aboutsummaryrefslogtreecommitdiffstatshomepage
AgeCommit message (Collapse)AuthorFilesLines
2014-11-28KVM: s390: allow injecting all kinds of machine checksJens Freimann1-3/+11
Allow to specify CR14, logout area, external damage code and failed storage address. Since more then one machine check can be indicated to the guest at a time we need to combine all indication bits with already pending requests. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28KVM: s390: handle pending local interrupts via bitmapJens Freimann6-282/+380
This patch adapts handling of local interrupts to be more compliant with the z/Architecture Principles of Operation and introduces a data structure which allows more efficient handling of interrupts. * get rid of li->active flag, use bitmap instead * Keep interrupts in a bitmap instead of a list * Deliver interrupts in the order of their priority as defined in the PoP * Use a second bitmap for sigp emergency requests, as a CPU can have one request pending from every other CPU in the system. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28KVM: s390: add bitmap for handling cpu-local interruptsJens Freimann1-0/+86
Adds a bitmap to the vcpu structure which is used to keep track of local pending interrupts. Also add enum with all interrupt types sorted in order of priority (highest to lowest) Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28KVM: s390: refactor interrupt delivery codeJens Freimann1-177/+282
Move delivery code for cpu-local interrupt from the huge do_deliver_interrupt() to smaller functions which handle one type of interrupt. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28KVM: s390: add defines for virtio and pfault interrupt codeJens Freimann1-2/+4
Get rid of open coded value for virtio and pfault completion interrupts. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28KVM: s390: external param not valid for cpu timer and ckcDavid Hildenbrand1-4/+2
The 32bit external interrupt parameter is only valid for timing-alert and service-signal interrupts. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28KVM: s390: refactor interrupt injection codeJens Freimann1-54/+167
In preparation for the rework of the local interrupt injection code, factor out injection routines from kvm_s390_inject_vcpu(). Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28KVM: S390: Create helper function get_guest_storage_keyJason J. Herne2-0/+40
Define get_guest_storage_key which can be used to get the value of a guest storage key. This compliments the functionality provided by the helper function set_guest_storage_key. Both functions are needed for live migration of s390 guests that use storage keys. Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28KVM: s390: trigger the right CPU exit for floating interruptsChristian Borntraeger1-1/+11
When injecting a floating interrupt and no CPU is idle we kick one CPU to do an external exit. In case of I/O we should trigger an I/O exit instead. This does not matter for Linux guests as external and I/O interrupts are enabled/disabled at the same time, but play safe anyway. The same holds true for machine checks. Since there is no special exit, just reuse the generic stop exit. The injection code inside the VCPU loop will recheck anyway and rearm the proper exits (e.g. control registers) if necessary. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
2014-11-28KVM: s390: Fix rewinding of the PSW pointing to an EXECUTE instructionThomas Huth4-13/+23
A couple of our interception handlers rewind the PSW to the beginning of the instruction to run the intercepted instruction again during the next SIE entry. This normally works fine, but there is also the possibility that the instruction did not get run directly but via an EXECUTE instruction. In this case, the PSW does not point to the instruction that caused the interception, but to the EXECUTE instruction! So we've got to rewind the PSW to the beginning of the EXECUTE instruction instead. This is now accomplished with a new helper function kvm_s390_rewind_psw(). Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28KVM: s390: Small fixes for the PFMF handlerThomas Huth1-4/+7
This patch includes two small fixes for the PFMF handler: First, the start address for PFMF has to be masked according to the current addressing mode, which is now done with kvm_s390_logical_to_effective(). Second, the protection exceptions have a lower priority than the specification exceptions, so the check for low-address protection has to be moved after the last spot where we inject a specification exception. Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-23kvm: x86: mask out XSAVESPaolo Bonzini1-1/+10
This feature is not supported inside KVM guests yet, because we do not emulate MSR_IA32_XSS. Mask it out. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-23kvm: x86: move assigned-dev.c and iommu.c to arch/x86/Radim Krčmář7-30/+25
Now that ia64 is gone, we can hide deprecated device assignment in x86. Notable changes: - kvm_vm_ioctl_assigned_device() was moved to x86/kvm_arch_vm_ioctl() The easy parts were removed from generic kvm code, remaining - kvm_iommu_(un)map_pages() would require new code to be moved - struct kvm_assigned_dev_kernel depends on struct kvm_irq_ack_notifier Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-21kvm: remove IA64 ioctlsRadim Krčmář1-3/+0
KVM ia64 is no longer present so new applications shouldn't use them. The main problem is that they most likely didn't work even before, because of a conflict in the #defines: #define KVM_SET_GUEST_DEBUG _IOW(KVMIO, 0x9b, struct kvm_guest_debug) #define KVM_IA64_VCPU_SET_STACK _IOW(KVMIO, 0x9b, void *) The argument to KVM_SET_GUEST_DEBUG is: struct kvm_guest_debug { __u32 control; __u32 pad; struct kvm_guest_debug_arch arch; }; struct kvm_guest_debug_arch { }; meaning that sizeof(struct kvm_guest_debug) == sizeof(void *) == 8 and KVM_SET_GUEST_DEBUG == KVM_IA64_VCPU_SET_STACK. KVM_SET_GUEST_DEBUG is handled in virt/kvm/kvm_main.c before even calling kvm_arch_vcpu_ioctl (which would have handled KVM_IA64_VCPU_SET_STACK), so KVM_IA64_VCPU_SET_STACK would just return -EINVAL. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-21kvm: remove CONFIG_X86 #ifdefs from files formerly shared with ia64Radim Krcmar2-24/+2
Signed-off-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-21kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/Paolo Bonzini9-30/+29
ia64 does not need them anymore. Ack notifiers become x86-specific too. Suggested-by: Gleb Natapov <gleb@kernel.org> Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-20kvm: Documentation: remove ia64Tiejun Chen2-106/+22
kvm/ia64 is gone, clean up Documentation too. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-20KVM: ia64: removePaolo Bonzini31-13272/+0
KVM for ia64 has been marked as broken not just once, but twice even, and the last patch from the maintainer is now roughly 5 years old. Time for it to rest in peace. Acked-by: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19KVM: x86: Remove FIXMEs in emulate.cNicholas Krause1-4/+0
Remove FIXME comments about needing fault addresses to be returned. These are propaagated from walk_addr_generic to gva_to_gpa and from there to ops->read_std and ops->write_std. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19KVM: emulator: remove duplicated limit checkPaolo Bonzini1-9/+4
The check on the higher limit of the segment, and the check on the maximum accessible size, is the same for both expand-up and expand-down segments. Only the computation of "lim" varies. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19KVM: emulator: remove code duplication in register_address{,_increment}Paolo Bonzini1-12/+11
register_address has been a duplicate of address_mask ever since the ancestor of __linearize was born in 90de84f50b42 (KVM: x86 emulator: preserve an operand's segment identity, 2010-11-17). However, we can put it to a better use by including the call to reg_read in register_address. Similarly, the call to reg_rmw can be moved to register_address_increment. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19KVM: x86: Move __linearize masking of la into switchNadav Amit1-2/+1
In __linearize there is check of the condition whether to check if masking of the linear address is needed. It occurs immediately after switch that evaluates the same condition. Merge them. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19KVM: x86: Non-canonical access using SS should cause #SSNadav Amit1-1/+1
When SS is used using a non-canonical address, an #SS exception is generated on real hardware. KVM emulator causes a #GP instead. Fix it to behave as real x86 CPU. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19KVM: x86: Perform limit checks when assigning EIPNadav Amit1-41/+54
If branch (e.g., jmp, ret) causes limit violations, since the target IP > limit, the #GP exception occurs before the branch. In other words, the RIP pushed on the stack should be that of the branch and not that of the target. To do so, we can call __linearize, with new EIP, which also saves us the code which performs the canonical address checks. On the case of assigning an EIP >= 2^32 (when switching cs.l), we also safe, as __linearize will check the new EIP does not exceed the limit and would trigger #GP(0) otherwise. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19KVM: x86: Emulator performs privilege checks on __linearizeNadav Amit1-15/+0
When segment is accessed, real hardware does not perform any privilege level checks. In contrast, KVM emulator does. This causes some discrepencies from real hardware. For instance, reading from readable code segment may fail due to incorrect segment checks. In addition, it introduces unnecassary overhead. To reference Intel SDM 5.5 ("Privilege Levels"): "Privilege levels are checked when the segment selector of a segment descriptor is loaded into a segment register." The SDM never mentions privilege level checks during memory access, except for loading far pointers in section 5.10 ("Pointer Validation"). Those are actually segment selector loads and are emulated in the similarily (i.e., regardless to __linearize checks). This behavior was also checked using sysexit. A data-segment whose DPL=0 was loaded, and after sysexit (CPL=3) it is still accessible. Therefore, all the privilege level checks in __linearize are removed. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19KVM: x86: Stack size is overridden by __linearizeNadav Amit1-4/+5
When performing segmented-read/write in the emulator for stack operations, it ignores the stack size, and uses the ad_bytes as indication for the pointer size. As a result, a wrong address may be accessed. To fix this behavior, we can remove the masking of address in __linearize and perform it beforehand. It is already done for the operands (so currently it is inefficiently done twice). It is missing in two cases: 1. When using rip_relative 2. On fetch_bit_operand that changes the address. This patch masks the address on these two occassions, and removes the masking from __linearize. Note that it does not mask EIP during fetch. In protected/legacy mode code fetch when RIP >= 2^32 should result in #GP and not wrap-around. Since we make limit checks within __linearize, this is the expected behavior. Partial revert of commit 518547b32ab4 (KVM: x86: Emulator does not calculate address correctly, 2014-09-30). Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-19KVM: x86: Revert NoBigReal patch in the emulatorNadav Amit1-7/+1
Commit 10e38fc7cab6 ("KVM: x86: Emulator flag for instruction that only support 16-bit addresses in real mode") introduced NoBigReal for instructions such as MONITOR. Apparetnly, the Intel SDM description that led to this patch is misleading. Since no instruction is using NoBigReal, it is safe to remove it, we fully understand what the SDM means. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-18kvm: x86: vmx: remove MMIO_MAX_GENTiejun Chen1-4/+3
MMIO_MAX_GEN is the same as MMIO_GEN_MASK. Use only one. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-18kvm: x86: vmx: cleanup handle_ept_violationTiejun Chen1-3/+3
Instead, just use PFERR_{FETCH, PRESENT, WRITE}_MASK inside handle_ept_violation() for slightly better code. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-17KVM: x86: Fix lost interrupt on irr_pending raceNadav Amit1-6/+12
apic_find_highest_irr assumes irr_pending is set if any vector in APIC_IRR is set. If this assumption is broken and apicv is disabled, the injection of interrupts may be deferred until another interrupt is delivered to the guest. Ultimately, if no other interrupt should be injected to that vCPU, the pending interrupt may be lost. commit 56cc2406d68c ("KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use") changed the behavior of apic_clear_irr so irr_pending is cleared after setting APIC_IRR vector. After this commit, if apic_set_irr and apic_clear_irr run simultaneously, a race may occur, resulting in APIC_IRR vector set, and irr_pending cleared. In the following example, assume a single vector is set in IRR prior to calling apic_clear_irr: apic_set_irr apic_clear_irr ------------ -------------- apic->irr_pending = true; apic_clear_vector(...); vec = apic_search_irr(apic); // => vec == -1 apic_set_vector(...); apic->irr_pending = (vec != -1); // => apic->irr_pending == false Nonetheless, it appears the race might even occur prior to this commit: apic_set_irr apic_clear_irr ------------ -------------- apic->irr_pending = true; apic->irr_pending = false; apic_clear_vector(...); if (apic_search_irr(apic) != -1) apic->irr_pending = true; // => apic->irr_pending == false apic_set_vector(...); Fixing this issue by: 1. Restoring the previous behavior of apic_clear_irr: clear irr_pending, call apic_clear_vector, and then if APIC_IRR is non-zero, set irr_pending. 2. On apic_set_irr: first call apic_set_vector, then set irr_pending. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-17KVM: compute correct map even if all APICs are software disabledPaolo Bonzini1-10/+15
Logical destination mode can be used to send NMI IPIs even when all APICs are software disabled, so if all APICs are software disabled we should still look at the DFRs. So the DFRs should all be the same, even if some or all APICs are software disabled. However, the SDM does not say this, so tweak the logic as follows: - if one APIC is enabled and has LDR != 0, use that one to build the map. This picks the right DFR in case an OS is only setting it for the software-enabled APICs, or in case an OS is using logical addressing on some APICs while leaving the rest in reset state (using LDR was suggested by Radim). - if all APICs are disabled, pick a random one to build the map. We use the last one with LDR != 0 for simplicity. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-17KVM: x86: Software disabled APIC should still deliver NMIsNadav Amit1-8/+15
Currently, the APIC logical map does not consider VCPUs whose local-apic is software-disabled. However, NMIs, INIT, etc. should still be delivered to such VCPUs. Therefore, the APIC mode should first be determined, and then the map, considering all VCPUs should be constructed. To address this issue, first find the APIC mode, and only then construct the logical map. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-17kvm: simplify update_memslots invocationPaolo Bonzini1-14/+6
The update_memslots invocation is only needed in one case. Make the code clearer by moving it to __kvm_set_memory_region, and removing the wrapper around insert_memslot. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-17kvm: commonize allocation of the new memory slotsPaolo Bonzini1-17/+11
The two kmemdup invocations can be unified. I find that the new placement of the comment makes it easier to see what happens. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-14kvm: memslots: track id_to_index changes during the insertion sortPaolo Bonzini1-19/+19
This completes the optimization from the previous patch, by removing the KVM_MEM_SLOTS_NUM-iteration loop from insert_memslot. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-14kvm: memslots: replace heap sort with an insertion sort passIgor Mammedov1-28/+28
memslots is a sorted array. When a slot is changed, heapsort (lib/sort.c) would take O(n log n) time to update it; an optimized insertion sort will only cost O(n) on an array with just one item out of order. Replace sort() with a custom sort that takes advantage of memslots usage pattern and the known position of the changed slot. performance change of 128 memslots insertions with gradually increasing size (the worst case): heap sort custom sort max: 249747 2500 cycles with custom sort alg taking ~98% less then original update time. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-14kvm: x86: increase user memory slots to 509Igor Mammedov1-1/+1
With the 3 private slots, this gives us 512 slots total. Motivation for this is in addition to assigned devices support more memory hotplug slots, where 1 slot is used by a hotplugged memory stick. It will allow to support upto 256 hotplug memory slots and leave 253 slots for assigned devices and other devices that use them. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-13kvm: svm: move WARN_ON in svm_adjust_tsc_offsetChris J Arges2-4/+6
When running the tsc_adjust kvm-unit-test on an AMD processor with the IA32_TSC_ADJUST feature enabled, the WARN_ON in svm_adjust_tsc_offset can be triggered. This WARN_ON checks for a negative adjustment in case __scale_tsc is called; however it may trigger unnecessary warnings. This patch moves the WARN_ON to trigger only if __scale_tsc will actually be called from svm_adjust_tsc_offset. In addition make adj in kvm_set_msr_common s64 since this can have signed values. Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-12x86, kvm, vmx: Don't set LOAD_IA32_EFER when host and guest matchAndy Lutomirski1-1/+3
There's nothing to switch if the host and guest values are the same. I am unable to find evidence that this makes any difference whatsoever. Signed-off-by: Andy Lutomirski <luto@amacapital.net> [I could see a difference on Nehalem. From 5 runs: userspace exit, guest!=host 12200 11772 12130 12164 12327 userspace exit, guest=host 11983 11780 11920 11919 12040 lightweight exit, guest!=host 3214 3220 3238 3218 3337 lightweight exit, guest=host 3178 3193 3193 3187 3220 This passes the t-test with 99% confidence for userspace exit, 98.5% confidence for lightweight exit. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-12x86, kvm, vmx: Always use LOAD_IA32_EFER if availableAndy Lutomirski1-2/+8
At least on Sandy Bridge, letting the CPU switch IA32_EFER is much faster than switching it manually. I benchmarked this using the vmexit kvm-unit-test (single run, but GOAL multiplied by 5 to do more iterations): Test Before After Change cpuid 2000 1932 -3.40% vmcall 1914 1817 -5.07% mov_from_cr8 13 13 0.00% mov_to_cr8 19 19 0.00% inl_from_pmtimer 19164 10619 -44.59% inl_from_qemu 15662 10302 -34.22% inl_from_kernel 3916 3802 -2.91% outl_to_kernel 2230 2194 -1.61% mov_dr 172 176 2.33% ipi (skipped) (skipped) ipi+halt (skipped) (skipped) ple-round-robin 13 13 0.00% wr_tsc_adjust_msr 1920 1845 -3.91% rd_tsc_adjust_msr 1892 1814 -4.12% mmio-no-eventfd:pci-mem 16394 11165 -31.90% mmio-wildcard-eventfd:pci-mem 4607 4645 0.82% mmio-datamatch-eventfd:pci-mem 4601 4610 0.20% portio-no-eventfd:pci-io 11507 7942 -30.98% portio-wildcard-eventfd:pci-io 2239 2225 -0.63% portio-datamatch-eventfd:pci-io 2250 2234 -0.71% I haven't explicitly computed the significance of these numbers, but this isn't subtle. Signed-off-by: Andy Lutomirski <luto@amacapital.net> [The results were reproducible on all of Nehalem, Sandy Bridge and Ivy Bridge. The slowness of manual switching is because writing to EFER with WRMSR triggers a TLB flush, even if the only bit you're touching is SCE (so the page table format is not affected). Doing the write as part of vmentry/vmexit, instead, does not flush the TLB, probably because all processors that have EPT also have VPID. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-10KVM: x86: fix warning on 32-bit compilationPaolo Bonzini1-0/+2
PCIDs are only supported in 64-bit mode. No need to clear bit 63 of CR3 unless the host is 64-bit. Reported by Fengguang Wu's autobuilder. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-08kvm: x86: add trace event for pvclock updatesDavid Matlack2-0/+39
The new trace event records: * the id of vcpu being updated * the pvclock_vcpu_time_info struct being written to guest memory This is useful for debugging pvclock bugs, such as the bug fixed by "[PATCH] kvm: x86: Fix kvm clock versioning.". Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-08kvm: x86: Fix kvm clock versioning.Owen Hofmann1-5/+5
kvm updates the version number for the guest paravirt clock structure by incrementing the version of its private copy. It does not read the guest version, so will write version = 2 in the first update for every new VM, including after restoring a saved state. If guest state is saved during reading the clock, it could read and accept struct fields and guest TSC from two different updates. This changes the code to increment the guest version and write it back. Signed-off-by: Owen Hofmann <osh@google.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-08KVM: x86: MOVNTI emulation min opsize is not respectedNadav Amit1-7/+7
Commit 3b32004a66e9 ("KVM: x86: movnti minimum op size of 32-bit is not kept") did not fully fix the minimum operand size of MONTI emulation. Still, MOVNTI may be mistakenly performed using 16-bit opsize. This patch add No16 flag to mark an instruction does not support 16-bits operand size. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-08KVM: x86: update masterclock values on TSC writesMarcelo Tosatti1-9/+10
When the guest writes to the TSC, the masterclock TSC copy must be updated as well along with the TSC_OFFSET update, otherwise a negative tsc_timestamp is calculated at kvm_guest_time_update. Once "if (!vcpus_matched && ka->use_master_clock)" is simplified to "if (ka->use_master_clock)", the corresponding "if (!ka->use_master_clock)" becomes redundant, so remove the do_request boolean and collapse everything into a single condition. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-08KVM: x86: Return UNHANDLABLE on unsupported SYSENTERNadav Amit1-4/+2
Now that KVM injects #UD on "unhandlable" error, it makes better sense to return such error on sysenter instead of directly injecting #UD to the guest. This allows to track more easily the unhandlable cases the emulator does not support. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-08KVM: x86: Warn on APIC base relocationNadav Amit1-0/+4
APIC base relocation is unsupported by KVM. If anyone uses it, the least should be to report a warning in the hypervisor. Note that KVM-unit-tests uses this feature for some reason, so running the tests triggers the warning. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-08KVM: x86: Emulator mis-decodes VEX instructions on real-modeNadav Amit1-2/+1
Commit 7fe864dc942c (KVM: x86: Mark VEX-prefix instructions emulation as unimplemented, 2014-06-02) marked VEX instructions as such in protected mode. VEX-prefix instructions are not supported relevant on real-mode and VM86, but should cause #UD instead of being decoded as LES/LDS. Fix this behaviour to be consistent with real hardware. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> [Check for mod == 3, rather than 2 or 3. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-07KVM: x86: Remove redundant and incorrect cpl check on task-switchNadav Amit1-6/+2
Task-switch emulation checks the privilege level prior to performing the task-switch. This check is incorrect in the case of task-gates, in which the tss.dpl is ignored, and can cause superfluous exceptions. Moreover this check is unnecassary, since the CPU checks the privilege levels prior to exiting. Intel SDM 25.4.2 says "If CALL or JMP accesses a TSS descriptor directly outside IA-32e mode, privilege levels are checked on the TSS descriptor" prior to exiting. AMD 15.14.1 says "The intercept is checked before the task switch takes place but after the incoming TSS and task gate (if one was involved) have been checked for correctness." This patch removes the CPL checks for CALL and JMP. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-11-07KVM: x86: Inject #GP when loading system segments with non-canonical baseNadav Amit1-0/+6
When emulating LTR/LDTR/LGDT/LIDT, #GP should be injected if the base is non-canonical. Otherwise, VM-entry will fail. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>