aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm (follow)
AgeCommit message (Collapse)AuthorFilesLines
2010-05-21Merge branch 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds13-1964/+2792
* 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (269 commits) KVM: x86: Add missing locking to arch specific vcpu ioctls KVM: PPC: Add missing vcpu_load()/vcpu_put() in vcpu ioctls KVM: MMU: Segregate shadow pages with different cr0.wp KVM: x86: Check LMA bit before set_efer KVM: Don't allow lmsw to clear cr0.pe KVM: Add cpuid.txt file KVM: x86: Tell the guest we'll warn it about tsc stability x86, paravirt: don't compute pvclock adjustments if we trust the tsc x86: KVM guest: Try using new kvm clock msrs KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID KVM: x86: add new KVMCLOCK cpuid feature KVM: x86: change msr numbers for kvmclock x86, paravirt: Add a global synchronization point for pvclock x86, paravirt: Enable pvclock flags in vcpu_time_info structure KVM: x86: Inject #GP with the right rip on efer writes KVM: SVM: Don't allow nested guest to VMMCALL into host KVM: x86: Fix exception reinjection forced to true KVM: Fix wallclock version writing race KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots KVM: VMX: enable VMXON check with SMX enabled (Intel TXT) ...
2010-05-19KVM: x86: Add missing locking to arch specific vcpu ioctlsAvi Kivity1-0/+6
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: MMU: Segregate shadow pages with different cr0.wpAvi Kivity1-1/+2
When cr0.wp=0, we may shadow a gpte having u/s=1 and r/w=0 with an spte having u/s=0 and r/w=1. This allows excessive access if the guest sets cr0.wp=1 and accesses through this spte. Fix by making cr0.wp part of the base role; we'll have different sptes for the two cases and the problem disappears. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: x86: Check LMA bit before set_eferSheng Yang1-2/+2
kvm_x86_ops->set_efer() would execute vcpu->arch.efer = efer, so the checking of LMA bit didn't work. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: Don't allow lmsw to clear cr0.peAvi Kivity1-1/+1
The current lmsw implementation allows the guest to clear cr0.pe, contrary to the manual, which breaks EMM386.EXE. Fix by ORing the old cr0.pe with lmsw's operand. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: x86: Tell the guest we'll warn it about tsc stabilityGlauber Costa1-1/+4
This patch puts up the flag that tells the guest that we'll warn it about the tsc being trustworthy or not. By now, we also say it is not. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUIDGlauber Costa1-0/+34
Right now, we were using individual KVM_CAP entities to communicate userspace about which cpuids we support. This is suboptimal, since it generates a delay between the feature arriving in the host, and being available at the guest. A much better mechanism is to list para features in KVM_GET_SUPPORTED_CPUID. This makes userspace automatically aware of what we provide. And if we ever add a new cpuid bit in the future, we have to do that again, which create some complexity and delay in feature adoption. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: x86: change msr numbers for kvmclockGlauber Costa1-1/+6
Avi pointed out a while ago that those MSRs falls into the pentium PMU range. So the idea here is to add new ones, and after a while, deprecate the old ones. Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: x86: Inject #GP with the right rip on efer writesRoedel, Joerg1-19/+12
This patch fixes a bug in the KVM efer-msr write path. If a guest writes to a reserved efer bit the set_efer function injects the #GP directly. The architecture dependent wrmsr function does not see this, assumes success and advances the rip. This results in a #GP in the guest with the wrong rip. This patch fixes this by reporting efer write errors back to the architectural wrmsr function. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: SVM: Don't allow nested guest to VMMCALL into hostJoerg Roedel1-0/+3
This patch disables the possibility for a l2-guest to do a VMMCALL directly into the host. This would happen if the l1-hypervisor doesn't intercept VMMCALL and the l2-guest executes this instruction. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: x86: Fix exception reinjection forced to trueJoerg Roedel1-1/+1
The patch merged recently which allowed to mark an exception as reinjected has a bug as it always marks the exception as reinjected. This breaks nested-svm shadow-on-shadow implementation. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: Fix wallclock version writing raceAvi Kivity1-2/+10
Wallclock writing uses an unprotected global variable to hold the version; this can cause one guest to interfere with another if both write their wallclock at the same time. Acked-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_rootsAvi Kivity1-0/+7
On svm, kvm_read_pdptr() may require reading guest memory, which can sleep. Push the spinlock into mmu_alloc_roots(), and only take it after we've read the pdptr. Tested-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: VMX: enable VMXON check with SMX enabled (Intel TXT)Shane Wang1-11/+21
Per document, for feature control MSR: Bit 1 enables VMXON in SMX operation. If the bit is clear, execution of VMXON in SMX operation causes a general-protection exception. Bit 2 enables VMXON outside SMX operation. If the bit is clear, execution of VMXON outside SMX operation causes a general-protection exception. This patch is to enable this kind of check with SMX for VMXON in KVM. Signed-off-by: Shane Wang <shane.wang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: x86: properly update ready_for_interrupt_injectionMarcelo Tosatti1-1/+1
The recent changes to emulate string instructions without entering guest mode exposed a bug where pending interrupts are not properly reflected in ready_for_interrupt_injection. The result is that userspace overwrites a previously queued interrupt, when irqchip's are emulated in userspace. Fix by always updating state before returning to userspace. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: VMX: Atomically switch efer if EPT && !EFER.NXAvi Kivity1-0/+12
When EPT is enabled, we cannot emulate EFER.NX=0 through the shadow page tables. This causes accesses through ptes with bit 63 set to succeed instead of failing a reserved bit check. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: VMX: Add facility to atomically switch MSRs on guest entry/exitAvi Kivity1-0/+49
Some guest msr values cannot be used on the host (for example. EFER.NX=0), so we need to switch them atomically during guest entry or exit. Add a facility to program the vmx msr autoload registers accordingly. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: Let vcpu structure alignment be determined at runtimeAvi Kivity2-2/+3
vmx and svm vcpus have different contents and therefore may have different alignmment requirements. Let each specify its required alignment. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-19KVM: MMU: cleanup invlpg codeXiao Guangrong1-3/+1
Using is_last_spte() to cleanup invlpg code Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: MMU: move unsync/sync tracpoints to proper placeXiao Guangrong1-2/+2
Move unsync/sync tracepoints to the proper place, it's good for us to obtain unsync page live time Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: MMU: convert mmu tracepointsXiao Guangrong1-43/+26
Convert mmu tracepoints by using DECLARE_EVENT_CLASS Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: MMU: fix for calculating gpa in invlpg codeXiao Guangrong1-1/+6
If the guest is 32-bit, we should use 'quadrant' to adjust gpa offset Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: Fix mmu shrinker errorGui Jianfeng1-5/+5
kvm_mmu_remove_one_alloc_mmu_page() assumes kvm_mmu_zap_page() only reclaims only one sp, but that's not the case. This will cause mmu shrinker returns a wrong number. This patch fix the counting error. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-19KVM: MMU: fix hashing for TDP and non-paging modesEric Northup1-4/+8
For TDP mode, avoid creating multiple page table roots for the single guest-to-host physical address map by fixing the inputs used for the shadow page table hash in mmu_alloc_roots(). Signed-off-by: Eric Northup <digitaleric@google.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-18Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tipLinus Torvalds3-1/+57
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (311 commits) perf tools: Add mode to build without newt support perf symbols: symbol inconsistency message should be done only at verbose=1 perf tui: Add explicit -lslang option perf options: Type check all the remaining OPT_ variants perf options: Type check OPT_BOOLEAN and fix the offenders perf options: Check v type in OPT_U?INTEGER perf options: Introduce OPT_UINTEGER perf tui: Add workaround for slang < 2.1.4 perf record: Fix bug mismatch with -c option definition perf options: Introduce OPT_U64 perf tui: Add help window to show key associations perf tui: Make <- exit menus too perf newt: Add single key shortcuts for zoom into DSO and threads perf newt: Exit browser unconditionally when CTRL+C, q or Q is pressed perf newt: Fix the 'A'/'a' shortcut for annotate perf newt: Make <- exit the ui_browser x86, perf: P4 PMU - fix counters management logic perf newt: Make <- zoom out filters perf report: Report number of events, not samples perf hist: Clarify events_stats fields usage ... Fix up trivial conflicts in kernel/fork.c and tools/perf/builtin-record.c
2010-05-17KVM: MMU: fix sp->unsync type error in trace event definitionGui Jianfeng1-1/+1
sp->unsync is bool now, so update trace event declaration. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17KVM: SVM: Handle MCE intercepts always on host levelJoerg Roedel1-0/+1
This patch prevents MCE intercepts from being propagated into the L1 guest if they happened in an L2 guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17KVM: x86: Allow marking an exception as reinjectedJoerg Roedel3-11/+27
This patch adds logic to kvm/x86 which allows to mark an injected exception as reinjected. This allows to remove an ugly hack from svm_complete_interrupts that prevented exceptions from being reinjected at all in the nested case. The hack was necessary because an reinjected exception into the nested guest could cause a nested vmexit emulation. But reinjected exceptions must not intercept. The downside of the hack is that a exception that in injected could get lost. This patch fixes the problem and puts the code for it into generic x86 files because. Nested-VMX will likely have the same problem and could reuse the code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17KVM: SVM: Report emulated SVM features to userspaceJoerg Roedel1-0/+10
This patch implements the reporting of the emulated SVM features to userspace instead of the real hardware capabilities. Every real hardware capability needs emulation in nested svm so the old behavior was broken. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17KVM: x86: Add callback to let modules decide over some supported cpuid bitsJoerg Roedel3-0/+15
This patch adds the get_supported_cpuid callback to kvm_x86_ops. It will be used in do_cpuid_ent to delegate the decission about some supported cpuid bits to the architecture modules. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17KVM: SVM: Propagate nested entry failure into guest hypervisorJoerg Roedel1-0/+4
This patch implements propagation of a failes guest vmrun back into the guest instead of killing the whole guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17KVM: SVM: Sync cr0 and cr3 to kvm state before nested handlingJoerg Roedel1-9/+6
This patch syncs cr0 and cr3 from the vmcb to the kvm state before nested intercept handling is done. This allows to simplify the vmexit path. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17KVM: SVM: Make sure rip is synced to vmcb before nested vmexitJoerg Roedel1-4/+4
This patch fixes a bug where a nested guest always went over the same instruction because the rip was not advanced on a nested vmexit. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17KVM: SVM: Fix nested nmi handlingJoerg Roedel1-7/+9
The patch introducing nested nmi handling had a bug. The check does not belong to enable_nmi_window but must be in nmi_allowed. This patch fixes this. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17Merge remote branch 'tip/perf/core'Avi Kivity1-0/+4
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17KVM: VMX: free vpid when fail to create vcpuLai Jiangshan1-4/+12
Fix bug of the exception path, free allocated vpid when fail to create vcpu. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17KVM: MMU: cleanup for function unaccount_shadowed()Wei Yongjun1-1/+1
Since gfn is not changed in the for loop, we do not need to call gfn_to_memslot_unaliased() under the loop, and it is safe to move it out. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17KVM: Get rid of dead function gva_to_page()Gui Jianfeng1-14/+0
Nobody use gva_to_page() anymore, get rid of it. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17KVM: MMU: Remove unused varialbe in rmap_next()Gui Jianfeng1-2/+0
Remove unused varialbe in rmap_next() Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17KVM: MMU: Make use of is_large_pte() in walkerGui Jianfeng1-2/+2
Make use of is_large_pte() instead of checking PT_PAGE_SIZE_MASK bit directly. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17KVM: MMU: Move sync_page() first pte address calculation out of loopGui Jianfeng1-2/+4
Move first pte address calculation out of loop to save some cycles. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17KVM: MMU: Drop cr4.pge from shadow page roleAvi Kivity2-3/+1
Since commit bf47a760f66ad, we no longer handle ptes with the global bit set specially, so there is no reason to distinguish between shadow pages created with cr4.gpe set and clear. Such tracking is expensive when the guest toggles cr4.pge, so drop it. Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17KVM: use the correct RCU API for PROVE_RCU=yLai Jiangshan4-6/+14
The RCU/SRCU API have already changed for proving RCU usage. I got the following dmesg when PROVE_RCU=y because we used incorrect API. This patch coverts rcu_deference() to srcu_dereference() or family API. =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by qemu-system-x86/8550: #0: (&kvm->slots_lock){+.+.+.}, at: [<ffffffffa011a6ac>] kvm_set_memory_region+0x29/0x50 [kvm] #1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [<ffffffffa012262d>] kvm_arch_commit_memory_region+0xa6/0xe2 [kvm] stack backtrace: Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27 Call Trace: [<ffffffff8106c59e>] lockdep_rcu_dereference+0xaa/0xb3 [<ffffffffa012f6c1>] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm] [<ffffffffa012263e>] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm] [<ffffffffa011a5d7>] __kvm_set_memory_region+0x636/0x6e2 [kvm] [<ffffffffa011a6ba>] kvm_set_memory_region+0x37/0x50 [kvm] [<ffffffffa015e956>] vmx_set_tss_addr+0x46/0x5a [kvm_intel] [<ffffffffa0126592>] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm] [<ffffffff810a8692>] ? unlock_page+0x27/0x2c [<ffffffff810bf879>] ? __do_fault+0x3a9/0x3e1 [<ffffffffa011b12f>] kvm_vm_ioctl+0x364/0x38d [kvm] [<ffffffff81060cfa>] ? up_read+0x23/0x3d [<ffffffff810f3587>] vfs_ioctl+0x32/0xa6 [<ffffffff810f3b19>] do_vfs_ioctl+0x495/0x4db [<ffffffff810e6b2f>] ? fget_light+0xc2/0x241 [<ffffffff810e416c>] ? do_sys_open+0x104/0x116 [<ffffffff81382d6d>] ? retint_swapgs+0xe/0x13 [<ffffffff810f3ba6>] sys_ioctl+0x47/0x6a [<ffffffff810021db>] system_call_fastpath+0x16/0x1b Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17Merge branch 'perf'Avi Kivity3-1/+53
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-05-17KVM: MMU: cleanup for hlist walk restartXiao Guangrong1-5/+10
Quote from Avi: |Just change the assignment to a 'goto restart;' please, |I don't like playing with list_for_each internals. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17KVM: prevent spurious exit to userspace during task switch emulation.Gleb Natapov2-4/+14
If kvm_task_switch() fails code exits to userspace without specifying exit reason, so the previous exit reason is reused by userspace. Fix this by specifying exit reason correctly. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17KVM: MMU: remove unused parameter in mmu_parent_walk()Xiao Guangrong1-13/+11
'vcpu' is unused, remove it Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17KVM: MMU: remove unused struct kvm_unsync_walkXiao Guangrong1-5/+0
Remove 'struct kvm_unsync_walk' since it's not used. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17KVM: fix emulator_task_switch() return value.Gleb Natapov2-4/+5
emulator_task_switch() should return -1 for failure and 0 for success to the caller, just like x86_emulate_insn() does. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-05-17KVM: MMU: Replace role.glevels with role.cr4_paeAvi Kivity2-8/+9
There is no real distinction between glevels=3 and glevels=4; both have exactly the same format and the code is treated exactly the same way. Drop role.glevels and replace is with role.cr4_pae (which is meaningful). This simplifies the code a bit. As a side effect, it allows sharing shadow page tables between pae and longmode guest page tables at the same guest page. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>