aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm (follow)
AgeCommit message (Collapse)AuthorFilesLines
2015-02-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds13-425/+1673
Pull KVM update from Paolo Bonzini: "Fairly small update, but there are some interesting new features. Common: Optional support for adding a small amount of polling on each HLT instruction executed in the guest (or equivalent for other architectures). This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This also has to be enabled manually for now, but the plan is to auto-tune this in the future. ARM/ARM64: The highlights are support for GICv3 emulation and dirty page tracking s390: Several optimizations and bugfixes. Also a first: a feature exposed by KVM (UUID and long guest name in /proc/sysinfo) before it is available in IBM's hypervisor! :) MIPS: Bugfixes. x86: Support for PML (page modification logging, a new feature in Broadwell Xeons that speeds up dirty page tracking), nested virtualization improvements (nested APICv---a nice optimization), usual round of emulation fixes. There is also a new option to reduce latency of the TSC deadline timer in the guest; this needs to be tuned manually. Some commits are common between this pull and Catalin's; I see you have already included his tree. Powerpc: Nothing yet. The KVM/PPC changes will come in through the PPC maintainers, because I haven't received them yet and I might end up being offline for some part of next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: ia64: drop kvm.h from installed user headers KVM: x86: fix build with !CONFIG_SMP KVM: x86: emulate: correct page fault error code for NoWrite instructions KVM: Disable compat ioctl for s390 KVM: s390: add cpu model support KVM: s390: use facilities and cpu_id per KVM KVM: s390/CPACF: Choose crypto control block format s390/kernel: Update /proc/sysinfo file with Extended Name and UUID KVM: s390: reenable LPP facility KVM: s390: floating irqs: fix user triggerable endless loop kvm: add halt_poll_ns module parameter kvm: remove KVM_MMIO_SIZE KVM: MIPS: Don't leak FPU/DSP to guest KVM: MIPS: Disable HTW while in guest KVM: nVMX: Enable nested posted interrupt processing KVM: nVMX: Enable nested virtual interrupt delivery KVM: nVMX: Enable nested apic register virtualization KVM: nVMX: Make nested control MSRs per-cpu KVM: nVMX: Enable nested virtualize x2apic mode KVM: nVMX: Prepare for using hardware MSR bitmap ...
2015-02-10KVM: x86: fix build with !CONFIG_SMPRadim Krčmář1-0/+1
<asm/apic.h> isn't included directly and without CONFIG_SMP, an option that automagically pulls it can't be enabled. Reported-by: Jim Davis <jim.epost@gmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-09Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-0/+1
Pull RCU updates from Ingo Molnar: "The main RCU changes in this cycle are: - Documentation updates. - Miscellaneous fixes. - Preemptible-RCU fixes, including fixing an old bug in the interaction of RCU priority boosting and CPU hotplug. - SRCU updates. - RCU CPU stall-warning updates. - RCU torture-test updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) rcu: Initialize tiny RCU stall-warning timeouts at boot rcu: Fix RCU CPU stall detection in tiny implementation rcu: Add GP-kthread-starvation checks to CPU stall warnings rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors rcu: Optionally run grace-period kthreads at real-time priority ksoftirqd: Use new cond_resched_rcu_qs() function ksoftirqd: Enable IRQs and call cond_resched() before poking RCU rcutorture: Add more diagnostics in rcu_barrier() test failure case torture: Flag console.log file to prevent holdovers from earlier runs torture: Add "-enable-kvm -soundhw pcspk" to qemu command line rcutorture: Handle different mpstat versions rcutorture: Check from beginning to end of grace period rcu: Remove redundant rcu_batches_completed() declaration rcutorture: Drop rcu_torture_completed() and friends rcu: Provide rcu_batches_completed_sched() for TINY_RCU rcutorture: Use unsigned for Reader Batch computations rcutorture: Make build-output parsing correctly flag RCU's warnings rcu: Make _batches_completed() functions return unsigned long rcutorture: Issue warnings on close calls due to Reader Batch blows documentation: Fix smp typo in memory-barriers.txt ...
2015-02-09KVM: x86: emulate: correct page fault error code for NoWrite instructionsPaolo Bonzini1-1/+2
NoWrite instructions (e.g. cmp or test) never set the "write access" bit in the error code, even if one of the operands is treated as a destination. Fixes: c205fb7d7d4f81e46fc577b707ceb9e356af1456 Cc: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-06kvm: add halt_poll_ns module parameterPaolo Bonzini1-0/+1
This patch introduces a new module parameter for the KVM module; when it is present, KVM attempts a bit of polling on every HLT before scheduling itself out via kvm_vcpu_block. This parameter helps a lot for latency-bound workloads---in particular I tested it with O_DSYNC writes with a battery-backed disk in the host. In this case, writes are fast (because the data doesn't have to go all the way to the platters) but they cannot be merged by either the host or the guest. KVM's performance here is usually around 30% of bare metal, or 50% if you use cache=directsync or cache=writethrough (these parameters avoid that the guest sends pointless flush requests, and at the same time they are not slow because of the battery-backed cache). The bad performance happens because on every halt the host CPU decides to halt itself too. When the interrupt comes, the vCPU thread is then migrated to a new physical CPU, and in general the latency is horrible because the vCPU thread has to be scheduled back in. With this patch performance reaches 60-65% of bare metal and, more important, 99% of what you get if you use idle=poll in the guest. This means that the tunable gets rid of this particular bottleneck, and more work can be done to improve performance in the kernel or QEMU. Of course there is some price to pay; every time an otherwise idle vCPUs is interrupted by an interrupt, it will poll unnecessarily and thus impose a little load on the host. The above results were obtained with a mostly random value of the parameter (500000), and the load was around 1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU. The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll, that can be used to tune the parameter. It counts how many HLT instructions received an interrupt during the polling period; each successful poll avoids that Linux schedules the VCPU thread out and back in, and may also avoid a likely trip to C1 and back for the physical CPU. While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second. Of these halts, almost all are failed polls. During the benchmark, instead, basically all halts end within the polling period, except a more or less constant stream of 50 per second coming from vCPUs that are not running the benchmark. The wasted time is thus very low. Things may be slightly different for Windows VMs, which have a ~10 ms timer tick. The effect is also visible on Marcelo's recently-introduced latency test for the TSC deadline timer. Though of course a non-RT kernel has awful latency bounds, the latency of the timer is around 8000-10000 clock cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC deadline timer, thus, the effect is both a smaller average latency and a smaller variance. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-03KVM: nVMX: Enable nested posted interrupt processingWincy Van3-7/+161
If vcpu has a interrupt in vmx non-root mode, injecting that interrupt requires a vmexit. With posted interrupt processing, the vmexit is not needed, and interrupts are fully taken care of by hardware. In nested vmx, this feature avoids much more vmexits than non-nested vmx. When L1 asks L0 to deliver L1's posted interrupt vector, and the target VCPU is in non-root mode, we use a physical ipi to deliver POSTED_INTR_NV to the target vCPU. Using POSTED_INTR_NV avoids unexpected interrupts if a concurrent vmexit happens and L1's vector is different with L0's. The IPI triggers posted interrupt processing in the target physical CPU. In case the target vCPU was not in guest mode, complete the posted interrupt delivery on the next entry to L2. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-03KVM: nVMX: Enable nested virtual interrupt deliveryWincy Van1-2/+65
With virtual interrupt delivery, the hardware lets KVM use a more efficient mechanism for interrupt injection. This is an important feature for nested VMX, because it reduces vmexits substantially and they are much more expensive with nested virtualization. This is especially important for throughput-bound scenarios. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-03KVM: nVMX: Enable nested apic register virtualizationWincy Van1-4/+35
We can reduce apic register virtualization cost with this feature, it is also a requirement for virtual interrupt delivery and posted interrupt processing. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-03KVM: nVMX: Make nested control MSRs per-cpuWincy Van1-86/+129
To enable nested apicv support, we need per-cpu vmx control MSRs: 1. If in-kernel irqchip is enabled, we can enable nested posted interrupt, we should set posted intr bit in the nested_vmx_pinbased_ctls_high. 2. If in-kernel irqchip is disabled, we can not enable nested posted interrupt, the posted intr bit in the nested_vmx_pinbased_ctls_high will be cleared. Since there would be different settings about in-kernel irqchip between VMs, different nested control MSRs are needed. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-03KVM: nVMX: Enable nested virtualize x2apic modeWincy Van1-2/+112
When L2 is using x2apic, we can use virtualize x2apic mode to gain higher performance, especially in apicv case. This patch also introduces nested_vmx_check_apicv_controls for the nested apicv patches. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-03KVM: nVMX: Prepare for using hardware MSR bitmapWincy Van1-11/+66
Currently, if L1 enables MSR_BITMAP, we will emulate this feature, all of L2's msr access is intercepted by L0. Features like "virtualize x2apic mode" require that the MSR bitmap is enabled, or the hardware will exit and for example not virtualize the x2apic MSRs. In order to let L1 use these features, we need to build a merged bitmap that only not cause a VMEXIT if 1) L1 requires that 2) the bit is not required by the processor for APIC virtualization. For now the guests are still run with MSR bitmap disabled, but this patch already introduces nested_vmx_merge_msr_bitmap for future use. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-02KVM: x86: revert "add method to test PIR bitmap vector"Marcelo Tosatti1-14/+0
Revert 7c6a98dfa1ba9dc64a62e73624ecea9995736bbd, given that testing PIR is not necessary anymore. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-02KVM: x86: fix lapic_timer_int_injected with APIC-vMarcelo Tosatti1-6/+6
With APICv, LAPIC timer interrupt is always delivered via IRR: apic_find_highest_irr syncs PIR to IRR. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-30kvm: vmx: fix oops with explicit flexpriority=0 optionPaolo Bonzini1-7/+7
A function pointer was not NULLed, causing kvm_vcpu_reload_apic_access_page to go down the wrong path and OOPS when doing put_page(NULL). This did not happen on old processors, only when setting the module option explicitly. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-30KVM: x86: check LAPIC presence when building apic_mapRadim Krčmář1-0/+3
We forgot to re-check LAPIC after splitting the loop in commit 173beedc1601 (KVM: x86: Software disabled APIC should still deliver NMIs, 2014-11-02). Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Fixes: 173beedc1601f51dae9d579aa7a414c5aa8f700b Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-30KVM: x86: fix x2apic logical address matchingRadim Krčmář1-1/+2
We cannot hit the bug now, but future patches will expose this path. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-30KVM: x86: replace 0 with APIC_DEST_PHYSICALRadim Krčmář1-4/+2
To make the code self-documenting. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-30KVM: x86: cleanup kvm_apic_match_*()Radim Krčmář1-30/+16
The majority of this patch turns result = 0; if (CODE) result = 1; return result; into return CODE; because we return bool now. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-30KVM: x86: return bool from kvm_apic_match*()Radim Krčmář3-7/+5
And don't export the internal ones while at it. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-30KVM: VMX: Add PML support in VMXKai Huang3-1/+213
This patch adds PML support in VMX. A new module parameter 'enable_pml' is added to allow user to enable/disable it manually. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-29KVM: x86: Add new dirty logging kvm_x86_ops for PMLKai Huang2-9/+68
This patch adds new kvm_x86_ops dirty logging hooks to enable/disable dirty logging for particular memory slot, and to flush potentially logged dirty GPAs before reporting slot->dirty_bitmap to userspace. kvm x86 common code calls these hooks when they are available so PML logic can be hidden to VMX specific. SVM won't be impacted as these hooks remain NULL there. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-29KVM: x86: Change parameter of kvm_mmu_slot_remove_write_accessKai Huang2-6/+9
This patch changes the second parameter of kvm_mmu_slot_remove_write_access from 'slot id' to 'struct kvm_memory_slot *' to align with kvm_x86_ops dirty logging hooks, which will be introduced in further patch. Better way is to change second parameter of kvm_arch_commit_memory_region from 'struct kvm_userspace_memory_region *' to 'struct kvm_memory_slot * new', but it requires changes on other non-x86 ARCH too, so avoid it now. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-29KVM: MMU: Explicitly set D-bit for writable spte.Kai Huang1-1/+15
This patch avoids unnecessary dirty GPA logging to PML buffer in EPT violation path by setting D-bit manually prior to the occurrence of the write from guest. We only set D-bit manually in set_spte, and leave fast_page_fault path unchanged, as fast_page_fault is very unlikely to happen in case of PML. For the hva <-> pa change case, the spte is updated to either read-only (host pte is read-only) or be dropped (host pte is writeable), and both cases will be handled by above changes, therefore no change is necessary. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-29KVM: MMU: Add mmu help functions to support PMLKai Huang1-0/+195
This patch adds new mmu layer functions to clear/set D-bit for memory slot, and to write protect superpages for memory slot. In case of PML, CPU logs the dirty GPA automatically to PML buffer when CPU updates D-bit from 0 to 1, therefore we don't have to write protect 4K pages, instead, we only need to clear D-bit in order to log that GPA. For superpages, we still write protect it and let page fault code to handle dirty page logging, as we still need to split superpage to 4K pages in PML. As PML is always enabled during guest's lifetime, to eliminate unnecessary PML GPA logging, we set D-bit manually for the slot with dirty logging disabled. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-29KVM: Rename kvm_arch_mmu_write_protect_pt_masked to be more generic for log dirtyKai Huang1-2/+19
We don't have to write protect guest memory for dirty logging if architecture supports hardware dirty logging, such as PML on VMX, so rename it to be more generic. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-27kvm: iommu: Add cond_resched to legacy device assignment codeJoerg Roedel1-1/+3
When assigning devices to large memory guests (>=128GB guest memory in the failure case) the functions to create the IOMMU page-tables for the whole guest might run for a very long time. On non-preemptible kernels this might cause Soft-Lockup warnings. Fix these by adding a cond_resched() to the mapping and unmapping loops. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26KVM: x86: Emulation of call may use incorrect stack sizeNadav Amit1-7/+18
On long-mode, when far call that changes cs.l takes place, the stack size is determined by the new mode. For instance, if we go from 32-bit mode to 64-bit mode, the stack-size if 64. KVM uses the old stack size. Fix it. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26KVM: x86: 32-bit wraparound read/write not emulated correctlyNadav Amit2-3/+9
If we got a wraparound of 32-bit operand, and the limit is 0xffffffff, read and writes should be successful. It just needs to be done in two segments. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26KVM: x86: Fix defines in emulator.cNadav Amit1-1/+1
Unnecassary define was left after commit 7d882ffa81d5 ("KVM: x86: Revert NoBigReal patch in the emulator"). Commit 39f062ff51b2 ("KVM: x86: Generate #UD when memory operand is required") was missing undef. Fix it. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26KVM: x86: ARPL emulation can cause spurious exceptionsNadav Amit1-6/+26
ARPL and MOVSXD are encoded the same and their execution depends on the execution mode. The operand sizes of each instruction are different. Currently, ARPL is detected too late, after the decoding was already done, and therefore may result in spurious exception (instead of failed emulation). Introduce a group to the emulator to handle instructions according to execution mode (32/64 bits). Note: in order not to make changes that may affect performance, the new ModeDual can only be applied to instructions with ModRM. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26KVM: x86: IRET emulation does not clear NMI maskingNadav Amit2-0/+7
The IRET instruction should clear NMI masking, but the current implementation does not do so. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26KVM: x86: Wrong operand size for far retNadav Amit1-2/+2
Indeed, Intel SDM specifically states that for the RET instruction "In 64-bit mode, the default operation size of this instruction is the stack-address size, i.e. 64 bits." However, experiments show this is not the case. Here is for example objdump of small 64-bit asm: 4004f1: ca 14 00 lret $0x14 4004f4: 48 cb lretq 4004f6: 48 ca 14 00 lretq $0x14 Therefore, remove the Stack flag from far-ret instructions. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-26KVM: x86: Dirty the dest op page on cmpxchg emulationNadav Amit1-4/+7
Intel SDM says for CMPXCHG: "To simplify the interface to the processor’s bus, the destination operand receives a write cycle without regard to the result of the comparison.". This means the destination page should be dirtied. Fix it to by writing back the original value if cmpxchg failed. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-23Merge tag 'kvm-s390-next-20150122' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-nextPaolo Bonzini1-7/+3
KVM: s390: fixes and features for kvm/next (3.20) 1. Generic - sparse warning (make function static) - optimize locking - bugfixes for interrupt injection - fix MVPG addressing modes 2. hrtimer/wakeup fun A recent change can cause KVM hangs if adjtime is used in the host. The hrtimer might wake up too early or too late. Too early is fatal as vcpu_block will see that the wakeup condition is not met and sleep again. This CPU might never wake up again. This series addresses this problem. adjclock slowing down the host clock will result in too late wakeups. This will require more work. In addition to that we also change the hrtimer from REALTIME to MONOTONIC to avoid similar problems with timedatectl set-time. 3. sigp rework We will move all "slow" sigps to QEMU (protected with a capability that can be enabled) to avoid several races between concurrent SIGP orders. 4. Optimize the shadow page table Provide an interface to announce the maximum guest size. The kernel will use that to make the pagetable 2,3,4 (or theoretically) 5 levels. 5. Provide an interface to set the guest TOD We now use two vm attributes instead of two oneregs, as oneregs are vcpu ioctl and we don't want to call them from other threads. 6. Protected key functions The real HMC allows to enable/disable protected key CPACF functions. Lets provide an implementation + an interface for QEMU to activate this the protected key instructions.
2015-01-23KVM: x86: SYSENTER emulation is brokenNadav Amit1-19/+8
SYSENTER emulation is broken in several ways: 1. It misses the case of 16-bit code segments completely (CVE-2015-0239). 2. MSR_IA32_SYSENTER_CS is checked in 64-bit mode incorrectly (bits 0 and 1 can still be set without causing #GP). 3. MSR_IA32_SYSENTER_EIP and MSR_IA32_SYSENTER_ESP are not masked in legacy-mode. 4. There is some unneeded code. Fix it. Cc: stable@vger.linux.org Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-23KVM: x86: Fix of previously incomplete fix for CVE-2014-8480Nadav Amit1-2/+2
STR and SLDT with rip-relative operand can cause a host kernel oops. Mark them as DstMem as well. Cc: stable@vger.linux.org Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-23Merge tag 'kvm-arm-for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-nextPaolo Bonzini3-61/+16
KVM/ARM changes for v3.20 including GICv3 emulation, dirty page logging, added trace symbols, and adding an explicit VGIC init device control IOCTL. Conflicts: arch/arm64/include/asm/kvm_arm.h arch/arm64/kvm/handle_exit.c
2015-01-23KVM: remove unneeded return value of vcpu_postcreateDominik Dingel1-7/+3
The return value of kvm_arch_vcpu_postcreate is not checked in its caller. This is okay, because only x86 provides vcpu_postcreate right now and it could only fail if vcpu_load failed. But that is not possible during KVM_CREATE_VCPU (kvm_arch_vcpu_load is void, too), so just get rid of the unchecked return value. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-21Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcuIngo Molnar1-0/+1
Pull RCU updates from Paul E. McKenney: - Documentation updates. - Miscellaneous fixes. - Preemptible-RCU fixes, including fixing an old bug in the interaction of RCU priority boosting and CPU hotplug. - SRCU updates. - RCU CPU stall-warning updates. - RCU torture-test updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-20KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issueMarcelo Tosatti1-1/+14
SuSE's 2.6.16 kernel fails to boot if the delta between tsc_timestamp and rdtsc is larger than a given threshold: * If we get more than the below threshold into the future, we rerequest * the real time from the host again which has only little offset then * that we need to adjust using the TSC. * * For now that threshold is 1/5th of a jiffie. That should be good * enough accuracy for completely broken systems, but also give us swing * to not call out to the host all the time. */ #define PVCLOCK_DELTA_MAX ((1000000000ULL / HZ) / 5) Disable masterclock support (which increases said delta) in case the boot vcpu does not use MSR_KVM_SYSTEM_TIME_NEW. Upstreams kernels which support pvclock vsyscalls (and therefore make use of PVCLOCK_STABLE_BIT) use MSR_KVM_SYSTEM_TIME_NEW. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-20KVM: fix "Should it be static?" warnings from sparseFengguang Wu1-4/+4
arch/x86/kvm/x86.c:495:5: sparse: symbol 'kvm_read_nested_guest_page' was not declared. Should it be static? arch/x86/kvm/x86.c:646:5: sparse: symbol '__kvm_set_xcr' was not declared. Should it be static? arch/x86/kvm/x86.c:1183:15: sparse: symbol 'max_tsc_khz' was not declared. Should it be static? arch/x86/kvm/x86.c:1237:6: sparse: symbol 'kvm_track_tsc_matching' was not declared. Should it be static? Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-19Optimize TLB flush in kvm_mmu_slot_remove_write_access.Kai Huang1-2/+5
No TLB flush is needed when there's no valid rmap in memory slot. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-19x86: kvm: vmx: Remove some unused functionsRickard Strandqvist1-10/+0
Removes some functions that are not used anywhere: cpu_has_vmx_eptp_writeback() cpu_has_vmx_eptp_uncacheable() This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-16KVM: x86: switch to kvm_get_dirty_log_protectPaolo Bonzini3-61/+16
We now have a generic function that does most of the work of kvm_vm_ioctl_get_dirty_log, now use it. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
2015-01-09KVM: x86: #PF error-code on R/W operations is wrongNadav Amit2-13/+5
When emulating an instruction that reads the destination memory operand (i.e., instructions without the Mov flag in the emulator), the operand is first read. If a page-fault is detected in this phase, the error-code which would be delivered to the VM does not indicate that the access that caused the exception is a write one. This does not conform with real hardware, and may cause the VM to enter the page-fault handler twice for no reason (once for read, once for write). Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-09KVM: x86: flush TLB when D bit is manually changed.Kai Huang1-0/+13
When software changes D bit (either from 1 to 0, or 0 to 1), the corresponding TLB entity in the hardware won't be updated immediately. We should flush it to guarantee the consistence of D bit between TLB and MMU page table in memory. This is especially important when clearing the D bit, since it may cause false negatives in reporting dirtiness. Sanity test was done on my machine with Intel processor. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> [Check A bit too. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-09KVM: x86: allow TSC deadline timer on all hostsRadim Krčmář1-3/+1
Emulation does not utilize the feature. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08kvm: x86: Remove kvm_make_request from lapic.cNicholas Krause3-6/+11
Adds a function kvm_vcpu_set_pending_timer instead of calling kvm_make_request in lapic.c. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08KVM: x86: Access to LDT/GDT that wraparound is incorrectNadav Amit1-13/+34
When access to descriptor in LDT/GDT wraparound outside long-mode, the address of the descriptor should be truncated to 32-bit. Citing Intel SDM 2.1.1.1 "Global and Local Descriptor Tables in IA-32e Mode": "GDTR and LDTR registers are expanded to 64-bits wide in both IA-32e sub-modes (64-bit mode and compatibility mode)." So in other cases, we need to truncate. Creating new function to return a pointer to descriptor table to avoid too much code duplication. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> [Wrap 64-bit check with #ifdef CONFIG_X86_64, to avoid a "right shift count >= width of type" warning and consequent undefined behavior. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-08KVM: x86: Do not set access bit on accessed segmentsNadav Amit1-4/+7
When segment is loaded, the segment access bit is set unconditionally. In fact, it should be set conditionally, based on whether the segment had the accessed bit set before. In addition, it can improve performance. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>