aboutsummaryrefslogtreecommitdiffstats
path: root/virt (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-02-08KVM: arm/arm64: Abstract virtual timer context into separate structureJintack Lim2-37/+42
Abstract virtual timer context into a separate structure and change all callers referring to timer registers, irq state and so on. No change in functionality. This is about to become very handy when adding the EL1 physical timer. Signed-off-by: Jintack Lim <jintack@cs.columbia.edu> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-02-08KVM: arm/arm64: vgic: Stop injecting the MSI occurrence twiceShanker Donthineni1-0/+3
The IRQFD framework calls the architecture dependent function twice if the corresponding GSI type is edge triggered. For ARM, the function kvm_set_msi() is getting called twice whenever the IRQFD receives the event signal. The rest of the code path is trying to inject the MSI without any validation checks. No need to call the function vgic_its_inject_msi() second time to avoid an unnecessary overhead in IRQ queue logic. It also avoids the possibility of VM seeing the MSI twice. Simple fix, return -1 if the argument 'level' value is zero. Cc: stable@vger.kernel.org Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-02-01KVM: arm/arm64: Remove kvm_vgic_inject_mapped_irqChristoffer Dall2-35/+18
The only benefit of having kvm_vgic_inject_mapped_irq separate from kvm_vgic_inject_irq is that we pass a boolean that we use for error checking on the injection path. While this could potentially help in some aspect of robustness, it's also a little bit of a defensive move, and arguably callers into the vgic should have make sure they have marked their virtual IRQs as mapped if required. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-01-30KVM: arm/arm64: vgic: Implement KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO ioctlVijaya Kumar K5-1/+119
Userspace requires to store and restore of line_level for level triggered interrupts using ioctl KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-30KVM: arm/arm64: vgic: Implement VGICv3 CPU interface accessVijaya Kumar K4-0/+66
VGICv3 CPU interface registers are accessed using KVM_DEV_ARM_VGIC_CPU_SYSREGS ioctl. These registers are accessed as 64-bit. The cpu MPIDR value is passed along with register id. It is used to identify the cpu for registers access. The VM that supports SEIs expect it on destination machine to handle guest aborts and hence checked for ICC_CTLR_EL1.SEIS compatibility. Similarly, VM that supports Affinity Level 3 that is required for AArch64 mode, is required to be supported on destination machine. Hence checked for ICC_CTLR_EL1.A3V compatibility. The arch/arm64/kvm/vgic-sys-reg-v3.c handles read and write of VGIC CPU registers for AArch64. For AArch32 mode, arch/arm/kvm/vgic-v3-coproc.c file is created but APIs are not implemented. Updated arch/arm/include/uapi/asm/kvm.h with new definitions required to compile for AArch32. The version of VGIC v3 specification is defined here Documentation/virtual/kvm/devices/arm-vgic-v3.txt Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-30KVM: arm/arm64: vgic: Introduce VENG0 and VENG1 fields to vmcr structVijaya Kumar K4-18/+39
ICC_VMCR_EL2 supports virtual access to ICC_IGRPEN1_EL1.Enable and ICC_IGRPEN0_EL1.Enable fields. Add grpen0 and grpen1 member variables to struct vmcr to support read and write of these fields. Also refactor vgic_set_vmcr and vgic_get_vmcr() code. Drop ICH_VMCR_CTLR_SHIFT and ICH_VMCR_CTLR_MASK macros and instead use ICH_VMCR_EOI* and ICH_VMCR_CBPR* macros. Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-30KVM: arm/arm64: vgic: Add distributor and redistributor accessVijaya Kumar K5-36/+292
VGICv3 Distributor and Redistributor registers are accessed using KVM_DEV_ARM_VGIC_GRP_DIST_REGS and KVM_DEV_ARM_VGIC_GRP_REDIST_REGS with KVM_SET_DEVICE_ATTR and KVM_GET_DEVICE_ATTR ioctls. These registers are accessed as 32-bit and cpu mpidr value passed along with register offset is used to identify the cpu for redistributor registers access. The version of VGIC v3 specification is defined here Documentation/virtual/kvm/devices/arm-vgic-v3.txt Also update arch/arm/include/uapi/asm/kvm.h to compile for AArch32 mode. Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-30KVM: arm/arm64: vgic: Implement support for userspace accessVijaya Kumar K4-49/+169
Read and write of some registers like ISPENDR and ICPENDR from userspace requires special handling when compared to guest access for these registers. Refer to Documentation/virtual/kvm/devices/arm-vgic-v3.txt for handling of ISPENDR, ICPENDR registers handling. Add infrastructure to support guest and userspace read and write for the required registers Also moved vgic_uaccess from vgic-mmio-v2.c to vgic-mmio.c Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@cavium.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-25KVM: arm/arm64: vgic: Add debugfs vgic-state fileChristoffer Dall3-0/+290
Add a file to debugfs to read the in-kernel state of the vgic. We don't do any locking of the entire VGIC state while traversing all the IRQs, so if the VM is running the user/developer may not see a quiesced state, but should take care to pause the VM using facilities in user space for that purpose. We also don't support LPIs yet, but they can be added easily if needed. Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Tested-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-01-25KVM: arm/arm64: Remove struct vgic_irq pending fieldChristoffer Dall8-44/+37
One of the goals behind the VGIC redesign was to get rid of cached or intermediate state in the data structures, but we decided to allow ourselves to precompute the pending value of an IRQ based on the line level and pending latch state. However, this has now become difficult to base proper GICv3 save/restore on, because there is a potential to modify the pending state without knowing if an interrupt is edge or level configured. See the following post and related message for more background: https://lists.cs.columbia.edu/pipermail/kvmarm/2017-January/023195.html This commit gets rid of the precomputed pending field in favor of a function that calculates the value when needed, irq_is_pending(). The soft_pending field is renamed to pending_latch to represent that this latch is the equivalent hardware latch which gets manipulated by the input signal for edge-triggered interrupts and when writing to the SPENDR/CPENDR registers. After this commit save/restore code should be able to simply restore the pending_latch state, line_level state, and config state in any order and get the desired result. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-01-17Merge tag 'kvm-arm-for-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarmRadim Krčmář5-24/+57
KVM/ARM updates for 4.10-rc4 - Fix for timer setup on VHE machines - Drop spurious warning when the timer races against the vcpu running again - Prevent a vgic deadlock when the initialization fails
2017-01-13KVM: arm/arm64: vgic: Fix deadlock on error handlingMarc Zyngier3-9/+13
Dmitry Vyukov reported that the syzkaller fuzzer triggered a deadlock in the vgic setup code when an error was detected, as the cleanup code tries to take a lock that is already held by the setup code. The fix is to avoid retaking the lock when cleaning up, by telling the cleanup function that we already hold it. Cc: stable@vger.kernel.org Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-13KVM: arm64: Access CNTHCTL_EL2 bit fields correctly on VHE systemsJintack Lim2-12/+44
Current KVM world switch code is unintentionally setting wrong bits to CNTHCTL_EL2 when E2H == 1, which may allow guest OS to access physical timer. Bit positions of CNTHCTL_EL2 are changing depending on HCR_EL2.E2H bit. EL1PCEN and EL1PCTEN are 1st and 0th bits when E2H is not set, but they are 11th and 10th bits respectively when E2H is set. In fact, on VHE we only need to set those bits once, not for every world switch. This is because the host kernel runs in EL2 with HCR_EL2.TGE == 1, which makes those bits have no effect for the host kernel execution. So we just set those bits once for guests, and that's it. Signed-off-by: Jintack Lim <jintack@cs.columbia.edu> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-13KVM: arm/arm64: Fix occasional warning from the timer work functionChristoffer Dall1-3/+0
When a VCPU blocks (WFI) and has programmed the vtimer, we program a soft timer to expire in the future to wake up the vcpu thread when appropriate. Because such as wake up involves a vcpu kick, and the timer expire function can get called from interrupt context, and the kick may sleep, we have to schedule the kick in the work function. The work function currently has a warning that gets raised if it turns out that the timer shouldn't fire when it's run, which was added because the idea was that in that case the work should never have been cancelled. However, it turns out that this whole thing is racy and we can get spurious warnings. The problem is that we clear the armed flag in the work function, which may run in parallel with the kvm_timer_unschedule->timer_disarm() call. This results in a possible situation where the timer_disarm() call does not call cancel_work_sync(), which effectively synchronizes the completion of the work function with running the VCPU. As a result, the VCPU thread proceeds before the work function completees, causing changes to the timer state such that kvm_timer_should_fire(vcpu) returns false in the work function. All we do in the work function is to kick the VCPU, and an occasional rare extra kick never harmed anyone. Since the race above is extremely rare, we don't bother checking if the race happens but simply remove the check and the clearing of the armed flag from the work function. Reported-by: Matthias Brugger <mbrugger@suse.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-01-12KVM: eventfd: fix NULL deref irqbypass consumerWanpeng Li1-2/+2
Reported syzkaller: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] PGD 0 Oops: 0002 [#1] SMP CPU: 1 PID: 125 Comm: kworker/1:1 Not tainted 4.9.0+ #1 Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm] task: ffff9bbe0dfbb900 task.stack: ffffb61802014000 RIP: 0010:irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] Call Trace: irqfd_shutdown+0x66/0xa0 [kvm] process_one_work+0x16b/0x480 worker_thread+0x4b/0x500 kthread+0x101/0x140 ? process_one_work+0x480/0x480 ? kthread_create_on_node+0x60/0x60 ret_from_fork+0x25/0x30 RIP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] RSP: ffffb61802017e20 CR2: 0000000000000008 The syzkaller folks reported a NULL pointer dereference that due to unregister an consumer which fails registration before. The syzkaller creates two VMs w/ an equal eventfd occasionally. So the second VM fails to register an irqbypass consumer. It will make irqfd as inactive and queue an workqueue work to shutdown irqfd and unregister the irqbypass consumer when eventfd is closed. However, the second consumer has been initialized though it fails registration. So the token(same as the first VM's) is taken to unregister the consumer through the workqueue, the consumer of the first VM is found and unregistered, then NULL deref incurred in the path of deleting consumer from the consumers list. This patch fixes it by making irq_bypass_register/unregister_consumer() looks for the consumer entry based on consumer pointer itself instead of token matching. Reported-by: Dmitry Vyukov <dvyukov@google.com> Suggested-by: Alex Williamson <alex.williamson@redhat.com> Cc: stable@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-12-25Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-3/+3
Pull timer type cleanups from Thomas Gleixner: "This series does a tree wide cleanup of types related to timers/timekeeping. - Get rid of cycles_t and use a plain u64. The type is not really helpful and caused more confusion than clarity - Get rid of the ktime union. The union has become useless as we use the scalar nanoseconds storage unconditionally now. The 32bit timespec alike storage got removed due to the Y2038 limitations some time ago. That leaves the odd union access around for no reason. Clean it up. Both changes have been done with coccinelle and a small amount of manual mopping up" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ktime: Get rid of ktime_equal() ktime: Cleanup ktime_set() usage ktime: Get rid of the union clocksource: Use a plain u64 instead of cycle_t
2016-12-25Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds3-3/+3
Pull SMP hotplug notifier removal from Thomas Gleixner: "This is the final cleanup of the hotplug notifier infrastructure. The series has been reintgrated in the last two days because there came a new driver using the old infrastructure via the SCSI tree. Summary: - convert the last leftover drivers utilizing notifiers - fixup for a completely broken hotplug user - prevent setup of already used states - removal of the notifiers - treewide cleanup of hotplug state names - consolidation of state space There is a sphinx based documentation pending, but that needs review from the documentation folks" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/armada-xp: Consolidate hotplug state space irqchip/gic: Consolidate hotplug state space coresight/etm3/4x: Consolidate hotplug state space cpu/hotplug: Cleanup state names cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions staging/lustre/libcfs: Convert to hotplug state machine scsi/bnx2i: Convert to hotplug state machine scsi/bnx2fc: Convert to hotplug state machine cpu/hotplug: Prevent overwriting of callbacks x86/msr: Remove bogus cleanup from the error path bus: arm-ccn: Prevent hotplug callback leak perf/x86/intel/cstate: Prevent hotplug callback leak ARM/imx/mmcd: Fix broken cpu hotplug handling scsi: qedi: Convert to hotplug state machine
2016-12-25clocksource: Use a plain u64 instead of cycle_tThomas Gleixner1-3/+3
There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>
2016-12-25cpu/hotplug: Cleanup state namesThomas Gleixner3-3/+3
When the state names got added a script was used to add the extra argument to the calls. The script basically converted the state constant to a string, but the cleanup to convert these strings into meaningful ones did not happen. Replace all the useless strings with 'subsys/xxx/yyy:state' strings which are used in all the other places already. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-24Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds1-1/+1
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14mm: unexport __get_user_pages_unlocked()Lorenzo Stoakes2-6/+9
Unexport the low-level __get_user_pages_unlocked() function and replaces invocations with calls to more appropriate higher-level functions. In hva_to_pfn_slow() we are able to replace __get_user_pages_unlocked() with get_user_pages_unlocked() since we can now pass gup_flags. In async_pf_execute() and process_vm_rw_single_vec() we need to pass different tsk, mm arguments so get_user_pages_remote() is the sane replacement in these cases (having added manual acquisition and release of mmap_sem.) Additionally get_user_pages_remote() reintroduces use of the FOLL_TOUCH flag. However, this flag was originally silently dropped by commit 1e9877902dc7 ("mm/gup: Introduce get_user_pages_remote()"), so this appears to have been unintentional and reintroducing it is therefore not an issue. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20161027095141.2569-3-lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds7-56/+92
Pull KVM updates from Paolo Bonzini: "Small release, the most interesting stuff is x86 nested virt improvements. x86: - userspace can now hide nested VMX features from guests - nested VMX can now run Hyper-V in a guest - support for AVX512_4VNNIW and AVX512_FMAPS in KVM - infrastructure support for virtual Intel GPUs. PPC: - support for KVM guests on POWER9 - improved support for interrupt polling - optimizations and cleanups. s390: - two small optimizations, more stuff is in flight and will be in 4.11. ARM: - support for the GICv3 ITS on 32bit platforms" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (94 commits) arm64: KVM: pmu: Reset PMSELR_EL0.SEL to a sane value before entering the guest KVM: arm/arm64: timer: Check for properly initialized timer on init KVM: arm/arm64: vgic-v2: Limit ITARGETSR bits to number of VCPUs KVM: x86: Handle the kthread worker using the new API KVM: nVMX: invvpid handling improvements KVM: nVMX: check host CR3 on vmentry and vmexit KVM: nVMX: introduce nested_vmx_load_cr3 and call it on vmentry KVM: nVMX: propagate errors from prepare_vmcs02 KVM: nVMX: fix CR3 load if L2 uses PAE paging and EPT KVM: nVMX: load GUEST_EFER after GUEST_CR0 during emulated VM-entry KVM: nVMX: generate MSR_IA32_CR{0,4}_FIXED1 from guest CPUID KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation KVM: nVMX: support restore of VMX capability MSRs KVM: nVMX: generate non-true VMX MSRs based on true versions KVM: x86: Do not clear RFLAGS.TF when a singlestep trap occurs. KVM: x86: Add kvm_skip_emulated_instruction and use it. KVM: VMX: Move skip_emulated_instruction out of nested_vmx_check_vmcs12 KVM: VMX: Reorder some skip_emulated_instruction calls KVM: x86: Add a return value to kvm_emulate_cpuid KVM: PPC: Book3S: Move prototypes for KVM functions into kvm_ppc.h ...
2016-12-13Merge tag 'vfio-v4.10-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds1-0/+18
Pull VFIO updates from Alex Williamson: - VFIO updates for v4.10 primarily include a new Mediated Device interface, which essentially allows software defined devices to be exposed to users through VFIO. The host vendor driver providing this virtual device polices, or mediates user access to the device. These devices often incorporate portions of real devices, for instance the primary initial users of this interface expose vGPUs which allow the user to map mediated devices, or mdevs, to a portion of a physical GPU. QEMU composes these mdevs into PCI representations using the existing VFIO user API. This enables both Intel KVM-GT support, which is also expected to arrive into Linux mainline during the v4.10 merge window, as well as NVIDIA vGPU, and also Channel I/O devices (aka CCW devices) for s390 virtualization support. (Kirti Wankhede, Neo Jia) - Drop unnecessary uses of pcibios_err_to_errno() (Cao Jin) - Fixes to VFIO capability chain handling (Eric Auger) - Error handling fixes for fallout from mdev (Christophe JAILLET) - Notifiers to expose struct kvm to mdev vendor drivers (Jike Song) - type1 IOMMU model search fixes (Kirti Wankhede, Neo Jia) * tag 'vfio-v4.10-rc1' of git://github.com/awilliam/linux-vfio: (30 commits) vfio iommu type1: Fix size argument to vfio_find_dma() in pin_pages/unpin_pages vfio iommu type1: Fix size argument to vfio_find_dma() during DMA UNMAP. vfio iommu type1: WARN_ON if notifier block is not unregistered kvm: set/clear kvm to/from vfio_group when group add/delete vfio: support notifier chain in vfio_group vfio: vfio_register_notifier: classify iommu notifier vfio: Fix handling of error returned by 'vfio_group_get_from_dev()' vfio: fix vfio_info_cap_add/shift vfio/pci: Drop unnecessary pcibios_err_to_errno() MAINTAINERS: Add entry VFIO based Mediated device drivers docs: Sample driver to demonstrate how to use Mediated device framework. docs: Sysfs ABI for mediated device framework docs: Add Documentation for Mediated devices vfio: Define device_api strings vfio_platform: Updated to use vfio_set_irqs_validate_and_prepare() vfio_pci: Updated to use vfio_set_irqs_validate_and_prepare() vfio: Introduce vfio_set_irqs_validate_and_prepare() vfio_pci: Update vfio_pci to use vfio_info_add_capability() vfio: Introduce common function to add capabilities vfio iommu: Add blocking notifier to notify DMA_UNMAP ...
2016-12-12Merge tag 'kvm-arm-for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEADPaolo Bonzini6-47/+14
KVM/ARM updates for 4.10: - Support for the GICv3 ITS on 32bit platforms - A handful of timer and GIC emulation fixes - A PMU architecture fix
2016-12-11Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar3-5/+9
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-09KVM: arm/arm64: timer: Check for properly initialized timer on initChristoffer Dall1-0/+5
When the arch timer code fails to initialize (for example because the memory mapped timer doesn't work, which is currently seen with the AEM model), then KVM just continues happily with a final result that KVM eventually does a NULL pointer dereference of the uninitialized cycle counter. Check directly for this in the init path and give the user a reasonable error in this case. Cc: Shih-Wei Li <shihwei@cs.columbia.edu> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-12-09KVM: arm/arm64: vgic-v2: Limit ITARGETSR bits to number of VCPUsAndre Przywara1-1/+2
The GICv2 spec says in section 4.3.12 that a "CPU targets field bit that corresponds to an unimplemented CPU interface is RAZ/WI." Currently we allow the guest to write any value in there and it can read that back. Mask the written value with the proper CPU mask to be spec compliant. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-12-01kvm: set/clear kvm to/from vfio_group when group add/deleteJike Song1-0/+18
Sometimes users need to be aware when a vfio_group attaches to a KVM or detaches from it. KVM already calls get/put method from vfio to manipulate the vfio_group reference, it can notify vfio_group in a similar way. Cc: Kirti Wankhede <kwankhede@nvidia.com> Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Jike Song <jike.song@intel.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2016-12-01KVM: use after free in kvm_ioctl_create_device()Dan Carpenter1-1/+1
We should move the ops->destroy(dev) after the list_del(&dev->vm_node) so that we don't use "dev" after freeing it. Fixes: a28ebea2adc4 ("KVM: Protect device ops->create and list_add with kvm->lock") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-12-01Merge tag 'kvm-arm-for-4.9-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarmRadim Krčmář2-4/+8
KVM/ARM updates for v4.9-rc7 - Do not call kvm_notify_acked for PPIs
2016-11-28KVM: Export kvm module parameter variablesSuraj Jitindar Singh1-3/+6
The kvm module has the parameters halt_poll_ns, halt_poll_ns_grow, and halt_poll_ns_shrink. Halt polling was recently added to the powerpc kvm-hv module and these parameters were essentially duplicated for that. There is no benefit to this duplication and it can lead to confusion when trying to tune halt polling. Thus move the definition of these variables to kvm_host.h and export them. This will allow the kvm-hv module to use the same module parameters by accessing these variables, which will be implemented in the next patch, meaning that they will no longer be duplicated. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-11-24KVM: arm/arm64: vgic: Don't notify EOI for non-SPIsMarc Zyngier2-4/+8
When we inject a level triggerered interrupt (and unless it is backed by the physical distributor - timer style), we request a maintenance interrupt. Part of the processing for that interrupt is to feed to the rest of KVM (and to the eventfd subsystem) the information that the interrupt has been EOIed. But that notification only makes sense for SPIs, and not PPIs (such as the PMU interrupt). Skip over the notification if the interrupt is not an SPI. Cc: stable@vger.kernel.org # 4.7+ Fixes: 140b086dd197 ("KVM: arm/arm64: vgic-new: Add GICv2 world switch backend") Fixes: 59529f69f504 ("KVM: arm/arm64: vgic-new: Add GICv3 world switch backend") Reported-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-22kvm: Introduce kvm_write_guest_offset_cached()Pan Xinhui1-6/+14
It allows us to update some status or field of a structure partially. We can also save a kvm_read_guest_cached() call if we just update one fild of the struct regardless of its current value. Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: benh@kernel.crashing.org Cc: boqun.feng@gmail.com Cc: borntraeger@de.ibm.com Cc: bsingharora@gmail.com Cc: dave@stgolabs.net Cc: jgross@suse.com Cc: kernellwp@gmail.com Cc: konrad.wilk@oracle.com Cc: linuxppc-dev@lists.ozlabs.org Cc: mpe@ellerman.id.au Cc: paulmck@linux.vnet.ibm.com Cc: paulus@samba.org Cc: rkrcmar@redhat.com Cc: virtualization@lists.linux-foundation.org Cc: will.deacon@arm.com Cc: xen-devel-request@lists.xenproject.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1478077718-37424-8-git-send-email-xinhui.pan@linux.vnet.ibm.com [ Typo fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-19KVM: async_pf: avoid recursive flushing of work itemsPaolo Bonzini1-1/+12
This was reported by syzkaller: [ INFO: possible recursive locking detected ] 4.9.0-rc4+ #49 Not tainted --------------------------------------------- kworker/2:1/5658 is trying to acquire lock: ([ 1644.769018] (&work->work) [< inline >] list_empty include/linux/compiler.h:243 [<ffffffff8128dd60>] flush_work+0x0/0x660 kernel/workqueue.c:1511 but task is already holding lock: ([ 1644.769018] (&work->work) [<ffffffff812916ab>] process_one_work+0x94b/0x1900 kernel/workqueue.c:2093 stack backtrace: CPU: 2 PID: 5658 Comm: kworker/2:1 Not tainted 4.9.0-rc4+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: events async_pf_execute ffff8800676ff630 ffffffff81c2e46b ffffffff8485b930 ffff88006b1fc480 0000000000000000 ffffffff8485b930 ffff8800676ff7e0 ffffffff81339b27 ffff8800676ff7e8 0000000000000046 ffff88006b1fcce8 ffff88006b1fccf0 Call Trace: ... [<ffffffff8128ddf3>] flush_work+0x93/0x660 kernel/workqueue.c:2846 [<ffffffff812954ea>] __cancel_work_timer+0x17a/0x410 kernel/workqueue.c:2916 [<ffffffff81295797>] cancel_work_sync+0x17/0x20 kernel/workqueue.c:2951 [<ffffffff81073037>] kvm_clear_async_pf_completion_queue+0xd7/0x400 virt/kvm/async_pf.c:126 [< inline >] kvm_free_vcpus arch/x86/kvm/x86.c:7841 [<ffffffff810b728d>] kvm_arch_destroy_vm+0x23d/0x620 arch/x86/kvm/x86.c:7946 [< inline >] kvm_destroy_vm virt/kvm/kvm_main.c:731 [<ffffffff8105914e>] kvm_put_kvm+0x40e/0x790 virt/kvm/kvm_main.c:752 [<ffffffff81072b3d>] async_pf_execute+0x23d/0x4f0 virt/kvm/async_pf.c:111 [<ffffffff8129175c>] process_one_work+0x9fc/0x1900 kernel/workqueue.c:2096 [<ffffffff8129274f>] worker_thread+0xef/0x1480 kernel/workqueue.c:2230 [<ffffffff812a5a94>] kthread+0x244/0x2d0 kernel/kthread.c:209 [<ffffffff831f102a>] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 The reason is that kvm_put_kvm is causing the destruction of the VM, but the page fault is still on the ->queue list. The ->queue list is owned by the VCPU, not by the work items, so we cannot just add list_del to the work item. Instead, use work->vcpu to note async page faults that have been resolved and will be processed through the done list. There is no need to flush those. Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-19Merge tag 'kvm-arm-for-4.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarmRadim Krčmář1-3/+5
KVM/ARM updates for v4.9-rc6 - Fix handling of the 32bit cycle counter - Fix cycle counter filtering
2016-11-18KVM: arm64: Fix the issues when guest PMCCFILTR is configuredWei Huang1-3/+5
KVM calls kvm_pmu_set_counter_event_type() when PMCCFILTR is configured. But this function can't deals with PMCCFILTR correctly because the evtCount bits of PMCCFILTR, which is reserved 0, conflits with the SW_INCR event type of other PMXEVTYPER<n> registers. To fix it, when eventsel == 0, this function shouldn't return immediately; instead it needs to check further if select_idx is ARMV8_PMU_CYCLE_IDX. Another issue is that KVM shouldn't copy the eventsel bits of PMCCFILTER blindly to attr.config. Instead it ought to convert the request to the "cpu cycle" event type (i.e. 0x11). To support this patch and to prevent duplicated definitions, a limited set of ARMv8 perf event types were relocated from perf_event.c to asm/perf_event.h. Cc: stable@vger.kernel.org # 4.6+ Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-16Merge branch 'x86/cpufeature' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into kvm/nextRadim Krčmář4-27/+50
Topic branch for AVX512_4VNNIW and AVX512_4FMAPS support in KVM.
2016-11-15arm/arm64: KVM: Clean up useless code in kvm_timer_enableLongpeng(Mike)1-11/+1
1) Since commit:41a54482 changed timer enabled variable to per-vcpu, the correlative comment in kvm_timer_enable is useless now. 2) After the kvm module init successfully, the timecounter is always non-null, so we can remove the checking of timercounter. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-14ARM: KVM: Support vGICv3 ITSVladimir Murzin3-30/+0
This patch allows to build and use vGICv3 ITS in 32-bit mode. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-14KVM: arm64: vgic-its: Fix compatibility with 32-bitVladimir Murzin1-5/+6
Evaluate GITS_BASER_ENTRY_SIZE once as an int data (GITS_BASER<n>'s Entry Size is 5-bit wide only), so when used as divider no reference to __aeabi_uldivmod is generated when build for AArch32. Use unsigned long long for GITS_BASER_PAGE_SIZE_* since they are used in conjunction with 64-bit data. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-11Merge tag 'kvm-arm-for-v4.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEADPaolo Bonzini3-21/+46
KVM/ARM updates for v4.9-rc4 - Kick the vcpu when a pending interrupt becomes pending again - Prevent access to invalid interrupt registers - Invalid TLBs when two vcpus from the same VM share a CPU
2016-11-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-3/+25
Pull KVM updates from Paolo Bonzini: "One NULL pointer dereference, and two fixes for regressions introduced during the merge window. The rest are fixes for MIPS, s390 and nested VMX" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: x86: Check memopp before dereference (CVE-2016-8630) kvm: nVMX: VMCLEAR an active shadow VMCS after last use KVM: x86: drop TSC offsetting kvm_x86_ops to fix KVM_GET/SET_CLOCK KVM: x86: fix wbinvd_dirty_mask use-after-free kvm/x86: Show WRMSR data is in hex kvm: nVMX: Fix kernel panics induced by illegal INVEPT/INVVPID types KVM: document lock orders KVM: fix OOPS on flush_work KVM: s390: Fix STHYI buffer alignment for diag224 KVM: MIPS: Precalculate MMIO load resume PC KVM: MIPS: Make ERET handle ERL before EXL KVM: MIPS: Fix lazy user ASID regenerate for SMP
2016-11-04KVM: arm/arm64: vgic: Kick VCPUs when queueing already pending IRQsShih-Wei Li1-0/+12
In cases like IPI, we could be queueing an interrupt for a VCPU that is already running and is not about to exit, because the VCPU has entered the VM with the interrupt pending and would not trap on EOI'ing that interrupt. This could result to delays in interrupt deliveries or even loss of interrupts. To guarantee prompt interrupt injection, here we have to try to kick the VCPU. Signed-off-by: Shih-Wei Li <shihwei@cs.columbia.edu> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-04KVM: arm/arm64: vgic: Prevent access to invalid SPIsAndre Przywara2-21/+34
In our VGIC implementation we limit the number of SPIs to a number that the userland application told us. Accordingly we limit the allocation of memory for virtual IRQs to that number. However in our MMIO dispatcher we didn't check if we ever access an IRQ beyond that limit, leading to out-of-bound accesses. Add a test against the number of allocated SPIs in check_region(). Adjust the VGIC_ADDR_TO_INT macro to avoid an actual division, which is not implemented on ARM(32). [maz: cleaned-up original patch] Cc: stable@vger.kernel.org Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-11-02kvm/stats: Update kvm stats to clear on write to their debugfs entrySuraj Jitindar Singh1-6/+72
Various kvm vm and vcpu stats are provided via debugfs entries. Currently there is no way to reset these stats back to zero. Add the ability to clear (reset back to zero) these stats on a per stat basis by writing to the debugfs files. Only a write value of 0 is accepted. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-10-26KVM: fix OOPS on flush_workPaolo Bonzini2-3/+25
The conversion done by commit 3706feacd007 ("KVM: Remove deprecated create_singlethread_workqueue") is broken. It flushes a single work item &irqfd->shutdown instead of all of them, and even worse if there is no irqfd on the list then you get a NULL pointer dereference. Revert the virt/kvm/eventfd.c part of that patch; to avoid the deprecated function, just allocate our own workqueue---it does not even have to be unbound---with alloc_workqueue. Fixes: 3706feacd007 Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-24mm: unexport __get_user_pages()Lorenzo Stoakes1-6/+4
This patch unexports the low-level __get_user_pages() function. Recent refactoring of the get_user_pages* functions allow flags to be passed through get_user_pages() which eliminates the need for access to this function from its one user, kvm. We can see that the two calls to get_user_pages() which replace __get_user_pages() in kvm_main.c are equivalent by examining their call stacks: get_user_page_nowait(): get_user_pages(start, 1, flags, page, NULL) __get_user_pages_locked(current, current->mm, start, 1, page, NULL, NULL, false, flags | FOLL_TOUCH) __get_user_pages(current, current->mm, start, 1, flags | FOLL_TOUCH | FOLL_GET, page, NULL, NULL) check_user_page_hwpoison(): get_user_pages(addr, 1, flags, NULL, NULL) __get_user_pages_locked(current, current->mm, addr, 1, NULL, NULL, NULL, false, flags | FOLL_TOUCH) __get_user_pages(current, current->mm, addr, 1, flags | FOLL_TOUCH, NULL, NULL, NULL) Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-18mm: remove write/force parameters from __get_user_pages_unlocked()Lorenzo Stoakes2-4/+10
This removes the redundant 'write' and 'force' parameters from __get_user_pages_unlocked() to make the use of FOLL_FORCE explicit in callers as use of this flag can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-09-29Merge tag 'kvm-arm-for-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into nextRadim Krčmář14-147/+694
KVM/ARM Changes for v4.9 - Various cleanups and removal of redundant code - Two important fixes for not using an in-kernel irqchip - A bit of optimizations - Handle SError exceptions and present them to guests if appropriate - Proxying of GICV access at EL2 if guest mappings are unsafe - GICv3 on AArch32 on ARMv8 - Preparations for GICv3 save/restore, including ABI docs
2016-09-27KVM: arm/arm64: vgic: Don't flush/sync without a working vgicChristoffer Dall1-0/+6
If the vgic hasn't been created and initialized, we shouldn't attempt to look at its data structures or flush/sync anything to the GIC hardware. This fixes an issue reported by Alexander Graf when using a userspace irqchip. Fixes: 0919e84c0fc1 ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework") Cc: stable@vger.kernel.org Reported-by: Alexander Graf <agraf@suse.de> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>