aboutsummaryrefslogtreecommitdiffstats
path: root/arch/s390/kvm (follow)
AgeCommit message (Collapse)AuthorFilesLines
2015-02-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds8-216/+822
Pull KVM update from Paolo Bonzini: "Fairly small update, but there are some interesting new features. Common: Optional support for adding a small amount of polling on each HLT instruction executed in the guest (or equivalent for other architectures). This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This also has to be enabled manually for now, but the plan is to auto-tune this in the future. ARM/ARM64: The highlights are support for GICv3 emulation and dirty page tracking s390: Several optimizations and bugfixes. Also a first: a feature exposed by KVM (UUID and long guest name in /proc/sysinfo) before it is available in IBM's hypervisor! :) MIPS: Bugfixes. x86: Support for PML (page modification logging, a new feature in Broadwell Xeons that speeds up dirty page tracking), nested virtualization improvements (nested APICv---a nice optimization), usual round of emulation fixes. There is also a new option to reduce latency of the TSC deadline timer in the guest; this needs to be tuned manually. Some commits are common between this pull and Catalin's; I see you have already included his tree. Powerpc: Nothing yet. The KVM/PPC changes will come in through the PPC maintainers, because I haven't received them yet and I might end up being offline for some part of next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: ia64: drop kvm.h from installed user headers KVM: x86: fix build with !CONFIG_SMP KVM: x86: emulate: correct page fault error code for NoWrite instructions KVM: Disable compat ioctl for s390 KVM: s390: add cpu model support KVM: s390: use facilities and cpu_id per KVM KVM: s390/CPACF: Choose crypto control block format s390/kernel: Update /proc/sysinfo file with Extended Name and UUID KVM: s390: reenable LPP facility KVM: s390: floating irqs: fix user triggerable endless loop kvm: add halt_poll_ns module parameter kvm: remove KVM_MMIO_SIZE KVM: MIPS: Don't leak FPU/DSP to guest KVM: MIPS: Disable HTW while in guest KVM: nVMX: Enable nested posted interrupt processing KVM: nVMX: Enable nested virtual interrupt delivery KVM: nVMX: Enable nested apic register virtualization KVM: nVMX: Make nested control MSRs per-cpu KVM: nVMX: Enable nested virtualize x2apic mode KVM: nVMX: Prepare for using hardware MSR bitmap ...
2015-02-09KVM: s390: add cpu model supportMichael Mueller1-0/+132
This patch enables cpu model support in kvm/s390 via the vm attribute interface. During KVM initialization, the host properties cpuid, IBC value and the facility list are stored in the architecture specific cpu model structure. During vcpu setup, these properties are taken to initialize the related SIE state. This mechanism allows to adjust the properties from user space and thus to implement different selectable cpu models. This patch uses the IBC functionality to block instructions that have not been implemented at the requested CPU type and GA level compared to the full host capability. Userspace has to initialize the cpu model before vcpu creation. A cpu model change of running vcpus is not possible. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-02-09KVM: s390: use facilities and cpu_id per KVMMichael Mueller4-44/+78
The patch introduces facilities and cpu_ids per virtual machine. Different virtual machines may want to expose different facilities and cpu ids to the guest, so let's make them per-vm instead of global. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-02-09KVM: s390/CPACF: Choose crypto control block formatTony Krowiak1-2/+47
We need to specify a different format for the crypto control block depending on whether the APXA facility is installed or not. Let's test for it by executing the PQAP(QCI) function and use either a format-1 or a format-2 crypto control block accordingly. This is a host only change for z13 and does not affect the guest view. Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-02-09KVM: s390: reenable LPP facilityChristian Borntraeger1-1/+1
commit 7be81a46695d ("KVM: s390/facilities: allow TOD-CLOCK steering facility bit") accidentially disabled the "load program parameter" facility bit during rebase for upstream submission (my fault). Re-add that bit. As this is only for a performance measurement helper instruction (used by KVM itself) cc stable is not necessary see http://www-01.ibm.com/support/docview.wss?uid=isg26fcd1cc32246f4c8852574ce0044734a (SA23-2260 The Load-Program-Parameter and CPU-Measurement Facilities) for details about LPP and its usecase. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Fixes: 7be81a46695d ("KVM: s390/facilities: allow TOD-CLOCK steering")
2015-02-09KVM: s390: floating irqs: fix user triggerable endless loopDavid Hildenbrand1-0/+2
If a vm with no VCPUs is created, the injection of a floating irq leads to an endless loop in the kernel. Let's skip the search for a destination VCPU for a floating irq if no VCPUs were created. Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: stable@vger.kernel.org # v3.15+ Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-02-06kvm: add halt_poll_ns module parameterPaolo Bonzini1-0/+1
This patch introduces a new module parameter for the KVM module; when it is present, KVM attempts a bit of polling on every HLT before scheduling itself out via kvm_vcpu_block. This parameter helps a lot for latency-bound workloads---in particular I tested it with O_DSYNC writes with a battery-backed disk in the host. In this case, writes are fast (because the data doesn't have to go all the way to the platters) but they cannot be merged by either the host or the guest. KVM's performance here is usually around 30% of bare metal, or 50% if you use cache=directsync or cache=writethrough (these parameters avoid that the guest sends pointless flush requests, and at the same time they are not slow because of the battery-backed cache). The bad performance happens because on every halt the host CPU decides to halt itself too. When the interrupt comes, the vCPU thread is then migrated to a new physical CPU, and in general the latency is horrible because the vCPU thread has to be scheduled back in. With this patch performance reaches 60-65% of bare metal and, more important, 99% of what you get if you use idle=poll in the guest. This means that the tunable gets rid of this particular bottleneck, and more work can be done to improve performance in the kernel or QEMU. Of course there is some price to pay; every time an otherwise idle vCPUs is interrupted by an interrupt, it will poll unnecessarily and thus impose a little load on the host. The above results were obtained with a mostly random value of the parameter (500000), and the load was around 1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU. The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll, that can be used to tune the parameter. It counts how many HLT instructions received an interrupt during the polling period; each successful poll avoids that Linux schedules the VCPU thread out and back in, and may also avoid a likely trip to C1 and back for the physical CPU. While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second. Of these halts, almost all are failed polls. During the benchmark, instead, basically all halts end within the polling period, except a more or less constant stream of 50 per second coming from vCPUs that are not running the benchmark. The wasted time is thus very low. Things may be slightly different for Windows VMs, which have a ~10 ms timer tick. The effect is also visible on Marcelo's recently-introduced latency test for the TSC deadline timer. Though of course a non-RT kernel has awful latency bounds, the latency of the timer is around 8000-10000 clock cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC deadline timer, thus, the effect is both a smaller average latency and a smaller variance. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-23KVM: s390: remove redundant setting of interrupt typeJens Freimann1-1/+0
Setting inti->type again is unnecessary here, so let's remove this. Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23KVM: s390: fix bug in interrupt parameter checkJens Freimann1-2/+2
When we convert interrupt data from struct kvm_s390_interrupt to struct kvm_s390_irq we need to check the data in the input parameter not the output parameter. Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23KVM: s390: avoid memory leaks if __inject_vm() failsDavid Hildenbrand1-1/+5
We have to delete the allocated interrupt info if __inject_vm() fails. Otherwise user space can keep flooding kvm with floating interrupts and provoke more and more memory leaks. Reported-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: stable@vger.kernel.org # v3.15+ Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23KVM: s390/cpacf: Enable/disable protected key functions for kvm guestTony Krowiak1-0/+75
Created new KVM device attributes for indicating whether the AES and DES/TDES protected key functions are available for programs running on the KVM guest. The attributes are used to set up the controls in the guest SIE block that specify whether programs running on the guest will be given access to the protected key functions available on the s390 hardware. Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Michael Mueller <mimu@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> [split MSA4/protected key into two patches]
2015-01-23KVM: s390: Provide guest TOD Clock Get/Set ControlsJason J. Herne1-0/+128
Provide controls for setting/getting the guest TOD clock based on the VM attribute interface. Provide TOD and TOD_HIGH vm attributes on s390 for managing guest Time Of Day clock value. TOD_HIGH is presently always set to 0. In the future it will contain a high order expansion of the tod clock value after it overflows the 64-bits of the TOD. Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23KVM: s390: trace correct values for set prefix and machine checksJens Freimann1-4/+4
When injecting SIGP set prefix or a machine check, we trace the values in our per-vcpu local_int data structure instead of the parameters passed to the function. Fix this by changing the trace statement to use the correct values. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23KVM: s390: fix bug in sigp emergency signal injectionJens Freimann1-3/+2
Currently we are always setting the wrong bit in the bitmap for pending emergency signals. Instead of using emerg.code from the passed in irq parameter, we use the value in our per-vcpu local_int structure, which is always zero. That means all emergency signals will have address 0 as parameter. If two CPUs send a SIGP to the same target, one might be lost. Let's fix this by using the value from the parameter and also trace the correct value. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23KVM: s390: Take addressing mode into account for MVPG interceptionThomas Huth1-6/+8
The handler for MVPG partial execution interception does not take the current CPU addressing mode into account yet, so addresses are always treated as 64-bit addresses. For correct behaviour, we should properly handle 24-bit and 31-bit addresses, too. Since MVPG is defined to work with logical addresses, we can simply use guest_translate_address() to achieve the required behaviour (since DAT is disabled here, guest_translate_address() skips the MMU translation and only translates the address via kvm_s390_logical_to_effective() and kvm_s390_real_to_abs(), which is exactly what we want here). Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23KVM: s390: no need to hold the kvm->mutex for floating interruptsChristian Borntraeger1-8/+0
The kvm mutex was (probably) used to protect against cpu hotplug. The current code no longer needs to protect against that, as we only rely on CPU data structures that are guaranteed to be available if we can access the CPU. (e.g. vcpu_create will put the cpu in the array AFTER the cpu is ready). Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
2015-01-23KVM: s390: forward most SIGP orders to user spaceDavid Hildenbrand2-0/+54
Most SIGP orders are handled partially in kernel and partially in user space. In order to: - Get a correct SIGP SET PREFIX handler that informs user space - Avoid race conditions between concurrently executed SIGP orders - Serialize SIGP orders per VCPU We need to handle all "slow" SIGP orders in user space. The remaining ones to be handled completely in kernel are: - SENSE - SENSE RUNNING - EXTERNAL CALL - EMERGENCY SIGNAL - CONDITIONAL EMERGENCY SIGNAL According to the PoP, they have to be fast. They can be executed without conflicting to the actions of other pending/concurrently executing orders (e.g. STOP vs. START). This patch introduces a new capability that will - when enabled - forward all but the mentioned SIGP orders to user space. The instruction counters in the kernel are still updated. Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23KVM: s390: clear the pfault queue if user space sets the invalid tokenDavid Hildenbrand1-0/+4
We need a way to clear the async pfault queue from user space (e.g. for resets and SIGP SET ARCHITECTURE). This patch simply clears the queue as soon as user space sets the invalid pfault token. The definition of the invalid token is moved to uapi. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23KVM: s390: only one external call may be pending at a timeDavid Hildenbrand5-25/+63
Only one external call may be pending at a vcpu at a time. For this reason, we have to detect whether the SIGP externcal call interpretation facility is available. If so, all external calls have to be injected using this mechanism. SIGP EXTERNAL CALL orders have to return whether another external call is already pending. This check was missing until now. SIGP SENSE hasn't returned yet in all conditions whether an external call was pending. If a SIGP EXTERNAL CALL irq is to be injected and one is already pending, -EBUSY is returned. Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23KVM: s390: SIGP SET PREFIX cleanupDavid Hildenbrand2-19/+14
This patch cleanes up the the SIGP SET PREFIX code. A SIGP SET PREFIX irq may only be injected if the target vcpu is stopped. Let's move the checking code into the injection code and return -EBUSY if the target vcpu is not stopped. Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23KVM: s390: a VCPU may only stop when no interrupts are left pendingDavid Hildenbrand4-5/+9
As a SIGP STOP is an interrupt with the least priority, it may only result in stop of the vcpu when no other interrupts are left pending. To detect whether a non-stop irq is pending, we need a way to mask out stop irqs from the general kvm_cpu_has_interrupt() function. For this reason, the existing function (with an outdated name) is replaced by kvm_s390_vcpu_has_irq() which allows to mask out pending stop irqs. Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23KVM: s390: handle stop irqs without action_bitsDavid Hildenbrand6-87/+88
This patch removes the famous action_bits and moves the handling of SIGP STOP AND STORE STATUS directly into the SIGP STOP interrupt. The new local interrupt infrastructure is used to track pending stop requests. STOP irqs are the only irqs that don't get actively delivered. They remain pending until the stop function is executed (=stop intercept). If another STOP irq is already pending, -EBUSY will now be returned (needed for the SIGP handling code). Migration of pending SIGP STOP (AND STORE STATUS) orders should now be supported out of the box. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23KVM: s390: new parameter for SIGP STOP irqsDavid Hildenbrand1-1/+17
In order to get rid of the action_flags and to properly migrate pending SIGP STOP irqs triggered e.g. by SIGP STOP AND STORE STATUS, we need to remember whether to store the status when stopping. For this reason, a new parameter (flags) for the SIGP STOP irq is introduced. These flags further define details of the requested STOP and can be easily migrated. Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23KVM: s390: forward hrtimer if guest ckc not pending yetDavid Hildenbrand1-2/+12
Patch 0759d0681cae ("KVM: s390: cleanup handle_wait by reusing kvm_vcpu_block") changed the way pending guest clock comparator interrupts are detected. It was assumed that as soon as the hrtimer wakes up, the condition for the guest ckc is satisfied. This is however only true as long as adjclock() doesn't speed up the monotonic clock. Reason is that the hrtimer is based on CLOCK_MONOTONIC, the guest clock comparator detection is based on the raw TOD clock. If CLOCK_MONOTONIC runs faster than the TOD clock, the hrtimer wakes the target VCPU up too early and the target VCPU will not detect any pending interrupts, therefore going back to sleep. It will never be woken up again because the hrtimer has finished. The VCPU is stuck. As a quick fix, we have to forward the hrtimer until the guest clock comparator is really due, to guarantee properly timed wake ups. As the hrtimer callback might be triggered on another cpu, we have to make sure that the timer is really stopped and not currently executing the callback on another cpu. This can happen if the vcpu thread is scheduled onto another physical cpu, but the timer base is not migrated. So lets use hrtimer_cancel instead of try_to_cancel. A proper fix might be to introduce a RAW based hrtimer. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23KVM: s390: base hrtimer on a monotonic clockDavid Hildenbrand1-1/+1
The hrtimer that handles the wait with enabled timer interrupts should not be disturbed by changes of the host time. This patch changes our hrtimer to be based on a monotonic clock. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23KVM: s390: prevent sleep duration underflows in handle_wait()David Hildenbrand1-1/+7
We sometimes get an underflow for the sleep duration, which most likely won't result in the short sleep time we wanted. So let's check for sleep duration underflows and directly continue to run the guest if we get one. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23KVM: s390: Allow userspace to limit guest memory sizeDominik Dingel1-3/+62
With commit c6c956b80bdf ("KVM: s390/mm: support gmap page tables with less than 5 levels") we are able to define a limit for the guest memory size. As we round up the guest size in respect to the levels of page tables we get to guest limits of: 2048 MB, 4096 GB, 8192 TB and 16384 PB. We currently limit the guest size to 16 TB, which means we end up creating a page table structure supporting guest sizes up to 8192 TB. This patch introduces an interface that allows userspace to tune this limit. This may bring performance improvements for small guests. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23KVM: s390: move vcpu specific initalization to a later pointDominik Dingel1-9/+16
As we will allow in a later patch to recreate gmaps with new limits, we need to make sure that vcpus get their reference for that gmap after they increased the online_vcpu counter, so there is no possible race. While we are doing this, we also can simplify the vcpu_init function, by moving ucontrol specifics to an own function. That way we also start now setting the kvm_valid_regs for the ucontrol path. Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23KVM: s390: make local function staticChristian Borntraeger1-1/+1
sparse rightfully complains about warning: symbol '__inject_extcall' was not declared. Should it be static? Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23KVM: remove unneeded return value of vcpu_postcreateDominik Dingel1-2/+1
The return value of kvm_arch_vcpu_postcreate is not checked in its caller. This is okay, because only x86 provides vcpu_postcreate right now and it could only fail if vcpu_load failed. But that is not possible during KVM_CREATE_VCPU (kvm_arch_vcpu_load is void, too), so just get rid of the unchecked return value. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-06rcu: Make SRCU optional by using CONFIG_SRCUPranith Kumar1-0/+1
SRCU is not necessary to be compiled by default in all cases. For tinification efforts not compiling SRCU unless necessary is desirable. The current patch tries to make compiling SRCU optional by introducing a new Kconfig option CONFIG_SRCU which is selected when any of the components making use of SRCU are selected. If we do not select CONFIG_SRCU, srcu.o will not be compiled at all. text data bss dec hex filename 2007 0 0 2007 7d7 kernel/rcu/srcu.o Size of arch/powerpc/boot/zImage changes from text data bss dec hex filename 831552 64180 23944 919676 e087c arch/powerpc/boot/zImage : before 829504 64180 23952 917636 e0084 arch/powerpc/boot/zImage : after so the savings are about ~2000 bytes. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> CC: Josh Triplett <josh@joshtriplett.org> CC: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]
2014-12-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linuxLinus Torvalds1-12/+6
Pull ACCESS_ONCE cleanup preparation from Christian Borntraeger: "kernel: Provide READ_ONCE and ASSIGN_ONCE As discussed on LKML http://marc.info/?i=54611D86.4040306%40de.ibm.com ACCESS_ONCE might fail with specific compilers for non-scalar accesses. Here is a set of patches to tackle that problem. The first patch introduce READ_ONCE and ASSIGN_ONCE. If the data structure is larger than the machine word size memcpy is used and a warning is emitted. The next patches fix up several in-tree users of ACCESS_ONCE on non-scalar types. This does not yet contain a patch that forces ACCESS_ONCE to work only on scalar types. This is targetted for the next merge window as Linux next already contains new offenders regarding ACCESS_ONCE vs. non-scalar types" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux: s390/kvm: REPLACE barrier fixup with READ_ONCE arm/spinlock: Replace ACCESS_ONCE with READ_ONCE arm64/spinlock: Replace ACCESS_ONCE READ_ONCE mips/gup: Replace ACCESS_ONCE with READ_ONCE x86/gup: Replace ACCESS_ONCE with READ_ONCE x86/spinlock: Replace ACCESS_ONCE with READ_ONCE mm: replace ACCESS_ONCE with READ_ONCE or barriers kernel: Provide READ_ONCE and ASSIGN_ONCE
2014-12-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds7-583/+954
Pull KVM update from Paolo Bonzini: "3.19 changes for KVM: - spring cleaning: removed support for IA64, and for hardware- assisted virtualization on the PPC970 - ARM, PPC, s390 all had only small fixes For x86: - small performance improvements (though only on weird guests) - usual round of hardware-compliancy fixes from Nadav - APICv fixes - XSAVES support for hosts and guests. XSAVES hosts were broken because the (non-KVM) XSAVES patches inadvertently changed the KVM userspace ABI whenever XSAVES was enabled; hence, this part is going to stable. Guest support is just a matter of exposing the feature and CPUID leaves support" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits) KVM: move APIC types to arch/x86/ KVM: PPC: Book3S: Enable in-kernel XICS emulation by default KVM: PPC: Book3S HV: Improve H_CONFER implementation KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register KVM: PPC: Book3S HV: Remove code for PPC970 processors KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions KVM: PPC: Book3S HV: Simplify locking around stolen time calculations arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function arch: powerpc: kvm: book3s_pr.c: Remove unused function arch: powerpc: kvm: book3s.c: Remove some unused functions arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked KVM: PPC: Book3S HV: ptes are big endian KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI KVM: PPC: Book3S HV: Fix KSM memory corruption KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI KVM: PPC: Book3S HV: Fix computation of tlbie operand KVM: PPC: Book3S HV: Add missing HPTE unlock KVM: PPC: BookE: Improve irq inject tracepoint arm/arm64: KVM: Require in-kernel vgic for the arch timers ...
2014-12-18s390/kvm: REPLACE barrier fixup with READ_ONCEChristian Borntraeger1-12/+6
ACCESS_ONCE does not work reliably on non-scalar types. For example gcc 4.6 and 4.7 might remove the volatile tag for such accesses during the SRA (scalar replacement of aggregates) step (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145) Commit 1365039d0cb3 ("KVM: s390: Fix ipte locking") replace ACCESS_ONCE with barriers. Lets use READ_ONCE instead. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-12-04KVM: s390: clean up return code handling in irq delivery codeJens Freimann1-13/+13
Instead of returning a possibly random or'ed together value, let's always return -EFAULT if rc is set. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-12-04KVM: s390: use atomic bitops to access pending_irqs bitmapJens Freimann1-2/+2
Currently we use a mixture of atomic/non-atomic bitops and the local_int spin lock to protect the pending_irqs bitmap and interrupt payload data. We need to use atomic bitops for the pending_irqs bitmap everywhere and in addition acquire the local_int lock where interrupt data needs to be protected. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-12-04KVM: s390: some ext irqs have to clear the ext cpu addrDavid Hildenbrand1-0/+3
The cpu address of a source cpu (responsible for an external irq) is only to be stored if bit 6 of the ext irq code is set. If bit 6 is not set, it is to be zeroed out. The special external irq code used for virtio and pfault uses the cpu addr as a parameter field. As bit 6 is set, this implementation is correct. Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28KVM: s390: allow injecting all kinds of machine checksJens Freimann1-3/+11
Allow to specify CR14, logout area, external damage code and failed storage address. Since more then one machine check can be indicated to the guest at a time we need to combine all indication bits with already pending requests. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28KVM: s390: handle pending local interrupts via bitmapJens Freimann5-280/+380
This patch adapts handling of local interrupts to be more compliant with the z/Architecture Principles of Operation and introduces a data structure which allows more efficient handling of interrupts. * get rid of li->active flag, use bitmap instead * Keep interrupts in a bitmap instead of a list * Deliver interrupts in the order of their priority as defined in the PoP * Use a second bitmap for sigp emergency requests, as a CPU can have one request pending from every other CPU in the system. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28KVM: s390: refactor interrupt delivery codeJens Freimann1-177/+282
Move delivery code for cpu-local interrupt from the huge do_deliver_interrupt() to smaller functions which handle one type of interrupt. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28KVM: s390: add defines for virtio and pfault interrupt codeJens Freimann1-2/+4
Get rid of open coded value for virtio and pfault completion interrupts. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28KVM: s390: external param not valid for cpu timer and ckcDavid Hildenbrand1-4/+2
The 32bit external interrupt parameter is only valid for timing-alert and service-signal interrupts. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28KVM: s390: refactor interrupt injection codeJens Freimann1-54/+167
In preparation for the rework of the local interrupt injection code, factor out injection routines from kvm_s390_inject_vcpu(). Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28KVM: s390: trigger the right CPU exit for floating interruptsChristian Borntraeger1-1/+11
When injecting a floating interrupt and no CPU is idle we kick one CPU to do an external exit. In case of I/O we should trigger an I/O exit instead. This does not matter for Linux guests as external and I/O interrupts are enabled/disabled at the same time, but play safe anyway. The same holds true for machine checks. Since there is no special exit, just reuse the generic stop exit. The injection code inside the VCPU loop will recheck anyway and rearm the proper exits (e.g. control registers) if necessary. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
2014-11-28KVM: s390: Fix rewinding of the PSW pointing to an EXECUTE instructionThomas Huth3-12/+22
A couple of our interception handlers rewind the PSW to the beginning of the instruction to run the intercepted instruction again during the next SIE entry. This normally works fine, but there is also the possibility that the instruction did not get run directly but via an EXECUTE instruction. In this case, the PSW does not point to the instruction that caused the interception, but to the EXECUTE instruction! So we've got to rewind the PSW to the beginning of the EXECUTE instruction instead. This is now accomplished with a new helper function kvm_s390_rewind_psw(). Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28KVM: s390: Small fixes for the PFMF handlerThomas Huth1-4/+7
This patch includes two small fixes for the PFMF handler: First, the start address for PFMF has to be masked according to the current addressing mode, which is now done with kvm_s390_logical_to_effective(). Second, the protection exceptions have a lower priority than the specification exceptions, so the check for low-address protection has to be moved after the last spot where we inject a specification exception. Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-07Merge tag 'kvm-s390-next-20141107' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEADPaolo Bonzini2-44/+48
KVM: s390: Fixes for kvm/next (3.19) and stable 1. We should flush TLBs for load control instruction emulation (stable) 2. A workaround for a compiler bug that renders ACCESS_ONCE broken (stable) 3. Fix program check handling for load control 4. Documentation Fix
2014-11-07KVM: s390: fix handling of lctl[g]/stctl[g]Heiko Carstens1-36/+32
According to the architecture all instructions are suppressing if memory access is prohibited due to DAT protection, unless stated otherwise for an instruction. The lctl[g]/stctl[g] implementations handled this incorrectly since control register handling was done piecemeal, which means they had terminating instead of suppressing semantics. This patch fixes this. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-07KVM: s390: flush CPU on load controlChristian Borntraeger1-2/+2
some control register changes will flush some aspects of the CPU, e.g. POP explicitely mentions that for CR9-CR11 "TLBs may be cleared". Instead of trying to be clever and only flush on specific CRs, let play safe and flush on all lctl(g) as future machines might define new bits in CRs. Load control intercept should not happen that often. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: stable@vger.kernel.org
2014-11-07KVM: s390: Fix ipte lockingChristian Borntraeger1-6/+14
ipte_unlock_siif uses cmpxchg to replace the in-memory data of the ipte lock together with ACCESS_ONCE for the intial read. union ipte_control { unsigned long val; struct { unsigned long k : 1; unsigned long kh : 31; unsigned long kg : 32; }; }; [...] static void ipte_unlock_siif(struct kvm_vcpu *vcpu) { union ipte_control old, new, *ic; ic = &vcpu->kvm->arch.sca->ipte_control; do { new = old = ACCESS_ONCE(*ic); new.kh--; if (!new.kh) new.k = 0; } while (cmpxchg(&ic->val, old.val, new.val) != old.val); if (!new.kh) wake_up(&vcpu->kvm->arch.ipte_wq); } The new value, is loaded twice from memory with gcc 4.7.2 of fedora 18, despite the ACCESS_ONCE: ---> l %r4,0(%r3) <--- load first 32 bit of lock (k and kh) in r4 alfi %r4,2147483647 <--- add -1 to r4 llgtr %r4,%r4 <--- zero out the sign bit of r4 lg %r1,0(%r3) <--- load all 64 bit of lock into new lgr %r2,%r1 <--- load the same into old risbg %r1,%r4,1,31,32 <--- shift and insert r4 into the bits 1-31 of new llihf %r4,2147483647 ngrk %r4,%r1,%r4 jne aa0 <ipte_unlock+0xf8> nihh %r1,32767 lgr %r4,%r2 csg %r4,%r1,0(%r3) cgr %r2,%r4 jne a70 <ipte_unlock+0xc8> If the memory value changes between the first load (l) and the second load (lg) we are broken. If that happens VCPU threads will hang (unkillable) in handle_ipte_interlock. Andreas Krebbel analyzed this and tracked it down to a compiler bug in that version: "while it is not that obvious the C99 standard basically forbids duplicating the memory access also in that case. For an argumentation of a similiar case please see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=22278#c43 For the implementation-defined cases regarding volatile there are some GCC-specific clarifications which can be found here: https://gcc.gnu.org/onlinedocs/gcc/Volatiles.html#Volatiles I've tracked down the problem with a reduced testcase. The problem was that during a tree level optimization (SRA - scalar replacement of aggregates) the volatile marker is lost. And an RTL level optimizer (CSE - common subexpression elimination) then propagated the memory read into its second use introducing another access to the memory location. So indeed Christian's suspicion that the union access has something to do with it is correct (since it triggered the SRA optimization). This issue has been reported and fixed in the GCC 4.8 development cycle: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145" This patch replaces the ACCESS_ONCE scheme with a barrier() based scheme that should work for all supported compilers. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: stable@vger.kernel.org # v3.16+