aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorPaolo Bonzini <pbonzini@redhat.com>2019-05-15 23:38:42 +0200
committerPaolo Bonzini <pbonzini@redhat.com>2019-05-15 23:39:38 +0200
commit59c5c58c5b93285753d5c1de34d2e00039c27bc0 (patch)
tree19ac0493a5eb3bef477cb04f8117dad12b6bddb9 /Documentation
parentRevert "KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU" (diff)
parentKVM: PPC: Book3S: Remove useless checks in 'release' method of KVM device (diff)
downloadlinux-dev-59c5c58c5b93285753d5c1de34d2e00039c27bc0.tar.xz
linux-dev-59c5c58c5b93285753d5c1de34d2e00039c27bc0.zip
Merge tag 'kvm-ppc-next-5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
PPC KVM update for 5.2 * Support for guests to access the new POWER9 XIVE interrupt controller hardware directly, reducing interrupt latency and overhead for guests. * In-kernel implementation of the H_PAGE_INIT hypercall. * Reduce memory usage of sparsely-populated IOMMU tables. * Several bug fixes. Second PPC KVM update for 5.2 * Fix a bug, fix a spelling mistake, remove some useless code.
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/powerpc/DAWR-POWER9.txt32
-rw-r--r--Documentation/virtual/kvm/api.txt10
-rw-r--r--Documentation/virtual/kvm/devices/xive.txt197
3 files changed, 239 insertions, 0 deletions
diff --git a/Documentation/powerpc/DAWR-POWER9.txt b/Documentation/powerpc/DAWR-POWER9.txt
index 2feaa6619658..bdec03650941 100644
--- a/Documentation/powerpc/DAWR-POWER9.txt
+++ b/Documentation/powerpc/DAWR-POWER9.txt
@@ -56,3 +56,35 @@ POWER9. Loads and stores to the watchpoint locations will not be
trapped in GDB. The watchpoint is remembered, so if the guest is
migrated back to the POWER8 host, it will start working again.
+Force enabling the DAWR
+=============================
+Kernels (since ~v5.2) have an option to force enable the DAWR via:
+
+ echo Y > /sys/kernel/debug/powerpc/dawr_enable_dangerous
+
+This enables the DAWR even on POWER9.
+
+This is a dangerous setting, USE AT YOUR OWN RISK.
+
+Some users may not care about a bad user crashing their box
+(ie. single user/desktop systems) and really want the DAWR. This
+allows them to force enable DAWR.
+
+This flag can also be used to disable DAWR access. Once this is
+cleared, all DAWR access should be cleared immediately and your
+machine once again safe from crashing.
+
+Userspace may get confused by toggling this. If DAWR is force
+enabled/disabled between getting the number of breakpoints (via
+PTRACE_GETHWDBGINFO) and setting the breakpoint, userspace will get an
+inconsistent view of what's available. Similarly for guests.
+
+For the DAWR to be enabled in a KVM guest, the DAWR needs to be force
+enabled in the host AND the guest. For this reason, this won't work on
+POWERVM as it doesn't allow the HCALL to work. Writes of 'Y' to the
+dawr_enable_dangerous file will fail if the hypervisor doesn't support
+writing the DAWR.
+
+To double check the DAWR is working, run this kernel selftest:
+ tools/testing/selftests/powerpc/ptrace/ptrace-hwbreak.c
+Any errors/failures/skips mean something is wrong.
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 47a5eb00bc53..8ffd9beb931b 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1967,6 +1967,7 @@ registers, find a list below:
PPC | KVM_REG_PPC_TLB3PS | 32
PPC | KVM_REG_PPC_EPTCFG | 32
PPC | KVM_REG_PPC_ICP_STATE | 64
+ PPC | KVM_REG_PPC_VP_STATE | 128
PPC | KVM_REG_PPC_TB_OFFSET | 64
PPC | KVM_REG_PPC_SPMC1 | 32
PPC | KVM_REG_PPC_SPMC2 | 32
@@ -4487,6 +4488,15 @@ struct kvm_sync_regs {
struct kvm_vcpu_events events;
};
+6.75 KVM_CAP_PPC_IRQ_XIVE
+
+Architectures: ppc
+Target: vcpu
+Parameters: args[0] is the XIVE device fd
+ args[1] is the XIVE CPU number (server ID) for this vcpu
+
+This capability connects the vcpu to an in-kernel XIVE device.
+
7. Capabilities that can be enabled on VMs
------------------------------------------
diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt
new file mode 100644
index 000000000000..9a24a4525253
--- /dev/null
+++ b/Documentation/virtual/kvm/devices/xive.txt
@@ -0,0 +1,197 @@
+POWER9 eXternal Interrupt Virtualization Engine (XIVE Gen1)
+==========================================================
+
+Device types supported:
+ KVM_DEV_TYPE_XIVE POWER9 XIVE Interrupt Controller generation 1
+
+This device acts as a VM interrupt controller. It provides the KVM
+interface to configure the interrupt sources of a VM in the underlying
+POWER9 XIVE interrupt controller.
+
+Only one XIVE instance may be instantiated. A guest XIVE device
+requires a POWER9 host and the guest OS should have support for the
+XIVE native exploitation interrupt mode. If not, it should run using
+the legacy interrupt mode, referred as XICS (POWER7/8).
+
+* Device Mappings
+
+ The KVM device exposes different MMIO ranges of the XIVE HW which
+ are required for interrupt management. These are exposed to the
+ guest in VMAs populated with a custom VM fault handler.
+
+ 1. Thread Interrupt Management Area (TIMA)
+
+ Each thread has an associated Thread Interrupt Management context
+ composed of a set of registers. These registers let the thread
+ handle priority management and interrupt acknowledgment. The most
+ important are :
+
+ - Interrupt Pending Buffer (IPB)
+ - Current Processor Priority (CPPR)
+ - Notification Source Register (NSR)
+
+ They are exposed to software in four different pages each proposing
+ a view with a different privilege. The first page is for the
+ physical thread context and the second for the hypervisor. Only the
+ third (operating system) and the fourth (user level) are exposed the
+ guest.
+
+ 2. Event State Buffer (ESB)
+
+ Each source is associated with an Event State Buffer (ESB) with
+ either a pair of even/odd pair of pages which provides commands to
+ manage the source: to trigger, to EOI, to turn off the source for
+ instance.
+
+ 3. Device pass-through
+
+ When a device is passed-through into the guest, the source
+ interrupts are from a different HW controller (PHB4) and the ESB
+ pages exposed to the guest should accommadate this change.
+
+ The passthru_irq helpers, kvmppc_xive_set_mapped() and
+ kvmppc_xive_clr_mapped() are called when the device HW irqs are
+ mapped into or unmapped from the guest IRQ number space. The KVM
+ device extends these helpers to clear the ESB pages of the guest IRQ
+ number being mapped and then lets the VM fault handler repopulate.
+ The handler will insert the ESB page corresponding to the HW
+ interrupt of the device being passed-through or the initial IPI ESB
+ page if the device has being removed.
+
+ The ESB remapping is fully transparent to the guest and the OS
+ device driver. All handling is done within VFIO and the above
+ helpers in KVM-PPC.
+
+* Groups:
+
+ 1. KVM_DEV_XIVE_GRP_CTRL
+ Provides global controls on the device
+ Attributes:
+ 1.1 KVM_DEV_XIVE_RESET (write only)
+ Resets the interrupt controller configuration for sources and event
+ queues. To be used by kexec and kdump.
+ Errors: none
+
+ 1.2 KVM_DEV_XIVE_EQ_SYNC (write only)
+ Sync all the sources and queues and mark the EQ pages dirty. This
+ to make sure that a consistent memory state is captured when
+ migrating the VM.
+ Errors: none
+
+ 2. KVM_DEV_XIVE_GRP_SOURCE (write only)
+ Initializes a new source in the XIVE device and mask it.
+ Attributes:
+ Interrupt source number (64-bit)
+ The kvm_device_attr.addr points to a __u64 value:
+ bits: | 63 .... 2 | 1 | 0
+ values: | unused | level | type
+ - type: 0:MSI 1:LSI
+ - level: assertion level in case of an LSI.
+ Errors:
+ -E2BIG: Interrupt source number is out of range
+ -ENOMEM: Could not create a new source block
+ -EFAULT: Invalid user pointer for attr->addr.
+ -ENXIO: Could not allocate underlying HW interrupt
+
+ 3. KVM_DEV_XIVE_GRP_SOURCE_CONFIG (write only)
+ Configures source targeting
+ Attributes:
+ Interrupt source number (64-bit)
+ The kvm_device_attr.addr points to a __u64 value:
+ bits: | 63 .... 33 | 32 | 31 .. 3 | 2 .. 0
+ values: | eisn | mask | server | priority
+ - priority: 0-7 interrupt priority level
+ - server: CPU number chosen to handle the interrupt
+ - mask: mask flag (unused)
+ - eisn: Effective Interrupt Source Number
+ Errors:
+ -ENOENT: Unknown source number
+ -EINVAL: Not initialized source number
+ -EINVAL: Invalid priority
+ -EINVAL: Invalid CPU number.
+ -EFAULT: Invalid user pointer for attr->addr.
+ -ENXIO: CPU event queues not configured or configuration of the
+ underlying HW interrupt failed
+ -EBUSY: No CPU available to serve interrupt
+
+ 4. KVM_DEV_XIVE_GRP_EQ_CONFIG (read-write)
+ Configures an event queue of a CPU
+ Attributes:
+ EQ descriptor identifier (64-bit)
+ The EQ descriptor identifier is a tuple (server, priority) :
+ bits: | 63 .... 32 | 31 .. 3 | 2 .. 0
+ values: | unused | server | priority
+ The kvm_device_attr.addr points to :
+ struct kvm_ppc_xive_eq {
+ __u32 flags;
+ __u32 qshift;
+ __u64 qaddr;
+ __u32 qtoggle;
+ __u32 qindex;
+ __u8 pad[40];
+ };
+ - flags: queue flags
+ KVM_XIVE_EQ_ALWAYS_NOTIFY (required)
+ forces notification without using the coalescing mechanism
+ provided by the XIVE END ESBs.
+ - qshift: queue size (power of 2)
+ - qaddr: real address of queue
+ - qtoggle: current queue toggle bit
+ - qindex: current queue index
+ - pad: reserved for future use
+ Errors:
+ -ENOENT: Invalid CPU number
+ -EINVAL: Invalid priority
+ -EINVAL: Invalid flags
+ -EINVAL: Invalid queue size
+ -EINVAL: Invalid queue address
+ -EFAULT: Invalid user pointer for attr->addr.
+ -EIO: Configuration of the underlying HW failed
+
+ 5. KVM_DEV_XIVE_GRP_SOURCE_SYNC (write only)
+ Synchronize the source to flush event notifications
+ Attributes:
+ Interrupt source number (64-bit)
+ Errors:
+ -ENOENT: Unknown source number
+ -EINVAL: Not initialized source number
+
+* VCPU state
+
+ The XIVE IC maintains VP interrupt state in an internal structure
+ called the NVT. When a VP is not dispatched on a HW processor
+ thread, this structure can be updated by HW if the VP is the target
+ of an event notification.
+
+ It is important for migration to capture the cached IPB from the NVT
+ as it synthesizes the priorities of the pending interrupts. We
+ capture a bit more to report debug information.
+
+ KVM_REG_PPC_VP_STATE (2 * 64bits)
+ bits: | 63 .... 32 | 31 .... 0 |
+ values: | TIMA word0 | TIMA word1 |
+ bits: | 127 .......... 64 |
+ values: | unused |
+
+* Migration:
+
+ Saving the state of a VM using the XIVE native exploitation mode
+ should follow a specific sequence. When the VM is stopped :
+
+ 1. Mask all sources (PQ=01) to stop the flow of events.
+
+ 2. Sync the XIVE device with the KVM control KVM_DEV_XIVE_EQ_SYNC to
+ flush any in-flight event notification and to stabilize the EQs. At
+ this stage, the EQ pages are marked dirty to make sure they are
+ transferred in the migration sequence.
+
+ 3. Capture the state of the source targeting, the EQs configuration
+ and the state of thread interrupt context registers.
+
+ Restore is similar :
+
+ 1. Restore the EQ configuration. As targeting depends on it.
+ 2. Restore targeting
+ 3. Restore the thread interrupt contexts
+ 4. Restore the source states
+ 5. Let the vCPU run