diff options
Diffstat (limited to 'Documentation/virt/kvm')
-rw-r--r-- | Documentation/virt/kvm/api.rst | 255 | ||||
-rw-r--r-- | Documentation/virt/kvm/devices/vcpu.rst | 70 | ||||
-rw-r--r-- | Documentation/virt/kvm/devices/xics.rst | 2 | ||||
-rw-r--r-- | Documentation/virt/kvm/devices/xive.rst | 2 |
4 files changed, 309 insertions, 20 deletions
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index a6729c8cf063..aeeb071c7688 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -532,7 +532,7 @@ translation mode. ------------------ :Capability: basic -:Architectures: x86, ppc, mips +:Architectures: x86, ppc, mips, riscv :Type: vcpu ioctl :Parameters: struct kvm_interrupt (in) :Returns: 0 on success, negative on failure. @@ -601,6 +601,23 @@ interrupt number dequeues the interrupt. This is an asynchronous vcpu ioctl and can be invoked from any thread. +RISC-V: +^^^^^^^ + +Queues an external interrupt to be injected into the virutal CPU. This ioctl +is overloaded with 2 different irq values: + +a) KVM_INTERRUPT_SET + + This sets external interrupt for a virtual CPU and it will receive + once it is ready. + +b) KVM_INTERRUPT_UNSET + + This clears pending external interrupt for a virtual CPU. + +This is an asynchronous vcpu ioctl and can be invoked from any thread. + 4.17 KVM_DEBUG_GUEST -------------------- @@ -993,20 +1010,37 @@ such as migration. When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the set of bits that KVM can return in struct kvm_clock_data's flag member. -The only flag defined now is KVM_CLOCK_TSC_STABLE. If set, the returned -value is the exact kvmclock value seen by all VCPUs at the instant -when KVM_GET_CLOCK was called. If clear, the returned value is simply -CLOCK_MONOTONIC plus a constant offset; the offset can be modified -with KVM_SET_CLOCK. KVM will try to make all VCPUs follow this clock, -but the exact value read by each VCPU could differ, because the host -TSC is not stable. +The following flags are defined: + +KVM_CLOCK_TSC_STABLE + If set, the returned value is the exact kvmclock + value seen by all VCPUs at the instant when KVM_GET_CLOCK was called. + If clear, the returned value is simply CLOCK_MONOTONIC plus a constant + offset; the offset can be modified with KVM_SET_CLOCK. KVM will try + to make all VCPUs follow this clock, but the exact value read by each + VCPU could differ, because the host TSC is not stable. + +KVM_CLOCK_REALTIME + If set, the `realtime` field in the kvm_clock_data + structure is populated with the value of the host's real time + clocksource at the instant when KVM_GET_CLOCK was called. If clear, + the `realtime` field does not contain a value. + +KVM_CLOCK_HOST_TSC + If set, the `host_tsc` field in the kvm_clock_data + structure is populated with the value of the host's timestamp counter (TSC) + at the instant when KVM_GET_CLOCK was called. If clear, the `host_tsc` field + does not contain a value. :: struct kvm_clock_data { __u64 clock; /* kvmclock current value */ __u32 flags; - __u32 pad[9]; + __u32 pad0; + __u64 realtime; + __u64 host_tsc; + __u32 pad[4]; }; @@ -1023,12 +1057,25 @@ Sets the current timestamp of kvmclock to the value specified in its parameter. In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios such as migration. +The following flags can be passed: + +KVM_CLOCK_REALTIME + If set, KVM will compare the value of the `realtime` field + with the value of the host's real time clocksource at the instant when + KVM_SET_CLOCK was called. The difference in elapsed time is added to the final + kvmclock value that will be provided to guests. + +Other flags returned by ``KVM_GET_CLOCK`` are accepted but ignored. + :: struct kvm_clock_data { __u64 clock; /* kvmclock current value */ __u32 flags; - __u32 pad[9]; + __u32 pad0; + __u64 realtime; + __u64 host_tsc; + __u32 pad[4]; }; @@ -1399,7 +1446,7 @@ for vm-wide capabilities. --------------------- :Capability: KVM_CAP_MP_STATE -:Architectures: x86, s390, arm, arm64 +:Architectures: x86, s390, arm, arm64, riscv :Type: vcpu ioctl :Parameters: struct kvm_mp_state (out) :Returns: 0 on success; -1 on error @@ -1416,7 +1463,8 @@ uniprocessor guests). Possible values are: ========================== =============================================== - KVM_MP_STATE_RUNNABLE the vcpu is currently running [x86,arm/arm64] + KVM_MP_STATE_RUNNABLE the vcpu is currently running + [x86,arm/arm64,riscv] KVM_MP_STATE_UNINITIALIZED the vcpu is an application processor (AP) which has not yet received an INIT signal [x86] KVM_MP_STATE_INIT_RECEIVED the vcpu has received an INIT signal, and is @@ -1425,7 +1473,7 @@ Possible values are: is waiting for an interrupt [x86] KVM_MP_STATE_SIPI_RECEIVED the vcpu has just received a SIPI (vector accessible via KVM_GET_VCPU_EVENTS) [x86] - KVM_MP_STATE_STOPPED the vcpu is stopped [s390,arm/arm64] + KVM_MP_STATE_STOPPED the vcpu is stopped [s390,arm/arm64,riscv] KVM_MP_STATE_CHECK_STOP the vcpu is in a special error state [s390] KVM_MP_STATE_OPERATING the vcpu is operating (running or halted) [s390] @@ -1437,8 +1485,8 @@ On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel irqchip, the multiprocessing state must be maintained by userspace on these architectures. -For arm/arm64: -^^^^^^^^^^^^^^ +For arm/arm64/riscv: +^^^^^^^^^^^^^^^^^^^^ The only states that are valid are KVM_MP_STATE_STOPPED and KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not. @@ -1447,7 +1495,7 @@ KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not. --------------------- :Capability: KVM_CAP_MP_STATE -:Architectures: x86, s390, arm, arm64 +:Architectures: x86, s390, arm, arm64, riscv :Type: vcpu ioctl :Parameters: struct kvm_mp_state (in) :Returns: 0 on success; -1 on error @@ -1459,8 +1507,8 @@ On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an in-kernel irqchip, the multiprocessing state must be maintained by userspace on these architectures. -For arm/arm64: -^^^^^^^^^^^^^^ +For arm/arm64/riscv: +^^^^^^^^^^^^^^^^^^^^ The only states that are valid are KVM_MP_STATE_STOPPED and KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not. @@ -2577,6 +2625,144 @@ following id bit patterns:: 0x7020 0000 0003 02 <0:3> <reg:5> +RISC-V registers are mapped using the lower 32 bits. The upper 8 bits of +that is the register group type. + +RISC-V config registers are meant for configuring a Guest VCPU and it has +the following id bit patterns:: + + 0x8020 0000 01 <index into the kvm_riscv_config struct:24> (32bit Host) + 0x8030 0000 01 <index into the kvm_riscv_config struct:24> (64bit Host) + +Following are the RISC-V config registers: + +======================= ========= ============================================= + Encoding Register Description +======================= ========= ============================================= + 0x80x0 0000 0100 0000 isa ISA feature bitmap of Guest VCPU +======================= ========= ============================================= + +The isa config register can be read anytime but can only be written before +a Guest VCPU runs. It will have ISA feature bits matching underlying host +set by default. + +RISC-V core registers represent the general excution state of a Guest VCPU +and it has the following id bit patterns:: + + 0x8020 0000 02 <index into the kvm_riscv_core struct:24> (32bit Host) + 0x8030 0000 02 <index into the kvm_riscv_core struct:24> (64bit Host) + +Following are the RISC-V core registers: + +======================= ========= ============================================= + Encoding Register Description +======================= ========= ============================================= + 0x80x0 0000 0200 0000 regs.pc Program counter + 0x80x0 0000 0200 0001 regs.ra Return address + 0x80x0 0000 0200 0002 regs.sp Stack pointer + 0x80x0 0000 0200 0003 regs.gp Global pointer + 0x80x0 0000 0200 0004 regs.tp Task pointer + 0x80x0 0000 0200 0005 regs.t0 Caller saved register 0 + 0x80x0 0000 0200 0006 regs.t1 Caller saved register 1 + 0x80x0 0000 0200 0007 regs.t2 Caller saved register 2 + 0x80x0 0000 0200 0008 regs.s0 Callee saved register 0 + 0x80x0 0000 0200 0009 regs.s1 Callee saved register 1 + 0x80x0 0000 0200 000a regs.a0 Function argument (or return value) 0 + 0x80x0 0000 0200 000b regs.a1 Function argument (or return value) 1 + 0x80x0 0000 0200 000c regs.a2 Function argument 2 + 0x80x0 0000 0200 000d regs.a3 Function argument 3 + 0x80x0 0000 0200 000e regs.a4 Function argument 4 + 0x80x0 0000 0200 000f regs.a5 Function argument 5 + 0x80x0 0000 0200 0010 regs.a6 Function argument 6 + 0x80x0 0000 0200 0011 regs.a7 Function argument 7 + 0x80x0 0000 0200 0012 regs.s2 Callee saved register 2 + 0x80x0 0000 0200 0013 regs.s3 Callee saved register 3 + 0x80x0 0000 0200 0014 regs.s4 Callee saved register 4 + 0x80x0 0000 0200 0015 regs.s5 Callee saved register 5 + 0x80x0 0000 0200 0016 regs.s6 Callee saved register 6 + 0x80x0 0000 0200 0017 regs.s7 Callee saved register 7 + 0x80x0 0000 0200 0018 regs.s8 Callee saved register 8 + 0x80x0 0000 0200 0019 regs.s9 Callee saved register 9 + 0x80x0 0000 0200 001a regs.s10 Callee saved register 10 + 0x80x0 0000 0200 001b regs.s11 Callee saved register 11 + 0x80x0 0000 0200 001c regs.t3 Caller saved register 3 + 0x80x0 0000 0200 001d regs.t4 Caller saved register 4 + 0x80x0 0000 0200 001e regs.t5 Caller saved register 5 + 0x80x0 0000 0200 001f regs.t6 Caller saved register 6 + 0x80x0 0000 0200 0020 mode Privilege mode (1 = S-mode or 0 = U-mode) +======================= ========= ============================================= + +RISC-V csr registers represent the supervisor mode control/status registers +of a Guest VCPU and it has the following id bit patterns:: + + 0x8020 0000 03 <index into the kvm_riscv_csr struct:24> (32bit Host) + 0x8030 0000 03 <index into the kvm_riscv_csr struct:24> (64bit Host) + +Following are the RISC-V csr registers: + +======================= ========= ============================================= + Encoding Register Description +======================= ========= ============================================= + 0x80x0 0000 0300 0000 sstatus Supervisor status + 0x80x0 0000 0300 0001 sie Supervisor interrupt enable + 0x80x0 0000 0300 0002 stvec Supervisor trap vector base + 0x80x0 0000 0300 0003 sscratch Supervisor scratch register + 0x80x0 0000 0300 0004 sepc Supervisor exception program counter + 0x80x0 0000 0300 0005 scause Supervisor trap cause + 0x80x0 0000 0300 0006 stval Supervisor bad address or instruction + 0x80x0 0000 0300 0007 sip Supervisor interrupt pending + 0x80x0 0000 0300 0008 satp Supervisor address translation and protection +======================= ========= ============================================= + +RISC-V timer registers represent the timer state of a Guest VCPU and it has +the following id bit patterns:: + + 0x8030 0000 04 <index into the kvm_riscv_timer struct:24> + +Following are the RISC-V timer registers: + +======================= ========= ============================================= + Encoding Register Description +======================= ========= ============================================= + 0x8030 0000 0400 0000 frequency Time base frequency (read-only) + 0x8030 0000 0400 0001 time Time value visible to Guest + 0x8030 0000 0400 0002 compare Time compare programmed by Guest + 0x8030 0000 0400 0003 state Time compare state (1 = ON or 0 = OFF) +======================= ========= ============================================= + +RISC-V F-extension registers represent the single precision floating point +state of a Guest VCPU and it has the following id bit patterns:: + + 0x8020 0000 05 <index into the __riscv_f_ext_state struct:24> + +Following are the RISC-V F-extension registers: + +======================= ========= ============================================= + Encoding Register Description +======================= ========= ============================================= + 0x8020 0000 0500 0000 f[0] Floating point register 0 + ... + 0x8020 0000 0500 001f f[31] Floating point register 31 + 0x8020 0000 0500 0020 fcsr Floating point control and status register +======================= ========= ============================================= + +RISC-V D-extension registers represent the double precision floating point +state of a Guest VCPU and it has the following id bit patterns:: + + 0x8020 0000 06 <index into the __riscv_d_ext_state struct:24> (fcsr) + 0x8030 0000 06 <index into the __riscv_d_ext_state struct:24> (non-fcsr) + +Following are the RISC-V D-extension registers: + +======================= ========= ============================================= + Encoding Register Description +======================= ========= ============================================= + 0x8030 0000 0600 0000 f[0] Floating point register 0 + ... + 0x8030 0000 0600 001f f[31] Floating point register 31 + 0x8020 0000 0600 0020 fcsr Floating point control and status register +======================= ========= ============================================= + 4.69 KVM_GET_ONE_REG -------------------- @@ -5850,6 +6036,25 @@ Valid values for 'type' are: :: + /* KVM_EXIT_RISCV_SBI */ + struct { + unsigned long extension_id; + unsigned long function_id; + unsigned long args[6]; + unsigned long ret[2]; + } riscv_sbi; +If exit reason is KVM_EXIT_RISCV_SBI then it indicates that the VCPU has +done a SBI call which is not handled by KVM RISC-V kernel module. The details +of the SBI call are available in 'riscv_sbi' member of kvm_run structure. The +'extension_id' field of 'riscv_sbi' represents SBI extension ID whereas the +'function_id' field represents function ID of given SBI extension. The 'args' +array field of 'riscv_sbi' represents parameters for the SBI call and 'ret' +array field represents return values. The userspace should update the return +values of SBI call before resuming the VCPU. For more details on RISC-V SBI +spec refer, https://github.com/riscv/riscv-sbi-doc. + +:: + /* Fix the size of the union. */ char padding[256]; }; @@ -6706,6 +6911,20 @@ MAP_SHARED mmap will result in an -EINVAL return. When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to perform a bulk copy of tags to/from the guest. +7.29 KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM +------------------------------------- + +Architectures: x86 SEV enabled +Type: vm +Parameters: args[0] is the fd of the source vm +Returns: 0 on success + +This capability enables userspace to migrate the encryption context from the VM +indicated by the fd to the VM this is called on. + +This is intended to support intra-host migration of VMs between userspace VMMs, +upgrading the VMM process without interrupting the guest. + 8. Other capabilities. ====================== diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst index 2acec3b9ef65..60a29972d3f1 100644 --- a/Documentation/virt/kvm/devices/vcpu.rst +++ b/Documentation/virt/kvm/devices/vcpu.rst @@ -161,3 +161,73 @@ Specifies the base address of the stolen time structure for this VCPU. The base address must be 64 byte aligned and exist within a valid guest memory region. See Documentation/virt/kvm/arm/pvtime.rst for more information including the layout of the stolen time structure. + +4. GROUP: KVM_VCPU_TSC_CTRL +=========================== + +:Architectures: x86 + +4.1 ATTRIBUTE: KVM_VCPU_TSC_OFFSET + +:Parameters: 64-bit unsigned TSC offset + +Returns: + + ======= ====================================== + -EFAULT Error reading/writing the provided + parameter address. + -ENXIO Attribute not supported + ======= ====================================== + +Specifies the guest's TSC offset relative to the host's TSC. The guest's +TSC is then derived by the following equation: + + guest_tsc = host_tsc + KVM_VCPU_TSC_OFFSET + +This attribute is useful to adjust the guest's TSC on live migration, +so that the TSC counts the time during which the VM was paused. The +following describes a possible algorithm to use for this purpose. + +From the source VMM process: + +1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_src), + kvmclock nanoseconds (guest_src), and host CLOCK_REALTIME nanoseconds + (host_src). + +2. Read the KVM_VCPU_TSC_OFFSET attribute for every vCPU to record the + guest TSC offset (ofs_src[i]). + +3. Invoke the KVM_GET_TSC_KHZ ioctl to record the frequency of the + guest's TSC (freq). + +From the destination VMM process: + +4. Invoke the KVM_SET_CLOCK ioctl, providing the source nanoseconds from + kvmclock (guest_src) and CLOCK_REALTIME (host_src) in their respective + fields. Ensure that the KVM_CLOCK_REALTIME flag is set in the provided + structure. + + KVM will advance the VM's kvmclock to account for elapsed time since + recording the clock values. Note that this will cause problems in + the guest (e.g., timeouts) unless CLOCK_REALTIME is synchronized + between the source and destination, and a reasonably short time passes + between the source pausing the VMs and the destination executing + steps 4-7. + +5. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_dest) and + kvmclock nanoseconds (guest_dest). + +6. Adjust the guest TSC offsets for every vCPU to account for (1) time + elapsed since recording state and (2) difference in TSCs between the + source and destination machine: + + ofs_dst[i] = ofs_src[i] - + (guest_src - guest_dest) * freq + + (tsc_src - tsc_dest) + + ("ofs[i] + tsc - guest * freq" is the guest TSC value corresponding to + a time of 0 in kvmclock. The above formula ensures that it is the + same on the destination as it was on the source). + +7. Write the KVM_VCPU_TSC_OFFSET attribute for every vCPU with the + respective value derived in the previous step. diff --git a/Documentation/virt/kvm/devices/xics.rst b/Documentation/virt/kvm/devices/xics.rst index 2d6927e0b776..bf32c77174ab 100644 --- a/Documentation/virt/kvm/devices/xics.rst +++ b/Documentation/virt/kvm/devices/xics.rst @@ -22,7 +22,7 @@ Groups: Errors: ======= ========================================== - -EINVAL Value greater than KVM_MAX_VCPU_ID. + -EINVAL Value greater than KVM_MAX_VCPU_IDS. -EFAULT Invalid user pointer for attr->addr. -EBUSY A vcpu is already connected to the device. ======= ========================================== diff --git a/Documentation/virt/kvm/devices/xive.rst b/Documentation/virt/kvm/devices/xive.rst index 8bdf3dc38f01..8b5e7b40bdf8 100644 --- a/Documentation/virt/kvm/devices/xive.rst +++ b/Documentation/virt/kvm/devices/xive.rst @@ -91,7 +91,7 @@ the legacy interrupt mode, referred as XICS (POWER7/8). Errors: ======= ========================================== - -EINVAL Value greater than KVM_MAX_VCPU_ID. + -EINVAL Value greater than KVM_MAX_VCPU_IDS. -EFAULT Invalid user pointer for attr->addr. -EBUSY A vCPU is already connected to the device. ======= ========================================== |