aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/virt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/virt')
-rw-r--r--Documentation/virt/kvm/api.rst255
-rw-r--r--Documentation/virt/kvm/devices/vcpu.rst70
-rw-r--r--Documentation/virt/kvm/devices/xics.rst2
-rw-r--r--Documentation/virt/kvm/devices/xive.rst2
-rw-r--r--Documentation/virt/ne_overview.rst21
-rw-r--r--Documentation/virt/uml/user_mode_linux_howto_v2.rst119
6 files changed, 384 insertions, 85 deletions
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index a6729c8cf063..aeeb071c7688 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -532,7 +532,7 @@ translation mode.
------------------
:Capability: basic
-:Architectures: x86, ppc, mips
+:Architectures: x86, ppc, mips, riscv
:Type: vcpu ioctl
:Parameters: struct kvm_interrupt (in)
:Returns: 0 on success, negative on failure.
@@ -601,6 +601,23 @@ interrupt number dequeues the interrupt.
This is an asynchronous vcpu ioctl and can be invoked from any thread.
+RISC-V:
+^^^^^^^
+
+Queues an external interrupt to be injected into the virutal CPU. This ioctl
+is overloaded with 2 different irq values:
+
+a) KVM_INTERRUPT_SET
+
+ This sets external interrupt for a virtual CPU and it will receive
+ once it is ready.
+
+b) KVM_INTERRUPT_UNSET
+
+ This clears pending external interrupt for a virtual CPU.
+
+This is an asynchronous vcpu ioctl and can be invoked from any thread.
+
4.17 KVM_DEBUG_GUEST
--------------------
@@ -993,20 +1010,37 @@ such as migration.
When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the
set of bits that KVM can return in struct kvm_clock_data's flag member.
-The only flag defined now is KVM_CLOCK_TSC_STABLE. If set, the returned
-value is the exact kvmclock value seen by all VCPUs at the instant
-when KVM_GET_CLOCK was called. If clear, the returned value is simply
-CLOCK_MONOTONIC plus a constant offset; the offset can be modified
-with KVM_SET_CLOCK. KVM will try to make all VCPUs follow this clock,
-but the exact value read by each VCPU could differ, because the host
-TSC is not stable.
+The following flags are defined:
+
+KVM_CLOCK_TSC_STABLE
+ If set, the returned value is the exact kvmclock
+ value seen by all VCPUs at the instant when KVM_GET_CLOCK was called.
+ If clear, the returned value is simply CLOCK_MONOTONIC plus a constant
+ offset; the offset can be modified with KVM_SET_CLOCK. KVM will try
+ to make all VCPUs follow this clock, but the exact value read by each
+ VCPU could differ, because the host TSC is not stable.
+
+KVM_CLOCK_REALTIME
+ If set, the `realtime` field in the kvm_clock_data
+ structure is populated with the value of the host's real time
+ clocksource at the instant when KVM_GET_CLOCK was called. If clear,
+ the `realtime` field does not contain a value.
+
+KVM_CLOCK_HOST_TSC
+ If set, the `host_tsc` field in the kvm_clock_data
+ structure is populated with the value of the host's timestamp counter (TSC)
+ at the instant when KVM_GET_CLOCK was called. If clear, the `host_tsc` field
+ does not contain a value.
::
struct kvm_clock_data {
__u64 clock; /* kvmclock current value */
__u32 flags;
- __u32 pad[9];
+ __u32 pad0;
+ __u64 realtime;
+ __u64 host_tsc;
+ __u32 pad[4];
};
@@ -1023,12 +1057,25 @@ Sets the current timestamp of kvmclock to the value specified in its parameter.
In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
such as migration.
+The following flags can be passed:
+
+KVM_CLOCK_REALTIME
+ If set, KVM will compare the value of the `realtime` field
+ with the value of the host's real time clocksource at the instant when
+ KVM_SET_CLOCK was called. The difference in elapsed time is added to the final
+ kvmclock value that will be provided to guests.
+
+Other flags returned by ``KVM_GET_CLOCK`` are accepted but ignored.
+
::
struct kvm_clock_data {
__u64 clock; /* kvmclock current value */
__u32 flags;
- __u32 pad[9];
+ __u32 pad0;
+ __u64 realtime;
+ __u64 host_tsc;
+ __u32 pad[4];
};
@@ -1399,7 +1446,7 @@ for vm-wide capabilities.
---------------------
:Capability: KVM_CAP_MP_STATE
-:Architectures: x86, s390, arm, arm64
+:Architectures: x86, s390, arm, arm64, riscv
:Type: vcpu ioctl
:Parameters: struct kvm_mp_state (out)
:Returns: 0 on success; -1 on error
@@ -1416,7 +1463,8 @@ uniprocessor guests).
Possible values are:
========================== ===============================================
- KVM_MP_STATE_RUNNABLE the vcpu is currently running [x86,arm/arm64]
+ KVM_MP_STATE_RUNNABLE the vcpu is currently running
+ [x86,arm/arm64,riscv]
KVM_MP_STATE_UNINITIALIZED the vcpu is an application processor (AP)
which has not yet received an INIT signal [x86]
KVM_MP_STATE_INIT_RECEIVED the vcpu has received an INIT signal, and is
@@ -1425,7 +1473,7 @@ Possible values are:
is waiting for an interrupt [x86]
KVM_MP_STATE_SIPI_RECEIVED the vcpu has just received a SIPI (vector
accessible via KVM_GET_VCPU_EVENTS) [x86]
- KVM_MP_STATE_STOPPED the vcpu is stopped [s390,arm/arm64]
+ KVM_MP_STATE_STOPPED the vcpu is stopped [s390,arm/arm64,riscv]
KVM_MP_STATE_CHECK_STOP the vcpu is in a special error state [s390]
KVM_MP_STATE_OPERATING the vcpu is operating (running or halted)
[s390]
@@ -1437,8 +1485,8 @@ On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
in-kernel irqchip, the multiprocessing state must be maintained by userspace on
these architectures.
-For arm/arm64:
-^^^^^^^^^^^^^^
+For arm/arm64/riscv:
+^^^^^^^^^^^^^^^^^^^^
The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
@@ -1447,7 +1495,7 @@ KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
---------------------
:Capability: KVM_CAP_MP_STATE
-:Architectures: x86, s390, arm, arm64
+:Architectures: x86, s390, arm, arm64, riscv
:Type: vcpu ioctl
:Parameters: struct kvm_mp_state (in)
:Returns: 0 on success; -1 on error
@@ -1459,8 +1507,8 @@ On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
in-kernel irqchip, the multiprocessing state must be maintained by userspace on
these architectures.
-For arm/arm64:
-^^^^^^^^^^^^^^
+For arm/arm64/riscv:
+^^^^^^^^^^^^^^^^^^^^
The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
@@ -2577,6 +2625,144 @@ following id bit patterns::
0x7020 0000 0003 02 <0:3> <reg:5>
+RISC-V registers are mapped using the lower 32 bits. The upper 8 bits of
+that is the register group type.
+
+RISC-V config registers are meant for configuring a Guest VCPU and it has
+the following id bit patterns::
+
+ 0x8020 0000 01 <index into the kvm_riscv_config struct:24> (32bit Host)
+ 0x8030 0000 01 <index into the kvm_riscv_config struct:24> (64bit Host)
+
+Following are the RISC-V config registers:
+
+======================= ========= =============================================
+ Encoding Register Description
+======================= ========= =============================================
+ 0x80x0 0000 0100 0000 isa ISA feature bitmap of Guest VCPU
+======================= ========= =============================================
+
+The isa config register can be read anytime but can only be written before
+a Guest VCPU runs. It will have ISA feature bits matching underlying host
+set by default.
+
+RISC-V core registers represent the general excution state of a Guest VCPU
+and it has the following id bit patterns::
+
+ 0x8020 0000 02 <index into the kvm_riscv_core struct:24> (32bit Host)
+ 0x8030 0000 02 <index into the kvm_riscv_core struct:24> (64bit Host)
+
+Following are the RISC-V core registers:
+
+======================= ========= =============================================
+ Encoding Register Description
+======================= ========= =============================================
+ 0x80x0 0000 0200 0000 regs.pc Program counter
+ 0x80x0 0000 0200 0001 regs.ra Return address
+ 0x80x0 0000 0200 0002 regs.sp Stack pointer
+ 0x80x0 0000 0200 0003 regs.gp Global pointer
+ 0x80x0 0000 0200 0004 regs.tp Task pointer
+ 0x80x0 0000 0200 0005 regs.t0 Caller saved register 0
+ 0x80x0 0000 0200 0006 regs.t1 Caller saved register 1
+ 0x80x0 0000 0200 0007 regs.t2 Caller saved register 2
+ 0x80x0 0000 0200 0008 regs.s0 Callee saved register 0
+ 0x80x0 0000 0200 0009 regs.s1 Callee saved register 1
+ 0x80x0 0000 0200 000a regs.a0 Function argument (or return value) 0
+ 0x80x0 0000 0200 000b regs.a1 Function argument (or return value) 1
+ 0x80x0 0000 0200 000c regs.a2 Function argument 2
+ 0x80x0 0000 0200 000d regs.a3 Function argument 3
+ 0x80x0 0000 0200 000e regs.a4 Function argument 4
+ 0x80x0 0000 0200 000f regs.a5 Function argument 5
+ 0x80x0 0000 0200 0010 regs.a6 Function argument 6
+ 0x80x0 0000 0200 0011 regs.a7 Function argument 7
+ 0x80x0 0000 0200 0012 regs.s2 Callee saved register 2
+ 0x80x0 0000 0200 0013 regs.s3 Callee saved register 3
+ 0x80x0 0000 0200 0014 regs.s4 Callee saved register 4
+ 0x80x0 0000 0200 0015 regs.s5 Callee saved register 5
+ 0x80x0 0000 0200 0016 regs.s6 Callee saved register 6
+ 0x80x0 0000 0200 0017 regs.s7 Callee saved register 7
+ 0x80x0 0000 0200 0018 regs.s8 Callee saved register 8
+ 0x80x0 0000 0200 0019 regs.s9 Callee saved register 9
+ 0x80x0 0000 0200 001a regs.s10 Callee saved register 10
+ 0x80x0 0000 0200 001b regs.s11 Callee saved register 11
+ 0x80x0 0000 0200 001c regs.t3 Caller saved register 3
+ 0x80x0 0000 0200 001d regs.t4 Caller saved register 4
+ 0x80x0 0000 0200 001e regs.t5 Caller saved register 5
+ 0x80x0 0000 0200 001f regs.t6 Caller saved register 6
+ 0x80x0 0000 0200 0020 mode Privilege mode (1 = S-mode or 0 = U-mode)
+======================= ========= =============================================
+
+RISC-V csr registers represent the supervisor mode control/status registers
+of a Guest VCPU and it has the following id bit patterns::
+
+ 0x8020 0000 03 <index into the kvm_riscv_csr struct:24> (32bit Host)
+ 0x8030 0000 03 <index into the kvm_riscv_csr struct:24> (64bit Host)
+
+Following are the RISC-V csr registers:
+
+======================= ========= =============================================
+ Encoding Register Description
+======================= ========= =============================================
+ 0x80x0 0000 0300 0000 sstatus Supervisor status
+ 0x80x0 0000 0300 0001 sie Supervisor interrupt enable
+ 0x80x0 0000 0300 0002 stvec Supervisor trap vector base
+ 0x80x0 0000 0300 0003 sscratch Supervisor scratch register
+ 0x80x0 0000 0300 0004 sepc Supervisor exception program counter
+ 0x80x0 0000 0300 0005 scause Supervisor trap cause
+ 0x80x0 0000 0300 0006 stval Supervisor bad address or instruction
+ 0x80x0 0000 0300 0007 sip Supervisor interrupt pending
+ 0x80x0 0000 0300 0008 satp Supervisor address translation and protection
+======================= ========= =============================================
+
+RISC-V timer registers represent the timer state of a Guest VCPU and it has
+the following id bit patterns::
+
+ 0x8030 0000 04 <index into the kvm_riscv_timer struct:24>
+
+Following are the RISC-V timer registers:
+
+======================= ========= =============================================
+ Encoding Register Description
+======================= ========= =============================================
+ 0x8030 0000 0400 0000 frequency Time base frequency (read-only)
+ 0x8030 0000 0400 0001 time Time value visible to Guest
+ 0x8030 0000 0400 0002 compare Time compare programmed by Guest
+ 0x8030 0000 0400 0003 state Time compare state (1 = ON or 0 = OFF)
+======================= ========= =============================================
+
+RISC-V F-extension registers represent the single precision floating point
+state of a Guest VCPU and it has the following id bit patterns::
+
+ 0x8020 0000 05 <index into the __riscv_f_ext_state struct:24>
+
+Following are the RISC-V F-extension registers:
+
+======================= ========= =============================================
+ Encoding Register Description
+======================= ========= =============================================
+ 0x8020 0000 0500 0000 f[0] Floating point register 0
+ ...
+ 0x8020 0000 0500 001f f[31] Floating point register 31
+ 0x8020 0000 0500 0020 fcsr Floating point control and status register
+======================= ========= =============================================
+
+RISC-V D-extension registers represent the double precision floating point
+state of a Guest VCPU and it has the following id bit patterns::
+
+ 0x8020 0000 06 <index into the __riscv_d_ext_state struct:24> (fcsr)
+ 0x8030 0000 06 <index into the __riscv_d_ext_state struct:24> (non-fcsr)
+
+Following are the RISC-V D-extension registers:
+
+======================= ========= =============================================
+ Encoding Register Description
+======================= ========= =============================================
+ 0x8030 0000 0600 0000 f[0] Floating point register 0
+ ...
+ 0x8030 0000 0600 001f f[31] Floating point register 31
+ 0x8020 0000 0600 0020 fcsr Floating point control and status register
+======================= ========= =============================================
+
4.69 KVM_GET_ONE_REG
--------------------
@@ -5850,6 +6036,25 @@ Valid values for 'type' are:
::
+ /* KVM_EXIT_RISCV_SBI */
+ struct {
+ unsigned long extension_id;
+ unsigned long function_id;
+ unsigned long args[6];
+ unsigned long ret[2];
+ } riscv_sbi;
+If exit reason is KVM_EXIT_RISCV_SBI then it indicates that the VCPU has
+done a SBI call which is not handled by KVM RISC-V kernel module. The details
+of the SBI call are available in 'riscv_sbi' member of kvm_run structure. The
+'extension_id' field of 'riscv_sbi' represents SBI extension ID whereas the
+'function_id' field represents function ID of given SBI extension. The 'args'
+array field of 'riscv_sbi' represents parameters for the SBI call and 'ret'
+array field represents return values. The userspace should update the return
+values of SBI call before resuming the VCPU. For more details on RISC-V SBI
+spec refer, https://github.com/riscv/riscv-sbi-doc.
+
+::
+
/* Fix the size of the union. */
char padding[256];
};
@@ -6706,6 +6911,20 @@ MAP_SHARED mmap will result in an -EINVAL return.
When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to
perform a bulk copy of tags to/from the guest.
+7.29 KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM
+-------------------------------------
+
+Architectures: x86 SEV enabled
+Type: vm
+Parameters: args[0] is the fd of the source vm
+Returns: 0 on success
+
+This capability enables userspace to migrate the encryption context from the VM
+indicated by the fd to the VM this is called on.
+
+This is intended to support intra-host migration of VMs between userspace VMMs,
+upgrading the VMM process without interrupting the guest.
+
8. Other capabilities.
======================
diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/kvm/devices/vcpu.rst
index 2acec3b9ef65..60a29972d3f1 100644
--- a/Documentation/virt/kvm/devices/vcpu.rst
+++ b/Documentation/virt/kvm/devices/vcpu.rst
@@ -161,3 +161,73 @@ Specifies the base address of the stolen time structure for this VCPU. The
base address must be 64 byte aligned and exist within a valid guest memory
region. See Documentation/virt/kvm/arm/pvtime.rst for more information
including the layout of the stolen time structure.
+
+4. GROUP: KVM_VCPU_TSC_CTRL
+===========================
+
+:Architectures: x86
+
+4.1 ATTRIBUTE: KVM_VCPU_TSC_OFFSET
+
+:Parameters: 64-bit unsigned TSC offset
+
+Returns:
+
+ ======= ======================================
+ -EFAULT Error reading/writing the provided
+ parameter address.
+ -ENXIO Attribute not supported
+ ======= ======================================
+
+Specifies the guest's TSC offset relative to the host's TSC. The guest's
+TSC is then derived by the following equation:
+
+ guest_tsc = host_tsc + KVM_VCPU_TSC_OFFSET
+
+This attribute is useful to adjust the guest's TSC on live migration,
+so that the TSC counts the time during which the VM was paused. The
+following describes a possible algorithm to use for this purpose.
+
+From the source VMM process:
+
+1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_src),
+ kvmclock nanoseconds (guest_src), and host CLOCK_REALTIME nanoseconds
+ (host_src).
+
+2. Read the KVM_VCPU_TSC_OFFSET attribute for every vCPU to record the
+ guest TSC offset (ofs_src[i]).
+
+3. Invoke the KVM_GET_TSC_KHZ ioctl to record the frequency of the
+ guest's TSC (freq).
+
+From the destination VMM process:
+
+4. Invoke the KVM_SET_CLOCK ioctl, providing the source nanoseconds from
+ kvmclock (guest_src) and CLOCK_REALTIME (host_src) in their respective
+ fields. Ensure that the KVM_CLOCK_REALTIME flag is set in the provided
+ structure.
+
+ KVM will advance the VM's kvmclock to account for elapsed time since
+ recording the clock values. Note that this will cause problems in
+ the guest (e.g., timeouts) unless CLOCK_REALTIME is synchronized
+ between the source and destination, and a reasonably short time passes
+ between the source pausing the VMs and the destination executing
+ steps 4-7.
+
+5. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_dest) and
+ kvmclock nanoseconds (guest_dest).
+
+6. Adjust the guest TSC offsets for every vCPU to account for (1) time
+ elapsed since recording state and (2) difference in TSCs between the
+ source and destination machine:
+
+ ofs_dst[i] = ofs_src[i] -
+ (guest_src - guest_dest) * freq +
+ (tsc_src - tsc_dest)
+
+ ("ofs[i] + tsc - guest * freq" is the guest TSC value corresponding to
+ a time of 0 in kvmclock. The above formula ensures that it is the
+ same on the destination as it was on the source).
+
+7. Write the KVM_VCPU_TSC_OFFSET attribute for every vCPU with the
+ respective value derived in the previous step.
diff --git a/Documentation/virt/kvm/devices/xics.rst b/Documentation/virt/kvm/devices/xics.rst
index 2d6927e0b776..bf32c77174ab 100644
--- a/Documentation/virt/kvm/devices/xics.rst
+++ b/Documentation/virt/kvm/devices/xics.rst
@@ -22,7 +22,7 @@ Groups:
Errors:
======= ==========================================
- -EINVAL Value greater than KVM_MAX_VCPU_ID.
+ -EINVAL Value greater than KVM_MAX_VCPU_IDS.
-EFAULT Invalid user pointer for attr->addr.
-EBUSY A vcpu is already connected to the device.
======= ==========================================
diff --git a/Documentation/virt/kvm/devices/xive.rst b/Documentation/virt/kvm/devices/xive.rst
index 8bdf3dc38f01..8b5e7b40bdf8 100644
--- a/Documentation/virt/kvm/devices/xive.rst
+++ b/Documentation/virt/kvm/devices/xive.rst
@@ -91,7 +91,7 @@ the legacy interrupt mode, referred as XICS (POWER7/8).
Errors:
======= ==========================================
- -EINVAL Value greater than KVM_MAX_VCPU_ID.
+ -EINVAL Value greater than KVM_MAX_VCPU_IDS.
-EFAULT Invalid user pointer for attr->addr.
-EBUSY A vCPU is already connected to the device.
======= ==========================================
diff --git a/Documentation/virt/ne_overview.rst b/Documentation/virt/ne_overview.rst
index 39b0c8fe2654..74c2f5919c88 100644
--- a/Documentation/virt/ne_overview.rst
+++ b/Documentation/virt/ne_overview.rst
@@ -14,12 +14,15 @@ instances [1].
For example, an application that processes sensitive data and runs in a VM,
can be separated from other applications running in the same VM. This
application then runs in a separate VM than the primary VM, namely an enclave.
+It runs alongside the VM that spawned it. This setup matches low latency
+applications needs.
-An enclave runs alongside the VM that spawned it. This setup matches low latency
-applications needs. The resources that are allocated for the enclave, such as
-memory and CPUs, are carved out of the primary VM. Each enclave is mapped to a
-process running in the primary VM, that communicates with the NE driver via an
-ioctl interface.
+The current supported architectures for the NE kernel driver, available in the
+upstream Linux kernel, are x86 and ARM64.
+
+The resources that are allocated for the enclave, such as memory and CPUs, are
+carved out of the primary VM. Each enclave is mapped to a process running in the
+primary VM, that communicates with the NE kernel driver via an ioctl interface.
In this sense, there are two components:
@@ -43,8 +46,8 @@ for the enclave VM. An enclave does not have persistent storage attached.
The memory regions carved out of the primary VM and given to an enclave need to
be aligned 2 MiB / 1 GiB physically contiguous memory regions (or multiple of
this size e.g. 8 MiB). The memory can be allocated e.g. by using hugetlbfs from
-user space [2][3]. The memory size for an enclave needs to be at least 64 MiB.
-The enclave memory and CPUs need to be from the same NUMA node.
+user space [2][3][7]. The memory size for an enclave needs to be at least
+64 MiB. The enclave memory and CPUs need to be from the same NUMA node.
An enclave runs on dedicated cores. CPU 0 and its CPU siblings need to remain
available for the primary VM. A CPU pool has to be set for NE purposes by an
@@ -61,7 +64,7 @@ device is placed in memory below the typical 4 GiB.
The application that runs in the enclave needs to be packaged in an enclave
image together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
enclave VM. The enclave VM has its own kernel and follows the standard Linux
-boot protocol [6].
+boot protocol [6][8].
The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
Enclave Image Format (EIF); plus an EIF header including metadata such as magic
@@ -93,3 +96,5 @@ enclave process can exit.
[4] https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
[5] https://man7.org/linux/man-pages/man7/vsock.7.html
[6] https://www.kernel.org/doc/html/latest/x86/boot.html
+[7] https://www.kernel.org/doc/html/latest/arm64/hugetlbpage.html
+[8] https://www.kernel.org/doc/html/latest/arm64/booting.html
diff --git a/Documentation/virt/uml/user_mode_linux_howto_v2.rst b/Documentation/virt/uml/user_mode_linux_howto_v2.rst
index 312e431695d9..2cafd3c3c6cb 100644
--- a/Documentation/virt/uml/user_mode_linux_howto_v2.rst
+++ b/Documentation/virt/uml/user_mode_linux_howto_v2.rst
@@ -128,7 +128,7 @@ Create a minimal OS installation on the mounted filesystem::
debootstrap does not set up the root password, fstab, hostname or
anything related to networking. It is up to the user to do that.
-Set the root password -t he easiest way to do that is to chroot into the
+Set the root password - the easiest way to do that is to chroot into the
mounted image::
# chroot /mnt
@@ -144,7 +144,7 @@ will be empty and it needs an entry for the root file system::
/dev/ubd0 ext4 discard,errors=remount-ro 0 1
The image hostname will be set to the same as the host on which you
-are creating it image. It is a good idea to change that to avoid
+are creating its image. It is a good idea to change that to avoid
"Oh, bummer, I rebooted the wrong machine".
UML supports two classes of network devices - the older uml_net ones
@@ -162,7 +162,7 @@ need entries like::
# vector UML network devices
auto vec0
- iface eth0 inet dhcp
+ iface vec0 inet dhcp
We now have a UML image which is nearly ready to run, all we need is a
UML kernel and modules for it.
@@ -179,7 +179,12 @@ directory to the mounted UML filesystem::
If you have compiled your own kernel, you need to use the usual "install
modules to a location" procedure by running::
- # make install MODULES_DIR=/mnt/lib/modules
+ # make INSTALL_MOD_PATH=/mnt/lib/modules modules_install
+
+This will install modules into /mnt/lib/modules/$(KERNELRELEASE).
+To specify the full module installation path, use::
+
+ # make MODLIB=/mnt/lib/modules modules_install
At this point the image is ready to be brought up.
@@ -188,7 +193,7 @@ Setting Up UML Networking
*************************
UML networking is designed to emulate an Ethernet connection. This
-connection may be either a point-to-point (similar to a connection
+connection may be either point-to-point (similar to a connection
between machines using a back-to-back cable) or a connection to a
switch. UML supports a wide variety of means to build these
connections to all of: local machine, remote machine(s), local and
@@ -231,7 +236,7 @@ remote UML and other VM instances.
* All transports which have multi-packet rx and/or tx can deliver pps
rates of up to 1Mps or more.
-* All legacy transports are generally limited to ~600-700MBit and 0.05Mps
+* All legacy transports are generally limited to ~600-700MBit and 0.05Mps.
* GRE and L2TPv3 allow connections to all of: local machine, remote
machines, remote network devices and remote UML instances.
@@ -255,7 +260,7 @@ raw sockets where needed.
This can be achieved by granting the user a particular capability instead
of running UML as root. In case of vector transport, a user can add the
-capability ``CAP_NET_ADMIN`` or ``CAP_NET_RAW``, to the uml binary.
+capability ``CAP_NET_ADMIN`` or ``CAP_NET_RAW`` to the uml binary.
Thenceforth, UML can be run with normal user privilges, along with
full networking.
@@ -286,7 +291,7 @@ These options are common for all transports:
* ``mac=XX:XX:XX:XX:XX`` - sets the interface MAC address value.
-* ``gro=[0,1]`` - sets GRO on or off. Enables receive/transmit offloads.
+* ``gro=[0,1]`` - sets GRO off or on. Enables receive/transmit offloads.
The effect of this option depends on the host side support in the transport
which is being configured. In most cases it will enable TCP segmentation and
RX/TX checksumming offloads. The setting must be identical on the host side
@@ -301,7 +306,7 @@ These options are common for all transports:
* ``headroom=int`` - adjusts the default headroom (32 bytes) reserved
if a packet will need to be re-encapsulated into for instance VXLAN.
-* ``vec=0`` - disable multipacket io and fall back to packet at a
+* ``vec=0`` - disable multipacket IO and fall back to packet at a
time mode
Shared Options
@@ -331,7 +336,7 @@ Example::
This will connect vec0 to tap0 on the host. Tap0 must already exist (for example
created using tunctl) and UP.
-tap0 can be configured as a point-to-point interface and given an ip
+tap0 can be configured as a point-to-point interface and given an IP
address so that UML can talk to the host. Alternatively, it is possible
to connect UML to a tap interface which is connected to a bridge.
@@ -358,7 +363,7 @@ Example::
This is an experimental/demo transport which couples tap for transmit
and a raw socket for receive. The raw socket allows multi-packet
-receive resulting in significantly higher packet rates than normal tap
+receive resulting in significantly higher packet rates than normal tap.
Privileges required: hybrid requires ``CAP_NET_RAW`` capability by
the UML user as well as the requirements for the tap transport.
@@ -426,10 +431,10 @@ This will configure an Ethernet over ``GRE`` (aka ``GRETAP`` or
endpoint at host dst_host. ``GRE`` supports the following additional
options:
-* ``rx_key=int`` - GRE 32 bit integer key for rx packets, if set,
+* ``rx_key=int`` - GRE 32-bit integer key for rx packets, if set,
``txkey`` must be set too
-* ``tx_key=int`` - GRE 32 bit integer key for tx packets, if set
+* ``tx_key=int`` - GRE 32-bit integer key for tx packets, if set
``rx_key`` must be set too
* ``sequence=[0,1]`` - enable GRE sequence
@@ -444,12 +449,12 @@ options:
GRE has a number of caveats:
-* You can use only one GRE connection per ip address. There is no way to
+* You can use only one GRE connection per IP address. There is no way to
multiplex connections as each GRE tunnel is terminated directly on
the UML instance.
* The key is not really a security feature. While it was intended as such
- it's "security" is laughable. It is, however, a useful feature to
+ its "security" is laughable. It is, however, a useful feature to
ensure that the tunnel is not misconfigured.
An example configuration for a Linux host with a local address of
@@ -489,22 +494,22 @@ the L2TPv3 UDP flavour and UDP destination port $dst_port.
L2TPv3 always requires the following additional options:
-* ``rx_session=int`` - l2tpv3 32 bit integer session for rx packets
+* ``rx_session=int`` - l2tpv3 32-bit integer session for rx packets
-* ``tx_session=int`` - l2tpv3 32 bit integer session for tx packets
+* ``tx_session=int`` - l2tpv3 32-bit integer session for tx packets
As the tunnel is fixed these are not negotiated and they are
preconfigured on both ends.
-Additionally, L2TPv3 supports the following optional parameters
+Additionally, L2TPv3 supports the following optional parameters.
-* ``rx_cookie=int`` - l2tpv3 32 bit integer cookie for rx packets - same
+* ``rx_cookie=int`` - l2tpv3 32-bit integer cookie for rx packets - same
functionality as GRE key, more to prevent misconfiguration than provide
actual security
-* ``tx_cookie=int`` - l2tpv3 32 bit integer cookie for tx packets
+* ``tx_cookie=int`` - l2tpv3 32-bit integer cookie for tx packets
-* ``cookie64=[0,1]`` - use 64 bit cookies instead of 32 bit.
+* ``cookie64=[0,1]`` - use 64-bit cookies instead of 32-bit.
* ``counter=[0,1]`` - enable l2tpv3 counter
@@ -518,12 +523,12 @@ Additionally, L2TPv3 supports the following optional parameters
L2TPv3 has a number of caveats:
-* you can use only one connection per ip address in raw mode. There is
+* you can use only one connection per IP address in raw mode. There is
no way to multiplex connections as each L2TPv3 tunnel is terminated
directly on the UML instance. UDP mode can use different ports for
this purpose.
-Here is an example of how to configure a linux host to connect to UML
+Here is an example of how to configure a Linux host to connect to UML
via L2TPv3:
**/etc/network/interfaces**::
@@ -586,7 +591,7 @@ distribution or a custom built kernel has been installed on the host.
These add an executable called linux to the system. This is the UML
kernel. It can be run just like any other executable.
It will take most normal linux kernel arguments as command line
-arguments. Additionally, it will need some UML specific arguments
+arguments. Additionally, it will need some UML-specific arguments
in order to do something useful.
Arguments
@@ -595,7 +600,7 @@ Arguments
Mandatory Arguments:
--------------------
-* ``mem=int[K,M,G]`` - amount of memory. By default bytes. It will
+* ``mem=int[K,M,G]`` - amount of memory. By default in bytes. It will
also accept K, M or G qualifiers.
* ``ubdX[s,d,c,t]=`` virtual disk specification. This is not really
@@ -603,7 +608,7 @@ Mandatory Arguments:
specify a root file system.
The simplest possible image specification is the name of the image
file for the filesystem (created using one of the methods described
- in `Creating an image`_)
+ in `Creating an image`_).
* UBD devices support copy on write (COW). The changes are kept in
a separate file which can be discarded allowing a rollback to the
@@ -613,15 +618,15 @@ Mandatory Arguments:
* UBD devices can be set to use synchronous IO. Any writes are
immediately flushed to disk. This is done by adding ``s`` after
- the ``ubdX`` specification
+ the ``ubdX`` specification.
- * UBD performs some euristics on devices specified as a single
+ * UBD performs some heuristics on devices specified as a single
filename to make sure that a COW file has not been specified as
- the image. To turn them off, use the ``d`` flag after ``ubdX``
+ the image. To turn them off, use the ``d`` flag after ``ubdX``.
* UBD supports TRIM - asking the Host OS to reclaim any unused
blocks in the image. To turn it off, specify the ``t`` flag after
- ``ubdX``
+ ``ubdX``.
* ``root=`` root device - most likely ``/dev/ubd0`` (this is a Linux
filesystem image)
@@ -631,7 +636,7 @@ Important Optional Arguments
If UML is run as "linux" with no extra arguments, it will try to start an
xterm for every console configured inside the image (up to 6 in most
-linux distributions). Each console is started inside an
+Linux distributions). Each console is started inside an
xterm. This makes it nice and easy to use UML on a host with a GUI. It is,
however, the wrong approach if UML is to be used as a testing harness or run
in a text-only environment.
@@ -656,10 +661,10 @@ one is input, the second one output.
* The null channel - Discard all input or output. Example ``con=null`` will set
all consoles to null by default.
-* The fd channel - use file descriptor numbers for input/out. Example:
+* The fd channel - use file descriptor numbers for input/output. Example:
``con1=fd:0,fd:1.``
-* The port channel - listen on tcp port number. Example: ``con1=port:4321``
+* The port channel - listen on TCP port number. Example: ``con1=port:4321``
* The pty and pts channels - use system pty/pts.
@@ -667,7 +672,7 @@ one is input, the second one output.
will make UML use the host 8th console (usually unused).
* The xterm channel - this is the default - bring up an xterm on this channel
- and direct IO to it. Note, that in order for xterm to work, the host must
+ and direct IO to it. Note that in order for xterm to work, the host must
have the UML distribution package installed. This usually contains the
port-helper and other utilities needed for UML to communicate with the xterm.
Alternatively, these need to be complied and installed from source. All
@@ -685,7 +690,7 @@ We can now run UML.
vec0:transport=tap,ifname=tap0,depth=128,gro=1 \
root=/dev/ubda con=null con0=null,fd:2 con1=fd:0,fd:1
-This will run an instance with ``2048M RAM``, try to use the image file
+This will run an instance with ``2048M RAM`` and try to use the image file
called ``Filesystem.img`` as root. It will connect to the host using tap0.
All consoles except ``con1`` will be disabled and console 1 will
use standard input/output making it appear in the same terminal it was started.
@@ -702,7 +707,7 @@ The UML Management Console
============================
In addition to managing the image from "the inside" using normal sysadmin tools,
-it is possible to perform a number of low level operations using the UML
+it is possible to perform a number of low-level operations using the UML
management console. The UML management console is a low-level interface to the
kernel on a running UML instance, somewhat like the i386 SysRq interface. Since
there is a full-blown operating system under UML, there is much greater
@@ -726,7 +731,7 @@ kernel. When you boot UML, you'll see a line like::
mconsole initialized on /home/jdike/.uml/umlNJ32yL/mconsole
-If you specify a unique machine id one the UML command line, i.e.
+If you specify a unique machine id on the UML command line, i.e.
``umid=debian``, you'll see this::
mconsole initialized on /home/jdike/.uml/debian/mconsole
@@ -881,11 +886,11 @@ be able to cache the shared data using a much smaller amount of memory,
so UML disk requests will be served from the host's memory rather than
its disks. There is a major caveat in doing this on multisocket NUMA
machines. On such hardware, running many UML instances with a shared
-master image and COW changes may caise issues like NMIs from excess of
+master image and COW changes may cause issues like NMIs from excess of
inter-socket traffic.
-If you are running UML on high end hardware like this, make sure to
-bind UML to a set of logical cpus residing on the same socket using the
+If you are running UML on high-end hardware like this, make sure to
+bind UML to a set of logical CPUs residing on the same socket using the
``taskset`` command or have a look at the "tuning" section.
To add a copy-on-write layer to an existing block device file, simply
@@ -986,7 +991,7 @@ specify a subdirectory to mount with the -o switch to mount::
# mount none /mnt/home -t hostfs -o /home
-will mount the hosts's /home on the virtual machine's /mnt/home.
+will mount the host's /home on the virtual machine's /mnt/home.
hostfs as the root filesystem
-----------------------------
@@ -1035,7 +1040,7 @@ The UBD driver, SIGIO and the MMU emulation do that. If the system is
idle, these threads will be migrated to other processors on a SMP host.
This, unfortunately, will usually result in LOWER performance because of
all of the cache/memory synchronization traffic between cores. As a
-result, UML will usually benefit from being pinned on a single CPU
+result, UML will usually benefit from being pinned on a single CPU,
especially on a large system. This can result in performance differences
of 5 times or higher on some benchmarks.
@@ -1061,7 +1066,7 @@ filesystems, devices, virtualization, etc. It provides unrivalled
opportunities to create and test them without being constrained to
emulating specific hardware.
-Example - want to try how linux will work with 4096 "proper" network
+Example - want to try how Linux will work with 4096 "proper" network
devices?
Not an issue with UML. At the same time, this is something which
@@ -1070,10 +1075,10 @@ constrained by the number of devices allowed on the hardware bus
they are trying to emulate (for example 16 on a PCI bus in qemu).
If you have something to contribute such as a patch, a bugfix, a
-new feature, please send it to ``linux-um@lists.infradead.org``
+new feature, please send it to ``linux-um@lists.infradead.org``.
Please follow all standard Linux patch guidelines such as cc-ing
-relevant maintainers and run ``./sripts/checkpatch.pl`` on your patch.
+relevant maintainers and run ``./scripts/checkpatch.pl`` on your patch.
For more details see ``Documentation/process/submitting-patches.rst``
Note - the list does not accept HTML or attachments, all emails must
@@ -1082,21 +1087,21 @@ be formatted as plain text.
Developing always goes hand in hand with debugging. First of all,
you can always run UML under gdb and there will be a whole section
later on on how to do that. That, however, is not the only way to
-debug a linux kernel. Quite often adding tracing statements and/or
+debug a Linux kernel. Quite often adding tracing statements and/or
using UML specific approaches such as ptracing the UML kernel process
are significantly more informative.
Tracing UML
=============
-When running UML consists of a main kernel thread and a number of
+When running, UML consists of a main kernel thread and a number of
helper threads. The ones of interest for tracing are NOT the ones
that are already ptraced by UML as a part of its MMU emulation.
These are usually the first three threads visible in a ps display.
The one with the lowest PID number and using most CPU is usually the
kernel thread. The other threads are the disk
-(ubd) device helper thread and the sigio helper thread.
+(ubd) device helper thread and the SIGIO helper thread.
Running ptrace on this thread usually results in the following picture::
host$ strace -p 16566
@@ -1121,21 +1126,21 @@ Running ptrace on this thread usually results in the following picture::
--- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=1631716592, ptr=0x614204f0}} ---
rt_sigreturn({mask=[PIPE]}) = -1 EINTR (Interrupted system call)
-This is a typical picture from a mostly idle UML instance
+This is a typical picture from a mostly idle UML instance.
* UML interrupt controller uses epoll - this is UML waiting for IO
interrupts:
epoll_wait(4, [{EPOLLIN, {u32=3721159424, u64=3721159424}}], 64, 0) = 1
-* The sequence of ptrace calls is part of MMU emulation and runnin the
- UML userspace
+* The sequence of ptrace calls is part of MMU emulation and running the
+ UML userspace.
* ``timer_settime`` is part of the UML high res timer subsystem mapping
- timer requests from inside UML onto the host high resultion timers.
+ timer requests from inside UML onto the host high resolution timers.
* ``clock_nanosleep`` is UML going into idle (similar to the way a PC
will execute an ACPI idle).
-As you can see UML will generate quite a bit of output even in idle.The output
+As you can see UML will generate quite a bit of output even in idle. The output
can be very informative when observing IO. It shows the actual IO calls, their
arguments and returns values.
@@ -1164,14 +1169,14 @@ in order to really leverage UML, one needs to write a piece of
userspace code which maps driver concepts onto actual userspace host
calls.
-This forms the so called "user" portion of the driver. While it can
+This forms the so-called "user" portion of the driver. While it can
reuse a lot of kernel concepts, it is generally just another piece of
userspace code. This portion needs some matching "kernel" code which
resides inside the UML image and which implements the Linux kernel part.
*Note: There are very few limitations in the way "kernel" and "user" interact*.
-UML does not have a strictly defined kernel to host API. It does not
+UML does not have a strictly defined kernel-to-host API. It does not
try to emulate a specific architecture or bus. UML's "kernel" and
"user" can share memory, code and interact as needed to implement
whatever design the software developer has in mind. The only
@@ -1180,7 +1185,7 @@ variables having the same names, the developer should be careful
which includes and libraries they are trying to refer to.
As a result a lot of userspace code consists of simple wrappers.
-F.e. ``os_close_file()`` is just a wrapper around ``close()``
+E.g. ``os_close_file()`` is just a wrapper around ``close()``
which ensures that the userspace function close does not clash
with similarly named function(s) in the kernel part.
@@ -1188,7 +1193,7 @@ Security Considerations
-----------------------
Drivers or any new functionality should default to not
-accepting arbitrary filename, bpf code or other parameters
+accepting arbitrary filename, bpf code or other parameters
which can affect the host from inside the UML instance.
For example, specifying the socket used for IPC communication
between a driver and the host at the UML command line is OK