diff options
Diffstat (limited to 'Documentation/arch/riscv')
-rw-r--r-- | Documentation/arch/riscv/acpi.rst | 10 | ||||
-rw-r--r-- | Documentation/arch/riscv/boot-image-header.rst | 59 | ||||
-rw-r--r-- | Documentation/arch/riscv/boot.rst | 169 | ||||
-rw-r--r-- | Documentation/arch/riscv/cmodx.rst | 130 | ||||
-rw-r--r-- | Documentation/arch/riscv/features.rst | 3 | ||||
-rw-r--r-- | Documentation/arch/riscv/hwprobe.rst | 363 | ||||
-rw-r--r-- | Documentation/arch/riscv/index.rst | 25 | ||||
-rw-r--r-- | Documentation/arch/riscv/patch-acceptance.rst | 59 | ||||
-rw-r--r-- | Documentation/arch/riscv/uabi.rst | 86 | ||||
-rw-r--r-- | Documentation/arch/riscv/vector.rst | 140 | ||||
-rw-r--r-- | Documentation/arch/riscv/vm-layout.rst | 136 |
11 files changed, 1180 insertions, 0 deletions
diff --git a/Documentation/arch/riscv/acpi.rst b/Documentation/arch/riscv/acpi.rst new file mode 100644 index 000000000000..9870a282815b --- /dev/null +++ b/Documentation/arch/riscv/acpi.rst @@ -0,0 +1,10 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============== +ACPI on RISC-V +============== + +The ISA string parsing rules for ACPI are defined by `Version ASCIIDOC +Conversion, 12/2022 of the RISC-V specifications, as defined by tag +"riscv-isa-release-1239329-2023-05-23" (commit 1239329 +) <https://github.com/riscv/riscv-isa-manual/releases/tag/riscv-isa-release-1239329-2023-05-23>`_ diff --git a/Documentation/arch/riscv/boot-image-header.rst b/Documentation/arch/riscv/boot-image-header.rst new file mode 100644 index 000000000000..df2ffc173e80 --- /dev/null +++ b/Documentation/arch/riscv/boot-image-header.rst @@ -0,0 +1,59 @@ +================================= +Boot image header in RISC-V Linux +================================= + +:Author: Atish Patra <atish.patra@wdc.com> +:Date: 20 May 2019 + +This document only describes the boot image header details for RISC-V Linux. + +The following 64-byte header is present in decompressed Linux kernel image:: + + u32 code0; /* Executable code */ + u32 code1; /* Executable code */ + u64 text_offset; /* Image load offset, little endian */ + u64 image_size; /* Effective Image size, little endian */ + u64 flags; /* kernel flags, little endian */ + u32 version; /* Version of this header */ + u32 res1 = 0; /* Reserved */ + u64 res2 = 0; /* Reserved */ + u64 magic = 0x5643534952; /* Magic number, little endian, "RISCV" */ + u32 magic2 = 0x05435352; /* Magic number 2, little endian, "RSC\x05" */ + u32 res3; /* Reserved for PE COFF offset */ + +This header format is compliant with PE/COFF header and largely inspired from +ARM64 header. Thus, both ARM64 & RISC-V header can be combined into one common +header in future. + +Notes +===== + +- This header is also reused to support EFI stub for RISC-V. EFI specification + needs PE/COFF image header in the beginning of the kernel image in order to + load it as an EFI application. In order to support EFI stub, code0 is replaced + with "MZ" magic string and res3(at offset 0x3c) points to the rest of the + PE/COFF header. + +- version field indicate header version number + + ========== ============= + Bits 0:15 Minor version + Bits 16:31 Major version + ========== ============= + + This preserves compatibility across newer and older version of the header. + The current version is defined as 0.2. + +- The "magic" field is deprecated as of version 0.2. In a future + release, it may be removed. This originally should have matched up + with the ARM64 header "magic" field, but unfortunately does not. + The "magic2" field replaces it, matching up with the ARM64 header. + +- In current header, the flags field has only one field. + + ===== ==================================== + Bit 0 Kernel endianness. 1 if BE, 0 if LE. + ===== ==================================== + +- Image size is mandatory for boot loader to load kernel image. Booting will + fail otherwise. diff --git a/Documentation/arch/riscv/boot.rst b/Documentation/arch/riscv/boot.rst new file mode 100644 index 000000000000..6077b587a842 --- /dev/null +++ b/Documentation/arch/riscv/boot.rst @@ -0,0 +1,169 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=============================================== +RISC-V Kernel Boot Requirements and Constraints +=============================================== + +:Author: Alexandre Ghiti <alexghiti@rivosinc.com> +:Date: 23 May 2023 + +This document describes what the RISC-V kernel expects from bootloaders and +firmware, and also the constraints that any developer must have in mind when +touching the early boot process. For the purposes of this document, the +``early boot process`` refers to any code that runs before the final virtual +mapping is set up. + +Pre-kernel Requirements and Constraints +======================================= + +The RISC-V kernel expects the following of bootloaders and platform firmware: + +Register state +-------------- + +The RISC-V kernel expects: + + * ``$a0`` to contain the hartid of the current core. + * ``$a1`` to contain the address of the devicetree in memory. + +CSR state +--------- + +The RISC-V kernel expects: + + * ``$satp = 0``: the MMU, if present, must be disabled. + +Reserved memory for resident firmware +------------------------------------- + +The RISC-V kernel must not map any resident memory, or memory protected with +PMPs, in the direct mapping, so the firmware must correctly mark those regions +as per the devicetree specification and/or the UEFI specification. + +Kernel location +--------------- + +The RISC-V kernel expects to be placed at a PMD boundary (2MB aligned for rv64 +and 4MB aligned for rv32). Note that the EFI stub will physically relocate the +kernel if that's not the case. + +Hardware description +-------------------- + +The firmware can pass either a devicetree or ACPI tables to the RISC-V kernel. + +The devicetree is either passed directly to the kernel from the previous stage +using the ``$a1`` register, or when booting with UEFI, it can be passed using the +EFI configuration table. + +The ACPI tables are passed to the kernel using the EFI configuration table. In +this case, a tiny devicetree is still created by the EFI stub. Please refer to +"EFI stub and devicetree" section below for details about this devicetree. + +Kernel entry +------------ + +On SMP systems, there are 2 methods to enter the kernel: + +- ``RISCV_BOOT_SPINWAIT``: the firmware releases all harts in the kernel, one hart + wins a lottery and executes the early boot code while the other harts are + parked waiting for the initialization to finish. This method is mostly used to + support older firmwares without SBI HSM extension and M-mode RISC-V kernel. +- ``Ordered booting``: the firmware releases only one hart that will execute the + initialization phase and then will start all other harts using the SBI HSM + extension. The ordered booting method is the preferred booting method for + booting the RISC-V kernel because it can support CPU hotplug and kexec. + +UEFI +---- + +UEFI memory map +~~~~~~~~~~~~~~~ + +When booting with UEFI, the RISC-V kernel will use only the EFI memory map to +populate the system memory. + +The UEFI firmware must parse the subnodes of the ``/reserved-memory`` devicetree +node and abide by the devicetree specification to convert the attributes of +those subnodes (``no-map`` and ``reusable``) into their correct EFI equivalent +(refer to section "3.5.4 /reserved-memory and UEFI" of the devicetree +specification v0.4-rc1). + +RISCV_EFI_BOOT_PROTOCOL +~~~~~~~~~~~~~~~~~~~~~~~ + +When booting with UEFI, the EFI stub requires the boot hartid in order to pass +it to the RISC-V kernel in ``$a1``. The EFI stub retrieves the boot hartid using +one of the following methods: + +- ``RISCV_EFI_BOOT_PROTOCOL`` (**preferred**). +- ``boot-hartid`` devicetree subnode (**deprecated**). + +Any new firmware must implement ``RISCV_EFI_BOOT_PROTOCOL`` as the devicetree +based approach is deprecated now. + +Early Boot Requirements and Constraints +======================================= + +The RISC-V kernel's early boot process operates under the following constraints: + +EFI stub and devicetree +----------------------- + +When booting with UEFI, the devicetree is supplemented (or created) by the EFI +stub with the same parameters as arm64 which are described at the paragraph +"UEFI kernel support on ARM" in Documentation/arch/arm/uefi.rst. + +Virtual mapping installation +---------------------------- + +The installation of the virtual mapping is done in 2 steps in the RISC-V kernel: + +1. ``setup_vm()`` installs a temporary kernel mapping in ``early_pg_dir`` which + allows discovery of the system memory. Only the kernel text/data are mapped + at this point. When establishing this mapping, no allocation can be done + (since the system memory is not known yet), so ``early_pg_dir`` page table is + statically allocated (using only one table for each level). + +2. ``setup_vm_final()`` creates the final kernel mapping in ``swapper_pg_dir`` + and takes advantage of the discovered system memory to create the linear + mapping. When establishing this mapping, the kernel can allocate memory but + cannot access it directly (since the direct mapping is not present yet), so + it uses temporary mappings in the fixmap region to be able to access the + newly allocated page table levels. + +For ``virt_to_phys()`` and ``phys_to_virt()`` to be able to correctly convert +direct mapping addresses to physical addresses, they need to know the start of +the DRAM. This happens after step 1, right before step 2 installs the direct +mapping (see ``setup_bootmem()`` function in arch/riscv/mm/init.c). Any usage of +those macros before the final virtual mapping is installed must be carefully +examined. + +Devicetree mapping via fixmap +----------------------------- + +As the ``reserved_mem`` array is initialized with virtual addresses established +by ``setup_vm()``, and used with the mapping established by +``setup_vm_final()``, the RISC-V kernel uses the fixmap region to map the +devicetree. This ensures that the devicetree remains accessible by both virtual +mappings. + +Pre-MMU execution +----------------- + +A few pieces of code need to run before even the first virtual mapping is +established. These are the installation of the first virtual mapping itself, +patching of early alternatives and the early parsing of the kernel command line. +That code must be very carefully compiled as: + +- ``-fno-pie``: This is needed for relocatable kernels which use ``-fPIE``, + since otherwise, any access to a global symbol would go through the GOT which + is only relocated virtually. +- ``-mcmodel=medany``: Any access to a global symbol must be PC-relative to + avoid any relocations to happen before the MMU is setup. +- *all* instrumentation must also be disabled (that includes KASAN, ftrace and + others). + +As using a symbol from a different compilation unit requires this unit to be +compiled with those flags, we advise, as much as possible, not to use external +symbols. diff --git a/Documentation/arch/riscv/cmodx.rst b/Documentation/arch/riscv/cmodx.rst new file mode 100644 index 000000000000..40ba53bed5df --- /dev/null +++ b/Documentation/arch/riscv/cmodx.rst @@ -0,0 +1,130 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============================================================================== +Concurrent Modification and Execution of Instructions (CMODX) for RISC-V Linux +============================================================================== + +CMODX is a programming technique where a program executes instructions that were +modified by the program itself. Instruction storage and the instruction cache +(icache) are not guaranteed to be synchronized on RISC-V hardware. Therefore, the +program must enforce its own synchronization with the unprivileged fence.i +instruction. + +CMODX in the Kernel Space +------------------------- + +Dynamic ftrace +--------------------- + +Essentially, dynamic ftrace directs the control flow by inserting a function +call at each patchable function entry, and patches it dynamically at runtime to +enable or disable the redirection. In the case of RISC-V, 2 instructions, +AUIPC + JALR, are required to compose a function call. However, it is impossible +to patch 2 instructions and expect that a concurrent read-side executes them +without a race condition. This series makes atmoic code patching possible in +RISC-V ftrace. Kernel preemption makes things even worse as it allows the old +state to persist across the patching process with stop_machine(). + +In order to get rid of stop_machine() and run dynamic ftrace with full kernel +preemption, we partially initialize each patchable function entry at boot-time, +setting the first instruction to AUIPC, and the second to NOP. Now, atmoic +patching is possible because the kernel only has to update one instruction. +According to Ziccif, as long as an instruction is naturally aligned, the ISA +guarantee an atomic update. + +By fixing down the first instruction, AUIPC, the range of the ftrace trampoline +is limited to +-2K from the predetermined target, ftrace_caller, due to the lack +of immediate encoding space in RISC-V. To address the issue, we introduce +CALL_OPS, where an 8B naturally align metadata is added in front of each +pacthable function. The metadata is resolved at the first trampoline, then the +execution can be derect to another custom trampoline. + +CMODX in the User Space +----------------------- + +Though fence.i is an unprivileged instruction, the default Linux ABI prohibits +the use of fence.i in userspace applications. At any point the scheduler may +migrate a task onto a new hart. If migration occurs after the userspace +synchronized the icache and instruction storage with fence.i, the icache on the +new hart will no longer be clean. This is due to the behavior of fence.i only +affecting the hart that it is called on. Thus, the hart that the task has been +migrated to may not have synchronized instruction storage and icache. + +There are two ways to solve this problem: use the riscv_flush_icache() syscall, +or use the ``PR_RISCV_SET_ICACHE_FLUSH_CTX`` prctl() and emit fence.i in +userspace. The syscall performs a one-off icache flushing operation. The prctl +changes the Linux ABI to allow userspace to emit icache flushing operations. + +As an aside, "deferred" icache flushes can sometimes be triggered in the kernel. +At the time of writing, this only occurs during the riscv_flush_icache() syscall +and when the kernel uses copy_to_user_page(). These deferred flushes happen only +when the memory map being used by a hart changes. If the prctl() context caused +an icache flush, this deferred icache flush will be skipped as it is redundant. +Therefore, there will be no additional flush when using the riscv_flush_icache() +syscall inside of the prctl() context. + +prctl() Interface +--------------------- + +Call prctl() with ``PR_RISCV_SET_ICACHE_FLUSH_CTX`` as the first argument. The +remaining arguments will be delegated to the riscv_set_icache_flush_ctx +function detailed below. + +.. kernel-doc:: arch/riscv/mm/cacheflush.c + :identifiers: riscv_set_icache_flush_ctx + +Example usage: + +The following files are meant to be compiled and linked with each other. The +modify_instruction() function replaces an add with 0 with an add with one, +causing the instruction sequence in get_value() to change from returning a zero +to returning a one. + +cmodx.c:: + + #include <stdio.h> + #include <sys/prctl.h> + + extern int get_value(); + extern void modify_instruction(); + + int main() + { + int value = get_value(); + printf("Value before cmodx: %d\n", value); + + // Call prctl before first fence.i is called inside modify_instruction + prctl(PR_RISCV_SET_ICACHE_FLUSH_CTX, PR_RISCV_CTX_SW_FENCEI_ON, PR_RISCV_SCOPE_PER_PROCESS); + modify_instruction(); + // Call prctl after final fence.i is called in process + prctl(PR_RISCV_SET_ICACHE_FLUSH_CTX, PR_RISCV_CTX_SW_FENCEI_OFF, PR_RISCV_SCOPE_PER_PROCESS); + + value = get_value(); + printf("Value after cmodx: %d\n", value); + return 0; + } + +cmodx.S:: + + .option norvc + + .text + .global modify_instruction + modify_instruction: + lw a0, new_insn + lui a5,%hi(old_insn) + sw a0,%lo(old_insn)(a5) + fence.i + ret + + .section modifiable, "awx" + .global get_value + get_value: + li a0, 0 + old_insn: + addi a0, a0, 0 + ret + + .data + new_insn: + addi a0, a0, 1 diff --git a/Documentation/arch/riscv/features.rst b/Documentation/arch/riscv/features.rst new file mode 100644 index 000000000000..36e90144adab --- /dev/null +++ b/Documentation/arch/riscv/features.rst @@ -0,0 +1,3 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. kernel-feat:: features riscv diff --git a/Documentation/arch/riscv/hwprobe.rst b/Documentation/arch/riscv/hwprobe.rst new file mode 100644 index 000000000000..2aa9be272d5d --- /dev/null +++ b/Documentation/arch/riscv/hwprobe.rst @@ -0,0 +1,363 @@ +.. SPDX-License-Identifier: GPL-2.0 + +RISC-V Hardware Probing Interface +--------------------------------- + +The RISC-V hardware probing interface is based around a single syscall, which +is defined in <asm/hwprobe.h>:: + + struct riscv_hwprobe { + __s64 key; + __u64 value; + }; + + long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count, + size_t cpusetsize, cpu_set_t *cpus, + unsigned int flags); + +The arguments are split into three groups: an array of key-value pairs, a CPU +set, and some flags. The key-value pairs are supplied with a count. Userspace +must prepopulate the key field for each element, and the kernel will fill in the +value if the key is recognized. If a key is unknown to the kernel, its key field +will be cleared to -1, and its value set to 0. The CPU set is defined by +CPU_SET(3) with size ``cpusetsize`` bytes. For value-like keys (eg. vendor, +arch, impl), the returned value will only be valid if all CPUs in the given set +have the same value. Otherwise -1 will be returned. For boolean-like keys, the +value returned will be a logical AND of the values for the specified CPUs. +Usermode can supply NULL for ``cpus`` and 0 for ``cpusetsize`` as a shortcut for +all online CPUs. The currently supported flags are: + +* :c:macro:`RISCV_HWPROBE_WHICH_CPUS`: This flag basically reverses the behavior + of sys_riscv_hwprobe(). Instead of populating the values of keys for a given + set of CPUs, the values of each key are given and the set of CPUs is reduced + by sys_riscv_hwprobe() to only those which match each of the key-value pairs. + How matching is done depends on the key type. For value-like keys, matching + means to be the exact same as the value. For boolean-like keys, matching + means the result of a logical AND of the pair's value with the CPU's value is + exactly the same as the pair's value. Additionally, when ``cpus`` is an empty + set, then it is initialized to all online CPUs which fit within it, i.e. the + CPU set returned is the reduction of all the online CPUs which can be + represented with a CPU set of size ``cpusetsize``. + +All other flags are reserved for future compatibility and must be zero. + +On success 0 is returned, on failure a negative error code is returned. + +The following keys are defined: + +* :c:macro:`RISCV_HWPROBE_KEY_MVENDORID`: Contains the value of ``mvendorid``, + as defined by the RISC-V privileged architecture specification. + +* :c:macro:`RISCV_HWPROBE_KEY_MARCHID`: Contains the value of ``marchid``, as + defined by the RISC-V privileged architecture specification. + +* :c:macro:`RISCV_HWPROBE_KEY_MIMPID`: Contains the value of ``mimpid``, as + defined by the RISC-V privileged architecture specification. + +* :c:macro:`RISCV_HWPROBE_KEY_BASE_BEHAVIOR`: A bitmask containing the base + user-visible behavior that this kernel supports. The following base user ABIs + are defined: + + * :c:macro:`RISCV_HWPROBE_BASE_BEHAVIOR_IMA`: Support for rv32ima or + rv64ima, as defined by version 2.2 of the user ISA and version 1.10 of the + privileged ISA, with the following known exceptions (more exceptions may be + added, but only if it can be demonstrated that the user ABI is not broken): + + * The ``fence.i`` instruction cannot be directly executed by userspace + programs (it may still be executed in userspace via a + kernel-controlled mechanism such as the vDSO). + +* :c:macro:`RISCV_HWPROBE_KEY_IMA_EXT_0`: A bitmask containing the extensions + that are compatible with the :c:macro:`RISCV_HWPROBE_BASE_BEHAVIOR_IMA`: + base system behavior. + + * :c:macro:`RISCV_HWPROBE_IMA_FD`: The F and D extensions are supported, as + defined by commit cd20cee ("FMIN/FMAX now implement + minimumNumber/maximumNumber, not minNum/maxNum") of the RISC-V ISA manual. + + * :c:macro:`RISCV_HWPROBE_IMA_C`: The C extension is supported, as defined + by version 2.2 of the RISC-V ISA manual. + + * :c:macro:`RISCV_HWPROBE_IMA_V`: The V extension is supported, as defined by + version 1.0 of the RISC-V Vector extension manual. + + * :c:macro:`RISCV_HWPROBE_EXT_ZBA`: The Zba address generation extension is + supported, as defined in version 1.0 of the Bit-Manipulation ISA + extensions. + + * :c:macro:`RISCV_HWPROBE_EXT_ZBB`: The Zbb extension is supported, as defined + in version 1.0 of the Bit-Manipulation ISA extensions. + + * :c:macro:`RISCV_HWPROBE_EXT_ZBS`: The Zbs extension is supported, as defined + in version 1.0 of the Bit-Manipulation ISA extensions. + + * :c:macro:`RISCV_HWPROBE_EXT_ZICBOZ`: The Zicboz extension is supported, as + ratified in commit 3dd606f ("Create cmobase-v1.0.pdf") of riscv-CMOs. + + * :c:macro:`RISCV_HWPROBE_EXT_ZBC` The Zbc extension is supported, as defined + in version 1.0 of the Bit-Manipulation ISA extensions. + + * :c:macro:`RISCV_HWPROBE_EXT_ZBKB` The Zbkb extension is supported, as + defined in version 1.0 of the Scalar Crypto ISA extensions. + + * :c:macro:`RISCV_HWPROBE_EXT_ZBKC` The Zbkc extension is supported, as + defined in version 1.0 of the Scalar Crypto ISA extensions. + + * :c:macro:`RISCV_HWPROBE_EXT_ZBKX` The Zbkx extension is supported, as + defined in version 1.0 of the Scalar Crypto ISA extensions. + + * :c:macro:`RISCV_HWPROBE_EXT_ZKND` The Zknd extension is supported, as + defined in version 1.0 of the Scalar Crypto ISA extensions. + + * :c:macro:`RISCV_HWPROBE_EXT_ZKNE` The Zkne extension is supported, as + defined in version 1.0 of the Scalar Crypto ISA extensions. + + * :c:macro:`RISCV_HWPROBE_EXT_ZKNH` The Zknh extension is supported, as + defined in version 1.0 of the Scalar Crypto ISA extensions. + + * :c:macro:`RISCV_HWPROBE_EXT_ZKSED` The Zksed extension is supported, as + defined in version 1.0 of the Scalar Crypto ISA extensions. + + * :c:macro:`RISCV_HWPROBE_EXT_ZKSH` The Zksh extension is supported, as + defined in version 1.0 of the Scalar Crypto ISA extensions. + + * :c:macro:`RISCV_HWPROBE_EXT_ZKT` The Zkt extension is supported, as defined + in version 1.0 of the Scalar Crypto ISA extensions. + + * :c:macro:`RISCV_HWPROBE_EXT_ZVBB`: The Zvbb extension is supported as + defined in version 1.0 of the RISC-V Cryptography Extensions Volume II. + + * :c:macro:`RISCV_HWPROBE_EXT_ZVBC`: The Zvbc extension is supported as + defined in version 1.0 of the RISC-V Cryptography Extensions Volume II. + + * :c:macro:`RISCV_HWPROBE_EXT_ZVKB`: The Zvkb extension is supported as + defined in version 1.0 of the RISC-V Cryptography Extensions Volume II. + + * :c:macro:`RISCV_HWPROBE_EXT_ZVKG`: The Zvkg extension is supported as + defined in version 1.0 of the RISC-V Cryptography Extensions Volume II. + + * :c:macro:`RISCV_HWPROBE_EXT_ZVKNED`: The Zvkned extension is supported as + defined in version 1.0 of the RISC-V Cryptography Extensions Volume II. + + * :c:macro:`RISCV_HWPROBE_EXT_ZVKNHA`: The Zvknha extension is supported as + defined in version 1.0 of the RISC-V Cryptography Extensions Volume II. + + * :c:macro:`RISCV_HWPROBE_EXT_ZVKNHB`: The Zvknhb extension is supported as + defined in version 1.0 of the RISC-V Cryptography Extensions Volume II. + + * :c:macro:`RISCV_HWPROBE_EXT_ZVKSED`: The Zvksed extension is supported as + defined in version 1.0 of the RISC-V Cryptography Extensions Volume II. + + * :c:macro:`RISCV_HWPROBE_EXT_ZVKSH`: The Zvksh extension is supported as + defined in version 1.0 of the RISC-V Cryptography Extensions Volume II. + + * :c:macro:`RISCV_HWPROBE_EXT_ZVKT`: The Zvkt extension is supported as + defined in version 1.0 of the RISC-V Cryptography Extensions Volume II. + + * :c:macro:`RISCV_HWPROBE_EXT_ZFH`: The Zfh extension version 1.0 is supported + as defined in the RISC-V ISA manual. + + * :c:macro:`RISCV_HWPROBE_EXT_ZFHMIN`: The Zfhmin extension version 1.0 is + supported as defined in the RISC-V ISA manual. + + * :c:macro:`RISCV_HWPROBE_EXT_ZIHINTNTL`: The Zihintntl extension version 1.0 + is supported as defined in the RISC-V ISA manual. + + * :c:macro:`RISCV_HWPROBE_EXT_ZVFH`: The Zvfh extension is supported as + defined in the RISC-V Vector manual starting from commit e2ccd0548d6c + ("Remove draft warnings from Zvfh[min]"). + + * :c:macro:`RISCV_HWPROBE_EXT_ZVFHMIN`: The Zvfhmin extension is supported as + defined in the RISC-V Vector manual starting from commit e2ccd0548d6c + ("Remove draft warnings from Zvfh[min]"). + + * :c:macro:`RISCV_HWPROBE_EXT_ZFA`: The Zfa extension is supported as + defined in the RISC-V ISA manual starting from commit 056b6ff467c7 + ("Zfa is ratified"). + + * :c:macro:`RISCV_HWPROBE_EXT_ZTSO`: The Ztso extension is supported as + defined in the RISC-V ISA manual starting from commit 5618fb5a216b + ("Ztso is now ratified.") + + * :c:macro:`RISCV_HWPROBE_EXT_ZACAS`: The Zacas extension is supported as + defined in the Atomic Compare-and-Swap (CAS) instructions manual starting + from commit 5059e0ca641c ("update to ratified"). + + * :c:macro:`RISCV_HWPROBE_EXT_ZICNTR`: The Zicntr extension version 2.0 + is supported as defined in the RISC-V ISA manual. + + * :c:macro:`RISCV_HWPROBE_EXT_ZICOND`: The Zicond extension is supported as + defined in the RISC-V Integer Conditional (Zicond) operations extension + manual starting from commit 95cf1f9 ("Add changes requested by Ved + during signoff") + + * :c:macro:`RISCV_HWPROBE_EXT_ZIHINTPAUSE`: The Zihintpause extension is + supported as defined in the RISC-V ISA manual starting from commit + d8ab5c78c207 ("Zihintpause is ratified"). + + * :c:macro:`RISCV_HWPROBE_EXT_ZIHPM`: The Zihpm extension version 2.0 + is supported as defined in the RISC-V ISA manual. + + * :c:macro:`RISCV_HWPROBE_EXT_ZVE32X`: The Vector sub-extension Zve32x is + supported, as defined by version 1.0 of the RISC-V Vector extension manual. + + * :c:macro:`RISCV_HWPROBE_EXT_ZVE32F`: The Vector sub-extension Zve32f is + supported, as defined by version 1.0 of the RISC-V Vector extension manual. + + * :c:macro:`RISCV_HWPROBE_EXT_ZVE64X`: The Vector sub-extension Zve64x is + supported, as defined by version 1.0 of the RISC-V Vector extension manual. + + * :c:macro:`RISCV_HWPROBE_EXT_ZVE64F`: The Vector sub-extension Zve64f is + supported, as defined by version 1.0 of the RISC-V Vector extension manual. + + * :c:macro:`RISCV_HWPROBE_EXT_ZVE64D`: The Vector sub-extension Zve64d is + supported, as defined by version 1.0 of the RISC-V Vector extension manual. + + * :c:macro:`RISCV_HWPROBE_EXT_ZIMOP`: The Zimop May-Be-Operations extension is + supported as defined in the RISC-V ISA manual starting from commit + 58220614a5f ("Zimop is ratified/1.0"). + + * :c:macro:`RISCV_HWPROBE_EXT_ZCA`: The Zca extension part of Zc* standard + extensions for code size reduction, as ratified in commit 8be3419c1c0 + ("Zcf doesn't exist on RV64 as it contains no instructions") of + riscv-code-size-reduction. + + * :c:macro:`RISCV_HWPROBE_EXT_ZCB`: The Zcb extension part of Zc* standard + extensions for code size reduction, as ratified in commit 8be3419c1c0 + ("Zcf doesn't exist on RV64 as it contains no instructions") of + riscv-code-size-reduction. + + * :c:macro:`RISCV_HWPROBE_EXT_ZCD`: The Zcd extension part of Zc* standard + extensions for code size reduction, as ratified in commit 8be3419c1c0 + ("Zcf doesn't exist on RV64 as it contains no instructions") of + riscv-code-size-reduction. + + * :c:macro:`RISCV_HWPROBE_EXT_ZCF`: The Zcf extension part of Zc* standard + extensions for code size reduction, as ratified in commit 8be3419c1c0 + ("Zcf doesn't exist on RV64 as it contains no instructions") of + riscv-code-size-reduction. + + * :c:macro:`RISCV_HWPROBE_EXT_ZCMOP`: The Zcmop May-Be-Operations extension is + supported as defined in the RISC-V ISA manual starting from commit + c732a4f39a4 ("Zcmop is ratified/1.0"). + + * :c:macro:`RISCV_HWPROBE_EXT_ZAWRS`: The Zawrs extension is supported as + ratified in commit 98918c844281 ("Merge pull request #1217 from + riscv/zawrs") of riscv-isa-manual. + + * :c:macro:`RISCV_HWPROBE_EXT_ZAAMO`: The Zaamo extension is supported as + defined in the in the RISC-V ISA manual starting from commit e87412e621f1 + ("integrate Zaamo and Zalrsc text (#1304)"). + + * :c:macro:`RISCV_HWPROBE_EXT_ZALRSC`: The Zalrsc extension is supported as + defined in the in the RISC-V ISA manual starting from commit e87412e621f1 + ("integrate Zaamo and Zalrsc text (#1304)"). + + * :c:macro:`RISCV_HWPROBE_EXT_SUPM`: The Supm extension is supported as + defined in version 1.0 of the RISC-V Pointer Masking extensions. + + * :c:macro:`RISCV_HWPROBE_EXT_ZFBFMIN`: The Zfbfmin extension is supported as + defined in the RISC-V ISA manual starting from commit 4dc23d6229de + ("Added Chapter title to BF16"). + + * :c:macro:`RISCV_HWPROBE_EXT_ZVFBFMIN`: The Zvfbfmin extension is supported as + defined in the RISC-V ISA manual starting from commit 4dc23d6229de + ("Added Chapter title to BF16"). + + * :c:macro:`RISCV_HWPROBE_EXT_ZVFBFWMA`: The Zvfbfwma extension is supported as + defined in the RISC-V ISA manual starting from commit 4dc23d6229de + ("Added Chapter title to BF16"). + + * :c:macro:`RISCV_HWPROBE_EXT_ZICBOM`: The Zicbom extension is supported, as + ratified in commit 3dd606f ("Create cmobase-v1.0.pdf") of riscv-CMOs. + + * :c:macro:`RISCV_HWPROBE_EXT_ZABHA`: The Zabha extension is supported as + ratified in commit 49f49c842ff9 ("Update to Rafified state") of + riscv-zabha. + +* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: Deprecated. Returns similar values to + :c:macro:`RISCV_HWPROBE_KEY_MISALIGNED_SCALAR_PERF`, but the key was + mistakenly classified as a bitmask rather than a value. + +* :c:macro:`RISCV_HWPROBE_KEY_MISALIGNED_SCALAR_PERF`: An enum value describing + the performance of misaligned scalar native word accesses on the selected set + of processors. + + * :c:macro:`RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN`: The performance of + misaligned scalar accesses is unknown. + + * :c:macro:`RISCV_HWPROBE_MISALIGNED_SCALAR_EMULATED`: Misaligned scalar + accesses are emulated via software, either in or below the kernel. These + accesses are always extremely slow. + + * :c:macro:`RISCV_HWPROBE_MISALIGNED_SCALAR_SLOW`: Misaligned scalar native + word sized accesses are slower than the equivalent quantity of byte + accesses. Misaligned accesses may be supported directly in hardware, or + trapped and emulated by software. + + * :c:macro:`RISCV_HWPROBE_MISALIGNED_SCALAR_FAST`: Misaligned scalar native + word sized accesses are faster than the equivalent quantity of byte + accesses. + + * :c:macro:`RISCV_HWPROBE_MISALIGNED_SCALAR_UNSUPPORTED`: Misaligned scalar + accesses are not supported at all and will generate a misaligned address + fault. + +* :c:macro:`RISCV_HWPROBE_KEY_ZICBOZ_BLOCK_SIZE`: An unsigned int which + represents the size of the Zicboz block in bytes. + +* :c:macro:`RISCV_HWPROBE_KEY_HIGHEST_VIRT_ADDRESS`: An unsigned long which + represent the highest userspace virtual address usable. + +* :c:macro:`RISCV_HWPROBE_KEY_TIME_CSR_FREQ`: Frequency (in Hz) of `time CSR`. + +* :c:macro:`RISCV_HWPROBE_KEY_MISALIGNED_VECTOR_PERF`: An enum value describing the + performance of misaligned vector accesses on the selected set of processors. + + * :c:macro:`RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN`: The performance of misaligned + vector accesses is unknown. + + * :c:macro:`RISCV_HWPROBE_MISALIGNED_VECTOR_SLOW`: 32-bit misaligned accesses using vector + registers are slower than the equivalent quantity of byte accesses via vector registers. + Misaligned accesses may be supported directly in hardware, or trapped and emulated by software. + + * :c:macro:`RISCV_HWPROBE_MISALIGNED_VECTOR_FAST`: 32-bit misaligned accesses using vector + registers are faster than the equivalent quantity of byte accesses via vector registers. + + * :c:macro:`RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED`: Misaligned vector accesses are + not supported at all and will generate a misaligned address fault. + +* :c:macro:`RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0`: A bitmask containing the + thead vendor extensions that are compatible with the + :c:macro:`RISCV_HWPROBE_BASE_BEHAVIOR_IMA`: base system behavior. + + * T-HEAD + + * :c:macro:`RISCV_HWPROBE_VENDOR_EXT_XTHEADVECTOR`: The xtheadvector vendor + extension is supported in the T-Head ISA extensions spec starting from + commit a18c801634 ("Add T-Head VECTOR vendor extension. "). + +* :c:macro:`RISCV_HWPROBE_KEY_ZICBOM_BLOCK_SIZE`: An unsigned int which + represents the size of the Zicbom block in bytes. + +* :c:macro:`RISCV_HWPROBE_KEY_VENDOR_EXT_SIFIVE_0`: A bitmask containing the + sifive vendor extensions that are compatible with the + :c:macro:`RISCV_HWPROBE_BASE_BEHAVIOR_IMA`: base system behavior. + + * SIFIVE + + * :c:macro:`RISCV_HWPROBE_VENDOR_EXT_XSFVQMACCDOD`: The Xsfqmaccdod vendor + extension is supported in version 1.1 of SiFive Int8 Matrix Multiplication + Extensions Specification. + + * :c:macro:`RISCV_HWPROBE_VENDOR_EXT_XSFVQMACCQOQ`: The Xsfqmaccqoq vendor + extension is supported in version 1.1 of SiFive Int8 Matrix Multiplication + Instruction Extensions Specification. + + * :c:macro:`RISCV_HWPROBE_VENDOR_EXT_XSFVFNRCLIPXFQF`: The Xsfvfnrclipxfqf + vendor extension is supported in version 1.0 of SiFive FP32-to-int8 Ranged + Clip Instructions Extensions Specification. + + * :c:macro:`RISCV_HWPROBE_VENDOR_EXT_XSFVFWMACCQQQ`: The Xsfvfwmaccqqq + vendor extension is supported in version 1.0 of Matrix Multiply Accumulate + Instruction Extensions Specification.
\ No newline at end of file diff --git a/Documentation/arch/riscv/index.rst b/Documentation/arch/riscv/index.rst new file mode 100644 index 000000000000..eecf347ce849 --- /dev/null +++ b/Documentation/arch/riscv/index.rst @@ -0,0 +1,25 @@ +=================== +RISC-V architecture +=================== + +.. toctree:: + :maxdepth: 1 + + acpi + boot + boot-image-header + vm-layout + hwprobe + patch-acceptance + uabi + vector + cmodx + + features + +.. only:: subproject and html + + Indices + ======= + + * :ref:`genindex` diff --git a/Documentation/arch/riscv/patch-acceptance.rst b/Documentation/arch/riscv/patch-acceptance.rst new file mode 100644 index 000000000000..634aa222b410 --- /dev/null +++ b/Documentation/arch/riscv/patch-acceptance.rst @@ -0,0 +1,59 @@ +.. SPDX-License-Identifier: GPL-2.0 + +arch/riscv maintenance guidelines for developers +================================================ + +Overview +-------- +The RISC-V instruction set architecture is developed in the open: +in-progress drafts are available for all to review and to experiment +with implementations. New module or extension drafts can change +during the development process - sometimes in ways that are +incompatible with previous drafts. This flexibility can present a +challenge for RISC-V Linux maintenance. Linux maintainers disapprove +of churn, and the Linux development process prefers well-reviewed and +tested code over experimental code. We wish to extend these same +principles to the RISC-V-related code that will be accepted for +inclusion in the kernel. + +Patchwork +--------- + +RISC-V has a patchwork instance, where the status of patches can be checked: + + https://patchwork.kernel.org/project/linux-riscv/list/ + +If your patch does not appear in the default view, the RISC-V maintainers have +likely either requested changes, or expect it to be applied to another tree. + +Automation runs against this patchwork instance, building/testing patches as +they arrive. The automation applies patches against the current HEAD of the +RISC-V `for-next` and `fixes` branches, depending on whether the patch has been +detected as a fix. Failing those, it will use the RISC-V `master` branch. +The exact commit to which a series has been applied will be noted on patchwork. +Patches for which any of the checks fail are unlikely to be applied and in most +cases will need to be resubmitted. + +Submit Checklist Addendum +------------------------- +We'll only accept patches for new modules or extensions if the +specifications for those modules or extensions are listed as being +unlikely to be incompatibly changed in the future. For +specifications from the RISC-V foundation this means "Frozen" or +"Ratified", for the UEFI forum specifications this means a published +ECR. (Developers may, of course, maintain their own Linux kernel trees +that contain code for any draft extensions that they wish.) + +Additionally, the RISC-V specification allows implementers to create +their own custom extensions. These custom extensions aren't required +to go through any review or ratification process by the RISC-V +Foundation. To avoid the maintenance complexity and potential +performance impact of adding kernel code for implementor-specific +RISC-V extensions, we'll only consider patches for extensions that either: + +- Have been officially frozen or ratified by the RISC-V Foundation, or +- Have been implemented in hardware that is widely available, per standard + Linux practice. + +(Implementers, may, of course, maintain their own Linux kernel trees containing +code for any custom extensions that they wish.) diff --git a/Documentation/arch/riscv/uabi.rst b/Documentation/arch/riscv/uabi.rst new file mode 100644 index 000000000000..243e40062e34 --- /dev/null +++ b/Documentation/arch/riscv/uabi.rst @@ -0,0 +1,86 @@ +.. SPDX-License-Identifier: GPL-2.0 + +RISC-V Linux User ABI +===================== + +ISA string ordering in /proc/cpuinfo +------------------------------------ + +The canonical order of ISA extension names in the ISA string is defined in +chapter 27 of the unprivileged specification. +The specification uses vague wording, such as should, when it comes to ordering, +so for our purposes the following rules apply: + +#. Single-letter extensions come first, in canonical order. + The canonical order is "IMAFDQLCBKJTPVH". + +#. All multi-letter extensions will be separated from other extensions by an + underscore. + +#. Additional standard extensions (starting with 'Z') will be sorted after + single-letter extensions and before any higher-privileged extensions. + +#. For additional standard extensions, the first letter following the 'Z' + conventionally indicates the most closely related alphabetical + extension category. If multiple 'Z' extensions are named, they will be + ordered first by category, in canonical order, as listed above, then + alphabetically within a category. + +#. Standard supervisor-level extensions (starting with 'S') will be listed + after standard unprivileged extensions. If multiple supervisor-level + extensions are listed, they will be ordered alphabetically. + +#. Standard machine-level extensions (starting with 'Zxm') will be listed + after any lower-privileged, standard extensions. If multiple machine-level + extensions are listed, they will be ordered alphabetically. + +#. Non-standard extensions (starting with 'X') will be listed after all standard + extensions. If multiple non-standard extensions are listed, they will be + ordered alphabetically. + +An example string following the order is:: + + rv64imadc_zifoo_zigoo_zafoo_sbar_scar_zxmbaz_xqux_xrux + +"isa" and "hart isa" lines in /proc/cpuinfo +------------------------------------------- + +The "isa" line in /proc/cpuinfo describes the lowest common denominator of +RISC-V ISA extensions recognized by the kernel and implemented on all harts. The +"hart isa" line, in contrast, describes the set of extensions recognized by the +kernel on the particular hart being described, even if those extensions may not +be present on all harts in the system. + +In both lines, the presence of an extension guarantees only that the hardware +has the described capability. Additional kernel support or policy changes may be +required before an extension's capability is fully usable by userspace programs. +Similarly, for S-mode extensions, presence in one of these lines does not +guarantee that the kernel is taking advantage of the extension, or that the +feature will be visible in guest VMs managed by this kernel. + +Inversely, the absence of an extension in these lines does not necessarily mean +the hardware does not support that feature. The running kernel may not recognize +the extension, or may have deliberately removed it from the listing. + +Misaligned accesses +------------------- + +Misaligned scalar accesses are supported in userspace, but they may perform +poorly. Misaligned vector accesses are only supported if the Zicclsm extension +is supported. + +Pointer masking +--------------- + +Support for pointer masking in userspace (the Supm extension) is provided via +the ``PR_SET_TAGGED_ADDR_CTRL`` and ``PR_GET_TAGGED_ADDR_CTRL`` ``prctl()`` +operations. Pointer masking is disabled by default. To enable it, userspace +must call ``PR_SET_TAGGED_ADDR_CTRL`` with the ``PR_PMLEN`` field set to the +number of mask/tag bits needed by the application. ``PR_PMLEN`` is interpreted +as a lower bound; if the kernel is unable to satisfy the request, the +``PR_SET_TAGGED_ADDR_CTRL`` operation will fail. The actual number of tag bits +is returned in ``PR_PMLEN`` by the ``PR_GET_TAGGED_ADDR_CTRL`` operation. + +Additionally, when pointer masking is enabled (``PR_PMLEN`` is greater than 0), +a tagged address ABI is supported, with the same interface and behavior as +documented for AArch64 (Documentation/arch/arm64/tagged-address-abi.rst). diff --git a/Documentation/arch/riscv/vector.rst b/Documentation/arch/riscv/vector.rst new file mode 100644 index 000000000000..3987f5f76a9d --- /dev/null +++ b/Documentation/arch/riscv/vector.rst @@ -0,0 +1,140 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================================= +Vector Extension Support for RISC-V Linux +========================================= + +This document briefly outlines the interface provided to userspace by Linux in +order to support the use of the RISC-V Vector Extension. + +1. prctl() Interface +--------------------- + +Two new prctl() calls are added to allow programs to manage the enablement +status for the use of Vector in userspace. The intended usage guideline for +these interfaces is to give init systems a way to modify the availability of V +for processes running under its domain. Calling these interfaces is not +recommended in libraries routines because libraries should not override policies +configured from the parent process. Also, users must note that these interfaces +are not portable to non-Linux, nor non-RISC-V environments, so it is discourage +to use in a portable code. To get the availability of V in an ELF program, +please read :c:macro:`COMPAT_HWCAP_ISA_V` bit of :c:macro:`ELF_HWCAP` in the +auxiliary vector. + +* prctl(PR_RISCV_V_SET_CONTROL, unsigned long arg) + + Sets the Vector enablement status of the calling thread, where the control + argument consists of two 2-bit enablement statuses and a bit for inheritance + mode. Other threads of the calling process are unaffected. + + Enablement status is a tri-state value each occupying 2-bit of space in + the control argument: + + * :c:macro:`PR_RISCV_V_VSTATE_CTRL_DEFAULT`: Use the system-wide default + enablement status on execve(). The system-wide default setting can be + controlled via sysctl interface (see sysctl section below). + + * :c:macro:`PR_RISCV_V_VSTATE_CTRL_ON`: Allow Vector to be run for the + thread. + + * :c:macro:`PR_RISCV_V_VSTATE_CTRL_OFF`: Disallow Vector. Executing Vector + instructions under such condition will trap and casuse the termination of the thread. + + arg: The control argument is a 5-bit value consisting of 3 parts, and + accessed by 3 masks respectively. + + The 3 masks, PR_RISCV_V_VSTATE_CTRL_CUR_MASK, + PR_RISCV_V_VSTATE_CTRL_NEXT_MASK, and PR_RISCV_V_VSTATE_CTRL_INHERIT + represents bit[1:0], bit[3:2], and bit[4]. bit[1:0] accounts for the + enablement status of current thread, and the setting at bit[3:2] takes place + at next execve(). bit[4] defines the inheritance mode of the setting in + bit[3:2]. + + * :c:macro:`PR_RISCV_V_VSTATE_CTRL_CUR_MASK`: bit[1:0]: Account for the + Vector enablement status for the calling thread. The calling thread is + not able to turn off Vector once it has been enabled. The prctl() call + fails with EPERM if the value in this mask is PR_RISCV_V_VSTATE_CTRL_OFF + but the current enablement status is not off. Setting + PR_RISCV_V_VSTATE_CTRL_DEFAULT here takes no effect but to set back + the original enablement status. + + * :c:macro:`PR_RISCV_V_VSTATE_CTRL_NEXT_MASK`: bit[3:2]: Account for the + Vector enablement setting for the calling thread at the next execve() + system call. If PR_RISCV_V_VSTATE_CTRL_DEFAULT is used in this mask, + then the enablement status will be decided by the system-wide + enablement status when execve() happen. + + * :c:macro:`PR_RISCV_V_VSTATE_CTRL_INHERIT`: bit[4]: the inheritance + mode for the setting at PR_RISCV_V_VSTATE_CTRL_NEXT_MASK. If the bit + is set then the following execve() will not clear the setting in both + PR_RISCV_V_VSTATE_CTRL_NEXT_MASK and PR_RISCV_V_VSTATE_CTRL_INHERIT. + This setting persists across changes in the system-wide default value. + + Return value: + * 0 on success; + * EINVAL: Vector not supported, invalid enablement status for current or + next mask; + * EPERM: Turning off Vector in PR_RISCV_V_VSTATE_CTRL_CUR_MASK if Vector + was enabled for the calling thread. + + On success: + * A valid setting for PR_RISCV_V_VSTATE_CTRL_CUR_MASK takes place + immediately. The enablement status specified in + PR_RISCV_V_VSTATE_CTRL_NEXT_MASK happens at the next execve() call, or + all following execve() calls if PR_RISCV_V_VSTATE_CTRL_INHERIT bit is + set. + * Every successful call overwrites a previous setting for the calling + thread. + +* prctl(PR_RISCV_V_GET_CONTROL) + + Gets the same Vector enablement status for the calling thread. Setting for + next execve() call and the inheritance bit are all OR-ed together. + + Note that ELF programs are able to get the availability of V for itself by + reading :c:macro:`COMPAT_HWCAP_ISA_V` bit of :c:macro:`ELF_HWCAP` in the + auxiliary vector. + + Return value: + * a nonnegative value on success; + * EINVAL: Vector not supported. + +2. System runtime configuration (sysctl) +----------------------------------------- + +To mitigate the ABI impact of expansion of the signal stack, a +policy mechanism is provided to the administrators, distro maintainers, and +developers to control the default Vector enablement status for userspace +processes in form of sysctl knob: + +* /proc/sys/abi/riscv_v_default_allow + + Writing the text representation of 0 or 1 to this file sets the default + system enablement status for new starting userspace programs. Valid values + are: + + * 0: Do not allow Vector code to be executed as the default for new processes. + * 1: Allow Vector code to be executed as the default for new processes. + + Reading this file returns the current system default enablement status. + + At every execve() call, a new enablement status of the new process is set to + the system default, unless: + + * PR_RISCV_V_VSTATE_CTRL_INHERIT is set for the calling process, and the + setting in PR_RISCV_V_VSTATE_CTRL_NEXT_MASK is not + PR_RISCV_V_VSTATE_CTRL_DEFAULT. Or, + + * The setting in PR_RISCV_V_VSTATE_CTRL_NEXT_MASK is not + PR_RISCV_V_VSTATE_CTRL_DEFAULT. + + Modifying the system default enablement status does not affect the enablement + status of any existing process of thread that do not make an execve() call. + +3. Vector Register State Across System Calls +--------------------------------------------- + +As indicated by version 1.0 of the V extension [1], vector registers are +clobbered by system calls. + +1: https://github.com/riscv/riscv-v-spec/blob/master/calling-convention.adoc diff --git a/Documentation/arch/riscv/vm-layout.rst b/Documentation/arch/riscv/vm-layout.rst new file mode 100644 index 000000000000..eabec99b5852 --- /dev/null +++ b/Documentation/arch/riscv/vm-layout.rst @@ -0,0 +1,136 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================================== +Virtual Memory Layout on RISC-V Linux +===================================== + +:Author: Alexandre Ghiti <alex@ghiti.fr> +:Date: 12 February 2021 + +This document describes the virtual memory layout used by the RISC-V Linux +Kernel. + +RISC-V Linux Kernel 32bit +========================= + +RISC-V Linux Kernel SV32 +------------------------ + +TODO + +RISC-V Linux Kernel 64bit +========================= + +The RISC-V privileged architecture document states that the 64bit addresses +"must have bits 63–48 all equal to bit 47, or else a page-fault exception will +occur.": that splits the virtual address space into 2 halves separated by a very +big hole, the lower half is where the userspace resides, the upper half is where +the RISC-V Linux Kernel resides. + +RISC-V Linux Kernel SV39 +------------------------ + +:: + + ======================================================================================================================== + Start addr | Offset | End addr | Size | VM area description + ======================================================================================================================== + | | | | + 0000000000000000 | 0 | 0000003fffffffff | 256 GB | user-space virtual memory, different per mm + __________________|____________|__________________|_________|___________________________________________________________ + | | | | + 0000004000000000 | +256 GB | ffffffbfffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical + | | | | virtual memory addresses up to the -256 GB + | | | | starting offset of kernel mappings. + __________________|____________|__________________|_________|___________________________________________________________ + | + | Kernel-space virtual memory, shared between all processes: + ____________________________________________________________|___________________________________________________________ + | | | | + ffffffc4fea00000 | -236 GB | ffffffc4feffffff | 6 MB | fixmap + ffffffc4ff000000 | -236 GB | ffffffc4ffffffff | 16 MB | PCI io + ffffffc500000000 | -236 GB | ffffffc5ffffffff | 4 GB | vmemmap + ffffffc600000000 | -232 GB | ffffffd5ffffffff | 64 GB | vmalloc/ioremap space + ffffffd600000000 | -168 GB | fffffff5ffffffff | 128 GB | direct mapping of all physical memory + | | | | + fffffff700000000 | -36 GB | fffffffeffffffff | 32 GB | kasan + __________________|____________|__________________|_________|____________________________________________________________ + | + | + ____________________________________________________________|____________________________________________________________ + | | | | + ffffffff00000000 | -4 GB | ffffffff7fffffff | 2 GB | modules, BPF + ffffffff80000000 | -2 GB | ffffffffffffffff | 2 GB | kernel + __________________|____________|__________________|_________|____________________________________________________________ + + +RISC-V Linux Kernel SV48 +------------------------ + +:: + + ======================================================================================================================== + Start addr | Offset | End addr | Size | VM area description + ======================================================================================================================== + | | | | + 0000000000000000 | 0 | 00007fffffffffff | 128 TB | user-space virtual memory, different per mm + __________________|____________|__________________|_________|___________________________________________________________ + | | | | + 0000800000000000 | +128 TB | ffff7fffffffffff | ~16M TB | ... huge, almost 64 bits wide hole of non-canonical + | | | | virtual memory addresses up to the -128 TB + | | | | starting offset of kernel mappings. + __________________|____________|__________________|_________|___________________________________________________________ + | + | Kernel-space virtual memory, shared between all processes: + ____________________________________________________________|___________________________________________________________ + | | | | + ffff8d7ffea00000 | -114.5 TB | ffff8d7ffeffffff | 6 MB | fixmap + ffff8d7fff000000 | -114.5 TB | ffff8d7fffffffff | 16 MB | PCI io + ffff8d8000000000 | -114.5 TB | ffff8f7fffffffff | 2 TB | vmemmap + ffff8f8000000000 | -112.5 TB | ffffaf7fffffffff | 32 TB | vmalloc/ioremap space + ffffaf8000000000 | -80.5 TB | ffffef7fffffffff | 64 TB | direct mapping of all physical memory + ffffef8000000000 | -16.5 TB | fffffffeffffffff | 16.5 TB | kasan + __________________|____________|__________________|_________|____________________________________________________________ + | + | Identical layout to the 39-bit one from here on: + ____________________________________________________________|____________________________________________________________ + | | | | + ffffffff00000000 | -4 GB | ffffffff7fffffff | 2 GB | modules, BPF + ffffffff80000000 | -2 GB | ffffffffffffffff | 2 GB | kernel + __________________|____________|__________________|_________|____________________________________________________________ + + +RISC-V Linux Kernel SV57 +------------------------ + +:: + + ======================================================================================================================== + Start addr | Offset | End addr | Size | VM area description + ======================================================================================================================== + | | | | + 0000000000000000 | 0 | 00ffffffffffffff | 64 PB | user-space virtual memory, different per mm + __________________|____________|__________________|_________|___________________________________________________________ + | | | | + 0100000000000000 | +64 PB | feffffffffffffff | ~16K PB | ... huge, almost 64 bits wide hole of non-canonical + | | | | virtual memory addresses up to the -64 PB + | | | | starting offset of kernel mappings. + __________________|____________|__________________|_________|___________________________________________________________ + | + | Kernel-space virtual memory, shared between all processes: + ____________________________________________________________|___________________________________________________________ + | | | | + ff1bfffffea00000 | -57 PB | ff1bfffffeffffff | 6 MB | fixmap + ff1bffffff000000 | -57 PB | ff1bffffffffffff | 16 MB | PCI io + ff1c000000000000 | -57 PB | ff1fffffffffffff | 1 PB | vmemmap + ff20000000000000 | -56 PB | ff5fffffffffffff | 16 PB | vmalloc/ioremap space + ff60000000000000 | -40 PB | ffdeffffffffffff | 32 PB | direct mapping of all physical memory + ffdf000000000000 | -8 PB | fffffffeffffffff | 8 PB | kasan + __________________|____________|__________________|_________|____________________________________________________________ + | + | Identical layout to the 39-bit one from here on: + ____________________________________________________________|____________________________________________________________ + | | | | + ffffffff00000000 | -4 GB | ffffffff7fffffff | 2 GB | modules, BPF + ffffffff80000000 | -2 GB | ffffffffffffffff | 2 GB | kernel + __________________|____________|__________________|_________|____________________________________________________________ |