aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2019-09-16 14:31:40 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2019-09-16 14:31:40 -0700
commite77fafe9afb53b7f4d8176c5cd5c10c43a905bc8 (patch)
tree828ad771a2951f7ac06111c3a9a30e0f368a9b5e /Documentation
parentMerge tag 'iommu-updates-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu (diff)
parentarm64: remove __iounmap (diff)
downloadlinux-dev-e77fafe9afb53b7f4d8176c5cd5c10c43a905bc8.tar.xz
linux-dev-e77fafe9afb53b7f4d8176c5cd5c10c43a905bc8.zip
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon: "Although there isn't tonnes of code in terms of line count, there are a fair few headline features which I've noted both in the tag and also in the merge commits when I pulled everything together. The part I'm most pleased with is that we had 35 contributors this time around, which feels like a big jump from the usual small group of core arm64 arch developers. Hopefully they all enjoyed it so much that they'll continue to contribute, but we'll see. It's probably worth highlighting that we've pulled in a branch from the risc-v folks which moves our CPU topology code out to where it can be shared with others. Summary: - 52-bit virtual addressing in the kernel - New ABI to allow tagged user pointers to be dereferenced by syscalls - Early RNG seeding by the bootloader - Improve robustness of SMP boot - Fix TLB invalidation in light of recent architectural clarifications - Support for i.MX8 DDR PMU - Remove direct LSE instruction patching in favour of static keys - Function error injection using kprobes - Support for the PPTT "thread" flag introduced by ACPI 6.3 - Move PSCI idle code into proper cpuidle driver - Relaxation of implicit I/O memory barriers - Build with RELR relocations when toolchain supports them - Numerous cleanups and non-critical fixes" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (114 commits) arm64: remove __iounmap arm64: atomics: Use K constraint when toolchain appears to support it arm64: atomics: Undefine internal macros after use arm64: lse: Make ARM64_LSE_ATOMICS depend on JUMP_LABEL arm64: asm: Kill 'asm/atomic_arch.h' arm64: lse: Remove unused 'alt_lse' assembly macro arm64: atomics: Remove atomic_ll_sc compilation unit arm64: avoid using hard-coded registers for LSE atomics arm64: atomics: avoid out-of-line ll/sc atomics arm64: Use correct ll/sc atomic constraints jump_label: Don't warn on __exit jump entries docs/perf: Add documentation for the i.MX8 DDR PMU perf/imx_ddr: Add support for AXI ID filtering arm64: kpti: ensure patched kernel text is fetched from PoU arm64: fix fixmap copy for 16K pages and 48-bit VA perf/smmuv3: Validate groups for global filtering perf/smmuv3: Validate group size arm64: Relax Documentation/arm64/tagged-pointers.rst arm64: kvm: Replace hardcoded '1' with SYS_PAR_EL1_F arm64: mm: Ignore spurious translation faults taken from the kernel ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/admin-guide/perf/imx-ddr.rst52
-rw-r--r--Documentation/arm64/index.rst1
-rw-r--r--Documentation/arm64/kasan-offsets.sh27
-rw-r--r--Documentation/arm64/memory.rst123
-rw-r--r--Documentation/arm64/tagged-address-abi.rst156
-rw-r--r--Documentation/arm64/tagged-pointers.rst21
-rw-r--r--Documentation/devicetree/bindings/cpu/cpu-topology.txt (renamed from Documentation/devicetree/bindings/arm/topology.txt)256
7 files changed, 512 insertions, 124 deletions
diff --git a/Documentation/admin-guide/perf/imx-ddr.rst b/Documentation/admin-guide/perf/imx-ddr.rst
new file mode 100644
index 000000000000..517a205abad6
--- /dev/null
+++ b/Documentation/admin-guide/perf/imx-ddr.rst
@@ -0,0 +1,52 @@
+=====================================================
+Freescale i.MX8 DDR Performance Monitoring Unit (PMU)
+=====================================================
+
+There are no performance counters inside the DRAM controller, so performance
+signals are brought out to the edge of the controller where a set of 4 x 32 bit
+counters is implemented. This is controlled by the CSV modes programed in counter
+control register which causes a large number of PERF signals to be generated.
+
+Selection of the value for each counter is done via the config registers. There
+is one register for each counter. Counter 0 is special in that it always counts
+“time” and when expired causes a lock on itself and the other counters and an
+interrupt is raised. If any other counter overflows, it continues counting, and
+no interrupt is raised.
+
+The "format" directory describes format of the config (event ID) and config1
+(AXI filtering) fields of the perf_event_attr structure, see /sys/bus/event_source/
+devices/imx8_ddr0/format/. The "events" directory describes the events types
+hardware supported that can be used with perf tool, see /sys/bus/event_source/
+devices/imx8_ddr0/events/.
+ e.g.::
+ perf stat -a -e imx8_ddr0/cycles/ cmd
+ perf stat -a -e imx8_ddr0/read/,imx8_ddr0/write/ cmd
+
+AXI filtering is only used by CSV modes 0x41 (axid-read) and 0x42 (axid-write)
+to count reading or writing matches filter setting. Filter setting is various
+from different DRAM controller implementations, which is distinguished by quirks
+in the driver.
+
+* With DDR_CAP_AXI_ID_FILTER quirk.
+ Filter is defined with two configuration parts:
+ --AXI_ID defines AxID matching value.
+ --AXI_MASKING defines which bits of AxID are meaningful for the matching.
+ 0:corresponding bit is masked.
+ 1: corresponding bit is not masked, i.e. used to do the matching.
+
+ AXI_ID and AXI_MASKING are mapped on DPCR1 register in performance counter.
+ When non-masked bits are matching corresponding AXI_ID bits then counter is
+ incremented. Perf counter is incremented if
+ AxID && AXI_MASKING == AXI_ID && AXI_MASKING
+
+ This filter doesn't support filter different AXI ID for axid-read and axid-write
+ event at the same time as this filter is shared between counters.
+ e.g.::
+ perf stat -a -e imx8_ddr0/axid-read,axi_mask=0xMMMM,axi_id=0xDDDD/ cmd
+ perf stat -a -e imx8_ddr0/axid-write,axi_mask=0xMMMM,axi_id=0xDDDD/ cmd
+
+ NOTE: axi_mask is inverted in userspace(i.e. set bits are bits to mask), and
+ it will be reverted in driver automatically. so that the user can just specify
+ axi_id to monitor a specific id, rather than having to specify axi_mask.
+ e.g.::
+ perf stat -a -e imx8_ddr0/axid-read,axi_id=0x12/ cmd, which will monitor ARID=0x12
diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst
index 96b696ba4e6c..5c0c69dc58aa 100644
--- a/Documentation/arm64/index.rst
+++ b/Documentation/arm64/index.rst
@@ -16,6 +16,7 @@ ARM64 Architecture
pointer-authentication
silicon-errata
sve
+ tagged-address-abi
tagged-pointers
.. only:: subproject and html
diff --git a/Documentation/arm64/kasan-offsets.sh b/Documentation/arm64/kasan-offsets.sh
new file mode 100644
index 000000000000..2b7a021db363
--- /dev/null
+++ b/Documentation/arm64/kasan-offsets.sh
@@ -0,0 +1,27 @@
+#!/bin/sh
+
+# Print out the KASAN_SHADOW_OFFSETS required to place the KASAN SHADOW
+# start address at the mid-point of the kernel VA space
+
+print_kasan_offset () {
+ printf "%02d\t" $1
+ printf "0x%08x00000000\n" $(( (0xffffffff & (-1 << ($1 - 1 - 32))) \
+ + (1 << ($1 - 32 - $2)) \
+ - (1 << (64 - 32 - $2)) ))
+}
+
+echo KASAN_SHADOW_SCALE_SHIFT = 3
+printf "VABITS\tKASAN_SHADOW_OFFSET\n"
+print_kasan_offset 48 3
+print_kasan_offset 47 3
+print_kasan_offset 42 3
+print_kasan_offset 39 3
+print_kasan_offset 36 3
+echo
+echo KASAN_SHADOW_SCALE_SHIFT = 4
+printf "VABITS\tKASAN_SHADOW_OFFSET\n"
+print_kasan_offset 48 4
+print_kasan_offset 47 4
+print_kasan_offset 42 4
+print_kasan_offset 39 4
+print_kasan_offset 36 4
diff --git a/Documentation/arm64/memory.rst b/Documentation/arm64/memory.rst
index 464b880fc4b7..b040909e45f8 100644
--- a/Documentation/arm64/memory.rst
+++ b/Documentation/arm64/memory.rst
@@ -14,6 +14,10 @@ with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit
64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB)
virtual address, are used but the memory layout is the same.
+ARMv8.2 adds optional support for Large Virtual Address space. This is
+only available when running with a 64KB page size and expands the
+number of descriptors in the first level of translation.
+
User addresses have bits 63:48 set to 0 while the kernel addresses have
the same bits set to 1. TTBRx selection is given by bit 63 of the
virtual address. The swapper_pg_dir contains only kernel (global)
@@ -22,40 +26,43 @@ The swapper_pg_dir address is written to TTBR1 and never written to
TTBR0.
-AArch64 Linux memory layout with 4KB pages + 3 levels::
-
- Start End Size Use
- -----------------------------------------------------------------------
- 0000000000000000 0000007fffffffff 512GB user
- ffffff8000000000 ffffffffffffffff 512GB kernel
-
-
-AArch64 Linux memory layout with 4KB pages + 4 levels::
+AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit)::
Start End Size Use
-----------------------------------------------------------------------
0000000000000000 0000ffffffffffff 256TB user
- ffff000000000000 ffffffffffffffff 256TB kernel
-
-
-AArch64 Linux memory layout with 64KB pages + 2 levels::
+ ffff000000000000 ffff7fffffffffff 128TB kernel logical memory map
+ ffff800000000000 ffff9fffffffffff 32TB kasan shadow region
+ ffffa00000000000 ffffa00007ffffff 128MB bpf jit region
+ ffffa00008000000 ffffa0000fffffff 128MB modules
+ ffffa00010000000 fffffdffbffeffff ~93TB vmalloc
+ fffffdffbfff0000 fffffdfffe5f8fff ~998MB [guard region]
+ fffffdfffe5f9000 fffffdfffe9fffff 4124KB fixed mappings
+ fffffdfffea00000 fffffdfffebfffff 2MB [guard region]
+ fffffdfffec00000 fffffdffffbfffff 16MB PCI I/O space
+ fffffdffffc00000 fffffdffffdfffff 2MB [guard region]
+ fffffdffffe00000 ffffffffffdfffff 2TB vmemmap
+ ffffffffffe00000 ffffffffffffffff 2MB [guard region]
+
+
+AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support)::
Start End Size Use
-----------------------------------------------------------------------
- 0000000000000000 000003ffffffffff 4TB user
- fffffc0000000000 ffffffffffffffff 4TB kernel
-
-
-AArch64 Linux memory layout with 64KB pages + 3 levels::
-
- Start End Size Use
- -----------------------------------------------------------------------
- 0000000000000000 0000ffffffffffff 256TB user
- ffff000000000000 ffffffffffffffff 256TB kernel
-
-
-For details of the virtual kernel memory layout please see the kernel
-booting log.
+ 0000000000000000 000fffffffffffff 4PB user
+ fff0000000000000 fff7ffffffffffff 2PB kernel logical memory map
+ fff8000000000000 fffd9fffffffffff 1440TB [gap]
+ fffda00000000000 ffff9fffffffffff 512TB kasan shadow region
+ ffffa00000000000 ffffa00007ffffff 128MB bpf jit region
+ ffffa00008000000 ffffa0000fffffff 128MB modules
+ ffffa00010000000 fffff81ffffeffff ~88TB vmalloc
+ fffff81fffff0000 fffffc1ffe58ffff ~3TB [guard region]
+ fffffc1ffe590000 fffffc1ffe9fffff 4544KB fixed mappings
+ fffffc1ffea00000 fffffc1ffebfffff 2MB [guard region]
+ fffffc1ffec00000 fffffc1fffbfffff 16MB PCI I/O space
+ fffffc1fffc00000 fffffc1fffdfffff 2MB [guard region]
+ fffffc1fffe00000 ffffffffffdfffff 3968GB vmemmap
+ ffffffffffe00000 ffffffffffffffff 2MB [guard region]
Translation table lookup with 4KB pages::
@@ -83,7 +90,8 @@ Translation table lookup with 64KB pages::
| | | | [15:0] in-page offset
| | | +----------> [28:16] L3 index
| | +--------------------------> [41:29] L2 index
- | +-------------------------------> [47:42] L1 index
+ | +-------------------------------> [47:42] L1 index (48-bit)
+ | [51:42] L1 index (52-bit)
+-------------------------------------------------> [63] TTBR0/1
@@ -96,3 +104,62 @@ ARM64_HARDEN_EL2_VECTORS is selected for particular CPUs.
When using KVM with the Virtualization Host Extensions, no additional
mappings are created, since the host kernel runs directly in EL2.
+
+52-bit VA support in the kernel
+-------------------------------
+If the ARMv8.2-LVA optional feature is present, and we are running
+with a 64KB page size; then it is possible to use 52-bits of address
+space for both userspace and kernel addresses. However, any kernel
+binary that supports 52-bit must also be able to fall back to 48-bit
+at early boot time if the hardware feature is not present.
+
+This fallback mechanism necessitates the kernel .text to be in the
+higher addresses such that they are invariant to 48/52-bit VAs. Due
+to the kasan shadow being a fraction of the entire kernel VA space,
+the end of the kasan shadow must also be in the higher half of the
+kernel VA space for both 48/52-bit. (Switching from 48-bit to 52-bit,
+the end of the kasan shadow is invariant and dependent on ~0UL,
+whilst the start address will "grow" towards the lower addresses).
+
+In order to optimise phys_to_virt and virt_to_phys, the PAGE_OFFSET
+is kept constant at 0xFFF0000000000000 (corresponding to 52-bit),
+this obviates the need for an extra variable read. The physvirt
+offset and vmemmap offsets are computed at early boot to enable
+this logic.
+
+As a single binary will need to support both 48-bit and 52-bit VA
+spaces, the VMEMMAP must be sized large enough for 52-bit VAs and
+also must be sized large enought to accommodate a fixed PAGE_OFFSET.
+
+Most code in the kernel should not need to consider the VA_BITS, for
+code that does need to know the VA size the variables are
+defined as follows:
+
+VA_BITS constant the *maximum* VA space size
+
+VA_BITS_MIN constant the *minimum* VA space size
+
+vabits_actual variable the *actual* VA space size
+
+
+Maximum and minimum sizes can be useful to ensure that buffers are
+sized large enough or that addresses are positioned close enough for
+the "worst" case.
+
+52-bit userspace VAs
+--------------------
+To maintain compatibility with software that relies on the ARMv8.0
+VA space maximum size of 48-bits, the kernel will, by default,
+return virtual addresses to userspace from a 48-bit range.
+
+Software can "opt-in" to receiving VAs from a 52-bit space by
+specifying an mmap hint parameter that is larger than 48-bit.
+For example:
+ maybe_high_address = mmap(~0UL, size, prot, flags,...);
+
+It is also possible to build a debug kernel that returns addresses
+from a 52-bit space by enabling the following kernel config options:
+ CONFIG_EXPERT=y && CONFIG_ARM64_FORCE_52BIT=y
+
+Note that this option is only intended for debugging applications
+and should not be used in production.
diff --git a/Documentation/arm64/tagged-address-abi.rst b/Documentation/arm64/tagged-address-abi.rst
new file mode 100644
index 000000000000..d4a85d535bf9
--- /dev/null
+++ b/Documentation/arm64/tagged-address-abi.rst
@@ -0,0 +1,156 @@
+==========================
+AArch64 TAGGED ADDRESS ABI
+==========================
+
+Authors: Vincenzo Frascino <vincenzo.frascino@arm.com>
+ Catalin Marinas <catalin.marinas@arm.com>
+
+Date: 21 August 2019
+
+This document describes the usage and semantics of the Tagged Address
+ABI on AArch64 Linux.
+
+1. Introduction
+---------------
+
+On AArch64 the ``TCR_EL1.TBI0`` bit is set by default, allowing
+userspace (EL0) to perform memory accesses through 64-bit pointers with
+a non-zero top byte. This document describes the relaxation of the
+syscall ABI that allows userspace to pass certain tagged pointers to
+kernel syscalls.
+
+2. AArch64 Tagged Address ABI
+-----------------------------
+
+From the kernel syscall interface perspective and for the purposes of
+this document, a "valid tagged pointer" is a pointer with a potentially
+non-zero top-byte that references an address in the user process address
+space obtained in one of the following ways:
+
+- ``mmap()`` syscall where either:
+
+ - flags have the ``MAP_ANONYMOUS`` bit set or
+ - the file descriptor refers to a regular file (including those
+ returned by ``memfd_create()``) or ``/dev/zero``
+
+- ``brk()`` syscall (i.e. the heap area between the initial location of
+ the program break at process creation and its current location).
+
+- any memory mapped by the kernel in the address space of the process
+ during creation and with the same restrictions as for ``mmap()`` above
+ (e.g. data, bss, stack).
+
+The AArch64 Tagged Address ABI has two stages of relaxation depending
+how the user addresses are used by the kernel:
+
+1. User addresses not accessed by the kernel but used for address space
+ management (e.g. ``mmap()``, ``mprotect()``, ``madvise()``). The use
+ of valid tagged pointers in this context is always allowed.
+
+2. User addresses accessed by the kernel (e.g. ``write()``). This ABI
+ relaxation is disabled by default and the application thread needs to
+ explicitly enable it via ``prctl()`` as follows:
+
+ - ``PR_SET_TAGGED_ADDR_CTRL``: enable or disable the AArch64 Tagged
+ Address ABI for the calling thread.
+
+ The ``(unsigned int) arg2`` argument is a bit mask describing the
+ control mode used:
+
+ - ``PR_TAGGED_ADDR_ENABLE``: enable AArch64 Tagged Address ABI.
+ Default status is disabled.
+
+ Arguments ``arg3``, ``arg4``, and ``arg5`` must be 0.
+
+ - ``PR_GET_TAGGED_ADDR_CTRL``: get the status of the AArch64 Tagged
+ Address ABI for the calling thread.
+
+ Arguments ``arg2``, ``arg3``, ``arg4``, and ``arg5`` must be 0.
+
+ The ABI properties described above are thread-scoped, inherited on
+ clone() and fork() and cleared on exec().
+
+ Calling ``prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0)``
+ returns ``-EINVAL`` if the AArch64 Tagged Address ABI is globally
+ disabled by ``sysctl abi.tagged_addr_disabled=1``. The default
+ ``sysctl abi.tagged_addr_disabled`` configuration is 0.
+
+When the AArch64 Tagged Address ABI is enabled for a thread, the
+following behaviours are guaranteed:
+
+- All syscalls except the cases mentioned in section 3 can accept any
+ valid tagged pointer.
+
+- The syscall behaviour is undefined for invalid tagged pointers: it may
+ result in an error code being returned, a (fatal) signal being raised,
+ or other modes of failure.
+
+- The syscall behaviour for a valid tagged pointer is the same as for
+ the corresponding untagged pointer.
+
+
+A definition of the meaning of tagged pointers on AArch64 can be found
+in Documentation/arm64/tagged-pointers.rst.
+
+3. AArch64 Tagged Address ABI Exceptions
+-----------------------------------------
+
+The following system call parameters must be untagged regardless of the
+ABI relaxation:
+
+- ``prctl()`` other than pointers to user data either passed directly or
+ indirectly as arguments to be accessed by the kernel.
+
+- ``ioctl()`` other than pointers to user data either passed directly or
+ indirectly as arguments to be accessed by the kernel.
+
+- ``shmat()`` and ``shmdt()``.
+
+Any attempt to use non-zero tagged pointers may result in an error code
+being returned, a (fatal) signal being raised, or other modes of
+failure.
+
+4. Example of correct usage
+---------------------------
+.. code-block:: c
+
+ #include <stdlib.h>
+ #include <string.h>
+ #include <unistd.h>
+ #include <sys/mman.h>
+ #include <sys/prctl.h>
+
+ #define PR_SET_TAGGED_ADDR_CTRL 55
+ #define PR_TAGGED_ADDR_ENABLE (1UL << 0)
+
+ #define TAG_SHIFT 56
+
+ int main(void)
+ {
+ int tbi_enabled = 0;
+ unsigned long tag = 0;
+ char *ptr;
+
+ /* check/enable the tagged address ABI */
+ if (!prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE, 0, 0, 0))
+ tbi_enabled = 1;
+
+ /* memory allocation */
+ ptr = mmap(NULL, sysconf(_SC_PAGE_SIZE), PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (ptr == MAP_FAILED)
+ return 1;
+
+ /* set a non-zero tag if the ABI is available */
+ if (tbi_enabled)
+ tag = rand() & 0xff;
+ ptr = (char *)((unsigned long)ptr | (tag << TAG_SHIFT));
+
+ /* memory access to a tagged address */
+ strcpy(ptr, "tagged pointer\n");
+
+ /* syscall with a tagged pointer */
+ write(1, ptr, strlen(ptr));
+
+ return 0;
+ }
diff --git a/Documentation/arm64/tagged-pointers.rst b/Documentation/arm64/tagged-pointers.rst
index 2acdec3ebbeb..eab4323609b9 100644
--- a/Documentation/arm64/tagged-pointers.rst
+++ b/Documentation/arm64/tagged-pointers.rst
@@ -20,7 +20,9 @@ Passing tagged addresses to the kernel
--------------------------------------
All interpretation of userspace memory addresses by the kernel assumes
-an address tag of 0x00.
+an address tag of 0x00, unless the application enables the AArch64
+Tagged Address ABI explicitly
+(Documentation/arm64/tagged-address-abi.rst).
This includes, but is not limited to, addresses found in:
@@ -33,13 +35,15 @@ This includes, but is not limited to, addresses found in:
- the frame pointer (x29) and frame records, e.g. when interpreting
them to generate a backtrace or call graph.
-Using non-zero address tags in any of these locations may result in an
-error code being returned, a (fatal) signal being raised, or other modes
-of failure.
+Using non-zero address tags in any of these locations when the
+userspace application did not enable the AArch64 Tagged Address ABI may
+result in an error code being returned, a (fatal) signal being raised,
+or other modes of failure.
-For these reasons, passing non-zero address tags to the kernel via
-system calls is forbidden, and using a non-zero address tag for sp is
-strongly discouraged.
+For these reasons, when the AArch64 Tagged Address ABI is disabled,
+passing non-zero address tags to the kernel via system calls is
+forbidden, and using a non-zero address tag for sp is strongly
+discouraged.
Programs maintaining a frame pointer and frame records that use non-zero
address tags may suffer impaired or inaccurate debug and profiling
@@ -59,6 +63,9 @@ be preserved.
The architecture prevents the use of a tagged PC, so the upper byte will
be set to a sign-extension of bit 55 on exception return.
+This behaviour is maintained when the AArch64 Tagged Address ABI is
+enabled.
+
Other considerations
--------------------
diff --git a/Documentation/devicetree/bindings/arm/topology.txt b/Documentation/devicetree/bindings/cpu/cpu-topology.txt
index b0d80c0fb265..99918189403c 100644
--- a/Documentation/devicetree/bindings/arm/topology.txt
+++ b/Documentation/devicetree/bindings/cpu/cpu-topology.txt
@@ -1,21 +1,19 @@
===========================================
-ARM topology binding description
+CPU topology binding description
===========================================
===========================================
1 - Introduction
===========================================
-In an ARM system, the hierarchy of CPUs is defined through three entities that
+In a SMP system, the hierarchy of CPUs is defined through three entities that
are used to describe the layout of physical CPUs in the system:
+- socket
- cluster
- core
- thread
-The cpu nodes (bindings defined in [1]) represent the devices that
-correspond to physical CPUs and are to be mapped to the hierarchy levels.
-
The bottom hierarchy level sits at core or thread level depending on whether
symmetric multi-threading (SMT) is supported or not.
@@ -24,33 +22,31 @@ threads existing in the system and map to the hierarchy level "thread" above.
In systems where SMT is not supported "cpu" nodes represent all cores present
in the system and map to the hierarchy level "core" above.
-ARM topology bindings allow one to associate cpu nodes with hierarchical groups
+CPU topology bindings allow one to associate cpu nodes with hierarchical groups
corresponding to the system hierarchy; syntactically they are defined as device
tree nodes.
-The remainder of this document provides the topology bindings for ARM, based
-on the Devicetree Specification, available from:
+Currently, only ARM/RISC-V intend to use this cpu topology binding but it may be
+used for any other architecture as well.
-https://www.devicetree.org/specifications/
+The cpu nodes, as per bindings defined in [4], represent the devices that
+correspond to physical CPUs and are to be mapped to the hierarchy levels.
-If not stated otherwise, whenever a reference to a cpu node phandle is made its
-value must point to a cpu node compliant with the cpu node bindings as
-documented in [1].
A topology description containing phandles to cpu nodes that are not compliant
-with bindings standardized in [1] is therefore considered invalid.
+with bindings standardized in [4] is therefore considered invalid.
===========================================
2 - cpu-map node
===========================================
-The ARM CPU topology is defined within the cpu-map node, which is a direct
+The ARM/RISC-V CPU topology is defined within the cpu-map node, which is a direct
child of the cpus node and provides a container where the actual topology
nodes are listed.
- cpu-map node
- Usage: Optional - On ARM SMP systems provide CPUs topology to the OS.
- ARM uniprocessor systems do not require a topology
+ Usage: Optional - On SMP systems provide CPUs topology to the OS.
+ Uniprocessor systems do not require a topology
description and therefore should not define a
cpu-map node.
@@ -63,21 +59,23 @@ nodes are listed.
The cpu-map node's child nodes can be:
- - one or more cluster nodes
+ - one or more cluster nodes or
+ - one or more socket nodes in a multi-socket system
Any other configuration is considered invalid.
-The cpu-map node can only contain three types of child nodes:
+The cpu-map node can only contain 4 types of child nodes:
+- socket node
- cluster node
- core node
- thread node
whose bindings are described in paragraph 3.
-The nodes describing the CPU topology (cluster/core/thread) can only
-be defined within the cpu-map node and every core/thread in the system
-must be defined within the topology. Any other configuration is
+The nodes describing the CPU topology (socket/cluster/core/thread) can
+only be defined within the cpu-map node and every core/thread in the
+system must be defined within the topology. Any other configuration is
invalid and therefore must be ignored.
===========================================
@@ -85,26 +83,44 @@ invalid and therefore must be ignored.
===========================================
cpu-map child nodes must follow a naming convention where the node name
-must be "clusterN", "coreN", "threadN" depending on the node type (ie
-cluster/core/thread) (where N = {0, 1, ...} is the node number; nodes which
-are siblings within a single common parent node must be given a unique and
+must be "socketN", "clusterN", "coreN", "threadN" depending on the node type
+(ie socket/cluster/core/thread) (where N = {0, 1, ...} is the node number; nodes
+which are siblings within a single common parent node must be given a unique and
sequential N value, starting from 0).
cpu-map child nodes which do not share a common parent node can have the same
name (ie same number N as other cpu-map child nodes at different device tree
levels) since name uniqueness will be guaranteed by the device tree hierarchy.
===========================================
-3 - cluster/core/thread node bindings
+3 - socket/cluster/core/thread node bindings
===========================================
-Bindings for cluster/cpu/thread nodes are defined as follows:
+Bindings for socket/cluster/cpu/thread nodes are defined as follows:
+
+- socket node
+
+ Description: must be declared within a cpu-map node, one node
+ per physical socket in the system. A system can
+ contain single or multiple physical socket.
+ The association of sockets and NUMA nodes is beyond
+ the scope of this bindings, please refer [2] for
+ NUMA bindings.
+
+ This node is optional for a single socket system.
+
+ The socket node name must be "socketN" as described in 2.1 above.
+ A socket node can not be a leaf node.
+
+ A socket node's child nodes must be one or more cluster nodes.
+
+ Any other configuration is considered invalid.
- cluster node
Description: must be declared within a cpu-map node, one node
per cluster. A system can contain several layers of
- clustering and cluster nodes can be contained in parent
- cluster nodes.
+ clustering within a single physical socket and cluster
+ nodes can be contained in parent cluster nodes.
The cluster node name must be "clusterN" as described in 2.1 above.
A cluster node can not be a leaf node.
@@ -164,90 +180,93 @@ Bindings for cluster/cpu/thread nodes are defined as follows:
4 - Example dts
===========================================
-Example 1 (ARM 64-bit, 16-cpu system, two clusters of clusters):
+Example 1 (ARM 64-bit, 16-cpu system, two clusters of clusters in a single
+physical socket):
cpus {
#size-cells = <0>;
#address-cells = <2>;
cpu-map {
- cluster0 {
+ socket0 {
cluster0 {
- core0 {
- thread0 {
- cpu = <&CPU0>;
+ cluster0 {
+ core0 {
+ thread0 {
+ cpu = <&CPU0>;
+ };
+ thread1 {
+ cpu = <&CPU1>;
+ };
};
- thread1 {
- cpu = <&CPU1>;
- };
- };
- core1 {
- thread0 {
- cpu = <&CPU2>;
- };
- thread1 {
- cpu = <&CPU3>;
+ core1 {
+ thread0 {
+ cpu = <&CPU2>;
+ };
+ thread1 {
+ cpu = <&CPU3>;
+ };
};
};
- };
- cluster1 {
- core0 {
- thread0 {
- cpu = <&CPU4>;
- };
- thread1 {
- cpu = <&CPU5>;
+ cluster1 {
+ core0 {
+ thread0 {
+ cpu = <&CPU4>;
+ };
+ thread1 {
+ cpu = <&CPU5>;
+ };
};
- };
- core1 {
- thread0 {
- cpu = <&CPU6>;
- };
- thread1 {
- cpu = <&CPU7>;
- };
- };
- };
- };
-
- cluster1 {
- cluster0 {
- core0 {
- thread0 {
- cpu = <&CPU8>;
- };
- thread1 {
- cpu = <&CPU9>;
- };
- };
- core1 {
- thread0 {
- cpu = <&CPU10>;
- };
- thread1 {
- cpu = <&CPU11>;
+ core1 {
+ thread0 {
+ cpu = <&CPU6>;
+ };
+ thread1 {
+ cpu = <&CPU7>;
+ };
};
};
};
cluster1 {
- core0 {
- thread0 {
- cpu = <&CPU12>;
+ cluster0 {
+ core0 {
+ thread0 {
+ cpu = <&CPU8>;
+ };
+ thread1 {
+ cpu = <&CPU9>;
+ };
};
- thread1 {
- cpu = <&CPU13>;
+ core1 {
+ thread0 {
+ cpu = <&CPU10>;
+ };
+ thread1 {
+ cpu = <&CPU11>;
+ };
};
};
- core1 {
- thread0 {
- cpu = <&CPU14>;
+
+ cluster1 {
+ core0 {
+ thread0 {
+ cpu = <&CPU12>;
+ };
+ thread1 {
+ cpu = <&CPU13>;
+ };
};
- thread1 {
- cpu = <&CPU15>;
+ core1 {
+ thread0 {
+ cpu = <&CPU14>;
+ };
+ thread1 {
+ cpu = <&CPU15>;
+ };
};
};
};
@@ -470,6 +489,65 @@ cpus {
};
};
+Example 3: HiFive Unleashed (RISC-V 64 bit, 4 core system)
+
+{
+ #address-cells = <2>;
+ #size-cells = <2>;
+ compatible = "sifive,fu540g", "sifive,fu500";
+ model = "sifive,hifive-unleashed-a00";
+
+ ...
+ cpus {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ cpu-map {
+ socket0 {
+ cluster0 {
+ core0 {
+ cpu = <&CPU1>;
+ };
+ core1 {
+ cpu = <&CPU2>;
+ };
+ core2 {
+ cpu0 = <&CPU2>;
+ };
+ core3 {
+ cpu0 = <&CPU3>;
+ };
+ };
+ };
+ };
+
+ CPU1: cpu@1 {
+ device_type = "cpu";
+ compatible = "sifive,rocket0", "riscv";
+ reg = <0x1>;
+ }
+
+ CPU2: cpu@2 {
+ device_type = "cpu";
+ compatible = "sifive,rocket0", "riscv";
+ reg = <0x2>;
+ }
+ CPU3: cpu@3 {
+ device_type = "cpu";
+ compatible = "sifive,rocket0", "riscv";
+ reg = <0x3>;
+ }
+ CPU4: cpu@4 {
+ device_type = "cpu";
+ compatible = "sifive,rocket0", "riscv";
+ reg = <0x4>;
+ }
+ }
+};
===============================================================================
[1] ARM Linux kernel documentation
Documentation/devicetree/bindings/arm/cpus.yaml
+[2] Devicetree NUMA binding description
+ Documentation/devicetree/bindings/numa.txt
+[3] RISC-V Linux kernel documentation
+ Documentation/devicetree/bindings/riscv/cpus.txt
+[4] https://www.devicetree.org/specifications/