diff options
Diffstat (limited to 'Documentation/arm64')
21 files changed, 1788 insertions, 198 deletions
diff --git a/Documentation/arm64/acpi_object_usage.rst b/Documentation/arm64/acpi_object_usage.rst index d51b69dc624d..0609da73970b 100644 --- a/Documentation/arm64/acpi_object_usage.rst +++ b/Documentation/arm64/acpi_object_usage.rst @@ -17,12 +17,12 @@ For ACPI on arm64, tables also fall into the following categories: - Recommended: BERT, EINJ, ERST, HEST, PCCT, SSDT - - Optional: BGRT, CPEP, CSRT, DBG2, DRTM, ECDT, FACS, FPDT, IORT, - MCHI, MPST, MSCT, NFIT, PMTT, RASF, SBST, SLIT, SPMI, SRAT, STAO, - TCPA, TPM2, UEFI, XENV + - Optional: BGRT, CPEP, CSRT, DBG2, DRTM, ECDT, FACS, FPDT, IBFT, + IORT, MCHI, MPST, MSCT, NFIT, PMTT, RASF, SBST, SLIT, SPMI, SRAT, + STAO, TCPA, TPM2, UEFI, XENV - - Not supported: BOOT, DBGP, DMAR, ETDT, HPET, IBFT, IVRS, LPIT, - MSDM, OEMx, PSDT, RSDT, SLIC, WAET, WDAT, WDRT, WPBT + - Not supported: BOOT, DBGP, DMAR, ETDT, HPET, IVRS, LPIT, MSDM, OEMx, + PSDT, RSDT, SLIC, WAET, WDAT, WDRT, WPBT ====== ======================================================================== Table Usage for ARMv8 Linux @@ -220,7 +220,7 @@ LPIT Signature Reserved (signature == "LPIT") x86 only table as of ACPI 5.1; starting with ACPI 6.0, processor descriptions and power states on ARM platforms should use the DSDT and define processor container devices (_HID ACPI0010, Section 8.4, - and more specifically 8.4.3 and and 8.4.4). + and more specifically 8.4.3 and 8.4.4). MADT Section 5.2.12 (signature == "APIC") diff --git a/Documentation/arm64/amu.rst b/Documentation/arm64/amu.rst new file mode 100644 index 000000000000..01f2de2b0450 --- /dev/null +++ b/Documentation/arm64/amu.rst @@ -0,0 +1,119 @@ +.. _amu_index: + +======================================================= +Activity Monitors Unit (AMU) extension in AArch64 Linux +======================================================= + +Author: Ionela Voinescu <ionela.voinescu@arm.com> + +Date: 2019-09-10 + +This document briefly describes the provision of Activity Monitors Unit +support in AArch64 Linux. + + +Architecture overview +--------------------- + +The activity monitors extension is an optional extension introduced by the +ARMv8.4 CPU architecture. + +The activity monitors unit, implemented in each CPU, provides performance +counters intended for system management use. The AMU extension provides a +system register interface to the counter registers and also supports an +optional external memory-mapped interface. + +Version 1 of the Activity Monitors architecture implements a counter group +of four fixed and architecturally defined 64-bit event counters. + + - CPU cycle counter: increments at the frequency of the CPU. + - Constant counter: increments at the fixed frequency of the system + clock. + - Instructions retired: increments with every architecturally executed + instruction. + - Memory stall cycles: counts instruction dispatch stall cycles caused by + misses in the last level cache within the clock domain. + +When in WFI or WFE these counters do not increment. + +The Activity Monitors architecture provides space for up to 16 architected +event counters. Future versions of the architecture may use this space to +implement additional architected event counters. + +Additionally, version 1 implements a counter group of up to 16 auxiliary +64-bit event counters. + +On cold reset all counters reset to 0. + + +Basic support +------------- + +The kernel can safely run a mix of CPUs with and without support for the +activity monitors extension. Therefore, when CONFIG_ARM64_AMU_EXTN is +selected we unconditionally enable the capability to allow any late CPU +(secondary or hotplugged) to detect and use the feature. + +When the feature is detected on a CPU, we flag the availability of the +feature but this does not guarantee the correct functionality of the +counters, only the presence of the extension. + +Firmware (code running at higher exception levels, e.g. arm-tf) support is +needed to: + + - Enable access for lower exception levels (EL2 and EL1) to the AMU + registers. + - Enable the counters. If not enabled these will read as 0. + - Save/restore the counters before/after the CPU is being put/brought up + from the 'off' power state. + +When using kernels that have this feature enabled but boot with broken +firmware the user may experience panics or lockups when accessing the +counter registers. Even if these symptoms are not observed, the values +returned by the register reads might not correctly reflect reality. Most +commonly, the counters will read as 0, indicating that they are not +enabled. + +If proper support is not provided in firmware it's best to disable +CONFIG_ARM64_AMU_EXTN. To be noted that for security reasons, this does not +bypass the setting of AMUSERENR_EL0 to trap accesses from EL0 (userspace) to +EL1 (kernel). Therefore, firmware should still ensure accesses to AMU registers +are not trapped in EL2/EL3. + +The fixed counters of AMUv1 are accessible though the following system +register definitions: + + - SYS_AMEVCNTR0_CORE_EL0 + - SYS_AMEVCNTR0_CONST_EL0 + - SYS_AMEVCNTR0_INST_RET_EL0 + - SYS_AMEVCNTR0_MEM_STALL_EL0 + +Auxiliary platform specific counters can be accessed using +SYS_AMEVCNTR1_EL0(n), where n is a value between 0 and 15. + +Details can be found in: arch/arm64/include/asm/sysreg.h. + + +Userspace access +---------------- + +Currently, access from userspace to the AMU registers is disabled due to: + + - Security reasons: they might expose information about code executed in + secure mode. + - Purpose: AMU counters are intended for system management use. + +Also, the presence of the feature is not visible to userspace. + + +Virtualization +-------------- + +Currently, access from userspace (EL0) and kernelspace (EL1) on the KVM +guest side is disabled due to: + + - Security reasons: they might expose information about code executed + by other guests or the host. + +Any attempt to access the AMU registers will result in an UNDEFINED +exception being injected into the guest. diff --git a/Documentation/arm64/arm-acpi.rst b/Documentation/arm64/arm-acpi.rst index 872dbbc73d4a..47ecb9930dde 100644 --- a/Documentation/arm64/arm-acpi.rst +++ b/Documentation/arm64/arm-acpi.rst @@ -273,7 +273,7 @@ only use the _DSD Device Properties UUID [5]: - UUID: daffd814-6eba-4d8c-8a91-bc9bbf4aa301 - - http://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf + - https://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf The UEFI Forum provides a mechanism for registering device properties [4] so that they may be used across all operating systems supporting ACPI. @@ -470,7 +470,7 @@ likely be willing to assist in submitting ECRs. Linux Code ---------- -Individual items specific to Linux on ARM, contained in the the Linux +Individual items specific to Linux on ARM, contained in the Linux source code, are in the list that follows: ACPI_OS_NAME diff --git a/Documentation/arm64/asymmetric-32bit.rst b/Documentation/arm64/asymmetric-32bit.rst new file mode 100644 index 000000000000..64a0b505da7d --- /dev/null +++ b/Documentation/arm64/asymmetric-32bit.rst @@ -0,0 +1,155 @@ +====================== +Asymmetric 32-bit SoCs +====================== + +Author: Will Deacon <will@kernel.org> + +This document describes the impact of asymmetric 32-bit SoCs on the +execution of 32-bit (``AArch32``) applications. + +Date: 2021-05-17 + +Introduction +============ + +Some Armv9 SoCs suffer from a big.LITTLE misfeature where only a subset +of the CPUs are capable of executing 32-bit user applications. On such +a system, Linux by default treats the asymmetry as a "mismatch" and +disables support for both the ``PER_LINUX32`` personality and +``execve(2)`` of 32-bit ELF binaries, with the latter returning +``-ENOEXEC``. If the mismatch is detected during late onlining of a +64-bit-only CPU, then the onlining operation fails and the new CPU is +unavailable for scheduling. + +Surprisingly, these SoCs have been produced with the intention of +running legacy 32-bit binaries. Unsurprisingly, that doesn't work very +well with the default behaviour of Linux. + +It seems inevitable that future SoCs will drop 32-bit support +altogether, so if you're stuck in the unenviable position of needing to +run 32-bit code on one of these transitionary platforms then you would +be wise to consider alternatives such as recompilation, emulation or +retirement. If neither of those options are practical, then read on. + +Enabling kernel support +======================= + +Since the kernel support is not completely transparent to userspace, +allowing 32-bit tasks to run on an asymmetric 32-bit system requires an +explicit "opt-in" and can be enabled by passing the +``allow_mismatched_32bit_el0`` parameter on the kernel command-line. + +For the remainder of this document we will refer to an *asymmetric +system* to mean an asymmetric 32-bit SoC running Linux with this kernel +command-line option enabled. + +Userspace impact +================ + +32-bit tasks running on an asymmetric system behave in mostly the same +way as on a homogeneous system, with a few key differences relating to +CPU affinity. + +sysfs +----- + +The subset of CPUs capable of running 32-bit tasks is described in +``/sys/devices/system/cpu/aarch32_el0`` and is documented further in +``Documentation/ABI/testing/sysfs-devices-system-cpu``. + +**Note:** CPUs are advertised by this file as they are detected and so +late-onlining of 32-bit-capable CPUs can result in the file contents +being modified by the kernel at runtime. Once advertised, CPUs are never +removed from the file. + +``execve(2)`` +------------- + +On a homogeneous system, the CPU affinity of a task is preserved across +``execve(2)``. This is not always possible on an asymmetric system, +specifically when the new program being executed is 32-bit yet the +affinity mask contains 64-bit-only CPUs. In this situation, the kernel +determines the new affinity mask as follows: + + 1. If the 32-bit-capable subset of the affinity mask is not empty, + then the affinity is restricted to that subset and the old affinity + mask is saved. This saved mask is inherited over ``fork(2)`` and + preserved across ``execve(2)`` of 32-bit programs. + + **Note:** This step does not apply to ``SCHED_DEADLINE`` tasks. + See `SCHED_DEADLINE`_. + + 2. Otherwise, the cpuset hierarchy of the task is walked until an + ancestor is found containing at least one 32-bit-capable CPU. The + affinity of the task is then changed to match the 32-bit-capable + subset of the cpuset determined by the walk. + + 3. On failure (i.e. out of memory), the affinity is changed to the set + of all 32-bit-capable CPUs of which the kernel is aware. + +A subsequent ``execve(2)`` of a 64-bit program by the 32-bit task will +invalidate the affinity mask saved in (1) and attempt to restore the CPU +affinity of the task using the saved mask if it was previously valid. +This restoration may fail due to intervening changes to the deadline +policy or cpuset hierarchy, in which case the ``execve(2)`` continues +with the affinity unchanged. + +Calls to ``sched_setaffinity(2)`` for a 32-bit task will consider only +the 32-bit-capable CPUs of the requested affinity mask. On success, the +affinity for the task is updated and any saved mask from a prior +``execve(2)`` is invalidated. + +``SCHED_DEADLINE`` +------------------ + +Explicit admission of a 32-bit deadline task to the default root domain +(e.g. by calling ``sched_setattr(2)``) is rejected on an asymmetric +32-bit system unless admission control is disabled by writing -1 to +``/proc/sys/kernel/sched_rt_runtime_us``. + +``execve(2)`` of a 32-bit program from a 64-bit deadline task will +return ``-ENOEXEC`` if the root domain for the task contains any +64-bit-only CPUs and admission control is enabled. Concurrent offlining +of 32-bit-capable CPUs may still necessitate the procedure described in +`execve(2)`_, in which case step (1) is skipped and a warning is +emitted on the console. + +**Note:** It is recommended that a set of 32-bit-capable CPUs are placed +into a separate root domain if ``SCHED_DEADLINE`` is to be used with +32-bit tasks on an asymmetric system. Failure to do so is likely to +result in missed deadlines. + +Cpusets +------- + +The affinity of a 32-bit task on an asymmetric system may include CPUs +that are not explicitly allowed by the cpuset to which it is attached. +This can occur as a result of the following two situations: + + - A 64-bit task attached to a cpuset which allows only 64-bit CPUs + executes a 32-bit program. + + - All of the 32-bit-capable CPUs allowed by a cpuset containing a + 32-bit task are offlined. + +In both of these cases, the new affinity is calculated according to step +(2) of the process described in `execve(2)`_ and the cpuset hierarchy is +unchanged irrespective of the cgroup version. + +CPU hotplug +----------- + +On an asymmetric system, the first detected 32-bit-capable CPU is +prevented from being offlined by userspace and any such attempt will +return ``-EPERM``. Note that suspend is still permitted even if the +primary CPU (i.e. CPU 0) is 64-bit-only. + +KVM +--- + +Although KVM will not advertise 32-bit EL0 support to any vCPUs on an +asymmetric system, a broken guest at EL1 could still attempt to execute +32-bit code at EL0. In this case, an exit from a vCPU thread in 32-bit +mode will return to host userspace with an ``exit_reason`` of +``KVM_EXIT_FAIL_ENTRY`` and will remain non-runnable until successfully +re-initialised by a subsequent ``KVM_ARM_VCPU_INIT`` operation. diff --git a/Documentation/arm64/booting.rst b/Documentation/arm64/booting.rst index 5d78a6f5b0ae..8c324ad638de 100644 --- a/Documentation/arm64/booting.rst +++ b/Documentation/arm64/booting.rst @@ -10,9 +10,9 @@ This document is based on the ARM booting document by Russell King and is relevant to all public releases of the AArch64 Linux kernel. The AArch64 exception model is made up of a number of exception levels -(EL0 - EL3), with EL0 and EL1 having a secure and a non-secure -counterpart. EL2 is the hypervisor level and exists only in non-secure -mode. EL3 is the highest priority level and exists only in secure mode. +(EL0 - EL3), with EL0, EL1 and EL2 having a secure and a non-secure +counterpart. EL2 is the hypervisor level, EL3 is the highest priority +level and exists only in secure mode. Both are architecturally optional. For the purposes of this document, we will use the term `boot loader` simply to define all software that executes on the CPU(s) before control @@ -167,13 +167,16 @@ Before jumping into the kernel, the following conditions must be met: All forms of interrupts must be masked in PSTATE.DAIF (Debug, SError, IRQ and FIQ). - The CPU must be in either EL2 (RECOMMENDED in order to have access to - the virtualisation extensions) or non-secure EL1. + The CPU must be in non-secure state, either in EL2 (RECOMMENDED in order + to have access to the virtualisation extensions), or in EL1. - Caches, MMUs The MMU must be off. - Instruction cache may be on or off. + + The instruction cache may be on or off, and must not hold any stale + entries corresponding to the loaded kernel image. + The address range corresponding to the loaded kernel image must be cleaned to the PoC. In the presence of a system cache or other coherent masters with caches enabled, this will typically require @@ -199,14 +202,22 @@ Before jumping into the kernel, the following conditions must be met: - System registers - All writable architected system registers at the exception level where - the kernel image will be entered must be initialised by software at a - higher exception level to prevent execution in an UNKNOWN state. + All writable architected system registers at or below the exception + level where the kernel image will be entered must be initialised by + software at a higher exception level to prevent execution in an UNKNOWN + state. + + For all systems: + - If EL3 is present: + + - SCR_EL3.FIQ must have the same value across all CPUs the kernel is + executing on. + - The value of SCR_EL3.FIQ must be the same as the one present at boot + time whenever the kernel is executing. + + - If EL3 is present and the kernel is entered at EL2: - - SCR_EL3.FIQ must have the same value across all CPUs the kernel is - executing on. - - The value of SCR_EL3.FIQ must be the same as the one present at boot - time whenever the kernel is executing. + - SCR_EL3.HCE (bit 8) must be initialised to 0b1. For systems with a GICv3 interrupt controller to be used in v3 mode: - If EL3 is present: @@ -238,6 +249,7 @@ Before jumping into the kernel, the following conditions must be met: - The DT or ACPI tables must describe a GICv2 interrupt controller. For CPUs with pointer authentication functionality: + - If EL3 is present: - SCR_EL3.APK (bit 16) must be initialised to 0b1 @@ -248,9 +260,120 @@ Before jumping into the kernel, the following conditions must be met: - HCR_EL2.APK (bit 40) must be initialised to 0b1 - HCR_EL2.API (bit 41) must be initialised to 0b1 + For CPUs with Activity Monitors Unit v1 (AMUv1) extension present: + + - If EL3 is present: + + - CPTR_EL3.TAM (bit 30) must be initialised to 0b0 + - CPTR_EL2.TAM (bit 30) must be initialised to 0b0 + - AMCNTENSET0_EL0 must be initialised to 0b1111 + - AMCNTENSET1_EL0 must be initialised to a platform specific value + having 0b1 set for the corresponding bit for each of the auxiliary + counters present. + + - If the kernel is entered at EL1: + + - AMCNTENSET0_EL0 must be initialised to 0b1111 + - AMCNTENSET1_EL0 must be initialised to a platform specific value + having 0b1 set for the corresponding bit for each of the auxiliary + counters present. + + For CPUs with the Fine Grained Traps (FEAT_FGT) extension present: + + - If EL3 is present and the kernel is entered at EL2: + + - SCR_EL3.FGTEn (bit 27) must be initialised to 0b1. + + For CPUs with support for HCRX_EL2 (FEAT_HCX) present: + + - If EL3 is present and the kernel is entered at EL2: + + - SCR_EL3.HXEn (bit 38) must be initialised to 0b1. + + For CPUs with Advanced SIMD and floating point support: + + - If EL3 is present: + + - CPTR_EL3.TFP (bit 10) must be initialised to 0b0. + + - If EL2 is present and the kernel is entered at EL1: + + - CPTR_EL2.TFP (bit 10) must be initialised to 0b0. + + For CPUs with the Scalable Vector Extension (FEAT_SVE) present: + + - if EL3 is present: + + - CPTR_EL3.EZ (bit 8) must be initialised to 0b1. + + - ZCR_EL3.LEN must be initialised to the same value for all CPUs the + kernel is executed on. + + - If the kernel is entered at EL1 and EL2 is present: + + - CPTR_EL2.TZ (bit 8) must be initialised to 0b0. + + - CPTR_EL2.ZEN (bits 17:16) must be initialised to 0b11. + + - ZCR_EL2.LEN must be initialised to the same value for all CPUs the + kernel will execute on. + + For CPUs with the Scalable Matrix Extension (FEAT_SME): + + - If EL3 is present: + + - CPTR_EL3.ESM (bit 12) must be initialised to 0b1. + + - SCR_EL3.EnTP2 (bit 41) must be initialised to 0b1. + + - SMCR_EL3.LEN must be initialised to the same value for all CPUs the + kernel will execute on. + + - If the kernel is entered at EL1 and EL2 is present: + + - CPTR_EL2.TSM (bit 12) must be initialised to 0b0. + + - CPTR_EL2.SMEN (bits 25:24) must be initialised to 0b11. + + - SCTLR_EL2.EnTP2 (bit 60) must be initialised to 0b1. + + - SMCR_EL2.LEN must be initialised to the same value for all CPUs the + kernel will execute on. + + - HWFGRTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01. + + - HWFGWTR_EL2.nTPIDR2_EL0 (bit 55) must be initialised to 0b01. + + - HWFGRTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01. + + - HWFGWTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01. + + For CPUs with the Scalable Matrix Extension FA64 feature (FEAT_SME_FA64) + + - If EL3 is present: + + - SMCR_EL3.FA64 (bit 31) must be initialised to 0b1. + + - If the kernel is entered at EL1 and EL2 is present: + + - SMCR_EL2.FA64 (bit 31) must be initialised to 0b1. + + For CPUs with the Memory Tagging Extension feature (FEAT_MTE2): + + - If EL3 is present: + + - SCR_EL3.ATA (bit 26) must be initialised to 0b1. + + - If the kernel is entered at EL1 and EL2 is present: + + - HCR_EL2.ATA (bit 56) must be initialised to 0b1. + The requirements described above for CPU mode, caches, MMUs, architected timers, coherency and system registers apply to all CPUs. All CPUs must -enter the kernel in the same exception level. +enter the kernel in the same exception level. Where the values documented +disable traps it is permissible for these traps to be enabled so long as +those traps are handled transparently by higher exception levels as though +the values documented were set. The boot loader is expected to enter the kernel on each CPU in the following manner: @@ -290,7 +413,8 @@ following manner: Documentation/devicetree/bindings/arm/psci.yaml. - Secondary CPU general-purpose register settings - x0 = 0 (reserved for future use) - x1 = 0 (reserved for future use) - x2 = 0 (reserved for future use) - x3 = 0 (reserved for future use) + + - x0 = 0 (reserved for future use) + - x1 = 0 (reserved for future use) + - x2 = 0 (reserved for future use) + - x3 = 0 (reserved for future use) diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst index 41937a8091aa..c7adc7897df6 100644 --- a/Documentation/arm64/cpu-feature-registers.rst +++ b/Documentation/arm64/cpu-feature-registers.rst @@ -92,7 +92,7 @@ operation if the source belongs to the supported system register space. The infrastructure emulates only the following system register space:: - Op0=3, Op1=0, CRn=0, CRm=0,4,5,6,7 + Op0=3, Op1=0, CRn=0, CRm=0,2,3,4,5,6,7 (See Table C5-6 'System instruction encodings for non-Debug System register accesses' in ARMv8 ARM DDI 0487A.h, for the list of @@ -171,14 +171,20 @@ infrastructure: 3) ID_AA64PFR1_EL1 - Processor Feature Register 1 + +------------------------------+---------+---------+ | Name | bits | visible | +------------------------------+---------+---------+ + | MTE | [11-8] | y | + +------------------------------+---------+---------+ | SSBS | [7-4] | y | +------------------------------+---------+---------+ + | BT | [3-0] | y | + +------------------------------+---------+---------+ 4) MIDR_EL1 - Main ID Register + +------------------------------+---------+---------+ | Name | bits | visible | +------------------------------+---------+---------+ @@ -229,7 +235,15 @@ infrastructure: | DPB | [3-0] | y | +------------------------------+---------+---------+ - 6) ID_AA64MMFR2_EL1 - Memory model feature register 2 + 6) ID_AA64MMFR0_EL1 - Memory model feature register 0 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | ECV | [63-60] | y | + +------------------------------+---------+---------+ + + 7) ID_AA64MMFR2_EL1 - Memory model feature register 2 +------------------------------+---------+---------+ | Name | bits | visible | @@ -237,7 +251,7 @@ infrastructure: | AT | [35-32] | y | +------------------------------+---------+---------+ - 7) ID_AA64ZFR0_EL1 - SVE feature ID register 0 + 8) ID_AA64ZFR0_EL1 - SVE feature ID register 0 +------------------------------+---------+---------+ | Name | bits | visible | @@ -261,6 +275,61 @@ infrastructure: | SVEVer | [3-0] | y | +------------------------------+---------+---------+ + 8) ID_AA64MMFR1_EL1 - Memory model feature register 1 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | AFP | [47-44] | y | + +------------------------------+---------+---------+ + + 9) ID_AA64ISAR2_EL1 - Instruction set attribute register 2 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | RPRES | [7-4] | y | + +------------------------------+---------+---------+ + | WFXT | [3-0] | y | + +------------------------------+---------+---------+ + + 10) MVFR0_EL1 - AArch32 Media and VFP Feature Register 0 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | FPDP | [11-8] | y | + +------------------------------+---------+---------+ + + 11) MVFR1_EL1 - AArch32 Media and VFP Feature Register 1 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | SIMDFMAC | [31-28] | y | + +------------------------------+---------+---------+ + | SIMDSP | [19-16] | y | + +------------------------------+---------+---------+ + | SIMDInt | [15-12] | y | + +------------------------------+---------+---------+ + | SIMDLS | [11-8] | y | + +------------------------------+---------+---------+ + + 12) ID_ISAR5_EL1 - AArch32 Instruction Set Attribute Register 5 + + +------------------------------+---------+---------+ + | Name | bits | visible | + +------------------------------+---------+---------+ + | CRC32 | [19-16] | y | + +------------------------------+---------+---------+ + | SHA2 | [15-12] | y | + +------------------------------+---------+---------+ + | SHA1 | [11-8] | y | + +------------------------------+---------+---------+ + | AES | [7-4] | y | + +------------------------------+---------+---------+ + + Appendix I: Example ------------------- diff --git a/Documentation/arm64/elf_hwcaps.rst b/Documentation/arm64/elf_hwcaps.rst index 7dfb97dfe416..bb34287c8e01 100644 --- a/Documentation/arm64/elf_hwcaps.rst +++ b/Documentation/arm64/elf_hwcaps.rst @@ -1,3 +1,5 @@ +.. _elf_hwcaps_index: + ================ ARM64 ELF hwcaps ================ @@ -72,7 +74,7 @@ HWCAP_ASIMD HWCAP_EVTSTRM The generic timer is configured to generate events at a frequency of - approximately 100KHz. + approximately 10KHz. HWCAP_AES Functionality implied by ID_AA64ISAR0_EL1.AES == 0b0001. @@ -169,73 +171,110 @@ HWCAP_PACG Documentation/arm64/pointer-authentication.rst. HWCAP2_DCPODP - Functionality implied by ID_AA64ISAR1_EL1.DPB == 0b0010. HWCAP2_SVE2 - Functionality implied by ID_AA64ZFR0_EL1.SVEVer == 0b0001. HWCAP2_SVEAES - Functionality implied by ID_AA64ZFR0_EL1.AES == 0b0001. HWCAP2_SVEPMULL - Functionality implied by ID_AA64ZFR0_EL1.AES == 0b0010. HWCAP2_SVEBITPERM - Functionality implied by ID_AA64ZFR0_EL1.BitPerm == 0b0001. HWCAP2_SVESHA3 - Functionality implied by ID_AA64ZFR0_EL1.SHA3 == 0b0001. HWCAP2_SVESM4 - Functionality implied by ID_AA64ZFR0_EL1.SM4 == 0b0001. HWCAP2_FLAGM2 - Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0010. HWCAP2_FRINT - Functionality implied by ID_AA64ISAR1_EL1.FRINTTS == 0b0001. HWCAP2_SVEI8MM - Functionality implied by ID_AA64ZFR0_EL1.I8MM == 0b0001. HWCAP2_SVEF32MM - Functionality implied by ID_AA64ZFR0_EL1.F32MM == 0b0001. HWCAP2_SVEF64MM - Functionality implied by ID_AA64ZFR0_EL1.F64MM == 0b0001. HWCAP2_SVEBF16 - Functionality implied by ID_AA64ZFR0_EL1.BF16 == 0b0001. HWCAP2_I8MM - Functionality implied by ID_AA64ISAR1_EL1.I8MM == 0b0001. HWCAP2_BF16 - Functionality implied by ID_AA64ISAR1_EL1.BF16 == 0b0001. HWCAP2_DGH - Functionality implied by ID_AA64ISAR1_EL1.DGH == 0b0001. HWCAP2_RNG - Functionality implied by ID_AA64ISAR0_EL1.RNDR == 0b0001. +HWCAP2_BTI + Functionality implied by ID_AA64PFR0_EL1.BT == 0b0001. + +HWCAP2_MTE + Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0010, as described + by Documentation/arm64/memory-tagging-extension.rst. + +HWCAP2_ECV + Functionality implied by ID_AA64MMFR0_EL1.ECV == 0b0001. + +HWCAP2_AFP + Functionality implied by ID_AA64MFR1_EL1.AFP == 0b0001. + +HWCAP2_RPRES + Functionality implied by ID_AA64ISAR2_EL1.RPRES == 0b0001. + +HWCAP2_MTE3 + Functionality implied by ID_AA64PFR1_EL1.MTE == 0b0011, as described + by Documentation/arm64/memory-tagging-extension.rst. + +HWCAP2_SME + Functionality implied by ID_AA64PFR1_EL1.SME == 0b0001, as described + by Documentation/arm64/sme.rst. + +HWCAP2_SME_I16I64 + Functionality implied by ID_AA64SMFR0_EL1.I16I64 == 0b1111. + +HWCAP2_SME_F64F64 + Functionality implied by ID_AA64SMFR0_EL1.F64F64 == 0b1. + +HWCAP2_SME_I8I32 + Functionality implied by ID_AA64SMFR0_EL1.I8I32 == 0b1111. + +HWCAP2_SME_F16F32 + Functionality implied by ID_AA64SMFR0_EL1.F16F32 == 0b1. + +HWCAP2_SME_B16F32 + Functionality implied by ID_AA64SMFR0_EL1.B16F32 == 0b1. + +HWCAP2_SME_F32F32 + Functionality implied by ID_AA64SMFR0_EL1.F32F32 == 0b1. + +HWCAP2_SME_FA64 + Functionality implied by ID_AA64SMFR0_EL1.FA64 == 0b1. + +HWCAP2_WFXT + Functionality implied by ID_AA64ISAR2_EL1.WFXT == 0b0010. + +HWCAP2_EBF16 + Functionality implied by ID_AA64ISAR1_EL1.BF16 == 0b0010. + +HWCAP2_SVE_EBF16 + Functionality implied by ID_AA64ZFR0_EL1.BF16 == 0b0010. + 4. Unused AT_HWCAP bits ----------------------- diff --git a/Documentation/arm64/features.rst b/Documentation/arm64/features.rst new file mode 100644 index 000000000000..dfa4cb3cd3ef --- /dev/null +++ b/Documentation/arm64/features.rst @@ -0,0 +1,3 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. kernel-feat:: $srctree/Documentation/features arm64 diff --git a/Documentation/arm64/hugetlbpage.rst b/Documentation/arm64/hugetlbpage.rst index b44f939e5210..a110124c11e3 100644 --- a/Documentation/arm64/hugetlbpage.rst +++ b/Documentation/arm64/hugetlbpage.rst @@ -1,3 +1,5 @@ +.. _hugetlbpage_index: + ==================== HugeTLBpage on ARM64 ==================== diff --git a/Documentation/arm64/index.rst b/Documentation/arm64/index.rst index 5c0c69dc58aa..ae21f8118830 100644 --- a/Documentation/arm64/index.rst +++ b/Documentation/arm64/index.rst @@ -1,3 +1,5 @@ +.. _arm64_index: + ================== ARM64 Architecture ================== @@ -6,19 +8,26 @@ ARM64 Architecture :maxdepth: 1 acpi_object_usage + amu arm-acpi + asymmetric-32bit booting cpu-feature-registers elf_hwcaps hugetlbpage legacy_instructions memory + memory-tagging-extension + perf pointer-authentication silicon-errata + sme sve tagged-address-abi tagged-pointers + features + .. only:: subproject and html Indices diff --git a/Documentation/arm64/kasan-offsets.sh b/Documentation/arm64/kasan-offsets.sh index 2b7a021db363..2dc5f9e18039 100644 --- a/Documentation/arm64/kasan-offsets.sh +++ b/Documentation/arm64/kasan-offsets.sh @@ -1,12 +1,11 @@ #!/bin/sh # Print out the KASAN_SHADOW_OFFSETS required to place the KASAN SHADOW -# start address at the mid-point of the kernel VA space +# start address at the top of the linear region print_kasan_offset () { printf "%02d\t" $1 printf "0x%08x00000000\n" $(( (0xffffffff & (-1 << ($1 - 1 - 32))) \ - + (1 << ($1 - 32 - $2)) \ - (1 << (64 - 32 - $2)) )) } diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst new file mode 100644 index 000000000000..dbae47bba25e --- /dev/null +++ b/Documentation/arm64/memory-tagging-extension.rst @@ -0,0 +1,375 @@ +=============================================== +Memory Tagging Extension (MTE) in AArch64 Linux +=============================================== + +Authors: Vincenzo Frascino <vincenzo.frascino@arm.com> + Catalin Marinas <catalin.marinas@arm.com> + +Date: 2020-02-25 + +This document describes the provision of the Memory Tagging Extension +functionality in AArch64 Linux. + +Introduction +============ + +ARMv8.5 based processors introduce the Memory Tagging Extension (MTE) +feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI +(Top Byte Ignore) feature and allows software to access a 4-bit +allocation tag for each 16-byte granule in the physical address space. +Such memory range must be mapped with the Normal-Tagged memory +attribute. A logical tag is derived from bits 59-56 of the virtual +address used for the memory access. A CPU with MTE enabled will compare +the logical tag against the allocation tag and potentially raise an +exception on mismatch, subject to system registers configuration. + +Userspace Support +================= + +When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is +supported by the hardware, the kernel advertises the feature to +userspace via ``HWCAP2_MTE``. + +PROT_MTE +-------- + +To access the allocation tags, a user process must enable the Tagged +memory attribute on an address range using a new ``prot`` flag for +``mmap()`` and ``mprotect()``: + +``PROT_MTE`` - Pages allow access to the MTE allocation tags. + +The allocation tag is set to 0 when such pages are first mapped in the +user address space and preserved on copy-on-write. ``MAP_SHARED`` is +supported and the allocation tags can be shared between processes. + +**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and +RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other +types of mapping will result in ``-EINVAL`` returned by these system +calls. + +**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot +be cleared by ``mprotect()``. + +**Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and +``MADV_FREE`` may have the allocation tags cleared (set to 0) at any +point after the system call. + +Tag Check Faults +---------------- + +When ``PROT_MTE`` is enabled on an address range and a mismatch between +the logical and allocation tags occurs on access, there are three +configurable behaviours: + +- *Ignore* - This is the default mode. The CPU (and kernel) ignores the + tag check fault. + +- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with + ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The + memory access is not performed. If ``SIGSEGV`` is ignored or blocked + by the offending thread, the containing process is terminated with a + ``coredump``. + +- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending + thread, asynchronously following one or multiple tag check faults, + with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting + address is unknown). + +- *Asymmetric* - Reads are handled as for synchronous mode while writes + are handled as for asynchronous mode. + +The user can select the above modes, per thread, using the +``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where ``flags`` +contains any number of the following values in the ``PR_MTE_TCF_MASK`` +bit-field: + +- ``PR_MTE_TCF_NONE`` Â - *Ignore* tag check faults + (ignored if combined with other options) +- ``PR_MTE_TCF_SYNC`` - *Synchronous* tag check fault mode +- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode + +If no modes are specified, tag check faults are ignored. If a single +mode is specified, the program will run in that mode. If multiple +modes are specified, the mode is selected as described in the "Per-CPU +preferred tag checking modes" section below. + +The current tag check fault configuration can be read using the +``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call. If +multiple modes were requested then all will be reported. + +Tag checking can also be disabled for a user thread by setting the +``PSTATE.TCO`` bit with ``MSR TCO, #1``. + +**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``, +irrespective of the interrupted context. ``PSTATE.TCO`` is restored on +``sigreturn()``. + +**Note**: There are no *match-all* logical tags available for user +applications. + +**Note**: Kernel accesses to the user address space (e.g. ``read()`` +system call) are not checked if the user thread tag checking mode is +``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is +``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user +address accesses, however it cannot always guarantee it. Kernel accesses +to user addresses are always performed with an effective ``PSTATE.TCO`` +value of zero, regardless of the user configuration. + +Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions +----------------------------------------------------------------- + +The architecture allows excluding certain tags to be randomly generated +via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux +excludes all tags other than 0. A user thread can enable specific tags +in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL, +flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap +in the ``PR_MTE_TAG_MASK`` bit-field. + +**Note**: The hardware uses an exclude mask but the ``prctl()`` +interface provides an include mask. An include mask of ``0`` (exclusion +mask ``0xffff``) results in the CPU always generating tag ``0``. + +Per-CPU preferred tag checking mode +----------------------------------- + +On some CPUs the performance of MTE in stricter tag checking modes +is similar to that of less strict tag checking modes. This makes it +worthwhile to enable stricter checks on those CPUs when a less strict +checking mode is requested, in order to gain the error detection +benefits of the stricter checks without the performance downsides. To +support this scenario, a privileged user may configure a stricter +tag checking mode as the CPU's preferred tag checking mode. + +The preferred tag checking mode for each CPU is controlled by +``/sys/devices/system/cpu/cpu<N>/mte_tcf_preferred``, to which a +privileged user may write the value ``async``, ``sync`` or ``asymm``. The +default preferred mode for each CPU is ``async``. + +To allow a program to potentially run in the CPU's preferred tag +checking mode, the user program may set multiple tag check fault mode +bits in the ``flags`` argument to the ``prctl(PR_SET_TAGGED_ADDR_CTRL, +flags, 0, 0, 0)`` system call. If both synchronous and asynchronous +modes are requested then asymmetric mode may also be selected by the +kernel. If the CPU's preferred tag checking mode is in the task's set +of provided tag checking modes, that mode will be selected. Otherwise, +one of the modes in the task's mode will be selected by the kernel +from the task's mode set using the preference order: + + 1. Asynchronous + 2. Asymmetric + 3. Synchronous + +Note that there is no way for userspace to request multiple modes and +also disable asymmetric mode. + +Initial process state +--------------------- + +On ``execve()``, the new process has the following configuration: + +- ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled) +- No tag checking modes are selected (tag check faults ignored) +- ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded) +- ``PSTATE.TCO`` set to 0 +- ``PROT_MTE`` not set on any of the initial memory maps + +On ``fork()``, the new process inherits the parent's configuration and +memory map attributes with the exception of the ``madvise()`` ranges +with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set +to 0). + +The ``ptrace()`` interface +-------------------------- + +``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read +the tags from or set the tags to a tracee's address space. The +``ptrace()`` system call is invoked as ``ptrace(request, pid, addr, +data)`` where: + +- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_POKEMTETAGS``. +- ``pid`` - the tracee's PID. +- ``addr`` - address in the tracee's address space. +- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to + a buffer of ``iov_len`` length in the tracer's address space. + +The tags in the tracer's ``iov_base`` buffer are represented as one +4-bit tag per byte and correspond to a 16-byte MTE tag granule in the +tracee's address space. + +**Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel +will use the corresponding aligned address. + +``ptrace()`` return value: + +- 0 - tags were copied, the tracer's ``iov_len`` was updated to the + number of tags transferred. This may be smaller than the requested + ``iov_len`` if the requested address range in the tracee's or the + tracer's space cannot be accessed or does not have valid tags. +- ``-EPERM`` - the specified process cannot be traced. +- ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid + address) and no tags copied. ``iov_len`` not updated. +- ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec`` + or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated. +- ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never + mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated. + +**Note**: There are no transient errors for the requests above, so user +programs should not retry in case of a non-zero system call return. + +``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr == +``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged +address ABI control and MTE configuration of a process as per the +``prctl()`` options described in +Documentation/arm64/tagged-address-abi.rst and above. The corresponding +``regset`` is 1 element of 8 bytes (``sizeof(long))``). + +Core dump support +----------------- + +The allocation tags for user memory mapped with ``PROT_MTE`` are dumped +in the core file as additional ``PT_AARCH64_MEMTAG_MTE`` segments. The +program header for such segment is defined as: + +:``p_type``: ``PT_AARCH64_MEMTAG_MTE`` +:``p_flags``: 0 +:``p_offset``: segment file offset +:``p_vaddr``: segment virtual address, same as the corresponding + ``PT_LOAD`` segment +:``p_paddr``: 0 +:``p_filesz``: segment size in file, calculated as ``p_mem_sz / 32`` + (two 4-bit tags cover 32 bytes of memory) +:``p_memsz``: segment size in memory, same as the corresponding + ``PT_LOAD`` segment +:``p_align``: 0 + +The tags are stored in the core file at ``p_offset`` as two 4-bit tags +in a byte. With the tag granule of 16 bytes, a 4K page requires 128 +bytes in the core file. + +Example of correct usage +======================== + +*MTE Example code* + +.. code-block:: c + + /* + * To be compiled with -march=armv8.5-a+memtag + */ + #include <errno.h> + #include <stdint.h> + #include <stdio.h> + #include <stdlib.h> + #include <unistd.h> + #include <sys/auxv.h> + #include <sys/mman.h> + #include <sys/prctl.h> + + /* + * From arch/arm64/include/uapi/asm/hwcap.h + */ + #define HWCAP2_MTE (1 << 18) + + /* + * From arch/arm64/include/uapi/asm/mman.h + */ + #define PROT_MTE 0x20 + + /* + * From include/uapi/linux/prctl.h + */ + #define PR_SET_TAGGED_ADDR_CTRL 55 + #define PR_GET_TAGGED_ADDR_CTRL 56 + # define PR_TAGGED_ADDR_ENABLE (1UL << 0) + # define PR_MTE_TCF_SHIFT 1 + # define PR_MTE_TCF_NONE (0UL << PR_MTE_TCF_SHIFT) + # define PR_MTE_TCF_SYNC (1UL << PR_MTE_TCF_SHIFT) + # define PR_MTE_TCF_ASYNC (2UL << PR_MTE_TCF_SHIFT) + # define PR_MTE_TCF_MASK (3UL << PR_MTE_TCF_SHIFT) + # define PR_MTE_TAG_SHIFT 3 + # define PR_MTE_TAG_MASK (0xffffUL << PR_MTE_TAG_SHIFT) + + /* + * Insert a random logical tag into the given pointer. + */ + #define insert_random_tag(ptr) ({ \ + uint64_t __val; \ + asm("irg %0, %1" : "=r" (__val) : "r" (ptr)); \ + __val; \ + }) + + /* + * Set the allocation tag on the destination address. + */ + #define set_tag(tagged_addr) do { \ + asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \ + } while (0) + + int main() + { + unsigned char *a; + unsigned long page_sz = sysconf(_SC_PAGESIZE); + unsigned long hwcap2 = getauxval(AT_HWCAP2); + + /* check if MTE is present */ + if (!(hwcap2 & HWCAP2_MTE)) + return EXIT_FAILURE; + + /* + * Enable the tagged address ABI, synchronous or asynchronous MTE + * tag check faults (based on per-CPU preference) and allow all + * non-zero tags in the randomly generated set. + */ + if (prctl(PR_SET_TAGGED_ADDR_CTRL, + PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC | + (0xfffe << PR_MTE_TAG_SHIFT), + 0, 0, 0)) { + perror("prctl() failed"); + return EXIT_FAILURE; + } + + a = mmap(0, page_sz, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (a == MAP_FAILED) { + perror("mmap() failed"); + return EXIT_FAILURE; + } + + /* + * Enable MTE on the above anonymous mmap. The flag could be passed + * directly to mmap() and skip this step. + */ + if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) { + perror("mprotect() failed"); + return EXIT_FAILURE; + } + + /* access with the default tag (0) */ + a[0] = 1; + a[1] = 2; + + printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]); + + /* set the logical and allocation tags */ + a = (unsigned char *)insert_random_tag(a); + set_tag(a); + + printf("%p\n", a); + + /* non-zero tag access */ + a[0] = 3; + printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]); + + /* + * If MTE is enabled correctly the next instruction will generate an + * exception. + */ + printf("Expecting SIGSEGV...\n"); + a[16] = 0xdd; + + /* this should not be printed in the PR_MTE_TCF_SYNC mode */ + printf("...haven't got one\n"); + + return EXIT_FAILURE; + } diff --git a/Documentation/arm64/memory.rst b/Documentation/arm64/memory.rst index cf03b3290800..2a641ba7be3b 100644 --- a/Documentation/arm64/memory.rst +++ b/Documentation/arm64/memory.rst @@ -32,17 +32,15 @@ AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit):: ----------------------------------------------------------------------- 0000000000000000 0000ffffffffffff 256TB user ffff000000000000 ffff7fffffffffff 128TB kernel logical memory map - ffff800000000000 ffff9fffffffffff 32TB kasan shadow region - ffffa00000000000 ffffa00007ffffff 128MB bpf jit region - ffffa00008000000 ffffa0000fffffff 128MB modules - ffffa00010000000 fffffdffbffeffff ~93TB vmalloc - fffffdffbfff0000 fffffdfffe5f8fff ~998MB [guard region] - fffffdfffe5f9000 fffffdfffe9fffff 4124KB fixed mappings - fffffdfffea00000 fffffdfffebfffff 2MB [guard region] - fffffdfffec00000 fffffdffffbfffff 16MB PCI I/O space - fffffdffffc00000 fffffdffffdfffff 2MB [guard region] - fffffdffffe00000 ffffffffffdfffff 2TB vmemmap - ffffffffffe00000 ffffffffffffffff 2MB [guard region] + [ffff600000000000 ffff7fffffffffff] 32TB [kasan shadow region] + ffff800000000000 ffff800007ffffff 128MB modules + ffff800008000000 fffffbffefffffff 124TB vmalloc + fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down) + fffffbfffe000000 fffffbfffe7fffff 8MB [guard region] + fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space + fffffbffff800000 fffffbffffffffff 8MB [guard region] + fffffc0000000000 fffffdffffffffff 2TB vmemmap + fffffe0000000000 ffffffffffffffff 2TB [guard region] AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support):: @@ -50,19 +48,16 @@ AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support): Start End Size Use ----------------------------------------------------------------------- 0000000000000000 000fffffffffffff 4PB user - fff0000000000000 fff7ffffffffffff 2PB kernel logical memory map - fff8000000000000 fffd9fffffffffff 1440TB [gap] - fffda00000000000 ffff9fffffffffff 512TB kasan shadow region - ffffa00000000000 ffffa00007ffffff 128MB bpf jit region - ffffa00008000000 ffffa0000fffffff 128MB modules - ffffa00010000000 fffff81ffffeffff ~88TB vmalloc - fffff81fffff0000 fffffc1ffe58ffff ~3TB [guard region] - fffffc1ffe590000 fffffc1ffe9fffff 4544KB fixed mappings - fffffc1ffea00000 fffffc1ffebfffff 2MB [guard region] - fffffc1ffec00000 fffffc1fffbfffff 16MB PCI I/O space - fffffc1fffc00000 fffffc1fffdfffff 2MB [guard region] - fffffc1fffe00000 ffffffffffdfffff 3968GB vmemmap - ffffffffffe00000 ffffffffffffffff 2MB [guard region] + fff0000000000000 ffff7fffffffffff ~4PB kernel logical memory map + [fffd800000000000 ffff7fffffffffff] 512TB [kasan shadow region] + ffff800000000000 ffff800007ffffff 128MB modules + ffff800008000000 fffffbffefffffff 124TB vmalloc + fffffbfff0000000 fffffbfffdffffff 224MB fixed mappings (top down) + fffffbfffe000000 fffffbfffe7fffff 8MB [guard region] + fffffbfffe800000 fffffbffff7fffff 16MB PCI I/O space + fffffbffff800000 fffffbffffffffff 8MB [guard region] + fffffc0000000000 ffffffdfffffffff ~4TB vmemmap + ffffffe000000000 ffffffffffffffff 128GB [guard region] Translation table lookup with 4KB pages:: @@ -100,7 +95,7 @@ hypervisor maps kernel pages in EL2 at a fixed (and potentially random) offset from the linear mapping. See the kern_hyp_va macro and kvm_update_va_mask function for more details. MMIO devices such as GICv2 gets mapped next to the HYP idmap page, as do vectors when -ARM64_HARDEN_EL2_VECTORS is selected for particular CPUs. +ARM64_SPECTRE_V3A is enabled for particular CPUs. When using KVM with the Virtualization Host Extensions, no additional mappings are created, since the host kernel runs directly in EL2. diff --git a/Documentation/arm64/perf.rst b/Documentation/arm64/perf.rst new file mode 100644 index 000000000000..1f87b57c2332 --- /dev/null +++ b/Documentation/arm64/perf.rst @@ -0,0 +1,166 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _perf_index: + +==== +Perf +==== + +Perf Event Attributes +===================== + +:Author: Andrew Murray <andrew.murray@arm.com> +:Date: 2019-03-06 + +exclude_user +------------ + +This attribute excludes userspace. + +Userspace always runs at EL0 and thus this attribute will exclude EL0. + + +exclude_kernel +-------------- + +This attribute excludes the kernel. + +The kernel runs at EL2 with VHE and EL1 without. Guest kernels always run +at EL1. + +For the host this attribute will exclude EL1 and additionally EL2 on a VHE +system. + +For the guest this attribute will exclude EL1. Please note that EL2 is +never counted within a guest. + + +exclude_hv +---------- + +This attribute excludes the hypervisor. + +For a VHE host this attribute is ignored as we consider the host kernel to +be the hypervisor. + +For a non-VHE host this attribute will exclude EL2 as we consider the +hypervisor to be any code that runs at EL2 which is predominantly used for +guest/host transitions. + +For the guest this attribute has no effect. Please note that EL2 is +never counted within a guest. + + +exclude_host / exclude_guest +---------------------------- + +These attributes exclude the KVM host and guest, respectively. + +The KVM host may run at EL0 (userspace), EL1 (non-VHE kernel) and EL2 (VHE +kernel or non-VHE hypervisor). + +The KVM guest may run at EL0 (userspace) and EL1 (kernel). + +Due to the overlapping exception levels between host and guests we cannot +exclusively rely on the PMU's hardware exception filtering - therefore we +must enable/disable counting on the entry and exit to the guest. This is +performed differently on VHE and non-VHE systems. + +For non-VHE systems we exclude EL2 for exclude_host - upon entering and +exiting the guest we disable/enable the event as appropriate based on the +exclude_host and exclude_guest attributes. + +For VHE systems we exclude EL1 for exclude_guest and exclude both EL0,EL2 +for exclude_host. Upon entering and exiting the guest we modify the event +to include/exclude EL0 as appropriate based on the exclude_host and +exclude_guest attributes. + +The statements above also apply when these attributes are used within a +non-VHE guest however please note that EL2 is never counted within a guest. + + +Accuracy +-------- + +On non-VHE hosts we enable/disable counters on the entry/exit of host/guest +transition at EL2 - however there is a period of time between +enabling/disabling the counters and entering/exiting the guest. We are +able to eliminate counters counting host events on the boundaries of guest +entry/exit when counting guest events by filtering out EL2 for +exclude_host. However when using !exclude_hv there is a small blackout +window at the guest entry/exit where host events are not captured. + +On VHE systems there are no blackout windows. + +Perf Userspace PMU Hardware Counter Access +========================================== + +Overview +-------- +The perf userspace tool relies on the PMU to monitor events. It offers an +abstraction layer over the hardware counters since the underlying +implementation is cpu-dependent. +Arm64 allows userspace tools to have access to the registers storing the +hardware counters' values directly. + +This targets specifically self-monitoring tasks in order to reduce the overhead +by directly accessing the registers without having to go through the kernel. + +How-to +------ +The focus is set on the armv8 PMUv3 which makes sure that the access to the pmu +registers is enabled and that the userspace has access to the relevant +information in order to use them. + +In order to have access to the hardware counters, the global sysctl +kernel/perf_user_access must first be enabled: + +.. code-block:: sh + + echo 1 > /proc/sys/kernel/perf_user_access + +It is necessary to open the event using the perf tool interface with config1:1 +attr bit set: the sys_perf_event_open syscall returns a fd which can +subsequently be used with the mmap syscall in order to retrieve a page of memory +containing information about the event. The PMU driver uses this page to expose +to the user the hardware counter's index and other necessary data. Using this +index enables the user to access the PMU registers using the `mrs` instruction. +Access to the PMU registers is only valid while the sequence lock is unchanged. +In particular, the PMSELR_EL0 register is zeroed each time the sequence lock is +changed. + +The userspace access is supported in libperf using the perf_evsel__mmap() +and perf_evsel__read() functions. See `tools/lib/perf/tests/test-evsel.c`_ for +an example. + +About heterogeneous systems +--------------------------- +On heterogeneous systems such as big.LITTLE, userspace PMU counter access can +only be enabled when the tasks are pinned to a homogeneous subset of cores and +the corresponding PMU instance is opened by specifying the 'type' attribute. +The use of generic event types is not supported in this case. + +Have a look at `tools/perf/arch/arm64/tests/user-events.c`_ for an example. It +can be run using the perf tool to check that the access to the registers works +correctly from userspace: + +.. code-block:: sh + + perf test -v user + +About chained events and counter sizes +-------------------------------------- +The user can request either a 32-bit (config1:0 == 0) or 64-bit (config1:0 == 1) +counter along with userspace access. The sys_perf_event_open syscall will fail +if a 64-bit counter is requested and the hardware doesn't support 64-bit +counters. Chained events are not supported in conjunction with userspace counter +access. If a 32-bit counter is requested on hardware with 64-bit counters, then +userspace must treat the upper 32-bits read from the counter as UNKNOWN. The +'pmc_width' field in the user page will indicate the valid width of the counter +and should be used to mask the upper bits as needed. + +.. Links +.. _tools/perf/arch/arm64/tests/user-events.c: + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c +.. _tools/lib/perf/tests/test-evsel.c: + https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c diff --git a/Documentation/arm64/perf.txt b/Documentation/arm64/perf.txt deleted file mode 100644 index 0d6a7d87d49e..000000000000 --- a/Documentation/arm64/perf.txt +++ /dev/null @@ -1,85 +0,0 @@ -Perf Event Attributes -===================== - -Author: Andrew Murray <andrew.murray@arm.com> -Date: 2019-03-06 - -exclude_user ------------- - -This attribute excludes userspace. - -Userspace always runs at EL0 and thus this attribute will exclude EL0. - - -exclude_kernel --------------- - -This attribute excludes the kernel. - -The kernel runs at EL2 with VHE and EL1 without. Guest kernels always run -at EL1. - -For the host this attribute will exclude EL1 and additionally EL2 on a VHE -system. - -For the guest this attribute will exclude EL1. Please note that EL2 is -never counted within a guest. - - -exclude_hv ----------- - -This attribute excludes the hypervisor. - -For a VHE host this attribute is ignored as we consider the host kernel to -be the hypervisor. - -For a non-VHE host this attribute will exclude EL2 as we consider the -hypervisor to be any code that runs at EL2 which is predominantly used for -guest/host transitions. - -For the guest this attribute has no effect. Please note that EL2 is -never counted within a guest. - - -exclude_host / exclude_guest ----------------------------- - -These attributes exclude the KVM host and guest, respectively. - -The KVM host may run at EL0 (userspace), EL1 (non-VHE kernel) and EL2 (VHE -kernel or non-VHE hypervisor). - -The KVM guest may run at EL0 (userspace) and EL1 (kernel). - -Due to the overlapping exception levels between host and guests we cannot -exclusively rely on the PMU's hardware exception filtering - therefore we -must enable/disable counting on the entry and exit to the guest. This is -performed differently on VHE and non-VHE systems. - -For non-VHE systems we exclude EL2 for exclude_host - upon entering and -exiting the guest we disable/enable the event as appropriate based on the -exclude_host and exclude_guest attributes. - -For VHE systems we exclude EL1 for exclude_guest and exclude both EL0,EL2 -for exclude_host. Upon entering and exiting the guest we modify the event -to include/exclude EL0 as appropriate based on the exclude_host and -exclude_guest attributes. - -The statements above also apply when these attributes are used within a -non-VHE guest however please note that EL2 is never counted within a guest. - - -Accuracy --------- - -On non-VHE hosts we enable/disable counters on the entry/exit of host/guest -transition at EL2 - however there is a period of time between -enabling/disabling the counters and entering/exiting the guest. We are -able to eliminate counters counting host events on the boundaries of guest -entry/exit when counting guest events by filtering out EL2 for -exclude_host. However when using !exclude_hv there is a small blackout -window at the guest entry/exit where host events are not captured. - -On VHE systems there are no blackout windows. diff --git a/Documentation/arm64/pointer-authentication.rst b/Documentation/arm64/pointer-authentication.rst index 30b2ab06526b..e5dad2e40aa8 100644 --- a/Documentation/arm64/pointer-authentication.rst +++ b/Documentation/arm64/pointer-authentication.rst @@ -53,11 +53,10 @@ The number of bits that the PAC occupies in a pointer is 55 minus the virtual address size configured by the kernel. For example, with a virtual address size of 48, the PAC is 7 bits wide. -Recent versions of GCC can compile code with APIAKey-based return -address protection when passed the -msign-return-address option. This -uses instructions in the HINT space (unless -march=armv8.3-a or higher -is also passed), and such code can run on systems without the pointer -authentication extension. +When ARM64_PTR_AUTH_KERNEL is selected, the kernel will be compiled +with HINT space pointer authentication instructions protecting +function returns. Kernels built with this option will work on hardware +with or without pointer authentication support. In addition to exec(), keys can also be reinitialized to random values using the PR_PAC_RESET_KEYS prctl. A bitmask of PR_PAC_APIAKEY, @@ -107,3 +106,37 @@ filter out the Pointer Authentication system key registers from KVM_GET/SET_REG_* ioctls and mask those features from cpufeature ID register. Any attempt to use the Pointer Authentication instructions will result in an UNDEFINED exception being injected into the guest. + + +Enabling and disabling keys +--------------------------- + +The prctl PR_PAC_SET_ENABLED_KEYS allows the user program to control which +PAC keys are enabled in a particular task. It takes two arguments, the +first being a bitmask of PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY +and PR_PAC_APDBKEY specifying which keys shall be affected by this prctl, +and the second being a bitmask of the same bits specifying whether the key +should be enabled or disabled. For example:: + + prctl(PR_PAC_SET_ENABLED_KEYS, + PR_PAC_APIAKEY | PR_PAC_APIBKEY | PR_PAC_APDAKEY | PR_PAC_APDBKEY, + PR_PAC_APIBKEY, 0, 0); + +disables all keys except the IB key. + +The main reason why this is useful is to enable a userspace ABI that uses PAC +instructions to sign and authenticate function pointers and other pointers +exposed outside of the function, while still allowing binaries conforming to +the ABI to interoperate with legacy binaries that do not sign or authenticate +pointers. + +The idea is that a dynamic loader or early startup code would issue this +prctl very early after establishing that a process may load legacy binaries, +but before executing any PAC instructions. + +For compatibility with previous kernel versions, processes start up with IA, +IB, DA and DB enabled, and are reset to this state on exec(). Processes created +via fork() and clone() inherit the key enabled state from the calling process. + +It is recommended to avoid disabling the IA key, as this has higher performance +overhead than disabling any of the other keys. diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst index 2c08c628febd..808ade4cc008 100644 --- a/Documentation/arm64/silicon-errata.rst +++ b/Documentation/arm64/silicon-errata.rst @@ -52,6 +52,14 @@ stable kernels. | Allwinner | A64/R18 | UNKNOWN1 | SUN50I_ERRATUM_UNKNOWN1 | +----------------+-----------------+-----------------+-----------------------------+ +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2457168 | ARM64_ERRATUM_2457168 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2064142 | ARM64_ERRATUM_2064142 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2038923 | ARM64_ERRATUM_2038923 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #1902691 | ARM64_ERRATUM_1902691 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 | @@ -64,6 +72,12 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A53 | #843419 | ARM64_ERRATUM_843419 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A55 | #1530923 | ARM64_ERRATUM_1530923 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A55 | #2441007 | ARM64_ERRATUM_2441007 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A57 | #832075 | ARM64_ERRATUM_832075 | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A57 | #852523 | N/A | @@ -72,13 +86,15 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A57 | #1319537 | ARM64_ERRATUM_1319367 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A57 | #1742098 | ARM64_ERRATUM_1742098 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A72 | #853709 | N/A | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A72 | #1319367 | ARM64_ERRATUM_1319367 | +----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 | +| ARM | Cortex-A72 | #1655431 | ARM64_ERRATUM_1742098 | +----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A55 | #1024718 | ARM64_ERRATUM_1024718 | +| ARM | Cortex-A73 | #858921 | ARM64_ERRATUM_858921 | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A76 | #1188873,1418040| ARM64_ERRATUM_1418040 | +----------------+-----------------+-----------------+-----------------------------+ @@ -88,7 +104,25 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A76 | #1463225 | ARM64_ERRATUM_1463225 | +----------------+-----------------+-----------------+-----------------------------+ -| ARM | Cortex-A55 | #1530923 | ARM64_ERRATUM_1530923 | +| ARM | Cortex-A77 | #1508412 | ARM64_ERRATUM_1508412 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2051678 | ARM64_ERRATUM_2051678 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2077057 | ARM64_ERRATUM_2077057 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2441009 | ARM64_ERRATUM_2441009 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #2658417 | ARM64_ERRATUM_2658417 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A710 | #2119858 | ARM64_ERRATUM_2119858 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A710 | #2054223 | ARM64_ERRATUM_2054223 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A710 | #2224489 | ARM64_ERRATUM_2224489 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-X2 | #2119858 | ARM64_ERRATUM_2119858 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-X2 | #2224489 | ARM64_ERRATUM_2224489 | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 | +----------------+-----------------+-----------------+-----------------------------+ @@ -96,6 +130,12 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-N1 | #1542419 | ARM64_ERRATUM_1542419 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-N2 | #2139208 | ARM64_ERRATUM_2139208 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-N2 | #2067961 | ARM64_ERRATUM_2067961 | ++----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-N2 | #2253138 | ARM64_ERRATUM_2253138 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | MMU-500 | #841119,826419 | N/A | +----------------+-----------------+-----------------+-----------------------------+ +----------------+-----------------+-----------------+-----------------------------+ @@ -108,7 +148,7 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | Cavium | ThunderX ITS | #23144 | CAVIUM_ERRATUM_23144 | +----------------+-----------------+-----------------+-----------------------------+ -| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 | +| Cavium | ThunderX GICv3 | #23154,38545 | CAVIUM_ERRATUM_23154 | +----------------+-----------------+-----------------+-----------------------------+ | Cavium | ThunderX GICv3 | #38539 | N/A | +----------------+-----------------+-----------------+-----------------------------+ @@ -125,6 +165,12 @@ stable kernels. | Cavium | ThunderX2 Core | #219 | CAVIUM_TX2_ERRATUM_219 | +----------------+-----------------+-----------------+-----------------------------+ +----------------+-----------------+-----------------+-----------------------------+ +| Marvell | ARM-MMU-500 | #582743 | N/A | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ +| NVIDIA | Carmel Core | N/A | NVIDIA_CARMEL_CNP_ERRATUM | ++----------------+-----------------+-----------------+-----------------------------+ ++----------------+-----------------+-----------------+-----------------------------+ | Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 | +----------------+-----------------+-----------------+-----------------------------+ +----------------+-----------------+-----------------+-----------------------------+ @@ -147,6 +193,17 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | Qualcomm Tech. | Falkor v{1,2} | E1041 | QCOM_FALKOR_ERRATUM_1041 | +----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Kryo4xx Gold | N/A | ARM64_ERRATUM_1463225 | ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Kryo4xx Gold | N/A | ARM64_ERRATUM_1418040 | ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Kryo4xx Silver | N/A | ARM64_ERRATUM_1530923 | ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Kryo4xx Silver | N/A | ARM64_ERRATUM_1024718 | ++----------------+-----------------+-----------------+-----------------------------+ +| Qualcomm Tech. | Kryo4xx Gold | N/A | ARM64_ERRATUM_1286807 | ++----------------+-----------------+-----------------+-----------------------------+ + +----------------+-----------------+-----------------+-----------------------------+ | Fujitsu | A64FX | E#010001 | FUJITSU_ERRATUM_010001 | +----------------+-----------------+-----------------+-----------------------------+ diff --git a/Documentation/arm64/sme.rst b/Documentation/arm64/sme.rst new file mode 100644 index 000000000000..16d2db4c2e2e --- /dev/null +++ b/Documentation/arm64/sme.rst @@ -0,0 +1,431 @@ +=================================================== +Scalable Matrix Extension support for AArch64 Linux +=================================================== + +This document outlines briefly the interface provided to userspace by Linux in +order to support use of the ARM Scalable Matrix Extension (SME). + +This is an outline of the most important features and issues only and not +intended to be exhaustive. It should be read in conjunction with the SVE +documentation in sve.rst which provides details on the Streaming SVE mode +included in SME. + +This document does not aim to describe the SME architecture or programmer's +model. To aid understanding, a minimal description of relevant programmer's +model features for SME is included in Appendix A. + + +1. General +----------- + +* PSTATE.SM, PSTATE.ZA, the streaming mode vector length, the ZA + register state and TPIDR2_EL0 are tracked per thread. + +* The presence of SME is reported to userspace via HWCAP2_SME in the aux vector + AT_HWCAP2 entry. Presence of this flag implies the presence of the SME + instructions and registers, and the Linux-specific system interfaces + described in this document. SME is reported in /proc/cpuinfo as "sme". + +* Support for the execution of SME instructions in userspace can also be + detected by reading the CPU ID register ID_AA64PFR1_EL1 using an MRS + instruction, and checking that the value of the SME field is nonzero. [3] + + It does not guarantee the presence of the system interfaces described in the + following sections: software that needs to verify that those interfaces are + present must check for HWCAP2_SME instead. + +* There are a number of optional SME features, presence of these is reported + through AT_HWCAP2 through: + + HWCAP2_SME_I16I64 + HWCAP2_SME_F64F64 + HWCAP2_SME_I8I32 + HWCAP2_SME_F16F32 + HWCAP2_SME_B16F32 + HWCAP2_SME_F32F32 + HWCAP2_SME_FA64 + + This list may be extended over time as the SME architecture evolves. + + These extensions are also reported via the CPU ID register ID_AA64SMFR0_EL1, + which userspace can read using an MRS instruction. See elf_hwcaps.txt and + cpu-feature-registers.txt for details. + +* Debuggers should restrict themselves to interacting with the target via the + NT_ARM_SVE, NT_ARM_SSVE and NT_ARM_ZA regsets. The recommended way + of detecting support for these regsets is to connect to a target process + first and then attempt a + + ptrace(PTRACE_GETREGSET, pid, NT_ARM_<regset>, &iov). + +* Whenever ZA register values are exchanged in memory between userspace and + the kernel, the register value is encoded in memory as a series of horizontal + vectors from 0 to VL/8-1 stored in the same endianness invariant format as is + used for SVE vectors. + +* On thread creation TPIDR2_EL0 is preserved unless CLONE_SETTLS is specified, + in which case it is set to 0. + +2. Vector lengths +------------------ + +SME defines a second vector length similar to the SVE vector length which is +controls the size of the streaming mode SVE vectors and the ZA matrix array. +The ZA matrix is square with each side having as many bytes as a streaming +mode SVE vector. + + +3. Sharing of streaming and non-streaming mode SVE state +--------------------------------------------------------- + +It is implementation defined which if any parts of the SVE state are shared +between streaming and non-streaming modes. When switching between modes +via software interfaces such as ptrace if no register content is provided as +part of switching no state will be assumed to be shared and everything will +be zeroed. + + +4. System call behaviour +------------------------- + +* On syscall PSTATE.ZA is preserved, if PSTATE.ZA==1 then the contents of the + ZA matrix are preserved. + +* On syscall PSTATE.SM will be cleared and the SVE registers will be handled + as per the standard SVE ABI. + +* Neither the SVE registers nor ZA are used to pass arguments to or receive + results from any syscall. + +* On process creation (eg, clone()) the newly created process will have + PSTATE.SM cleared. + +* All other SME state of a thread, including the currently configured vector + length, the state of the PR_SME_VL_INHERIT flag, and the deferred vector + length (if any), is preserved across all syscalls, subject to the specific + exceptions for execve() described in section 6. + + +5. Signal handling +------------------- + +* Signal handlers are invoked with streaming mode and ZA disabled. + +* A new signal frame record za_context encodes the ZA register contents on + signal delivery. [1] + +* The signal frame record for ZA always contains basic metadata, in particular + the thread's vector length (in za_context.vl). + +* The ZA matrix may or may not be included in the record, depending on + the value of PSTATE.ZA. The registers are present if and only if: + za_context.head.size >= ZA_SIG_CONTEXT_SIZE(sve_vq_from_vl(za_context.vl)) + in which case PSTATE.ZA == 1. + +* If matrix data is present, the remainder of the record has a vl-dependent + size and layout. Macros ZA_SIG_* are defined [1] to facilitate access to + them. + +* The matrix is stored as a series of horizontal vectors in the same format as + is used for SVE vectors. + +* If the ZA context is too big to fit in sigcontext.__reserved[], then extra + space is allocated on the stack, an extra_context record is written in + __reserved[] referencing this space. za_context is then written in the + extra space. Refer to [1] for further details about this mechanism. + + +5. Signal return +----------------- + +When returning from a signal handler: + +* If there is no za_context record in the signal frame, or if the record is + present but contains no register data as described in the previous section, + then ZA is disabled. + +* If za_context is present in the signal frame and contains matrix data then + PSTATE.ZA is set to 1 and ZA is populated with the specified data. + +* The vector length cannot be changed via signal return. If za_context.vl in + the signal frame does not match the current vector length, the signal return + attempt is treated as illegal, resulting in a forced SIGSEGV. + + +6. prctl extensions +-------------------- + +Some new prctl() calls are added to allow programs to manage the SME vector +length: + +prctl(PR_SME_SET_VL, unsigned long arg) + + Sets the vector length of the calling thread and related flags, where + arg == vl | flags. Other threads of the calling process are unaffected. + + vl is the desired vector length, where sve_vl_valid(vl) must be true. + + flags: + + PR_SME_VL_INHERIT + + Inherit the current vector length across execve(). Otherwise, the + vector length is reset to the system default at execve(). (See + Section 9.) + + PR_SME_SET_VL_ONEXEC + + Defer the requested vector length change until the next execve() + performed by this thread. + + The effect is equivalent to implicit execution of the following + call immediately after the next execve() (if any) by the thread: + + prctl(PR_SME_SET_VL, arg & ~PR_SME_SET_VL_ONEXEC) + + This allows launching of a new program with a different vector + length, while avoiding runtime side effects in the caller. + + Without PR_SME_SET_VL_ONEXEC, the requested change takes effect + immediately. + + + Return value: a nonnegative on success, or a negative value on error: + EINVAL: SME not supported, invalid vector length requested, or + invalid flags. + + + On success: + + * Either the calling thread's vector length or the deferred vector length + to be applied at the next execve() by the thread (dependent on whether + PR_SME_SET_VL_ONEXEC is present in arg), is set to the largest value + supported by the system that is less than or equal to vl. If vl == + SVE_VL_MAX, the value set will be the largest value supported by the + system. + + * Any previously outstanding deferred vector length change in the calling + thread is cancelled. + + * The returned value describes the resulting configuration, encoded as for + PR_SME_GET_VL. The vector length reported in this value is the new + current vector length for this thread if PR_SME_SET_VL_ONEXEC was not + present in arg; otherwise, the reported vector length is the deferred + vector length that will be applied at the next execve() by the calling + thread. + + * Changing the vector length causes all of ZA, P0..P15, FFR and all bits of + Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become + unspecified, including both streaming and non-streaming SVE state. + Calling PR_SME_SET_VL with vl equal to the thread's current vector + length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag, + does not constitute a change to the vector length for this purpose. + + * Changing the vector length causes PSTATE.ZA and PSTATE.SM to be cleared. + Calling PR_SME_SET_VL with vl equal to the thread's current vector + length, or calling PR_SME_SET_VL with the PR_SVE_SET_VL_ONEXEC flag, + does not constitute a change to the vector length for this purpose. + + +prctl(PR_SME_GET_VL) + + Gets the vector length of the calling thread. + + The following flag may be OR-ed into the result: + + PR_SME_VL_INHERIT + + Vector length will be inherited across execve(). + + There is no way to determine whether there is an outstanding deferred + vector length change (which would only normally be the case between a + fork() or vfork() and the corresponding execve() in typical use). + + To extract the vector length from the result, bitwise and it with + PR_SME_VL_LEN_MASK. + + Return value: a nonnegative value on success, or a negative value on error: + EINVAL: SME not supported. + + +7. ptrace extensions +--------------------- + +* A new regset NT_ARM_SSVE is defined for access to streaming mode SVE + state via PTRACE_GETREGSET and PTRACE_SETREGSET, this is documented in + sve.rst. + +* A new regset NT_ARM_ZA is defined for ZA state for access to ZA state via + PTRACE_GETREGSET and PTRACE_SETREGSET. + + Refer to [2] for definitions. + +The regset data starts with struct user_za_header, containing: + + size + + Size of the complete regset, in bytes. + This depends on vl and possibly on other things in the future. + + If a call to PTRACE_GETREGSET requests less data than the value of + size, the caller can allocate a larger buffer and retry in order to + read the complete regset. + + max_size + + Maximum size in bytes that the regset can grow to for the target + thread. The regset won't grow bigger than this even if the target + thread changes its vector length etc. + + vl + + Target thread's current streaming vector length, in bytes. + + max_vl + + Maximum possible streaming vector length for the target thread. + + flags + + Zero or more of the following flags, which have the same + meaning and behaviour as the corresponding PR_SET_VL_* flags: + + SME_PT_VL_INHERIT + + SME_PT_VL_ONEXEC (SETREGSET only). + +* The effects of changing the vector length and/or flags are equivalent to + those documented for PR_SME_SET_VL. + + The caller must make a further GETREGSET call if it needs to know what VL is + actually set by SETREGSET, unless is it known in advance that the requested + VL is supported. + +* The size and layout of the payload depends on the header fields. The + SME_PT_ZA_*() macros are provided to facilitate access to the data. + +* In either case, for SETREGSET it is permissible to omit the payload, in which + case the vector length and flags are changed and PSTATE.ZA is set to 0 + (along with any consequences of those changes). If a payload is provided + then PSTATE.ZA will be set to 1. + +* For SETREGSET, if the requested VL is not supported, the effect will be the + same as if the payload were omitted, except that an EIO error is reported. + No attempt is made to translate the payload data to the correct layout + for the vector length actually set. It is up to the caller to translate the + payload layout for the actual VL and retry. + +* The effect of writing a partial, incomplete payload is unspecified. + + +8. ELF coredump extensions +--------------------------- + +* NT_ARM_SSVE notes will be added to each coredump for + each thread of the dumped process. The contents will be equivalent to the + data that would have been read if a PTRACE_GETREGSET of the corresponding + type were executed for each thread when the coredump was generated. + +* A NT_ARM_ZA note will be added to each coredump for each thread of the + dumped process. The contents will be equivalent to the data that would have + been read if a PTRACE_GETREGSET of NT_ARM_ZA were executed for each thread + when the coredump was generated. + +* The NT_ARM_TLS note will be extended to two registers, the second register + will contain TPIDR2_EL0 on systems that support SME and will be read as + zero with writes ignored otherwise. + +9. System runtime configuration +-------------------------------- + +* To mitigate the ABI impact of expansion of the signal frame, a policy + mechanism is provided for administrators, distro maintainers and developers + to set the default vector length for userspace processes: + +/proc/sys/abi/sme_default_vector_length + + Writing the text representation of an integer to this file sets the system + default vector length to the specified value, unless the value is greater + than the maximum vector length supported by the system in which case the + default vector length is set to that maximum. + + The result can be determined by reopening the file and reading its + contents. + + At boot, the default vector length is initially set to 32 or the maximum + supported vector length, whichever is smaller and supported. This + determines the initial vector length of the init process (PID 1). + + Reading this file returns the current system default vector length. + +* At every execve() call, the new vector length of the new process is set to + the system default vector length, unless + + * PR_SME_VL_INHERIT (or equivalently SME_PT_VL_INHERIT) is set for the + calling thread, or + + * a deferred vector length change is pending, established via the + PR_SME_SET_VL_ONEXEC flag (or SME_PT_VL_ONEXEC). + +* Modifying the system default vector length does not affect the vector length + of any existing process or thread that does not make an execve() call. + + +Appendix A. SME programmer's model (informative) +================================================= + +This section provides a minimal description of the additions made by SME to the +ARMv8-A programmer's model that are relevant to this document. + +Note: This section is for information only and not intended to be complete or +to replace any architectural specification. + +A.1. Registers +--------------- + +In A64 state, SME adds the following: + +* A new mode, streaming mode, in which a subset of the normal FPSIMD and SVE + features are available. When supported EL0 software may enter and leave + streaming mode at any time. + + For best system performance it is strongly encouraged for software to enable + streaming mode only when it is actively being used. + +* A new vector length controlling the size of ZA and the Z registers when in + streaming mode, separately to the vector length used for SVE when not in + streaming mode. There is no requirement that either the currently selected + vector length or the set of vector lengths supported for the two modes in + a given system have any relationship. The streaming mode vector length + is referred to as SVL. + +* A new ZA matrix register. This is a square matrix of SVLxSVL bits. Most + operations on ZA require that streaming mode be enabled but ZA can be + enabled without streaming mode in order to load, save and retain data. + + For best system performance it is strongly encouraged for software to enable + ZA only when it is actively being used. + +* Two new 1 bit fields in PSTATE which may be controlled via the SMSTART and + SMSTOP instructions or by access to the SVCR system register: + + * PSTATE.ZA, if this is 1 then the ZA matrix is accessible and has valid + data while if it is 0 then ZA can not be accessed. When PSTATE.ZA is + changed from 0 to 1 all bits in ZA are cleared. + + * PSTATE.SM, if this is 1 then the PE is in streaming mode. When the value + of PSTATE.SM is changed then it is implementation defined if the subset + of the floating point register bits valid in both modes may be retained. + Any other bits will be cleared. + + +References +========== + +[1] arch/arm64/include/uapi/asm/sigcontext.h + AArch64 Linux signal ABI definitions + +[2] arch/arm64/include/uapi/asm/ptrace.h + AArch64 Linux ptrace ABI definitions + +[3] Documentation/arm64/cpu-feature-registers.rst diff --git a/Documentation/arm64/sve.rst b/Documentation/arm64/sve.rst index 5689c74c8082..f338ee2df46d 100644 --- a/Documentation/arm64/sve.rst +++ b/Documentation/arm64/sve.rst @@ -7,7 +7,9 @@ Author: Dave Martin <Dave.Martin@arm.com> Date: 4 August 2017 This document outlines briefly the interface provided to userspace by Linux in -order to support use of the ARM Scalable Vector Extension (SVE). +order to support use of the ARM Scalable Vector Extension (SVE), including +interactions with Streaming SVE mode added by the Scalable Matrix Extension +(SME). This is an outline of the most important features and issues only and not intended to be exhaustive. @@ -23,6 +25,10 @@ model features for SVE is included in Appendix A. * SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are tracked per-thread. +* In streaming mode FFR is not accessible unless HWCAP2_SME_FA64 is present + in the system, when it is not supported and these interfaces are used to + access streaming mode FFR is read and written as zero. + * The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector AT_HWCAP entry. Presence of this flag implies the presence of the SVE instructions and registers, and the Linux-specific system interfaces @@ -53,10 +59,19 @@ model features for SVE is included in Appendix A. which userspace can read using an MRS instruction. See elf_hwcaps.txt and cpu-feature-registers.txt for details. +* On hardware that supports the SME extensions, HWCAP2_SME will also be + reported in the AT_HWCAP2 aux vector entry. Among other things SME adds + streaming mode which provides a subset of the SVE feature set using a + separate SME vector length and the same Z/V registers. See sme.rst + for more details. + * Debuggers should restrict themselves to interacting with the target via the NT_ARM_SVE regset. The recommended way of detecting support for this regset is to connect to a target process first and then attempt a - ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov). + ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov). Note that when SME is + present and streaming SVE mode is in use the FPSIMD subset of registers + will be read via NT_ARM_SVE and NT_ARM_SVE writes will exit streaming mode + in the target. * Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory between userspace and the kernel, the register value is encoded in memory in @@ -96,7 +111,7 @@ the SVE instruction set architecture. * On syscall, V0..V31 are preserved (as without SVE). Thus, bits [127:0] of Z0..Z31 are preserved. All other bits of Z0..Z31, and all of P0..P15 and FFR - become unspecified on return from a syscall. + become zero on return from a syscall. * The SVE registers are not used to pass arguments to or receive results from any syscall. @@ -126,6 +141,11 @@ the SVE instruction set architecture. are only present in fpsimd_context. For convenience, the content of V0..V31 is duplicated between sve_context and fpsimd_context. +* The record contains a flag field which includes a flag SVE_SIG_FLAG_SM which + if set indicates that the thread is in streaming mode and the vector length + and register data (if present) describe the streaming SVE data and vector + length. + * The signal frame record for SVE always contains basic metadata, in particular the thread's vector length (in sve_context.vl). @@ -170,6 +190,11 @@ When returning from a signal handler: the signal frame does not match the current vector length, the signal return attempt is treated as illegal, resulting in a forced SIGSEGV. +* It is permitted to enter or leave streaming mode by setting or clearing + the SVE_SIG_FLAG_SM flag but applications should take care to ensure that + when doing so sve_context.vl and any register data are appropriate for the + vector length in the new mode. + 6. prctl extensions -------------------- @@ -186,7 +211,7 @@ prctl(PR_SVE_SET_VL, unsigned long arg) flags: - PR_SVE_SET_VL_INHERIT + PR_SVE_VL_INHERIT Inherit the current vector length across execve(). Otherwise, the vector length is reset to the system default at execve(). (See @@ -247,7 +272,7 @@ prctl(PR_SVE_GET_VL) The following flag may be OR-ed into the result: - PR_SVE_SET_VL_INHERIT + PR_SVE_VL_INHERIT Vector length will be inherited across execve(). @@ -255,7 +280,7 @@ prctl(PR_SVE_GET_VL) vector length change (which would only normally be the case between a fork() or vfork() and the corresponding execve() in typical use). - To extract the vector length from the result, and it with + To extract the vector length from the result, bitwise and it with PR_SVE_VL_LEN_MASK. Return value: a nonnegative value on success, or a negative value on error: @@ -265,8 +290,14 @@ prctl(PR_SVE_GET_VL) 7. ptrace extensions --------------------- -* A new regset NT_ARM_SVE is defined for use with PTRACE_GETREGSET and - PTRACE_SETREGSET. +* New regsets NT_ARM_SVE and NT_ARM_SSVE are defined for use with + PTRACE_GETREGSET and PTRACE_SETREGSET. NT_ARM_SSVE describes the + streaming mode SVE registers and NT_ARM_SVE describes the + non-streaming mode SVE registers. + + In this description a register set is referred to as being "live" when + the target is in the appropriate streaming or non-streaming mode and is + using data beyond the subset shared with the FPSIMD Vn registers. Refer to [2] for definitions. @@ -297,7 +328,7 @@ The regset data starts with struct user_sve_header, containing: flags - either + at most one of SVE_PT_REGS_FPSIMD @@ -331,6 +362,10 @@ The regset data starts with struct user_sve_header, containing: SVE_PT_VL_ONEXEC (SETREGSET only). + If neither FPSIMD nor SVE flags are provided then no register + payload is available, this is only possible when SME is implemented. + + * The effects of changing the vector length and/or flags are equivalent to those documented for PR_SVE_SET_VL. @@ -346,6 +381,13 @@ The regset data starts with struct user_sve_header, containing: case only the vector length and flags are changed (along with any consequences of those changes). +* In systems supporting SME when in streaming mode a GETREGSET for + NT_REG_SVE will return only the user_sve_header with no register data, + similarly a GETREGSET for NT_REG_SSVE will not return any register data + when not in streaming mode. + +* A GETREGSET for NT_ARM_SSVE will never return SVE_PT_REGS_FPSIMD. + * For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the requested VL is not supported, the effect will be the same as if the payload were omitted, except that an EIO error is reported. No @@ -355,17 +397,25 @@ The regset data starts with struct user_sve_header, containing: unspecified. It is up to the caller to translate the payload layout for the actual VL and retry. +* Where SME is implemented it is not possible to GETREGSET the register + state for normal SVE when in streaming mode, nor the streaming mode + register state when in normal mode, regardless of the implementation defined + behaviour of the hardware for sharing data between the two modes. + +* Any SETREGSET of NT_ARM_SVE will exit streaming mode if the target was in + streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode + if the target was not in streaming mode. + * The effect of writing a partial, incomplete payload is unspecified. 8. ELF coredump extensions --------------------------- -* A NT_ARM_SVE note will be added to each coredump for each thread of the - dumped process. The contents will be equivalent to the data that would have - been read if a PTRACE_GETREGSET of NT_ARM_SVE were executed for each thread - when the coredump was generated. - +* NT_ARM_SVE and NT_ARM_SSVE notes will be added to each coredump for + each thread of the dumped process. The contents will be equivalent to the + data that would have been read if a PTRACE_GETREGSET of the corresponding + type were executed for each thread when the coredump was generated. 9. System runtime configuration -------------------------------- @@ -393,7 +443,7 @@ The regset data starts with struct user_sve_header, containing: * At every execve() call, the new vector length of the new process is set to the system default vector length, unless - * PR_SVE_SET_VL_INHERIT (or equivalently SVE_PT_VL_INHERIT) is set for the + * PR_SVE_VL_INHERIT (or equivalently SVE_PT_VL_INHERIT) is set for the calling thread, or * a deferred vector length change is pending, established via the @@ -402,6 +452,24 @@ The regset data starts with struct user_sve_header, containing: * Modifying the system default vector length does not affect the vector length of any existing process or thread that does not make an execve() call. +10. Perf extensions +-------------------------------- + +* The arm64 specific DWARF standard [5] added the VG (Vector Granule) register + at index 46. This register is used for DWARF unwinding when variable length + SVE registers are pushed onto the stack. + +* Its value is equivalent to the current SVE vector length (VL) in bits divided + by 64. + +* The value is included in Perf samples in the regs[46] field if + PERF_SAMPLE_REGS_USER is set and the sample_regs_user mask has bit 46 set. + +* The value is the current value at the time the sample was taken, and it can + change over time. + +* If the system doesn't support SVE when perf_event_open is called with these + settings, the event will fail to open. Appendix A. SVE programmer's model (informative) ================================================= @@ -494,7 +562,7 @@ Appendix B. ARMv8-A FP/SIMD programmer's model Note: This section is for information only and not intended to be complete or to replace any architectural specification. -Refer to [4] for for more information. +Refer to [4] for more information. ARMv8-A defines the following floating-point / SIMD register state: @@ -543,3 +611,5 @@ References http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html Procedure Call Standard for the ARM 64-bit Architecture (AArch64) + +[5] https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst diff --git a/Documentation/arm64/tagged-address-abi.rst b/Documentation/arm64/tagged-address-abi.rst index 4a9d9c794ee5..540a1d4fc6c9 100644 --- a/Documentation/arm64/tagged-address-abi.rst +++ b/Documentation/arm64/tagged-address-abi.rst @@ -40,19 +40,29 @@ space obtained in one of the following ways: during creation and with the same restrictions as for ``mmap()`` above (e.g. data, bss, stack). -The AArch64 Tagged Address ABI has two stages of relaxation depending +The AArch64 Tagged Address ABI has two stages of relaxation depending on how the user addresses are used by the kernel: 1. User addresses not accessed by the kernel but used for address space management (e.g. ``mprotect()``, ``madvise()``). The use of valid - tagged pointers in this context is allowed with the exception of - ``brk()``, ``mmap()`` and the ``new_address`` argument to - ``mremap()`` as these have the potential to alias with existing - user addresses. + tagged pointers in this context is allowed with these exceptions: - NOTE: This behaviour changed in v5.6 and so some earlier kernels may - incorrectly accept valid tagged pointers for the ``brk()``, - ``mmap()`` and ``mremap()`` system calls. + - ``brk()``, ``mmap()`` and the ``new_address`` argument to + ``mremap()`` as these have the potential to alias with existing + user addresses. + + NOTE: This behaviour changed in v5.6 and so some earlier kernels may + incorrectly accept valid tagged pointers for the ``brk()``, + ``mmap()`` and ``mremap()`` system calls. + + - The ``range.start``, ``start`` and ``dst`` arguments to the + ``UFFDIO_*`` ``ioctl()``s used on a file descriptor obtained from + ``userfaultfd()``, as fault addresses subsequently obtained by reading + the file descriptor will be untagged, which may otherwise confuse + tag-unaware programs. + + NOTE: This behaviour changed in v5.14 and so some earlier kernels may + incorrectly accept valid tagged pointers for this system call. 2. User addresses accessed by the kernel (e.g. ``write()``). This ABI relaxation is disabled by default and the application thread needs to @@ -113,6 +123,12 @@ ABI relaxation: - ``shmat()`` and ``shmdt()``. +- ``brk()`` (since kernel v5.6). + +- ``mmap()`` (since kernel v5.6). + +- ``mremap()``, the ``new_address`` argument (since kernel v5.6). + Any attempt to use non-zero tagged pointers may result in an error code being returned, a (fatal) signal being raised, or other modes of failure. diff --git a/Documentation/arm64/tagged-pointers.rst b/Documentation/arm64/tagged-pointers.rst index eab4323609b9..19d284b70384 100644 --- a/Documentation/arm64/tagged-pointers.rst +++ b/Documentation/arm64/tagged-pointers.rst @@ -53,12 +53,25 @@ visibility. Preserving tags --------------- -Non-zero tags are not preserved when delivering signals. This means that -signal handlers in applications making use of tags cannot rely on the -tag information for user virtual addresses being maintained for fields -inside siginfo_t. One exception to this rule is for signals raised in -response to watchpoint debug exceptions, where the tag information will -be preserved. +When delivering signals, non-zero tags are not preserved in +siginfo.si_addr unless the flag SA_EXPOSE_TAGBITS was set in +sigaction.sa_flags when the signal handler was installed. This means +that signal handlers in applications making use of tags cannot rely +on the tag information for user virtual addresses being maintained +in these fields unless the flag was set. + +Due to architecture limitations, bits 63:60 of the fault address +are not preserved in response to synchronous tag check faults +(SEGV_MTESERR) even if SA_EXPOSE_TAGBITS was set. Applications should +treat the values of these bits as undefined in order to accommodate +future architecture revisions which may preserve the bits. + +For signals raised in response to watchpoint debug exceptions, the +tag information will be preserved regardless of the SA_EXPOSE_TAGBITS +flag setting. + +Non-zero tags are never preserved in sigcontext.fault_address +regardless of the SA_EXPOSE_TAGBITS flag setting. The architecture prevents the use of a tagged PC, so the upper byte will be set to a sign-extension of bit 55 on exception return. |