aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2022-01-10 08:49:37 -0800
committerLinus Torvalds <torvalds@linux-foundation.org>2022-01-10 08:49:37 -0800
commit9b9e211360044c12d7738973c944d6f300134881 (patch)
treef994a379135e70397df3c653e6578ad7e5d295fe /Documentation
parentMerge tag 'asm-generic-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic (diff)
parentMerge branches 'for-next/misc', 'for-next/cache-ops-dzp', 'for-next/stacktrace', 'for-next/xor-neon', 'for-next/kasan', 'for-next/armv8_7-fp', 'for-next/atomics', 'for-next/bti', 'for-next/sve', 'for-next/kselftest' and 'for-next/kcsan', remote-tracking branch 'arm64/for-next/perf' into for-next/... (diff)
downloadlinux-dev-9b9e211360044c12d7738973c944d6f300134881.tar.xz
linux-dev-9b9e211360044c12d7738973c944d6f300134881.zip
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas: - KCSAN enabled for arm64. - Additional kselftests to exercise the syscall ABI w.r.t. SVE/FPSIMD. - Some more SVE clean-ups and refactoring in preparation for SME support (scalable matrix extensions). - BTI clean-ups (SYM_FUNC macros etc.) - arm64 atomics clean-up and codegen improvements. - HWCAPs for FEAT_AFP (alternate floating point behaviour) and FEAT_RPRESS (increased precision of reciprocal estimate and reciprocal square root estimate). - Use SHA3 instructions to speed-up XOR. - arm64 unwind code refactoring/unification. - Avoid DC (data cache maintenance) instructions when DCZID_EL0.DZP == 1 (potentially set by a hypervisor; user-space already does this). - Perf updates for arm64: support for CI-700, HiSilicon PCIe PMU, Marvell CN10K LLC-TAD PMU, miscellaneous clean-ups. - Other fixes and clean-ups; highlights: fix the handling of erratum 1418040, correct the calculation of the nomap region boundaries, introduce io_stop_wc() mapped to the new DGH instruction (data gathering hint). * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (81 commits) arm64: Use correct method to calculate nomap region boundaries arm64: Drop outdated links in comments arm64: perf: Don't register user access sysctl handler multiple times drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check perf/smmuv3: Fix unused variable warning when CONFIG_OF=n arm64: errata: Fix exec handling in erratum 1418040 workaround arm64: Unhash early pointer print plus improve comment asm-generic: introduce io_stop_wc() and add implementation for ARM64 arm64: Ensure that the 'bti' macro is defined where linkage.h is included arm64: remove __dma_*_area() aliases docs/arm64: delete a space from tagged-address-abi arm64: Enable KCSAN kselftest/arm64: Add pidbench for floating point syscall cases arm64/fp: Add comments documenting the usage of state restore functions kselftest/arm64: Add a test program to exercise the syscall ABI kselftest/arm64: Allow signal tests to trigger from a function kselftest/arm64: Parameterise ptrace vector length information arm64/sve: Minor clarification of ABI documentation arm64/sve: Generalise vector length configuration prctl() for SME arm64/sve: Make sysctl interface for SVE reusable by SME ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/admin-guide/perf/hisi-pcie-pmu.rst106
-rw-r--r--Documentation/admin-guide/sysctl/kernel.rst11
-rw-r--r--Documentation/arm64/cpu-feature-registers.rst17
-rw-r--r--Documentation/arm64/elf_hwcaps.rst8
-rw-r--r--Documentation/arm64/perf.rst78
-rw-r--r--Documentation/arm64/sve.rst2
-rw-r--r--Documentation/arm64/tagged-address-abi.rst2
-rw-r--r--Documentation/devicetree/bindings/perf/arm,cmn.yaml21
-rw-r--r--Documentation/devicetree/bindings/perf/arm,smmu-v3-pmcg.yaml70
-rw-r--r--Documentation/devicetree/bindings/perf/marvell-cn10k-tad.yaml63
-rw-r--r--Documentation/memory-barriers.txt8
11 files changed, 378 insertions, 8 deletions
diff --git a/Documentation/admin-guide/perf/hisi-pcie-pmu.rst b/Documentation/admin-guide/perf/hisi-pcie-pmu.rst
new file mode 100644
index 000000000000..294ebbdb22af
--- /dev/null
+++ b/Documentation/admin-guide/perf/hisi-pcie-pmu.rst
@@ -0,0 +1,106 @@
+================================================
+HiSilicon PCIe Performance Monitoring Unit (PMU)
+================================================
+
+On Hip09, HiSilicon PCIe Performance Monitoring Unit (PMU) could monitor
+bandwidth, latency, bus utilization and buffer occupancy data of PCIe.
+
+Each PCIe Core has a PMU to monitor multi Root Ports of this PCIe Core and
+all Endpoints downstream these Root Ports.
+
+
+HiSilicon PCIe PMU driver
+=========================
+
+The PCIe PMU driver registers a perf PMU with the name of its sicl-id and PCIe
+Core id.::
+
+ /sys/bus/event_source/hisi_pcie<sicl>_<core>
+
+PMU driver provides description of available events and filter options in sysfs,
+see /sys/bus/event_source/devices/hisi_pcie<sicl>_<core>.
+
+The "format" directory describes all formats of the config (events) and config1
+(filter options) fields of the perf_event_attr structure. The "events" directory
+describes all documented events shown in perf list.
+
+The "identifier" sysfs file allows users to identify the version of the
+PMU hardware device.
+
+The "bus" sysfs file allows users to get the bus number of Root Ports
+monitored by PMU.
+
+Example usage of perf::
+
+ $# perf list
+ hisi_pcie0_0/rx_mwr_latency/ [kernel PMU event]
+ hisi_pcie0_0/rx_mwr_cnt/ [kernel PMU event]
+ ------------------------------------------
+
+ $# perf stat -e hisi_pcie0_0/rx_mwr_latency/
+ $# perf stat -e hisi_pcie0_0/rx_mwr_cnt/
+ $# perf stat -g -e hisi_pcie0_0/rx_mwr_latency/ -e hisi_pcie0_0/rx_mwr_cnt/
+
+The current driver does not support sampling. So "perf record" is unsupported.
+Also attach to a task is unsupported for PCIe PMU.
+
+Filter options
+--------------
+
+1. Target filter
+PMU could only monitor the performance of traffic downstream target Root Ports
+or downstream target Endpoint. PCIe PMU driver support "port" and "bdf"
+interfaces for users, and these two interfaces aren't supported at the same
+time.
+
+-port
+"port" filter can be used in all PCIe PMU events, target Root Port can be
+selected by configuring the 16-bits-bitmap "port". Multi ports can be selected
+for AP-layer-events, and only one port can be selected for TL/DL-layer-events.
+
+For example, if target Root Port is 0000:00:00.0 (x8 lanes), bit0 of bitmap
+should be set, port=0x1; if target Root Port is 0000:00:04.0 (x4 lanes),
+bit8 is set, port=0x100; if these two Root Ports are both monitored, port=0x101.
+
+Example usage of perf::
+
+ $# perf stat -e hisi_pcie0_0/rx_mwr_latency,port=0x1/ sleep 5
+
+-bdf
+
+"bdf" filter can only be used in bandwidth events, target Endpoint is selected
+by configuring BDF to "bdf". Counter only counts the bandwidth of message
+requested by target Endpoint.
+
+For example, "bdf=0x3900" means BDF of target Endpoint is 0000:39:00.0.
+
+Example usage of perf::
+
+ $# perf stat -e hisi_pcie0_0/rx_mrd_flux,bdf=0x3900/ sleep 5
+
+2. Trigger filter
+Event statistics start when the first time TLP length is greater/smaller
+than trigger condition. You can set the trigger condition by writing "trig_len",
+and set the trigger mode by writing "trig_mode". This filter can only be used
+in bandwidth events.
+
+For example, "trig_len=4" means trigger condition is 2^4 DW, "trig_mode=0"
+means statistics start when TLP length > trigger condition, "trig_mode=1"
+means start when TLP length < condition.
+
+Example usage of perf::
+
+ $# perf stat -e hisi_pcie0_0/rx_mrd_flux,trig_len=0x4,trig_mode=1/ sleep 5
+
+3. Threshold filter
+Counter counts when TLP length within the specified range. You can set the
+threshold by writing "thr_len", and set the threshold mode by writing
+"thr_mode". This filter can only be used in bandwidth events.
+
+For example, "thr_len=4" means threshold is 2^4 DW, "thr_mode=0" means
+counter counts when TLP length >= threshold, and "thr_mode=1" means counts
+when TLP length < threshold.
+
+Example usage of perf::
+
+ $# perf stat -e hisi_pcie0_0/rx_mrd_flux,thr_len=0x4,thr_mode=1/ sleep 5
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 0e486f41185e..d359bcfadd39 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -905,6 +905,17 @@ enabled, otherwise writing to this file will return ``-EBUSY``.
The default value is 8.
+perf_user_access (arm64 only)
+=================================
+
+Controls user space access for reading perf event counters. When set to 1,
+user space can read performance monitor counter registers directly.
+
+The default value is 0 (access disabled).
+
+See Documentation/arm64/perf.rst for more information.
+
+
pid_max
=======
diff --git a/Documentation/arm64/cpu-feature-registers.rst b/Documentation/arm64/cpu-feature-registers.rst
index 9f9b8fd06089..749ae970c319 100644
--- a/Documentation/arm64/cpu-feature-registers.rst
+++ b/Documentation/arm64/cpu-feature-registers.rst
@@ -275,6 +275,23 @@ infrastructure:
| SVEVer | [3-0] | y |
+------------------------------+---------+---------+
+ 8) ID_AA64MMFR1_EL1 - Memory model feature register 1
+
+ +------------------------------+---------+---------+
+ | Name | bits | visible |
+ +------------------------------+---------+---------+
+ | AFP | [47-44] | y |
+ +------------------------------+---------+---------+
+
+ 9) ID_AA64ISAR2_EL1 - Instruction set attribute register 2
+
+ +------------------------------+---------+---------+
+ | Name | bits | visible |
+ +------------------------------+---------+---------+
+ | RPRES | [7-4] | y |
+ +------------------------------+---------+---------+
+
+
Appendix I: Example
-------------------
diff --git a/Documentation/arm64/elf_hwcaps.rst b/Documentation/arm64/elf_hwcaps.rst
index af106af8e1c0..b72ff17d600a 100644
--- a/Documentation/arm64/elf_hwcaps.rst
+++ b/Documentation/arm64/elf_hwcaps.rst
@@ -251,6 +251,14 @@ HWCAP2_ECV
Functionality implied by ID_AA64MMFR0_EL1.ECV == 0b0001.
+HWCAP2_AFP
+
+ Functionality implied by ID_AA64MFR1_EL1.AFP == 0b0001.
+
+HWCAP2_RPRES
+
+ Functionality implied by ID_AA64ISAR2_EL1.RPRES == 0b0001.
+
4. Unused AT_HWCAP bits
-----------------------
diff --git a/Documentation/arm64/perf.rst b/Documentation/arm64/perf.rst
index b567f177d385..1f87b57c2332 100644
--- a/Documentation/arm64/perf.rst
+++ b/Documentation/arm64/perf.rst
@@ -2,7 +2,10 @@
.. _perf_index:
-=====================
+====
+Perf
+====
+
Perf Event Attributes
=====================
@@ -88,3 +91,76 @@ exclude_host. However when using !exclude_hv there is a small blackout
window at the guest entry/exit where host events are not captured.
On VHE systems there are no blackout windows.
+
+Perf Userspace PMU Hardware Counter Access
+==========================================
+
+Overview
+--------
+The perf userspace tool relies on the PMU to monitor events. It offers an
+abstraction layer over the hardware counters since the underlying
+implementation is cpu-dependent.
+Arm64 allows userspace tools to have access to the registers storing the
+hardware counters' values directly.
+
+This targets specifically self-monitoring tasks in order to reduce the overhead
+by directly accessing the registers without having to go through the kernel.
+
+How-to
+------
+The focus is set on the armv8 PMUv3 which makes sure that the access to the pmu
+registers is enabled and that the userspace has access to the relevant
+information in order to use them.
+
+In order to have access to the hardware counters, the global sysctl
+kernel/perf_user_access must first be enabled:
+
+.. code-block:: sh
+
+ echo 1 > /proc/sys/kernel/perf_user_access
+
+It is necessary to open the event using the perf tool interface with config1:1
+attr bit set: the sys_perf_event_open syscall returns a fd which can
+subsequently be used with the mmap syscall in order to retrieve a page of memory
+containing information about the event. The PMU driver uses this page to expose
+to the user the hardware counter's index and other necessary data. Using this
+index enables the user to access the PMU registers using the `mrs` instruction.
+Access to the PMU registers is only valid while the sequence lock is unchanged.
+In particular, the PMSELR_EL0 register is zeroed each time the sequence lock is
+changed.
+
+The userspace access is supported in libperf using the perf_evsel__mmap()
+and perf_evsel__read() functions. See `tools/lib/perf/tests/test-evsel.c`_ for
+an example.
+
+About heterogeneous systems
+---------------------------
+On heterogeneous systems such as big.LITTLE, userspace PMU counter access can
+only be enabled when the tasks are pinned to a homogeneous subset of cores and
+the corresponding PMU instance is opened by specifying the 'type' attribute.
+The use of generic event types is not supported in this case.
+
+Have a look at `tools/perf/arch/arm64/tests/user-events.c`_ for an example. It
+can be run using the perf tool to check that the access to the registers works
+correctly from userspace:
+
+.. code-block:: sh
+
+ perf test -v user
+
+About chained events and counter sizes
+--------------------------------------
+The user can request either a 32-bit (config1:0 == 0) or 64-bit (config1:0 == 1)
+counter along with userspace access. The sys_perf_event_open syscall will fail
+if a 64-bit counter is requested and the hardware doesn't support 64-bit
+counters. Chained events are not supported in conjunction with userspace counter
+access. If a 32-bit counter is requested on hardware with 64-bit counters, then
+userspace must treat the upper 32-bits read from the counter as UNKNOWN. The
+'pmc_width' field in the user page will indicate the valid width of the counter
+and should be used to mask the upper bits as needed.
+
+.. Links
+.. _tools/perf/arch/arm64/tests/user-events.c:
+ https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c
+.. _tools/lib/perf/tests/test-evsel.c:
+ https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c
diff --git a/Documentation/arm64/sve.rst b/Documentation/arm64/sve.rst
index 03137154299e..9d9a4de5bc34 100644
--- a/Documentation/arm64/sve.rst
+++ b/Documentation/arm64/sve.rst
@@ -255,7 +255,7 @@ prctl(PR_SVE_GET_VL)
vector length change (which would only normally be the case between a
fork() or vfork() and the corresponding execve() in typical use).
- To extract the vector length from the result, and it with
+ To extract the vector length from the result, bitwise and it with
PR_SVE_VL_LEN_MASK.
Return value: a nonnegative value on success, or a negative value on error:
diff --git a/Documentation/arm64/tagged-address-abi.rst b/Documentation/arm64/tagged-address-abi.rst
index 0c9120ec58ae..540a1d4fc6c9 100644
--- a/Documentation/arm64/tagged-address-abi.rst
+++ b/Documentation/arm64/tagged-address-abi.rst
@@ -49,7 +49,7 @@ how the user addresses are used by the kernel:
- ``brk()``, ``mmap()`` and the ``new_address`` argument to
``mremap()`` as these have the potential to alias with existing
- user addresses.
+ user addresses.
NOTE: This behaviour changed in v5.6 and so some earlier kernels may
incorrectly accept valid tagged pointers for the ``brk()``,
diff --git a/Documentation/devicetree/bindings/perf/arm,cmn.yaml b/Documentation/devicetree/bindings/perf/arm,cmn.yaml
index 42424ccbdd0c..2d4219ec7eda 100644
--- a/Documentation/devicetree/bindings/perf/arm,cmn.yaml
+++ b/Documentation/devicetree/bindings/perf/arm,cmn.yaml
@@ -12,12 +12,14 @@ maintainers:
properties:
compatible:
- const: arm,cmn-600
+ enum:
+ - arm,cmn-600
+ - arm,ci-700
reg:
items:
- description: Physical address of the base (PERIPHBASE) and
- size (up to 64MB) of the configuration address space.
+ size of the configuration address space.
interrupts:
minItems: 1
@@ -31,14 +33,23 @@ properties:
arm,root-node:
$ref: /schemas/types.yaml#/definitions/uint32
- description: Offset from PERIPHBASE of the configuration
- discovery node (see TRM definition of ROOTNODEBASE).
+ description: Offset from PERIPHBASE of CMN-600's configuration
+ discovery node (see TRM definition of ROOTNODEBASE). Not
+ relevant for newer CMN/CI products.
required:
- compatible
- reg
- interrupts
- - arm,root-node
+
+if:
+ properties:
+ compatible:
+ contains:
+ const: arm,cmn-600
+then:
+ required:
+ - arm,root-node
additionalProperties: false
diff --git a/Documentation/devicetree/bindings/perf/arm,smmu-v3-pmcg.yaml b/Documentation/devicetree/bindings/perf/arm,smmu-v3-pmcg.yaml
new file mode 100644
index 000000000000..a4b53a6a1ebf
--- /dev/null
+++ b/Documentation/devicetree/bindings/perf/arm,smmu-v3-pmcg.yaml
@@ -0,0 +1,70 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/perf/arm,smmu-v3-pmcg.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Arm SMMUv3 Performance Monitor Counter Group
+
+maintainers:
+ - Will Deacon <will@kernel.org>
+ - Robin Murphy <robin.murphy@arm.com>
+
+description: |
+ An SMMUv3 may have several Performance Monitor Counter Group (PMCG).
+ They are standalone performance monitoring units that support both
+ architected and IMPLEMENTATION DEFINED event counters.
+
+properties:
+ $nodename:
+ pattern: "^pmu@[0-9a-f]*"
+ compatible:
+ oneOf:
+ - items:
+ - const: arm,mmu-600-pmcg
+ - const: arm,smmu-v3-pmcg
+ - const: arm,smmu-v3-pmcg
+
+ reg:
+ items:
+ - description: Register page 0
+ - description: Register page 1, if SMMU_PMCG_CFGR.RELOC_CTRS = 1
+ minItems: 1
+
+ interrupts:
+ maxItems: 1
+
+ msi-parent: true
+
+required:
+ - compatible
+ - reg
+
+anyOf:
+ - required:
+ - interrupts
+ - required:
+ - msi-parent
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ #include <dt-bindings/interrupt-controller/irq.h>
+
+ pmu@2b420000 {
+ compatible = "arm,smmu-v3-pmcg";
+ reg = <0x2b420000 0x1000>,
+ <0x2b430000 0x1000>;
+ interrupts = <GIC_SPI 80 IRQ_TYPE_EDGE_RISING>;
+ msi-parent = <&its 0xff0000>;
+ };
+
+ pmu@2b440000 {
+ compatible = "arm,smmu-v3-pmcg";
+ reg = <0x2b440000 0x1000>,
+ <0x2b450000 0x1000>;
+ interrupts = <GIC_SPI 81 IRQ_TYPE_EDGE_RISING>;
+ msi-parent = <&its 0xff0000>;
+ };
diff --git a/Documentation/devicetree/bindings/perf/marvell-cn10k-tad.yaml b/Documentation/devicetree/bindings/perf/marvell-cn10k-tad.yaml
new file mode 100644
index 000000000000..362142252667
--- /dev/null
+++ b/Documentation/devicetree/bindings/perf/marvell-cn10k-tad.yaml
@@ -0,0 +1,63 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/perf/marvell-cn10k-tad.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Marvell CN10K LLC-TAD performance monitor
+
+maintainers:
+ - Bhaskara Budiredla <bbudiredla@marvell.com>
+
+description: |
+ The Tag-and-Data units (TADs) maintain coherence and contain CN10K
+ shared on-chip last level cache (LLC). The tad pmu measures the
+ performance of last-level cache. Each tad pmu supports up to eight
+ counters.
+
+ The DT setup comprises of number of tad blocks, the sizes of pmu
+ regions, tad blocks and overall base address of the HW.
+
+properties:
+ compatible:
+ const: marvell,cn10k-tad-pmu
+
+ reg:
+ maxItems: 1
+
+ marvell,tad-cnt:
+ description: specifies the number of tads on the soc
+ $ref: /schemas/types.yaml#/definitions/uint32
+
+ marvell,tad-page-size:
+ description: specifies the size of each tad page
+ $ref: /schemas/types.yaml#/definitions/uint32
+
+ marvell,tad-pmu-page-size:
+ description: specifies the size of page that the pmu uses
+ $ref: /schemas/types.yaml#/definitions/uint32
+
+required:
+ - compatible
+ - reg
+ - marvell,tad-cnt
+ - marvell,tad-page-size
+ - marvell,tad-pmu-page-size
+
+additionalProperties: false
+
+examples:
+ - |
+
+ tad {
+ #address-cells = <2>;
+ #size-cells = <2>;
+
+ tad_pmu@80000000 {
+ compatible = "marvell,cn10k-tad-pmu";
+ reg = <0x87e2 0x80000000 0x0 0x1000>;
+ marvell,tad-cnt = <1>;
+ marvell,tad-page-size = <0x1000>;
+ marvell,tad-pmu-page-size = <0x1000>;
+ };
+ };
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 7367ada13208..b12df9137e1c 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1950,6 +1950,14 @@ There are some more advanced barrier functions:
For load from persistent memory, existing read memory barriers are sufficient
to ensure read ordering.
+ (*) io_stop_wc();
+
+ For memory accesses with write-combining attributes (e.g. those returned
+ by ioremap_wc(), the CPU may wait for prior accesses to be merged with
+ subsequent ones. io_stop_wc() can be used to prevent the merging of
+ write-combining memory accesses before this macro with those after it when
+ such wait has performance implications.
+
===============================
IMPLICIT KERNEL MEMORY BARRIERS
===============================