diff options
| author | 2020-08-03 15:06:44 +0000 | |
|---|---|---|
| committer | 2020-08-03 15:06:44 +0000 | |
| commit | b64793999546ed8adebaeebd9d8345d18db8927d (patch) | |
| tree | 4357c27b561d73b0e089727c6ed659f2ceff5f47 /gnu/llvm/docs/CommandGuide/llvm-exegesis.rst | |
| parent | Add support for UTF-8 DISPLAY-HINTs with octet length. For now only (diff) | |
| download | wireguard-openbsd-b64793999546ed8adebaeebd9d8345d18db8927d.tar.xz wireguard-openbsd-b64793999546ed8adebaeebd9d8345d18db8927d.zip | |
Remove LLVM 8.0.1 files.
Diffstat (limited to 'gnu/llvm/docs/CommandGuide/llvm-exegesis.rst')
| -rw-r--r-- | gnu/llvm/docs/CommandGuide/llvm-exegesis.rst | 236 |
1 files changed, 0 insertions, 236 deletions
diff --git a/gnu/llvm/docs/CommandGuide/llvm-exegesis.rst b/gnu/llvm/docs/CommandGuide/llvm-exegesis.rst deleted file mode 100644 index f27db9e57ed..00000000000 --- a/gnu/llvm/docs/CommandGuide/llvm-exegesis.rst +++ /dev/null @@ -1,236 +0,0 @@ -llvm-exegesis - LLVM Machine Instruction Benchmark -================================================== - -SYNOPSIS --------- - -:program:`llvm-exegesis` [*options*] - -DESCRIPTION ------------ - -:program:`llvm-exegesis` is a benchmarking tool that uses information available -in LLVM to measure host machine instruction characteristics like latency or port -decomposition. - -Given an LLVM opcode name and a benchmarking mode, :program:`llvm-exegesis` -generates a code snippet that makes execution as serial (resp. as parallel) as -possible so that we can measure the latency (resp. uop decomposition) of the -instruction. -The code snippet is jitted and executed on the host subtarget. The time taken -(resp. resource usage) is measured using hardware performance counters. The -result is printed out as YAML to the standard output. - -The main goal of this tool is to automatically (in)validate the LLVM's TableDef -scheduling models. To that end, we also provide analysis of the results. - -:program:`llvm-exegesis` can also benchmark arbitrary user-provided code -snippets. - -EXAMPLE 1: benchmarking instructions ------------------------------------- - -Assume you have an X86-64 machine. To measure the latency of a single -instruction, run: - -.. code-block:: bash - - $ llvm-exegesis -mode=latency -opcode-name=ADD64rr - -Measuring the uop decomposition of an instruction works similarly: - -.. code-block:: bash - - $ llvm-exegesis -mode=uops -opcode-name=ADD64rr - -The output is a YAML document (the default is to write to stdout, but you can -redirect the output to a file using `-benchmarks-file`): - -.. code-block:: none - - --- - key: - opcode_name: ADD64rr - mode: latency - config: '' - cpu_name: haswell - llvm_triple: x86_64-unknown-linux-gnu - num_repetitions: 10000 - measurements: - - { key: latency, value: 1.0058, debug_string: '' } - error: '' - info: 'explicit self cycles, selecting one aliasing configuration. - Snippet: - ADD64rr R8, R8, R10 - ' - ... - -To measure the latency of all instructions for the host architecture, run: - -.. code-block:: bash - - #!/bin/bash - readonly INSTRUCTIONS=$(($(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc | cut -f2 -d=) - 1)) - for INSTRUCTION in $(seq 1 ${INSTRUCTIONS}); - do - ./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} | sed -n '/---/,$p' - done - -FIXME: Provide an :program:`llvm-exegesis` option to test all instructions. - - -EXAMPLE 2: benchmarking a custom code snippet ---------------------------------------------- - -To measure the latency/uops of a custom piece of code, you can specify the -`snippets-file` option (`-` reads from standard input). - -.. code-block:: bash - - $ echo "vzeroupper" | llvm-exegesis -mode=uops -snippets-file=- - -Real-life code snippets typically depend on registers or memory. -:program:`llvm-exegesis` checks the liveliness of registers (i.e. any register -use has a corresponding def or is a "live in"). If your code depends on the -value of some registers, you have two options: - -- Mark the register as requiring a definition. :program:`llvm-exegesis` will - automatically assign a value to the register. This can be done using the - directive `LLVM-EXEGESIS-DEFREG <reg name> <hex_value>`, where `<hex_value>` - is a bit pattern used to fill `<reg_name>`. If `<hex_value>` is smaller than - the register width, it will be sign-extended. -- Mark the register as a "live in". :program:`llvm-exegesis` will benchmark - using whatever value was in this registers on entry. This can be done using - the directive `LLVM-EXEGESIS-LIVEIN <reg name>`. - -For example, the following code snippet depends on the values of XMM1 (which -will be set by the tool) and the memory buffer passed in RDI (live in). - -.. code-block:: none - - # LLVM-EXEGESIS-LIVEIN RDI - # LLVM-EXEGESIS-DEFREG XMM1 42 - vmulps (%rdi), %xmm1, %xmm2 - vhaddps %xmm2, %xmm2, %xmm3 - addq $0x10, %rdi - - -EXAMPLE 3: analysis -------------------- - -Assuming you have a set of benchmarked instructions (either latency or uops) as -YAML in file `/tmp/benchmarks.yaml`, you can analyze the results using the -following command: - -.. code-block:: bash - - $ llvm-exegesis -mode=analysis \ - -benchmarks-file=/tmp/benchmarks.yaml \ - -analysis-clusters-output-file=/tmp/clusters.csv \ - -analysis-inconsistencies-output-file=/tmp/inconsistencies.html - -This will group the instructions into clusters with the same performance -characteristics. The clusters will be written out to `/tmp/clusters.csv` in the -following format: - -.. code-block:: none - - cluster_id,opcode_name,config,sched_class - ... - 2,ADD32ri8_DB,,WriteALU,1.00 - 2,ADD32ri_DB,,WriteALU,1.01 - 2,ADD32rr,,WriteALU,1.01 - 2,ADD32rr_DB,,WriteALU,1.00 - 2,ADD32rr_REV,,WriteALU,1.00 - 2,ADD64i32,,WriteALU,1.01 - 2,ADD64ri32,,WriteALU,1.01 - 2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00 - 2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02 - 2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01 - 2,ADD64ri8,,WriteALU,1.00 - 2,SETBr,,WriteSETCC,1.01 - ... - -:program:`llvm-exegesis` will also analyze the clusters to point out -inconsistencies in the scheduling information. The output is an html file. For -example, `/tmp/inconsistencies.html` will contain messages like the following : - -.. image:: llvm-exegesis-analysis.png - :align: center - -Note that the scheduling class names will be resolved only when -:program:`llvm-exegesis` is compiled in debug mode, else only the class id will -be shown. This does not invalidate any of the analysis results though. - - -OPTIONS -------- - -.. option:: -help - - Print a summary of command line options. - -.. option:: -opcode-index=<LLVM opcode index> - - Specify the opcode to measure, by index. See example 1 for details. - Either `opcode-index`, `opcode-name` or `snippets-file` must be set. - -.. option:: -opcode-name=<opcode name 1>,<opcode name 2>,... - - Specify the opcode to measure, by name. Several opcodes can be specified as - a comma-separated list. See example 1 for details. - Either `opcode-index`, `opcode-name` or `snippets-file` must be set. - - .. option:: -snippets-file=<filename> - - Specify the custom code snippet to measure. See example 2 for details. - Either `opcode-index`, `opcode-name` or `snippets-file` must be set. - -.. option:: -mode=[latency|uops|analysis] - - Specify the run mode. - -.. option:: -num-repetitions=<Number of repetition> - - Specify the number of repetitions of the asm snippet. - Higher values lead to more accurate measurements but lengthen the benchmark. - -.. option:: -benchmarks-file=</path/to/file> - - File to read (`analysis` mode) or write (`latency`/`uops` modes) benchmark - results. "-" uses stdin/stdout. - -.. option:: -analysis-clusters-output-file=</path/to/file> - - If provided, write the analysis clusters as CSV to this file. "-" prints to - stdout. - -.. option:: -analysis-inconsistencies-output-file=</path/to/file> - - If non-empty, write inconsistencies found during analysis to this file. `-` - prints to stdout. - -.. option:: -analysis-numpoints=<dbscan numPoints parameter> - - Specify the numPoints parameters to be used for DBSCAN clustering - (`analysis` mode). - -.. option:: -analysis-espilon=<dbscan epsilon parameter> - - Specify the numPoints parameters to be used for DBSCAN clustering - (`analysis` mode). - -.. option:: -ignore-invalid-sched-class=false - - If set, ignore instructions that do not have a sched class (class idx = 0). - - .. option:: -mcpu=<cpu name> - - If set, measure the cpu characteristics using the counters for this CPU. This - is useful when creating new sched models (the host CPU is unknown to LLVM). - -EXIT STATUS ------------ - -:program:`llvm-exegesis` returns 0 on success. Otherwise, an error message is -printed to standard error, and the tool returns a non 0 value. |
