diff options
| author | 2019-06-23 21:36:31 +0000 | |
|---|---|---|
| committer | 2019-06-23 21:36:31 +0000 | |
| commit | 23f101f37937a1bd4a29726cab2f76e0fb038b35 (patch) | |
| tree | f7da7d6b32c2e07114da399150bfa88d72187012 /gnu/llvm/docs/CommandGuide | |
| parent | sort previous; ok deraadt (diff) | |
| download | wireguard-openbsd-23f101f37937a1bd4a29726cab2f76e0fb038b35.tar.xz wireguard-openbsd-23f101f37937a1bd4a29726cab2f76e0fb038b35.zip | |
Import LLVM 8.0.0 release including clang, lld and lldb.
Diffstat (limited to 'gnu/llvm/docs/CommandGuide')
| -rw-r--r-- | gnu/llvm/docs/CommandGuide/FileCheck.rst | 43 | ||||
| -rw-r--r-- | gnu/llvm/docs/CommandGuide/index.rst | 2 | ||||
| -rw-r--r-- | gnu/llvm/docs/CommandGuide/lit.rst | 2 | ||||
| -rw-r--r-- | gnu/llvm/docs/CommandGuide/llc.rst | 4 | ||||
| -rw-r--r-- | gnu/llvm/docs/CommandGuide/lli.rst | 1 | ||||
| -rw-r--r-- | gnu/llvm/docs/CommandGuide/llvm-cov.rst | 22 | ||||
| -rw-r--r-- | gnu/llvm/docs/CommandGuide/llvm-cxxmap.rst | 91 | ||||
| -rw-r--r-- | gnu/llvm/docs/CommandGuide/llvm-exegesis.rst | 70 | ||||
| -rw-r--r-- | gnu/llvm/docs/CommandGuide/llvm-mca.rst | 266 | ||||
| -rw-r--r-- | gnu/llvm/docs/CommandGuide/llvm-objdump.rst | 123 | ||||
| -rw-r--r-- | gnu/llvm/docs/CommandGuide/llvm-profdata.rst | 22 | ||||
| -rw-r--r-- | gnu/llvm/docs/CommandGuide/llvm-symbolizer.rst | 8 | ||||
| -rw-r--r-- | gnu/llvm/docs/CommandGuide/tblgen.rst | 4 |
13 files changed, 511 insertions, 147 deletions
diff --git a/gnu/llvm/docs/CommandGuide/FileCheck.rst b/gnu/llvm/docs/CommandGuide/FileCheck.rst index b0324f40463..721d2c2e782 100644 --- a/gnu/llvm/docs/CommandGuide/FileCheck.rst +++ b/gnu/llvm/docs/CommandGuide/FileCheck.rst @@ -24,6 +24,9 @@ match. The file to verify is read from standard input unless the OPTIONS ------- +Options are parsed from the environment variable ``FILECHECK_OPTS`` +and from the command line. + .. option:: -help Print a summary of command line options. @@ -77,9 +80,16 @@ OPTIONS -verify``. With this option FileCheck will verify that input does not contain warnings not covered by any ``CHECK:`` patterns. +.. option:: --dump-input <mode> + + Dump input to stderr, adding annotations representing currently enabled + diagnostics. Do this either 'always', on 'fail', or 'never'. Specify 'help' + to explain the dump format and quit. + .. option:: --dump-input-on-failure - When the check fails, dump all of the original input. + When the check fails, dump all of the original input. This option is + deprecated in favor of `--dump-input=fail`. .. option:: --enable-var-scope @@ -116,6 +126,10 @@ OPTIONS as old tests are migrated to the new non-overlapping ``CHECK-DAG:`` implementation. +.. option:: --color + + Use colors in output (autodetected by default). + EXIT STATUS ----------- @@ -270,9 +284,9 @@ you can use the "``CHECK-EMPTY:``" directive. .. code-block:: llvm - foo + declare void @foo() - bar + declare void @bar() ; CHECK: foo ; CHECK-EMPTY: ; CHECK-NEXT: bar @@ -304,6 +318,29 @@ can be used: ; CHECK: ret i8 } +The "CHECK-COUNT:" directive +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If you need to match multiple lines with the same pattern over and over again +you can repeat a plain ``CHECK:`` as many times as needed. If that looks too +boring you can instead use a counted check "``CHECK-COUNT-<num>:``", where +``<num>`` is a positive decimal number. It will match the pattern exactly +``<num>`` times, no more and no less. If you specified a custom check prefix, +just use "``<PREFIX>-COUNT-<num>:``" for the same effect. +Here is a simple example: + +.. code-block:: text + + Loop at depth 1 + Loop at depth 1 + Loop at depth 1 + Loop at depth 1 + Loop at depth 2 + Loop at depth 3 + + ; CHECK-COUNT-6: Loop at depth {{[0-9]+}} + ; CHECK-NOT: Loop at depth {{[0-9]+}} + The "CHECK-DAG:" directive ~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/gnu/llvm/docs/CommandGuide/index.rst b/gnu/llvm/docs/CommandGuide/index.rst index 95efffdb656..9108ae6a96b 100644 --- a/gnu/llvm/docs/CommandGuide/index.rst +++ b/gnu/llvm/docs/CommandGuide/index.rst @@ -23,7 +23,9 @@ Basic Commands llvm-ar llvm-lib llvm-nm + llvm-objdump llvm-config + llvm-cxxmap llvm-diff llvm-cov llvm-profdata diff --git a/gnu/llvm/docs/CommandGuide/lit.rst b/gnu/llvm/docs/CommandGuide/lit.rst index 0d39311152d..e0d09ae977d 100644 --- a/gnu/llvm/docs/CommandGuide/lit.rst +++ b/gnu/llvm/docs/CommandGuide/lit.rst @@ -407,7 +407,7 @@ These are defined in TestRunner.py. The base set of substitutions are: %p same as %S %{pathsep} path separator %t temporary file name unique to the test - %T temporary directory unique to the test + %T parent directory of %t (not unique, deprecated, do not use) %% % ========== ============== diff --git a/gnu/llvm/docs/CommandGuide/llc.rst b/gnu/llvm/docs/CommandGuide/llc.rst index 11dfc902d20..da096f1263a 100644 --- a/gnu/llvm/docs/CommandGuide/llc.rst +++ b/gnu/llvm/docs/CommandGuide/llc.rst @@ -87,9 +87,9 @@ End-user Options llvm-as < /dev/null | llc -march=xyz -mattr=help -.. option:: --disable-fp-elim +.. option:: --frame-pointer - Disable frame pointer elimination optimization. + Specify effect of frame pointer elimination optimization (all,non-leaf,none). .. option:: --disable-excess-fp-precision diff --git a/gnu/llvm/docs/CommandGuide/lli.rst b/gnu/llvm/docs/CommandGuide/lli.rst index 58481073d06..1132ac3e6be 100644 --- a/gnu/llvm/docs/CommandGuide/lli.rst +++ b/gnu/llvm/docs/CommandGuide/lli.rst @@ -125,6 +125,7 @@ CODE GENERATION OPTIONS .. code-block:: text default: Target default code model + tiny: Tiny code model small: Small code model kernel: Kernel code model medium: Medium code model diff --git a/gnu/llvm/docs/CommandGuide/llvm-cov.rst b/gnu/llvm/docs/CommandGuide/llvm-cov.rst index 6f1b6e46c48..71924e997d9 100644 --- a/gnu/llvm/docs/CommandGuide/llvm-cov.rst +++ b/gnu/llvm/docs/CommandGuide/llvm-cov.rst @@ -374,9 +374,15 @@ SYNOPSIS DESCRIPTION ^^^^^^^^^^^ -The :program:`llvm-cov export` command exports regions, functions, expansions, -and summaries of the coverage of the binaries *BIN*,... using the profile data -*PROFILE* as JSON. It can optionally be filtered to only export the coverage +The :program:`llvm-cov export` command exports coverage data of the binaries +*BIN*,... using the profile data *PROFILE* in either JSON or lcov trace file +format. + +When exporting JSON, the regions, functions, expansions, and summaries of the +coverage data will be exported. When exporting an lcov trace file, the +line-based coverage and summaries will be exported. + +The exported data can optionally be filtered to only export the coverage for the files listed in *SOURCES*. For information on compiling programs for coverage and generating profile data, @@ -392,12 +398,18 @@ OPTIONS universal binary or to use an architecture that does not match a non-universal binary. +.. option:: -format=<FORMAT> + + Use the specified output format. The supported formats are: "text" (JSON), + "lcov". + .. option:: -summary-only Export only summary information for each file in the coverage data. This mode will not export coverage information for smaller units such as individual - functions or regions. The result will be the same as produced by :program: - `llvm-cov report` command, but presented in JSON format rather than text. + functions or regions. The result will contain the same information as produced + by the :program:`llvm-cov report` command, but presented in JSON or lcov + format rather than text. .. option:: -ignore-filename-regex=<PATTERN> diff --git a/gnu/llvm/docs/CommandGuide/llvm-cxxmap.rst b/gnu/llvm/docs/CommandGuide/llvm-cxxmap.rst new file mode 100644 index 00000000000..7293f60b55d --- /dev/null +++ b/gnu/llvm/docs/CommandGuide/llvm-cxxmap.rst @@ -0,0 +1,91 @@ +llvm-cxxmap - Mangled name remapping tool +========================================= + +SYNOPSIS +-------- + +:program:`llvm-cxxmap` [*options*] *symbol-file-1* *symbol-file-2* + +DESCRIPTION +----------- + +The :program:`llvm-cxxmap` tool performs fuzzy matching of C++ mangled names, +based on a file describing name components that should be considered equivalent. + +The symbol files should contain a list of C++ mangled names (one per line). +Blank lines and lines starting with ``#`` are ignored. The output is a list +of pairs of equivalent symbols, one per line, of the form + +.. code-block:: none + + <symbol-1> <symbol-2> + +where ``<symbol-1>`` is a symbol from *symbol-file-1* and ``<symbol-2>`` is +a symbol from *symbol-file-2*. Mappings for which the two symbols are identical +are omitted. + +OPTIONS +------- + +.. program:: llvm-cxxmap + +.. option:: -remapping-file=file, -r=file + + Specify a file containing a list of equivalence rules that should be used + to determine whether two symbols are equivalent. Required. + See :ref:`remapping-file`. + +.. option:: -output=file, -o=file + + Specify a file to write the list of matched names to. If unspecified, the + list will be written to stdout. + +.. option:: -Wambiguous + + Produce a warning if there are multiple equivalent (but distinct) symbols in + *symbol-file-2*. + +.. option:: -Wincomplete + + Produce a warning if *symbol-file-1* contains a symbol for which there is no + equivalent symbol in *symbol-file-2*. + +.. _remapping-file: + +REMAPPING FILE +-------------- + +The remapping file is a text file containing lines of the form + +.. code-block:: none + + fragmentkind fragment1 fragment2 + +where ``fragmentkind`` is one of ``name``, ``type``, or ``encoding``, +indicating whether the following mangled name fragments are +<`name <http://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangle.name>`_>s, +<`type <http://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangle.type>`_>s, or +<`encoding <http://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangle.encoding>`_>s, +respectively. +Blank lines and lines starting with ``#`` are ignored. + +For convenience, built-in <substitution>s such as ``St`` and ``Ss`` +are accepted as <name>s (even though they technically are not <name>s). + +For example, to specify that ``absl::string_view`` and ``std::string_view`` +should be treated as equivalent, the following remapping file could be used: + +.. code-block:: none + + # absl::string_view is considered equivalent to std::string_view + type N4absl11string_viewE St17basic_string_viewIcSt11char_traitsIcEE + + # std:: might be std::__1:: in libc++ or std::__cxx11:: in libstdc++ + name St St3__1 + name St St7__cxx11 + +.. note:: + + Symbol remapping is currently only supported for C++ mangled names + following the Itanium C++ ABI mangling scheme. This covers all C++ targets + supported by Clang other than Windows targets. diff --git a/gnu/llvm/docs/CommandGuide/llvm-exegesis.rst b/gnu/llvm/docs/CommandGuide/llvm-exegesis.rst index d60434f5d02..f27db9e57ed 100644 --- a/gnu/llvm/docs/CommandGuide/llvm-exegesis.rst +++ b/gnu/llvm/docs/CommandGuide/llvm-exegesis.rst @@ -24,8 +24,11 @@ result is printed out as YAML to the standard output. The main goal of this tool is to automatically (in)validate the LLVM's TableDef scheduling models. To that end, we also provide analysis of the results. -EXAMPLES: benchmarking ----------------------- +:program:`llvm-exegesis` can also benchmark arbitrary user-provided code +snippets. + +EXAMPLE 1: benchmarking instructions +------------------------------------ Assume you have an X86-64 machine. To measure the latency of a single instruction, run: @@ -75,8 +78,45 @@ To measure the latency of all instructions for the host architecture, run: FIXME: Provide an :program:`llvm-exegesis` option to test all instructions. -EXAMPLES: analysis ----------------------- + +EXAMPLE 2: benchmarking a custom code snippet +--------------------------------------------- + +To measure the latency/uops of a custom piece of code, you can specify the +`snippets-file` option (`-` reads from standard input). + +.. code-block:: bash + + $ echo "vzeroupper" | llvm-exegesis -mode=uops -snippets-file=- + +Real-life code snippets typically depend on registers or memory. +:program:`llvm-exegesis` checks the liveliness of registers (i.e. any register +use has a corresponding def or is a "live in"). If your code depends on the +value of some registers, you have two options: + +- Mark the register as requiring a definition. :program:`llvm-exegesis` will + automatically assign a value to the register. This can be done using the + directive `LLVM-EXEGESIS-DEFREG <reg name> <hex_value>`, where `<hex_value>` + is a bit pattern used to fill `<reg_name>`. If `<hex_value>` is smaller than + the register width, it will be sign-extended. +- Mark the register as a "live in". :program:`llvm-exegesis` will benchmark + using whatever value was in this registers on entry. This can be done using + the directive `LLVM-EXEGESIS-LIVEIN <reg name>`. + +For example, the following code snippet depends on the values of XMM1 (which +will be set by the tool) and the memory buffer passed in RDI (live in). + +.. code-block:: none + + # LLVM-EXEGESIS-LIVEIN RDI + # LLVM-EXEGESIS-DEFREG XMM1 42 + vmulps (%rdi), %xmm1, %xmm2 + vhaddps %xmm2, %xmm2, %xmm3 + addq $0x10, %rdi + + +EXAMPLE 3: analysis +------------------- Assuming you have a set of benchmarked instructions (either latency or uops) as YAML in file `/tmp/benchmarks.yaml`, you can analyze the results using the @@ -87,7 +127,7 @@ following command: $ llvm-exegesis -mode=analysis \ -benchmarks-file=/tmp/benchmarks.yaml \ -analysis-clusters-output-file=/tmp/clusters.csv \ - -analysis-inconsistencies-output-file=/tmp/inconsistencies.txt + -analysis-inconsistencies-output-file=/tmp/inconsistencies.html This will group the instructions into clusters with the same performance characteristics. The clusters will be written out to `/tmp/clusters.csv` in the @@ -132,13 +172,19 @@ OPTIONS .. option:: -opcode-index=<LLVM opcode index> - Specify the opcode to measure, by index. - Either `opcode-index` or `opcode-name` must be set. + Specify the opcode to measure, by index. See example 1 for details. + Either `opcode-index`, `opcode-name` or `snippets-file` must be set. + +.. option:: -opcode-name=<opcode name 1>,<opcode name 2>,... -.. option:: -opcode-name=<LLVM opcode name> + Specify the opcode to measure, by name. Several opcodes can be specified as + a comma-separated list. See example 1 for details. + Either `opcode-index`, `opcode-name` or `snippets-file` must be set. - Specify the opcode to measure, by name. - Either `opcode-index` or `opcode-name` must be set. + .. option:: -snippets-file=<filename> + + Specify the custom code snippet to measure. See example 2 for details. + Either `opcode-index`, `opcode-name` or `snippets-file` must be set. .. option:: -mode=[latency|uops|analysis] @@ -178,6 +224,10 @@ OPTIONS If set, ignore instructions that do not have a sched class (class idx = 0). + .. option:: -mcpu=<cpu name> + + If set, measure the cpu characteristics using the counters for this CPU. This + is useful when creating new sched models (the host CPU is unknown to LLVM). EXIT STATUS ----------- diff --git a/gnu/llvm/docs/CommandGuide/llvm-mca.rst b/gnu/llvm/docs/CommandGuide/llvm-mca.rst index e44eb2f8ce9..bc50794e0cb 100644 --- a/gnu/llvm/docs/CommandGuide/llvm-mca.rst +++ b/gnu/llvm/docs/CommandGuide/llvm-mca.rst @@ -21,43 +21,12 @@ The main goal of this tool is not just to predict the performance of the code when run on the target, but also help with diagnosing potential performance issues. -Given an assembly code sequence, llvm-mca estimates the Instructions Per Cycle -(IPC), as well as hardware resource pressure. The analysis and reporting style -were inspired by the IACA tool from Intel. +Given an assembly code sequence, :program:`llvm-mca` estimates the Instructions +Per Cycle (IPC), as well as hardware resource pressure. The analysis and +reporting style were inspired by the IACA tool from Intel. -:program:`llvm-mca` allows the usage of special code comments to mark regions of -the assembly code to be analyzed. A comment starting with substring -``LLVM-MCA-BEGIN`` marks the beginning of a code region. A comment starting with -substring ``LLVM-MCA-END`` marks the end of a code region. For example: - -.. code-block:: none - - # LLVM-MCA-BEGIN My Code Region - ... - # LLVM-MCA-END - -Multiple regions can be specified provided that they do not overlap. A code -region can have an optional description. If no user-defined region is specified, -then :program:`llvm-mca` assumes a default region which contains every -instruction in the input file. Every region is analyzed in isolation, and the -final performance report is the union of all the reports generated for every -code region. - -Inline assembly directives may be used from source code to annotate the -assembly text: - -.. code-block:: c++ - - int foo(int a, int b) { - __asm volatile("# LLVM-MCA-BEGIN foo"); - a += 42; - __asm volatile("# LLVM-MCA-END"); - a *= b; - return a; - } - -So for example, you can compile code with clang, output assembly, and pipe it -directly into llvm-mca for analysis: +For example, you can compile code with clang, output assembly, and pipe it +directly into :program:`llvm-mca` for analysis: .. code-block:: bash @@ -207,6 +176,40 @@ EXIT STATUS :program:`llvm-mca` returns 0 on success. Otherwise, an error message is printed to standard error, and the tool returns 1. +USING MARKERS TO ANALYZE SPECIFIC CODE BLOCKS +--------------------------------------------- +:program:`llvm-mca` allows for the optional usage of special code comments to +mark regions of the assembly code to be analyzed. A comment starting with +substring ``LLVM-MCA-BEGIN`` marks the beginning of a code region. A comment +starting with substring ``LLVM-MCA-END`` marks the end of a code region. For +example: + +.. code-block:: none + + # LLVM-MCA-BEGIN My Code Region + ... + # LLVM-MCA-END + +Multiple regions can be specified provided that they do not overlap. A code +region can have an optional description. If no user-defined region is specified, +then :program:`llvm-mca` assumes a default region which contains every +instruction in the input file. Every region is analyzed in isolation, and the +final performance report is the union of all the reports generated for every +code region. + +Inline assembly directives may be used from source code to annotate the +assembly text: + +.. code-block:: c++ + + int foo(int a, int b) { + __asm volatile("# LLVM-MCA-BEGIN foo"); + a += 42; + __asm volatile("# LLVM-MCA-END"); + a *= b; + return a; + } + HOW LLVM-MCA WORKS ------------------ @@ -235,7 +238,10 @@ the following command using the example located at Iterations: 300 Instructions: 900 Total Cycles: 610 + Total uOps: 900 + Dispatch Width: 2 + uOps Per Cycle: 1.48 IPC: 1.48 Block RThroughput: 2.0 @@ -282,35 +288,45 @@ the following command using the example located at - - - 1.00 - 1.00 - - - - - - - - vhaddps %xmm3, %xmm3, %xmm4 According to this report, the dot-product kernel has been executed 300 times, -for a total of 900 dynamically executed instructions. +for a total of 900 simulated instructions. The total number of simulated micro +opcodes (uOps) is also 900. The report is structured in three main sections. The first section collects a few performance numbers; the goal of this section is to give a very quick -overview of the performance throughput. In this example, the two important -performance indicators are **IPC** and **Block RThroughput** (Block Reciprocal +overview of the performance throughput. Important performance indicators are +**IPC**, **uOps Per Cycle**, and **Block RThroughput** (Block Reciprocal Throughput). IPC is computed dividing the total number of simulated instructions by the total -number of cycles. A delta between Dispatch Width and IPC is an indicator of a -performance issue. In the absence of loop-carried data dependencies, the +number of cycles. In the absence of loop-carried data dependencies, the observed IPC tends to a theoretical maximum which can be computed by dividing the number of instructions of a single iteration by the *Block RThroughput*. -IPC is bounded from above by the dispatch width. That is because the dispatch -width limits the maximum size of a dispatch group. IPC is also limited by the -amount of hardware parallelism. The availability of hardware resources affects -the resource pressure distribution, and it limits the number of instructions -that can be executed in parallel every cycle. A delta between Dispatch -Width and the theoretical maximum IPC is an indicator of a performance -bottleneck caused by the lack of hardware resources. In general, the lower the -Block RThroughput, the better. - -In this example, ``Instructions per iteration/Block RThroughput`` is 1.50. Since -there are no loop-carried dependencies, the observed IPC is expected to approach -1.50 when the number of iterations tends to infinity. The delta between the -Dispatch Width (2.00), and the theoretical maximum IPC (1.50) is an indicator of -a performance bottleneck caused by the lack of hardware resources, and the -*Resource pressure view* can help to identify the problematic resource usage. +Field 'uOps Per Cycle' is computed dividing the total number of simulated micro +opcodes by the total number of cycles. A delta between Dispatch Width and this +field is an indicator of a performance issue. In the absence of loop-carried +data dependencies, the observed 'uOps Per Cycle' should tend to a theoretical +maximum throughput which can be computed by dividing the number of uOps of a +single iteration by the *Block RThroughput*. + +Field *uOps Per Cycle* is bounded from above by the dispatch width. That is +because the dispatch width limits the maximum size of a dispatch group. Both IPC +and 'uOps Per Cycle' are limited by the amount of hardware parallelism. The +availability of hardware resources affects the resource pressure distribution, +and it limits the number of instructions that can be executed in parallel every +cycle. A delta between Dispatch Width and the theoretical maximum uOps per +Cycle (computed by dividing the number of uOps of a single iteration by the +*Block RTrhoughput*) is an indicator of a performance bottleneck caused by the +lack of hardware resources. +In general, the lower the Block RThroughput, the better. + +In this example, ``uOps per iteration/Block RThroughput`` is 1.50. Since there +are no loop-carried dependencies, the observed *uOps Per Cycle* is expected to +approach 1.50 when the number of iterations tends to infinity. The delta between +the Dispatch Width (2.00), and the theoretical maximum uOp throughput (1.50) is +an indicator of a performance bottleneck caused by the lack of hardware +resources, and the *Resource pressure view* can help to identify the problematic +resource usage. The second section of the report shows the latency and reciprocal throughput of every instruction in the sequence. That section also reports @@ -454,21 +470,22 @@ The ``-all-stats`` command line option enables extra statistics and performance counters for the dispatch logic, the reorder buffer, the retire control unit, and the register file. -Below is an example of ``-all-stats`` output generated by MCA for the -dot-product example discussed in the previous sections. +Below is an example of ``-all-stats`` output generated by :program:`llvm-mca` +for 300 iterations of the dot-product example discussed in the previous +sections. .. code-block:: none Dynamic Dispatch Stall Cycles: RAT - Register unavailable: 0 RCU - Retire tokens unavailable: 0 - SCHEDQ - Scheduler full: 272 + SCHEDQ - Scheduler full: 272 (44.6%) LQ - Load queue full: 0 SQ - Store queue full: 0 GROUP - Static restrictions on the dispatch group: 0 - Dispatch Logic - number of cycles where we saw N instructions dispatched: + Dispatch Logic - number of cycles where we saw N micro opcodes dispatched: [# dispatched], [# cycles] 0, 24 (3.9%) 1, 272 (44.6%) @@ -481,11 +498,16 @@ dot-product example discussed in the previous sections. 1, 306 (50.2%) 2, 297 (48.7%) - Scheduler's queue usage: - JALU01, 0/20 - JFPU01, 18/18 - JLSAGU, 0/12 + [1] Resource name. + [2] Average number of used buffer entries. + [3] Maximum number of used buffer entries. + [4] Total number of buffer entries. + + [1] [2] [3] [4] + JALU01 0 0 20 + JFPU01 17 18 18 + JLSAGU 0 0 12 Retire Control Unit - number of cycles where we saw N instructions retired: @@ -494,6 +516,10 @@ dot-product example discussed in the previous sections. 1, 102 (16.7%) 2, 399 (65.4%) + Total ROB Entries: 64 + Max Used ROB Entries: 35 ( 54.7% ) + Average Used ROB Entries per cy: 32 ( 50.0% ) + Register File statistics: Total number of mappings created: 900 @@ -511,23 +537,21 @@ dot-product example discussed in the previous sections. If we look at the *Dynamic Dispatch Stall Cycles* table, we see the counter for SCHEDQ reports 272 cycles. This counter is incremented every time the dispatch -logic is unable to dispatch a group of two instructions because the scheduler's -queue is full. +logic is unable to dispatch a full group because the scheduler's queue is full. -Looking at the *Dispatch Logic* table, we see that the pipeline was only able -to dispatch two instructions 51.5% of the time. The dispatch group was limited -to one instruction 44.6% of the cycles, which corresponds to 272 cycles. The +Looking at the *Dispatch Logic* table, we see that the pipeline was only able to +dispatch two micro opcodes 51.5% of the time. The dispatch group was limited to +one micro opcode 44.6% of the cycles, which corresponds to 272 cycles. The dispatch statistics are displayed by either using the command option ``-all-stats`` or ``-dispatch-stats``. The next table, *Schedulers*, presents a histogram displaying a count, representing the number of instructions issued on some number of cycles. In -this case, of the 610 simulated cycles, single -instructions were issued 306 times (50.2%) and there were 7 cycles where -no instructions were issued. +this case, of the 610 simulated cycles, single instructions were issued 306 +times (50.2%) and there were 7 cycles where no instructions were issued. -The *Scheduler's queue usage* table shows that the maximum number of buffer -entries (i.e., scheduler queue entries) used at runtime. Resource JFPU01 +The *Scheduler's queue usage* table shows that the average and maximum number of +buffer entries (i.e., scheduler queue entries) used at runtime. Resource JFPU01 reached its maximum (18 of 18 queue entries). Note that AMD Jaguar implements three schedulers: @@ -543,28 +567,28 @@ A full scheduler queue is either caused by data dependency chains or by a sub-optimal usage of hardware resources. Sometimes, resource pressure can be mitigated by rewriting the kernel using different instructions that consume different scheduler resources. Schedulers with a small queue are less resilient -to bottlenecks caused by the presence of long data dependencies. -The scheduler statistics are displayed by -using the command option ``-all-stats`` or ``-scheduler-stats``. +to bottlenecks caused by the presence of long data dependencies. The scheduler +statistics are displayed by using the command option ``-all-stats`` or +``-scheduler-stats``. The next table, *Retire Control Unit*, presents a histogram displaying a count, representing the number of instructions retired on some number of cycles. In -this case, of the 610 simulated cycles, two instructions were retired during -the same cycle 399 times (65.4%) and there were 109 cycles where no -instructions were retired. The retire statistics are displayed by using the -command option ``-all-stats`` or ``-retire-stats``. +this case, of the 610 simulated cycles, two instructions were retired during the +same cycle 399 times (65.4%) and there were 109 cycles where no instructions +were retired. The retire statistics are displayed by using the command option +``-all-stats`` or ``-retire-stats``. The last table presented is *Register File statistics*. Each physical register file (PRF) used by the pipeline is presented in this table. In the case of AMD -Jaguar, there are two register files, one for floating-point registers -(JFpuPRF) and one for integer registers (JIntegerPRF). The table shows that of -the 900 instructions processed, there were 900 mappings created. Since this -dot-product example utilized only floating point registers, the JFPuPRF was -responsible for creating the 900 mappings. However, we see that the pipeline -only used a maximum of 35 of 72 available register slots at any given time. We -can conclude that the floating point PRF was the only register file used for -the example, and that it was never resource constrained. The register file -statistics are displayed by using the command option ``-all-stats`` or +Jaguar, there are two register files, one for floating-point registers (JFpuPRF) +and one for integer registers (JIntegerPRF). The table shows that of the 900 +instructions processed, there were 900 mappings created. Since this dot-product +example utilized only floating point registers, the JFPuPRF was responsible for +creating the 900 mappings. However, we see that the pipeline only used a +maximum of 35 of 72 available register slots at any given time. We can conclude +that the floating point PRF was the only register file used for the example, and +that it was never resource constrained. The register file statistics are +displayed by using the command option ``-all-stats`` or ``-register-file-stats``. In this example, we can conclude that the IPC is mostly limited by data @@ -572,8 +596,8 @@ dependencies, and not by resource pressure. Instruction Flow ^^^^^^^^^^^^^^^^ -This section describes the instruction flow through MCA's default out-of-order -pipeline, as well as the functional units involved in the process. +This section describes the instruction flow through the default pipeline of +:program:`llvm-mca`, as well as the functional units involved in the process. The default pipeline implements the following sequence of stages used to process instructions. @@ -585,9 +609,9 @@ process instructions. The default pipeline only models the out-of-order portion of a processor. Therefore, the instruction fetch and decode stages are not modeled. Performance -bottlenecks in the frontend are not diagnosed. MCA assumes that instructions -have all been decoded and placed into a queue. Also, MCA does not model branch -prediction. +bottlenecks in the frontend are not diagnosed. :program:`llvm-mca` assumes that +instructions have all been decoded and placed into a queue before the simulation +start. Also, :program:`llvm-mca` does not model branch prediction. Instruction Dispatch """""""""""""""""""" @@ -607,19 +631,19 @@ An instruction can be dispatched if: * The schedulers are not full. Scheduling models can optionally specify which register files are available on -the processor. MCA uses that information to initialize register file -descriptors. Users can limit the number of physical registers that are +the processor. :program:`llvm-mca` uses that information to initialize register +file descriptors. Users can limit the number of physical registers that are globally available for register renaming by using the command option -``-register-file-size``. A value of zero for this option means *unbounded*. -By knowing how many registers are available for renaming, MCA can predict -dispatch stalls caused by the lack of registers. +``-register-file-size``. A value of zero for this option means *unbounded*. By +knowing how many registers are available for renaming, the tool can predict +dispatch stalls caused by the lack of physical registers. The number of reorder buffer entries consumed by an instruction depends on the -number of micro-opcodes specified by the target scheduling model. MCA's -reorder buffer's purpose is to track the progress of instructions that are -"in-flight," and to retire instructions in program order. The number of -entries in the reorder buffer defaults to the `MicroOpBufferSize` provided by -the target scheduling model. +number of micro-opcodes specified for that instruction by the target scheduling +model. The reorder buffer is responsible for tracking the progress of +instructions that are "in-flight", and retiring them in program order. The +number of entries in the reorder buffer defaults to the value specified by field +`MicroOpBufferSize` in the target scheduling model. Instructions that are dispatched to the schedulers consume scheduler buffer entries. :program:`llvm-mca` queries the scheduling model to determine the set @@ -646,32 +670,32 @@ available units from the group; by default, the resource manager uses a round-robin selector to guarantee that resource usage is uniformly distributed between all units of a group. -:program:`llvm-mca`'s scheduler implements three instruction queues: +:program:`llvm-mca`'s scheduler internally groups instructions into three sets: -* WaitQueue: a queue of instructions whose operands are not ready. -* ReadyQueue: a queue of instructions ready to execute. -* IssuedQueue: a queue of instructions executing. +* WaitSet: a set of instructions whose operands are not ready. +* ReadySet: a set of instructions ready to execute. +* IssuedSet: a set of instructions executing. -Depending on the operand availability, instructions that are dispatched to the -scheduler are either placed into the WaitQueue or into the ReadyQueue. +Depending on the operands availability, instructions that are dispatched to the +scheduler are either placed into the WaitSet or into the ReadySet. -Every cycle, the scheduler checks if instructions can be moved from the -WaitQueue to the ReadyQueue, and if instructions from the ReadyQueue can be -issued to the underlying pipelines. The algorithm prioritizes older instructions -over younger instructions. +Every cycle, the scheduler checks if instructions can be moved from the WaitSet +to the ReadySet, and if instructions from the ReadySet can be issued to the +underlying pipelines. The algorithm prioritizes older instructions over younger +instructions. Write-Back and Retire Stage """"""""""""""""""""""""""" -Issued instructions are moved from the ReadyQueue to the IssuedQueue. There, +Issued instructions are moved from the ReadySet to the IssuedSet. There, instructions wait until they reach the write-back stage. At that point, they get removed from the queue and the retire control unit is notified. -When instructions are executed, the retire control unit flags the -instruction as "ready to retire." +When instructions are executed, the retire control unit flags the instruction as +"ready to retire." -Instructions are retired in program order. The register file is notified of -the retirement so that it can free the physical registers that were allocated -for the instruction during the register renaming stage. +Instructions are retired in program order. The register file is notified of the +retirement so that it can free the physical registers that were allocated for +the instruction during the register renaming stage. Load/Store Unit and Memory Consistency Model """""""""""""""""""""""""""""""""""""""""""" diff --git a/gnu/llvm/docs/CommandGuide/llvm-objdump.rst b/gnu/llvm/docs/CommandGuide/llvm-objdump.rst new file mode 100644 index 00000000000..c3e7c166005 --- /dev/null +++ b/gnu/llvm/docs/CommandGuide/llvm-objdump.rst @@ -0,0 +1,123 @@ +llvm-objdump - LLVM's object file dumper +======================================== + +SYNOPSIS +-------- + +:program:`llvm-objdump` [*commands*] [*options*] [*filenames...*] + +DESCRIPTION +----------- +The :program:`llvm-objdump` utility prints the contents of object files and +final linked images named on the command line. If no file name is specified, +:program:`llvm-objdump` will attempt to read from *a.out*. If *-* is used as a +file name, :program:`llvm-objdump` will process a file on its standard input +stream. + +COMMANDS +-------- +At least one of the following commands are required, and some commands can be +combined with other commands: + +.. option:: -d, -disassemble + + Display assembler mnemonics for the machine instructions. Disassembles all + text sections found in the input file(s). + +.. option:: -D, -disassemble-all + + Display assembler mnemonics for the machine instructions. Disassembles all + sections found in the input file(s). + +.. option:: -help + + Display usage information and exit. Does not stack with other commands. + +.. option:: -r + + Display the relocation entries in the file. + +.. option:: -s + + Display the content of each section. + +.. option:: -section-headers + + Display summaries of the headers for each section. + +.. option:: -t + + Display the symbol table. + +.. option:: -version + + Display the version of this program. Does not stack with other commands. + +OPTIONS +------- +:program:`llvm-objdump` supports the following options: + +.. option:: -arch=<architecture> + + Specify the architecture to disassemble. see ``-version`` for available + architectures. + +.. option:: -cfg + + Create a CFG for every symbol in the object file and write it to a graphviz + file (Mach-O-only). + +.. option:: -dsym=<string> + + Use .dSYM file for debug info. + +.. option:: -g + + Print line information from debug info if available. + +.. option:: -m, -macho + + Use Mach-O specific object file parser. Commands and other options may behave + differently when used with ``-macho``. + +.. option:: -mattr=<a1,+a2,-a3,...> + + Target specific attributes. + +.. option:: -mc-x86-disable-arith-relaxation + + Disable relaxation of arithmetic instruction for X86. + +.. option:: -stats + + Enable statistics output from program. + +.. option:: -triple=<string> + + Target triple to disassemble for, see ``-version`` for available targets. + +.. option:: -x86-asm-syntax=<style> + + When used with the ``-disassemble`` option, choose style of code to emit from + X86 backend. Supported values are: + + .. option:: att + + AT&T-style assembly + + .. option:: intel + + Intel-style assembly + + + The default disassembly style is **att**. + +BUGS +---- + +To report bugs, please visit <http://llvm.org/bugs/>. + +SEE ALSO +-------- + +:manpage:`llvm-nm(1)` diff --git a/gnu/llvm/docs/CommandGuide/llvm-profdata.rst b/gnu/llvm/docs/CommandGuide/llvm-profdata.rst index 5b6330b5dc4..f66fb499697 100644 --- a/gnu/llvm/docs/CommandGuide/llvm-profdata.rst +++ b/gnu/llvm/docs/CommandGuide/llvm-profdata.rst @@ -74,6 +74,16 @@ OPTIONS file are newline-separated. Lines starting with '#' are skipped. Entries may be of the form <filename> or <weight>,<filename>. +.. option:: -remapping-file=path, -r=path + + Specify a file which contains a remapping from symbol names in the input + profile to the symbol names that should be used in the output profile. The + file should consist of lines of the form ``<input-symbol> <output-symbol>``. + Blank lines and lines starting with ``#`` are skipped. + + The :doc:`llvm-cxxmap <llvm-cxxmap>` tool can be used to generate the symbol + remapping file. + .. option:: -instr (default) Specify that the input profile is an instrumentation-based profile. @@ -193,7 +203,7 @@ OPTIONS annotations. .. option:: -topn=n - + Instruct the profile dumper to show the top ``n`` functions with the hottest basic blocks in the summary section. By default, the topn functions are not dumped. @@ -206,6 +216,16 @@ OPTIONS Show the profiled sizes of the memory intrinsic calls for shown functions. +.. option:: -value-cutoff=n + + Show only those functions whose max count values are greater or equal to ``n``. + By default, the value-cutoff is set to 0. + +.. option:: -list-below-cutoff + + Only output names of functions whose max count value are below the cutoff + value. + EXIT STATUS ----------- diff --git a/gnu/llvm/docs/CommandGuide/llvm-symbolizer.rst b/gnu/llvm/docs/CommandGuide/llvm-symbolizer.rst index 7bcad1c12f1..3c7a26e486f 100644 --- a/gnu/llvm/docs/CommandGuide/llvm-symbolizer.rst +++ b/gnu/llvm/docs/CommandGuide/llvm-symbolizer.rst @@ -68,7 +68,7 @@ EXAMPLE OPTIONS ------- -.. option:: -obj +.. option:: -obj, -exe, -e Path to object file to be symbolized. @@ -83,7 +83,7 @@ OPTIONS Prefer function names stored in symbol table to function names in debug info sections. Defaults to true. -.. option:: -demangle +.. option:: -demangle, -C Print demangled function names. Defaults to true. @@ -106,11 +106,11 @@ OPTIONS location, look for the debug info at the .dSYM path provided via the ``-dsym-hint`` flag. This flag can be used multiple times. -.. option:: -print-address +.. option:: -print-address, -addresses, -a Print address before the source code location. Defaults to false. -.. option:: -pretty-print +.. option:: -pretty-print, -p Print human readable output. If ``-inlining`` is specified, enclosing scope is prefixed by (inlined by). Refer to listed examples. diff --git a/gnu/llvm/docs/CommandGuide/tblgen.rst b/gnu/llvm/docs/CommandGuide/tblgen.rst index 55b54294846..3105e0c8076 100644 --- a/gnu/llvm/docs/CommandGuide/tblgen.rst +++ b/gnu/llvm/docs/CommandGuide/tblgen.rst @@ -130,6 +130,10 @@ OPTIONS Generate enhanced disassembly info. +.. option:: -gen-exegesis + + Generate llvm-exegesis tables. + .. option:: -version Show the version number of this program. |
