diff options
Diffstat (limited to 'gnu/llvm/docs')
85 files changed, 10849 insertions, 1352 deletions
diff --git a/gnu/llvm/docs/AMDGPUUsage.rst b/gnu/llvm/docs/AMDGPUUsage.rst index 97d6662a2ed..34a9b6011d4 100644 --- a/gnu/llvm/docs/AMDGPUUsage.rst +++ b/gnu/llvm/docs/AMDGPUUsage.rst @@ -9,6 +9,29 @@ The AMDGPU back-end provides ISA code generation for AMD GPUs, starting with the R600 family up until the current Volcanic Islands (GCN Gen 3). +Conventions +=========== + +Address Spaces +-------------- + +The AMDGPU back-end uses the following address space mapping: + + ============= ============================================ + Address Space Memory Space + ============= ============================================ + 0 Private + 1 Global + 2 Constant + 3 Local + 4 Generic (Flat) + 5 Region + ============= ============================================ + +The terminology in the table, aside from the region memory space, is from the +OpenCL standard. + + Assembler ========= @@ -65,14 +88,14 @@ wait for. .. code-block:: nasm - // Wait for all counters to be 0 + ; Wait for all counters to be 0 s_waitcnt 0 - // Equivalent to s_waitcnt 0. Counter names can also be delimited by - // '&' or ','. + ; Equivalent to s_waitcnt 0. Counter names can also be delimited by + ; '&' or ','. s_waitcnt vmcnt(0) expcnt(0) lgkcmt(0) - // Wait for vmcnt counter to be 1. + ; Wait for vmcnt counter to be 1. s_waitcnt vmcnt(1) VOP1, VOP2, VOP3, VOPC Instructions @@ -153,7 +176,10 @@ Here is an example of a minimal amd_kernel_code_t specification: .hsa_code_object_version 1,0 .hsa_code_object_isa - .text + .hsatext + .globl hello_world + .p2align 8 + .amdgpu_hsa_kernel hello_world hello_world: @@ -173,5 +199,7 @@ Here is an example of a minimal amd_kernel_code_t specification: s_waitcnt lgkmcnt(0) v_mov_b32 v1, s0 v_mov_b32 v2, s1 - flat_store_dword v0, v[1:2] + flat_store_dword v[1:2], v0 s_endpgm + .Lfunc_end0: + .size hello_world, .Lfunc_end0-hello_world diff --git a/gnu/llvm/docs/AdvancedBuilds.rst b/gnu/llvm/docs/AdvancedBuilds.rst new file mode 100644 index 00000000000..dc808a0ab83 --- /dev/null +++ b/gnu/llvm/docs/AdvancedBuilds.rst @@ -0,0 +1,174 @@ +============================= +Advanced Build Configurations +============================= + +.. contents:: + :local: + +Introduction +============ + +`CMake <http://www.cmake.org/>`_ is a cross-platform build-generator tool. CMake +does not build the project, it generates the files needed by your build tool +(GNU make, Visual Studio, etc.) for building LLVM. + +If **you are a new contributor**, please start with the :doc:`GettingStarted` or +:doc:`CMake` pages. This page is intended for users doing more complex builds. + +Many of the examples below are written assuming specific CMake Generators. +Unless otherwise explicitly called out these commands should work with any CMake +generator. + +Bootstrap Builds +================ + +The Clang CMake build system supports bootstrap (aka multi-stage) builds. At a +high level a multi-stage build is a chain of builds that pass data from one +stage into the next. The most common and simple version of this is a traditional +bootstrap build. + +In a simple two-stage bootstrap build, we build clang using the system compiler, +then use that just-built clang to build clang again. In CMake this simplest form +of a bootstrap build can be configured with a single option, +CLANG_ENABLE_BOOTSTRAP. + +.. code-block:: console + + $ cmake -G Ninja -DCLANG_ENABLE_BOOTSTRAP=On <path to source> + $ ninja stage2 + +This command itself isn't terribly useful because it assumes default +configurations for each stage. The next series of examples utilize CMake cache +scripts to provide more complex options. + +The clang build system refers to builds as stages. A stage1 build is a standard +build using the compiler installed on the host, and a stage2 build is built +using the stage1 compiler. This nomenclature holds up to more stages too. In +general a stage*n* build is built using the output from stage*n-1*. + +Apple Clang Builds (A More Complex Bootstrap) +============================================= + +Apple's Clang builds are a slightly more complicated example of the simple +bootstrapping scenario. Apple Clang is built using a 2-stage build. + +The stage1 compiler is a host-only compiler with some options set. The stage1 +compiler is a balance of optimization vs build time because it is a throwaway. +The stage2 compiler is the fully optimized compiler intended to ship to users. + +Setting up these compilers requires a lot of options. To simplify the +configuration the Apple Clang build settings are contained in CMake Cache files. +You can build an Apple Clang compiler using the following commands: + +.. code-block:: console + + $ cmake -G Ninja -C <path to clang>/cmake/caches/Apple-stage1.cmake <path to source> + $ ninja stage2-distribution + +This CMake invocation configures the stage1 host compiler, and sets +CLANG_BOOTSTRAP_CMAKE_ARGS to pass the Apple-stage2.cmake cache script to the +stage2 configuration step. + +When you build the stage2-distribution target it builds the minimal stage1 +compiler and required tools, then configures and builds the stage2 compiler +based on the settings in Apple-stage2.cmake. + +This pattern of using cache scripts to set complex settings, and specifically to +make later stage builds include cache scripts is common in our more advanced +build configurations. + +Multi-stage PGO +=============== + +Profile-Guided Optimizations (PGO) is a really great way to optimize the code +clang generates. Our multi-stage PGO builds are a workflow for generating PGO +profiles that can be used to optimize clang. + +At a high level, the way PGO works is that you build an instrumented compiler, +then you run the instrumented compiler against sample source files. While the +instrumented compiler runs it will output a bunch of files containing +performance counters (.profraw files). After generating all the profraw files +you use llvm-profdata to merge the files into a single profdata file that you +can feed into the LLVM_PROFDATA_FILE option. + +Our PGO.cmake cache script automates that whole process. You can use it by +running: + +.. code-block:: console + + $ cmake -G Ninja -C <path_to_clang>/cmake/caches/PGO.cmake <source dir> + $ ninja stage2-instrumented-generate-profdata + +If you let that run for a few hours or so, it will place a profdata file in your +build directory. This takes a really long time because it builds clang twice, +and you *must* have compiler-rt in your build tree. + +This process uses any source files under the perf-training directory as training +data as long as the source files are marked up with LIT-style RUN lines. + +After it finishes you can use “find . -name clang.profdata” to find it, but it +should be at a path something like: + +.. code-block:: console + + <build dir>/tools/clang/stage2-instrumented-bins/utils/perf-training/clang.profdata + +You can feed that file into the LLVM_PROFDATA_FILE option when you build your +optimized compiler. + +The PGO came cache has a slightly different stage naming scheme than other +multi-stage builds. It generates three stages; stage1, stage2-instrumented, and +stage2. Both of the stage2 builds are built using the stage1 compiler. + +The PGO came cache generates the following additional targets: + +**stage2-instrumented** + Builds a stage1 x86 compiler, runtime, and required tools (llvm-config, + llvm-profdata) then uses that compiler to build an instrumented stage2 compiler. + +**stage2-instrumented-generate-profdata** + Depends on "stage2-instrumented" and will use the instrumented compiler to + generate profdata based on the training files in <clang>/utils/perf-training + +**stage2** + Depends of "stage2-instrumented-generate-profdata" and will use the stage1 + compiler with the stage2 profdata to build a PGO-optimized compiler. + +**stage2-check-llvm** + Depends on stage2 and runs check-llvm using the stage2 compiler. + +**stage2-check-clang** + Depends on stage2 and runs check-clang using the stage2 compiler. + +**stage2-check-all** + Depends on stage2 and runs check-all using the stage2 compiler. + +**stage2-test-suite** + Depends on stage2 and runs the test-suite using the stage3 compiler (requires + in-tree test-suite). + +3-Stage Non-Determinism +======================= + +In the ancient lore of compilers non-determinism is like the multi-headed hydra. +Whenever it's head pops up, terror and chaos ensue. + +Historically one of the tests to verify that a compiler was deterministic would +be a three stage build. The idea of a three stage build is you take your sources +and build a compiler (stage1), then use that compiler to rebuild the sources +(stage2), then you use that compiler to rebuild the sources a third time +(stage3) with an identical configuration to the stage2 build. At the end of +this, you have a stage2 and stage3 compiler that should be bit-for-bit +identical. + +You can perform one of these 3-stage builds with LLVM & clang using the +following commands: + +.. code-block:: console + + $ cmake -G Ninja -C <path_to_clang>/cmake/caches/3-stage.cmake <source dir> + $ ninja stage3 + +After the build you can compare the stage2 & stage3 compilers. We have a bot +setup `here <http://lab.llvm.org:8011/builders/clang-3stage-ubuntu>`_ that runs +this build and compare configuration. diff --git a/gnu/llvm/docs/AliasAnalysis.rst b/gnu/llvm/docs/AliasAnalysis.rst index e055b4e1afb..097f7bf75cb 100644 --- a/gnu/llvm/docs/AliasAnalysis.rst +++ b/gnu/llvm/docs/AliasAnalysis.rst @@ -31,8 +31,7 @@ well together. This document contains information necessary to successfully implement this interface, use it, and to test both sides. It also explains some of the finer -points about what exactly results mean. If you feel that something is unclear -or should be added, please `let me know <mailto:sabre@nondot.org>`_. +points about what exactly results mean. ``AliasAnalysis`` Class Overview ================================ diff --git a/gnu/llvm/docs/Atomics.rst b/gnu/llvm/docs/Atomics.rst index 79ab74792dd..4961348d0c9 100644 --- a/gnu/llvm/docs/Atomics.rst +++ b/gnu/llvm/docs/Atomics.rst @@ -8,17 +8,13 @@ LLVM Atomic Instructions and Concurrency Guide Introduction ============ -Historically, LLVM has not had very strong support for concurrency; some minimal -intrinsics were provided, and ``volatile`` was used in some cases to achieve -rough semantics in the presence of concurrency. However, this is changing; -there are now new instructions which are well-defined in the presence of threads -and asynchronous signals, and the model for existing instructions has been -clarified in the IR. +LLVM supports instructions which are well-defined in the presence of threads and +asynchronous signals. The atomic instructions are designed specifically to provide readable IR and optimized code generation for the following: -* The new C++11 ``<atomic>`` header. (`C++11 draft available here +* The C++11 ``<atomic>`` header. (`C++11 draft available here <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C11 draft available here <http://www.open-std.org/jtc1/sc22/wg14/>`_.) @@ -371,7 +367,7 @@ Predicates for optimizer writers to query: that they return true for any operation which is volatile or at least Monotonic. -* ``isAtLeastAcquire()``/``isAtLeastRelease()``: These are predicates on +* ``isStrongerThan`` / ``isAtLeastOrStrongerThan``: These are predicates on orderings. They can be useful for passes that are aware of atomics, for example to do DSE across a single atomic access, but not across a release-acquire pair (see MemoryDependencyAnalysis for an example of this) @@ -402,7 +398,7 @@ operations: MemoryDependencyAnalysis (which is also used by other passes like GVN). * Folding a load: Any atomic load from a constant global can be constant-folded, - because it cannot be observed. Similar reasoning allows scalarrepl with + because it cannot be observed. Similar reasoning allows sroa with atomic loads and stores. Atomics and Codegen @@ -417,19 +413,28 @@ The MachineMemOperand for all atomic operations is currently marked as volatile; this is not correct in the IR sense of volatile, but CodeGen handles anything marked volatile very conservatively. This should get fixed at some point. -Common architectures have some way of representing at least a pointer-sized -lock-free ``cmpxchg``; such an operation can be used to implement all the other -atomic operations which can be represented in IR up to that size. Backends are -expected to implement all those operations, but not operations which cannot be -implemented in a lock-free manner. It is expected that backends will give an -error when given an operation which cannot be implemented. (The LLVM code -generator is not very helpful here at the moment, but hopefully that will -change.) +One very important property of the atomic operations is that if your backend +supports any inline lock-free atomic operations of a given size, you should +support *ALL* operations of that size in a lock-free manner. + +When the target implements atomic ``cmpxchg`` or LL/SC instructions (as most do) +this is trivial: all the other operations can be implemented on top of those +primitives. However, on many older CPUs (e.g. ARMv5, SparcV8, Intel 80386) there +are atomic load and store instructions, but no ``cmpxchg`` or LL/SC. As it is +invalid to implement ``atomic load`` using the native instruction, but +``cmpxchg`` using a library call to a function that uses a mutex, ``atomic +load`` must *also* expand to a library call on such architectures, so that it +can remain atomic with regards to a simultaneous ``cmpxchg``, by using the same +mutex. + +AtomicExpandPass can help with that: it will expand all atomic operations to the +proper ``__atomic_*`` libcalls for any size above the maximum set by +``setMaxAtomicSizeInBitsSupported`` (which defaults to 0). On x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores generate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent fences generate an ``MFENCE``, other fences do not cause any code to be -generated. cmpxchg uses the ``LOCK CMPXCHG`` instruction. ``atomicrmw xchg`` +generated. ``cmpxchg`` uses the ``LOCK CMPXCHG`` instruction. ``atomicrmw xchg`` uses ``XCHG``, ``atomicrmw add`` and ``atomicrmw sub`` use ``XADD``, and all other ``atomicrmw`` operations generate a loop with ``LOCK CMPXCHG``. Depending on the users of the result, some ``atomicrmw`` operations can be translated into @@ -450,10 +455,151 @@ atomic constructs. Here are some lowerings it can do: ``emitStoreConditional()`` * large loads/stores -> ll-sc/cmpxchg by overriding ``shouldExpandAtomicStoreInIR()``/``shouldExpandAtomicLoadInIR()`` -* strong atomic accesses -> monotonic accesses + fences - by using ``setInsertFencesForAtomic()`` and overriding ``emitLeadingFence()`` - and ``emitTrailingFence()`` +* strong atomic accesses -> monotonic accesses + fences by overriding + ``shouldInsertFencesForAtomic()``, ``emitLeadingFence()``, and + ``emitTrailingFence()`` * atomic rmw -> loop with cmpxchg or load-linked/store-conditional by overriding ``expandAtomicRMWInIR()`` +* expansion to __atomic_* libcalls for unsupported sizes. For an example of all of these, look at the ARM backend. + +Libcalls: __atomic_* +==================== + +There are two kinds of atomic library calls that are generated by LLVM. Please +note that both sets of library functions somewhat confusingly share the names of +builtin functions defined by clang. Despite this, the library functions are +not directly related to the builtins: it is *not* the case that ``__atomic_*`` +builtins lower to ``__atomic_*`` library calls and ``__sync_*`` builtins lower +to ``__sync_*`` library calls. + +The first set of library functions are named ``__atomic_*``. This set has been +"standardized" by GCC, and is described below. (See also `GCC's documentation +<https://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary>`_) + +LLVM's AtomicExpandPass will translate atomic operations on data sizes above +``MaxAtomicSizeInBitsSupported`` into calls to these functions. + +There are four generic functions, which can be called with data of any size or +alignment:: + + void __atomic_load(size_t size, void *ptr, void *ret, int ordering) + void __atomic_store(size_t size, void *ptr, void *val, int ordering) + void __atomic_exchange(size_t size, void *ptr, void *val, void *ret, int ordering) + bool __atomic_compare_exchange(size_t size, void *ptr, void *expected, void *desired, int success_order, int failure_order) + +There are also size-specialized versions of the above functions, which can only +be used with *naturally-aligned* pointers of the appropriate size. In the +signatures below, "N" is one of 1, 2, 4, 8, and 16, and "iN" is the appropriate +integer type of that size; if no such integer type exists, the specialization +cannot be used:: + + iN __atomic_load_N(iN *ptr, iN val, int ordering) + void __atomic_store_N(iN *ptr, iN val, int ordering) + iN __atomic_exchange_N(iN *ptr, iN val, int ordering) + bool __atomic_compare_exchange_N(iN *ptr, iN *expected, iN desired, int success_order, int failure_order) + +Finally there are some read-modify-write functions, which are only available in +the size-specific variants (any other sizes use a ``__atomic_compare_exchange`` +loop):: + + iN __atomic_fetch_add_N(iN *ptr, iN val, int ordering) + iN __atomic_fetch_sub_N(iN *ptr, iN val, int ordering) + iN __atomic_fetch_and_N(iN *ptr, iN val, int ordering) + iN __atomic_fetch_or_N(iN *ptr, iN val, int ordering) + iN __atomic_fetch_xor_N(iN *ptr, iN val, int ordering) + iN __atomic_fetch_nand_N(iN *ptr, iN val, int ordering) + +This set of library functions have some interesting implementation requirements +to take note of: + +- They support all sizes and alignments -- including those which cannot be + implemented natively on any existing hardware. Therefore, they will certainly + use mutexes in for some sizes/alignments. + +- As a consequence, they cannot be shipped in a statically linked + compiler-support library, as they have state which must be shared amongst all + DSOs loaded in the program. They must be provided in a shared library used by + all objects. + +- The set of atomic sizes supported lock-free must be a superset of the sizes + any compiler can emit. That is: if a new compiler introduces support for + inline-lock-free atomics of size N, the ``__atomic_*`` functions must also have a + lock-free implementation for size N. This is a requirement so that code + produced by an old compiler (which will have called the ``__atomic_*`` function) + interoperates with code produced by the new compiler (which will use native + the atomic instruction). + +Note that it's possible to write an entirely target-independent implementation +of these library functions by using the compiler atomic builtins themselves to +implement the operations on naturally-aligned pointers of supported sizes, and a +generic mutex implementation otherwise. + +Libcalls: __sync_* +================== + +Some targets or OS/target combinations can support lock-free atomics, but for +various reasons, it is not practical to emit the instructions inline. + +There's two typical examples of this. + +Some CPUs support multiple instruction sets which can be swiched back and forth +on function-call boundaries. For example, MIPS supports the MIPS16 ISA, which +has a smaller instruction encoding than the usual MIPS32 ISA. ARM, similarly, +has the Thumb ISA. In MIPS16 and earlier versions of Thumb, the atomic +instructions are not encodable. However, those instructions are available via a +function call to a function with the longer encoding. + +Additionally, a few OS/target pairs provide kernel-supported lock-free +atomics. ARM/Linux is an example of this: the kernel `provides +<https://www.kernel.org/doc/Documentation/arm/kernel_user_helpers.txt>`_ a +function which on older CPUs contains a "magically-restartable" atomic sequence +(which looks atomic so long as there's only one CPU), and contains actual atomic +instructions on newer multicore models. This sort of functionality can typically +be provided on any architecture, if all CPUs which are missing atomic +compare-and-swap support are uniprocessor (no SMP). This is almost always the +case. The only common architecture without that property is SPARC -- SPARCV8 SMP +systems were common, yet it doesn't support any sort of compare-and-swap +operation. + +In either of these cases, the Target in LLVM can claim support for atomics of an +appropriate size, and then implement some subset of the operations via libcalls +to a ``__sync_*`` function. Such functions *must* not use locks in their +implementation, because unlike the ``__atomic_*`` routines used by +AtomicExpandPass, these may be mixed-and-matched with native instructions by the +target lowering. + +Further, these routines do not need to be shared, as they are stateless. So, +there is no issue with having multiple copies included in one binary. Thus, +typically these routines are implemented by the statically-linked compiler +runtime support library. + +LLVM will emit a call to an appropriate ``__sync_*`` routine if the target +ISelLowering code has set the corresponding ``ATOMIC_CMPXCHG``, ``ATOMIC_SWAP``, +or ``ATOMIC_LOAD_*`` operation to "Expand", and if it has opted-into the +availability of those library functions via a call to ``initSyncLibcalls()``. + +The full set of functions that may be called by LLVM is (for ``N`` being 1, 2, +4, 8, or 16):: + + iN __sync_val_compare_and_swap_N(iN *ptr, iN expected, iN desired) + iN __sync_lock_test_and_set_N(iN *ptr, iN val) + iN __sync_fetch_and_add_N(iN *ptr, iN val) + iN __sync_fetch_and_sub_N(iN *ptr, iN val) + iN __sync_fetch_and_and_N(iN *ptr, iN val) + iN __sync_fetch_and_or_N(iN *ptr, iN val) + iN __sync_fetch_and_xor_N(iN *ptr, iN val) + iN __sync_fetch_and_nand_N(iN *ptr, iN val) + iN __sync_fetch_and_max_N(iN *ptr, iN val) + iN __sync_fetch_and_umax_N(iN *ptr, iN val) + iN __sync_fetch_and_min_N(iN *ptr, iN val) + iN __sync_fetch_and_umin_N(iN *ptr, iN val) + +This list doesn't include any function for atomic load or store; all known +architectures support atomic loads and stores directly (possibly by emitting a +fence on either side of a normal load or store.) + +There's also, somewhat separately, the possibility to lower ``ATOMIC_FENCE`` to +``__sync_synchronize()``. This may happen or not happen independent of all the +above, controlled purely by ``setOperationAction(ISD::ATOMIC_FENCE, ...)``. diff --git a/gnu/llvm/docs/BitCodeFormat.rst b/gnu/llvm/docs/BitCodeFormat.rst index d6e3099bdb6..ffa21763252 100644 --- a/gnu/llvm/docs/BitCodeFormat.rst +++ b/gnu/llvm/docs/BitCodeFormat.rst @@ -467,10 +467,11 @@ Native Object File Wrapper Format ================================= Bitcode files for LLVM IR may also be wrapped in a native object file -(i.e. ELF, COFF, Mach-O). The bitcode must be stored in a section of the -object file named ``.llvmbc``. This wrapper format is useful for accommodating -LTO in compilation pipelines where intermediate objects must be native object -files which contain metadata in other sections. +(i.e. ELF, COFF, Mach-O). The bitcode must be stored in a section of the object +file named ``__LLVM,__bitcode`` for MachO and ``.llvmbc`` for the other object +formats. This wrapper format is useful for accommodating LTO in compilation +pipelines where intermediate objects must be native object files which contain +metadata in other sections. Not all tools support this format. @@ -689,6 +690,7 @@ global variable. The operand fields are: .. _linkage type: * *linkage*: An encoding of the linkage type for this variable: + * ``external``: code 0 * ``weak``: code 1 * ``appending``: code 2 @@ -713,20 +715,30 @@ global variable. The operand fields are: .. _visibility: * *visibility*: If present, an encoding of the visibility of this variable: + * ``default``: code 0 * ``hidden``: code 1 * ``protected``: code 2 +.. _bcthreadlocal: + * *threadlocal*: If present, an encoding of the thread local storage mode of the variable: + * ``not thread local``: code 0 * ``thread local; default TLS model``: code 1 * ``localdynamic``: code 2 * ``initialexec``: code 3 * ``localexec``: code 4 -* *unnamed_addr*: If present and non-zero, indicates that the variable has - ``unnamed_addr`` +.. _bcunnamedaddr: + +* *unnamed_addr*: If present, an encoding of the ``unnamed_addr`` attribute of this + variable: + + * not ``unnamed_addr``: code 0 + * ``unnamed_addr``: code 1 + * ``local_unnamed_addr``: code 2 .. _bcdllstorageclass: @@ -736,6 +748,8 @@ global variable. The operand fields are: * ``dllimport``: code 1 * ``dllexport``: code 2 +* *comdat*: An encoding of the COMDAT of this function + .. _FUNCTION: MODULE_CODE_FUNCTION Record @@ -756,6 +770,7 @@ function. The operand fields are: * ``anyregcc``: code 13 * ``preserve_mostcc``: code 14 * ``preserve_allcc``: code 15 + * ``swiftcc`` : code 16 * ``cxx_fast_tlscc``: code 17 * ``x86_stdcallcc``: code 64 * ``x86_fastcallcc``: code 65 @@ -782,8 +797,8 @@ function. The operand fields are: * *gc*: If present and nonzero, the 1-based garbage collector index in the table of `MODULE_CODE_GCNAME`_ entries. -* *unnamed_addr*: If present and non-zero, indicates that the function has - ``unnamed_addr`` +* *unnamed_addr*: If present, an encoding of the + :ref:`unnamed_addr<bcunnamedaddr>` attribute of this function * *prologuedata*: If non-zero, the value index of the prologue data for this function, plus 1. @@ -802,7 +817,7 @@ function. The operand fields are: MODULE_CODE_ALIAS Record ^^^^^^^^^^^^^^^^^^^^^^^^ -``[ALIAS, alias type, aliasee val#, linkage, visibility, dllstorageclass]`` +``[ALIAS, alias type, aliasee val#, linkage, visibility, dllstorageclass, threadlocal, unnamed_addr]`` The ``ALIAS`` record (code 9) marks the definition of an alias. The operand fields are @@ -818,6 +833,12 @@ fields are * *dllstorageclass*: If present, an encoding of the :ref:`dllstorageclass<bcdllstorageclass>` of the alias +* *threadlocal*: If present, an encoding of the + :ref:`thread local property<bcthreadlocal>` of the alias + +* *unnamed_addr*: If present, an encoding of the + :ref:`unnamed_addr<bcunnamedaddr>` attribute of this alias + MODULE_CODE_PURGEVALS Record ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ diff --git a/gnu/llvm/docs/CMake.rst b/gnu/llvm/docs/CMake.rst index 4e5feae9993..5d57bc98596 100644 --- a/gnu/llvm/docs/CMake.rst +++ b/gnu/llvm/docs/CMake.rst @@ -12,12 +12,20 @@ Introduction does not build the project, it generates the files needed by your build tool (GNU make, Visual Studio, etc.) for building LLVM. +If **you are a new contributor**, please start with the :doc:`GettingStarted` +page. This page is geared for existing contributors moving from the +legacy configure/make system. + If you are really anxious about getting a functional LLVM build, go to the `Quick start`_ section. If you are a CMake novice, start with `Basic CMake usage`_ and then go back to the `Quick start`_ section once you know what you are doing. The `Options and variables`_ section is a reference for customizing your build. If you already have experience with CMake, this is the recommended starting point. +This page is geared towards users of the LLVM CMake build. If you're looking for +information about modifying the LLVM CMake build system you may want to see the +:doc:`CMakePrimer` page. It has a basic overview of the CMake language. + .. _Quick start: Quick start @@ -26,10 +34,7 @@ Quick start We use here the command-line, non-interactive CMake interface. #. `Download <http://www.cmake.org/cmake/resources/software.html>`_ and install - CMake. Version 2.8.8 is the minimum required, but if you're using the Ninja - backend, CMake v3.2 or newer is required to `get interactive output - <http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20141117/244797.html>`_ - when running :doc:`Lit <CommandGuide/lit>`. + CMake. Version 3.4.3 is the minimum required. #. Open a shell. Your development tools must be reachable from this shell through the PATH environment variable. @@ -259,6 +264,9 @@ LLVM-specific variables link against LLVM libraries and make use of C++ exceptions in your own code that need to propagate through LLVM code. Defaults to OFF. +**LLVM_ENABLE_EXPENSIVE_CHECKS**:BOOL + Enable additional time/memory expensive checking. Defaults to OFF. + **LLVM_ENABLE_PIC**:BOOL Add the ``-fPIC`` flag to the compiler command-line, if the compiler supports this flag. Some systems, like Windows, do not need this flag. Defaults to ON. @@ -328,6 +336,14 @@ LLVM-specific variables will not be used. If the variable for an external project does not point to a valid path, then that project will not be built. +**LLVM_EXTERNAL_PROJECTS**:STRING + Semicolon-separated list of additional external projects to build as part of + llvm. For each project LLVM_EXTERNAL_<NAME>_SOURCE_DIR have to be specified + with the path for the source code of the project. Example: + ``-DLLVM_EXTERNAL_PROJECTS="Foo;Bar" + -DLLVM_EXTERNAL_FOO_SOURCE_DIR=/src/foo + -DLLVM_EXTERNAL_BAR_SOURCE_DIR=/src/bar``. + **LLVM_USE_OPROFILE**:BOOL Enable building OProfile JIT support. Defaults to OFF. @@ -347,6 +363,11 @@ LLVM-specific variables are ``Address``, ``Memory``, ``MemoryWithOrigins``, ``Undefined``, ``Thread``, and ``Address;Undefined``. Defaults to empty string. +**LLVM_ENABLE_LTO**:STRING + Add ``-flto`` or ``-flto=`` flags to the compile and link command + lines, enabling link-time optimization. Possible values are ``Off``, + ``On``, ``Thin`` and ``Full``. Defaults to OFF. + **LLVM_PARALLEL_COMPILE_JOBS**:STRING Define the maximum number of concurrent compilation jobs. @@ -354,10 +375,12 @@ LLVM-specific variables Define the maximum number of concurrent link jobs. **LLVM_BUILD_DOCS**:BOOL - Enables all enabled documentation targets (i.e. Doxgyen and Sphinx targets) to - be built as part of the normal build. If the ``install`` target is run then - this also enables all built documentation targets to be installed. Defaults to - OFF. + Adds all *enabled* documentation targets (i.e. Doxgyen and Sphinx targets) as + dependencies of the default build targets. This results in all of the (enabled) + documentation targets being as part of a normal build. If the ``install`` + target is run then this also enables all built documentation targets to be + installed. Defaults to OFF. To enable a particular documentation target, see + see LLVM_ENABLE_SPHINX and LLVM_ENABLE_DOXYGEN. **LLVM_ENABLE_DOXYGEN**:BOOL Enables the generation of browsable HTML documentation using doxygen. @@ -409,7 +432,7 @@ LLVM-specific variables Defaults to OFF. **LLVM_ENABLE_SPHINX**:BOOL - If enabled CMake will search for the ``sphinx-build`` executable and will make + If specified, CMake will search for the ``sphinx-build`` executable and will make the ``SPHINX_OUTPUT_HTML`` and ``SPHINX_OUTPUT_MAN`` CMake options available. Defaults to OFF. @@ -463,6 +486,47 @@ LLVM-specific variables If you want to build LLVM as a shared library, you should use the ``LLVM_BUILD_LLVM_DYLIB`` option. +**LLVM_OPTIMIZED_TABLEGEN**:BOOL + If enabled and building a debug or asserts build the CMake build system will + generate a Release build tree to build a fully optimized tablegen for use + during the build. Enabling this option can significantly speed up build times + especially when building LLVM in Debug configurations. + +CMake Caches +============ + +Recently LLVM and Clang have been adding some more complicated build system +features. Utilizing these new features often involves a complicated chain of +CMake variables passed on the command line. Clang provides a collection of CMake +cache scripts to make these features more approachable. + +CMake cache files are utilized using CMake's -C flag: + +.. code-block:: console + + $ cmake -C <path to cache file> <path to sources> + +CMake cache scripts are processed in an isolated scope, only cached variables +remain set when the main configuration runs. CMake cached variables do not reset +variables that are already set unless the FORCE option is specified. + +A few notes about CMake Caches: + +- Order of command line arguments is important + + - -D arguments specified before -C are set before the cache is processed and + can be read inside the cache file + - -D arguments specified after -C are set after the cache is processed and + are unset inside the cache file + +- All -D arguments will override cache file settings +- CMAKE_TOOLCHAIN_FILE is evaluated after both the cache file and the command + line arguments +- It is recommended that all -D options should be specified *before* -C + +For more information about some of the advanced build configurations supported +via Cache files see :doc:`AdvancedBuilds`. + Executing the test suite ======================== @@ -502,7 +566,7 @@ and uses them to build a simple application ``simple-tool``. .. code-block:: cmake - cmake_minimum_required(VERSION 2.8.8) + cmake_minimum_required(VERSION 3.4.3) project(SimpleProject) find_package(LLVM REQUIRED CONFIG) @@ -532,16 +596,16 @@ The ``find_package(...)`` directive when used in CONFIG mode (as in the above example) will look for the ``LLVMConfig.cmake`` file in various locations (see cmake manual for details). It creates a ``LLVM_DIR`` cache entry to save the directory where ``LLVMConfig.cmake`` is found or allows the user to specify the -directory (e.g. by passing ``-DLLVM_DIR=/usr/share/llvm/cmake`` to +directory (e.g. by passing ``-DLLVM_DIR=/usr/lib/cmake/llvm`` to the ``cmake`` command or by setting it directly in ``ccmake`` or ``cmake-gui``). This file is available in two different locations. -* ``<INSTALL_PREFIX>/share/llvm/cmake/LLVMConfig.cmake`` where +* ``<INSTALL_PREFIX>/lib/cmake/llvm/LLVMConfig.cmake`` where ``<INSTALL_PREFIX>`` is the install prefix of an installed version of LLVM. - On Linux typically this is ``/usr/share/llvm/cmake/LLVMConfig.cmake``. + On Linux typically this is ``/usr/lib/cmake/llvm/LLVMConfig.cmake``. -* ``<LLVM_BUILD_ROOT>/share/llvm/cmake/LLVMConfig.cmake`` where +* ``<LLVM_BUILD_ROOT>/lib/cmake/llvm/LLVMConfig.cmake`` where ``<LLVM_BUILD_ROOT>`` is the root of the LLVM build tree. **Note: this is only available when building LLVM with CMake.** diff --git a/gnu/llvm/docs/CMakePrimer.rst b/gnu/llvm/docs/CMakePrimer.rst new file mode 100644 index 00000000000..03477902214 --- /dev/null +++ b/gnu/llvm/docs/CMakePrimer.rst @@ -0,0 +1,465 @@ +============ +CMake Primer +============ + +.. contents:: + :local: + +.. warning:: + Disclaimer: This documentation is written by LLVM project contributors `not` + anyone affiliated with the CMake project. This document may contain + inaccurate terminology, phrasing, or technical details. It is provided with + the best intentions. + + +Introduction +============ + +The LLVM project and many of the core projects built on LLVM build using CMake. +This document aims to provide a brief overview of CMake for developers modifying +LLVM projects or building their own projects on top of LLVM. + +The official CMake language references is available in the cmake-language +manpage and `cmake-language online documentation +<https://cmake.org/cmake/help/v3.4/manual/cmake-language.7.html>`_. + +10,000 ft View +============== + +CMake is a tool that reads script files in its own language that describe how a +software project builds. As CMake evaluates the scripts it constructs an +internal representation of the software project. Once the scripts have been +fully processed, if there are no errors, CMake will generate build files to +actually build the project. CMake supports generating build files for a variety +of command line build tools as well as for popular IDEs. + +When a user runs CMake it performs a variety of checks similar to how autoconf +worked historically. During the checks and the evaluation of the build +description scripts CMake caches values into the CMakeCache. This is useful +because it allows the build system to skip long-running checks during +incremental development. CMake caching also has some drawbacks, but that will be +discussed later. + +Scripting Overview +================== + +CMake's scripting language has a very simple grammar. Every language construct +is a command that matches the pattern _name_(_args_). Commands come in three +primary types: language-defined (commands implemented in C++ in CMake), defined +functions, and defined macros. The CMake distribution also contains a suite of +CMake modules that contain definitions for useful functionality. + +The example below is the full CMake build for building a C++ "Hello World" +program. The example uses only CMake language-defined functions. + +.. code-block:: cmake + + cmake_minimum_required(VERSION 3.2) + project(HelloWorld) + add_executable(HelloWorld HelloWorld.cpp) + +The CMake language provides control flow constructs in the form of foreach loops +and if blocks. To make the example above more complicated you could add an if +block to define "APPLE" when targeting Apple platforms: + +.. code-block:: cmake + + cmake_minimum_required(VERSION 3.2) + project(HelloWorld) + add_executable(HelloWorld HelloWorld.cpp) + if(APPLE) + target_compile_definitions(HelloWorld PUBLIC APPLE) + endif() + +Variables, Types, and Scope +=========================== + +Dereferencing +------------- + +In CMake variables are "stringly" typed. All variables are represented as +strings throughout evaluation. Wrapping a variable in ``${}`` dereferences it +and results in a literal substitution of the name for the value. CMake refers to +this as "variable evaluation" in their documentation. Dereferences are performed +*before* the command being called receives the arguments. This means +dereferencing a list results in multiple separate arguments being passed to the +command. + +Variable dereferences can be nested and be used to model complex data. For +example: + +.. code-block:: cmake + + set(var_name var1) + set(${var_name} foo) # same as "set(var1 foo)" + set(${${var_name}}_var bar) # same as "set(foo_var bar)" + +Dereferencing an unset variable results in an empty expansion. It is a common +pattern in CMake to conditionally set variables knowing that it will be used in +code paths that the variable isn't set. There are examples of this throughout +the LLVM CMake build system. + +An example of variable empty expansion is: + +.. code-block:: cmake + + if(APPLE) + set(extra_sources Apple.cpp) + endif() + add_executable(HelloWorld HelloWorld.cpp ${extra_sources}) + +In this example the ``extra_sources`` variable is only defined if you're +targeting an Apple platform. For all other targets the ``extra_sources`` will be +evaluated as empty before add_executable is given its arguments. + +One big "Gotcha" with variable dereferencing is that ``if`` commands implicitly +dereference values. This has some unexpected results. For example: + +.. code-block:: cmake + + if("${SOME_VAR}" STREQUAL "MSVC") + +In this code sample MSVC will be implicitly dereferenced, which will result in +the if command comparing the value of the dereferenced variables ``SOME_VAR`` +and ``MSVC``. A common workaround to this solution is to prepend strings being +compared with an ``x``. + +.. code-block:: cmake + + if("x${SOME_VAR}" STREQUAL "xMSVC") + +This works because while ``MSVC`` is a defined variable, ``xMSVC`` is not. This +pattern is uncommon, but it does occur in LLVM's CMake scripts. + +.. note:: + + Once the LLVM project upgrades its minimum CMake version to 3.1 or later we + can prevent this behavior by setting CMP0054 to new. For more information on + CMake policies please see the cmake-policies manpage or the `cmake-policies + online documentation + <https://cmake.org/cmake/help/v3.4/manual/cmake-policies.7.html>`_. + +Lists +----- + +In CMake lists are semi-colon delimited strings, and it is strongly advised that +you avoid using semi-colons in lists; it doesn't go smoothly. A few examples of +defining lists: + +.. code-block:: cmake + + # Creates a list with members a, b, c, and d + set(my_list a b c d) + set(my_list "a;b;c;d") + + # Creates a string "a b c d" + set(my_string "a b c d") + +Lists of Lists +-------------- + +One of the more complicated patterns in CMake is lists of lists. Because a list +cannot contain an element with a semi-colon to construct a list of lists you +make a list of variable names that refer to other lists. For example: + +.. code-block:: cmake + + set(list_of_lists a b c) + set(a 1 2 3) + set(b 4 5 6) + set(c 7 8 9) + +With this layout you can iterate through the list of lists printing each value +with the following code: + +.. code-block:: cmake + + foreach(list_name IN LISTS list_of_lists) + foreach(value IN LISTS ${list_name}) + message(${value}) + endforeach() + endforeach() + +You'll notice that the inner foreach loop's list is doubly dereferenced. This is +because the first dereference turns ``list_name`` into the name of the sub-list +(a, b, or c in the example), then the second dereference is to get the value of +the list. + +This pattern is used throughout CMake, the most common example is the compiler +flags options, which CMake refers to using the following variable expansions: +CMAKE_${LANGUAGE}_FLAGS and CMAKE_${LANGUAGE}_FLAGS_${CMAKE_BUILD_TYPE}. + +Other Types +----------- + +Variables that are cached or specified on the command line can have types +associated with them. The variable's type is used by CMake's UI tool to display +the right input field. The variable's type generally doesn't impact evaluation. +One of the few examples is PATH variables, which CMake does have some special +handling for. You can read more about the special handling in `CMake's set +documentation +<https://cmake.org/cmake/help/v3.5/command/set.html#set-cache-entry>`_. + +Scope +----- + +CMake inherently has a directory-based scoping. Setting a variable in a +CMakeLists file, will set the variable for that file, and all subdirectories. +Variables set in a CMake module that is included in a CMakeLists file will be +set in the scope they are included from, and all subdirectories. + +When a variable that is already set is set again in a subdirectory it overrides +the value in that scope and any deeper subdirectories. + +The CMake set command provides two scope-related options. PARENT_SCOPE sets a +variable into the parent scope, and not the current scope. The CACHE option sets +the variable in the CMakeCache, which results in it being set in all scopes. The +CACHE option will not set a variable that already exists in the CACHE unless the +FORCE option is specified. + +In addition to directory-based scope, CMake functions also have their own scope. +This means variables set inside functions do not bleed into the parent scope. +This is not true of macros, and it is for this reason LLVM prefers functions +over macros whenever reasonable. + +.. note:: + Unlike C-based languages, CMake's loop and control flow blocks do not have + their own scopes. + +Control Flow +============ + +CMake features the same basic control flow constructs you would expect in any +scripting language, but there are a few quarks because, as with everything in +CMake, control flow constructs are commands. + +If, ElseIf, Else +---------------- + +.. note:: + For the full documentation on the CMake if command go + `here <https://cmake.org/cmake/help/v3.4/command/if.html>`_. That resource is + far more complete. + +In general CMake if blocks work the way you'd expect: + +.. code-block:: cmake + + if(<condition>) + .. do stuff + elseif(<condition>) + .. do other stuff + else() + .. do other other stuff + endif() + +The single most important thing to know about CMake's if blocks coming from a C +background is that they do not have their own scope. Variables set inside +conditional blocks persist after the ``endif()``. + +Loops +----- + +The most common form of the CMake ``foreach`` block is: + +.. code-block:: cmake + + foreach(var ...) + .. do stuff + endforeach() + +The variable argument portion of the ``foreach`` block can contain dereferenced +lists, values to iterate, or a mix of both: + +.. code-block:: cmake + + foreach(var foo bar baz) + message(${var}) + endforeach() + # prints: + # foo + # bar + # baz + + set(my_list 1 2 3) + foreach(var ${my_list}) + message(${var}) + endforeach() + # prints: + # 1 + # 2 + # 3 + + foreach(var ${my_list} out_of_bounds) + message(${var}) + endforeach() + # prints: + # 1 + # 2 + # 3 + # out_of_bounds + +There is also a more modern CMake foreach syntax. The code below is equivalent +to the code above: + +.. code-block:: cmake + + foreach(var IN ITEMS foo bar baz) + message(${var}) + endforeach() + # prints: + # foo + # bar + # baz + + set(my_list 1 2 3) + foreach(var IN LISTS my_list) + message(${var}) + endforeach() + # prints: + # 1 + # 2 + # 3 + + foreach(var IN LISTS my_list ITEMS out_of_bounds) + message(${var}) + endforeach() + # prints: + # 1 + # 2 + # 3 + # out_of_bounds + +Similar to the conditional statements, these generally behave how you would +expect, and they do not have their own scope. + +CMake also supports ``while`` loops, although they are not widely used in LLVM. + +Modules, Functions and Macros +============================= + +Modules +------- + +Modules are CMake's vehicle for enabling code reuse. CMake modules are just +CMake script files. They can contain code to execute on include as well as +definitions for commands. + +In CMake macros and functions are universally referred to as commands, and they +are the primary method of defining code that can be called multiple times. + +In LLVM we have several CMake modules that are included as part of our +distribution for developers who don't build our project from source. Those +modules are the fundamental pieces needed to build LLVM-based projects with +CMake. We also rely on modules as a way of organizing the build system's +functionality for maintainability and re-use within LLVM projects. + +Argument Handling +----------------- + +When defining a CMake command handling arguments is very useful. The examples +in this section will all use the CMake ``function`` block, but this all applies +to the ``macro`` block as well. + +CMake commands can have named arguments, but all commands are implicitly +variable argument. If the command has named arguments they are required and must +be specified at every call site. Below is a trivial example of providing a +wrapper function for CMake's built in function ``add_dependencies``. + +.. code-block:: cmake + + function(add_deps target) + add_dependencies(${target} ${ARGV}) + endfunction() + +This example defines a new macro named ``add_deps`` which takes a required first +argument, and just calls another function passing through the first argument and +all trailing arguments. When variable arguments are present CMake defines them +in a list named ``ARGV``, and the count of the arguments is defined in ``ARGN``. + +CMake provides a module ``CMakeParseArguments`` which provides an implementation +of advanced argument parsing. We use this all over LLVM, and it is recommended +for any function that has complex argument-based behaviors or optional +arguments. CMake's official documentation for the module is in the +``cmake-modules`` manpage, and is also available at the +`cmake-modules online documentation +<https://cmake.org/cmake/help/v3.4/module/CMakeParseArguments.html>`_. + +.. note:: + As of CMake 3.5 the cmake_parse_arguments command has become a native command + and the CMakeParseArguments module is empty and only left around for + compatibility. + +Functions Vs Macros +------------------- + +Functions and Macros look very similar in how they are used, but there is one +fundamental difference between the two. Functions have their own scope, and +macros don't. This means variables set in macros will bleed out into the calling +scope. That makes macros suitable for defining very small bits of functionality +only. + +The other difference between CMake functions and macros is how arguments are +passed. Arguments to macros are not set as variables, instead dereferences to +the parameters are resolved across the macro before executing it. This can +result in some unexpected behavior if using unreferenced variables. For example: + +.. code-block:: cmake + + macro(print_list my_list) + foreach(var IN LISTS my_list) + message("${var}") + endforeach() + endmacro() + + set(my_list a b c d) + set(my_list_of_numbers 1 2 3 4) + print_list(my_list_of_numbers) + # prints: + # a + # b + # c + # d + +Generally speaking this issue is uncommon because it requires using +non-dereferenced variables with names that overlap in the parent scope, but it +is important to be aware of because it can lead to subtle bugs. + +LLVM Project Wrappers +===================== + +LLVM projects provide lots of wrappers around critical CMake built-in commands. +We use these wrappers to provide consistent behaviors across LLVM components +and to reduce code duplication. + +We generally (but not always) follow the convention that commands prefaced with +``llvm_`` are intended to be used only as building blocks for other commands. +Wrapper commands that are intended for direct use are generally named following +with the project in the middle of the command name (i.e. ``add_llvm_executable`` +is the wrapper for ``add_executable``). The LLVM ``add_*`` wrapper functions are +all defined in ``AddLLVM.cmake`` which is installed as part of the LLVM +distribution. It can be included and used by any LLVM sub-project that requires +LLVM. + +.. note:: + + Not all LLVM projects require LLVM for all use cases. For example compiler-rt + can be built without LLVM, and the compiler-rt sanitizer libraries are used + with GCC. + +Useful Built-in Commands +======================== + +CMake has a bunch of useful built-in commands. This document isn't going to +go into details about them because The CMake project has excellent +documentation. To highlight a few useful functions see: + +* `add_custom_command <https://cmake.org/cmake/help/v3.4/command/add_custom_command.html>`_ +* `add_custom_target <https://cmake.org/cmake/help/v3.4/command/add_custom_target.html>`_ +* `file <https://cmake.org/cmake/help/v3.4/command/file.html>`_ +* `list <https://cmake.org/cmake/help/v3.4/command/list.html>`_ +* `math <https://cmake.org/cmake/help/v3.4/command/math.html>`_ +* `string <https://cmake.org/cmake/help/v3.4/command/string.html>`_ + +The full documentation for CMake commands is in the ``cmake-commands`` manpage +and available on `CMake's website <https://cmake.org/cmake/help/v3.4/manual/cmake-commands.7.html>`_ diff --git a/gnu/llvm/docs/CodeGenerator.rst b/gnu/llvm/docs/CodeGenerator.rst index f3b949c7ad1..2f5a27c00af 100644 --- a/gnu/llvm/docs/CodeGenerator.rst +++ b/gnu/llvm/docs/CodeGenerator.rst @@ -45,7 +45,7 @@ components: ``include/llvm/CodeGen/``. At this level, concepts like "constant pool entries" and "jump tables" are explicitly exposed. -3. Classes and algorithms used to represent code as the object file level, the +3. Classes and algorithms used to represent code at the object file level, the `MC Layer`_. These classes represent assembly level constructs like labels, sections, and instructions. At this level, concepts like "constant pool entries" and "jump tables" don't exist. @@ -386,32 +386,27 @@ functions make it easy to build arbitrary machine instructions. Usage of the .. code-block:: c++ // Create a 'DestReg = mov 42' (rendered in X86 assembly as 'mov DestReg, 42') - // instruction. The '1' specifies how many operands will be added. - MachineInstr *MI = BuildMI(X86::MOV32ri, 1, DestReg).addImm(42); - - // Create the same instr, but insert it at the end of a basic block. + // instruction and insert it at the end of the given MachineBasicBlock. + const TargetInstrInfo &TII = ... MachineBasicBlock &MBB = ... - BuildMI(MBB, X86::MOV32ri, 1, DestReg).addImm(42); + DebugLoc DL; + MachineInstr *MI = BuildMI(MBB, DL, TII.get(X86::MOV32ri), DestReg).addImm(42); // Create the same instr, but insert it before a specified iterator point. MachineBasicBlock::iterator MBBI = ... - BuildMI(MBB, MBBI, X86::MOV32ri, 1, DestReg).addImm(42); + BuildMI(MBB, MBBI, DL, TII.get(X86::MOV32ri), DestReg).addImm(42); // Create a 'cmp Reg, 0' instruction, no destination reg. - MI = BuildMI(X86::CMP32ri, 2).addReg(Reg).addImm(0); + MI = BuildMI(MBB, DL, TII.get(X86::CMP32ri8)).addReg(Reg).addImm(42); // Create an 'sahf' instruction which takes no operands and stores nothing. - MI = BuildMI(X86::SAHF, 0); + MI = BuildMI(MBB, DL, TII.get(X86::SAHF)); // Create a self looping branch instruction. - BuildMI(MBB, X86::JNE, 1).addMBB(&MBB); + BuildMI(MBB, DL, TII.get(X86::JNE)).addMBB(&MBB); -The key thing to remember with the ``BuildMI`` functions is that you have to -specify the number of operands that the machine instruction will take. This -allows for efficient memory allocation. You also need to specify if operands -default to be uses of values, not definitions. If you need to add a definition -operand (other than the optional destination register), you must explicitly mark -it as such: +If you need to add a definition operand (other than the optional destination +register), you must explicitly mark it as such: .. code-block:: c++ @@ -441,7 +436,7 @@ For example, consider this simple LLVM example: The X86 instruction selector might produce this machine code for the ``div`` and ``ret``: -.. code-block:: llvm +.. code-block:: text ;; Start of div %EAX = mov %reg1024 ;; Copy X (in reg1024) into EAX @@ -458,7 +453,7 @@ By the end of code generation, the register allocator would coalesce the registers and delete the resultant identity moves producing the following code: -.. code-block:: llvm +.. code-block:: text ;; X is in EAX, Y is in ECX mov %EAX, %EDX @@ -632,7 +627,7 @@ directives through MCStreamer. On the implementation side of MCStreamer, there are two major implementations: one for writing out a .s file (MCAsmStreamer), and one for writing out a .o -file (MCObjectStreamer). MCAsmStreamer is a straight-forward implementation +file (MCObjectStreamer). MCAsmStreamer is a straightforward implementation that prints out a directive for each method (e.g. ``EmitValue -> .byte``), but MCObjectStreamer implements a full assembler. @@ -970,7 +965,7 @@ target code. For example, consider the following LLVM fragment: This LLVM code corresponds to a SelectionDAG that looks basically like this: -.. code-block:: llvm +.. code-block:: text (fadd:f32 (fmul:f32 (fadd:f32 W, X), Y), Z) @@ -1771,13 +1766,11 @@ table that summarizes what features are supported by each target. Target Feature Matrix --------------------- -Note that this table does not include the C backend or Cpp backends, since they -do not use the target independent code generator infrastructure. It also -doesn't list features that are not supported fully by any target yet. It -considers a feature to be supported if at least one subtarget supports it. A -feature being supported means that it is useful and works for most cases, it -does not indicate that there are zero known bugs in the implementation. Here is -the key: +Note that this table does not list features that are not supported fully by any +target yet. It considers a feature to be supported if at least one subtarget +supports it. A feature being supported means that it is useful and works for +most cases, it does not indicate that there are zero known bugs in the +implementation. Here is the key: :raw-html:`<table border="1" cellspacing="0">` :raw-html:`<tr>` @@ -2197,9 +2190,9 @@ prefix byte on an instruction causes the instruction's memory access to go to the specified segment. LLVM address space 0 is the default address space, which includes the stack, and any unqualified memory accesses in a program. Address spaces 1-255 are currently reserved for user-defined code. The GS-segment is -represented by address space 256, while the FS-segment is represented by address -space 257. Other x86 segments have yet to be allocated address space -numbers. +represented by address space 256, the FS-segment is represented by address space +257, and the SS-segment is represented by address space 258. Other x86 segments +have yet to be allocated address space numbers. While these address spaces may seem similar to TLS via the ``thread_local`` keyword, and often use the same underlying hardware, there are some fundamental @@ -2645,3 +2638,59 @@ of a program is limited to 4K instructions: this ensures fast termination and a limited number of kernel function calls. Prior to running an eBPF program, a verifier performs static analysis to prevent loops in the code and to ensure valid register usage and operand types. + +The AMDGPU backend +------------------ + +The AMDGPU code generator lives in the lib/Target/AMDGPU directory, and is an +open source native AMD GCN ISA code generator. + +Target triples supported +^^^^^^^^^^^^^^^^^^^^^^^^ + +The following are the known target triples that are supported by the AMDGPU +backend. + +* **amdgcn--** --- AMD GCN GPUs (AMDGPU.7.0.0+) +* **amdgcn--amdhsa** --- AMD GCN GPUs (AMDGPU.7.0.0+) with HSA support +* **r600--** --- AMD GPUs HD2XXX-HD6XXX + +Relocations +^^^^^^^^^^^ + +Supported relocatable fields are: + +* **word32** --- This specifies a 32-bit field occupying 4 bytes with arbitrary + byte alignment. These values use the same byte order as other word values in + the AMD GPU architecture +* **word64** --- This specifies a 64-bit field occupying 8 bytes with arbitrary + byte alignment. These values use the same byte order as other word values in + the AMD GPU architecture + +Following notations are used for specifying relocation calculations: + +* **A** --- Represents the addend used to compute the value of the relocatable + field +* **G** --- Represents the offset into the global offset table at which the + relocation entry’s symbol will reside during execution. +* **GOT** --- Represents the address of the global offset table. +* **P** --- Represents the place (section offset or address) of the storage unit + being relocated (computed using ``r_offset``) +* **S** --- Represents the value of the symbol whose index resides in the + relocation entry + +AMDGPU Backend generates *Elf64_Rela* relocation records with the following +supported relocation types: + + ===================== ===== ========== ==================== + Relocation type Value Field Calculation + ===================== ===== ========== ==================== + ``R_AMDGPU_NONE`` 0 ``none`` ``none`` + ``R_AMDGPU_ABS32_LO`` 1 ``word32`` (S + A) & 0xFFFFFFFF + ``R_AMDGPU_ABS32_HI`` 2 ``word32`` (S + A) >> 32 + ``R_AMDGPU_ABS64`` 3 ``word64`` S + A + ``R_AMDGPU_REL32`` 4 ``word32`` S + A - P + ``R_AMDGPU_REL64`` 5 ``word64`` S + A - P + ``R_AMDGPU_ABS32`` 6 ``word32`` S + A + ``R_AMDGPU_GOTPCREL`` 7 ``word32`` G + GOT + A - P + ===================== ===== ========== ==================== diff --git a/gnu/llvm/docs/CodeOfConduct.rst b/gnu/llvm/docs/CodeOfConduct.rst new file mode 100644 index 00000000000..aa366f3514e --- /dev/null +++ b/gnu/llvm/docs/CodeOfConduct.rst @@ -0,0 +1,112 @@ +============================== +LLVM Community Code of Conduct +============================== + +.. note:: + + This document is currently a **DRAFT** document while it is being discussed + by the community. + +The LLVM community has always worked to be a welcoming and respectful +community, and we want to ensure that doesn't change as we grow and evolve. To +that end, we have a few ground rules that we ask people to adhere to: + +* `be friendly and patient`_, +* `be welcoming`_, +* `be considerate`_, +* `be respectful`_, +* `be careful in the words that you choose and be kind to others`_, and +* `when we disagree, try to understand why`_. + +This isn't an exhaustive list of things that you can't do. Rather, take it in +the spirit in which it's intended - a guide to make it easier to communicate +and participate in the community. + +This code of conduct applies to all spaces managed by the LLVM project or The +LLVM Foundation. This includes IRC channels, mailing lists, bug trackers, LLVM +events such as the developer meetings and socials, and any other forums created +by the project that the community uses for communication. It applies to all of +your communication and conduct in these spaces, including emails, chats, things +you say, slides, videos, posters, signs, or even t-shirts you display in these +spaces. In addition, violations of this code outside these spaces may, in rare +cases, affect a person's ability to participate within them, when the conduct +amounts to an egregious violation of this code. + +If you believe someone is violating the code of conduct, we ask that you report +it by emailing conduct@llvm.org. For more details please see our +:doc:`Reporting Guide <ReportingGuide>`. + +.. _be friendly and patient: + +* **Be friendly and patient.** + +.. _be welcoming: + +* **Be welcoming.** We strive to be a community that welcomes and supports + people of all backgrounds and identities. This includes, but is not limited + to members of any race, ethnicity, culture, national origin, colour, + immigration status, social and economic class, educational level, sex, sexual + orientation, gender identity and expression, age, size, family status, + political belief, religion or lack thereof, and mental and physical ability. + +.. _be considerate: + +* **Be considerate.** Your work will be used by other people, and you in turn + will depend on the work of others. Any decision you take will affect users + and colleagues, and you should take those consequences into account. Remember + that we're a world-wide community, so you might not be communicating in + someone else's primary language. + +.. _be respectful: + +* **Be respectful.** Not all of us will agree all the time, but disagreement is + no excuse for poor behavior and poor manners. We might all experience some + frustration now and then, but we cannot allow that frustration to turn into + a personal attack. It's important to remember that a community where people + feel uncomfortable or threatened is not a productive one. Members of the LLVM + community should be respectful when dealing with other members as well as + with people outside the LLVM community. + +.. _be careful in the words that you choose and be kind to others: + +* **Be careful in the words that you choose and be kind to others.** Do not + insult or put down other participants. Harassment and other exclusionary + behavior aren't acceptable. This includes, but is not limited to: + + * Violent threats or language directed against another person. + * Discriminatory jokes and language. + * Posting sexually explicit or violent material. + * Posting (or threatening to post) other people's personally identifying + information ("doxing"). + * Personal insults, especially those using racist or sexist terms. + * Unwelcome sexual attention. + * Advocating for, or encouraging, any of the above behavior. + + In general, if someone asks you to stop, then stop. Persisting in such + behavior after being asked to stop is considered harassment. + +.. _when we disagree, try to understand why: + +* **When we disagree, try to understand why.** Disagreements, both social and + technical, happen all the time and LLVM is no exception. It is important that + we resolve disagreements and differing views constructively. Remember that + we're different. The strength of LLVM comes from its varied community, people + from a wide range of backgrounds. Different people have different + perspectives on issues. Being unable to understand why someone holds + a viewpoint doesn't mean that they're wrong. Don't forget that it is human to + err and blaming each other doesn't get us anywhere. Instead, focus on helping + to resolve issues and learning from mistakes. + +Questions? +========== + +If you have questions, please feel free to contact the LLVM Foundation Code of +Conduct Advisory Committee by emailing conduct@llvm.org. + + +(This text is based on the `Django Project`_ Code of Conduct, which is in turn +based on wording from the `Speak Up! project`_.) + +.. _Django Project: https://www.djangoproject.com/conduct/ +.. _Speak Up! project: http://speakup.io/coc.html + diff --git a/gnu/llvm/docs/CommandGuide/FileCheck.rst b/gnu/llvm/docs/CommandGuide/FileCheck.rst index 03c88297677..413b6f41b0c 100644 --- a/gnu/llvm/docs/CommandGuide/FileCheck.rst +++ b/gnu/llvm/docs/CommandGuide/FileCheck.rst @@ -38,10 +38,27 @@ OPTIONS prefixes to match. Multiple prefixes are useful for tests which might change for different run options, but most lines remain the same. +.. option:: --check-prefixes prefix1,prefix2,... + + An alias of :option:`--check-prefix` that allows multiple prefixes to be + specified as a comma separated list. + .. option:: --input-file filename File to check (defaults to stdin). +.. option:: --match-full-lines + + By default, FileCheck allows matches of anywhere on a line. This + option will require all positive matches to cover an entire + line. Leading and trailing whitespace is ignored, unless + :option:`--strict-whitespace` is also specified. (Note: negative + matches from ``CHECK-NOT`` are not affected by this option!) + + Passing this option is equivalent to inserting ``{{^ *}}`` or + ``{{^}}`` before, and ``{{ *$}}`` or ``{{$}}`` after every positive + check pattern. + .. option:: --strict-whitespace By default, FileCheck canonicalizes input horizontal whitespace (spaces and @@ -127,7 +144,7 @@ exists anywhere in the file. The FileCheck -check-prefix option ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -The FileCheck :option:`-check-prefix` option allows multiple test +The FileCheck `-check-prefix` option allows multiple test configurations to be driven from one `.ll` file. This is useful in many circumstances, for example, testing different architectural variants with :program:`llc`. Here's a simple example: @@ -286,7 +303,7 @@ be aware that the definition rule can match `after` its use. So, for instance, the code below will pass: -.. code-block:: llvm +.. code-block:: text ; CHECK-DAG: vmov.32 [[REG2:d[0-9]+]][0] ; CHECK-DAG: vmov.32 [[REG2]][1] @@ -295,7 +312,7 @@ So, for instance, the code below will pass: While this other code, will not: -.. code-block:: llvm +.. code-block:: text ; CHECK-DAG: vmov.32 [[REG2:d[0-9]+]][0] ; CHECK-DAG: vmov.32 [[REG2]][1] @@ -444,3 +461,22 @@ relative line number references, for example: // CHECK-NEXT: {{^ ;}} int a +Matching Newline Characters +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To match newline characters in regular expressions the character class +``[[:space:]]`` can be used. For example, the following pattern: + +.. code-block:: c++ + + // CHECK: DW_AT_location [DW_FORM_sec_offset] ([[DLOC:0x[0-9a-f]+]]){{[[:space:]].*}}"intd" + +matches output of the form (from llvm-dwarfdump): + +.. code-block:: text + + DW_AT_location [DW_FORM_sec_offset] (0x00000233) + DW_AT_name [DW_FORM_strp] ( .debug_str[0x000000c9] = "intd") + +letting us set the :program:`FileCheck` variable ``DLOC`` to the desired value +``0x00000233``, extracted from the line immediately preceding "``intd``". diff --git a/gnu/llvm/docs/CommandGuide/bugpoint.rst b/gnu/llvm/docs/CommandGuide/bugpoint.rst index f11585d359c..8c2a0d12498 100644 --- a/gnu/llvm/docs/CommandGuide/bugpoint.rst +++ b/gnu/llvm/docs/CommandGuide/bugpoint.rst @@ -15,7 +15,7 @@ can be used to debug three types of failures: optimizer crashes, miscompilations by optimizers, or bad native code generation (including problems in the static and JIT compilers). It aims to reduce large test cases to small, useful ones. For more information on the design and inner workings of **bugpoint**, as well as -advice for using bugpoint, see *llvm/docs/Bugpoint.html* in the LLVM +advice for using bugpoint, see :doc:`/Bugpoint` in the LLVM distribution. OPTIONS @@ -151,7 +151,12 @@ OPTIONS **--compile-command** *command* This option defines the command to use with the **--compile-custom** - option to compile the bitcode testcase. This can be useful for + option to compile the bitcode testcase. The command should exit with a + failure exit code if the file is "interesting" and should exit with a + success exit code (i.e. 0) otherwise (this is the same as if it crashed on + "interesting" inputs). + + This can be useful for testing compiler output without running any link or execute stages. To generate a reduced unit test, you may add CHECK directives to the testcase and pass the name of an executable compile-command script in this form: @@ -171,6 +176,14 @@ OPTIONS **--safe-{int,jit,llc,custom}** option. +**--verbose-errors**\ =\ *{true,false}* + + The default behavior of bugpoint is to print "<crash>" when it finds a reduced + test that crashes compilation. This flag prints the output of the crashing + program to stderr. This is useful to make sure it is the same error being + tracked down and not a different error that happens to crash the compiler as + well. Defaults to false. + EXIT STATUS ----------- diff --git a/gnu/llvm/docs/CommandGuide/lit.rst b/gnu/llvm/docs/CommandGuide/lit.rst index 0ec14bb2236..b2da58ec02c 100644 --- a/gnu/llvm/docs/CommandGuide/lit.rst +++ b/gnu/llvm/docs/CommandGuide/lit.rst @@ -355,6 +355,35 @@ be used to define subdirectories of optional tests, or to change other configuration parameters --- for example, to change the test format, or the suffixes which identify test files. +PRE-DEFINED SUBSTITUTIONS +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +:program:`lit` provides various patterns that can be used with the RUN command. +These are defined in TestRunner.py. + + ========== ============== + Macro Substitution + ========== ============== + %s source path (path to the file currently being run) + %S source dir (directory of the file currently being run) + %p same as %S + %{pathsep} path separator + %t temporary file name unique to the test + %T temporary directory unique to the test + %% % + %/s same as %s but replace all / with \\ + %/S same as %S but replace all / with \\ + %/p same as %p but replace all / with \\ + %/t same as %t but replace all / with \\ + %/T same as %T but replace all / with \\ + ========== ============== + +Further substitution patterns might be defined by each test module. +See the modules :ref:`local-configuration-files`. + +More information on the testing infrastucture can be found in the +:doc:`../TestingGuide`. + TEST RUN OUTPUT FORMAT ~~~~~~~~~~~~~~~~~~~~~~ diff --git a/gnu/llvm/docs/CommandGuide/llvm-cov.rst b/gnu/llvm/docs/CommandGuide/llvm-cov.rst index d0e78a9a1d1..946b125a452 100644 --- a/gnu/llvm/docs/CommandGuide/llvm-cov.rst +++ b/gnu/llvm/docs/CommandGuide/llvm-cov.rst @@ -236,6 +236,26 @@ OPTIONS Show code coverage only for functions that match the given regular expression. +.. option:: -format=<FORMAT> + + Use the specified output format. The supported formats are: "text", "html". + +.. option:: -output-dir=PATH + + Specify a directory to write coverage reports into. If the directory does not + exist, it is created. When used in function view mode (i.e when -name or + -name-regex are used to select specific functions), the report is written to + PATH/functions.EXTENSION. When used in file view mode, a report for each file + is written to PATH/REL_PATH_TO_FILE.EXTENSION. + +.. option:: -Xdemangler=<TOOL>|<TOOL-OPTION> + + Specify a symbol demangler. This can be used to make reports more + human-readable. This option can be specified multiple times to supply + arguments to the demangler (e.g `-Xdemangler c++filt -Xdemangler -n` for C++). + The demangler is expected to read a newline-separated list of symbols from + stdin and write a newline-separated list of the same length to stdout. + .. option:: -line-coverage-gt=<N> Show code coverage only for functions with line coverage greater than the diff --git a/gnu/llvm/docs/CommandGuide/llvm-nm.rst b/gnu/llvm/docs/CommandGuide/llvm-nm.rst index 83d9fbaf9e8..319e6e6aecf 100644 --- a/gnu/llvm/docs/CommandGuide/llvm-nm.rst +++ b/gnu/llvm/docs/CommandGuide/llvm-nm.rst @@ -68,11 +68,11 @@ OPTIONS .. option:: -B (default) - Use BSD output format. Alias for :option:`--format=bsd`. + Use BSD output format. Alias for `--format=bsd`. .. option:: -P - Use POSIX.2 output format. Alias for :option:`--format=posix`. + Use POSIX.2 output format. Alias for `--format=posix`. .. option:: --debug-syms, -a @@ -126,6 +126,11 @@ OPTIONS Print only symbols referenced but not defined in this file. +.. option:: --radix=RADIX, -t + + Specify the radix of the symbol address(es). Values accepted d(decimal), + x(hexadecomal) and o(octal). + BUGS ---- diff --git a/gnu/llvm/docs/CommandGuide/llvm-profdata.rst b/gnu/llvm/docs/CommandGuide/llvm-profdata.rst index 74fe4ee9d21..f5508b5b2b8 100644 --- a/gnu/llvm/docs/CommandGuide/llvm-profdata.rst +++ b/gnu/llvm/docs/CommandGuide/llvm-profdata.rst @@ -44,6 +44,9 @@ interpreted as relatively more important than a shorter run. Depending on the nature of the training runs it may be useful to adjust the weight given to each input file by using the ``-weighted-input`` option. +Profiles passed in via ``-weighted-input``, ``-input-files``, or via positional +arguments are processed once for each time they are seen. + OPTIONS ^^^^^^^ @@ -59,10 +62,17 @@ OPTIONS .. option:: -weighted-input=weight,filename - Specify an input file name along with a weight. The profile counts of the input - file will be scaled (multiplied) by the supplied ``weight``, where where ``weight`` - is a decimal integer >= 1. Input files specified without using this option are - assigned a default weight of 1. Examples are shown below. + Specify an input file name along with a weight. The profile counts of the + supplied ``filename`` will be scaled (multiplied) by the supplied + ``weight``, where where ``weight`` is a decimal integer >= 1. + Input files specified without using this option are assigned a default + weight of 1. Examples are shown below. + +.. option:: -input-files=path, -f=path + + Specify a file which contains a list of files to merge. The entries in this + file are newline-separated. Lines starting with '#' are skipped. Entries may + be of the form <filename> or <weight>,<filename>. .. option:: -instr (default) @@ -90,6 +100,12 @@ OPTIONS Emit the profile using GCC's gcov format (Not yet supported). +.. option:: -sparse[=true|false] + + Do not emit function records with 0 execution count. Can only be used in + conjunction with -instr. Defaults to false, since it can inhibit compiler + optimization during PGO. + EXAMPLES ^^^^^^^^ Basic Usage diff --git a/gnu/llvm/docs/CommandGuide/llvm-readobj.rst b/gnu/llvm/docs/CommandGuide/llvm-readobj.rst index b1918b548f8..417fcd05c8a 100644 --- a/gnu/llvm/docs/CommandGuide/llvm-readobj.rst +++ b/gnu/llvm/docs/CommandGuide/llvm-readobj.rst @@ -80,6 +80,10 @@ input. Otherwise, it will read from the specified ``filenames``. Display the ELF program headers (only for ELF object files). +.. option:: -elf-section-groups, -g + + Display section groups (only for ELF object files). + EXIT STATUS ----------- diff --git a/gnu/llvm/docs/CommandGuide/opt.rst b/gnu/llvm/docs/CommandGuide/opt.rst index 3a050f7d815..7b9255d2642 100644 --- a/gnu/llvm/docs/CommandGuide/opt.rst +++ b/gnu/llvm/docs/CommandGuide/opt.rst @@ -12,16 +12,16 @@ DESCRIPTION The :program:`opt` command is the modular LLVM optimizer and analyzer. It takes LLVM source files as input, runs the specified optimizations or analyses on it, and then outputs the optimized file or the analysis results. The -function of :program:`opt` depends on whether the :option:`-analyze` option is +function of :program:`opt` depends on whether the `-analyze` option is given. -When :option:`-analyze` is specified, :program:`opt` performs various analyses +When `-analyze` is specified, :program:`opt` performs various analyses of the input source. It will usually print the results on standard output, but in a few cases, it will print output to standard error or generate a file with the analysis output, which is usually done when the output is meant for another program. -While :option:`-analyze` is *not* given, :program:`opt` attempts to produce an +While `-analyze` is *not* given, :program:`opt` attempts to produce an optimized output file. The optimizations available via :program:`opt` depend upon what libraries were linked into it as well as any additional libraries that have been loaded with the :option:`-load` option. Use the :option:`-help` @@ -68,19 +68,19 @@ OPTIONS .. option:: -disable-opt - This option is only meaningful when :option:`-std-link-opts` is given. It + This option is only meaningful when `-std-link-opts` is given. It disables most passes. .. option:: -strip-debug This option causes opt to strip debug information from the module before - applying other optimizations. It is essentially the same as :option:`-strip` + applying other optimizations. It is essentially the same as `-strip` but it ensures that stripping of debug information is done first. .. option:: -verify-each This option causes opt to add a verify pass after every pass otherwise - specified on the command line (including :option:`-verify`). This is useful + specified on the command line (including `-verify`). This is useful for cases where it is suspected that a pass is creating an invalid module but it is not clear which pass is doing it. diff --git a/gnu/llvm/docs/CompileCudaWithLLVM.rst b/gnu/llvm/docs/CompileCudaWithLLVM.rst index a981ffe1e8f..f57839cec96 100644 --- a/gnu/llvm/docs/CompileCudaWithLLVM.rst +++ b/gnu/llvm/docs/CompileCudaWithLLVM.rst @@ -18,9 +18,11 @@ familiarity with CUDA. Information about CUDA programming can be found in the How to Build LLVM with CUDA Support =================================== -Below is a quick summary of downloading and building LLVM. Consult the `Getting -Started <http://llvm.org/docs/GettingStarted.html>`_ page for more details on -setting up LLVM. +CUDA support is still in development and works the best in the trunk version +of LLVM. Below is a quick summary of downloading and building the trunk +version. Consult the `Getting Started +<http://llvm.org/docs/GettingStarted.html>`_ page for more details on setting +up LLVM. #. Checkout LLVM @@ -51,7 +53,7 @@ How to Compile CUDA C/C++ with LLVM =================================== We assume you have installed the CUDA driver and runtime. Consult the `NVIDIA -CUDA installation Guide +CUDA installation guide <https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html>`_ if you have not. @@ -60,8 +62,6 @@ which multiplies a ``float`` array by a ``float`` scalar (AXPY). .. code-block:: c++ - #include <helper_cuda.h> // for checkCudaErrors - #include <iostream> __global__ void axpy(float a, float* x, float* y) { @@ -78,25 +78,25 @@ which multiplies a ``float`` array by a ``float`` scalar (AXPY). // Copy input data to device. float* device_x; float* device_y; - checkCudaErrors(cudaMalloc(&device_x, kDataLen * sizeof(float))); - checkCudaErrors(cudaMalloc(&device_y, kDataLen * sizeof(float))); - checkCudaErrors(cudaMemcpy(device_x, host_x, kDataLen * sizeof(float), - cudaMemcpyHostToDevice)); + cudaMalloc(&device_x, kDataLen * sizeof(float)); + cudaMalloc(&device_y, kDataLen * sizeof(float)); + cudaMemcpy(device_x, host_x, kDataLen * sizeof(float), + cudaMemcpyHostToDevice); // Launch the kernel. axpy<<<1, kDataLen>>>(a, device_x, device_y); // Copy output data to host. - checkCudaErrors(cudaDeviceSynchronize()); - checkCudaErrors(cudaMemcpy(host_y, device_y, kDataLen * sizeof(float), - cudaMemcpyDeviceToHost)); + cudaDeviceSynchronize(); + cudaMemcpy(host_y, device_y, kDataLen * sizeof(float), + cudaMemcpyDeviceToHost); // Print the results. for (int i = 0; i < kDataLen; ++i) { std::cout << "y[" << i << "] = " << host_y[i] << "\n"; } - checkCudaErrors(cudaDeviceReset()); + cudaDeviceReset(); return 0; } @@ -104,16 +104,89 @@ The command line for compilation is similar to what you would use for C++. .. code-block:: console - $ clang++ -o axpy -I<CUDA install path>/samples/common/inc -L<CUDA install path>/<lib64 or lib> axpy.cu -lcudart_static -lcuda -ldl -lrt -pthread + $ clang++ axpy.cu -o axpy --cuda-gpu-arch=<GPU arch> \ + -L<CUDA install path>/<lib64 or lib> \ + -lcudart_static -ldl -lrt -pthread $ ./axpy y[0] = 2 y[1] = 4 y[2] = 6 y[3] = 8 -Note that ``helper_cuda.h`` comes from the CUDA samples, so you need the -samples installed for this example. ``<CUDA install path>`` is the root -directory where you installed CUDA SDK, typically ``/usr/local/cuda``. +``<CUDA install path>`` is the root directory where you installed CUDA SDK, +typically ``/usr/local/cuda``. ``<GPU arch>`` is `the compute capability of +your GPU <https://developer.nvidia.com/cuda-gpus>`_. For example, if you want +to run your program on a GPU with compute capability of 3.5, you should specify +``--cuda-gpu-arch=sm_35``. + +Detecting clang vs NVCC +======================= + +Although clang's CUDA implementation is largely compatible with NVCC's, you may +still want to detect when you're compiling CUDA code specifically with clang. + +This is tricky, because NVCC may invoke clang as part of its own compilation +process! For example, NVCC uses the host compiler's preprocessor when +compiling for device code, and that host compiler may in fact be clang. + +When clang is actually compiling CUDA code -- rather than being used as a +subtool of NVCC's -- it defines the ``__CUDA__`` macro. ``__CUDA_ARCH__`` is +defined only in device mode (but will be defined if NVCC is using clang as a +preprocessor). So you can use the following incantations to detect clang CUDA +compilation, in host and device modes: + +.. code-block:: c++ + + #if defined(__clang__) && defined(__CUDA__) && !defined(__CUDA_ARCH__) + // clang compiling CUDA code, host mode. + #endif + + #if defined(__clang__) && defined(__CUDA__) && defined(__CUDA_ARCH__) + // clang compiling CUDA code, device mode. + #endif + +Both clang and nvcc define ``__CUDACC__`` during CUDA compilation. You can +detect NVCC specifically by looking for ``__NVCC__``. + +Flags that control numerical code +================================= + +If you're using GPUs, you probably care about making numerical code run fast. +GPU hardware allows for more control over numerical operations than most CPUs, +but this results in more compiler options for you to juggle. + +Flags you may wish to tweak include: + +* ``-ffp-contract={on,off,fast}`` (defaults to ``fast`` on host and device when + compiling CUDA) Controls whether the compiler emits fused multiply-add + operations. + + * ``off``: never emit fma operations, and prevent ptxas from fusing multiply + and add instructions. + * ``on``: fuse multiplies and adds within a single statement, but never + across statements (C11 semantics). Prevent ptxas from fusing other + multiplies and adds. + * ``fast``: fuse multiplies and adds wherever profitable, even across + statements. Doesn't prevent ptxas from fusing additional multiplies and + adds. + + Fused multiply-add instructions can be much faster than the unfused + equivalents, but because the intermediate result in an fma is not rounded, + this flag can affect numerical code. + +* ``-fcuda-flush-denormals-to-zero`` (default: off) When this is enabled, + floating point operations may flush `denormal + <https://en.wikipedia.org/wiki/Denormal_number>`_ inputs and/or outputs to 0. + Operations on denormal numbers are often much slower than the same operations + on normal numbers. + +* ``-fcuda-approx-transcendentals`` (default: off) When this is enabled, the + compiler may emit calls to faster, approximate versions of transcendental + functions, instead of using the slower, fully IEEE-compliant versions. For + example, this flag allows clang to emit the ptx ``sin.approx.f32`` + instruction. + + This is implied by ``-ffast-math``. Optimizations ============= @@ -134,10 +207,9 @@ customizable target-independent optimization pipeline. straight-line scalar optimizations <https://goo.gl/4Rb9As>`_. * **Inferring memory spaces**. `This optimization - <http://www.llvm.org/docs/doxygen/html/NVPTXFavorNonGenericAddrSpaces_8cpp_source.html>`_ + <https://github.com/llvm-mirror/llvm/blob/master/lib/Target/NVPTX/NVPTXInferAddressSpaces.cpp>`_ infers the memory space of an address so that the backend can emit faster - special loads and stores from it. Details can be found in the `design - document for memory space inference <https://goo.gl/5wH2Ct>`_. + special loads and stores from it. * **Aggressive loop unrooling and function inlining**. Loop unrolling and function inlining need to be more aggressive for GPUs than for CPUs because @@ -167,3 +239,22 @@ customizable target-independent optimization pipeline. 32-bit ones on NVIDIA GPUs due to lack of a divide unit. Many of the 64-bit divides in our benchmarks have a divisor and dividend which fit in 32-bits at runtime. This optimization provides a fast path for this common case. + +Publication +=========== + +| `gpucc: An Open-Source GPGPU Compiler <http://dl.acm.org/citation.cfm?id=2854041>`_ +| Jingyue Wu, Artem Belevich, Eli Bendersky, Mark Heffernan, Chris Leary, Jacques Pienaar, Bjarke Roune, Rob Springer, Xuetian Weng, Robert Hundt +| *Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO 2016)* +| `Slides for the CGO talk <http://wujingyue.com/docs/gpucc-talk.pdf>`_ + +Tutorial +======== + +`CGO 2016 gpucc tutorial <http://wujingyue.com/docs/gpucc-tutorial.pdf>`_ + +Obtaining Help +============== + +To obtain help on LLVM in general and its CUDA support, see `the LLVM +community <http://llvm.org/docs/#mailing-lists>`_. diff --git a/gnu/llvm/docs/CompilerWriterInfo.rst b/gnu/llvm/docs/CompilerWriterInfo.rst index 6c3ff4b10f1..5ae47ea89fe 100644 --- a/gnu/llvm/docs/CompilerWriterInfo.rst +++ b/gnu/llvm/docs/CompilerWriterInfo.rst @@ -13,25 +13,18 @@ Architecture & Platform Information for Compiler Writers Hardware ======== -ARM ---- +AArch64 & ARM +------------- -* `ARM documentation <http://www.arm.com/documentation/>`_ (`Processor Cores <http://www.arm.com/documentation/ARMProcessor_Cores/>`_ Cores) +* `ARMv8-A Architecture Reference Manual <http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0487a.h/index.html>`_ (authentication required, free sign-up). This document covers both AArch64 and ARM instructions -* `ABI <http://www.arm.com/products/DevTools/ABI.html>`_ +* `ARMv7-M Architecture Reference Manual` <http://infocenter.arm.com/help/topic/com.arm.doc.ddi0403e.b/index.html>`_ (authentication required, free sign-up). This covers the Thumb2-only microcontrollers -* `ABI Addenda and Errata <http://infocenter.arm.com/help/topic/com.arm.doc.ihi0045d/IHI0045D_ABI_addenda.pdf>`_ +* `ARMv6-M Architecture Reference Manual` <http://infocenter.arm.com/help/topic/com.arm.doc.ddi0419c/index.html>_ (authentication required, free sign-up). This covers the Thumb1-only microcontrollers * `ARM C Language Extensions <http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf>`_ -AArch64 -------- - -* `ARMv8 Architecture Reference Manual <http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0487a.h/index.html>`_ - -* `ARMv8 Instruction Set Overview <http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.genc010197a/index.html>`_ - -* `ARM C Language Extensions <http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf>`_ +* AArch32 `ABI Addenda and Errata <http://infocenter.arm.com/help/topic/com.arm.doc.ihi0045d/IHI0045D_ABI_addenda.pdf>`_ Itanium (ia64) -------------- @@ -97,21 +90,10 @@ SystemZ X86 --- -AMD - Official manuals and docs -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - * `AMD processor manuals <http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_739,00.html>`_ -* `X86-64 ABI <http://www.x86-64.org/documentation>`_ - -Intel - Official manuals and docs -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - * `Intel 64 and IA-32 manuals <http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html>`_ * `Intel Itanium documentation <http://www.intel.com/design/itanium/documentation.htm?iid=ipp_srvr_proc_itanium2+techdocs>`_ - -Other x86-specific information -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - +* `X86 and X86-64 SysV psABI <https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI>`_ * `Calling conventions for different C++ compilers and operating systems <http://www.agner.org/optimize/calling_conventions.pdf>`_ XCore @@ -134,6 +116,7 @@ ABI Linux ----- +* `Linux extensions to gabi <https://github.com/hjl-tools/linux-abi/wiki/Linux-Extensions-to-gABI>`_ * `PowerPC 64-bit ELF ABI Supplement <http://www.linuxbase.org/spec/ELF/ppc64/>`_ * `Procedure Call Standard for the AArch64 Architecture <http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055a/IHI0055A_aapcs64.pdf>`_ * `ELF for the ARM Architecture <http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044e/IHI0044E_aaelf.pdf>`_ diff --git a/gnu/llvm/docs/CoverageMappingFormat.rst b/gnu/llvm/docs/CoverageMappingFormat.rst index 84cddff5ed9..158255ab863 100644 --- a/gnu/llvm/docs/CoverageMappingFormat.rst +++ b/gnu/llvm/docs/CoverageMappingFormat.rst @@ -251,27 +251,40 @@ The coverage mapping variable generated by Clang has 3 fields: .. code-block:: llvm - @__llvm_coverage_mapping = internal constant { { i32, i32, i32, i32 }, [2 x { i8*, i32, i32 }], [40 x i8] } + @__llvm_coverage_mapping = internal constant { { i32, i32, i32, i32 }, [2 x { i64, i32, i64 }], [40 x i8] } { { i32, i32, i32, i32 } ; Coverage map header { i32 2, ; The number of function records i32 20, ; The length of the string that contains the encoded translation unit filenames i32 20, ; The length of the string that contains the encoded coverage mapping data - i32 0, ; Coverage mapping format version + i32 1, ; Coverage mapping format version }, - [2 x { i8*, i32, i32 }] [ ; Function records - { i8*, i32, i32 } { i8* getelementptr inbounds ([3 x i8]* @__llvm_profile_name_foo, i32 0, i32 0), ; Function's name - i32 3, ; Function's name length - i32 9 ; Function's encoded coverage mapping data string length + [2 x { i64, i32, i64 }] [ ; Function records + { i64, i32, i64 } { + i64 0x5cf8c24cdb18bdac, ; Function's name MD5 + i32 9, ; Function's encoded coverage mapping data string length + i64 0 ; Function's structural hash }, - { i8*, i32, i32 } { i8* getelementptr inbounds ([3 x i8]* @__llvm_profile_name_bar, i32 0, i32 0), ; Function's name - i32 3, ; Function's name length - i32 9 ; Function's encoded coverage mapping data string length + { i64, i32, i64 } { + i64 0xe413754a191db537, ; Function's name MD5 + i32 9, ; Function's encoded coverage mapping data string length + i64 0 ; Function's structural hash }], [40 x i8] c"..." ; Encoded data (dissected later) }, section "__llvm_covmap", align 8 +The function record layout has evolved since version 1. In version 1, the function record for *foo* is defined as follows: + +.. code-block:: llvm + + { i8*, i32, i32, i64 } { i8* getelementptr inbounds ([3 x i8]* @__profn_foo, i32 0, i32 0), ; Function's name + i32 3, ; Function's name length + i32 9, ; Function's encoded coverage mapping data string length + i64 0 ; Function's structural hash + } + + Coverage Mapping Header: ------------------------ @@ -283,7 +296,7 @@ The coverage mapping header has the following fields: * The length of the string in the third field of *__llvm_coverage_mapping* that contains the encoded coverage mapping data. -* The format version. 0 is the first (current) version of the coverage mapping format. +* The format version. The current version is 2 (encoded as a 1). .. _function records: @@ -294,10 +307,10 @@ A function record is a structure of the following type: .. code-block:: llvm - { i8*, i32, i32 } + { i64, i32, i64 } -It contains the pointer to the function's name, function's name length, -and the length of the encoded mapping data for that function. +It contains function name's MD5, the length of the encoded mapping data for that function, and function's +structural hash value. Encoded data: ------------- @@ -417,7 +430,7 @@ and can appear after ``:`` in the ``[foo : type]`` description. LEB128 ^^^^^^ -LEB128 is an unsigned interger value that is encoded using DWARF's LEB128 +LEB128 is an unsigned integer value that is encoded using DWARF's LEB128 encoding, optimizing for the case where values are small (1 byte for values less than 128). diff --git a/gnu/llvm/docs/DeveloperPolicy.rst b/gnu/llvm/docs/DeveloperPolicy.rst index 17baf2d27b1..23bdb2fcf17 100644 --- a/gnu/llvm/docs/DeveloperPolicy.rst +++ b/gnu/llvm/docs/DeveloperPolicy.rst @@ -186,7 +186,7 @@ problem, we have a notion of an 'owner' for a piece of the code. The sole responsibility of a code owner is to ensure that a commit to their area of the code is appropriately reviewed, either by themself or by someone else. The list of current code owners can be found in the file -`CODE_OWNERS.TXT <http://llvm.org/viewvc/llvm-project/llvm/trunk/CODE_OWNERS.TXT?view=markup>`_ +`CODE_OWNERS.TXT <http://llvm.org/klaus/llvm/blob/master/CODE_OWNERS.TXT>`_ in the root of the LLVM source tree. Note that code ownership is completely different than reviewers: anyone can @@ -338,7 +338,7 @@ Obtaining Commit Access We grant commit access to contributors with a track record of submitting high quality patches. If you would like commit access, please send an email to -`Chris <mailto:sabre@nondot.org>`_ with the following information: +`Chris <mailto:clattner@llvm.org>`_ with the following information: #. The user name you want to commit with, e.g. "hacker". @@ -348,8 +348,10 @@ quality patches. If you would like commit access, please send an email to #. A "password hash" of the password you want to use, e.g. "``2ACR96qjUqsyM``". Note that you don't ever tell us what your password is; you just give it to us in an encrypted form. To get this, run "``htpasswd``" (a utility that - comes with apache) in crypt mode (often enabled with "``-d``"), or find a web - page that will do it for you. + comes with apache) in *crypt* mode (often enabled with "``-d``"), or find a web + page that will do it for you. Note that our system does not work with MD5 + hashes. These are significantly longer than a crypt hash - e.g. + "``$apr1$vea6bBV2$Z8IFx.AfeD8LhqlZFqJer0``", we only accept the shorter crypt hash. Once you've been granted commit access, you should be able to check out an LLVM tree with an SVN URL of "https://username@llvm.org/..." instead of the normal diff --git a/gnu/llvm/docs/ExceptionHandling.rst b/gnu/llvm/docs/ExceptionHandling.rst index 41dd4b606b1..a44fb92794c 100644 --- a/gnu/llvm/docs/ExceptionHandling.rst +++ b/gnu/llvm/docs/ExceptionHandling.rst @@ -406,7 +406,7 @@ outlined. After the handler is outlined, this intrinsic is simply removed. ``llvm.eh.exceptionpointer`` ---------------------------- -.. code-block:: llvm +.. code-block:: text i8 addrspace(N)* @llvm.eh.padparam.pNi8(token %catchpad) @@ -427,7 +427,7 @@ backend. Uses of them are generated by the backend's ``llvm.eh.sjlj.setjmp`` ~~~~~~~~~~~~~~~~~~~~~~~ -.. code-block:: llvm +.. code-block:: text i32 @llvm.eh.sjlj.setjmp(i8* %setjmp_buf) @@ -664,7 +664,7 @@ all of the new IR instructions: return 0; } -.. code-block:: llvm +.. code-block:: text define i32 @f() nounwind personality i32 (...)* @__CxxFrameHandler3 { entry: @@ -741,7 +741,7 @@ C++ code: } } -.. code-block:: llvm +.. code-block:: text define void @f() #0 personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*) { entry: diff --git a/gnu/llvm/docs/Extensions.rst b/gnu/llvm/docs/Extensions.rst index c8ff07c2b0c..f7029215c19 100644 --- a/gnu/llvm/docs/Extensions.rst +++ b/gnu/llvm/docs/Extensions.rst @@ -43,7 +43,7 @@ The following additional relocation types are supported: corresponds to the COFF relocation types ``IMAGE_REL_I386_DIR32NB`` (32-bit) or ``IMAGE_REL_AMD64_ADDR32NB`` (64-bit). -.. code-block:: gas +.. code-block:: text .text fun: diff --git a/gnu/llvm/docs/FAQ.rst b/gnu/llvm/docs/FAQ.rst index 0559a1ff215..0ab99f3452a 100644 --- a/gnu/llvm/docs/FAQ.rst +++ b/gnu/llvm/docs/FAQ.rst @@ -75,149 +75,17 @@ reference. In fact, the names of dummy numbered temporaries like ``%1`` are not explicitly represented in the in-memory representation at all (see ``Value::getName()``). -Build Problems -============== - -When I run configure, it finds the wrong C compiler. ----------------------------------------------------- -The ``configure`` script attempts to locate first ``gcc`` and then ``cc``, -unless it finds compiler paths set in ``CC`` and ``CXX`` for the C and C++ -compiler, respectively. - -If ``configure`` finds the wrong compiler, either adjust your ``PATH`` -environment variable or set ``CC`` and ``CXX`` explicitly. - - -The ``configure`` script finds the right C compiler, but it uses the LLVM tools from a previous build. What do I do? ---------------------------------------------------------------------------------------------------------------------- -The ``configure`` script uses the ``PATH`` to find executables, so if it's -grabbing the wrong linker/assembler/etc, there are two ways to fix it: - -#. Adjust your ``PATH`` environment variable so that the correct program - appears first in the ``PATH``. This may work, but may not be convenient - when you want them *first* in your path for other work. - -#. Run ``configure`` with an alternative ``PATH`` that is correct. In a - Bourne compatible shell, the syntax would be: - -.. code-block:: console - - % PATH=[the path without the bad program] $LLVM_SRC_DIR/configure ... - -This is still somewhat inconvenient, but it allows ``configure`` to do its -work without having to adjust your ``PATH`` permanently. - - -When creating a dynamic library, I get a strange GLIBC error. -------------------------------------------------------------- -Under some operating systems (i.e. Linux), libtool does not work correctly if -GCC was compiled with the ``--disable-shared option``. To work around this, -install your own version of GCC that has shared libraries enabled by default. - - -I've updated my source tree from Subversion, and now my build is trying to use a file/directory that doesn't exist. -------------------------------------------------------------------------------------------------------------------- -You need to re-run configure in your object directory. When new Makefiles -are added to the source tree, they have to be copied over to the object tree -in order to be used by the build. - - -I've modified a Makefile in my source tree, but my build tree keeps using the old version. What do I do? ---------------------------------------------------------------------------------------------------------- -If the Makefile already exists in your object tree, you can just run the -following command in the top level directory of your object tree: - -.. code-block:: console - - % ./config.status <relative path to Makefile>; - -If the Makefile is new, you will have to modify the configure script to copy -it over. - - -I've upgraded to a new version of LLVM, and I get strange build errors. ------------------------------------------------------------------------ -Sometimes, changes to the LLVM source code alters how the build system works. -Changes in ``libtool``, ``autoconf``, or header file dependencies are -especially prone to this sort of problem. - -The best thing to try is to remove the old files and re-build. In most cases, -this takes care of the problem. To do this, just type ``make clean`` and then -``make`` in the directory that fails to build. - - -I've built LLVM and am testing it, but the tests freeze. --------------------------------------------------------- -This is most likely occurring because you built a profile or release -(optimized) build of LLVM and have not specified the same information on the -``gmake`` command line. - -For example, if you built LLVM with the command: - -.. code-block:: console - - % gmake ENABLE_PROFILING=1 - -...then you must run the tests with the following commands: - -.. code-block:: console - - % cd llvm/test - % gmake ENABLE_PROFILING=1 - -Why do test results differ when I perform different types of builds? --------------------------------------------------------------------- -The LLVM test suite is dependent upon several features of the LLVM tools and -libraries. - -First, the debugging assertions in code are not enabled in optimized or -profiling builds. Hence, tests that used to fail may pass. - -Second, some tests may rely upon debugging options or behavior that is only -available in the debug build. These tests will fail in an optimized or -profile build. - - -Compiling LLVM with GCC 3.3.2 fails, what should I do? ------------------------------------------------------- -This is `a bug in GCC <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=13392>`_, -and affects projects other than LLVM. Try upgrading or downgrading your GCC. - - -After Subversion update, rebuilding gives the error "No rule to make target". ------------------------------------------------------------------------------ -If the error is of the form: - -.. code-block:: console - - gmake[2]: *** No rule to make target `/path/to/somefile', - needed by `/path/to/another/file.d'. - Stop. - -This may occur anytime files are moved within the Subversion repository or -removed entirely. In this case, the best solution is to erase all ``.d`` -files, which list dependencies for source files, and rebuild: - -.. code-block:: console - - % cd $LLVM_OBJ_DIR - % rm -f `find . -name \*\.d` - % gmake - -In other cases, it may be necessary to run ``make clean`` before rebuilding. - Source Languages ================ What source languages are supported? ------------------------------------ -LLVM currently has full support for C and C++ source languages. These are -available through both `Clang <http://clang.llvm.org/>`_ and `DragonEgg -<http://dragonegg.llvm.org/>`_. -The PyPy developers are working on integrating LLVM into the PyPy backend so -that PyPy language can translate to LLVM. +LLVM currently has full support for C and C++ source languages through +`Clang <http://clang.llvm.org/>`_. Many other language frontends have +been written using LLVM, and an incomplete list is available at +`projects with LLVM <http://llvm.org/ProjectsWithLLVM/>`_. I'd like to write a self-hosting LLVM compiler. How should I interface with the LLVM middle-end optimizers and back-end code generators? diff --git a/gnu/llvm/docs/GarbageCollection.rst b/gnu/llvm/docs/GarbageCollection.rst index 56b4b9f8f95..81605bc2095 100644 --- a/gnu/llvm/docs/GarbageCollection.rst +++ b/gnu/llvm/docs/GarbageCollection.rst @@ -204,7 +204,7 @@ IR features is specified by the selected :ref:`GC strategy description Specifying GC code generation: ``gc "..."`` ------------------------------------------- -.. code-block:: llvm +.. code-block:: text define <returntype> @name(...) gc "name" { ... } diff --git a/gnu/llvm/docs/GetElementPtr.rst b/gnu/llvm/docs/GetElementPtr.rst index c9cfae64ace..f39f1d9207a 100644 --- a/gnu/llvm/docs/GetElementPtr.rst +++ b/gnu/llvm/docs/GetElementPtr.rst @@ -105,7 +105,7 @@ memory, or a global variable. To make this clear, let's consider a more obtuse example: -.. code-block:: llvm +.. code-block:: text %MyVar = uninitialized global i32 ... @@ -142,7 +142,7 @@ Quick answer: there are no superfluous indices. This question arises most often when the GEP instruction is applied to a global variable which is always a pointer type. For example, consider this: -.. code-block:: llvm +.. code-block:: text %MyStruct = uninitialized global { float*, i32 } ... @@ -178,7 +178,7 @@ The GetElementPtr instruction dereferences nothing. That is, it doesn't access memory in any way. That's what the Load and Store instructions are for. GEP is only involved in the computation of addresses. For example, consider this: -.. code-block:: llvm +.. code-block:: text %MyVar = uninitialized global { [40 x i32 ]* } ... @@ -195,7 +195,7 @@ illegal. In order to access the 18th integer in the array, you would need to do the following: -.. code-block:: llvm +.. code-block:: text %idx = getelementptr { [40 x i32]* }, { [40 x i32]* }* %, i64 0, i32 0 %arr = load [40 x i32]** %idx @@ -204,7 +204,7 @@ following: In this case, we have to load the pointer in the structure with a load instruction before we can index into the array. If the example was changed to: -.. code-block:: llvm +.. code-block:: text %MyVar = uninitialized global { [40 x i32 ] } ... diff --git a/gnu/llvm/docs/GettingStarted.rst b/gnu/llvm/docs/GettingStarted.rst index 6aba5003679..54240b92b6a 100644 --- a/gnu/llvm/docs/GettingStarted.rst +++ b/gnu/llvm/docs/GettingStarted.rst @@ -38,6 +38,9 @@ Here's the short story for getting up and running quickly with LLVM: #. Read the documentation. #. Read the documentation. #. Remember that you were warned twice about reading the documentation. + + * In particular, the *relative paths specified are important*. + #. Checkout LLVM: * ``cd where-you-want-llvm-to-live`` @@ -49,13 +52,13 @@ Here's the short story for getting up and running quickly with LLVM: * ``cd llvm/tools`` * ``svn co http://llvm.org/svn/llvm-project/cfe/trunk clang`` -#. Checkout Compiler-RT (required to build the sanitizers): +#. Checkout Compiler-RT (required to build the sanitizers) **[Optional]**: * ``cd where-you-want-llvm-to-live`` * ``cd llvm/projects`` * ``svn co http://llvm.org/svn/llvm-project/compiler-rt/trunk compiler-rt`` -#. Checkout Libomp (required for OpenMP support): +#. Checkout Libomp (required for OpenMP support) **[Optional]**: * ``cd where-you-want-llvm-to-live`` * ``cd llvm/projects`` @@ -76,10 +79,15 @@ Here's the short story for getting up and running quickly with LLVM: #. Configure and build LLVM and Clang: - The usual build uses `CMake <CMake.html>`_. If you would rather use - autotools, see `Building LLVM with autotools <BuildingLLVMWithAutotools.html>`_. - Although the build is known to work with CMake >= 2.8.8, we recommend CMake - >= v3.2, especially if you're generating Ninja build files. + *Warning:* Make sure you've checked out *all of* the source code + before trying to configure with cmake. cmake does not pickup newly + added source directories in incremental builds. + + The build uses `CMake <CMake.html>`_. LLVM requires CMake 3.4.3 to build. It + is generally recommended to use a recent CMake, especially if you're + generating Ninja build files. This is because the CMake project is constantly + improving the quality of the generators, and the Ninja generator gets a lot + of attention. * ``cd where you want to build llvm`` * ``mkdir build`` @@ -89,10 +97,10 @@ Here's the short story for getting up and running quickly with LLVM: Some common generators are: * ``Unix Makefiles`` --- for generating make-compatible parallel makefiles. - * ``Ninja`` --- for generating `Ninja <http://martine.github.io/ninja/>` - build files. Most llvm developers use Ninja. + * ``Ninja`` --- for generating `Ninja <https://ninja-build.org>`_ + build files. Most llvm developers use Ninja. * ``Visual Studio`` --- for generating Visual Studio projects and - solutions. + solutions. * ``Xcode`` --- for generating Xcode projects. Some Common options: @@ -117,15 +125,17 @@ Here's the short story for getting up and running quickly with LLVM: * CMake will generate build targets for each tool and library, and most LLVM sub-projects generate their own ``check-<project>`` target. + * Running a serial build will be *slow*. Make sure you run a + parallel build; for ``make``, use ``make -j``. + * For more information see `CMake <CMake.html>`_ * If you get an "internal compiler error (ICE)" or test failures, see `below`_. Consult the `Getting Started with LLVM`_ section for detailed information on -configuring and compiling LLVM. See `Setting Up Your Environment`_ for tips -that simplify working with the Clang front end and LLVM tools. Go to `Program -Layout`_ to learn about the layout of the source code tree. +configuring and compiling LLVM. Go to `Directory Layout`_ to learn about the +layout of the source code tree. Requirements ============ @@ -161,16 +171,17 @@ Windows x64 x86-64 Visual Studio #. Code generation supported for Pentium processors and up #. Code generation supported for 32-bit ABI only #. To use LLVM modules on Win32-based system, you may configure LLVM - with ``-DBUILD_SHARED_LIBS=On`` for CMake builds or ``--enable-shared`` - for configure builds. + with ``-DBUILD_SHARED_LIBS=On``. #. MCJIT not working well pre-v7, old JIT engine not supported any more. -Note that you will need about 1-3 GB of space for a full LLVM build in Debug -mode, depending on the system (it is so large because of all the debugging -information and the fact that the libraries are statically linked into multiple -tools). If you do not need many of the tools and you are space-conscious, you -can pass ``ONLY_TOOLS="tools you need"`` to make. The Release build requires -considerably less space. +Note that Debug builds require a lot of time and disk space. An LLVM-only build +will need about 1-3 GB of space. A full build of LLVM and Clang will need around +15-20 GB of disk space. The exact space requirements will vary by system. (It +is so large because of all the debugging information and the fact that the +libraries are statically linked into multiple tools). + +If you you are space-constrained, you can build only selected tools or only +selected targets. The Release build requires considerably less space. The LLVM suite *may* compile on other platforms, but it is not guaranteed to do so. If compilation is successful, the LLVM utilities should be able to @@ -193,11 +204,7 @@ Package Version Notes `GNU Make <http://savannah.gnu.org/projects/make>`_ 3.79, 3.79.1 Makefile/build processor `GCC <http://gcc.gnu.org/>`_ >=4.7.0 C/C++ compiler\ :sup:`1` `python <http://www.python.org/>`_ >=2.7 Automated test suite\ :sup:`2` -`GNU M4 <http://savannah.gnu.org/projects/m4>`_ 1.4 Macro processor for configuration\ :sup:`3` -`GNU Autoconf <http://www.gnu.org/software/autoconf/>`_ 2.60 Configuration script builder\ :sup:`3` -`GNU Automake <http://www.gnu.org/software/automake/>`_ 1.9.6 aclocal macro generator\ :sup:`3` -`libtool <http://savannah.gnu.org/projects/libtool>`_ 1.5.22 Shared library manager\ :sup:`3` -`zlib <http://zlib.net>`_ >=1.2.3.4 Compression library\ :sup:`4` +`zlib <http://zlib.net>`_ >=1.2.3.4 Compression library\ :sup:`3` =========================================================== ============ ========================================== .. note:: @@ -207,9 +214,6 @@ Package Version Notes info. #. Only needed if you want to run the automated test suite in the ``llvm/test`` directory. - #. If you want to make changes to the configure scripts, you will need GNU - autoconf (2.60), and consequently, GNU M4 (version 1.4 or higher). You - will also need automake (1.9.6). We only use aclocal from that package. #. Optional, adds compression / uncompression capabilities to selected LLVM tools. @@ -421,22 +425,6 @@ appropriate pathname on your local system. All these paths are absolute: object files and compiled programs will be placed. It can be the same as SRC_ROOT). -.. _Setting Up Your Environment: - -Setting Up Your Environment ---------------------------- - -In order to compile and use LLVM, you may need to set some environment -variables. - -``LLVM_LIB_SEARCH_PATH=/path/to/your/bitcode/libs`` - - [Optional] This environment variable helps LLVM linking tools find the - locations of your bitcode libraries. It is provided only as a convenience - since you can specify the paths using the -L options of the tools and the - C/C++ front-end will automatically use the bitcode files installed in its - ``lib`` directory. - Unpacking the LLVM Archives --------------------------- @@ -513,8 +501,7 @@ get it from the Subversion repository: % svn co http://llvm.org/svn/llvm-project/test-suite/trunk test-suite By placing it in the ``llvm/projects``, it will be automatically configured by -the LLVM configure script as well as automatically updated when you run ``svn -update``. +the LLVM cmake configuration. Git Mirror ---------- @@ -628,6 +615,8 @@ Then, your .git/config should have [imap] sections. ; example for Traditional Chinese folder = "[Gmail]/&g0l6Pw-" +.. _developers-work-with-git-svn: + For developers to work with git-svn ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -711,9 +700,8 @@ Local LLVM Configuration ------------------------ Once checked out from the Subversion repository, the LLVM suite source code must -be configured before being built. For instructions using autotools please see -`Building LLVM With Autotools <BuildingLLVMWithAutotools.html>`_. The -recommended process uses CMake. Unlinke the normal ``configure`` script, CMake +be configured before being built. This process uses CMake. +Unlinke the normal ``configure`` script, CMake generates the build files in whatever format you request as well as various ``*.inc`` files, and ``llvm/include/Config/config.h``. @@ -744,9 +732,9 @@ used by people developing LLVM. | | the configure script. The default list is defined | | | as ``LLVM_ALL_TARGETS``, and can be set to include | | | out-of-tree targets. The default value includes: | -| | ``AArch64, AMDGPU, ARM, BPF, CppBackend, Hexagon, | -| | Mips, MSP430, NVPTX, PowerPC, Sparc, SystemZ | -| | X86, XCore``. | +| | ``AArch64, AMDGPU, ARM, BPF, Hexagon, Mips, | +| | MSP430, NVPTX, PowerPC, Sparc, SystemZ, X86, | +| | XCore``. | +-------------------------+----------------------------------------------------+ | LLVM_ENABLE_DOXYGEN | Build doxygen-based documentation from the source | | | code This is disabled by default because it is | @@ -888,8 +876,6 @@ The LLVM build system is capable of sharing a single LLVM source tree among several LLVM builds. Hence, it is possible to build LLVM for several different platforms or configurations using the same source tree. -This is accomplished in the typical autoconf manner: - * Change directory to where the LLVM object files should live: .. code-block:: console @@ -942,40 +928,38 @@ use this command instead of the 'echo' command above: .. _Program Layout: .. _general layout: -Program Layout -============== +Directory Layout +================ One useful source of information about the LLVM source base is the LLVM `doxygen -<http://www.doxygen.org/>`_ documentation available at +<http://www.doxygen.org/>`_ documentation available at `<http://llvm.org/doxygen/>`_. The following is a brief introduction to code layout: ``llvm/examples`` ----------------- -This directory contains some simple examples of how to use the LLVM IR and JIT. +Simple examples using the LLVM IR and JIT. ``llvm/include`` ---------------- -This directory contains public header files exported from the LLVM library. The -three main subdirectories of this directory are: +Public header files exported from the LLVM library. The three main subdirectories: ``llvm/include/llvm`` - This directory contains all of the LLVM specific header files. This directory - also has subdirectories for different portions of LLVM: ``Analysis``, - ``CodeGen``, ``Target``, ``Transforms``, etc... + All LLVM-specific header files, and subdirectories for different portions of + LLVM: ``Analysis``, ``CodeGen``, ``Target``, ``Transforms``, etc... ``llvm/include/llvm/Support`` - This directory contains generic support libraries that are provided with LLVM - but not necessarily specific to LLVM. For example, some C++ STL utilities and - a Command Line option processing library store their header files here. + Generic support libraries provided with LLVM but not necessarily specific to + LLVM. For example, some C++ STL utilities and a Command Line option processing + library store header files here. ``llvm/include/llvm/Config`` - This directory contains header files configured by the ``configure`` script. + Header files configured by the ``configure`` script. They wrap "standard" UNIX and C header files. Source code can include these header files which automatically take care of the conditional #includes that the ``configure`` script generates. @@ -983,103 +967,76 @@ three main subdirectories of this directory are: ``llvm/lib`` ------------ -This directory contains most of the source files of the LLVM system. In LLVM, -almost all code exists in libraries, making it very easy to share code among the -different `tools`_. +Most source files are here. By putting code in libraries, LLVM makes it easy to +share code among the `tools`_. ``llvm/lib/IR/`` - This directory holds the core LLVM source files that implement core classes - like Instruction and BasicBlock. + Core LLVM source files that implement core classes like Instruction and + BasicBlock. ``llvm/lib/AsmParser/`` - This directory holds the source code for the LLVM assembly language parser - library. + Source code for the LLVM assembly language parser library. ``llvm/lib/Bitcode/`` - This directory holds code for reading and write LLVM bitcode. + Code for reading and writing bitcode. ``llvm/lib/Analysis/`` - This directory contains a variety of different program analyses, such as - Dominator Information, Call Graphs, Induction Variables, Interval - Identification, Natural Loop Identification, etc. + A variety of program analyses, such as Call Graphs, Induction Variables, + Natural Loop Identification, etc. ``llvm/lib/Transforms/`` - This directory contains the source code for the LLVM to LLVM program - transformations, such as Aggressive Dead Code Elimination, Sparse Conditional - Constant Propagation, Inlining, Loop Invariant Code Motion, Dead Global - Elimination, and many others. + IR-to-IR program transformations, such as Aggressive Dead Code Elimination, + Sparse Conditional Constant Propagation, Inlining, Loop Invariant Code Motion, + Dead Global Elimination, and many others. ``llvm/lib/Target/`` - This directory contains files that describe various target architectures for - code generation. For example, the ``llvm/lib/Target/X86`` directory holds the - X86 machine description while ``llvm/lib/Target/ARM`` implements the ARM - backend. + Files describing target architectures for code generation. For example, + ``llvm/lib/Target/X86`` holds the X86 machine description. ``llvm/lib/CodeGen/`` - This directory contains the major parts of the code generator: Instruction - Selector, Instruction Scheduling, and Register Allocation. + The major parts of the code generator: Instruction Selector, Instruction + Scheduling, and Register Allocation. ``llvm/lib/MC/`` - (FIXME: T.B.D.) - -``llvm/lib/Debugger/`` - - This directory contains the source level debugger library that makes it - possible to instrument LLVM programs so that a debugger could identify source - code locations at which the program is executing. + (FIXME: T.B.D.) ....? ``llvm/lib/ExecutionEngine/`` - This directory contains libraries for executing LLVM bitcode directly at - runtime in both interpreted and JIT compiled fashions. + Libraries for directly executing bitcode at runtime in interpreted and + JIT-compiled scenarios. ``llvm/lib/Support/`` - This directory contains the source code that corresponds to the header files - located in ``llvm/include/ADT/`` and ``llvm/include/Support/``. + Source code that corresponding to the header files in ``llvm/include/ADT/`` + and ``llvm/include/Support/``. ``llvm/projects`` ----------------- -This directory contains projects that are not strictly part of LLVM but are -shipped with LLVM. This is also the directory where you should create your own -LLVM-based projects. - -``llvm/runtime`` ----------------- - -This directory contains libraries which are compiled into LLVM bitcode and used -when linking programs with the Clang front end. Most of these libraries are -skeleton versions of real libraries; for example, libc is a stripped down -version of glibc. - -Unlike the rest of the LLVM suite, this directory needs the LLVM GCC front end -to compile. +Projects not strictly part of LLVM but shipped with LLVM. This is also the +directory for creating your own LLVM-based projects which leverage the LLVM +build system. ``llvm/test`` ------------- -This directory contains feature and regression tests and other basic sanity -checks on the LLVM infrastructure. These are intended to run quickly and cover a -lot of territory without being exhaustive. +Feature and regression tests and other sanity checks on LLVM infrastructure. These +are intended to run quickly and cover a lot of territory without being exhaustive. ``test-suite`` -------------- -This is not a directory in the normal llvm module; it is a separate Subversion -module that must be checked out (usually to ``projects/test-suite``). This -module contains a comprehensive correctness, performance, and benchmarking test -suite for LLVM. It is a separate Subversion module because not every LLVM user -is interested in downloading or building such a comprehensive test suite. For -further details on this test suite, please see the :doc:`Testing Guide +A comprehensive correctness, performance, and benchmarking test suite for LLVM. +Comes in a separate Subversion module because not every LLVM user is interested +in such a comprehensive suite. For details see the :doc:`Testing Guide <TestingGuide>` document. .. _tools: @@ -1087,7 +1044,7 @@ further details on this test suite, please see the :doc:`Testing Guide ``llvm/tools`` -------------- -The **tools** directory contains the executables built out of the libraries +Executables built out of the libraries above, which form the main part of the user interface. You can always get help for a tool by typing ``tool_name -help``. The following is a brief introduction to the most important tools. More detailed information is in @@ -1135,72 +1092,67 @@ the `Command Guide <CommandGuide/index.html>`_. ``opt`` ``opt`` reads LLVM bitcode, applies a series of LLVM to LLVM transformations - (which are specified on the command line), and then outputs the resultant - bitcode. The '``opt -help``' command is a good way to get a list of the + (which are specified on the command line), and outputs the resultant + bitcode. '``opt -help``' is a good way to get a list of the program transformations available in LLVM. - ``opt`` can also be used to run a specific analysis on an input LLVM bitcode - file and print out the results. It is primarily useful for debugging + ``opt`` can also run a specific analysis on an input LLVM bitcode + file and print the results. Primarily useful for debugging analyses, or familiarizing yourself with what an analysis does. ``llvm/utils`` -------------- -This directory contains utilities for working with LLVM source code, and some of -the utilities are actually required as part of the build process because they -are code generators for parts of LLVM infrastructure. +Utilities for working with LLVM source code; some are part of the build process +because they are code generators for parts of the infrastructure. ``codegen-diff`` - ``codegen-diff`` is a script that finds differences between code that LLC - generates and code that LLI generates. This is a useful tool if you are + ``codegen-diff`` finds differences between code that LLC + generates and code that LLI generates. This is useful if you are debugging one of them, assuming that the other generates correct output. For the full user manual, run ```perldoc codegen-diff'``. ``emacs/`` - The ``emacs`` directory contains syntax-highlighting files which will work - with Emacs and XEmacs editors, providing syntax highlighting support for LLVM - assembly files and TableGen description files. For information on how to use - the syntax files, consult the ``README`` file in that directory. + Emacs and XEmacs syntax highlighting for LLVM assembly files and TableGen + description files. See the ``README`` for information on using them. ``getsrcs.sh`` - The ``getsrcs.sh`` script finds and outputs all non-generated source files, - which is useful if one wishes to do a lot of development across directories - and does not want to individually find each file. One way to use it is to run, - for example: ``xemacs `utils/getsources.sh``` from the top of your LLVM source + Finds and outputs all non-generated source files, + useful if one wishes to do a lot of development across directories + and does not want to find each file. One way to use it is to run, + for example: ``xemacs `utils/getsources.sh``` from the top of the LLVM source tree. ``llvmgrep`` - This little tool performs an ``egrep -H -n`` on each source file in LLVM and + Performs an ``egrep -H -n`` on each source file in LLVM and passes to it a regular expression provided on ``llvmgrep``'s command - line. This is a very efficient way of searching the source base for a + line. This is an efficient way of searching the source base for a particular regular expression. ``makellvm`` - The ``makellvm`` script compiles all files in the current directory and then + Compiles all files in the current directory, then compiles and links the tool that is the first argument. For example, assuming - you are in the directory ``llvm/lib/Target/Sparc``, if ``makellvm`` is in your - path, simply running ``makellvm llc`` will make a build of the current + you are in ``llvm/lib/Target/Sparc``, if ``makellvm`` is in your + path, running ``makellvm llc`` will make a build of the current directory, switch to directory ``llvm/tools/llc`` and build it, causing a re-linking of LLC. ``TableGen/`` - The ``TableGen`` directory contains the tool used to generate register + Contains the tool used to generate register descriptions, instruction set descriptions, and even assemblers from common TableGen description files. ``vim/`` - The ``vim`` directory contains syntax-highlighting files which will work with - the VIM editor, providing syntax highlighting support for LLVM assembly files - and TableGen description files. For information on how to use the syntax - files, consult the ``README`` file in that directory. + vim syntax-highlighting for LLVM assembly files + and TableGen description files. See the ``README`` for how to use them. .. _simple example: diff --git a/gnu/llvm/docs/GettingStartedVS.rst b/gnu/llvm/docs/GettingStartedVS.rst index 0ca50904ce4..57ed875ca4f 100644 --- a/gnu/llvm/docs/GettingStartedVS.rst +++ b/gnu/llvm/docs/GettingStartedVS.rst @@ -45,10 +45,12 @@ approximately 3GB. Software -------- -You will need Visual Studio 2013 or higher. +You will need Visual Studio 2013 or higher, with the latest Update installed. You will also need the `CMake <http://www.cmake.org/>`_ build system since it -generates the project files you will use to build with. +generates the project files you will use to build with. CMake 2.8.12.2 is the +minimum required version for building with Visual Studio, though the latest +version of CMake is recommended. If you would like to run the LLVM tests you will need `Python <http://www.python.org/>`_. Version 2.7 and newer are known to work. You will @@ -91,6 +93,10 @@ Here's the short story for getting up and running quickly with LLVM: using LLVM. Another important option is ``LLVM_TARGETS_TO_BUILD``, which controls the LLVM target architectures that are included on the build. + * If CMake complains that it cannot find the compiler, make sure that + you have the Visual Studio C++ Tools installed, not just Visual Studio + itself (trying to create a C++ project in Visual Studio will generally + download the C++ tools if they haven't already been). * See the :doc:`LLVM CMake guide <CMake>` for detailed information about how to configure the LLVM build. * CMake generates project files for all build types. To select a specific diff --git a/gnu/llvm/docs/GoldPlugin.rst b/gnu/llvm/docs/GoldPlugin.rst index 6328934b37b..88b944a2a0f 100644 --- a/gnu/llvm/docs/GoldPlugin.rst +++ b/gnu/llvm/docs/GoldPlugin.rst @@ -44,9 +44,7 @@ will either need to build gold or install a version with plugin support. the ``-plugin`` option. Running ``make`` will additionally build ``build/binutils/ar`` and ``nm-new`` binaries supporting plugins. -* Build the LLVMgold plugin. If building with autotools, run configure with - ``--with-binutils-include=/path/to/binutils/include`` and run ``make``. - If building with CMake, run cmake with +* Build the LLVMgold plugin. Run CMake with ``-DLLVM_BINUTILS_INCDIR=/path/to/binutils/include``. The correct include path will contain the file ``plugin-api.h``. diff --git a/gnu/llvm/docs/HowToCrossCompileLLVM.rst b/gnu/llvm/docs/HowToCrossCompileLLVM.rst index 1072517e4c2..e71c0b07a7a 100644 --- a/gnu/llvm/docs/HowToCrossCompileLLVM.rst +++ b/gnu/llvm/docs/HowToCrossCompileLLVM.rst @@ -39,6 +39,7 @@ For more information on how to configure CMake for LLVM/Clang, see :doc:`CMake`. The CMake options you need to add are: + * ``-DCMAKE_CROSSCOMPILING=True`` * ``-DCMAKE_INSTALL_PREFIX=<install-dir>`` * ``-DLLVM_TABLEGEN=<path-to-host-bin>/llvm-tblgen`` @@ -46,20 +47,40 @@ The CMake options you need to add are: * ``-DLLVM_DEFAULT_TARGET_TRIPLE=arm-linux-gnueabihf`` * ``-DLLVM_TARGET_ARCH=ARM`` * ``-DLLVM_TARGETS_TO_BUILD=ARM`` - * ``-DCMAKE_CXX_FLAGS='-target armv7a-linux-gnueabihf -mcpu=cortex-a9 -I/usr/arm-linux-gnueabihf/include/c++/4.7.2/arm-linux-gnueabihf/ -I/usr/arm-linux-gnueabihf/include/ -mfloat-abi=hard -ccc-gcc-name arm-linux-gnueabihf-gcc'`` + +If you're compiling with GCC, you can use architecture options for your target, +and the compiler driver will detect everything that it needs: + + * ``-DCMAKE_CXX_FLAGS='-march=armv7-a -mcpu=cortex-a9 -mfloat-abi=hard'`` + +However, if you're using Clang, the driver might not be up-to-date with your +specific Linux distribution, version or GCC layout, so you'll need to fudge. + +In addition to the ones above, you'll also need: + + * ``'-target arm-linux-gnueabihf'`` or whatever is the triple of your cross GCC. + * ``'--sysroot=/usr/arm-linux-gnueabihf'``, ``'--sysroot=/opt/gcc/arm-linux-gnueabihf'`` + or whatever is the location of your GCC's sysroot (where /lib, /bin etc are). + * Appropriate use of ``-I`` and ``-L``, depending on how the cross GCC is installed, + and where are the libraries and headers. The TableGen options are required to compile it with the host compiler, so you'll need to compile LLVM (or at least ``llvm-tblgen``) to your host -platform before you start. The CXX flags define the target, cpu (which +platform before you start. The CXX flags define the target, cpu (which in this case defaults to ``fpu=VFP3`` with NEON), and forcing the hard-float ABI. If you're -using Clang as a cross-compiler, you will *also* have to set ``-ccc-gcc-name``, +using Clang as a cross-compiler, you will *also* have to set ``--sysroot`` to make sure it picks the correct linker. +When using Clang, it's important that you choose the triple to be *identical* +to the GCC triple and the sysroot. This will make it easier for Clang to +find the correct tools and include headers. But that won't mean all headers and +libraries will be found. You'll still need to use ``-I`` and ``-L`` to locate +those extra ones, depending on your distribution. + Most of the time, what you want is to have a native compiler to the -platform itself, but not others. It might not even be feasible to -produce x86 binaries from ARM targets, so there's no point in compiling +platform itself, but not others. So there's rarely a point in compiling all back-ends. For that reason, you should also set the -``TARGETS_TO_BUILD`` to only build the ARM back-end. +``TARGETS_TO_BUILD`` to only build the back-end you're targeting to. You must set the ``CMAKE_INSTALL_PREFIX``, otherwise a ``ninja install`` will copy ARM binaries to your root filesystem, which is not what you @@ -83,14 +104,23 @@ running CMake: This is not a problem, since Clang/LLVM libraries are statically linked anyway, it shouldn't affect much. -#. The ARM libraries won't be installed in your system, and possibly - not easily installable anyway, so you'll have to build/download - them separately. But the CMake prepare step, which checks for +#. The ARM libraries won't be installed in your system. + But the CMake prepare step, which checks for dependencies, will check the *host* libraries, not the *target* - ones. + ones. Below there's a list of some dependencies, but your project could + have more, or this document could be outdated. You'll see the errors + while linking as an indication of that. + + Debian based distros have a way to add ``multiarch``, which adds + a new architecture and allows you to install packages for those + systems. See https://wiki.debian.org/Multiarch/HOWTO for more info. + + But not all distros will have that, and possibly not an easy way to + install them in any anyway, so you'll have to build/download + them separately. A quick way of getting the libraries is to download them from - a distribution repository, like Debian (http://packages.debian.org/wheezy/), + a distribution repository, like Debian (http://packages.debian.org/jessie/), and download the missing libraries. Note that the ``libXXX`` will have the shared objects (``.so``) and the ``libXXX-dev`` will give you the headers and the static (``.a``) library. Just in diff --git a/gnu/llvm/docs/HowToReleaseLLVM.rst b/gnu/llvm/docs/HowToReleaseLLVM.rst index 33c547e97a8..d44ea04a9fa 100644 --- a/gnu/llvm/docs/HowToReleaseLLVM.rst +++ b/gnu/llvm/docs/HowToReleaseLLVM.rst @@ -332,9 +332,26 @@ Below are the rules regarding patching the release branch: #. During the remaining rounds of testing, only patches that fix critical regressions may be applied. -#. For dot releases all patches must mantain both API and ABI compatibility with +#. For dot releases all patches must maintain both API and ABI compatibility with the previous major release. Only bugfixes will be accepted. +Merging Patches +^^^^^^^^^^^^^^^ + +The ``utils/release/merge.sh`` script can be used to merge individual revisions +into any one of the llvm projects. To merge revision ``$N`` into project +``$PROJ``, do: + +#. ``svn co https://llvm.org/svn/llvm-project/$PROJ/branches/release_XX + $PROJ.src`` + +#. ``$PROJ.src/utils/release/merge.sh --proj $PROJ --rev $N`` + +#. Run regression tests. + +#. ``cd $PROJ.src``. Run the ``svn commit`` command printed out by ``merge.sh`` + in step 2. + Release Final Tasks ------------------- diff --git a/gnu/llvm/docs/HowToUseInstrMappings.rst b/gnu/llvm/docs/HowToUseInstrMappings.rst index 8a3e7c8d726..1c586b4bada 100644 --- a/gnu/llvm/docs/HowToUseInstrMappings.rst +++ b/gnu/llvm/docs/HowToUseInstrMappings.rst @@ -30,7 +30,7 @@ instructions with each other. These tables are emitted in the ``XXXInstrInfo.inc`` file along with the functions to query them. Following is the definition of ``InstrMapping`` class definied in Target.td file: -.. code-block:: llvm +.. code-block:: text class InstrMapping { // Used to reduce search space only to the instructions using this @@ -69,7 +69,7 @@ non-predicated form by assigning appropriate values to the ``InstrMapping`` fields. For this relationship, non-predicated instructions are treated as key instruction since they are the one used to query the interface function. -.. code-block:: llvm +.. code-block:: text def getPredOpcode : InstrMapping { // Choose a FilterClass that is used as a base class for all the @@ -116,7 +116,7 @@ to include relevant information in its definition. For example, consider following to be the current definitions of ADD, ADD_pt (true) and ADD_pf (false) instructions: -.. code-block:: llvm +.. code-block:: text def ADD : ALU32_rr<(outs IntRegs:$dst), (ins IntRegs:$a, IntRegs:$b), "$dst = add($a, $b)", @@ -137,7 +137,7 @@ In this step, we modify these instructions to include the information required by the relationship model, <tt>getPredOpcode</tt>, so that they can be related. -.. code-block:: llvm +.. code-block:: text def ADD : PredRel, ALU32_rr<(outs IntRegs:$dst), (ins IntRegs:$a, IntRegs:$b), "$dst = add($a, $b)", diff --git a/gnu/llvm/docs/InAlloca.rst b/gnu/llvm/docs/InAlloca.rst index c7609cddb4f..a75f22da796 100644 --- a/gnu/llvm/docs/InAlloca.rst +++ b/gnu/llvm/docs/InAlloca.rst @@ -41,7 +41,7 @@ that passes two default-constructed ``Foo`` objects to ``g`` in the g(Foo(), Foo()); } -.. code-block:: llvm +.. code-block:: text %struct.Foo = type { i32, i32 } declare void @Foo_ctor(%struct.Foo* %this) diff --git a/gnu/llvm/docs/LLVMBuild.rst b/gnu/llvm/docs/LLVMBuild.rst index 58f6f4d20a0..0200f78bfb7 100644 --- a/gnu/llvm/docs/LLVMBuild.rst +++ b/gnu/llvm/docs/LLVMBuild.rst @@ -49,8 +49,7 @@ Build Integration The LLVMBuild files themselves are just a declarative way to describe the project structure. The actual building of the LLVM project is -handled by another build system (currently we support both -:doc:`Makefiles <MakefileGuide>` and :doc:`CMake <CMake>`). +handled by another build system (See: :doc:`CMake <CMake>`). The build system implementation will load the relevant contents of the LLVMBuild files and use that to drive the actual project build. diff --git a/gnu/llvm/docs/LangRef.rst b/gnu/llvm/docs/LangRef.rst index 5f8a3a5a4a9..ce15c47111c 100644 --- a/gnu/llvm/docs/LangRef.rst +++ b/gnu/llvm/docs/LangRef.rst @@ -250,6 +250,11 @@ linkage: together. This is the LLVM, typesafe, equivalent of having the system linker append together "sections" with identical names when .o files are linked. + + Unfortunately this doesn't correspond to any feature in .o files, so it + can only be used for variables like ``llvm.global_ctors`` which llvm + interprets specially. + ``extern_weak`` The semantics of this linkage follow the ELF object file model: the symbol is weak until linked, if not linked, the symbol becomes null @@ -427,6 +432,10 @@ added in the future: - On X86-64 the callee preserves all general purpose registers, except for RDI and RAX. +"``swiftcc``" - This calling convention is used for Swift language. + - On X86-64 RCX and R8 are available for additional integer returns, and + XMM2 and XMM3 are available for additional FP/vector returns. + - On iOS platforms, we use AAPCS-VFP calling convention. "``cc <n>``" - Numbered convention Any calling convention may be specified by number, allowing target-specific calling conventions to be used. Target specific @@ -580,6 +589,9 @@ initializer. Note that a constant with significant address *can* be merged with a ``unnamed_addr`` constant, the result being a constant whose address is significant. +If the ``local_unnamed_addr`` attribute is given, the address is known to +not be significant within the module. + A global variable may be declared to reside in a target-specific numbered address space. For targets that support them, address spaces may affect how optimizations are performed and/or what target @@ -610,18 +622,20 @@ assume that the globals are densely packed in their section and try to iterate over them as an array, alignment padding would break this iteration. The maximum alignment is ``1 << 29``. -Globals can also have a :ref:`DLL storage class <dllstorageclass>`. +Globals can also have a :ref:`DLL storage class <dllstorageclass>` and +an optional list of attached :ref:`metadata <metadata>`, Variables and aliases can have a :ref:`Thread Local Storage Model <tls_model>`. Syntax:: - [@<GlobalVarName> =] [Linkage] [Visibility] [DLLStorageClass] [ThreadLocal] - [unnamed_addr] [AddrSpace] [ExternallyInitialized] + @<GlobalVarName> = [Linkage] [Visibility] [DLLStorageClass] [ThreadLocal] + [(unnamed_addr|local_unnamed_addr)] [AddrSpace] + [ExternallyInitialized] <global | constant> <Type> [<InitializerConstant>] [, section "name"] [, comdat [($name)]] - [, align <Alignment>] + [, align <Alignment>] (, !name !N)* For example, the following defines a global in a numbered address space with an initializer, section, and alignment: @@ -665,14 +679,14 @@ an optional list of attached :ref:`metadata <metadata>`, an opening curly brace, a list of basic blocks, and a closing curly brace. LLVM function declarations consist of the "``declare``" keyword, an -optional :ref:`linkage type <linkage>`, an optional :ref:`visibility -style <visibility>`, an optional :ref:`DLL storage class <dllstorageclass>`, -an optional :ref:`calling convention <callingconv>`, -an optional ``unnamed_addr`` attribute, a return type, an optional -:ref:`parameter attribute <paramattrs>` for the return type, a function -name, a possibly empty list of arguments, an optional alignment, an optional -:ref:`garbage collector name <gc>`, an optional :ref:`prefix <prefixdata>`, -and an optional :ref:`prologue <prologuedata>`. +optional :ref:`linkage type <linkage>`, an optional :ref:`visibility style +<visibility>`, an optional :ref:`DLL storage class <dllstorageclass>`, an +optional :ref:`calling convention <callingconv>`, an optional ``unnamed_addr`` +or ``local_unnamed_addr`` attribute, a return type, an optional :ref:`parameter +attribute <paramattrs>` for the return type, a function name, a possibly +empty list of arguments, an optional alignment, an optional :ref:`garbage +collector name <gc>`, an optional :ref:`prefix <prefixdata>`, and an optional +:ref:`prologue <prologuedata>`. A function definition contains a list of basic blocks, forming the CFG (Control Flow Graph) for the function. Each basic block may optionally start with a label @@ -703,14 +717,17 @@ alignment. All alignments must be a power of 2. If the ``unnamed_addr`` attribute is given, the address is known to not be significant and two identical functions can be merged. +If the ``local_unnamed_addr`` attribute is given, the address is known to +not be significant within the module. + Syntax:: define [linkage] [visibility] [DLLStorageClass] [cconv] [ret attrs] <ResultType> @<FunctionName> ([argument list]) - [unnamed_addr] [fn Attrs] [section "name"] [comdat [($name)]] - [align N] [gc] [prefix Constant] [prologue Constant] - [personality Constant] (!name !N)* { ... } + [(unnamed_addr|local_unnamed_addr)] [fn Attrs] [section "name"] + [comdat [($name)]] [align N] [gc] [prefix Constant] + [prologue Constant] [personality Constant] (!name !N)* { ... } The argument list is a comma separated sequence of arguments where each argument is of the following form: @@ -737,7 +754,7 @@ Aliases may have an optional :ref:`linkage type <linkage>`, an optional Syntax:: - @<Name> = [Linkage] [Visibility] [DLLStorageClass] [ThreadLocal] [unnamed_addr] alias <AliaseeTy>, <AliaseeTy>* @<Aliasee> + @<Name> = [Linkage] [Visibility] [DLLStorageClass] [ThreadLocal] [(unnamed_addr|local_unnamed_addr)] alias <AliaseeTy>, <AliaseeTy>* @<Aliasee> The linkage must be one of ``private``, ``internal``, ``linkonce``, ``weak``, ``linkonce_odr``, ``weak_odr``, ``external``. Note that some system linkers @@ -747,6 +764,9 @@ Aliases that are not ``unnamed_addr`` are guaranteed to have the same address as the aliasee expression. ``unnamed_addr`` ones are only guaranteed to point to the same content. +If the ``local_unnamed_addr`` attribute is given, the address is known to +not be significant within the module. + Since aliases are only a second name, some restrictions apply, of which some can only be checked when producing an object file: @@ -760,6 +780,25 @@ some can only be checked when producing an object file: * No global value in the expression can be a declaration, since that would require a relocation, which is not possible. +.. _langref_ifunc: + +IFuncs +------- + +IFuncs, like as aliases, don't create any new data or func. They are just a new +symbol that dynamic linker resolves at runtime by calling a resolver function. + +IFuncs have a name and a resolver that is a function called by dynamic linker +that returns address of another function associated with the name. + +IFunc may have an optional :ref:`linkage type <linkage>` and an optional +:ref:`visibility style <visibility>`. + +Syntax:: + + @<Name> = [Linkage] [Visibility] ifunc <IFuncTy>, <ResolverTy>* @<Resolver> + + .. _langref_comdats: Comdats @@ -800,7 +839,7 @@ Note that the Mach-O platform doesn't support COMDATs and ELF only supports Here is an example of a COMDAT group where a function will only be selected if the COMDAT key's section is the largest: -.. code-block:: llvm +.. code-block:: text $foo = comdat largest @foo = global i32 2, comdat($foo) @@ -812,7 +851,7 @@ the COMDAT key's section is the largest: As a syntactic sugar the ``$name`` can be omitted if the name is the same as the global name: -.. code-block:: llvm +.. code-block:: text $foo = comdat any @foo = global i32 2, comdat @@ -836,7 +875,7 @@ if a collision occurs in the symbol table. The combined use of COMDATS and section attributes may yield surprising results. For example: -.. code-block:: llvm +.. code-block:: text $foo = comdat any $bar = comdat any @@ -907,8 +946,7 @@ Currently, only the following parameter attributes are defined: ``zeroext`` This indicates to the code generator that the parameter or return value should be zero-extended to the extent required by the target's - ABI (which is usually 32-bits, but is 8-bits for a i1 on x86-64) by - the caller (for a parameter) or the callee (for a return value). + ABI by the caller (for a parameter) or the callee (for a return value). ``signext`` This indicates to the code generator that the parameter or return value should be sign-extended to the extent required by the target's @@ -1010,7 +1048,8 @@ Currently, only the following parameter attributes are defined: ``nocapture`` This indicates that the callee does not make any copies of the pointer that outlive the callee itself. This is not a valid - attribute for return values. + attribute for return values. Addresses used in volatile operations + are considered to be captured. .. _nest: @@ -1021,12 +1060,13 @@ Currently, only the following parameter attributes are defined: ``returned`` This indicates that the function always returns the argument as its return - value. This is an optimization hint to the code generator when generating - the caller, allowing tail call optimization and omission of register saves - and restores in some cases; it is not checked or enforced when generating - the callee. The parameter and the function return type must be valid - operands for the :ref:`bitcast instruction <i_bitcast>`. This is not a - valid attribute for return values and can only be applied to one parameter. + value. This is a hint to the optimizer and code generator used when + generating the caller, allowing value propagation, tail call optimization, + and omission of register saves and restores in some cases; it is not + checked or enforced when generating the callee. The parameter and the + function return type must be valid operands for the + :ref:`bitcast instruction <i_bitcast>`. This is not a valid attribute for + return values and can only be applied to one parameter. ``nonnull`` This indicates that the parameter or return pointer is not null. This @@ -1059,6 +1099,30 @@ Currently, only the following parameter attributes are defined: ``dereferenceable(<n>)``). This attribute may only be applied to pointer typed parameters. +``swiftself`` + This indicates that the parameter is the self/context parameter. This is not + a valid attribute for return values and can only be applied to one + parameter. + +``swifterror`` + This attribute is motivated to model and optimize Swift error handling. It + can be applied to a parameter with pointer to pointer type or a + pointer-sized alloca. At the call site, the actual argument that corresponds + to a ``swifterror`` parameter has to come from a ``swifterror`` alloca. A + ``swifterror`` value (either the parameter or the alloca) can only be loaded + and stored from, or used as a ``swifterror`` argument. This is not a valid + attribute for return values and can only be applied to one parameter. + + These constraints allow the calling convention to optimize access to + ``swifterror`` variables by associating them with a specific register at + call boundaries rather than placing them in memory. Since this does change + the calling convention, a function which uses the ``swifterror`` attribute + on a parameter is not ABI-compatible with one which does not. + + These constraints also allow LLVM to assume that a ``swifterror`` argument + does not alias any other memory visible within a function and that a + ``swifterror`` alloca passed as an argument does not escape. + .. _gc: Garbage Collector Strategy Names @@ -1141,7 +1205,7 @@ makes the format of the prologue data highly target dependent. A trivial example of valid prologue data for the x86 architecture is ``i8 144``, which encodes the ``nop`` instruction: -.. code-block:: llvm +.. code-block:: text define void @f() prologue i8 144 { ... } @@ -1149,7 +1213,7 @@ Generally prologue data can be formed by encoding a relative branch instruction which skips the metadata, as in this example of valid prologue data for the x86_64 architecture, where the first two bytes encode ``jmp .+10``: -.. code-block:: llvm +.. code-block:: text %0 = type <{ i8, i8, i8* }> @@ -1223,6 +1287,15 @@ example: epilogue, the backend should forcibly align the stack pointer. Specify the desired alignment, which must be a power of two, in parentheses. +``allocsize(<EltSizeParam>[, <NumEltsParam>])`` + This attribute indicates that the annotated function will always return at + least a given number of bytes (or null). Its arguments are zero-indexed + parameter numbers; if one argument is provided, then it's assumed that at + least ``CallSite.Args[EltSizeParam]`` bytes will be available at the + returned pointer. If two are provided, then it's assumed that + ``CallSite.Args[EltSizeParam] * CallSite.Args[NumEltsParam]`` bytes are + available. The referenced parameters must be integer types. No assumptions + are made about the contents of the returned block of memory. ``alwaysinline`` This attribute indicates that the inliner should attempt to inline this function into callers whenever possible, ignoring any active @@ -1239,10 +1312,26 @@ example: function call are also considered to be cold; and, thus, given low weight. ``convergent`` - This attribute indicates that the callee is dependent on a convergent - thread execution pattern under certain parallel execution models. - Transformations that are execution model agnostic may not make the execution - of a convergent operation control dependent on any additional values. + In some parallel execution models, there exist operations that cannot be + made control-dependent on any additional values. We call such operations + ``convergent``, and mark them with this attribute. + + The ``convergent`` attribute may appear on functions or call/invoke + instructions. When it appears on a function, it indicates that calls to + this function should not be made control-dependent on additional values. + For example, the intrinsic ``llvm.nvvm.barrier0`` is ``convergent``, so + calls to this intrinsic cannot be made control-dependent on additional + values. + + When it appears on a call/invoke, the ``convergent`` attribute indicates + that we should treat the call as though we're calling a convergent + function. This is particularly useful on indirect calls; without this we + may treat such calls as though the target is non-convergent. + + The optimizer may remove the ``convergent`` attribute on functions when it + can prove that the function does not execute any convergent operations. + Similarly, the optimizer may remove ``convergent`` on calls/invokes when it + can prove that the call/invoke cannot call a convergent function. ``inaccessiblememonly`` This attribute indicates that the function may only access memory that is not accessible by the module being compiled. This is a weaker form @@ -1334,6 +1423,31 @@ example: passes make choices that keep the code size of this function low, and otherwise do optimizations specifically to reduce code size as long as they do not significantly impact runtime performance. +``"patchable-function"`` + This attribute tells the code generator that the code + generated for this function needs to follow certain conventions that + make it possible for a runtime function to patch over it later. + The exact effect of this attribute depends on its string value, + for which there currently is one legal possibility: + + * ``"prologue-short-redirect"`` - This style of patchable + function is intended to support patching a function prologue to + redirect control away from the function in a thread safe + manner. It guarantees that the first instruction of the + function will be large enough to accommodate a short jump + instruction, and will be sufficiently aligned to allow being + fully changed via an atomic compare-and-swap instruction. + While the first requirement can be satisfied by inserting large + enough NOP, LLVM can and will try to re-purpose an existing + instruction (i.e. one that would have to be emitted anyway) as + the patchable instruction larger than a short jump. + + ``"prologue-short-redirect"`` is currently only supported on + x86-64. + + This attribute by itself does not imply restrictions on + inter-procedural optimizations. All of the semantic effects the + patching may have to be separately conveyed via the linkage type. ``readnone`` On a function, this attribute indicates that the function computes its result (or decides to unwind an exception) based strictly on its arguments, @@ -1361,6 +1475,13 @@ example: On an argument, this attribute indicates that the function does not write through this pointer argument, even though it may write to the memory that the pointer points to. +``writeonly`` + On a function, this attribute indicates that the function may write to but + does not read from memory. + + On an argument, this attribute indicates that the function may write to but + does not read through this pointer argument (even though it may read from + the memory that the pointer points to). ``argmemonly`` This attribute indicates that the only memory accesses inside function are loads and stores from objects pointed to by its pointer-typed arguments, @@ -1511,7 +1632,7 @@ operand bundle to not miscompile programs containing it. ways before control is transferred to the callee or invokee. - Calls and invokes with operand bundles have unknown read / write effect on the heap on entry and exit (even if the call target is - ``readnone`` or ``readonly``), unless they're overriden with + ``readnone`` or ``readonly``), unless they're overridden with callsite specific attributes. - An operand bundle at a call site cannot change the implementation of the called function. Inter-procedural optimizations work as @@ -1519,6 +1640,8 @@ operand bundle to not miscompile programs containing it. More specific types of operand bundles are described below. +.. _deopt_opbundles: + Deoptimization Operand Bundles ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -1602,6 +1725,18 @@ it is undefined behavior to execute a ``call`` or ``invoke`` which: Similarly, if no funclet EH pads have been entered-but-not-yet-exited, executing a ``call`` or ``invoke`` with a ``"funclet"`` bundle is undefined behavior. +GC Transition Operand Bundles +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +GC transition operand bundles are characterized by the +``"gc-transition"`` operand bundle tag. These operand bundles mark a +call as a transition between a function with one GC strategy to a +function with a different GC strategy. If coordinating the transition +between GC strategies requires additional code generation at the call +site, these bundles may contain any values that are needed by the +generated code. For more details, see :ref:`GC Transitions +<gc_transition_args>`. + .. _moduleasm: Module-Level Inline Assembly @@ -2086,6 +2221,26 @@ function's scope. uselistorder i32 (i32) @bar, { 1, 0 } uselistorder_bb @foo, %bb, { 5, 1, 3, 2, 0, 4 } +.. _source_filename: + +Source Filename +--------------- + +The *source filename* string is set to the original module identifier, +which will be the name of the compiled source file when compiling from +source through the clang front end, for example. It is then preserved through +the IR and bitcode. + +This is currently necessary to generate a consistent unique global +identifier for local functions used in profile data, which prepends the +source file name to the local function name. + +The syntax for the source file name is simply: + +.. code-block:: text + + source_filename = "/path/to/source.c" + .. _typesystem: Type System @@ -2692,7 +2847,7 @@ cleared low bit. However, in the ``%C`` example, the optimizer is allowed to assume that the '``undef``' operand could be the same as ``%Y``, allowing the whole '``select``' to be eliminated. -.. code-block:: llvm +.. code-block:: text %A = xor undef, undef @@ -2744,7 +2899,7 @@ does not execute at all. This allows us to delete the divide and all code after it. Because the undefined operation "can't happen", the optimizer can assume that it occurs in dead code. -.. code-block:: llvm +.. code-block:: text a: store undef -> %X b: store %X -> undef @@ -3119,7 +3274,7 @@ the same register to an output and an input. If this is not safe (e.g. if the assembly contains two instructions, where the first writes to one output, and the second reads an input and writes to a second output), then the "``&``" modifier must be used (e.g. "``=&r``") to specify that the output is an -"early-clobber" output. Marking an ouput as "early-clobber" ensures that LLVM +"early-clobber" output. Marking an output as "early-clobber" ensures that LLVM will not use the same register for any inputs (other than an input tied to this output). @@ -3453,8 +3608,14 @@ SystemZ: - ``K``: An immediate signed 16-bit integer. - ``L``: An immediate signed 20-bit integer. - ``M``: An immediate integer 0x7fffffff. -- ``Q``, ``R``, ``S``, ``T``: A memory address operand, treated the same as - ``m``, at the moment. +- ``Q``: A memory address operand with a base address and a 12-bit immediate + unsigned displacement. +- ``R``: A memory address operand with a base address, a 12-bit immediate + unsigned displacement, and an index register. +- ``S``: A memory address operand with a base address and a 20-bit immediate + signed displacement. +- ``T``: A memory address operand with a base address, a 20-bit immediate + signed displacement, and an index register. - ``r`` or ``d``: A 32, 64, or 128-bit integer register. - ``a``: A 32, 64, or 128-bit integer address register (excludes R0, which in an address context evaluates as zero). @@ -3723,7 +3884,7 @@ their operand. For example: Metadata nodes that aren't uniqued use the ``distinct`` keyword. For example: -.. code-block:: llvm +.. code-block:: text !0 = distinct !{!"test\00", i32 10} @@ -3788,11 +3949,11 @@ fields are tuples containing the debug info to be emitted along with the compile unit, regardless of code optimizations (some nodes are only emitted if there are references to them from instructions). -.. code-block:: llvm +.. code-block:: text !0 = !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang", isOptimized: true, flags: "-O2", runtimeVersion: 2, - splitDebugFilename: "abc.debug", emissionKind: 1, + splitDebugFilename: "abc.debug", emissionKind: FullDebug, enums: !2, retainedTypes: !3, subprograms: !4, globals: !5, imports: !6, macros: !7, dwoId: 0x0abcd) @@ -3824,7 +3985,7 @@ DIBasicType ``DIBasicType`` nodes represent primitive types, such as ``int``, ``bool`` and ``float``. ``tag:`` defaults to ``DW_TAG_base_type``. -.. code-block:: llvm +.. code-block:: text !0 = !DIBasicType(name: "unsigned char", size: 8, align: 8, encoding: DW_ATE_unsigned_char) @@ -3833,7 +3994,7 @@ DIBasicType The ``encoding:`` describes the details of the type. Usually it's one of the following: -.. code-block:: llvm +.. code-block:: text DW_ATE_address = 1 DW_ATE_boolean = 2 @@ -3853,7 +4014,7 @@ refers to a tuple; the first operand is the return type, while the rest are the types of the formal arguments in order. If the first operand is ``null``, that represents a function with no return value (such as ``void foo() {}`` in C++). -.. code-block:: llvm +.. code-block:: text !0 = !BasicType(name: "int", size: 32, align: 32, DW_ATE_signed) !1 = !BasicType(name: "char", size: 8, align: 8, DW_ATE_signed_char) @@ -3867,7 +4028,7 @@ DIDerivedType ``DIDerivedType`` nodes represent types derived from other types, such as qualified types. -.. code-block:: llvm +.. code-block:: text !0 = !DIBasicType(name: "unsigned char", size: 8, align: 8, encoding: DW_ATE_unsigned_char) @@ -3876,23 +4037,30 @@ qualified types. The following ``tag:`` values are valid: -.. code-block:: llvm +.. code-block:: text - DW_TAG_formal_parameter = 5 DW_TAG_member = 13 DW_TAG_pointer_type = 15 DW_TAG_reference_type = 16 DW_TAG_typedef = 22 + DW_TAG_inheritance = 28 DW_TAG_ptr_to_member_type = 31 DW_TAG_const_type = 38 + DW_TAG_friend = 42 DW_TAG_volatile_type = 53 DW_TAG_restrict_type = 55 +.. _DIDerivedTypeMember: + ``DW_TAG_member`` is used to define a member of a :ref:`composite type -<DICompositeType>` or :ref:`subprogram <DISubprogram>`. The type of the member -is the ``baseType:``. The ``offset:`` is the member's bit offset. -``DW_TAG_formal_parameter`` is used to define a member which is a formal -argument of a subprogram. +<DICompositeType>`. The type of the member is the ``baseType:``. The +``offset:`` is the member's bit offset. If the composite type has an ODR +``identifier:`` and does not set ``flags: DIFwdDecl``, then the member is +uniqued based only on its ``name:`` and ``scope:``. + +``DW_TAG_inheritance`` and ``DW_TAG_friend`` are used in the ``elements:`` +field of :ref:`composite types <DICompositeType>` to describe parents and +friends. ``DW_TAG_typedef`` is used to provide a name for the ``baseType:``. @@ -3911,11 +4079,17 @@ DICompositeType structures and unions. ``elements:`` points to a tuple of the composed types. If the source language supports ODR, the ``identifier:`` field gives the unique -identifier used for type merging between modules. When specified, other types -can refer to composite types indirectly via a :ref:`metadata string -<metadata-string>` that matches their identifier. +identifier used for type merging between modules. When specified, +:ref:`subprogram declarations <DISubprogramDeclaration>` and :ref:`member +derived types <DIDerivedTypeMember>` that reference the ODR-type in their +``scope:`` change uniquing rules. -.. code-block:: llvm +For a given ``identifier:``, there should only be a single composite type that +does not have ``flags: DIFlagFwdDecl`` set. LLVM tools that link modules +together will unique such definitions at parse time via the ``identifier:`` +field, even if the nodes are ``distinct``. + +.. code-block:: text !0 = !DIEnumerator(name: "SixKind", value: 7) !1 = !DIEnumerator(name: "SevenKind", value: 7) @@ -3926,16 +4100,13 @@ can refer to composite types indirectly via a :ref:`metadata string The following ``tag:`` values are valid: -.. code-block:: llvm +.. code-block:: text DW_TAG_array_type = 1 DW_TAG_class_type = 2 DW_TAG_enumeration_type = 4 DW_TAG_structure_type = 19 DW_TAG_union_type = 23 - DW_TAG_subroutine_type = 21 - DW_TAG_inheritance = 28 - For ``DW_TAG_array_type``, the ``elements:`` should be :ref:`subrange descriptors <DISubrange>`, each representing the range of subscripts at that @@ -3949,7 +4120,9 @@ value for the set. All enumeration type descriptors are collected in the For ``DW_TAG_structure_type``, ``DW_TAG_class_type``, and ``DW_TAG_union_type``, the ``elements:`` should be :ref:`derived types -<DIDerivedType>` with ``tag: DW_TAG_member`` or ``tag: DW_TAG_inheritance``. +<DIDerivedType>` with ``tag: DW_TAG_member``, ``tag: DW_TAG_inheritance``, or +``tag: DW_TAG_friend``; or :ref:`subprograms <DISubprogram>` with +``isDefinition: false``. .. _DISubrange: @@ -4038,7 +4211,15 @@ metadata. The ``variables:`` field points at :ref:`variables <DILocalVariable>` that must be retained, even if their IR counterparts are optimized out of the IR. The ``type:`` field must point at an :ref:`DISubroutineType`. -.. code-block:: llvm +.. _DISubprogramDeclaration: + +When ``isDefinition: false``, subprograms describe a declaration in the type +tree as opposed to a definition of a function. If the scope is a composite +type with an ODR ``identifier:`` and that does not set ``flags: DIFwdDecl``, +then the subprogram declaration is uniqued based only on its ``linkageName:`` +and ``scope:``. + +.. code-block:: text define void @_Z3foov() !dbg !0 { ... @@ -4046,7 +4227,7 @@ the IR. The ``type:`` field must point at an :ref:`DISubroutineType`. !0 = distinct !DISubprogram(name: "foo", linkageName: "_Zfoov", scope: !1, file: !2, line: 7, type: !3, isLocal: true, - isDefinition: false, scopeLine: 8, + isDefinition: true, scopeLine: 8, containingType: !4, virtuality: DW_VIRTUALITY_pure_virtual, virtualIndex: 10, flags: DIFlagPrototyped, @@ -4063,7 +4244,7 @@ DILexicalBlock two lexical blocks at same depth. They are valid targets for ``scope:`` fields. -.. code-block:: llvm +.. code-block:: text !0 = distinct !DILexicalBlock(scope: !1, file: !2, line: 7, column: 35) @@ -4109,7 +4290,7 @@ the ``arg:`` field is set to non-zero, then this variable is a subprogram parameter, and it will be included in the ``variables:`` field of its :ref:`DISubprogram`. -.. code-block:: llvm +.. code-block:: text !0 = !DILocalVariable(name: "this", arg: 1, scope: !3, file: !2, line: 7, type: !3, flags: DIFlagArtificial) @@ -4132,7 +4313,7 @@ The current supported vocabulary is limited: - ``DW_OP_bit_piece, 16, 8`` specifies the offset and size (``16`` and ``8`` here, respectively) of the variable piece from the working expression. -.. code-block:: llvm +.. code-block:: text !0 = !DIExpression(DW_OP_deref) !1 = !DIExpression(DW_OP_plus, 3) @@ -4155,7 +4336,7 @@ DIImportedEntity ``DIImportedEntity`` nodes represent entities (such as modules) imported into a compile unit. -.. code-block:: llvm +.. code-block:: text !2 = !DIImportedEntity(tag: DW_TAG_imported_module, name: "foo", scope: !0, entity: !1, line: 7) @@ -4165,10 +4346,10 @@ DIMacro ``DIMacro`` nodes represent definition or undefinition of a macro identifiers. The ``name:`` field is the macro identifier, followed by macro parameters when -definining a function-like macro, and the ``value`` field is the token-string +defining a function-like macro, and the ``value`` field is the token-string used to expand the macro identifier. -.. code-block:: llvm +.. code-block:: text !2 = !DIMacro(macinfo: DW_MACINFO_define, line: 7, name: "foo(x)", value: "((x) + 1)") @@ -4181,7 +4362,7 @@ DIMacroFile The ``nodes:`` field is a list of ``DIMacro`` and ``DIMacroFile`` nodes that appear in the included source file. -.. code-block:: llvm +.. code-block:: text !2 = !DIMacroFile(macinfo: DW_MACINFO_start_file, line: 7, file: !2, nodes: !3) @@ -4262,12 +4443,20 @@ instructions (loads, stores, memory-accessing calls, etc.) that carry ``noalias`` metadata can specifically be specified not to alias with some other collection of memory access instructions that carry ``alias.scope`` metadata. Each type of metadata specifies a list of scopes where each scope has an id and -a domain. When evaluating an aliasing query, if for some domain, the set +a domain. + +When evaluating an aliasing query, if for some domain, the set of scopes with that domain in one instruction's ``alias.scope`` list is a subset of (or equal to) the set of scopes for that domain in another instruction's ``noalias`` list, then the two memory accesses are assumed not to alias. +Because scopes in one domain don't affect scopes in other domains, separate +domains can be used to compose multiple independent noalias sets. This is +used for example during inlining. As the noalias function parameters are +turned into noalias scope metadata, a new domain is used every time the +function is inlined. + The metadata identifying each domain is itself a list containing one or two entries. The first entry is the name of the domain. Note that if the name is a string then it can be combined across functions and translation units. A @@ -4329,8 +4518,8 @@ it. ULP is defined as follows: distance between the two non-equal finite floating-point numbers nearest ``x``. Moreover, ``ulp(NaN)`` is ``NaN``. -The metadata node shall consist of a single positive floating point -number representing the maximum relative error, for example: +The metadata node shall consist of a single positive float type number +representing the maximum relative error, for example: .. code-block:: llvm @@ -4542,6 +4731,38 @@ For example: !0 = !{!"llvm.loop.unroll.full"} +'``llvm.loop.licm_versioning.disable``' Metadata +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This metadata indicates that the loop should not be versioned for the purpose +of enabling loop-invariant code motion (LICM). The metadata has a single operand +which is the string ``llvm.loop.licm_versioning.disable``. For example: + +.. code-block:: llvm + + !0 = !{!"llvm.loop.licm_versioning.disable"} + +'``llvm.loop.distribute.enable``' Metadata +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Loop distribution allows splitting a loop into multiple loops. Currently, +this is only performed if the entire loop cannot be vectorized due to unsafe +memory dependencies. The transformation will atempt to isolate the unsafe +dependencies into their own loop. + +This metadata can be used to selectively enable or disable distribution of the +loop. The first operand is the string ``llvm.loop.distribute.enable`` and the +second operand is a bit. If the bit operand value is 1 distribution is +enabled. A value of 0 disables distribution: + +.. code-block:: llvm + + !0 = !{!"llvm.loop.distribute.enable", i1 0} + !1 = !{!"llvm.loop.distribute.enable", i1 1} + +This metadata should be used in conjunction with ``llvm.loop`` loop +identification metadata. + '``llvm.mem``' ^^^^^^^^^^^^^^^ @@ -4555,7 +4776,8 @@ The ``llvm.mem.parallel_loop_access`` metadata refers to a loop identifier, or metadata containing a list of loop identifiers for nested loops. The metadata is attached to memory accessing instructions and denotes that no loop carried memory dependence exist between it and other instructions denoted -with the same loop identifier. +with the same loop identifier. The metadata on memory reads also implies that +if conversion (i.e. speculative execution within a loop iteration) is safe. Precisely, given two instructions ``m1`` and ``m2`` that both have the ``llvm.mem.parallel_loop_access`` metadata, with ``L1`` and ``L2`` being the @@ -4625,12 +4847,6 @@ the loop identifier metadata node directly: !1 = !{!1} ; an identifier for the inner loop !2 = !{!2} ; an identifier for the outer loop -'``llvm.bitsets``' -^^^^^^^^^^^^^^^^^^ - -The ``llvm.bitsets`` global metadata is used to implement -:doc:`bitsets <BitSets>`. - '``invariant.group``' Metadata ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -5267,7 +5483,7 @@ Syntax: :: - <result> = invoke [cconv] [ret attrs] <ptr to function ty> <function ptr val>(<function args>) [fn attrs] + <result> = invoke [cconv] [ret attrs] <ty>|<fnty> <fnptrval>(<function args>) [fn attrs] [operand bundles] to label <normal label> unwind label <exception label> Overview: @@ -5303,12 +5519,16 @@ This instruction requires several arguments: #. The optional :ref:`Parameter Attributes <paramattrs>` list for return values. Only '``zeroext``', '``signext``', and '``inreg``' attributes are valid here. -#. '``ptr to function ty``': shall be the signature of the pointer to - function value being invoked. In most cases, this is a direct - function invocation, but indirect ``invoke``'s are just as possible, - branching off an arbitrary pointer to function value. -#. '``function ptr val``': An LLVM value containing a pointer to a - function to be invoked. +#. '``ty``': the type of the call instruction itself which is also the + type of the return value. Functions that return no value are marked + ``void``. +#. '``fnty``': shall be the signature of the function being invoked. The + argument types must match the types implied by this signature. This + type can be omitted if the function is not varargs. +#. '``fnptrval``': An LLVM value containing a pointer to a function to + be invoked. In most cases, this is a direct function invocation, but + indirect ``invoke``'s are just as possible, calling an arbitrary pointer + to function value. #. '``function args``': argument list whose types match the function signature argument types and parameter attributes. All arguments must be of :ref:`first class <t_firstclass>` type. If the function signature @@ -5440,7 +5660,7 @@ block. Therefore, it must be the only non-phi instruction in the block. Example: """""""" -.. code-block:: llvm +.. code-block:: text dispatch1: %cs1 = catchswitch within none [label %handler0, label %handler1] unwind to caller @@ -5491,7 +5711,7 @@ the ``catchret``'s behavior is undefined. Example: """""""" -.. code-block:: llvm +.. code-block:: text catchret from %catch label %continue @@ -5541,7 +5761,7 @@ It transfers control to ``continue`` or unwinds out of the function. Example: """""""" -.. code-block:: llvm +.. code-block:: text cleanupret from %cleanup unwind to caller cleanupret from %cleanup unwind label %continue @@ -5631,7 +5851,7 @@ unsigned and/or signed overflow, respectively, occurs. Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = add i32 4, %var ; yields i32:result = 4 + %var @@ -5670,7 +5890,7 @@ optimizations: Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = fadd float 4.0, %var ; yields float:result = 4.0 + %var @@ -5722,7 +5942,7 @@ unsigned and/or signed overflow, respectively, occurs. Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = sub i32 4, %var ; yields i32:result = 4 - %var <result> = sub i32 0, %val ; yields i32:result = -%var @@ -5765,7 +5985,7 @@ unsafe floating point optimizations: Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = fsub float 4.0, %var ; yields float:result = 4.0 - %var <result> = fsub float -0.0, %val ; yields float:result = -%var @@ -5819,7 +6039,7 @@ unsigned and/or signed overflow, respectively, occurs. Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = mul i32 4, %var ; yields i32:result = 4 * %var @@ -5858,7 +6078,7 @@ unsafe floating point optimizations: Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = fmul float 4.0, %var ; yields float:result = 4.0 * %var @@ -5902,7 +6122,7 @@ such, "((a udiv exact b) mul b) == a"). Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = udiv i32 4, %var ; yields i32:result = 4 / %var @@ -5948,7 +6168,7 @@ a :ref:`poison value <poisonvalues>` if the result would be rounded. Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = sdiv i32 4, %var ; yields i32:result = 4 / %var @@ -5987,7 +6207,7 @@ unsafe floating point optimizations: Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = fdiv float 4.0, %var ; yields float:result = 4.0 / %var @@ -6029,7 +6249,7 @@ Taking the remainder of a division by zero leads to undefined behavior. Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = urem i32 4, %var ; yields i32:result = 4 % %var @@ -6084,7 +6304,7 @@ result of the division and the remainder.) Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = srem i32 4, %var ; yields i32:result = 4 % %var @@ -6124,7 +6344,7 @@ to enable otherwise unsafe floating point optimizations: Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = frem float 4.0, %var ; yields float:result = 4.0 % %var @@ -6186,7 +6406,7 @@ nsw/nuw bits in (mul %op1, (shl 1, %op2)). Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = shl i32 4, %var ; yields i32: 4 << %var <result> = shl i32 4, 2 ; yields i32: 16 @@ -6235,7 +6455,7 @@ non-zero. Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = lshr i32 4, 1 ; yields i32:result = 2 <result> = lshr i32 4, 2 ; yields i32:result = 1 @@ -6286,7 +6506,7 @@ non-zero. Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = ashr i32 4, 1 ; yields i32:result = 2 <result> = ashr i32 4, 2 ; yields i32:result = 1 @@ -6338,7 +6558,7 @@ The truth table used for the '``and``' instruction is: Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = and i32 4, %var ; yields i32:result = 4 & %var <result> = and i32 15, 40 ; yields i32:result = 8 @@ -6437,7 +6657,7 @@ The truth table used for the '``xor``' instruction is: Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = xor i32 4, %var ; yields i32:result = 4 ^ %var <result> = xor i32 15, 40 ; yields i32:result = 39 @@ -6490,7 +6710,7 @@ exceeds the length of ``val``, the results are undefined. Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = extractelement <4 x i32> %vec, i32 0 ; yields i32 @@ -6532,7 +6752,7 @@ undefined. Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = insertelement <4 x i32> %vec, i32 1, i32 0 ; yields <4 x i32> @@ -6580,7 +6800,7 @@ only one vector. Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = shufflevector <4 x i32> %v1, <4 x i32> %v2, <4 x i32> <i32 0, i32 4, i32 1, i32 5> ; yields <4 x i32> @@ -6639,7 +6859,7 @@ the index operands. Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = extractvalue {i32, float} %agg, 0 ; yields i32 @@ -6767,7 +6987,7 @@ Syntax: :: <result> = load [volatile] <ty>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>][, !invariant.load !<index>][, !invariant.group !<index>][, !nonnull !<index>][, !dereferenceable !<deref_bytes_node>][, !dereferenceable_or_null !<deref_bytes_node>][, !align !<align_node>] - <result> = load atomic [volatile] <ty>* <pointer> [singlethread] <ordering>, align <alignment> [, !invariant.group !<index>] + <result> = load atomic [volatile] <ty>, <ty>* <pointer> [singlethread] <ordering>, align <alignment> [, !invariant.group !<index>] !<index> = !{ i32 1 } !<deref_bytes_node> = !{i64 <dereferenceable_bytes>} !<align_node> = !{ i64 <value_alignment> } @@ -6780,12 +7000,12 @@ The '``load``' instruction is used to read from memory. Arguments: """""""""" -The argument to the ``load`` instruction specifies the memory address -from which to load. The type specified must be a :ref:`first -class <t_firstclass>` type. If the ``load`` is marked as ``volatile``, -then the optimizer is not allowed to modify the number or order of -execution of this ``load`` with other :ref:`volatile -operations <volatile>`. +The argument to the ``load`` instruction specifies the memory address from which +to load. The type specified must be a :ref:`first class <t_firstclass>` type of +known size (i.e. not containing an :ref:`opaque structural type <t_opaque>`). If +the ``load`` is marked as ``volatile``, then the optimizer is not allowed to +modify the number or order of execution of this ``load`` with other +:ref:`volatile operations <volatile>`. If the ``load`` is marked as ``atomic``, it takes an extra :ref:`ordering <ordering>` and optional ``singlethread`` argument. The ``release`` and @@ -6805,7 +7025,12 @@ alignment for the target. It is the responsibility of the code emitter to ensure that the alignment information is correct. Overestimating the alignment results in undefined behavior. Underestimating the alignment may produce less efficient code. An alignment of 1 is always safe. The -maximum possible alignment is ``1 << 29``. +maximum possible alignment is ``1 << 29``. An alignment value higher +than the size of the loaded type implies memory up to the alignment +value bytes can be safely loaded without trapping in the default +address space. Access of the high bytes can interfere with debugging +tools, so should not be accessed if the function has the +``sanitize_thread`` or ``sanitize_address`` attributes. The optional ``!nontemporal`` metadata must reference a single metadata name ``<index>`` corresponding to a metadata node with one @@ -6903,13 +7128,14 @@ The '``store``' instruction is used to write to memory. Arguments: """""""""" -There are two arguments to the ``store`` instruction: a value to store -and an address at which to store it. The type of the ``<pointer>`` -operand must be a pointer to the :ref:`first class <t_firstclass>` type of -the ``<value>`` operand. If the ``store`` is marked as ``volatile``, -then the optimizer is not allowed to modify the number or order of -execution of this ``store`` with other :ref:`volatile -operations <volatile>`. +There are two arguments to the ``store`` instruction: a value to store and an +address at which to store it. The type of the ``<pointer>`` operand must be a +pointer to the :ref:`first class <t_firstclass>` type of the ``<value>`` +operand. If the ``store`` is marked as ``volatile``, then the optimizer is not +allowed to modify the number or order of execution of this ``store`` with other +:ref:`volatile operations <volatile>`. Only values of :ref:`first class +<t_firstclass>` types of known size (i.e. not containing an :ref:`opaque +structural type <t_opaque>`) can be stored. If the ``store`` is marked as ``atomic``, it takes an extra :ref:`ordering <ordering>` and optional ``singlethread`` argument. The ``acquire`` and @@ -6929,7 +7155,14 @@ alignment for the target. It is the responsibility of the code emitter to ensure that the alignment information is correct. Overestimating the alignment results in undefined behavior. Underestimating the alignment may produce less efficient code. An alignment of 1 is always -safe. The maximum possible alignment is ``1 << 29``. +safe. The maximum possible alignment is ``1 << 29``. An alignment +value higher than the size of the stored type implies memory up to the +alignment value bytes can be stored to without trapping in the default +address space. Storing to the higher bytes however may result in data +races if another thread can access the same address. Introducing a +data race is not allowed. Storing to the extra bytes is not allowed +even in situations where a data race is known to not exist if the +function has the ``sanitize_address`` attribute. The optional ``!nontemporal`` metadata must reference a single metadata name ``<index>`` corresponding to a metadata node with one ``i32`` entry of @@ -7044,13 +7277,13 @@ Arguments: There are three arguments to the '``cmpxchg``' instruction: an address to operate on, a value to compare to the value currently be at that address, and a new value to place at that address if the compared values -are equal. The type of '<cmp>' must be an integer type whose bit width -is a power of two greater than or equal to eight and less than or equal -to a target-specific size limit. '<cmp>' and '<new>' must have the same -type, and the type of '<pointer>' must be a pointer to that type. If the -``cmpxchg`` is marked as ``volatile``, then the optimizer is not allowed -to modify the number or order of execution of this ``cmpxchg`` with -other :ref:`volatile operations <volatile>`. +are equal. The type of '<cmp>' must be an integer or pointer type whose +bit width is a power of two greater than or equal to eight and less +than or equal to a target-specific size limit. '<cmp>' and '<new>' must +have the same type, and the type of '<pointer>' must be a pointer to +that type. If the ``cmpxchg`` is marked as ``volatile``, then the +optimizer is not allowed to modify the number or order of execution of +this ``cmpxchg`` with other :ref:`volatile operations <volatile>`. The success and failure :ref:`ordering <ordering>` arguments specify how this ``cmpxchg`` synchronizes with other atomic operations. Both ordering parameters @@ -7091,11 +7324,11 @@ Example: .. code-block:: llvm entry: - %orig = atomic load i32, i32* %ptr unordered ; yields i32 + %orig = load atomic i32, i32* %ptr unordered, align 4 ; yields i32 br label %loop loop: - %cmp = phi i32 [ %orig, %entry ], [%old, %loop] + %cmp = phi i32 [ %orig, %entry ], [%value_loaded, %loop] %squared = mul i32 %cmp, %cmp %val_success = cmpxchg i32* %ptr, i32 %cmp, i32 %squared acq_rel monotonic ; yields { i32, i1 } %value_loaded = extractvalue { i32, i1 } %val_success, 0 @@ -7893,7 +8126,7 @@ or :ref:`ptrtoint <i_ptrtoint>` instructions first. Example: """""""" -.. code-block:: llvm +.. code-block:: text %X = bitcast i8 255 to i8 ; yields i8 :-1 %Y = bitcast i32* %x to sint* ; yields sint*:%x @@ -7977,7 +8210,7 @@ Arguments: The '``icmp``' instruction takes three operands. The first operand is the condition code indicating the kind of comparison to perform. It is -not a value, just a keyword. The possible condition code are: +not a value, just a keyword. The possible condition codes are: #. ``eq``: equal #. ``ne``: not equal @@ -8032,7 +8265,7 @@ as the values being compared. Otherwise, the result is an ``i1``. Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = icmp eq i32 4, 5 ; yields: result=false <result> = icmp ne float* %X, %X ; yields: result=false @@ -8041,9 +8274,6 @@ Example: <result> = icmp ule i16 -4, 5 ; yields: result=false <result> = icmp sge i16 4, 5 ; yields: result=false -Note that the code generator does not yet support vector types with the -``icmp`` instruction. - .. _i_fcmp: '``fcmp``' Instruction @@ -8074,7 +8304,7 @@ Arguments: The '``fcmp``' instruction takes three operands. The first operand is the condition code indicating the kind of comparison to perform. It is -not a value, just a keyword. The possible condition code are: +not a value, just a keyword. The possible condition codes are: #. ``false``: no comparison, always returns false #. ``oeq``: ordered and equal @@ -8149,16 +8379,13 @@ assumptions to be made about the values of input arguments; namely Example: """""""" -.. code-block:: llvm +.. code-block:: text <result> = fcmp oeq float 4.0, 5.0 ; yields: result=false <result> = fcmp one float 4.0, 5.0 ; yields: result=true <result> = fcmp olt float 4.0, 5.0 ; yields: result=true <result> = fcmp ueq double 1.0, 2.0 ; yields: result=false -Note that the code generator does not yet support vector types with the -``fcmp`` instruction. - .. _i_phi: '``phi``' Instruction @@ -8270,7 +8497,7 @@ Syntax: :: - <result> = [tail | musttail | notail ] call [fast-math flags] [cconv] [ret attrs] <ty> [<fnty>*] <fnptrval>(<function args>) [fn attrs] + <result> = [tail | musttail | notail ] call [fast-math flags] [cconv] [ret attrs] <ty>|<fnty> <fnptrval>(<function args>) [fn attrs] [ operand bundles ] Overview: @@ -8343,13 +8570,11 @@ This instruction requires several arguments: #. '``ty``': the type of the call instruction itself which is also the type of the return value. Functions that return no value are marked ``void``. -#. '``fnty``': shall be the signature of the pointer to function value - being invoked. The argument types must match the types implied by - this signature. This type can be omitted if the function is not - varargs and if the function type does not return a pointer to a - function. +#. '``fnty``': shall be the signature of the function being called. The + argument types must match the types implied by this signature. This + type can be omitted if the function is not varargs. #. '``fnptrval``': An LLVM value containing a pointer to a function to - be invoked. In most cases, this is a direct function invocation, but + be called. In most cases, this is a direct function call, but indirect ``call``'s are just as possible, calling an arbitrary pointer to function value. #. '``function args``': argument list whose types match the function @@ -8358,8 +8583,8 @@ This instruction requires several arguments: indicates the function accepts a variable number of arguments, the extra arguments can be specified. #. The optional :ref:`function attributes <fnattrs>` list. Only - '``noreturn``', '``nounwind``', '``readonly``' and '``readnone``' - attributes are valid here. + '``noreturn``', '``nounwind``', '``readonly``' , '``readnone``', + and '``convergent``' attributes are valid here. #. The optional :ref:`operand bundles <opbundles>` list. Semantics: @@ -8590,7 +8815,7 @@ that does not carry an appropriate :ref:`"funclet" bundle <ob_funclet>`. Example: """""""" -.. code-block:: llvm +.. code-block:: text dispatch: %cs = catchswitch within none [label %handler0] unwind to caller @@ -8660,7 +8885,7 @@ that does not carry an appropriate :ref:`"funclet" bundle <ob_funclet>`. Example: """""""" -.. code-block:: llvm +.. code-block:: text %tok = cleanuppad within %cs [] @@ -9497,6 +9722,33 @@ pass will generate the appropriate data structures and replace the ``llvm.instrprof_value_profile`` intrinsic with the call to the profile runtime library with proper arguments. +'``llvm.thread.pointer``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i8* @llvm.thread.pointer() + +Overview: +""""""""" + +The '``llvm.thread.pointer``' intrinsic returns the value of the thread +pointer. + +Semantics: +"""""""""" + +The '``llvm.thread.pointer``' intrinsic returns a pointer to the TLS area +for the current thread. The exact semantics of this value are target +specific: it may point to the start of TLS area, to the end, or somewhere +in the middle. Depending on the target, this intrinsic may read a register, +call a helper function, read from an alternate memory space, or perform +other operations necessary to locate the TLS area. Not all targets support +this intrinsic. + Standard C Library Intrinsics ----------------------------- @@ -10459,8 +10711,8 @@ Overview: """"""""" The '``llvm.bitreverse``' family of intrinsics is used to reverse the -bitpattern of an integer value; for example ``0b1234567`` becomes -``0b7654321``. +bitpattern of an integer value; for example ``0b10110110`` becomes +``0b01101101``. Semantics: """""""""" @@ -10558,7 +10810,7 @@ targets support all bit widths or vector types, however. declare i32 @llvm.ctlz.i32 (i32 <src>, i1 <is_zero_undef>) declare i64 @llvm.ctlz.i64 (i64 <src>, i1 <is_zero_undef>) declare i256 @llvm.ctlz.i256(i256 <src>, i1 <is_zero_undef>) - declase <2 x i32> @llvm.ctlz.v2i32(<2 x i32> <src>, i1 <is_zero_undef>) + declare <2 x i32> @llvm.ctlz.v2i32(<2 x i32> <src>, i1 <is_zero_undef>) Overview: """"""""" @@ -10605,7 +10857,7 @@ support all bit widths or vector types, however. declare i32 @llvm.cttz.i32 (i32 <src>, i1 <is_zero_undef>) declare i64 @llvm.cttz.i64 (i64 <src>, i1 <is_zero_undef>) declare i256 @llvm.cttz.i256(i256 <src>, i1 <is_zero_undef>) - declase <2 x i32> @llvm.cttz.v2i32(<2 x i32> <src>, i1 <is_zero_undef>) + declare <2 x i32> @llvm.cttz.v2i32(<2 x i32> <src>, i1 <is_zero_undef>) Overview: """"""""" @@ -10640,7 +10892,26 @@ then the result is the size in bits of the type of ``src`` if Arithmetic with Overflow Intrinsics ----------------------------------- -LLVM provides intrinsics for some arithmetic with overflow operations. +LLVM provides intrinsics for fast arithmetic overflow checking. + +Each of these intrinsics returns a two-element struct. The first +element of this struct contains the result of the corresponding +arithmetic operation modulo 2\ :sup:`n`\ , where n is the bit width of +the result. Therefore, for example, the first element of the struct +returned by ``llvm.sadd.with.overflow.i32`` is always the same as the +result of a 32-bit ``add`` instruction with the same operands, where +the ``add`` is *not* modified by an ``nsw`` or ``nuw`` flag. + +The second element of the result is an ``i1`` that is 1 if the +arithmetic operation overflowed and 0 otherwise. An operation +overflows if, for any values of its operands ``A`` and ``B`` and for +any ``N`` larger than the operands' width, ``ext(A op B) to iN`` is +not equal to ``(ext(A) to iN) op (ext(B) to iN)`` where ``ext`` is +``sext`` for signed overflow and ``zext`` for unsigned overflow, and +``op`` is the underlying arithmetic operation. + +The behavior of these intrinsics is well-defined for all argument +values. '``llvm.sadd.with.overflow.*``' Intrinsics ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -10980,7 +11251,7 @@ Examples of non-canonical encodings: - Many normal decimal floating point numbers have non-canonical alternative encodings. - Some machines, like GPUs or ARMv7 NEON, do not support subnormal values. - These are treated as non-canonical encodings of zero and with be flushed to + These are treated as non-canonical encodings of zero and will be flushed to a zero of the same sign by this operation. Note that per IEEE-754-2008 6.2, systems that support signaling NaNs with @@ -11304,12 +11575,12 @@ This is an overloaded intrinsic. The loaded data is a vector of any integer, flo :: - declare <16 x float> @llvm.masked.load.v16f32 (<16 x float>* <ptr>, i32 <alignment>, <16 x i1> <mask>, <16 x float> <passthru>) - declare <2 x double> @llvm.masked.load.v2f64 (<2 x double>* <ptr>, i32 <alignment>, <2 x i1> <mask>, <2 x double> <passthru>) + declare <16 x float> @llvm.masked.load.v16f32.p0v16f32 (<16 x float>* <ptr>, i32 <alignment>, <16 x i1> <mask>, <16 x float> <passthru>) + declare <2 x double> @llvm.masked.load.v2f64.p0v2f64 (<2 x double>* <ptr>, i32 <alignment>, <2 x i1> <mask>, <2 x double> <passthru>) ;; The data is a vector of pointers to double - declare <8 x double*> @llvm.masked.load.v8p0f64 (<8 x double*>* <ptr>, i32 <alignment>, <8 x i1> <mask>, <8 x double*> <passthru>) + declare <8 x double*> @llvm.masked.load.v8p0f64.p0v8p0f64 (<8 x double*>* <ptr>, i32 <alignment>, <8 x i1> <mask>, <8 x double*> <passthru>) ;; The data is a vector of function pointers - declare <8 x i32 ()*> @llvm.masked.load.v8p0f_i32f (<8 x i32 ()*>* <ptr>, i32 <alignment>, <8 x i1> <mask>, <8 x i32 ()*> <passthru>) + declare <8 x i32 ()*> @llvm.masked.load.v8p0f_i32f.p0v8p0f_i32f (<8 x i32 ()*>* <ptr>, i32 <alignment>, <8 x i1> <mask>, <8 x i32 ()*> <passthru>) Overview: """"""""" @@ -11332,7 +11603,7 @@ The result of this operation is equivalent to a regular vector load instruction :: - %res = call <16 x float> @llvm.masked.load.v16f32 (<16 x float>* %ptr, i32 4, <16 x i1>%mask, <16 x float> %passthru) + %res = call <16 x float> @llvm.masked.load.v16f32.p0v16f32 (<16 x float>* %ptr, i32 4, <16 x i1>%mask, <16 x float> %passthru) ;; The result of the two following instructions is identical aside from potential memory access exception %loadlal = load <16 x float>, <16 x float>* %ptr, align 4 @@ -11349,12 +11620,12 @@ This is an overloaded intrinsic. The data stored in memory is a vector of any in :: - declare void @llvm.masked.store.v8i32 (<8 x i32> <value>, <8 x i32>* <ptr>, i32 <alignment>, <8 x i1> <mask>) - declare void @llvm.masked.store.v16f32 (<16 x float> <value>, <16 x float>* <ptr>, i32 <alignment>, <16 x i1> <mask>) + declare void @llvm.masked.store.v8i32.p0v8i32 (<8 x i32> <value>, <8 x i32>* <ptr>, i32 <alignment>, <8 x i1> <mask>) + declare void @llvm.masked.store.v16f32.p0v16f32 (<16 x float> <value>, <16 x float>* <ptr>, i32 <alignment>, <16 x i1> <mask>) ;; The data is a vector of pointers to double - declare void @llvm.masked.store.v8p0f64 (<8 x double*> <value>, <8 x double*>* <ptr>, i32 <alignment>, <8 x i1> <mask>) + declare void @llvm.masked.store.v8p0f64.p0v8p0f64 (<8 x double*> <value>, <8 x double*>* <ptr>, i32 <alignment>, <8 x i1> <mask>) ;; The data is a vector of function pointers - declare void @llvm.masked.store.v4p0f_i32f (<4 x i32 ()*> <value>, <4 x i32 ()*>* <ptr>, i32 <alignment>, <4 x i1> <mask>) + declare void @llvm.masked.store.v4p0f_i32f.p0v4p0f_i32f (<4 x i32 ()*> <value>, <4 x i32 ()*>* <ptr>, i32 <alignment>, <4 x i1> <mask>) Overview: """"""""" @@ -11375,7 +11646,7 @@ The result of this operation is equivalent to a load-modify-store sequence. Howe :: - call void @llvm.masked.store.v16f32(<16 x float> %value, <16 x float>* %ptr, i32 4, <16 x i1> %mask) + call void @llvm.masked.store.v16f32.p0v16f32(<16 x float> %value, <16 x float>* %ptr, i32 4, <16 x i1> %mask) ;; The result of the following instructions is identical aside from potential data races and memory access exceptions %oldval = load <16 x float>, <16 x float>* %ptr, align 4 @@ -11475,7 +11746,7 @@ The '``llvm.masked.scatter``' intrinsics is designed for writing selected vector :: - ;; This instruction unconditionaly stores data vector in multiple addresses + ;; This instruction unconditionally stores data vector in multiple addresses call @llvm.masked.scatter.v8i32 (<8 x i32> %value, <8 x i32*> %ptrs, i32 4, <8 x i1> <true, true, .. true>) ;; It is equivalent to a list of scalar stores @@ -11859,43 +12130,40 @@ checked against the original guard by ``llvm.stackprotectorcheck``. If they are different, then ``llvm.stackprotectorcheck`` causes the program to abort by calling the ``__stack_chk_fail()`` function. -'``llvm.stackprotectorcheck``' Intrinsic -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +'``llvm.stackguard``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Syntax: """"""" :: - declare void @llvm.stackprotectorcheck(i8** <guard>) + declare i8* @llvm.stackguard() Overview: """"""""" -The ``llvm.stackprotectorcheck`` intrinsic compares ``guard`` against an already -created stack protector and if they are not equal calls the -``__stack_chk_fail()`` function. +The ``llvm.stackguard`` intrinsic returns the system stack guard value. + +It should not be generated by frontends, since it is only for internal usage. +The reason why we create this intrinsic is that we still support IR form Stack +Protector in FastISel. Arguments: """""""""" -The ``llvm.stackprotectorcheck`` intrinsic requires one pointer argument, the -the variable ``@__stack_chk_guard``. +None. Semantics: """""""""" -This intrinsic is provided to perform the stack protector check by comparing -``guard`` with the stack slot created by ``llvm.stackprotector`` and if the -values do not match call the ``__stack_chk_fail()`` function. +On some platforms, the value returned by this intrinsic remains unchanged +between loads in the same thread. On other platforms, it returns the same +global variable value, if any, e.g. ``@__stack_chk_guard``. -The reason to provide this as an IR level intrinsic instead of implementing it -via other IR operations is that in order to perform this operation at the IR -level without an intrinsic, one would need to create additional basic blocks to -handle the success/failure cases. This makes it difficult to stop the stack -protector check from disrupting sibling tail calls in Codegen. With this -intrinsic, we are able to generate the stack protector basic blocks late in -codegen after the tail call decision has occurred. +Currently some platforms have IR-level customized stack guard loading (e.g. +X86 Linux) that is not handled by ``llvm.stackguard()``, while they should be +in the future. '``llvm.objectsize``' Intrinsic ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -12010,9 +12278,9 @@ sufficient overall improvement in code quality. For this reason, that the optimizer can otherwise deduce or facts that are of little use to the optimizer. -.. _bitset.test: +.. _type.test: -'``llvm.bitset.test``' Intrinsic +'``llvm.type.test``' Intrinsic ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Syntax: @@ -12020,20 +12288,74 @@ Syntax: :: - declare i1 @llvm.bitset.test(i8* %ptr, metadata %bitset) nounwind readnone + declare i1 @llvm.type.test(i8* %ptr, metadata %type) nounwind readnone Arguments: """""""""" The first argument is a pointer to be tested. The second argument is a -metadata object representing an identifier for a :doc:`bitset <BitSets>`. +metadata object representing a :doc:`type identifier <TypeMetadata>`. + +Overview: +""""""""" + +The ``llvm.type.test`` intrinsic tests whether the given pointer is associated +with the given type identifier. + +'``llvm.type.checked.load``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare {i8*, i1} @llvm.type.checked.load(i8* %ptr, i32 %offset, metadata %type) argmemonly nounwind readonly + + +Arguments: +"""""""""" + +The first argument is a pointer from which to load a function pointer. The +second argument is the byte offset from which to load the function pointer. The +third argument is a metadata object representing a :doc:`type identifier +<TypeMetadata>`. Overview: """"""""" -The ``llvm.bitset.test`` intrinsic tests whether the given pointer is a -member of the given bitset. +The ``llvm.type.checked.load`` intrinsic safely loads a function pointer from a +virtual table pointer using type metadata. This intrinsic is used to implement +control flow integrity in conjunction with virtual call optimization. The +virtual call optimization pass will optimize away ``llvm.type.checked.load`` +intrinsics associated with devirtualized calls, thereby removing the type +check in cases where it is not needed to enforce the control flow integrity +constraint. + +If the given pointer is associated with a type metadata identifier, this +function returns true as the second element of its return value. (Note that +the function may also return true if the given pointer is not associated +with a type metadata identifier.) If the function's return value's second +element is true, the following rules apply to the first element: + +- If the given pointer is associated with the given type metadata identifier, + it is the function pointer loaded from the given byte offset from the given + pointer. + +- If the given pointer is not associated with the given type metadata + identifier, it is one of the following (the choice of which is unspecified): + + 1. The function pointer that would have been loaded from an arbitrarily chosen + (through an unspecified mechanism) pointer associated with the type + metadata. + + 2. If the function has a non-void return type, a pointer to a function that + returns an unspecified value without causing side effects. + +If the function's return value's second element is false, the value of the +first element is undefined. + '``llvm.donothing``' Intrinsic ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -12049,8 +12371,9 @@ Overview: """"""""" The ``llvm.donothing`` intrinsic doesn't perform any operation. It's one of only -two intrinsics (besides ``llvm.experimental.patchpoint``) that can be called -with an invoke instruction. +three intrinsics (besides ``llvm.experimental.patchpoint`` and +``llvm.experimental.gc.statepoint``) that can be called with an invoke +instruction. Arguments: """""""""" @@ -12063,6 +12386,155 @@ Semantics: This intrinsic does nothing, and it's removed by optimizers and ignored by codegen. +'``llvm.experimental.deoptimize``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare type @llvm.experimental.deoptimize(...) [ "deopt"(...) ] + +Overview: +""""""""" + +This intrinsic, together with :ref:`deoptimization operand bundles +<deopt_opbundles>`, allow frontends to express transfer of control and +frame-local state from the currently executing (typically more specialized, +hence faster) version of a function into another (typically more generic, hence +slower) version. + +In languages with a fully integrated managed runtime like Java and JavaScript +this intrinsic can be used to implement "uncommon trap" or "side exit" like +functionality. In unmanaged languages like C and C++, this intrinsic can be +used to represent the slow paths of specialized functions. + + +Arguments: +"""""""""" + +The intrinsic takes an arbitrary number of arguments, whose meaning is +decided by the :ref:`lowering strategy<deoptimize_lowering>`. + +Semantics: +"""""""""" + +The ``@llvm.experimental.deoptimize`` intrinsic executes an attached +deoptimization continuation (denoted using a :ref:`deoptimization +operand bundle <deopt_opbundles>`) and returns the value returned by +the deoptimization continuation. Defining the semantic properties of +the continuation itself is out of scope of the language reference -- +as far as LLVM is concerned, the deoptimization continuation can +invoke arbitrary side effects, including reading from and writing to +the entire heap. + +Deoptimization continuations expressed using ``"deopt"`` operand bundles always +continue execution to the end of the physical frame containing them, so all +calls to ``@llvm.experimental.deoptimize`` must be in "tail position": + + - ``@llvm.experimental.deoptimize`` cannot be invoked. + - The call must immediately precede a :ref:`ret <i_ret>` instruction. + - The ``ret`` instruction must return the value produced by the + ``@llvm.experimental.deoptimize`` call if there is one, or void. + +Note that the above restrictions imply that the return type for a call to +``@llvm.experimental.deoptimize`` will match the return type of its immediate +caller. + +The inliner composes the ``"deopt"`` continuations of the caller into the +``"deopt"`` continuations present in the inlinee, and also updates calls to this +intrinsic to return directly from the frame of the function it inlined into. + +All declarations of ``@llvm.experimental.deoptimize`` must share the +same calling convention. + +.. _deoptimize_lowering: + +Lowering: +""""""""" + +Calls to ``@llvm.experimental.deoptimize`` are lowered to calls to the +symbol ``__llvm_deoptimize`` (it is the frontend's responsibility to +ensure that this symbol is defined). The call arguments to +``@llvm.experimental.deoptimize`` are lowered as if they were formal +arguments of the specified types, and not as varargs. + + +'``llvm.experimental.guard``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare void @llvm.experimental.guard(i1, ...) [ "deopt"(...) ] + +Overview: +""""""""" + +This intrinsic, together with :ref:`deoptimization operand bundles +<deopt_opbundles>`, allows frontends to express guards or checks on +optimistic assumptions made during compilation. The semantics of +``@llvm.experimental.guard`` is defined in terms of +``@llvm.experimental.deoptimize`` -- its body is defined to be +equivalent to: + +.. code-block:: text + + define void @llvm.experimental.guard(i1 %pred, <args...>) { + %realPred = and i1 %pred, undef + br i1 %realPred, label %continue, label %leave [, !make.implicit !{}] + + leave: + call void @llvm.experimental.deoptimize(<args...>) [ "deopt"() ] + ret void + + continue: + ret void + } + + +with the optional ``[, !make.implicit !{}]`` present if and only if it +is present on the call site. For more details on ``!make.implicit``, +see :doc:`FaultMaps`. + +In words, ``@llvm.experimental.guard`` executes the attached +``"deopt"`` continuation if (but **not** only if) its first argument +is ``false``. Since the optimizer is allowed to replace the ``undef`` +with an arbitrary value, it can optimize guard to fail "spuriously", +i.e. without the original condition being false (hence the "not only +if"); and this allows for "check widening" type optimizations. + +``@llvm.experimental.guard`` cannot be invoked. + + +'``llvm.load.relative``' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i8* @llvm.load.relative.iN(i8* %ptr, iN %offset) argmemonly nounwind readonly + +Overview: +""""""""" + +This intrinsic loads a 32-bit value from the address ``%ptr + %offset``, +adds ``%ptr`` to that value and returns it. The constant folder specifically +recognizes the form of this intrinsic and the constant initializers it may +load from; if a loaded constant initializer is known to have the form +``i32 trunc(x - %ptr)``, the intrinsic call is folded to ``x``. + +LLVM provides that the calculation of such a constant initializer will +not overflow at link time under the medium code model if ``x`` is an +``unnamed_addr`` function. However, it does not provide this guarantee for +a constant initializer folded into a function body. This intrinsic can be +used to avoid the possibility of overflows when loading from such a constant. + Stack Map Intrinsics -------------------- diff --git a/gnu/llvm/docs/LibFuzzer.rst b/gnu/llvm/docs/LibFuzzer.rst index 84adff3616f..92937c2d0b5 100644 --- a/gnu/llvm/docs/LibFuzzer.rst +++ b/gnu/llvm/docs/LibFuzzer.rst @@ -1,90 +1,373 @@ -======================================================== -LibFuzzer -- a library for coverage-guided fuzz testing. -======================================================== +======================================================= +libFuzzer – a library for coverage-guided fuzz testing. +======================================================= .. contents:: :local: - :depth: 4 + :depth: 1 Introduction ============ -This library is intended primarily for in-process coverage-guided fuzz testing -(fuzzing) of other libraries. The typical workflow looks like this: - -* Build the Fuzzer library as a static archive (or just a set of .o files). - Note that the Fuzzer contains the main() function. - Preferably do *not* use sanitizers while building the Fuzzer. -* Build the library you are going to test with - `-fsanitize-coverage={bb,edge}[,indirect-calls,8bit-counters]` - and one of the sanitizers. We recommend to build the library in several - different modes (e.g. asan, msan, lsan, ubsan, etc) and even using different - optimizations options (e.g. -O0, -O1, -O2) to diversify testing. -* Build a test driver using the same options as the library. - The test driver is a C/C++ file containing interesting calls to the library - inside a single function ``extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size);``. - Currently, the only expected return value is 0, others are reserved for future. -* Link the Fuzzer, the library and the driver together into an executable - using the same sanitizer options as for the library. -* Collect the initial corpus of inputs for the - fuzzer (a directory with test inputs, one file per input). - The better your inputs are the faster you will find something interesting. - Also try to keep your inputs small, otherwise the Fuzzer will run too slow. - By default, the Fuzzer limits the size of every input to 64 bytes - (use ``-max_len=N`` to override). -* Run the fuzzer with the test corpus. As new interesting test cases are - discovered they will be added to the corpus. If a bug is discovered by - the sanitizer (asan, etc) it will be reported as usual and the reproducer - will be written to disk. - Each Fuzzer process is single-threaded (unless the library starts its own - threads). You can run the Fuzzer on the same corpus in multiple processes - in parallel. - - -The Fuzzer is similar in concept to AFL_, -but uses in-process Fuzzing, which is more fragile, more restrictive, but -potentially much faster as it has no overhead for process start-up. -It uses LLVM's SanitizerCoverage_ instrumentation to get in-process -coverage-feedback - -The code resides in the LLVM repository, requires the fresh Clang compiler to build -and is used to fuzz various parts of LLVM, -but the Fuzzer itself does not (and should not) depend on any -part of LLVM and can be used for other projects w/o requiring the rest of LLVM. - -Flags -===== -The most important flags are:: - - seed 0 Random seed. If 0, seed is generated. - runs -1 Number of individual test runs (-1 for infinite runs). - max_len 64 Maximum length of the test input. - cross_over 1 If 1, cross over inputs. - mutate_depth 5 Apply this number of consecutive mutations to each input. - timeout 1200 Timeout in seconds (if positive). If one unit runs more than this number of seconds the process will abort. - max_total_time 0 If positive, indicates the maximal total time in seconds to run the fuzzer. - help 0 Print help. - merge 0 If 1, the 2-nd, 3-rd, etc corpora will be merged into the 1-st corpus. Only interesting units will be taken. - jobs 0 Number of jobs to run. If jobs >= 1 we spawn this number of jobs in separate worker processes with stdout/stderr redirected to fuzz-JOB.log. - workers 0 Number of simultaneous worker processes to run the jobs. If zero, "min(jobs,NumberOfCpuCores()/2)" is used. - sync_command 0 Execute an external command "<sync_command> <test_corpus>" to synchronize the test corpus. - sync_timeout 600 Minimum timeout between syncs. - use_traces 0 Experimental: use instruction traces - only_ascii 0 If 1, generate only ASCII (isprint+isspace) inputs. - test_single_input "" Use specified file content as test input. Test will be run only once. Useful for debugging a particular case. - artifact_prefix "" Write fuzzing artifacts (crash, timeout, or slow inputs) as $(artifact_prefix)file - exact_artifact_path "" Write the single artifact on failure (crash, timeout) as $(exact_artifact_path). This overrides -artifact_prefix and will not use checksum in the file name. Do not use the same path for several parallel processes. +LibFuzzer is a library for in-process, coverage-guided, evolutionary fuzzing +of other libraries. + +LibFuzzer is similar in concept to American Fuzzy Lop (AFL_), but it performs +all of its fuzzing inside a single process. This in-process fuzzing can be more +restrictive and fragile, but is potentially much faster as there is no overhead +for process start-up. + +The fuzzer is linked with the library under test, and feeds fuzzed inputs to the +library via a specific fuzzing entrypoint (aka "target function"); the fuzzer +then tracks which areas of the code are reached, and generates mutations on the +corpus of input data in order to maximize the code coverage. The code coverage +information for libFuzzer is provided by LLVM's SanitizerCoverage_ +instrumentation. + +Contact: libfuzzer(#)googlegroups.com + +Versions +======== + +LibFuzzer is under active development so a current (or at least very recent) +version of Clang is the only supported variant. + +(If `building Clang from trunk`_ is too time-consuming or difficult, then +the Clang binaries that the Chromium developers build are likely to be +fairly recent: + +.. code-block:: console + + mkdir TMP_CLANG + cd TMP_CLANG + git clone https://chromium.googlesource.com/chromium/src/tools/clang + cd .. + TMP_CLANG/clang/scripts/update.py + +This installs the Clang binary as +``./third_party/llvm-build/Release+Asserts/bin/clang``) + +The libFuzzer code resides in the LLVM repository, and requires a recent Clang +compiler to build (and is used to `fuzz various parts of LLVM itself`_). +However the fuzzer itself does not (and should not) depend on any part of LLVM +infrastructure and can be used for other projects without requiring the rest +of LLVM. + + + +Getting Started +=============== + +.. contents:: + :local: + :depth: 1 + +Building +-------- + +The first step for using libFuzzer on a library is to implement a fuzzing +target function that accepts a sequence of bytes, like this: + +.. code-block:: c++ + + // fuzz_target.cc + extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) { + DoSomethingInterestingWithMyAPI(Data, Size); + return 0; // Non-zero return values are reserved for future use. + } + +Next, build the libFuzzer library as a static archive, without any sanitizer +options. Note that the libFuzzer library contains the ``main()`` function: + +.. code-block:: console + + svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer + # Alternative: get libFuzzer from a dedicated git mirror: + # git clone https://chromium.googlesource.com/chromium/llvm-project/llvm/lib/Fuzzer + clang++ -c -g -O2 -std=c++11 Fuzzer/*.cpp -IFuzzer + ar ruv libFuzzer.a Fuzzer*.o + +Then build the fuzzing target function and the library under test using +the SanitizerCoverage_ option, which instruments the code so that the fuzzer +can retrieve code coverage information (to guide the fuzzing). Linking with +the libFuzzer code then gives an fuzzer executable. + +You should also enable one or more of the *sanitizers*, which help to expose +latent bugs by making incorrect behavior generate errors at runtime: + + - AddressSanitizer_ (ASAN) detects memory access errors. Use `-fsanitize=address`. + - UndefinedBehaviorSanitizer_ (UBSAN) detects the use of various features of C/C++ that are explicitly + listed as resulting in undefined behavior. Use `-fsanitize=undefined -fno-sanitize-recover=undefined` + or any individual UBSAN check, e.g. `-fsanitize=signed-integer-overflow -fno-sanitize-recover=undefined`. + You may combine ASAN and UBSAN in one build. + - MemorySanitizer_ (MSAN) detects uninitialized reads: code whose behavior relies on memory + contents that have not been initialized to a specific value. Use `-fsanitize=memory`. + MSAN can not be combined with other sanirizers and should be used as a seprate build. + +Finally, link with ``libFuzzer.a``:: + + clang -fsanitize-coverage=edge -fsanitize=address your_lib.cc fuzz_target.cc libFuzzer.a -o my_fuzzer + +Corpus +------ + +Coverage-guided fuzzers like libFuzzer rely on a corpus of sample inputs for the +code under test. This corpus should ideally be seeded with a varied collection +of valid and invalid inputs for the code under test; for example, for a graphics +library the initial corpus might hold a variety of different small PNG/JPG/GIF +files. The fuzzer generates random mutations based around the sample inputs in +the current corpus. If a mutation triggers execution of a previously-uncovered +path in the code under test, then that mutation is saved to the corpus for +future variations. + +LibFuzzer will work without any initial seeds, but will be less +efficient if the library under test accepts complex, +structured inputs. + +The corpus can also act as a sanity/regression check, to confirm that the +fuzzing entrypoint still works and that all of the sample inputs run through +the code under test without problems. + +If you have a large corpus (either generated by fuzzing or acquired by other means) +you may want to minimize it while still preserving the full coverage. One way to do that +is to use the `-merge=1` flag: + +.. code-block:: console + + mkdir NEW_CORPUS_DIR # Store minimized corpus here. + ./my_fuzzer -merge=1 NEW_CORPUS_DIR FULL_CORPUS_DIR + +You may use the same flag to add more interesting items to an existing corpus. +Only the inputs that trigger new coverage will be added to the first corpus. + +.. code-block:: console + + ./my_fuzzer -merge=1 CURRENT_CORPUS_DIR NEW_POTENTIALLY_INTERESTING_INPUTS_DIR + + +Running +------- + +To run the fuzzer, first create a Corpus_ directory that holds the +initial "seed" sample inputs: + +.. code-block:: console + + mkdir CORPUS_DIR + cp /some/input/samples/* CORPUS_DIR + +Then run the fuzzer on the corpus directory: + +.. code-block:: console + + ./my_fuzzer CORPUS_DIR # -max_len=1000 -jobs=20 ... + +As the fuzzer discovers new interesting test cases (i.e. test cases that +trigger coverage of new paths through the code under test), those test cases +will be added to the corpus directory. + +By default, the fuzzing process will continue indefinitely – at least until +a bug is found. Any crashes or sanitizer failures will be reported as usual, +stopping the fuzzing process, and the particular input that triggered the bug +will be written to disk (typically as ``crash-<sha1>``, ``leak-<sha1>``, +or ``timeout-<sha1>``). + + +Parallel Fuzzing +---------------- + +Each libFuzzer process is single-threaded, unless the library under test starts +its own threads. However, it is possible to run multiple libFuzzer processes in +parallel with a shared corpus directory; this has the advantage that any new +inputs found by one fuzzer process will be available to the other fuzzer +processes (unless you disable this with the ``-reload=0`` option). + +This is primarily controlled by the ``-jobs=N`` option, which indicates that +that `N` fuzzing jobs should be run to completion (i.e. until a bug is found or +time/iteration limits are reached). These jobs will be run across a set of +worker processes, by default using half of the available CPU cores; the count of +worker processes can be overridden by the ``-workers=N`` option. For example, +running with ``-jobs=30`` on a 12-core machine would run 6 workers by default, +with each worker averaging 5 bugs by completion of the entire process. + + +Options +======= + +To run the fuzzer, pass zero or more corpus directories as command line +arguments. The fuzzer will read test inputs from each of these corpus +directories, and any new test inputs that are generated will be written +back to the first corpus directory: + +.. code-block:: console + + ./fuzzer [-flag1=val1 [-flag2=val2 ...] ] [dir1 [dir2 ...] ] + +If a list of files (rather than directories) are passed to the fuzzer program, +then it will re-run those files as test inputs but will not perform any fuzzing. +In this mode the fuzzer binary can be used as a regression test (e.g. on a +continuous integration system) to check the target function and saved inputs +still work. + +The most important command line options are: + +``-help`` + Print help message. +``-seed`` + Random seed. If 0 (the default), the seed is generated. +``-runs`` + Number of individual test runs, -1 (the default) to run indefinitely. +``-max_len`` + Maximum length of a test input. If 0 (the default), libFuzzer tries to guess + a good value based on the corpus (and reports it). +``-timeout`` + Timeout in seconds, default 1200. If an input takes longer than this timeout, + the process is treated as a failure case. +``-rss_limit_mb`` + Memory usage limit in Mb, default 2048. Use 0 to disable the limit. + If an input requires more than this amount of RSS memory to execute, + the process is treated as a failure case. + The limit is checked in a separate thread every second. + If running w/o ASAN/MSAN, you may use 'ulimit -v' instead. +``-timeout_exitcode`` + Exit code (default 77) to emit when terminating due to timeout, when + ``-abort_on_timeout`` is not set. +``-max_total_time`` + If positive, indicates the maximum total time in seconds to run the fuzzer. + If 0 (the default), run indefinitely. +``-merge`` + If set to 1, any corpus inputs from the 2nd, 3rd etc. corpus directories + that trigger new code coverage will be merged into the first corpus + directory. Defaults to 0. This flag can be used to minimize a corpus. +``-reload`` + If set to 1 (the default), the corpus directory is re-read periodically to + check for new inputs; this allows detection of new inputs that were discovered + by other fuzzing processes. +``-jobs`` + Number of fuzzing jobs to run to completion. Default value is 0, which runs a + single fuzzing process until completion. If the value is >= 1, then this + number of jobs performing fuzzing are run, in a collection of parallel + separate worker processes; each such worker process has its + ``stdout``/``stderr`` redirected to ``fuzz-<JOB>.log``. +``-workers`` + Number of simultaneous worker processes to run the fuzzing jobs to completion + in. If 0 (the default), ``min(jobs, NumberOfCpuCores()/2)`` is used. +``-dict`` + Provide a dictionary of input keywords; see Dictionaries_. +``-use_counters`` + Use `coverage counters`_ to generate approximate counts of how often code + blocks are hit; defaults to 1. +``-use_traces`` + Use instruction traces (experimental, defaults to 0); see `Data-flow-guided fuzzing`_. +``-only_ascii`` + If 1, generate only ASCII (``isprint``+``isspace``) inputs. Defaults to 0. +``-artifact_prefix`` + Provide a prefix to use when saving fuzzing artifacts (crash, timeout, or + slow inputs) as ``$(artifact_prefix)file``. Defaults to empty. +``-exact_artifact_path`` + Ignored if empty (the default). If non-empty, write the single artifact on + failure (crash, timeout) as ``$(exact_artifact_path)``. This overrides + ``-artifact_prefix`` and will not use checksum in the file name. Do not use + the same path for several parallel processes. +``-print_final_stats`` + If 1, print statistics at exit. Defaults to 0. +``-detect-leaks`` + If 1 (default) and if LeakSanitizer is enabled + try to detect memory leaks during fuzzing (i.e. not only at shut down). +``-close_fd_mask`` + Indicate output streams to close at startup. Be careful, this will + remove diagnostic output from target code (e.g. messages on assert failure). + + - 0 (default): close neither ``stdout`` nor ``stderr`` + - 1 : close ``stdout`` + - 2 : close ``stderr`` + - 3 : close both ``stdout`` and ``stderr``. For the full list of flags run the fuzzer binary with ``-help=1``. -Usage examples -============== +Output +====== + +During operation the fuzzer prints information to ``stderr``, for example:: + + INFO: Seed: 3338750330 + Loaded 1024/1211 files from corpus/ + INFO: -max_len is not provided, using 64 + #0 READ units: 1211 exec/s: 0 + #1211 INITED cov: 2575 bits: 8855 indir: 5 units: 830 exec/s: 1211 + #1422 NEW cov: 2580 bits: 8860 indir: 5 units: 831 exec/s: 1422 L: 21 MS: 1 ShuffleBytes- + #1688 NEW cov: 2581 bits: 8865 indir: 5 units: 832 exec/s: 1688 L: 19 MS: 2 EraseByte-CrossOver- + #1734 NEW cov: 2583 bits: 8879 indir: 5 units: 833 exec/s: 1734 L: 27 MS: 3 ChangeBit-EraseByte-ShuffleBytes- + ... + +The early parts of the output include information about the fuzzer options and +configuration, including the current random seed (in the ``Seed:`` line; this +can be overridden with the ``-seed=N`` flag). + +Further output lines have the form of an event code and statistics. The +possible event codes are: + +``READ`` + The fuzzer has read in all of the provided input samples from the corpus + directories. +``INITED`` + The fuzzer has completed initialization, which includes running each of + the initial input samples through the code under test. +``NEW`` + The fuzzer has created a test input that covers new areas of the code + under test. This input will be saved to the primary corpus directory. +``pulse`` + The fuzzer has generated 2\ :sup:`n` inputs (generated periodically to reassure + the user that the fuzzer is still working). +``DONE`` + The fuzzer has completed operation because it has reached the specified + iteration limit (``-runs``) or time limit (``-max_total_time``). +``MIN<n>`` + The fuzzer is minimizing the combination of input corpus directories into + a single unified corpus (due to the ``-merge`` command line option). +``RELOAD`` + The fuzzer is performing a periodic reload of inputs from the corpus + directory; this allows it to discover any inputs discovered by other + fuzzer processes (see `Parallel Fuzzing`_). + +Each output line also reports the following statistics (when non-zero): + +``cov:`` + Total number of code blocks or edges covered by the executing the current + corpus. +``bits:`` + Rough measure of the number of code blocks or edges covered, and how often; + only valid if the fuzzer is run with ``-use_counters=1``. +``indir:`` + Number of distinct function `caller-callee pairs`_ executed with the + current corpus; only valid if the code under test was built with + ``-fsanitize-coverage=indirect-calls``. +``units:`` + Number of entries in the current input corpus. +``exec/s:`` + Number of fuzzer iterations per second. + +For ``NEW`` events, the output line also includes information about the mutation +operation that produced the new input: + +``L:`` + Size of the new input in bytes. +``MS: <n> <operations>`` + Count and list of the mutation operations used to generate the input. + + +Examples +======== +.. contents:: + :local: + :depth: 1 Toy example ----------- -A simple function that does something interesting if it receives the input "HI!":: +A simple function that does something interesting if it receives the input +"HI!":: - cat << EOF >> test_fuzzer.cc + cat << EOF > test_fuzzer.cc #include <stdint.h> #include <stddef.h> extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) { @@ -95,32 +378,37 @@ A simple function that does something interesting if it receives the input "HI!" return 0; } EOF - # Get lib/Fuzzer. Assuming that you already have fresh clang in PATH. - svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer - # Build lib/Fuzzer files. - clang -c -g -O2 -std=c++11 Fuzzer/*.cpp -IFuzzer - # Build test_fuzzer.cc with asan and link against lib/Fuzzer. - clang++ -fsanitize=address -fsanitize-coverage=edge test_fuzzer.cc Fuzzer*.o + # Build test_fuzzer.cc with asan and link against libFuzzer.a + clang++ -fsanitize=address -fsanitize-coverage=edge test_fuzzer.cc libFuzzer.a # Run the fuzzer with no corpus. ./a.out -You should get ``Illegal instruction (core dumped)`` pretty quickly. +You should get an error pretty quickly:: + + #0 READ units: 1 exec/s: 0 + #1 INITED cov: 3 units: 1 exec/s: 0 + #2 NEW cov: 5 units: 2 exec/s: 0 L: 64 MS: 0 + #19237 NEW cov: 9 units: 3 exec/s: 0 L: 64 MS: 0 + #20595 NEW cov: 10 units: 4 exec/s: 0 L: 1 MS: 4 ChangeASCIIInt-ShuffleBytes-ChangeByte-CrossOver- + #34574 NEW cov: 13 units: 5 exec/s: 0 L: 2 MS: 3 ShuffleBytes-CrossOver-ChangeBit- + #34807 NEW cov: 15 units: 6 exec/s: 0 L: 3 MS: 1 CrossOver- + ==31511== ERROR: libFuzzer: deadly signal + ... + artifact_prefix='./'; Test unit written to ./crash-b13e8756b13a00cf168300179061fb4b91fefbed + PCRE2 ----- -Here we show how to use lib/Fuzzer on something real, yet simple: pcre2_:: +Here we show how to use libFuzzer on something real, yet simple: pcre2_:: COV_FLAGS=" -fsanitize-coverage=edge,indirect-calls,8bit-counters" # Get PCRE2 - svn co svn://vcs.exim.org/pcre2/code/trunk pcre - # Get lib/Fuzzer. Assuming that you already have fresh clang in PATH. - svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer - # Build PCRE2 with AddressSanitizer and coverage. - (cd pcre; ./autogen.sh; CC="clang -fsanitize=address $COV_FLAGS" ./configure --prefix=`pwd`/../inst && make -j && make install) - # Build lib/Fuzzer files. - clang -c -g -O2 -std=c++11 Fuzzer/*.cpp -IFuzzer - # Build the actual function that does something interesting with PCRE2. + wget ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-10.20.tar.gz + tar xf pcre2-10.20.tar.gz + # Build PCRE2 with AddressSanitizer and coverage; requires autotools. + (cd pcre2-10.20; ./autogen.sh; CC="clang -fsanitize=address $COV_FLAGS" ./configure --prefix=`pwd`/../inst && make -j && make install) + # Build the fuzzing target function that does something interesting with PCRE2. cat << EOF > pcre_fuzzer.cc #include <string.h> #include <stdint.h> @@ -141,61 +429,67 @@ Here we show how to use lib/Fuzzer on something real, yet simple: pcre2_:: EOF clang++ -g -fsanitize=address $COV_FLAGS -c -std=c++11 -I inst/include/ pcre_fuzzer.cc # Link. - clang++ -g -fsanitize=address -Wl,--whole-archive inst/lib/*.a -Wl,-no-whole-archive Fuzzer*.o pcre_fuzzer.o -o pcre_fuzzer + clang++ -g -fsanitize=address -Wl,--whole-archive inst/lib/*.a -Wl,-no-whole-archive libFuzzer.a pcre_fuzzer.o -o pcre_fuzzer This will give you a binary of the fuzzer, called ``pcre_fuzzer``. -Now, create a directory that will hold the test corpus:: +Now, create a directory that will hold the test corpus: + +.. code-block:: console mkdir -p CORPUS For simple input languages like regular expressions this is all you need. -For more complicated inputs populate the directory with some input samples. -Now run the fuzzer with the corpus dir as the only parameter:: +For more complicated/structured inputs, the fuzzer works much more efficiently +if you can populate the corpus directory with a variety of valid and invalid +inputs for the code under test. +Now run the fuzzer with the corpus directory as the only parameter: - ./pcre_fuzzer ./CORPUS +.. code-block:: console -You will see output like this:: + ./pcre_fuzzer ./CORPUS - Seed: 1876794929 - #0 READ cov 0 bits 0 units 1 exec/s 0 - #1 pulse cov 3 bits 0 units 1 exec/s 0 - #1 INITED cov 3 bits 0 units 1 exec/s 0 - #2 pulse cov 208 bits 0 units 1 exec/s 0 - #2 NEW cov 208 bits 0 units 2 exec/s 0 L: 64 - #3 NEW cov 217 bits 0 units 3 exec/s 0 L: 63 - #4 pulse cov 217 bits 0 units 3 exec/s 0 +Initially, you will see Output_ like this:: -* The ``Seed:`` line shows you the current random seed (you can change it with ``-seed=N`` flag). -* The ``READ`` line shows you how many input files were read (since you passed an empty dir there were inputs, but one dummy input was synthesised). -* The ``INITED`` line shows you that how many inputs will be fuzzed. -* The ``NEW`` lines appear with the fuzzer finds a new interesting input, which is saved to the CORPUS dir. If multiple corpus dirs are given, the first one is used. -* The ``pulse`` lines appear periodically to show the current status. + INFO: Seed: 2938818941 + INFO: -max_len is not provided, using 64 + INFO: A corpus is not provided, starting from an empty corpus + #0 READ units: 1 exec/s: 0 + #1 INITED cov: 3 bits: 3 units: 1 exec/s: 0 + #2 NEW cov: 176 bits: 176 indir: 3 units: 2 exec/s: 0 L: 64 MS: 0 + #8 NEW cov: 176 bits: 179 indir: 3 units: 3 exec/s: 0 L: 63 MS: 2 ChangeByte-EraseByte- + ... + #14004 NEW cov: 1500 bits: 4536 indir: 5 units: 406 exec/s: 0 L: 54 MS: 3 ChangeBit-ChangeBit-CrossOver- Now, interrupt the fuzzer and run it again the same way. You will see:: - Seed: 1879995378 - #0 READ cov 0 bits 0 units 564 exec/s 0 - #1 pulse cov 502 bits 0 units 564 exec/s 0 + INFO: Seed: 3398349082 + INFO: -max_len is not provided, using 64 + #0 READ units: 405 exec/s: 0 + #405 INITED cov: 1499 bits: 4535 indir: 5 units: 286 exec/s: 0 + #587 NEW cov: 1499 bits: 4540 indir: 5 units: 287 exec/s: 0 L: 52 MS: 2 InsertByte-EraseByte- + #667 NEW cov: 1501 bits: 4542 indir: 5 units: 288 exec/s: 0 L: 39 MS: 2 ChangeBit-InsertByte- + #672 NEW cov: 1501 bits: 4543 indir: 5 units: 289 exec/s: 0 L: 15 MS: 2 ChangeASCIIInt-ChangeBit- + #739 NEW cov: 1501 bits: 4544 indir: 5 units: 290 exec/s: 0 L: 64 MS: 4 ShuffleBytes-ChangeASCIIInt-InsertByte-ChangeBit- ... - #512 pulse cov 2933 bits 0 units 564 exec/s 512 - #564 INITED cov 2991 bits 0 units 344 exec/s 564 - #1024 pulse cov 2991 bits 0 units 344 exec/s 1024 - #1455 NEW cov 2995 bits 0 units 345 exec/s 1455 L: 49 -This time you were running the fuzzer with a non-empty input corpus (564 items). -As the first step, the fuzzer minimized the set to produce 344 interesting items (the ``INITED`` line) +On the second execution the fuzzer has a non-empty input corpus (405 items). As +the first step, the fuzzer minimized this corpus (the ``INITED`` line) to +produce 286 interesting items, omitting inputs that do not hit any additional +code. -It is quite convenient to store test corpuses in git. -As an example, here is a git repository with test inputs for the above PCRE2 fuzzer:: +(Aside: although the fuzzer only saves new inputs that hit additional code, this +does not mean that the corpus as a whole is kept minimized. For example, if +an input hitting A-B-C then an input that hits A-B-C-D are generated, +they will both be saved, even though the latter subsumes the former.) - git clone https://github.com/kcc/fuzzing-with-sanitizers.git - ./pcre_fuzzer ./fuzzing-with-sanitizers/pcre2/C1/ -You may run ``N`` independent fuzzer jobs in parallel on ``M`` CPUs:: +You may run ``N`` independent fuzzer jobs in parallel on ``M`` CPUs: + +.. code-block:: console N=100; M=4; ./pcre_fuzzer ./CORPUS -jobs=$N -workers=$M -By default (``-reload=1``) the fuzzer processes will periodically scan the CORPUS directory +By default (``-reload=1``) the fuzzer processes will periodically scan the corpus directory and reload any new tests. This way the test inputs found by one process will be picked up by all others. @@ -205,15 +499,15 @@ Heartbleed ---------- Remember Heartbleed_? As it was recently `shown <https://blog.hboeck.de/archives/868-How-Heartbleed-couldve-been-found.html>`_, -fuzzing with AddressSanitizer can find Heartbleed. Indeed, here are the step-by-step instructions -to find Heartbleed with LibFuzzer:: +fuzzing with AddressSanitizer_ can find Heartbleed. Indeed, here are the step-by-step instructions +to find Heartbleed with libFuzzer:: wget https://www.openssl.org/source/openssl-1.0.1f.tar.gz tar xf openssl-1.0.1f.tar.gz COV_FLAGS="-fsanitize-coverage=edge,indirect-calls" # -fsanitize-coverage=8bit-counters (cd openssl-1.0.1f/ && ./config && make -j 32 CC="clang -g -fsanitize=address $COV_FLAGS") - # Get and build LibFuzzer + # Get and build libFuzzer svn co http://llvm.org/svn/llvm-project/llvm/trunk/lib/Fuzzer clang -c -g -O2 -std=c++11 Fuzzer/*.cpp -IFuzzer # Get examples of key/pem files. @@ -267,14 +561,16 @@ Voila:: #2 0x580be3 in ssl3_read_bytes openssl-1.0.1f/ssl/s3_pkt.c:1092:4 Note: a `similar fuzzer <https://boringssl.googlesource.com/boringssl/+/HEAD/FUZZING.md>`_ -is now a part of the boringssl source tree. +is now a part of the BoringSSL_ source tree. Advanced features ================= +.. contents:: + :local: + :depth: 1 Dictionaries ------------ -*EXPERIMENTAL*. LibFuzzer supports user-supplied dictionaries with input language keywords or other interesting byte sequences (e.g. multi-byte magic values). Use ``-dict=DICTIONARY_FILE``. For some input languages using a dictionary @@ -304,16 +600,51 @@ It will later use those recorded inputs during mutations. This mode can be combined with DataFlowSanitizer_ to achieve better sensitivity. +Fuzzer-friendly build mode +--------------------------- +Sometimes the code under test is not fuzzing-friendly. Examples: + + - The target code uses a PRNG seeded e.g. by system time and + thus two consequent invocations may potentially execute different code paths + even if the end result will be the same. This will cause a fuzzer to treat + two similar inputs as significantly different and it will blow up the test corpus. + E.g. libxml uses ``rand()`` inside its hash table. + - The target code uses checksums to protect from invalid inputs. + E.g. png checks CRC for every chunk. + +In many cases it makes sense to build a special fuzzing-friendly build +with certain fuzzing-unfriendly features disabled. We propose to use a common build macro +for all such cases for consistency: ``FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION``. + +.. code-block:: c++ + + void MyInitPRNG() { + #ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION + // In fuzzing mode the behavior of the code should be deterministic. + srand(0); + #else + srand(time(0)); + #endif + } + + + AFL compatibility ----------------- -LibFuzzer can be used in parallel with AFL_ on the same test corpus. +LibFuzzer can be used together with AFL_ on the same test corpus. Both fuzzers expect the test corpus to reside in a directory, one file per input. -You can run both fuzzers on the same corpus in parallel:: +You can run both fuzzers on the same corpus, one after another: + +.. code-block:: console - ./afl-fuzz -i testcase_dir -o findings_dir /path/to/program -r @@ + ./afl-fuzz -i testcase_dir -o findings_dir /path/to/program @@ ./llvm-fuzz testcase_dir findings_dir # Will write new tests to testcase_dir Periodically restart both fuzzers so that they can use each other's findings. +Currently, there is no simple way to run both fuzzing engines in parallel while sharing the same corpus dir. + +You may also use AFL on your target function ``LLVMFuzzerTestOneInput``: +see an example `here <https://github.com/llvm-mirror/llvm/blob/master/lib/Fuzzer/afl/afl_driver.cpp>`__. How good is my fuzzer? ---------------------- @@ -321,14 +652,20 @@ How good is my fuzzer? Once you implement your target function ``LLVMFuzzerTestOneInput`` and fuzz it to death, you will want to know whether the function or the corpus can be improved further. One easy to use metric is, of course, code coverage. -You can get the coverage for your corpus like this:: +You can get the coverage for your corpus like this: + +.. code-block:: console + + ASAN_OPTIONS=coverage=1:html_cov_report=1 ./fuzzer CORPUS_DIR -runs=0 - ASAN_OPTIONS=coverage_pcs=1 ./fuzzer CORPUS_DIR -runs=0 +This will run all tests in the CORPUS_DIR but will not perform any fuzzing. +At the end of the process it will dump a single html file with coverage information. +See SanitizerCoverage_ for details. -This will run all the tests in the CORPUS_DIR but will not generate any new tests -and dump covered PCs to disk before exiting. -Then you can subtract the set of covered PCs from the set of all instrumented PCs in the binary, -see SanitizerCoverage_ for details. +You may also use other ways to visualize coverage, +e.g. using `Clang coverage <http://clang.llvm.org/docs/SourceBasedCodeCoverage.html>`_, +but those will require +you to rebuild the code with different compiler flags. User-supplied mutators ---------------------- @@ -336,21 +673,83 @@ User-supplied mutators LibFuzzer allows to use custom (user-supplied) mutators, see FuzzerInterface.h_ +Startup initialization +---------------------- +If the library being tested needs to be initialized, there are several options. + +The simplest way is to have a statically initialized global object inside +`LLVMFuzzerTestOneInput` (or in global scope if that works for you): + +.. code-block:: c++ + + extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) { + static bool Initialized = DoInitialization(); + ... + +Alternatively, you may define an optional init function and it will receive +the program arguments that you can read and modify. Do this **only** if you +realy need to access ``argv``/``argc``. + +.. code-block:: c++ + + extern "C" int LLVMFuzzerInitialize(int *argc, char ***argv) { + ReadAndMaybeModify(argc, argv); + return 0; + } + + +Leaks +----- + +Binaries built with AddressSanitizer_ or LeakSanitizer_ will try to detect +memory leaks at the process shutdown. +For in-process fuzzing this is inconvenient +since the fuzzer needs to report a leak with a reproducer as soon as the leaky +mutation is found. However, running full leak detection after every mutation +is expensive. + +By default (``-detect_leaks=1``) libFuzzer will count the number of +``malloc`` and ``free`` calls when executing every mutation. +If the numbers don't match (which by itself doesn't mean there is a leak) +libFuzzer will invoke the more expensive LeakSanitizer_ +pass and if the actual leak is found, it will be reported with the reproducer +and the process will exit. + +If your target has massive leaks and the leak detection is disabled +you will eventually run out of RAM (see the ``-rss_limit_mb`` flag). + + +Developing libFuzzer +==================== + +Building libFuzzer as a part of LLVM project and running its test requires +fresh clang as the host compiler and special CMake configuration: + +.. code-block:: console + + cmake -GNinja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_USE_SANITIZER=Address -DLLVM_USE_SANITIZE_COVERAGE=YES -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON /path/to/llvm + ninja check-fuzzer + + Fuzzing components of LLVM ========================== +.. contents:: + :local: + :depth: 1 + +To build any of the LLVM fuzz targets use the build instructions above. clang-format-fuzzer ------------------- The inputs are random pieces of C++-like text. -Build (make sure to use fresh clang as the host compiler):: +.. code-block:: console - cmake -GNinja -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_USE_SANITIZER=Address -DLLVM_USE_SANITIZE_COVERAGE=YES -DCMAKE_BUILD_TYPE=Release /path/to/llvm ninja clang-format-fuzzer mkdir CORPUS_DIR ./bin/clang-format-fuzzer CORPUS_DIR -Optionally build other kinds of binaries (asan+Debug, msan, ubsan, etc). +Optionally build other kinds of binaries (ASan+Debug, MSan, UBSan, etc). Tracking bug: https://llvm.org/bugs/show_bug.cgi?id=23052 @@ -380,37 +779,27 @@ finds an invalid instruction or runs out of data. Please note that the command line interface differs slightly from that of other fuzzers. The fuzzer arguments should follow ``--fuzzer-args`` and should have a single dash, while other arguments control the operation mode and target in a -similar manner to ``llvm-mc`` and should have two dashes. For example:: +similar manner to ``llvm-mc`` and should have two dashes. For example: + +.. code-block:: console llvm-mc-fuzzer --triple=aarch64-linux-gnu --disassemble --fuzzer-args -max_len=4 -jobs=10 Buildbot -------- -We have a buildbot that runs the above fuzzers for LLVM components -24/7/365 at http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer . - -Pre-fuzzed test inputs in git ------------------------------ - -The buildbot occumulates large test corpuses over time. -The corpuses are stored in git on github and can be used like this:: - - git clone https://github.com/kcc/fuzzing-with-sanitizers.git - bin/clang-format-fuzzer fuzzing-with-sanitizers/llvm/clang-format/C1 - bin/clang-fuzzer fuzzing-with-sanitizers/llvm/clang/C1/ - bin/llvm-as-fuzzer fuzzing-with-sanitizers/llvm/llvm-as/C1 -only_ascii=1 - +A buildbot continuously runs the above fuzzers for LLVM components, with results +shown at http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fuzzer . FAQ ========================= -Q. Why Fuzzer does not use any of the LLVM support? ---------------------------------------------------- +Q. Why doesn't libFuzzer use any of the LLVM support? +----------------------------------------------------- There are two reasons. -First, we want this library to be used outside of the LLVM w/o users having to +First, we want this library to be used outside of the LLVM without users having to build the rest of LLVM. This may sound unconvincing for many LLVM folks, but in practice the need for building the whole LLVM frightens many potential users -- and we want more users to use this code. @@ -422,19 +811,17 @@ coverage set of the process (since the fuzzer is in-process). In other words, by using more external dependencies we will slow down the fuzzer while the main reason for it to exist is extreme speed. -Q. What about Windows then? The Fuzzer contains code that does not build on Windows. +Q. What about Windows then? The fuzzer contains code that does not build on Windows. ------------------------------------------------------------------------------------ -The sanitizer coverage support does not work on Windows either as of 01/2015. -Once it's there, we'll need to re-implement OS-specific parts (I/O, signals). +Volunteers are welcome. Q. When this Fuzzer is not a good solution for a problem? --------------------------------------------------------- * If the test inputs are validated by the target library and the validator - asserts/crashes on invalid inputs, the in-process fuzzer is not applicable - (we could use fork() w/o exec, but it comes with extra overhead). -* Bugs in the target library may accumulate w/o being detected. E.g. a memory + asserts/crashes on invalid inputs, in-process fuzzing is not applicable. +* Bugs in the target library may accumulate without being detected. E.g. a memory corruption that goes undetected at first and then leads to a crash while testing another input. This is why it is highly recommended to run this in-process fuzzer with all sanitizers to detect most bugs on the spot. @@ -442,7 +829,7 @@ Q. When this Fuzzer is not a good solution for a problem? consumption and infinite loops in the target library (still possible). * The target library should not have significant global state that is not reset between the runs. -* Many interesting target libs are not designed in a way that supports +* Many interesting target libraries are not designed in a way that supports the in-process fuzzer interface (e.g. require a file path instead of a byte array). * If a single test run takes a considerable fraction of a second (or @@ -454,18 +841,16 @@ Q. So, what exactly this Fuzzer is good for? -------------------------------------------- This Fuzzer might be a good choice for testing libraries that have relatively -small inputs, each input takes < 1ms to run, and the library code is not expected +small inputs, each input takes < 10ms to run, and the library code is not expected to crash on invalid inputs. -Examples: regular expression matchers, text or binary format parsers. +Examples: regular expression matchers, text or binary format parsers, compression, +network, crypto. Trophies ======== * GLIBC: https://sourceware.org/glibc/wiki/FuzzingLibc -* MUSL LIBC: - - * http://git.musl-libc.org/cgit/musl/commit/?id=39dfd58417ef642307d90306e1c7e50aaec5a35c - * http://www.openwall.com/lists/oss-security/2015/03/30/3 +* MUSL LIBC: `[1] <http://git.musl-libc.org/cgit/musl/commit/?id=39dfd58417ef642307d90306e1c7e50aaec5a35c>`__ `[2] <http://www.openwall.com/lists/oss-security/2015/03/30/3>`__ * `pugixml <https://github.com/zeux/pugixml/issues/39>`_ @@ -482,23 +867,39 @@ Trophies * `Python <http://bugs.python.org/issue25388>`_ -* OpenSSL/BoringSSL: `[1] <https://boringssl.googlesource.com/boringssl/+/cb852981cd61733a7a1ae4fd8755b7ff950e857d>`_ +* OpenSSL/BoringSSL: `[1] <https://boringssl.googlesource.com/boringssl/+/cb852981cd61733a7a1ae4fd8755b7ff950e857d>`_ `[2] <https://openssl.org/news/secadv/20160301.txt>`_ `[3] <https://boringssl.googlesource.com/boringssl/+/2b07fa4b22198ac02e0cee8f37f3337c3dba91bc>`_ `[4] <https://boringssl.googlesource.com/boringssl/+/6b6e0b20893e2be0e68af605a60ffa2cbb0ffa64>`_ `[5] <https://github.com/openssl/openssl/pull/931/commits/dd5ac557f052cc2b7f718ac44a8cb7ac6f77dca8>`_ `[6] <https://github.com/openssl/openssl/pull/931/commits/19b5b9194071d1d84e38ac9a952e715afbc85a81>`_ * `Libxml2 - <https://bugzilla.gnome.org/buglist.cgi?bug_status=__all__&content=libFuzzer&list_id=68957&order=Importance&product=libxml2&query_format=specific>`_ + <https://bugzilla.gnome.org/buglist.cgi?bug_status=__all__&content=libFuzzer&list_id=68957&order=Importance&product=libxml2&query_format=specific>`_ and `[HT206167] <https://support.apple.com/en-gb/HT206167>`_ (CVE-2015-5312, CVE-2015-7500, CVE-2015-7942) * `Linux Kernel's BPF verifier <https://github.com/iovisor/bpf-fuzzer>`_ +* Capstone: `[1] <https://github.com/aquynh/capstone/issues/600>`__ `[2] <https://github.com/aquynh/capstone/commit/6b88d1d51eadf7175a8f8a11b690684443b11359>`__ + +* file:`[1] <http://bugs.gw.com/view.php?id=550>`__ `[2] <http://bugs.gw.com/view.php?id=551>`__ `[3] <http://bugs.gw.com/view.php?id=553>`__ `[4] <http://bugs.gw.com/view.php?id=554>`__ + +* Radare2: `[1] <https://github.com/revskills?tab=contributions&from=2016-04-09>`__ + +* gRPC: `[1] <https://github.com/grpc/grpc/pull/6071/commits/df04c1f7f6aec6e95722ec0b023a6b29b6ea871c>`__ `[2] <https://github.com/grpc/grpc/pull/6071/commits/22a3dfd95468daa0db7245a4e8e6679a52847579>`__ `[3] <https://github.com/grpc/grpc/pull/6071/commits/9cac2a12d9e181d130841092e9d40fa3309d7aa7>`__ `[4] <https://github.com/grpc/grpc/pull/6012/commits/82a91c91d01ce9b999c8821ed13515883468e203>`__ `[5] <https://github.com/grpc/grpc/pull/6202/commits/2e3e0039b30edaf89fb93bfb2c1d0909098519fa>`__ `[6] <https://github.com/grpc/grpc/pull/6106/files>`__ + +* WOFF2: `[1] <https://github.com/google/woff2/commit/a15a8ab>`__ + * LLVM: `Clang <https://llvm.org/bugs/show_bug.cgi?id=23057>`_, `Clang-format <https://llvm.org/bugs/show_bug.cgi?id=23052>`_, `libc++ <https://llvm.org/bugs/show_bug.cgi?id=24411>`_, `llvm-as <https://llvm.org/bugs/show_bug.cgi?id=24639>`_, Disassembler: http://reviews.llvm.org/rL247405, http://reviews.llvm.org/rL247414, http://reviews.llvm.org/rL247416, http://reviews.llvm.org/rL247417, http://reviews.llvm.org/rL247420, http://reviews.llvm.org/rL247422. .. _pcre2: http://www.pcre.org/ - .. _AFL: http://lcamtuf.coredump.cx/afl/ - .. _SanitizerCoverage: http://clang.llvm.org/docs/SanitizerCoverage.html .. _SanitizerCoverageTraceDataFlow: http://clang.llvm.org/docs/SanitizerCoverage.html#tracing-data-flow .. _DataFlowSanitizer: http://clang.llvm.org/docs/DataFlowSanitizer.html - +.. _AddressSanitizer: http://clang.llvm.org/docs/AddressSanitizer.html +.. _LeakSanitizer: http://clang.llvm.org/docs/LeakSanitizer.html .. _Heartbleed: http://en.wikipedia.org/wiki/Heartbleed - .. _FuzzerInterface.h: https://github.com/llvm-mirror/llvm/blob/master/lib/Fuzzer/FuzzerInterface.h +.. _3.7.0: http://llvm.org/releases/3.7.0/docs/LibFuzzer.html +.. _building Clang from trunk: http://clang.llvm.org/get_started.html +.. _MemorySanitizer: http://clang.llvm.org/docs/MemorySanitizer.html +.. _UndefinedBehaviorSanitizer: http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html +.. _`coverage counters`: http://clang.llvm.org/docs/SanitizerCoverage.html#coverage-counters +.. _`caller-callee pairs`: http://clang.llvm.org/docs/SanitizerCoverage.html#caller-callee-coverage +.. _BoringSSL: https://boringssl.googlesource.com/boringssl/ +.. _`fuzz various parts of LLVM itself`: `Fuzzing components of LLVM`_ diff --git a/gnu/llvm/docs/LinkTimeOptimization.rst b/gnu/llvm/docs/LinkTimeOptimization.rst index 55a7486874a..9c1e5607596 100644 --- a/gnu/llvm/docs/LinkTimeOptimization.rst +++ b/gnu/llvm/docs/LinkTimeOptimization.rst @@ -87,9 +87,9 @@ To compile, run: .. code-block:: console - % clang -emit-llvm -c a.c -o a.o # <-- a.o is LLVM bitcode file + % clang -flto -c a.c -o a.o # <-- a.o is LLVM bitcode file % clang -c main.c -o main.o # <-- main.o is native object file - % clang a.o main.o -o main # <-- standard link command without modifications + % clang -flto a.o main.o -o main # <-- standard link command with -flto * In this example, the linker recognizes that ``foo2()`` is an externally visible symbol defined in LLVM bitcode file. The linker completes its usual diff --git a/gnu/llvm/docs/MIRLangRef.rst b/gnu/llvm/docs/MIRLangRef.rst index a5f8c8c743a..f6ee6ccd050 100644 --- a/gnu/llvm/docs/MIRLangRef.rst +++ b/gnu/llvm/docs/MIRLangRef.rst @@ -111,7 +111,6 @@ Here is an example of a YAML document that contains an LLVM module: .. code-block:: llvm - --- | define i32 @inc(i32* %x) { entry: %0 = load i32, i32* %x @@ -119,7 +118,6 @@ Here is an example of a YAML document that contains an LLVM module: store i32 %1, i32* %x ret i32 %1 } - ... .. _YAML block literal string: http://www.yaml.org/spec/1.2/spec.html#id2795688 @@ -129,7 +127,7 @@ Machine Functions The remaining YAML documents contain the machine functions. This is an example of such YAML document: -.. code-block:: llvm +.. code-block:: text --- name: inc @@ -172,7 +170,7 @@ A machine basic block is defined in a single block definition source construct that contains the block's ID. The example below defines two blocks that have an ID of zero and one: -.. code-block:: llvm +.. code-block:: text bb.0: <instructions> @@ -182,7 +180,7 @@ The example below defines two blocks that have an ID of zero and one: A machine basic block can also have a name. It should be specified after the ID in the block's definition: -.. code-block:: llvm +.. code-block:: text bb.0.entry: ; This block's name is "entry" <instructions> @@ -196,7 +194,7 @@ Block References The machine basic blocks are identified by their ID numbers. Individual blocks are referenced using the following syntax: -.. code-block:: llvm +.. code-block:: text %bb.<id>[.<name>] @@ -213,7 +211,7 @@ Successors The machine basic block's successors have to be specified before any of the instructions: -.. code-block:: llvm +.. code-block:: text bb.0.entry: successors: %bb.1.then, %bb.2.else @@ -227,7 +225,7 @@ The branch weights can be specified in brackets after the successor blocks. The example below defines a block that has two successors with branch weights of 32 and 16: -.. code-block:: llvm +.. code-block:: text bb.0.entry: successors: %bb.1.then(32), %bb.2.else(16) @@ -240,7 +238,7 @@ Live In Registers The machine basic block's live in registers have to be specified before any of the instructions: -.. code-block:: llvm +.. code-block:: text bb.0.entry: liveins: %edi, %esi @@ -255,7 +253,7 @@ Miscellaneous Attributes The attributes ``IsAddressTaken``, ``IsLandingPad`` and ``Alignment`` can be specified in brackets after the block's definition: -.. code-block:: llvm +.. code-block:: text bb.0.entry (address-taken): <instructions> @@ -278,7 +276,7 @@ The instruction's name is usually specified before the operands. The example below shows an instance of the X86 ``RETQ`` instruction with a single machine operand: -.. code-block:: llvm +.. code-block:: text RETQ %eax @@ -287,7 +285,7 @@ operands, the instruction's name has to be specified after them. The example below shows an instance of the AArch64 ``LDPXpost`` instruction with three defined register operands: -.. code-block:: llvm +.. code-block:: text %sp, %fp, %lr = LDPXpost %sp, 2 @@ -303,7 +301,7 @@ Instruction Flags The flag ``frame-setup`` can be specified before the instruction's name: -.. code-block:: llvm +.. code-block:: text %fp = frame-setup ADDXri %sp, 0, 0 @@ -321,13 +319,13 @@ but they can also be used in a number of other places, like the The physical registers are identified by their name. They use the following syntax: -.. code-block:: llvm +.. code-block:: text %<name> The example below shows three X86 physical registers: -.. code-block:: llvm +.. code-block:: text %eax %r15 @@ -336,13 +334,13 @@ The example below shows three X86 physical registers: The virtual registers are identified by their ID number. They use the following syntax: -.. code-block:: llvm +.. code-block:: text %<id> Example: -.. code-block:: llvm +.. code-block:: text %0 @@ -366,7 +364,7 @@ The immediate machine operands are untyped, 64-bit signed integers. The example below shows an instance of the X86 ``MOV32ri`` instruction that has an immediate machine operand ``-42``: -.. code-block:: llvm +.. code-block:: text %eax = MOV32ri -42 @@ -384,14 +382,14 @@ machine operands. The register operands can also have optional and a reference to the tied register operand. The full syntax of a register operand is shown below: -.. code-block:: llvm +.. code-block:: text [<flags>] <register> [ :<subregister-idx-name> ] [ (tied-def <tied-op>) ] This example shows an instance of the X86 ``XOR32rr`` instruction that has 5 register operands with different register flags: -.. code-block:: llvm +.. code-block:: text dead %eax = XOR32rr undef %eax, undef %eax, implicit-def dead %eflags, implicit-def %al @@ -446,7 +444,7 @@ the subregister indices. The example below shows an instance of the ``COPY`` pseudo instruction that uses the X86 ``sub_8bit`` subregister index to copy 8 lower bits from the 32-bit virtual register 0 to the 8-bit virtual register 1: -.. code-block:: llvm +.. code-block:: text %1 = COPY %0:sub_8bit @@ -461,7 +459,7 @@ The global value machine operands reference the global values from the The example below shows an instance of the X86 ``MOV64rm`` instruction that has a global value operand named ``G``: -.. code-block:: llvm +.. code-block:: text %rax = MOV64rm %rip, 1, _, @G, _ diff --git a/gnu/llvm/docs/MarkedUpDisassembly.rst b/gnu/llvm/docs/MarkedUpDisassembly.rst index cc4dbc817e0..df8befe45cd 100644 --- a/gnu/llvm/docs/MarkedUpDisassembly.rst +++ b/gnu/llvm/docs/MarkedUpDisassembly.rst @@ -70,7 +70,7 @@ clients. For example, a possible annotation of an ARM load of a stack-relative location might be annotated as: -.. code-block:: nasm +.. code-block:: text ldr <reg gpr:r0>, <mem regoffset:[<reg gpr:sp>, <imm:#4>]> diff --git a/gnu/llvm/docs/MergeFunctions.rst b/gnu/llvm/docs/MergeFunctions.rst index b2f6030edc1..b87cea68ba5 100644 --- a/gnu/llvm/docs/MergeFunctions.rst +++ b/gnu/llvm/docs/MergeFunctions.rst @@ -56,7 +56,7 @@ As a good start point, Kaleidoscope tutorial could be used: Especially it's important to understand chapter 3 of tutorial: -:doc:`tutorial/LangImpl3` +:doc:`tutorial/LangImpl03` Reader also should know how passes work in LLVM, they could use next article as a reference and start point here: @@ -394,7 +394,7 @@ and in right function "*FR*". And every part of *left* place is equal to the corresponding part of *right* place, and (!) both parts use *Value* instances, for example: -.. code-block:: llvm +.. code-block:: text instr0 i32 %LV ; left side, function FL instr0 i32 %RV ; right side, function FR @@ -409,13 +409,13 @@ in "*FL*" and "*FR*". Consider small example here: -.. code-block:: llvm +.. code-block:: text define void %f(i32 %pf0, i32 %pf1) { instr0 i32 %pf0 instr1 i32 %pf1 instr2 i32 123 } -.. code-block:: llvm +.. code-block:: text define void %g(i32 %pg0, i32 %pg1) { instr0 i32 %pg0 instr1 i32 %pg0 instr2 i32 123 @@ -697,7 +697,7 @@ Below is detailed body description. If “F” may be overridden ------------------------ As follows from ``mayBeOverridden`` comments: “whether the definition of this -global may be replaced by something non-equivalent at link time”. If so, thats +global may be replaced by something non-equivalent at link time”. If so, that's ok: we can use alias to *F* instead of *G* or change call instructions itself. HasGlobalAliases, removeUsers diff --git a/gnu/llvm/docs/NVPTXUsage.rst b/gnu/llvm/docs/NVPTXUsage.rst index fc697ca0046..fdfc8e41dc3 100644 --- a/gnu/llvm/docs/NVPTXUsage.rst +++ b/gnu/llvm/docs/NVPTXUsage.rst @@ -37,9 +37,9 @@ code. By default, the back-end will emit device functions. Metadata is used to declare a function as a kernel function. This metadata is attached to the ``nvvm.annotations`` named metadata object, and has the following format: -.. code-block:: llvm +.. code-block:: text - !0 = metadata !{<function-ref>, metadata !"kernel", i32 1} + !0 = !{<function-ref>, metadata !"kernel", i32 1} The first parameter is a reference to the kernel function. The following example shows a kernel function calling a device function in LLVM IR. The @@ -54,14 +54,14 @@ function ``@my_kernel`` is callable from host code, but ``@my_fmad`` is not. } define void @my_kernel(float* %ptr) { - %val = load float* %ptr + %val = load float, float* %ptr %ret = call float @my_fmad(float %val, float %val, float %val) store float %ret, float* %ptr ret void } !nvvm.annotations = !{!1} - !1 = metadata !{void (float*)* @my_kernel, metadata !"kernel", i32 1} + !1 = !{void (float*)* @my_kernel, !"kernel", i32 1} When compiled, the PTX kernel functions are callable by host-side code. @@ -361,7 +361,7 @@ With programmatic pass pipeline: .. code-block:: c++ - extern ModulePass *llvm::createNVVMReflectPass(const StringMap<int>& Mapping); + extern FunctionPass *llvm::createNVVMReflectPass(const StringMap<int>& Mapping); StringMap<int> ReflectParams; ReflectParams["__CUDA_FTZ"] = 1; @@ -395,7 +395,7 @@ JIT compiling a PTX string to a device binary: .. code-block:: c++ CUmodule module; - CUfunction funcion; + CUfunction function; // JIT compile a null-terminated PTX string cuModuleLoadData(&module, (void*)PTXString); @@ -446,13 +446,13 @@ The Kernel %id = tail call i32 @llvm.nvvm.read.ptx.sreg.tid.x() readnone nounwind ; Compute pointers into A, B, and C - %ptrA = getelementptr float addrspace(1)* %A, i32 %id - %ptrB = getelementptr float addrspace(1)* %B, i32 %id - %ptrC = getelementptr float addrspace(1)* %C, i32 %id + %ptrA = getelementptr float, float addrspace(1)* %A, i32 %id + %ptrB = getelementptr float, float addrspace(1)* %B, i32 %id + %ptrC = getelementptr float, float addrspace(1)* %C, i32 %id ; Read A, B - %valA = load float addrspace(1)* %ptrA, align 4 - %valB = load float addrspace(1)* %ptrB, align 4 + %valA = load float, float addrspace(1)* %ptrA, align 4 + %valB = load float, float addrspace(1)* %ptrB, align 4 ; Compute C = A + B %valC = fadd float %valA, %valB @@ -464,9 +464,9 @@ The Kernel } !nvvm.annotations = !{!0} - !0 = metadata !{void (float addrspace(1)*, - float addrspace(1)*, - float addrspace(1)*)* @kernel, metadata !"kernel", i32 1} + !0 = !{void (float addrspace(1)*, + float addrspace(1)*, + float addrspace(1)*)* @kernel, !"kernel", i32 1} We can use the LLVM ``llc`` tool to directly run the NVPTX code generator: @@ -566,7 +566,7 @@ Intrinsic CUDA Equivalent ``i32 @llvm.nvvm.read.ptx.sreg.ctaid.{x,y,z}`` blockIdx.{x,y,z} ``i32 @llvm.nvvm.read.ptx.sreg.ntid.{x,y,z}`` blockDim.{x,y,z} ``i32 @llvm.nvvm.read.ptx.sreg.nctaid.{x,y,z}`` gridDim.{x,y,z} -``void @llvm.cuda.syncthreads()`` __syncthreads() +``void @llvm.nvvm.barrier0()`` __syncthreads() ================================================ ==================== @@ -608,16 +608,16 @@ as a PTX `kernel` function. These metadata nodes take the form: .. code-block:: text - metadata !{<function ref>, metadata !"kernel", i32 1} + !{<function ref>, metadata !"kernel", i32 1} For the previous example, we have: .. code-block:: llvm !nvvm.annotations = !{!0} - !0 = metadata !{void (float addrspace(1)*, - float addrspace(1)*, - float addrspace(1)*)* @kernel, metadata !"kernel", i32 1} + !0 = !{void (float addrspace(1)*, + float addrspace(1)*, + float addrspace(1)*)* @kernel, !"kernel", i32 1} Here, we have a single metadata declaration in ``nvvm.annotations``. This metadata annotates our ``@kernel`` function with the ``kernel`` attribute. @@ -830,13 +830,13 @@ Libdevice provides an ``__nv_powf`` function that we will use. %id = tail call i32 @llvm.nvvm.read.ptx.sreg.tid.x() readnone nounwind ; Compute pointers into A, B, and C - %ptrA = getelementptr float addrspace(1)* %A, i32 %id - %ptrB = getelementptr float addrspace(1)* %B, i32 %id - %ptrC = getelementptr float addrspace(1)* %C, i32 %id + %ptrA = getelementptr float, float addrspace(1)* %A, i32 %id + %ptrB = getelementptr float, float addrspace(1)* %B, i32 %id + %ptrC = getelementptr float, float addrspace(1)* %C, i32 %id ; Read A, B - %valA = load float addrspace(1)* %ptrA, align 4 - %valB = load float addrspace(1)* %ptrB, align 4 + %valA = load float, float addrspace(1)* %ptrA, align 4 + %valB = load float, float addrspace(1)* %ptrB, align 4 ; Compute C = pow(A, B) %valC = call float @__nv_powf(float %valA, float %valB) @@ -848,9 +848,9 @@ Libdevice provides an ``__nv_powf`` function that we will use. } !nvvm.annotations = !{!0} - !0 = metadata !{void (float addrspace(1)*, - float addrspace(1)*, - float addrspace(1)*)* @kernel, metadata !"kernel", i32 1} + !0 = !{void (float addrspace(1)*, + float addrspace(1)*, + float addrspace(1)*)* @kernel, !"kernel", i32 1} To compile this kernel, we perform the following steps: diff --git a/gnu/llvm/docs/Passes.rst b/gnu/llvm/docs/Passes.rst index cc0a853bc4d..77461f3c52d 100644 --- a/gnu/llvm/docs/Passes.rst +++ b/gnu/llvm/docs/Passes.rst @@ -253,14 +253,6 @@ This pass decodes the debug info metadata in a module and prints in a For example, run this pass from ``opt`` along with the ``-analyze`` option, and it'll print to standard output. -``-no-aa``: No Alias Analysis (always returns 'may' alias) ----------------------------------------------------------- - -This is the default implementation of the Alias Analysis interface. It always -returns "I don't know" for alias queries. NoAA is unlike other alias analysis -implementations, in that it does not chain to a previous analysis. As such it -doesn't follow many of the rules that other alias analyses must. - ``-postdomfrontier``: Post-Dominance Frontier Construction ---------------------------------------------------------- @@ -955,7 +947,7 @@ that this should make CFG hacking much easier. To make later hacking easier, the entry block is split into two, such that all introduced ``alloca`` instructions (and nothing else) are in the entry block. -``-scalarrepl``: Scalar Replacement of Aggregates (DT) +``-sroa``: Scalar Replacement of Aggregates ------------------------------------------------------ The well-known scalar replacement of aggregates transformation. This transform @@ -964,12 +956,6 @@ individual ``alloca`` instructions for each member if possible. Then, if possible, it transforms the individual ``alloca`` instructions into nice clean scalar SSA form. -This combines a simple scalar replacement of aggregates algorithm with the -:ref:`mem2reg <passes-mem2reg>` algorithm because they often interact, -especially for C++ programs. As such, iterating between ``scalarrepl``, then -:ref:`mem2reg <passes-mem2reg>` until we run out of things to promote works -well. - .. _passes-sccp: ``-sccp``: Sparse Conditional Constant Propagation diff --git a/gnu/llvm/docs/Phabricator.rst b/gnu/llvm/docs/Phabricator.rst index af1e4429fda..04319a9a378 100644 --- a/gnu/llvm/docs/Phabricator.rst +++ b/gnu/llvm/docs/Phabricator.rst @@ -127,37 +127,80 @@ a change from Phabricator. Committing a change ------------------- -Arcanist can manage the commit transparently. It will retrieve the description, -reviewers, the ``Differential Revision``, etc from the review and commit it to the repository. +Once a patch has been reviewed and approved on Phabricator it can then be +committed to trunk. There are multiple workflows to achieve this. Whichever +method you follow it is recommend that your commit message ends with the line: + +:: + + Differential Revision: <URL> + +where ``<URL>`` is the URL for the code review, starting with +``http://reviews.llvm.org/``. + +This allows people reading the version history to see the review for +context. This also allows Phabricator to detect the commit, close the +review, and add a link from the review to the commit. + +Note that if you use the Arcanist tool the ``Differential Revision`` line will +be added automatically. If you don't want to use Arcanist, you can add the +``Differential Revision`` line (as the last line) to the commit message +yourself. + +Using the Arcanist tool can simplify the process of committing reviewed code +as it will retrieve reviewers, the ``Differential Revision``, etc from the review +and place it in the commit message. Several methods of using Arcanist to commit +code are given below. If you do not wish to use Arcanist then simply commit +the reviewed patch as you would normally. + +Note that if you commit the change without using Arcanist and forget to add the +``Differential Revision`` line to your commit message then it is recommended +that you close the review manually. In the web UI, under "Leap Into Action" put +the SVN revision number in the Comment, set the Action to "Close Revision" and +click Submit. Note the review must have been Accepted first. + +Subversion and Arcanist +^^^^^^^^^^^^^^^^^^^^^^^ + +On a clean Subversion working copy run the following (where ``<Revision>`` is +the Phabricator review number): :: arc patch D<Revision> arc commit --revision D<Revision> +The first command will take the latest version of the reviewed patch and apply it to the working +copy. The second command will commit this revision to trunk. + +git-svn and Arcanist +^^^^^^^^^^^^^^^^^^^^ -When committing a change that has been reviewed using -Phabricator, the convention is for the commit message to end with the -line: +This presumes that the git repository has been configured as described in :ref:`developers-work-with-git-svn`. + +On a clean Git repository on an up to date ``master`` branch run the +following (where ``<Revision>`` is the Phabricator review number): :: - Differential Revision: <URL> + arc patch D<Revision> -where ``<URL>`` is the URL for the code review, starting with -``http://reviews.llvm.org/``. -Note that Arcanist will add this automatically. +This will create a new branch called ``arcpatch-D<Revision>`` based on the +current ``master`` and will create a commit corresponding to ``D<Revision>`` with a +commit message derived from information in the Phabricator review. + +Check you are happy with the commit message and amend it if necessary. Now switch to +the ``master`` branch and add the new commit to it and commit it to trunk. This +can be done by running the following: + +:: + + git checkout master + git merge --ff-only arcpatch-D<Revision> + git svn dcommit -This allows people reading the version history to see the review for -context. This also allows Phabricator to detect the commit, close the -review, and add a link from the review to the commit. -If you use ``git`` or ``svn`` to commit the change and forget to add the line -to your commit message, you should close the review manually. In the web UI, -under "Leap Into Action" put the SVN revision number in the Comment, set the -Action to "Close Revision" and click Submit. Note the review must have been -Accepted first. Abandoning a change ------------------- diff --git a/gnu/llvm/docs/ProgrammersManual.rst b/gnu/llvm/docs/ProgrammersManual.rst index 665e30aeb67..030637048bf 100644 --- a/gnu/llvm/docs/ProgrammersManual.rst +++ b/gnu/llvm/docs/ProgrammersManual.rst @@ -263,8 +263,193 @@ almost never be stored or mentioned directly. They are intended solely for use when defining a function which should be able to efficiently accept concatenated strings. +.. _error_apis: + +Error handling +-------------- + +Proper error handling helps us identify bugs in our code, and helps end-users +understand errors in their tool usage. Errors fall into two broad categories: +*programmatic* and *recoverable*, with different strategies for handling and +reporting. + +Programmatic Errors +^^^^^^^^^^^^^^^^^^^ + +Programmatic errors are violations of program invariants or API contracts, and +represent bugs within the program itself. Our aim is to document invariants, and +to abort quickly at the point of failure (providing some basic diagnostic) when +invariants are broken at runtime. + +The fundamental tools for handling programmatic errors are assertions and the +llvm_unreachable function. Assertions are used to express invariant conditions, +and should include a message describing the invariant: + +.. code-block:: c++ + + assert(isPhysReg(R) && "All virt regs should have been allocated already."); + +The llvm_unreachable function can be used to document areas of control flow +that should never be entered if the program invariants hold: + +.. code-block:: c++ + + enum { Foo, Bar, Baz } X = foo(); + + switch (X) { + case Foo: /* Handle Foo */; break; + case Bar: /* Handle Bar */; break; + default: + llvm_unreachable("X should be Foo or Bar here"); + } + +Recoverable Errors +^^^^^^^^^^^^^^^^^^ + +Recoverable errors represent an error in the program's environment, for example +a resource failure (a missing file, a dropped network connection, etc.), or +malformed input. These errors should be detected and communicated to a level of +the program where they can be handled appropriately. Handling the error may be +as simple as reporting the issue to the user, or it may involve attempts at +recovery. + +Recoverable errors are modeled using LLVM's ``Error`` scheme. This scheme +represents errors using function return values, similar to classic C integer +error codes, or C++'s ``std::error_code``. However, the ``Error`` class is +actually a lightweight wrapper for user-defined error types, allowing arbitrary +information to be attached to describe the error. This is similar to the way C++ +exceptions allow throwing of user-defined types. + +Success values are created by calling ``Error::success()``: + +.. code-block:: c++ + + Error foo() { + // Do something. + // Return success. + return Error::success(); + } + +Success values are very cheap to construct and return - they have minimal +impact on program performance. + +Failure values are constructed using ``make_error<T>``, where ``T`` is any class +that inherits from the ErrorInfo utility: + +.. code-block:: c++ + + class MyError : public ErrorInfo<MyError> { + public: + MyError(std::string Msg) : Msg(Msg) {} + void log(OStream &OS) const override { OS << "MyError - " << Msg; } + static char ID; + private: + std::string Msg; + }; + + char MyError::ID = 0; // In MyError.cpp + + Error bar() { + if (checkErrorCondition) + return make_error<MyError>("Error condition detected"); + + // No error - proceed with bar. + + // Return success value. + return Error::success(); + } + +Error values can be implicitly converted to bool: true for error, false for +success, enabling the following idiom: + +.. code-block:: c++ + + Error mayFail(); + + Error foo() { + if (auto Err = mayFail()) + return Err; + // Success! We can proceed. + ... + +For functions that can fail but need to return a value the ``Expected<T>`` +utility can be used. Values of this type can be constructed with either a +``T``, or a ``Error``. Expected<T> values are also implicitly convertible to +boolean, but with the opposite convention to Error: true for success, false for +error. If success, the ``T`` value can be accessed via the dereference operator. +If failure, the ``Error`` value can be extracted using the ``takeError()`` +method. Idiomatic usage looks like: + +.. code-block:: c++ + + Expected<float> parseAndSquareRoot(IStream &IS) { + float f; + OS >> f; + if (f < 0) + return make_error<FloatingPointError>(...); + return sqrt(f); + } + + Error foo(IStream &IS) { + if (auto SqrtOrErr = parseAndSquartRoot(IS)) { + float Sqrt = *SqrtOrErr; + // ... + } else + return SqrtOrErr.takeError(); + } + +All Error instances, whether success or failure, must be either checked or +moved from (via std::move or a return) before they are destructed. Accidentally +discarding an unchecked error will cause a program abort at the point where the +unchecked value's destructor is run, making it easy to identify and fix +violations of this rule. + +Success values are considered checked once they have been tested (by invoking +the boolean conversion operator): + +.. code-block:: c++ + + if (auto Err = canFail(...)) + return Err; // Failure value - move error to caller. + + // Safe to continue: Err was checked. + +In contrast, the following code will always cause an abort, regardless of the +return value of ``foo``: + +.. code-block:: c++ + + canFail(); + // Program will always abort here, even if canFail() returns Success, since + // the value is not checked. + +Failure values are considered checked once a handler for the error type has +been activated: + +.. code-block:: c++ + + auto Err = canFail(...); + if (auto Err2 = + handleErrors(std::move(Err), + [](std::unique_ptr<MyError> M) { + // Try to handle 'M'. If successful, return a success value from + // the handler. + if (tryToHandle(M)) + return Error::success(); + + // We failed to handle 'M' - return it from the handler. + // This value will be passed back from catchErrors and + // wind up in Err2, where it will be returned from this function. + return Error(std::move(M)); + }))) + return Err2; + + .. _function_apis: +More information on Error and its related utilities can be found in the +Error.h header file. + Passing functions and other callable objects -------------------------------------------- @@ -295,7 +480,7 @@ The ``function_ref`` class template ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``function_ref`` -(`doxygen <http://llvm.org/doxygen/classllvm_1_1function_ref.html>`__) class +(`doxygen <http://llvm.org/docs/doxygen/html/classllvm_1_1function__ref_3_01Ret_07Params_8_8_8_08_4.html>`__) class template represents a reference to a callable object, templated over the type of the callable. This is a good choice for passing a callback to a function, if you don't need to hold onto the callback after the function returns. In this @@ -420,7 +605,7 @@ system in place to ensure that names do not conflict. If two different modules use the same string, they will all be turned on when the name is specified. This allows, for example, all debug information for instruction scheduling to be enabled with ``-debug-only=InstrSched``, even if the source lives in multiple -files. The name must not include a comma (,) as that is used to seperate the +files. The name must not include a comma (,) as that is used to separate the arguments of the ``-debug-only`` option. For performance reasons, -debug-only is not available in optimized build @@ -1135,7 +1320,7 @@ llvm/ADT/StringSet.h ``StringSet`` is a thin wrapper around :ref:`StringMap\<char\> <dss_stringmap>`, and it allows efficient storage and retrieval of unique strings. -Functionally analogous to ``SmallSet<StringRef>``, ``StringSet`` also suports +Functionally analogous to ``SmallSet<StringRef>``, ``StringSet`` also supports iteration. (The iterator dereferences to a ``StringMapEntry<char>``, so you need to call ``i->getKey()`` to access the item of the StringSet.) On the other hand, ``StringSet`` doesn't support range-insertion and @@ -1696,7 +1881,7 @@ pointer from an iterator is very straight-forward. Assuming that ``i`` is a However, the iterators you'll be working with in the LLVM framework are special: they will automatically convert to a ptr-to-instance type whenever they need to. -Instead of derferencing the iterator and then taking the address of the result, +Instead of dereferencing the iterator and then taking the address of the result, you can simply assign the iterator to the proper pointer type and you get the dereference and address-of operation as a result of the assignment (behind the scenes, this is a result of overloading casting mechanisms). Thus the second @@ -2036,7 +2221,7 @@ sequence of instructions that form a ``BasicBlock``: CallInst* callTwo = Builder.CreateCall(...); Value* result = Builder.CreateMul(callOne, callTwo); - See :doc:`tutorial/LangImpl3` for a practical use of the ``IRBuilder``. + See :doc:`tutorial/LangImpl03` for a practical use of the ``IRBuilder``. .. _schanges_deleting: @@ -2234,11 +2419,6 @@ determine what context they belong to by looking at their own ``Type``. If you are adding new entities to LLVM IR, please try to maintain this interface design. -For clients that do *not* require the benefits of isolation, LLVM provides a -convenience API ``getGlobalContext()``. This returns a global, lazily -initialized ``LLVMContext`` that may be used in situations where isolation is -not a concern. - .. _jitthreading: Threads and the JIT diff --git a/gnu/llvm/docs/README.txt b/gnu/llvm/docs/README.txt index 31764b2951b..6c6e5b90ecf 100644 --- a/gnu/llvm/docs/README.txt +++ b/gnu/llvm/docs/README.txt @@ -11,12 +11,13 @@ updated after every commit. Manpage output is also supported, see below. If you instead would like to generate and view the HTML locally, install Sphinx <http://sphinx-doc.org/> and then do: - cd docs/ - make -f Makefile.sphinx - $BROWSER _build/html/index.html + cd <build-dir> + cmake -DLLVM_ENABLE_SPHINX=true -DSPHINX_OUTPUT_HTML=true <src-dir> + make -j3 docs-llvm-html + $BROWSER <build-dir>/docs//html/index.html The mapping between reStructuredText files and generated documentation is -`docs/Foo.rst` <-> `_build/html/Foo.html` <-> `http://llvm.org/docs/Foo.html`. +`docs/Foo.rst` <-> `<build-dir>/docs//html/Foo.html` <-> `http://llvm.org/docs/Foo.html`. If you are interested in writing new documentation, you will want to read `SphinxQuickstartTemplate.rst` which will get you writing documentation @@ -29,14 +30,15 @@ Manpage Output Building the manpages is similar to building the HTML documentation. The primary difference is to use the `man` makefile target, instead of the default (which is `html`). Sphinx then produces the man pages in the -directory `_build/man/`. +directory `<build-dir>/docs/man/`. - cd docs/ - make -f Makefile.sphinx man - man -l _build/man/FileCheck.1 + cd <build-dir> + cmake -DLLVM_ENABLE_SPHINX=true -DSPHINX_OUTPUT_MAN=true <src-dir> + make -j3 docs-llvm-man + man -l >build-dir>/docs/man/FileCheck.1 The correspondence between .rst files and man pages is -`docs/CommandGuide/Foo.rst` <-> `_build/man/Foo.1`. +`docs/CommandGuide/Foo.rst` <-> `<build-dir>/docs//man/Foo.1`. These .rst files are also included during HTML generation so they are also viewable online (as noted above) at e.g. `http://llvm.org/docs/CommandGuide/Foo.html`. diff --git a/gnu/llvm/docs/ReleaseNotes.rst b/gnu/llvm/docs/ReleaseNotes.rst index a25429734bb..757434a02ce 100644 --- a/gnu/llvm/docs/ReleaseNotes.rst +++ b/gnu/llvm/docs/ReleaseNotes.rst @@ -1,16 +1,15 @@ ====================== -LLVM 3.8 Release Notes +LLVM 3.9 Release Notes ====================== .. contents:: :local: - Introduction ============ This document contains the release notes for the LLVM Compiler Infrastructure, -release 3.8. Here we describe the status of LLVM, including major improvements +release 3.9. Here we describe the status of LLVM, including major improvements from the previous release, improvements in various subprojects of LLVM, and some of the current users of the code. All LLVM releases may be downloaded from the `LLVM releases web site <http://llvm.org/releases/>`_. @@ -23,256 +22,232 @@ them. Non-comprehensive list of changes in this release ================================================= -* With this release, the minimum Windows version required for running LLVM is - Windows 7. Earlier versions, including Windows Vista and XP are no longer - supported. - -* With this release, the autoconf build system is deprecated. It will be removed - in the 3.9 release. Please migrate to using CMake. For more information see: - `Building LLVM with CMake <CMake.html>`_ - -* We have documented our C API stability guarantees for both development and - release branches, as well as documented how to extend the C API. Please see - the `developer documentation <DeveloperPolicy.html#c-api-changes>`_ for more - information. - -* The C API function ``LLVMLinkModules`` is deprecated. It will be removed in the - 3.9 release. Please migrate to ``LLVMLinkModules2``. Unlike the old function the - new one - - * Doesn't take an unused parameter. - * Destroys the source instead of only damaging it. - * Does not record a message. Use the diagnostic handler instead. - -* The C API functions ``LLVMParseBitcode``, ``LLVMParseBitcodeInContext``, - ``LLVMGetBitcodeModuleInContext`` and ``LLVMGetBitcodeModule`` have been deprecated. - They will be removed in 3.9. Please migrate to the versions with a 2 suffix. - Unlike the old ones the new ones do not record a diagnostic message. Use - the diagnostic handler instead. - -* The deprecated C APIs ``LLVMGetBitcodeModuleProviderInContext`` and - ``LLVMGetBitcodeModuleProvider`` have been removed. - -* The deprecated C APIs ``LLVMCreateExecutionEngine``, ``LLVMCreateInterpreter``, - ``LLVMCreateJITCompiler``, ``LLVMAddModuleProvider`` and ``LLVMRemoveModuleProvider`` - have been removed. - -* With this release, the C API headers have been reorganized to improve build - time. Type specific declarations have been moved to Type.h, and error - handling routines have been moved to ErrorHandling.h. Both are included in - Core.h so nothing should change for projects directly including the headers, - but transitive dependencies may be affected. - -* llvm-ar now supports thin archives. - -* llvm doesn't produce ``.data.rel.ro.local`` or ``.data.rel`` sections anymore. - -* Aliases to ``available_externally`` globals are now rejected by the verifier. - -* The IR Linker has been split into ``IRMover`` that moves bits from one module to - another and Linker proper that decides what to link. - -* Support for dematerializing has been dropped. - -* ``RegisterScheduler::setDefault`` was removed. Targets that used to call into the - command line parser to set the ``DAGScheduler``, and that don't have enough - control with ``setSchedulingPreference``, should look into overriding the - ``SubTargetHook`` "``getDAGScheduler()``". - -* ``ilist_iterator<T>`` no longer has implicit conversions to and from ``T*``, - since ``ilist_iterator<T>`` may be pointing at the sentinel (which is usually - not of type ``T`` at all). To convert from an iterator ``I`` to a pointer, - use ``&*I``; to convert from a pointer ``P`` to an iterator, use - ``P->getIterator()``. Alternatively, explicit conversions via - ``static_cast<T>(U)`` are still available. - -* ``ilist_node<T>::getNextNode()`` and ``ilist_node<T>::getPrevNode()`` now - fail at compile time when the node cannot access its parent list. - Previously, when the sentinel was was an ``ilist_half_node<T>``, this API - could return the sentinel instead of ``nullptr``. Frustrated callers should - be updated to use ``iplist<T>::getNextNode(T*)`` instead. Alternatively, if - the node ``N`` is guaranteed not to be the last in the list, it is safe to - call ``&*++N->getIterator()`` directly. - -* The `Kaleidoscope tutorials <tutorial/index.html>`_ have been updated to use - the ORC JIT APIs. - -* ORC now has a basic set of C bindings. - -* Optional support for linking clang and the LLVM tools with a single libLLVM - shared library. To enable this, pass ``-DLLVM_LINK_LLVM_DYLIB=ON`` to CMake. - See `Building LLVM with CMake`_ for more details. - -* The optimization to move the prologue and epilogue of functions in colder - code path (shrink-wrapping) is now enabled by default. +* The LLVMContext gains a new runtime check (see + LLVMContext::discardValueNames()) that can be set to discard Value names + (other than GlobalValue). This is intended to be used in release builds by + clients that are interested in saving CPU/memory as much as possible. + +* There is no longer a "global context" available in LLVM, except for the C API. + +* The autoconf build system has been removed in favor of CMake. LLVM 3.9 + requires CMake 3.4.3 or later to build. For information about using CMake + please see the documentation on :doc:`CMake`. For information about the CMake + language there is also a :doc:`CMakePrimer` document available. + +* C API functions LLVMParseBitcode, + LLVMParseBitcodeInContext, LLVMGetBitcodeModuleInContext and + LLVMGetBitcodeModule having been removed. LLVMGetTargetMachineData has been + removed (use LLVMGetDataLayout instead). + +* The C API function LLVMLinkModules has been removed. + +* The C API function LLVMAddTargetData has been removed. + +* The C API function LLVMGetDataLayout is deprecated + in favor of LLVMGetDataLayoutStr. + +* The C API enum LLVMAttribute and associated API is deprecated in favor of + the new LLVMAttributeRef API. The deprecated functions are + LLVMAddFunctionAttr, LLVMAddTargetDependentFunctionAttr, + LLVMRemoveFunctionAttr, LLVMGetFunctionAttr, LLVMAddAttribute, + LLVMRemoveAttribute, LLVMGetAttribute, LLVMAddInstrAttribute, + LLVMRemoveInstrAttribute and LLVMSetInstrParamAlignment. + +* ``TargetFrameLowering::eliminateCallFramePseudoInstr`` now returns an + iterator to the next instruction instead of ``void``. Targets that previously + did ``MBB.erase(I); return;`` now probably want ``return MBB.erase(I);``. + +* ``SelectionDAGISel::Select`` now returns ``void``. Out-of-tree targets will + need to be updated to replace the argument node and remove any dead nodes in + cases where they currently return an ``SDNode *`` from this interface. + +* Added the MemorySSA analysis, which hopes to replace MemoryDependenceAnalysis. + It should provide higher-quality results than MemDep, and be algorithmically + faster than MemDep. Currently, GVNHoist (which is off by default) makes use of + MemorySSA. + +* The minimum density for lowering switches with jump tables has been reduced + from 40% to 10% for functions which are not marked ``optsize`` (that is, + compiled with ``-Os``). + +GCC ABI Tag +----------- + +Recently, many of the Linux distributions (e.g. `Fedora <http://developerblog.redhat.com/2015/02/10/gcc-5-in-fedora/>`_, +`Debian <https://wiki.debian.org/GCC5>`_, `Ubuntu <https://wiki.ubuntu.com/GCC5>`_) +have moved on to use the new `GCC ABI <https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Attributes.html>`_ +to work around `C++11 incompatibilities in libstdc++ <https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html>`_. +This caused `incompatibility problems <https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00153.html>`_ +with other compilers (e.g. Clang), which needed to be fixed, but due to the +experimental nature of GCC's own implementation, it took a long time for it to +land in LLVM (`D18035 <https://reviews.llvm.org/D18035>`_ and +`D17567 <https://reviews.llvm.org/D17567>`_), not in time for the 3.8 release. + +Those patches are now present in the 3.9.0 release and should be working in the +majority of cases, as they have been tested thoroughly. However, some bugs were +`filed in GCC <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71712>`_ and have not +yet been fixed, so there may be corner cases not covered by either GCC or Clang. +Bug fixes to those problems should be reported in Bugzilla (either LLVM or GCC), +and patches to LLVM's trunk are very likely to be back-ported to future 3.9.x +releases (depends on how destructive it is). + +Unfortunately, these patches won't be back-ported to 3.8.x or earlier, so we +strongly recommend people to use 3.9.x when GCC ABI cases are at stake. + +For a more in-depth view of the issue, check our `Bugzilla entry <https://llvm.org/bugs/show_bug.cgi?id=23529>`_. + +Changes to the LLVM IR +---------------------- + +* New intrinsics ``llvm.masked.load``, ``llvm.masked.store``, + ``llvm.masked.gather`` and ``llvm.masked.scatter`` were introduced to the + LLVM IR to allow selective memory access for vector data types. + +* The new ``notail`` attribute prevents optimization passes from adding ``tail`` + or ``musttail`` markers to a call. It is used to prevent tail call + optimization from being performed on the call. + +Changes to LLVM's IPO model +--------------------------- -* A new target-independent gcc-compatible emulated Thread Local Storage mode - is added. When ``-femultated-tls`` flag is used, all accesses to TLS - variables are converted to calls to ``__emutls_get_address`` in the runtime - library. +LLVM no longer does inter-procedural analysis and optimization (except +inlining) on functions with comdat linkage. Doing IPO over such +functions is unsound because the implementation the linker chooses at +link-time may be differently optimized than the one what was visible +during optimization, and may have arbitrarily different observable +behavior. See `PR26774 <http://llvm.org/PR26774>`_ for more details. -* MSVC-compatible exception handling has been completely overhauled. New - instructions have been introduced to facilitate this: - `New exception handling instructions <ExceptionHandling.html#new-exception-handling-instructions>`_. - While we have done our best to test this feature thoroughly, it would - not be completely surprising if there were a few lingering issues that - early adopters might bump into. +Support for ThinLTO +------------------- +LLVM now supports ThinLTO compilation, which can be invoked by compiling +and linking with ``-flto=thin``. The gold linker plugin, as well as linkers +that use the new ThinLTO API in libLTO (like ld64), will transparently +execute the ThinLTO backends in parallel threads. +For more information on ThinLTO and the LLVM implementation, see the +`ThinLTO blog post <http://blog.llvm.org/2016/06/thinlto-scalable-and-incremental-lto.html>`_. -Changes to the ARM Backends ---------------------------- +Changes to the ARM Targets +-------------------------- -During this release the AArch64 target has: - -* Added support for more sanitizers (MSAN, TSAN) and made them compatible with - all VMA kernel configurations (currently tested on 39 and 42 bits). -* Gained initial LLD support in the new ELF back-end -* Extended the Load/Store optimiser and cleaned up some of the bad decisions - made earlier. -* Expanded LLDB support, including watchpoints, native building, Renderscript, - LLDB-server, debugging 32-bit applications. -* Added support for the ``Exynos M1`` chip. - -During this release the ARM target has: - -* Gained massive performance improvements on embedded benchmarks due to finally - running the stride vectorizer in full form, incrementing the performance gains - that we already had in the previous releases with limited stride vectorization. -* Expanded LLDB support, including watchpoints, unwind tables -* Extended the Load/Store optimiser and cleaned up some of the bad decisions - made earlier. -* Simplified code generation for global variable addresses in ELF, resulting in - a significant (4% in Chromium) reduction in code size. -* Gained some additional code size improvements, though there's still a long road - ahead, especially for older cores. -* Added some EABI floating point comparison functions to Compiler-RT -* Added support for Windows+GNU triple, ``+features`` in ``-mcpu``/``-march`` options. +**During this release the AArch64 backend has:** + +* Gained support for Qualcomm's Kryo and Broadcom's Vulcan CPUs, including + scheduling models. +* Landed a scheduling model for Samsung's Exynos M1. +* Seen a lot of work on GlobalISel. +* Learned a few more useful combines (fadd and fmul into fmadd, adjustments to the + stack pointer for callee-save stack memory and local stack memory etc). +* Gained support for the Swift calling convention. +* Switched to using SubtargetFeatures rather than testing for specific CPUs and + to using TableGen for handling system instruction operands. +* Like ARM, AArch64 is now using the TargetParser, so no more StringSwitches + matching CPU, FPU or feature names will be accepted in normal code. +* Clang can now self-host itself using LLD on AArch64. +* Gained a big batch of tests from Halide. + + Furthermore, LLDB now supports AArch64 compact unwind tables, as used on iOS, + tvos and watchos. + +**During this release the ARM target has:** + +* ARMv8.2-A can now be targeted directly via Clang flags. +* Adding preliminary support for Cortex-R8. +* LLDB can now parse EABI attributes for an ELF input. +* Initial ARM/Thumb support was added to LLD. +* The ExecutionEngine now supports COFF/ARM. +* Swift calling convention was ported to ARM. +* A large number of codegen fixes around ARMv8, DSP, correct sub-target support, + relocations, EABI, EHABI, Windows on ARM, atomics.. +* Improved assembler support for Linux/Android/Chromium sub-projects. +* Initial support for MUSL (libc) on ARM. +* Support for Thumb1 targets in libunwind. +* Gained a big batch of tests from Halide. Changes to the MIPS Target -------------------------- -During this release the MIPS target has: - -* Significantly extended support for the Integrated Assembler. See below for - more information -* Added support for the ``P5600`` processor. -* Added support for the ``interrupt`` attribute for MIPS32R2 and later. This - attribute will generate a function which can be used as a interrupt handler - on bare metal MIPS targets using the static relocation model. -* Added support for the ``ERETNC`` instruction found in MIPS32R5 and later. -* Added support for OpenCL. See http://portablecl.org/. - -* Address spaces 1 to 255 are now reserved for software use and conversions - between them are no-op casts. - -* Removed the ``mips16`` value for the ``-mcpu`` option since it is an :abbr:`ASE - (Application Specific Extension)` and not a processor. If you were using this, - please specify another CPU and use ``-mips16`` to enable MIPS16. -* Removed ``copy_u.w`` from 32-bit MSA and ``copy_u.d`` from 64-bit MSA since - they have been removed from the MSA specification due to forward compatibility - issues. For example, 32-bit MSA code containing ``copy_u.w`` would behave - differently on a 64-bit processor supporting MSA. The corresponding intrinsics - are still available and may expand to ``copy_s.[wd]`` where this is - appropriate for forward compatibility purposes. -* Relaxed the ``-mnan`` option to allow ``-mnan=2008`` on MIPS32R2/MIPS64R2 for - compatibility with GCC. -* Made MIPS64R6 the default CPU for 64-bit Android triples. - -The MIPS target has also fixed various bugs including the following notable -fixes: - -* Fixed reversed operands on ``mthi``/``mtlo`` in the DSP :abbr:`ASE - (Application Specific Extension)`. -* The code generator no longer uses ``jal`` for calls to absolute immediate - addresses. -* Disabled fast instruction selection on MIPS32R6 and MIPS64R6 since this is not - yet supported. -* Corrected addend for ``R_MIPS_HI16`` and ``R_MIPS_PCHI16`` in MCJIT -* The code generator no longer crashes when handling subregisters of an 64-bit - FPU register with undefined value. -* The code generator no longer attempts to use ``$zero`` for operands that do - not permit ``$zero``. -* Corrected the opcode used for ``ll``/``sc`` when using MIPS32R6/MIPS64R6 and - the Integrated Assembler. -* Added support for atomic load and atomic store. -* Corrected debug info when dynamically re-aligning the stack. - -We have made a large number of improvements to the integrated assembler for -MIPS. In this release, the integrated assembler isn't quite production-ready -since there are a few known issues related to bare-metal support, checking -immediates on instructions, and the N32/N64 ABI's. However, the current support -should be sufficient for many users of the O32 ABI, particularly those targeting -MIPS32 on Linux or bare-metal MIPS32. - -If you would like to try the integrated assembler, please use -``-fintegrated-as``. +**During this release the MIPS target has:** + +* Enabled the Integrated Assembler by default for all ``mips-*`` and + ``mipsel-*`` triples. +* Significantly improved the Integrated Assembler support for the n64 ABI. +* Added the Clang frontend ``-mcompact-branches={never,optimal,always}`` option + that controls how LLVM generates compact branches for MIPS targets. +* Improved performance and code size for stack pointer adjustments in functions + with large frames. +* Implemented many instructions from the microMIPS32R6 ISA and added CodeGen + support for most of them. +* Added support for the triple used by Debian Stretch for little endian + MIPS64, ie. ``mips64el-linux-gnuabi64``. +* Removed EABI which was neither tested nor properly supported. +* Gained the ability to self-host on MIPS32R6. +* Gained the ability to self-host on MIPS64R2 and MIPS64R6 when using the n64 + ABI. +* Added support for the ``LA`` macro in PIC mode for o32. +* Added support for safestack in compiler-rt. +* Added support for the MIPS n64 ABI in LLD. +* Added LLD support for TLS relocations for both o32 and n64 MIPS ABIs. + +**The MIPS target has also fixed various bugs including the following notable +fixes:** + +* Delay slots are no longer filled multiple times when either ``-save-temps`` + or ``-via-file-asm`` are used. +* Updated n32 and n64 to follow the standard ELF conventions for label prefixes + (``.L``), whereas o32 still uses its own (``$``). +* Properly sign-extend values to GPR width for instructions that expect 32-bit + values on 64-bit ISAs. +* Several fixes for the delay-slot filler pass, including correct + forbidden-slot hazard handling. +* Fixed several errors caught by the machine verifier when turned on for MIPS. +* Fixed broken predicate for ``SELECT`` patterns in MIPS64. +* Fixed wrong truncation of memory address for ``LL``/``SC`` seqeuences in + MIPS64. +* Fixed the o32, n32 and n64 handling of ``.cprestore`` directives when inside + a ``.set noat`` region by the Integrated Assembler. +* Fixed the ordering of ``HI``/``LO`` pairs in the relocation table. +* Fixed the generated ELF ``EFlags`` when Octeon is the target. + Changes to the PowerPC Target ----------------------------- -There are numerous improvements to the PowerPC target in this release: - -* Shrink wrapping optimization has been enabled for PowerPC Little Endian - -* Direct move instructions are used when converting scalars to vectors - -* Thread Sanitizer (TSAN) is now supported for PowerPC - -* New MI peephole pass to clean up redundant XXPERMDI instructions - -* Add branch hints to highly biased branch instructions (code reaching - unreachable terminators and exceptional control flow constructs) - -* Promote boolean return values to integer to prevent excessive usage of - condition registers - -* Additional vector APIs for vector comparisons and vector merges have been - added to altivec.h - -* Many bugs have been identified and fixed +* Moved some optimizations from O3 to O2 (D18562) +* Enable sibling call optimization on ppc64 ELFv1/ELFv2 abi Changes to the X86 Target ------------------------------ - -* TLS is enabled for Cygwin as emutls. +------------------------- -* Smaller code for materializing 32-bit 1 and -1 constants at ``-Os``. +* LLVM now supports the Intel CPU codenamed Skylake Server with AVX-512 + extensions using ``-march=skylake-avx512``. The switch enables the + ISA extensions AVX-512{F, CD, VL, BW, DQ}. -* More efficient code for wide integer compares. (E.g. 64-bit compares - on 32-bit targets.) +* LLVM now supports the Intel CPU codenamed Knights Landing with AVX-512 + extensions using ``-march=knl``. The switch enables the ISA extensions + AVX-512{F, CD, ER, PF}. -* Tail call support for ``thiscall``, ``stdcall``, ``vectorcall``, and - ``fastcall`` functions. +* LLVM will now prefer ``PUSH`` instructions rather than ``%esp``-relative + ``MOV`` instructions for function calls at all optimization levels greater + than ``-O0``. Previously this transformation only occurred at ``-Os``. -Changes to the Hexagon Target +Changes to the AMDGPU Target ----------------------------- -In addition to general code size and performance improvements, Hexagon target -now has basic support for Hexagon V60 architecture and Hexagon Vector -Extensions (HVX). - -Changes to the AVR Target -------------------------- - -Slightly less than half of the AVR backend has been merged in at this point. It is still -missing a number large parts which cause it to be unusable, but is well on the -road to being completely merged and workable. + * Added backend support for OpenGL shader image, buffer storage, atomic + counter, and compute shader extensions (supported since Mesa 12) -Changes to the OCaml bindings ------------------------------ + * Mesa 11.0.x is no longer supported -* The ocaml function link_modules has been replaced with link_modules' which - uses LLVMLinkModules2. - -External Open Source Projects Using LLVM 3.8 +External Open Source Projects Using LLVM 3.9 ============================================ An exciting aspect of LLVM is that it is used as an enabling technology for a lot of other language and tools projects. This section lists some of the -projects that have already been updated to work with LLVM 3.8. +projects that have already been updated to work with LLVM 3.9. LDC - the LLVM-based D compiler ------------------------------- @@ -285,8 +260,9 @@ to concurrency and offers many classical paradigms. `LDC <http://wiki.dlang.org/LDC>`_ uses the frontend from the reference compiler combined with LLVM as backend to produce efficient native code. LDC targets -x86/x86_64 systems like Linux, OS X and Windows and also PowerPC (32/64 bit) -and ARM. Ports to other architectures like AArch64 and MIPS64 are underway. +x86/x86_64 systems like Linux, OS X, FreeBSD and Windows and also Linux on ARM +and PowerPC (32/64 bit). Ports to other architectures like AArch64 and MIPS64 +are underway. Additional Information @@ -301,3 +277,4 @@ going into the ``llvm/docs/`` directory in the LLVM tree. If you have any questions or comments about LLVM, please feel free to contact us via the `mailing lists <http://llvm.org/docs/#maillist>`_. + diff --git a/gnu/llvm/docs/ReportingGuide.rst b/gnu/llvm/docs/ReportingGuide.rst new file mode 100644 index 00000000000..f7ecbb38d45 --- /dev/null +++ b/gnu/llvm/docs/ReportingGuide.rst @@ -0,0 +1,143 @@ +=============== +Reporting Guide +=============== + +.. note:: + + This document is currently a **DRAFT** document while it is being discussed + by the community. + +If you believe someone is violating the :doc:`code of conduct <CodeOfConduct>` +you can always report it to the LLVM Foundation Code of Conduct Advisory +Committee by emailing conduct@llvm.org. **All reports will be kept +confidential.** This isn't a public list and only `members`_ of the advisory +committee will receive the report. + +If you believe anyone is in **physical danger**, please notify appropriate law +enforcement first. If you are unsure what law enforcement agency is +appropriate, please include this in your report and we will attempt to notify +them. + +If the violation occurs at an event such as a Developer Meeting and requires +immediate attention, you can also reach out to any of the event organizers or +staff. Event organizers and staff will be prepared to handle the incident and +able to help. If you cannot find one of the organizers, the venue staff can +locate one for you. We will also post detailed contact information for specific +events as part of each events' information. In person reports will still be +kept confidential exactly as above, but also feel free to (anonymously if +needed) email conduct@llvm.org. + +.. note:: + The LLVM community has long handled inappropriate behavior on its own, using + both private communication and public responses. Nothing in this document is + intended to discourage this self enforcement of community norms. Instead, + the mechanisms described here are intended to supplement any self + enforcement within the community. They provide avenues for handling severe + cases or cases where the reporting party does not wish to respond directly + for any reason. + +Filing a report +=============== + +Reports can be as formal or informal as needed for the situation at hand. If +possible, please include as much information as you can. If you feel +comfortable, please consider including: + +* Your contact info (so we can get in touch with you if we need to follow up). +* Names (real, nicknames, or pseudonyms) of any individuals involved. If there + were other witnesses besides you, please try to include them as well. +* When and where the incident occurred. Please be as specific as possible. +* Your account of what occurred. If there is a publicly available record (e.g. + a mailing list archive or a public IRC logger) please include a link. +* Any extra context you believe existed for the incident. +* If you believe this incident is ongoing. +* Any other information you believe we should have. + +What happens after you file a report? +===================================== + +You will receive an email from the advisory committee acknowledging receipt +within 24 hours (and we will aim to respond much quicker than that). + +The advisory committee will immediately meet to review the incident and try to +determine: + +* What happened and who was involved. +* Whether this event constitutes a code of conduct violation. +* Whether this is an ongoing situation, or if there is a threat to anyone's + physical safety. + +If this is determined to be an ongoing incident or a threat to physical safety, +the working groups' immediate priority will be to protect everyone involved. +This means we may delay an "official" response until we believe that the +situation has ended and that everyone is physically safe. + +The working group will try to contact other parties involved or witnessing the +event to gain clarity on what happened and understand any different +perspectives. + +Once the advisory committee has a complete account of the events they will make +a decision as to how to respond. Responses may include: + +* Nothing, if we determine no violation occurred or it has already been + appropriately resolved. +* Providing either moderation or mediation to ongoing interactions (where + appropriate, safe, and desired by both parties). +* A private reprimand from the working group to the individuals involved. +* An imposed vacation (i.e. asking someone to "take a week off" from a mailing + list or IRC). +* A public reprimand. +* A permanent or temporary ban from some or all LLVM spaces (mailing lists, + IRC, etc.) +* Involvement of relevant law enforcement if appropriate. + +If the situation is not resolved within one week, we'll respond within one week +to the original reporter with an update and explanation. + +Once we've determined our response, we will separately contact the original +reporter and other individuals to let them know what actions (if any) we'll be +taking. We will take into account feedback from the individuals involved on the +appropriateness of our response, but we don't guarantee we'll act on it. + +After any incident, the advisory committee will make a report on the situation +to the LLVM Foundation board. The board may choose to make a public statement +about the incident. If that's the case, the identities of anyone involved will +remain confidential unless instructed by those inviduals otherwise. + +Appealing +========= + +Only permanent resolutions (such as bans) or requests for public actions may be +appealed. To appeal a decision of the working group, contact the LLVM +Foundation board at board@llvm.org with your appeal and the board will review +the case. + +In general, it is **not** appropriate to appeal a particular decision on +a public mailing list. Doing so would involve disclosure of information which +whould be confidential. Disclosing this kind of information publicly may be +considered a separate and (potentially) more serious violation of the Code of +Conduct. This is not meant to limit discussion of the Code of Conduct, the +advisory board itself, or the appropriateness of responses in general, but +**please** refrain from mentioning specific facts about cases without the +explicit permission of all parties involved. + +.. _members: + +Members of the Code of Conduct Advisory Committee +================================================= + +The members serving on the advisory committee are listed here with contact +information in case you are more comfortable talking directly to a specific +member of the committee. + +.. note:: + + FIXME: When we form the initial advisory committee, the members names and private contact info need to be added here. + + + +(This text is based on the `Django Project`_ Code of Conduct, which is in turn +based on wording from the `Speak Up! project`_.) + +.. _Django Project: https://www.djangoproject.com/conduct/ +.. _Speak Up! project: http://speakup.io/coc.html diff --git a/gnu/llvm/docs/ScudoHardenedAllocator.rst b/gnu/llvm/docs/ScudoHardenedAllocator.rst new file mode 100644 index 00000000000..5bc390eadd5 --- /dev/null +++ b/gnu/llvm/docs/ScudoHardenedAllocator.rst @@ -0,0 +1,117 @@ +======================== +Scudo Hardened Allocator +======================== + +.. contents:: + :local: + :depth: 1 + +Introduction +============ +The Scudo Hardened Allocator is a user-mode allocator based on LLVM Sanitizer's +CombinedAllocator, which aims at providing additional mitigations against heap +based vulnerabilities, while maintaining good performance. + +The name "Scudo" has been retained from the initial implementation (Escudo +meaning Shield in Spanish and Portuguese). + +Design +====== +Chunk Header +------------ +Every chunk of heap memory will be preceded by a chunk header. This has two +purposes, the first one being to store various information about the chunk, +the second one being to detect potential heap overflows. In order to achieve +this, the header will be checksumed, involving the pointer to the chunk itself +and a global secret. Any corruption of the header will be detected when said +header is accessed, and the process terminated. + +The following information is stored in the header: + +- the 16-bit checksum; +- the user requested size for that chunk, which is necessary for reallocation + purposes; +- the state of the chunk (available, allocated or quarantined); +- the allocation type (malloc, new, new[] or memalign), to detect potential + mismatches in the allocation APIs used; +- whether or not the chunk is offseted (ie: if the chunk beginning is different + than the backend allocation beginning, which is most often the case with some + aligned allocations); +- the associated offset; +- a 16-bit salt. + +On x64, which is currently the only architecture supported, the header fits +within 16-bytes, which works nicely with the minimum alignment requirements. + +The checksum is computed as a CRC32 (requiring the SSE 4.2 instruction set) +of the global secret, the chunk pointer itself, and the 16 bytes of header with +the checksum field zeroed out. + +The header is atomically loaded and stored to prevent races (this requires +platform support such as the cmpxchg16b instruction). This is important as two +consecutive chunks could belong to different threads. We also want to avoid +any type of double fetches of information located in the header, and use local +copies of the header for this purpose. + +Delayed Freelist +----------------- +A delayed freelist allows us to not return a chunk directly to the backend, but +to keep it aside for a while. Once a criterion is met, the delayed freelist is +emptied, and the quarantined chunks are returned to the backend. This helps +mitigate use-after-free vulnerabilities by reducing the determinism of the +allocation and deallocation patterns. + +This feature is using the Sanitizer's Quarantine as its base, and the amount of +memory that it can hold is configurable by the user (see the Options section +below). + +Randomness +---------- +It is important for the allocator to not make use of fixed addresses. We use +the dynamic base option for the SizeClassAllocator, allowing us to benefit +from the randomness of mmap. + +Usage +===== + +Library +------- +The allocator static library can be built from the LLVM build tree thanks to +the "scudo" CMake rule. The associated tests can be exercised thanks to the +"check-scudo" CMake rule. + +Linking the static library to your project can require the use of the +"whole-archive" linker flag (or equivalent), depending on your linker. +Additional flags might also be necessary. + +Your linked binary should now make use of the Scudo allocation and deallocation +functions. + +Options +------- +Several aspects of the allocator can be configured through environment options, +following the usual ASan options syntax, through the variable SCUDO_OPTIONS. + +For example: SCUDO_OPTIONS="DeleteSizeMismatch=1:QuarantineSizeMb=16". + +The following options are available: + +- QuarantineSizeMb (integer, defaults to 64): the size (in Mb) of quarantine + used to delay the actual deallocation of chunks. Lower value may reduce + memory usage but decrease the effectiveness of the mitigation; a negative + value will fallback to a default of 64Mb; + +- ThreadLocalQuarantineSizeKb (integer, default to 1024): the size (in Kb) of + per-thread cache used to offload the global quarantine. Lower value may + reduce memory usage but might increase the contention on the global + quarantine. + +- DeallocationTypeMismatch (boolean, defaults to true): whether or not we report + errors on malloc/delete, new/free, new/delete[], etc; + +- DeleteSizeMismatch (boolean, defaults to true): whether or not we report + errors on mismatch between size of new and delete; + +- ZeroContents (boolean, defaults to false): whether or not we zero chunk + contents on allocation and deallocation. + diff --git a/gnu/llvm/docs/SegmentedStacks.rst b/gnu/llvm/docs/SegmentedStacks.rst index c0bf32b3f92..b1c588cb163 100644 --- a/gnu/llvm/docs/SegmentedStacks.rst +++ b/gnu/llvm/docs/SegmentedStacks.rst @@ -33,7 +33,7 @@ current stack limit (minus the amount of space needed to allocate a new block) - this slot's offset is again dictated by ``libgcc``. The generated assembly looks like this on x86-64: -.. code-block:: nasm +.. code-block:: text leaq -8(%rsp), %r10 cmpq %fs:112, %r10 diff --git a/gnu/llvm/docs/SourceLevelDebugging.rst b/gnu/llvm/docs/SourceLevelDebugging.rst index 270c44eb50b..8c3142ed219 100644 --- a/gnu/llvm/docs/SourceLevelDebugging.rst +++ b/gnu/llvm/docs/SourceLevelDebugging.rst @@ -63,16 +63,18 @@ away during the compilation process. This meta information provides an LLVM user a relationship between generated code and the original program source code. -Currently, debug information is consumed by DwarfDebug to produce dwarf -information used by the gdb debugger. Other targets could use the same -information to produce stabs or other debug forms. +Currently, there are two backend consumers of debug info: DwarfDebug and +CodeViewDebug. DwarfDebug produces DWARF sutable for use with GDB, LLDB, and +other DWARF-based debuggers. :ref:`CodeViewDebug <codeview>` produces CodeView, +the Microsoft debug info format, which is usable with Microsoft debuggers such +as Visual Studio and WinDBG. LLVM's debug information format is mostly derived +from and inspired by DWARF, but it is feasible to translate into other target +debug info formats such as STABS. It would also be reasonable to use debug information to feed profiling tools for analysis of generated code, or, tools for reconstructing the original source from generated code. -TODO - expound a bit more. - .. _intro_debugopt: Debugging optimized code @@ -197,7 +199,7 @@ value. The first argument is the new value (wrapped as metadata). The second argument is the offset in the user source variable where the new value is written. The third argument is a `local variable <LangRef.html#dilocalvariable>`_ containing a description of the variable. The -third argument is a `complex expression <LangRef.html#diexpression>`_. +fourth argument is a `complex expression <LangRef.html#diexpression>`_. Object lifetimes and scoping ============================ @@ -228,7 +230,7 @@ following C fragment, for example: Compiled to LLVM, this function would be represented like this: -.. code-block:: llvm +.. code-block:: text ; Function Attrs: nounwind ssp uwtable define void @foo() #0 !dbg !4 { @@ -259,7 +261,7 @@ Compiled to LLVM, this function would be represented like this: !llvm.module.flags = !{!7, !8, !9} !llvm.ident = !{!10} - !0 = !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)", isOptimized: false, runtimeVersion: 0, emissionKind: 1, enums: !2, retainedTypes: !2, subprograms: !3, globals: !2, imports: !2) + !0 = !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, retainedTypes: !2, subprograms: !3, globals: !2, imports: !2) !1 = !DIFile(filename: "/dev/stdin", directory: "/Users/dexonsmith/data/llvm/debug-info") !2 = !{} !3 = !{!4} @@ -301,7 +303,7 @@ The first intrinsic ``%llvm.dbg.declare`` encodes debugging information for the variable ``X``. The metadata ``!dbg !14`` attached to the intrinsic provides scope information for the variable ``X``. -.. code-block:: llvm +.. code-block:: text !14 = !DILocation(line: 2, column: 9, scope: !4) !4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5, @@ -325,7 +327,7 @@ The third intrinsic ``%llvm.dbg.declare`` encodes debugging information for variable ``Z``. The metadata ``!dbg !19`` attached to the intrinsic provides scope information for the variable ``Z``. -.. code-block:: llvm +.. code-block:: text !18 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5) !19 = !DILocation(line: 5, column: 11, scope: !18) @@ -388,7 +390,7 @@ Given an integer global variable declared as follows: a C/C++ front-end would generate the following descriptors: -.. code-block:: llvm +.. code-block:: text ;; ;; Define the global itself. @@ -407,7 +409,7 @@ a C/C++ front-end would generate the following descriptors: !0 = !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)", - isOptimized: false, runtimeVersion: 0, emissionKind: 1, + isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, retainedTypes: !2, subprograms: !2, globals: !3, imports: !2) @@ -454,7 +456,7 @@ Given a function declared as follows: a C/C++ front-end would generate the following descriptors: -.. code-block:: llvm +.. code-block:: text ;; ;; Define the anchor for subprograms. @@ -679,7 +681,13 @@ New DWARF Constants | DW_APPLE_PROPERTY_strong | 0x400 | +--------------------------------------+-------+ | DW_APPLE_PROPERTY_unsafe_unretained | 0x800 | -+--------------------------------+-----+-------+ ++--------------------------------------+-------+ +| DW_APPLE_PROPERTY_nullability | 0x1000| ++--------------------------------------+-------+ +| DW_APPLE_PROPERTY_null_resettable | 0x2000| ++--------------------------------------+-------+ +| DW_APPLE_PROPERTY_class | 0x4000| ++--------------------------------------+-------+ Name Accelerator Tables ----------------------- @@ -1333,3 +1341,74 @@ names as follows: * "``.apple_namespaces``" -> "``__apple_namespac``" (16 character limit) * "``.apple_objc``" -> "``__apple_objc``" +.. _codeview: + +CodeView Debug Info Format +========================== + +LLVM supports emitting CodeView, the Microsoft debug info format, and this +section describes the design and implementation of that support. + +Format Background +----------------- + +CodeView as a format is clearly oriented around C++ debugging, and in C++, the +majority of debug information tends to be type information. Therefore, the +overriding design constraint of CodeView is the separation of type information +from other "symbol" information so that type information can be efficiently +merged across translation units. Both type information and symbol information is +generally stored as a sequence of records, where each record begins with a +16-bit record size and a 16-bit record kind. + +Type information is usually stored in the ``.debug$T`` section of the object +file. All other debug info, such as line info, string table, symbol info, and +inlinee info, is stored in one or more ``.debug$S`` sections. There may only be +one ``.debug$T`` section per object file, since all other debug info refers to +it. If a PDB (enabled by the ``/Zi`` MSVC option) was used during compilation, +the ``.debug$T`` section will contain only an ``LF_TYPESERVER2`` record pointing +to the PDB. When using PDBs, symbol information appears to remain in the object +file ``.debug$S`` sections. + +Type records are referred to by their index, which is the number of records in +the stream before a given record plus ``0x1000``. Many common basic types, such +as the basic integral types and unqualified pointers to them, are represented +using type indices less than ``0x1000``. Such basic types are built in to +CodeView consumers and do not require type records. + +Each type record may only contain type indices that are less than its own type +index. This ensures that the graph of type stream references is acyclic. While +the source-level type graph may contain cycles through pointer types (consider a +linked list struct), these cycles are removed from the type stream by always +referring to the forward declaration record of user-defined record types. Only +"symbol" records in the ``.debug$S`` streams may refer to complete, +non-forward-declaration type records. + +Working with CodeView +--------------------- + +These are instructions for some common tasks for developers working to improve +LLVM's CodeView support. Most of them revolve around using the CodeView dumper +embedded in ``llvm-readobj``. + +* Testing MSVC's output:: + + $ cl -c -Z7 foo.cpp # Use /Z7 to keep types in the object file + $ llvm-readobj -codeview foo.obj + +* Getting LLVM IR debug info out of Clang:: + + $ clang -g -gcodeview --target=x86_64-windows-msvc foo.cpp -S -emit-llvm + + Use this to generate LLVM IR for LLVM test cases. + +* Generate and dump CodeView from LLVM IR metadata:: + + $ llc foo.ll -filetype=obj -o foo.obj + $ llvm-readobj -codeview foo.obj > foo.txt + + Use this pattern in lit test cases and FileCheck the output of llvm-readobj + +Improving LLVM's CodeView support is a process of finding interesting type +records, constructing a C++ test case that makes MSVC emit those records, +dumping the records, understanding them, and then generating equivalent records +in LLVM's backend. diff --git a/gnu/llvm/docs/Statepoints.rst b/gnu/llvm/docs/Statepoints.rst index 442b1c269c4..29b1be37a89 100644 --- a/gnu/llvm/docs/Statepoints.rst +++ b/gnu/llvm/docs/Statepoints.rst @@ -138,7 +138,7 @@ SSA value ``%obj.relocated`` which represents the potentially changed value of ``%obj`` after the safepoint and update any following uses appropriately. The resulting relocation sequence is: -.. code-block:: llvm +.. code-block:: text define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj) gc "statepoint-example" { @@ -237,7 +237,7 @@ afterwards. If we extend our previous example to include a pointless derived pointer, we get: -.. code-block:: llvm +.. code-block:: text define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj) gc "statepoint-example" { @@ -251,7 +251,9 @@ we get: Note that in this example %p and %obj.relocate are the same address and we could replace one with the other, potentially removing the derived pointer -from the live set at the safepoint entirely. +from the live set at the safepoint entirely. + +.. _gc_transition_args: GC Transitions ^^^^^^^^^^^^^^^^^^ @@ -260,7 +262,7 @@ As a practical consideration, many garbage-collected systems allow code that is collector-aware ("managed code") to call code that is not collector-aware ("unmanaged code"). It is common that such calls must also be safepoints, since it is desirable to allow the collector to run during the execution of -unmanaged code. Futhermore, it is common that coordinating the transition from +unmanaged code. Furthermore, it is common that coordinating the transition from managed to unmanaged code requires extra code generation at the call site to inform the collector of the transition. In order to support these needs, a statepoint may be marked as a GC transition, and data that is necessary to @@ -281,7 +283,7 @@ Let's assume a hypothetical GC--somewhat unimaginatively named "hypothetical-gc" --that requires that a TLS variable must be written to before and after a call to unmanaged code. The resulting relocation sequence is: -.. code-block:: llvm +.. code-block:: text @flag = thread_local global i32 0, align 4 @@ -566,15 +568,36 @@ Each statepoint generates the following Locations: * Constant which describes number of following deopt *Locations* (not operands) * Variable number of Locations, one for each deopt parameter listed in - the IR statepoint (same number as described by previous Constant) -* Variable number of Locations pairs, one pair for each unique pointer - which needs relocated. The first Location in each pair describes - the base pointer for the object. The second is the derived pointer - actually being relocated. It is guaranteed that the base pointer - must also appear explicitly as a relocation pair if used after the - statepoint. There may be fewer pairs then gc parameters in the IR + the IR statepoint (same number as described by previous Constant). At + the moment, only deopt parameters with a bitwidth of 64 bits or less + are supported. Values of a type larger than 64 bits can be specified + and reported only if a) the value is constant at the call site, and b) + the constant can be represented with less than 64 bits (assuming zero + extension to the original bitwidth). +* Variable number of relocation records, each of which consists of + exactly two Locations. Relocation records are described in detail + below. + +Each relocation record provides sufficient information for a collector to +relocate one or more derived pointers. Each record consists of a pair of +Locations. The second element in the record represents the pointer (or +pointers) which need updated. The first element in the record provides a +pointer to the base of the object with which the pointer(s) being relocated is +associated. This information is required for handling generalized derived +pointers since a pointer may be outside the bounds of the original allocation, +but still needs to be relocated with the allocation. Additionally: + +* It is guaranteed that the base pointer must also appear explicitly as a + relocation pair if used after the statepoint. +* There may be fewer relocation records then gc parameters in the IR statepoint. Each *unique* pair will occur at least once; duplicates - are possible. + are possible. +* The Locations within each record may either be of pointer size or a + multiple of pointer size. In the later case, the record must be + interpreted as describing a sequence of pointers and their corresponding + base pointers. If the Location is of size N x sizeof(pointer), then + there will be N records of one pointer each contained within the Location. + Both Locations in a pair can be assumed to be of the same size. Note that the Locations used in each section may describe the same physical location. e.g. A stack slot may appear as a deopt location, @@ -639,7 +662,7 @@ distinguish between GC references and non-GC references in IR it is given. As an example, given this code: -.. code-block:: llvm +.. code-block:: text define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj) gc "statepoint-example" { @@ -649,7 +672,7 @@ As an example, given this code: The pass would produce this IR: -.. code-block:: llvm +.. code-block:: text define i8 addrspace(1)* @test1(i8 addrspace(1)* %obj) gc "statepoint-example" { @@ -714,7 +737,7 @@ As an example, given input IR of the following: This pass would produce the following IR: -.. code-block:: llvm +.. code-block:: text define void @test() gc "statepoint-example" { %safepoint_token = call token (i64, i32, void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 2882400000, i32 0, void ()* @do_safepoint, i32 0, i32 0, i32 0, i32 0) @@ -768,6 +791,41 @@ Supported Architectures Support for statepoint generation requires some code for each backend. Today, only X86_64 is supported. +Problem Areas and Active Work +============================= + +#. As the existing users of the late rewriting model have matured, we've found + cases where the optimizer breaks the assumption that an SSA value of + gc-pointer type actually contains a gc-pointer and vice-versa. We need to + clarify our expectations and propose at least one small IR change. (Today, + the gc-pointer distinction is managed via address spaces. This turns out + not to be quite strong enough.) + +#. Support for languages which allow unmanaged pointers to garbage collected + objects (i.e. pass a pointer to an object to a C routine) via pinning. + +#. Support for garbage collected objects allocated on the stack. Specifically, + allocas are always assumed to be in address space 0 and we need a + cast/promotion operator to let rewriting identify them. + +#. The current statepoint lowering is known to be somewhat poor. In the very + long term, we'd like to integrate statepoints with the register allocator; + in the near term this is unlikely to happen. We've found the quality of + lowering to be relatively unimportant as hot-statepoints are almost always + inliner bugs. + +#. Concerns have been raised that the statepoint representation results in a + large amount of IR being produced for some examples and that this + contributes to higher than expected memory usage and compile times. There's + no immediate plans to make changes due to this, but alternate models may be + explored in the future. + +#. Relocations along exceptional paths are currently broken in ToT. In + particular, there is current no way to represent a rethrow on a path which + also has relocations. See `this llvm-dev discussion + <https://groups.google.com/forum/#!topic/llvm-dev/AE417XjgxvI>`_ for more + detail. + Bugs and Enhancements ===================== diff --git a/gnu/llvm/docs/TableGen/LangIntro.rst b/gnu/llvm/docs/TableGen/LangIntro.rst index a148634e3ed..c1391e73646 100644 --- a/gnu/llvm/docs/TableGen/LangIntro.rst +++ b/gnu/llvm/docs/TableGen/LangIntro.rst @@ -232,7 +232,7 @@ the record ends with a semicolon. Here is a simple TableGen file: -.. code-block:: llvm +.. code-block:: text class C { bit V = 1; } def X : C; @@ -276,7 +276,7 @@ derived class or definition wants to override. Let expressions consist of the value. For example, a new class could be added to the example above, redefining the ``V`` field for all of its subclasses: -.. code-block:: llvm +.. code-block:: text class D : C { let V = 0; } def Z : D; @@ -295,7 +295,7 @@ concrete classes. Parameterized TableGen classes specify a list of variable bindings (which may optionally have defaults) that are bound when used. Here is a simple example: -.. code-block:: llvm +.. code-block:: text class FPFormat<bits<3> val> { bits<3> Value = val; @@ -316,7 +316,7 @@ integer. The more esoteric forms of `TableGen expressions`_ are useful in conjunction with template arguments. As an example: -.. code-block:: llvm +.. code-block:: text class ModRefVal<bits<2> val> { bits<2> Value = val; @@ -346,7 +346,7 @@ be used to decouple the interface provided to the user of the class from the actual internal data representation expected by the class. In this case, running ``llvm-tblgen`` on the example prints the following definitions: -.. code-block:: llvm +.. code-block:: text def bork { // Value bit isMod = 1; @@ -379,7 +379,7 @@ commonality exists, then in a separate place indicate what all the ops are. Here is an example TableGen fragment that shows this idea: -.. code-block:: llvm +.. code-block:: text def ops; def GPR; @@ -405,7 +405,7 @@ inherit from multiple multiclasses, instantiating definitions from each multiclass. Using a multiclass this way is exactly equivalent to instantiating the classes multiple times yourself, e.g. by writing: -.. code-block:: llvm +.. code-block:: text def ops; def GPR; @@ -432,7 +432,7 @@ the classes multiple times yourself, e.g. by writing: A ``defm`` can also be used inside a multiclass providing several levels of multiclass instantiations. -.. code-block:: llvm +.. code-block:: text class Instruction<bits<4> opc, string Name> { bits<4> opcode = opc; @@ -473,7 +473,7 @@ multiclass instantiations. the class list must start after the last multiclass, and there must be at least one multiclass before them. -.. code-block:: llvm +.. code-block:: text class XD { bits<4> Prefix = 11; } class XS { bits<4> Prefix = 12; } @@ -516,7 +516,7 @@ specified file in place of the include directive. The filename should be specified as a double quoted string immediately after the '``include``' keyword. Example: -.. code-block:: llvm +.. code-block:: text include "foo.td" @@ -532,7 +532,7 @@ commonality from the records. File-scope "let" expressions take a comma-separated list of bindings to apply, and one or more records to bind the values in. Here are some examples: -.. code-block:: llvm +.. code-block:: text let isTerminator = 1, isReturn = 1, isBarrier = 1, hasCtrlDep = 1 in def RET : I<0xC3, RawFrm, (outs), (ins), "ret", [(X86retflag 0)]>; @@ -559,7 +559,7 @@ ways to factor out commonality from the records, specially if using several levels of multiclass instantiations. This also avoids the need of using "let" expressions within subsequent records inside a multiclass. -.. code-block:: llvm +.. code-block:: text multiclass basic_r<bits<4> opc> { let Predicates = [HasSSE2] in { @@ -587,7 +587,7 @@ TableGen supports the '``foreach``' block, which textually replicates the loop body, substituting iterator values for iterator references in the body. Example: -.. code-block:: llvm +.. code-block:: text foreach i = [0, 1, 2, 3] in { def R#i : Register<...>; @@ -598,7 +598,7 @@ This will create objects ``R0``, ``R1``, ``R2`` and ``R3``. ``foreach`` blocks may be nested. If there is only one item in the body the braces may be elided: -.. code-block:: llvm +.. code-block:: text foreach i = [0, 1, 2, 3] in def R#i : Register<...>; diff --git a/gnu/llvm/docs/TableGen/LangRef.rst b/gnu/llvm/docs/TableGen/LangRef.rst index 27b2c8beaa6..58da6285c07 100644 --- a/gnu/llvm/docs/TableGen/LangRef.rst +++ b/gnu/llvm/docs/TableGen/LangRef.rst @@ -154,7 +154,7 @@ programmer. .. productionlist:: Declaration: `Type` `TokIdentifier` ["=" `Value`] -It assigns the value to the identifer. +It assigns the value to the identifier. Types ----- diff --git a/gnu/llvm/docs/TableGen/index.rst b/gnu/llvm/docs/TableGen/index.rst index 9526240d54f..5ba555ac2d2 100644 --- a/gnu/llvm/docs/TableGen/index.rst +++ b/gnu/llvm/docs/TableGen/index.rst @@ -90,7 +90,7 @@ of the classes, then all of the definitions. This is a good way to see what the various definitions expand to fully. Running this on the ``X86.td`` file prints this (at the time of this writing): -.. code-block:: llvm +.. code-block:: text ... def ADD32rr { // Instruction X86Inst I @@ -155,7 +155,7 @@ by the code generator, and specifying it all manually would be unmaintainable, prone to bugs, and tiring to do in the first place. Because we are using TableGen, all of the information was derived from the following definition: -.. code-block:: llvm +.. code-block:: text let Defs = [EFLAGS], isCommutable = 1, // X = ADD Y,Z --> X = ADD Z,Y @@ -201,7 +201,7 @@ TableGen. **TableGen definitions** are the concrete form of 'records'. These generally do not have any undefined values, and are marked with the '``def``' keyword. -.. code-block:: llvm +.. code-block:: text def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true", "Enable ARMv8 FP">; @@ -220,7 +220,7 @@ floating point instructions in the X86 backend). TableGen keeps track of all of the classes that are used to build up a definition, so the backend can find all definitions of a particular class, such as "Instruction". -.. code-block:: llvm +.. code-block:: text class ProcNoItin<string Name, list<SubtargetFeature> Features> : Processor<Name, NoItineraries, Features>; @@ -235,7 +235,7 @@ If a multiclass inherits from another multiclass, the definitions in the sub-multiclass become part of the current multiclass, as if they were declared in the current multiclass. -.. code-block:: llvm +.. code-block:: text multiclass ro_signed_pats<string T, string Rm, dag Base, dag Offset, dag Extend, dag address, ValueType sty> { diff --git a/gnu/llvm/docs/TestSuiteMakefileGuide.rst b/gnu/llvm/docs/TestSuiteMakefileGuide.rst index e2852a07351..b6f32262b06 100644 --- a/gnu/llvm/docs/TestSuiteMakefileGuide.rst +++ b/gnu/llvm/docs/TestSuiteMakefileGuide.rst @@ -1,6 +1,6 @@ -============================== -LLVM test-suite Makefile Guide -============================== +===================== +LLVM test-suite Guide +===================== .. contents:: :local: @@ -9,10 +9,11 @@ Overview ======== This document describes the features of the Makefile-based LLVM -test-suite. This way of interacting with the test-suite is deprecated in -favor of running the test-suite using LNT, but may continue to prove -useful for some users. See the Testing Guide's :ref:`test-suite Quickstart -<test-suite-quickstart>` section for more information. +test-suite as well as the cmake based replacement. This way of interacting +with the test-suite is deprecated in favor of running the test-suite using LNT, +but may continue to prove useful for some users. See the Testing +Guide's :ref:`test-suite Quickstart <test-suite-quickstart>` section for more +information. Test suite Structure ==================== @@ -83,8 +84,77 @@ generated. If a test fails, a large <program> FAILED message will be displayed. This will help you separate benign warnings from actual test failures. -Running the test suite -====================== +Running the test suite via CMake +================================ + +To run the test suite, you need to use the following steps: + +#. The test suite uses the lit test runner to run the test-suite, + you need to have lit installed first. Check out LLVM and install lit: + + .. code-block:: bash + + % svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm + % cd llvm/utils/lit + % sudo python setup.py install # Or without sudo, install in virtual-env. + running install + running bdist_egg + running egg_info + writing lit.egg-info/PKG-INFO + ... + % lit --version + lit 0.5.0dev + +#. Check out the ``test-suite`` module with: + + .. code-block:: bash + + % svn co http://llvm.org/svn/llvm-project/test-suite/trunk test-suite + +#. Use CMake to configure the test suite in a new directory. You cannot build + the test suite in the source tree. + + .. code-block:: bash + + % mkdir test-suite-build + % cd test-suite-build + % cmake ../test-suite + +#. Build the benchmarks, using the makefiles CMake generated. + +.. code-block:: bash + + % make + Scanning dependencies of target timeit-target + [ 0%] Building C object tools/CMakeFiles/timeit-target.dir/timeit.c.o + [ 0%] Linking C executable timeit-target + [ 0%] Built target timeit-target + Scanning dependencies of target fpcmp-host + [ 0%] [TEST_SUITE_HOST_CC] Building host executable fpcmp + [ 0%] Built target fpcmp-host + Scanning dependencies of target timeit-host + [ 0%] [TEST_SUITE_HOST_CC] Building host executable timeit + [ 0%] Built target timeit-host + + +#. Run the tests with lit: + +.. code-block:: bash + + % lit -v -j 1 . -o results.json + -- Testing: 474 tests, 1 threads -- + PASS: test-suite :: MultiSource/Applications/ALAC/decode/alacconvert-decode.test (1 of 474) + ********** TEST 'test-suite :: MultiSource/Applications/ALAC/decode/alacconvert-decode.test' RESULTS ********** + compile_time: 0.2192 + exec_time: 0.0462 + hash: "59620e187c6ac38b36382685ccd2b63b" + size: 83348 + ********** + PASS: test-suite :: MultiSource/Applications/ALAC/encode/alacconvert-encode.test (2 of 474) + + +Running the test suite via Makefiles (deprecated) +================================================= First, all tests are executed within the LLVM object directory tree. They *are not* executed inside of the LLVM source tree. This is because diff --git a/gnu/llvm/docs/TestingGuide.rst b/gnu/llvm/docs/TestingGuide.rst index 134ddd88c87..5dac58309e4 100644 --- a/gnu/llvm/docs/TestingGuide.rst +++ b/gnu/llvm/docs/TestingGuide.rst @@ -25,6 +25,10 @@ In order to use the LLVM testing infrastructure, you will need all of the software required to build LLVM, as well as `Python <http://python.org>`_ 2.7 or later. +If you intend to run the :ref:`test-suite <test-suite-overview>`, you will also +need a development version of zlib (zlib1g-dev is known to work on several Linux +distributions). + LLVM testing infrastructure organization ======================================== @@ -99,19 +103,11 @@ is in the ``test-suite`` module. See :ref:`test-suite Quickstart Regression tests ---------------- -To run all of the LLVM regression tests, use the master Makefile in the -``llvm/test`` directory. LLVM Makefiles require GNU Make (read the :doc:`LLVM -Makefile Guide <MakefileGuide>` for more details): - -.. code-block:: bash - - % make -C llvm/test - -or: +To run all of the LLVM regression tests use the check-llvm target: .. code-block:: bash - % make check + % make check-llvm If you have `Clang <http://clang.llvm.org/>`_ checked out and built, you can run the LLVM and Clang tests simultaneously using: @@ -391,6 +387,23 @@ depends on special features of sub-architectures, you must add the specific triple, test with the specific FileCheck and put it into the specific directory that will filter out all other architectures. +REQUIRES and REQUIRES-ANY directive +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Some tests can be enabled only in specific situation - like having +debug build. Use ``REQUIRES`` directive to specify those requirements. + +.. code-block:: llvm + + ; This test will be only enabled in the build with asserts + ; REQUIRES: asserts + +You can separate requirements by a comma. +``REQUIRES`` means all listed requirements must be satisfied. +``REQUIRES-ANY`` means at least one must be satisfied. + +List of features that can be used in ``REQUIRES`` and ``REQUIRES-ANY`` can be +found in lit.cfg files. Substitutions ------------- @@ -543,6 +556,8 @@ the last RUN: line. This has two side effects: (b) it speeds things up for really big test cases by avoiding interpretation of the remainder of the file. +.. _test-suite-overview: + ``test-suite`` Overview ======================= diff --git a/gnu/llvm/docs/TypeMetadata.rst b/gnu/llvm/docs/TypeMetadata.rst new file mode 100644 index 00000000000..98d58b71a6d --- /dev/null +++ b/gnu/llvm/docs/TypeMetadata.rst @@ -0,0 +1,226 @@ +============= +Type Metadata +============= + +Type metadata is a mechanism that allows IR modules to co-operatively build +pointer sets corresponding to addresses within a given set of globals. LLVM's +`control flow integrity`_ implementation uses this metadata to efficiently +check (at each call site) that a given address corresponds to either a +valid vtable or function pointer for a given class or function type, and its +whole-program devirtualization pass uses the metadata to identify potential +callees for a given virtual call. + +To use the mechanism, a client creates metadata nodes with two elements: + +1. a byte offset into the global (generally zero for functions) +2. a metadata object representing an identifier for the type + +These metadata nodes are associated with globals by using global object +metadata attachments with the ``!type`` metadata kind. + +Each type identifier must exclusively identify either global variables +or functions. + +.. admonition:: Limitation + + The current implementation only supports attaching metadata to functions on + the x86-32 and x86-64 architectures. + +An intrinsic, :ref:`llvm.type.test <type.test>`, is used to test whether a +given pointer is associated with a type identifier. + +.. _control flow integrity: http://clang.llvm.org/docs/ControlFlowIntegrity.html + +Representing Type Information using Type Metadata +================================================= + +This section describes how Clang represents C++ type information associated with +virtual tables using type metadata. + +Consider the following inheritance hierarchy: + +.. code-block:: c++ + + struct A { + virtual void f(); + }; + + struct B : A { + virtual void f(); + virtual void g(); + }; + + struct C { + virtual void h(); + }; + + struct D : A, C { + virtual void f(); + virtual void h(); + }; + +The virtual table objects for A, B, C and D look like this (under the Itanium ABI): + +.. csv-table:: Virtual Table Layout for A, B, C, D + :header: Class, 0, 1, 2, 3, 4, 5, 6 + + A, A::offset-to-top, &A::rtti, &A::f + B, B::offset-to-top, &B::rtti, &B::f, &B::g + C, C::offset-to-top, &C::rtti, &C::h + D, D::offset-to-top, &D::rtti, &D::f, &D::h, D::offset-to-top, &D::rtti, thunk for &D::h + +When an object of type A is constructed, the address of ``&A::f`` in A's +virtual table object is stored in the object's vtable pointer. In ABI parlance +this address is known as an `address point`_. Similarly, when an object of type +B is constructed, the address of ``&B::f`` is stored in the vtable pointer. In +this way, the vtable in B's virtual table object is compatible with A's vtable. + +D is a little more complicated, due to the use of multiple inheritance. Its +virtual table object contains two vtables, one compatible with A's vtable and +the other compatible with C's vtable. Objects of type D contain two virtual +pointers, one belonging to the A subobject and containing the address of +the vtable compatible with A's vtable, and the other belonging to the C +subobject and containing the address of the vtable compatible with C's vtable. + +The full set of compatibility information for the above class hierarchy is +shown below. The following table shows the name of a class, the offset of an +address point within that class's vtable and the name of one of the classes +with which that address point is compatible. + +.. csv-table:: Type Offsets for A, B, C, D + :header: VTable for, Offset, Compatible Class + + A, 16, A + B, 16, A + , , B + C, 16, C + D, 16, A + , , D + , 48, C + +The next step is to encode this compatibility information into the IR. The way +this is done is to create type metadata named after each of the compatible +classes, with which we associate each of the compatible address points in +each vtable. For example, these type metadata entries encode the compatibility +information for the above hierarchy: + +:: + + @_ZTV1A = constant [...], !type !0 + @_ZTV1B = constant [...], !type !0, !type !1 + @_ZTV1C = constant [...], !type !2 + @_ZTV1D = constant [...], !type !0, !type !3, !type !4 + + !0 = !{i64 16, !"_ZTS1A"} + !1 = !{i64 16, !"_ZTS1B"} + !2 = !{i64 16, !"_ZTS1C"} + !3 = !{i64 16, !"_ZTS1D"} + !4 = !{i64 48, !"_ZTS1C"} + +With this type metadata, we can now use the ``llvm.type.test`` intrinsic to +test whether a given pointer is compatible with a type identifier. Working +backwards, if ``llvm.type.test`` returns true for a particular pointer, +we can also statically determine the identities of the virtual functions +that a particular virtual call may call. For example, if a program assumes +a pointer to be a member of ``!"_ZST1A"``, we know that the address can +be only be one of ``_ZTV1A+16``, ``_ZTV1B+16`` or ``_ZTV1D+16`` (i.e. the +address points of the vtables of A, B and D respectively). If we then load +an address from that pointer, we know that the address can only be one of +``&A::f``, ``&B::f`` or ``&D::f``. + +.. _address point: https://mentorembedded.github.io/cxx-abi/abi.html#vtable-general + +Testing Addresses For Type Membership +===================================== + +If a program tests an address using ``llvm.type.test``, this will cause +a link-time optimization pass, ``LowerTypeTests``, to replace calls to this +intrinsic with efficient code to perform type member tests. At a high level, +the pass will lay out referenced globals in a consecutive memory region in +the object file, construct bit vectors that map onto that memory region, +and generate code at each of the ``llvm.type.test`` call sites to test +pointers against those bit vectors. Because of the layout manipulation, the +globals' definitions must be available at LTO time. For more information, +see the `control flow integrity design document`_. + +A type identifier that identifies functions is transformed into a jump table, +which is a block of code consisting of one branch instruction for each +of the functions associated with the type identifier that branches to the +target function. The pass will redirect any taken function addresses to the +corresponding jump table entry. In the object file's symbol table, the jump +table entries take the identities of the original functions, so that addresses +taken outside the module will pass any verification done inside the module. + +Jump tables may call external functions, so their definitions need not +be available at LTO time. Note that if an externally defined function is +associated with a type identifier, there is no guarantee that its identity +within the module will be the same as its identity outside of the module, +as the former will be the jump table entry if a jump table is necessary. + +The `GlobalLayoutBuilder`_ class is responsible for laying out the globals +efficiently to minimize the sizes of the underlying bitsets. + +.. _control flow integrity design document: http://clang.llvm.org/docs/ControlFlowIntegrityDesign.html + +:Example: + +:: + + target datalayout = "e-p:32:32" + + @a = internal global i32 0, !type !0 + @b = internal global i32 0, !type !0, !type !1 + @c = internal global i32 0, !type !1 + @d = internal global [2 x i32] [i32 0, i32 0], !type !2 + + define void @e() !type !3 { + ret void + } + + define void @f() { + ret void + } + + declare void @g() !type !3 + + !0 = !{i32 0, !"typeid1"} + !1 = !{i32 0, !"typeid2"} + !2 = !{i32 4, !"typeid2"} + !3 = !{i32 0, !"typeid3"} + + declare i1 @llvm.type.test(i8* %ptr, metadata %typeid) nounwind readnone + + define i1 @foo(i32* %p) { + %pi8 = bitcast i32* %p to i8* + %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid1") + ret i1 %x + } + + define i1 @bar(i32* %p) { + %pi8 = bitcast i32* %p to i8* + %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid2") + ret i1 %x + } + + define i1 @baz(void ()* %p) { + %pi8 = bitcast void ()* %p to i8* + %x = call i1 @llvm.type.test(i8* %pi8, metadata !"typeid3") + ret i1 %x + } + + define void @main() { + %a1 = call i1 @foo(i32* @a) ; returns 1 + %b1 = call i1 @foo(i32* @b) ; returns 1 + %c1 = call i1 @foo(i32* @c) ; returns 0 + %a2 = call i1 @bar(i32* @a) ; returns 0 + %b2 = call i1 @bar(i32* @b) ; returns 1 + %c2 = call i1 @bar(i32* @c) ; returns 1 + %d02 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 0)) ; returns 0 + %d12 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 1)) ; returns 1 + %e = call i1 @baz(void ()* @e) ; returns 1 + %f = call i1 @baz(void ()* @f) ; returns 0 + %g = call i1 @baz(void ()* @g) ; returns 1 + ret void + } + +.. _GlobalLayoutBuilder: http://llvm.org/klaus/llvm/blob/master/include/llvm/Transforms/IPO/LowerTypeTests.h diff --git a/gnu/llvm/docs/WritingAnLLVMBackend.rst b/gnu/llvm/docs/WritingAnLLVMBackend.rst index fdadbb04e94..f0f3ab5504d 100644 --- a/gnu/llvm/docs/WritingAnLLVMBackend.rst +++ b/gnu/llvm/docs/WritingAnLLVMBackend.rst @@ -135,14 +135,13 @@ First, you should create a subdirectory under ``lib/Target`` to hold all the files related to your target. If your target is called "Dummy", create the directory ``lib/Target/Dummy``. -In this new directory, create a ``Makefile``. It is easiest to copy a -``Makefile`` of another target and modify it. It should at least contain the -``LEVEL``, ``LIBRARYNAME`` and ``TARGET`` variables, and then include -``$(LEVEL)/Makefile.common``. The library can be named ``LLVMDummy`` (for -example, see the MIPS target). Alternatively, you can split the library into -``LLVMDummyCodeGen`` and ``LLVMDummyAsmPrinter``, the latter of which should be -implemented in a subdirectory below ``lib/Target/Dummy`` (for example, see the -PowerPC target). +In this new directory, create a ``CMakeLists.txt``. It is easiest to copy a +``CMakeLists.txt`` of another target and modify it. It should at least contain +the ``LLVM_TARGET_DEFINITIONS`` variable. The library can be named ``LLVMDummy`` +(for example, see the MIPS target). Alternatively, you can split the library +into ``LLVMDummyCodeGen`` and ``LLVMDummyAsmPrinter``, the latter of which +should be implemented in a subdirectory below ``lib/Target/Dummy`` (for example, +see the PowerPC target). Note that these two naming schemes are hardcoded into ``llvm-config``. Using any other naming scheme will confuse ``llvm-config`` and produce a lot of @@ -156,13 +155,12 @@ generator, you should do what all current machine backends do: create a subclass of ``LLVMTargetMachine``. (To create a target from scratch, create a subclass of ``TargetMachine``.) -To get LLVM to actually build and link your target, you need to add it to the -``TARGETS_TO_BUILD`` variable. To do this, you modify the configure script to -know about your target when parsing the ``--enable-targets`` option. Search -the configure script for ``TARGETS_TO_BUILD``, add your target to the lists -there (some creativity required), and then reconfigure. Alternatively, you can -change ``autoconf/configure.ac`` and regenerate configure by running -``./autoconf/AutoRegen.sh``. +To get LLVM to actually build and link your target, you need to run ``cmake`` +with ``-DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=Dummy``. This will build your +target without needing to add it to the list of all the targets. + +Once your target is stable, you can add it to the ``LLVM_ALL_TARGETS`` variable +located in the main ``CMakeLists.txt``. Target Machine ============== @@ -347,7 +345,7 @@ to define an object for each register. The specified string ``n`` becomes the ``Name`` of the register. The basic ``Register`` object does not have any subregisters and does not specify any aliases. -.. code-block:: llvm +.. code-block:: text class Register<string n> { string Namespace = ""; @@ -363,7 +361,7 @@ subregisters and does not specify any aliases. For example, in the ``X86RegisterInfo.td`` file, there are register definitions that utilize the ``Register`` class, such as: -.. code-block:: llvm +.. code-block:: text def AL : Register<"AL">, DwarfRegNum<[0, 0, 0]>; @@ -416,7 +414,7 @@ classes. In ``Target.td``, the ``Register`` class is the base for the ``RegisterWithSubRegs`` class that is used to define registers that need to specify subregisters in the ``SubRegs`` list, as shown here: -.. code-block:: llvm +.. code-block:: text class RegisterWithSubRegs<string n, list<Register> subregs> : Register<n> { let SubRegs = subregs; @@ -429,7 +427,7 @@ feature common to these subclasses. Note the use of "``let``" expressions to override values that are initially defined in a superclass (such as ``SubRegs`` field in the ``Rd`` class). -.. code-block:: llvm +.. code-block:: text class SparcReg<string n> : Register<n> { field bits<5> Num; @@ -454,7 +452,7 @@ field in the ``Rd`` class). In the ``SparcRegisterInfo.td`` file, there are register definitions that utilize these subclasses of ``Register``, such as: -.. code-block:: llvm +.. code-block:: text def G0 : Ri< 0, "G0">, DwarfRegNum<[0]>; def G1 : Ri< 1, "G1">, DwarfRegNum<[1]>; @@ -480,7 +478,7 @@ default allocation order of the registers. A target description file ``XXXRegisterInfo.td`` that uses ``Target.td`` can construct register classes using the following class: -.. code-block:: llvm +.. code-block:: text class RegisterClass<string namespace, list<ValueType> regTypes, int alignment, dag regList> { @@ -534,7 +532,7 @@ defines a group of 32 single-precision floating-point registers (``F0`` to ``F31``); ``DFPRegs`` defines a group of 16 double-precision registers (``D0-D15``). -.. code-block:: llvm +.. code-block:: text // F0, F1, F2, ..., F31 def FPRegs : RegisterClass<"SP", [f32], 32, (sequence "F%u", 0, 31)>; @@ -705,7 +703,7 @@ which describes one instruction. An instruction descriptor defines: The Instruction class (defined in ``Target.td``) is mostly used as a base for more complex instruction classes. -.. code-block:: llvm +.. code-block:: text class Instruction { string Namespace = ""; @@ -762,7 +760,7 @@ specific operation value for ``LD``/Load Word. The third parameter is the output destination, which is a register operand and defined in the ``Register`` target description file (``IntRegs``). -.. code-block:: llvm +.. code-block:: text def LDrr : F3_1 <3, 0b000000, (outs IntRegs:$dst), (ins MEMrr:$addr), "ld [$addr], $dst", @@ -771,7 +769,7 @@ target description file (``IntRegs``). The fourth parameter is the input source, which uses the address operand ``MEMrr`` that is defined earlier in ``SparcInstrInfo.td``: -.. code-block:: llvm +.. code-block:: text def MEMrr : Operand<i32> { let PrintMethod = "printMemOperand"; @@ -790,7 +788,7 @@ immediate value operands. For example, to perform a Load Integer instruction for a Word from an immediate operand to a register, the following instruction class is defined: -.. code-block:: llvm +.. code-block:: text def LDri : F3_2 <3, 0b000000, (outs IntRegs:$dst), (ins MEMri:$addr), "ld [$addr], $dst", @@ -803,7 +801,7 @@ creation of templates to define several instruction classes at once (using the pattern ``F3_12`` is defined to create 2 instruction classes each time ``F3_12`` is invoked: -.. code-block:: llvm +.. code-block:: text multiclass F3_12 <string OpcStr, bits<6> Op3Val, SDNode OpNode> { def rr : F3_1 <2, Op3Val, @@ -820,7 +818,7 @@ So when the ``defm`` directive is used for the ``XOR`` and ``ADD`` instructions, as seen below, it creates four instruction objects: ``XORrr``, ``XORri``, ``ADDrr``, and ``ADDri``. -.. code-block:: llvm +.. code-block:: text defm XOR : F3_12<"xor", 0b000011, xor>; defm ADD : F3_12<"add", 0b000000, add>; @@ -832,7 +830,7 @@ For example, the 10\ :sup:`th` bit represents the "greater than" condition for integers, and the 22\ :sup:`nd` bit represents the "greater than" condition for floats. -.. code-block:: llvm +.. code-block:: text def ICC_NE : ICC_VAL< 9>; // Not Equal def ICC_E : ICC_VAL< 1>; // Equal @@ -857,7 +855,7 @@ order they are defined. Fields are bound when they are assigned a value. For example, the Sparc target defines the ``XNORrr`` instruction as a ``F3_1`` format instruction having three operands. -.. code-block:: llvm +.. code-block:: text def XNORrr : F3_1<2, 0b000111, (outs IntRegs:$dst), (ins IntRegs:$b, IntRegs:$c), @@ -867,7 +865,7 @@ format instruction having three operands. The instruction templates in ``SparcInstrFormats.td`` show the base class for ``F3_1`` is ``InstSP``. -.. code-block:: llvm +.. code-block:: text class InstSP<dag outs, dag ins, string asmstr, list<dag> pattern> : Instruction { field bits<32> Inst; @@ -882,7 +880,7 @@ The instruction templates in ``SparcInstrFormats.td`` show the base class for ``InstSP`` leaves the ``op`` field unbound. -.. code-block:: llvm +.. code-block:: text class F3<dag outs, dag ins, string asmstr, list<dag> pattern> : InstSP<outs, ins, asmstr, pattern> { @@ -899,7 +897,7 @@ The instruction templates in ``SparcInstrFormats.td`` show the base class for fields. ``F3`` format instructions will bind the operands ``rd``, ``op3``, and ``rs1`` fields. -.. code-block:: llvm +.. code-block:: text class F3_1<bits<2> opVal, bits<6> op3val, dag outs, dag ins, string asmstr, list<dag> pattern> : F3<outs, ins, asmstr, pattern> { @@ -927,7 +925,7 @@ TableGen definition will add all of its operands to an enumeration in the llvm::XXX:OpName namespace and also add an entry for it into the OperandMap table, which can be queried using getNamedOperandIdx() -.. code-block:: llvm +.. code-block:: text int DstIndex = SP::getNamedOperandIdx(SP::XNORrr, SP::OpName::dst); // => 0 int BIndex = SP::getNamedOperandIdx(SP::XNORrr, SP::OpName::b); // => 1 @@ -974,7 +972,7 @@ For example, the X86 backend defines ``brtarget`` and ``brtarget8``, both instances of the TableGen ``Operand`` class, which represent branch target operands: -.. code-block:: llvm +.. code-block:: text def brtarget : Operand<OtherVT>; def brtarget8 : Operand<OtherVT>; @@ -1224,14 +1222,14 @@ definitions in ``XXXInstrInfo.td``. For example, in ``SparcInstrInfo.td``, this entry defines a register store operation, and the last parameter describes a pattern with the store DAG operator. -.. code-block:: llvm +.. code-block:: text def STrr : F3_1< 3, 0b000100, (outs), (ins MEMrr:$addr, IntRegs:$src), "st $src, [$addr]", [(store i32:$src, ADDRrr:$addr)]>; ``ADDRrr`` is a memory mode that is also defined in ``SparcInstrInfo.td``: -.. code-block:: llvm +.. code-block:: text def ADDRrr : ComplexPattern<i32, 2, "SelectADDRrr", [], []>; @@ -1242,7 +1240,7 @@ defined in an implementation of the Instructor Selector (such as In ``lib/Target/TargetSelectionDAG.td``, the DAG operator for store is defined below: -.. code-block:: llvm +.. code-block:: text def store : PatFrag<(ops node:$val, node:$ptr), (st node:$val, node:$ptr), [{ @@ -1460,7 +1458,7 @@ if the current argument is of type ``f32`` or ``f64``), then the action is performed. In this case, the ``CCAssignToReg`` action assigns the argument value to the first available register: either ``R0`` or ``R1``. -.. code-block:: llvm +.. code-block:: text CCIfType<[f32,f64], CCAssignToReg<[R0, R1]>> @@ -1471,7 +1469,7 @@ which registers are used for specified scalar return types. A single-precision float is returned to register ``F0``, and a double-precision float goes to register ``D0``. A 32-bit integer is returned in register ``I0`` or ``I1``. -.. code-block:: llvm +.. code-block:: text def RetCC_Sparc32 : CallingConv<[ CCIfType<[i32], CCAssignToReg<[I0, I1]>>, @@ -1486,7 +1484,7 @@ the size of the slot, and the second parameter, also 4, indicates the stack alignment along 4-byte units. (Special cases: if size is zero, then the ABI size is used; if alignment is zero, then the ABI alignment is used.) -.. code-block:: llvm +.. code-block:: text def CC_Sparc32 : CallingConv<[ // All arguments get passed in integer registers if there is space. @@ -1501,7 +1499,7 @@ the following example (in ``X86CallingConv.td``), the definition of assigned to the register ``ST0`` or ``ST1``, the ``RetCC_X86Common`` is invoked. -.. code-block:: llvm +.. code-block:: text def RetCC_X86_32_C : CallingConv<[ CCIfType<[f32], CCAssignToReg<[ST0, ST1]>>, @@ -1516,7 +1514,7 @@ then a specified action is invoked. In the following example (in ``RetCC_X86_32_Fast`` is invoked. If the ``SSECall`` calling convention is in use, then ``RetCC_X86_32_SSE`` is invoked. -.. code-block:: llvm +.. code-block:: text def RetCC_X86_32 : CallingConv<[ CCIfCC<"CallingConv::Fast", CCDelegateTo<RetCC_X86_32_Fast>>, @@ -1684,7 +1682,7 @@ feature, the value of the attribute, and a description of the feature. (The fifth parameter is a list of features whose presence is implied, and its default value is an empty array.) -.. code-block:: llvm +.. code-block:: text class SubtargetFeature<string n, string a, string v, string d, list<SubtargetFeature> i = []> { @@ -1698,7 +1696,7 @@ default value is an empty array.) In the ``Sparc.td`` file, the ``SubtargetFeature`` is used to define the following features. -.. code-block:: llvm +.. code-block:: text def FeatureV9 : SubtargetFeature<"v9", "IsV9", "true", "Enable SPARC-V9 instructions">; @@ -1712,7 +1710,7 @@ Elsewhere in ``Sparc.td``, the ``Proc`` class is defined and then is used to define particular SPARC processor subtypes that may have the previously described features. -.. code-block:: llvm +.. code-block:: text class Proc<string Name, list<SubtargetFeature> Features> : Processor<Name, NoItineraries, Features>; diff --git a/gnu/llvm/docs/WritingAnLLVMPass.rst b/gnu/llvm/docs/WritingAnLLVMPass.rst index 241066842b7..537bbbc19d2 100644 --- a/gnu/llvm/docs/WritingAnLLVMPass.rst +++ b/gnu/llvm/docs/WritingAnLLVMPass.rst @@ -525,6 +525,14 @@ interface. Implementing a loop pass is usually straightforward. these methods should return ``true`` if they modified the program, or ``false`` if they didn't. +A ``LoopPass`` subclass which is intended to run as part of the main loop pass +pipeline needs to preserve all of the same *function* analyses that the other +loop passes in its pipeline require. To make that easier, +a ``getLoopAnalysisUsage`` function is provided by ``LoopUtils.h``. It can be +called within the subclass's ``getAnalysisUsage`` override to get consistent +and correct behavior. Analogously, ``INITIALIZE_PASS_DEPENDENCY(LoopPass)`` +will initialize this set of function analyses. + The ``doInitialization(Loop *, LPPassManager &)`` method ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -739,7 +747,7 @@ template parameter is the name of the pass that is to be used on the command line to specify that the pass should be added to a program (for example, with :program:`opt` or :program:`bugpoint`). The first argument is the name of the pass, which is to be used for the :option:`-help` output of programs, as well -as for debug output generated by the :option:`--debug-pass` option. +as for debug output generated by the `--debug-pass` option. If you want your pass to be easily dumpable, you should implement the virtual print method: @@ -1392,7 +1400,7 @@ some with solutions, some without. * Restarting the program breaks breakpoints. After following the information above, you have succeeded in getting some breakpoints planted in your pass. - Nex thing you know, you restart the program (i.e., you type "``run``" again), + Next thing you know, you restart the program (i.e., you type "``run``" again), and you start getting errors about breakpoints being unsettable. The only way I have found to "fix" this problem is to delete the breakpoints that are already set in your pass, run the program, and re-set the breakpoints once diff --git a/gnu/llvm/docs/YamlIO.rst b/gnu/llvm/docs/YamlIO.rst index f0baeb4c69d..04e63fac6a4 100644 --- a/gnu/llvm/docs/YamlIO.rst +++ b/gnu/llvm/docs/YamlIO.rst @@ -456,10 +456,11 @@ looks like: template <> struct ScalarTraits<MyCustomType> { - static void output(const T &value, void*, llvm::raw_ostream &out) { + static void output(const MyCustomType &value, void*, + llvm::raw_ostream &out) { out << value; // do custom formatting here } - static StringRef input(StringRef scalar, void*, T &value) { + static StringRef input(StringRef scalar, void*, MyCustomType &value) { // do custom parsing here. Return the empty string on success, // or an error message on failure. return StringRef(); diff --git a/gnu/llvm/docs/conf.py b/gnu/llvm/docs/conf.py index 6e3f16ceef1..224cca14288 100644 --- a/gnu/llvm/docs/conf.py +++ b/gnu/llvm/docs/conf.py @@ -48,9 +48,9 @@ copyright = u'2003-%d, LLVM Project' % date.today().year # built documents. # # The short X.Y version. -version = '3.8' +version = '3.9' # The full version, including alpha/beta/rc tags. -release = '3.8' +release = '3.9' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. diff --git a/gnu/llvm/docs/doxygen-mainpage.dox b/gnu/llvm/docs/doxygen-mainpage.dox new file mode 100644 index 00000000000..02a74799bff --- /dev/null +++ b/gnu/llvm/docs/doxygen-mainpage.dox @@ -0,0 +1,18 @@ +/// \mainpage LLVM +/// +/// \section main_intro Introduction +/// Welcome to LLVM. +/// +/// This documentation describes the **internal** software that makes +/// up LLVM, not the **external** use of LLVM. There are no instructions +/// here on how to use LLVM, only the APIs that make up the software. For usage +/// instructions, please see the programmer's guide or reference manual. +/// +/// \section main_caveat Caveat +/// This documentation is generated directly from the source code with doxygen. +/// Since LLVM is constantly under active development, what you're about to +/// read is out of date! However, it may still be useful since certain portions +/// of LLVM are very stable. +/// +/// \section main_changelog Change Log +/// - Original content written 12/30/2003 by Reid Spencer diff --git a/gnu/llvm/docs/doxygen.cfg.in b/gnu/llvm/docs/doxygen.cfg.in index 5a74cecc8aa..7699711adce 100644 --- a/gnu/llvm/docs/doxygen.cfg.in +++ b/gnu/llvm/docs/doxygen.cfg.in @@ -745,7 +745,7 @@ WARN_LOGFILE = INPUT = @abs_top_srcdir@/include \ @abs_top_srcdir@/lib \ - @abs_top_srcdir@/docs/doxygen.intro + @abs_top_srcdir@/docs/doxygen-mainpage.dox # This tag can be used to specify the character encoding of the source files # that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses @@ -1791,18 +1791,6 @@ GENERATE_XML = NO XML_OUTPUT = xml -# The XML_SCHEMA tag can be used to specify a XML schema, which can be used by a -# validating XML parser to check the syntax of the XML files. -# This tag requires that the tag GENERATE_XML is set to YES. - -XML_SCHEMA = - -# The XML_DTD tag can be used to specify a XML DTD, which can be used by a -# validating XML parser to check the syntax of the XML files. -# This tag requires that the tag GENERATE_XML is set to YES. - -XML_DTD = - # If the XML_PROGRAMLISTING tag is set to YES doxygen will dump the program # listings (including syntax highlighting and cross-referencing information) to # the XML output. Note that enabling this will significantly increase the size @@ -2071,7 +2059,7 @@ DOT_NUM_THREADS = 0 # The default value is: Helvetica. # This tag requires that the tag HAVE_DOT is set to YES. -DOT_FONTNAME = FreeSans +DOT_FONTNAME = Helvetica # The DOT_FONTSIZE tag can be used to set the size (in points) of the font of # dot graphs. diff --git a/gnu/llvm/docs/index.rst b/gnu/llvm/docs/index.rst index 6cbce632164..a68dd1b8c73 100644 --- a/gnu/llvm/docs/index.rst +++ b/gnu/llvm/docs/index.rst @@ -60,12 +60,13 @@ representation. :hidden: CMake + CMakePrimer + AdvancedBuilds HowToBuildOnARM HowToCrossCompileLLVM CommandGuide/index GettingStarted GettingStartedVS - BuildingLLVMWithAutotools FAQ Lexicon HowToAddABuilder @@ -81,7 +82,9 @@ representation. GetElementPtr Frontend/PerformanceTips MCJITDesignAndImplementation + CodeOfConduct CompileCudaWithLLVM + ReportingGuide :doc:`GettingStarted` Discusses how to get up and running quickly with the LLVM infrastructure. @@ -102,10 +105,6 @@ representation. An addendum to the main Getting Started guide for those using Visual Studio on Windows. -:doc:`BuildingLLVMWithAutotools` - An addendum to the Getting Started guide with instructions for building LLVM - with the Autotools build system. - :doc:`tutorial/index` Tutorials about using LLVM. Includes a tutorial about making a custom language with LLVM. @@ -174,6 +173,7 @@ For developers of applications which use LLVM as a library. ProgrammersManual Extensions LibFuzzer + ScudoHardenedAllocator :doc:`LLVM Language Reference Manual <LangRef>` Defines the LLVM intermediate representation and the assembly form of the @@ -218,6 +218,9 @@ For developers of applications which use LLVM as a library. :doc:`LibFuzzer` A library for writing in-process guided fuzzers. +:doc:`ScudoHardenedAllocator` + A library that implements a security-hardened `malloc()`. + Subsystem Documentation ======================= @@ -255,7 +258,7 @@ For API clients and LLVM developers. CoverageMappingFormat Statepoints MergeFunctions - BitSets + TypeMetadata FaultMaps MIRLangRef @@ -379,7 +382,6 @@ Information about LLVM's development process. :hidden: DeveloperPolicy - MakefileGuide Projects LLVMBuild HowToReleaseLLVM @@ -400,9 +402,6 @@ Information about LLVM's development process. Describes the LLVMBuild organization and files used by LLVM to specify component descriptions. -:doc:`MakefileGuide` - Describes how the LLVM makefiles work and how to use them. - :doc:`HowToReleaseLLVM` This is a guide to preparing LLVM releases. Most developers can ignore it. diff --git a/gnu/llvm/docs/tutorial/BuildingAJIT1.rst b/gnu/llvm/docs/tutorial/BuildingAJIT1.rst new file mode 100644 index 00000000000..f30b979579d --- /dev/null +++ b/gnu/llvm/docs/tutorial/BuildingAJIT1.rst @@ -0,0 +1,375 @@ +======================================================= +Building a JIT: Starting out with KaleidoscopeJIT +======================================================= + +.. contents:: + :local: + +Chapter 1 Introduction +====================== + +Welcome to Chapter 1 of the "Building an ORC-based JIT in LLVM" tutorial. This +tutorial runs through the implementation of a JIT compiler using LLVM's +On-Request-Compilation (ORC) APIs. It begins with a simplified version of the +KaleidoscopeJIT class used in the +`Implementing a language with LLVM <LangImpl1.html>`_ tutorials and then +introduces new features like optimization, lazy compilation and remote +execution. + +The goal of this tutorial is to introduce you to LLVM's ORC JIT APIs, show how +these APIs interact with other parts of LLVM, and to teach you how to recombine +them to build a custom JIT that is suited to your use-case. + +The structure of the tutorial is: + +- Chapter #1: Investigate the simple KaleidoscopeJIT class. This will + introduce some of the basic concepts of the ORC JIT APIs, including the + idea of an ORC *Layer*. + +- `Chapter #2 <BuildingAJIT2.html>`_: Extend the basic KaleidoscopeJIT by adding + a new layer that will optimize IR and generated code. + +- `Chapter #3 <BuildingAJIT3.html>`_: Further extend the JIT by adding a + Compile-On-Demand layer to lazily compile IR. + +- `Chapter #4 <BuildingAJIT4.html>`_: Improve the laziness of our JIT by + replacing the Compile-On-Demand layer with a custom layer that uses the ORC + Compile Callbacks API directly to defer IR-generation until functions are + called. + +- `Chapter #5 <BuildingAJIT5.html>`_: Add process isolation by JITing code into + a remote process with reduced privileges using the JIT Remote APIs. + +To provide input for our JIT we will use the Kaleidoscope REPL from +`Chapter 7 <LangImpl7.html>`_ of the "Implementing a language in LLVM tutorial", +with one minor modification: We will remove the FunctionPassManager from the +code for that chapter and replace it with optimization support in our JIT class +in Chapter #2. + +Finally, a word on API generations: ORC is the 3rd generation of LLVM JIT API. +It was preceded by MCJIT, and before that by the (now deleted) legacy JIT. +These tutorials don't assume any experience with these earlier APIs, but +readers acquainted with them will see many familiar elements. Where appropriate +we will make this connection with the earlier APIs explicit to help people who +are transitioning from them to ORC. + +JIT API Basics +============== + +The purpose of a JIT compiler is to compile code "on-the-fly" as it is needed, +rather than compiling whole programs to disk ahead of time as a traditional +compiler does. To support that aim our initial, bare-bones JIT API will be: + +1. Handle addModule(Module &M) -- Make the given IR module available for + execution. +2. JITSymbol findSymbol(const std::string &Name) -- Search for pointers to + symbols (functions or variables) that have been added to the JIT. +3. void removeModule(Handle H) -- Remove a module from the JIT, releasing any + memory that had been used for the compiled code. + +A basic use-case for this API, executing the 'main' function from a module, +will look like: + +.. code-block:: c++ + + std::unique_ptr<Module> M = buildModule(); + JIT J; + Handle H = J.addModule(*M); + int (*Main)(int, char*[]) = + (int(*)(int, char*[])J.findSymbol("main").getAddress(); + int Result = Main(); + J.removeModule(H); + +The APIs that we build in these tutorials will all be variations on this simple +theme. Behind the API we will refine the implementation of the JIT to add +support for optimization and lazy compilation. Eventually we will extend the +API itself to allow higher-level program representations (e.g. ASTs) to be +added to the JIT. + +KaleidoscopeJIT +=============== + +In the previous section we described our API, now we examine a simple +implementation of it: The KaleidoscopeJIT class [1]_ that was used in the +`Implementing a language with LLVM <LangImpl1.html>`_ tutorials. We will use +the REPL code from `Chapter 7 <LangImpl7.html>`_ of that tutorial to supply the +input for our JIT: Each time the user enters an expression the REPL will add a +new IR module containing the code for that expression to the JIT. If the +expression is a top-level expression like '1+1' or 'sin(x)', the REPL will also +use the findSymbol method of our JIT class find and execute the code for the +expression, and then use the removeModule method to remove the code again +(since there's no way to re-invoke an anonymous expression). In later chapters +of this tutorial we'll modify the REPL to enable new interactions with our JIT +class, but for now we will take this setup for granted and focus our attention on +the implementation of our JIT itself. + +Our KaleidoscopeJIT class is defined in the KaleidoscopeJIT.h header. After the +usual include guards and #includes [2]_, we get to the definition of our class: + +.. code-block:: c++ + + #ifndef LLVM_EXECUTIONENGINE_ORC_KALEIDOSCOPEJIT_H + #define LLVM_EXECUTIONENGINE_ORC_KALEIDOSCOPEJIT_H + + #include "llvm/ExecutionEngine/ExecutionEngine.h" + #include "llvm/ExecutionEngine/RTDyldMemoryManager.h" + #include "llvm/ExecutionEngine/Orc/CompileUtils.h" + #include "llvm/ExecutionEngine/Orc/IRCompileLayer.h" + #include "llvm/ExecutionEngine/Orc/LambdaResolver.h" + #include "llvm/ExecutionEngine/Orc/ObjectLinkingLayer.h" + #include "llvm/IR/Mangler.h" + #include "llvm/Support/DynamicLibrary.h" + + namespace llvm { + namespace orc { + + class KaleidoscopeJIT { + private: + + std::unique_ptr<TargetMachine> TM; + const DataLayout DL; + ObjectLinkingLayer<> ObjectLayer; + IRCompileLayer<decltype(ObjectLayer)> CompileLayer; + + public: + + typedef decltype(CompileLayer)::ModuleSetHandleT ModuleHandleT; + +Our class begins with four members: A TargetMachine, TM, which will be used +to build our LLVM compiler instance; A DataLayout, DL, which will be used for +symbol mangling (more on that later), and two ORC *layers*: an +ObjectLinkingLayer and a IRCompileLayer. We'll be talking more about layers in +the next chapter, but for now you can think of them as analogous to LLVM +Passes: they wrap up useful JIT utilities behind an easy to compose interface. +The first layer, ObjectLinkingLayer, is the foundation of our JIT: it takes +in-memory object files produced by a compiler and links them on the fly to make +them executable. This JIT-on-top-of-a-linker design was introduced in MCJIT, +however the linker was hidden inside the MCJIT class. In ORC we expose the +linker so that clients can access and configure it directly if they need to. In +this tutorial our ObjectLinkingLayer will just be used to support the next layer +in our stack: the IRCompileLayer, which will be responsible for taking LLVM IR, +compiling it, and passing the resulting in-memory object files down to the +object linking layer below. + +That's it for member variables, after that we have a single typedef: +ModuleHandle. This is the handle type that will be returned from our JIT's +addModule method, and can be passed to the removeModule method to remove a +module. The IRCompileLayer class already provides a convenient handle type +(IRCompileLayer::ModuleSetHandleT), so we just alias our ModuleHandle to this. + +.. code-block:: c++ + + KaleidoscopeJIT() + : TM(EngineBuilder().selectTarget()), DL(TM->createDataLayout()), + CompileLayer(ObjectLayer, SimpleCompiler(*TM)) { + llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr); + } + + TargetMachine &getTargetMachine() { return *TM; } + +Next up we have our class constructor. We begin by initializing TM using the +EngineBuilder::selectTarget helper method, which constructs a TargetMachine for +the current process. Next we use our newly created TargetMachine to initialize +DL, our DataLayout. Then we initialize our IRCompileLayer. Our IRCompile layer +needs two things: (1) A reference to our object linking layer, and (2) a +compiler instance to use to perform the actual compilation from IR to object +files. We use the off-the-shelf SimpleCompiler instance for now. Finally, in +the body of the constructor, we call the DynamicLibrary::LoadLibraryPermanently +method with a nullptr argument. Normally the LoadLibraryPermanently method is +called with the path of a dynamic library to load, but when passed a null +pointer it will 'load' the host process itself, making its exported symbols +available for execution. + +.. code-block:: c++ + + ModuleHandle addModule(std::unique_ptr<Module> M) { + // Build our symbol resolver: + // Lambda 1: Look back into the JIT itself to find symbols that are part of + // the same "logical dylib". + // Lambda 2: Search for external symbols in the host process. + auto Resolver = createLambdaResolver( + [&](const std::string &Name) { + if (auto Sym = CompileLayer.findSymbol(Name, false)) + return Sym.toRuntimeDyldSymbol(); + return RuntimeDyld::SymbolInfo(nullptr); + }, + [](const std::string &S) { + if (auto SymAddr = + RTDyldMemoryManager::getSymbolAddressInProcess(Name)) + return RuntimeDyld::SymbolInfo(SymAddr, JITSymbolFlags::Exported); + return RuntimeDyld::SymbolInfo(nullptr); + }); + + // Build a singlton module set to hold our module. + std::vector<std::unique_ptr<Module>> Ms; + Ms.push_back(std::move(M)); + + // Add the set to the JIT with the resolver we created above and a newly + // created SectionMemoryManager. + return CompileLayer.addModuleSet(std::move(Ms), + make_unique<SectionMemoryManager>(), + std::move(Resolver)); + } + +Now we come to the first of our JIT API methods: addModule. This method is +responsible for adding IR to the JIT and making it available for execution. In +this initial implementation of our JIT we will make our modules "available for +execution" by adding them straight to the IRCompileLayer, which will +immediately compile them. In later chapters we will teach our JIT to be lazier +and instead add the Modules to a "pending" list to be compiled if and when they +are first executed. + +To add our module to the IRCompileLayer we need to supply two auxiliary objects +(as well as the module itself): a memory manager and a symbol resolver. The +memory manager will be responsible for managing the memory allocated to JIT'd +machine code, setting memory permissions, and registering exception handling +tables (if the JIT'd code uses exceptions). For our memory manager we will use +the SectionMemoryManager class: another off-the-shelf utility that provides all +the basic functionality we need. The second auxiliary class, the symbol +resolver, is more interesting for us. It exists to tell the JIT where to look +when it encounters an *external symbol* in the module we are adding. External +symbols are any symbol not defined within the module itself, including calls to +functions outside the JIT and calls to functions defined in other modules that +have already been added to the JIT. It may seem as though modules added to the +JIT should "know about one another" by default, but since we would still have to +supply a symbol resolver for references to code outside the JIT it turns out to +be easier to just re-use this one mechanism for all symbol resolution. This has +the added benefit that the user has full control over the symbol resolution +process. Should we search for definitions within the JIT first, then fall back +on external definitions? Or should we prefer external definitions where +available and only JIT code if we don't already have an available +implementation? By using a single symbol resolution scheme we are free to choose +whatever makes the most sense for any given use case. + +Building a symbol resolver is made especially easy by the *createLambdaResolver* +function. This function takes two lambdas [3]_ and returns a +RuntimeDyld::SymbolResolver instance. The first lambda is used as the +implementation of the resolver's findSymbolInLogicalDylib method, which searches +for symbol definitions that should be thought of as being part of the same +"logical" dynamic library as this Module. If you are familiar with static +linking: this means that findSymbolInLogicalDylib should expose symbols with +common linkage and hidden visibility. If all this sounds foreign you can ignore +the details and just remember that this is the first method that the linker will +use to try to find a symbol definition. If the findSymbolInLogicalDylib method +returns a null result then the linker will call the second symbol resolver +method, called findSymbol, which searches for symbols that should be thought of +as external to (but visibile from) the module and its logical dylib. In this +tutorial we will adopt the following simple scheme: All modules added to the JIT +will behave as if they were linked into a single, ever-growing logical dylib. To +implement this our first lambda (the one defining findSymbolInLogicalDylib) will +just search for JIT'd code by calling the CompileLayer's findSymbol method. If +we don't find a symbol in the JIT itself we'll fall back to our second lambda, +which implements findSymbol. This will use the +RTDyldMemoyrManager::getSymbolAddressInProcess method to search for the symbol +within the program itself. If we can't find a symbol definition via either of +these paths the JIT will refuse to accept our module, returning a "symbol not +found" error. + +Now that we've built our symbol resolver we're ready to add our module to the +JIT. We do this by calling the CompileLayer's addModuleSet method [4]_. Since +we only have a single Module and addModuleSet expects a collection, we will +create a vector of modules and add our module as the only member. Since we +have already typedef'd our ModuleHandle type to be the same as the +CompileLayer's handle type, we can return the handle from addModuleSet +directly from our addModule method. + +.. code-block:: c++ + + JITSymbol findSymbol(const std::string Name) { + std::string MangledName; + raw_string_ostream MangledNameStream(MangledName); + Mangler::getNameWithPrefix(MangledNameStream, Name, DL); + return CompileLayer.findSymbol(MangledNameStream.str(), true); + } + + void removeModule(ModuleHandle H) { + CompileLayer.removeModuleSet(H); + } + +Now that we can add code to our JIT, we need a way to find the symbols we've +added to it. To do that we call the findSymbol method on our IRCompileLayer, +but with a twist: We have to *mangle* the name of the symbol we're searching +for first. The reason for this is that the ORC JIT components use mangled +symbols internally the same way a static compiler and linker would, rather +than using plain IR symbol names. The kind of mangling will depend on the +DataLayout, which in turn depends on the target platform. To allow us to +remain portable and search based on the un-mangled name, we just re-produce +this mangling ourselves. + +We now come to the last method in our JIT API: removeModule. This method is +responsible for destructing the MemoryManager and SymbolResolver that were +added with a given module, freeing any resources they were using in the +process. In our Kaleidoscope demo we rely on this method to remove the module +representing the most recent top-level expression, preventing it from being +treated as a duplicate definition when the next top-level expression is +entered. It is generally good to free any module that you know you won't need +to call further, just to free up the resources dedicated to it. However, you +don't strictly need to do this: All resources will be cleaned up when your +JIT class is destructed, if the haven't been freed before then. + +This brings us to the end of Chapter 1 of Building a JIT. You now have a basic +but fully functioning JIT stack that you can use to take LLVM IR and make it +executable within the context of your JIT process. In the next chapter we'll +look at how to extend this JIT to produce better quality code, and in the +process take a deeper look at the ORC layer concept. + +`Next: Extending the KaleidoscopeJIT <BuildingAJIT2.html>`_ + +Full Code Listing +================= + +Here is the complete code listing for our running example. To build this +example, use: + +.. code-block:: bash + + # Compile + clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orc native` -O3 -o toy + # Run + ./toy + +Here is the code: + +.. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter1/KaleidoscopeJIT.h + :language: c++ + +.. [1] Actually we use a cut-down version of KaleidoscopeJIT that makes a + simplifying assumption: symbols cannot be re-defined. This will make it + impossible to re-define symbols in the REPL, but will make our symbol + lookup logic simpler. Re-introducing support for symbol redefinition is + left as an exercise for the reader. (The KaleidoscopeJIT.h used in the + original tutorials will be a helpful reference). + +.. [2] +-----------------------+-----------------------------------------------+ + | File | Reason for inclusion | + +=======================+===============================================+ + | ExecutionEngine.h | Access to the EngineBuilder::selectTarget | + | | method. | + +-----------------------+-----------------------------------------------+ + | | Access to the | + | RTDyldMemoryManager.h | RTDyldMemoryManager::getSymbolAddressInProcess| + | | method. | + +-----------------------+-----------------------------------------------+ + | CompileUtils.h | Provides the SimpleCompiler class. | + +-----------------------+-----------------------------------------------+ + | IRCompileLayer.h | Provides the IRCompileLayer class. | + +-----------------------+-----------------------------------------------+ + | | Access the createLambdaResolver function, | + | LambdaResolver.h | which provides easy construction of symbol | + | | resolvers. | + +-----------------------+-----------------------------------------------+ + | ObjectLinkingLayer.h | Provides the ObjectLinkingLayer class. | + +-----------------------+-----------------------------------------------+ + | Mangler.h | Provides the Mangler class for platform | + | | specific name-mangling. | + +-----------------------+-----------------------------------------------+ + | DynamicLibrary.h | Provides the DynamicLibrary class, which | + | | makes symbols in the host process searchable. | + +-----------------------+-----------------------------------------------+ + +.. [3] Actually they don't have to be lambdas, any object with a call operator + will do, including plain old functions or std::functions. + +.. [4] ORC layers accept sets of Modules, rather than individual ones, so that + all Modules in the set could be co-located by the memory manager, though + this feature is not yet implemented. diff --git a/gnu/llvm/docs/tutorial/BuildingAJIT2.rst b/gnu/llvm/docs/tutorial/BuildingAJIT2.rst new file mode 100644 index 00000000000..8fa92317f54 --- /dev/null +++ b/gnu/llvm/docs/tutorial/BuildingAJIT2.rst @@ -0,0 +1,336 @@ +===================================================================== +Building a JIT: Adding Optimizations -- An introduction to ORC Layers +===================================================================== + +.. contents:: + :local: + +**This tutorial is under active development. It is incomplete and details may +change frequently.** Nonetheless we invite you to try it out as it stands, and +we welcome any feedback. + +Chapter 2 Introduction +====================== + +Welcome to Chapter 2 of the "Building an ORC-based JIT in LLVM" tutorial. In +`Chapter 1 <BuildingAJIT1.html>`_ of this series we examined a basic JIT +class, KaleidoscopeJIT, that could take LLVM IR modules as input and produce +executable code in memory. KaleidoscopeJIT was able to do this with relatively +little code by composing two off-the-shelf *ORC layers*: IRCompileLayer and +ObjectLinkingLayer, to do much of the heavy lifting. + +In this layer we'll learn more about the ORC layer concept by using a new layer, +IRTransformLayer, to add IR optimization support to KaleidoscopeJIT. + +Optimizing Modules using the IRTransformLayer +============================================= + +In `Chapter 4 <LangImpl4.html>`_ of the "Implementing a language with LLVM" +tutorial series the llvm *FunctionPassManager* is introduced as a means for +optimizing LLVM IR. Interested readers may read that chapter for details, but +in short: to optimize a Module we create an llvm::FunctionPassManager +instance, configure it with a set of optimizations, then run the PassManager on +a Module to mutate it into a (hopefully) more optimized but semantically +equivalent form. In the original tutorial series the FunctionPassManager was +created outside the KaleidoscopeJIT and modules were optimized before being +added to it. In this Chapter we will make optimization a phase of our JIT +instead. For now this will provide us a motivation to learn more about ORC +layers, but in the long term making optimization part of our JIT will yield an +important benefit: When we begin lazily compiling code (i.e. deferring +compilation of each function until the first time it's run), having +optimization managed by our JIT will allow us to optimize lazily too, rather +than having to do all our optimization up-front. + +To add optimization support to our JIT we will take the KaleidoscopeJIT from +Chapter 1 and compose an ORC *IRTransformLayer* on top. We will look at how the +IRTransformLayer works in more detail below, but the interface is simple: the +constructor for this layer takes a reference to the layer below (as all layers +do) plus an *IR optimization function* that it will apply to each Module that +is added via addModuleSet: + +.. code-block:: c++ + + class KaleidoscopeJIT { + private: + std::unique_ptr<TargetMachine> TM; + const DataLayout DL; + ObjectLinkingLayer<> ObjectLayer; + IRCompileLayer<decltype(ObjectLayer)> CompileLayer; + + typedef std::function<std::unique_ptr<Module>(std::unique_ptr<Module>)> + OptimizeFunction; + + IRTransformLayer<decltype(CompileLayer), OptimizeFunction> OptimizeLayer; + + public: + typedef decltype(OptimizeLayer)::ModuleSetHandleT ModuleHandle; + + KaleidoscopeJIT() + : TM(EngineBuilder().selectTarget()), DL(TM->createDataLayout()), + CompileLayer(ObjectLayer, SimpleCompiler(*TM)), + OptimizeLayer(CompileLayer, + [this](std::unique_ptr<Module> M) { + return optimizeModule(std::move(M)); + }) { + llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr); + } + +Our extended KaleidoscopeJIT class starts out the same as it did in Chapter 1, +but after the CompileLayer we introduce a typedef for our optimization function. +In this case we use a std::function (a handy wrapper for "function-like" things) +from a single unique_ptr<Module> input to a std::unique_ptr<Module> output. With +our optimization function typedef in place we can declare our OptimizeLayer, +which sits on top of our CompileLayer. + +To initialize our OptimizeLayer we pass it a reference to the CompileLayer +below (standard practice for layers), and we initialize the OptimizeFunction +using a lambda that calls out to an "optimizeModule" function that we will +define below. + +.. code-block:: c++ + + // ... + auto Resolver = createLambdaResolver( + [&](const std::string &Name) { + if (auto Sym = OptimizeLayer.findSymbol(Name, false)) + return Sym.toRuntimeDyldSymbol(); + return RuntimeDyld::SymbolInfo(nullptr); + }, + // ... + +.. code-block:: c++ + + // ... + return OptimizeLayer.addModuleSet(std::move(Ms), + make_unique<SectionMemoryManager>(), + std::move(Resolver)); + // ... + +.. code-block:: c++ + + // ... + return OptimizeLayer.findSymbol(MangledNameStream.str(), true); + // ... + +.. code-block:: c++ + + // ... + OptimizeLayer.removeModuleSet(H); + // ... + +Next we need to replace references to 'CompileLayer' with references to +OptimizeLayer in our key methods: addModule, findSymbol, and removeModule. In +addModule we need to be careful to replace both references: the findSymbol call +inside our resolver, and the call through to addModuleSet. + +.. code-block:: c++ + + std::unique_ptr<Module> optimizeModule(std::unique_ptr<Module> M) { + // Create a function pass manager. + auto FPM = llvm::make_unique<legacy::FunctionPassManager>(M.get()); + + // Add some optimizations. + FPM->add(createInstructionCombiningPass()); + FPM->add(createReassociatePass()); + FPM->add(createGVNPass()); + FPM->add(createCFGSimplificationPass()); + FPM->doInitialization(); + + // Run the optimizations over all functions in the module being added to + // the JIT. + for (auto &F : *M) + FPM->run(F); + + return M; + } + +At the bottom of our JIT we add a private method to do the actual optimization: +*optimizeModule*. This function sets up a FunctionPassManager, adds some passes +to it, runs it over every function in the module, and then returns the mutated +module. The specific optimizations are the same ones used in +`Chapter 4 <LangImpl4.html>`_ of the "Implementing a language with LLVM" +tutorial series. Readers may visit that chapter for a more in-depth +discussion of these, and of IR optimization in general. + +And that's it in terms of changes to KaleidoscopeJIT: When a module is added via +addModule the OptimizeLayer will call our optimizeModule function before passing +the transformed module on to the CompileLayer below. Of course, we could have +called optimizeModule directly in our addModule function and not gone to the +bother of using the IRTransformLayer, but doing so gives us another opportunity +to see how layers compose. It also provides a neat entry point to the *layer* +concept itself, because IRTransformLayer turns out to be one of the simplest +implementations of the layer concept that can be devised: + +.. code-block:: c++ + + template <typename BaseLayerT, typename TransformFtor> + class IRTransformLayer { + public: + typedef typename BaseLayerT::ModuleSetHandleT ModuleSetHandleT; + + IRTransformLayer(BaseLayerT &BaseLayer, + TransformFtor Transform = TransformFtor()) + : BaseLayer(BaseLayer), Transform(std::move(Transform)) {} + + template <typename ModuleSetT, typename MemoryManagerPtrT, + typename SymbolResolverPtrT> + ModuleSetHandleT addModuleSet(ModuleSetT Ms, + MemoryManagerPtrT MemMgr, + SymbolResolverPtrT Resolver) { + + for (auto I = Ms.begin(), E = Ms.end(); I != E; ++I) + *I = Transform(std::move(*I)); + + return BaseLayer.addModuleSet(std::move(Ms), std::move(MemMgr), + std::move(Resolver)); + } + + void removeModuleSet(ModuleSetHandleT H) { BaseLayer.removeModuleSet(H); } + + JITSymbol findSymbol(const std::string &Name, bool ExportedSymbolsOnly) { + return BaseLayer.findSymbol(Name, ExportedSymbolsOnly); + } + + JITSymbol findSymbolIn(ModuleSetHandleT H, const std::string &Name, + bool ExportedSymbolsOnly) { + return BaseLayer.findSymbolIn(H, Name, ExportedSymbolsOnly); + } + + void emitAndFinalize(ModuleSetHandleT H) { + BaseLayer.emitAndFinalize(H); + } + + TransformFtor& getTransform() { return Transform; } + + const TransformFtor& getTransform() const { return Transform; } + + private: + BaseLayerT &BaseLayer; + TransformFtor Transform; + }; + +This is the whole definition of IRTransformLayer, from +``llvm/include/llvm/ExecutionEngine/Orc/IRTransformLayer.h``, stripped of its +comments. It is a template class with two template arguments: ``BaesLayerT`` and +``TransformFtor`` that provide the type of the base layer and the type of the +"transform functor" (in our case a std::function) respectively. This class is +concerned with two very simple jobs: (1) Running every IR Module that is added +with addModuleSet through the transform functor, and (2) conforming to the ORC +layer interface. The interface consists of one typedef and five methods: + ++------------------+-----------------------------------------------------------+ +| Interface | Description | ++==================+===========================================================+ +| | Provides a handle that can be used to identify a module | +| ModuleSetHandleT | set when calling findSymbolIn, removeModuleSet, or | +| | emitAndFinalize. | ++------------------+-----------------------------------------------------------+ +| | Takes a given set of Modules and makes them "available | +| | for execution. This means that symbols in those modules | +| | should be searchable via findSymbol and findSymbolIn, and | +| | the address of the symbols should be read/writable (for | +| | data symbols), or executable (for function symbols) after | +| | JITSymbol::getAddress() is called. Note: This means that | +| addModuleSet | addModuleSet doesn't have to compile (or do any other | +| | work) up-front. It *can*, like IRCompileLayer, act | +| | eagerly, but it can also simply record the module and | +| | take no further action until somebody calls | +| | JITSymbol::getAddress(). In IRTransformLayer's case | +| | addModuleSet eagerly applies the transform functor to | +| | each module in the set, then passes the resulting set | +| | of mutated modules down to the layer below. | ++------------------+-----------------------------------------------------------+ +| | Removes a set of modules from the JIT. Code or data | +| removeModuleSet | defined in these modules will no longer be available, and | +| | the memory holding the JIT'd definitions will be freed. | ++------------------+-----------------------------------------------------------+ +| | Searches for the named symbol in all modules that have | +| | previously been added via addModuleSet (and not yet | +| findSymbol | removed by a call to removeModuleSet). In | +| | IRTransformLayer we just pass the query on to the layer | +| | below. In our REPL this is our default way to search for | +| | function definitions. | ++------------------+-----------------------------------------------------------+ +| | Searches for the named symbol in the module set indicated | +| | by the given ModuleSetHandleT. This is just an optimized | +| | search, better for lookup-speed when you know exactly | +| | a symbol definition should be found. In IRTransformLayer | +| findSymbolIn | we just pass this query on to the layer below. In our | +| | REPL we use this method to search for functions | +| | representing top-level expressions, since we know exactly | +| | where we'll find them: in the top-level expression module | +| | we just added. | ++------------------+-----------------------------------------------------------+ +| | Forces all of the actions required to make the code and | +| | data in a module set (represented by a ModuleSetHandleT) | +| | accessible. Behaves as if some symbol in the set had been | +| | searched for and JITSymbol::getSymbolAddress called. This | +| emitAndFinalize | is rarely needed, but can be useful when dealing with | +| | layers that usually behave lazily if the user wants to | +| | trigger early compilation (for example, to use idle CPU | +| | time to eagerly compile code in the background). | ++------------------+-----------------------------------------------------------+ + +This interface attempts to capture the natural operations of a JIT (with some +wrinkles like emitAndFinalize for performance), similar to the basic JIT API +operations we identified in Chapter 1. Conforming to the layer concept allows +classes to compose neatly by implementing their behaviors in terms of the these +same operations, carried out on the layer below. For example, an eager layer +(like IRTransformLayer) can implement addModuleSet by running each module in the +set through its transform up-front and immediately passing the result to the +layer below. A lazy layer, by contrast, could implement addModuleSet by +squirreling away the modules doing no other up-front work, but applying the +transform (and calling addModuleSet on the layer below) when the client calls +findSymbol instead. The JIT'd program behavior will be the same either way, but +these choices will have different performance characteristics: Doing work +eagerly means the JIT takes longer up-front, but proceeds smoothly once this is +done. Deferring work allows the JIT to get up-and-running quickly, but will +force the JIT to pause and wait whenever some code or data is needed that hasn't +already been processed. + +Our current REPL is eager: Each function definition is optimized and compiled as +soon as it's typed in. If we were to make the transform layer lazy (but not +change things otherwise) we could defer optimization until the first time we +reference a function in a top-level expression (see if you can figure out why, +then check out the answer below [1]_). In the next chapter, however we'll +introduce fully lazy compilation, in which function's aren't compiled until +they're first called at run-time. At this point the trade-offs get much more +interesting: the lazier we are, the quicker we can start executing the first +function, but the more often we'll have to pause to compile newly encountered +functions. If we only code-gen lazily, but optimize eagerly, we'll have a slow +startup (which everything is optimized) but relatively short pauses as each +function just passes through code-gen. If we both optimize and code-gen lazily +we can start executing the first function more quickly, but we'll have longer +pauses as each function has to be both optimized and code-gen'd when it's first +executed. Things become even more interesting if we consider interproceedural +optimizations like inlining, which must be performed eagerly. These are +complex trade-offs, and there is no one-size-fits all solution to them, but by +providing composable layers we leave the decisions to the person implementing +the JIT, and make it easy for them to experiment with different configurations. + +`Next: Adding Per-function Lazy Compilation <BuildingAJIT3.html>`_ + +Full Code Listing +================= + +Here is the complete code listing for our running example with an +IRTransformLayer added to enable optimization. To build this example, use: + +.. code-block:: bash + + # Compile + clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orc native` -O3 -o toy + # Run + ./toy + +Here is the code: + +.. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter2/KaleidoscopeJIT.h + :language: c++ + +.. [1] When we add our top-level expression to the JIT, any calls to functions + that we defined earlier will appear to the ObjectLinkingLayer as + external symbols. The ObjectLinkingLayer will call the SymbolResolver + that we defined in addModuleSet, which in turn calls findSymbol on the + OptimizeLayer, at which point even a lazy transform layer will have to + do its work. diff --git a/gnu/llvm/docs/tutorial/BuildingAJIT3.rst b/gnu/llvm/docs/tutorial/BuildingAJIT3.rst new file mode 100644 index 00000000000..ba0dab91c4e --- /dev/null +++ b/gnu/llvm/docs/tutorial/BuildingAJIT3.rst @@ -0,0 +1,171 @@ +============================================= +Building a JIT: Per-function Lazy Compilation +============================================= + +.. contents:: + :local: + +**This tutorial is under active development. It is incomplete and details may +change frequently.** Nonetheless we invite you to try it out as it stands, and +we welcome any feedback. + +Chapter 3 Introduction +====================== + +Welcome to Chapter 3 of the "Building an ORC-based JIT in LLVM" tutorial. This +chapter discusses lazy JITing and shows you how to enable it by adding an ORC +CompileOnDemand layer the JIT from `Chapter 2 <BuildingAJIT2.html>`_. + +Lazy Compilation +================ + +When we add a module to the KaleidoscopeJIT class described in Chapter 2 it is +immediately optimized, compiled and linked for us by the IRTransformLayer, +IRCompileLayer and ObjectLinkingLayer respectively. This scheme, where all the +work to make a Module executable is done up front, is relatively simple to +understand its performance characteristics are easy to reason about. However, +it will lead to very high startup times if the amount of code to be compiled is +large, and may also do a lot of unnecessary compilation if only a few compiled +functions are ever called at runtime. A truly "just-in-time" compiler should +allow us to defer the compilation of any given function until the moment that +function is first called, improving launch times and eliminating redundant work. +In fact, the ORC APIs provide us with a layer to lazily compile LLVM IR: +*CompileOnDemandLayer*. + +The CompileOnDemandLayer conforms to the layer interface described in Chapter 2, +but the addModuleSet method behaves quite differently from the layers we have +seen so far: rather than doing any work up front, it just constructs a *stub* +for each function in the module and arranges for the stub to trigger compilation +of the actual function the first time it is called. Because stub functions are +very cheap to produce CompileOnDemand's addModuleSet method runs very quickly, +reducing the time required to launch the first function to be executed, and +saving us from doing any redundant compilation. By conforming to the layer +interface, CompileOnDemand can be easily added on top of our existing JIT class. +We just need a few changes: + +.. code-block:: c++ + + ... + #include "llvm/ExecutionEngine/SectionMemoryManager.h" + #include "llvm/ExecutionEngine/Orc/CompileOnDemandLayer.h" + #include "llvm/ExecutionEngine/Orc/CompileUtils.h" + ... + + ... + class KaleidoscopeJIT { + private: + std::unique_ptr<TargetMachine> TM; + const DataLayout DL; + std::unique_ptr<JITCompileCallbackManager> CompileCallbackManager; + ObjectLinkingLayer<> ObjectLayer; + IRCompileLayer<decltype(ObjectLayer)> CompileLayer; + + typedef std::function<std::unique_ptr<Module>(std::unique_ptr<Module>)> + OptimizeFunction; + + IRTransformLayer<decltype(CompileLayer), OptimizeFunction> OptimizeLayer; + CompileOnDemandLayer<decltype(OptimizeLayer)> CODLayer; + + public: + typedef decltype(CODLayer)::ModuleSetHandleT ModuleHandle; + +First we need to include the CompileOnDemandLayer.h header, then add two new +members: a std::unique_ptr<CompileCallbackManager> and a CompileOnDemandLayer, +to our class. The CompileCallbackManager is a utility that enables us to +create re-entry points into the compiler for functions that we want to lazily +compile. In the next chapter we'll be looking at this class in detail, but for +now we'll be treating it as an opaque utility: We just need to pass a reference +to it into our new CompileOnDemandLayer, and the layer will do all the work of +setting up the callbacks using the callback manager we gave it. + +.. code-block:: c++ + + KaleidoscopeJIT() + : TM(EngineBuilder().selectTarget()), DL(TM->createDataLayout()), + CompileLayer(ObjectLayer, SimpleCompiler(*TM)), + OptimizeLayer(CompileLayer, + [this](std::unique_ptr<Module> M) { + return optimizeModule(std::move(M)); + }), + CompileCallbackManager( + orc::createLocalCompileCallbackManager(TM->getTargetTriple(), 0)), + CODLayer(OptimizeLayer, + [this](Function &F) { return std::set<Function*>({&F}); }, + *CompileCallbackManager, + orc::createLocalIndirectStubsManagerBuilder( + TM->getTargetTriple())) { + llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr); + } + +Next we have to update our constructor to initialize the new members. To create +an appropriate compile callback manager we use the +createLocalCompileCallbackManager function, which takes a TargetMachine and a +TargetAddress to call if it receives a request to compile an unknown function. +In our simple JIT this situation is unlikely to come up, so we'll cheat and +just pass '0' here. In a production quality JIT you could give the address of a +function that throws an exception in order to unwind the JIT'd code stack. + +Now we can construct our CompileOnDemandLayer. Following the pattern from +previous layers we start by passing a reference to the next layer down in our +stack -- the OptimizeLayer. Next we need to supply a 'partitioning function': +when a not-yet-compiled function is called, the CompileOnDemandLayer will call +this function to ask us what we would like to compile. At a minimum we need to +compile the function being called (given by the argument to the partitioning +function), but we could also request that the CompileOnDemandLayer compile other +functions that are unconditionally called (or highly likely to be called) from +the function being called. For KaleidoscopeJIT we'll keep it simple and just +request compilation of the function that was called. Next we pass a reference to +our CompileCallbackManager. Finally, we need to supply an "indirect stubs +manager builder". This is a function that constructs IndirectStubManagers, which +are in turn used to build the stubs for each module. The CompileOnDemandLayer +will call the indirect stub manager builder once for each call to addModuleSet, +and use the resulting indirect stubs manager to create stubs for all functions +in all modules added. If/when the module set is removed from the JIT the +indirect stubs manager will be deleted, freeing any memory allocated to the +stubs. We supply this function by using the +createLocalIndirectStubsManagerBuilder utility. + +.. code-block:: c++ + + // ... + if (auto Sym = CODLayer.findSymbol(Name, false)) + // ... + return CODLayer.addModuleSet(std::move(Ms), + make_unique<SectionMemoryManager>(), + std::move(Resolver)); + // ... + + // ... + return CODLayer.findSymbol(MangledNameStream.str(), true); + // ... + + // ... + CODLayer.removeModuleSet(H); + // ... + +Finally, we need to replace the references to OptimizeLayer in our addModule, +findSymbol, and removeModule methods. With that, we're up and running. + +**To be done:** + +** Discuss CompileCallbackManagers and IndirectStubManagers in more detail.** + +Full Code Listing +================= + +Here is the complete code listing for our running example with a CompileOnDemand +layer added to enable lazy function-at-a-time compilation. To build this example, use: + +.. code-block:: bash + + # Compile + clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orc native` -O3 -o toy + # Run + ./toy + +Here is the code: + +.. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter3/KaleidoscopeJIT.h + :language: c++ + +`Next: Extreme Laziness -- Using Compile Callbacks to JIT directly from ASTs <BuildingAJIT4.html>`_ diff --git a/gnu/llvm/docs/tutorial/BuildingAJIT4.rst b/gnu/llvm/docs/tutorial/BuildingAJIT4.rst new file mode 100644 index 00000000000..39d9198a85c --- /dev/null +++ b/gnu/llvm/docs/tutorial/BuildingAJIT4.rst @@ -0,0 +1,48 @@ +=========================================================================== +Building a JIT: Extreme Laziness - Using Compile Callbacks to JIT from ASTs +=========================================================================== + +.. contents:: + :local: + +**This tutorial is under active development. It is incomplete and details may +change frequently.** Nonetheless we invite you to try it out as it stands, and +we welcome any feedback. + +Chapter 4 Introduction +====================== + +Welcome to Chapter 4 of the "Building an ORC-based JIT in LLVM" tutorial. This +chapter introduces the Compile Callbacks and Indirect Stubs APIs and shows how +they can be used to replace the CompileOnDemand layer from +`Chapter 3 <BuildingAJIT3.html>`_ with a custom lazy-JITing scheme that JITs +directly from Kaleidoscope ASTs. + +**To be done:** + +**(1) Describe the drawbacks of JITing from IR (have to compile to IR first, +which reduces the benefits of laziness).** + +**(2) Describe CompileCallbackManagers and IndirectStubManagers in detail.** + +**(3) Run through the implementation of addFunctionAST.** + +Full Code Listing +================= + +Here is the complete code listing for our running example that JITs lazily from +Kaleidoscope ASTS. To build this example, use: + +.. code-block:: bash + + # Compile + clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orc native` -O3 -o toy + # Run + ./toy + +Here is the code: + +.. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter4/KaleidoscopeJIT.h + :language: c++ + +`Next: Remote-JITing -- Process-isolation and laziness-at-a-distance <BuildingAJIT5.html>`_ diff --git a/gnu/llvm/docs/tutorial/BuildingAJIT5.rst b/gnu/llvm/docs/tutorial/BuildingAJIT5.rst new file mode 100644 index 00000000000..94ea92ce5ad --- /dev/null +++ b/gnu/llvm/docs/tutorial/BuildingAJIT5.rst @@ -0,0 +1,55 @@ +============================================================================= +Building a JIT: Remote-JITing -- Process Isolation and Laziness at a Distance +============================================================================= + +.. contents:: + :local: + +**This tutorial is under active development. It is incomplete and details may +change frequently.** Nonetheless we invite you to try it out as it stands, and +we welcome any feedback. + +Chapter 5 Introduction +====================== + +Welcome to Chapter 5 of the "Building an ORC-based JIT in LLVM" tutorial. This +chapter introduces the ORC RemoteJIT Client/Server APIs and shows how to use +them to build a JIT stack that will execute its code via a communications +channel with a different process. This can be a separate process on the same +machine, a process on a different machine, or even a process on a different +platform/architecture. The code builds on top of the lazy-AST-compiling JIT +stack from `Chapter 4 <BuildingAJIT3.html>`_. + +**To be done -- this is going to be a long one:** + +**(1) Introduce channels, RPC, RemoteJIT Client and Server APIs** + +**(2) Describe the client code in greater detail. Discuss modifications of the +KaleidoscopeJIT class, and the REPL itself.** + +**(3) Describe the server code.** + +**(4) Describe how to run the demo.** + +Full Code Listing +================= + +Here is the complete code listing for our running example that JITs lazily from +Kaleidoscope ASTS. To build this example, use: + +.. code-block:: bash + + # Compile + clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orc native` -O3 -o toy + # Run + ./toy + +Here is the code for the modified KaleidoscopeJIT: + +.. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter5/KaleidoscopeJIT.h + :language: c++ + +And the code for the JIT server: + +.. literalinclude:: ../../examples/Kaleidoscope/BuildingAJIT/Chapter5/Server/server.cpp + :language: c++ diff --git a/gnu/llvm/docs/tutorial/LangImpl01.rst b/gnu/llvm/docs/tutorial/LangImpl01.rst new file mode 100644 index 00000000000..f7fbd150ef1 --- /dev/null +++ b/gnu/llvm/docs/tutorial/LangImpl01.rst @@ -0,0 +1,293 @@ +================================================= +Kaleidoscope: Tutorial Introduction and the Lexer +================================================= + +.. contents:: + :local: + +Tutorial Introduction +===================== + +Welcome to the "Implementing a language with LLVM" tutorial. This +tutorial runs through the implementation of a simple language, showing +how fun and easy it can be. This tutorial will get you up and started as +well as help to build a framework you can extend to other languages. The +code in this tutorial can also be used as a playground to hack on other +LLVM specific things. + +The goal of this tutorial is to progressively unveil our language, +describing how it is built up over time. This will let us cover a fairly +broad range of language design and LLVM-specific usage issues, showing +and explaining the code for it all along the way, without overwhelming +you with tons of details up front. + +It is useful to point out ahead of time that this tutorial is really +about teaching compiler techniques and LLVM specifically, *not* about +teaching modern and sane software engineering principles. In practice, +this means that we'll take a number of shortcuts to simplify the +exposition. For example, the code uses global variables +all over the place, doesn't use nice design patterns like +`visitors <http://en.wikipedia.org/wiki/Visitor_pattern>`_, etc... but +it is very simple. If you dig in and use the code as a basis for future +projects, fixing these deficiencies shouldn't be hard. + +I've tried to put this tutorial together in a way that makes chapters +easy to skip over if you are already familiar with or are uninterested +in the various pieces. The structure of the tutorial is: + +- `Chapter #1 <#language>`_: Introduction to the Kaleidoscope + language, and the definition of its Lexer - This shows where we are + going and the basic functionality that we want it to do. In order to + make this tutorial maximally understandable and hackable, we choose + to implement everything in C++ instead of using lexer and parser + generators. LLVM obviously works just fine with such tools, feel free + to use one if you prefer. +- `Chapter #2 <LangImpl02.html>`_: Implementing a Parser and AST - + With the lexer in place, we can talk about parsing techniques and + basic AST construction. This tutorial describes recursive descent + parsing and operator precedence parsing. Nothing in Chapters 1 or 2 + is LLVM-specific, the code doesn't even link in LLVM at this point. + :) +- `Chapter #3 <LangImpl03.html>`_: Code generation to LLVM IR - With + the AST ready, we can show off how easy generation of LLVM IR really + is. +- `Chapter #4 <LangImpl04.html>`_: Adding JIT and Optimizer Support + - Because a lot of people are interested in using LLVM as a JIT, + we'll dive right into it and show you the 3 lines it takes to add JIT + support. LLVM is also useful in many other ways, but this is one + simple and "sexy" way to show off its power. :) +- `Chapter #5 <LangImpl05.html>`_: Extending the Language: Control + Flow - With the language up and running, we show how to extend it + with control flow operations (if/then/else and a 'for' loop). This + gives us a chance to talk about simple SSA construction and control + flow. +- `Chapter #6 <LangImpl06.html>`_: Extending the Language: + User-defined Operators - This is a silly but fun chapter that talks + about extending the language to let the user program define their own + arbitrary unary and binary operators (with assignable precedence!). + This lets us build a significant piece of the "language" as library + routines. +- `Chapter #7 <LangImpl07.html>`_: Extending the Language: Mutable + Variables - This chapter talks about adding user-defined local + variables along with an assignment operator. The interesting part + about this is how easy and trivial it is to construct SSA form in + LLVM: no, LLVM does *not* require your front-end to construct SSA + form! +- `Chapter #8 <LangImpl08.html>`_: Compiling to Object Files - This + chapter explains how to take LLVM IR and compile it down to object + files. +- `Chapter #9 <LangImpl09.html>`_: Extending the Language: Debug + Information - Having built a decent little programming language with + control flow, functions and mutable variables, we consider what it + takes to add debug information to standalone executables. This debug + information will allow you to set breakpoints in Kaleidoscope + functions, print out argument variables, and call functions - all + from within the debugger! +- `Chapter #10 <LangImpl10.html>`_: Conclusion and other useful LLVM + tidbits - This chapter wraps up the series by talking about + potential ways to extend the language, but also includes a bunch of + pointers to info about "special topics" like adding garbage + collection support, exceptions, debugging, support for "spaghetti + stacks", and a bunch of other tips and tricks. + +By the end of the tutorial, we'll have written a bit less than 1000 lines +of non-comment, non-blank, lines of code. With this small amount of +code, we'll have built up a very reasonable compiler for a non-trivial +language including a hand-written lexer, parser, AST, as well as code +generation support with a JIT compiler. While other systems may have +interesting "hello world" tutorials, I think the breadth of this +tutorial is a great testament to the strengths of LLVM and why you +should consider it if you're interested in language or compiler design. + +A note about this tutorial: we expect you to extend the language and +play with it on your own. Take the code and go crazy hacking away at it, +compilers don't need to be scary creatures - it can be a lot of fun to +play with languages! + +The Basic Language +================== + +This tutorial will be illustrated with a toy language that we'll call +"`Kaleidoscope <http://en.wikipedia.org/wiki/Kaleidoscope>`_" (derived +from "meaning beautiful, form, and view"). Kaleidoscope is a procedural +language that allows you to define functions, use conditionals, math, +etc. Over the course of the tutorial, we'll extend Kaleidoscope to +support the if/then/else construct, a for loop, user defined operators, +JIT compilation with a simple command line interface, etc. + +Because we want to keep things simple, the only datatype in Kaleidoscope +is a 64-bit floating point type (aka 'double' in C parlance). As such, +all values are implicitly double precision and the language doesn't +require type declarations. This gives the language a very nice and +simple syntax. For example, the following simple example computes +`Fibonacci numbers: <http://en.wikipedia.org/wiki/Fibonacci_number>`_ + +:: + + # Compute the x'th fibonacci number. + def fib(x) + if x < 3 then + 1 + else + fib(x-1)+fib(x-2) + + # This expression will compute the 40th number. + fib(40) + +We also allow Kaleidoscope to call into standard library functions (the +LLVM JIT makes this completely trivial). This means that you can use the +'extern' keyword to define a function before you use it (this is also +useful for mutually recursive functions). For example: + +:: + + extern sin(arg); + extern cos(arg); + extern atan2(arg1 arg2); + + atan2(sin(.4), cos(42)) + +A more interesting example is included in Chapter 6 where we write a +little Kaleidoscope application that `displays a Mandelbrot +Set <LangImpl06.html#kicking-the-tires>`_ at various levels of magnification. + +Lets dive into the implementation of this language! + +The Lexer +========= + +When it comes to implementing a language, the first thing needed is the +ability to process a text file and recognize what it says. The +traditional way to do this is to use a +"`lexer <http://en.wikipedia.org/wiki/Lexical_analysis>`_" (aka +'scanner') to break the input up into "tokens". Each token returned by +the lexer includes a token code and potentially some metadata (e.g. the +numeric value of a number). First, we define the possibilities: + +.. code-block:: c++ + + // The lexer returns tokens [0-255] if it is an unknown character, otherwise one + // of these for known things. + enum Token { + tok_eof = -1, + + // commands + tok_def = -2, + tok_extern = -3, + + // primary + tok_identifier = -4, + tok_number = -5, + }; + + static std::string IdentifierStr; // Filled in if tok_identifier + static double NumVal; // Filled in if tok_number + +Each token returned by our lexer will either be one of the Token enum +values or it will be an 'unknown' character like '+', which is returned +as its ASCII value. If the current token is an identifier, the +``IdentifierStr`` global variable holds the name of the identifier. If +the current token is a numeric literal (like 1.0), ``NumVal`` holds its +value. Note that we use global variables for simplicity, this is not the +best choice for a real language implementation :). + +The actual implementation of the lexer is a single function named +``gettok``. The ``gettok`` function is called to return the next token +from standard input. Its definition starts as: + +.. code-block:: c++ + + /// gettok - Return the next token from standard input. + static int gettok() { + static int LastChar = ' '; + + // Skip any whitespace. + while (isspace(LastChar)) + LastChar = getchar(); + +``gettok`` works by calling the C ``getchar()`` function to read +characters one at a time from standard input. It eats them as it +recognizes them and stores the last character read, but not processed, +in LastChar. The first thing that it has to do is ignore whitespace +between tokens. This is accomplished with the loop above. + +The next thing ``gettok`` needs to do is recognize identifiers and +specific keywords like "def". Kaleidoscope does this with this simple +loop: + +.. code-block:: c++ + + if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]* + IdentifierStr = LastChar; + while (isalnum((LastChar = getchar()))) + IdentifierStr += LastChar; + + if (IdentifierStr == "def") + return tok_def; + if (IdentifierStr == "extern") + return tok_extern; + return tok_identifier; + } + +Note that this code sets the '``IdentifierStr``' global whenever it +lexes an identifier. Also, since language keywords are matched by the +same loop, we handle them here inline. Numeric values are similar: + +.. code-block:: c++ + + if (isdigit(LastChar) || LastChar == '.') { // Number: [0-9.]+ + std::string NumStr; + do { + NumStr += LastChar; + LastChar = getchar(); + } while (isdigit(LastChar) || LastChar == '.'); + + NumVal = strtod(NumStr.c_str(), 0); + return tok_number; + } + +This is all pretty straight-forward code for processing input. When +reading a numeric value from input, we use the C ``strtod`` function to +convert it to a numeric value that we store in ``NumVal``. Note that +this isn't doing sufficient error checking: it will incorrectly read +"1.23.45.67" and handle it as if you typed in "1.23". Feel free to +extend it :). Next we handle comments: + +.. code-block:: c++ + + if (LastChar == '#') { + // Comment until end of line. + do + LastChar = getchar(); + while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); + + if (LastChar != EOF) + return gettok(); + } + +We handle comments by skipping to the end of the line and then return +the next token. Finally, if the input doesn't match one of the above +cases, it is either an operator character like '+' or the end of the +file. These are handled with this code: + +.. code-block:: c++ + + // Check for end of file. Don't eat the EOF. + if (LastChar == EOF) + return tok_eof; + + // Otherwise, just return the character as its ascii value. + int ThisChar = LastChar; + LastChar = getchar(); + return ThisChar; + } + +With this, we have the complete lexer for the basic Kaleidoscope +language (the `full code listing <LangImpl02.html#full-code-listing>`_ for the Lexer +is available in the `next chapter <LangImpl02.html>`_ of the tutorial). +Next we'll `build a simple parser that uses this to build an Abstract +Syntax Tree <LangImpl02.html>`_. When we have that, we'll include a +driver so that you can use the lexer and parser together. + +`Next: Implementing a Parser and AST <LangImpl02.html>`_ + diff --git a/gnu/llvm/docs/tutorial/LangImpl02.rst b/gnu/llvm/docs/tutorial/LangImpl02.rst new file mode 100644 index 00000000000..701cbc96113 --- /dev/null +++ b/gnu/llvm/docs/tutorial/LangImpl02.rst @@ -0,0 +1,735 @@ +=========================================== +Kaleidoscope: Implementing a Parser and AST +=========================================== + +.. contents:: + :local: + +Chapter 2 Introduction +====================== + +Welcome to Chapter 2 of the "`Implementing a language with +LLVM <index.html>`_" tutorial. This chapter shows you how to use the +lexer, built in `Chapter 1 <LangImpl1.html>`_, to build a full +`parser <http://en.wikipedia.org/wiki/Parsing>`_ for our Kaleidoscope +language. Once we have a parser, we'll define and build an `Abstract +Syntax Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST). + +The parser we will build uses a combination of `Recursive Descent +Parsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and +`Operator-Precedence +Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to +parse the Kaleidoscope language (the latter for binary expressions and +the former for everything else). Before we get to parsing though, lets +talk about the output of the parser: the Abstract Syntax Tree. + +The Abstract Syntax Tree (AST) +============================== + +The AST for a program captures its behavior in such a way that it is +easy for later stages of the compiler (e.g. code generation) to +interpret. We basically want one object for each construct in the +language, and the AST should closely model the language. In +Kaleidoscope, we have expressions, a prototype, and a function object. +We'll start with expressions first: + +.. code-block:: c++ + + /// ExprAST - Base class for all expression nodes. + class ExprAST { + public: + virtual ~ExprAST() {} + }; + + /// NumberExprAST - Expression class for numeric literals like "1.0". + class NumberExprAST : public ExprAST { + double Val; + + public: + NumberExprAST(double Val) : Val(Val) {} + }; + +The code above shows the definition of the base ExprAST class and one +subclass which we use for numeric literals. The important thing to note +about this code is that the NumberExprAST class captures the numeric +value of the literal as an instance variable. This allows later phases +of the compiler to know what the stored numeric value is. + +Right now we only create the AST, so there are no useful accessor +methods on them. It would be very easy to add a virtual method to pretty +print the code, for example. Here are the other expression AST node +definitions that we'll use in the basic form of the Kaleidoscope +language: + +.. code-block:: c++ + + /// VariableExprAST - Expression class for referencing a variable, like "a". + class VariableExprAST : public ExprAST { + std::string Name; + + public: + VariableExprAST(const std::string &Name) : Name(Name) {} + }; + + /// BinaryExprAST - Expression class for a binary operator. + class BinaryExprAST : public ExprAST { + char Op; + std::unique_ptr<ExprAST> LHS, RHS; + + public: + BinaryExprAST(char op, std::unique_ptr<ExprAST> LHS, + std::unique_ptr<ExprAST> RHS) + : Op(op), LHS(std::move(LHS)), RHS(std::move(RHS)) {} + }; + + /// CallExprAST - Expression class for function calls. + class CallExprAST : public ExprAST { + std::string Callee; + std::vector<std::unique_ptr<ExprAST>> Args; + + public: + CallExprAST(const std::string &Callee, + std::vector<std::unique_ptr<ExprAST>> Args) + : Callee(Callee), Args(std::move(Args)) {} + }; + +This is all (intentionally) rather straight-forward: variables capture +the variable name, binary operators capture their opcode (e.g. '+'), and +calls capture a function name as well as a list of any argument +expressions. One thing that is nice about our AST is that it captures +the language features without talking about the syntax of the language. +Note that there is no discussion about precedence of binary operators, +lexical structure, etc. + +For our basic language, these are all of the expression nodes we'll +define. Because it doesn't have conditional control flow, it isn't +Turing-complete; we'll fix that in a later installment. The two things +we need next are a way to talk about the interface to a function, and a +way to talk about functions themselves: + +.. code-block:: c++ + + /// PrototypeAST - This class represents the "prototype" for a function, + /// which captures its name, and its argument names (thus implicitly the number + /// of arguments the function takes). + class PrototypeAST { + std::string Name; + std::vector<std::string> Args; + + public: + PrototypeAST(const std::string &name, std::vector<std::string> Args) + : Name(name), Args(std::move(Args)) {} + }; + + /// FunctionAST - This class represents a function definition itself. + class FunctionAST { + std::unique_ptr<PrototypeAST> Proto; + std::unique_ptr<ExprAST> Body; + + public: + FunctionAST(std::unique_ptr<PrototypeAST> Proto, + std::unique_ptr<ExprAST> Body) + : Proto(std::move(Proto)), Body(std::move(Body)) {} + }; + +In Kaleidoscope, functions are typed with just a count of their +arguments. Since all values are double precision floating point, the +type of each argument doesn't need to be stored anywhere. In a more +aggressive and realistic language, the "ExprAST" class would probably +have a type field. + +With this scaffolding, we can now talk about parsing expressions and +function bodies in Kaleidoscope. + +Parser Basics +============= + +Now that we have an AST to build, we need to define the parser code to +build it. The idea here is that we want to parse something like "x+y" +(which is returned as three tokens by the lexer) into an AST that could +be generated with calls like this: + +.. code-block:: c++ + + auto LHS = llvm::make_unique<VariableExprAST>("x"); + auto RHS = llvm::make_unique<VariableExprAST>("y"); + auto Result = std::make_unique<BinaryExprAST>('+', std::move(LHS), + std::move(RHS)); + +In order to do this, we'll start by defining some basic helper routines: + +.. code-block:: c++ + + /// CurTok/getNextToken - Provide a simple token buffer. CurTok is the current + /// token the parser is looking at. getNextToken reads another token from the + /// lexer and updates CurTok with its results. + static int CurTok; + static int getNextToken() { + return CurTok = gettok(); + } + +This implements a simple token buffer around the lexer. This allows us +to look one token ahead at what the lexer is returning. Every function +in our parser will assume that CurTok is the current token that needs to +be parsed. + +.. code-block:: c++ + + + /// LogError* - These are little helper functions for error handling. + std::unique_ptr<ExprAST> LogError(const char *Str) { + fprintf(stderr, "LogError: %s\n", Str); + return nullptr; + } + std::unique_ptr<PrototypeAST> LogErrorP(const char *Str) { + LogError(Str); + return nullptr; + } + +The ``LogError`` routines are simple helper routines that our parser will +use to handle errors. The error recovery in our parser will not be the +best and is not particular user-friendly, but it will be enough for our +tutorial. These routines make it easier to handle errors in routines +that have various return types: they always return null. + +With these basic helper functions, we can implement the first piece of +our grammar: numeric literals. + +Basic Expression Parsing +======================== + +We start with numeric literals, because they are the simplest to +process. For each production in our grammar, we'll define a function +which parses that production. For numeric literals, we have: + +.. code-block:: c++ + + /// numberexpr ::= number + static std::unique_ptr<ExprAST> ParseNumberExpr() { + auto Result = llvm::make_unique<NumberExprAST>(NumVal); + getNextToken(); // consume the number + return std::move(Result); + } + +This routine is very simple: it expects to be called when the current +token is a ``tok_number`` token. It takes the current number value, +creates a ``NumberExprAST`` node, advances the lexer to the next token, +and finally returns. + +There are some interesting aspects to this. The most important one is +that this routine eats all of the tokens that correspond to the +production and returns the lexer buffer with the next token (which is +not part of the grammar production) ready to go. This is a fairly +standard way to go for recursive descent parsers. For a better example, +the parenthesis operator is defined like this: + +.. code-block:: c++ + + /// parenexpr ::= '(' expression ')' + static std::unique_ptr<ExprAST> ParseParenExpr() { + getNextToken(); // eat (. + auto V = ParseExpression(); + if (!V) + return nullptr; + + if (CurTok != ')') + return LogError("expected ')'"); + getNextToken(); // eat ). + return V; + } + +This function illustrates a number of interesting things about the +parser: + +1) It shows how we use the LogError routines. When called, this function +expects that the current token is a '(' token, but after parsing the +subexpression, it is possible that there is no ')' waiting. For example, +if the user types in "(4 x" instead of "(4)", the parser should emit an +error. Because errors can occur, the parser needs a way to indicate that +they happened: in our parser, we return null on an error. + +2) Another interesting aspect of this function is that it uses recursion +by calling ``ParseExpression`` (we will soon see that +``ParseExpression`` can call ``ParseParenExpr``). This is powerful +because it allows us to handle recursive grammars, and keeps each +production very simple. Note that parentheses do not cause construction +of AST nodes themselves. While we could do it this way, the most +important role of parentheses are to guide the parser and provide +grouping. Once the parser constructs the AST, parentheses are not +needed. + +The next simple production is for handling variable references and +function calls: + +.. code-block:: c++ + + /// identifierexpr + /// ::= identifier + /// ::= identifier '(' expression* ')' + static std::unique_ptr<ExprAST> ParseIdentifierExpr() { + std::string IdName = IdentifierStr; + + getNextToken(); // eat identifier. + + if (CurTok != '(') // Simple variable ref. + return llvm::make_unique<VariableExprAST>(IdName); + + // Call. + getNextToken(); // eat ( + std::vector<std::unique_ptr<ExprAST>> Args; + if (CurTok != ')') { + while (1) { + if (auto Arg = ParseExpression()) + Args.push_back(std::move(Arg)); + else + return nullptr; + + if (CurTok == ')') + break; + + if (CurTok != ',') + return LogError("Expected ')' or ',' in argument list"); + getNextToken(); + } + } + + // Eat the ')'. + getNextToken(); + + return llvm::make_unique<CallExprAST>(IdName, std::move(Args)); + } + +This routine follows the same style as the other routines. (It expects +to be called if the current token is a ``tok_identifier`` token). It +also has recursion and error handling. One interesting aspect of this is +that it uses *look-ahead* to determine if the current identifier is a +stand alone variable reference or if it is a function call expression. +It handles this by checking to see if the token after the identifier is +a '(' token, constructing either a ``VariableExprAST`` or +``CallExprAST`` node as appropriate. + +Now that we have all of our simple expression-parsing logic in place, we +can define a helper function to wrap it together into one entry point. +We call this class of expressions "primary" expressions, for reasons +that will become more clear `later in the +tutorial <LangImpl6.html#user-defined-unary-operators>`_. In order to parse an arbitrary +primary expression, we need to determine what sort of expression it is: + +.. code-block:: c++ + + /// primary + /// ::= identifierexpr + /// ::= numberexpr + /// ::= parenexpr + static std::unique_ptr<ExprAST> ParsePrimary() { + switch (CurTok) { + default: + return LogError("unknown token when expecting an expression"); + case tok_identifier: + return ParseIdentifierExpr(); + case tok_number: + return ParseNumberExpr(); + case '(': + return ParseParenExpr(); + } + } + +Now that you see the definition of this function, it is more obvious why +we can assume the state of CurTok in the various functions. This uses +look-ahead to determine which sort of expression is being inspected, and +then parses it with a function call. + +Now that basic expressions are handled, we need to handle binary +expressions. They are a bit more complex. + +Binary Expression Parsing +========================= + +Binary expressions are significantly harder to parse because they are +often ambiguous. For example, when given the string "x+y\*z", the parser +can choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common +definitions from mathematics, we expect the later parse, because "\*" +(multiplication) has higher *precedence* than "+" (addition). + +There are many ways to handle this, but an elegant and efficient way is +to use `Operator-Precedence +Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_. +This parsing technique uses the precedence of binary operators to guide +recursion. To start with, we need a table of precedences: + +.. code-block:: c++ + + /// BinopPrecedence - This holds the precedence for each binary operator that is + /// defined. + static std::map<char, int> BinopPrecedence; + + /// GetTokPrecedence - Get the precedence of the pending binary operator token. + static int GetTokPrecedence() { + if (!isascii(CurTok)) + return -1; + + // Make sure it's a declared binop. + int TokPrec = BinopPrecedence[CurTok]; + if (TokPrec <= 0) return -1; + return TokPrec; + } + + int main() { + // Install standard binary operators. + // 1 is lowest precedence. + BinopPrecedence['<'] = 10; + BinopPrecedence['+'] = 20; + BinopPrecedence['-'] = 20; + BinopPrecedence['*'] = 40; // highest. + ... + } + +For the basic form of Kaleidoscope, we will only support 4 binary +operators (this can obviously be extended by you, our brave and intrepid +reader). The ``GetTokPrecedence`` function returns the precedence for +the current token, or -1 if the token is not a binary operator. Having a +map makes it easy to add new operators and makes it clear that the +algorithm doesn't depend on the specific operators involved, but it +would be easy enough to eliminate the map and do the comparisons in the +``GetTokPrecedence`` function. (Or just use a fixed-size array). + +With the helper above defined, we can now start parsing binary +expressions. The basic idea of operator precedence parsing is to break +down an expression with potentially ambiguous binary operators into +pieces. Consider, for example, the expression "a+b+(c+d)\*e\*f+g". +Operator precedence parsing considers this as a stream of primary +expressions separated by binary operators. As such, it will first parse +the leading primary expression "a", then it will see the pairs [+, b] +[+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are +primary expressions, the binary expression parser doesn't need to worry +about nested subexpressions like (c+d) at all. + +To start, an expression is a primary expression potentially followed by +a sequence of [binop,primaryexpr] pairs: + +.. code-block:: c++ + + /// expression + /// ::= primary binoprhs + /// + static std::unique_ptr<ExprAST> ParseExpression() { + auto LHS = ParsePrimary(); + if (!LHS) + return nullptr; + + return ParseBinOpRHS(0, std::move(LHS)); + } + +``ParseBinOpRHS`` is the function that parses the sequence of pairs for +us. It takes a precedence and a pointer to an expression for the part +that has been parsed so far. Note that "x" is a perfectly valid +expression: As such, "binoprhs" is allowed to be empty, in which case it +returns the expression that is passed into it. In our example above, the +code passes the expression for "a" into ``ParseBinOpRHS`` and the +current token is "+". + +The precedence value passed into ``ParseBinOpRHS`` indicates the +*minimal operator precedence* that the function is allowed to eat. For +example, if the current pair stream is [+, x] and ``ParseBinOpRHS`` is +passed in a precedence of 40, it will not consume any tokens (because +the precedence of '+' is only 20). With this in mind, ``ParseBinOpRHS`` +starts with: + +.. code-block:: c++ + + /// binoprhs + /// ::= ('+' primary)* + static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec, + std::unique_ptr<ExprAST> LHS) { + // If this is a binop, find its precedence. + while (1) { + int TokPrec = GetTokPrecedence(); + + // If this is a binop that binds at least as tightly as the current binop, + // consume it, otherwise we are done. + if (TokPrec < ExprPrec) + return LHS; + +This code gets the precedence of the current token and checks to see if +if is too low. Because we defined invalid tokens to have a precedence of +-1, this check implicitly knows that the pair-stream ends when the token +stream runs out of binary operators. If this check succeeds, we know +that the token is a binary operator and that it will be included in this +expression: + +.. code-block:: c++ + + // Okay, we know this is a binop. + int BinOp = CurTok; + getNextToken(); // eat binop + + // Parse the primary expression after the binary operator. + auto RHS = ParsePrimary(); + if (!RHS) + return nullptr; + +As such, this code eats (and remembers) the binary operator and then +parses the primary expression that follows. This builds up the whole +pair, the first of which is [+, b] for the running example. + +Now that we parsed the left-hand side of an expression and one pair of +the RHS sequence, we have to decide which way the expression associates. +In particular, we could have "(a+b) binop unparsed" or "a + (b binop +unparsed)". To determine this, we look ahead at "binop" to determine its +precedence and compare it to BinOp's precedence (which is '+' in this +case): + +.. code-block:: c++ + + // If BinOp binds less tightly with RHS than the operator after RHS, let + // the pending operator take RHS as its LHS. + int NextPrec = GetTokPrecedence(); + if (TokPrec < NextPrec) { + +If the precedence of the binop to the right of "RHS" is lower or equal +to the precedence of our current operator, then we know that the +parentheses associate as "(a+b) binop ...". In our example, the current +operator is "+" and the next operator is "+", we know that they have the +same precedence. In this case we'll create the AST node for "a+b", and +then continue parsing: + +.. code-block:: c++ + + ... if body omitted ... + } + + // Merge LHS/RHS. + LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS), + std::move(RHS)); + } // loop around to the top of the while loop. + } + +In our example above, this will turn "a+b+" into "(a+b)" and execute the +next iteration of the loop, with "+" as the current token. The code +above will eat, remember, and parse "(c+d)" as the primary expression, +which makes the current pair equal to [+, (c+d)]. It will then evaluate +the 'if' conditional above with "\*" as the binop to the right of the +primary. In this case, the precedence of "\*" is higher than the +precedence of "+" so the if condition will be entered. + +The critical question left here is "how can the if condition parse the +right hand side in full"? In particular, to build the AST correctly for +our example, it needs to get all of "(c+d)\*e\*f" as the RHS expression +variable. The code to do this is surprisingly simple (code from the +above two blocks duplicated for context): + +.. code-block:: c++ + + // If BinOp binds less tightly with RHS than the operator after RHS, let + // the pending operator take RHS as its LHS. + int NextPrec = GetTokPrecedence(); + if (TokPrec < NextPrec) { + RHS = ParseBinOpRHS(TokPrec+1, std::move(RHS)); + if (!RHS) + return nullptr; + } + // Merge LHS/RHS. + LHS = llvm::make_unique<BinaryExprAST>(BinOp, std::move(LHS), + std::move(RHS)); + } // loop around to the top of the while loop. + } + +At this point, we know that the binary operator to the RHS of our +primary has higher precedence than the binop we are currently parsing. +As such, we know that any sequence of pairs whose operators are all +higher precedence than "+" should be parsed together and returned as +"RHS". To do this, we recursively invoke the ``ParseBinOpRHS`` function +specifying "TokPrec+1" as the minimum precedence required for it to +continue. In our example above, this will cause it to return the AST +node for "(c+d)\*e\*f" as RHS, which is then set as the RHS of the '+' +expression. + +Finally, on the next iteration of the while loop, the "+g" piece is +parsed and added to the AST. With this little bit of code (14 +non-trivial lines), we correctly handle fully general binary expression +parsing in a very elegant way. This was a whirlwind tour of this code, +and it is somewhat subtle. I recommend running through it with a few +tough examples to see how it works. + +This wraps up handling of expressions. At this point, we can point the +parser at an arbitrary token stream and build an expression from it, +stopping at the first token that is not part of the expression. Next up +we need to handle function definitions, etc. + +Parsing the Rest +================ + +The next thing missing is handling of function prototypes. In +Kaleidoscope, these are used both for 'extern' function declarations as +well as function body definitions. The code to do this is +straight-forward and not very interesting (once you've survived +expressions): + +.. code-block:: c++ + + /// prototype + /// ::= id '(' id* ')' + static std::unique_ptr<PrototypeAST> ParsePrototype() { + if (CurTok != tok_identifier) + return LogErrorP("Expected function name in prototype"); + + std::string FnName = IdentifierStr; + getNextToken(); + + if (CurTok != '(') + return LogErrorP("Expected '(' in prototype"); + + // Read the list of argument names. + std::vector<std::string> ArgNames; + while (getNextToken() == tok_identifier) + ArgNames.push_back(IdentifierStr); + if (CurTok != ')') + return LogErrorP("Expected ')' in prototype"); + + // success. + getNextToken(); // eat ')'. + + return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames)); + } + +Given this, a function definition is very simple, just a prototype plus +an expression to implement the body: + +.. code-block:: c++ + + /// definition ::= 'def' prototype expression + static std::unique_ptr<FunctionAST> ParseDefinition() { + getNextToken(); // eat def. + auto Proto = ParsePrototype(); + if (!Proto) return nullptr; + + if (auto E = ParseExpression()) + return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E)); + return nullptr; + } + +In addition, we support 'extern' to declare functions like 'sin' and +'cos' as well as to support forward declaration of user functions. These +'extern's are just prototypes with no body: + +.. code-block:: c++ + + /// external ::= 'extern' prototype + static std::unique_ptr<PrototypeAST> ParseExtern() { + getNextToken(); // eat extern. + return ParsePrototype(); + } + +Finally, we'll also let the user type in arbitrary top-level expressions +and evaluate them on the fly. We will handle this by defining anonymous +nullary (zero argument) functions for them: + +.. code-block:: c++ + + /// toplevelexpr ::= expression + static std::unique_ptr<FunctionAST> ParseTopLevelExpr() { + if (auto E = ParseExpression()) { + // Make an anonymous proto. + auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>()); + return llvm::make_unique<FunctionAST>(std::move(Proto), std::move(E)); + } + return nullptr; + } + +Now that we have all the pieces, let's build a little driver that will +let us actually *execute* this code we've built! + +The Driver +========== + +The driver for this simply invokes all of the parsing pieces with a +top-level dispatch loop. There isn't much interesting here, so I'll just +include the top-level loop. See `below <#full-code-listing>`_ for full code in the +"Top-Level Parsing" section. + +.. code-block:: c++ + + /// top ::= definition | external | expression | ';' + static void MainLoop() { + while (1) { + fprintf(stderr, "ready> "); + switch (CurTok) { + case tok_eof: + return; + case ';': // ignore top-level semicolons. + getNextToken(); + break; + case tok_def: + HandleDefinition(); + break; + case tok_extern: + HandleExtern(); + break; + default: + HandleTopLevelExpression(); + break; + } + } + } + +The most interesting part of this is that we ignore top-level +semicolons. Why is this, you ask? The basic reason is that if you type +"4 + 5" at the command line, the parser doesn't know whether that is the +end of what you will type or not. For example, on the next line you +could type "def foo..." in which case 4+5 is the end of a top-level +expression. Alternatively you could type "\* 6", which would continue +the expression. Having top-level semicolons allows you to type "4+5;", +and the parser will know you are done. + +Conclusions +=========== + +With just under 400 lines of commented code (240 lines of non-comment, +non-blank code), we fully defined our minimal language, including a +lexer, parser, and AST builder. With this done, the executable will +validate Kaleidoscope code and tell us if it is grammatically invalid. +For example, here is a sample interaction: + +.. code-block:: bash + + $ ./a.out + ready> def foo(x y) x+foo(y, 4.0); + Parsed a function definition. + ready> def foo(x y) x+y y; + Parsed a function definition. + Parsed a top-level expr + ready> def foo(x y) x+y ); + Parsed a function definition. + Error: unknown token when expecting an expression + ready> extern sin(a); + ready> Parsed an extern + ready> ^D + $ + +There is a lot of room for extension here. You can define new AST nodes, +extend the language in many ways, etc. In the `next +installment <LangImpl3.html>`_, we will describe how to generate LLVM +Intermediate Representation (IR) from the AST. + +Full Code Listing +================= + +Here is the complete code listing for this and the previous chapter. +Note that it is fully self-contained: you don't need LLVM or any +external libraries at all for this. (Besides the C and C++ standard +libraries, of course.) To build this, just compile with: + +.. code-block:: bash + + # Compile + clang++ -g -O3 toy.cpp + # Run + ./a.out + +Here is the code: + +.. literalinclude:: ../../examples/Kaleidoscope/Chapter2/toy.cpp + :language: c++ + +`Next: Implementing Code Generation to LLVM IR <LangImpl03.html>`_ + diff --git a/gnu/llvm/docs/tutorial/LangImpl03.rst b/gnu/llvm/docs/tutorial/LangImpl03.rst new file mode 100644 index 00000000000..2bb3a300026 --- /dev/null +++ b/gnu/llvm/docs/tutorial/LangImpl03.rst @@ -0,0 +1,567 @@ +======================================== +Kaleidoscope: Code generation to LLVM IR +======================================== + +.. contents:: + :local: + +Chapter 3 Introduction +====================== + +Welcome to Chapter 3 of the "`Implementing a language with +LLVM <index.html>`_" tutorial. This chapter shows you how to transform +the `Abstract Syntax Tree <LangImpl2.html>`_, built in Chapter 2, into +LLVM IR. This will teach you a little bit about how LLVM does things, as +well as demonstrate how easy it is to use. It's much more work to build +a lexer and parser than it is to generate LLVM IR code. :) + +**Please note**: the code in this chapter and later require LLVM 3.7 or +later. LLVM 3.6 and before will not work with it. Also note that you +need to use a version of this tutorial that matches your LLVM release: +If you are using an official LLVM release, use the version of the +documentation included with your release or on the `llvm.org releases +page <http://llvm.org/releases/>`_. + +Code Generation Setup +===================== + +In order to generate LLVM IR, we want some simple setup to get started. +First we define virtual code generation (codegen) methods in each AST +class: + +.. code-block:: c++ + + /// ExprAST - Base class for all expression nodes. + class ExprAST { + public: + virtual ~ExprAST() {} + virtual Value *codegen() = 0; + }; + + /// NumberExprAST - Expression class for numeric literals like "1.0". + class NumberExprAST : public ExprAST { + double Val; + + public: + NumberExprAST(double Val) : Val(Val) {} + virtual Value *codegen(); + }; + ... + +The codegen() method says to emit IR for that AST node along with all +the things it depends on, and they all return an LLVM Value object. +"Value" is the class used to represent a "`Static Single Assignment +(SSA) <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_ +register" or "SSA value" in LLVM. The most distinct aspect of SSA values +is that their value is computed as the related instruction executes, and +it does not get a new value until (and if) the instruction re-executes. +In other words, there is no way to "change" an SSA value. For more +information, please read up on `Static Single +Assignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_ +- the concepts are really quite natural once you grok them. + +Note that instead of adding virtual methods to the ExprAST class +hierarchy, it could also make sense to use a `visitor +pattern <http://en.wikipedia.org/wiki/Visitor_pattern>`_ or some other +way to model this. Again, this tutorial won't dwell on good software +engineering practices: for our purposes, adding a virtual method is +simplest. + +The second thing we want is an "LogError" method like we used for the +parser, which will be used to report errors found during code generation +(for example, use of an undeclared parameter): + +.. code-block:: c++ + + static LLVMContext TheContext; + static IRBuilder<> Builder(TheContext); + static std::unique_ptr<Module> TheModule; + static std::map<std::string, Value *> NamedValues; + + Value *LogErrorV(const char *Str) { + LogError(Str); + return nullptr; + } + +The static variables will be used during code generation. ``TheContext`` +is an opaque object that owns a lot of core LLVM data structures, such as +the type and constant value tables. We don't need to understand it in +detail, we just need a single instance to pass into APIs that require it. + +The ``Builder`` object is a helper object that makes it easy to generate +LLVM instructions. Instances of the +`IRBuilder <http://llvm.org/doxygen/IRBuilder_8h-source.html>`_ +class template keep track of the current place to insert instructions +and has methods to create new instructions. + +``TheModule`` is an LLVM construct that contains functions and global +variables. In many ways, it is the top-level structure that the LLVM IR +uses to contain code. It will own the memory for all of the IR that we +generate, which is why the codegen() method returns a raw Value\*, +rather than a unique_ptr<Value>. + +The ``NamedValues`` map keeps track of which values are defined in the +current scope and what their LLVM representation is. (In other words, it +is a symbol table for the code). In this form of Kaleidoscope, the only +things that can be referenced are function parameters. As such, function +parameters will be in this map when generating code for their function +body. + +With these basics in place, we can start talking about how to generate +code for each expression. Note that this assumes that the ``Builder`` +has been set up to generate code *into* something. For now, we'll assume +that this has already been done, and we'll just use it to emit code. + +Expression Code Generation +========================== + +Generating LLVM code for expression nodes is very straightforward: less +than 45 lines of commented code for all four of our expression nodes. +First we'll do numeric literals: + +.. code-block:: c++ + + Value *NumberExprAST::codegen() { + return ConstantFP::get(LLVMContext, APFloat(Val)); + } + +In the LLVM IR, numeric constants are represented with the +``ConstantFP`` class, which holds the numeric value in an ``APFloat`` +internally (``APFloat`` has the capability of holding floating point +constants of Arbitrary Precision). This code basically just creates +and returns a ``ConstantFP``. Note that in the LLVM IR that constants +are all uniqued together and shared. For this reason, the API uses the +"foo::get(...)" idiom instead of "new foo(..)" or "foo::Create(..)". + +.. code-block:: c++ + + Value *VariableExprAST::codegen() { + // Look this variable up in the function. + Value *V = NamedValues[Name]; + if (!V) + LogErrorV("Unknown variable name"); + return V; + } + +References to variables are also quite simple using LLVM. In the simple +version of Kaleidoscope, we assume that the variable has already been +emitted somewhere and its value is available. In practice, the only +values that can be in the ``NamedValues`` map are function arguments. +This code simply checks to see that the specified name is in the map (if +not, an unknown variable is being referenced) and returns the value for +it. In future chapters, we'll add support for `loop induction +variables <LangImpl5.html#for-loop-expression>`_ in the symbol table, and for `local +variables <LangImpl7.html#user-defined-local-variables>`_. + +.. code-block:: c++ + + Value *BinaryExprAST::codegen() { + Value *L = LHS->codegen(); + Value *R = RHS->codegen(); + if (!L || !R) + return nullptr; + + switch (Op) { + case '+': + return Builder.CreateFAdd(L, R, "addtmp"); + case '-': + return Builder.CreateFSub(L, R, "subtmp"); + case '*': + return Builder.CreateFMul(L, R, "multmp"); + case '<': + L = Builder.CreateFCmpULT(L, R, "cmptmp"); + // Convert bool 0/1 to double 0.0 or 1.0 + return Builder.CreateUIToFP(L, Type::getDoubleTy(LLVMContext), + "booltmp"); + default: + return LogErrorV("invalid binary operator"); + } + } + +Binary operators start to get more interesting. The basic idea here is +that we recursively emit code for the left-hand side of the expression, +then the right-hand side, then we compute the result of the binary +expression. In this code, we do a simple switch on the opcode to create +the right LLVM instruction. + +In the example above, the LLVM builder class is starting to show its +value. IRBuilder knows where to insert the newly created instruction, +all you have to do is specify what instruction to create (e.g. with +``CreateFAdd``), which operands to use (``L`` and ``R`` here) and +optionally provide a name for the generated instruction. + +One nice thing about LLVM is that the name is just a hint. For instance, +if the code above emits multiple "addtmp" variables, LLVM will +automatically provide each one with an increasing, unique numeric +suffix. Local value names for instructions are purely optional, but it +makes it much easier to read the IR dumps. + +`LLVM instructions <../LangRef.html#instruction-reference>`_ are constrained by strict +rules: for example, the Left and Right operators of an `add +instruction <../LangRef.html#add-instruction>`_ must have the same type, and the +result type of the add must match the operand types. Because all values +in Kaleidoscope are doubles, this makes for very simple code for add, +sub and mul. + +On the other hand, LLVM specifies that the `fcmp +instruction <../LangRef.html#fcmp-instruction>`_ always returns an 'i1' value (a +one bit integer). The problem with this is that Kaleidoscope wants the +value to be a 0.0 or 1.0 value. In order to get these semantics, we +combine the fcmp instruction with a `uitofp +instruction <../LangRef.html#uitofp-to-instruction>`_. This instruction converts its +input integer into a floating point value by treating the input as an +unsigned value. In contrast, if we used the `sitofp +instruction <../LangRef.html#sitofp-to-instruction>`_, the Kaleidoscope '<' operator +would return 0.0 and -1.0, depending on the input value. + +.. code-block:: c++ + + Value *CallExprAST::codegen() { + // Look up the name in the global module table. + Function *CalleeF = TheModule->getFunction(Callee); + if (!CalleeF) + return LogErrorV("Unknown function referenced"); + + // If argument mismatch error. + if (CalleeF->arg_size() != Args.size()) + return LogErrorV("Incorrect # arguments passed"); + + std::vector<Value *> ArgsV; + for (unsigned i = 0, e = Args.size(); i != e; ++i) { + ArgsV.push_back(Args[i]->codegen()); + if (!ArgsV.back()) + return nullptr; + } + + return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); + } + +Code generation for function calls is quite straightforward with LLVM. The code +above initially does a function name lookup in the LLVM Module's symbol table. +Recall that the LLVM Module is the container that holds the functions we are +JIT'ing. By giving each function the same name as what the user specifies, we +can use the LLVM symbol table to resolve function names for us. + +Once we have the function to call, we recursively codegen each argument +that is to be passed in, and create an LLVM `call +instruction <../LangRef.html#call-instruction>`_. Note that LLVM uses the native C +calling conventions by default, allowing these calls to also call into +standard library functions like "sin" and "cos", with no additional +effort. + +This wraps up our handling of the four basic expressions that we have so +far in Kaleidoscope. Feel free to go in and add some more. For example, +by browsing the `LLVM language reference <../LangRef.html>`_ you'll find +several other interesting instructions that are really easy to plug into +our basic framework. + +Function Code Generation +======================== + +Code generation for prototypes and functions must handle a number of +details, which make their code less beautiful than expression code +generation, but allows us to illustrate some important points. First, +lets talk about code generation for prototypes: they are used both for +function bodies and external function declarations. The code starts +with: + +.. code-block:: c++ + + Function *PrototypeAST::codegen() { + // Make the function type: double(double,double) etc. + std::vector<Type*> Doubles(Args.size(), + Type::getDoubleTy(LLVMContext)); + FunctionType *FT = + FunctionType::get(Type::getDoubleTy(LLVMContext), Doubles, false); + + Function *F = + Function::Create(FT, Function::ExternalLinkage, Name, TheModule); + +This code packs a lot of power into a few lines. Note first that this +function returns a "Function\*" instead of a "Value\*". Because a +"prototype" really talks about the external interface for a function +(not the value computed by an expression), it makes sense for it to +return the LLVM Function it corresponds to when codegen'd. + +The call to ``FunctionType::get`` creates the ``FunctionType`` that +should be used for a given Prototype. Since all function arguments in +Kaleidoscope are of type double, the first line creates a vector of "N" +LLVM double types. It then uses the ``Functiontype::get`` method to +create a function type that takes "N" doubles as arguments, returns one +double as a result, and that is not vararg (the false parameter +indicates this). Note that Types in LLVM are uniqued just like Constants +are, so you don't "new" a type, you "get" it. + +The final line above actually creates the IR Function corresponding to +the Prototype. This indicates the type, linkage and name to use, as +well as which module to insert into. "`external +linkage <../LangRef.html#linkage>`_" means that the function may be +defined outside the current module and/or that it is callable by +functions outside the module. The Name passed in is the name the user +specified: since "``TheModule``" is specified, this name is registered +in "``TheModule``"s symbol table. + +.. code-block:: c++ + + // Set names for all arguments. + unsigned Idx = 0; + for (auto &Arg : F->args()) + Arg.setName(Args[Idx++]); + + return F; + +Finally, we set the name of each of the function's arguments according to the +names given in the Prototype. This step isn't strictly necessary, but keeping +the names consistent makes the IR more readable, and allows subsequent code to +refer directly to the arguments for their names, rather than having to look up +them up in the Prototype AST. + +At this point we have a function prototype with no body. This is how LLVM IR +represents function declarations. For extern statements in Kaleidoscope, this +is as far as we need to go. For function definitions however, we need to +codegen and attach a function body. + +.. code-block:: c++ + + Function *FunctionAST::codegen() { + // First, check for an existing function from a previous 'extern' declaration. + Function *TheFunction = TheModule->getFunction(Proto->getName()); + + if (!TheFunction) + TheFunction = Proto->codegen(); + + if (!TheFunction) + return nullptr; + + if (!TheFunction->empty()) + return (Function*)LogErrorV("Function cannot be redefined."); + + +For function definitions, we start by searching TheModule's symbol table for an +existing version of this function, in case one has already been created using an +'extern' statement. If Module::getFunction returns null then no previous version +exists, so we'll codegen one from the Prototype. In either case, we want to +assert that the function is empty (i.e. has no body yet) before we start. + +.. code-block:: c++ + + // Create a new basic block to start insertion into. + BasicBlock *BB = BasicBlock::Create(LLVMContext, "entry", TheFunction); + Builder.SetInsertPoint(BB); + + // Record the function arguments in the NamedValues map. + NamedValues.clear(); + for (auto &Arg : TheFunction->args()) + NamedValues[Arg.getName()] = &Arg; + +Now we get to the point where the ``Builder`` is set up. The first line +creates a new `basic block <http://en.wikipedia.org/wiki/Basic_block>`_ +(named "entry"), which is inserted into ``TheFunction``. The second line +then tells the builder that new instructions should be inserted into the +end of the new basic block. Basic blocks in LLVM are an important part +of functions that define the `Control Flow +Graph <http://en.wikipedia.org/wiki/Control_flow_graph>`_. Since we +don't have any control flow, our functions will only contain one block +at this point. We'll fix this in `Chapter 5 <LangImpl5.html>`_ :). + +Next we add the function arguments to the NamedValues map (after first clearing +it out) so that they're accessible to ``VariableExprAST`` nodes. + +.. code-block:: c++ + + if (Value *RetVal = Body->codegen()) { + // Finish off the function. + Builder.CreateRet(RetVal); + + // Validate the generated code, checking for consistency. + verifyFunction(*TheFunction); + + return TheFunction; + } + +Once the insertion point has been set up and the NamedValues map populated, +we call the ``codegen()`` method for the root expression of the function. If no +error happens, this emits code to compute the expression into the entry block +and returns the value that was computed. Assuming no error, we then create an +LLVM `ret instruction <../LangRef.html#ret-instruction>`_, which completes the function. +Once the function is built, we call ``verifyFunction``, which is +provided by LLVM. This function does a variety of consistency checks on +the generated code, to determine if our compiler is doing everything +right. Using this is important: it can catch a lot of bugs. Once the +function is finished and validated, we return it. + +.. code-block:: c++ + + // Error reading body, remove function. + TheFunction->eraseFromParent(); + return nullptr; + } + +The only piece left here is handling of the error case. For simplicity, +we handle this by merely deleting the function we produced with the +``eraseFromParent`` method. This allows the user to redefine a function +that they incorrectly typed in before: if we didn't delete it, it would +live in the symbol table, with a body, preventing future redefinition. + +This code does have a bug, though: If the ``FunctionAST::codegen()`` method +finds an existing IR Function, it does not validate its signature against the +definition's own prototype. This means that an earlier 'extern' declaration will +take precedence over the function definition's signature, which can cause +codegen to fail, for instance if the function arguments are named differently. +There are a number of ways to fix this bug, see what you can come up with! Here +is a testcase: + +:: + + extern foo(a); # ok, defines foo. + def foo(b) b; # Error: Unknown variable name. (decl using 'a' takes precedence). + +Driver Changes and Closing Thoughts +=================================== + +For now, code generation to LLVM doesn't really get us much, except that +we can look at the pretty IR calls. The sample code inserts calls to +codegen into the "``HandleDefinition``", "``HandleExtern``" etc +functions, and then dumps out the LLVM IR. This gives a nice way to look +at the LLVM IR for simple functions. For example: + +:: + + ready> 4+5; + Read top-level expression: + define double @0() { + entry: + ret double 9.000000e+00 + } + +Note how the parser turns the top-level expression into anonymous +functions for us. This will be handy when we add `JIT +support <LangImpl4.html#adding-a-jit-compiler>`_ in the next chapter. Also note that the +code is very literally transcribed, no optimizations are being performed +except simple constant folding done by IRBuilder. We will `add +optimizations <LangImpl4.html#trivial-constant-folding>`_ explicitly in the next +chapter. + +:: + + ready> def foo(a b) a*a + 2*a*b + b*b; + Read function definition: + define double @foo(double %a, double %b) { + entry: + %multmp = fmul double %a, %a + %multmp1 = fmul double 2.000000e+00, %a + %multmp2 = fmul double %multmp1, %b + %addtmp = fadd double %multmp, %multmp2 + %multmp3 = fmul double %b, %b + %addtmp4 = fadd double %addtmp, %multmp3 + ret double %addtmp4 + } + +This shows some simple arithmetic. Notice the striking similarity to the +LLVM builder calls that we use to create the instructions. + +:: + + ready> def bar(a) foo(a, 4.0) + bar(31337); + Read function definition: + define double @bar(double %a) { + entry: + %calltmp = call double @foo(double %a, double 4.000000e+00) + %calltmp1 = call double @bar(double 3.133700e+04) + %addtmp = fadd double %calltmp, %calltmp1 + ret double %addtmp + } + +This shows some function calls. Note that this function will take a long +time to execute if you call it. In the future we'll add conditional +control flow to actually make recursion useful :). + +:: + + ready> extern cos(x); + Read extern: + declare double @cos(double) + + ready> cos(1.234); + Read top-level expression: + define double @1() { + entry: + %calltmp = call double @cos(double 1.234000e+00) + ret double %calltmp + } + +This shows an extern for the libm "cos" function, and a call to it. + +.. TODO:: Abandon Pygments' horrible `llvm` lexer. It just totally gives up + on highlighting this due to the first line. + +:: + + ready> ^D + ; ModuleID = 'my cool jit' + + define double @0() { + entry: + %addtmp = fadd double 4.000000e+00, 5.000000e+00 + ret double %addtmp + } + + define double @foo(double %a, double %b) { + entry: + %multmp = fmul double %a, %a + %multmp1 = fmul double 2.000000e+00, %a + %multmp2 = fmul double %multmp1, %b + %addtmp = fadd double %multmp, %multmp2 + %multmp3 = fmul double %b, %b + %addtmp4 = fadd double %addtmp, %multmp3 + ret double %addtmp4 + } + + define double @bar(double %a) { + entry: + %calltmp = call double @foo(double %a, double 4.000000e+00) + %calltmp1 = call double @bar(double 3.133700e+04) + %addtmp = fadd double %calltmp, %calltmp1 + ret double %addtmp + } + + declare double @cos(double) + + define double @1() { + entry: + %calltmp = call double @cos(double 1.234000e+00) + ret double %calltmp + } + +When you quit the current demo, it dumps out the IR for the entire +module generated. Here you can see the big picture with all the +functions referencing each other. + +This wraps up the third chapter of the Kaleidoscope tutorial. Up next, +we'll describe how to `add JIT codegen and optimizer +support <LangImpl4.html>`_ to this so we can actually start running +code! + +Full Code Listing +================= + +Here is the complete code listing for our running example, enhanced with +the LLVM code generator. Because this uses the LLVM libraries, we need +to link them in. To do this, we use the +`llvm-config <http://llvm.org/cmds/llvm-config.html>`_ tool to inform +our makefile/command line about which options to use: + +.. code-block:: bash + + # Compile + clang++ -g -O3 toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core` -o toy + # Run + ./toy + +Here is the code: + +.. literalinclude:: ../../examples/Kaleidoscope/Chapter3/toy.cpp + :language: c++ + +`Next: Adding JIT and Optimizer Support <LangImpl04.html>`_ + diff --git a/gnu/llvm/docs/tutorial/LangImpl04.rst b/gnu/llvm/docs/tutorial/LangImpl04.rst new file mode 100644 index 00000000000..78596cd8eee --- /dev/null +++ b/gnu/llvm/docs/tutorial/LangImpl04.rst @@ -0,0 +1,610 @@ +============================================== +Kaleidoscope: Adding JIT and Optimizer Support +============================================== + +.. contents:: + :local: + +Chapter 4 Introduction +====================== + +Welcome to Chapter 4 of the "`Implementing a language with +LLVM <index.html>`_" tutorial. Chapters 1-3 described the implementation +of a simple language and added support for generating LLVM IR. This +chapter describes two new techniques: adding optimizer support to your +language, and adding JIT compiler support. These additions will +demonstrate how to get nice, efficient code for the Kaleidoscope +language. + +Trivial Constant Folding +======================== + +Our demonstration for Chapter 3 is elegant and easy to extend. +Unfortunately, it does not produce wonderful code. The IRBuilder, +however, does give us obvious optimizations when compiling simple code: + +:: + + ready> def test(x) 1+2+x; + Read function definition: + define double @test(double %x) { + entry: + %addtmp = fadd double 3.000000e+00, %x + ret double %addtmp + } + +This code is not a literal transcription of the AST built by parsing the +input. That would be: + +:: + + ready> def test(x) 1+2+x; + Read function definition: + define double @test(double %x) { + entry: + %addtmp = fadd double 2.000000e+00, 1.000000e+00 + %addtmp1 = fadd double %addtmp, %x + ret double %addtmp1 + } + +Constant folding, as seen above, in particular, is a very common and +very important optimization: so much so that many language implementors +implement constant folding support in their AST representation. + +With LLVM, you don't need this support in the AST. Since all calls to +build LLVM IR go through the LLVM IR builder, the builder itself checked +to see if there was a constant folding opportunity when you call it. If +so, it just does the constant fold and return the constant instead of +creating an instruction. + +Well, that was easy :). In practice, we recommend always using +``IRBuilder`` when generating code like this. It has no "syntactic +overhead" for its use (you don't have to uglify your compiler with +constant checks everywhere) and it can dramatically reduce the amount of +LLVM IR that is generated in some cases (particular for languages with a +macro preprocessor or that use a lot of constants). + +On the other hand, the ``IRBuilder`` is limited by the fact that it does +all of its analysis inline with the code as it is built. If you take a +slightly more complex example: + +:: + + ready> def test(x) (1+2+x)*(x+(1+2)); + ready> Read function definition: + define double @test(double %x) { + entry: + %addtmp = fadd double 3.000000e+00, %x + %addtmp1 = fadd double %x, 3.000000e+00 + %multmp = fmul double %addtmp, %addtmp1 + ret double %multmp + } + +In this case, the LHS and RHS of the multiplication are the same value. +We'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``" +instead of computing "``x+3``" twice. + +Unfortunately, no amount of local analysis will be able to detect and +correct this. This requires two transformations: reassociation of +expressions (to make the add's lexically identical) and Common +Subexpression Elimination (CSE) to delete the redundant add instruction. +Fortunately, LLVM provides a broad range of optimizations that you can +use, in the form of "passes". + +LLVM Optimization Passes +======================== + +LLVM provides many optimization passes, which do many different sorts of +things and have different tradeoffs. Unlike other systems, LLVM doesn't +hold to the mistaken notion that one set of optimizations is right for +all languages and for all situations. LLVM allows a compiler implementor +to make complete decisions about what optimizations to use, in which +order, and in what situation. + +As a concrete example, LLVM supports both "whole module" passes, which +look across as large of body of code as they can (often a whole file, +but if run at link time, this can be a substantial portion of the whole +program). It also supports and includes "per-function" passes which just +operate on a single function at a time, without looking at other +functions. For more information on passes and how they are run, see the +`How to Write a Pass <../WritingAnLLVMPass.html>`_ document and the +`List of LLVM Passes <../Passes.html>`_. + +For Kaleidoscope, we are currently generating functions on the fly, one +at a time, as the user types them in. We aren't shooting for the +ultimate optimization experience in this setting, but we also want to +catch the easy and quick stuff where possible. As such, we will choose +to run a few per-function optimizations as the user types the function +in. If we wanted to make a "static Kaleidoscope compiler", we would use +exactly the code we have now, except that we would defer running the +optimizer until the entire file has been parsed. + +In order to get per-function optimizations going, we need to set up a +`FunctionPassManager <../WritingAnLLVMPass.html#what-passmanager-doesr>`_ to hold +and organize the LLVM optimizations that we want to run. Once we have +that, we can add a set of optimizations to run. We'll need a new +FunctionPassManager for each module that we want to optimize, so we'll +write a function to create and initialize both the module and pass manager +for us: + +.. code-block:: c++ + + void InitializeModuleAndPassManager(void) { + // Open a new module. + Context LLVMContext; + TheModule = llvm::make_unique<Module>("my cool jit", LLVMContext); + TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout()); + + // Create a new pass manager attached to it. + TheFPM = llvm::make_unique<FunctionPassManager>(TheModule.get()); + + // Provide basic AliasAnalysis support for GVN. + TheFPM.add(createBasicAliasAnalysisPass()); + // Do simple "peephole" optimizations and bit-twiddling optzns. + TheFPM.add(createInstructionCombiningPass()); + // Reassociate expressions. + TheFPM.add(createReassociatePass()); + // Eliminate Common SubExpressions. + TheFPM.add(createGVNPass()); + // Simplify the control flow graph (deleting unreachable blocks, etc). + TheFPM.add(createCFGSimplificationPass()); + + TheFPM.doInitialization(); + } + +This code initializes the global module ``TheModule``, and the function pass +manager ``TheFPM``, which is attached to ``TheModule``. Once the pass manager is +set up, we use a series of "add" calls to add a bunch of LLVM passes. + +In this case, we choose to add five passes: one analysis pass (alias analysis), +and four optimization passes. The passes we choose here are a pretty standard set +of "cleanup" optimizations that are useful for a wide variety of code. I won't +delve into what they do but, believe me, they are a good starting place :). + +Once the PassManager is set up, we need to make use of it. We do this by +running it after our newly created function is constructed (in +``FunctionAST::codegen()``), but before it is returned to the client: + +.. code-block:: c++ + + if (Value *RetVal = Body->codegen()) { + // Finish off the function. + Builder.CreateRet(RetVal); + + // Validate the generated code, checking for consistency. + verifyFunction(*TheFunction); + + // Optimize the function. + TheFPM->run(*TheFunction); + + return TheFunction; + } + +As you can see, this is pretty straightforward. The +``FunctionPassManager`` optimizes and updates the LLVM Function\* in +place, improving (hopefully) its body. With this in place, we can try +our test above again: + +:: + + ready> def test(x) (1+2+x)*(x+(1+2)); + ready> Read function definition: + define double @test(double %x) { + entry: + %addtmp = fadd double %x, 3.000000e+00 + %multmp = fmul double %addtmp, %addtmp + ret double %multmp + } + +As expected, we now get our nicely optimized code, saving a floating +point add instruction from every execution of this function. + +LLVM provides a wide variety of optimizations that can be used in +certain circumstances. Some `documentation about the various +passes <../Passes.html>`_ is available, but it isn't very complete. +Another good source of ideas can come from looking at the passes that +``Clang`` runs to get started. The "``opt``" tool allows you to +experiment with passes from the command line, so you can see if they do +anything. + +Now that we have reasonable code coming out of our front-end, lets talk +about executing it! + +Adding a JIT Compiler +===================== + +Code that is available in LLVM IR can have a wide variety of tools +applied to it. For example, you can run optimizations on it (as we did +above), you can dump it out in textual or binary forms, you can compile +the code to an assembly file (.s) for some target, or you can JIT +compile it. The nice thing about the LLVM IR representation is that it +is the "common currency" between many different parts of the compiler. + +In this section, we'll add JIT compiler support to our interpreter. The +basic idea that we want for Kaleidoscope is to have the user enter +function bodies as they do now, but immediately evaluate the top-level +expressions they type in. For example, if they type in "1 + 2;", we +should evaluate and print out 3. If they define a function, they should +be able to call it from the command line. + +In order to do this, we first declare and initialize the JIT. This is +done by adding a global variable ``TheJIT``, and initializing it in +``main``: + +.. code-block:: c++ + + static std::unique_ptr<KaleidoscopeJIT> TheJIT; + ... + int main() { + .. + TheJIT = llvm::make_unique<KaleidoscopeJIT>(); + + // Run the main "interpreter loop" now. + MainLoop(); + + return 0; + } + +The KaleidoscopeJIT class is a simple JIT built specifically for these +tutorials. In later chapters we will look at how it works and extend it with +new features, but for now we will take it as given. Its API is very simple:: +``addModule`` adds an LLVM IR module to the JIT, making its functions +available for execution; ``removeModule`` removes a module, freeing any +memory associated with the code in that module; and ``findSymbol`` allows us +to look up pointers to the compiled code. + +We can take this simple API and change our code that parses top-level expressions to +look like this: + +.. code-block:: c++ + + static void HandleTopLevelExpression() { + // Evaluate a top-level expression into an anonymous function. + if (auto FnAST = ParseTopLevelExpr()) { + if (FnAST->codegen()) { + + // JIT the module containing the anonymous expression, keeping a handle so + // we can free it later. + auto H = TheJIT->addModule(std::move(TheModule)); + InitializeModuleAndPassManager(); + + // Search the JIT for the __anon_expr symbol. + auto ExprSymbol = TheJIT->findSymbol("__anon_expr"); + assert(ExprSymbol && "Function not found"); + + // Get the symbol's address and cast it to the right type (takes no + // arguments, returns a double) so we can call it as a native function. + double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress(); + fprintf(stderr, "Evaluated to %f\n", FP()); + + // Delete the anonymous expression module from the JIT. + TheJIT->removeModule(H); + } + +If parsing and codegen succeeed, the next step is to add the module containing +the top-level expression to the JIT. We do this by calling addModule, which +triggers code generation for all the functions in the module, and returns a +handle that can be used to remove the module from the JIT later. Once the module +has been added to the JIT it can no longer be modified, so we also open a new +module to hold subsequent code by calling ``InitializeModuleAndPassManager()``. + +Once we've added the module to the JIT we need to get a pointer to the final +generated code. We do this by calling the JIT's findSymbol method, and passing +the name of the top-level expression function: ``__anon_expr``. Since we just +added this function, we assert that findSymbol returned a result. + +Next, we get the in-memory address of the ``__anon_expr`` function by calling +``getAddress()`` on the symbol. Recall that we compile top-level expressions +into a self-contained LLVM function that takes no arguments and returns the +computed double. Because the LLVM JIT compiler matches the native platform ABI, +this means that you can just cast the result pointer to a function pointer of +that type and call it directly. This means, there is no difference between JIT +compiled code and native machine code that is statically linked into your +application. + +Finally, since we don't support re-evaluation of top-level expressions, we +remove the module from the JIT when we're done to free the associated memory. +Recall, however, that the module we created a few lines earlier (via +``InitializeModuleAndPassManager``) is still open and waiting for new code to be +added. + +With just these two changes, lets see how Kaleidoscope works now! + +:: + + ready> 4+5; + Read top-level expression: + define double @0() { + entry: + ret double 9.000000e+00 + } + + Evaluated to 9.000000 + +Well this looks like it is basically working. The dump of the function +shows the "no argument function that always returns double" that we +synthesize for each top-level expression that is typed in. This +demonstrates very basic functionality, but can we do more? + +:: + + ready> def testfunc(x y) x + y*2; + Read function definition: + define double @testfunc(double %x, double %y) { + entry: + %multmp = fmul double %y, 2.000000e+00 + %addtmp = fadd double %multmp, %x + ret double %addtmp + } + + ready> testfunc(4, 10); + Read top-level expression: + define double @1() { + entry: + %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01) + ret double %calltmp + } + + Evaluated to 24.000000 + + ready> testfunc(5, 10); + ready> LLVM ERROR: Program used external function 'testfunc' which could not be resolved! + + +Function definitions and calls also work, but something went very wrong on that +last line. The call looks valid, so what happened? As you may have guessed from +the the API a Module is a unit of allocation for the JIT, and testfunc was part +of the same module that contained anonymous expression. When we removed that +module from the JIT to free the memory for the anonymous expression, we deleted +the definition of ``testfunc`` along with it. Then, when we tried to call +testfunc a second time, the JIT could no longer find it. + +The easiest way to fix this is to put the anonymous expression in a separate +module from the rest of the function definitions. The JIT will happily resolve +function calls across module boundaries, as long as each of the functions called +has a prototype, and is added to the JIT before it is called. By putting the +anonymous expression in a different module we can delete it without affecting +the rest of the functions. + +In fact, we're going to go a step further and put every function in its own +module. Doing so allows us to exploit a useful property of the KaleidoscopeJIT +that will make our environment more REPL-like: Functions can be added to the +JIT more than once (unlike a module where every function must have a unique +definition). When you look up a symbol in KaleidoscopeJIT it will always return +the most recent definition: + +:: + + ready> def foo(x) x + 1; + Read function definition: + define double @foo(double %x) { + entry: + %addtmp = fadd double %x, 1.000000e+00 + ret double %addtmp + } + + ready> foo(2); + Evaluated to 3.000000 + + ready> def foo(x) x + 2; + define double @foo(double %x) { + entry: + %addtmp = fadd double %x, 2.000000e+00 + ret double %addtmp + } + + ready> foo(2); + Evaluated to 4.000000 + + +To allow each function to live in its own module we'll need a way to +re-generate previous function declarations into each new module we open: + +.. code-block:: c++ + + static std::unique_ptr<KaleidoscopeJIT> TheJIT; + + ... + + Function *getFunction(std::string Name) { + // First, see if the function has already been added to the current module. + if (auto *F = TheModule->getFunction(Name)) + return F; + + // If not, check whether we can codegen the declaration from some existing + // prototype. + auto FI = FunctionProtos.find(Name); + if (FI != FunctionProtos.end()) + return FI->second->codegen(); + + // If no existing prototype exists, return null. + return nullptr; + } + + ... + + Value *CallExprAST::codegen() { + // Look up the name in the global module table. + Function *CalleeF = getFunction(Callee); + + ... + + Function *FunctionAST::codegen() { + // Transfer ownership of the prototype to the FunctionProtos map, but keep a + // reference to it for use below. + auto &P = *Proto; + FunctionProtos[Proto->getName()] = std::move(Proto); + Function *TheFunction = getFunction(P.getName()); + if (!TheFunction) + return nullptr; + + +To enable this, we'll start by adding a new global, ``FunctionProtos``, that +holds the most recent prototype for each function. We'll also add a convenience +method, ``getFunction()``, to replace calls to ``TheModule->getFunction()``. +Our convenience method searches ``TheModule`` for an existing function +declaration, falling back to generating a new declaration from FunctionProtos if +it doesn't find one. In ``CallExprAST::codegen()`` we just need to replace the +call to ``TheModule->getFunction()``. In ``FunctionAST::codegen()`` we need to +update the FunctionProtos map first, then call ``getFunction()``. With this +done, we can always obtain a function declaration in the current module for any +previously declared function. + +We also need to update HandleDefinition and HandleExtern: + +.. code-block:: c++ + + static void HandleDefinition() { + if (auto FnAST = ParseDefinition()) { + if (auto *FnIR = FnAST->codegen()) { + fprintf(stderr, "Read function definition:"); + FnIR->dump(); + TheJIT->addModule(std::move(TheModule)); + InitializeModuleAndPassManager(); + } + } else { + // Skip token for error recovery. + getNextToken(); + } + } + + static void HandleExtern() { + if (auto ProtoAST = ParseExtern()) { + if (auto *FnIR = ProtoAST->codegen()) { + fprintf(stderr, "Read extern: "); + FnIR->dump(); + FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST); + } + } else { + // Skip token for error recovery. + getNextToken(); + } + } + +In HandleDefinition, we add two lines to transfer the newly defined function to +the JIT and open a new module. In HandleExtern, we just need to add one line to +add the prototype to FunctionProtos. + +With these changes made, lets try our REPL again (I removed the dump of the +anonymous functions this time, you should get the idea by now :) : + +:: + + ready> def foo(x) x + 1; + ready> foo(2); + Evaluated to 3.000000 + + ready> def foo(x) x + 2; + ready> foo(2); + Evaluated to 4.000000 + +It works! + +Even with this simple code, we get some surprisingly powerful capabilities - +check this out: + +:: + + ready> extern sin(x); + Read extern: + declare double @sin(double) + + ready> extern cos(x); + Read extern: + declare double @cos(double) + + ready> sin(1.0); + Read top-level expression: + define double @2() { + entry: + ret double 0x3FEAED548F090CEE + } + + Evaluated to 0.841471 + + ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x); + Read function definition: + define double @foo(double %x) { + entry: + %calltmp = call double @sin(double %x) + %multmp = fmul double %calltmp, %calltmp + %calltmp2 = call double @cos(double %x) + %multmp4 = fmul double %calltmp2, %calltmp2 + %addtmp = fadd double %multmp, %multmp4 + ret double %addtmp + } + + ready> foo(4.0); + Read top-level expression: + define double @3() { + entry: + %calltmp = call double @foo(double 4.000000e+00) + ret double %calltmp + } + + Evaluated to 1.000000 + +Whoa, how does the JIT know about sin and cos? The answer is surprisingly +simple: The KaleidoscopeJIT has a straightforward symbol resolution rule that +it uses to find symbols that aren't available in any given module: First +it searches all the modules that have already been added to the JIT, from the +most recent to the oldest, to find the newest definition. If no definition is +found inside the JIT, it falls back to calling "``dlsym("sin")``" on the +Kaleidoscope process itself. Since "``sin``" is defined within the JIT's +address space, it simply patches up calls in the module to call the libm +version of ``sin`` directly. + +In the future we'll see how tweaking this symbol resolution rule can be used to +enable all sorts of useful features, from security (restricting the set of +symbols available to JIT'd code), to dynamic code generation based on symbol +names, and even lazy compilation. + +One immediate benefit of the symbol resolution rule is that we can now extend +the language by writing arbitrary C++ code to implement operations. For example, +if we add: + +.. code-block:: c++ + + /// putchard - putchar that takes a double and returns 0. + extern "C" double putchard(double X) { + fputc((char)X, stderr); + return 0; + } + +Now we can produce simple output to the console by using things like: +"``extern putchard(x); putchard(120);``", which prints a lowercase 'x' +on the console (120 is the ASCII code for 'x'). Similar code could be +used to implement file I/O, console input, and many other capabilities +in Kaleidoscope. + +This completes the JIT and optimizer chapter of the Kaleidoscope +tutorial. At this point, we can compile a non-Turing-complete +programming language, optimize and JIT compile it in a user-driven way. +Next up we'll look into `extending the language with control flow +constructs <LangImpl5.html>`_, tackling some interesting LLVM IR issues +along the way. + +Full Code Listing +================= + +Here is the complete code listing for our running example, enhanced with +the LLVM JIT and optimizer. To build this example, use: + +.. code-block:: bash + + # Compile + clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy + # Run + ./toy + +If you are compiling this on Linux, make sure to add the "-rdynamic" +option as well. This makes sure that the external functions are resolved +properly at runtime. + +Here is the code: + +.. literalinclude:: ../../examples/Kaleidoscope/Chapter4/toy.cpp + :language: c++ + +`Next: Extending the language: control flow <LangImpl05.html>`_ + diff --git a/gnu/llvm/docs/tutorial/LangImpl05-cfg.png b/gnu/llvm/docs/tutorial/LangImpl05-cfg.png Binary files differnew file mode 100644 index 00000000000..cdba92ff6c5 --- /dev/null +++ b/gnu/llvm/docs/tutorial/LangImpl05-cfg.png diff --git a/gnu/llvm/docs/tutorial/LangImpl05.rst b/gnu/llvm/docs/tutorial/LangImpl05.rst new file mode 100644 index 00000000000..ae0935d9ba1 --- /dev/null +++ b/gnu/llvm/docs/tutorial/LangImpl05.rst @@ -0,0 +1,790 @@ +================================================== +Kaleidoscope: Extending the Language: Control Flow +================================================== + +.. contents:: + :local: + +Chapter 5 Introduction +====================== + +Welcome to Chapter 5 of the "`Implementing a language with +LLVM <index.html>`_" tutorial. Parts 1-4 described the implementation of +the simple Kaleidoscope language and included support for generating +LLVM IR, followed by optimizations and a JIT compiler. Unfortunately, as +presented, Kaleidoscope is mostly useless: it has no control flow other +than call and return. This means that you can't have conditional +branches in the code, significantly limiting its power. In this episode +of "build that compiler", we'll extend Kaleidoscope to have an +if/then/else expression plus a simple 'for' loop. + +If/Then/Else +============ + +Extending Kaleidoscope to support if/then/else is quite straightforward. +It basically requires adding support for this "new" concept to the +lexer, parser, AST, and LLVM code emitter. This example is nice, because +it shows how easy it is to "grow" a language over time, incrementally +extending it as new ideas are discovered. + +Before we get going on "how" we add this extension, lets talk about +"what" we want. The basic idea is that we want to be able to write this +sort of thing: + +:: + + def fib(x) + if x < 3 then + 1 + else + fib(x-1)+fib(x-2); + +In Kaleidoscope, every construct is an expression: there are no +statements. As such, the if/then/else expression needs to return a value +like any other. Since we're using a mostly functional form, we'll have +it evaluate its conditional, then return the 'then' or 'else' value +based on how the condition was resolved. This is very similar to the C +"?:" expression. + +The semantics of the if/then/else expression is that it evaluates the +condition to a boolean equality value: 0.0 is considered to be false and +everything else is considered to be true. If the condition is true, the +first subexpression is evaluated and returned, if the condition is +false, the second subexpression is evaluated and returned. Since +Kaleidoscope allows side-effects, this behavior is important to nail +down. + +Now that we know what we "want", lets break this down into its +constituent pieces. + +Lexer Extensions for If/Then/Else +--------------------------------- + +The lexer extensions are straightforward. First we add new enum values +for the relevant tokens: + +.. code-block:: c++ + + // control + tok_if = -6, + tok_then = -7, + tok_else = -8, + +Once we have that, we recognize the new keywords in the lexer. This is +pretty simple stuff: + +.. code-block:: c++ + + ... + if (IdentifierStr == "def") + return tok_def; + if (IdentifierStr == "extern") + return tok_extern; + if (IdentifierStr == "if") + return tok_if; + if (IdentifierStr == "then") + return tok_then; + if (IdentifierStr == "else") + return tok_else; + return tok_identifier; + +AST Extensions for If/Then/Else +------------------------------- + +To represent the new expression we add a new AST node for it: + +.. code-block:: c++ + + /// IfExprAST - Expression class for if/then/else. + class IfExprAST : public ExprAST { + std::unique_ptr<ExprAST> Cond, Then, Else; + + public: + IfExprAST(std::unique_ptr<ExprAST> Cond, std::unique_ptr<ExprAST> Then, + std::unique_ptr<ExprAST> Else) + : Cond(std::move(Cond)), Then(std::move(Then)), Else(std::move(Else)) {} + virtual Value *codegen(); + }; + +The AST node just has pointers to the various subexpressions. + +Parser Extensions for If/Then/Else +---------------------------------- + +Now that we have the relevant tokens coming from the lexer and we have +the AST node to build, our parsing logic is relatively straightforward. +First we define a new parsing function: + +.. code-block:: c++ + + /// ifexpr ::= 'if' expression 'then' expression 'else' expression + static std::unique_ptr<ExprAST> ParseIfExpr() { + getNextToken(); // eat the if. + + // condition. + auto Cond = ParseExpression(); + if (!Cond) + return nullptr; + + if (CurTok != tok_then) + return LogError("expected then"); + getNextToken(); // eat the then + + auto Then = ParseExpression(); + if (!Then) + return nullptr; + + if (CurTok != tok_else) + return LogError("expected else"); + + getNextToken(); + + auto Else = ParseExpression(); + if (!Else) + return nullptr; + + return llvm::make_unique<IfExprAST>(std::move(Cond), std::move(Then), + std::move(Else)); + } + +Next we hook it up as a primary expression: + +.. code-block:: c++ + + static std::unique_ptr<ExprAST> ParsePrimary() { + switch (CurTok) { + default: + return LogError("unknown token when expecting an expression"); + case tok_identifier: + return ParseIdentifierExpr(); + case tok_number: + return ParseNumberExpr(); + case '(': + return ParseParenExpr(); + case tok_if: + return ParseIfExpr(); + } + } + +LLVM IR for If/Then/Else +------------------------ + +Now that we have it parsing and building the AST, the final piece is +adding LLVM code generation support. This is the most interesting part +of the if/then/else example, because this is where it starts to +introduce new concepts. All of the code above has been thoroughly +described in previous chapters. + +To motivate the code we want to produce, lets take a look at a simple +example. Consider: + +:: + + extern foo(); + extern bar(); + def baz(x) if x then foo() else bar(); + +If you disable optimizations, the code you'll (soon) get from +Kaleidoscope looks like this: + +.. code-block:: llvm + + declare double @foo() + + declare double @bar() + + define double @baz(double %x) { + entry: + %ifcond = fcmp one double %x, 0.000000e+00 + br i1 %ifcond, label %then, label %else + + then: ; preds = %entry + %calltmp = call double @foo() + br label %ifcont + + else: ; preds = %entry + %calltmp1 = call double @bar() + br label %ifcont + + ifcont: ; preds = %else, %then + %iftmp = phi double [ %calltmp, %then ], [ %calltmp1, %else ] + ret double %iftmp + } + +To visualize the control flow graph, you can use a nifty feature of the +LLVM '`opt <http://llvm.org/cmds/opt.html>`_' tool. If you put this LLVM +IR into "t.ll" and run "``llvm-as < t.ll | opt -analyze -view-cfg``", `a +window will pop up <../ProgrammersManual.html#viewing-graphs-while-debugging-code>`_ and you'll +see this graph: + +.. figure:: LangImpl05-cfg.png + :align: center + :alt: Example CFG + + Example CFG + +Another way to get this is to call "``F->viewCFG()``" or +"``F->viewCFGOnly()``" (where F is a "``Function*``") either by +inserting actual calls into the code and recompiling or by calling these +in the debugger. LLVM has many nice features for visualizing various +graphs. + +Getting back to the generated code, it is fairly simple: the entry block +evaluates the conditional expression ("x" in our case here) and compares +the result to 0.0 with the "``fcmp one``" instruction ('one' is "Ordered +and Not Equal"). Based on the result of this expression, the code jumps +to either the "then" or "else" blocks, which contain the expressions for +the true/false cases. + +Once the then/else blocks are finished executing, they both branch back +to the 'ifcont' block to execute the code that happens after the +if/then/else. In this case the only thing left to do is to return to the +caller of the function. The question then becomes: how does the code +know which expression to return? + +The answer to this question involves an important SSA operation: the +`Phi +operation <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_. +If you're not familiar with SSA, `the wikipedia +article <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_ +is a good introduction and there are various other introductions to it +available on your favorite search engine. The short version is that +"execution" of the Phi operation requires "remembering" which block +control came from. The Phi operation takes on the value corresponding to +the input control block. In this case, if control comes in from the +"then" block, it gets the value of "calltmp". If control comes from the +"else" block, it gets the value of "calltmp1". + +At this point, you are probably starting to think "Oh no! This means my +simple and elegant front-end will have to start generating SSA form in +order to use LLVM!". Fortunately, this is not the case, and we strongly +advise *not* implementing an SSA construction algorithm in your +front-end unless there is an amazingly good reason to do so. In +practice, there are two sorts of values that float around in code +written for your average imperative programming language that might need +Phi nodes: + +#. Code that involves user variables: ``x = 1; x = x + 1;`` +#. Values that are implicit in the structure of your AST, such as the + Phi node in this case. + +In `Chapter 7 <LangImpl7.html>`_ of this tutorial ("mutable variables"), +we'll talk about #1 in depth. For now, just believe me that you don't +need SSA construction to handle this case. For #2, you have the choice +of using the techniques that we will describe for #1, or you can insert +Phi nodes directly, if convenient. In this case, it is really +easy to generate the Phi node, so we choose to do it directly. + +Okay, enough of the motivation and overview, lets generate code! + +Code Generation for If/Then/Else +-------------------------------- + +In order to generate code for this, we implement the ``codegen`` method +for ``IfExprAST``: + +.. code-block:: c++ + + Value *IfExprAST::codegen() { + Value *CondV = Cond->codegen(); + if (!CondV) + return nullptr; + + // Convert condition to a bool by comparing equal to 0.0. + CondV = Builder.CreateFCmpONE( + CondV, ConstantFP::get(LLVMContext, APFloat(0.0)), "ifcond"); + +This code is straightforward and similar to what we saw before. We emit +the expression for the condition, then compare that value to zero to get +a truth value as a 1-bit (bool) value. + +.. code-block:: c++ + + Function *TheFunction = Builder.GetInsertBlock()->getParent(); + + // Create blocks for the then and else cases. Insert the 'then' block at the + // end of the function. + BasicBlock *ThenBB = + BasicBlock::Create(LLVMContext, "then", TheFunction); + BasicBlock *ElseBB = BasicBlock::Create(LLVMContext, "else"); + BasicBlock *MergeBB = BasicBlock::Create(LLVMContext, "ifcont"); + + Builder.CreateCondBr(CondV, ThenBB, ElseBB); + +This code creates the basic blocks that are related to the if/then/else +statement, and correspond directly to the blocks in the example above. +The first line gets the current Function object that is being built. It +gets this by asking the builder for the current BasicBlock, and asking +that block for its "parent" (the function it is currently embedded +into). + +Once it has that, it creates three blocks. Note that it passes +"TheFunction" into the constructor for the "then" block. This causes the +constructor to automatically insert the new block into the end of the +specified function. The other two blocks are created, but aren't yet +inserted into the function. + +Once the blocks are created, we can emit the conditional branch that +chooses between them. Note that creating new blocks does not implicitly +affect the IRBuilder, so it is still inserting into the block that the +condition went into. Also note that it is creating a branch to the +"then" block and the "else" block, even though the "else" block isn't +inserted into the function yet. This is all ok: it is the standard way +that LLVM supports forward references. + +.. code-block:: c++ + + // Emit then value. + Builder.SetInsertPoint(ThenBB); + + Value *ThenV = Then->codegen(); + if (!ThenV) + return nullptr; + + Builder.CreateBr(MergeBB); + // Codegen of 'Then' can change the current block, update ThenBB for the PHI. + ThenBB = Builder.GetInsertBlock(); + +After the conditional branch is inserted, we move the builder to start +inserting into the "then" block. Strictly speaking, this call moves the +insertion point to be at the end of the specified block. However, since +the "then" block is empty, it also starts out by inserting at the +beginning of the block. :) + +Once the insertion point is set, we recursively codegen the "then" +expression from the AST. To finish off the "then" block, we create an +unconditional branch to the merge block. One interesting (and very +important) aspect of the LLVM IR is that it `requires all basic blocks +to be "terminated" <../LangRef.html#functionstructure>`_ with a `control +flow instruction <../LangRef.html#terminators>`_ such as return or +branch. This means that all control flow, *including fall throughs* must +be made explicit in the LLVM IR. If you violate this rule, the verifier +will emit an error. + +The final line here is quite subtle, but is very important. The basic +issue is that when we create the Phi node in the merge block, we need to +set up the block/value pairs that indicate how the Phi will work. +Importantly, the Phi node expects to have an entry for each predecessor +of the block in the CFG. Why then, are we getting the current block when +we just set it to ThenBB 5 lines above? The problem is that the "Then" +expression may actually itself change the block that the Builder is +emitting into if, for example, it contains a nested "if/then/else" +expression. Because calling ``codegen()`` recursively could arbitrarily change +the notion of the current block, we are required to get an up-to-date +value for code that will set up the Phi node. + +.. code-block:: c++ + + // Emit else block. + TheFunction->getBasicBlockList().push_back(ElseBB); + Builder.SetInsertPoint(ElseBB); + + Value *ElseV = Else->codegen(); + if (!ElseV) + return nullptr; + + Builder.CreateBr(MergeBB); + // codegen of 'Else' can change the current block, update ElseBB for the PHI. + ElseBB = Builder.GetInsertBlock(); + +Code generation for the 'else' block is basically identical to codegen +for the 'then' block. The only significant difference is the first line, +which adds the 'else' block to the function. Recall previously that the +'else' block was created, but not added to the function. Now that the +'then' and 'else' blocks are emitted, we can finish up with the merge +code: + +.. code-block:: c++ + + // Emit merge block. + TheFunction->getBasicBlockList().push_back(MergeBB); + Builder.SetInsertPoint(MergeBB); + PHINode *PN = + Builder.CreatePHI(Type::getDoubleTy(LLVMContext), 2, "iftmp"); + + PN->addIncoming(ThenV, ThenBB); + PN->addIncoming(ElseV, ElseBB); + return PN; + } + +The first two lines here are now familiar: the first adds the "merge" +block to the Function object (it was previously floating, like the else +block above). The second changes the insertion point so that newly +created code will go into the "merge" block. Once that is done, we need +to create the PHI node and set up the block/value pairs for the PHI. + +Finally, the CodeGen function returns the phi node as the value computed +by the if/then/else expression. In our example above, this returned +value will feed into the code for the top-level function, which will +create the return instruction. + +Overall, we now have the ability to execute conditional code in +Kaleidoscope. With this extension, Kaleidoscope is a fairly complete +language that can calculate a wide variety of numeric functions. Next up +we'll add another useful expression that is familiar from non-functional +languages... + +'for' Loop Expression +===================== + +Now that we know how to add basic control flow constructs to the +language, we have the tools to add more powerful things. Lets add +something more aggressive, a 'for' expression: + +:: + + extern putchard(char) + def printstar(n) + for i = 1, i < n, 1.0 in + putchard(42); # ascii 42 = '*' + + # print 100 '*' characters + printstar(100); + +This expression defines a new variable ("i" in this case) which iterates +from a starting value, while the condition ("i < n" in this case) is +true, incrementing by an optional step value ("1.0" in this case). If +the step value is omitted, it defaults to 1.0. While the loop is true, +it executes its body expression. Because we don't have anything better +to return, we'll just define the loop as always returning 0.0. In the +future when we have mutable variables, it will get more useful. + +As before, lets talk about the changes that we need to Kaleidoscope to +support this. + +Lexer Extensions for the 'for' Loop +----------------------------------- + +The lexer extensions are the same sort of thing as for if/then/else: + +.. code-block:: c++ + + ... in enum Token ... + // control + tok_if = -6, tok_then = -7, tok_else = -8, + tok_for = -9, tok_in = -10 + + ... in gettok ... + if (IdentifierStr == "def") + return tok_def; + if (IdentifierStr == "extern") + return tok_extern; + if (IdentifierStr == "if") + return tok_if; + if (IdentifierStr == "then") + return tok_then; + if (IdentifierStr == "else") + return tok_else; + if (IdentifierStr == "for") + return tok_for; + if (IdentifierStr == "in") + return tok_in; + return tok_identifier; + +AST Extensions for the 'for' Loop +--------------------------------- + +The AST node is just as simple. It basically boils down to capturing the +variable name and the constituent expressions in the node. + +.. code-block:: c++ + + /// ForExprAST - Expression class for for/in. + class ForExprAST : public ExprAST { + std::string VarName; + std::unique_ptr<ExprAST> Start, End, Step, Body; + + public: + ForExprAST(const std::string &VarName, std::unique_ptr<ExprAST> Start, + std::unique_ptr<ExprAST> End, std::unique_ptr<ExprAST> Step, + std::unique_ptr<ExprAST> Body) + : VarName(VarName), Start(std::move(Start)), End(std::move(End)), + Step(std::move(Step)), Body(std::move(Body)) {} + virtual Value *codegen(); + }; + +Parser Extensions for the 'for' Loop +------------------------------------ + +The parser code is also fairly standard. The only interesting thing here +is handling of the optional step value. The parser code handles it by +checking to see if the second comma is present. If not, it sets the step +value to null in the AST node: + +.. code-block:: c++ + + /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression + static std::unique_ptr<ExprAST> ParseForExpr() { + getNextToken(); // eat the for. + + if (CurTok != tok_identifier) + return LogError("expected identifier after for"); + + std::string IdName = IdentifierStr; + getNextToken(); // eat identifier. + + if (CurTok != '=') + return LogError("expected '=' after for"); + getNextToken(); // eat '='. + + + auto Start = ParseExpression(); + if (!Start) + return nullptr; + if (CurTok != ',') + return LogError("expected ',' after for start value"); + getNextToken(); + + auto End = ParseExpression(); + if (!End) + return nullptr; + + // The step value is optional. + std::unique_ptr<ExprAST> Step; + if (CurTok == ',') { + getNextToken(); + Step = ParseExpression(); + if (!Step) + return nullptr; + } + + if (CurTok != tok_in) + return LogError("expected 'in' after for"); + getNextToken(); // eat 'in'. + + auto Body = ParseExpression(); + if (!Body) + return nullptr; + + return llvm::make_unique<ForExprAST>(IdName, std::move(Start), + std::move(End), std::move(Step), + std::move(Body)); + } + +LLVM IR for the 'for' Loop +-------------------------- + +Now we get to the good part: the LLVM IR we want to generate for this +thing. With the simple example above, we get this LLVM IR (note that +this dump is generated with optimizations disabled for clarity): + +.. code-block:: llvm + + declare double @putchard(double) + + define double @printstar(double %n) { + entry: + ; initial value = 1.0 (inlined into phi) + br label %loop + + loop: ; preds = %loop, %entry + %i = phi double [ 1.000000e+00, %entry ], [ %nextvar, %loop ] + ; body + %calltmp = call double @putchard(double 4.200000e+01) + ; increment + %nextvar = fadd double %i, 1.000000e+00 + + ; termination test + %cmptmp = fcmp ult double %i, %n + %booltmp = uitofp i1 %cmptmp to double + %loopcond = fcmp one double %booltmp, 0.000000e+00 + br i1 %loopcond, label %loop, label %afterloop + + afterloop: ; preds = %loop + ; loop always returns 0.0 + ret double 0.000000e+00 + } + +This loop contains all the same constructs we saw before: a phi node, +several expressions, and some basic blocks. Lets see how this fits +together. + +Code Generation for the 'for' Loop +---------------------------------- + +The first part of codegen is very simple: we just output the start +expression for the loop value: + +.. code-block:: c++ + + Value *ForExprAST::codegen() { + // Emit the start code first, without 'variable' in scope. + Value *StartVal = Start->codegen(); + if (StartVal == 0) return 0; + +With this out of the way, the next step is to set up the LLVM basic +block for the start of the loop body. In the case above, the whole loop +body is one block, but remember that the body code itself could consist +of multiple blocks (e.g. if it contains an if/then/else or a for/in +expression). + +.. code-block:: c++ + + // Make the new basic block for the loop header, inserting after current + // block. + Function *TheFunction = Builder.GetInsertBlock()->getParent(); + BasicBlock *PreheaderBB = Builder.GetInsertBlock(); + BasicBlock *LoopBB = + BasicBlock::Create(LLVMContext, "loop", TheFunction); + + // Insert an explicit fall through from the current block to the LoopBB. + Builder.CreateBr(LoopBB); + +This code is similar to what we saw for if/then/else. Because we will +need it to create the Phi node, we remember the block that falls through +into the loop. Once we have that, we create the actual block that starts +the loop and create an unconditional branch for the fall-through between +the two blocks. + +.. code-block:: c++ + + // Start insertion in LoopBB. + Builder.SetInsertPoint(LoopBB); + + // Start the PHI node with an entry for Start. + PHINode *Variable = Builder.CreatePHI(Type::getDoubleTy(LLVMContext), + 2, VarName.c_str()); + Variable->addIncoming(StartVal, PreheaderBB); + +Now that the "preheader" for the loop is set up, we switch to emitting +code for the loop body. To begin with, we move the insertion point and +create the PHI node for the loop induction variable. Since we already +know the incoming value for the starting value, we add it to the Phi +node. Note that the Phi will eventually get a second value for the +backedge, but we can't set it up yet (because it doesn't exist!). + +.. code-block:: c++ + + // Within the loop, the variable is defined equal to the PHI node. If it + // shadows an existing variable, we have to restore it, so save it now. + Value *OldVal = NamedValues[VarName]; + NamedValues[VarName] = Variable; + + // Emit the body of the loop. This, like any other expr, can change the + // current BB. Note that we ignore the value computed by the body, but don't + // allow an error. + if (!Body->codegen()) + return nullptr; + +Now the code starts to get more interesting. Our 'for' loop introduces a +new variable to the symbol table. This means that our symbol table can +now contain either function arguments or loop variables. To handle this, +before we codegen the body of the loop, we add the loop variable as the +current value for its name. Note that it is possible that there is a +variable of the same name in the outer scope. It would be easy to make +this an error (emit an error and return null if there is already an +entry for VarName) but we choose to allow shadowing of variables. In +order to handle this correctly, we remember the Value that we are +potentially shadowing in ``OldVal`` (which will be null if there is no +shadowed variable). + +Once the loop variable is set into the symbol table, the code +recursively codegen's the body. This allows the body to use the loop +variable: any references to it will naturally find it in the symbol +table. + +.. code-block:: c++ + + // Emit the step value. + Value *StepVal = nullptr; + if (Step) { + StepVal = Step->codegen(); + if (!StepVal) + return nullptr; + } else { + // If not specified, use 1.0. + StepVal = ConstantFP::get(LLVMContext, APFloat(1.0)); + } + + Value *NextVar = Builder.CreateFAdd(Variable, StepVal, "nextvar"); + +Now that the body is emitted, we compute the next value of the iteration +variable by adding the step value, or 1.0 if it isn't present. +'``NextVar``' will be the value of the loop variable on the next +iteration of the loop. + +.. code-block:: c++ + + // Compute the end condition. + Value *EndCond = End->codegen(); + if (!EndCond) + return nullptr; + + // Convert condition to a bool by comparing equal to 0.0. + EndCond = Builder.CreateFCmpONE( + EndCond, ConstantFP::get(LLVMContext, APFloat(0.0)), "loopcond"); + +Finally, we evaluate the exit value of the loop, to determine whether +the loop should exit. This mirrors the condition evaluation for the +if/then/else statement. + +.. code-block:: c++ + + // Create the "after loop" block and insert it. + BasicBlock *LoopEndBB = Builder.GetInsertBlock(); + BasicBlock *AfterBB = + BasicBlock::Create(LLVMContext, "afterloop", TheFunction); + + // Insert the conditional branch into the end of LoopEndBB. + Builder.CreateCondBr(EndCond, LoopBB, AfterBB); + + // Any new code will be inserted in AfterBB. + Builder.SetInsertPoint(AfterBB); + +With the code for the body of the loop complete, we just need to finish +up the control flow for it. This code remembers the end block (for the +phi node), then creates the block for the loop exit ("afterloop"). Based +on the value of the exit condition, it creates a conditional branch that +chooses between executing the loop again and exiting the loop. Any +future code is emitted in the "afterloop" block, so it sets the +insertion position to it. + +.. code-block:: c++ + + // Add a new entry to the PHI node for the backedge. + Variable->addIncoming(NextVar, LoopEndBB); + + // Restore the unshadowed variable. + if (OldVal) + NamedValues[VarName] = OldVal; + else + NamedValues.erase(VarName); + + // for expr always returns 0.0. + return Constant::getNullValue(Type::getDoubleTy(LLVMContext)); + } + +The final code handles various cleanups: now that we have the "NextVar" +value, we can add the incoming value to the loop PHI node. After that, +we remove the loop variable from the symbol table, so that it isn't in +scope after the for loop. Finally, code generation of the for loop +always returns 0.0, so that is what we return from +``ForExprAST::codegen()``. + +With this, we conclude the "adding control flow to Kaleidoscope" chapter +of the tutorial. In this chapter we added two control flow constructs, +and used them to motivate a couple of aspects of the LLVM IR that are +important for front-end implementors to know. In the next chapter of our +saga, we will get a bit crazier and add `user-defined +operators <LangImpl6.html>`_ to our poor innocent language. + +Full Code Listing +================= + +Here is the complete code listing for our running example, enhanced with +the if/then/else and for expressions.. To build this example, use: + +.. code-block:: bash + + # Compile + clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy + # Run + ./toy + +Here is the code: + +.. literalinclude:: ../../examples/Kaleidoscope/Chapter5/toy.cpp + :language: c++ + +`Next: Extending the language: user-defined operators <LangImpl06.html>`_ + diff --git a/gnu/llvm/docs/tutorial/LangImpl06.rst b/gnu/llvm/docs/tutorial/LangImpl06.rst new file mode 100644 index 00000000000..7c9a2123e8f --- /dev/null +++ b/gnu/llvm/docs/tutorial/LangImpl06.rst @@ -0,0 +1,768 @@ +============================================================ +Kaleidoscope: Extending the Language: User-defined Operators +============================================================ + +.. contents:: + :local: + +Chapter 6 Introduction +====================== + +Welcome to Chapter 6 of the "`Implementing a language with +LLVM <index.html>`_" tutorial. At this point in our tutorial, we now +have a fully functional language that is fairly minimal, but also +useful. There is still one big problem with it, however. Our language +doesn't have many useful operators (like division, logical negation, or +even any comparisons besides less-than). + +This chapter of the tutorial takes a wild digression into adding +user-defined operators to the simple and beautiful Kaleidoscope +language. This digression now gives us a simple and ugly language in +some ways, but also a powerful one at the same time. One of the great +things about creating your own language is that you get to decide what +is good or bad. In this tutorial we'll assume that it is okay to use +this as a way to show some interesting parsing techniques. + +At the end of this tutorial, we'll run through an example Kaleidoscope +application that `renders the Mandelbrot set <#kicking-the-tires>`_. This gives an +example of what you can build with Kaleidoscope and its feature set. + +User-defined Operators: the Idea +================================ + +The "operator overloading" that we will add to Kaleidoscope is more +general than languages like C++. In C++, you are only allowed to +redefine existing operators: you can't programatically change the +grammar, introduce new operators, change precedence levels, etc. In this +chapter, we will add this capability to Kaleidoscope, which will let the +user round out the set of operators that are supported. + +The point of going into user-defined operators in a tutorial like this +is to show the power and flexibility of using a hand-written parser. +Thus far, the parser we have been implementing uses recursive descent +for most parts of the grammar and operator precedence parsing for the +expressions. See `Chapter 2 <LangImpl2.html>`_ for details. Without +using operator precedence parsing, it would be very difficult to allow +the programmer to introduce new operators into the grammar: the grammar +is dynamically extensible as the JIT runs. + +The two specific features we'll add are programmable unary operators +(right now, Kaleidoscope has no unary operators at all) as well as +binary operators. An example of this is: + +:: + + # Logical unary not. + def unary!(v) + if v then + 0 + else + 1; + + # Define > with the same precedence as <. + def binary> 10 (LHS RHS) + RHS < LHS; + + # Binary "logical or", (note that it does not "short circuit") + def binary| 5 (LHS RHS) + if LHS then + 1 + else if RHS then + 1 + else + 0; + + # Define = with slightly lower precedence than relationals. + def binary= 9 (LHS RHS) + !(LHS < RHS | LHS > RHS); + +Many languages aspire to being able to implement their standard runtime +library in the language itself. In Kaleidoscope, we can implement +significant parts of the language in the library! + +We will break down implementation of these features into two parts: +implementing support for user-defined binary operators and adding unary +operators. + +User-defined Binary Operators +============================= + +Adding support for user-defined binary operators is pretty simple with +our current framework. We'll first add support for the unary/binary +keywords: + +.. code-block:: c++ + + enum Token { + ... + // operators + tok_binary = -11, + tok_unary = -12 + }; + ... + static int gettok() { + ... + if (IdentifierStr == "for") + return tok_for; + if (IdentifierStr == "in") + return tok_in; + if (IdentifierStr == "binary") + return tok_binary; + if (IdentifierStr == "unary") + return tok_unary; + return tok_identifier; + +This just adds lexer support for the unary and binary keywords, like we +did in `previous chapters <LangImpl5.html#lexer-extensions-for-if-then-else>`_. One nice thing +about our current AST, is that we represent binary operators with full +generalisation by using their ASCII code as the opcode. For our extended +operators, we'll use this same representation, so we don't need any new +AST or parser support. + +On the other hand, we have to be able to represent the definitions of +these new operators, in the "def binary\| 5" part of the function +definition. In our grammar so far, the "name" for the function +definition is parsed as the "prototype" production and into the +``PrototypeAST`` AST node. To represent our new user-defined operators +as prototypes, we have to extend the ``PrototypeAST`` AST node like +this: + +.. code-block:: c++ + + /// PrototypeAST - This class represents the "prototype" for a function, + /// which captures its argument names as well as if it is an operator. + class PrototypeAST { + std::string Name; + std::vector<std::string> Args; + bool IsOperator; + unsigned Precedence; // Precedence if a binary op. + + public: + PrototypeAST(const std::string &name, std::vector<std::string> Args, + bool IsOperator = false, unsigned Prec = 0) + : Name(name), Args(std::move(Args)), IsOperator(IsOperator), + Precedence(Prec) {} + + bool isUnaryOp() const { return IsOperator && Args.size() == 1; } + bool isBinaryOp() const { return IsOperator && Args.size() == 2; } + + char getOperatorName() const { + assert(isUnaryOp() || isBinaryOp()); + return Name[Name.size()-1]; + } + + unsigned getBinaryPrecedence() const { return Precedence; } + + Function *codegen(); + }; + +Basically, in addition to knowing a name for the prototype, we now keep +track of whether it was an operator, and if it was, what precedence +level the operator is at. The precedence is only used for binary +operators (as you'll see below, it just doesn't apply for unary +operators). Now that we have a way to represent the prototype for a +user-defined operator, we need to parse it: + +.. code-block:: c++ + + /// prototype + /// ::= id '(' id* ')' + /// ::= binary LETTER number? (id, id) + static std::unique_ptr<PrototypeAST> ParsePrototype() { + std::string FnName; + + unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. + unsigned BinaryPrecedence = 30; + + switch (CurTok) { + default: + return LogErrorP("Expected function name in prototype"); + case tok_identifier: + FnName = IdentifierStr; + Kind = 0; + getNextToken(); + break; + case tok_binary: + getNextToken(); + if (!isascii(CurTok)) + return LogErrorP("Expected binary operator"); + FnName = "binary"; + FnName += (char)CurTok; + Kind = 2; + getNextToken(); + + // Read the precedence if present. + if (CurTok == tok_number) { + if (NumVal < 1 || NumVal > 100) + return LogErrorP("Invalid precedecnce: must be 1..100"); + BinaryPrecedence = (unsigned)NumVal; + getNextToken(); + } + break; + } + + if (CurTok != '(') + return LogErrorP("Expected '(' in prototype"); + + std::vector<std::string> ArgNames; + while (getNextToken() == tok_identifier) + ArgNames.push_back(IdentifierStr); + if (CurTok != ')') + return LogErrorP("Expected ')' in prototype"); + + // success. + getNextToken(); // eat ')'. + + // Verify right number of names for operator. + if (Kind && ArgNames.size() != Kind) + return LogErrorP("Invalid number of operands for operator"); + + return llvm::make_unique<PrototypeAST>(FnName, std::move(ArgNames), Kind != 0, + BinaryPrecedence); + } + +This is all fairly straightforward parsing code, and we have already +seen a lot of similar code in the past. One interesting part about the +code above is the couple lines that set up ``FnName`` for binary +operators. This builds names like "binary@" for a newly defined "@" +operator. This then takes advantage of the fact that symbol names in the +LLVM symbol table are allowed to have any character in them, including +embedded nul characters. + +The next interesting thing to add, is codegen support for these binary +operators. Given our current structure, this is a simple addition of a +default case for our existing binary operator node: + +.. code-block:: c++ + + Value *BinaryExprAST::codegen() { + Value *L = LHS->codegen(); + Value *R = RHS->codegen(); + if (!L || !R) + return nullptr; + + switch (Op) { + case '+': + return Builder.CreateFAdd(L, R, "addtmp"); + case '-': + return Builder.CreateFSub(L, R, "subtmp"); + case '*': + return Builder.CreateFMul(L, R, "multmp"); + case '<': + L = Builder.CreateFCmpULT(L, R, "cmptmp"); + // Convert bool 0/1 to double 0.0 or 1.0 + return Builder.CreateUIToFP(L, Type::getDoubleTy(LLVMContext), + "booltmp"); + default: + break; + } + + // If it wasn't a builtin binary operator, it must be a user defined one. Emit + // a call to it. + Function *F = TheModule->getFunction(std::string("binary") + Op); + assert(F && "binary operator not found!"); + + Value *Ops[2] = { L, R }; + return Builder.CreateCall(F, Ops, "binop"); + } + +As you can see above, the new code is actually really simple. It just +does a lookup for the appropriate operator in the symbol table and +generates a function call to it. Since user-defined operators are just +built as normal functions (because the "prototype" boils down to a +function with the right name) everything falls into place. + +The final piece of code we are missing, is a bit of top-level magic: + +.. code-block:: c++ + + Function *FunctionAST::codegen() { + NamedValues.clear(); + + Function *TheFunction = Proto->codegen(); + if (!TheFunction) + return nullptr; + + // If this is an operator, install it. + if (Proto->isBinaryOp()) + BinopPrecedence[Proto->getOperatorName()] = Proto->getBinaryPrecedence(); + + // Create a new basic block to start insertion into. + BasicBlock *BB = BasicBlock::Create(LLVMContext, "entry", TheFunction); + Builder.SetInsertPoint(BB); + + if (Value *RetVal = Body->codegen()) { + ... + +Basically, before codegening a function, if it is a user-defined +operator, we register it in the precedence table. This allows the binary +operator parsing logic we already have in place to handle it. Since we +are working on a fully-general operator precedence parser, this is all +we need to do to "extend the grammar". + +Now we have useful user-defined binary operators. This builds a lot on +the previous framework we built for other operators. Adding unary +operators is a bit more challenging, because we don't have any framework +for it yet - lets see what it takes. + +User-defined Unary Operators +============================ + +Since we don't currently support unary operators in the Kaleidoscope +language, we'll need to add everything to support them. Above, we added +simple support for the 'unary' keyword to the lexer. In addition to +that, we need an AST node: + +.. code-block:: c++ + + /// UnaryExprAST - Expression class for a unary operator. + class UnaryExprAST : public ExprAST { + char Opcode; + std::unique_ptr<ExprAST> Operand; + + public: + UnaryExprAST(char Opcode, std::unique_ptr<ExprAST> Operand) + : Opcode(Opcode), Operand(std::move(Operand)) {} + virtual Value *codegen(); + }; + +This AST node is very simple and obvious by now. It directly mirrors the +binary operator AST node, except that it only has one child. With this, +we need to add the parsing logic. Parsing a unary operator is pretty +simple: we'll add a new function to do it: + +.. code-block:: c++ + + /// unary + /// ::= primary + /// ::= '!' unary + static std::unique_ptr<ExprAST> ParseUnary() { + // If the current token is not an operator, it must be a primary expr. + if (!isascii(CurTok) || CurTok == '(' || CurTok == ',') + return ParsePrimary(); + + // If this is a unary operator, read it. + int Opc = CurTok; + getNextToken(); + if (auto Operand = ParseUnary()) + return llvm::unique_ptr<UnaryExprAST>(Opc, std::move(Operand)); + return nullptr; + } + +The grammar we add is pretty straightforward here. If we see a unary +operator when parsing a primary operator, we eat the operator as a +prefix and parse the remaining piece as another unary operator. This +allows us to handle multiple unary operators (e.g. "!!x"). Note that +unary operators can't have ambiguous parses like binary operators can, +so there is no need for precedence information. + +The problem with this function, is that we need to call ParseUnary from +somewhere. To do this, we change previous callers of ParsePrimary to +call ParseUnary instead: + +.. code-block:: c++ + + /// binoprhs + /// ::= ('+' unary)* + static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec, + std::unique_ptr<ExprAST> LHS) { + ... + // Parse the unary expression after the binary operator. + auto RHS = ParseUnary(); + if (!RHS) + return nullptr; + ... + } + /// expression + /// ::= unary binoprhs + /// + static std::unique_ptr<ExprAST> ParseExpression() { + auto LHS = ParseUnary(); + if (!LHS) + return nullptr; + + return ParseBinOpRHS(0, std::move(LHS)); + } + +With these two simple changes, we are now able to parse unary operators +and build the AST for them. Next up, we need to add parser support for +prototypes, to parse the unary operator prototype. We extend the binary +operator code above with: + +.. code-block:: c++ + + /// prototype + /// ::= id '(' id* ')' + /// ::= binary LETTER number? (id, id) + /// ::= unary LETTER (id) + static std::unique_ptr<PrototypeAST> ParsePrototype() { + std::string FnName; + + unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary. + unsigned BinaryPrecedence = 30; + + switch (CurTok) { + default: + return LogErrorP("Expected function name in prototype"); + case tok_identifier: + FnName = IdentifierStr; + Kind = 0; + getNextToken(); + break; + case tok_unary: + getNextToken(); + if (!isascii(CurTok)) + return LogErrorP("Expected unary operator"); + FnName = "unary"; + FnName += (char)CurTok; + Kind = 1; + getNextToken(); + break; + case tok_binary: + ... + +As with binary operators, we name unary operators with a name that +includes the operator character. This assists us at code generation +time. Speaking of, the final piece we need to add is codegen support for +unary operators. It looks like this: + +.. code-block:: c++ + + Value *UnaryExprAST::codegen() { + Value *OperandV = Operand->codegen(); + if (!OperandV) + return nullptr; + + Function *F = TheModule->getFunction(std::string("unary")+Opcode); + if (!F) + return LogErrorV("Unknown unary operator"); + + return Builder.CreateCall(F, OperandV, "unop"); + } + +This code is similar to, but simpler than, the code for binary +operators. It is simpler primarily because it doesn't need to handle any +predefined operators. + +Kicking the Tires +================= + +It is somewhat hard to believe, but with a few simple extensions we've +covered in the last chapters, we have grown a real-ish language. With +this, we can do a lot of interesting things, including I/O, math, and a +bunch of other things. For example, we can now add a nice sequencing +operator (printd is defined to print out the specified value and a +newline): + +:: + + ready> extern printd(x); + Read extern: + declare double @printd(double) + + ready> def binary : 1 (x y) 0; # Low-precedence operator that ignores operands. + .. + ready> printd(123) : printd(456) : printd(789); + 123.000000 + 456.000000 + 789.000000 + Evaluated to 0.000000 + +We can also define a bunch of other "primitive" operations, such as: + +:: + + # Logical unary not. + def unary!(v) + if v then + 0 + else + 1; + + # Unary negate. + def unary-(v) + 0-v; + + # Define > with the same precedence as <. + def binary> 10 (LHS RHS) + RHS < LHS; + + # Binary logical or, which does not short circuit. + def binary| 5 (LHS RHS) + if LHS then + 1 + else if RHS then + 1 + else + 0; + + # Binary logical and, which does not short circuit. + def binary& 6 (LHS RHS) + if !LHS then + 0 + else + !!RHS; + + # Define = with slightly lower precedence than relationals. + def binary = 9 (LHS RHS) + !(LHS < RHS | LHS > RHS); + + # Define ':' for sequencing: as a low-precedence operator that ignores operands + # and just returns the RHS. + def binary : 1 (x y) y; + +Given the previous if/then/else support, we can also define interesting +functions for I/O. For example, the following prints out a character +whose "density" reflects the value passed in: the lower the value, the +denser the character: + +:: + + ready> + + extern putchard(char) + def printdensity(d) + if d > 8 then + putchard(32) # ' ' + else if d > 4 then + putchard(46) # '.' + else if d > 2 then + putchard(43) # '+' + else + putchard(42); # '*' + ... + ready> printdensity(1): printdensity(2): printdensity(3): + printdensity(4): printdensity(5): printdensity(9): + putchard(10); + **++. + Evaluated to 0.000000 + +Based on these simple primitive operations, we can start to define more +interesting things. For example, here's a little function that solves +for the number of iterations it takes a function in the complex plane to +converge: + +:: + + # Determine whether the specific location diverges. + # Solve for z = z^2 + c in the complex plane. + def mandelconverger(real imag iters creal cimag) + if iters > 255 | (real*real + imag*imag > 4) then + iters + else + mandelconverger(real*real - imag*imag + creal, + 2*real*imag + cimag, + iters+1, creal, cimag); + + # Return the number of iterations required for the iteration to escape + def mandelconverge(real imag) + mandelconverger(real, imag, 0, real, imag); + +This "``z = z2 + c``" function is a beautiful little creature that is +the basis for computation of the `Mandelbrot +Set <http://en.wikipedia.org/wiki/Mandelbrot_set>`_. Our +``mandelconverge`` function returns the number of iterations that it +takes for a complex orbit to escape, saturating to 255. This is not a +very useful function by itself, but if you plot its value over a +two-dimensional plane, you can see the Mandelbrot set. Given that we are +limited to using putchard here, our amazing graphical output is limited, +but we can whip together something using the density plotter above: + +:: + + # Compute and plot the mandelbrot set with the specified 2 dimensional range + # info. + def mandelhelp(xmin xmax xstep ymin ymax ystep) + for y = ymin, y < ymax, ystep in ( + (for x = xmin, x < xmax, xstep in + printdensity(mandelconverge(x,y))) + : putchard(10) + ) + + # mandel - This is a convenient helper function for plotting the mandelbrot set + # from the specified position with the specified Magnification. + def mandel(realstart imagstart realmag imagmag) + mandelhelp(realstart, realstart+realmag*78, realmag, + imagstart, imagstart+imagmag*40, imagmag); + +Given this, we can try plotting out the mandelbrot set! Lets try it out: + +:: + + ready> mandel(-2.3, -1.3, 0.05, 0.07); + *******************************+++++++++++************************************* + *************************+++++++++++++++++++++++******************************* + **********************+++++++++++++++++++++++++++++**************************** + *******************+++++++++++++++++++++.. ...++++++++************************* + *****************++++++++++++++++++++++.... ...+++++++++*********************** + ***************+++++++++++++++++++++++..... ...+++++++++********************* + **************+++++++++++++++++++++++.... ....+++++++++******************** + *************++++++++++++++++++++++...... .....++++++++******************* + ************+++++++++++++++++++++....... .......+++++++****************** + ***********+++++++++++++++++++.... ... .+++++++***************** + **********+++++++++++++++++....... .+++++++**************** + *********++++++++++++++........... ...+++++++*************** + ********++++++++++++............ ...++++++++************** + ********++++++++++... .......... .++++++++************** + *******+++++++++..... .+++++++++************* + *******++++++++...... ..+++++++++************* + *******++++++....... ..+++++++++************* + *******+++++...... ..+++++++++************* + *******.... .... ...+++++++++************* + *******.... . ...+++++++++************* + *******+++++...... ...+++++++++************* + *******++++++....... ..+++++++++************* + *******++++++++...... .+++++++++************* + *******+++++++++..... ..+++++++++************* + ********++++++++++... .......... .++++++++************** + ********++++++++++++............ ...++++++++************** + *********++++++++++++++.......... ...+++++++*************** + **********++++++++++++++++........ .+++++++**************** + **********++++++++++++++++++++.... ... ..+++++++**************** + ***********++++++++++++++++++++++....... .......++++++++***************** + ************+++++++++++++++++++++++...... ......++++++++****************** + **************+++++++++++++++++++++++.... ....++++++++******************** + ***************+++++++++++++++++++++++..... ...+++++++++********************* + *****************++++++++++++++++++++++.... ...++++++++*********************** + *******************+++++++++++++++++++++......++++++++************************* + *********************++++++++++++++++++++++.++++++++*************************** + *************************+++++++++++++++++++++++******************************* + ******************************+++++++++++++************************************ + ******************************************************************************* + ******************************************************************************* + ******************************************************************************* + Evaluated to 0.000000 + ready> mandel(-2, -1, 0.02, 0.04); + **************************+++++++++++++++++++++++++++++++++++++++++++++++++++++ + ***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + *********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++. + *******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++... + *****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++..... + ***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........ + **************++++++++++++++++++++++++++++++++++++++++++++++++++++++........... + ************+++++++++++++++++++++++++++++++++++++++++++++++++++++.............. + ***********++++++++++++++++++++++++++++++++++++++++++++++++++........ . + **********++++++++++++++++++++++++++++++++++++++++++++++............. + ********+++++++++++++++++++++++++++++++++++++++++++.................. + *******+++++++++++++++++++++++++++++++++++++++....................... + ******+++++++++++++++++++++++++++++++++++........................... + *****++++++++++++++++++++++++++++++++............................ + *****++++++++++++++++++++++++++++............................... + ****++++++++++++++++++++++++++...... ......................... + ***++++++++++++++++++++++++......... ...... ........... + ***++++++++++++++++++++++............ + **+++++++++++++++++++++.............. + **+++++++++++++++++++................ + *++++++++++++++++++................. + *++++++++++++++++............ ... + *++++++++++++++.............. + *+++....++++................ + *.......... ........... + * + *.......... ........... + *+++....++++................ + *++++++++++++++.............. + *++++++++++++++++............ ... + *++++++++++++++++++................. + **+++++++++++++++++++................ + **+++++++++++++++++++++.............. + ***++++++++++++++++++++++............ + ***++++++++++++++++++++++++......... ...... ........... + ****++++++++++++++++++++++++++...... ......................... + *****++++++++++++++++++++++++++++............................... + *****++++++++++++++++++++++++++++++++............................ + ******+++++++++++++++++++++++++++++++++++........................... + *******+++++++++++++++++++++++++++++++++++++++....................... + ********+++++++++++++++++++++++++++++++++++++++++++.................. + Evaluated to 0.000000 + ready> mandel(-0.9, -1.4, 0.02, 0.03); + ******************************************************************************* + ******************************************************************************* + ******************************************************************************* + **********+++++++++++++++++++++************************************************ + *+++++++++++++++++++++++++++++++++++++++*************************************** + +++++++++++++++++++++++++++++++++++++++++++++********************************** + ++++++++++++++++++++++++++++++++++++++++++++++++++***************************** + ++++++++++++++++++++++++++++++++++++++++++++++++++++++************************* + +++++++++++++++++++++++++++++++++++++++++++++++++++++++++********************** + +++++++++++++++++++++++++++++++++.........++++++++++++++++++******************* + +++++++++++++++++++++++++++++++.... ......+++++++++++++++++++**************** + +++++++++++++++++++++++++++++....... ........+++++++++++++++++++************** + ++++++++++++++++++++++++++++........ ........++++++++++++++++++++************ + +++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++********** + ++++++++++++++++++++++++++........... ....++++++++++++++++++++++******** + ++++++++++++++++++++++++............. .......++++++++++++++++++++++****** + +++++++++++++++++++++++............. ........+++++++++++++++++++++++**** + ++++++++++++++++++++++........... ..........++++++++++++++++++++++*** + ++++++++++++++++++++........... .........++++++++++++++++++++++* + ++++++++++++++++++............ ...........++++++++++++++++++++ + ++++++++++++++++............... .............++++++++++++++++++ + ++++++++++++++................. ...............++++++++++++++++ + ++++++++++++.................. .................++++++++++++++ + +++++++++.................. .................+++++++++++++ + ++++++........ . ......... ..++++++++++++ + ++............ ...... ....++++++++++ + .............. ...++++++++++ + .............. ....+++++++++ + .............. .....++++++++ + ............. ......++++++++ + ........... .......++++++++ + ......... ........+++++++ + ......... ........+++++++ + ......... ....+++++++ + ........ ...+++++++ + ....... ...+++++++ + ....+++++++ + .....+++++++ + ....+++++++ + ....+++++++ + ....+++++++ + Evaluated to 0.000000 + ready> ^D + +At this point, you may be starting to realize that Kaleidoscope is a +real and powerful language. It may not be self-similar :), but it can be +used to plot things that are! + +With this, we conclude the "adding user-defined operators" chapter of +the tutorial. We have successfully augmented our language, adding the +ability to extend the language in the library, and we have shown how +this can be used to build a simple but interesting end-user application +in Kaleidoscope. At this point, Kaleidoscope can build a variety of +applications that are functional and can call functions with +side-effects, but it can't actually define and mutate a variable itself. + +Strikingly, variable mutation is an important feature of some languages, +and it is not at all obvious how to `add support for mutable +variables <LangImpl7.html>`_ without having to add an "SSA construction" +phase to your front-end. In the next chapter, we will describe how you +can add variable mutation without building SSA in your front-end. + +Full Code Listing +================= + +Here is the complete code listing for our running example, enhanced with +the if/then/else and for expressions.. To build this example, use: + +.. code-block:: bash + + # Compile + clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy + # Run + ./toy + +On some platforms, you will need to specify -rdynamic or +-Wl,--export-dynamic when linking. This ensures that symbols defined in +the main executable are exported to the dynamic linker and so are +available for symbol resolution at run time. This is not needed if you +compile your support code into a shared library, although doing that +will cause problems on Windows. + +Here is the code: + +.. literalinclude:: ../../examples/Kaleidoscope/Chapter6/toy.cpp + :language: c++ + +`Next: Extending the language: mutable variables / SSA +construction <LangImpl07.html>`_ + diff --git a/gnu/llvm/docs/tutorial/LangImpl07.rst b/gnu/llvm/docs/tutorial/LangImpl07.rst new file mode 100644 index 00000000000..4d86ecad38a --- /dev/null +++ b/gnu/llvm/docs/tutorial/LangImpl07.rst @@ -0,0 +1,881 @@ +======================================================= +Kaleidoscope: Extending the Language: Mutable Variables +======================================================= + +.. contents:: + :local: + +Chapter 7 Introduction +====================== + +Welcome to Chapter 7 of the "`Implementing a language with +LLVM <index.html>`_" tutorial. In chapters 1 through 6, we've built a +very respectable, albeit simple, `functional programming +language <http://en.wikipedia.org/wiki/Functional_programming>`_. In our +journey, we learned some parsing techniques, how to build and represent +an AST, how to build LLVM IR, and how to optimize the resultant code as +well as JIT compile it. + +While Kaleidoscope is interesting as a functional language, the fact +that it is functional makes it "too easy" to generate LLVM IR for it. In +particular, a functional language makes it very easy to build LLVM IR +directly in `SSA +form <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_. +Since LLVM requires that the input code be in SSA form, this is a very +nice property and it is often unclear to newcomers how to generate code +for an imperative language with mutable variables. + +The short (and happy) summary of this chapter is that there is no need +for your front-end to build SSA form: LLVM provides highly tuned and +well tested support for this, though the way it works is a bit +unexpected for some. + +Why is this a hard problem? +=========================== + +To understand why mutable variables cause complexities in SSA +construction, consider this extremely simple C example: + +.. code-block:: c + + int G, H; + int test(_Bool Condition) { + int X; + if (Condition) + X = G; + else + X = H; + return X; + } + +In this case, we have the variable "X", whose value depends on the path +executed in the program. Because there are two different possible values +for X before the return instruction, a PHI node is inserted to merge the +two values. The LLVM IR that we want for this example looks like this: + +.. code-block:: llvm + + @G = weak global i32 0 ; type of @G is i32* + @H = weak global i32 0 ; type of @H is i32* + + define i32 @test(i1 %Condition) { + entry: + br i1 %Condition, label %cond_true, label %cond_false + + cond_true: + %X.0 = load i32* @G + br label %cond_next + + cond_false: + %X.1 = load i32* @H + br label %cond_next + + cond_next: + %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] + ret i32 %X.2 + } + +In this example, the loads from the G and H global variables are +explicit in the LLVM IR, and they live in the then/else branches of the +if statement (cond\_true/cond\_false). In order to merge the incoming +values, the X.2 phi node in the cond\_next block selects the right value +to use based on where control flow is coming from: if control flow comes +from the cond\_false block, X.2 gets the value of X.1. Alternatively, if +control flow comes from cond\_true, it gets the value of X.0. The intent +of this chapter is not to explain the details of SSA form. For more +information, see one of the many `online +references <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_. + +The question for this article is "who places the phi nodes when lowering +assignments to mutable variables?". The issue here is that LLVM +*requires* that its IR be in SSA form: there is no "non-ssa" mode for +it. However, SSA construction requires non-trivial algorithms and data +structures, so it is inconvenient and wasteful for every front-end to +have to reproduce this logic. + +Memory in LLVM +============== + +The 'trick' here is that while LLVM does require all register values to +be in SSA form, it does not require (or permit) memory objects to be in +SSA form. In the example above, note that the loads from G and H are +direct accesses to G and H: they are not renamed or versioned. This +differs from some other compiler systems, which do try to version memory +objects. In LLVM, instead of encoding dataflow analysis of memory into +the LLVM IR, it is handled with `Analysis +Passes <../WritingAnLLVMPass.html>`_ which are computed on demand. + +With this in mind, the high-level idea is that we want to make a stack +variable (which lives in memory, because it is on the stack) for each +mutable object in a function. To take advantage of this trick, we need +to talk about how LLVM represents stack variables. + +In LLVM, all memory accesses are explicit with load/store instructions, +and it is carefully designed not to have (or need) an "address-of" +operator. Notice how the type of the @G/@H global variables is actually +"i32\*" even though the variable is defined as "i32". What this means is +that @G defines *space* for an i32 in the global data area, but its +*name* actually refers to the address for that space. Stack variables +work the same way, except that instead of being declared with global +variable definitions, they are declared with the `LLVM alloca +instruction <../LangRef.html#alloca-instruction>`_: + +.. code-block:: llvm + + define i32 @example() { + entry: + %X = alloca i32 ; type of %X is i32*. + ... + %tmp = load i32* %X ; load the stack value %X from the stack. + %tmp2 = add i32 %tmp, 1 ; increment it + store i32 %tmp2, i32* %X ; store it back + ... + +This code shows an example of how you can declare and manipulate a stack +variable in the LLVM IR. Stack memory allocated with the alloca +instruction is fully general: you can pass the address of the stack slot +to functions, you can store it in other variables, etc. In our example +above, we could rewrite the example to use the alloca technique to avoid +using a PHI node: + +.. code-block:: llvm + + @G = weak global i32 0 ; type of @G is i32* + @H = weak global i32 0 ; type of @H is i32* + + define i32 @test(i1 %Condition) { + entry: + %X = alloca i32 ; type of %X is i32*. + br i1 %Condition, label %cond_true, label %cond_false + + cond_true: + %X.0 = load i32* @G + store i32 %X.0, i32* %X ; Update X + br label %cond_next + + cond_false: + %X.1 = load i32* @H + store i32 %X.1, i32* %X ; Update X + br label %cond_next + + cond_next: + %X.2 = load i32* %X ; Read X + ret i32 %X.2 + } + +With this, we have discovered a way to handle arbitrary mutable +variables without the need to create Phi nodes at all: + +#. Each mutable variable becomes a stack allocation. +#. Each read of the variable becomes a load from the stack. +#. Each update of the variable becomes a store to the stack. +#. Taking the address of a variable just uses the stack address + directly. + +While this solution has solved our immediate problem, it introduced +another one: we have now apparently introduced a lot of stack traffic +for very simple and common operations, a major performance problem. +Fortunately for us, the LLVM optimizer has a highly-tuned optimization +pass named "mem2reg" that handles this case, promoting allocas like this +into SSA registers, inserting Phi nodes as appropriate. If you run this +example through the pass, for example, you'll get: + +.. code-block:: bash + + $ llvm-as < example.ll | opt -mem2reg | llvm-dis + @G = weak global i32 0 + @H = weak global i32 0 + + define i32 @test(i1 %Condition) { + entry: + br i1 %Condition, label %cond_true, label %cond_false + + cond_true: + %X.0 = load i32* @G + br label %cond_next + + cond_false: + %X.1 = load i32* @H + br label %cond_next + + cond_next: + %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ] + ret i32 %X.01 + } + +The mem2reg pass implements the standard "iterated dominance frontier" +algorithm for constructing SSA form and has a number of optimizations +that speed up (very common) degenerate cases. The mem2reg optimization +pass is the answer to dealing with mutable variables, and we highly +recommend that you depend on it. Note that mem2reg only works on +variables in certain circumstances: + +#. mem2reg is alloca-driven: it looks for allocas and if it can handle + them, it promotes them. It does not apply to global variables or heap + allocations. +#. mem2reg only looks for alloca instructions in the entry block of the + function. Being in the entry block guarantees that the alloca is only + executed once, which makes analysis simpler. +#. mem2reg only promotes allocas whose uses are direct loads and stores. + If the address of the stack object is passed to a function, or if any + funny pointer arithmetic is involved, the alloca will not be + promoted. +#. mem2reg only works on allocas of `first + class <../LangRef.html#first-class-types>`_ values (such as pointers, + scalars and vectors), and only if the array size of the allocation is + 1 (or missing in the .ll file). mem2reg is not capable of promoting + structs or arrays to registers. Note that the "sroa" pass is + more powerful and can promote structs, "unions", and arrays in many + cases. + +All of these properties are easy to satisfy for most imperative +languages, and we'll illustrate it below with Kaleidoscope. The final +question you may be asking is: should I bother with this nonsense for my +front-end? Wouldn't it be better if I just did SSA construction +directly, avoiding use of the mem2reg optimization pass? In short, we +strongly recommend that you use this technique for building SSA form, +unless there is an extremely good reason not to. Using this technique +is: + +- Proven and well tested: clang uses this technique + for local mutable variables. As such, the most common clients of LLVM + are using this to handle a bulk of their variables. You can be sure + that bugs are found fast and fixed early. +- Extremely Fast: mem2reg has a number of special cases that make it + fast in common cases as well as fully general. For example, it has + fast-paths for variables that are only used in a single block, + variables that only have one assignment point, good heuristics to + avoid insertion of unneeded phi nodes, etc. +- Needed for debug info generation: `Debug information in + LLVM <../SourceLevelDebugging.html>`_ relies on having the address of + the variable exposed so that debug info can be attached to it. This + technique dovetails very naturally with this style of debug info. + +If nothing else, this makes it much easier to get your front-end up and +running, and is very simple to implement. Let's extend Kaleidoscope with +mutable variables now! + +Mutable Variables in Kaleidoscope +================================= + +Now that we know the sort of problem we want to tackle, let's see what +this looks like in the context of our little Kaleidoscope language. +We're going to add two features: + +#. The ability to mutate variables with the '=' operator. +#. The ability to define new variables. + +While the first item is really what this is about, we only have +variables for incoming arguments as well as for induction variables, and +redefining those only goes so far :). Also, the ability to define new +variables is a useful thing regardless of whether you will be mutating +them. Here's a motivating example that shows how we could use these: + +:: + + # Define ':' for sequencing: as a low-precedence operator that ignores operands + # and just returns the RHS. + def binary : 1 (x y) y; + + # Recursive fib, we could do this before. + def fib(x) + if (x < 3) then + 1 + else + fib(x-1)+fib(x-2); + + # Iterative fib. + def fibi(x) + var a = 1, b = 1, c in + (for i = 3, i < x in + c = a + b : + a = b : + b = c) : + b; + + # Call it. + fibi(10); + +In order to mutate variables, we have to change our existing variables +to use the "alloca trick". Once we have that, we'll add our new +operator, then extend Kaleidoscope to support new variable definitions. + +Adjusting Existing Variables for Mutation +========================================= + +The symbol table in Kaleidoscope is managed at code generation time by +the '``NamedValues``' map. This map currently keeps track of the LLVM +"Value\*" that holds the double value for the named variable. In order +to support mutation, we need to change this slightly, so that +``NamedValues`` holds the *memory location* of the variable in question. +Note that this change is a refactoring: it changes the structure of the +code, but does not (by itself) change the behavior of the compiler. All +of these changes are isolated in the Kaleidoscope code generator. + +At this point in Kaleidoscope's development, it only supports variables +for two things: incoming arguments to functions and the induction +variable of 'for' loops. For consistency, we'll allow mutation of these +variables in addition to other user-defined variables. This means that +these will both need memory locations. + +To start our transformation of Kaleidoscope, we'll change the +NamedValues map so that it maps to AllocaInst\* instead of Value\*. Once +we do this, the C++ compiler will tell us what parts of the code we need +to update: + +.. code-block:: c++ + + static std::map<std::string, AllocaInst*> NamedValues; + +Also, since we will need to create these alloca's, we'll use a helper +function that ensures that the allocas are created in the entry block of +the function: + +.. code-block:: c++ + + /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of + /// the function. This is used for mutable variables etc. + static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction, + const std::string &VarName) { + IRBuilder<> TmpB(&TheFunction->getEntryBlock(), + TheFunction->getEntryBlock().begin()); + return TmpB.CreateAlloca(Type::getDoubleTy(LLVMContext), 0, + VarName.c_str()); + } + +This funny looking code creates an IRBuilder object that is pointing at +the first instruction (.begin()) of the entry block. It then creates an +alloca with the expected name and returns it. Because all values in +Kaleidoscope are doubles, there is no need to pass in a type to use. + +With this in place, the first functionality change we want to make is to +variable references. In our new scheme, variables live on the stack, so +code generating a reference to them actually needs to produce a load +from the stack slot: + +.. code-block:: c++ + + Value *VariableExprAST::codegen() { + // Look this variable up in the function. + Value *V = NamedValues[Name]; + if (!V) + return LogErrorV("Unknown variable name"); + + // Load the value. + return Builder.CreateLoad(V, Name.c_str()); + } + +As you can see, this is pretty straightforward. Now we need to update +the things that define the variables to set up the alloca. We'll start +with ``ForExprAST::codegen()`` (see the `full code listing <#id1>`_ for +the unabridged code): + +.. code-block:: c++ + + Function *TheFunction = Builder.GetInsertBlock()->getParent(); + + // Create an alloca for the variable in the entry block. + AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); + + // Emit the start code first, without 'variable' in scope. + Value *StartVal = Start->codegen(); + if (!StartVal) + return nullptr; + + // Store the value into the alloca. + Builder.CreateStore(StartVal, Alloca); + ... + + // Compute the end condition. + Value *EndCond = End->codegen(); + if (!EndCond) + return nullptr; + + // Reload, increment, and restore the alloca. This handles the case where + // the body of the loop mutates the variable. + Value *CurVar = Builder.CreateLoad(Alloca); + Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar"); + Builder.CreateStore(NextVar, Alloca); + ... + +This code is virtually identical to the code `before we allowed mutable +variables <LangImpl5.html#code-generation-for-the-for-loop>`_. The big difference is that we +no longer have to construct a PHI node, and we use load/store to access +the variable as needed. + +To support mutable argument variables, we need to also make allocas for +them. The code for this is also pretty simple: + +.. code-block:: c++ + + /// CreateArgumentAllocas - Create an alloca for each argument and register the + /// argument in the symbol table so that references to it will succeed. + void PrototypeAST::CreateArgumentAllocas(Function *F) { + Function::arg_iterator AI = F->arg_begin(); + for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) { + // Create an alloca for this variable. + AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]); + + // Store the initial value into the alloca. + Builder.CreateStore(AI, Alloca); + + // Add arguments to variable symbol table. + NamedValues[Args[Idx]] = Alloca; + } + } + +For each argument, we make an alloca, store the input value to the +function into the alloca, and register the alloca as the memory location +for the argument. This method gets invoked by ``FunctionAST::codegen()`` +right after it sets up the entry block for the function. + +The final missing piece is adding the mem2reg pass, which allows us to +get good codegen once again: + +.. code-block:: c++ + + // Set up the optimizer pipeline. Start with registering info about how the + // target lays out data structures. + OurFPM.add(new DataLayout(*TheExecutionEngine->getDataLayout())); + // Promote allocas to registers. + OurFPM.add(createPromoteMemoryToRegisterPass()); + // Do simple "peephole" optimizations and bit-twiddling optzns. + OurFPM.add(createInstructionCombiningPass()); + // Reassociate expressions. + OurFPM.add(createReassociatePass()); + +It is interesting to see what the code looks like before and after the +mem2reg optimization runs. For example, this is the before/after code +for our recursive fib function. Before the optimization: + +.. code-block:: llvm + + define double @fib(double %x) { + entry: + %x1 = alloca double + store double %x, double* %x1 + %x2 = load double* %x1 + %cmptmp = fcmp ult double %x2, 3.000000e+00 + %booltmp = uitofp i1 %cmptmp to double + %ifcond = fcmp one double %booltmp, 0.000000e+00 + br i1 %ifcond, label %then, label %else + + then: ; preds = %entry + br label %ifcont + + else: ; preds = %entry + %x3 = load double* %x1 + %subtmp = fsub double %x3, 1.000000e+00 + %calltmp = call double @fib(double %subtmp) + %x4 = load double* %x1 + %subtmp5 = fsub double %x4, 2.000000e+00 + %calltmp6 = call double @fib(double %subtmp5) + %addtmp = fadd double %calltmp, %calltmp6 + br label %ifcont + + ifcont: ; preds = %else, %then + %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] + ret double %iftmp + } + +Here there is only one variable (x, the input argument) but you can +still see the extremely simple-minded code generation strategy we are +using. In the entry block, an alloca is created, and the initial input +value is stored into it. Each reference to the variable does a reload +from the stack. Also, note that we didn't modify the if/then/else +expression, so it still inserts a PHI node. While we could make an +alloca for it, it is actually easier to create a PHI node for it, so we +still just make the PHI. + +Here is the code after the mem2reg pass runs: + +.. code-block:: llvm + + define double @fib(double %x) { + entry: + %cmptmp = fcmp ult double %x, 3.000000e+00 + %booltmp = uitofp i1 %cmptmp to double + %ifcond = fcmp one double %booltmp, 0.000000e+00 + br i1 %ifcond, label %then, label %else + + then: + br label %ifcont + + else: + %subtmp = fsub double %x, 1.000000e+00 + %calltmp = call double @fib(double %subtmp) + %subtmp5 = fsub double %x, 2.000000e+00 + %calltmp6 = call double @fib(double %subtmp5) + %addtmp = fadd double %calltmp, %calltmp6 + br label %ifcont + + ifcont: ; preds = %else, %then + %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ] + ret double %iftmp + } + +This is a trivial case for mem2reg, since there are no redefinitions of +the variable. The point of showing this is to calm your tension about +inserting such blatent inefficiencies :). + +After the rest of the optimizers run, we get: + +.. code-block:: llvm + + define double @fib(double %x) { + entry: + %cmptmp = fcmp ult double %x, 3.000000e+00 + %booltmp = uitofp i1 %cmptmp to double + %ifcond = fcmp ueq double %booltmp, 0.000000e+00 + br i1 %ifcond, label %else, label %ifcont + + else: + %subtmp = fsub double %x, 1.000000e+00 + %calltmp = call double @fib(double %subtmp) + %subtmp5 = fsub double %x, 2.000000e+00 + %calltmp6 = call double @fib(double %subtmp5) + %addtmp = fadd double %calltmp, %calltmp6 + ret double %addtmp + + ifcont: + ret double 1.000000e+00 + } + +Here we see that the simplifycfg pass decided to clone the return +instruction into the end of the 'else' block. This allowed it to +eliminate some branches and the PHI node. + +Now that all symbol table references are updated to use stack variables, +we'll add the assignment operator. + +New Assignment Operator +======================= + +With our current framework, adding a new assignment operator is really +simple. We will parse it just like any other binary operator, but handle +it internally (instead of allowing the user to define it). The first +step is to set a precedence: + +.. code-block:: c++ + + int main() { + // Install standard binary operators. + // 1 is lowest precedence. + BinopPrecedence['='] = 2; + BinopPrecedence['<'] = 10; + BinopPrecedence['+'] = 20; + BinopPrecedence['-'] = 20; + +Now that the parser knows the precedence of the binary operator, it +takes care of all the parsing and AST generation. We just need to +implement codegen for the assignment operator. This looks like: + +.. code-block:: c++ + + Value *BinaryExprAST::codegen() { + // Special case '=' because we don't want to emit the LHS as an expression. + if (Op == '=') { + // Assignment requires the LHS to be an identifier. + VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS.get()); + if (!LHSE) + return LogErrorV("destination of '=' must be a variable"); + +Unlike the rest of the binary operators, our assignment operator doesn't +follow the "emit LHS, emit RHS, do computation" model. As such, it is +handled as a special case before the other binary operators are handled. +The other strange thing is that it requires the LHS to be a variable. It +is invalid to have "(x+1) = expr" - only things like "x = expr" are +allowed. + +.. code-block:: c++ + + // Codegen the RHS. + Value *Val = RHS->codegen(); + if (!Val) + return nullptr; + + // Look up the name. + Value *Variable = NamedValues[LHSE->getName()]; + if (!Variable) + return LogErrorV("Unknown variable name"); + + Builder.CreateStore(Val, Variable); + return Val; + } + ... + +Once we have the variable, codegen'ing the assignment is +straightforward: we emit the RHS of the assignment, create a store, and +return the computed value. Returning a value allows for chained +assignments like "X = (Y = Z)". + +Now that we have an assignment operator, we can mutate loop variables +and arguments. For example, we can now run code like this: + +:: + + # Function to print a double. + extern printd(x); + + # Define ':' for sequencing: as a low-precedence operator that ignores operands + # and just returns the RHS. + def binary : 1 (x y) y; + + def test(x) + printd(x) : + x = 4 : + printd(x); + + test(123); + +When run, this example prints "123" and then "4", showing that we did +actually mutate the value! Okay, we have now officially implemented our +goal: getting this to work requires SSA construction in the general +case. However, to be really useful, we want the ability to define our +own local variables, let's add this next! + +User-defined Local Variables +============================ + +Adding var/in is just like any other extension we made to +Kaleidoscope: we extend the lexer, the parser, the AST and the code +generator. The first step for adding our new 'var/in' construct is to +extend the lexer. As before, this is pretty trivial, the code looks like +this: + +.. code-block:: c++ + + enum Token { + ... + // var definition + tok_var = -13 + ... + } + ... + static int gettok() { + ... + if (IdentifierStr == "in") + return tok_in; + if (IdentifierStr == "binary") + return tok_binary; + if (IdentifierStr == "unary") + return tok_unary; + if (IdentifierStr == "var") + return tok_var; + return tok_identifier; + ... + +The next step is to define the AST node that we will construct. For +var/in, it looks like this: + +.. code-block:: c++ + + /// VarExprAST - Expression class for var/in + class VarExprAST : public ExprAST { + std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames; + std::unique_ptr<ExprAST> Body; + + public: + VarExprAST(std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames, + std::unique_ptr<ExprAST> body) + : VarNames(std::move(VarNames)), Body(std::move(Body)) {} + + virtual Value *codegen(); + }; + +var/in allows a list of names to be defined all at once, and each name +can optionally have an initializer value. As such, we capture this +information in the VarNames vector. Also, var/in has a body, this body +is allowed to access the variables defined by the var/in. + +With this in place, we can define the parser pieces. The first thing we +do is add it as a primary expression: + +.. code-block:: c++ + + /// primary + /// ::= identifierexpr + /// ::= numberexpr + /// ::= parenexpr + /// ::= ifexpr + /// ::= forexpr + /// ::= varexpr + static std::unique_ptr<ExprAST> ParsePrimary() { + switch (CurTok) { + default: + return LogError("unknown token when expecting an expression"); + case tok_identifier: + return ParseIdentifierExpr(); + case tok_number: + return ParseNumberExpr(); + case '(': + return ParseParenExpr(); + case tok_if: + return ParseIfExpr(); + case tok_for: + return ParseForExpr(); + case tok_var: + return ParseVarExpr(); + } + } + +Next we define ParseVarExpr: + +.. code-block:: c++ + + /// varexpr ::= 'var' identifier ('=' expression)? + // (',' identifier ('=' expression)?)* 'in' expression + static std::unique_ptr<ExprAST> ParseVarExpr() { + getNextToken(); // eat the var. + + std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames; + + // At least one variable name is required. + if (CurTok != tok_identifier) + return LogError("expected identifier after var"); + +The first part of this code parses the list of identifier/expr pairs +into the local ``VarNames`` vector. + +.. code-block:: c++ + + while (1) { + std::string Name = IdentifierStr; + getNextToken(); // eat identifier. + + // Read the optional initializer. + std::unique_ptr<ExprAST> Init; + if (CurTok == '=') { + getNextToken(); // eat the '='. + + Init = ParseExpression(); + if (!Init) return nullptr; + } + + VarNames.push_back(std::make_pair(Name, std::move(Init))); + + // End of var list, exit loop. + if (CurTok != ',') break; + getNextToken(); // eat the ','. + + if (CurTok != tok_identifier) + return LogError("expected identifier list after var"); + } + +Once all the variables are parsed, we then parse the body and create the +AST node: + +.. code-block:: c++ + + // At this point, we have to have 'in'. + if (CurTok != tok_in) + return LogError("expected 'in' keyword after 'var'"); + getNextToken(); // eat 'in'. + + auto Body = ParseExpression(); + if (!Body) + return nullptr; + + return llvm::make_unique<VarExprAST>(std::move(VarNames), + std::move(Body)); + } + +Now that we can parse and represent the code, we need to support +emission of LLVM IR for it. This code starts out with: + +.. code-block:: c++ + + Value *VarExprAST::codegen() { + std::vector<AllocaInst *> OldBindings; + + Function *TheFunction = Builder.GetInsertBlock()->getParent(); + + // Register all variables and emit their initializer. + for (unsigned i = 0, e = VarNames.size(); i != e; ++i) { + const std::string &VarName = VarNames[i].first; + ExprAST *Init = VarNames[i].second.get(); + +Basically it loops over all the variables, installing them one at a +time. For each variable we put into the symbol table, we remember the +previous value that we replace in OldBindings. + +.. code-block:: c++ + + // Emit the initializer before adding the variable to scope, this prevents + // the initializer from referencing the variable itself, and permits stuff + // like this: + // var a = 1 in + // var a = a in ... # refers to outer 'a'. + Value *InitVal; + if (Init) { + InitVal = Init->codegen(); + if (!InitVal) + return nullptr; + } else { // If not specified, use 0.0. + InitVal = ConstantFP::get(LLVMContext, APFloat(0.0)); + } + + AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName); + Builder.CreateStore(InitVal, Alloca); + + // Remember the old variable binding so that we can restore the binding when + // we unrecurse. + OldBindings.push_back(NamedValues[VarName]); + + // Remember this binding. + NamedValues[VarName] = Alloca; + } + +There are more comments here than code. The basic idea is that we emit +the initializer, create the alloca, then update the symbol table to +point to it. Once all the variables are installed in the symbol table, +we evaluate the body of the var/in expression: + +.. code-block:: c++ + + // Codegen the body, now that all vars are in scope. + Value *BodyVal = Body->codegen(); + if (!BodyVal) + return nullptr; + +Finally, before returning, we restore the previous variable bindings: + +.. code-block:: c++ + + // Pop all our variables from scope. + for (unsigned i = 0, e = VarNames.size(); i != e; ++i) + NamedValues[VarNames[i].first] = OldBindings[i]; + + // Return the body computation. + return BodyVal; + } + +The end result of all of this is that we get properly scoped variable +definitions, and we even (trivially) allow mutation of them :). + +With this, we completed what we set out to do. Our nice iterative fib +example from the intro compiles and runs just fine. The mem2reg pass +optimizes all of our stack variables into SSA registers, inserting PHI +nodes where needed, and our front-end remains simple: no "iterated +dominance frontier" computation anywhere in sight. + +Full Code Listing +================= + +Here is the complete code listing for our running example, enhanced with +mutable variables and var/in support. To build this example, use: + +.. code-block:: bash + + # Compile + clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy + # Run + ./toy + +Here is the code: + +.. literalinclude:: ../../examples/Kaleidoscope/Chapter7/toy.cpp + :language: c++ + +`Next: Compiling to Object Code <LangImpl08.html>`_ + diff --git a/gnu/llvm/docs/tutorial/LangImpl08.rst b/gnu/llvm/docs/tutorial/LangImpl08.rst new file mode 100644 index 00000000000..96eccaebd32 --- /dev/null +++ b/gnu/llvm/docs/tutorial/LangImpl08.rst @@ -0,0 +1,218 @@ +======================================== + Kaleidoscope: Compiling to Object Code +======================================== + +.. contents:: + :local: + +Chapter 8 Introduction +====================== + +Welcome to Chapter 8 of the "`Implementing a language with LLVM +<index.html>`_" tutorial. This chapter describes how to compile our +language down to object files. + +Choosing a target +================= + +LLVM has native support for cross-compilation. You can compile to the +architecture of your current machine, or just as easily compile for +other architectures. In this tutorial, we'll target the current +machine. + +To specify the architecture that you want to target, we use a string +called a "target triple". This takes the form +``<arch><sub>-<vendor>-<sys>-<abi>`` (see the `cross compilation docs +<http://clang.llvm.org/docs/CrossCompilation.html#target-triple>`_). + +As an example, we can see what clang thinks is our current target +triple: + +:: + + $ clang --version | grep Target + Target: x86_64-unknown-linux-gnu + +Running this command may show something different on your machine as +you might be using a different architecture or operating system to me. + +Fortunately, we don't need to hard-code a target triple to target the +current machine. LLVM provides ``sys::getDefaultTargetTriple``, which +returns the target triple of the current machine. + +.. code-block:: c++ + + auto TargetTriple = sys::getDefaultTargetTriple(); + +LLVM doesn't require us to to link in all the target +functionality. For example, if we're just using the JIT, we don't need +the assembly printers. Similarly, if we're only targeting certain +architectures, we can only link in the functionality for those +architectures. + +For this example, we'll initialize all the targets for emitting object +code. + +.. code-block:: c++ + + InitializeAllTargetInfos(); + InitializeAllTargets(); + InitializeAllTargetMCs(); + InitializeAllAsmParsers(); + InitializeAllAsmPrinters(); + +We can now use our target triple to get a ``Target``: + +.. code-block:: c++ + + std::string Error; + auto Target = TargetRegistry::lookupTarget(TargetTriple, Error); + + // Print an error and exit if we couldn't find the requested target. + // This generally occurs if we've forgotten to initialise the + // TargetRegistry or we have a bogus target triple. + if (!Target) { + errs() << Error; + return 1; + } + +Target Machine +============== + +We will also need a ``TargetMachine``. This class provides a complete +machine description of the machine we're targeting. If we want to +target a specific feature (such as SSE) or a specific CPU (such as +Intel's Sandylake), we do so now. + +To see which features and CPUs that LLVM knows about, we can use +``llc``. For example, let's look at x86: + +:: + + $ llvm-as < /dev/null | llc -march=x86 -mattr=help + Available CPUs for this target: + + amdfam10 - Select the amdfam10 processor. + athlon - Select the athlon processor. + athlon-4 - Select the athlon-4 processor. + ... + + Available features for this target: + + 16bit-mode - 16-bit mode (i8086). + 32bit-mode - 32-bit mode (80386). + 3dnow - Enable 3DNow! instructions. + 3dnowa - Enable 3DNow! Athlon instructions. + ... + +For our example, we'll use the generic CPU without any additional +features, options or relocation model. + +.. code-block:: c++ + + auto CPU = "generic"; + auto Features = ""; + + TargetOptions opt; + auto RM = Optional<Reloc::Model>(); + auto TargetMachine = Target->createTargetMachine(TargetTriple, CPU, Features, opt, RM); + + +Configuring the Module +====================== + +We're now ready to configure our module, to specify the target and +data layout. This isn't strictly necessary, but the `frontend +performance guide <../Frontend/PerformanceTips.html>`_ recommends +this. Optimizations benefit from knowing about the target and data +layout. + +.. code-block:: c++ + + TheModule->setDataLayout(TargetMachine->createDataLayout()); + TheModule->setTargetTriple(TargetTriple); + +Emit Object Code +================ + +We're ready to emit object code! Let's define where we want to write +our file to: + +.. code-block:: c++ + + auto Filename = "output.o"; + std::error_code EC; + raw_fd_ostream dest(Filename, EC, sys::fs::F_None); + + if (EC) { + errs() << "Could not open file: " << EC.message(); + return 1; + } + +Finally, we define a pass that emits object code, then we run that +pass: + +.. code-block:: c++ + + legacy::PassManager pass; + auto FileType = TargetMachine::CGFT_ObjectFile; + + if (TargetMachine->addPassesToEmitFile(pass, dest, FileType)) { + errs() << "TargetMachine can't emit a file of this type"; + return 1; + } + + pass.run(*TheModule); + dest.flush(); + +Putting It All Together +======================= + +Does it work? Let's give it a try. We need to compile our code, but +note that the arguments to ``llvm-config`` are different to the previous chapters. + +:: + + $ clang++ -g -O3 toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs all` -o toy + +Let's run it, and define a simple ``average`` function. Press Ctrl-D +when you're done. + +:: + + $ ./toy + ready> def average(x y) (x + y) * 0.5; + ^D + Wrote output.o + +We have an object file! To test it, let's write a simple program and +link it with our output. Here's the source code: + +.. code-block:: c++ + + #include <iostream> + + extern "C" { + double average(double, double); + } + + int main() { + std::cout << "average of 3.0 and 4.0: " << average(3.0, 4.0) << std::endl; + } + +We link our program to output.o and check the result is what we +expected: + +:: + + $ clang++ main.cpp output.o -o main + $ ./main + average of 3.0 and 4.0: 3.5 + +Full Code Listing +================= + +.. literalinclude:: ../../examples/Kaleidoscope/Chapter8/toy.cpp + :language: c++ + +`Next: Adding Debug Information <LangImpl09.html>`_ diff --git a/gnu/llvm/docs/tutorial/LangImpl09.rst b/gnu/llvm/docs/tutorial/LangImpl09.rst new file mode 100644 index 00000000000..0053960756d --- /dev/null +++ b/gnu/llvm/docs/tutorial/LangImpl09.rst @@ -0,0 +1,462 @@ +====================================== +Kaleidoscope: Adding Debug Information +====================================== + +.. contents:: + :local: + +Chapter 9 Introduction +====================== + +Welcome to Chapter 9 of the "`Implementing a language with +LLVM <index.html>`_" tutorial. In chapters 1 through 8, we've built a +decent little programming language with functions and variables. +What happens if something goes wrong though, how do you debug your +program? + +Source level debugging uses formatted data that helps a debugger +translate from binary and the state of the machine back to the +source that the programmer wrote. In LLVM we generally use a format +called `DWARF <http://dwarfstd.org>`_. DWARF is a compact encoding +that represents types, source locations, and variable locations. + +The short summary of this chapter is that we'll go through the +various things you have to add to a programming language to +support debug info, and how you translate that into DWARF. + +Caveat: For now we can't debug via the JIT, so we'll need to compile +our program down to something small and standalone. As part of this +we'll make a few modifications to the running of the language and +how programs are compiled. This means that we'll have a source file +with a simple program written in Kaleidoscope rather than the +interactive JIT. It does involve a limitation that we can only +have one "top level" command at a time to reduce the number of +changes necessary. + +Here's the sample program we'll be compiling: + +.. code-block:: python + + def fib(x) + if x < 3 then + 1 + else + fib(x-1)+fib(x-2); + + fib(10) + + +Why is this a hard problem? +=========================== + +Debug information is a hard problem for a few different reasons - mostly +centered around optimized code. First, optimization makes keeping source +locations more difficult. In LLVM IR we keep the original source location +for each IR level instruction on the instruction. Optimization passes +should keep the source locations for newly created instructions, but merged +instructions only get to keep a single location - this can cause jumping +around when stepping through optimized programs. Secondly, optimization +can move variables in ways that are either optimized out, shared in memory +with other variables, or difficult to track. For the purposes of this +tutorial we're going to avoid optimization (as you'll see with one of the +next sets of patches). + +Ahead-of-Time Compilation Mode +============================== + +To highlight only the aspects of adding debug information to a source +language without needing to worry about the complexities of JIT debugging +we're going to make a few changes to Kaleidoscope to support compiling +the IR emitted by the front end into a simple standalone program that +you can execute, debug, and see results. + +First we make our anonymous function that contains our top level +statement be our "main": + +.. code-block:: udiff + + - auto Proto = llvm::make_unique<PrototypeAST>("", std::vector<std::string>()); + + auto Proto = llvm::make_unique<PrototypeAST>("main", std::vector<std::string>()); + +just with the simple change of giving it a name. + +Then we're going to remove the command line code wherever it exists: + +.. code-block:: udiff + + @@ -1129,7 +1129,6 @@ static void HandleTopLevelExpression() { + /// top ::= definition | external | expression | ';' + static void MainLoop() { + while (1) { + - fprintf(stderr, "ready> "); + switch (CurTok) { + case tok_eof: + return; + @@ -1184,7 +1183,6 @@ int main() { + BinopPrecedence['*'] = 40; // highest. + + // Prime the first token. + - fprintf(stderr, "ready> "); + getNextToken(); + +Lastly we're going to disable all of the optimization passes and the JIT so +that the only thing that happens after we're done parsing and generating +code is that the llvm IR goes to standard error: + +.. code-block:: udiff + + @@ -1108,17 +1108,8 @@ static void HandleExtern() { + static void HandleTopLevelExpression() { + // Evaluate a top-level expression into an anonymous function. + if (auto FnAST = ParseTopLevelExpr()) { + - if (auto *FnIR = FnAST->codegen()) { + - // We're just doing this to make sure it executes. + - TheExecutionEngine->finalizeObject(); + - // JIT the function, returning a function pointer. + - void *FPtr = TheExecutionEngine->getPointerToFunction(FnIR); + - + - // Cast it to the right type (takes no arguments, returns a double) so we + - // can call it as a native function. + - double (*FP)() = (double (*)())(intptr_t)FPtr; + - // Ignore the return value for this. + - (void)FP; + + if (!F->codegen()) { + + fprintf(stderr, "Error generating code for top level expr"); + } + } else { + // Skip token for error recovery. + @@ -1439,11 +1459,11 @@ int main() { + // target lays out data structures. + TheModule->setDataLayout(TheExecutionEngine->getDataLayout()); + OurFPM.add(new DataLayoutPass()); + +#if 0 + OurFPM.add(createBasicAliasAnalysisPass()); + // Promote allocas to registers. + OurFPM.add(createPromoteMemoryToRegisterPass()); + @@ -1218,7 +1210,7 @@ int main() { + OurFPM.add(createGVNPass()); + // Simplify the control flow graph (deleting unreachable blocks, etc). + OurFPM.add(createCFGSimplificationPass()); + - + + #endif + OurFPM.doInitialization(); + + // Set the global so the code gen can use this. + +This relatively small set of changes get us to the point that we can compile +our piece of Kaleidoscope language down to an executable program via this +command line: + +.. code-block:: bash + + Kaleidoscope-Ch9 < fib.ks | & clang -x ir - + +which gives an a.out/a.exe in the current working directory. + +Compile Unit +============ + +The top level container for a section of code in DWARF is a compile unit. +This contains the type and function data for an individual translation unit +(read: one file of source code). So the first thing we need to do is +construct one for our fib.ks file. + +DWARF Emission Setup +==================== + +Similar to the ``IRBuilder`` class we have a +`DIBuilder <http://llvm.org/doxygen/classllvm_1_1DIBuilder.html>`_ class +that helps in constructing debug metadata for an llvm IR file. It +corresponds 1:1 similarly to ``IRBuilder`` and llvm IR, but with nicer names. +Using it does require that you be more familiar with DWARF terminology than +you needed to be with ``IRBuilder`` and ``Instruction`` names, but if you +read through the general documentation on the +`Metadata Format <http://llvm.org/docs/SourceLevelDebugging.html>`_ it +should be a little more clear. We'll be using this class to construct all +of our IR level descriptions. Construction for it takes a module so we +need to construct it shortly after we construct our module. We've left it +as a global static variable to make it a bit easier to use. + +Next we're going to create a small container to cache some of our frequent +data. The first will be our compile unit, but we'll also write a bit of +code for our one type since we won't have to worry about multiple typed +expressions: + +.. code-block:: c++ + + static DIBuilder *DBuilder; + + struct DebugInfo { + DICompileUnit *TheCU; + DIType *DblTy; + + DIType *getDoubleTy(); + } KSDbgInfo; + + DIType *DebugInfo::getDoubleTy() { + if (DblTy.isValid()) + return DblTy; + + DblTy = DBuilder->createBasicType("double", 64, 64, dwarf::DW_ATE_float); + return DblTy; + } + +And then later on in ``main`` when we're constructing our module: + +.. code-block:: c++ + + DBuilder = new DIBuilder(*TheModule); + + KSDbgInfo.TheCU = DBuilder->createCompileUnit( + dwarf::DW_LANG_C, "fib.ks", ".", "Kaleidoscope Compiler", 0, "", 0); + +There are a couple of things to note here. First, while we're producing a +compile unit for a language called Kaleidoscope we used the language +constant for C. This is because a debugger wouldn't necessarily understand +the calling conventions or default ABI for a language it doesn't recognize +and we follow the C ABI in our llvm code generation so it's the closest +thing to accurate. This ensures we can actually call functions from the +debugger and have them execute. Secondly, you'll see the "fib.ks" in the +call to ``createCompileUnit``. This is a default hard coded value since +we're using shell redirection to put our source into the Kaleidoscope +compiler. In a usual front end you'd have an input file name and it would +go there. + +One last thing as part of emitting debug information via DIBuilder is that +we need to "finalize" the debug information. The reasons are part of the +underlying API for DIBuilder, but make sure you do this near the end of +main: + +.. code-block:: c++ + + DBuilder->finalize(); + +before you dump out the module. + +Functions +========= + +Now that we have our ``Compile Unit`` and our source locations, we can add +function definitions to the debug info. So in ``PrototypeAST::codegen()`` we +add a few lines of code to describe a context for our subprogram, in this +case the "File", and the actual definition of the function itself. + +So the context: + +.. code-block:: c++ + + DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(), + KSDbgInfo.TheCU.getDirectory()); + +giving us an DIFile and asking the ``Compile Unit`` we created above for the +directory and filename where we are currently. Then, for now, we use some +source locations of 0 (since our AST doesn't currently have source location +information) and construct our function definition: + +.. code-block:: c++ + + DIScope *FContext = Unit; + unsigned LineNo = 0; + unsigned ScopeLine = 0; + DISubprogram *SP = DBuilder->createFunction( + FContext, Name, StringRef(), Unit, LineNo, + CreateFunctionType(Args.size(), Unit), false /* internal linkage */, + true /* definition */, ScopeLine, DINode::FlagPrototyped, false); + F->setSubprogram(SP); + +and we now have an DISubprogram that contains a reference to all of our +metadata for the function. + +Source Locations +================ + +The most important thing for debug information is accurate source location - +this makes it possible to map your source code back. We have a problem though, +Kaleidoscope really doesn't have any source location information in the lexer +or parser so we'll need to add it. + +.. code-block:: c++ + + struct SourceLocation { + int Line; + int Col; + }; + static SourceLocation CurLoc; + static SourceLocation LexLoc = {1, 0}; + + static int advance() { + int LastChar = getchar(); + + if (LastChar == '\n' || LastChar == '\r') { + LexLoc.Line++; + LexLoc.Col = 0; + } else + LexLoc.Col++; + return LastChar; + } + +In this set of code we've added some functionality on how to keep track of the +line and column of the "source file". As we lex every token we set our current +current "lexical location" to the assorted line and column for the beginning +of the token. We do this by overriding all of the previous calls to +``getchar()`` with our new ``advance()`` that keeps track of the information +and then we have added to all of our AST classes a source location: + +.. code-block:: c++ + + class ExprAST { + SourceLocation Loc; + + public: + ExprAST(SourceLocation Loc = CurLoc) : Loc(Loc) {} + virtual ~ExprAST() {} + virtual Value* codegen() = 0; + int getLine() const { return Loc.Line; } + int getCol() const { return Loc.Col; } + virtual raw_ostream &dump(raw_ostream &out, int ind) { + return out << ':' << getLine() << ':' << getCol() << '\n'; + } + +that we pass down through when we create a new expression: + +.. code-block:: c++ + + LHS = llvm::make_unique<BinaryExprAST>(BinLoc, BinOp, std::move(LHS), + std::move(RHS)); + +giving us locations for each of our expressions and variables. + +From this we can make sure to tell ``DIBuilder`` when we're at a new source +location so it can use that when we generate the rest of our code and make +sure that each instruction has source location information. We do this +by constructing another small function: + +.. code-block:: c++ + + void DebugInfo::emitLocation(ExprAST *AST) { + DIScope *Scope; + if (LexicalBlocks.empty()) + Scope = TheCU; + else + Scope = LexicalBlocks.back(); + Builder.SetCurrentDebugLocation( + DebugLoc::get(AST->getLine(), AST->getCol(), Scope)); + } + +that both tells the main ``IRBuilder`` where we are, but also what scope +we're in. Since we've just created a function above we can either be in +the main file scope (like when we created our function), or now we can be +in the function scope we just created. To represent this we create a stack +of scopes: + +.. code-block:: c++ + + std::vector<DIScope *> LexicalBlocks; + std::map<const PrototypeAST *, DIScope *> FnScopeMap; + +and keep a map of each function to the scope that it represents (an +DISubprogram is also an DIScope). + +Then we make sure to: + +.. code-block:: c++ + + KSDbgInfo.emitLocation(this); + +emit the location every time we start to generate code for a new AST, and +also: + +.. code-block:: c++ + + KSDbgInfo.FnScopeMap[this] = SP; + +store the scope (function) when we create it and use it: + + KSDbgInfo.LexicalBlocks.push_back(&KSDbgInfo.FnScopeMap[Proto]); + +when we start generating the code for each function. + +also, don't forget to pop the scope back off of your scope stack at the +end of the code generation for the function: + +.. code-block:: c++ + + // Pop off the lexical block for the function since we added it + // unconditionally. + KSDbgInfo.LexicalBlocks.pop_back(); + +Variables +========= + +Now that we have functions, we need to be able to print out the variables +we have in scope. Let's get our function arguments set up so we can get +decent backtraces and see how our functions are being called. It isn't +a lot of code, and we generally handle it when we're creating the +argument allocas in ``PrototypeAST::CreateArgumentAllocas``. + +.. code-block:: c++ + + DIScope *Scope = KSDbgInfo.LexicalBlocks.back(); + DIFile *Unit = DBuilder->createFile(KSDbgInfo.TheCU.getFilename(), + KSDbgInfo.TheCU.getDirectory()); + DILocalVariable D = DBuilder->createParameterVariable( + Scope, Args[Idx], Idx + 1, Unit, Line, KSDbgInfo.getDoubleTy(), true); + + DBuilder->insertDeclare(Alloca, D, DBuilder->createExpression(), + DebugLoc::get(Line, 0, Scope), + Builder.GetInsertBlock()); + +Here we're doing a few things. First, we're grabbing our current scope +for the variable so we can say what range of code our variable is valid +through. Second, we're creating the variable, giving it the scope, +the name, source location, type, and since it's an argument, the argument +index. Third, we create an ``lvm.dbg.declare`` call to indicate at the IR +level that we've got a variable in an alloca (and it gives a starting +location for the variable), and setting a source location for the +beginning of the scope on the declare. + +One interesting thing to note at this point is that various debuggers have +assumptions based on how code and debug information was generated for them +in the past. In this case we need to do a little bit of a hack to avoid +generating line information for the function prologue so that the debugger +knows to skip over those instructions when setting a breakpoint. So in +``FunctionAST::CodeGen`` we add a couple of lines: + +.. code-block:: c++ + + // Unset the location for the prologue emission (leading instructions with no + // location in a function are considered part of the prologue and the debugger + // will run past them when breaking on a function) + KSDbgInfo.emitLocation(nullptr); + +and then emit a new location when we actually start generating code for the +body of the function: + +.. code-block:: c++ + + KSDbgInfo.emitLocation(Body); + +With this we have enough debug information to set breakpoints in functions, +print out argument variables, and call functions. Not too bad for just a +few simple lines of code! + +Full Code Listing +================= + +Here is the complete code listing for our running example, enhanced with +debug information. To build this example, use: + +.. code-block:: bash + + # Compile + clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy + # Run + ./toy + +Here is the code: + +.. literalinclude:: ../../examples/Kaleidoscope/Chapter9/toy.cpp + :language: c++ + +`Next: Conclusion and other useful LLVM tidbits <LangImpl10.html>`_ + diff --git a/gnu/llvm/docs/tutorial/LangImpl10.rst b/gnu/llvm/docs/tutorial/LangImpl10.rst new file mode 100644 index 00000000000..5799c99402c --- /dev/null +++ b/gnu/llvm/docs/tutorial/LangImpl10.rst @@ -0,0 +1,259 @@ +====================================================== +Kaleidoscope: Conclusion and other useful LLVM tidbits +====================================================== + +.. contents:: + :local: + +Tutorial Conclusion +=================== + +Welcome to the final chapter of the "`Implementing a language with +LLVM <index.html>`_" tutorial. In the course of this tutorial, we have +grown our little Kaleidoscope language from being a useless toy, to +being a semi-interesting (but probably still useless) toy. :) + +It is interesting to see how far we've come, and how little code it has +taken. We built the entire lexer, parser, AST, code generator, an +interactive run-loop (with a JIT!), and emitted debug information in +standalone executables - all in under 1000 lines of (non-comment/non-blank) +code. + +Our little language supports a couple of interesting features: it +supports user defined binary and unary operators, it uses JIT +compilation for immediate evaluation, and it supports a few control flow +constructs with SSA construction. + +Part of the idea of this tutorial was to show you how easy and fun it +can be to define, build, and play with languages. Building a compiler +need not be a scary or mystical process! Now that you've seen some of +the basics, I strongly encourage you to take the code and hack on it. +For example, try adding: + +- **global variables** - While global variables have questional value + in modern software engineering, they are often useful when putting + together quick little hacks like the Kaleidoscope compiler itself. + Fortunately, our current setup makes it very easy to add global + variables: just have value lookup check to see if an unresolved + variable is in the global variable symbol table before rejecting it. + To create a new global variable, make an instance of the LLVM + ``GlobalVariable`` class. +- **typed variables** - Kaleidoscope currently only supports variables + of type double. This gives the language a very nice elegance, because + only supporting one type means that you never have to specify types. + Different languages have different ways of handling this. The easiest + way is to require the user to specify types for every variable + definition, and record the type of the variable in the symbol table + along with its Value\*. +- **arrays, structs, vectors, etc** - Once you add types, you can start + extending the type system in all sorts of interesting ways. Simple + arrays are very easy and are quite useful for many different + applications. Adding them is mostly an exercise in learning how the + LLVM `getelementptr <../LangRef.html#getelementptr-instruction>`_ instruction + works: it is so nifty/unconventional, it `has its own + FAQ <../GetElementPtr.html>`_! +- **standard runtime** - Our current language allows the user to access + arbitrary external functions, and we use it for things like "printd" + and "putchard". As you extend the language to add higher-level + constructs, often these constructs make the most sense if they are + lowered to calls into a language-supplied runtime. For example, if + you add hash tables to the language, it would probably make sense to + add the routines to a runtime, instead of inlining them all the way. +- **memory management** - Currently we can only access the stack in + Kaleidoscope. It would also be useful to be able to allocate heap + memory, either with calls to the standard libc malloc/free interface + or with a garbage collector. If you would like to use garbage + collection, note that LLVM fully supports `Accurate Garbage + Collection <../GarbageCollection.html>`_ including algorithms that + move objects and need to scan/update the stack. +- **exception handling support** - LLVM supports generation of `zero + cost exceptions <../ExceptionHandling.html>`_ which interoperate with + code compiled in other languages. You could also generate code by + implicitly making every function return an error value and checking + it. You could also make explicit use of setjmp/longjmp. There are + many different ways to go here. +- **object orientation, generics, database access, complex numbers, + geometric programming, ...** - Really, there is no end of crazy + features that you can add to the language. +- **unusual domains** - We've been talking about applying LLVM to a + domain that many people are interested in: building a compiler for a + specific language. However, there are many other domains that can use + compiler technology that are not typically considered. For example, + LLVM has been used to implement OpenGL graphics acceleration, + translate C++ code to ActionScript, and many other cute and clever + things. Maybe you will be the first to JIT compile a regular + expression interpreter into native code with LLVM? + +Have fun - try doing something crazy and unusual. Building a language +like everyone else always has, is much less fun than trying something a +little crazy or off the wall and seeing how it turns out. If you get +stuck or want to talk about it, feel free to email the `llvm-dev mailing +list <http://lists.llvm.org/mailman/listinfo/llvm-dev>`_: it has lots +of people who are interested in languages and are often willing to help +out. + +Before we end this tutorial, I want to talk about some "tips and tricks" +for generating LLVM IR. These are some of the more subtle things that +may not be obvious, but are very useful if you want to take advantage of +LLVM's capabilities. + +Properties of the LLVM IR +========================= + +We have a couple of common questions about code in the LLVM IR form - +let's just get these out of the way right now, shall we? + +Target Independence +------------------- + +Kaleidoscope is an example of a "portable language": any program written +in Kaleidoscope will work the same way on any target that it runs on. +Many other languages have this property, e.g. lisp, java, haskell, +javascript, python, etc (note that while these languages are portable, +not all their libraries are). + +One nice aspect of LLVM is that it is often capable of preserving target +independence in the IR: you can take the LLVM IR for a +Kaleidoscope-compiled program and run it on any target that LLVM +supports, even emitting C code and compiling that on targets that LLVM +doesn't support natively. You can trivially tell that the Kaleidoscope +compiler generates target-independent code because it never queries for +any target-specific information when generating code. + +The fact that LLVM provides a compact, target-independent, +representation for code gets a lot of people excited. Unfortunately, +these people are usually thinking about C or a language from the C +family when they are asking questions about language portability. I say +"unfortunately", because there is really no way to make (fully general) +C code portable, other than shipping the source code around (and of +course, C source code is not actually portable in general either - ever +port a really old application from 32- to 64-bits?). + +The problem with C (again, in its full generality) is that it is heavily +laden with target specific assumptions. As one simple example, the +preprocessor often destructively removes target-independence from the +code when it processes the input text: + +.. code-block:: c + + #ifdef __i386__ + int X = 1; + #else + int X = 42; + #endif + +While it is possible to engineer more and more complex solutions to +problems like this, it cannot be solved in full generality in a way that +is better than shipping the actual source code. + +That said, there are interesting subsets of C that can be made portable. +If you are willing to fix primitive types to a fixed size (say int = +32-bits, and long = 64-bits), don't care about ABI compatibility with +existing binaries, and are willing to give up some other minor features, +you can have portable code. This can make sense for specialized domains +such as an in-kernel language. + +Safety Guarantees +----------------- + +Many of the languages above are also "safe" languages: it is impossible +for a program written in Java to corrupt its address space and crash the +process (assuming the JVM has no bugs). Safety is an interesting +property that requires a combination of language design, runtime +support, and often operating system support. + +It is certainly possible to implement a safe language in LLVM, but LLVM +IR does not itself guarantee safety. The LLVM IR allows unsafe pointer +casts, use after free bugs, buffer over-runs, and a variety of other +problems. Safety needs to be implemented as a layer on top of LLVM and, +conveniently, several groups have investigated this. Ask on the `llvm-dev +mailing list <http://lists.llvm.org/mailman/listinfo/llvm-dev>`_ if +you are interested in more details. + +Language-Specific Optimizations +------------------------------- + +One thing about LLVM that turns off many people is that it does not +solve all the world's problems in one system (sorry 'world hunger', +someone else will have to solve you some other day). One specific +complaint is that people perceive LLVM as being incapable of performing +high-level language-specific optimization: LLVM "loses too much +information". + +Unfortunately, this is really not the place to give you a full and +unified version of "Chris Lattner's theory of compiler design". Instead, +I'll make a few observations: + +First, you're right that LLVM does lose information. For example, as of +this writing, there is no way to distinguish in the LLVM IR whether an +SSA-value came from a C "int" or a C "long" on an ILP32 machine (other +than debug info). Both get compiled down to an 'i32' value and the +information about what it came from is lost. The more general issue +here, is that the LLVM type system uses "structural equivalence" instead +of "name equivalence". Another place this surprises people is if you +have two types in a high-level language that have the same structure +(e.g. two different structs that have a single int field): these types +will compile down into a single LLVM type and it will be impossible to +tell what it came from. + +Second, while LLVM does lose information, LLVM is not a fixed target: we +continue to enhance and improve it in many different ways. In addition +to adding new features (LLVM did not always support exceptions or debug +info), we also extend the IR to capture important information for +optimization (e.g. whether an argument is sign or zero extended, +information about pointers aliasing, etc). Many of the enhancements are +user-driven: people want LLVM to include some specific feature, so they +go ahead and extend it. + +Third, it is *possible and easy* to add language-specific optimizations, +and you have a number of choices in how to do it. As one trivial +example, it is easy to add language-specific optimization passes that +"know" things about code compiled for a language. In the case of the C +family, there is an optimization pass that "knows" about the standard C +library functions. If you call "exit(0)" in main(), it knows that it is +safe to optimize that into "return 0;" because C specifies what the +'exit' function does. + +In addition to simple library knowledge, it is possible to embed a +variety of other language-specific information into the LLVM IR. If you +have a specific need and run into a wall, please bring the topic up on +the llvm-dev list. At the very worst, you can always treat LLVM as if it +were a "dumb code generator" and implement the high-level optimizations +you desire in your front-end, on the language-specific AST. + +Tips and Tricks +=============== + +There is a variety of useful tips and tricks that you come to know after +working on/with LLVM that aren't obvious at first glance. Instead of +letting everyone rediscover them, this section talks about some of these +issues. + +Implementing portable offsetof/sizeof +------------------------------------- + +One interesting thing that comes up, if you are trying to keep the code +generated by your compiler "target independent", is that you often need +to know the size of some LLVM type or the offset of some field in an +llvm structure. For example, you might need to pass the size of a type +into a function that allocates memory. + +Unfortunately, this can vary widely across targets: for example the +width of a pointer is trivially target-specific. However, there is a +`clever way to use the getelementptr +instruction <http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt>`_ +that allows you to compute this in a portable way. + +Garbage Collected Stack Frames +------------------------------ + +Some languages want to explicitly manage their stack frames, often so +that they are garbage collected or to allow easy implementation of +closures. There are often better ways to implement these features than +explicit stack frames, but `LLVM does support +them, <http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt>`_ +if you want. It requires your front-end to convert the code into +`Continuation Passing +Style <http://en.wikipedia.org/wiki/Continuation-passing_style>`_ and +the use of tail calls (which LLVM also supports). + diff --git a/gnu/llvm/docs/tutorial/OCamlLangImpl1.rst b/gnu/llvm/docs/tutorial/OCamlLangImpl1.rst index cf968b5ae89..9de92305a1c 100644 --- a/gnu/llvm/docs/tutorial/OCamlLangImpl1.rst +++ b/gnu/llvm/docs/tutorial/OCamlLangImpl1.rst @@ -106,7 +106,7 @@ support the if/then/else construct, a for loop, user defined operators, JIT compilation with a simple command line interface, etc. Because we want to keep things simple, the only datatype in Kaleidoscope -is a 64-bit floating point type (aka 'float' in O'Caml parlance). As +is a 64-bit floating point type (aka 'float' in OCaml parlance). As such, all values are implicitly double precision and the language doesn't require type declarations. This gives the language a very nice and simple syntax. For example, the following simple example computes diff --git a/gnu/llvm/docs/tutorial/OCamlLangImpl5.rst b/gnu/llvm/docs/tutorial/OCamlLangImpl5.rst index 675b9bc1978..3a135b23337 100644 --- a/gnu/llvm/docs/tutorial/OCamlLangImpl5.rst +++ b/gnu/llvm/docs/tutorial/OCamlLangImpl5.rst @@ -178,7 +178,7 @@ IR into "t.ll" and run "``llvm-as < t.ll | opt -analyze -view-cfg``", `a window will pop up <../ProgrammersManual.html#viewing-graphs-while-debugging-code>`_ and you'll see this graph: -.. figure:: LangImpl5-cfg.png +.. figure:: LangImpl05-cfg.png :align: center :alt: Example CFG diff --git a/gnu/llvm/docs/tutorial/OCamlLangImpl6.rst b/gnu/llvm/docs/tutorial/OCamlLangImpl6.rst index a3ae11fd7e5..2fa25f5c22f 100644 --- a/gnu/llvm/docs/tutorial/OCamlLangImpl6.rst +++ b/gnu/llvm/docs/tutorial/OCamlLangImpl6.rst @@ -496,17 +496,17 @@ converge: # determine whether the specific location diverges. # Solve for z = z^2 + c in the complex plane. - def mandleconverger(real imag iters creal cimag) + def mandelconverger(real imag iters creal cimag) if iters > 255 | (real*real + imag*imag > 4) then iters else - mandleconverger(real*real - imag*imag + creal, + mandelconverger(real*real - imag*imag + creal, 2*real*imag + cimag, iters+1, creal, cimag); # return the number of iterations required for the iteration to escape - def mandleconverge(real imag) - mandleconverger(real, imag, 0, real, imag); + def mandelconverge(real imag) + mandelconverger(real, imag, 0, real, imag); This "z = z\ :sup:`2`\ + c" function is a beautiful little creature that is the basis for computation of the `Mandelbrot @@ -520,12 +520,12 @@ but we can whip together something using the density plotter above: :: - # compute and plot the mandlebrot set with the specified 2 dimensional range + # compute and plot the mandelbrot set with the specified 2 dimensional range # info. def mandelhelp(xmin xmax xstep ymin ymax ystep) for y = ymin, y < ymax, ystep in ( (for x = xmin, x < xmax, xstep in - printdensity(mandleconverge(x,y))) + printdensity(mandelconverge(x,y))) : putchard(10) ) @@ -535,7 +535,7 @@ but we can whip together something using the density plotter above: mandelhelp(realstart, realstart+realmag*78, realmag, imagstart, imagstart+imagmag*40, imagmag); -Given this, we can try plotting out the mandlebrot set! Lets try it out: +Given this, we can try plotting out the mandelbrot set! Lets try it out: :: diff --git a/gnu/llvm/docs/tutorial/OCamlLangImpl7.rst b/gnu/llvm/docs/tutorial/OCamlLangImpl7.rst index c8c701b9101..f36845c5234 100644 --- a/gnu/llvm/docs/tutorial/OCamlLangImpl7.rst +++ b/gnu/llvm/docs/tutorial/OCamlLangImpl7.rst @@ -224,7 +224,7 @@ variables in certain circumstances: class <../LangRef.html#first-class-types>`_ values (such as pointers, scalars and vectors), and only if the array size of the allocation is 1 (or missing in the .ll file). mem2reg is not capable of promoting - structs or arrays to registers. Note that the "scalarrepl" pass is + structs or arrays to registers. Note that the "sroa" pass is more powerful and can promote structs, "unions", and arrays in many cases. diff --git a/gnu/llvm/docs/tutorial/index.rst b/gnu/llvm/docs/tutorial/index.rst index dde53badd3a..494cfd0a33a 100644 --- a/gnu/llvm/docs/tutorial/index.rst +++ b/gnu/llvm/docs/tutorial/index.rst @@ -22,6 +22,16 @@ Kaleidoscope: Implementing a Language with LLVM in Objective Caml OCamlLangImpl* +Building a JIT in LLVM +=============================================== + +.. toctree:: + :titlesonly: + :glob: + :numbered: + + BuildingAJIT* + External Tutorials ================== |
