diff options
| author | 2016-09-03 22:46:54 +0000 | |
|---|---|---|
| committer | 2016-09-03 22:46:54 +0000 | |
| commit | b5500b9ca0102f1ccaf32f0e77e96d0739aded9b (patch) | |
| tree | e1b7ebb5a0231f9e6d8d3f6f719582cebd64dc98 /gnu/llvm/docs/SourceLevelDebugging.rst | |
| parent | clarify purpose of src/gnu/ directory. (diff) | |
| download | wireguard-openbsd-b5500b9ca0102f1ccaf32f0e77e96d0739aded9b.tar.xz wireguard-openbsd-b5500b9ca0102f1ccaf32f0e77e96d0739aded9b.zip | |
Use the space freed up by sparc and zaurus to import LLVM.
ok hackroom@
Diffstat (limited to 'gnu/llvm/docs/SourceLevelDebugging.rst')
| -rw-r--r-- | gnu/llvm/docs/SourceLevelDebugging.rst | 1335 |
1 files changed, 1335 insertions, 0 deletions
diff --git a/gnu/llvm/docs/SourceLevelDebugging.rst b/gnu/llvm/docs/SourceLevelDebugging.rst new file mode 100644 index 00000000000..270c44eb50b --- /dev/null +++ b/gnu/llvm/docs/SourceLevelDebugging.rst @@ -0,0 +1,1335 @@ +================================ +Source Level Debugging with LLVM +================================ + +.. contents:: + :local: + +Introduction +============ + +This document is the central repository for all information pertaining to debug +information in LLVM. It describes the :ref:`actual format that the LLVM debug +information takes <format>`, which is useful for those interested in creating +front-ends or dealing directly with the information. Further, this document +provides specific examples of what debug information for C/C++ looks like. + +Philosophy behind LLVM debugging information +-------------------------------------------- + +The idea of the LLVM debugging information is to capture how the important +pieces of the source-language's Abstract Syntax Tree map onto LLVM code. +Several design aspects have shaped the solution that appears here. The +important ones are: + +* Debugging information should have very little impact on the rest of the + compiler. No transformations, analyses, or code generators should need to + be modified because of debugging information. + +* LLVM optimizations should interact in :ref:`well-defined and easily described + ways <intro_debugopt>` with the debugging information. + +* Because LLVM is designed to support arbitrary programming languages, + LLVM-to-LLVM tools should not need to know anything about the semantics of + the source-level-language. + +* Source-level languages are often **widely** different from one another. + LLVM should not put any restrictions of the flavor of the source-language, + and the debugging information should work with any language. + +* With code generator support, it should be possible to use an LLVM compiler + to compile a program to native machine code and standard debugging + formats. This allows compatibility with traditional machine-code level + debuggers, like GDB or DBX. + +The approach used by the LLVM implementation is to use a small set of +:ref:`intrinsic functions <format_common_intrinsics>` to define a mapping +between LLVM program objects and the source-level objects. The description of +the source-level program is maintained in LLVM metadata in an +:ref:`implementation-defined format <ccxx_frontend>` (the C/C++ front-end +currently uses working draft 7 of the `DWARF 3 standard +<http://www.eagercon.com/dwarf/dwarf3std.htm>`_). + +When a program is being debugged, a debugger interacts with the user and turns +the stored debug information into source-language specific information. As +such, a debugger must be aware of the source-language, and is thus tied to a +specific language or family of languages. + +Debug information consumers +--------------------------- + +The role of debug information is to provide meta information normally stripped +away during the compilation process. This meta information provides an LLVM +user a relationship between generated code and the original program source +code. + +Currently, debug information is consumed by DwarfDebug to produce dwarf +information used by the gdb debugger. Other targets could use the same +information to produce stabs or other debug forms. + +It would also be reasonable to use debug information to feed profiling tools +for analysis of generated code, or, tools for reconstructing the original +source from generated code. + +TODO - expound a bit more. + +.. _intro_debugopt: + +Debugging optimized code +------------------------ + +An extremely high priority of LLVM debugging information is to make it interact +well with optimizations and analysis. In particular, the LLVM debug +information provides the following guarantees: + +* LLVM debug information **always provides information to accurately read + the source-level state of the program**, regardless of which LLVM + optimizations have been run, and without any modification to the + optimizations themselves. However, some optimizations may impact the + ability to modify the current state of the program with a debugger, such + as setting program variables, or calling functions that have been + deleted. + +* As desired, LLVM optimizations can be upgraded to be aware of the LLVM + debugging information, allowing them to update the debugging information + as they perform aggressive optimizations. This means that, with effort, + the LLVM optimizers could optimize debug code just as well as non-debug + code. + +* LLVM debug information does not prevent optimizations from + happening (for example inlining, basic block reordering/merging/cleanup, + tail duplication, etc). + +* LLVM debug information is automatically optimized along with the rest of + the program, using existing facilities. For example, duplicate + information is automatically merged by the linker, and unused information + is automatically removed. + +Basically, the debug information allows you to compile a program with +"``-O0 -g``" and get full debug information, allowing you to arbitrarily modify +the program as it executes from a debugger. Compiling a program with +"``-O3 -g``" gives you full debug information that is always available and +accurate for reading (e.g., you get accurate stack traces despite tail call +elimination and inlining), but you might lose the ability to modify the program +and call functions where were optimized out of the program, or inlined away +completely. + +:ref:`LLVM test suite <test-suite-quickstart>` provides a framework to test +optimizer's handling of debugging information. It can be run like this: + +.. code-block:: bash + + % cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level + % make TEST=dbgopt + +This will test impact of debugging information on optimization passes. If +debugging information influences optimization passes then it will be reported +as a failure. See :doc:`TestingGuide` for more information on LLVM test +infrastructure and how to run various tests. + +.. _format: + +Debugging information format +============================ + +LLVM debugging information has been carefully designed to make it possible for +the optimizer to optimize the program and debugging information without +necessarily having to know anything about debugging information. In +particular, the use of metadata avoids duplicated debugging information from +the beginning, and the global dead code elimination pass automatically deletes +debugging information for a function if it decides to delete the function. + +To do this, most of the debugging information (descriptors for types, +variables, functions, source files, etc) is inserted by the language front-end +in the form of LLVM metadata. + +Debug information is designed to be agnostic about the target debugger and +debugging information representation (e.g. DWARF/Stabs/etc). It uses a generic +pass to decode the information that represents variables, types, functions, +namespaces, etc: this allows for arbitrary source-language semantics and +type-systems to be used, as long as there is a module written for the target +debugger to interpret the information. + +To provide basic functionality, the LLVM debugger does have to make some +assumptions about the source-level language being debugged, though it keeps +these to a minimum. The only common features that the LLVM debugger assumes +exist are `source files <LangRef.html#difile>`_, and `program objects +<LangRef.html#diglobalvariable>`_. These abstract objects are used by a +debugger to form stack traces, show information about local variables, etc. + +This section of the documentation first describes the representation aspects +common to any source-language. :ref:`ccxx_frontend` describes the data layout +conventions used by the C and C++ front-ends. + +Debug information descriptors are `specialized metadata nodes +<LangRef.html#specialized-metadata>`_, first-class subclasses of ``Metadata``. + +.. _format_common_intrinsics: + +Debugger intrinsic functions +---------------------------- + +LLVM uses several intrinsic functions (name prefixed with "``llvm.dbg``") to +provide debug information at various points in generated code. + +``llvm.dbg.declare`` +^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: llvm + + void @llvm.dbg.declare(metadata, metadata, metadata) + +This intrinsic provides information about a local element (e.g., variable). +The first argument is metadata holding the alloca for the variable. The second +argument is a `local variable <LangRef.html#dilocalvariable>`_ containing a +description of the variable. The third argument is a `complex expression +<LangRef.html#diexpression>`_. + +``llvm.dbg.value`` +^^^^^^^^^^^^^^^^^^ + +.. code-block:: llvm + + void @llvm.dbg.value(metadata, i64, metadata, metadata) + +This intrinsic provides information when a user source variable is set to a new +value. The first argument is the new value (wrapped as metadata). The second +argument is the offset in the user source variable where the new value is +written. The third argument is a `local variable +<LangRef.html#dilocalvariable>`_ containing a description of the variable. The +third argument is a `complex expression <LangRef.html#diexpression>`_. + +Object lifetimes and scoping +============================ + +In many languages, the local variables in functions can have their lifetimes or +scopes limited to a subset of a function. In the C family of languages, for +example, variables are only live (readable and writable) within the source +block that they are defined in. In functional languages, values are only +readable after they have been defined. Though this is a very obvious concept, +it is non-trivial to model in LLVM, because it has no notion of scoping in this +sense, and does not want to be tied to a language's scoping rules. + +In order to handle this, the LLVM debug format uses the metadata attached to +llvm instructions to encode line number and scoping information. Consider the +following C fragment, for example: + +.. code-block:: c + + 1. void foo() { + 2. int X = 21; + 3. int Y = 22; + 4. { + 5. int Z = 23; + 6. Z = X; + 7. } + 8. X = Y; + 9. } + +Compiled to LLVM, this function would be represented like this: + +.. code-block:: llvm + + ; Function Attrs: nounwind ssp uwtable + define void @foo() #0 !dbg !4 { + entry: + %X = alloca i32, align 4 + %Y = alloca i32, align 4 + %Z = alloca i32, align 4 + call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !13), !dbg !14 + store i32 21, i32* %X, align 4, !dbg !14 + call void @llvm.dbg.declare(metadata i32* %Y, metadata !15, metadata !13), !dbg !16 + store i32 22, i32* %Y, align 4, !dbg !16 + call void @llvm.dbg.declare(metadata i32* %Z, metadata !17, metadata !13), !dbg !19 + store i32 23, i32* %Z, align 4, !dbg !19 + %0 = load i32, i32* %X, align 4, !dbg !20 + store i32 %0, i32* %Z, align 4, !dbg !21 + %1 = load i32, i32* %Y, align 4, !dbg !22 + store i32 %1, i32* %X, align 4, !dbg !23 + ret void, !dbg !24 + } + + ; Function Attrs: nounwind readnone + declare void @llvm.dbg.declare(metadata, metadata, metadata) #1 + + attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" } + attributes #1 = { nounwind readnone } + + !llvm.dbg.cu = !{!0} + !llvm.module.flags = !{!7, !8, !9} + !llvm.ident = !{!10} + + !0 = !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)", isOptimized: false, runtimeVersion: 0, emissionKind: 1, enums: !2, retainedTypes: !2, subprograms: !3, globals: !2, imports: !2) + !1 = !DIFile(filename: "/dev/stdin", directory: "/Users/dexonsmith/data/llvm/debug-info") + !2 = !{} + !3 = !{!4} + !4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: false, variables: !2) + !5 = !DISubroutineType(types: !6) + !6 = !{null} + !7 = !{i32 2, !"Dwarf Version", i32 2} + !8 = !{i32 2, !"Debug Info Version", i32 3} + !9 = !{i32 1, !"PIC Level", i32 2} + !10 = !{!"clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)"} + !11 = !DILocalVariable(name: "X", scope: !4, file: !1, line: 2, type: !12) + !12 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed) + !13 = !DIExpression() + !14 = !DILocation(line: 2, column: 9, scope: !4) + !15 = !DILocalVariable(name: "Y", scope: !4, file: !1, line: 3, type: !12) + !16 = !DILocation(line: 3, column: 9, scope: !4) + !17 = !DILocalVariable(name: "Z", scope: !18, file: !1, line: 5, type: !12) + !18 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5) + !19 = !DILocation(line: 5, column: 11, scope: !18) + !20 = !DILocation(line: 6, column: 11, scope: !18) + !21 = !DILocation(line: 6, column: 9, scope: !18) + !22 = !DILocation(line: 8, column: 9, scope: !4) + !23 = !DILocation(line: 8, column: 7, scope: !4) + !24 = !DILocation(line: 9, column: 3, scope: !4) + + +This example illustrates a few important details about LLVM debugging +information. In particular, it shows how the ``llvm.dbg.declare`` intrinsic and +location information, which are attached to an instruction, are applied +together to allow a debugger to analyze the relationship between statements, +variable definitions, and the code used to implement the function. + +.. code-block:: llvm + + call void @llvm.dbg.declare(metadata i32* %X, metadata !11, metadata !13), !dbg !14 + ; [debug line = 2:7] [debug variable = X] + +The first intrinsic ``%llvm.dbg.declare`` encodes debugging information for the +variable ``X``. The metadata ``!dbg !14`` attached to the intrinsic provides +scope information for the variable ``X``. + +.. code-block:: llvm + + !14 = !DILocation(line: 2, column: 9, scope: !4) + !4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 1, type: !5, + isLocal: false, isDefinition: true, scopeLine: 1, + isOptimized: false, variables: !2) + +Here ``!14`` is metadata providing `location information +<LangRef.html#dilocation>`_. In this example, scope is encoded by ``!4``, a +`subprogram descriptor <LangRef.html#disubprogram>`_. This way the location +information attached to the intrinsics indicates that the variable ``X`` is +declared at line number 2 at a function level scope in function ``foo``. + +Now lets take another example. + +.. code-block:: llvm + + call void @llvm.dbg.declare(metadata i32* %Z, metadata !17, metadata !13), !dbg !19 + ; [debug line = 5:9] [debug variable = Z] + +The third intrinsic ``%llvm.dbg.declare`` encodes debugging information for +variable ``Z``. The metadata ``!dbg !19`` attached to the intrinsic provides +scope information for the variable ``Z``. + +.. code-block:: llvm + + !18 = distinct !DILexicalBlock(scope: !4, file: !1, line: 4, column: 5) + !19 = !DILocation(line: 5, column: 11, scope: !18) + +Here ``!19`` indicates that ``Z`` is declared at line number 5 and column +number 0 inside of lexical scope ``!18``. The lexical scope itself resides +inside of subprogram ``!4`` described above. + +The scope information attached with each instruction provides a straightforward +way to find instructions covered by a scope. + +.. _ccxx_frontend: + +C/C++ front-end specific debug information +========================================== + +The C and C++ front-ends represent information about the program in a format +that is effectively identical to `DWARF 3.0 +<http://www.eagercon.com/dwarf/dwarf3std.htm>`_ in terms of information +content. This allows code generators to trivially support native debuggers by +generating standard dwarf information, and contains enough information for +non-dwarf targets to translate it as needed. + +This section describes the forms used to represent C and C++ programs. Other +languages could pattern themselves after this (which itself is tuned to +representing programs in the same way that DWARF 3 does), or they could choose +to provide completely different forms if they don't fit into the DWARF model. +As support for debugging information gets added to the various LLVM +source-language front-ends, the information used should be documented here. + +The following sections provide examples of a few C/C++ constructs and the debug +information that would best describe those constructs. The canonical +references are the ``DIDescriptor`` classes defined in +``include/llvm/IR/DebugInfo.h`` and the implementations of the helper functions +in ``lib/IR/DIBuilder.cpp``. + +C/C++ source file information +----------------------------- + +``llvm::Instruction`` provides easy access to metadata attached with an +instruction. One can extract line number information encoded in LLVM IR using +``Instruction::getDebugLoc()`` and ``DILocation::getLine()``. + +.. code-block:: c++ + + if (DILocation *Loc = I->getDebugLoc()) { // Here I is an LLVM instruction + unsigned Line = Loc->getLine(); + StringRef File = Loc->getFilename(); + StringRef Dir = Loc->getDirectory(); + } + +C/C++ global variable information +--------------------------------- + +Given an integer global variable declared as follows: + +.. code-block:: c + + int MyGlobal = 100; + +a C/C++ front-end would generate the following descriptors: + +.. code-block:: llvm + + ;; + ;; Define the global itself. + ;; + @MyGlobal = global i32 100, align 4 + + ;; + ;; List of debug info of globals + ;; + !llvm.dbg.cu = !{!0} + + ;; Some unrelated metadata. + !llvm.module.flags = !{!6, !7} + + ;; Define the compile unit. + !0 = !DICompileUnit(language: DW_LANG_C99, file: !1, + producer: + "clang version 3.7.0 (trunk 231150) (llvm/trunk 231154)", + isOptimized: false, runtimeVersion: 0, emissionKind: 1, + enums: !2, retainedTypes: !2, subprograms: !2, globals: + !3, imports: !2) + + ;; + ;; Define the file + ;; + !1 = !DIFile(filename: "/dev/stdin", + directory: "/Users/dexonsmith/data/llvm/debug-info") + + ;; An empty array. + !2 = !{} + + ;; The Array of Global Variables + !3 = !{!4} + + ;; + ;; Define the global variable itself. + ;; + !4 = !DIGlobalVariable(name: "MyGlobal", scope: !0, file: !1, line: 1, + type: !5, isLocal: false, isDefinition: true, + variable: i32* @MyGlobal) + + ;; + ;; Define the type + ;; + !5 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed) + + ;; Dwarf version to output. + !6 = !{i32 2, !"Dwarf Version", i32 2} + + ;; Debug info schema version. + !7 = !{i32 2, !"Debug Info Version", i32 3} + +C/C++ function information +-------------------------- + +Given a function declared as follows: + +.. code-block:: c + + int main(int argc, char *argv[]) { + return 0; + } + +a C/C++ front-end would generate the following descriptors: + +.. code-block:: llvm + + ;; + ;; Define the anchor for subprograms. + ;; + !4 = !DISubprogram(name: "main", scope: !1, file: !1, line: 1, type: !5, + isLocal: false, isDefinition: true, scopeLine: 1, + flags: DIFlagPrototyped, isOptimized: false, + variables: !2) + + ;; + ;; Define the subprogram itself. + ;; + define i32 @main(i32 %argc, i8** %argv) !dbg !4 { + ... + } + +Debugging information format +============================ + +Debugging Information Extension for Objective C Properties +---------------------------------------------------------- + +Introduction +^^^^^^^^^^^^ + +Objective C provides a simpler way to declare and define accessor methods using +declared properties. The language provides features to declare a property and +to let compiler synthesize accessor methods. + +The debugger lets developer inspect Objective C interfaces and their instance +variables and class variables. However, the debugger does not know anything +about the properties defined in Objective C interfaces. The debugger consumes +information generated by compiler in DWARF format. The format does not support +encoding of Objective C properties. This proposal describes DWARF extensions to +encode Objective C properties, which the debugger can use to let developers +inspect Objective C properties. + +Proposal +^^^^^^^^ + +Objective C properties exist separately from class members. A property can be +defined only by "setter" and "getter" selectors, and be calculated anew on each +access. Or a property can just be a direct access to some declared ivar. +Finally it can have an ivar "automatically synthesized" for it by the compiler, +in which case the property can be referred to in user code directly using the +standard C dereference syntax as well as through the property "dot" syntax, but +there is no entry in the ``@interface`` declaration corresponding to this ivar. + +To facilitate debugging, these properties we will add a new DWARF TAG into the +``DW_TAG_structure_type`` definition for the class to hold the description of a +given property, and a set of DWARF attributes that provide said description. +The property tag will also contain the name and declared type of the property. + +If there is a related ivar, there will also be a DWARF property attribute placed +in the ``DW_TAG_member`` DIE for that ivar referring back to the property TAG +for that property. And in the case where the compiler synthesizes the ivar +directly, the compiler is expected to generate a ``DW_TAG_member`` for that +ivar (with the ``DW_AT_artificial`` set to 1), whose name will be the name used +to access this ivar directly in code, and with the property attribute pointing +back to the property it is backing. + +The following examples will serve as illustration for our discussion: + +.. code-block:: objc + + @interface I1 { + int n2; + } + + @property int p1; + @property int p2; + @end + + @implementation I1 + @synthesize p1; + @synthesize p2 = n2; + @end + +This produces the following DWARF (this is a "pseudo dwarfdump" output): + +.. code-block:: none + + 0x00000100: TAG_structure_type [7] * + AT_APPLE_runtime_class( 0x10 ) + AT_name( "I1" ) + AT_decl_file( "Objc_Property.m" ) + AT_decl_line( 3 ) + + 0x00000110 TAG_APPLE_property + AT_name ( "p1" ) + AT_type ( {0x00000150} ( int ) ) + + 0x00000120: TAG_APPLE_property + AT_name ( "p2" ) + AT_type ( {0x00000150} ( int ) ) + + 0x00000130: TAG_member [8] + AT_name( "_p1" ) + AT_APPLE_property ( {0x00000110} "p1" ) + AT_type( {0x00000150} ( int ) ) + AT_artificial ( 0x1 ) + + 0x00000140: TAG_member [8] + AT_name( "n2" ) + AT_APPLE_property ( {0x00000120} "p2" ) + AT_type( {0x00000150} ( int ) ) + + 0x00000150: AT_type( ( int ) ) + +Note, the current convention is that the name of the ivar for an +auto-synthesized property is the name of the property from which it derives +with an underscore prepended, as is shown in the example. But we actually +don't need to know this convention, since we are given the name of the ivar +directly. + +Also, it is common practice in ObjC to have different property declarations in +the @interface and @implementation - e.g. to provide a read-only property in +the interface,and a read-write interface in the implementation. In that case, +the compiler should emit whichever property declaration will be in force in the +current translation unit. + +Developers can decorate a property with attributes which are encoded using +``DW_AT_APPLE_property_attribute``. + +.. code-block:: objc + + @property (readonly, nonatomic) int pr; + +.. code-block:: none + + TAG_APPLE_property [8] + AT_name( "pr" ) + AT_type ( {0x00000147} (int) ) + AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic) + +The setter and getter method names are attached to the property using +``DW_AT_APPLE_property_setter`` and ``DW_AT_APPLE_property_getter`` attributes. + +.. code-block:: objc + + @interface I1 + @property (setter=myOwnP3Setter:) int p3; + -(void)myOwnP3Setter:(int)a; + @end + + @implementation I1 + @synthesize p3; + -(void)myOwnP3Setter:(int)a{ } + @end + +The DWARF for this would be: + +.. code-block:: none + + 0x000003bd: TAG_structure_type [7] * + AT_APPLE_runtime_class( 0x10 ) + AT_name( "I1" ) + AT_decl_file( "Objc_Property.m" ) + AT_decl_line( 3 ) + + 0x000003cd TAG_APPLE_property + AT_name ( "p3" ) + AT_APPLE_property_setter ( "myOwnP3Setter:" ) + AT_type( {0x00000147} ( int ) ) + + 0x000003f3: TAG_member [8] + AT_name( "_p3" ) + AT_type ( {0x00000147} ( int ) ) + AT_APPLE_property ( {0x000003cd} ) + AT_artificial ( 0x1 ) + +New DWARF Tags +^^^^^^^^^^^^^^ + ++-----------------------+--------+ +| TAG | Value | ++=======================+========+ +| DW_TAG_APPLE_property | 0x4200 | ++-----------------------+--------+ + +New DWARF Attributes +^^^^^^^^^^^^^^^^^^^^ + ++--------------------------------+--------+-----------+ +| Attribute | Value | Classes | ++================================+========+===========+ +| DW_AT_APPLE_property | 0x3fed | Reference | ++--------------------------------+--------+-----------+ +| DW_AT_APPLE_property_getter | 0x3fe9 | String | ++--------------------------------+--------+-----------+ +| DW_AT_APPLE_property_setter | 0x3fea | String | ++--------------------------------+--------+-----------+ +| DW_AT_APPLE_property_attribute | 0x3feb | Constant | ++--------------------------------+--------+-----------+ + +New DWARF Constants +^^^^^^^^^^^^^^^^^^^ + ++--------------------------------------+-------+ +| Name | Value | ++======================================+=======+ +| DW_APPLE_PROPERTY_readonly | 0x01 | ++--------------------------------------+-------+ +| DW_APPLE_PROPERTY_getter | 0x02 | ++--------------------------------------+-------+ +| DW_APPLE_PROPERTY_assign | 0x04 | ++--------------------------------------+-------+ +| DW_APPLE_PROPERTY_readwrite | 0x08 | ++--------------------------------------+-------+ +| DW_APPLE_PROPERTY_retain | 0x10 | ++--------------------------------------+-------+ +| DW_APPLE_PROPERTY_copy | 0x20 | ++--------------------------------------+-------+ +| DW_APPLE_PROPERTY_nonatomic | 0x40 | ++--------------------------------------+-------+ +| DW_APPLE_PROPERTY_setter | 0x80 | ++--------------------------------------+-------+ +| DW_APPLE_PROPERTY_atomic | 0x100 | ++--------------------------------------+-------+ +| DW_APPLE_PROPERTY_weak | 0x200 | ++--------------------------------------+-------+ +| DW_APPLE_PROPERTY_strong | 0x400 | ++--------------------------------------+-------+ +| DW_APPLE_PROPERTY_unsafe_unretained | 0x800 | ++--------------------------------+-----+-------+ + +Name Accelerator Tables +----------------------- + +Introduction +^^^^^^^^^^^^ + +The "``.debug_pubnames``" and "``.debug_pubtypes``" formats are not what a +debugger needs. The "``pub``" in the section name indicates that the entries +in the table are publicly visible names only. This means no static or hidden +functions show up in the "``.debug_pubnames``". No static variables or private +class variables are in the "``.debug_pubtypes``". Many compilers add different +things to these tables, so we can't rely upon the contents between gcc, icc, or +clang. + +The typical query given by users tends not to match up with the contents of +these tables. For example, the DWARF spec states that "In the case of the name +of a function member or static data member of a C++ structure, class or union, +the name presented in the "``.debug_pubnames``" section is not the simple name +given by the ``DW_AT_name attribute`` of the referenced debugging information +entry, but rather the fully qualified name of the data or function member." +So the only names in these tables for complex C++ entries is a fully +qualified name. Debugger users tend not to enter their search strings as +"``a::b::c(int,const Foo&) const``", but rather as "``c``", "``b::c``" , or +"``a::b::c``". So the name entered in the name table must be demangled in +order to chop it up appropriately and additional names must be manually entered +into the table to make it effective as a name lookup table for debuggers to +use. + +All debuggers currently ignore the "``.debug_pubnames``" table as a result of +its inconsistent and useless public-only name content making it a waste of +space in the object file. These tables, when they are written to disk, are not +sorted in any way, leaving every debugger to do its own parsing and sorting. +These tables also include an inlined copy of the string values in the table +itself making the tables much larger than they need to be on disk, especially +for large C++ programs. + +Can't we just fix the sections by adding all of the names we need to this +table? No, because that is not what the tables are defined to contain and we +won't know the difference between the old bad tables and the new good tables. +At best we could make our own renamed sections that contain all of the data we +need. + +These tables are also insufficient for what a debugger like LLDB needs. LLDB +uses clang for its expression parsing where LLDB acts as a PCH. LLDB is then +often asked to look for type "``foo``" or namespace "``bar``", or list items in +namespace "``baz``". Namespaces are not included in the pubnames or pubtypes +tables. Since clang asks a lot of questions when it is parsing an expression, +we need to be very fast when looking up names, as it happens a lot. Having new +accelerator tables that are optimized for very quick lookups will benefit this +type of debugging experience greatly. + +We would like to generate name lookup tables that can be mapped into memory +from disk, and used as is, with little or no up-front parsing. We would also +be able to control the exact content of these different tables so they contain +exactly what we need. The Name Accelerator Tables were designed to fix these +issues. In order to solve these issues we need to: + +* Have a format that can be mapped into memory from disk and used as is +* Lookups should be very fast +* Extensible table format so these tables can be made by many producers +* Contain all of the names needed for typical lookups out of the box +* Strict rules for the contents of tables + +Table size is important and the accelerator table format should allow the reuse +of strings from common string tables so the strings for the names are not +duplicated. We also want to make sure the table is ready to be used as-is by +simply mapping the table into memory with minimal header parsing. + +The name lookups need to be fast and optimized for the kinds of lookups that +debuggers tend to do. Optimally we would like to touch as few parts of the +mapped table as possible when doing a name lookup and be able to quickly find +the name entry we are looking for, or discover there are no matches. In the +case of debuggers we optimized for lookups that fail most of the time. + +Each table that is defined should have strict rules on exactly what is in the +accelerator tables and documented so clients can rely on the content. + +Hash Tables +^^^^^^^^^^^ + +Standard Hash Tables +"""""""""""""""""""" + +Typical hash tables have a header, buckets, and each bucket points to the +bucket contents: + +.. code-block:: none + + .------------. + | HEADER | + |------------| + | BUCKETS | + |------------| + | DATA | + `------------' + +The BUCKETS are an array of offsets to DATA for each hash: + +.. code-block:: none + + .------------. + | 0x00001000 | BUCKETS[0] + | 0x00002000 | BUCKETS[1] + | 0x00002200 | BUCKETS[2] + | 0x000034f0 | BUCKETS[3] + | | ... + | 0xXXXXXXXX | BUCKETS[n_buckets] + '------------' + +So for ``bucket[3]`` in the example above, we have an offset into the table +0x000034f0 which points to a chain of entries for the bucket. Each bucket must +contain a next pointer, full 32 bit hash value, the string itself, and the data +for the current string value. + +.. code-block:: none + + .------------. + 0x000034f0: | 0x00003500 | next pointer + | 0x12345678 | 32 bit hash + | "erase" | string value + | data[n] | HashData for this bucket + |------------| + 0x00003500: | 0x00003550 | next pointer + | 0x29273623 | 32 bit hash + | "dump" | string value + | data[n] | HashData for this bucket + |------------| + 0x00003550: | 0x00000000 | next pointer + | 0x82638293 | 32 bit hash + | "main" | string value + | data[n] | HashData for this bucket + `------------' + +The problem with this layout for debuggers is that we need to optimize for the +negative lookup case where the symbol we're searching for is not present. So +if we were to lookup "``printf``" in the table above, we would make a 32 hash +for "``printf``", it might match ``bucket[3]``. We would need to go to the +offset 0x000034f0 and start looking to see if our 32 bit hash matches. To do +so, we need to read the next pointer, then read the hash, compare it, and skip +to the next bucket. Each time we are skipping many bytes in memory and +touching new cache pages just to do the compare on the full 32 bit hash. All +of these accesses then tell us that we didn't have a match. + +Name Hash Tables +"""""""""""""""" + +To solve the issues mentioned above we have structured the hash tables a bit +differently: a header, buckets, an array of all unique 32 bit hash values, +followed by an array of hash value data offsets, one for each hash value, then +the data for all hash values: + +.. code-block:: none + + .-------------. + | HEADER | + |-------------| + | BUCKETS | + |-------------| + | HASHES | + |-------------| + | OFFSETS | + |-------------| + | DATA | + `-------------' + +The ``BUCKETS`` in the name tables are an index into the ``HASHES`` array. By +making all of the full 32 bit hash values contiguous in memory, we allow +ourselves to efficiently check for a match while touching as little memory as +possible. Most often checking the 32 bit hash values is as far as the lookup +goes. If it does match, it usually is a match with no collisions. So for a +table with "``n_buckets``" buckets, and "``n_hashes``" unique 32 bit hash +values, we can clarify the contents of the ``BUCKETS``, ``HASHES`` and +``OFFSETS`` as: + +.. code-block:: none + + .-------------------------. + | HEADER.magic | uint32_t + | HEADER.version | uint16_t + | HEADER.hash_function | uint16_t + | HEADER.bucket_count | uint32_t + | HEADER.hashes_count | uint32_t + | HEADER.header_data_len | uint32_t + | HEADER_DATA | HeaderData + |-------------------------| + | BUCKETS | uint32_t[n_buckets] // 32 bit hash indexes + |-------------------------| + | HASHES | uint32_t[n_hashes] // 32 bit hash values + |-------------------------| + | OFFSETS | uint32_t[n_hashes] // 32 bit offsets to hash value data + |-------------------------| + | ALL HASH DATA | + `-------------------------' + +So taking the exact same data from the standard hash example above we end up +with: + +.. code-block:: none + + .------------. + | HEADER | + |------------| + | 0 | BUCKETS[0] + | 2 | BUCKETS[1] + | 5 | BUCKETS[2] + | 6 | BUCKETS[3] + | | ... + | ... | BUCKETS[n_buckets] + |------------| + | 0x........ | HASHES[0] + | 0x........ | HASHES[1] + | 0x........ | HASHES[2] + | 0x........ | HASHES[3] + | 0x........ | HASHES[4] + | 0x........ | HASHES[5] + | 0x12345678 | HASHES[6] hash for BUCKETS[3] + | 0x29273623 | HASHES[7] hash for BUCKETS[3] + | 0x82638293 | HASHES[8] hash for BUCKETS[3] + | 0x........ | HASHES[9] + | 0x........ | HASHES[10] + | 0x........ | HASHES[11] + | 0x........ | HASHES[12] + | 0x........ | HASHES[13] + | 0x........ | HASHES[n_hashes] + |------------| + | 0x........ | OFFSETS[0] + | 0x........ | OFFSETS[1] + | 0x........ | OFFSETS[2] + | 0x........ | OFFSETS[3] + | 0x........ | OFFSETS[4] + | 0x........ | OFFSETS[5] + | 0x000034f0 | OFFSETS[6] offset for BUCKETS[3] + | 0x00003500 | OFFSETS[7] offset for BUCKETS[3] + | 0x00003550 | OFFSETS[8] offset for BUCKETS[3] + | 0x........ | OFFSETS[9] + | 0x........ | OFFSETS[10] + | 0x........ | OFFSETS[11] + | 0x........ | OFFSETS[12] + | 0x........ | OFFSETS[13] + | 0x........ | OFFSETS[n_hashes] + |------------| + | | + | | + | | + | | + | | + |------------| + 0x000034f0: | 0x00001203 | .debug_str ("erase") + | 0x00000004 | A 32 bit array count - number of HashData with name "erase" + | 0x........ | HashData[0] + | 0x........ | HashData[1] + | 0x........ | HashData[2] + | 0x........ | HashData[3] + | 0x00000000 | String offset into .debug_str (terminate data for hash) + |------------| + 0x00003500: | 0x00001203 | String offset into .debug_str ("collision") + | 0x00000002 | A 32 bit array count - number of HashData with name "collision" + | 0x........ | HashData[0] + | 0x........ | HashData[1] + | 0x00001203 | String offset into .debug_str ("dump") + | 0x00000003 | A 32 bit array count - number of HashData with name "dump" + | 0x........ | HashData[0] + | 0x........ | HashData[1] + | 0x........ | HashData[2] + | 0x00000000 | String offset into .debug_str (terminate data for hash) + |------------| + 0x00003550: | 0x00001203 | String offset into .debug_str ("main") + | 0x00000009 | A 32 bit array count - number of HashData with name "main" + | 0x........ | HashData[0] + | 0x........ | HashData[1] + | 0x........ | HashData[2] + | 0x........ | HashData[3] + | 0x........ | HashData[4] + | 0x........ | HashData[5] + | 0x........ | HashData[6] + | 0x........ | HashData[7] + | 0x........ | HashData[8] + | 0x00000000 | String offset into .debug_str (terminate data for hash) + `------------' + +So we still have all of the same data, we just organize it more efficiently for +debugger lookup. If we repeat the same "``printf``" lookup from above, we +would hash "``printf``" and find it matches ``BUCKETS[3]`` by taking the 32 bit +hash value and modulo it by ``n_buckets``. ``BUCKETS[3]`` contains "6" which +is the index into the ``HASHES`` table. We would then compare any consecutive +32 bit hashes values in the ``HASHES`` array as long as the hashes would be in +``BUCKETS[3]``. We do this by verifying that each subsequent hash value modulo +``n_buckets`` is still 3. In the case of a failed lookup we would access the +memory for ``BUCKETS[3]``, and then compare a few consecutive 32 bit hashes +before we know that we have no match. We don't end up marching through +multiple words of memory and we really keep the number of processor data cache +lines being accessed as small as possible. + +The string hash that is used for these lookup tables is the Daniel J. +Bernstein hash which is also used in the ELF ``GNU_HASH`` sections. It is a +very good hash for all kinds of names in programs with very few hash +collisions. + +Empty buckets are designated by using an invalid hash index of ``UINT32_MAX``. + +Details +^^^^^^^ + +These name hash tables are designed to be generic where specializations of the +table get to define additional data that goes into the header ("``HeaderData``"), +how the string value is stored ("``KeyType``") and the content of the data for each +hash value. + +Header Layout +""""""""""""" + +The header has a fixed part, and the specialized part. The exact format of the +header is: + +.. code-block:: c + + struct Header + { + uint32_t magic; // 'HASH' magic value to allow endian detection + uint16_t version; // Version number + uint16_t hash_function; // The hash function enumeration that was used + uint32_t bucket_count; // The number of buckets in this hash table + uint32_t hashes_count; // The total number of unique hash values and hash data offsets in this table + uint32_t header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment + // Specifically the length of the following HeaderData field - this does not + // include the size of the preceding fields + HeaderData header_data; // Implementation specific header data + }; + +The header starts with a 32 bit "``magic``" value which must be ``'HASH'`` +encoded as an ASCII integer. This allows the detection of the start of the +hash table and also allows the table's byte order to be determined so the table +can be correctly extracted. The "``magic``" value is followed by a 16 bit +``version`` number which allows the table to be revised and modified in the +future. The current version number is 1. ``hash_function`` is a ``uint16_t`` +enumeration that specifies which hash function was used to produce this table. +The current values for the hash function enumerations include: + +.. code-block:: c + + enum HashFunctionType + { + eHashFunctionDJB = 0u, // Daniel J Bernstein hash function + }; + +``bucket_count`` is a 32 bit unsigned integer that represents how many buckets +are in the ``BUCKETS`` array. ``hashes_count`` is the number of unique 32 bit +hash values that are in the ``HASHES`` array, and is the same number of offsets +are contained in the ``OFFSETS`` array. ``header_data_len`` specifies the size +in bytes of the ``HeaderData`` that is filled in by specialized versions of +this table. + +Fixed Lookup +"""""""""""" + +The header is followed by the buckets, hashes, offsets, and hash value data. + +.. code-block:: c + + struct FixedTable + { + uint32_t buckets[Header.bucket_count]; // An array of hash indexes into the "hashes[]" array below + uint32_t hashes [Header.hashes_count]; // Every unique 32 bit hash for the entire table is in this table + uint32_t offsets[Header.hashes_count]; // An offset that corresponds to each item in the "hashes[]" array above + }; + +``buckets`` is an array of 32 bit indexes into the ``hashes`` array. The +``hashes`` array contains all of the 32 bit hash values for all names in the +hash table. Each hash in the ``hashes`` table has an offset in the ``offsets`` +array that points to the data for the hash value. + +This table setup makes it very easy to repurpose these tables to contain +different data, while keeping the lookup mechanism the same for all tables. +This layout also makes it possible to save the table to disk and map it in +later and do very efficient name lookups with little or no parsing. + +DWARF lookup tables can be implemented in a variety of ways and can store a lot +of information for each name. We want to make the DWARF tables extensible and +able to store the data efficiently so we have used some of the DWARF features +that enable efficient data storage to define exactly what kind of data we store +for each name. + +The ``HeaderData`` contains a definition of the contents of each HashData chunk. +We might want to store an offset to all of the debug information entries (DIEs) +for each name. To keep things extensible, we create a list of items, or +Atoms, that are contained in the data for each name. First comes the type of +the data in each atom: + +.. code-block:: c + + enum AtomType + { + eAtomTypeNULL = 0u, + eAtomTypeDIEOffset = 1u, // DIE offset, check form for encoding + eAtomTypeCUOffset = 2u, // DIE offset of the compiler unit header that contains the item in question + eAtomTypeTag = 3u, // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2 + eAtomTypeNameFlags = 4u, // Flags from enum NameFlags + eAtomTypeTypeFlags = 5u, // Flags from enum TypeFlags + }; + +The enumeration values and their meanings are: + +.. code-block:: none + + eAtomTypeNULL - a termination atom that specifies the end of the atom list + eAtomTypeDIEOffset - an offset into the .debug_info section for the DWARF DIE for this name + eAtomTypeCUOffset - an offset into the .debug_info section for the CU that contains the DIE + eAtomTypeDIETag - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is + eAtomTypeNameFlags - Flags for functions and global variables (isFunction, isInlined, isExternal...) + eAtomTypeTypeFlags - Flags for types (isCXXClass, isObjCClass, ...) + +Then we allow each atom type to define the atom type and how the data for each +atom type data is encoded: + +.. code-block:: c + + struct Atom + { + uint16_t type; // AtomType enum value + uint16_t form; // DWARF DW_FORM_XXX defines + }; + +The ``form`` type above is from the DWARF specification and defines the exact +encoding of the data for the Atom type. See the DWARF specification for the +``DW_FORM_`` definitions. + +.. code-block:: c + + struct HeaderData + { + uint32_t die_offset_base; + uint32_t atom_count; + Atoms atoms[atom_count0]; + }; + +``HeaderData`` defines the base DIE offset that should be added to any atoms +that are encoded using the ``DW_FORM_ref1``, ``DW_FORM_ref2``, +``DW_FORM_ref4``, ``DW_FORM_ref8`` or ``DW_FORM_ref_udata``. It also defines +what is contained in each ``HashData`` object -- ``Atom.form`` tells us how large +each field will be in the ``HashData`` and the ``Atom.type`` tells us how this data +should be interpreted. + +For the current implementations of the "``.apple_names``" (all functions + +globals), the "``.apple_types``" (names of all types that are defined), and +the "``.apple_namespaces``" (all namespaces), we currently set the ``Atom`` +array to be: + +.. code-block:: c + + HeaderData.atom_count = 1; + HeaderData.atoms[0].type = eAtomTypeDIEOffset; + HeaderData.atoms[0].form = DW_FORM_data4; + +This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is +encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have +multiple matching DIEs in a single file, which could come up with an inlined +function for instance. Future tables could include more information about the +DIE such as flags indicating if the DIE is a function, method, block, +or inlined. + +The KeyType for the DWARF table is a 32 bit string table offset into the +".debug_str" table. The ".debug_str" is the string table for the DWARF which +may already contain copies of all of the strings. This helps make sure, with +help from the compiler, that we reuse the strings between all of the DWARF +sections and keeps the hash table size down. Another benefit to having the +compiler generate all strings as DW_FORM_strp in the debug info, is that +DWARF parsing can be made much faster. + +After a lookup is made, we get an offset into the hash data. The hash data +needs to be able to deal with 32 bit hash collisions, so the chunk of data +at the offset in the hash data consists of a triple: + +.. code-block:: c + + uint32_t str_offset + uint32_t hash_data_count + HashData[hash_data_count] + +If "str_offset" is zero, then the bucket contents are done. 99.9% of the +hash data chunks contain a single item (no 32 bit hash collision): + +.. code-block:: none + + .------------. + | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") + | 0x00000004 | uint32_t HashData count + | 0x........ | uint32_t HashData[0] DIE offset + | 0x........ | uint32_t HashData[1] DIE offset + | 0x........ | uint32_t HashData[2] DIE offset + | 0x........ | uint32_t HashData[3] DIE offset + | 0x00000000 | uint32_t KeyType (end of hash chain) + `------------' + +If there are collisions, you will have multiple valid string offsets: + +.. code-block:: none + + .------------. + | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") + | 0x00000004 | uint32_t HashData count + | 0x........ | uint32_t HashData[0] DIE offset + | 0x........ | uint32_t HashData[1] DIE offset + | 0x........ | uint32_t HashData[2] DIE offset + | 0x........ | uint32_t HashData[3] DIE offset + | 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print") + | 0x00000002 | uint32_t HashData count + | 0x........ | uint32_t HashData[0] DIE offset + | 0x........ | uint32_t HashData[1] DIE offset + | 0x00000000 | uint32_t KeyType (end of hash chain) + `------------' + +Current testing with real world C++ binaries has shown that there is around 1 +32 bit hash collision per 100,000 name entries. + +Contents +^^^^^^^^ + +As we said, we want to strictly define exactly what is included in the +different tables. For DWARF, we have 3 tables: "``.apple_names``", +"``.apple_types``", and "``.apple_namespaces``". + +"``.apple_names``" sections should contain an entry for each DWARF DIE whose +``DW_TAG`` is a ``DW_TAG_label``, ``DW_TAG_inlined_subroutine``, or +``DW_TAG_subprogram`` that has address attributes: ``DW_AT_low_pc``, +``DW_AT_high_pc``, ``DW_AT_ranges`` or ``DW_AT_entry_pc``. It also contains +``DW_TAG_variable`` DIEs that have a ``DW_OP_addr`` in the location (global and +static variables). All global and static variables should be included, +including those scoped within functions and classes. For example using the +following code: + +.. code-block:: c + + static int var = 0; + + void f () + { + static int var = 0; + } + +Both of the static ``var`` variables would be included in the table. All +functions should emit both their full names and their basenames. For C or C++, +the full name is the mangled name (if available) which is usually in the +``DW_AT_MIPS_linkage_name`` attribute, and the ``DW_AT_name`` contains the +function basename. If global or static variables have a mangled name in a +``DW_AT_MIPS_linkage_name`` attribute, this should be emitted along with the +simple name found in the ``DW_AT_name`` attribute. + +"``.apple_types``" sections should contain an entry for each DWARF DIE whose +tag is one of: + +* DW_TAG_array_type +* DW_TAG_class_type +* DW_TAG_enumeration_type +* DW_TAG_pointer_type +* DW_TAG_reference_type +* DW_TAG_string_type +* DW_TAG_structure_type +* DW_TAG_subroutine_type +* DW_TAG_typedef +* DW_TAG_union_type +* DW_TAG_ptr_to_member_type +* DW_TAG_set_type +* DW_TAG_subrange_type +* DW_TAG_base_type +* DW_TAG_const_type +* DW_TAG_file_type +* DW_TAG_namelist +* DW_TAG_packed_type +* DW_TAG_volatile_type +* DW_TAG_restrict_type +* DW_TAG_interface_type +* DW_TAG_unspecified_type +* DW_TAG_shared_type + +Only entries with a ``DW_AT_name`` attribute are included, and the entry must +not be a forward declaration (``DW_AT_declaration`` attribute with a non-zero +value). For example, using the following code: + +.. code-block:: c + + int main () + { + int *b = 0; + return *b; + } + +We get a few type DIEs: + +.. code-block:: none + + 0x00000067: TAG_base_type [5] + AT_encoding( DW_ATE_signed ) + AT_name( "int" ) + AT_byte_size( 0x04 ) + + 0x0000006e: TAG_pointer_type [6] + AT_type( {0x00000067} ( int ) ) + AT_byte_size( 0x08 ) + +The DW_TAG_pointer_type is not included because it does not have a ``DW_AT_name``. + +"``.apple_namespaces``" section should contain all ``DW_TAG_namespace`` DIEs. +If we run into a namespace that has no name this is an anonymous namespace, and +the name should be output as "``(anonymous namespace)``" (without the quotes). +Why? This matches the output of the ``abi::cxa_demangle()`` that is in the +standard C++ library that demangles mangled names. + + +Language Extensions and File Format Changes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Objective-C Extensions +"""""""""""""""""""""" + +"``.apple_objc``" section should contain all ``DW_TAG_subprogram`` DIEs for an +Objective-C class. The name used in the hash table is the name of the +Objective-C class itself. If the Objective-C class has a category, then an +entry is made for both the class name without the category, and for the class +name with the category. So if we have a DIE at offset 0x1234 with a name of +method "``-[NSString(my_additions) stringWithSpecialString:]``", we would add +an entry for "``NSString``" that points to DIE 0x1234, and an entry for +"``NSString(my_additions)``" that points to 0x1234. This allows us to quickly +track down all Objective-C methods for an Objective-C class when doing +expressions. It is needed because of the dynamic nature of Objective-C where +anyone can add methods to a class. The DWARF for Objective-C methods is also +emitted differently from C++ classes where the methods are not usually +contained in the class definition, they are scattered about across one or more +compile units. Categories can also be defined in different shared libraries. +So we need to be able to quickly find all of the methods and class functions +given the Objective-C class name, or quickly find all methods and class +functions for a class + category name. This table does not contain any +selector names, it just maps Objective-C class names (or class names + +category) to all of the methods and class functions. The selectors are added +as function basenames in the "``.debug_names``" section. + +In the "``.apple_names``" section for Objective-C functions, the full name is +the entire function name with the brackets ("``-[NSString +stringWithCString:]``") and the basename is the selector only +("``stringWithCString:``"). + +Mach-O Changes +"""""""""""""" + +The sections names for the apple hash tables are for non-mach-o files. For +mach-o files, the sections should be contained in the ``__DWARF`` segment with +names as follows: + +* "``.apple_names``" -> "``__apple_names``" +* "``.apple_types``" -> "``__apple_types``" +* "``.apple_namespaces``" -> "``__apple_namespac``" (16 character limit) +* "``.apple_objc``" -> "``__apple_objc``" + |
