diff options
Diffstat (limited to 'Documentation/bpf/standardization/instruction-set.rst')
-rw-r--r-- | Documentation/bpf/standardization/instruction-set.rst | 790 |
1 files changed, 790 insertions, 0 deletions
diff --git a/Documentation/bpf/standardization/instruction-set.rst b/Documentation/bpf/standardization/instruction-set.rst new file mode 100644 index 000000000000..fbe975585236 --- /dev/null +++ b/Documentation/bpf/standardization/instruction-set.rst @@ -0,0 +1,790 @@ +.. contents:: +.. sectnum:: + +====================================== +BPF Instruction Set Architecture (ISA) +====================================== + +eBPF, also commonly +referred to as BPF, is a technology with origins in the Linux kernel +that can run untrusted programs in a privileged context such as an +operating system kernel. This document specifies the BPF instruction +set architecture (ISA). + +As a historical note, BPF originally stood for Berkeley Packet Filter, +but now that it can do so much more than packet filtering, the acronym +no longer makes sense. BPF is now considered a standalone term that +does not stand for anything. The original BPF is sometimes referred to +as cBPF (classic BPF) to distinguish it from the now widely deployed +eBPF (extended BPF). + +Documentation conventions +========================= + +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", +"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and +"OPTIONAL" in this document are to be interpreted as described in +BCP 14 `<https://www.rfc-editor.org/info/rfc2119>`_ +`<https://www.rfc-editor.org/info/rfc8174>`_ +when, and only when, they appear in all capitals, as shown here. + +For brevity and consistency, this document refers to families +of types using a shorthand syntax and refers to several expository, +mnemonic functions when describing the semantics of instructions. +The range of valid values for those types and the semantics of those +functions are defined in the following subsections. + +Types +----- +This document refers to integer types with the notation `SN` to specify +a type's signedness (`S`) and bit width (`N`), respectively. + +.. table:: Meaning of signedness notation + + ==== ========= + S Meaning + ==== ========= + u unsigned + s signed + ==== ========= + +.. table:: Meaning of bit-width notation + + ===== ========= + N Bit width + ===== ========= + 8 8 bits + 16 16 bits + 32 32 bits + 64 64 bits + 128 128 bits + ===== ========= + +For example, `u32` is a type whose valid values are all the 32-bit unsigned +numbers and `s16` is a type whose valid values are all the 16-bit signed +numbers. + +Functions +--------- + +The following byteswap functions are direction-agnostic. That is, +the same function is used for conversion in either direction discussed +below. + +* be16: Takes an unsigned 16-bit number and converts it between + host byte order and big-endian + (`IEN137 <https://www.rfc-editor.org/ien/ien137.txt>`_) byte order. +* be32: Takes an unsigned 32-bit number and converts it between + host byte order and big-endian byte order. +* be64: Takes an unsigned 64-bit number and converts it between + host byte order and big-endian byte order. +* bswap16: Takes an unsigned 16-bit number in either big- or little-endian + format and returns the equivalent number with the same bit width but + opposite endianness. +* bswap32: Takes an unsigned 32-bit number in either big- or little-endian + format and returns the equivalent number with the same bit width but + opposite endianness. +* bswap64: Takes an unsigned 64-bit number in either big- or little-endian + format and returns the equivalent number with the same bit width but + opposite endianness. +* le16: Takes an unsigned 16-bit number and converts it between + host byte order and little-endian byte order. +* le32: Takes an unsigned 32-bit number and converts it between + host byte order and little-endian byte order. +* le64: Takes an unsigned 64-bit number and converts it between + host byte order and little-endian byte order. + +Definitions +----------- + +.. glossary:: + + Sign Extend + To `sign extend an` ``X`` `-bit number, A, to a` ``Y`` `-bit number, B ,` means to + + #. Copy all ``X`` bits from `A` to the lower ``X`` bits of `B`. + #. Set the value of the remaining ``Y`` - ``X`` bits of `B` to the value of + the most-significant bit of `A`. + +.. admonition:: Example + + Sign extend an 8-bit number ``A`` to a 16-bit number ``B`` on a big-endian platform: + :: + + A: 10000110 + B: 11111111 10000110 + +Conformance groups +------------------ + +An implementation does not need to support all instructions specified in this +document (e.g., deprecated instructions). Instead, a number of conformance +groups are specified. An implementation MUST support the base32 conformance +group and MAY support additional conformance groups, where supporting a +conformance group means it MUST support all instructions in that conformance +group. + +The use of named conformance groups enables interoperability between a runtime +that executes instructions, and tools such as compilers that generate +instructions for the runtime. Thus, capability discovery in terms of +conformance groups might be done manually by users or automatically by tools. + +Each conformance group has a short ASCII label (e.g., "base32") that +corresponds to a set of instructions that are mandatory. That is, each +instruction has one or more conformance groups of which it is a member. + +This document defines the following conformance groups: + +* base32: includes all instructions defined in this + specification unless otherwise noted. +* base64: includes base32, plus instructions explicitly noted + as being in the base64 conformance group. +* atomic32: includes 32-bit atomic operation instructions (see `Atomic operations`_). +* atomic64: includes atomic32, plus 64-bit atomic operation instructions. +* divmul32: includes 32-bit division, multiplication, and modulo instructions. +* divmul64: includes divmul32, plus 64-bit division, multiplication, + and modulo instructions. +* packet: deprecated packet access instructions. + +Instruction encoding +==================== + +BPF has two instruction encodings: + +* the basic instruction encoding, which uses 64 bits to encode an instruction +* the wide instruction encoding, which appends a second 64 bits + after the basic instruction for a total of 128 bits. + +Basic instruction encoding +-------------------------- + +A basic instruction is encoded as follows:: + + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | opcode | regs | offset | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | imm | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +**opcode** + operation to perform, encoded as follows:: + + +-+-+-+-+-+-+-+-+ + |specific |class| + +-+-+-+-+-+-+-+-+ + + **specific** + The format of these bits varies by instruction class + + **class** + The instruction class (see `Instruction classes`_) + +**regs** + The source and destination register numbers, encoded as follows + on a little-endian host:: + + +-+-+-+-+-+-+-+-+ + |src_reg|dst_reg| + +-+-+-+-+-+-+-+-+ + + and as follows on a big-endian host:: + + +-+-+-+-+-+-+-+-+ + |dst_reg|src_reg| + +-+-+-+-+-+-+-+-+ + + **src_reg** + the source register number (0-10), except where otherwise specified + (`64-bit immediate instructions`_ reuse this field for other purposes) + + **dst_reg** + destination register number (0-10), unless otherwise specified + (future instructions might reuse this field for other purposes) + +**offset** + signed integer offset used with pointer arithmetic, except where + otherwise specified (some arithmetic instructions reuse this field + for other purposes) + +**imm** + signed integer immediate value + +Note that the contents of multi-byte fields ('offset' and 'imm') are +stored using big-endian byte ordering on big-endian hosts and +little-endian byte ordering on little-endian hosts. + +For example:: + + opcode offset imm assembly + src_reg dst_reg + 07 0 1 00 00 44 33 22 11 r1 += 0x11223344 // little + dst_reg src_reg + 07 1 0 00 00 11 22 33 44 r1 += 0x11223344 // big + +Note that most instructions do not use all of the fields. +Unused fields SHALL be cleared to zero. + +Wide instruction encoding +-------------------------- + +Some instructions are defined to use the wide instruction encoding, +which uses two 32-bit immediate values. The 64 bits following +the basic instruction format contain a pseudo instruction +with 'opcode', 'dst_reg', 'src_reg', and 'offset' all set to zero. + +This is depicted in the following figure:: + + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | opcode | regs | offset | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | imm | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | reserved | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | next_imm | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +**opcode** + operation to perform, encoded as explained above + +**regs** + The source and destination register numbers (unless otherwise + specified), encoded as explained above + +**offset** + signed integer offset used with pointer arithmetic, unless + otherwise specified + +**imm** + signed integer immediate value + +**reserved** + unused, set to zero + +**next_imm** + second signed integer immediate value + +Instruction classes +------------------- + +The three least significant bits of the 'opcode' field store the instruction class: + +.. table:: Instruction class + + ===== ===== =============================== =================================== + class value description reference + ===== ===== =============================== =================================== + LD 0x0 non-standard load operations `Load and store instructions`_ + LDX 0x1 load into register operations `Load and store instructions`_ + ST 0x2 store from immediate operations `Load and store instructions`_ + STX 0x3 store from register operations `Load and store instructions`_ + ALU 0x4 32-bit arithmetic operations `Arithmetic and jump instructions`_ + JMP 0x5 64-bit jump operations `Arithmetic and jump instructions`_ + JMP32 0x6 32-bit jump operations `Arithmetic and jump instructions`_ + ALU64 0x7 64-bit arithmetic operations `Arithmetic and jump instructions`_ + ===== ===== =============================== =================================== + +Arithmetic and jump instructions +================================ + +For arithmetic and jump instructions (``ALU``, ``ALU64``, ``JMP`` and +``JMP32``), the 8-bit 'opcode' field is divided into three parts:: + + +-+-+-+-+-+-+-+-+ + | code |s|class| + +-+-+-+-+-+-+-+-+ + +**code** + the operation code, whose meaning varies by instruction class + +**s (source)** + the source operand location, which unless otherwise specified is one of: + + .. table:: Source operand location + + ====== ===== ============================================== + source value description + ====== ===== ============================================== + K 0 use 32-bit 'imm' value as source operand + X 1 use 'src_reg' register value as source operand + ====== ===== ============================================== + +**instruction class** + the instruction class (see `Instruction classes`_) + +Arithmetic instructions +----------------------- + +``ALU`` uses 32-bit wide operands while ``ALU64`` uses 64-bit wide operands for +otherwise identical operations. ``ALU64`` instructions belong to the +base64 conformance group unless noted otherwise. +The 'code' field encodes the operation as below, where 'src' refers to the +the source operand and 'dst' refers to the value of the destination +register. + +.. table:: Arithmetic instructions + + ===== ===== ======= =================================================================================== + name code offset description + ===== ===== ======= =================================================================================== + ADD 0x0 0 dst += src + SUB 0x1 0 dst -= src + MUL 0x2 0 dst \*= src + DIV 0x3 0 dst = (src != 0) ? (dst / src) : 0 + SDIV 0x3 1 dst = (src == 0) ? 0 : ((src == -1 && dst == LLONG_MIN) ? LLONG_MIN : (dst s/ src)) + OR 0x4 0 dst \|= src + AND 0x5 0 dst &= src + LSH 0x6 0 dst <<= (src & mask) + RSH 0x7 0 dst >>= (src & mask) + NEG 0x8 0 dst = -dst + MOD 0x9 0 dst = (src != 0) ? (dst % src) : dst + SMOD 0x9 1 dst = (src == 0) ? dst : ((src == -1 && dst == LLONG_MIN) ? 0: (dst s% src)) + XOR 0xa 0 dst ^= src + MOV 0xb 0 dst = src + MOVSX 0xb 8/16/32 dst = (s8,s16,s32)src + ARSH 0xc 0 :term:`sign extending<Sign Extend>` dst >>= (src & mask) + END 0xd 0 byte swap operations (see `Byte swap instructions`_ below) + ===== ===== ======= =================================================================================== + +Underflow and overflow are allowed during arithmetic operations, meaning +the 64-bit or 32-bit value will wrap. If BPF program execution would +result in division by zero, the destination register is instead set to zero. +Otherwise, for ``ALU64``, if execution would result in ``LLONG_MIN`` +dividing -1, the desination register is instead set to ``LLONG_MIN``. For +``ALU``, if execution would result in ``INT_MIN`` dividing -1, the +desination register is instead set to ``INT_MIN``. + +If execution would result in modulo by zero, for ``ALU64`` the value of +the destination register is unchanged whereas for ``ALU`` the upper +32 bits of the destination register are zeroed. Otherwise, for ``ALU64``, +if execution would resuslt in ``LLONG_MIN`` modulo -1, the destination +register is instead set to 0. For ``ALU``, if execution would result in +``INT_MIN`` modulo -1, the destination register is instead set to 0. + +``{ADD, X, ALU}``, where 'code' = ``ADD``, 'source' = ``X``, and 'class' = ``ALU``, means:: + + dst = (u32) ((u32) dst + (u32) src) + +where '(u32)' indicates that the upper 32 bits are zeroed. + +``{ADD, X, ALU64}`` means:: + + dst = dst + src + +``{XOR, K, ALU}`` means:: + + dst = (u32) dst ^ (u32) imm + +``{XOR, K, ALU64}`` means:: + + dst = dst ^ imm + +Note that most arithmetic instructions have 'offset' set to 0. Only three instructions +(``SDIV``, ``SMOD``, ``MOVSX``) have a non-zero 'offset'. + +Division, multiplication, and modulo operations for ``ALU`` are part +of the "divmul32" conformance group, and division, multiplication, and +modulo operations for ``ALU64`` are part of the "divmul64" conformance +group. +The division and modulo operations support both unsigned and signed flavors. + +For unsigned operations (``DIV`` and ``MOD``), for ``ALU``, +'imm' is interpreted as a 32-bit unsigned value. For ``ALU64``, +'imm' is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then +interpreted as a 64-bit unsigned value. + +For signed operations (``SDIV`` and ``SMOD``), for ``ALU``, +'imm' is interpreted as a 32-bit signed value. For ``ALU64``, 'imm' +is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then +interpreted as a 64-bit signed value. + +Note that there are varying definitions of the signed modulo operation +when the dividend or divisor are negative, where implementations often +vary by language such that Python, Ruby, etc. differ from C, Go, Java, +etc. This specification requires that signed modulo MUST use truncated division +(where -13 % 3 == -1) as implemented in C, Go, etc.:: + + a % n = a - n * trunc(a / n) + +The ``MOVSX`` instruction does a move operation with sign extension. +``{MOVSX, X, ALU}`` :term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into +32-bit operands, and zeroes the remaining upper 32 bits. +``{MOVSX, X, ALU64}`` :term:`sign extends<Sign Extend>` 8-bit, 16-bit, and 32-bit +operands into 64-bit operands. Unlike other arithmetic instructions, +``MOVSX`` is only defined for register source operands (``X``). + +``{MOV, K, ALU64}`` means:: + + dst = (s64)imm + +``{MOV, X, ALU}`` means:: + + dst = (u32)src + +``{MOVSX, X, ALU}`` with 'offset' 8 means:: + + dst = (u32)(s32)(s8)src + + +The ``NEG`` instruction is only defined when the source bit is clear +(``K``). + +Shift operations use a mask of 0x3F (63) for 64-bit operations and 0x1F (31) +for 32-bit operations. + +Byte swap instructions +---------------------- + +The byte swap instructions use instruction classes of ``ALU`` and ``ALU64`` +and a 4-bit 'code' field of ``END``. + +The byte swap instructions operate on the destination register +only and do not use a separate source register or immediate value. + +For ``ALU``, the 1-bit source operand field in the opcode is used to +select what byte order the operation converts from or to. For +``ALU64``, the 1-bit source operand field in the opcode is reserved +and MUST be set to 0. + +.. table:: Byte swap instructions + + ===== ======== ===== ================================================= + class source value description + ===== ======== ===== ================================================= + ALU LE 0 convert between host byte order and little endian + ALU BE 1 convert between host byte order and big endian + ALU64 Reserved 0 do byte swap unconditionally + ===== ======== ===== ================================================= + +The 'imm' field encodes the width of the swap operations. The following widths +are supported: 16, 32 and 64. Width 64 operations belong to the base64 +conformance group and other swap operations belong to the base32 +conformance group. + +Examples: + +``{END, LE, ALU}`` with 'imm' = 16/32/64 means:: + + dst = le16(dst) + dst = le32(dst) + dst = le64(dst) + +``{END, BE, ALU}`` with 'imm' = 16/32/64 means:: + + dst = be16(dst) + dst = be32(dst) + dst = be64(dst) + +``{END, TO, ALU64}`` with 'imm' = 16/32/64 means:: + + dst = bswap16(dst) + dst = bswap32(dst) + dst = bswap64(dst) + +Jump instructions +----------------- + +``JMP32`` uses 32-bit wide operands and indicates the base32 +conformance group, while ``JMP`` uses 64-bit wide operands for +otherwise identical operations, and indicates the base64 conformance +group unless otherwise specified. +The 'code' field encodes the operation as below: + +.. table:: Jump instructions + + ======== ===== ======= ================================= =================================================== + code value src_reg description notes + ======== ===== ======= ================================= =================================================== + JA 0x0 0x0 PC += offset {JA, K, JMP} only + JA 0x0 0x0 PC += imm {JA, K, JMP32} only + JEQ 0x1 any PC += offset if dst == src + JGT 0x2 any PC += offset if dst > src unsigned + JGE 0x3 any PC += offset if dst >= src unsigned + JSET 0x4 any PC += offset if dst & src + JNE 0x5 any PC += offset if dst != src + JSGT 0x6 any PC += offset if dst > src signed + JSGE 0x7 any PC += offset if dst >= src signed + CALL 0x8 0x0 call helper function by static ID {CALL, K, JMP} only, see `Helper functions`_ + CALL 0x8 0x1 call PC += imm {CALL, K, JMP} only, see `Program-local functions`_ + CALL 0x8 0x2 call helper function by BTF ID {CALL, K, JMP} only, see `Helper functions`_ + EXIT 0x9 0x0 return {CALL, K, JMP} only + JLT 0xa any PC += offset if dst < src unsigned + JLE 0xb any PC += offset if dst <= src unsigned + JSLT 0xc any PC += offset if dst < src signed + JSLE 0xd any PC += offset if dst <= src signed + ======== ===== ======= ================================= =================================================== + +where 'PC' denotes the program counter, and the offset to increment by +is in units of 64-bit instructions relative to the instruction following +the jump instruction. Thus 'PC += 1' skips execution of the next +instruction if it's a basic instruction or results in undefined behavior +if the next instruction is a 128-bit wide instruction. + +Example: + +``{JSGE, X, JMP32}`` means:: + + if (s32)dst s>= (s32)src goto +offset + +where 's>=' indicates a signed '>=' comparison. + +``{JLE, K, JMP}`` means:: + + if dst <= (u64)(s64)imm goto +offset + +``{JA, K, JMP32}`` means:: + + gotol +imm + +where 'imm' means the branch offset comes from the 'imm' field. + +Note that there are two flavors of ``JA`` instructions. The +``JMP`` class permits a 16-bit jump offset specified by the 'offset' +field, whereas the ``JMP32`` class permits a 32-bit jump offset +specified by the 'imm' field. A > 16-bit conditional jump may be +converted to a < 16-bit conditional jump plus a 32-bit unconditional +jump. + +All ``CALL`` and ``JA`` instructions belong to the +base32 conformance group. + +Helper functions +~~~~~~~~~~~~~~~~ + +Helper functions are a concept whereby BPF programs can call into a +set of function calls exposed by the underlying platform. + +Historically, each helper function was identified by a static ID +encoded in the 'imm' field. Further documentation of helper functions +is outside the scope of this document and standardization is left for +future work, but use is widely deployed and more information can be +found in platform-specific documentation (e.g., Linux kernel documentation). + +Platforms that support the BPF Type Format (BTF) support identifying +a helper function by a BTF ID encoded in the 'imm' field, where the BTF ID +identifies the helper name and type. Further documentation of BTF +is outside the scope of this document and standardization is left for +future work, but use is widely deployed and more information can be +found in platform-specific documentation (e.g., Linux kernel documentation). + +Program-local functions +~~~~~~~~~~~~~~~~~~~~~~~ +Program-local functions are functions exposed by the same BPF program as the +caller, and are referenced by offset from the instruction following the call +instruction, similar to ``JA``. The offset is encoded in the 'imm' field of +the call instruction. An ``EXIT`` within the program-local function will +return to the caller. + +Load and store instructions +=========================== + +For load and store instructions (``LD``, ``LDX``, ``ST``, and ``STX``), the +8-bit 'opcode' field is divided as follows:: + + +-+-+-+-+-+-+-+-+ + |mode |sz |class| + +-+-+-+-+-+-+-+-+ + +**mode** + The mode modifier is one of: + + .. table:: Mode modifier + + ============= ===== ==================================== ============= + mode modifier value description reference + ============= ===== ==================================== ============= + IMM 0 64-bit immediate instructions `64-bit immediate instructions`_ + ABS 1 legacy BPF packet access (absolute) `Legacy BPF Packet access instructions`_ + IND 2 legacy BPF packet access (indirect) `Legacy BPF Packet access instructions`_ + MEM 3 regular load and store operations `Regular load and store operations`_ + MEMSX 4 sign-extension load operations `Sign-extension load operations`_ + ATOMIC 6 atomic operations `Atomic operations`_ + ============= ===== ==================================== ============= + +**sz (size)** + The size modifier is one of: + + .. table:: Size modifier + + ==== ===== ===================== + size value description + ==== ===== ===================== + W 0 word (4 bytes) + H 1 half word (2 bytes) + B 2 byte + DW 3 double word (8 bytes) + ==== ===== ===================== + + Instructions using ``DW`` belong to the base64 conformance group. + +**class** + The instruction class (see `Instruction classes`_) + +Regular load and store operations +--------------------------------- + +The ``MEM`` mode modifier is used to encode regular load and store +instructions that transfer data between a register and memory. + +``{MEM, <size>, STX}`` means:: + + *(size *) (dst + offset) = src + +``{MEM, <size>, ST}`` means:: + + *(size *) (dst + offset) = imm + +``{MEM, <size>, LDX}`` means:: + + dst = *(unsigned size *) (src + offset) + +Where '<size>' is one of: ``B``, ``H``, ``W``, or ``DW``, and +'unsigned size' is one of: u8, u16, u32, or u64. + +Sign-extension load operations +------------------------------ + +The ``MEMSX`` mode modifier is used to encode :term:`sign-extension<Sign Extend>` load +instructions that transfer data between a register and memory. + +``{MEMSX, <size>, LDX}`` means:: + + dst = *(signed size *) (src + offset) + +Where '<size>' is one of: ``B``, ``H``, or ``W``, and +'signed size' is one of: s8, s16, or s32. + +Atomic operations +----------------- + +Atomic operations are operations that operate on memory and can not be +interrupted or corrupted by other access to the same memory region +by other BPF programs or means outside of this specification. + +All atomic operations supported by BPF are encoded as store operations +that use the ``ATOMIC`` mode modifier as follows: + +* ``{ATOMIC, W, STX}`` for 32-bit operations, which are + part of the "atomic32" conformance group. +* ``{ATOMIC, DW, STX}`` for 64-bit operations, which are + part of the "atomic64" conformance group. +* 8-bit and 16-bit wide atomic operations are not supported. + +The 'imm' field is used to encode the actual atomic operation. +Simple atomic operation use a subset of the values defined to encode +arithmetic operations in the 'imm' field to encode the atomic operation: + +.. table:: Simple atomic operations + + ======== ===== =========== + imm value description + ======== ===== =========== + ADD 0x00 atomic add + OR 0x40 atomic or + AND 0x50 atomic and + XOR 0xa0 atomic xor + ======== ===== =========== + + +``{ATOMIC, W, STX}`` with 'imm' = ADD means:: + + *(u32 *)(dst + offset) += src + +``{ATOMIC, DW, STX}`` with 'imm' = ADD means:: + + *(u64 *)(dst + offset) += src + +In addition to the simple atomic operations, there also is a modifier and +two complex atomic operations: + +.. table:: Complex atomic operations + + =========== ================ =========================== + imm value description + =========== ================ =========================== + FETCH 0x01 modifier: return old value + XCHG 0xe0 | FETCH atomic exchange + CMPXCHG 0xf0 | FETCH atomic compare and exchange + =========== ================ =========================== + +The ``FETCH`` modifier is optional for simple atomic operations, and +always set for the complex atomic operations. If the ``FETCH`` flag +is set, then the operation also overwrites ``src`` with the value that +was in memory before it was modified. + +The ``XCHG`` operation atomically exchanges ``src`` with the value +addressed by ``dst + offset``. + +The ``CMPXCHG`` operation atomically compares the value addressed by +``dst + offset`` with ``R0``. If they match, the value addressed by +``dst + offset`` is replaced with ``src``. In either case, the +value that was at ``dst + offset`` before the operation is zero-extended +and loaded back to ``R0``. + +64-bit immediate instructions +----------------------------- + +Instructions with the ``IMM`` 'mode' modifier use the wide instruction +encoding defined in `Instruction encoding`_, and use the 'src_reg' field of the +basic instruction to hold an opcode subtype. + +The following table defines a set of ``{IMM, DW, LD}`` instructions +with opcode subtypes in the 'src_reg' field, using new terms such as "map" +defined further below: + +.. table:: 64-bit immediate instructions + + ======= ========================================= =========== ============== + src_reg pseudocode imm type dst type + ======= ========================================= =========== ============== + 0x0 dst = (next_imm << 32) | imm integer integer + 0x1 dst = map_by_fd(imm) map fd map + 0x2 dst = map_val(map_by_fd(imm)) + next_imm map fd data address + 0x3 dst = var_addr(imm) variable id data address + 0x4 dst = code_addr(imm) integer code address + 0x5 dst = map_by_idx(imm) map index map + 0x6 dst = map_val(map_by_idx(imm)) + next_imm map index data address + ======= ========================================= =========== ============== + +where + +* map_by_fd(imm) means to convert a 32-bit file descriptor into an address of a map (see `Maps`_) +* map_by_idx(imm) means to convert a 32-bit index into an address of a map +* map_val(map) gets the address of the first value in a given map +* var_addr(imm) gets the address of a platform variable (see `Platform Variables`_) with a given id +* code_addr(imm) gets the address of the instruction at a specified relative offset in number of (64-bit) instructions +* the 'imm type' can be used by disassemblers for display +* the 'dst type' can be used for verification and JIT compilation purposes + +Maps +~~~~ + +Maps are shared memory regions accessible by BPF programs on some platforms. +A map can have various semantics as defined in a separate document, and may or +may not have a single contiguous memory region, but the 'map_val(map)' is +currently only defined for maps that do have a single contiguous memory region. + +Each map can have a file descriptor (fd) if supported by the platform, where +'map_by_fd(imm)' means to get the map with the specified file descriptor. Each +BPF program can also be defined to use a set of maps associated with the +program at load time, and 'map_by_idx(imm)' means to get the map with the given +index in the set associated with the BPF program containing the instruction. + +Platform Variables +~~~~~~~~~~~~~~~~~~ + +Platform variables are memory regions, identified by integer ids, exposed by +the runtime and accessible by BPF programs on some platforms. The +'var_addr(imm)' operation means to get the address of the memory region +identified by the given id. + +Legacy BPF Packet access instructions +------------------------------------- + +BPF previously introduced special instructions for access to packet data that were +carried over from classic BPF. These instructions used an instruction +class of ``LD``, a size modifier of ``W``, ``H``, or ``B``, and a +mode modifier of ``ABS`` or ``IND``. The 'dst_reg' and 'offset' fields were +set to zero, and 'src_reg' was set to zero for ``ABS``. However, these +instructions are deprecated and SHOULD no longer be used. All legacy packet +access instructions belong to the "packet" conformance group. |