aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/networking/bpf_flow_dissector.rst
diff options
context:
space:
mode:
authorOlof Johansson <olof@lixom.net>2019-05-15 22:51:48 -0700
committerOlof Johansson <olof@lixom.net>2019-05-15 22:51:48 -0700
commit7a0c4c17089a8aff52f516f0f52002be52950aae (patch)
tree474dc451d62a1ba788ed52c4cecf93f9e50cc39a /Documentation/networking/bpf_flow_dissector.rst
parentMerge tag 'v5.2-rockchip-soc32-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/soc (diff)
parentMerge tag 'davinci-fixes-for-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci into fixes (diff)
downloadlinux-dev-7a0c4c17089a8aff52f516f0f52002be52950aae.tar.xz
linux-dev-7a0c4c17089a8aff52f516f0f52002be52950aae.zip
Merge branch 'fixes' into arm/soc
Merge in a few pending fixes from pre-5.1 that didn't get sent in: MAINTAINERS: update arch/arm/mach-davinci ARM: dts: ls1021: Fix SGMII PCS link remaining down after PHY disconnect ARM: dts: imx6q-logicpd: Reduce inrush current on USBH1 ARM: dts: imx6q-logicpd: Reduce inrush current on start ARM: dts: imx: Fix the AR803X phy-mode ARM: dts: sun8i: a33: Reintroduce default pinctrl muxing arm64: dts: allwinner: a64: Rename hpvcc-supply to cpvdd-supply ARM: sunxi: fix a leaked reference by adding missing of_node_put ARM: sunxi: fix a leaked reference by adding missing of_node_put Signed-off-by: Olof Johansson <olof@lixom.net>
Diffstat (limited to 'Documentation/networking/bpf_flow_dissector.rst')
-rw-r--r--Documentation/networking/bpf_flow_dissector.rst126
1 files changed, 126 insertions, 0 deletions
diff --git a/Documentation/networking/bpf_flow_dissector.rst b/Documentation/networking/bpf_flow_dissector.rst
new file mode 100644
index 000000000000..b375ae2ec2c4
--- /dev/null
+++ b/Documentation/networking/bpf_flow_dissector.rst
@@ -0,0 +1,126 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================
+BPF Flow Dissector
+==================
+
+Overview
+========
+
+Flow dissector is a routine that parses metadata out of the packets. It's
+used in the various places in the networking subsystem (RFS, flow hash, etc).
+
+BPF flow dissector is an attempt to reimplement C-based flow dissector logic
+in BPF to gain all the benefits of BPF verifier (namely, limits on the
+number of instructions and tail calls).
+
+API
+===
+
+BPF flow dissector programs operate on an ``__sk_buff``. However, only the
+limited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``.
+``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input
+and output arguments.
+
+The inputs are:
+ * ``nhoff`` - initial offset of the networking header
+ * ``thoff`` - initial offset of the transport header, initialized to nhoff
+ * ``n_proto`` - L3 protocol type, parsed out of L2 header
+
+Flow dissector BPF program should fill out the rest of the ``struct
+bpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be
+also adjusted accordingly.
+
+The return code of the BPF program is either BPF_OK to indicate successful
+dissection, or BPF_DROP to indicate parsing error.
+
+__sk_buff->data
+===============
+
+In the VLAN-less case, this is what the initial state of the BPF flow
+dissector looks like::
+
+ +------+------+------------+-----------+
+ | DMAC | SMAC | ETHER_TYPE | L3_HEADER |
+ +------+------+------------+-----------+
+ ^
+ |
+ +-- flow dissector starts here
+
+
+.. code:: c
+
+ skb->data + flow_keys->nhoff point to the first byte of L3_HEADER
+ flow_keys->thoff = nhoff
+ flow_keys->n_proto = ETHER_TYPE
+
+In case of VLAN, flow dissector can be called with the two different states.
+
+Pre-VLAN parsing::
+
+ +------+------+------+-----+-----------+-----------+
+ | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
+ +------+------+------+-----+-----------+-----------+
+ ^
+ |
+ +-- flow dissector starts here
+
+.. code:: c
+
+ skb->data + flow_keys->nhoff point the to first byte of TCI
+ flow_keys->thoff = nhoff
+ flow_keys->n_proto = TPID
+
+Please note that TPID can be 802.1AD and, hence, BPF program would
+have to parse VLAN information twice for double tagged packets.
+
+
+Post-VLAN parsing::
+
+ +------+------+------+-----+-----------+-----------+
+ | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
+ +------+------+------+-----+-----------+-----------+
+ ^
+ |
+ +-- flow dissector starts here
+
+.. code:: c
+
+ skb->data + flow_keys->nhoff point the to first byte of L3_HEADER
+ flow_keys->thoff = nhoff
+ flow_keys->n_proto = ETHER_TYPE
+
+In this case VLAN information has been processed before the flow dissector
+and BPF flow dissector is not required to handle it.
+
+
+The takeaway here is as follows: BPF flow dissector program can be called with
+the optional VLAN header and should gracefully handle both cases: when single
+or double VLAN is present and when it is not present. The same program
+can be called for both cases and would have to be written carefully to
+handle both cases.
+
+
+Reference Implementation
+========================
+
+See ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference
+implementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]``
+for the loader. bpftool can be used to load BPF flow dissector program as well.
+
+The reference implementation is organized as follows:
+ * ``jmp_table`` map that contains sub-programs for each supported L3 protocol
+ * ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and
+ does ``bpf_tail_call`` to the appropriate L3 handler
+
+Since BPF at this point doesn't support looping (or any jumping back),
+jmp_table is used instead to handle multiple levels of encapsulation (and
+IPv6 options).
+
+
+Current Limitations
+===================
+BPF flow dissector doesn't support exporting all the metadata that in-kernel
+C-based implementation can export. Notable example is single VLAN (802.1Q)
+and double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys``
+for a set of information that's currently can be exported from the BPF context.