path: root/Documentation/networking/filter.txt (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2019-11-15bpf, doc: Change right arguments for JIT example codeMao Wenan1-4/+4
The example code for the x86_64 JIT uses the wrong arguments when calling function bar(). Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191114034351.162740-1-maowenan@huawei.com
2019-03-02docs/bpf: minor casing/punctuation fixesAndrii Nakryiko1-1/+1
Fix few casing and punctuation glitches. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-05bpf, doc: add RISC-V JIT to BPF documentationBjörn Töpel1-7/+9
Update Documentation/networking/filter.txt and Documentation/sysctl/net.txt to mention RISC-V. Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-01-26bpf: allocate 0x06 to new eBPF instruction class JMP32Jiong Wang1-7/+8
The new eBPF instruction class JMP32 uses the reserved class number 0x6. Kernel BPF ISA documentation updated accordingly. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-08bpf, doc: Document Jump X addressing modeArthur Fabre1-14/+16
bpf_asm and the other classic BPF tools support jump conditions comparing register A to register X, in addition to comparing register A with constant K. Only the latter was documented in filter.txt, add two new addressing modes that describe the former. Signed-off-by: Arthur Fabre <arthur@arthurfabre.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-03Documentation: Describe bpf reference trackingJoe Stringer1-0/+64
Document the new pointer types in the verifier and how the pointer ID tracking works to ensure that references which are taken are later released. Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-11bpf, doc: clarification for the meaning of 'id'Wang YanQing1-6/+9
For me, as a reader whose mother language isn't English, the old words bring a little difficulty to catch the meaning, this patch rewords the subsection in a more clarificatory way. This patch also add blank lines as separator at two places to improve readability. Signed-off-by: Wang YanQing <udknight@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-27bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ONLeo Yan1-0/+6
When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2 for JIT opcode dumping; this patch is to update the doc for it. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-15filter.txt: update 'tools/net/' to 'tools/bpf/'Wang Sheng-Hui1-3/+3
The tools are located at tootls/bpf/ instead of tools/net/. Update the filter.txt doc. Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24bpf, doc: Correct one wrong value in "Register value tracking"Wang YanQing1-1/+1
If we then OR this with 0x40, then the value of 6th bit (0th is first bit) become known, so the right mask is 0xbf instead of 0xcf. Signed-off-by: Wang YanQing <udknight@gmail.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-09-16Documentation: link in networking docsPavel Machek1-1/+1
Fix link in filter.txt. Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-23bpf, doc: Add arm32 as arch supporting eBPF JITShubham Bansal1-2/+2
As eBPF JIT support for arm32 was added recently with commit 39c13c204bb1150d401e27d41a9d8b332be47c49, it seems appropriate to add arm32 as arch with support for eBPF JIT in bpf and sysctl docs as well. Signed-off-by: Shubham Bansal <illusionist.neo@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-14seccomp: Rename SECCOMP_RET_KILL to SECCOMP_RET_KILL_THREADKees Cook1-1/+1
In preparation for adding SECCOMP_RET_KILL_PROCESS, rename SECCOMP_RET_KILL to the more accurate SECCOMP_RET_KILL_THREAD. The existing selftest values are intentionally left as SECCOMP_RET_KILL just to be sure we're exercising the alias. Signed-off-by: Kees Cook <keescook@chromium.org>
2017-08-09bpf: add BPF_J{LT,LE,SLT,SLE} instructionsDaniel Borkmann1-0/+4
Currently, eBPF only understands BPF_JGT (>), BPF_JGE (>=), BPF_JSGT (s>), BPF_JSGE (s>=) instructions, this means that particularly *JLT/*JLE counterparts involving immediates need to be rewritten from e.g. X < [IMM] by swapping arguments into [IMM] > X, meaning the immediate first is required to be loaded into a register Y := [IMM], such that then we can compare with Y > X. Note that the destination operand is always required to be a register. This has the downside of having unnecessarily increased register pressure, meaning complex program would need to spill other registers temporarily to stack in order to obtain an unused register for the [IMM]. Loading to registers will thus also affect state pruning since we need to account for that register use and potentially those registers that had to be spilled/filled again. As a consequence slightly more stack space might have been used due to spilling, and BPF programs are a bit longer due to extra code involving the register load and potentially required spill/fills. Thus, add BPF_JLT (<), BPF_JLE (<=), BPF_JSLT (s<), BPF_JSLE (s<=) counterparts to the eBPF instruction set. Modifying LLVM to remove the NegateCC() workaround in a PoC patch at [1] and allowing it to also emit the new instructions resulted in cilium's BPF programs that are injected into the fast-path to have a reduced program length in the range of 2-3% (e.g. accumulated main and tail call sections from one of the object file reduced from 4864 to 4729 insns), reduced complexity in the range of 10-30% (e.g. accumulated sections reduced in one of the cases from 116432 to 88428 insns), and reduced stack usage in the range of 1-5% (e.g. accumulated sections from one of the object files reduced from 824 to 784b). The modification for LLVM will be incorporated in a backwards compatible way. Plan is for LLVM to have i) a target specific option to offer a possibility to explicitly enable the extension by the user (as we have with -m target specific extensions today for various CPU insns), and ii) have the kernel checked for presence of the extensions and enable them transparently when the user is selecting more aggressive options such as -march=native in a bpf target context. (Other frontends generating BPF byte code, e.g. ply can probe the kernel directly for its code generation.) [1] https://github.com/borkmann/llvm/tree/bpf-insns Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-08Documentation: describe the new eBPF verifier value tracking behaviourEdward Cree1-18/+104
Also bring the eBPF documentation up to date in other ways. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-23bpf, doc: update list of architectures that do eBPF JITAlexei Starovoitov1-4/+3
update the list and remove 'in the future' statement, since all still alive 64-bit architectures now do eBPF JIT. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16bpf, doc: fix typo on bpf_asm descriptionsDaniel Borkmann1-8/+8
Fix description of some of the bpf_asm tool related jump instructions and generally move them to format A <op> k. Reported-by: Sebastian Amend <sebastian.amend@googlemail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06bpf: add documentation for 'direct packet access'Alexei Starovoitov1-2/+83
explain how verifier checks safety of packet access and update email addresses. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-31bpf: doc: "neg" opcode has no operandsDave Anderson1-1/+1
Fixes a copy-paste-o in the BPF opcode table: "neg" takes no arguments and thus has no addressing modes. Signed-off-by: Dave Anderson <danderson@google.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2015-11-08bpf: doc: correct arch list for supported eBPF JITYang Shi1-3/+3
aarch64 and s390x support eBPF JIT too, correct document to reflect this and avoid any confusion. Signed-off-by: Yang Shi <yang.shi@linaro.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24filter: introduce SKF_AD_VLAN_TPID BPF extensionMichal Sekletar1-1/+2
If vlan offloading takes place then vlan header is removed from frame and its contents, both vlan_tci and vlan_proto, is available to user space via TPACKET interface. However, only vlan_tci can be used in BPF filters. This commit introduces a new BPF extension. It makes possible to load the value of vlan_proto (vlan TPID) to register A. Support for classic BPF and eBPF is being added, analogous to skb->protocol. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Jiri Pirko <jpirko@redhat.com> Signed-off-by: Michal Sekletar <msekleta@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13net: rename vlan_tx_* helpers since "tx" is misleading thereJiri Pirko1-2/+2
The same macros are used for rx as well. So rename it. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-10Documentation: replace __sk_run_filter with __bpf_prog_runLi RongQing1-2/+2
__sk_run_filter has been renamed as __bpf_prog_run, so replace them in comments Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26bpf: verifier (add docs)Alexei Starovoitov1-0/+224
this patch adds all of eBPF verfier documentation and empty bpf_check() The end goal for the verifier is to statically check safety of the program. Verifier will catch: - loops - out of range jumps - unreachable instructions - invalid instructions - uninitialized register access - uninitialized stack access - misaligned stack access - out of range stack access - invalid calling convention More details in Documentation/networking/filter.txt Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26bpf: introduce BPF syscall and mapsAlexei Starovoitov1-0/+39
BPF syscall is a multiplexor for a range of different operations on eBPF. This patch introduces syscall with single command to create a map. Next patch adds commands to access maps. 'maps' is a generic storage of different types for sharing data between kernel and userspace. Userspace example: /* this syscall wrapper creates a map with given type and attributes * and returns map_fd on success. * use close(map_fd) to delete the map */ int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size, int max_entries) { union bpf_attr attr = { .map_type = map_type, .key_size = key_size, .value_size = value_size, .max_entries = max_entries }; return bpf(BPF_MAP_CREATE, &attr, sizeof(attr)); } 'union bpf_attr' is backwards compatible with future extensions. More details in Documentation/networking/filter.txt and in manpage Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-10Documentation: filter: Add MIPS to architectures with BPF JITMarkos Chandras1-3/+3
MIPS supports BPF JIT since v3.16-rc1 Cc: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Daniel Borkmann <dborkman@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Markos Chandras <markos.chandras@imgtec.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09net: filter: add "load 64-bit immediate" eBPF instructionAlexei Starovoitov1-1/+7
add BPF_LD_IMM64 instruction to load 64-bit immediate value into a register. All previous instructions were 8-byte. This is first 16-byte instruction. Two consecutive 'struct bpf_insn' blocks are interpreted as single instruction: insn[0].code = BPF_LD | BPF_DW | BPF_IMM insn[0].dst_reg = destination register insn[0].imm = lower 32-bit insn[1].code = 0 insn[1].imm = upper 32-bit All unused fields must be zero. Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM which loads 32-bit immediate value into a register. x64 JITs it as single 'movabsq %rax, imm64' arm64 may JIT as sequence of four 'movk x0, #imm16, lsl #shift' insn Note that old eBPF programs are binary compatible with new interpreter. It helps eBPF programs load 64-bit constant into a register with one instruction instead of using two registers and 4 instructions: BPF_MOV32_IMM(R1, imm32) BPF_ALU64_IMM(BPF_LSH, R1, 32) BPF_MOV32_IMM(R2, imm32) BPF_ALU64_REG(BPF_OR, R1, R2) User space generated programs will use this instruction to load constants only. To tell kernel that user space needs a pointer the _pseudo_ variant of this instruction may be added later, which will use extra bits of encoding to indicate what type of pointer user space is asking kernel to provide. For example 'off' or 'src_reg' fields can be used for such purpose. src_reg = 1 could mean that user space is asking kernel to validate and load in-kernel map pointer. src_reg = 2 could mean that user space needs readonly data section pointer src_reg = 3 could mean that user space needs a pointer to per-cpu local data All such future pseudo instructions will not be carrying the actual pointer as part of the instruction, but rather will be treated as a request to kernel to provide one. The kernel will verify the request_for_a_pointer, then will drop _pseudo_ marking and will store actual internal pointer inside the instruction, so the end result is the interpreter and JITs never see pseudo BPF_LD_IMM64 insns and only operate on generic BPF_LD_IMM64 that loads 64-bit immediate into a register. User space never operates on direct pointers and verifier can easily recognize request_for_pointer vs other instructions. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-08arm64: eBPF JIT compilerZi Shen Lim1-3/+3
The JIT compiler emits A64 instructions. It supports eBPF only. Legacy BPF is supported thanks to conversion by BPF core. JIT is enabled in the same way as for other architectures: echo 1 > /proc/sys/net/core/bpf_jit_enable Or for additional compiler output: echo 2 > /proc/sys/net/core/bpf_jit_enable See Documentation/networking/filter.txt for more information. The implementation passes all 57 tests in lib/test_bpf.c on ARMv8 Foundation Model :) Also tested by Will on Juno platform. Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2014-08-02net: filter: split 'struct sk_filter' into socket and bpf partsAlexei Starovoitov1-5/+5
clean up names related to socket filtering and bpf in the following way: - everything that deals with sockets keeps 'sk_*' prefix - everything that is pure BPF is changed to 'bpf_*' prefix split 'struct sk_filter' into struct sk_filter { atomic_t refcnt; struct rcu_head rcu; struct bpf_prog *prog; }; and struct bpf_prog { u32 jited:1, len:31; struct sock_fprog_kern *orig_prog; unsigned int (*bpf_func)(const struct sk_buff *skb, const struct bpf_insn *filter); union { struct sock_filter insns[0]; struct bpf_insn insnsi[0]; struct work_struct work; }; }; so that 'struct bpf_prog' can be used independent of sockets and cleans up 'unattached' bpf use cases split SK_RUN_FILTER macro into: SK_RUN_FILTER to be used with 'struct sk_filter *' and BPF_PROG_RUN to be used with 'struct bpf_prog *' __sk_filter_release(struct sk_filter *) gains __bpf_prog_release(struct bpf_prog *) helper function also perform related renames for the functions that work with 'struct bpf_prog *', since they're on the same lines: sk_filter_size -> bpf_prog_size sk_filter_select_runtime -> bpf_prog_select_runtime sk_filter_free -> bpf_prog_free sk_unattached_filter_create -> bpf_prog_create sk_unattached_filter_destroy -> bpf_prog_destroy sk_store_orig_filter -> bpf_prog_store_orig_filter sk_release_orig_filter -> bpf_release_orig_filter __sk_migrate_filter -> bpf_migrate_filter __sk_prepare_filter -> bpf_prepare_filter API for attaching classic BPF to a socket stays the same: sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *) and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program which is used by sockets, tun, af_packet API for 'unattached' BPF programs becomes: bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *) and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02net: filter: rename sk_chk_filter() -> bpf_check_classic()Alexei Starovoitov1-1/+1
trivial rename to indicate that this functions performs classic BPF checking Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net: filter: document internal instruction encodingAlexei Starovoitov1-0/+161
This patch adds a description of eBPFs instruction encoding in order to bring the documentation in line with the implementation. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net: filter: mention eBPF terminology as wellAlexei Starovoitov1-42/+43
Since the term eBPF is used anyway on mailing list discussions, lets also document that in the main BPF documentation file and replace a couple of occurrences with eBPF terminology to be more clear. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net: filter: cleanup A/X name usageAlexei Starovoitov1-1/+1
The macro 'A' used in internal BPF interpreter: #define A regs[insn->a_reg] was easily confused with the name of classic BPF register 'A', since 'A' would mean two different things depending on context. This patch is trying to clean up the naming and clarify its usage in the following way: - A and X are names of two classic BPF registers - BPF_REG_A denotes internal BPF register R0 used to map classic register A in internal BPF programs generated from classic - BPF_REG_X denotes internal BPF register R7 used to map classic register X in internal BPF programs generated from classic - internal BPF instruction format: struct sock_filter_int { __u8 code; /* opcode */ __u8 dst_reg:4; /* dest register */ __u8 src_reg:4; /* source register */ __s16 off; /* signed offset */ __s32 imm; /* signed immediate constant */ }; - BPF_X/BPF_K is 1 bit used to encode source operand of instruction In classic: BPF_X - means use register X as source operand BPF_K - means use 32-bit immediate as source operand In internal: BPF_X - means use 'src_reg' register as source operand BPF_K - means use 32-bit immediate as source operand Suggested-by: Chema Gonzalez <chema@google.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Chema Gonzalez <chema@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-23net: filter: doc: add section for BPF test suiteDaniel Borkmann1-0/+14
Mention the recently added test suite in the documentation file. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22net: doc: Update references to skb->rxhashTobias Klauser1-1/+1
In commit 61b905da33 ("net: Rename skb->rxhash to skb->hash"), skb->rxhash was renamed to skb->hash. Update references in Documentation accordingly. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-04net: filter: doc: expand and improve BPF documentationAlexei Starovoitov1-4/+154
In particular, this patch tries to clarify internal BPF calling convention and adds internal BPF examples, JIT guide, use cases. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-22filter: added BPF random opcodeChema Gonzalez1-0/+13
Added a new ancillary load (bpf call in eBPF parlance) that produces a 32-bit random number. We are implementing it as an ancillary load (instead of an ISA opcode) because (a) it is simpler, (b) allows easy JITing, and (c) seems more in line with generic ISAs that do not have "get a random number" as a instruction, but as an OS call. The main use for this ancillary load is to perform random packet sampling. Signed-off-by: Chema Gonzalez <chema@google.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31doc: filter: extend BPF documentation to document new internalsAlexei Starovoitov1-0/+125
Further extend the current BPF documentation to document new BPF engine internals. Joint work with Daniel Borkmann. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11filter: doc: improve BPF documentationDaniel Borkmann1-47/+561
This patch significantly updates the BPF documentation and describes its internal architecture, Linux extensions, and handling of the kernel's BPF and JIT engine, plus documents how development can be facilitated with the help of bpf_dbg, bpf_asm, bpf_jit_disasm. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-17sk-filter: Add ability to lock a socket filter programVincent Bernat1-2/+9
While a privileged program can open a raw socket, attach some restrictive filter and drop its privileges (or send the socket to an unprivileged program through some Unix socket), the filter can still be removed or modified by the unprivileged program. This commit adds a socket option to lock the filter (SO_LOCK_FILTER) preventing any modification of a socket filter program. This is similar to OpenBSD BIOCLOCK ioctl on bpf sockets, except even root is not allowed change/drop the filter. The state of the lock can be read with getsockopt(). No error is triggered if the state is not changed. -EPERM is returned when a user tries to remove the lock or to change/remove the filter while the lock is active. The check is done directly in sk_attach_filter() and sk_detach_filter() and does not affect only setsockopt() syscall. Signed-off-by: Vincent Bernat <bernat@luffy.cx> Signed-off-by: David S. Miller <davem@davemloft.net>