linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-11-20	tools/bpf: sync kernel uapi bpf.h header to tools directory	Yonghong Song	1	-0/+13
	The kernel uapi bpf.h is synced to tools directory. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20	bpf: Introduce bpf_func_info	Yonghong Song	8	-8/+209
	This patch added interface to load a program with the following additional information: . prog_btf_fd . func_info, func_info_rec_size and func_info_cnt where func_info will provide function range and type_id corresponding to each function. The func_info_rec_size is introduced in the UAPI to specify struct bpf_func_info size passed from user space. This intends to make bpf_func_info structure growable in the future. If the kernel gets a different bpf_func_info size from userspace, it will try to handle user request with part of bpf_func_info it can understand. In this patch, kernel can understand struct bpf_func_info { __u32 insn_offset; __u32 type_id; }; If user passed a bpf func_info record size of 16 bytes, the kernel can still handle part of records with the above definition. If verifier agrees with function range provided by the user, the bpf_prog ksym for each function will use the func name provided in the type_id, which is supposed to provide better encoding as it is not limited by 16 bytes program name limitation and this is better for bpf program which contains multiple subprograms. The bpf_prog_info interface is also extended to return btf_id, func_info, func_info_rec_size and func_info_cnt to userspace, so userspace can print out the function prototype for each xlated function. The insn_offset in the returned func_info corresponds to the insn offset for xlated functions. With other jit related fields in bpf_prog_info, userspace can also print out function prototypes for each jited function. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20	tools/bpf: Add tests for BTF_KIND_FUNC_PROTO and BTF_KIND_FUNC	Martin KaFai Lau	2	-2/+476
	This patch adds unit tests for BTF_KIND_FUNC_PROTO and BTF_KIND_FUNC to test_btf. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20	tools/bpf: Sync kernel btf.h header	Martin KaFai Lau	1	-3/+15
	The kernel uapi btf.h is synced to the tools directory. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20	bpf: btf: Add BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO	Martin KaFai Lau	2	-53/+354
	This patch adds BTF_KIND_FUNC and BTF_KIND_FUNC_PROTO to support the function debug info. BTF_KIND_FUNC_PROTO must not have a name (i.e. !t->name_off) and it is followed by >= 0 'struct bpf_param' objects to describe the function arguments. The BTF_KIND_FUNC must have a valid name and it must refer back to a BTF_KIND_FUNC_PROTO. The above is the conclusion after the discussion between Edward Cree, Alexei, Daniel, Yonghong and Martin. By combining BTF_KIND_FUNC and BTF_LIND_FUNC_PROTO, a complete function signature can be obtained. It will be used in the later patches to learn the function signature of a running bpf program. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20	bpf: btf: Break up btf_type_is_void()	Martin KaFai Lau	1	-15/+22
	This patch breaks up btf_type_is_void() into btf_type_is_void() and btf_type_is_fwd(). It also adds btf_type_nosize() to better describe it is testing a type has nosize info. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-20	tools: add selftest for BPF_F_ZERO_SEED	Lorenz Bauer	1	-9/+55
	Check that iterating two separate hash maps produces the same order of keys if BPF_F_ZERO_SEED is used. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-20	tools: sync linux/bpf.h	Lorenz Bauer	1	-3/+10
	Synchronize changes to linux/bpf.h from * "bpf: allow zero-initializing hash map seed" * "bpf: move BPF_F_QUERY_EFFECTIVE after map flags" Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-20	bpf: move BPF_F_QUERY_EFFECTIVE after map flags	Lorenz Bauer	1	-3/+3
	BPF_F_QUERY_EFFECTIVE is in the middle of the flags valid for BPF_MAP_CREATE. Move it to its own section to reduce confusion. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-20	bpf: allow zero-initializing hash map seed	Lorenz Bauer	2	-2/+14
	Add a new flag BPF_F_ZERO_SEED, which forces a hash map to initialize the seed to zero. This is useful when doing performance analysis both on individual BPF programs, as well as the kernel's hash table implementation. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-20	bpf: libbpf: retry map creation without the name	Stanislav Fomichev	1	-1/+10
	Since commit 88cda1c9da02 ("bpf: libbpf: Provide basic API support to specify BPF obj name"), libbpf unconditionally sets bpf_attr->name for maps. Pre v4.14 kernels don't know about map names and return an error about unexpected non-zero data. Retry sys_bpf without a map name to cover older kernels. v2 changes: * check for errno == EINVAL as suggested by Daniel Borkmann Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-16	bpf: fix null pointer dereference on pointer offload	Colin Ian King	1	-2/+3
	Pointer offload is being null checked however the following statement dereferences the potentially null pointer offload when assigning offload->dev_state. Fix this by only assigning it if offload is not null. Detected by CoverityScan, CID#1475437 ("Dereference after null check") Fixes: 00db12c3d141 ("bpf: call verifier_prep from its callback in struct bpf_offload_dev") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-16	bpftool: make libbfd optional	Stanislav Fomichev	5	-6/+35
	Make it possible to build bpftool without libbfd. libbfd and libopcodes are typically provided in dev/dbg packages (binutils-dev in debian) which we usually don't have installed on the fleet machines and we'd like a way to have bpftool version that works without installing any additional packages. This excludes support for disassembling jit-ted code and prints an error if the user tries to use these features. Tested by: cat > FEATURES_DUMP.bpftool <<EOF feature-libbfd=0 feature-disassembler-four-args=1 feature-reallocarray=0 feature-libelf=1 feature-libelf-mmap=1 feature-bpf=1 EOF FEATURES_DUMP=$PWD/FEATURES_DUMP.bpftool make ldd bpftool \| grep libbfd Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-16	selftest/bpf: Use bpf_sk_lookup_{tcp, udp} in test_sock_addr	Andrey Ignatov	2	-21/+78
	Use bpf_sk_lookup_tcp, bpf_sk_lookup_udp and bpf_sk_release helpers from test_sock_addr programs to make sure they're available and can lookup and release socket properly for IPv4/IPv4, TCP/UDP. Reading from a few fields of returned struct bpf_sock is also tested. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-16	bpf: Support socket lookup in CGROUP_SOCK_ADDR progs	Andrey Ignatov	1	-0/+45
	Make bpf_sk_lookup_tcp, bpf_sk_lookup_udp and bpf_sk_release helpers available in programs of type BPF_PROG_TYPE_CGROUP_SOCK_ADDR. Such programs operate on sockets and have access to socket and struct sockaddr passed by user to system calls such as sys_bind, sys_connect, sys_sendmsg. It's useful to be able to lookup other sockets from these programs. E.g. sys_connect may lookup IP:port endpoint and if there is a server socket bound to that endpoint ("server" can be defined by saddr & sport being zero), redirect client connection to it by rewriting IP:port in sockaddr passed to sys_connect. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-16	bpf: Fix IPv6 dport byte order in bpf_sk_lookup_udp	Andrey Ignatov	1	-3/+2
	Lookup functions in sk_lookup have different expectations about byte order of provided arguments. Specifically __inet_lookup, __udp4_lib_lookup and __udp6_lib_lookup expect dport to be in network byte order and do ntohs(dport) internally. At the same time __inet6_lookup expects dport to be in host byte order and correspondingly name the argument hnum. sk_lookup works correctly with __inet_lookup, __udp4_lib_lookup and __inet6_lookup with regard to dport. But in __udp6_lib_lookup case it uses host instead of expected network byte order. It makes result returned by bpf_sk_lookup_udp for IPv6 incorrect. The patch fixes byte order of dport passed to __udp6_lib_lookup. Originally sk_lookup properly handled UDPv6, but not TCPv6. 5ef0ae84f02a fixes TCPv6 but breaks UDPv6. Fixes: 5ef0ae84f02a ("bpf: Fix IPv6 dport byte-order in bpf_sk_lookup") Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Joe Stringer <joe@wand.net.nz> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-16	bpf: Remove unused variable in nsim_bpf	Nathan Chancellor	1	-1/+0
	Clang warns: drivers/net/netdevsim/bpf.c:557:30: error: unused variable 'state' [-Werror,-Wunused-variable] struct nsim_bpf_bound_prog *state; ^ 1 error generated. The declaration should have been removed in commit b07ade27e933 ("bpf: pass translate() as a callback and remove its ndo_bpf subcommand"). Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-16	bpf: libbpf: Fix bpf_program__next() API	Martin KaFai Lau	1	-14/+11
	This patch restores the behavior in commit eac7d84519a3 ("tools: libbpf: don't return '.text' as a program for multi-function programs") such that bpf_program__next() does not return pseudo programs in ".text". Fixes: 0c19a9fbc9cd ("libbpf: cleanup after partial failure in bpf_object__pin") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-16	selftests/bpf: Fix uninitialized duration warning	Joe Stringer	1	-1/+1
	Daniel Borkmann reports: test_progs.c: In function ‘main’: test_progs.c:81:3: warning: ‘duration’ may be used uninitialized in this function [-Wmaybe-uninitialized] printf("%s:PASS:%s %d nsec\n", __func__, tag, duration);\ ^~~~~~ test_progs.c:1706:8: note: ‘duration’ was declared here __u32 duration; ^~~~~~~~ Signed-off-by: Joe Stringer <joe@wand.net.nz> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	selftests/bpf: Test narrow loads with off > 0 for bpf_sock_addr	Andrey Ignatov	1	-4/+24
	Add more test cases for context bpf_sock_addr to test narrow loads with offset > 0 for ctx->user_ip4 field (__u32): * off=1, size=1; * off=2, size=1; * off=3, size=1; * off=2, size=2. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	selftests/bpf: Test narrow loads with off > 0 in test_verifier	Andrey Ignatov	1	-10/+38
	Test the following narrow loads in test_verifier for context __sk_buff: * off=1, size=1 - ok; * off=2, size=1 - ok; * off=3, size=1 - ok; * off=0, size=2 - ok; * off=1, size=2 - fail; * off=0, size=2 - ok; * off=3, size=2 - fail. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	bpf: Allow narrow loads with offset > 0	Andrey Ignatov	2	-20/+17
	Currently BPF verifier allows narrow loads for a context field only with offset zero. E.g. if there is a __u32 field then only the following loads are permitted: * off=0, size=1 (narrow); * off=0, size=2 (narrow); * off=0, size=4 (full). On the other hand LLVM can generate a load with offset different than zero that make sense from program logic point of view, but verifier doesn't accept it. E.g. tools/testing/selftests/bpf/sendmsg4_prog.c has code: #define DST_IP4 0xC0A801FEU /* 192.168.1.254 / ... if ((ctx->user_ip4 >> 24) == (bpf_htonl(DST_IP4) >> 24) && where ctx is struct bpf_sock_addr. Some versions of LLVM can produce the following byte code for it: 8: 71 12 07 00 00 00 00 00 r2 = (u8 )(r1 + 7) 9: 67 02 00 00 18 00 00 00 r2 <<= 24 10: 18 03 00 00 00 00 00 fe 00 00 00 00 00 00 00 00 r3 = 4261412864 ll 12: 5d 32 07 00 00 00 00 00 if r2 != r3 goto +7 <LBB0_6> where `(u8 )(r1 + 7)` means narrow load for ctx->user_ip4 with size=1 and offset=3 (7 - sizeof(ctx->user_family) = 3). This load is currently rejected by verifier. Verifier code that rejects such loads is in bpf_ctx_narrow_access_ok() what means any is_valid_access implementation, that uses the function, works this way, e.g. bpf_skb_is_valid_access() for __sk_buff or sock_addr_is_valid_access() for bpf_sock_addr. The patch makes such loads supported. Offset can be in [0; size_default) but has to be multiple of load size. E.g. for __u32 field the following loads are supported now: off=0, size=1 (narrow); * off=1, size=1 (narrow); * off=2, size=1 (narrow); * off=3, size=1 (narrow); * off=0, size=2 (narrow); * off=2, size=2 (narrow); * off=0, size=4 (full). Reported-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	bpftool: support loading flow dissector	Stanislav Fomichev	3	-51/+74
	This commit adds support for loading/attaching/detaching flow dissector program. When `bpftool loadall` is called with a flow_dissector prog (i.e. when the 'type flow_dissector' argument is passed), we load and pin all programs. User is responsible to construct the jump table for the tail calls. The last argument of `bpftool attach` is made optional for this use case. Example: bpftool prog load tools/testing/selftests/bpf/bpf_flow.o \ /sys/fs/bpf/flow type flow_dissector \ pinmaps /sys/fs/bpf/flow bpftool map update pinned /sys/fs/bpf/flow/jmp_table \ key 0 0 0 0 \ value pinned /sys/fs/bpf/flow/IP bpftool map update pinned /sys/fs/bpf/flow/jmp_table \ key 1 0 0 0 \ value pinned /sys/fs/bpf/flow/IPV6 bpftool map update pinned /sys/fs/bpf/flow/jmp_table \ key 2 0 0 0 \ value pinned /sys/fs/bpf/flow/IPV6OP bpftool map update pinned /sys/fs/bpf/flow/jmp_table \ key 3 0 0 0 \ value pinned /sys/fs/bpf/flow/IPV6FR bpftool map update pinned /sys/fs/bpf/flow/jmp_table \ key 4 0 0 0 \ value pinned /sys/fs/bpf/flow/MPLS bpftool map update pinned /sys/fs/bpf/flow/jmp_table \ key 5 0 0 0 \ value pinned /sys/fs/bpf/flow/VLAN bpftool prog attach pinned /sys/fs/bpf/flow/flow_dissector flow_dissector Tested by using the above lines to load the prog in the test_flow_dissector.sh selftest. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	bpftool: add pinmaps argument to the load/loadall	Stanislav Fomichev	3	-3/+28
	This new additional argument lets users pin all maps from the object at specified path. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	bpftool: add loadall command	Stanislav Fomichev	5	-43/+81
	This patch adds new loadall command which slightly differs from the existing load. load command loads all programs from the obj file, but pins only the first programs. loadall pins all programs from the obj file under specified directory. The intended usecase is flow_dissector, where we want to load a bunch of progs, pin them all and after that construct a jump table. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	libbpf: add internal pin_name	Stanislav Fomichev	1	-3/+26
	pin_name is the same as section_name where '/' is replaced by '_'. bpf_object__pin_programs is converted to use pin_name to avoid the situation where section_name would require creating another subdirectory for a pin (as, for example, when calling bpf_object__pin_programs for programs in sections like "cgroup/connect6"). Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	libbpf: bpf_program__pin: add special case for instances.nr == 1	Stanislav Fomichev	1	-0/+10
	When bpf_program has only one instance, don't create a subdirectory with per-instance pin files (<prog>/0). Instead, just create a single pin file for that single instance. This simplifies object pinning by not creating unnecessary subdirectories. This can potentially break existing users that depend on the case where '/0' is always created. However, I couldn't find any serious usage of bpf_program__pin inside the kernel tree and I suppose there should be none outside. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	libbpf: cleanup after partial failure in bpf_object__pin	Stanislav Fomichev	2	-23/+319
	bpftool will use bpf_object__pin in the next commits to pin all programs and maps from the file; in case of a partial failure, we need to get back to the clean state (undo previous program/map pins). As part of a cleanup, I've added and exported separate routines to pin all maps (bpf_object__pin_maps) and progs (bpf_object__pin_programs) of an object. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	selftests/bpf: rename flow dissector section to flow_dissector	Stanislav Fomichev	2	-2/+2
	Makes it compatible with the logic that derives program type from section name in libbpf_prog_type_by_name. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	bpf: do not pass netdev to translate() and prepare() offload callbacks	Quentin Monnet	4	-13/+11
	The kernel functions to prepare verifier and translate for offloaded program retrieve "offload" from "prog", and "netdev" from "offload". Then both "prog" and "netdev" are passed to the callbacks. Simplify this by letting the drivers retrieve the net device themselves from the offload object attached to prog - if they need it at all. There is currently no need to pass the netdev as an argument to those functions. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	bpf: pass prog instead of env to bpf_prog_offload_verifier_prep()	Quentin Monnet	6	-10/+9
	Function bpf_prog_offload_verifier_prep(), called from the kernel BPF verifier to run a driver-specific callback for preparing for the verification step for offloaded programs, takes a pointer to a struct bpf_verifier_env object. However, no driver callback needs the whole structure at this time: the two drivers supporting this, nfp and netdevsim, only need a pointer to the struct bpf_prog instance held by env. Update the callback accordingly, on kernel side and in these two drivers. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	bpf: pass destroy() as a callback and remove its ndo_bpf subcommand	Quentin Monnet	5	-36/+5
	As part of the transition from ndo_bpf() to callbacks attached to struct bpf_offload_dev for some of the eBPF offload operations, move the functions related to program destruction to the struct and remove the subcommand that was used to call them through the NDO. Remove function __bpf_offload_ndo(), which is no longer used. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	bpf: pass translate() as a callback and remove its ndo_bpf subcommand	Quentin Monnet	5	-22/+21
	As part of the transition from ndo_bpf() to callbacks attached to struct bpf_offload_dev for some of the eBPF offload operations, move the functions related to code translation to the struct and remove the subcommand that was used to call them through the NDO. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	bpf: call verifier_prep from its callback in struct bpf_offload_dev	Quentin Monnet	5	-41/+32
	In a way similar to the change previously brought to the verify_insn hook and to the finalize callback, switch to the newly added ops in struct bpf_prog_offload for calling the functions used to prepare driver verifiers. Since the dev_ops pointer in struct bpf_prog_offload is no longer used by any callback, we can now remove it from struct bpf_prog_offload. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	bpf: call finalize() from its callback in struct bpf_offload_dev	Quentin Monnet	1	-2/+2
	In a way similar to the change previously brought to the verify_insn hook, switch to the newly added ops in struct bpf_prog_offload for calling the functions used to perform final verification steps for offloaded programs. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	bpf: call verify_insn from its callback in struct bpf_offload_dev	Quentin Monnet	2	-1/+4
	We intend to remove the dev_ops in struct bpf_prog_offload, and to only keep the ops in struct bpf_offload_dev instead, which is accessible from more locations for passing function pointers. But dev_ops is used for calling the verify_insn hook. Switch to the newly added ops in struct bpf_prog_offload instead. To avoid table lookups for each eBPF instruction to verify, we remember the offdev attached to a netdev and modify bpf_offload_find_netdev() to avoid performing more than once a lookup for a given offload object. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	bpf: pass a struct with offload callbacks to bpf_offload_dev_create()	Quentin Monnet	6	-9/+13
	For passing device functions for offloaded eBPF programs, there used to be no place where to store the pointer without making the non-offloaded programs pay a memory price. As a consequence, three functions were called with ndo_bpf() through specific commands. Now that we have struct bpf_offload_dev, and since none of those operations rely on RTNL, we can turn these three commands into hooks inside the struct bpf_prog_offload_ops, and pass them as part of bpf_offload_dev_create(). This commit effectively passes a pointer to the struct to bpf_offload_dev_create(). We temporarily have two struct bpf_prog_offload_ops instances, one under offdev->ops and one under offload->dev_ops. The next patches will make the transition towards the former, so that offload->dev_ops can be removed, and callbacks relying on ndo_bpf() added to offdev->ops as well. While at it, rename "nfp_bpf_analyzer_ops" as "nfp_bpf_dev_ops" (and similarly for netdevsim). Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	nfp: bpf: move nfp_bpf_analyzer_ops from verifier.c to offload.c	Quentin Monnet	3	-8/+12
	We are about to add several new callbacks to the struct, all of them defined in offload.c. Move the struct bpf_prog_offload_ops object in that file. As a consequence, nfp_verify_insn() and nfp_finalize() can no longer be static. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-09	bpf: Extend the sk_lookup() helper to XDP hookpoint.	Nitin Hande	2	-19/+90
	This patch proposes to extend the sk_lookup() BPF API to the XDP hookpoint. The sk_lookup() helper supports a lookup on incoming packet to find the corresponding socket that will receive this packet. Current support for this BPF API is at the tc hookpoint. This patch will extend this API at XDP hookpoint. A XDP program can map the incoming packet to the 5-tuple parameter and invoke the API to find the corresponding socket structure. Signed-off-by: Nitin Hande <Nitin.Hande@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09	bpftool: Improve handling of ENOENT on map dumps	David Ahern	1	-4/+14
	bpftool output is not user friendly when dumping a map with only a few populated entries: $ bpftool map 1: devmap name tx_devmap flags 0x0 key 4B value 4B max_entries 64 memlock 4096B 2: array name tx_idxmap flags 0x0 key 4B value 4B max_entries 64 memlock 4096B $ bpftool map dump id 1 key: 00 00 00 00 value: No such file or directory key: 01 00 00 00 value: No such file or directory key: 02 00 00 00 value: No such file or directory key: 03 00 00 00 value: 03 00 00 00 Handle ENOENT by keeping the line format sane and dumping "<no entry>" for the value $ bpftool map dump id 1 key: 00 00 00 00 value: <no entry> key: 01 00 00 00 value: <no entry> key: 02 00 00 00 value: <no entry> key: 03 00 00 00 value: 03 00 00 00 ... Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09	selftests/bpf: add a test case for sock_ops perf-event notification	Sowmini Varadhan	4	-1/+303
	This patch provides a tcp_bpf based eBPF sample. The test - ncat(1) as the TCP client program to connect() to a port with the intention of triggerring SYN retransmissions: we first install an iptables DROP rule to make sure ncat SYNs are resent (instead of aborting instantly after a TCP RST) - has a bpf kernel module that sends a perf-event notification for each TCP retransmit, and also tracks the number of such notifications sent in the global_map The test passes when the number of event notifications intercepted in user-space matches the value in the global_map. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09	bpf: add perf event notificaton support for sock_ops	Sowmini Varadhan	1	-0/+22
	This patch allows eBPF programs that use sock_ops to send perf based event notifications using bpf_perf_event_output(). Our main use case for this is the following: We would like to monitor some subset of TCP sockets in user-space, (the monitoring application would define 4-tuples it wants to monitor) using TCP_INFO stats to analyze reported problems. The idea is to use those stats to see where the bottlenecks are likely to be ("is it application-limited?" or "is there evidence of BufferBloat in the path?" etc). Today we can do this by periodically polling for tcp_info, but this could be made more efficient if the kernel would asynchronously notify the application via tcp_info when some "interesting" thresholds (e.g., "RTT variance > X", or "total_retrans > Y" etc) are reached. And to make this effective, it is better if we could apply the threshold check before constructing the tcp_info netlink notification, so that we don't waste resources constructing notifications that will be discarded by the filter. This work solves the problem by adding perf event based notification support for sock_ops. The eBPF program can thus be designed to apply any desired filters to the bpf_sock_ops and trigger a perf event notification based on the evaluation from the filter. The user space component can use these perf event notifications to either read any state managed by the eBPF program, or issue a TCP_INFO netlink call if desired. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09	nfp: bpf: relax prog rejection through max_pkt_offset	Jiong Wang	1	-4/+5
	NFP is refusing to offload programs whenever the MTU is set to a value larger than the max packet bytes that fits in NFP Cluster Target Memory (CTM). However, a eBPF program doesn't always need to access the whole packet data. Verifier has always calculated maximum direct packet access (DPA) offset, and kept it in max_pkt_offset inside prog auxiliar information. This patch relax prog rejection based on max_pkt_offset. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09	bpf: let verifier to calculate and record max_pkt_offset	Jiong Wang	2	-0/+13
	In check_packet_access, update max_pkt_offset after the offset has passed __check_packet_access. It should be safe to use u32 for max_pkt_offset as explained in code comment. Also, when there is tail call, the max_pkt_offset of the called program is unknown, so conservatively set max_pkt_offset to MAX_PACKET_OFF for such case. Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-07	bpf_load: add map name to load_maps error message	Shannon Nelson	1	-2/+2
	To help when debugging bpf/xdp load issues, have the load_map() error message include the number and name of the map that failed. Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-07	tools: bpftool: adjust rlimit RLIMIT_MEMLOCK when loading programs, maps	Quentin Monnet	4	-0/+14
	The limit for memory locked in the kernel by a process is usually set to 64 kbytes by default. This can be an issue when creating large BPF maps and/or loading many programs. A workaround is to raise this limit for the current process before trying to create a new BPF map. Changing the hard limit requires the CAP_SYS_RESOURCE and can usually only be done by root user (for non-root users, a call to setrlimit fails (and sets errno) and the program simply goes on with its rlimit unchanged). There is no API to get the current amount of memory locked for a user, therefore we cannot raise the limit only when required. One solution, used by bcc, is to try to create the map, and on getting a EPERM error, raising the limit to infinity before giving another try. Another approach, used in iproute2, is to raise the limit in all cases, before trying to create the map. Here we do the same as in iproute2: the rlimit is raised to infinity before trying to load programs or to create maps with bpftool. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-07	selftests/bpf: enable (uncomment) all tests in test_libbpf.sh	Quentin Monnet	2	-10/+14
	libbpf is now able to load successfully test_l4lb_noinline.o and samples/bpf/tracex3_kern.o. For the test_l4lb_noinline, uncomment related tests from test_libbpf.c and remove the associated "TODO". For tracex3_kern.o, instead of loading a program from samples/bpf/ that might not have been compiled at this stage, try loading a program from BPF selftests. Since this test case is about loading a program compiled without the "-target bpf" flag, change the Makefile to compile one program accordingly (instead of passing the flag for compiling all programs). Regarding test_xdp_noinline.o: in its current shape the program fails to load because it provides no version section, but the loader needs one. The test was added to make sure that libbpf could load XDP programs even if they do not provide a version number in a dedicated section. But libbpf is already capable of doing that: in our case loading fails because the loader does not know that this is an XDP program (it does not need to, since it does not attach the program). So trying to load test_xdp_noinline.o does not bring much here: just delete this subtest. For the record, the error message obtained with tracex3_kern.o was fixed by commit e3d91b0ca523 ("tools/libbpf: handle issues with bpf ELF objects containing .eh_frames") I have not been abled to reproduce the "libbpf: incorrect bpf_call opcode" error for test_l4lb_noinline.o, even with the version of libbpf present at the time when test_libbpf.sh and test_libbpf_open.c were created. RFC -> v1: - Compile test_xdp without the "-target bpf" flag, and try to load it instead of ../../samples/bpf/tracex3_kern.o. - Delete test_xdp_noinline.o subtest. Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-07	net: hns3: Remove set but not used variable 'reset_level'	YueHaibing	1	-10/+3
	Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c: In function 'hclge_log_and_clear_ppp_error': drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_err.c:821:24: warning: variable 'reset_level' set but not used [-Wunused-but-set-variable] enum hnae3_reset_type reset_level = HNAE3_NONE_RESET; It never used since introduction in commit 01865a50d78f ("net: hns3: Add enable and process hw errors of TM scheduler") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07	nfp: flower: use the common netdev notifier	Jakub Kicinski	4	-55/+30
	Use driver's common notifier for LAG and tunnel configuration. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07	nfp: register a notifier handler in a central location for the device	Jakub Kicinski	2	-15/+57
	Code interested in networking events registers its own notifier handlers. Create one device-wide notifier instance. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>