linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2019-08-28	selftests/bpf: add precision tracking test	Alexei Starovoitov	1	-0/+25
	Copy-paste of existing test "calls: cross frame pruning - liveness propagation" but ran with different parentage chain heuristic which stresses different path in precision tracking logic. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-28	selftests/bpf: verifier precise tests	Alexei Starovoitov	2	-11/+174
	Use BPF_F_TEST_STATE_FREQ flag to check that precision tracking works as expected by comparing every step it takes. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-28	tools/bpf: sync bpf.h	Alexei Starovoitov	1	-0/+3
	sync bpf.h from kernel/ to tools/ Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-28	bpf: introduce verifier internal test flag	Alexei Starovoitov	4	-1/+9
	Introduce BPF_F_TEST_STATE_FREQ flag to stress test parentage chain and state pruning. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-21	tools: bpftool: add "bpftool map freeze" subcommand	Quentin Monnet	3	-3/+44
	Add a new subcommand to freeze maps from user space. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-21	tools: bpftool: show frozen status for maps	Quentin Monnet	1	-3/+27
	When listing maps, read their "frozen" status from procfs, and tell if maps are frozen. As commit log for map freezing command mentions that the feature might be extended with flags (e.g. for write-only instead of read-only) in the future, use an integer and not a boolean for JSON output. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-21	bpf: sync bpf.h to tools/	Peter Wu	1	-3/+5
	Fix a 'struct pt_reg' typo and clarify when bpf_trace_printk discards lines. Affects documentation only. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-21	bpf: clarify when bpf_trace_printk discards lines	Peter Wu	1	-0/+2
	I opened /sys/kernel/tracing/trace once and kept reading from it. bpf_trace_printk somehow did not seem to work, no entries were appended to that trace file. It turns out that tracing is disabled when that file is open. Save the next person some time and document this. The trace file is described in Documentation/trace/ftrace.rst, however the implication "tracing is disabled" did not immediate translate to "bpf_trace_printk silently discards entries". Signed-off-by: Peter Wu <peter@lekensteyn.nl> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-21	bpf: fix 'struct pt_reg' typo in documentation	Peter Wu	1	-3/+3
	There is no 'struct pt_reg'. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-21	bpf: clarify description for CONFIG_BPF_EVENTS	Peter Wu	1	-1/+2
	PERF_EVENT_IOC_SET_BPF supports uprobes since v4.3, and tracepoints since v4.7 via commit 04a22fae4cbc ("tracing, perf: Implement BPF programs attached to uprobes"), and commit 98b5c2c65c29 ("perf, bpf: allow bpf programs attach to tracepoints") respectively. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-21	btf: do not use CONFIG_OUTPUT_FORMAT	Ilya Leoshkevich	1	-2/+4
	Building s390 kernel with CONFIG_DEBUG_INFO_BTF fails, because CONFIG_OUTPUT_FORMAT is not defined. As a matter of fact, this variable appears to be x86-only, so other arches might be affected as well. Fix by obtaining this value from objdump output, just like it's already done for bin_arch. The exact objdump invocation is "inspired" by arch/powerpc/boot/wrapper. Also, use LANG=C for the existing bin_arch objdump invocation to avoid potential build issues on systems with non-English locale. Fixes: 341dfcf8d78e ("btf: expose BTF info through sysfs") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-21	samples: bpf: syscall_nrs: use mmap2 if defined	Ivan Khoronzhuk	2	-0/+19
	For arm32 xdp sockets mmap2 is preferred, so use it if it's defined. Declaration of __NR_mmap can be skipped and it breaks build. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-21	xdp: xdp_umem: replace kmap on vmap for umem map	Ivan Khoronzhuk	1	-6/+30
	For 64-bit there is no reason to use vmap/vunmap, so use page_address as it was initially. For 32 bits, in some apps, like in samples xdpsock_user.c when number of pgs in use is quite big, the kmap memory can be not enough, despite on this, kmap looks like is deprecated in such cases as it can block and should be used rather for dynamic mm. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-21	libbpf: use LFS (_FILE_OFFSET_BITS) instead of direct mmap2 syscall	Ivan Khoronzhuk	2	-35/+15
	Drop __NR_mmap2 fork in flavor of LFS, that is _FILE_OFFSET_BITS=64 (glibc & bionic) / LARGEFILE64_SOURCE (for musl) decision. It allows mmap() to use 64bit offset that is passed to mmap2 syscall. As result pgoff is not truncated and no need to use direct access to mmap2 for 32 bits systems. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-20	tools: bpftool: implement "bpftool btf show\|list"	Quentin Monnet	3	-6/+363
	Add a "btf list" (alias: "btf show") subcommand to bpftool in order to dump all BTF objects loaded on a system. When running the command, hash tables are built in bpftool to retrieve all the associations between BTF objects and BPF maps and programs. This allows for printing all such associations when listing the BTF objects. The command is added at the top of the subcommands for "bpftool btf", so that typing only "bpftool btf" also comes down to listing the programs. We could not have this with the previous command ("dump"), which required a BTF object id, so it should not break any previous behaviour. This also makes the "btf" command behaviour consistent with "prog" or "map". Bash completion is updated to use "bpftool btf" instead of "bpftool prog" to list the BTF ids, as it looks more consistent. Example output (plain): # bpftool btf show 9: size 2989B prog_ids 21 map_ids 15 17: size 2847B prog_ids 36 map_ids 30,29,28 26: size 2847B Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-20	libbpf: add bpf_btf_get_next_id() to cycle through BTF objects	Quentin Monnet	3	-0/+8
	Add an API function taking a BTF object id and providing the id of the next BTF object in the kernel. This can be used to list all BTF objects loaded on the system. v2: - Rebase on top of Andrii's changes regarding libbpf versioning. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-20	libbpf: refactor bpf_*_get_next_id() functions	Quentin Monnet	1	-13/+8
	In preparation for the introduction of a similar function for retrieving the id of the next BTF object, consolidate the code from bpf_prog_get_next_id() and bpf_map_get_next_id() in libbpf. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-20	tools: bpf: synchronise BPF UAPI header with tools	Quentin Monnet	1	-0/+1
	Synchronise the bpf.h header under tools, to report the addition of the new BPF_BTF_GET_NEXT_ID syscall command for bpf(). Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-20	bpf: add new BPF_BTF_GET_NEXT_ID syscall command	Quentin Monnet	4	-2/+10
	Add a new command for the bpf() system call: BPF_BTF_GET_NEXT_ID is used to cycle through all BTF objects loaded on the system. The motivation is to be able to inspect (list) all BTF objects presents on the system. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-20	test_bpf: Fix a new clang warning about xor-ing two numbers	Nathan Chancellor	1	-1/+1
	r369217 in clang added a new warning about potential misuse of the xor operator as an exponentiation operator: ../lib/test_bpf.c:870:13: warning: result of '10 ^ 300' is 294; did you mean '1e300'? [-Wxor-used-as-pow] { { 4, 10 ^ 300 }, { 20, 10 ^ 300 } }, ~~~^~~~~ 1e300 ../lib/test_bpf.c:870:13: note: replace expression with '0xA ^ 300' to silence this warning ../lib/test_bpf.c:870:31: warning: result of '10 ^ 300' is 294; did you mean '1e300'? [-Wxor-used-as-pow] { { 4, 10 ^ 300 }, { 20, 10 ^ 300 } }, ~~~^~~~~ 1e300 ../lib/test_bpf.c:870:31: note: replace expression with '0xA ^ 300' to silence this warning The commit link for this new warning has some good logic behind wanting to add it but this instance appears to be a false positive. Adopt its suggestion to silence the warning but not change the code. According to the differential review link in the clang commit, GCC may eventually adopt this warning as well. Link: https://github.com/ClangBuiltLinux/linux/issues/643 Link: https://github.com/llvm/llvm-project/commit/920890e26812f808a74c60ebc14cc636dac661c1 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-20	bpf: add include guard to tnum.h	Masahiro Yamada	1	-0/+6
	Add a header include guard just in case. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-20	bpf: add BTF ids in procfs for file descriptors to BTF objects	Quentin Monnet	1	-0/+12
	Implement the show_fdinfo hook for BTF FDs file operations, and make it print the id of the BTF object. This allows for a quick retrieval of the BTF id from its FD; or it can help understanding what type of object (BTF) the file descriptor points to. v2: - Do not expose data_size, only btf_id, in FD info. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-20	bpf: Use PTR_ERR_OR_ZERO in xsk_map_inc()	YueHaibing	1	-1/+1
	Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17	xsk: support BPF_EXIST and BPF_NOEXIST flags in XSKMAP	Björn Töpel	1	-2/+6
	The XSKMAP did not honor the BPF_EXIST/BPF_NOEXIST flags when updating an entry. This patch addresses that. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17	xsk: remove AF_XDP socket from map when the socket is released	Björn Töpel	3	-20/+173
	When an AF_XDP socket is released/closed the XSKMAP still holds a reference to the socket in a "released" state. The socket will still use the netdev queue resource, and block newly created sockets from attaching to that queue, but no user application can access the fill/complete/rx/tx queues. This results in that all applications need to explicitly clear the map entry from the old "zombie state" socket. This should be done automatically. In this patch, the sockets tracks, and have a reference to, which maps it resides in. When the socket is released, it will remove itself from all maps. Suggested-by: Bruce Richardson <bruce.richardson@intel.com> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17	selftests/bpf: add sockopt clone/inheritance test	Stanislav Fomichev	4	-1/+353
	Add a test that calls setsockopt on the listener socket which triggers BPF program. This BPF program writes to the sk storage and sets clone flag. Make sure that sk storage is cloned for a newly accepted connection. We have two cloned maps in the tests to make sure we hit both cases in bpf_sk_storage_clone: first element (sk_storage_alloc) and non-first element(s) (selem_link_map). Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17	bpf: sync bpf.h to tools/	Stanislav Fomichev	1	-0/+3
	Sync new sk storage clone flag. Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17	bpf: support cloning sk storage on accept()	Stanislav Fomichev	4	-6/+120
	Add new helper bpf_sk_storage_clone which optionally clones sk storage and call it from sk_clone_lock. Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17	bpf: export bpf_map_inc_not_zero	Stanislav Fomichev	2	-3/+15
	Rename existing bpf_map_inc_not_zero to __bpf_map_inc_not_zero to indicate that it's caller's responsibility to do proper locking. Create and export bpf_map_inc_not_zero wrapper that properly locks map_idr_lock. Will be used in the next commit to hold a map while cloning a socket. Cc: Martin KaFai Lau <kafai@fb.com> Cc: Yonghong Song <yhs@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17	selftests/bpf: fix race in test_tcp_rtt test	Petar Penkov	1	-0/+31
	There is a race in this test between receiving the ACK for the single-byte packet sent in the test, and reading the values from the map. This patch fixes this by having the client wait until there are no more unacknowledged packets. Before: for i in {1..1000}; do ../net/in_netns.sh ./test_tcp_rtt; \ done \| grep -c PASSED < trimmed error messages > 993 After: for i in {1..10000}; do ../net/in_netns.sh ./test_tcp_rtt; \ done \| grep -c PASSED 10000 Fixes: b55873984dab ("selftests/bpf: test BPF_SOCK_OPS_RTT_CB") Signed-off-by: Petar Penkov <ppenkov@google.com> Reviewed-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17	libbpf: relicense bpf_helpers.h and bpf_endian.h	Andrii Nakryiko	2	-2/+2
	bpf_helpers.h and bpf_endian.h contain useful macros and BPF helper definitions essential to almost every BPF program. Which makes them useful not just for selftests. To be able to expose them as part of libbpf, though, we need them to be dual-licensed as LGPL-2.1 OR BSD-2-Clause. This patch updates licensing of those two files. Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Hechao Li <hechaol@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Andrey Ignatov <rdna@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Adam Barth <arb@fb.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Josef Bacik <jbacik@fb.com> Acked-by: Joe Stringer <joe@wand.net.nz> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: David Ahern <dsahern@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Lorenz Bauer <lmb@cloudflare.com> Acked-by: Adrian Ratiu <adrian.ratiu@collabora.com> Acked-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Petar Penkov <ppenkov@google.com> Acked-by: Teng Qin <palmtenor@gmail.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Michal Rostecki <mrostecki@opensuse.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Sargun Dhillon <sargun@sargun.me> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17	net: Don't call XDP_SETUP_PROG when nothing is changed	Maxim Mikityanskiy	1	-2/+13
	Don't uninstall an XDP program when none is installed, and don't install an XDP program that has the same ID as the one already installed. dev_change_xdp_fd doesn't perform any checks in case it uninstalls an XDP program. It means that the driver's ndo_bpf can be called with XDP_SETUP_PROG asking to set it to NULL even if it's already NULL. This case happens if the user runs `ip link set eth0 xdp off` when there is no XDP program attached. The symmetrical case is possible when the user tries to set the program that is already set. The drivers typically perform some heavy operations on XDP_SETUP_PROG, so they all have to handle these cases internally to return early if they happen. This patch puts this check into the kernel code, so that all drivers will benefit from it. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17	net/mlx5e: Add AF_XDP need_wakeup support	Maxim Mikityanskiy	4	-5/+48
	This commit adds support for the new need_wakeup feature of AF_XDP. The applications can opt-in by using the XDP_USE_NEED_WAKEUP bind() flag. When this feature is enabled, some behavior changes: RX side: If the Fill Ring is empty, instead of busy-polling, set the flag to tell the application to kick the driver when it refills the Fill Ring. TX side: If there are pending completions or packets queued for transmission, set the flag to tell the application that it can skip the sendto() syscall and save time. The performance testing was performed on a machine with the following configuration: - 24 cores of Intel Xeon E5-2620 v3 @ 2.40 GHz - Mellanox ConnectX-5 Ex with 100 Gbit/s link The results with retpoline disabled: \| without need_wakeup \| with need_wakeup \| \|----------------------\|----------------------\| \| one core \| two cores \| one core \| two cores \| -------\|----------\|-----------\|----------\|-----------\| txonly \| 20.1 \| 33.5 \| 29.0 \| 34.2 \| rxdrop \| 0.065 \| 14.1 \| 12.0 \| 14.1 \| l2fwd \| 0.032 \| 7.3 \| 6.6 \| 7.2 \| "One core" means the application and NAPI run on the same core. "Two cores" means they are pinned to different cores. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17	net/mlx5e: Move the SW XSK code from NAPI poll to a separate function	Maxim Mikityanskiy	1	-2/+11
	Two XSK tasks are performed during NAPI polling, that are not bound to hardware interrupts: TXing packets and polling for frames in the Fill Ring. They are special in a way that the hardware doesn't know about these tasks, so it doesn't trigger interrupts if there is still some work to be done, it's our driver's responsibility to ensure NAPI will be rescheduled if needed. Create a new function to handle these tasks and move the corresponding code from mlx5e_napi_poll to the new function to improve modularity and prepare for the changes in the following patch. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17	samples/bpf: add use of need_wakeup flag in xdpsock	Magnus Karlsson	1	-72/+120
	This commit adds using the need_wakeup flag to the xdpsock sample application. It is turned on by default as we think it is a feature that seems to always produce a performance benefit, if the application has been written taking advantage of it. It can be turned off in the sample app by using the '-m' command line option. The txpush and l2fwd sub applications have also been updated to support poll() with multiple sockets. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17	libbpf: add support for need_wakeup flag in AF_XDP part	Magnus Karlsson	3	-0/+23
	This commit adds support for the new need_wakeup flag in AF_XDP. The xsk_socket__create function is updated to handle this and a new function is introduced called xsk_ring_prod__needs_wakeup(). This function can be used by the application to check if Rx and/or Tx processing needs to be explicitly woken up. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17	ixgbe: add support for AF_XDP need_wakeup feature	Magnus Karlsson	1	-0/+18
	This patch adds support for the need_wakeup feature of AF_XDP. If the application has told the kernel that it might sleep using the new bind flag XDP_USE_NEED_WAKEUP, the driver will then set this flag if it has no more buffers on the NIC Rx ring and yield to the application. For Tx, it will set the flag if it has no outstanding Tx completion interrupts and return to the application. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17	i40e: add support for AF_XDP need_wakeup feature	Magnus Karlsson	1	-0/+18
	This patch adds support for the need_wakeup feature of AF_XDP. If the application has told the kernel that it might sleep using the new bind flag XDP_USE_NEED_WAKEUP, the driver will then set this flag if it has no more buffers on the NIC Rx ring and yield to the application. For Tx, it will set the flag if it has no outstanding Tx completion interrupts and return to the application. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17	xsk: add support for need_wakeup flag in AF_XDP rings	Magnus Karlsson	6	-20/+195
	This commit adds support for a new flag called need_wakeup in the AF_XDP Tx and fill rings. When this flag is set, it means that the application has to explicitly wake up the kernel Rx (for the bit in the fill ring) or kernel Tx (for bit in the Tx ring) processing by issuing a syscall. Poll() can wake up both depending on the flags submitted and sendto() will wake up tx processing only. The main reason for introducing this new flag is to be able to efficiently support the case when application and driver is executing on the same core. Previously, the driver was just busy-spinning on the fill ring if it ran out of buffers in the HW and there were none on the fill ring. This approach works when the application is running on another core as it can replenish the fill ring while the driver is busy-spinning. Though, this is a lousy approach if both of them are running on the same core as the probability of the fill ring getting more entries when the driver is busy-spinning is zero. With this new feature the driver now sets the need_wakeup flag and returns to the application. The application can then replenish the fill queue and then explicitly wake up the Rx processing in the kernel using the syscall poll(). For Tx, the flag is only set to one if the driver has no outstanding Tx completion interrupts. If it has some, the flag is zero as it will be woken up by a completion interrupt anyway. As a nice side effect, this new flag also improves the performance of the case where application and driver are running on two different cores as it reduces the number of syscalls to the kernel. The kernel tells user space if it needs to be woken up by a syscall, and this eliminates many of the syscalls. This flag needs some simple driver support. If the driver does not support this, the Rx flag is always zero and the Tx flag is always one. This makes any application relying on this feature default to the old behaviour of not requiring any syscalls in the Rx path and always having to call sendto() in the Tx path. For backwards compatibility reasons, this feature has to be explicitly turned on using a new bind flag (XDP_USE_NEED_WAKEUP). I recommend that you always turn it on as it so far always have had a positive performance impact. The name and inspiration of the flag has been taken from io_uring by Jens Axboe. Details about this feature in io_uring can be found in http://kernel.dk/io_uring.pdf, section 8.3. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-17	xsk: replace ndo_xsk_async_xmit with ndo_xsk_wakeup	Magnus Karlsson	12	-19/+32
	This commit replaces ndo_xsk_async_xmit with ndo_xsk_wakeup. This new ndo provides the same functionality as before but with the addition of a new flags field that is used to specifiy if Rx, Tx or both should be woken up. The previous ndo only woke up Tx, as implied by the name. The i40e and ixgbe drivers (which are all the supported ones) are updated with this new interface. This new ndo will be used by the new need_wakeup functionality of XDP sockets that need to be able to wake up both Rx and Tx driver processing. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-15	btf: fix return value check in btf_vmlinux_init()	Wei Yongjun	1	-7/+2
	In case of error, the function kobject_create_and_add() returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. Fixes: 341dfcf8d78e ("btf: expose BTF info through sysfs") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-15	tools: bpftool: move "__printf()" attributes to header file	Quentin Monnet	5	-10/+12
	Some functions in bpftool have a "__printf()" format attributes to tell the compiler they should expect printf()-like arguments. But because these attributes are not used for the function prototypes in the header files, the compiler does not run the checks everywhere the functions are used, and some mistakes on format string and corresponding arguments slipped in over time. Let's move the __printf() attributes to the correct places. Note: We add guards around the definition of GCC_VERSION in tools/include/linux/compiler-gcc.h to prevent a conflict in jit_disasm.c on GCC_VERSION from headers pulled via libbfd. Fixes: c101189bc968 ("tools: bpftool: fix -Wmissing declaration warnings") Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-15	tools: bpftool: fix format string for p_err() in detect_common_prefix()	Quentin Monnet	1	-1/+1
	There is one call to the p_err() function in detect_common_prefix() where the message to print is passed directly as the first argument, without using a format string. This is harmless, but may trigger warnings if the "__printf()" attribute is used correctly for the p_err() function. Let's fix it by using a "%s" format string. Fixes: ba95c7452439 ("tools: bpftool: add "prog run" subcommand to test-run programs") Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-15	tools: bpftool: fix format string for p_err() in query_flow_dissector()	Quentin Monnet	1	-1/+1
	The format string passed to one call to the p_err() function in query_flow_dissector() does not match the value that should be printed, resulting in some garbage integer being printed instead of strerror(errno) if /proc/self/ns/net cannot be open. Let's fix the format string. Fixes: 7f0c57fec80f ("bpftool: show flow_dissector attachment status") Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-15	tools: bpftool: fix argument for p_err() in BTF do_dump()	Quentin Monnet	1	-1/+1
	The last argument passed to one call to the p_err() function is not correct, it should be "argv" instead of "*argv". This may lead to a segmentation fault error if BTF id cannot be parsed correctly. Let's fix this. Fixes: c93cc69004dt ("bpftool: add ability to dump BTF types") Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-15	tools: bpftool: fix format strings and arguments for jsonw_printf()	Quentin Monnet	1	-4/+4
	There are some mismatches between format strings and arguments passed to jsonw_printf() in the BTF dumper for bpftool, which seems harmless but may result in warnings if the "__printf()" attribute is used correctly for jsonw_printf(). Let's fix relevant format strings and type cast. Fixes: b12d6ec09730 ("bpf: btf: add btf print functionality") Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-15	tools: bpftool: fix arguments for p_err() in do_event_pipe()	Quentin Monnet	1	-2/+2
	The last argument passed to some calls to the p_err() functions is not correct, it should be "argv" instead of "*argv". This may lead to a segmentation fault error if CPU IDs or indices from the command line cannot be parsed correctly. Let's fix this. Fixes: f412eed9dfde ("tools: bpftool: add simple perf event output reader") Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-15	libbpf: make libbpf.map source of truth for libbpf version	Andrii Nakryiko	2	-12/+11
	Currently libbpf version is specified in 2 places: libbpf.map and Makefile. They easily get out of sync and it's very easy to update one, but forget to update another one. In addition, Github projection of libbpf has to maintain its own version which has to be remembered to be kept in sync manually, which is very error-prone approach. This patch makes libbpf.map a source of truth for libbpf version and uses shell invocation to parse out correct full and major libbpf version to use during build. Now we need to make sure that once new release cycle starts, we need to add (initially) empty section to libbpf.map with correct latest version. This also will make it possible to keep Github projection consistent with kernel sources version of libbpf by adopting similar parsing of version from libbpf.map. v2->v3: - grep -o + sort -rV (Andrey); v1->v2: - eager version vars evaluation (Jakub); - simplified version regex (Andrey); Cc: Andrey Ignatov <rdna@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-15	tools: bpftool: add documentation for net attach/detach	Daniel T. Lee	1	-3/+54
	Since, new sub-command 'net attach/detach' has been added for attaching XDP program on interface, this commit documents usage and sample output of `net attach/detach`. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-08-15	tools: bpftool: add bash-completion for net attach/detach	Daniel T. Lee	1	-10/+55
	This commit adds bash-completion for new "net attach/detach" subcommand for attaching XDP program on interface. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>