linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2020-01-16	selftests/bpf: Fix test_progs send_signal flakiness with nmi mode	Yonghong Song	1	-5/+1
	Alexei observed that test_progs send_signal may fail if run with command line "./test_progs" and the tests will pass if just run "./test_progs -n 40". I observed similar issue with nmi subtest failure and added a delay 100 us in Commit ab8b7f0cb358 ("tools/bpf: Add self tests for bpf_send_signal_thread()") and the problem is gone for me. But the issue still exists in Alexei's testing environment. The current code uses sample_freq = 50 (50 events/second), which may not be enough. But if the sample_freq value is larger than sysctl kernel/perf_event_max_sample_rate, the perf_event_open syscall will fail. This patch changed nmi perf testing to use sample_period = 1, which means trying to sampling every event. This seems fixing the issue. Fixes: ab8b7f0cb358 ("tools/bpf: Add self tests for bpf_send_signal_thread()") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200116174004.1522812-1-yhs@fb.com
2020-01-16	libbpf: Fix unneeded extra initialization in bpf_map_batch_common	Brian Vazquez	1	-1/+1
	bpf_attr doesn't required to be declared with '= {}' as memset is used in the code. Fixes: 2ab3d86ea1859 ("libbpf: Add libbpf support to batch ops") Reported-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200116045918.75597-1-brianvv@google.com
2020-01-15	selftests/bpf: Add whitelist/blacklist of test names to test_progs	Andrii Nakryiko	2	-11/+80
	Add ability to specify a list of test name substrings for selecting which tests to run. So now -t is accepting a comma-separated list of strings, similarly to how -n accepts a comma-separated list of test numbers. Additionally, add ability to blacklist tests by name. Blacklist takes precedence over whitelist. Blacklisting is important for cases where it's known that some tests can't pass (e.g., due to perf hardware events that are not available within VM). This is going to be used for libbpf testing in Travis CI in its Github repo. Example runs with just whitelist and whitelist + blacklist: $ sudo ./test_progs -tattach,core/existence #1 attach_probe:OK #6 cgroup_attach_autodetach:OK #7 cgroup_attach_multi:OK #8 cgroup_attach_override:OK #9 core_extern:OK #10/44 existence:OK #10/45 existence___minimal:OK #10/46 existence__err_int_sz:OK #10/47 existence__err_int_type:OK #10/48 existence__err_int_kind:OK #10/49 existence__err_arr_kind:OK #10/50 existence__err_arr_value_type:OK #10/51 existence__err_struct_type:OK #10 core_reloc:OK #19 flow_dissector_reattach:OK #60 tp_attach_query:OK Summary: 8/8 PASSED, 0 SKIPPED, 0 FAILED $ sudo ./test_progs -tattach,core/existence -bcgroup,flow/arr #1 attach_probe:OK #9 core_extern:OK #10/44 existence:OK #10/45 existence___minimal:OK #10/46 existence__err_int_sz:OK #10/47 existence__err_int_type:OK #10/48 existence__err_int_kind:OK #10/51 existence__err_struct_type:OK #10 core_reloc:OK #60 tp_attach_query:OK Summary: 4/6 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Julia Kartseva <hex@fb.com> Link: https://lore.kernel.org/bpf/20200116005549.3644118-1-andriin@fb.com
2020-01-15	Merge branch 'bpftool-improvements'	Alexei Starovoitov	5	-134/+166
	Martin Lau says: ==================== When a map is storing a kernel's struct, its map_info->btf_vmlinux_value_type_id is set. The first map type supporting it is BPF_MAP_TYPE_STRUCT_OPS. This series adds support to dump this kind of map with BTF. The first two patches are bug fixes which are only applicable to bpf-next. Please see individual patches for details. v3: - Remove unnecessary #include "libbpf_internal.h" from patch 5 v2: - Expose bpf_find_kernel_btf() as a LIBBPF_API in patch 3 (Andrii) - Cache btf_vmlinux in bpftool/map.c (Andrii) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-01-15	bpftool: Support dumping a map with btf_vmlinux_value_type_id	Martin KaFai Lau	1	-11/+50
	This patch makes bpftool support dumping a map's value properly when the map's value type is a type of the running kernel's btf. (i.e. map_info.btf_vmlinux_value_type_id is set instead of map_info.btf_value_type_id). The first usecase is for the BPF_MAP_TYPE_STRUCT_OPS. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200115230044.1103008-1-kafai@fb.com
2020-01-15	bpftool: Add struct_ops map name	Martin KaFai Lau	1	-0/+1
	This patch adds BPF_MAP_TYPE_STRUCT_OPS to "struct_ops" name mapping so that "bpftool map show" can print the "struct_ops" map type properly. [root@arch-fb-vm1 bpf]# ~/devshare/fb-kernel/linux/tools/bpf/bpftool/bpftool map show id 8 8: struct_ops name dctcp flags 0x0 key 4B value 256B max_entries 1 memlock 4096B btf_id 7 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200115230037.1102674-1-kafai@fb.com
2020-01-15	libbpf: Expose bpf_find_kernel_btf as a LIBBPF_API	Martin KaFai Lau	4	-96/+102
	This patch exposes bpf_find_kernel_btf() as a LIBBPF_API. It will be used in 'bpftool map dump' in a following patch to dump a map with btf_vmlinux_value_type_id set. bpf_find_kernel_btf() is renamed to libbpf_find_kernel_btf() and moved to btf.c. As <linux/kernel.h> is included, some of the max/min type casting needs to be fixed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200115230031.1102305-1-kafai@fb.com
2020-01-15	bpftool: Fix missing BTF output for json during map dump	Martin KaFai Lau	1	-22/+20
	The btf availability check is only done for plain text output. It causes the whole BTF output went missing when json_output is used. This patch simplifies the logic a little by avoiding passing "int btf" to map_dump(). For plain text output, the btf_wtr is only created when the map has BTF (i.e. info->btf_id != 0). The nullness of "json_writer_t *wtr" in map_dump() alone can decide if dumping BTF output is needed. As long as wtr is not NULL, map_dump() will print out the BTF-described data whenever a map has BTF available (i.e. info->btf_id != 0) regardless of json or plain-text output. In do_dump(), the "int btf" is also renamed to "int do_plain_btf". Fixes: 99f9863a0c45 ("bpftool: Match maps by name") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: Paul Chaignon <paul.chaignon@orange.com> Link: https://lore.kernel.org/bpf/20200115230025.1101828-1-kafai@fb.com
2020-01-15	bpftool: Fix a leak of btf object	Martin KaFai Lau	1	-14/+2
	When testing a map has btf or not, maps_have_btf() tests it by actually getting a btf_fd from sys_bpf(BPF_BTF_GET_FD_BY_ID). However, it forgot to btf__free() it. In maps_have_btf() stage, there is no need to test it by really calling sys_bpf(BPF_BTF_GET_FD_BY_ID). Testing non zero info.btf_id is good enough. Also, the err_close case is unnecessary, and also causes double close() because the calling func do_dump() will close() all fds again. Fixes: 99f9863a0c45 ("bpftool: Match maps by name") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: Paul Chaignon <paul.chaignon@orange.com> Link: https://lore.kernel.org/bpf/20200115230019.1101352-1-kafai@fb.com
2020-01-15	Merge branch 'bpf-batch-ops'	Alexei Starovoitov	11	-128/+1248
	Brian Vazquez says: ==================== This patch series introduce batch ops that can be added to bpf maps to lookup/lookup_and_delete/update/delete more than 1 element at the time, this is specially useful when syscall overhead is a problem and in case of hmap it will provide a reliable way of traversing them. The implementation inclues a generic approach that could potentially be used by any bpf map and adds it to arraymap, it also includes the specific implementation of hashmaps which are traversed using buckets instead of keys. The bpf syscall subcommands introduced are: BPF_MAP_LOOKUP_BATCH BPF_MAP_LOOKUP_AND_DELETE_BATCH BPF_MAP_UPDATE_BATCH BPF_MAP_DELETE_BATCH The UAPI attribute is: struct { /* struct used by BPF_MAP__BATCH commands / __aligned_u64 in_batch; /* start batch, * NULL to start from beginning / __aligned_u64 out_batch; / output: next start batch / __aligned_u64 keys; __aligned_u64 values; __u32 count; / input/output: * input: # of key/value * elements * output: # of filled elements */ __u32 map_fd; __u64 elem_flags; __u64 flags; } batch; in_batch and out_batch are only used for lookup and lookup_and_delete since those are the only two operations that attempt to traverse the map. update/delete batch ops should provide the keys/values that user wants to modify. Here are the previous discussions on the batch processing: - https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/ - https://lore.kernel.org/bpf/20190829064502.2750303-1-yhs@fb.com/ - https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/ Changelog sinve v4: - Remove unnecessary checks from libbpf API (Andrii Nakryiko) - Move DECLARE_LIBBPF_OPTS with all var declarations (Andrii Nakryiko) - Change bucket internal buffer size to 5 entries (Yonghong Song) - Fix some minor bugs in hashtab batch ops implementation (Yonghong Song) Changelog sinve v3: - Do not use copy_to_user inside atomic region (Yonghong Song) - Use _opts approach on libbpf APIs (Andrii Nakryiko) - Drop generic_map_lookup_and_delete_batch support - Free malloc-ed memory in tests (Yonghong Song) - Reverse christmas tree (Yonghong Song) - Add acked labels Changelog sinve v2: - Add generic batch support for lpm_trie and test it (Yonghong Song) - Use define MAP_LOOKUP_RETRIES for retries (John Fastabend) - Return errors directly and remove labels (Yonghong Song) - Insert new API functions into libbpf alphabetically (Yonghong Song) - Change hlist_nulls_for_each_entry_rcu to hlist_nulls_for_each_entry_safe in htab batch ops (Yonghong Song) Changelog since v1: - Fix SOB ordering and remove Co-authored-by tag (Alexei Starovoitov) Changelog since RFC: - Change batch to in_batch and out_batch to support more flexible opaque values to iterate the bpf maps. - Remove update/delete specific batch ops for htab and use the generic implementations instead. ==================== Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-01-15	selftests/bpf: Add batch ops testing to array bpf map	Brian Vazquez	1	-0/+129
	Tested bpf_map_lookup_batch() and bpf_map_update_batch() functionality. $ ./test_maps ... test_array_map_batch_ops:PASS ... Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-10-brianvv@google.com
2020-01-15	selftests/bpf: Add batch ops testing for htab and htab_percpu map	Yonghong Song	1	-0/+283
	Tested bpf_map_lookup_batch(), bpf_map_lookup_and_delete_batch(), bpf_map_update_batch(), and bpf_map_delete_batch() functionality. $ ./test_maps ... test_htab_map_batch_ops:PASS test_htab_percpu_map_batch_ops:PASS ... Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-9-brianvv@google.com
2020-01-15	libbpf: Add libbpf support to batch ops	Yonghong Song	3	-0/+84
	Added four libbpf API functions to support map batch operations: . int bpf_map_delete_batch( ... ) . int bpf_map_lookup_batch( ... ) . int bpf_map_lookup_and_delete_batch( ... ) . int bpf_map_update_batch( ... ) Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-8-brianvv@google.com
2020-01-15	tools/bpf: Sync uapi header bpf.h	Yonghong Song	1	-0/+21
	sync uapi header include/uapi/linux/bpf.h to tools/include/uapi/linux/bpf.h Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-7-brianvv@google.com
2020-01-15	bpf: Add batch ops to all htab bpf map	Yonghong Song	4	-1/+276
	htab can't use generic batch support due some problematic behaviours inherent to the data structre, i.e. while iterating the bpf map a concurrent program might delete the next entry that batch was about to use, in that case there's no easy solution to retrieve the next entry, the issue has been discussed multiple times (see [1] and [2]). The only way hmap can be traversed without the problem previously exposed is by making sure that the map is traversing entire buckets. This commit implements those strict requirements for hmap, the implementation follows the same interaction that generic support with some exceptions: - If keys/values buffer are not big enough to traverse a bucket, ENOSPC will be returned. - out_batch contains the value of the next bucket in the iteration, not the next key, but this is transparent for the user since the user should never use out_batch for other than bpf batch syscalls. This commits implements BPF_MAP_LOOKUP_BATCH and adds support for new command BPF_MAP_LOOKUP_AND_DELETE_BATCH. Note that for update/delete batch ops it is possible to use the generic implementations. [1] https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/ [2] https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/ Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-6-brianvv@google.com
2020-01-15	bpf: Add lookup and update batch ops to arraymap	Brian Vazquez	1	-0/+2
	This adds the generic batch ops functionality to bpf arraymap, note that since deletion is not a valid operation for arraymap, only batch and lookup are added. Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200115184308.162644-5-brianvv@google.com
2020-01-15	bpf: Add generic support for update and delete batch ops	Brian Vazquez	3	-0/+127
	This commit adds generic support for update and delete batch ops that can be used for almost all the bpf maps. These commands share the same UAPI attr that lookup and lookup_and_delete batch ops use and the syscall commands are: BPF_MAP_UPDATE_BATCH BPF_MAP_DELETE_BATCH The main difference between update/delete and lookup batch ops is that for update/delete keys/values must be specified for userspace and because of that, neither in_batch nor out_batch are used. Suggested-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-4-brianvv@google.com
2020-01-15	bpf: Add generic support for lookup batch op	Brian Vazquez	3	-4/+179
	This commit introduces generic support for the bpf_map_lookup_batch. This implementation can be used by almost all the bpf maps since its core implementation is relying on the existing map_get_next_key and map_lookup_elem. The bpf syscall subcommand introduced is: BPF_MAP_LOOKUP_BATCH The UAPI attribute is: struct { /* struct used by BPF_MAP__BATCH commands / __aligned_u64 in_batch; /* start batch, * NULL to start from beginning / __aligned_u64 out_batch; / output: next start batch / __aligned_u64 keys; __aligned_u64 values; __u32 count; / input/output: * input: # of key/value * elements * output: # of filled elements / __u32 map_fd; __u64 elem_flags; __u64 flags; } batch; in_batch/out_batch are opaque values use to communicate between user/kernel space, in_batch/out_batch must be of key_size length. To start iterating from the beginning in_batch must be null, count is the # of key/value elements to retrieve. Note that the 'keys' buffer must be a buffer of key_size count size and the 'values' buffer must be value_size * count, where value_size must be aligned to 8 bytes by userspace if it's dealing with percpu maps. 'count' will contain the number of keys/values successfully retrieved. Note that 'count' is an input/output variable and it can contain a lower value after a call. If there's no more entries to retrieve, ENOENT will be returned. If error is ENOENT, count might be > 0 in case it copied some values but there were no more entries to retrieve. Note that if the return code is an error and not -EFAULT, count indicates the number of elements successfully processed. Suggested-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-3-brianvv@google.com
2020-01-15	bpf: Add bpf_map_{value_size, update_value, map_copy_value} functions	Brian Vazquez	1	-128/+152
	This commit moves reusable code from map_lookup_elem and map_update_elem to avoid code duplication in kernel/bpf/syscall.c. Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200115184308.162644-2-brianvv@google.com
2020-01-15	selftests/bpf: Add a test for attaching a bpf fentry/fexit trace to an XDP program	Eelco Chaudron	2	-0/+109
	Add a test that will attach a FENTRY and FEXIT program to the XDP test program. It will also verify data from the XDP context on FENTRY and verifies the return code on exit. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/157909410480.47481.11202505690938004673.stgit@xdp-tutorial
2020-01-15	libbpf: Support .text sub-calls relocations	Andrii Nakryiko	1	-7/+22
	The LLVM patch https://reviews.llvm.org/D72197 makes LLVM emit function call relocations within the same section. This includes a default .text section, which contains any BPF sub-programs. This wasn't the case before and so libbpf was able to get a way with slightly simpler handling of subprogram call relocations. This patch adds support for .text section relocations. It needs to ensure correct order of relocations, so does two passes: - first, relocate .text instructions, if there are any relocations in it; - then process all the other programs and copy over patched .text instructions for all sub-program calls. v1->v2: - break early once .text program is processed. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115190856.2391325-1-andriin@fb.com
2020-01-15	Merge branch 'bpf_send_signal_thread'	Alexei Starovoitov	5	-113/+131
	Yonghong Song says: ==================== Commit 8b401f9ed244 ("bpf: implement bpf_send_signal() helper") added helper bpf_send_signal() which permits bpf program to send a signal to the current process. The signal may send to any thread of the process. This patch implemented a new helper bpf_send_signal_thread() to send a signal to the thread corresponding to the kernel current task. This helper can simplify user space code if the thread context of bpf sending signal is needed in user space. Please see Patch #1 for details of use case and kernel implementation. Patch #2 added some bpf self tests for the new helper. Changelogs: v2 -> v3: - More simplification for skeleton codes by removing not-needed mmap code and redundantly created tracepoint link. v1 -> v2: - More description for the difference between bpf_send_signal() and bpf_send_signal_thread() in the uapi header bpf.h. - Use skeleton and mmap for send_signal test. ==================== Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-01-15	tools/bpf: Add self tests for bpf_send_signal_thread()	Yonghong Song	2	-106/+73
	The test_progs send_signal() is amended to test bpf_send_signal_thread() as well. $ ./test_progs -n 40 #40/1 send_signal_tracepoint:OK #40/2 send_signal_perf:OK #40/3 send_signal_nmi:OK #40/4 send_signal_tracepoint_thread:OK #40/5 send_signal_perf_thread:OK #40/6 send_signal_nmi_thread:OK #40 send_signal:OK Summary: 1/6 PASSED, 0 SKIPPED, 0 FAILED Also took this opportunity to rewrite the send_signal test using skeleton framework and array mmap to make code simpler and more readable. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115035003.602425-1-yhs@fb.com
2020-01-15	bpf: Add bpf_send_signal_thread() helper	Yonghong Song	3	-7/+58
	Commit 8b401f9ed244 ("bpf: implement bpf_send_signal() helper") added helper bpf_send_signal() which permits bpf program to send a signal to the current process. The signal may be delivered to any threads in the process. We found a use case where sending the signal to the current thread is more preferable. - A bpf program will collect the stack trace and then send signal to the user application. - The user application will add some thread specific information to the just collected stack trace for later analysis. If bpf_send_signal() is used, user application will need to check whether the thread receiving the signal matches the thread collecting the stack by checking thread id. If not, it will need to send signal to another thread through pthread_kill(). This patch proposed a new helper bpf_send_signal_thread(), which sends the signal to the thread corresponding to the current kernel task. This way, user space is guaranteed that bpf_program execution context and user space signal handling context are the same thread. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115035002.602336-1-yhs@fb.com
2020-01-15	xsk: Support allocations of large umems	Magnus Karlsson	1	-3/+4
	When registering a umem area that is sufficiently large (>1G on an x86), kmalloc cannot be used to allocate one of the internal data structures, as the size requested gets too large. Use kvmalloc instead that falls back on vmalloc if the allocation is too large for kmalloc. Also add accounting for this structure as it is triggered by a user space action (the XDP_UMEM_REG setsockopt) and it is by far the largest structure of kernel allocated memory in xsk. Reported-by: Ryan Goodfellow <rgoodfel@isi.edu> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/bpf/1578995365-7050-1-git-send-email-magnus.karlsson@intel.com
2020-01-14	bpf: Return -EBADRQC for invalid map type in __bpf_tx_xdp_map	Li RongQing	1	-1/+1
	A negative value should be returned if map->map_type is invalid although that is impossible now, but if we run into such situation in future, then xdpbuff could be leaked. Daniel Borkmann suggested: -EBADRQC should be returned to stay consistent with generic XDP for the tracepoint output and not to be confused with -EOPNOTSUPP from other locations like dev_map_enqueue() when ndo_xdp_xmit is missing and such. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1578618277-18085-1-git-send-email-lirongqing@baidu.com
2020-01-14	bpf: Fix seq_show for BPF_MAP_TYPE_STRUCT_OPS	Martin KaFai Lau	1	-4/+10
	Instead of using bpf_struct_ops_map_lookup_elem() which is not implemented, bpf_struct_ops_map_seq_show_elem() should also use bpf_struct_ops_map_sys_lookup_elem() which does an inplace update to the value. The change allocates a value to pass to bpf_struct_ops_map_sys_lookup_elem(). [root@arch-fb-vm1 bpf]# cat /sys/fs/bpf/dctcp {{{1}},BPF_STRUCT_OPS_STATE_INUSE,{{00000000df93eebc,00000000df93eebc},0,2, ... Fixes: 85d33df357b6 ("bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200114072647.3188298-1-kafai@fb.com
2020-01-14	Merge branch 'runqslower'	Alexei Starovoitov	11	-31/+437
	Andrii Nakryiko says: ==================== Based on recent BPF CO-RE, tp_btf, and BPF skeleton changes, re-implement BCC-based runqslower tool as a portable pre-compiled BPF CO-RE-based tool. Make sure it's built as part of selftests to ensure it doesn't bit rot. As part of this patch set, augment `format c` output of `bpftool btf dump` sub-command with applying `preserve_access_index` attribute to all structs and unions. This makes all such structs and unions automatically relocatable under BPF CO-RE, which improves user experience of writing TRACING programs with direct kernel memory read access. Also, further clean up selftest/bpf Makefile output and make it conforming to libbpf and bpftool succinct output format. v1->v2: - build in-tree bpftool for runqslower (Yonghong); - drop `format core` and augment `format c` instead (Alexei); - move runqslower under tools/bpf (Daniel). ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-01-14	selftests/bpf: Build runqslower from selftests	Andrii Nakryiko	2	-2/+7
	Ensure runqslower tool is built as part of selftests to prevent it from bit rotting. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200113073143.1779940-7-andriin@fb.com
2020-01-13	tools/bpf: Add runqslower tool to tools/bpf	Andrii Nakryiko	6	-5/+396
	Convert one of BCC tools (runqslower [0]) to BPF CO-RE + libbpf. It matches its BCC-based counterpart 1-to-1, supporting all the same parameters and functionality. runqslower tool utilizes BPF skeleton, auto-generated from BPF object file, as well as memory-mapped interface to global (read-only, in this case) data. Its Makefile also ensures auto-generation of "relocatable" vmlinux.h, which is necessary for BTF-typed raw tracepoints with direct memory access. [0] https://github.com/iovisor/bcc/blob/11bf5d02c895df9646c117c713082eb192825293/tools/runqslower.py Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200113073143.1779940-6-andriin@fb.com
2020-01-13	bpftool: Apply preserve_access_index attribute to all types in BTF dump	Andrii Nakryiko	1	-0/+8
	This patch makes structs and unions, emitted through BTF dump, automatically CO-RE-relocatable (unless disabled with `#define BPF_NO_PRESERVE_ACCESS_INDEX`, specified before including generated header file). This effectivaly turns usual bpf_probe_read() call into equivalent of bpf_core_read(), by automatically applying builtin_preserve_access_index to any field accesses of types in generated C types header. This is especially useful for tp_btf/fentry/fexit BPF program types. They allow direct memory access, so BPF C code just uses straightfoward a->b->c access pattern to read data from kernel. But without kernel structs marked as CO-RE relocatable through preserve_access_index attribute, one has to enclose all the data reads into a special __builtin_preserve_access_index code block, like so: __builtin_preserve_access_index(({ x = p->pid; /* where p is struct task_struct , for example / })); This is very inconvenient and obscures the logic quite a bit. By marking all auto-generated types with preserve_access_index attribute the above code is reduced to just a clean and natural `x = p->pid;`. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200113073143.1779940-5-andriin@fb.com
2020-01-13	selftests/bpf: Conform selftests/bpf Makefile output to libbpf and bpftool	Andrii Nakryiko	1	-22/+25
	Bring selftest/bpf's Makefile output to the same format used by libbpf and bpftool: 2 spaces of padding on the left + 8-character left-aligned build step identifier. Also, hide feature detection output by default. Can be enabled back by setting V=1. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200113073143.1779940-4-andriin@fb.com
2020-01-13	libbpf: Clean up bpf_helper_defs.h generation output	Andrii Nakryiko	2	-3/+1
	bpf_helpers_doc.py script, used to generate bpf_helper_defs.h, unconditionally emits one informational message to stderr. Remove it and preserve stderr to contain only relevant errors. Also make sure script invocations command is muted by default in libbpf's Makefile. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200113073143.1779940-3-andriin@fb.com
2020-01-13	tools: Sync uapi/linux/if_link.h	Andrii Nakryiko	1	-0/+1
	Sync uapi/linux/if_link.h into tools to avoid out of sync warnings during libbpf build. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200113073143.1779940-2-andriin@fb.com
2020-01-10	selftests/bpf: Add BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE macros	Andrii Nakryiko	11	-121/+193
	Streamline BPF_TRACE_x macro by moving out return type and section attribute definition out of macro itself. That makes those function look in source code similar to other BPF programs. Additionally, simplify its usage by determining number of arguments automatically (so just single BPF_TRACE vs a family of BPF_TRACE_1, BPF_TRACE_2, etc). Also, allow more natural function argument syntax without commas inbetween argument type and name. Given this helper is useful not only for tracing tp_btf/fenty/fexit programs, but could be used for LSM programs and others following the same pattern, rename BPF_TRACE macro into more generic BPF_PROG. Existing BPF_TRACE_x usages in selftests are converted to new BPF_PROG macro. Following the same pattern, define BPF_KPROBE and BPF_KRETPROBE macros for nicer usage of kprobe/kretprobe arguments, respectively. BPF_KRETPROBE, adopts same convention used by fexit programs, that last defined argument is probed function's return result. v4->v5: - fix test_overhead test (__set_task_comm is void) (Alexei); v3->v4: - rebased and fixed one more BPF_TRACE_x occurence (Alexei); v2->v3: - rename to shorter and as generic BPF_PROG (Alexei); v1->v2: - verified GCC handles pragmas as expected; - added descriptions to macros; - converted new STRUCT_OPS selftest to BPF_HANDLER (worked as expected); - added original context as 'ctx' parameter, for cases where it has to be passed into BPF helpers. This might cause an accidental naming collision, unfortunately, but at least it's easy to work around. Fortunately, this situation produces quite legible compilation error: progs/bpf_dctcp.c:46:6: error: redefinition of 'ctx' with a different type: 'int' vs 'unsigned long long ' int ctx = 123; ^ progs/bpf_dctcp.c:42:6: note: previous definition is here void BPF_HANDLER(dctcp_init, struct sock sk) ^ ./bpf_trace_helpers.h:58:32: note: expanded from macro 'BPF_HANDLER' ____##name(unsigned long long *ctx, ##args) Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200110211634.1614739-1-andriin@fb.com
2020-01-10	libbpf: Poison kernel-only integer types	Andrii Nakryiko	12	-1/+37
	It's been a recurring issue with types like u32 slipping into libbpf source code accidentally. This is not detected during builds inside kernel source tree, but becomes a compilation error in libbpf's Github repo. Libbpf is supposed to use only __{s,u}{8,16,32,64} typedefs, so poison {s,u}{8,16,32,64} explicitly in every .c file. Doing that in a bit more centralized way, e.g., inside libbpf_internal.h breaks selftests, which are both using kernel u32 and libbpf_internal.h. This patch also fixes a new u32 occurence in libbpf.c, added recently. Fixes: 590a00888250 ("bpf: libbpf: Add STRUCT_OPS support") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200110181916.271446-1-andriin@fb.com
2020-01-10	Merge branch 'bpf-global-funcs'	Daniel Borkmann	22	-89/+746
	Alexei Starovoitov says: ==================== Introduce static vs global functions and function by function verification. This is another step toward dynamic re-linking (or replacement) of global functions. See patch 2 for details. v2->v3: - cleaned up a check spotted by Song. - rebased and dropped patch 2 that was trying to improve BTF based on ELF. - added one more unit test for scalar return value from global func. v1->v2: - addressed review comments from Song, Andrii, Yonghong - fixed memory leak in error path - added modified ctx check - added more tests in patch 7 ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2020-01-10	selftests/bpf: Add unit tests for global functions	Alexei Starovoitov	8	-0/+280
	test_global_func[12] - check 512 stack limit. test_global_func[34] - check 8 frame call chain limit. test_global_func5 - check that non-ctx pointer cannot be passed into a function that expects context. test_global_func6 - check that ctx pointer is unmodified. test_global_func7 - check that global function returns scalar. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200110064124.1760511-7-ast@kernel.org
2020-01-10	selftests/bpf: Modify a test to check global functions	Alexei Starovoitov	1	-2/+2
	Make two static functions in test_xdp_noinline.c global: before: processed 2790 insns after: processed 2598 insns Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200110064124.1760511-6-ast@kernel.org
2020-01-10	selftests/bpf: Add a test for a large global function	Alexei Starovoitov	3	-2/+14
	test results: pyperf50 with always_inlined the same function five times: processed 46378 insns pyperf50 with global function: processed 6102 insns Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200110064124.1760511-5-ast@kernel.org
2020-01-10	selftests/bpf: Add fexit-to-skb test for global funcs	Alexei Starovoitov	3	-0/+44
	Add simple fexit prog type to skb prog type test when subprogram is a global function. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200110064124.1760511-4-ast@kernel.org
2020-01-10	bpf: Introduce function-by-function verification	Alexei Starovoitov	5	-84/+366
	New llvm and old llvm with libbpf help produce BTF that distinguish global and static functions. Unlike arguments of static function the arguments of global functions cannot be removed or optimized away by llvm. The compiler has to use exactly the arguments specified in a function prototype. The argument type information allows the verifier validate each global function independently. For now only supported argument types are pointer to context and scalars. In the future pointers to structures, sizes, pointer to packet data can be supported as well. Consider the following example: static int f1(int ...) { ... } int f3(int b); int f2(int a) { f1(a) + f3(a); } int f3(int b) { ... } int main(...) { f1(...) + f2(...) + f3(...); } The verifier will start its safety checks from the first global function f2(). It will recursively descend into f1() because it's static. Then it will check that arguments match for the f3() invocation inside f2(). It will not descend into f3(). It will finish f2() that has to be successfully verified for all possible values of 'a'. Then it will proceed with f3(). That function also has to be safe for all possible values of 'b'. Then it will start subprog 0 (which is main() function). It will recursively descend into f1() and will skip full check of f2() and f3(), since they are global. The order of processing global functions doesn't affect safety, since all global functions must be proven safe based on their arguments only. Such function by function verification can drastically improve speed of the verification and reduce complexity. Note that the stack limit of 512 still applies to the call chain regardless whether functions were static or global. The nested level of 8 also still applies. The same recursion prevention checks are in place as well. The type information and static/global kind is preserved after the verification hence in the above example global function f2() and f3() can be replaced later by equivalent functions with the same types that are loaded and verified later without affecting safety of this main() program. Such replacement (re-linking) of global functions is a subject of future patches. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200110064124.1760511-3-ast@kernel.org
2020-01-10	libbpf: Sanitize global functions	Alexei Starovoitov	2	-1/+40
	In case the kernel doesn't support BTF_FUNC_GLOBAL sanitize BTF produced by the compiler for global functions. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200110064124.1760511-2-ast@kernel.org
2020-01-09	Merge branch 'selftest-makefile-cleanup'	Alexei Starovoitov	4	-18/+22
	Andrii Nakryiko says: ==================== Fix issues with bpf_helper_defs.h usage in selftests/bpf. As part of that, fix the way clean up is performed for libbpf and selftests/bpf. Some for Makefile output clean ups as well. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-01-09	selftests/bpf: Further clean up Makefile output	Andrii Nakryiko	2	-5/+7
	Further clean up Makefile output: - hide "entering directory" messages; - silvence sub-Make command echoing; - succinct MKDIR messages. Also remove few test binaries that are not produced anymore from .gitignore. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200110051716.1591485-4-andriin@fb.com
2020-01-09	selftests/bpf: Ensure bpf_helper_defs.h are taken from selftests dir	Andrii Nakryiko	2	-7/+7
	Reorder includes search path to ensure $(OUTPUT) and $(CURDIR) go before libbpf's directory. Also fix bpf_helpers.h to include bpf_helper_defs.h in such a way as to leverage includes search path. This allows selftests to not use libbpf's local and potentially stale bpf_helper_defs.h. It's important because selftests/bpf's Makefile only re-generates bpf_helper_defs.h in seltests' output directory, not the one in libbpf's directory. Also force regeneration of bpf_helper_defs.h when libbpf.a is updated to reduce staleness. Fixes: fa633a0f8919 ("libbpf: Fix build on read-only filesystems") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200110051716.1591485-3-andriin@fb.com
2020-01-09	libbpf,selftests/bpf: Fix clean targets	Andrii Nakryiko	2	-6/+8
	Libbpf's clean target should clean out generated files in $(OUTPUT) directory and not make assumption that $(OUTPUT) directory is current working directory. Selftest's Makefile should delegate cleaning of libbpf-generated files to libbpf's Makefile. This ensures more robust clean up. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200110051716.1591485-2-andriin@fb.com
2020-01-09	libbpf: Make bpf_map order and indices stable	Andrii Nakryiko	1	-14/+0
	Currently, libbpf re-sorts bpf_map structs after all the maps are added and initialized, which might change their relative order and invalidate any bpf_map pointer or index taken before that. This is inconvenient and error-prone. For instance, it can cause .kconfig map index to point to a wrong map. Furthermore, libbpf itself doesn't rely on any specific ordering of bpf_maps, so it's just an unnecessary complication right now. This patch drops sorting of maps and makes their relative positions fixed. If efficient index is ever needed, it's better to have a separate array of pointers as a search index, instead of reordering bpf_map struct in-place. This will be less error-prone and will allow multiple independent orderings, if necessary (e.g., either by section index or by name). Fixes: 166750bc1dd2 ("libbpf: Support libbpf-provided extern variables") Reported-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200110034247.1220142-1-andriin@fb.com
2020-01-09	bpf: Document BPF_F_QUERY_EFFECTIVE flag	Andrey Ignatov	2	-2/+12
	Document BPF_F_QUERY_EFFECTIVE flag, mostly to clarify how it affects attach_flags what may not be obvious and what may lead to confision. Specifically attach_flags is returned only for target_fd but if programs are inherited from an ancestor cgroup then returned attach_flags for current cgroup may be confusing. For example, two effective programs of same attach_type can be returned but w/o BPF_F_ALLOW_MULTI in attach_flags. Simple repro: # bpftool c s /sys/fs/cgroup/path/to/task ID AttachType AttachFlags Name # bpftool c s /sys/fs/cgroup/path/to/task effective ID AttachType AttachFlags Name 95043 ingress tw_ipt_ingress 95048 ingress tw_ingress Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200108014006.938363-1-rdna@fb.com
2020-01-09	Merge branch 'tcp-bpf-cc'	Alexei Starovoitov	32	-148/+2654
	Martin Lau says: ==================== This series introduces BPF STRUCT_OPS. It is an infra to allow implementing some specific kernel's function pointers in BPF. The first use case included in this series is to implement TCP congestion control algorithm in BPF (i.e. implement struct tcp_congestion_ops in BPF). There has been attempt to move the TCP CC to the user space (e.g. CCP in TCP). The common arguments are faster turn around, get away from long-tail kernel versions in production...etc, which are legit points. BPF has been the continuous effort to join both kernel and userspace upsides together (e.g. XDP to gain the performance advantage without bypassing the kernel). The recent BPF advancements (in particular BTF-aware verifier, BPF trampoline, BPF CO-RE...) made implementing kernel struct ops (e.g. tcp cc) possible in BPF. The idea is to allow implementing tcp_congestion_ops in bpf. It allows a faster turnaround for testing algorithm in the production while leveraging the existing (and continue growing) BPF feature/framework instead of building one specifically for userspace TCP CC. Please see individual patch for details. The bpftool support will be posted in follow-up patches. v4: - Expose tcp_ca_find() to tcp.h in patch 7. It is used to check the same bpf-tcp-cc does not exist to guarantee the register() will succeed. - set_memory_ro() and then set_memory_x() only after all trampolines are written to the image in patch 6. (Daniel) spinlock is replaced by mutex because set_memory_* requires sleepable context. v3: - Fix kbuild error by considering CONFIG_BPF_SYSCALL (kbuild) - Support anonymous bitfield in patch 4 (Andrii, Yonghong) - Push boundary safety check to a specific arch's trampoline function (in patch 6) (Yonghong). Reuse the WANR_ON_ONCE check in arch_prepare_bpf_trampoline() in x86. - Check module field is 0 in udata in patch 6 (Yonghong) - Check zero holes in patch 6 (Andrii) - s/_btf_vmlinux/btf/ in patch 5 and 7 (Andrii) - s/check_xxx/is_xxx/ in patch 7 (Andrii) - Use "struct_ops/" convention in patch 11 (Andrii) - Use the skel instead of bpf_object in patch 11 (Andrii) - libbpf: Decide BPF_PROG_TYPE_STRUCT_OPS at open phase by using find_sec_def() - libbpf: Avoid a debug message at open phase (Andrii) - libbpf: Add bpf_program__(is\|set)_struct_ops() for consistency (Andrii) - libbpf: Add "struct_ops" to section_defs (Andrii) - libbpf: Some code shuffling in init_kern_struct_ops() (Andrii) - libbpf: A few safety checks (Andrii) v2: - Dropped cubic for now. They will be reposted once there are more clarity in "jiffies" on both bpf side (about the helper) and tcp_cubic side (some of jiffies usages are being replaced by tp->tcp_mstamp) - Remove unnecssary check on bitfield support from btf_struct_access() (Yonghong) - BTF_TYPE_EMIT macro (Yonghong, Andrii) - value_name's length check to avoid an unlikely type match during truncation case (Yonghong) - BUILD_BUG_ON to ensure no trampoline-image overrun in the future (Yonghong) - Simplify get_next_key() (Yonghong) - Added comment to explain how to check mandatory func ptr in net/ipv4/bpf_tcp_ca.c (Yonghong) - Rename "__bpf_" to "bpf_struct_ops_" for value prefix (Andrii) - Add comment to highlight the bpf_dctcp.c is not necessarily the same as tcp_dctcp.c. (Alexei, Eric) - libbpf: Renmae "struct_ops" to ".struct_ops" for elf sec (Andrii) - libbpf: Expose struct_ops as a bpf_map (Andrii) - libbpf: Support multiple struct_ops in SEC(".struct_ops") (Andrii) - libbpf: Add bpf_map__attach_struct_ops() (Andrii) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>