wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2023-10-16	bpf: Disambiguate SCALAR register state output in verifier logs	Andrii Nakryiko	3	-32/+55
	Currently the way that verifier prints SCALAR_VALUE register state (and PTR_TO_PACKET, which can have var_off and ranges info as well) is very ambiguous. In the name of brevity we are trying to eliminate "unnecessary" output of umin/umax, smin/smax, u32_min/u32_max, and s32_min/s32_max values, if possible. Current rules are that if any of those have their default value (which for mins is the minimal value of its respective types: 0, S32_MIN, or S64_MIN, while for maxs it's U32_MAX, S32_MAX, S64_MAX, or U64_MAX) OR if there is another min/max value that as matching value. E.g., if smin=100 and umin=100, we'll emit only umin=10, omitting smin altogether. This approach has a few problems, being both ambiguous and sort-of incorrect in some cases. Ambiguity is due to missing value could be either default value or value of umin/umax or smin/smax. This is especially confusing when we mix signed and unsigned ranges. Quite often, umin=0 and smin=0, and so we'll have only `umin=0` leaving anyone reading verifier log to guess whether smin is actually 0 or it's actually -9223372036854775808 (S64_MIN). And often times it's important to know, especially when debugging tricky issues. "Sort-of incorrectness" comes from mixing negative and positive values. E.g., if umin is some large positive number, it can be equal to smin which is, interpreted as signed value, is actually some negative value. Currently, that smin will be omitted and only umin will be emitted with a large positive value, giving an impression that smin is also positive. Anyway, ambiguity is the biggest issue making it impossible to have an exact understanding of register state, preventing any sort of automated testing of verifier state based on verifier log. This patch is attempting to rectify the situation by removing ambiguity, while minimizing the verboseness of register state output. The rules are straightforward: - if some of the values are missing, then it definitely has a default value. I.e., `umin=0` means that umin is zero, but smin is actually S64_MIN; - all the various boundaries that happen to have the same value are emitted in one equality separated sequence. E.g., if umin and smin are both 100, we'll emit `smin=umin=100`, making this explicit; - we do not mix negative and positive values together, and even if they happen to have the same bit-level value, they will be emitted separately with proper sign. I.e., if both umax and smax happen to be 0xffffffffffffffff, we'll emit them both separately as `smax=-1,umax=18446744073709551615`; - in the name of a bit more uniformity and consistency, {u32,s32}_{min,max} are renamed to {s,u}{min,max}32, which seems to improve readability. The above means that in case of all 4 ranges being, say, [50, 100] range, we'd previously see hugely ambiguous: R1=scalar(umin=50,umax=100) Now, we'll be more explicit: R1=scalar(smin=umin=smin32=umin32=50,smax=umax=smax32=umax32=100) This is slightly more verbose, but distinct from the case when we don't know anything about signed boundaries and 32-bit boundaries, which under new rules will match the old case: R1=scalar(umin=50,umax=100) Also, in the name of simplicity of implementation and consistency, order for {s,u}32_{min,max} are emitted before var_off. Previously they were emitted afterwards, for unclear reasons. This patch also includes a few fixes to selftests that expect exact register state to accommodate slight changes to verifier format. You can see that the changes are pretty minimal in common cases. Note, the special case when SCALAR_VALUE register is a known constant isn't changed, we'll emit constant value once, interpreted as signed value. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20231011223728.3188086-5-andrii@kernel.org
2023-10-16	selftests/bpf: Make align selftests more robust	Andrii Nakryiko	1	-120/+121
	Align subtest is very specific and finicky about expected verifier log output and format. This is often completely unnecessary as in a bunch of situations test actually cares about var_off part of register state. But given how exact it is right now, any tiny verifier log changes can lead to align tests failures, requiring constant adjustment. This patch tries to make this a bit more robust by making logic first search for specified register and then allowing to match only portion of register state, not everything exactly. This will come handly with follow up changes to SCALAR register output disambiguation. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20231011223728.3188086-4-andrii@kernel.org
2023-10-16	selftests/bpf: Improve missed_kprobe_recursion test robustness	Andrii Nakryiko	1	-4/+4
	Given missed_kprobe_recursion is non-serial and uses common testing kfuncs to count number of recursion misses it's possible that some other parallel test can trigger extraneous recursion misses. So we can't expect exactly 1 miss. Relax conditions and expect at least one. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20231011223728.3188086-3-andrii@kernel.org
2023-10-16	selftests/bpf: Improve percpu_alloc test robustness	Andrii Nakryiko	3	-0/+14
	Make these non-serial tests filter BPF programs by intended PID of a test runner process. This makes it isolated from other parallel tests that might interfere accidentally. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20231011223728.3188086-2-andrii@kernel.org
2023-10-13	selftests/bpf: Add tests for open-coded task_vma iter	Dave Marchevsky	3	-0/+109
	The open-coded task_vma iter added earlier in this series allows for natural iteration over a task's vmas using existing open-coded iter infrastructure, specifically bpf_for_each. This patch adds a test demonstrating this pattern and validating correctness. The vma->vm_start and vma->vm_end addresses of the first 1000 vmas are recorded and compared to /proc/PID/maps output. As expected, both see the same vmas and addresses - with the exception of the [vsyscall] vma - which is explained in a comment in the prog_tests program. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231013204426.1074286-5-davemarchevsky@fb.com
2023-10-13	bpf: Introduce task_vma open-coded iterator kfuncs	Dave Marchevsky	2	-0/+94
	This patch adds kfuncs bpf_iter_task_vma_{new,next,destroy} which allow creation and manipulation of struct bpf_iter_task_vma in open-coded iterator style. BPF programs can use these kfuncs directly or through bpf_for_each macro for natural-looking iteration of all task vmas. The implementation borrows heavily from bpf_find_vma helper's locking - differing only in that it holds the mmap_read lock for all iterations while the helper only executes its provided callback on a maximum of 1 vma. Aside from locking, struct vma_iterator and vma_next do all the heavy lifting. A pointer to an inner data struct, struct bpf_iter_task_vma_data, is the only field in struct bpf_iter_task_vma. This is because the inner data struct contains a struct vma_iterator (not ptr), whose size is likely to change under us. If bpf_iter_task_vma_kern contained vma_iterator directly such a change would require change in opaque bpf_iter_task_vma struct's size. So better to allocate vma_iterator using BPF allocator, and since that alloc must already succeed, might as well allocate all iter fields, thereby freezing struct bpf_iter_task_vma size. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231013204426.1074286-4-davemarchevsky@fb.com
2023-10-13	selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c	Dave Marchevsky	2	-13/+13
	Further patches in this series will add a struct bpf_iter_task_vma, which will result in a name collision with the selftest prog renamed in this patch. Rename the selftest to avoid the collision. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231013204426.1074286-3-davemarchevsky@fb.com
2023-10-13	bpf: Don't explicitly emit BTF for struct btf_iter_num	Dave Marchevsky	1	-2/+0
	Commit 6018e1f407cc ("bpf: implement numbers iterator") added the BTF_TYPE_EMIT line that this patch is modifying. The struct btf_iter_num doesn't exist, so only a forward declaration is emitted in BTF: FWD 'btf_iter_num' fwd_kind=struct That commit was probably hoping to ensure that struct bpf_iter_num is emitted in vmlinux BTF. A previous version of this patch changed the line to emit the correct type, but Yonghong confirmed that it would definitely be emitted regardless in [0], so this patch simply removes the line. This isn't marked "Fixes" because the extraneous btf_iter_num FWD wasn't causing any issues that I noticed, aside from mild confusion when I looked through the code. [0]: https://lore.kernel.org/bpf/25d08207-43e6-36a8-5e0f-47a913d4cda5@linux.dev/ Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231013204426.1074286-2-davemarchevsky@fb.com
2023-10-13	bpf: Change syscall_nr type to int in struct syscall_tp_t	Artem Savkov	3	-5/+5
	linux-rt-devel tree contains a patch (b1773eac3f29c ("sched: Add support for lazy preemption")) that adds an extra member to struct trace_entry. This causes the offset of args field in struct trace_event_raw_sys_enter be different from the one in struct syscall_trace_enter: struct trace_event_raw_sys_enter { struct trace_entry ent; /* 0 12 / / XXX last struct has 3 bytes of padding / / XXX 4 bytes hole, try to pack / long int id; / 16 8 / long unsigned int args[6]; / 24 48 / / --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- / char __data[]; / 72 0 / / size: 72, cachelines: 2, members: 4 / / sum members: 68, holes: 1, sum holes: 4 / / paddings: 1, sum paddings: 3 / / last cacheline: 8 bytes / }; struct syscall_trace_enter { struct trace_entry ent; / 0 12 / / XXX last struct has 3 bytes of padding / int nr; / 12 4 / long unsigned int args[]; / 16 0 / / size: 16, cachelines: 1, members: 3 / / paddings: 1, sum paddings: 3 / / last cacheline: 16 bytes */ }; This, in turn, causes perf_event_set_bpf_prog() fail while running bpf test_profiler testcase because max_ctx_offset is calculated based on the former struct, while off on the latter: 10488 if (is_tracepoint \|\| is_syscall_tp) { 10489 int off = trace_event_get_offsets(event->tp_event); 10490 10491 if (prog->aux->max_ctx_offset > off) 10492 return -EACCES; 10493 } What bpf program is actually getting is a pointer to struct syscall_tp_t, defined in kernel/trace/trace_syscalls.c. This patch fixes the problem by aligning struct syscall_tp_t with struct syscall_trace_(enter\|exit) and changing the tests to use these structs to dereference context. Signed-off-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20231013054219.172920-1-asavkov@redhat.com
2023-10-13	net/bpf: Avoid unused "sin_addr_len" warning when CONFIG_CGROUP_BPF is not set	Martin KaFai Lau	1	-1/+1
	It was reported that there is a compiler warning on the unused variable "sin_addr_len" in af_inet.c when CONFIG_CGROUP_BPF is not set. This patch is to address it similar to the ipv6 counterpart in inet6_getname(). It is to "return sin_addr_len;" instead of "return sizeof(*sin);". Fixes: fefba7d1ae19 ("bpf: Propagate modified uaddrlen from cgroup sockaddr programs") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/bpf/20231013185702.3993710-1-martin.lau@linux.dev Closes: https://lore.kernel.org/bpf/20231013114007.2fb09691@canb.auug.org.au/
2023-10-13	bpf: Avoid unnecessary audit log for CPU security mitigations	Yafang Shao	1	-2/+2
	Check cpu_mitigations_off() first to avoid calling capable() if it is off. This can avoid unnecessary audit log. Fixes: bc5bc309db45 ("bpf: Inherit system settings for CPU security mitigations") Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/CAEf4Bza6UVUWqcWQ-66weZ-nMDr+TFU3Mtq=dumZFD-pSqU7Ow@mail.gmail.com/ Link: https://lore.kernel.org/bpf/20231013083916.4199-1-laoar.shao@gmail.com
2023-10-11	selftests/bpf: Add tests for cgroup unix socket address hooks	Daan De Meyer	10	-0/+883
	These selftests are written in prog_tests style instead of adding them to the existing test_sock_addr tests. Migrating the existing sock addr tests to prog_tests style is left for future work. This commit adds support for testing bind() sockaddr hooks, even though there's no unix socket sockaddr hook for bind(). We leave this code intact for when the INET and INET6 tests are migrated in the future which do support intercepting bind(). Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-10-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-11	selftests/bpf: Make sure mount directory exists	Daan De Meyer	1	-0/+5
	The mount directory for the selftests cgroup tree might not exist so let's make sure it does exist by creating it ourselves if it doesn't exist. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-9-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-11	documentation/bpf: Document cgroup unix socket address hooks	Daan De Meyer	1	-0/+10
	Update the documentation to mention the new cgroup unix sockaddr hooks. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-8-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-11	bpftool: Add support for cgroup unix socket address hooks	Daan De Meyer	5	-23/+38
	Add the necessary plumbing to hook up the new cgroup unix sockaddr hooks into bpftool. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/r/20231011185113.140426-7-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-11	libbpf: Add support for cgroup unix socket address hooks	Daan De Meyer	1	-0/+10
	Add the necessary plumbing to hook up the new cgroup unix sockaddr hooks into libbpf. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-6-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-11	bpf: Implement cgroup sockaddr hooks for unix sockets	Daan De Meyer	9	-14/+114
	These hooks allows intercepting connect(), getsockname(), getpeername(), sendmsg() and recvmsg() for unix sockets. The unix socket hooks get write access to the address length because the address length is not fixed when dealing with unix sockets and needs to be modified when a unix socket address is modified by the hook. Because abstract socket unix addresses start with a NUL byte, we cannot recalculate the socket address in kernelspace after running the hook by calculating the length of the unix socket path using strlen(). These hooks can be used when users want to multiplex syscall to a single unix socket to multiple different processes behind the scenes by redirecting the connect() and other syscalls to process specific sockets. We do not implement support for intercepting bind() because when using bind() with unix sockets with a pathname address, this creates an inode in the filesystem which must be cleaned up. If we rewrite the address, the user might try to clean up the wrong file, leaking the socket in the filesystem where it is never cleaned up. Until we figure out a solution for this (and a use case for intercepting bind()), we opt to not allow rewriting the sockaddr in bind() calls. We also implement recvmsg() support for connected streams so that after a connect() that is modified by a sockaddr hook, any corresponding recmvsg() on the connected socket can also be modified to make the connected program think it is connected to the "intended" remote. Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-5-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-11	bpf: Add bpf_sock_addr_set_sun_path() to allow writing unix sockaddr from bpf	Daan De Meyer	2	-1/+35
	As prep for adding unix socket support to the cgroup sockaddr hooks, let's add a kfunc bpf_sock_addr_set_sun_path() that allows modifying a unix sockaddr from bpf. While this is already possible for AF_INET and AF_INET6, we'll need this kfunc when we add unix socket support since modifying the address for those requires modifying both the address and the sockaddr length. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-4-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-11	bpf: Propagate modified uaddrlen from cgroup sockaddr programs	Daan De Meyer	11	-54/+76
	As prep for adding unix socket support to the cgroup sockaddr hooks, let's propagate the sockaddr length back to the caller after running a bpf cgroup sockaddr hook program. While not important for AF_INET or AF_INET6, the sockaddr length is important when working with AF_UNIX sockaddrs as the size of the sockaddr cannot be determined just from the address family or the sockaddr's contents. __cgroup_bpf_run_filter_sock_addr() is modified to take the uaddrlen as an input/output argument. After running the program, the modified sockaddr length is stored in the uaddrlen pointer. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-3-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-11	selftests/bpf: Add missing section name tests for getpeername/getsockname	Daan De Meyer	1	-0/+20
	These were missed when these hooks were first added so add them now instead to make sure every sockaddr hook has a matching section name test. Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-2-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-09	selftests/bpf: Add BPF_FIB_LOOKUP_SRC tests	Martynas Pumputis	1	-6/+77
	This patch extends the existing fib_lookup test suite by adding two test cases (for each IP family): * Test source IP selection from the egressing netdev. * Test source IP selection when an IP route has a preferred src IP addr. Signed-off-by: Martynas Pumputis <m@lambda.lt> Link: https://lore.kernel.org/r/20231007081415.33502-3-m@lambda.lt Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-09	bpf: Derive source IP addr via bpf_*_fib_lookup()	Martynas Pumputis	5	-1/+43
	Extend the bpf_fib_lookup() helper by making it to return the source IPv4/IPv6 address if the BPF_FIB_LOOKUP_SRC flag is set. For example, the following snippet can be used to derive the desired source IP address: struct bpf_fib_lookup p = { .ipv4_dst = ip4->daddr }; ret = bpf_skb_fib_lookup(skb, p, sizeof(p), BPF_FIB_LOOKUP_SRC \| BPF_FIB_LOOKUP_SKIP_NEIGH); if (ret != BPF_FIB_LKUP_RET_SUCCESS) return TC_ACT_SHOT; /* the p.ipv4_src now contains the source address */ The inability to derive the proper source address may cause malfunctions in BPF-based dataplanes for hosts containing netdevs with more than one routable IP address or for multi-homed hosts. For example, Cilium implements packet masquerading in BPF. If an egressing netdev to which the Cilium's BPF prog is attached has multiple IP addresses, then only one [hardcoded] IP address can be used for masquerading. This breaks connectivity if any other IP address should have been selected instead, for example, when a public and private addresses are attached to the same egress interface. The change was tested with Cilium [1]. Nikolay Aleksandrov helped to figure out the IPv6 addr selection. [1]: https://github.com/cilium/cilium/pull/28283 Signed-off-by: Martynas Pumputis <m@lambda.lt> Link: https://lore.kernel.org/r/20231007081415.33502-2-m@lambda.lt Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-09	bpftool: Align bpf_load_and_run_opts insns and data	Ian Rogers	1	-20/+23
	A C string lacks alignment so use aligned arrays to avoid potential alignment problems. Switch to using sizeof (less 1 for the \0 terminator) rather than a hardcode size constant. Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20231007044439.25171-2-irogers@google.com
2023-10-09	bpftool: Align output skeleton ELF code	Ian Rogers	1	-6/+9
	libbpf accesses the ELF data requiring at least 8 byte alignment, however, the data is generated into a C string that doesn't guarantee alignment. Fix this by assigning to an aligned char array. Use sizeof on the array, less one for the \0 terminator, rather than generating a constant. Fixes: a6cc6b34b93e ("bpftool: Provide a helper method for accessing skeleton's embedded ELF data") Signed-off-by: Ian Rogers <irogers@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20231007044439.25171-1-irogers@google.com
2023-10-09	selftests/bpf: Test pinning bpf timer to a core	David Vernet	2	-1/+66
	Now that we support pinning a BPF timer to the current core, we should test it with some selftests. This patch adds two new testcases to the timer suite, which verifies that a BPF timer both with and without BPF_F_TIMER_ABS, can be pinned to the calling core with BPF_F_TIMER_CPU_PIN. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <song@kernel.org> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20231004162339.200702-3-void@manifault.com
2023-10-09	bpf: Add ability to pin bpf timer to calling CPU	David Vernet	3	-1/+12
	BPF supports creating high resolution timers using bpf_timer_* helper functions. Currently, only the BPF_F_TIMER_ABS flag is supported, which specifies that the timeout should be interpreted as absolute time. It would also be useful to be able to pin that timer to a core. For example, if you wanted to make a subset of cores run without timer interrupts, and only have the timer be invoked on a single core. This patch adds support for this with a new BPF_F_TIMER_CPU_PIN flag. When specified, the HRTIMER_MODE_PINNED flag is passed to hrtimer_start(). A subsequent patch will update selftests to validate. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <song@kernel.org> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20231004162339.200702-2-void@manifault.com
2023-10-06	bpf: Annotate struct bpf_stack_map with __counted_by	Kees Cook	1	-1/+1
	Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle [1], add __counted_by for struct bpf_stack_map. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci [1] Link: https://lore.kernel.org/bpf/20231006201657.work.531-kees@kernel.org
2023-10-06	selftests/bpf: Add pairs_redir_to_connected helper	Geliang Tang	1	-115/+31
	Extract duplicate code from these four functions unix_redir_to_connected() udp_redir_to_connected() inet_unix_redir_to_connected() unix_inet_redir_to_connected() to generate a new helper pairs_redir_to_connected(). Create the different socketpairs in these four functions, then pass the socketpairs info to the new common helper to do the connections. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Link: https://lore.kernel.org/r/54bb28dcf764e7d4227ab160883931d2173f4f3d.1696588133.git.geliang.tang@suse.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-06	selftests/bpf: Don't truncate #test/subtest field	Andrii Nakryiko	1	-1/+1
	We currently expect up to a three-digit number of tests and subtests, so: #999/999: some_test/some_subtest: ... Is the largest test/subtest we can see. If we happen to cross into 1000s, current logic will just truncate everything after 7th character. This patch fixes this truncate and allows to go way higher (up to 31 characters in total). We still nicely align test numbers: #60/66 core_reloc_btfgen/type_based___incompat:OK #60/67 core_reloc_btfgen/type_based___fn_wrong_args:OK #60/68 core_reloc_btfgen/type_id:OK #60/69 core_reloc_btfgen/type_id___missing_targets:OK #60/70 core_reloc_btfgen/enumval:OK Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20231006175744.3136675-3-andrii@kernel.org
2023-10-06	selftests/bpf: Support building selftests in optimized -O2 mode	Andrii Nakryiko	1	-6/+8
	Add support for building selftests with -O2 level of optimization, which allows more compiler warnings detection (like lots of potentially uninitialized usage), but also is useful to have a faster-running test for some CPU-intensive tests. One can build optimized versions of libbpf and selftests by running: $ make RELEASE=1 There is a measurable speed up of about 10 seconds for me locally, though it's mostly capped by non-parallelized serial tests. User CPU time goes down by total 40 seconds, from 1m10s to 0m28s. Unoptimized build (-O0) ======================= Summary: 430/3544 PASSED, 25 SKIPPED, 4 FAILED real 1m59.937s user 1m10.877s sys 3m14.880s Optimized build (-O2) ===================== Summary: 425/3543 PASSED, 25 SKIPPED, 9 FAILED real 1m50.540s user 0m28.406s sys 3m13.198s Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20231006175744.3136675-2-andrii@kernel.org
2023-10-06	selftests/bpf: Fix compiler warnings reported in -O2 mode	Andrii Nakryiko	15	-24/+27
	Fix a bunch of potentially unitialized variable usage warnings that are reported by GCC in -O2 mode. Also silence overzealous stringop-truncation class of warnings. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20231006175744.3136675-1-andrii@kernel.org
2023-10-06	bpf: Inherit system settings for CPU security mitigations	Yafang Shao	1	-2/+2
	Currently, there exists a system-wide setting related to CPU security mitigations, denoted as 'mitigations='. When set to 'mitigations=off', it deactivates all optional CPU mitigations. Therefore, if we implement a system-wide 'mitigations=off' setting, it should inherently bypass Spectre v1 and Spectre v4 in the BPF subsystem. Please note that there is also a more specific 'nospectre_v1' setting on x86 and ppc architectures, though it is not currently exported. For the time being, let's disregard more fine-grained options. This idea emerged during our discussion about potential Spectre v1 attacks with Luis [0]. [0] https://lore.kernel.org/bpf/b4fc15f7-b204-767e-ebb9-fdb4233961fb@iogearbox.net Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Song Liu <song@kernel.org> Acked-by: KP Singh <kpsingh@kernel.org> Cc: Luis Gerhorst <gerhorst@cs.fau.de> Link: https://lore.kernel.org/bpf/20231005084123.1338-1-laoar.shao@gmail.com
2023-10-05	bpf: Fix the comment for bpf_restore_data_end()	Akihiko Odaki	1	-1/+1
	The comment used to say: > Restore data saved by bpf_compute_data_pointers(). But bpf_compute_data_pointers() does not save the data; bpf_compute_and_save_data_end() does. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231005072137.29870-1-akihiko.odaki@daynix.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-05	selftests/bpf: Enable CONFIG_VSOCKETS in config	Geliang Tang	1	-0/+1
	CONFIG_VSOCKETS is required by BPF selftests, otherwise we get errors like this: ./test_progs:socket_loopback_reuseport:386: socket: Address family not supported by protocol socket_loopback_reuseport:FAIL:386 ./test_progs:vsock_unix_redir_connectible:1496: vsock_socketpair_connectible() failed vsock_unix_redir_connectible:FAIL:1496 So this patch enables it in tools/testing/selftests/bpf/config. Signed-off-by: Geliang Tang <geliang.tang@suse.com> Link: https://lore.kernel.org/r/472e73d285db2ea59aca9bbb95eb5d4048327588.1696490003.git.geliang.tang@suse.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-04	selftests/bpf: Add uprobe_multi to gen_tar target	Björn Töpel	1	-1/+1
	The uprobe_multi program was not picked up for the gen_tar target. Fix by adding it to TEST_GEN_FILES. Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20231004122721.54525-4-bjorn@kernel.org
2023-10-04	selftests/bpf: Enable lld usage for RISC-V	Björn Töpel	1	-1/+1
	RISC-V has proper lld support. Use that, similar to what x86 does, for urandom_read et al. Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231004122721.54525-3-bjorn@kernel.org
2023-10-04	selftests/bpf: Add cross-build support for urandom_read et al	Björn Töpel	1	-3/+5
	Some userland programs in the BPF test suite, e.g. urandom_read, is missing cross-build support. Add cross-build support for these programs Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231004122721.54525-2-bjorn@kernel.org
2023-10-04	selftests/bpf: Define SYS_NANOSLEEP_KPROBE_NAME for riscv	Björn Töpel	1	-0/+2
	Add missing sys_nanosleep name for RISC-V, which is used by some tests (e.g. attach_probe). Fixes: 08d0ce30e0e4 ("riscv: Implement syscall wrappers") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/bpf/20231004110905.49024-4-bjorn@kernel.org
2023-10-04	selftests/bpf: Define SYS_PREFIX for riscv	Björn Töpel	1	-0/+3
	SYS_PREFIX was missing for a RISC-V, which made a couple of kprobe tests fail. Add missing SYS_PREFIX for RISC-V. Fixes: 08d0ce30e0e4 ("riscv: Implement syscall wrappers") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/bpf/20231004110905.49024-3-bjorn@kernel.org
2023-10-04	libbpf: Fix syscall access arguments on riscv	Alexandre Ghiti	1	-2/+0
	Since commit 08d0ce30e0e4 ("riscv: Implement syscall wrappers"), riscv selects ARCH_HAS_SYSCALL_WRAPPER so let's use the generic implementation of PT_REGS_SYSCALL_REGS(). Fixes: 08d0ce30e0e4 ("riscv: Implement syscall wrappers") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/bpf/20231004110905.49024-2-bjorn@kernel.org
2023-10-04	selftests/xsk: Add a test for shared umem feature	Tushar Vyavahare	4	-7/+72
	Add a new test for testing shared umem feature. This is accomplished by adding a new XDP program and using the multiple sockets. The new XDP program redirects the packets based on the destination MAC address. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20230927135241.2287547-9-tushar.vyavahare@intel.com
2023-10-04	selftests/xsk: Modify xsk_update_xskmap() to accept the index as an argument	Tushar Vyavahare	3	-6/+5
	Modify xsk_update_xskmap() to accept the index as an argument, enabling the addition of multiple sockets to xskmap. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20230927135241.2287547-8-tushar.vyavahare@intel.com
2023-10-04	selftests/xsk: Iterate over all the sockets in the send pkts function	Tushar Vyavahare	1	-19/+38
	Update send_pkts() to handle multiple sockets for sending packets. Multiple TX sockets are utilized alternately based on the batch size for improve packet transmission. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20230927135241.2287547-7-tushar.vyavahare@intel.com
2023-10-04	selftests/xsk: Remove unnecessary parameter from pkt_set() function call	Tushar Vyavahare	1	-16/+11
	The pkt_set() function no longer needs the umem parameter. This commit removes the umem parameter from the pkt_set() function. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20230927135241.2287547-6-tushar.vyavahare@intel.com
2023-10-04	selftests/xsk: Iterate over all the sockets in the receive pkts function	Tushar Vyavahare	3	-113/+178
	Improve the receive_pkt() function to enable it to receive packets from multiple sockets. Define a sock_num variable to iterate through all the sockets in the Rx path. Add nb_valid_entries to check that all the expected number of packets are received. Revise the function __receive_pkts() to only inspect the receive ring once, handle any received packets, and promptly return. Implement a bitmap to store the value of number of sockets. Update Makefile to include find_bit.c for compiling xskxceiver. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20230927135241.2287547-5-tushar.vyavahare@intel.com
2023-10-04	selftests/xsk: Move src_mac and dst_mac to the xsk_socket_info	Tushar Vyavahare	2	-20/+21
	Move the src_mac and dst_mac fields from the ifobject structure to the xsk_socket_info structure to achieve per-socket MAC address assignment. Require this in order to steer traffic to various sockets in subsequent patches. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20230927135241.2287547-4-tushar.vyavahare@intel.com
2023-10-04	selftests/xsk: Rename xsk_xdp_metadata.h to xsk_xdp_common.h	Tushar Vyavahare	3	-2/+7
	Rename the header file to a generic name so that it can be used by all future XDP programs. Ensure that the xsk_xdp_common.h header file includes include guards. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20230927135241.2287547-3-tushar.vyavahare@intel.com
2023-10-04	selftests/xsk: Move pkt_stream to the xsk_socket_info	Tushar Vyavahare	2	-35/+38
	Move the packet stream from the ifobject struct to the xsk_socket_info struct to enable the use of different streams for different sockets. This will facilitate the sending and receiving of data from multiple sockets simultaneously using the SHARED_XDP_UMEM feature. Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20230927135241.2287547-2-tushar.vyavahare@intel.com
2023-09-29	libbpf: Allow Golang symbols in uprobe secdef	Hengqi Chen	1	-6/+16
	Golang symbols in ELF files are different from C/C++ which contains special characters like '', '(' and ')'. With generics, things get more complicated, there are symbols like: github.com/cilium/ebpf/internal.(Deque[go.shape.interface { Format(fmt.State, int32); TypeName() string;github.com/cilium/ebpf/btf.copy() github.com/cilium/ebpf/btf.Type}]).Grow Matching such symbols using `%m[^\n]` in sscanf, this excludes newline which typically does not appear in ELF symbols. This should work in most use-cases and also work for unicode letters in identifiers. If newline do show up in ELF symbols, users can still attach to such symbol by specifying bpf_uprobe_opts::func_name. A working example can be found at this repo ([0]). [0]: https://github.com/chenhengqi/libbpf-go-symbols Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230929155954.92448-1-hengqi.chen@gmail.com
2023-09-28	samples/bpf: Add -fsanitize=bounds to userspace programs	Ruowen Qin	1	-0/+3
	The sanitizer flag, which is supported by both clang and gcc, would make it easier to debug array index out-of-bounds problems in these programs. Make the Makfile smarter to detect ubsan support from the compiler and add the '-fsanitize=bounds' accordingly. Suggested-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Jinghao Jia <jinghao@linux.ibm.com> Signed-off-by: Jinghao Jia <jinghao7@illinois.edu> Signed-off-by: Ruowen Qin <ruowenq2@illinois.edu> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20230927045030.224548-2-ruowenq2@illinois.edu