Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch removes the libcap usage from test_verifier.
The cap_*_effective() helpers added in the earlier patch are
used instead.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220316173829.2038682-1-kafai@fb.com
|
|
After upgrading to the newer libcap (>= 2.60),
the libcap commit aca076443591 ("Make cap_t operations thread safe.")
added a "__u8 mutex;" to the "struct _cap_struct". It caused a few byte
shift that breaks the assumption made in the "struct libcap" definition
in test_verifier.c.
The bpf selftest usage only needs to enable and disable the effective
caps of the running task. It is easier to directly syscall the
capget and capset instead. It can also remove the libcap
library dependency.
The cap_helpers.{c,h} is added. One __u64 is used for all CAP_*
bits instead of two __u32.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220316173823.2036955-1-kafai@fb.com
|
|
The top-level (bpftool.8) man page was missing docs for a few
subcommands and their respective sub-sub-commands.
This commit brings the top level man page up to date. Note that I've
kept the ordering of the subcommands the same as in `bpftool help`.
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/3049ef5dc509c0d1832f0a8b2dba2ccaad0af688.1647213551.git.dxu@dxuuu.xyz
|
|
Commit 82e6b1eee6a8 ("bpf: Allow to specify user-provided bpf_cookie for
BPF perf links") introduced the concept of user specified bpf_cookie,
which could be accessed by BPF programs using bpf_get_attach_cookie().
For troubleshooting purposes it is convenient to expose bpf_cookie via
bpftool as well, so there is no need to meddle with the target BPF
program itself.
Implemented using the pid iterator BPF program to actually fetch
bpf_cookies, which allows constraining code changes only to bpftool.
$ bpftool link
1: type 7 prog 5
bpf_cookie 123
pids bootstrap(81)
Signed-off-by: Dmitrii Dolgov <9erthalion6@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220309163112.24141-1-9erthalion6@gmail.com
|
|
Clean up the array_size.cocci warnings under tools/testing/selftests/bpf/:
Use `ARRAY_SIZE(arr)` instead of forms like `sizeof(arr)/sizeof(arr[0])`.
tools/testing/selftests/bpf/test_cgroup_storage.c uses ARRAY_SIZE() defined
in tools/include/linux/kernel.h (sys/sysinfo.h -> linux/kernel.h), while
others use ARRAY_SIZE() in bpf_util.h.
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220315130143.2403-1-guozhengkui@vivo.com
|
|
When running xdpsock for a fix duration of time before terminating
using --duration=<n>, there is a race condition that may cause xdpsock
to terminate immediately.
When running for a fixed duration of time the check to determine when to
terminate execution is in is_benchmark_done() and is being executed in
the context of the poller thread,
if (opt_duration > 0) {
unsigned long dt = (get_nsecs() - start_time);
if (dt >= opt_duration)
benchmark_done = true;
}
However start_time is only set after the poller thread have been
created. This leaves a small window when the poller thread is starting
and calls is_benchmark_done() for the first time that start_time is not
yet set. In that case start_time have its initial value of 0 and the
duration check fails as it do not correlate correctly for the
applications start time and immediately sets benchmark_done which in
turn terminates the xdpsock application.
Fix this by setting start_time before creating the poller thread.
Fixes: d3f11b018f6c ("samples/bpf: xdpsock: Add duration option to specify how long to run")
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220315102948.466436-1-niklas.soderlund@corigine.com
|
|
If tcp_bpf_sendmsg is running during a tear down operation, psock may be
freed.
tcp_bpf_sendmsg()
tcp_bpf_send_verdict()
sk_msg_return()
tcp_bpf_sendmsg_redir()
unlikely(!psock))
sk_msg_free()
The mem of msg has been uncharged in tcp_bpf_send_verdict() by
sk_msg_return(), and would be uncharged by sk_msg_free() again. When psock
is null, we can simply returning an error code, this would then trigger
the sk_msg_free_nocharge in the error path of __SK_REDIRECT and would have
the side effect of throwing an error up to user space. This would be a
slight change in behavior from user side but would look the same as an
error if the redirect on the socket threw an error.
This issue can cause the following info:
WARNING: CPU: 0 PID: 2136 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x13c/0x260
Call Trace:
<TASK>
__sk_destruct+0x24/0x1f0
sk_psock_destroy+0x19b/0x1c0
process_one_work+0x1b3/0x3c0
worker_thread+0x30/0x350
? process_one_work+0x3c0/0x3c0
kthread+0xe6/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x22/0x30
</TASK>
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220304081145.2037182-5-wangyufen@huawei.com
|
|
In tcp_bpf_send_verdict(), if msg has more data after
tcp_bpf_sendmsg_redir():
tcp_bpf_send_verdict()
tosend = msg->sg.size //msg->sg.size = 22220
case __SK_REDIRECT:
sk_msg_return() //uncharged msg->sg.size(22220) sk->sk_forward_alloc
tcp_bpf_sendmsg_redir() //after tcp_bpf_sendmsg_redir, msg->sg.size=11000
goto more_data;
tosend = msg->sg.size //msg->sg.size = 11000
case __SK_REDIRECT:
sk_msg_return() //uncharged msg->sg.size(11000) to sk->sk_forward_alloc
The msg->sg.size(11000) has been uncharged twice, to fix we can charge the
remaining msg->sg.size before goto more data.
This issue can cause the following info:
WARNING: CPU: 0 PID: 9860 at net/core/stream.c:208 sk_stream_kill_queues+0xd4/0x1a0
Call Trace:
<TASK>
inet_csk_destroy_sock+0x55/0x110
__tcp_close+0x279/0x470
tcp_close+0x1f/0x60
inet_release+0x3f/0x80
__sock_release+0x3d/0xb0
sock_close+0x11/0x20
__fput+0x92/0x250
task_work_run+0x6a/0xa0
do_exit+0x33b/0xb60
do_group_exit+0x2f/0xa0
get_signal+0xb6/0x950
arch_do_signal_or_restart+0xac/0x2a0
? vfs_write+0x237/0x290
exit_to_user_mode_prepare+0xa9/0x200
syscall_exit_to_user_mode+0x12/0x30
do_syscall_64+0x46/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK>
WARNING: CPU: 0 PID: 2136 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x13c/0x260
Call Trace:
<TASK>
__sk_destruct+0x24/0x1f0
sk_psock_destroy+0x19b/0x1c0
process_one_work+0x1b3/0x3c0
worker_thread+0x30/0x350
? process_one_work+0x3c0/0x3c0
kthread+0xe6/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x22/0x30
</TASK>
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220304081145.2037182-4-wangyufen@huawei.com
|
|
If tcp_bpf_sendmsg() is running while sk msg is full. When sk_msg_alloc()
returns -ENOMEM error, tcp_bpf_sendmsg() goes to wait_for_memory. If partial
memory has been alloced by sk_msg_alloc(), that is, msg_tx->sg.size is
greater than osize after sk_msg_alloc(), memleak occurs. To fix we use
sk_msg_trim() to release the allocated memory, then goto wait for memory.
Other call paths of sk_msg_alloc() have the similar issue, such as
tls_sw_sendmsg(), so handle sk_msg_trim logic inside sk_msg_alloc(),
as Cong Wang suggested.
This issue can cause the following info:
WARNING: CPU: 3 PID: 7950 at net/core/stream.c:208 sk_stream_kill_queues+0xd4/0x1a0
Call Trace:
<TASK>
inet_csk_destroy_sock+0x55/0x110
__tcp_close+0x279/0x470
tcp_close+0x1f/0x60
inet_release+0x3f/0x80
__sock_release+0x3d/0xb0
sock_close+0x11/0x20
__fput+0x92/0x250
task_work_run+0x6a/0xa0
do_exit+0x33b/0xb60
do_group_exit+0x2f/0xa0
get_signal+0xb6/0x950
arch_do_signal_or_restart+0xac/0x2a0
exit_to_user_mode_prepare+0xa9/0x200
syscall_exit_to_user_mode+0x12/0x30
do_syscall_64+0x46/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK>
WARNING: CPU: 3 PID: 2094 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x13c/0x260
Call Trace:
<TASK>
__sk_destruct+0x24/0x1f0
sk_psock_destroy+0x19b/0x1c0
process_one_work+0x1b3/0x3c0
kthread+0xe6/0x110
ret_from_fork+0x22/0x30
</TASK>
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220304081145.2037182-3-wangyufen@huawei.com
|
|
If tcp_bpf_sendmsg is running during a tear down operation we may enqueue
data on the ingress msg queue while tear down is trying to free it.
sk1 (redirect sk2) sk2
------------------- ---------------
tcp_bpf_sendmsg()
tcp_bpf_send_verdict()
tcp_bpf_sendmsg_redir()
bpf_tcp_ingress()
sock_map_close()
lock_sock()
lock_sock() ... blocking
sk_psock_stop
sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED);
release_sock(sk);
lock_sock()
sk_mem_charge()
get_page()
sk_psock_queue_msg()
sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED);
drop_sk_msg()
release_sock()
While drop_sk_msg(), the msg has charged memory form sk by sk_mem_charge
and has sg pages need to put. To fix we use sk_msg_free() and then kfee()
msg.
This issue can cause the following info:
WARNING: CPU: 0 PID: 9202 at net/core/stream.c:205 sk_stream_kill_queues+0xc8/0xe0
Call Trace:
<IRQ>
inet_csk_destroy_sock+0x55/0x110
tcp_rcv_state_process+0xe5f/0xe90
? sk_filter_trim_cap+0x10d/0x230
? tcp_v4_do_rcv+0x161/0x250
tcp_v4_do_rcv+0x161/0x250
tcp_v4_rcv+0xc3a/0xce0
ip_protocol_deliver_rcu+0x3d/0x230
ip_local_deliver_finish+0x54/0x60
ip_local_deliver+0xfd/0x110
? ip_protocol_deliver_rcu+0x230/0x230
ip_rcv+0xd6/0x100
? ip_local_deliver+0x110/0x110
__netif_receive_skb_one_core+0x85/0xa0
process_backlog+0xa4/0x160
__napi_poll+0x29/0x1b0
net_rx_action+0x287/0x300
__do_softirq+0xff/0x2fc
do_softirq+0x79/0x90
</IRQ>
WARNING: CPU: 0 PID: 531 at net/ipv4/af_inet.c:154 inet_sock_destruct+0x175/0x1b0
Call Trace:
<TASK>
__sk_destruct+0x24/0x1f0
sk_psock_destroy+0x19b/0x1c0
process_one_work+0x1b3/0x3c0
? process_one_work+0x3c0/0x3c0
worker_thread+0x30/0x350
? process_one_work+0x3c0/0x3c0
kthread+0xe6/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x22/0x30
</TASK>
Fixes: 9635720b7c88 ("bpf, sockmap: Fix memleak on ingress msg enqueue")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220304081145.2037182-2-wangyufen@huawei.com
|
|
Building selftests/bpf with latest clang compiler (clang15 built
from source), I hit the following compilation error:
/.../prog_tests/send_signal.c:43:16: error: variable 'j' set but not used [-Werror,-Wunused-but-set-variable]
volatile int j = 0;
^
1 error generated.
The problem also exists with clang13 and clang14. clang12 is okay.
In send_signal.c, we have the following code ...
volatile int j = 0;
[...]
for (int i = 0; i < 100000000 && !sigusr1_received; i++)
j /= i + 1;
... to burn CPU cycles so bpf_send_signal() helper can be tested
in NMI mode.
Slightly changing 'j /= i + 1' to 'j /= i + j + 1' or 'j++' can
fix the problem. Further investigation indicated this should be
a clang bug ([1]). The upstream fix will be proposed later. But it
is a good idea to workaround the issue to unblock people who build
kernel/selftests with clang.
[1] https://discourse.llvm.org/t/strange-clang-unused-but-set-variable-error-with-volatile-variables/60841
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220311003721.2177170-1-yhs@fb.com
|
|
This adds an extra test to the xdp_do_redirect selftest for XDP live packet
mode, which verifies that the maximum permissible packet size is accepted
without any errors, and that a too big packet is correctly rejected.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220310225621.53374-2-toke@redhat.com
|
|
The live packet mode uses some extra space at the start of each page to
cache data structures so they don't have to be rebuilt at every repetition.
This space wasn't correctly accounted for in the size checking of the
arguments supplied to userspace. In addition, the definition of the frame
size should include the size of the skb_shared_info (as there is other
logic that subtracts the size of this).
Together, these mistakes resulted in userspace being able to trip the
XDP_WARN() in xdp_update_frame_from_buff(), which syzbot discovered in
short order. Fix this by changing the frame size define and adding the
extra headroom to the bpf_prog_test_run_xdp() function. Also drop the
max_len parameter to the page_pool init, since this is related to DMA which
is not used for the page pool instance in PROG_TEST_RUN.
Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Reported-by: syzbot+0e91362d99386dc5de99@syzkaller.appspotmail.com
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220310225621.53374-1-toke@redhat.com
|
|
Previous patches have introduced the compiler attribute btf_type_tag for
__user and __percpu. The availability of this attribute depends on
some CONFIGs and compiler support. This patch refactors the use
of btf_type_tag by introducing BTF_TYPE_TAG, which hides all the
dependencies.
No functional change.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Hao Luo <haoluo@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220310211655.3173786-1-haoluo@google.com
|
|
Check that bpf_kernel_read_file() denies the reading of an IMA policy, by
ensuring that ima_setup.sh exits with an error.
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220302111404.193900-10-roberto.sassu@huawei.com
|
|
Test the ability of bpf_lsm_kernel_read_file() to call the sleepable
functions bpf_ima_inode_hash() or bpf_ima_file_hash() to obtain a
measurement of a loaded IMA policy.
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220302111404.193900-9-roberto.sassu@huawei.com
|
|
Make bpf_lsm_kernel_read_file() as sleepable, so that bpf_ima_inode_hash()
or bpf_ima_file_hash() can be called inside the implementation of this
hook.
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220302111404.193900-8-roberto.sassu@huawei.com
|
|
Verify that bpf_ima_inode_hash() returns a non-fresh digest after a file
write, and that bpf_ima_file_hash() returns a fresh digest. Verification is
done by requesting the digest from the bprm_creds_for_exec hook, called
before ima_bprm_check().
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220302111404.193900-7-roberto.sassu@huawei.com
|
|
Add new test to ensure that bpf_ima_file_hash() returns the digest of the
executed files.
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220302111404.193900-6-roberto.sassu@huawei.com
|
|
Move sample generator code to ima_test_common() so that the new function
can be called by multiple LSM hooks.
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220302111404.193900-5-roberto.sassu@huawei.com
|
|
ima_file_hash() has been modified to calculate the measurement of a file on
demand, if it has not been already performed by IMA or the measurement is
not fresh. For compatibility reasons, ima_inode_hash() remains unchanged.
Keep the same approach in eBPF and introduce the new helper
bpf_ima_file_hash() to take advantage of the modified behavior of
ima_file_hash().
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220302111404.193900-4-roberto.sassu@huawei.com
|
|
__ima_inode_hash() checks if a digest has been already calculated by
looking for the integrity_iint_cache structure associated to the passed
inode.
Users of ima_file_hash() (e.g. eBPF) might be interested in obtaining the
information without having to setup an IMA policy so that the digest is
always available at the time they call this function.
In addition, they likely expect the digest to be fresh, e.g. recalculated
by IMA after a file write. Although getting the digest from the
bprm_committed_creds hook (as in the eBPF test) ensures that the digest is
fresh, as the IMA hook is executed before that hook, this is not always the
case (e.g. for the mmap_file hook).
Call ima_collect_measurement() in __ima_inode_hash(), if the file
descriptor is available (passed by ima_file_hash()) and the digest is not
available/not fresh, and store the file measurement in a temporary
integrity_iint_cache structure.
This change does not cause memory usage increase, due to using the
temporary integrity_iint_cache structure, and due to freeing the
ima_digest_data structure inside integrity_iint_cache before exiting from
__ima_inode_hash().
For compatibility reasons, the behavior of ima_inode_hash() remains
unchanged.
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20220302111404.193900-3-roberto.sassu@huawei.com
|
|
Fix the following warnings in ima_main.c, displayed with W=n make argument:
security/integrity/ima/ima_main.c:432: warning: Function parameter or
member 'vma' not described in 'ima_file_mprotect'
security/integrity/ima/ima_main.c:636: warning: Function parameter or
member 'inode' not described in 'ima_post_create_tmpfile'
security/integrity/ima/ima_main.c:636: warning: Excess function parameter
'file' description in 'ima_post_create_tmpfile'
security/integrity/ima/ima_main.c:843: warning: Function parameter or
member 'load_id' not described in 'ima_post_load_data'
security/integrity/ima/ima_main.c:843: warning: Excess function parameter
'id' description in 'ima_post_load_data'
Also, fix some style issues in the description of ima_post_create_tmpfile()
and ima_post_path_mknod().
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20220302111404.193900-2-roberto.sassu@huawei.com
|
|
If a BPF map is created over 2^32 the memlock value as displayed in JSON
format will be incorrect. Use atoll instead of atoi so that the correct
number is displayed.
```
$ bpftool map create /sys/fs/bpf/test_bpfmap type hash key 4 \
value 1024 entries 4194304 name test_bpfmap
$ bpftool map list
1: hash name test_bpfmap flags 0x0
key 4B value 1024B max_entries 4194304 memlock 4328521728B
$ sudo bpftool map list -j | jq .[].bytes_memlock
33554432
```
Signed-off-by: Chris J Arges <carges@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/b6601087-0b11-33cc-904a-1133d1500a10@cloudflare.com
|
|
Use offsetofend() instead of offsetof() + sizeof() to simplify
MIN_BPF_LINEINFO_SIZE macro definition.
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/bpf/20220310161518.534544-1-ytcoode@gmail.com
|
|
Fix the descriptions of the return values of helper bpf_current_task_under_cgroup().
Fixes: c6b5fb8690fa ("bpf: add documentation for eBPF helpers (42-50)")
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220310155335.1278783-1-hengqi.chen@gmail.com
|
|
The previous patch made the follow changes:
- s/delivery_time_type/tstamp_type/
- s/bpf_skb_set_delivery_time/bpf_skb_set_tstamp/
- BPF_SKB_DELIVERY_TIME_* to BPF_SKB_TSTAMP_*
This patch is to change the test_tc_dtime.c to reflect the above.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220309090515.3712742-1-kafai@fb.com
|
|
This patch is to simplify the uapi bpf.h regarding to the tstamp type
and use a similar way as the kernel to describe the value stored
in __sk_buff->tstamp.
My earlier thought was to avoid describing the semantic and
clock base for the rcv timestamp until there is more clarity
on the use case, so the __sk_buff->delivery_time_type naming instead
of __sk_buff->tstamp_type.
With some thoughts, it can reuse the UNSPEC naming. This patch first
removes BPF_SKB_DELIVERY_TIME_NONE and also
rename BPF_SKB_DELIVERY_TIME_UNSPEC to BPF_SKB_TSTAMP_UNSPEC
and BPF_SKB_DELIVERY_TIME_MONO to BPF_SKB_TSTAMP_DELIVERY_MONO.
The semantic of BPF_SKB_TSTAMP_DELIVERY_MONO is the same:
__sk_buff->tstamp has delivery time in mono clock base.
BPF_SKB_TSTAMP_UNSPEC means __sk_buff->tstamp has the (rcv)
tstamp at ingress and the delivery time at egress. At egress,
the clock base could be found from skb->sk->sk_clockid.
__sk_buff->tstamp == 0 naturally means NONE, so NONE is not needed.
With BPF_SKB_TSTAMP_UNSPEC for the rcv tstamp at ingress,
the __sk_buff->delivery_time_type is also renamed to __sk_buff->tstamp_type
which was also suggested in the earlier discussion:
https://lore.kernel.org/bpf/b181acbe-caf8-502d-4b7b-7d96b9fc5d55@iogearbox.net/
The above will then make __sk_buff->tstamp and __sk_buff->tstamp_type
the same as its kernel skb->tstamp and skb->mono_delivery_time
counter part.
The internal kernel function bpf_skb_convert_dtime_type_read() is then
renamed to bpf_skb_convert_tstamp_type_read() and it can be simplified
with the BPF_SKB_DELIVERY_TIME_NONE gone. A BPF_ALU32_IMM(BPF_AND)
insn is also saved by using BPF_JMP32_IMM(BPF_JSET).
The bpf helper bpf_skb_set_delivery_time() is also renamed to
bpf_skb_set_tstamp(). The arg name is changed from dtime
to tstamp also. It only allows setting tstamp 0 for
BPF_SKB_TSTAMP_UNSPEC and it could be relaxed later
if there is use case to change mono delivery time to
non mono.
prog->delivery_time_access is also renamed to prog->tstamp_type_access.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220309090509.3712315-1-kafai@fb.com
|
|
BPF_JMP32_IMM(BPF_JSET) is used to save a BPF_ALU32_IMM(BPF_AND).
The skb->tc_at_ingress and skb->mono_delivery_time are at the same
offset, so only one BPF_LDX_MEM(BPF_B) is needed.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220309090502.3711982-1-kafai@fb.com
|
|
The skb->tc_at_ingress and skb->mono_delivery_time are at the same
byte offset. Thus, only one BPF_LDX_MEM(BPF_B) is needed
and both bits can be tested together.
/* BPF_READ: a = __sk_buff->tstamp */
if (skb->tc_at_ingress && skb->mono_delivery_time)
a = 0;
else
a = skb->tstamp;
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220309090456.3711530-1-kafai@fb.com
|
|
This patch removes the TC_AT_INGRESS_OFFSET and
SKB_MONO_DELIVERY_TIME_OFFSET macros. Instead, PKT_VLAN_PRESENT_OFFSET
is used because all of them are at the same offset. Comment is added to
make it clear that changing the position of tc_at_ingress or
mono_delivery_time will require to adjust the defined macros.
The earlier discussion can be found here:
https://lore.kernel.org/bpf/419d994e-ff61-7c11-0ec7-11fefcb0186e@iogearbox.net/
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220309090450.3710955-1-kafai@fb.com
|
|
It is allocated with kvmalloc(), the corresponding release function
should not be kfree(), use kvfree() instead.
Generated by: scripts/coccinelle/api/kfree_mismatch.cocci
Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Signed-off-by: Yihao Han <hanyihao@vivo.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20220310092828.13405-1-hanyihao@vivo.com
|
|
The kernel test robot pointed out that the newly added
bpf_test_run_xdp_live() runner doesn't set the retval in the caller (by
design), which means that the variable can be passed unitialised to
bpf_test_finish(). Fix this by initialising the variable properly.
Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220310110228.161869-1-toke@redhat.com
|
|
Commit 1a56c18e6c2e4e74 ("bpftool: Stop supporting BPF offload-enabled
feature probing") removed the support to probe for BPF offload features.
This is still something that is useful for NFP NIC that can support
offloading of BPF programs.
The reason for the dropped support was that libbpf starting with v1.0
would drop support for passing the ifindex to the BPF prog/map/helper
feature probing APIs. In order to keep this useful feature for NFP
restore the functionality by moving it directly into bpftool.
The code restored is a simplified version of the code that existed in
libbpf which supposed passing the ifindex. The simplification is that it
only targets the cases where ifindex is given and call into libbpf for
the cases where it's not.
Before restoring support for probing offload features:
# bpftool feature probe dev ens4np0
Scanning system call availability...
bpf() syscall is available
Scanning eBPF program types...
Scanning eBPF map types...
Scanning eBPF helper functions...
eBPF helpers supported for program type sched_cls:
eBPF helpers supported for program type xdp:
Scanning miscellaneous eBPF features...
Large program size limit is NOT available
Bounded loop support is NOT available
ISA extension v2 is NOT available
ISA extension v3 is NOT available
With support for probing offload features restored:
# bpftool feature probe dev ens4np0
Scanning system call availability...
bpf() syscall is available
Scanning eBPF program types...
eBPF program_type sched_cls is available
eBPF program_type xdp is available
Scanning eBPF map types...
eBPF map_type hash is available
eBPF map_type array is available
Scanning eBPF helper functions...
eBPF helpers supported for program type sched_cls:
- bpf_map_lookup_elem
- bpf_get_prandom_u32
- bpf_perf_event_output
eBPF helpers supported for program type xdp:
- bpf_map_lookup_elem
- bpf_get_prandom_u32
- bpf_perf_event_output
- bpf_xdp_adjust_head
- bpf_xdp_adjust_tail
Scanning miscellaneous eBPF features...
Large program size limit is NOT available
Bounded loop support is NOT available
ISA extension v2 is NOT available
ISA extension v3 is NOT available
Signed-off-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220310121846.921256-1-niklas.soderlund@corigine.com
|
|
This adds a selftest for the XDP_REDIRECT facility in BPF_PROG_RUN, that
redirects packets into a veth and counts them using an XDP program on the
other side of the veth pair and a TC program on the local side of the veth.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-6-toke@redhat.com
|
|
These will also be used by the xdp_do_redirect test being added in the next
commit.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-5-toke@redhat.com
|
|
Add support for setting the new batch_size parameter to BPF_PROG_TEST_RUN
to libbpf; just add it as an option and pass it through to the kernel.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-4-toke@redhat.com
|
|
This adds documentation for the BPF_PROG_RUN command; a short overview of
the command itself, and a more verbose description of the "live packet"
mode for XDP introduced in the previous commit.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-3-toke@redhat.com
|
|
This adds support for running XDP programs through BPF_PROG_RUN in a mode
that enables live packet processing of the resulting frames. Previous uses
of BPF_PROG_RUN for XDP returned the XDP program return code and the
modified packet data to userspace, which is useful for unit testing of XDP
programs.
The existing BPF_PROG_RUN for XDP allows userspace to set the ingress
ifindex and RXQ number as part of the context object being passed to the
kernel. This patch reuses that code, but adds a new mode with different
semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES
flag.
When running BPF_PROG_RUN in this mode, the XDP program return codes will
be honoured: returning XDP_PASS will result in the frame being injected
into the networking stack as if it came from the selected networking
interface, while returning XDP_TX and XDP_REDIRECT will result in the frame
being transmitted out that interface. XDP_TX is translated into an
XDP_REDIRECT operation to the same interface, since the real XDP_TX action
is only possible from within the network drivers themselves, not from the
process context where BPF_PROG_RUN is executed.
Internally, this new mode of operation creates a page pool instance while
setting up the test run, and feeds pages from that into the XDP program.
The setup cost of this is amortised over the number of repetitions
specified by userspace.
To support the performance testing use case, we further optimise the setup
step so that all pages in the pool are pre-initialised with the packet
data, and pre-computed context and xdp_frame objects stored at the start of
each page. This makes it possible to entirely avoid touching the page
content on each XDP program invocation, and enables sending up to 9
Mpps/core on my test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Enabling the new flag is only allowed when not setting ctx_out and data_out
in the test specification, since using it means frames will be redirected
somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
|
|
Remove unneeded spleep and increase length of dummy CPU
intensive computation to guarantee test process execution.
Also, complete aforemention computation as soon as
test success criteria is met
Signed-off-by: Mykola Lysenko <mykolal@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220308200449.1757478-4-mykolal@fb.com
|
|
Substitute sleep with dummy CPU intensive computation.
Finish aforemention computation as soon as signal was
delivered to the test process. Make the BPF code to
only execute when PID global variable is set
Signed-off-by: Mykola Lysenko <mykolal@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220308200449.1757478-3-mykolal@fb.com
|
|
Linux kernel may automatically reduce kernel.perf_event_max_sample_rate
value when running tests in parallel on slow systems. Linux kernel checks
against this limit when opening perf event with freq=1 parameter set.
The lower bound is 1000. This patch reduces sample_freq value to 1000
in all BPF tests that use sample_freq to ensure they always can open
perf event.
Signed-off-by: Mykola Lysenko <mykolal@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220308200449.1757478-2-mykolal@fb.com
|
|
In ChromeOS and Gentoo we catch any unwanted mixed Clang/LLVM
and GCC/binutils usage via toolchain wrappers which fail builds.
This has revealed that GCC is called unconditionally in Clang
configured builds to populate GCC_TOOLCHAIN_DIR.
Allow the user to override CLANG_CROSS_FLAGS to avoid the GCC
call - in our case we set the var directly in the ebuild recipe.
In theory Clang could be able to autodetect these settings so
this logic could be removed entirely, but in practice as the
commit cebdb7374577 ("tools: Help cross-building with clang")
mentions, this does not always work, so giving distributions
more control to specify their flags & sysroot is beneficial.
Suggested-by: Manoj Gupta <manojgupta@chromium.com>
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/lkml/87czjk4osi.fsf@ryzen9.i-did-not-set--mail-host-address--so-tickle-me
Link: https://lore.kernel.org/bpf/20220308121428.81735-1-adrian.ratiu@collabora.com
|
|
In test_lwt_ip_encap, the ingress IPv6 encap test failed from time to
time. The failure occured when an IPv4 ping through the IPv6 GRE
encapsulation did not receive a reply within the timeout. The IPv4 ping
and the IPv6 ping in the test used different timeouts (1 sec for IPv4
and 6 sec for IPv6), probably taking into account that IPv6 might need
longer to successfully complete. However, when IPv4 pings (with the
short timeout) are encapsulated into the IPv6 tunnel, the delays of IPv6
apply.
The actual reason for the long delays with IPv6 was that the IPv6
neighbor discovery sometimes did not complete in time. This was caused
by the outgoing interface only having a tentative link local address,
i.e., not having completed DAD for that lladdr. The ND was successfully
retried after 1 sec but that was too late for the ping timeout.
The IPv6 addresses for the test were already added with nodad. However,
for the lladdrs, DAD was still performed. We now disable DAD in the test
netns completely and just assume that the two lladdrs on each veth pair
do not collide. This removes all the delays for IPv6 traffic in the
test.
Without the delays, we can now also reduce the delay of the IPv6 ping to
1 sec. This makes the whole test complete faster because we don't need
to wait for the excessive timeout for each IPv6 ping that is supposed
to fail.
Fixes: 0fde56e4385b0 ("selftests: bpf: add test_lwt_ip_encap selftest")
Signed-off-by: Felix Maurer <fmaurer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/4987d549d48b4e316cd5b3936de69c8d4bc75a4f.1646305899.git.fmaurer@redhat.com
|
|
Instead of determining buf_info string in the caller of check_buffer_access(),
we can determine whether the register type is read-only through
type_is_rdonly_mem() helper inside check_buffer_access() and construct
buf_info, making the code slightly cleaner.
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/YiWYLnAkEZXBP/gH@syu-laptop
|
|
vmtest.sh also supports s390x now.
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220307133048.1287644-2-kpsingh@kernel.org
|
|
Dynamic linking when compiling on the host can cause issues when the
libc version does not match the one in the VM image. Update the
docs to explain how to do this.
Before:
./vmtest.sh -- ./test_progs -t test_ima
./test_progs: /usr/lib/libc.so.6: version `GLIBC_2.33' not found (required by ./test_progs)
After:
LDLIBS=-static ./vmtest.sh -- ./test_progs -t test_ima
test_ima:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
Reported-by: "Geyslan G. Bem" <geyslan@gmail.com>
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220307133048.1287644-1-kpsingh@kernel.org
|
|
The trailing slash of LIBBPF_SRCS is redundant, remove it. Also inline
it as its only used in LIBBPF_INCLUDE.
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220305161013.361646-1-ytcoode@gmail.com
|
|
Fix the following coccicheck warning:
tools/lib/bpf/bpf.c:114:31-32: WARNING: Use ARRAY_SIZE
tools/lib/bpf/xsk.c:484:34-35: WARNING: Use ARRAY_SIZE
tools/lib/bpf/xsk.c:485:35-36: WARNING: Use ARRAY_SIZE
It has been tested with gcc (Debian 8.3.0-6) 8.3.0 on x86_64.
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220306023426.19324-1-guozhengkui@vivo.com
|
|
Using strncpy() on NUL-terminated strings is considered deprecated[1].
Moreover, if the length of 'task->comm' is less than the destination buffer
size, strncpy() will NUL-pad the destination buffer, which is a needless
performance penalty.
Replacing strncpy() with strscpy() fixes all these issues.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220304070408.233658-1-ytcoode@gmail.com
|