linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2019-12-27	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next	David S. Miller	62	-1881/+5131
	Daniel Borkmann says: ==================== pull-request: bpf-next 2019-12-27 The following pull-request contains BPF updates for your net-next tree. We've added 127 non-merge commits during the last 17 day(s) which contain a total of 110 files changed, 6901 insertions(+), 2721 deletions(-). There are three merge conflicts. Conflicts and resolution looks as follows: 1) Merge conflict in net/bpf/test_run.c: There was a tree-wide cleanup c593642c8be0 ("treewide: Use sizeof_field() macro") which gets in the way with b590cb5f802d ("bpf: Switch to offsetofend in BPF_PROG_TEST_RUN"): <<<<<<< HEAD if (!range_is_zero(__skb, offsetof(struct __sk_buff, priority) + sizeof_field(struct __sk_buff, priority), ======= if (!range_is_zero(__skb, offsetofend(struct __sk_buff, priority), >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c There are a few occasions that look similar to this. Always take the chunk with offsetofend(). Note that there is one where the fields differ in here: <<<<<<< HEAD if (!range_is_zero(__skb, offsetof(struct __sk_buff, tstamp) + sizeof_field(struct __sk_buff, tstamp), ======= if (!range_is_zero(__skb, offsetofend(struct __sk_buff, gso_segs), >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c Just take the one with offsetofend() /and/ gso_segs. Latter is correct due to 850a88cc4096 ("bpf: Expose __sk_buff wire_len/gso_segs to BPF_PROG_TEST_RUN"). 2) Merge conflict in arch/riscv/net/bpf_jit_comp.c: (I'm keeping Bjorn in Cc here for a double-check in case I got it wrong.) <<<<<<< HEAD if (is_13b_check(off, insn)) return -1; emit(rv_blt(tcc, RV_REG_ZERO, off >> 1), ctx); ======= emit_branch(BPF_JSLT, RV_REG_T1, RV_REG_ZERO, off, ctx); >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c Result should look like: emit_branch(BPF_JSLT, tcc, RV_REG_ZERO, off, ctx); 3) Merge conflict in arch/riscv/include/asm/pgtable.h: <<<<<<< HEAD ======= #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1) #define VMALLOC_END (PAGE_OFFSET - 1) #define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE) #define BPF_JIT_REGION_SIZE (SZ_128M) #define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE) #define BPF_JIT_REGION_END (VMALLOC_END) /* * Roughly size the vmemmap space to be large enough to fit enough * struct pages to map half the virtual address space. Then * position vmemmap directly below the VMALLOC region. / #define VMEMMAP_SHIFT \ (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT) #define VMEMMAP_SIZE BIT(VMEMMAP_SHIFT) #define VMEMMAP_END (VMALLOC_START - 1) #define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE) #define vmemmap ((struct page )VMEMMAP_START) >>>>>>> 7c8dce4b166113743adad131b5a24c4acc12f92c Only take the BPF_* defines from there and move them higher up in the same file. Remove the rest from the chunk. The VMALLOC_* etc defines got moved via 01f52e16b868 ("riscv: define vmemmap before pfn_to_page calls"). Result: [...] #define __S101 PAGE_READ_EXEC #define __S110 PAGE_SHARED_EXEC #define __S111 PAGE_SHARED_EXEC #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1) #define VMALLOC_END (PAGE_OFFSET - 1) #define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE) #define BPF_JIT_REGION_SIZE (SZ_128M) #define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE) #define BPF_JIT_REGION_END (VMALLOC_END) /* * Roughly size the vmemmap space to be large enough to fit enough * struct pages to map half the virtual address space. Then * position vmemmap directly below the VMALLOC region. */ #define VMEMMAP_SHIFT \ (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT) #define VMEMMAP_SIZE BIT(VMEMMAP_SHIFT) #define VMEMMAP_END (VMALLOC_START - 1) #define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE) [...] Let me know if there are any other issues. Anyway, the main changes are: 1) Extend bpftool to produce a struct (aka "skeleton") tailored and specific to a provided BPF object file. This provides an alternative, simplified API compared to standard libbpf interaction. Also, add libbpf extern variable resolution for .kconfig section to import Kconfig data, from Andrii Nakryiko. 2) Add BPF dispatcher for XDP which is a mechanism to avoid indirect calls by generating a branch funnel as discussed back in bpfconf'19 at LSF/MM. Also, add various BPF riscv JIT improvements, from Björn Töpel. 3) Extend bpftool to allow matching BPF programs and maps by name, from Paul Chaignon. 4) Support for replacing cgroup BPF programs attached with BPF_F_ALLOW_MULTI flag for allowing updates without service interruption, from Andrey Ignatov. 5) Cleanup and simplification of ring access functions for AF_XDP with a bonus of 0-5% performance improvement, from Magnus Karlsson. 6) Enable BPF JITs for x86-64 and arm64 by default. Also, final version of audit support for BPF, from Daniel Borkmann and latter with Jiri Olsa. 7) Move and extend test_select_reuseport into BPF program tests under BPF selftests, from Jakub Sitnicki. 8) Various BPF sample improvements for xdpsock for customizing parameters to set up and benchmark AF_XDP, from Jay Jayatheerthan. 9) Improve libbpf to provide a ulimit hint on permission denied errors. Also change XDP sample programs to attach in driver mode by default, from Toke Høiland-Jørgensen. 10) Extend BPF test infrastructure to allow changing skb mark from tc BPF programs, from Nikita V. Shirokov. 11) Optimize prologue code sequence in BPF arm32 JIT, from Russell King. 12) Fix xdp_redirect_cpu BPF sample to manually attach to tracepoints after libbpf conversion, from Jesper Dangaard Brouer. 13) Minor misc improvements from various others. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27	bpftool: Make skeleton C code compilable with C++ compiler	Andrii Nakryiko	3	-6/+16
	When auto-generated BPF skeleton C code is included from C++ application, it triggers compilation error due to void * being implicitly casted to whatever target pointer type. This is supported by C, but not C++. To solve this problem, add explicit casts, where necessary. To ensure issues like this are captured going forward, add skeleton usage in test_cpp test. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191226210253.3132060-1-andriin@fb.com
2019-12-26	bpf: Print error message for bpftool cgroup show	Hechao Li	1	-17/+39
	Currently, when bpftool cgroup show <path> has an error, no error message is printed. This is confusing because the user may think the result is empty. Before the change: $ bpftool cgroup show /sys/fs/cgroup ID AttachType AttachFlags Name $ echo $? 255 After the change: $ ./bpftool cgroup show /sys/fs/cgroup Error: can't query bpf programs attached to /sys/fs/cgroup: Operation not permitted v2: Rename check_query_cgroup_progs to cgroup_has_attached_progs Signed-off-by: Hechao Li <hechaol@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191224011742.3714301-1-hechaol@fb.com
2019-12-26	libbpf: Support CO-RE relocations for LDX/ST/STX instructions	Andrii Nakryiko	1	-3/+28
	Clang patch [0] enables emitting relocatable generic ALU/ALU64 instructions (i.e, shifts and arithmetic operations), as well as generic load/store instructions. The former ones are already supported by libbpf as is. This patch adds further support for load/store instructions. Relocatable field offset is encoded in BPF instruction's 16-bit offset section and are adjusted by libbpf based on target kernel BTF. These Clang changes and corresponding libbpf changes allow for more succinct generated BPF code by encoding relocatable field reads as a single ST/LDX/STX instruction. It also enables relocatable access to BPF context. Previously, if context struct (e.g., __sk_buff) was accessed with CO-RE relocations (e.g., due to preserve_access_index attribute), it would be rejected by BPF verifier due to modified context pointer dereference. With Clang patch, such context accesses are both relocatable and have a fixed offset from the point of view of BPF verifier. [0] https://reviews.llvm.org/D71790 Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20191223180305.86417-1-andriin@fb.com
2019-12-22	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	David S. Miller	96	-546/+1658
	Mere overlapping changes in the conflicts here. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-22	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Linus Torvalds	13	-237/+603
	Pull networking fixes from David Miller: 1) Several nf_flow_table_offload fixes from Pablo Neira Ayuso, including adding a missing ipv6 match description. 2) Several heap overflow fixes in mwifiex from qize wang and Ganapathi Bhat. 3) Fix uninit value in bond_neigh_init(), from Eric Dumazet. 4) Fix non-ACPI probing of nxp-nci, from Stephan Gerhold. 5) Fix use after free in tipc_disc_rcv(), from Tuong Lien. 6) Enforce limit of 33 tail calls in mips and riscv JIT, from Paul Chaignon. 7) Multicast MAC limit test is off by one in qede, from Manish Chopra. 8) Fix established socket lookup race when socket goes from TCP_ESTABLISHED to TCP_LISTEN, because there lacks an intervening RCU grace period. From Eric Dumazet. 9) Don't send empty SKBs from tcp_write_xmit(), also from Eric Dumazet. 10) Fix active backup transition after link failure in bonding, from Mahesh Bandewar. 11) Avoid zero sized hash table in gtp driver, from Taehee Yoo. 12) Fix wrong interface passed to ->mac_link_up(), from Russell King. 13) Fix DSA egress flooding settings in b53, from Florian Fainelli. 14) Memory leak in gmac_setup_txqs(), from Navid Emamdoost. 15) Fix double free in dpaa2-ptp code, from Ioana Ciornei. 16) Reject invalid MTU values in stmmac, from Jose Abreu. 17) Fix refcount leak in error path of u32 classifier, from Davide Caratti. 18) Fix regression causing iwlwifi firmware crashes on boot, from Anders Kaseorg. 19) Fix inverted return value logic in llc2 code, from Chan Shu Tak. 20) Disable hardware GRO when XDP is attached to qede, frm Manish Chopra. 21) Since we encode state in the low pointer bits, dst metrics must be at least 4 byte aligned, which is not necessarily true on m68k. Add annotations to fix this, from Geert Uytterhoeven. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (160 commits) sfc: Include XDP packet headroom in buffer step size. sfc: fix channel allocation with brute force net: dst: Force 4-byte alignment of dst_metrics selftests: pmtu: fix init mtu value in description hv_netvsc: Fix unwanted rx_table reset net: phy: ensure that phy IDs are correctly typed mod_devicetable: fix PHY module format qede: Disable hardware gro when xdp prog is installed net: ena: fix issues in setting interrupt moderation params in ethtool net: ena: fix default tx interrupt moderation interval net/smc: unregister ib devices in reboot_event net: stmmac: platform: Fix MDIO init for platforms without PHY llc2: Fix return statement of llc_stat_ev_rx_null_dsap_xid_c (and _test_c) net: hisilicon: Fix a BUG trigered by wrong bytes_compl net: dsa: ksz: use common define for tag len s390/qeth: don't return -ENOTSUPP to userspace s390/qeth: fix promiscuous mode after reset s390/qeth: handle error due to unsupported transport mode cxgb4: fix refcount init for TC-MQPRIO offload tc-testing: initial tdc selftests for cls_u32 ...
2019-12-21	Merge tag 'libnvdimm-fix-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm	Linus Torvalds	2	-0/+7
	Pull libnvdimm fix from Dan Williams: "A minor regression fix. The libnvdimm unit tests were expecting to mock calls to ioremap_nocache() which disappeared in v5.5-rc1. This fix has appeared in -next and collided with some cleanups that Christoph has planned for v5.6, but he will fix up his branch once this goes in. Summary: - Restore the operation of the libnvdimm unit tests after the removal of ioremap_nocache()" * tag 'libnvdimm-fix-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: tools/testing/nvdimm: Fix mock support for ioremap
2019-12-20	selftests: pmtu: fix init mtu value in description	Hangbin Liu	1	-3/+3
	There is no a_r3, a_r4 in the testing topology. It should be b_r1, b_r2. Also b_r1 mtu is 1400 and b_r2 mtu is 1500. Fixes: e44e428f59e4 ("selftests: pmtu: add basic IPv4 and IPv6 PMTU tests") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20	vsock_test: add SOCK_STREAM MSG_PEEK test	Stefano Garzarella	1	-0/+34
	Test if the MSG_PEEK flags of recv(2) works as expected. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20	testing/vsock: print list of options and description	Stefano Garzarella	2	-2/+24
	Since we now have several options, in the help we print the list of all supported options and a brief description of them. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20	testing/vsock: add parameters to list and skip tests	Stefano Garzarella	6	-20/+116
	Some tests can fail with transports that have a slightly different behavior, so let's add the possibility to specify which tests to skip. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20	vsock_test: wait for the remote to close the connection	Stefano Garzarella	3	-4/+48
	Before check if a send returns -EPIPE, we need to make sure the connection is closed. To do that, we use epoll API to wait EPOLLRDHUP or EPOLLHUP events on the socket. Reported-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20	VSOCK: add AF_VSOCK test cases	Stefan Hajnoczi	4	-2/+317
	The vsock_test.c program runs a test suite of AF_VSOCK test cases. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20	VSOCK: add send_byte()/recv_byte() test utilities	Stefan Hajnoczi	2	-0/+105
	Test cases will want to transfer data. This patch adds utility functions to do this. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20	VSOCK: add full barrier between test cases	Stefan Hajnoczi	1	-2/+16
	See code comment for details. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20	VSOCK: extract connect/accept functions from vsock_diag_test.c	Stefan Hajnoczi	3	-76/+119
	Many test cases will need to connect to the server or accept incoming connections. This patch extracts these operations into utility functions that can be reused. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20	VSOCK: extract utility functions from vsock_diag_test.c	Stefan Hajnoczi	4	-71/+125
	Move useful functions into a separate file in preparation for more vsock test programs. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20	VSOCK: add SPDX identifiers to vsock tests	Stefan Hajnoczi	2	-0/+2
	Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20	VSOCK: fix header include in vsock_diag_test	Stefan Hajnoczi	3	-5/+4
	The vsock_diag_test program directly included ../../../include/uapi/ headers from the source tree. Tests are supposed to use the usr/include/linux/ headers that have been prepared with make headers_install instead. Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-20	selftests/bpf: Preserve errno in test_progs CHECK macros	Andrey Ignatov	1	-0/+4
	It's follow-up for discussion [1] CHECK and CHECK_FAIL macros in test_progs.h can affect errno in some circumstances, e.g. if some code accidentally closes stdout. It makes checking errno in patterns like this unreliable: if (CHECK(!bpf_prog_attach_xattr(...), "tag", "msg")) goto err; CHECK_FAIL(errno != ENOENT); , since by CHECK_FAIL time errno could be affected not only by bpf_prog_attach_xattr but by CHECK as well. Fix it by saving and restoring errno in the macros. There is no "Fixes" tag since no problems were discovered yet and it's rather precaution. test_progs was run with this change and no difference was identified. [1] https://lore.kernel.org/bpf/20191219210907.GD16266@rdna-mbp.dhcp.thefacebook.com/ Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191220000511.1684853-1-rdna@fb.com
2019-12-19	selftests/bpf: Test BPF_F_REPLACE in cgroup_attach_multi	Andrey Ignatov	1	-3/+50
	Test replacing a cgroup-bpf program attached with BPF_F_ALLOW_MULTI and possible failure modes: invalid combination of flags, invalid replace_bpf_fd, replacing a non-attachd to specified cgroup program. Example of program replacing: # gdb -q --args ./test_progs --name=cgroup_attach_multi ... Breakpoint 1, test_cgroup_attach_multi () at cgroup_attach_multi.c:227 (gdb) [1]+ Stopped gdb -q --args ./test_progs --name=cgroup_attach_multi # bpftool c s /mnt/cgroup2/cgroup-test-work-dir/cg1 ID AttachType AttachFlags Name 2133 egress multi 2134 egress multi # fg gdb -q --args ./test_progs --name=cgroup_attach_multi (gdb) c Continuing. Breakpoint 2, test_cgroup_attach_multi () at cgroup_attach_multi.c:233 (gdb) [1]+ Stopped gdb -q --args ./test_progs --name=cgroup_attach_multi # bpftool c s /mnt/cgroup2/cgroup-test-work-dir/cg1 ID AttachType AttachFlags Name 2139 egress multi 2134 egress multi Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/7b9b83e8d5fb82e15b034341bd40b6fb2431eeba.1576741281.git.rdna@fb.com
2019-12-19	selftests/bpf: Convert test_cgroup_attach to prog_tests	Andrey Ignatov	6	-574/+498
	Convert test_cgroup_attach to prog_tests. This change does a lot of things but in many cases it's pretty expensive to separate them, so they go in one commit. Nevertheless the logic is ketp as is and changes made are just moving things around, simplifying them (w/o changing the meaning of the tests) and making prog_tests compatible: * split the 3 tests in the file into 3 separate files in prog_tests/; * rename the test functions to test_<file_base_name>; * remove unused includes, constants, variables and functions from every test; * replace `if`-s with or `if (CHECK())` where additional context should be logged and with `if (CHECK_FAIL())` where line number is enough; * switch from `log_err()` to logging via `CHECK()`; * replace `assert`-s with `CHECK_FAIL()` to avoid crashing the whole test_progs if one assertion fails; * replace cgroup_helpers with test__join_cgroup() in cgroup_attach_override only, other tests need more fine-grained control for cgroup creation/deletion so cgroup_helpers are still used there; * simplify cgroup_attach_autodetach by switching to easiest possible program since this test doesn't really need such a complicated program as cgroup_attach_multi does; * remove test_cgroup_attach.c itself. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/0ff19cc64d2dc5cf404349f07131119480e10e32.1576741281.git.rdna@fb.com
2019-12-19	libbpf: Introduce bpf_prog_attach_xattr	Andrey Ignatov	3	-1/+28
	Introduce a new bpf_prog_attach_xattr function that, in addition to program fd, target fd and attach type, accepts an extendable struct bpf_prog_attach_opts. bpf_prog_attach_opts relies on DECLARE_LIBBPF_OPTS macro to maintain backward and forward compatibility and has the following "optional" attach attributes: * existing attach_flags, since it's not required when attaching in NONE mode. Even though it's quite often used in MULTI and OVERRIDE mode it seems to be a good idea to reduce number of arguments to bpf_prog_attach_xattr; * newly introduced attribute of BPF_PROG_ATTACH command: replace_prog_fd that is fd of previously attached cgroup-bpf program to replace if BPF_F_REPLACE flag is used. The new function is named to be consistent with other xattr-functions (bpf_prog_test_run_xattr, bpf_create_map_xattr, bpf_load_program_xattr). The struct bpf_prog_attach_opts is supposed to be used with DECLARE_LIBBPF_OPTS macro. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/bd6e0732303eb14e4b79cb128268d9e9ad6db208.1576741281.git.rdna@fb.com
2019-12-19	bpf: Support replacing cgroup-bpf program in MULTI mode	Andrey Ignatov	1	-0/+10
	The common use-case in production is to have multiple cgroup-bpf programs per attach type that cover multiple use-cases. Such programs are attached with BPF_F_ALLOW_MULTI and can be maintained by different people. Order of programs usually matters, for example imagine two egress programs: the first one drops packets and the second one counts packets. If they're swapped the result of counting program will be different. It brings operational challenges with updating cgroup-bpf program(s) attached with BPF_F_ALLOW_MULTI since there is no way to replace a program: * One way to update is to detach all programs first and then attach the new version(s) again in the right order. This introduces an interruption in the work a program is doing and may not be acceptable (e.g. if it's egress firewall); * Another way is attach the new version of a program first and only then detach the old version. This introduces the time interval when two versions of same program are working, what may not be acceptable if a program is not idempotent. It also imposes additional burden on program developers to make sure that two versions of their program can co-exist. Solve the problem by introducing a "replace" mode in BPF_PROG_ATTACH command for cgroup-bpf programs being attached with BPF_F_ALLOW_MULTI flag. This mode is enabled by newly introduced BPF_F_REPLACE attach flag and bpf_attr.replace_bpf_fd attribute to pass fd of the old program to replace That way user can replace any program among those attached with BPF_F_ALLOW_MULTI flag without the problems described above. Details of the new API: * If BPF_F_REPLACE is set but replace_bpf_fd doesn't have valid descriptor of BPF program, BPF_PROG_ATTACH will return corresponding error (EINVAL or EBADF). * If replace_bpf_fd has valid descriptor of BPF program but such a program is not attached to specified cgroup, BPF_PROG_ATTACH will return ENOENT. BPF_F_REPLACE is introduced to make the user intent clear, since replace_bpf_fd alone can't be used for this (its default value, 0, is a valid fd). BPF_F_REPLACE also makes it possible to extend the API in the future (e.g. add BPF_F_BEFORE and BPF_F_AFTER if needed). Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Andrii Narkyiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/30cd850044a0057bdfcaaf154b7d2f39850ba813.1576741281.git.rdna@fb.com
2019-12-19	tc-testing: initial tdc selftests for cls_u32	Davide Caratti	2	-22/+205
	- move test "e9a3 - Add u32 with source match" to u32.json, and change the match pattern to catch all hnodes - add testcases for relevant error paths of cls_u32 module Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	David S. Miller	3	-24/+176
	Daniel Borkmann says: ==================== pull-request: bpf 2019-12-19 The following pull-request contains BPF updates for your net tree. We've added 10 non-merge commits during the last 8 day(s) which contain a total of 21 files changed, 269 insertions(+), 108 deletions(-). The main changes are: 1) Fix lack of synchronization between xsk wakeup and destroying resources used by xsk wakeup, from Maxim Mikityanskiy. 2) Fix pruning with tail call patching, untrack programs in case of verifier error and fix a cgroup local storage tracking bug, from Daniel Borkmann. 3) Fix clearing skb->tstamp in bpf_redirect() when going from ingress to egress which otherwise cause issues e.g. on fq qdisc, from Lorenz Bauer. 4) Fix compile warning of unused proc_dointvec_minmax_bpf_restricted() when only cBPF is present, from Alexander Lobakin. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-19	bpf: Add further test_verifier cases for record_func_key	Daniel Borkmann	3	-24/+176
	Expand dummy prog generation such that we can easily check on return codes and add few more test cases to make sure we keep on tracking pruning behavior. # ./test_verifier [...] #1066/p XDP pkt read, pkt_data <= pkt_meta', bad access 1 OK #1067/p XDP pkt read, pkt_data <= pkt_meta', bad access 2 OK Summary: 1580 PASSED, 0 SKIPPED, 0 FAILED Also verified that JIT dump of added test cases looks good. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/df7200b6021444fd369376d227de917357285b65.1576789878.git.daniel@iogearbox.net
2019-12-19	libbpf: Fix another __u64 printf warning	Andrii Nakryiko	1	-2/+2
	Fix yet another printf warning for %llu specifier on ppc64le. This time size_t casting won't work, so cast to verbose `unsigned long long`. Fixes: 166750bc1dd2 ("libbpf: Support libbpf-provided extern variables") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191219052103.3515-1-andriin@fb.com
2019-12-19	libbpf: Fix printing of ulimit value	Toke Høiland-Jørgensen	1	-1/+1
	Naresh pointed out that libbpf builds fail on 32-bit architectures because rlimit.rlim_cur is defined as 'unsigned long long' on those architectures. Fix this by using %zu in printf and casting to size_t. Fixes: dc3a2d254782 ("libbpf: Print hint about ulimit when getting permission denied error") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191219090236.905059-1-toke@redhat.com
2019-12-19	selftests/bpf: Fix test_attach_probe	Alexei Starovoitov	1	-3/+4
	Fix two issues in test_attach_probe: 1. it was not able to parse /proc/self/maps beyond the first line, since %s means parse string until white space. 2. offset has to be accounted for otherwise uprobed address is incorrect. Fixes: 1e8611bbdfc9 ("selftests/bpf: add kprobe/uprobe selftests") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20191219020442.1922617-1-ast@kernel.org
2019-12-19	libbpf: Add missing newline in opts validation macro	Toke Høiland-Jørgensen	1	-1/+1
	The error log output in the opts validation macro was missing a newline. Fixes: 2ce8450ef5a3 ("libbpf: add bpf_object__open_{file, mem} w/ extensible opts") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191219120714.928380-1-toke@redhat.com
2019-12-19	riscv, bpf: Add missing uapi header for BPF_PROG_TYPE_PERF_EVENT programs	Björn Töpel	1	-0/+2
	Add missing uapi header the BPF_PROG_TYPE_PERF_EVENT programs by exporting struct user_regs_struct instead of struct pt_regs which is in-kernel only. Signed-off-by: Björn Töpel <bjorn.topel@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191216091343.23260-9-bjorn.topel@gmail.com
2019-12-18	libbpf: BTF is required when externs are present	Andrii Nakryiko	1	-1/+2
	BTF is required to get type information about extern variables. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191219002837.3074619-4-andriin@fb.com
2019-12-18	libbpf: Allow to augment system Kconfig through extra optional config	Andrii Nakryiko	3	-112/+132
	Instead of all or nothing approach of overriding Kconfig file location, allow to extend it with extra values and override chosen subset of values though optional user-provided extra config, passed as a string through open options' .kconfig option. If same config key is present in both user-supplied config and Kconfig, user-supplied one wins. This allows applications to more easily test various conditions despite host kernel's real configuration. If all of BPF object's __kconfig externs are satisfied from user-supplied config, system Kconfig won't be read at all. Simplify selftests by not needing to create temporary Kconfig files. Suggested-by: Alexei Starovoitov <ast@fb.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191219002837.3074619-3-andriin@fb.com
2019-12-18	libbpf: Put Kconfig externs into .kconfig section	Andrii Nakryiko	6	-48/+60
	Move Kconfig-provided externs into custom .kconfig section. Add __kconfig into bpf_helpers.h for user convenience. Update selftests accordingly. Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191219002837.3074619-2-andriin@fb.com
2019-12-18	libbpf: Add bpf_link__disconnect() API to preserve underlying BPF resource	Andrii Nakryiko	3	-10/+32
	There are cases in which BPF resource (program, map, etc) has to outlive userspace program that "installed" it in the system in the first place. When BPF program is attached, libbpf returns bpf_link object, which is supposed to be destroyed after no longer necessary through bpf_link__destroy() API. Currently, bpf_link destruction causes both automatic detachment and frees up any resources allocated to for bpf_link in-memory representation. This is inconvenient for the case described above because of coupling of detachment and resource freeing. This patch introduces bpf_link__disconnect() API call, which marks bpf_link as disconnected from its underlying BPF resouces. This means that when bpf_link is destroyed later, all its memory resources will be freed, but BPF resource itself won't be detached. This design allows to follow strict and resource-leak-free design by default, while giving easy and straightforward way for user code to opt for keeping BPF resource attached beyond lifetime of a bpf_link. For some BPF programs (i.e., FS-based tracepoints, kprobes, raw tracepoint, etc), user has to make sure to pin BPF program to prevent kernel to automatically detach it on process exit. This should typically be achived by pinning BPF program (or map in some cases) in BPF FS. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191218225039.2668205-1-andriin@fb.com
2019-12-18	Merge tag 'tpmdd-next-20191219' of git://git.infradead.org/users/jjs/linux-tpmdd	Linus Torvalds	3	-2/+36
	Pull tpm fixes from Jarkko Sakkinen: "Bunch of fixes for rc3" * tag 'tpmdd-next-20191219' of git://git.infradead.org/users/jjs/linux-tpmdd: tpm/tpm_ftpm_tee: add shutdown call back tpm: selftest: cleanup after unseal with wrong auth/policy test tpm: selftest: add test covering async mode tpm: fix invalid locking in NONBLOCKING mode security: keys: trusted: fix lost handle flush tpm_tis: reserve chip for duration of tpm_tis_core_init KEYS: asymmetric: return ENOMEM if akcipher_request_alloc() fails KEYS: remove CONFIG_KEYS_COMPAT
2019-12-18	bpf: Allow to change skb mark in test_run	Nikita V. Shirokov	2	-0/+6
	allow to pass skb's mark field into bpf_prog_test_run ctx for BPF_PROG_TYPE_SCHED_CLS prog type. that would allow to test bpf programs which are doing decision based on this field Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-12-18	bpftool: Work-around rst2man conversion bug	Andrii Nakryiko	1	-7/+8
	Work-around what appears to be a bug in rst2man convertion tool, used to create man pages out of reStructureText-formatted documents. If text line starts with dot, rst2man will put it in resulting man file verbatim. This seems to cause man tool to interpret it as a directive/command (e.g., `.bs`), and subsequently not render entire line because it's unrecognized one. Enclose '.xxx' words in extra formatting to work around. Fixes: cb21ac588546 ("bpftool: Add gen subcommand manpage") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com Link: https://lore.kernel.org/bpf/20191218221707.2552199-1-andriin@fb.com
2019-12-18	bpftool: Simplify format string to not use positional args	Andrii Nakryiko	1	-2/+2
	Change format string referring to just single argument out of two available. Some versions of libc can reject such format string. Reported-by: Nikita Shirokov <tehnerd@tehnerd.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20191218214314.2403729-1-andriin@fb.com
2019-12-18	selftests: qdiscs: Add test coverage for ETS Qdisc	Petr Machata	1	-0/+940
	Add TDC coverage for the new ETS Qdisc. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18	selftests: forwarding: sch_ets: Add test coverage for ETS Qdisc	Petr Machata	5	-0/+666
	This tests the newly-added ETS Qdisc. It runs two to three streams of traffic, each with a different priority. ETS Qdisc is supposed to allocate bandwidth according to the DRR algorithm and given weights. After running the traffic for a while, counters are compared for each stream to check that the expected ratio is in fact observed. In order for the DRR process to kick in, a traffic bottleneck must exist in the first place. In slow path, such bottleneck can be implemented by wrapping the ETS Qdisc inside a TBF or other shaper. This might however make the configuration unoffloadable. Instead, on HW datapath, the bottleneck would be set up by lowering port speed and configuring shared buffer suitably. Therefore the test is structured as a core component that implements the testing, with two wrapper scripts that implement the details of slow path resp. fast path configuration. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-18	selftests: forwarding: Move start_/stop_traffic from mlxsw to lib.sh	Petr Machata	2	-18/+18
	These two functions are used for starting several streams of traffic, and then stopping them later. They will be handy for the test coverage of ETS Qdisc. Move them from mlxsw-specific qos_lib.sh to the generic lib.sh. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-17	bpftool: Add gen subcommand manpage	Andrii Nakryiko	2	-1/+306
	Add bpftool-gen.rst describing skeleton on the high level. Also include a small, but complete, example BPF app (BPF side, userspace side, generated skeleton) in example section to demonstrate skeleton API and its usage. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20191218052552.2915188-4-andriin@fb.com
2019-12-17	libbpf: Remove BPF_EMBED_OBJ macro from libbpf.h	Andrii Nakryiko	1	-35/+0
	Drop BPF_EMBED_OBJ and struct bpf_embed_data now that skeleton automatically embeds contents of its source object file. While BPF_EMBED_OBJ is useful independently of skeleton, we are currently don't have any use cases utilizing it, so let's remove them until/if we need it. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20191218052552.2915188-3-andriin@fb.com
2019-12-17	bpftool, selftests/bpf: Embed object file inside skeleton	Andrii Nakryiko	9	-119/+154
	Embed contents of BPF object file used for BPF skeleton generation inside skeleton itself. This allows to keep BPF object file and its skeleton in sync at all times, and simpifies skeleton instantiation. Also switch existing selftests to not require BPF_EMBED_OBJ anymore. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20191218052552.2915188-2-andriin@fb.com
2019-12-17	libbpf: Reduce log level for custom section names	Andrii Nakryiko	1	-3/+3
	Libbpf is trying to recognize BPF program type based on its section name during bpf_object__open() phase. This is not strictly enforced and user code has ability to specify/override correct BPF program type after open. But if BPF program is using custom section name, libbpf will still emit warnings, which can be quite annoying to users. This patch reduces log level of information messages emitted by libbpf if section name is not canonical. User can still get a list of all supported section names as debug-level message. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191217234228.1739308-1-andriin@fb.com
2019-12-18	libbpf: Fix libbpf_common.h when installing libbpf through 'make install'	Toke Høiland-Jørgensen	2	-0/+3
	This fixes two issues with the newly introduced libbpf_common.h file: - The header failed to include <string.h> for the definition of memset() - The new file was not included in the install_headers rule in the Makefile Both of these issues cause breakage when installing libbpf with 'make install' and trying to use it in applications. Fixes: 544402d4b493 ("libbpf: Extract common user-facing helpers") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20191217112810.768078-1-toke@redhat.com
2019-12-18	selftests/bpf: More succinct Makefile output	Andrii Nakryiko	1	-0/+36
	Similarly to bpftool/libbpf output, make selftests/bpf output succinct per-item output line. Output is roughly as follows: $ make ... CLANG-LLC [test_maps] pyperf600.o CLANG-LLC [test_maps] strobemeta.o CLANG-LLC [test_maps] pyperf100.o EXTRA-OBJ [test_progs] cgroup_helpers.o EXTRA-OBJ [test_progs] trace_helpers.o BINARY test_align BINARY test_verifier_log GEN-SKEL [test_progs] fexit_bpf2bpf.skel.h GEN-SKEL [test_progs] test_global_data.skel.h GEN-SKEL [test_progs] sendmsg6_prog.skel.h ... To see the actual command invocation, verbose mode can be turned on with V=1 argument: $ make V=1 ... very verbose output ... Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20191217061425.2346359-1-andriin@fb.com
2019-12-17	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	Linus Torvalds	65	-289/+976
	Pull perf tooling fixes from Ingo Molnar: "These are all perf tooling changes: most of them are fixes. Note that the large CPU count related fixes go beyond regression fixes, but the IPI-flood symptoms are severe enough that I think justifies their inclusion" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) perf vendor events s390: Remove name from L1D_RO_EXCL_WRITES description perf vendor events s390: Fix counter long description for DTLB1_GPAGE_WRITES libtraceevent: Allow custom libdir path perf header: Fix false warning when there are no duplicate cache entries perf metricgroup: Fix printing event names of metric group with multiple events perf/x86/pmu-events: Fix Kernel_Utilization metric perf top: Do not bail out when perf_env__read_cpuid() returns ENOSYS perf arch: Make the default get_cpuid() return compatible error tools headers kvm: Sync linux/kvm.h with the kernel sources tools headers UAPI: Update tools's copy of drm.h headers tools headers UAPI: Sync drm/i915_drm.h with the kernel sources perf inject: Fix processing of ID index for injected instruction tracing perf report: Bail out --mem-mode if mem info is not available perf report: Make -F more strict like -s perf report/top TUI: Replace pr_err() with ui__error() libtraceevent: Copy pkg-config file to output folder when using O= libtraceevent: Fix lib installation with O= perf kvm: Clarify the 'perf kvm' -i and -o command line options tools arch x86: Sync asm/cpufeatures.h with the kernel sources perf beauty: Add CLEAR_SIGHAND support for clone's flags arg ...