aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2020-08-13selftest/bpf: Fix compilation warnings in 32-bit modeAndrii Nakryiko9-19/+24
Fix compilation warnings emitted when compiling selftests for 32-bit platform (x86 in my case). Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200813204945.1020225-3-andriin@fb.com
2020-08-13tools/bpftool: Fix compilation warnings in 32-bit modeAndrii Nakryiko4-12/+20
Fix few compilation warnings in bpftool when compiling in 32-bit mode. Abstract away u64 to pointer conversion into a helper function. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200813204945.1020225-2-andriin@fb.com
2020-08-13doc: Add link to bpf helpers man pageJoe Stringer1-0/+7
The bpf-helpers(7) man pages provide an invaluable description of the functions that an eBPF program can call at runtime. Link them here. Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200813180807.2821735-1-joe@wand.net.nz
2020-08-13bpf, selftests: Add tests to sock_ops for loading skJohn Fastabend1-0/+21
Add tests to directly accesse sock_ops sk field. Then use it to ensure a bad pointer access will fault if something goes wrong. We do three tests: The first test ensures when we read sock_ops sk pointer into the same register that we don't fault as described earlier. Here r9 is chosen as the temp register. The xlated code is, 36: (7b) *(u64 *)(r1 +32) = r9 37: (61) r9 = *(u32 *)(r1 +28) 38: (15) if r9 == 0x0 goto pc+3 39: (79) r9 = *(u64 *)(r1 +32) 40: (79) r1 = *(u64 *)(r1 +0) 41: (05) goto pc+1 42: (79) r9 = *(u64 *)(r1 +32) The second test ensures the temp register selection does not collide with in-use register r9. Shown here r8 is chosen because r9 is the sock_ops pointer. The xlated code is as follows, 46: (7b) *(u64 *)(r9 +32) = r8 47: (61) r8 = *(u32 *)(r9 +28) 48: (15) if r8 == 0x0 goto pc+3 49: (79) r8 = *(u64 *)(r9 +32) 50: (79) r9 = *(u64 *)(r9 +0) 51: (05) goto pc+1 52: (79) r8 = *(u64 *)(r9 +32) And finally, ensure we didn't break the base case where dst_reg does not equal the source register, 56: (61) r2 = *(u32 *)(r1 +28) 57: (15) if r2 == 0x0 goto pc+1 58: (79) r2 = *(u64 *)(r1 +0) Notice it takes us an extra four instructions when src reg is the same as dst reg. One to save the reg, two to restore depending on the branch taken and a goto to jump over the second restore. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/159718355325.4728.4163036953345999636.stgit@john-Precision-5820-Tower
2020-08-13bpf, selftests: Add tests for sock_ops load with r9, r8.r7 registersJohn Fastabend1-0/+7
Loads in sock_ops case when using high registers requires extra logic to ensure the correct temporary value is used. We need to ensure the temp register does not use either the src_reg or dst_reg. Lets add an asm test to force the logic is triggered. The xlated code is here, 30: (7b) *(u64 *)(r9 +32) = r7 31: (61) r7 = *(u32 *)(r9 +28) 32: (15) if r7 == 0x0 goto pc+2 33: (79) r7 = *(u64 *)(r9 +0) 34: (63) *(u32 *)(r7 +916) = r8 35: (79) r7 = *(u64 *)(r9 +32) Notice r9 and r8 are not used for temp registers and r7 is chosen. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/159718353345.4728.8805043614257933227.stgit@john-Precision-5820-Tower
2020-08-13bpf, selftests: Add tests for ctx access in sock_ops with single registerJohn Fastabend1-0/+13
To verify fix ("bpf: sock_ops ctx access may stomp registers in corner case") we want to force compiler to generate the following code when accessing a field with BPF_TCP_SOCK_GET_COMMON, r1 = *(u32 *)(r1 + 96) // r1 is skops ptr Rather than depend on clang to do this we add the test with inline asm to the tcpbpf test. This saves us from having to create another runner and ensures that if we break this again test_tcpbpf will crash. With above code we get the xlated code, 11: (7b) *(u64 *)(r1 +32) = r9 12: (61) r9 = *(u32 *)(r1 +28) 13: (15) if r9 == 0x0 goto pc+4 14: (79) r9 = *(u64 *)(r1 +32) 15: (79) r1 = *(u64 *)(r1 +0) 16: (61) r1 = *(u32 *)(r1 +2348) 17: (05) goto pc+1 18: (79) r9 = *(u64 *)(r1 +32) We also add the normal case where src_reg != dst_reg so we can compare code generation easily from llvm-objdump and ensure that case continues to work correctly. The normal code is xlated to, 20: (b7) r1 = 0 21: (61) r1 = *(u32 *)(r3 +28) 22: (15) if r1 == 0x0 goto pc+2 23: (79) r1 = *(u64 *)(r3 +0) 24: (61) r1 = *(u32 *)(r1 +2348) Where the temp variable is not used. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/159718351457.4728.3295119261717842496.stgit@john-Precision-5820-Tower
2020-08-13bpf: sock_ops sk access may stomp registers when dst_reg = src_regJohn Fastabend1-11/+38
Similar to patch ("bpf: sock_ops ctx access may stomp registers") if the src_reg = dst_reg when reading the sk field of a sock_ops struct we generate xlated code, 53: (61) r9 = *(u32 *)(r9 +28) 54: (15) if r9 == 0x0 goto pc+3 56: (79) r9 = *(u64 *)(r9 +0) This stomps on the r9 reg to do the sk_fullsock check and then when reading the skops->sk field instead of the sk pointer we get the sk_fullsock. To fix use similar pattern noted in the previous fix and use the temp field to save/restore a register used to do sk_fullsock check. After the fix the generated xlated code reads, 52: (7b) *(u64 *)(r9 +32) = r8 53: (61) r8 = *(u32 *)(r9 +28) 54: (15) if r9 == 0x0 goto pc+3 55: (79) r8 = *(u64 *)(r9 +32) 56: (79) r9 = *(u64 *)(r9 +0) 57: (05) goto pc+1 58: (79) r8 = *(u64 *)(r9 +32) Here r9 register was in-use so r8 is chosen as the temporary register. In line 52 r8 is saved in temp variable and at line 54 restored in case fullsock != 0. Finally we handle fullsock == 0 case by restoring at line 58. This adds a new macro SOCK_OPS_GET_SK it is almost possible to merge this with SOCK_OPS_GET_FIELD, but I found the extra branch logic a bit more confusing than just adding a new macro despite a bit of duplicating code. Fixes: 1314ef561102e ("bpf: export bpf_sock for BPF_PROG_TYPE_SOCK_OPS prog type") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/159718349653.4728.6559437186853473612.stgit@john-Precision-5820-Tower
2020-08-13bpf: sock_ops ctx access may stomp registers in corner caseJohn Fastabend1-2/+24
I had a sockmap program that after doing some refactoring started spewing this splat at me: [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 [...] [18610.807359] Call Trace: [18610.807370] ? 0xffffffffc114d0d5 [18610.807382] __cgroup_bpf_run_filter_sock_ops+0x7d/0xb0 [18610.807391] tcp_connect+0x895/0xd50 [18610.807400] tcp_v4_connect+0x465/0x4e0 [18610.807407] __inet_stream_connect+0xd6/0x3a0 [18610.807412] ? __inet_stream_connect+0x5/0x3a0 [18610.807417] inet_stream_connect+0x3b/0x60 [18610.807425] __sys_connect+0xed/0x120 After some debugging I was able to build this simple reproducer, __section("sockops/reproducer_bad") int bpf_reproducer_bad(struct bpf_sock_ops *skops) { volatile __maybe_unused __u32 i = skops->snd_ssthresh; return 0; } And along the way noticed that below program ran without splat, __section("sockops/reproducer_good") int bpf_reproducer_good(struct bpf_sock_ops *skops) { volatile __maybe_unused __u32 i = skops->snd_ssthresh; volatile __maybe_unused __u32 family; compiler_barrier(); family = skops->family; return 0; } So I decided to check out the code we generate for the above two programs and noticed each generates the BPF code you would expect, 0000000000000000 <bpf_reproducer_bad>: ; volatile __maybe_unused __u32 i = skops->snd_ssthresh; 0: r1 = *(u32 *)(r1 + 96) 1: *(u32 *)(r10 - 4) = r1 ; return 0; 2: r0 = 0 3: exit 0000000000000000 <bpf_reproducer_good>: ; volatile __maybe_unused __u32 i = skops->snd_ssthresh; 0: r2 = *(u32 *)(r1 + 96) 1: *(u32 *)(r10 - 4) = r2 ; family = skops->family; 2: r1 = *(u32 *)(r1 + 20) 3: *(u32 *)(r10 - 8) = r1 ; return 0; 4: r0 = 0 5: exit So we get reasonable assembly, but still something was causing the null pointer dereference. So, we load the programs and dump the xlated version observing that line 0 above 'r* = *(u32 *)(r1 +96)' is going to be translated by the skops access helpers. int bpf_reproducer_bad(struct bpf_sock_ops * skops): ; volatile __maybe_unused __u32 i = skops->snd_ssthresh; 0: (61) r1 = *(u32 *)(r1 +28) 1: (15) if r1 == 0x0 goto pc+2 2: (79) r1 = *(u64 *)(r1 +0) 3: (61) r1 = *(u32 *)(r1 +2340) ; volatile __maybe_unused __u32 i = skops->snd_ssthresh; 4: (63) *(u32 *)(r10 -4) = r1 ; return 0; 5: (b7) r0 = 0 6: (95) exit int bpf_reproducer_good(struct bpf_sock_ops * skops): ; volatile __maybe_unused __u32 i = skops->snd_ssthresh; 0: (61) r2 = *(u32 *)(r1 +28) 1: (15) if r2 == 0x0 goto pc+2 2: (79) r2 = *(u64 *)(r1 +0) 3: (61) r2 = *(u32 *)(r2 +2340) ; volatile __maybe_unused __u32 i = skops->snd_ssthresh; 4: (63) *(u32 *)(r10 -4) = r2 ; family = skops->family; 5: (79) r1 = *(u64 *)(r1 +0) 6: (69) r1 = *(u16 *)(r1 +16) ; family = skops->family; 7: (63) *(u32 *)(r10 -8) = r1 ; return 0; 8: (b7) r0 = 0 9: (95) exit Then we look at lines 0 and 2 above. In the good case we do the zero check in r2 and then load 'r1 + 0' at line 2. Do a quick cross-check into the bpf_sock_ops check and we can confirm that is the 'struct sock *sk' pointer field. But, in the bad case, 0: (61) r1 = *(u32 *)(r1 +28) 1: (15) if r1 == 0x0 goto pc+2 2: (79) r1 = *(u64 *)(r1 +0) Oh no, we read 'r1 +28' into r1, this is skops->fullsock and then in line 2 we read the 'r1 +0' as a pointer. Now jumping back to our spat, [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 The 0x01 makes sense because that is exactly the fullsock value. And its not a valid dereference so we splat. To fix we need to guard the case when a program is doing a sock_ops field access with src_reg == dst_reg. This is already handled in the load case where the ctx_access handler uses a tmp register being careful to store the old value and restore it. To fix the get case test if src_reg == dst_reg and in this case do the is_fullsock test in the temporary register. Remembering to restore the temporary register before writing to either dst_reg or src_reg to avoid smashing the pointer into the struct holding the tmp variable. Adding this inline code to test_tcpbpf_kern will now be generated correctly from, 9: r2 = *(u32 *)(r2 + 96) to xlated code, 12: (7b) *(u64 *)(r2 +32) = r9 13: (61) r9 = *(u32 *)(r2 +28) 14: (15) if r9 == 0x0 goto pc+4 15: (79) r9 = *(u64 *)(r2 +32) 16: (79) r2 = *(u64 *)(r2 +0) 17: (61) r2 = *(u32 *)(r2 +2348) 18: (05) goto pc+1 19: (79) r9 = *(u64 *)(r2 +32) And in the normal case we keep the original code, because really this is an edge case. From this, 9: r2 = *(u32 *)(r6 + 96) to xlated code, 22: (61) r2 = *(u32 *)(r6 +28) 23: (15) if r2 == 0x0 goto pc+2 24: (79) r2 = *(u64 *)(r6 +0) 25: (61) r2 = *(u32 *)(r2 +2348) So three additional instructions if dst == src register, but I scanned my current code base and did not see this pattern anywhere so should not be a big deal. Further, it seems no one else has hit this or at least reported it so it must a fairly rare pattern. Fixes: 9b1f3d6e5af29 ("bpf: Refactor sock_ops_convert_ctx_access") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/159718347772.4728.2781381670567919577.stgit@john-Precision-5820-Tower
2020-08-13libbpf: Prevent overriding errno when logging errorsToke Høiland-Jørgensen1-5/+7
Turns out there were a few more instances where libbpf didn't save the errno before writing an error message, causing errno to be overridden by the printf() return and the error disappearing if logging is enabled. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200813142905.160381-1-toke@redhat.com
2020-08-12bpf: Iterate through all PT_NOTE sections when looking for build idJiri Olsa1-10/+14
Currently when we look for build id within bpf_get_stackid helper call, we check the first NOTE section and we fail if build id is not there. However on some system (Fedora) there can be multiple NOTE sections in binaries and build id data is not always the first one, like: $ readelf -a /usr/bin/ls ... [ 2] .note.gnu.propert NOTE 0000000000000338 00000338 0000000000000020 0000000000000000 A 0 0 8358 [ 3] .note.gnu.build-i NOTE 0000000000000358 00000358 0000000000000024 0000000000000000 A 0 0 437c [ 4] .note.ABI-tag NOTE 000000000000037c 0000037c ... So the stack_map_get_build_id function will fail on build id retrieval and fallback to BPF_STACK_BUILD_ID_IP. This patch is changing the stack_map_get_build_id code to iterate through all the NOTE sections and try to get build id data from each of them. When tracing on sched_switch tracepoint that does bpf_get_stackid helper call kernel build, I can see about 60% increase of successful build id retrieval. The rest seems fails on -EFAULT. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200812123102.20032-1-jolsa@kernel.org
2020-08-12libbpf: Handle GCC built-in types for Arm NEONJean-Philippe Brucker1-1/+34
When building Arm NEON (SIMD) code from lib/raid6/neon.uc, GCC emits DWARF information using a base type "__Poly8_t", which is internal to GCC and not recognized by Clang. This causes build failures when building with Clang a vmlinux.h generated from an arm64 kernel that was built with GCC. vmlinux.h:47284:9: error: unknown type name '__Poly8_t' typedef __Poly8_t poly8x16_t[16]; ^~~~~~~~~ The polyX_t types are defined as unsigned integers in the "Arm C Language Extension" document (101028_Q220_00_en). Emit typedefs based on standard integer types for the GCC internal types, similar to those emitted by Clang. Including linux/kernel.h to use ARRAY_SIZE() incidentally redefined max(), causing a build bug due to different types, hence the seemingly unrelated change. Reported-by: Jakov Petrina <jakov.petrina@sartura.hr> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200812143909.3293280-1-jean-philippe@linaro.org
2020-08-12tools/bpftool: Make skeleton code C++17-friendly by dropping typeof()Andrii Nakryiko1-4/+4
Seems like C++17 standard mode doesn't recognize typeof() anymore. This can be tested by compiling test_cpp test with -std=c++17 or -std=c++1z options. The use of typeof in skeleton generated code is unnecessary, all types are well-known at the time of code generation, so remove all typeof()'s to make skeleton code more future-proof when interacting with C++ compilers. Fixes: 985ead416df3 ("bpftool: Add skeleton codegen command") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200812025907.1371956-1-andriin@fb.com
2020-08-12bpf: Fix XDP FD-based attach/detach logic around XDP_FLAGS_UPDATE_IF_NOEXISTAndrii Nakryiko1-4/+4
Enforce XDP_FLAGS_UPDATE_IF_NOEXIST only if new BPF program to be attached is non-NULL (i.e., we are not detaching a BPF program). Fixes: d4baa9368a5e ("bpf, xdp: Extract common XDP program attachment logic") Reported-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Stanislav Fomichev <sdf@google.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20200812022923.1217922-1-andriin@fb.com
2020-08-11selftests/bpf: Fix v4_to_v6 in sk_lookupStanislav Fomichev1-0/+1
I'm getting some garbage in bytes 8 and 9 when doing conversion from sockaddr_in to sockaddr_in6 (leftover from AF_INET?). Let's explicitly clear the higher bytes. Fixes: 0ab5539f8584 ("selftests/bpf: Tests for BPF_SK_LOOKUP attach point") Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20200807223846.4190917-1-sdf@google.com
2020-08-11selftests/bpf: Fix segmentation fault in test_progsJianlin Lv1-6/+13
test_progs reports the segmentation fault as below: $ sudo ./test_progs -t mmap --verbose test_mmap:PASS:skel_open_and_load 0 nsec [...] test_mmap:PASS:adv_mmap1 0 nsec test_mmap:PASS:adv_mmap2 0 nsec test_mmap:PASS:adv_mmap3 0 nsec test_mmap:PASS:adv_mmap4 0 nsec Segmentation fault This issue was triggered because mmap() and munmap() used inconsistent length parameters; mmap() creates a new mapping of 3 * page_size, but the length parameter set in the subsequent re-map and munmap() functions is 4 * page_size; this leads to the destruction of the process space. To fix this issue, first create 4 pages of anonymous mapping, then do all the mmap() with MAP_FIXED. Another issue is that when unmap the second page fails, the length parameter to delete tmp1 mappings should be 4 * page_size. Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200810153940.125508-1-Jianlin.Lv@arm.com
2020-08-11libbpf: Do not use __builtin_offsetof for offsetofYonghong Song1-1/+1
Commit 5fbc220862fc ("tools/libpf: Add offsetof/container_of macro in bpf_helpers.h") added a macro offsetof() to get the offset of a structure member: #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER) In certain use cases, size_t type may not be available so Commit da7a35062bcc ("libbpf bpf_helpers: Use __builtin_offsetof for offsetof") changed to use __builtin_offsetof which removed the dependency on type size_t, which I suggested. But using __builtin_offsetof will prevent CO-RE relocation generation in case that, e.g., TYPE is annotated with "preserve_access_info" where a relocation is desirable in case the member offset is changed in a different kernel version. So this patch reverted back to the original macro but using "unsigned long" instead of "site_t". Fixes: da7a35062bcc ("libbpf bpf_helpers: Use __builtin_offsetof for offsetof") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/bpf/20200811030852.3396929-1-yhs@fb.com
2020-08-10bitfield.h: don't compile-time validate _val in FIELD_FITJakub Kicinski1-1/+1
When ur_load_imm_any() is inlined into jeq_imm(), it's possible for the compiler to deduce a case where _val can only have the value of -1 at compile time. Specifically, /* struct bpf_insn: _s32 imm */ u64 imm = insn->imm; /* sign extend */ if (imm >> 32) { /* non-zero only if insn->imm is negative */ /* inlined from ur_load_imm_any */ u32 __imm = imm >> 32; /* therefore, always 0xffffffff */ if (__builtin_constant_p(__imm) && __imm > 255) compiletime_assert_XXX() This can result in tripping a BUILD_BUG_ON() in __BF_FIELD_CHECK() that checks that a given value is representable in one byte (interpreted as unsigned). FIELD_FIT() should return true or false at runtime for whether a value can fit for not. Don't break the build over a value that's too large for the mask. We'd prefer to keep the inlining and compiler optimizations though we know this case will always return false. Cc: stable@vger.kernel.org Fixes: 1697599ee301a ("bitfield.h: add FIELD_FIT() helper") Link: https://lore.kernel.org/kernel-hardening/CAK7LNASvb0UDJ0U5wkYYRzTAdnEs64HjXpEUL7d=V0CXiAXcNw@mail.gmail.com/ Reported-by: Masahiro Yamada <masahiroy@kernel.org> Debugged-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-10tcp: correct read of TFO keys on big endian systemsJason Baron4-24/+33
When TFO keys are read back on big endian systems either via the global sysctl interface or via getsockopt() using TCP_FASTOPEN_KEY, the values don't match what was written. For example, on s390x: # echo "1-2-3-4" > /proc/sys/net/ipv4/tcp_fastopen_key # cat /proc/sys/net/ipv4/tcp_fastopen_key 02000000-01000000-04000000-03000000 Instead of: # cat /proc/sys/net/ipv4/tcp_fastopen_key 00000001-00000002-00000003-00000004 Fix this by converting to the correct endianness on read. This was reported by Colin Ian King when running the 'tcp_fastopen_backup_key' net selftest on s390x, which depends on the read value matching what was written. I've confirmed that the test now passes on big and little endian systems. Signed-off-by: Jason Baron <jbaron@akamai.com> Fixes: 438ac88009bc ("net: fastopen: robustness and endianness fixes for SipHash") Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Eric Dumazet <edumazet@google.com> Reported-and-tested-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-10nfp: update maintainerJakub Kicinski1-1/+2
I'm not doing much work on the NFP driver any more. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-10vmxnet3: use correct tcp hdr length when packet is encapsulatedRonak Doshi1-1/+2
Commit dacce2be3312 ("vmxnet3: add geneve and vxlan tunnel offload support") added support for encapsulation offload. However, while calculating tcp hdr length, it does not take into account if the packet is encapsulated or not. This patch fixes this issue by using correct reference for inner tcp header. Fixes: dacce2be3312 ("vmxnet3: add geneve and vxlan tunnel offload support") Signed-off-by: Ronak Doshi <doshir@vmware.com> Acked-by: Guolin Yang <gyang@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-10net: Revert "net: optimize the sockptr_t for unified kernel/user address spaces"Christoph Hellwig3-37/+9
This reverts commits 6d04fe15f78acdf8e32329e208552e226f7a8ae6 and a31edb2059ed4e498f9aa8230c734b59d0ad797a. It turns out the idea to share a single pointer for both kernel and user space address causes various kinds of problems. So use the slightly less optimal version that uses an extra bit, but which is guaranteed to be safe everywhere. Fixes: 6d04fe15f78a ("net: optimize the sockptr_t for unified kernel/user address spaces") Reported-by: Eric Dumazet <edumazet@google.com> Reported-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-08net: Convert to use the fallthrough macroMiaohe Lin1-3/+3
Convert the uses of fallthrough comments to fallthrough macro. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-08net: Use helper function ip_is_fragment()Miaohe Lin1-1/+1
Use helper function ip_is_fragment() to check ip fragment. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-08net: Remove meaningless jump label out_fsMiaohe Lin1-2/+1
The out_fs jump label has nothing to do but goto out. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-08net: Set fput_needed iff FDPUT_FPUT is setMiaohe Lin1-1/+1
We should fput() file iff FDPUT_FPUT is set. So we should set fput_needed accordingly. Fixes: 00e188ef6a7e ("sockfd_lookup_light(): switch to fdget^W^Waway from fget_light") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-08net: Use helper function fdput()Miaohe Lin1-4/+2
Use helper function fdput() to fput() the file iff FDPUT_FPUT is set. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-08net: phy: fix memory leak in device-create error pathJohan Hovold1-4/+4
A recent commit introduced a late error path in phy_device_create() which fails to release the device name allocated by dev_set_name(). Fixes: 13d0ab6750b2 ("net: phy: check return code when requesting PHY driver module") Cc: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-07net/tls: allow MSG_CMSG_COMPAT in sendmsgRouven Czerwinski1-1/+2
Trying to use ktls on a system with 32-bit userspace and 64-bit kernel results in a EOPNOTSUPP message during sendmsg: setsockopt(3, SOL_TLS, TLS_TX, …, 40) = 0 sendmsg(3, …, msg_flags=0}, 0) = -1 EOPNOTSUPP (Operation not supported) The tls_sw implementation does strict flag checking and does not allow the MSG_CMSG_COMPAT flag, which is set if the message comes in through the compat syscall. This patch adds MSG_CMSG_COMPAT to the flag check to allow the usage of the TLS SW implementation on systems using the compat syscall path. Note that the same check is present in the sendmsg path for the TLS device implementation, however the flag hasn't been added there for lack of testing hardware. Signed-off-by: Rouven Czerwinski <r.czerwinski@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-07mptcp: fix warn at shutdown time for unaccepted msk socketsPaolo Abeni1-3/+3
With commit b93df08ccda3 ("mptcp: explicitly track the fully established status"), the status of unaccepted mptcp closed in mptcp_sock_destruct() changes from TCP_SYN_RECV to TCP_ESTABLISHED. As a result mptcp_sock_destruct() does not perform the proper cleanup and inet_sock_destruct() will later emit a warn. Address the issue updating the condition tested in mptcp_sock_destruct(). Also update the related comment. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/66 Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com> Fixes: b93df08ccda3 ("mptcp: explicitly track the fully established status") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-07mptcp: more stable diag self-testsPaolo Abeni1-4/+5
During diag self-tests we introduce long wait in the mptcp test program to give the script enough time to access the sockets dump. Such wait is introduced after shutting down one sockets end. Since commit 43b54c6ee382 ("mptcp: Use full MPTCP-level disconnect state machine") if both sides shutdown the socket is correctly transitioned into CLOSED status. As a side effect some sockets are not dumped via the diag interface, because the socket state (CLOSED) does not match the default filter, and this cause self-tests instability. Address the issue moving the above mentioned wait before shutting down the socket. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/68 Fixes: df62f2ec3df6 ("selftests/mptcp: add diag interface tests") Tested-and-acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-07selftests: mptcp: fix dependeciesPaolo Abeni1-0/+2
Since commit df62f2ec3df6 ("selftests/mptcp: add diag interface tests") the MPTCP selftests relies on the MPTCP diag interface which is enabled by a specific kconfig knob: be sure to include it. Fixes: df62f2ec3df6 ("selftests/mptcp: add diag interface tests") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-07r8152: Use MAC address from correct device tree nodeThierry Reding1-1/+1
Query the USB device's device tree node when looking for a MAC address. The struct device embedded into the struct net_device does not have a device tree node attached at all. The reason why this went unnoticed is because the system where this was tested was one of the few development units that had its OTP programmed, as opposed to production systems where the MAC address is stored in a separate EEPROM and is passed via device tree by the firmware. Reported-by: EJ Hsu <ejh@nvidia.com> Fixes: acb6d3771a03 ("r8152: Use MAC address from device tree if available") Signed-off-by: Thierry Reding <treding@nvidia.com> Reviewed-by: EJ Hsu <ejh@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-07bpf: Delete repeated words in commentsRandy Dunlap2-2/+2
Drop repeated words in kernel/bpf/: {has, the} Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200807033141.10437-1-rdunlap@infradead.org
2020-08-07selftests/bpf: Fix silent Makefile outputAndrii Nakryiko1-22/+26
99aacebecb75 ("selftests: do not use .ONESHELL") removed .ONESHELL, which changes how Makefile "silences" multi-command target recipes. selftests/bpf's Makefile relied (a somewhat unknowingly) on .ONESHELL behavior of silencing all commands within the recipe if the first command contains @ symbol. Removing .ONESHELL exposed this hack. This patch fixes the issue by explicitly silencing each command with $(Q). Also explicitly define fallback rule for building *.o from *.c, instead of relying on non-silent inherited rule. This was causing a non-silent output for bench.o object file. Fixes: 92f7440ecc93 ("selftests/bpf: More succinct Makefile output") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200807033058.848677-1-andriin@fb.com
2020-08-07bpf, doc: Remove references to warning message when using bpf_trace_printk()Alan Maguire1-11/+0
The BPF helper bpf_trace_printk() no longer uses trace_printk(); it is now triggers a dedicated trace event. Hence the described warning is no longer present, so remove the discussion of it as it may confuse people. Fixes: ac5a72ea5c89 ("bpf: Use dedicated bpf_trace_printk event instead of trace_printk()") Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1596801029-32395-1-git-send-email-alan.maguire@oracle.com
2020-08-06drivers/net/wan/lapbether: Added needed_headroom and a skb->len checkXie He1-1/+9
1. Added a skb->len check This driver expects upper layers to include a pseudo header of 1 byte when passing down a skb for transmission. This driver will read this 1-byte header. This patch added a skb->len check before reading the header to make sure the header exists. 2. Changed to use needed_headroom instead of hard_header_len to request necessary headroom to be allocated In net/packet/af_packet.c, the function packet_snd first reserves a headroom of length (dev->hard_header_len + dev->needed_headroom). Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header, which calls dev->header_ops->create, to create the link layer header. If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of length (dev->hard_header_len), and assumes the user to provide the appropriate link layer header. So according to the logic of af_packet.c, dev->hard_header_len should be the length of the header that would be created by dev->header_ops->create. However, this driver doesn't provide dev->header_ops, so logically dev->hard_header_len should be 0. So we should use dev->needed_headroom instead of dev->hard_header_len to request necessary headroom to be allocated. This change fixes kernel panic when this driver is used with AF_PACKET SOCK_RAW sockets. Call stack when panic: [ 168.399197] skbuff: skb_under_panic: text:ffffffff819d95fb len:20 put:14 head:ffff8882704c0a00 data:ffff8882704c09fd tail:0x11 end:0xc0 dev:veth0 ... [ 168.399255] Call Trace: [ 168.399259] skb_push.cold+0x14/0x24 [ 168.399262] eth_header+0x2b/0xc0 [ 168.399267] lapbeth_data_transmit+0x9a/0xb0 [lapbether] [ 168.399275] lapb_data_transmit+0x22/0x2c [lapb] [ 168.399277] lapb_transmit_buffer+0x71/0xb0 [lapb] [ 168.399279] lapb_kick+0xe3/0x1c0 [lapb] [ 168.399281] lapb_data_request+0x76/0xc0 [lapb] [ 168.399283] lapbeth_xmit+0x56/0x90 [lapbether] [ 168.399286] dev_hard_start_xmit+0x91/0x1f0 [ 168.399289] ? irq_init_percpu_irqstack+0xc0/0x100 [ 168.399291] __dev_queue_xmit+0x721/0x8e0 [ 168.399295] ? packet_parse_headers.isra.0+0xd2/0x110 [ 168.399297] dev_queue_xmit+0x10/0x20 [ 168.399298] packet_sendmsg+0xbf0/0x19b0 ...... Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Cc: Martin Schiller <ms@dev.tdt.de> Cc: Brian Norris <briannorris@chromium.org> Signed-off-by: Xie He <xie.he.0141@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-06bpf: Fix compilation warning of selftestsJianlin Lv3-14/+21
Clang compiler version: 12.0.0 The following warning appears during the selftests/bpf compilation: prog_tests/send_signal.c:51:3: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result] 51 | write(pipe_c2p[1], buf, 1); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ prog_tests/send_signal.c:54:3: warning: ignoring return value of ‘read’, declared with attribute warn_unused_result [-Wunused-result] 54 | read(pipe_p2c[0], buf, 1); | ^~~~~~~~~~~~~~~~~~~~~~~~~ ...... prog_tests/stacktrace_build_id_nmi.c:13:2: warning: ignoring return value of ‘fscanf’,declared with attribute warn_unused_result [-Wunused-resul] 13 | fscanf(f, "%llu", &sample_freq); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ test_tcpnotify_user.c:133:2: warning:ignoring return value of ‘system’, declared with attribute warn_unused_result [-Wunused-result] 133 | system(test_script); | ^~~~~~~~~~~~~~~~~~~ test_tcpnotify_user.c:138:2: warning:ignoring return value of ‘system’, declared with attribute warn_unused_result [-Wunused-result] 138 | system(test_script); | ^~~~~~~~~~~~~~~~~~~ test_tcpnotify_user.c:143:2: warning:ignoring return value of ‘system’, declared with attribute warn_unused_result [-Wunused-result] 143 | system(test_script); | ^~~~~~~~~~~~~~~~~~~ Add code that fix compilation warning about ignoring return value and handles any errors; Check return value of library`s API make the code more secure. Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200806104224.95306-1-Jianlin.Lv@arm.com