aboutsummaryrefslogtreecommitdiffstats
path: root/net/netfilter (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2019-03-08xsk: fix to reject invalid flags in xsk_bindBjörn Töpel1-1/+4
Passing a non-existing flag in the sxdp_flags member of struct sockaddr_xdp was, incorrectly, silently ignored. This patch addresses that behavior, and rejects any non-existing flags. We have examined existing user space code, and to our best knowledge, no one is relying on the current incorrect behavior. AF_XDP is still in its infancy, so from our perspective, the risk of breakage is very low, and addressing this problem now is important. Fixes: 965a99098443 ("xsk: add support for bind for Rx") Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-08bpf, libbpf: fixing leak when kernel does not support btfNikita V. Shirokov1-0/+2
We could end up in situation when we have object file w/ all btf info, but kernel does not support btf yet. In this situation currently libbpf just set obj->btf to NULL w/o freeing it first. This patch is fixing it by making sure to run btf__free first. Fixes: d29d87f7e612 ("btf: separate btf creation and loading") Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-07bpf: fix replace_map_fd_with_map_ptr's ldimm64 second imm fieldDaniel Borkmann2-6/+19
Non-zero imm value in the second part of the ldimm64 instruction for BPF_PSEUDO_MAP_FD is invalid, and thus must be rejected. The map fd only ever sits in the first instructions' imm field. None of the BPF loaders known to us are using it, so risk of regression is minimal. For clarity and consistency, the few insn->{src_reg,imm} occurrences are rewritten into insn[0].{src_reg,imm}. Add a test case to the BPF selftest suite as well. Fixes: 0246e64d9a5f ("bpf: handle pseudo BPF_LD_IMM64 insn") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-07bpf: Stop the psock parser before canceling its workJakub Sitnicki1-0/+1
We might have never enabled (started) the psock's parser, in which case it will not get stopped when destroying the psock. This leads to a warning when trying to cancel parser's work from psock's deferred destructor: [ 405.325769] WARNING: CPU: 1 PID: 3216 at net/strparser/strparser.c:526 strp_done+0x3c/0x40 [ 405.326712] Modules linked in: [last unloaded: test_bpf] [ 405.327359] CPU: 1 PID: 3216 Comm: kworker/1:164 Tainted: G W 5.0.0 #42 [ 405.328294] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014 [ 405.329712] Workqueue: events sk_psock_destroy_deferred [ 405.330254] RIP: 0010:strp_done+0x3c/0x40 [ 405.330706] Code: 28 e8 b8 d5 6b ff 48 8d bb 80 00 00 00 e8 9c d5 6b ff 48 8b 7b 18 48 85 ff 74 0d e8 1e a5 e8 ff 48 c7 43 18 00 00 00 00 5b c3 <0f> 0b eb cf 66 66 66 66 90 55 89 f5 53 48 89 fb 48 83 c7 28 e8 0b [ 405.332862] RSP: 0018:ffffc900026bbe50 EFLAGS: 00010246 [ 405.333482] RAX: ffffffff819323e0 RBX: ffff88812cb83640 RCX: ffff88812cb829e8 [ 405.334228] RDX: 0000000000000001 RSI: ffff88812cb837e8 RDI: ffff88812cb83640 [ 405.335366] RBP: ffff88813fd22680 R08: 0000000000000000 R09: 000073746e657665 [ 405.336472] R10: 8080808080808080 R11: 0000000000000001 R12: ffff88812cb83600 [ 405.337760] R13: 0000000000000000 R14: ffff88811f401780 R15: ffff88812cb837e8 [ 405.338777] FS: 0000000000000000(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000 [ 405.339903] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 405.340821] CR2: 00007fb11489a6b8 CR3: 000000012d4d6000 CR4: 00000000000406e0 [ 405.341981] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 405.343131] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 405.344415] Call Trace: [ 405.344821] sk_psock_destroy_deferred+0x23/0x1b0 [ 405.345585] process_one_work+0x1ae/0x3e0 [ 405.346110] worker_thread+0x3c/0x3b0 [ 405.346576] ? pwq_unbound_release_workfn+0xd0/0xd0 [ 405.347187] kthread+0x11d/0x140 [ 405.347601] ? __kthread_parkme+0x80/0x80 [ 405.348108] ret_from_fork+0x35/0x40 [ 405.348566] ---[ end trace a4a3af4026a327d4 ]--- Stop psock's parser just before canceling its work. Fixes: 1d79895aef18 ("sk_msg: Always cancel strp work before freeing the psock") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-07selftests: bpf: test_progs: initialize duration in singal_pending testStanislav Fomichev1-1/+1
CHECK macro implicitly uses duration. We call CHECK() a couple of times before duration is initialized from bpf_prog_test_run(). Explicitly set duration to 0 to avoid compiler warnings. Fixes: 740f8a657221 ("selftests/bpf: make sure signal interrupts BPF_PROG_TEST_RUN") Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-07libbpf: force fixdep compilation at the start of the buildStanislav Fomichev1-1/+2
libbpf targets don't explicitly depend on fixdep target, so when we do 'make -j$(nproc)', there is a high probability, that some objects will be built before fixdep binary is available. Fix this by running sub-make; this makes sure that fixdep dependency is properly accounted for. For the same issue in perf, see commit abb26210a395 ("perf tools: Force fixdep compilation at the start of the build"). Before: $ rm -rf /tmp/bld; mkdir /tmp/bld; make -j$(nproc) O=/tmp/bld -C tools/lib/bpf/ Auto-detecting system features: ... libelf: [ on ] ... bpf: [ on ] HOSTCC /tmp/bld/fixdep.o CC /tmp/bld/libbpf.o CC /tmp/bld/bpf.o CC /tmp/bld/btf.o CC /tmp/bld/nlattr.o CC /tmp/bld/libbpf_errno.o CC /tmp/bld/str_error.o CC /tmp/bld/netlink.o CC /tmp/bld/bpf_prog_linfo.o CC /tmp/bld/libbpf_probes.o CC /tmp/bld/xsk.o HOSTLD /tmp/bld/fixdep-in.o LINK /tmp/bld/fixdep LD /tmp/bld/libbpf-in.o LINK /tmp/bld/libbpf.a LINK /tmp/bld/libbpf.so LINK /tmp/bld/test_libbpf $ head /tmp/bld/.libbpf.o.cmd # cannot find fixdep (/usr/local/google/home/sdf/src/linux/xxx//fixdep) # using basic dep data /tmp/bld/libbpf.o: libbpf.c /usr/include/stdc-predef.h \ /usr/include/stdlib.h /usr/include/features.h \ /usr/include/x86_64-linux-gnu/sys/cdefs.h \ /usr/include/x86_64-linux-gnu/bits/wordsize.h \ /usr/include/x86_64-linux-gnu/gnu/stubs.h \ /usr/include/x86_64-linux-gnu/gnu/stubs-64.h \ /usr/lib/gcc/x86_64-linux-gnu/7/include/stddef.h \ After: $ rm -rf /tmp/bld; mkdir /tmp/bld; make -j$(nproc) O=/tmp/bld -C tools/lib/bpf/ Auto-detecting system features: ... libelf: [ on ] ... bpf: [ on ] HOSTCC /tmp/bld/fixdep.o HOSTLD /tmp/bld/fixdep-in.o LINK /tmp/bld/fixdep CC /tmp/bld/libbpf.o CC /tmp/bld/bpf.o CC /tmp/bld/nlattr.o CC /tmp/bld/btf.o CC /tmp/bld/libbpf_errno.o CC /tmp/bld/str_error.o CC /tmp/bld/netlink.o CC /tmp/bld/bpf_prog_linfo.o CC /tmp/bld/libbpf_probes.o CC /tmp/bld/xsk.o LD /tmp/bld/libbpf-in.o LINK /tmp/bld/libbpf.a LINK /tmp/bld/libbpf.so LINK /tmp/bld/test_libbpf $ head /tmp/bld/.libbpf.o.cmd cmd_/tmp/bld/libbpf.o := gcc -Wp,-MD,/tmp/bld/.libbpf.o.d -Wp,-MT,/tmp/bld/libbpf.o -g -Wall -DHAVE_LIBELF_MMAP_SUPPORT -DCOMPAT_NEED_REALLOCARRAY -Wbad-function-cast -Wdeclaration-after-statement -Wformat-security -Wformat-y2k -Winit-self -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wno-system-headers -Wold-style-definition -Wpacked -Wredundant-decls -Wshadow -Wstrict-prototypes -Wswitch-default -Wswitch-enum -Wundef -Wwrite-strings -Wformat -Wstrict-aliasing=3 -Werror -Wall -fPIC -I. -I/usr/local/google/home/sdf/src/linux/tools/include -I/usr/local/google/home/sdf/src/linux/tools/arch/x86/include/uapi -I/usr/local/google/home/sdf/src/linux/tools/include/uapi -fvisibility=hidden -D"BUILD_STR(s)=$(pound)s" -c -o /tmp/bld/libbpf.o libbpf.c source_/tmp/bld/libbpf.o := libbpf.c deps_/tmp/bld/libbpf.o := \ /usr/include/stdc-predef.h \ /usr/include/stdlib.h \ /usr/include/features.h \ /usr/include/x86_64-linux-gnu/sys/cdefs.h \ /usr/include/x86_64-linux-gnu/bits/wordsize.h \ Fixes: 7c422f557266 ("tools build: Build fixdep helper from perf and basic libs") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-07selftests: bpf: fix compilation with out-of-tree $(OUTPUT)Stanislav Fomichev1-10/+23
A bunch of related changes lumped together: * Create prog_tests and verifier output directories; these don't exist with out-of-tree $(OUTPUT) * Add missing -I (via separate TEST_{PROGS,VERIFIER}_CFLAGS) for the main tree ($(PWD) != $(OUTPUT) for out-of-tree) * Add libbpf.a dependency for test_progs_32 (parallel make fails otherwise) * Add missing "; \" after "cd" when generating test.h headers Tested by: $ alias m="make -s -j$(nproc)" $ m -C tools/testing/selftests/bpf/ clean $ m -C tools/lib/bpf/ clean $ rm -rf xxx; mkdir xxx; m -C tools/testing/selftests/bpf/ OUTPUT=$PWD/xxx $ m -C tools/testing/selftests/bpf/ Fixes: 3f30658830f3 ("selftests: bpf: break up test_progs - preparations") Fixes: 2dfb40121ee8 ("selftests: bpf: prepare for break up of verifier tests") Fixes: 3ef84346c561 ("selftests: bpf: makefile support sub-register code-gen test mode") Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-07selftests/bpf: test that GSO works in lwt_ip_encapPeter Oskolkov1-2/+52
Add a test on egress that a large TCP packet successfully goes through the lwt+bpf encap tunnel. Although there is no direct evidence that GSO worked, as opposed to e.g. TCP segmentation or IP fragmentation (maybe a kernel stats counter should be added to track the number of failed GSO attempts?), without the previous patch in the patchset this test fails, and printk-debugging showed that software-based GSO succeeded here (veth is not compatible with SKB_GSO_DODGY, so GSO happens in the software stack). Also removed an unnecessary nodad and added a missed failed flag. Signed-off-by: Peter Oskolkov <posk@google.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-07net: fix GSO in bpf_lwt_push_ip_encapPeter Oskolkov1-0/+2
GSO needs inner headers and inner protocol set properly to work. skb->inner_mac_header: skb_reset_inner_headers() assigns the current mac header value to inner_mac_header; but it is not set at the point, so we need to call skb_reset_inner_mac_header, otherwise gre_gso_segment fails: it does int tnl_hlen = skb_inner_mac_header(skb) - skb_transport_header(skb); ... if (unlikely(!pskb_may_pull(skb, tnl_hlen))) ... skb->inner_protocol should also be correctly set. Fixes: ca78801a81e0 ("bpf: handle GSO in bpf_lwt_push_encap") Signed-off-by: Peter Oskolkov <posk@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-07xsk: fix potential crash in xsk_diag_put_umem()Eric Dumazet1-2/+2
Fixes two typos in xsk_diag_put_umem() syzbot reported the following crash : kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 7641 Comm: syz-executor946 Not tainted 5.0.0-rc7+ #95 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:xsk_diag_put_umem net/xdp/xsk_diag.c:71 [inline] RIP: 0010:xsk_diag_fill net/xdp/xsk_diag.c:113 [inline] RIP: 0010:xsk_diag_dump+0xdcb/0x13a0 net/xdp/xsk_diag.c:143 Code: 8d be c0 04 00 00 48 89 f8 48 c1 e8 03 42 80 3c 20 00 0f 85 39 04 00 00 49 8b 96 c0 04 00 00 48 8d 7a 14 48 89 f8 48 c1 e8 03 <42> 0f b6 0c 20 48 89 f8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f 85 RSP: 0018:ffff888090bcf2d8 EFLAGS: 00010203 RAX: 0000000000000002 RBX: ffff8880a0aacbc0 RCX: ffffffff86ffdc3c RDX: 0000000000000000 RSI: ffffffff86ffdc70 RDI: 0000000000000014 RBP: ffff888090bcf438 R08: ffff88808e04a700 R09: ffffed1011c74174 R10: ffffed1011c74173 R11: ffff88808e3a0b9f R12: dffffc0000000000 R13: ffff888093a6d818 R14: ffff88808e365240 R15: ffff88808e3a0b40 FS: 00000000011ea880(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000080 CR3: 000000008fa13000 CR4: 00000000001406e0 Call Trace: netlink_dump+0x55d/0xfb0 net/netlink/af_netlink.c:2252 __netlink_dump_start+0x5b4/0x7e0 net/netlink/af_netlink.c:2360 netlink_dump_start include/linux/netlink.h:226 [inline] xsk_diag_handler_dump+0x1b2/0x250 net/xdp/xsk_diag.c:170 __sock_diag_cmd net/core/sock_diag.c:232 [inline] sock_diag_rcv_msg+0x322/0x410 net/core/sock_diag.c:263 netlink_rcv_skb+0x17a/0x460 net/netlink/af_netlink.c:2485 sock_diag_rcv+0x2b/0x40 net/core/sock_diag.c:274 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x536/0x720 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x8ae/0xd70 net/netlink/af_netlink.c:1925 sock_sendmsg_nosec net/socket.c:622 [inline] sock_sendmsg+0xdd/0x130 net/socket.c:632 sock_write_iter+0x27c/0x3e0 net/socket.c:923 call_write_iter include/linux/fs.h:1863 [inline] do_iter_readv_writev+0x5e0/0x8e0 fs/read_write.c:680 do_iter_write fs/read_write.c:956 [inline] do_iter_write+0x184/0x610 fs/read_write.c:937 vfs_writev+0x1b3/0x2f0 fs/read_write.c:1001 do_writev+0xf6/0x290 fs/read_write.c:1036 __do_sys_writev fs/read_write.c:1109 [inline] __se_sys_writev fs/read_write.c:1106 [inline] __x64_sys_writev+0x75/0xb0 fs/read_write.c:1106 do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x440139 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffcc966cc18 EFLAGS: 00000246 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440139 RDX: 0000000000000001 RSI: 0000000020000080 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8 R10: 0000000000000004 R11: 0000000000000246 R12: 00000000004019c0 R13: 0000000000401a50 R14: 0000000000000000 R15: 0000000000000000 Modules linked in: ---[ end trace 460a3c24d0a656c9 ]--- RIP: 0010:xsk_diag_put_umem net/xdp/xsk_diag.c:71 [inline] RIP: 0010:xsk_diag_fill net/xdp/xsk_diag.c:113 [inline] RIP: 0010:xsk_diag_dump+0xdcb/0x13a0 net/xdp/xsk_diag.c:143 Code: 8d be c0 04 00 00 48 89 f8 48 c1 e8 03 42 80 3c 20 00 0f 85 39 04 00 00 49 8b 96 c0 04 00 00 48 8d 7a 14 48 89 f8 48 c1 e8 03 <42> 0f b6 0c 20 48 89 f8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f 85 RSP: 0018:ffff888090bcf2d8 EFLAGS: 00010203 RAX: 0000000000000002 RBX: ffff8880a0aacbc0 RCX: ffffffff86ffdc3c RDX: 0000000000000000 RSI: ffffffff86ffdc70 RDI: 0000000000000014 RBP: ffff888090bcf438 R08: ffff88808e04a700 R09: ffffed1011c74174 R10: ffffed1011c74173 R11: ffff88808e3a0b9f R12: dffffc0000000000 R13: ffff888093a6d818 R14: ffff88808e365240 R15: ffff88808e3a0b40 FS: 00000000011ea880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000001d22000 CR3: 000000008fa13000 CR4: 00000000001406f0 Fixes: a36b38aa2af6 ("xsk: add sock_diag interface for AF_XDP") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Björn Töpel <bjorn.topel@intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-07bpf: hbm: fix spelling mistake "deault" -> "default"Colin Ian King1-2/+2
There are a couple of typos, fix these. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-07bpf: only test gso type on gso packetsWillem de Bruijn2-6/+6
BPF can adjust gso only for tcp bytestreams. Fail on other gso types. But only on gso packets. It does not touch this field if !gso_size. Fixes: b90efd225874 ("bpf: only adjust gso_size on bytestream protocols") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-07bpf: fix sysctl.c warningArnd Bergmann1-1/+1
When CONFIG_BPF_SYSCALL or CONFIG_SYSCTL is disabled, we get a warning about an unused function: kernel/sysctl.c:3331:12: error: 'proc_dointvec_minmax_bpf_stats' defined but not used [-Werror=unused-function] static int proc_dointvec_minmax_bpf_stats(struct ctl_table *table, int write, The CONFIG_BPF_SYSCALL check was already handled, but the SYSCTL check is needed on top. Fixes: 492ecee892c2 ("bpf: enable program stats") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christian Brauner <christian@brauner.io> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-06tcp: detecting the misuse of .sendpage for Slab objectsVasily Averin1-0/+4
sendpage was not designed for processing of the Slab pages, in some situations it can trigger BUG_ON on receiving side. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-06appletalk: Add atalk.h header files to MAINTAINERS fileArnd Bergmann1-0/+2
Add the path names here so that git-send-email can pick up the netdev@vger.kernel.org Cc line automatically for a patch that only touches the headers. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-06appletalk: Fix compile regressionArnd Bergmann1-4/+14
A bugfix just broke compilation of appletalk when CONFIG_SYSCTL is disabled: In file included from net/appletalk/ddp.c:65: net/appletalk/ddp.c: In function 'atalk_init': include/linux/atalk.h:164:34: error: expected expression before 'do' #define atalk_register_sysctl() do { } while(0) ^~ net/appletalk/ddp.c:1934:7: note: in expansion of macro 'atalk_register_sysctl' rc = atalk_register_sysctl(); This is easier to avoid by using conventional inline functions as stubs rather than macros. The header already has inline functions for other purposes, so I'm changing over all the macros for consistency. Fixes: 6377f787aeb9 ("appletalk: Fix use-after-free in atalk_proc_exit") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-06iptunnel: NULL pointer deref for ip_md_tunnel_xmitAlan Maguire1-3/+6
Naresh Kamboju noted the following oops during execution of selftest tools/testing/selftests/bpf/test_tunnel.sh on x86_64: [ 274.120445] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 274.128285] #PF error: [INSTR] [ 274.131351] PGD 8000000414a0e067 P4D 8000000414a0e067 PUD 3b6334067 PMD 0 [ 274.138241] Oops: 0010 [#1] SMP PTI [ 274.141734] CPU: 1 PID: 11464 Comm: ping Not tainted 5.0.0-rc4-next-20190129 #1 [ 274.149046] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.0b 07/27/2017 [ 274.156526] RIP: 0010: (null) [ 274.160280] Code: Bad RIP value. [ 274.163509] RSP: 0018:ffffbc9681f83540 EFLAGS: 00010286 [ 274.168726] RAX: 0000000000000000 RBX: ffffdc967fa80a18 RCX: 0000000000000000 [ 274.175851] RDX: ffff9db2ee08b540 RSI: 000000000000000e RDI: ffffdc967fa809a0 [ 274.182974] RBP: ffffbc9681f83580 R08: ffff9db2c4d62690 R09: 000000000000000c [ 274.190098] R10: 0000000000000000 R11: ffff9db2ee08b540 R12: ffff9db31ce7c000 [ 274.197222] R13: 0000000000000001 R14: 000000000000000c R15: ffff9db3179cf400 [ 274.204346] FS: 00007ff4ae7c5740(0000) GS:ffff9db31fa80000(0000) knlGS:0000000000000000 [ 274.212424] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 274.218162] CR2: ffffffffffffffd6 CR3: 00000004574da004 CR4: 00000000003606e0 [ 274.225292] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 274.232416] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 274.239541] Call Trace: [ 274.241988] ? tnl_update_pmtu+0x296/0x3b0 [ 274.246085] ip_md_tunnel_xmit+0x1bc/0x520 [ 274.250176] gre_fb_xmit+0x330/0x390 [ 274.253754] gre_tap_xmit+0x128/0x180 [ 274.257414] dev_hard_start_xmit+0xb7/0x300 [ 274.261598] sch_direct_xmit+0xf6/0x290 [ 274.265430] __qdisc_run+0x15d/0x5e0 [ 274.269007] __dev_queue_xmit+0x2c5/0xc00 [ 274.273011] ? dev_queue_xmit+0x10/0x20 [ 274.276842] ? eth_header+0x2b/0xc0 [ 274.280326] dev_queue_xmit+0x10/0x20 [ 274.283984] ? dev_queue_xmit+0x10/0x20 [ 274.287813] arp_xmit+0x1a/0xf0 [ 274.290952] arp_send_dst.part.19+0x46/0x60 [ 274.295138] arp_solicit+0x177/0x6b0 [ 274.298708] ? mod_timer+0x18e/0x440 [ 274.302281] neigh_probe+0x57/0x70 [ 274.305684] __neigh_event_send+0x197/0x2d0 [ 274.309862] neigh_resolve_output+0x18c/0x210 [ 274.314212] ip_finish_output2+0x257/0x690 [ 274.318304] ip_finish_output+0x219/0x340 [ 274.322314] ? ip_finish_output+0x219/0x340 [ 274.326493] ip_output+0x76/0x240 [ 274.329805] ? ip_fragment.constprop.53+0x80/0x80 [ 274.334510] ip_local_out+0x3f/0x70 [ 274.337992] ip_send_skb+0x19/0x40 [ 274.341391] ip_push_pending_frames+0x33/0x40 [ 274.345740] raw_sendmsg+0xc15/0x11d0 [ 274.349403] ? __might_fault+0x85/0x90 [ 274.353151] ? _copy_from_user+0x6b/0xa0 [ 274.357070] ? rw_copy_check_uvector+0x54/0x130 [ 274.361604] inet_sendmsg+0x42/0x1c0 [ 274.365179] ? inet_sendmsg+0x42/0x1c0 [ 274.368937] sock_sendmsg+0x3e/0x50 [ 274.372460] ___sys_sendmsg+0x26f/0x2d0 [ 274.376293] ? lock_acquire+0x95/0x190 [ 274.380043] ? __handle_mm_fault+0x7ce/0xb70 [ 274.384307] ? lock_acquire+0x95/0x190 [ 274.388053] ? __audit_syscall_entry+0xdd/0x130 [ 274.392586] ? ktime_get_coarse_real_ts64+0x64/0xc0 [ 274.397461] ? __audit_syscall_entry+0xdd/0x130 [ 274.401989] ? trace_hardirqs_on+0x4c/0x100 [ 274.406173] __sys_sendmsg+0x63/0xa0 [ 274.409744] ? __sys_sendmsg+0x63/0xa0 [ 274.413488] __x64_sys_sendmsg+0x1f/0x30 [ 274.417405] do_syscall_64+0x55/0x190 [ 274.421064] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 274.426113] RIP: 0033:0x7ff4ae0e6e87 [ 274.429686] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 ca d9 2b 00 48 63 d2 48 63 ff 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 53 48 89 f3 48 83 ec 10 48 89 7c 24 08 [ 274.448422] RSP: 002b:00007ffcd9b76db8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 274.455978] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007ff4ae0e6e87 [ 274.463104] RDX: 0000000000000000 RSI: 00000000006092e0 RDI: 0000000000000003 [ 274.470228] RBP: 0000000000000000 R08: 00007ffcd9bc40a0 R09: 00007ffcd9bc4080 [ 274.477349] R10: 000000000000060a R11: 0000000000000246 R12: 0000000000000003 [ 274.484475] R13: 0000000000000016 R14: 00007ffcd9b77fa0 R15: 00007ffcd9b78da4 [ 274.491602] Modules linked in: cls_bpf sch_ingress iptable_filter ip_tables algif_hash af_alg x86_pkg_temp_thermal fuse [last unloaded: test_bpf] [ 274.504634] CR2: 0000000000000000 [ 274.507976] ---[ end trace 196d18386545eae1 ]--- [ 274.512588] RIP: 0010: (null) [ 274.516334] Code: Bad RIP value. [ 274.519557] RSP: 0018:ffffbc9681f83540 EFLAGS: 00010286 [ 274.524775] RAX: 0000000000000000 RBX: ffffdc967fa80a18 RCX: 0000000000000000 [ 274.531921] RDX: ffff9db2ee08b540 RSI: 000000000000000e RDI: ffffdc967fa809a0 [ 274.539082] RBP: ffffbc9681f83580 R08: ffff9db2c4d62690 R09: 000000000000000c [ 274.546205] R10: 0000000000000000 R11: ffff9db2ee08b540 R12: ffff9db31ce7c000 [ 274.553329] R13: 0000000000000001 R14: 000000000000000c R15: ffff9db3179cf400 [ 274.560456] FS: 00007ff4ae7c5740(0000) GS:ffff9db31fa80000(0000) knlGS:0000000000000000 [ 274.568541] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 274.574277] CR2: ffffffffffffffd6 CR3: 00000004574da004 CR4: 00000000003606e0 [ 274.581403] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 274.588535] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 274.595658] Kernel panic - not syncing: Fatal exception in interrupt [ 274.602046] Kernel Offset: 0x14400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 274.612827] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- [ 274.620387] ------------[ cut here ]------------ I'm also seeing the same failure on x86_64, and it reproduces consistently. >From poking around it looks like the skb's dst entry is being used to calculate the mtu in: mtu = skb_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu; ...but because that dst_entry has an "ops" value set to md_dst_ops, the various ops (including mtu) are not set: crash> struct sk_buff._skb_refdst ffff928f87447700 -x _skb_refdst = 0xffffcd6fbf5ea590 crash> struct dst_entry.ops 0xffffcd6fbf5ea590 ops = 0xffffffffa0193800 crash> struct dst_ops.mtu 0xffffffffa0193800 mtu = 0x0 crash> I confirmed that the dst entry also has dst->input set to dst_md_discard, so it looks like it's an entry that's been initialized via __metadata_dst_init alright. I think the fix here is to use skb_valid_dst(skb) - it checks for DST_METADATA also, and with that fix in place, the problem - which was previously 100% reproducible - disappears. The below patch resolves the panic and all bpf tunnel tests pass without incident. Fixes: c8b34e680a09 ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Anders Roxell <anders.roxell@linaro.org> Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-06ipv4/route: fail early when inet dev is missingPaolo Abeni1-4/+5
If a non local multicast packet reaches ip_route_input_rcu() while the ingress device IPv4 private data (in_dev) is NULL, we end up doing a NULL pointer dereference in IN_DEV_MFORWARD(). Since the later call to ip_route_input_mc() is going to fail if !in_dev, we can fail early in such scenario and avoid the dangerous code path. v1 -> v2: - clarified the commit message, no code changes Reported-by: Tianhao Zhao <tizhao@redhat.com> Fixes: e58e41596811 ("net: Enable support for VRF with ipv4 multicast") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-06net: hns3: Fix a logical vs bitwise typoDan Carpenter1-2/+2
There were a couple logical ORs accidentally mixed in with the bitwise ORs. Fixes: e8149933b1fa ("net: hns3: remove hnae3_get_bit in data path") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-05net/sched: act_tunnel_key: Fix double free dst_cachewenxu1-16/+6
dst_cache_destroy will be called in dst_release dst_release-->dst_destroy_rcu-->dst_destroy-->metadata_dst_free -->dst_cache_destroy It should not call dst_cache_destroy before dst_release Fixes: 41411e2fd6b8 ("net/sched: act_tunnel_key: Add dst_cache support") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-05tipc: fix RDM/DGRAM connect() regressionErik Hugne1-1/+1
Fix regression bug introduced in commit 365ad353c256 ("tipc: reduce risk of user starvation during link congestion") Only signal -EDESTADDRREQ for RDM/DGRAM if we don't have a cached sockaddr. Fixes: 365ad353c256 ("tipc: reduce risk of user starvation during link congestion") Signed-off-by: Erik Hugne <erik.hugne@gmail.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-05x86: Deprecate a.out supportBorislav Petkov2-2/+1
Linux supports ELF binaries for ~25 years now. a.out coredumping has bitrotten quite significantly and would need some fixing to get it into shape again but considering how even the toolchains cannot create a.out executables in its default configuration, let's deprecate a.out support and remove it a couple of releases later, instead. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Richard Weinberger <richard@nod.at> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Jann Horn <jannh@google.com> Cc: <linux-api@vger.kernel.org> Cc: <linux-fsdevel@vger.kernel.org> Cc: lkml <linux-kernel@vger.kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <x86@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05a.out: remove core dumping supportLinus Torvalds6-485/+0
We're (finally) phasing out a.out support for good. As Borislav Petkov points out, we've supported ELF binaries for about 25 years by now, and coredumping in particular has bitrotted over the years. None of the tool chains even support generating a.out binaries any more, and the plan is to deprecate a.out support entirely for the kernel. But I want to start with just removing the core dumping code, because I can still imagine that somebody actually might want to support a.out as a simpler biinary format. Particularly if you generate some random binaries on the fly, ELF is a much more complicated format (admittedly ELF also does have a lot of toolchain support, mitigating that complexity a lot and you really should have moved over in the last 25 years). So it's at least somewhat possible that somebody out there has some workflow that still involves generating and running a.out executables. In contrast, it's very unlikely that anybody depends on debugging any legacy a.out core files. But regardless, I want this phase-out to be done in two steps, so that we can resurrect a.out support (if needed) without having to resurrect the core file dumping that is almost certainly not needed. Jann Horn pointed to the <asm/a.out-core.h> file that my first trivial cut at this had missed. And Alan Cox points out that the a.out binary loader _could_ be done in user space if somebody wants to, but we might keep just the loader in the kernel if somebody really wants it, since the loader isn't that big and has no really odd special cases like the core dumping does. Acked-by: Borislav Petkov <bp@alien8.de> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Jann Horn <jannh@google.com> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05Revert "s390/cpum_cf: Add kernel message exaplanations"Martin Schwidefsky2-81/+0
This reverts commit fb3a0b61e0d4e435016cc91575d051f841791da0. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-03-04fs: Make splice() and tee() take into account O_NONBLOCK flag on pipesSlavomir Kaslev1-0/+12
The current implementation of splice() and tee() ignores O_NONBLOCK set on pipe file descriptors and checks only the SPLICE_F_NONBLOCK flag for blocking on pipe arguments. This is inconsistent since splice()-ing from/to non-pipe file descriptors does take O_NONBLOCK into consideration. Fix this by promoting O_NONBLOCK, when set on a pipe, to SPLICE_F_NONBLOCK. Some context for how the current implementation of splice() leads to inconsistent behavior. In the ongoing work[1] to add VM tracing capability to trace-cmd we stream tracing data over named FIFOs or vsockets from guests back to the host. When we receive SIGINT from user to stop tracing, we set O_NONBLOCK on the input file descriptor and set SPLICE_F_NONBLOCK for the next call to splice(). If splice() was blocked waiting on data from the input FIFO, after SIGINT splice() restarts with the same arguments (no SPLICE_F_NONBLOCK) and blocks again instead of returning -EAGAIN when no data is available. This differs from the splice() behavior when reading from a vsocket or when we're doing a traditional read()/write() loop (trace-cmd's --nosplice argument). With this patch applied we get the same behavior in all situations after setting O_NONBLOCK which also matches the behavior of doing a read()/write() loop instead of splice(). This change does have potential of breaking users who don't expect EAGAIN from splice() when SPLICE_F_NONBLOCK is not set. OTOH programs that set O_NONBLOCK and don't anticipate EAGAIN are arguably buggy[2]. [1] https://github.com/skaslev/trace-cmd/tree/vsock [2] https://github.com/torvalds/linux/blob/d47e3da1759230e394096fd742aad423c291ba48/fs/read_write.c#L1425 Signed-off-by: Slavomir Kaslev <kaslevs@vmware.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-04net/sched: avoid unused-label warningArnd Bergmann1-1/+1
The label is only used from inside the #ifdef and should be hidden the same way, to avoid this warning: net/sched/act_tunnel_key.c: In function 'tunnel_key_init': net/sched/act_tunnel_key.c:389:1: error: label 'release_tun_meta' defined but not used [-Werror=unused-label] release_tun_meta: Fixes: 41411e2fd6b8 ("net/sched: act_tunnel_key: Add dst_cache support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04net: ignore sysctl_devconf_inherit_init_net without SYSCTLArnd Bergmann2-2/+5
When CONFIG_SYSCTL is turned off, we get a link failure for the newly introduced tuning knob. net/ipv6/addrconf.o: In function `addrconf_init_net': addrconf.c:(.text+0x31dc): undefined reference to `sysctl_devconf_inherit_init_net' Add an IS_ENABLED() check to fall back to the default behavior (sysctl_devconf_inherit_init_net=0) here. Fixes: 856c395cfa63 ("net: introduce a knob to control whether to inherit devconf config") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04phy: mdio-mux: fix Kconfig dependenciesArnd Bergmann1-1/+1
MDIO_BUS_MUX can only be selected if OF_MDIO is already turned on: WARNING: unmet direct dependencies detected for MDIO_BUS_MUX Depends on [n]: NETDEVICES [=y] && MDIO_BUS [=m] && OF_MDIO [=n] Selected by [m]: - MDIO_BUS_MUX_MULTIPLEXER [=m] && NETDEVICES [=y] && MDIO_BUS [=m] && OF [=y] Fixes: 7865ad6551c9 ("drivers: net: phy: mdio-mux: Add support for Generic Mux controls") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04net: phy: use phy_modify_mmd_changed in genphy_c45_an_config_anegHeiner Kallweit1-9/+8
As can be seen from the usage of the return value, we should use phy_modify_mmd_changed() here. Fixes: 9a5dc8af4416 ("net: phy: add genphy_c45_an_config_aneg") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04net: dsa: mv88e6xxx: add call to mv88e6xxx_ports_cmode_init to probe for new DSA frameworkHeiner Kallweit1-0/+1
In the original patch I missed to add mv88e6xxx_ports_cmode_init() to the second probe function, the one for the new DSA framework. Fixes: ed8fe20205ac ("net: dsa: mv88e6xxx: prevent interrupt storm caused by mv88e6390x_port_set_cmode") Reported-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04selftest/net: Remove duplicate headerSouptick Joarder1-1/+0
Remove duplicate header which is included twice. Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com> Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79Kai-Heng Feng1-1/+23
Some sky2 chips fire IRQ after S3, before the driver is fully resumed: [ 686.804877] do_IRQ: 1.37 No irq handler for vector This is likely a platform bug that device isn't fully quiesced during S3. Use MSI-X, maskable MSI or INTx can prevent this issue from happening. Since MSI-X and maskable MSI are not supported by this device, fallback to use INTx on affected platforms. BugLink: https://bugs.launchpad.net/bugs/1807259 BugLink: https://bugs.launchpad.net/bugs/1809843 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04net/mlx5e: Update tx reporter status in case channels were successfully openedEran Ben Elisha1-0/+4
Once channels were successfully opened, update tx reporter health state to healthy. This is needed for the following scenario: - SQ has an un-recovered error reported to the devlink health, resulting tx reporter state to be error. - Current channels (including this SQ) are closed - New channels are opened After that flow, the original error was "solved", and tx reporter state should be healthy. However, as it was resolved as a side effect, and not via tx reporter recover method, driver needs to inform devlink health about it. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04devlink: Add support for direct reporter health state updateEran Ben Elisha3-5/+62
It is possible that a reporter state will be updated due to a recover flow which is not triggered by a devlink health related operation, but as a side effect of some other operation in the system. Expose devlink health API for a direct update of a reporter status. Move devlink_health_reporter_state enum definition to devlink.h so it could be used from drivers as a parameter of devlink_health_reporter_state_update. In addition, add trace_devlink_health_reporter_state_update to provide user notification for reporter state change. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04devlink: Update reporter state to error even if recover abortedEran Ben Elisha1-1/+4
If devlink_health_report() aborted the recover flow due to grace period checker, it left the reporter status as DEVLINK_HEALTH_REPORTER_STATE_HEALTHY, which is a bug. Fix that by always setting the reporter state to DEVLINK_HEALTH_REPORTER_STATE_ERROR prior to running the checker mentioned above. In addition, save the previous health_state in a temporary variable, then use it in the abort check comparison instead of using reporter->health_state which might be already changed. Fixes: c8e1da0bf923 ("devlink: Add health report functionality") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04sctp: call iov_iter_revert() after sending ABORTXin Long1-0/+1
The user msg is also copied to the abort packet when doing SCTP_ABORT in sctp_sendmsg_check_sflags(). When SCTP_SENDALL is set, iov_iter_revert() should have been called for sending abort on the next asoc with copying this msg. Otherwise, memcpy_from_msg() in sctp_make_abort_user() will fail and return error. Fixes: 4910280503f3 ("sctp: add support for snd flag SCTP_SENDALL process in sendmsg") Reported-by: Ying Xu <yinxu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04team: Free BPF filter when unregistering netdevIdo Schimmel1-0/+15
When team is used in loadbalance mode a BPF filter can be used to provide a hash which will determine the Tx port. When the netdev is later unregistered the filter is not freed which results in memory leaks [1]. Fix by freeing the program and the corresponding filter when unregistering the netdev. [1] unreferenced object 0xffff8881dbc47cc8 (size 16): comm "teamd", pid 3068, jiffies 4294997779 (age 438.247s) hex dump (first 16 bytes): a3 00 6b 6b 6b 6b 6b 6b 88 a5 82 e1 81 88 ff ff ..kkkkkk........ backtrace: [<000000008a3b47e3>] team_nl_cmd_options_set+0x88f/0x11b0 [<00000000c4f4f27e>] genl_family_rcv_msg+0x78f/0x1080 [<00000000610ef838>] genl_rcv_msg+0xca/0x170 [<00000000a281df93>] netlink_rcv_skb+0x132/0x380 [<000000004d9448a2>] genl_rcv+0x29/0x40 [<000000000321b2f4>] netlink_unicast+0x4c0/0x690 [<000000008c25dffb>] netlink_sendmsg+0x929/0xe10 [<00000000068298c5>] sock_sendmsg+0xc8/0x110 [<0000000082a61ff0>] ___sys_sendmsg+0x77a/0x8f0 [<00000000663ae29d>] __sys_sendmsg+0xf7/0x250 [<0000000027c5f11a>] do_syscall_64+0x14d/0x610 [<000000006cfbc8d3>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<00000000e23197e2>] 0xffffffffffffffff unreferenced object 0xffff8881e182a588 (size 2048): comm "teamd", pid 3068, jiffies 4294997780 (age 438.247s) hex dump (first 32 bytes): 20 00 00 00 02 00 00 00 30 00 00 00 28 f0 ff ff .......0...(... 07 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 ........(....... backtrace: [<000000002daf01fb>] lb_bpf_func_set+0x45c/0x6d0 [<000000008a3b47e3>] team_nl_cmd_options_set+0x88f/0x11b0 [<00000000c4f4f27e>] genl_family_rcv_msg+0x78f/0x1080 [<00000000610ef838>] genl_rcv_msg+0xca/0x170 [<00000000a281df93>] netlink_rcv_skb+0x132/0x380 [<000000004d9448a2>] genl_rcv+0x29/0x40 [<000000000321b2f4>] netlink_unicast+0x4c0/0x690 [<000000008c25dffb>] netlink_sendmsg+0x929/0xe10 [<00000000068298c5>] sock_sendmsg+0xc8/0x110 [<0000000082a61ff0>] ___sys_sendmsg+0x77a/0x8f0 [<00000000663ae29d>] __sys_sendmsg+0xf7/0x250 [<0000000027c5f11a>] do_syscall_64+0x14d/0x610 [<000000006cfbc8d3>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<00000000e23197e2>] 0xffffffffffffffff Fixes: 01d7f30a9f96 ("team: add loadbalance mode") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04ip6mr: Do not call __IP6_INC_STATS() from preemptible contextIdo Schimmel1-4/+4
Similar to commit 44f49dd8b5a6 ("ipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() in preemptible context."), we cannot assume preemption is disabled when incrementing the counter and accessing a per-CPU variable. Preemption can be enabled when we add a route in process context that corresponds to packets stored in the unresolved queue, which are then forwarded using this route [1]. Fix this by using IP6_INC_STATS() which takes care of disabling preemption on architectures where it is needed. [1] [ 157.451447] BUG: using __this_cpu_add() in preemptible [00000000] code: smcrouted/2314 [ 157.460409] caller is ip6mr_forward2+0x73e/0x10e0 [ 157.460434] CPU: 3 PID: 2314 Comm: smcrouted Not tainted 5.0.0-rc7-custom-03635-g22f2712113f1 #1336 [ 157.460449] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016 [ 157.460461] Call Trace: [ 157.460486] dump_stack+0xf9/0x1be [ 157.460553] check_preemption_disabled+0x1d6/0x200 [ 157.460576] ip6mr_forward2+0x73e/0x10e0 [ 157.460705] ip6_mr_forward+0x9a0/0x1510 [ 157.460771] ip6mr_mfc_add+0x16b3/0x1e00 [ 157.461155] ip6_mroute_setsockopt+0x3cb/0x13c0 [ 157.461384] do_ipv6_setsockopt.isra.8+0x348/0x4060 [ 157.462013] ipv6_setsockopt+0x90/0x110 [ 157.462036] rawv6_setsockopt+0x4a/0x120 [ 157.462058] __sys_setsockopt+0x16b/0x340 [ 157.462198] __x64_sys_setsockopt+0xbf/0x160 [ 157.462220] do_syscall_64+0x14d/0x610 [ 157.462349] entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 0912ea38de61 ("[IPV6] MROUTE: Add stats in multicast routing module method ip6_mr_forward().") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Amit Cohen <amitc@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04isdn: mISDN: Fix potential NULL pointer dereference of kzallocAditya Pakki1-0/+3
Allocating memory via kzalloc for phi may fail and causes a NULL pointer dereference. This patch avoids such a scenario. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04net: dsa: mv88e6xxx: support in-band signalling on SGMII ports with external PHYsHeiner Kallweit4-14/+52
If an external PHY is connected via SGMII and uses in-band signalling then the auto-negotiated values aren't propagated to the port, resulting in a broken link. See discussion in [0]. This patch adds this propagation. We need to call mv88e6xxx_port_setup_mac(), therefore export it from chip.c. Successfully tested on a ZII DTU with 88E6390 switch and an Aquantia AQCS109 PHY connected via SGMII to port 9. [0] https://marc.info/?t=155130287200001&r=1&w=2 Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04get rid of legacy 'get_ds()' functionLinus Torvalds33-52/+16
Every in-kernel use of this function defined it to KERNEL_DS (either as an actual define, or as an inline function). It's an entirely historical artifact, and long long long ago used to actually read the segment selector valueof '%ds' on x86. Which in the kernel is always KERNEL_DS. Inspired by a patch from Jann Horn that just did this for a very small subset of users (the ones in fs/), along with Al who suggested a script. I then just took it to the logical extreme and removed all the remaining gunk. Roughly scripted with git grep -l '(get_ds())' -- :^tools/ | xargs sed -i 's/(get_ds())/(KERNEL_DS)/' git grep -lw 'get_ds' -- :^tools/ | xargs sed -i '/^#define get_ds()/d' plus manual fixups to remove a few unusual usage patterns, the couple of inline function cases and to fix up a comment that had become stale. The 'get_ds()' function remains in an x86 kvm selftest, since in user space it actually does something relevant. Inspired-by: Jann Horn <jannh@google.com> Inspired-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-04aio: simplify - and fix - fget/fput for io_submit()Linus Torvalds2-44/+36
Al Viro root-caused a race where the IOCB_CMD_POLL handling of fget/fput() could cause us to access the file pointer after it had already been freed: "In more details - normally IOCB_CMD_POLL handling looks so: 1) io_submit(2) allocates aio_kiocb instance and passes it to aio_poll() 2) aio_poll() resolves the descriptor to struct file by req->file = fget(iocb->aio_fildes) 3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that aio_kiocb to 2 (bumps by 1, that is). 4) aio_poll() calls vfs_poll(). After sanity checks (basically, "poll_wait() had been called and only once") it locks the queue. That's what the extra reference to iocb had been for - we know we can safely access it. 5) With queue locked, we check if ->woken has already been set to true (by aio_poll_wake()) and, if it had been, we unlock the queue, drop a reference to aio_kiocb and bugger off - at that point it's a responsibility to aio_poll_wake() and the stuff called/scheduled by it. That code will drop the reference to file in req->file, along with the other reference to our aio_kiocb. 6) otherwise, we see whether we need to wait. If we do, we unlock the queue, drop one reference to aio_kiocb and go away - eventual wakeup (or cancel) will deal with the reference to file and with the other reference to aio_kiocb 7) otherwise we remove ourselves from waitqueue (still under the queue lock), so that wakeup won't get us. No async activity will be happening, so we can safely drop req->file and iocb ourselves. If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb won't get freed under us, so we can do all the checks and locking safely. And we don't touch ->file if we detect that case. However, vfs_poll() most certainly *does* touch the file it had been given. So wakeup coming while we are still in ->poll() might end up doing fput() on that file. That case is not too rare, and usually we are saved by the still present reference from descriptor table - that fput() is not the final one. But if another thread closes that descriptor right after our fget() and wakeup does happen before ->poll() returns, we are in trouble - final fput() done while we are in the middle of a method: Al also wrote a patch to take an extra reference to the file descriptor to fix this, but I instead suggested we just streamline the whole file pointer handling by submit_io() so that the generic aio submission code simply keeps the file pointer around until the aio has completed. Fixes: bfe4037e722e ("aio: implement IOCB_CMD_POLL") Acked-by: Al Viro <viro@zeniv.linux.org.uk> Reported-by: syzbot+503d4cc169fcec1cb18c@syzkaller.appspotmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-04cxgb4/chtls: Prefix adapter flags with CXGB4Arjun Vynipadath12-105/+106
Some of these macros were conflicting with global namespace, hence prefixing them with CXGB4. Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Vishal Kulkarni <vishal@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04net-sysfs: Switch to bitmap_zalloc()Andy Shevchenko1-7/+5
Switch to bitmap_zalloc() to show clearly what we are allocating. Besides that it returns pointer of bitmap type instead of opaque void *. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04mellanox: Switch to bitmap_zalloc()Andy Shevchenko5-22/+16
Switch to bitmap_zalloc() to show clearly what we are allocating. Besides that it returns pointer of bitmap type instead of opaque void *. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-04x86-64: add warning for non-canonical user access address dereferencesLinus Torvalds1-0/+1
This adds a warning (once) for any kernel dereference that has a user exception handler, but accesses a non-canonical address. It basically is a simpler - and more limited - version of commit 9da3f2b74054 ("x86/fault: BUG() when uaccess helpers fault on kernel addresses") that got reverted. Note that unlike that original commit, this only causes a warning, because there are real situations where we currently can do this (notably speculative argument fetching for uprobes etc). Also, unlike that original commit, this _only_ triggers for #GP accesses, so the cases of valid kernel pointers that cross into a non-mapped page aren't affected. The intent of this is two-fold: - the uprobe/tracing accesses really do need to be more careful. In particular, from a portability standpoint it's just wrong to think that "a pointer is a pointer", and use the same logic for any random pointer value you find on the stack. It may _work_ on x86-64, but it doesn't necessarily work on other architectures (where the same pointer value can be either a kernel pointer _or_ a user pointer, and you really need to be much more careful in how you try to access it) The warning can hopefully end up being a reminder that just any random pointer access won't do. - Kees in particular wanted a way to actually report invalid uses of wild pointers to user space accessors, instead of just silently failing them. Automated fuzzers want a way to get reports if the kernel ever uses invalid values that the fuzzer fed it. The non-canonical address range is a fair chunk of the address space, and with this you can teach syzkaller to feed in invalid pointer values and find cases where we do not properly validate user addresses (possibly due to bad uses of "set_fs()"). Acked-by: Kees Cook <keescook@chromium.org> Cc: Jann Horn <jannh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-04bpf: add test cases for non-pointer sanitiation logicDaniel Borkmann1-1/+43
Add two additional tests for further asserting the BPF_ALU_NON_POINTER logic with cases that were missed previously. Cc: Marek Majkowski <marek@cloudflare.com> Cc: Arthur Fabre <afabre@cloudflare.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-03mlxsw: i2c: Extend initialization by querying resources dataVadim Pasternak2-6/+17
Extend initialization flow by query requests for chip resources data in order to obtain chip's specific capabilities, like the number of ports. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03mlxsw: i2c: Extend input parameters list of command APIVadim Pasternak1-19/+101
Extend input parameters list of command API in mlxsw_i2c_cmd() in order to support initialization commands. Up until now, only access commands were supported by I2C driver. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03mlxsw: i2c: Modify input parameter name in initialization APIVadim Pasternak1-1/+1
Change input parameter name "resource" to "res" in mlxsw_i2c_init() in order to align it with mlxsw_pci_init(). Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>