aboutsummaryrefslogtreecommitdiffstats
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-03-16net: Fix vlan untag for bridge and vlan_dev with reorder_hdr offToshiaki Makita1-2/+5
When we have a bridge with vlan_filtering on and a vlan device on top of it, packets would be corrupted in skb_vlan_untag() called from br_dev_xmit(). The problem sits in skb_reorder_vlan_header() used in skb_vlan_untag(), which makes use of skb->mac_len. In this function mac_len is meant for handling rx path with vlan devices with reorder_header disabled, but in tx path mac_len is typically 0 and cannot be used, which is the problem in this case. The current code even does not properly handle rx path (skb_vlan_untag() called from __netif_receive_skb_core()) with reorder_header off actually. In rx path single tag case, it works as follows: - Before skb_reorder_vlan_header() mac_header data v v +-------------------+-------------+------+---- | ETH | VLAN | ETH | | ADDRS | TPID | TCI | TYPE | +-------------------+-------------+------+---- <-------- mac_len ---------> <-------------> to be removed - After skb_reorder_vlan_header() mac_header data v v +-------------------+------+---- | ETH | ETH | | ADDRS | TYPE | +-------------------+------+---- <-------- mac_len ---------> This is ok, but in rx double tag case, it corrupts packets: - Before skb_reorder_vlan_header() mac_header data v v +-------------------+-------------+-------------+------+---- | ETH | VLAN | VLAN | ETH | | ADDRS | TPID | TCI | TPID | TCI | TYPE | +-------------------+-------------+-------------+------+---- <--------------- mac_len ----------------> <-------------> should be removed <---------------------------> actually will be removed - After skb_reorder_vlan_header() mac_header data v v +-------------------+------+---- | ETH | ETH | | ADDRS | TYPE | +-------------------+------+---- <--------------- mac_len ----------------> So, two of vlan tags are both removed while only inner one should be removed and mac_header (and mac_len) is broken. skb_vlan_untag() is meant for removing the vlan header at (skb->data - 2), so use skb->data and skb->mac_header to calculate the right offset. Reported-by: Brandon Carpenter <brandon.carpenter@cypherpath.com> Fixes: a6e18ff11170 ("vlan: Fix untag operations of stacked vlans with REORDER_HEADER off") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-15net sched actions: return explicit error when tunnel_key mode is not specifiedRoman Mashak1-0/+1
If set/unset mode of the tunnel_key action is not provided, ->init() still returns 0, and the caller proceeds with bogus 'struct tc_action *' object, this results in crash: % tc actions add action tunnel_key src_ip 1.1.1.1 dst_ip 2.2.2.1 id 7 index 1 [ 35.805515] general protection fault: 0000 [#1] SMP PTI [ 35.806161] Modules linked in: act_tunnel_key kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd serio_raw [ 35.808233] CPU: 1 PID: 428 Comm: tc Not tainted 4.16.0-rc4+ #286 [ 35.808929] RIP: 0010:tcf_action_init+0x90/0x190 [ 35.809457] RSP: 0018:ffffb8edc068b9a0 EFLAGS: 00010206 [ 35.810053] RAX: 1320c000000a0003 RBX: 0000000000000001 RCX: 0000000000000000 [ 35.810866] RDX: 0000000000000070 RSI: 0000000000007965 RDI: ffffb8edc068b910 [ 35.811660] RBP: ffffb8edc068b9d0 R08: 0000000000000000 R09: ffffb8edc068b808 [ 35.812463] R10: ffffffffc02bf040 R11: 0000000000000040 R12: ffffb8edc068bb38 [ 35.813235] R13: 0000000000000000 R14: 0000000000000000 R15: ffffb8edc068b910 [ 35.814006] FS: 00007f3d0d8556c0(0000) GS:ffff91d1dbc40000(0000) knlGS:0000000000000000 [ 35.814881] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 35.815540] CR2: 000000000043f720 CR3: 0000000019248001 CR4: 00000000001606a0 [ 35.816457] Call Trace: [ 35.817158] tc_ctl_action+0x11a/0x220 [ 35.817795] rtnetlink_rcv_msg+0x23d/0x2e0 [ 35.818457] ? __slab_alloc+0x1c/0x30 [ 35.819079] ? __kmalloc_node_track_caller+0xb1/0x2b0 [ 35.819544] ? rtnl_calcit.isra.30+0xe0/0xe0 [ 35.820231] netlink_rcv_skb+0xce/0x100 [ 35.820744] netlink_unicast+0x164/0x220 [ 35.821500] netlink_sendmsg+0x293/0x370 [ 35.822040] sock_sendmsg+0x30/0x40 [ 35.822508] ___sys_sendmsg+0x2c5/0x2e0 [ 35.823149] ? pagecache_get_page+0x27/0x220 [ 35.823714] ? filemap_fault+0xa2/0x640 [ 35.824423] ? page_add_file_rmap+0x108/0x200 [ 35.825065] ? alloc_set_pte+0x2aa/0x530 [ 35.825585] ? finish_fault+0x4e/0x70 [ 35.826140] ? __handle_mm_fault+0xbc1/0x10d0 [ 35.826723] ? __sys_sendmsg+0x41/0x70 [ 35.827230] __sys_sendmsg+0x41/0x70 [ 35.827710] do_syscall_64+0x68/0x120 [ 35.828195] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 35.828859] RIP: 0033:0x7f3d0ca4da67 [ 35.829331] RSP: 002b:00007ffc9f284338 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 35.830304] RAX: ffffffffffffffda RBX: 00007ffc9f284460 RCX: 00007f3d0ca4da67 [ 35.831247] RDX: 0000000000000000 RSI: 00007ffc9f2843b0 RDI: 0000000000000003 [ 35.832167] RBP: 000000005aa6a7a9 R08: 0000000000000001 R09: 0000000000000000 [ 35.833075] R10: 00000000000005f1 R11: 0000000000000246 R12: 0000000000000000 [ 35.833997] R13: 00007ffc9f2884c0 R14: 0000000000000001 R15: 0000000000674640 [ 35.834923] Code: 24 30 bb 01 00 00 00 45 31 f6 eb 5e 8b 50 08 83 c2 07 83 e2 fc 83 c2 70 49 8b 07 48 8b 40 70 48 85 c0 74 10 48 89 14 24 4c 89 ff <ff> d0 48 8b 14 24 48 01 c2 49 01 d6 45 85 ed 74 05 41 83 47 2c [ 35.837442] RIP: tcf_action_init+0x90/0x190 RSP: ffffb8edc068b9a0 [ 35.838291] ---[ end trace a095c06ee4b97a26 ]--- Fixes: d0f6dd8a914f ("net/sched: Introduce act_tunnel_key") Signed-off-by: Roman Mashak <mrv@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-15net/smc: simplify wait when closing listen socketUrsula Braun2-26/+3
Closing of a listen socket wakes up kernel_accept() of smc_tcp_listen_worker(), and then has to wait till smc_tcp_listen_worker() gives up the internal clcsock. The wait logic introduced with commit 127f49705823 ("net/smc: release clcsock from tcp_listen_worker") might wait longer than necessary. This patch implements the idea to implement the wait just with flush_work(), and gets rid of the extra smc_close_wait_listen_clcsock() function. Fixes: 127f49705823 ("net/smc: release clcsock from tcp_listen_worker") Reported-by: Hans Wippel <hwippel@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtuSabrina Dubroca2-7/+20
Prior to the rework of PMTU information storage in commit 2c8cec5c10bc ("ipv4: Cache learned PMTU information in inetpeer."), when a PMTU event advertising a PMTU smaller than net.ipv4.route.min_pmtu was received, we would disable setting the DF flag on packets by locking the MTU metric, and set the PMTU to net.ipv4.route.min_pmtu. Since then, we don't disable DF, and set PMTU to net.ipv4.route.min_pmtu, so the intermediate router that has this link with a small MTU will have to drop the packets. This patch reestablishes pre-2.6.39 behavior by splitting rtable->rt_pmtu into a bitfield with rt_mtu_locked and rt_pmtu. rt_mtu_locked indicates that we shouldn't set the DF bit on that path, and is checked in ip_dont_fragment(). One possible workaround is to set net.ipv4.route.min_pmtu to a value low enough to accommodate the lowest MTU encountered. Fixes: 2c8cec5c10bc ("ipv4: Cache learned PMTU information in inetpeer.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-14net: use skb_to_full_sk() in skb_update_prio()Eric Dumazet1-7/+15
Andrei Vagin reported a KASAN: slab-out-of-bounds error in skb_update_prio() Since SYNACK might be attached to a request socket, we need to get back to the listener socket. Since this listener is manipulated without locks, add const qualifiers to sock_cgroup_prioidx() so that the const can also be used in skb_update_prio() Also add the const qualifier to sock_cgroup_classid() for consistency. Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-13Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsecDavid S. Miller11-33/+50
Steffen Klassert says: ==================== pull request (net): ipsec 2018-03-13 1) Refuse to insert 32 bit userspace socket policies on 64 bit systems like we do it for standard policies. We don't have a compat layer, so inserting socket policies from 32 bit userspace will lead to a broken configuration. 2) Make the policy hold queue work without the flowcache. Dummy bundles are not chached anymore, so we need to generate a new one on each lookup as long as the SAs are not yet in place. 3) Fix the validation of the esn replay attribute. The The sanity check in verify_replay() is bypassed if the XFRM_STATE_ESN flag is not set. Fix this by doing the sanity check uncoditionally. From Florian Westphal. 4) After most of the dst_entry garbage collection code is removed, we may leak xfrm_dst entries as they are neither cached nor tracked somewhere. Fix this by reusing the 'uncached_list' to track xfrm_dst entries too. From Xin Long. 5) Fix a rcu_read_lock/rcu_read_unlock imbalance in xfrm_get_tos() From Xin Long. 6) Fix an infinite loop in xfrm_get_dst_nexthop. On transport mode we fetch the child dst_entry after we continue, so this pointer is never updated. Fix this by fetching it before we continue. 7) Fix ESN sequence number gap after IPsec GSO packets. We accidentally increment the sequence number counter on the xfrm_state by one packet too much in the ESN case. Fix this by setting the sequence number to the correct value. 8) Reset the ethernet protocol after decapsulation only if a mac header was set. Otherwise it breaks configurations with TUN devices. From Yossi Kuperman. 9) Fix __this_cpu_read() usage in preemptible code. Use this_cpu_read() instead in ipcomp_alloc_tfms(). From Greg Hackmann. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-13net: xfrm: use preempt-safe this_cpu_read() in ipcomp_alloc_tfms()Greg Hackmann1-1/+1
f7c83bcbfaf5 ("net: xfrm: use __this_cpu_read per-cpu helper") added a __this_cpu_read() call inside ipcomp_alloc_tfms(). At the time, __this_cpu_read() required the caller to either not care about races or to handle preemption/interrupt issues. 3.15 tightened the rules around some per-cpu operations, and now __this_cpu_read() should never be used in a preemptible context. On 3.15 and later, we need to use this_cpu_read() instead. syzkaller reported this leading to the following kernel BUG while fuzzing sendmsg: BUG: using __this_cpu_read() in preemptible [00000000] code: repro/3101 caller is ipcomp_init_state+0x185/0x990 CPU: 3 PID: 3101 Comm: repro Not tainted 4.16.0-rc4-00123-g86f84779d8e9 #154 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0xb9/0x115 check_preemption_disabled+0x1cb/0x1f0 ipcomp_init_state+0x185/0x990 ? __xfrm_init_state+0x876/0xc20 ? lock_downgrade+0x5e0/0x5e0 ipcomp4_init_state+0xaa/0x7c0 __xfrm_init_state+0x3eb/0xc20 xfrm_init_state+0x19/0x60 pfkey_add+0x20df/0x36f0 ? pfkey_broadcast+0x3dd/0x600 ? pfkey_sock_destruct+0x340/0x340 ? pfkey_seq_stop+0x80/0x80 ? __skb_clone+0x236/0x750 ? kmem_cache_alloc+0x1f6/0x260 ? pfkey_sock_destruct+0x340/0x340 ? pfkey_process+0x62a/0x6f0 pfkey_process+0x62a/0x6f0 ? pfkey_send_new_mapping+0x11c0/0x11c0 ? mutex_lock_io_nested+0x1390/0x1390 pfkey_sendmsg+0x383/0x750 ? dump_sp+0x430/0x430 sock_sendmsg+0xc0/0x100 ___sys_sendmsg+0x6c8/0x8b0 ? copy_msghdr_from_user+0x3b0/0x3b0 ? pagevec_lru_move_fn+0x144/0x1f0 ? find_held_lock+0x32/0x1c0 ? do_huge_pmd_anonymous_page+0xc43/0x11e0 ? lock_downgrade+0x5e0/0x5e0 ? get_kernel_page+0xb0/0xb0 ? _raw_spin_unlock+0x29/0x40 ? do_huge_pmd_anonymous_page+0x400/0x11e0 ? __handle_mm_fault+0x553/0x2460 ? __fget_light+0x163/0x1f0 ? __sys_sendmsg+0xc7/0x170 __sys_sendmsg+0xc7/0x170 ? SyS_shutdown+0x1a0/0x1a0 ? __do_page_fault+0x5a0/0xca0 ? lock_downgrade+0x5e0/0x5e0 SyS_sendmsg+0x27/0x40 ? __sys_sendmsg+0x170/0x170 do_syscall_64+0x19f/0x640 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x7f0ee73dfb79 RSP: 002b:00007ffe14fc15a8 EFLAGS: 00000207 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0ee73dfb79 RDX: 0000000000000000 RSI: 00000000208befc8 RDI: 0000000000000004 RBP: 00007ffe14fc15b0 R08: 00007ffe14fc15c0 R09: 00007ffe14fc15c0 R10: 0000000000000000 R11: 0000000000000207 R12: 0000000000400440 R13: 00007ffe14fc16b0 R14: 0000000000000000 R15: 0000000000000000 Signed-off-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-03-12net: dsa: Fix dsa_is_user_port() test inversionFlorian Fainelli1-1/+1
During the conversion to dsa_is_user_port(), a condition ended up being reversed, which would prevent the creation of any user port when using the legacy binding and/or platform data, fix that. Fixes: 4a5b85ffe2a0 ("net: dsa: use dsa_is_user_port everywhere") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-12l2tp: fix races with ipv4-mapped ipv6 addressesPaolo Abeni2-23/+18
The l2tp_tunnel_create() function checks for v4mapped ipv6 sockets and cache that flag, so that l2tp core code can reusing it at xmit time. If the socket is provided by the userspace, the connection status of the tunnel sockets can change between the tunnel creation and the xmit call, so that syzbot is able to trigger the following splat: BUG: KASAN: use-after-free in ip6_dst_idev include/net/ip6_fib.h:192 [inline] BUG: KASAN: use-after-free in ip6_xmit+0x1f76/0x2260 net/ipv6/ip6_output.c:264 Read of size 8 at addr ffff8801bd949318 by task syz-executor4/23448 CPU: 0 PID: 23448 Comm: syz-executor4 Not tainted 4.16.0-rc4+ #65 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report+0x23c/0x360 mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 ip6_dst_idev include/net/ip6_fib.h:192 [inline] ip6_xmit+0x1f76/0x2260 net/ipv6/ip6_output.c:264 inet6_csk_xmit+0x2fc/0x580 net/ipv6/inet6_connection_sock.c:139 l2tp_xmit_core net/l2tp/l2tp_core.c:1053 [inline] l2tp_xmit_skb+0x105f/0x1410 net/l2tp/l2tp_core.c:1148 pppol2tp_sendmsg+0x470/0x670 net/l2tp/l2tp_ppp.c:341 sock_sendmsg_nosec net/socket.c:630 [inline] sock_sendmsg+0xca/0x110 net/socket.c:640 ___sys_sendmsg+0x767/0x8b0 net/socket.c:2046 __sys_sendmsg+0xe5/0x210 net/socket.c:2080 SYSC_sendmsg net/socket.c:2091 [inline] SyS_sendmsg+0x2d/0x50 net/socket.c:2087 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x453e69 RSP: 002b:00007f819593cc68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007f819593d6d4 RCX: 0000000000453e69 RDX: 0000000000000081 RSI: 000000002037ffc8 RDI: 0000000000000004 RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000004c3 R14: 00000000006f72e8 R15: 0000000000000000 This change addresses the issues: * explicitly checking for TCP_ESTABLISHED for user space provided sockets * dropping the v4mapped flag usage - it can become outdated - and explicitly invoking ipv6_addr_v4mapped() instead The issue is apparently there since ancient times. v1 -> v2: (many thanks to Guillaume) - with csum issue introduced in v1 - replace pr_err with pr_debug - fix build issue with IPV6 disabled - move l2tp_sk_is_v4mapped in l2tp_core.c v2 -> v3: - don't update inet_daddr for v4mapped address, unneeded - drop rendundant check at creation time Reported-and-tested-by: syzbot+92fa328176eb07e4ac1a@syzkaller.appspotmail.com Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-12net: ipv6: keep sk status consistent after datagram connect failurePaolo Abeni1-7/+14
On unsuccesful ip6_datagram_connect(), if the failure is caused by ip6_datagram_dst_update(), the sk peer information are cleared, but the sk->sk_state is preserved. If the socket was already in an established status, the overall sk status is inconsistent and fouls later checks in datagram code. Fix this saving the old peer information and restoring them in case of failure. This also aligns ipv6 datagram connect() behavior with ipv4. v1 -> v2: - added missing Fixes tag Fixes: 85cb73ff9b74 ("net: ipv6: reset daddr and dport in sk if connect() fails") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller7-11/+84
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Fixed hashtable representation doesn't support timeout flag, skip it otherwise rules to add elements from the packet fail bogusly fail with EOPNOTSUPP. 2) Fix bogus error with 32-bits ebtables userspace and 64-bits kernel, patch from Florian Westphal. 3) Sanitize proc names in several x_tables extensions, also from Florian. 4) Add sanitization to ebt_among wormhash logic, from Florian. 5) Missing release of hook array in flowtable. ====================
2018-03-12sock_diag: request _diag module only when the family or proto has been registeredXin Long4-10/+31
Now when using 'ss' in iproute, kernel would try to load all _diag modules, which also causes corresponding family and proto modules to be loaded as well due to module dependencies. Like after running 'ss', sctp, dccp, af_packet (if it works as a module) would be loaded. For example: $ lsmod|grep sctp $ ss $ lsmod|grep sctp sctp_diag 16384 0 sctp 323584 5 sctp_diag inet_diag 24576 4 raw_diag,tcp_diag,sctp_diag,udp_diag libcrc32c 16384 3 nf_conntrack,nf_nat,sctp As these family and proto modules are loaded unintentionally, it could cause some problems, like: - Some debug tools use 'ss' to collect the socket info, which loads all those diag and family and protocol modules. It's noisy for identifying issues. - Users usually expect to drop sctp init packet silently when they have no sense of sctp protocol instead of sending abort back. - It wastes resources (especially with multiple netns), and SCTP module can't be unloaded once it's loaded. ... In short, it's really inappropriate to have these family and proto modules loaded unexpectedly when just doing debugging with inet_diag. This patch is to introduce sock_load_diag_module() where it loads the _diag module only when it's corresponding family or proto has been already registered. Note that we can't just load _diag module without the family or proto loaded, as some symbols used in _diag module are from the family or proto module. v1->v2: - move inet proto check to inet_diag to avoid a compiling err. v2->v3: - define sock_load_diag_module in sock.c and export one symbol only. - improve the changelog. Reported-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Phil Sutter <phil@nwl.cc> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-11openvswitch: meter: fix the incorrect calculation of max delta_tzhangliping1-3/+9
Max delat_t should be the full_bucket/rate instead of the full_bucket. Also report EINVAL if the rate is zero. Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure") Cc: Andy Zhou <azhou@ovn.org> Signed-off-by: zhangliping <zhangliping02@baidu.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-11netfilter: nf_tables: release flowtable hooksPablo Neira Ayuso1-0/+1
Otherwise we leak this array. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-11netfilter: bridge: ebt_among: add more missing match size checksFlorian Westphal1-0/+34
ebt_among is special, it has a dynamic match size and is exempt from the central size checks. commit c4585a2823edf ("bridge: ebt_among: add missing match size checks") added validation for pool size, but missed fact that the macros ebt_among_wh_src/dst can already return out-of-bound result because they do not check value of wh_src/dst_ofs (an offset) vs. the size of the match that userspace gave to us. v2: check that offset has correct alignment. Paolo Abeni points out that we should also check that src/dst wormhash arrays do not overlap, and src + length lines up with start of dst (or vice versa). v3: compact wormhash_sizes_valid() part NB: Fixes tag is intentionally wrong, this bug exists from day one when match was added for 2.6 kernel. Tag is there so stable maintainers will notice this one too. Tested with same rules from the earlier patch. Fixes: c4585a2823edf ("bridge: ebt_among: add missing match size checks") Reported-by: <syzbot+bdabab6f1983a03fc009@syzkaller.appspotmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-11netfilter: x_tables: add and use xt_check_proc_nameFlorian Westphal3-9/+43
recent and hashlimit both create /proc files, but only check that name is 0 terminated. This can trigger WARN() from procfs when name is "" or "/". Add helper for this and then use it for both. Cc: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: <syzbot+0502b00edac2a0680b61@syzkaller.appspotmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-11netfilter: ebtables: fix erroneous reject of last ruleFlorian Westphal1-1/+5
The last rule in the blob has next_entry offset that is same as total size. This made "ebtables32 -A OUTPUT -d de:ad:be:ef:01:02" fail on 64 bit kernel. Fixes: b71812168571fa ("netfilter: ebtables: CONFIG_COMPAT: don't trust userland offsets") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-09ip6erspan: make sure enough headroom at xmit.William Tu1-0/+3
The patch adds skb_cow_header() to ensure enough headroom at ip6erspan_tunnel_xmit before pushing the erspan header to the skb. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09ip6erspan: improve error handling for erspan version number.William Tu1-0/+2
When users fill in incorrect erspan version number through the struct erspan_metadata uapi, current code skips pushing the erspan header but continue pushing the gre header, which is incorrect. The patch fixes it by returning error. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09ip6gre: add erspan v2 to tunnel lookupWilliam Tu1-1/+2
The patch adds the erspan v2 proto in ip6gre_tunnel_lookup so the erspan v2 tunnel can be found correctly. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09net: use skb_is_gso_sctp() instead of open-codingDaniel Axtens5-8/+8
As well as the basic conversion, I noticed that a lot of the SCTP code checks gso_type without first checking skb_is_gso() so I have added that where appropriate. Also, document the helper. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09ieee802154: 6lowpan: fix possible NULL deref in lowpan_device_event()Eric Dumazet1-4/+8
A tun device type can trivially be set to arbitrary value using TUNSETLINK ioctl(). Therefore, lowpan_device_event() must really check that ieee802154_ptr is not NULL. Fixes: 2c88b5283f60d ("ieee802154: 6lowpan: remove check on null") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Aring <alex.aring@gmail.com> Cc: Stefan Schmidt <stefan@osg.samsung.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-09ipv6: fix access to non-linear packet in ndisc_fill_redirect_hdr_option()Lorenzo Bianconi1-1/+2
Fix the following slab-out-of-bounds kasan report in ndisc_fill_redirect_hdr_option when the incoming ipv6 packet is not linear and the accessed data are not in the linear data region of orig_skb. [ 1503.122508] ================================================================== [ 1503.122832] BUG: KASAN: slab-out-of-bounds in ndisc_send_redirect+0x94e/0x990 [ 1503.123036] Read of size 1184 at addr ffff8800298ab6b0 by task netperf/1932 [ 1503.123220] CPU: 0 PID: 1932 Comm: netperf Not tainted 4.16.0-rc2+ #124 [ 1503.123347] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014 [ 1503.123527] Call Trace: [ 1503.123579] <IRQ> [ 1503.123638] print_address_description+0x6e/0x280 [ 1503.123849] kasan_report+0x233/0x350 [ 1503.123946] memcpy+0x1f/0x50 [ 1503.124037] ndisc_send_redirect+0x94e/0x990 [ 1503.125150] ip6_forward+0x1242/0x13b0 [...] [ 1503.153890] Allocated by task 1932: [ 1503.153982] kasan_kmalloc+0x9f/0xd0 [ 1503.154074] __kmalloc_track_caller+0xb5/0x160 [ 1503.154198] __kmalloc_reserve.isra.41+0x24/0x70 [ 1503.154324] __alloc_skb+0x130/0x3e0 [ 1503.154415] sctp_packet_transmit+0x21a/0x1810 [ 1503.154533] sctp_outq_flush+0xc14/0x1db0 [ 1503.154624] sctp_do_sm+0x34e/0x2740 [ 1503.154715] sctp_primitive_SEND+0x57/0x70 [ 1503.154807] sctp_sendmsg+0xaa6/0x1b10 [ 1503.154897] sock_sendmsg+0x68/0x80 [ 1503.154987] ___sys_sendmsg+0x431/0x4b0 [ 1503.155078] __sys_sendmsg+0xa4/0x130 [ 1503.155168] do_syscall_64+0x171/0x3f0 [ 1503.155259] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 1503.155436] Freed by task 1932: [ 1503.155527] __kasan_slab_free+0x134/0x180 [ 1503.155618] kfree+0xbc/0x180 [ 1503.155709] skb_release_data+0x27f/0x2c0 [ 1503.155800] consume_skb+0x94/0xe0 [ 1503.155889] sctp_chunk_put+0x1aa/0x1f0 [ 1503.155979] sctp_inq_pop+0x2f8/0x6e0 [ 1503.156070] sctp_assoc_bh_rcv+0x6a/0x230 [ 1503.156164] sctp_inq_push+0x117/0x150 [ 1503.156255] sctp_backlog_rcv+0xdf/0x4a0 [ 1503.156346] __release_sock+0x142/0x250 [ 1503.156436] release_sock+0x80/0x180 [ 1503.156526] sctp_sendmsg+0xbb0/0x1b10 [ 1503.156617] sock_sendmsg+0x68/0x80 [ 1503.156708] ___sys_sendmsg+0x431/0x4b0 [ 1503.156799] __sys_sendmsg+0xa4/0x130 [ 1503.156889] do_syscall_64+0x171/0x3f0 [ 1503.156980] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 1503.157158] The buggy address belongs to the object at ffff8800298ab600 which belongs to the cache kmalloc-1024 of size 1024 [ 1503.157444] The buggy address is located 176 bytes inside of 1024-byte region [ffff8800298ab600, ffff8800298aba00) [ 1503.157702] The buggy address belongs to the page: [ 1503.157820] page:ffffea0000a62a00 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0 [ 1503.158053] flags: 0x4000000000008100(slab|head) [ 1503.158171] raw: 4000000000008100 0000000000000000 0000000000000000 00000001800e000e [ 1503.158350] raw: dead000000000100 dead000000000200 ffff880036002600 0000000000000000 [ 1503.158523] page dumped because: kasan: bad access detected [ 1503.158698] Memory state around the buggy address: [ 1503.158816] ffff8800298ab900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1503.158988] ffff8800298ab980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1503.159165] >ffff8800298aba00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 1503.159338] ^ [ 1503.159436] ffff8800298aba80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1503.159610] ffff8800298abb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1503.159785] ================================================================== [ 1503.159964] Disabling lock debugging due to kernel taint The test scenario to trigger the issue consists of 4 devices: - H0: data sender, connected to LAN0 - H1: data receiver, connected to LAN1 - GW0 and GW1: routers between LAN0 and LAN1. Both of them have an ethernet connection on LAN0 and LAN1 On H{0,1} set GW0 as default gateway while on GW0 set GW1 as next hop for data from LAN0 to LAN1. Moreover create an ip6ip6 tunnel between H0 and H1 and send 3 concurrent data streams (TCP/UDP/SCTP) from H0 to H1 through ip6ip6 tunnel (send buffer size is set to 16K). While data streams are active flush the route cache on HA multiple times. I have not been able to identify a given commit that introduced the issue since, using the reproducer described above, the kasan report has been triggered from 4.14 and I have not gone back further. Reported-by: Jianlin Shi <jishi@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-18/+42
Daniel Borkmann says: ==================== pull-request: bpf 2018-03-08 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix various BPF helpers which adjust the skb and its GSO information with regards to SCTP GSO. The latter is a special case where gso_size is of value GSO_BY_FRAGS, so mangling that will end up corrupting the skb, thus bail out when seeing SCTP GSO packets, from Daniel(s). 2) Fix a compilation error in bpftool where BPF_FS_MAGIC is not defined due to too old kernel headers in the system, from Jiri. 3) Increase the number of x64 JIT passes in order to allow larger images to converge instead of punting them to interpreter or having them rejected when the interpreter is not built into the kernel, from Daniel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-07net: don't unnecessarily load kernel modules in dev_ioctl()Paul Moore1-2/+5
Starting with v4.16-rc1 we've been seeing a higher than usual number of requests for the kernel to load networking modules, even on events which shouldn't trigger a module load (e.g. ioctl(TCGETS)). Stephen Smalley suggested the problem may lie in commit 44c02a2c3dc5 ("dev_ioctl(): move copyin/copyout to callers") which moves changes the network dev_ioctl() function to always call dev_load(), regardless of the requested ioctl. This patch moves the dev_load() calls back into the individual ioctls while preserving the rest of the original patch. Reported-by: Dominick Grift <dac.override@gmail.com> Suggested-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-07tcp: purge write queue upon aborting the connectionSoheil Hassas Yeganeh2-0/+2
When the connection is aborted, there is no point in keeping the packets on the write queue until the connection is closed. Similar to a27fd7a8ed38 ('tcp: purge write queue upon RST'), this is essential for a correct MSG_ZEROCOPY implementation, because userspace cannot call close(fd) before receiving zerocopy signals even when the connection is aborted. Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY") Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-07dccp: check sk for closed state in dccp_sendmsg()Alexey Kodanev1-0/+5
dccp_disconnect() sets 'dp->dccps_hc_tx_ccid' tx handler to NULL, therefore if DCCP socket is disconnected and dccp_sendmsg() is called after it, it will cause a NULL pointer dereference in dccp_write_xmit(). This crash and the reproducer was reported by syzbot. Looks like it is reproduced if commit 69c64866ce07 ("dccp: CVE-2017-8824: use-after-free in DCCP code") is applied. Reported-by: syzbot+f99ab3887ab65d70f816@syzkaller.appspotmail.com Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-07l2tp: do not accept arbitrary socketsEric Dumazet1-2/+6
syzkaller found an issue caused by lack of sufficient checks in l2tp_tunnel_create() RAW sockets can not be considered as UDP ones for instance. In another patch, we shall replace all pr_err() by less intrusive pr_debug() so that syzkaller can find other bugs faster. Acked-by: Guillaume Nault <g.nault@alphalink.fr> Acked-by: James Chapman <jchapman@katalix.com> ================================================================== BUG: KASAN: slab-out-of-bounds in setup_udp_tunnel_sock+0x3ee/0x5f0 net/ipv4/udp_tunnel.c:69 dst_release: dst:00000000d53d0d0f refcnt:-1 Write of size 1 at addr ffff8801d013b798 by task syz-executor3/6242 CPU: 1 PID: 6242 Comm: syz-executor3 Not tainted 4.16.0-rc2+ #253 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report+0x23b/0x360 mm/kasan/report.c:412 __asan_report_store1_noabort+0x17/0x20 mm/kasan/report.c:435 setup_udp_tunnel_sock+0x3ee/0x5f0 net/ipv4/udp_tunnel.c:69 l2tp_tunnel_create+0x1354/0x17f0 net/l2tp/l2tp_core.c:1596 pppol2tp_connect+0x14b1/0x1dd0 net/l2tp/l2tp_ppp.c:707 SYSC_connect+0x213/0x4a0 net/socket.c:1640 SyS_connect+0x24/0x30 net/socket.c:1621 do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-07net: Fix hlist corruptions in inet_evict_bucket()Kirill Tkhai1-0/+3
inet_evict_bucket() iterates global list, and several tasks may call it in parallel. All of them hash the same fq->list_evictor to different lists, which leads to list corruption. This patch makes fq be hashed to expired list only if this has not been made yet by another task. Since inet_frag_alloc() allocates fq using kmem_cache_zalloc(), we may rely on list_evictor is initially unhashed. The problem seems to exist before async pernet_operations, as there was possible to have exit method to be executed in parallel with inet_frags::frags_work, so I add two Fixes tags. This also may go to stable. Fixes: d1fe19444d82 "inet: frag: don't re-use chainlist for evictor" Fixes: f84c6821aa54 "net: Convert pernet_subsys, registered from inet_init()" Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-07ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routesStefano Brivio1-29/+42
Currently, administrative MTU changes on a given netdevice are not reflected on route exceptions for MTU-less routes, with a set PMTU value, for that device: # ip -6 route get 2001:db8::b 2001:db8::b from :: dev vti_a proto kernel src 2001:db8::a metric 256 pref medium # ping6 -c 1 -q -s10000 2001:db8::b > /dev/null # ip netns exec a ip -6 route get 2001:db8::b 2001:db8::b from :: dev vti_a src 2001:db8::a metric 0 cache expires 571sec mtu 4926 pref medium # ip link set dev vti_a mtu 3000 # ip -6 route get 2001:db8::b 2001:db8::b from :: dev vti_a src 2001:db8::a metric 0 cache expires 571sec mtu 4926 pref medium # ip link set dev vti_a mtu 9000 # ip -6 route get 2001:db8::b 2001:db8::b from :: dev vti_a src 2001:db8::a metric 0 cache expires 571sec mtu 4926 pref medium The first issue is that since commit fb56be83e43d ("net-ipv6: on device mtu change do not add mtu to mtu-less routes") we don't call rt6_exceptions_update_pmtu() from rt6_mtu_change_route(), which handles administrative MTU changes, if the regular route is MTU-less. However, PMTU exceptions should be always updated, as long as RTAX_MTU is not locked. Keep the check for MTU-less main route, as introduced by that commit, but, for exceptions, call rt6_exceptions_update_pmtu() regardless of that check. Once that is fixed, one problem remains: MTU changes are not reflected if the new MTU is higher than the previous one, because rt6_exceptions_update_pmtu() doesn't allow that. We should instead allow PMTU increase if the old PMTU matches the local MTU, as that implies that the old MTU was the lowest in the path, and PMTU discovery might lead to different results. The existing check in rt6_mtu_change_route() correctly took that case into account (for regular routes only), so factor it out and re-use it also in rt6_exceptions_update_pmtu(). While at it, fix comments style and grammar, and try to be a bit more descriptive. Reported-by: Xiumei Mu <xmu@redhat.com> Fixes: fb56be83e43d ("net-ipv6: on device mtu change do not add mtu to mtu-less routes") Fixes: f5bbe7ee79c2 ("ipv6: prepare rt6_mtu_change() for exception table") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-07sch_netem: fix skb leak in netem_enqueue()Alexey Kodanev1-1/+1
When we exceed current packets limit and we have more than one segment in the list returned by skb_gso_segment(), netem drops only the first one, skipping the rest, hence kmemleak reports: unreferenced object 0xffff880b5d23b600 (size 1024): comm "softirq", pid 0, jiffies 4384527763 (age 2770.629s) hex dump (first 32 bytes): 00 80 23 5d 0b 88 ff ff 00 00 00 00 00 00 00 00 ..#]............ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000d8a19b9d>] __alloc_skb+0xc9/0x520 [<000000001709b32f>] skb_segment+0x8c8/0x3710 [<00000000c7b9bb88>] tcp_gso_segment+0x331/0x1830 [<00000000c921cba1>] inet_gso_segment+0x476/0x1370 [<000000008b762dd4>] skb_mac_gso_segment+0x1f9/0x510 [<000000002182660a>] __skb_gso_segment+0x1dd/0x620 [<00000000412651b9>] netem_enqueue+0x1536/0x2590 [sch_netem] [<0000000005d3b2a9>] __dev_queue_xmit+0x1167/0x2120 [<00000000fc5f7327>] ip_finish_output2+0x998/0xf00 [<00000000d309e9d3>] ip_output+0x1aa/0x2c0 [<000000007ecbd3a4>] tcp_transmit_skb+0x18db/0x3670 [<0000000042d2a45f>] tcp_write_xmit+0x4d4/0x58c0 [<0000000056a44199>] tcp_tasklet_func+0x3d9/0x540 [<0000000013d06d02>] tasklet_action+0x1ca/0x250 [<00000000fcde0b8b>] __do_softirq+0x1b4/0x5a3 [<00000000e7ed027c>] irq_exit+0x1e2/0x210 Fix it by adding the rest of the segments, if any, to skb 'to_free' list. Add new __qdisc_drop_all() and qdisc_drop_all() functions because they can be useful in the future if we need to drop segmented GSO packets in other places. Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-07Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetoothDavid S. Miller1-1/+7
Johan Hedberg says: ==================== pull request: bluetooth 2018-03-05 Here are a few more Bluetooth fixes for the 4.16 kernel: - btusb: reset/resume fixes for Yoga 920 and Dell OptiPlex 3060 - Fix for missing encryption refresh with the Security Manager protocol ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-07xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_protoYossi Kuperman2-2/+4
Artem Savkov reported that commit 5efec5c655dd leads to a packet loss under IPSec configuration. It appears that his setup consists of a TUN device, which does not have a MAC header. Make sure MAC header exists. Note: TUN device sets a MAC header pointer, although it does not have one. Fixes: 5efec5c655dd ("xfrm: Fix eth_hdr(skb)->h_proto to reflect inner IP version") Reported-by: Artem Savkov <artem.savkov@gmail.com> Tested-by: Artem Savkov <artem.savkov@gmail.com> Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-03-06net: Only honor ifindex in IP_PKTINFO if non-0David Ahern1-2/+4
Only allow ifindex from IP_PKTINFO to override SO_BINDTODEVICE settings if the index is actually set in the message. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-06netfilter: nft_set_hash: skip fixed hash if timeout is specifiedPablo Neira Ayuso1-1/+1
Fixed hash supports to timeouts, so skip it. Otherwise, userspace hits EOPNOTSUPP. Fixes: 6c03ae210ce3 ("netfilter: nft_set_hash: add non-resizable hashtable implementation") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-03-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds58-357/+433
Pull networking fixes from David Miller: 1) Use an appropriate TSQ pacing shift in mac80211, from Toke Høiland-Jørgensen. 2) Just like ipv4's ip_route_me_harder(), we have to use skb_to_full_sk in ip6_route_me_harder, from Eric Dumazet. 3) Fix several shutdown races and similar other problems in l2tp, from James Chapman. 4) Handle missing XDP flush properly in tuntap, for real this time. From Jason Wang. 5) Out-of-bounds access in powerpc ebpf tailcalls, from Daniel Borkmann. 6) Fix phy_resume() locking, from Andrew Lunn. 7) IFLA_MTU values are ignored on newlink for some tunnel types, fix from Xin Long. 8) Revert F-RTO middle box workarounds, they only handle one dimension of the problem. From Yuchung Cheng. 9) Fix socket refcounting in RDS, from Ka-Cheong Poon. 10) Don't allow ppp unit registration to an unregistered channel, from Guillaume Nault. 11) Various hv_netvsc fixes from Stephen Hemminger. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (98 commits) hv_netvsc: propagate rx filters to VF hv_netvsc: filter multicast/broadcast hv_netvsc: defer queue selection to VF hv_netvsc: use napi_schedule_irqoff hv_netvsc: fix race in napi poll when rescheduling hv_netvsc: cancel subchannel setup before halting device hv_netvsc: fix error unwind handling if vmbus_open fails hv_netvsc: only wake transmit queue if link is up hv_netvsc: avoid retry on send during shutdown virtio-net: re enable XDP_REDIRECT for mergeable buffer ppp: prevent unregistered channels from connecting to PPP units tc-testing: skbmod: fix match value of ethertype mlxsw: spectrum_switchdev: Check success of FDB add operation net: make skb_gso_*_seglen functions private net: xfrm: use skb_gso_validate_network_len() to check gso sizes net: sched: tbf: handle GSO_BY_FRAGS case in enqueue net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len rds: Incorrect reference counting in TCP socket creation net: ethtool: don't ignore return from driver get_fecparam method vrf: check forwarding on the original netdevice when generating ICMP dest unreachable ...
2018-03-04net: make skb_gso_*_seglen functions privateDaniel Axtens1-2/+35
They're very hard to use properly as they do not consider the GSO_BY_FRAGS case. Code should use skb_gso_validate_network_len and skb_gso_validate_mac_len as they do consider this case. Make the seglen functions static, which stops people using them outside of skbuff.c Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-04net: xfrm: use skb_gso_validate_network_len() to check gso sizesDaniel Axtens2-2/+3
Replace skb_gso_network_seglen() with skb_gso_validate_network_len(), as it considers the GSO_BY_FRAGS case. Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-04net: sched: tbf: handle GSO_BY_FRAGS case in enqueueDaniel Axtens1-1/+2
tbf_enqueue() checks the size of a packet before enqueuing it. However, the GSO size check does not consider the GSO_BY_FRAGS case, and so will drop GSO SCTP packets, causing a massive drop in throughput. Use skb_gso_validate_mac_len() instead, as it does consider that case. Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-04net: rename skb_gso_validate_mtu -> skb_gso_validate_network_lenDaniel Axtens8-12/+13
If you take a GSO skb, and split it into packets, will the network length (L3 headers + L4 headers + payload) of those packets be small enough to fit within a given MTU? skb_gso_validate_mtu gives you the answer to that question. However, we recently added to add a way to validate the MAC length of a split GSO skb (L2+L3+L4+payload), and the names get confusing, so rename skb_gso_validate_mtu to skb_gso_validate_network_len Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-03Merge tag 'batadv-net-for-davem-20180302' of git://git.open-mesh.org/linux-mergeDavid S. Miller9-39/+50
Simon Wunderlich says: ==================== Here are some batman-adv bugfixes: - fix skb checksum issues, by Matthias Schiffer (2 patches) - fix exception handling when dumping data objects through netlink, by Sven Eckelmann (4 patches) - fix handling of interface indices, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-03bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat to deal with gso sctp skbsDaniel Axtens1-18/+42
SCTP GSO skbs have a gso_size of GSO_BY_FRAGS, so any sort of unconditionally mangling of that will result in nonsense value and would corrupt the skb later on. Therefore, i) add two helpers skb_increase_gso_size() and skb_decrease_gso_size() that would throw a one time warning and bail out for such skbs and ii) refuse and return early with an error in those BPF helpers that are affected. We do need to bail out as early as possible from there before any changes on the skb have been performed. Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper") Co-authored-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Axtens <dja@axtens.net> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-03-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller11-39/+98
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Put back reference on CLUSTERIP configuration structure from the error path, patch from Florian Westphal. 2) Put reference on CLUSTERIP configuration instead of freeing it, another cpu may still be walking over it, also from Florian. 3) Refetch pointer to IPv6 header from nf_nat_ipv6_manip_pkt() given packet manipulation may reallocation the skbuff header, from Florian. 4) Missing match size sanity checks in ebt_among, from Florian. 5) Convert BUG_ON to WARN_ON in ebtables, from Florian. 6) Sanity check userspace offsets from ebtables kernel, from Florian. 7) Missing checksum replace call in flowtable IPv4 DNAT, from Felix Fietkau. 8) Bump the right stats on checksum error from bridge netfilter, from Taehee Yoo. 9) Unset interface flag in IPv6 fib lookups otherwise we get misleading routing lookup results, from Florian. 10) Missing sk_to_full_sk() in ip6_route_me_harder() from Eric Dumazet. 11) Don't allow devices to be part of multiple flowtables at the same time, this may break setups. 12) Missing netlink attribute validation in flowtable deletion. 13) Wrong array index in nf_unregister_net_hook() call from error path in flowtable addition path. 14) Fix FTP IPVS helper when NAT mangling is in place, patch from Julian Anastasov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-02Merge tag 'mac80211-for-davem-2018-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211David S. Miller3-9/+14
Johannes Berg says: ==================== Three more patches: * fix for a regression in 4-addr mode with fast-RX * fix for a Kconfig problem with the new regdb * fix for the long-standing TCP performance issue in wifi using the new sk_pacing_shift_update() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-02rds: Incorrect reference counting in TCP socket creationKa-Cheong Poon1-3/+11
Commit 0933a578cd55 ("rds: tcp: use sock_create_lite() to create the accept socket") has a reference counting issue in TCP socket creation when accepting a new connection. The code uses sock_create_lite() to create a kernel socket. But it does not do __module_get() on the socket owner. When the connection is shutdown and sock_release() is called to free the socket, the owner's reference count is decremented and becomes incorrect. Note that this bug only shows up when the socket owner is configured as a kernel module. v2: Update comments Fixes: 0933a578cd55 ("rds: tcp: use sock_create_lite() to create the accept socket") Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01net: ethtool: don't ignore return from driver get_fecparam methodEdward Cree1-1/+4
If ethtool_ops->get_fecparam returns an error, pass that error on to the user, rather than ignoring it. Fixes: 1a5f3da20bd9 ("net: ethtool: add support for forward error correction modes") Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01vrf: check forwarding on the original netdevice when generating ICMP dest unreachableStephen Suryaputra1-1/+10
When ip_error() is called the device is the l3mdev master instead of the original device. So the forwarding check should be on the original one. Changes from v2: - Handle the original device disappearing (per David Ahern) - Minimize the change in code order Changes from v1: - Only need to reset the device on which __in_dev_get_rcu() is done (per David Ahern). Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01net: allow interface to be set into VRF if VLAN interface in same VRFMike Manning1-5/+9
Setting an interface into a VRF fails with 'RTNETLINK answers: File exists' if one of its VLAN interfaces is already in the same VRF. As the VRF is an upper device of the VLAN interface, it is also showing up as an upper device of the interface itself. The solution is to restrict this check to devices other than master. As only one master device can be linked to a device, the check in this case is that the upper device (VRF) being linked to is not the same as the master device instead of it not being any one of the upper devices. The following example shows an interface ens12 (with a VLAN interface ens12.10) being set into VRF green, which behaves as expected: # ip link add link ens12 ens12.10 type vlan id 10 # ip link set dev ens12 master vrfgreen # ip link show dev ens12 3: ens12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master vrfgreen state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:4c:a0:45 brd ff:ff:ff:ff:ff:ff But if the VLAN interface has previously been set into the same VRF, then setting the interface into the VRF fails: # ip link set dev ens12 nomaster # ip link set dev ens12.10 master vrfgreen # ip link show dev ens12.10 39: ens12.10@ens12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vrfgreen state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:4c:a0:45 brd ff:ff:ff:ff:ff:ff # ip link set dev ens12 master vrfgreen RTNETLINK answers: File exists The workaround is to move the VLAN interface back into the default VRF beforehand, but it has to be shut first so as to avoid the risk of traffic leaking from the VRF. This fix avoids needing this workaround. Signed-off-by: Mike Manning <mmanning@att.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01Bluetooth: Fix missing encryption refresh on Security RequestSzymon Janc1-1/+7
If Security Request is received on connection that is already encrypted with sufficient security master should perform encryption key refresh procedure instead of just ignoring Slave Security Request (Core Spec 5.0 Vol 3 Part H 2.4.6). > ACL Data RX: Handle 3585 flags 0x02 dlen 6 SMP: Security Request (0x0b) len 1 Authentication requirement: Bonding, No MITM, SC, No Keypresses (0x09) < HCI Command: LE Start Encryption (0x08|0x0019) plen 28 Handle: 3585 Random number: 0x0000000000000000 Encrypted diversifier: 0x0000 Long term key: 44264272a5c426a9e868f034cf0e69f3 > HCI Event: Command Status (0x0f) plen 4 LE Start Encryption (0x08|0x0019) ncmd 1 Status: Success (0x00) > HCI Event: Encryption Key Refresh Complete (0x30) plen 3 Status: Success (0x00) Handle: 3585 Signed-off-by: Szymon Janc <szymon.janc@codecoup.pl> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-03-01net: ipv4: avoid unused variable warning for sysctlArnd Bergmann1-2/+1
The newly introudced ip_min_valid_pmtu variable is only used when CONFIG_SYSCTL is set: net/ipv4/route.c:135:12: error: 'ip_min_valid_pmtu' defined but not used [-Werror=unused-variable] This moves it to the other variables like it, to avoid the harmless warning. Fixes: c7272c2f1229 ("net: ipv4: don't allow setting net.ipv4.route.min_pmtu below 68") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>