aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv6 (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-08-20ipv6: repair fib6 tree in failure caseWei Wang1-13/+11
In fib6_add(), it is possible that fib6_add_1() picks an intermediate node and sets the node's fn->leaf to NULL in order to add this new route. However, if fib6_add_rt2node() fails to add the new route for some reason, fn->leaf will be left as NULL and could potentially cause crash when fn->leaf is accessed in fib6_locate(). This patch makes sure fib6_repair_tree() is called to properly repair fn->leaf in the above failure case. Here is the syzkaller reported general protection fault in fib6_locate: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Modules linked in: CPU: 0 PID: 40937 Comm: syz-executor3 Not tainted Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801d7d64100 ti: ffff8801d01a0000 task.ti: ffff8801d01a0000 RIP: 0010:[<ffffffff82a3e0e1>] [<ffffffff82a3e0e1>] __ipv6_prefix_equal64_half include/net/ipv6.h:475 [inline] RIP: 0010:[<ffffffff82a3e0e1>] [<ffffffff82a3e0e1>] ipv6_prefix_equal include/net/ipv6.h:492 [inline] RIP: 0010:[<ffffffff82a3e0e1>] [<ffffffff82a3e0e1>] fib6_locate_1 net/ipv6/ip6_fib.c:1210 [inline] RIP: 0010:[<ffffffff82a3e0e1>] [<ffffffff82a3e0e1>] fib6_locate+0x281/0x3c0 net/ipv6/ip6_fib.c:1233 RSP: 0018:ffff8801d01a36a8 EFLAGS: 00010202 RAX: 0000000000000020 RBX: ffff8801bc790e00 RCX: ffffc90002983000 RDX: 0000000000001219 RSI: ffff8801d01a37a0 RDI: 0000000000000100 RBP: ffff8801d01a36f0 R08: 00000000000000ff R09: 0000000000000000 R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001 R13: dffffc0000000000 R14: ffff8801d01a37a0 R15: 0000000000000000 FS: 00007f6afd68c700(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004c6340 CR3: 00000000ba41f000 CR4: 00000000001426f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: ffff8801d01a37a8 ffff8801d01a3780 ffffed003a0346f5 0000000c82a23ea0 ffff8800b7bd7700 ffff8801d01a3780 ffff8800b6a1c940 ffffffff82a23ea0 ffff8801d01a3920 ffff8801d01a3748 ffffffff82a223d6 ffff8801d7d64988 Call Trace: [<ffffffff82a223d6>] ip6_route_del+0x106/0x570 net/ipv6/route.c:2109 [<ffffffff82a23f9d>] inet6_rtm_delroute+0xfd/0x100 net/ipv6/route.c:3075 [<ffffffff82621359>] rtnetlink_rcv_msg+0x549/0x7a0 net/core/rtnetlink.c:3450 [<ffffffff8274c1d1>] netlink_rcv_skb+0x141/0x370 net/netlink/af_netlink.c:2281 [<ffffffff82613ddf>] rtnetlink_rcv+0x2f/0x40 net/core/rtnetlink.c:3456 [<ffffffff8274ad38>] netlink_unicast_kernel net/netlink/af_netlink.c:1206 [inline] [<ffffffff8274ad38>] netlink_unicast+0x518/0x750 net/netlink/af_netlink.c:1232 [<ffffffff8274b83e>] netlink_sendmsg+0x8ce/0xc30 net/netlink/af_netlink.c:1778 [<ffffffff82564aff>] sock_sendmsg_nosec net/socket.c:609 [inline] [<ffffffff82564aff>] sock_sendmsg+0xcf/0x110 net/socket.c:619 [<ffffffff82564d62>] sock_write_iter+0x222/0x3a0 net/socket.c:834 [<ffffffff8178523d>] new_sync_write+0x1dd/0x2b0 fs/read_write.c:478 [<ffffffff817853f4>] __vfs_write+0xe4/0x110 fs/read_write.c:491 [<ffffffff81786c38>] vfs_write+0x178/0x4b0 fs/read_write.c:538 [<ffffffff817892a9>] SYSC_write fs/read_write.c:585 [inline] [<ffffffff817892a9>] SyS_write+0xd9/0x1b0 fs/read_write.c:577 [<ffffffff82c71e32>] entry_SYSCALL_64_fastpath+0x12/0x17 Note: there is no "Fixes" tag as this seems to be a bug introduced very early. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-18ipv6: reset fn->rr_ptr when replacing routeWei Wang1-0/+4
syzcaller reported the following use-after-free issue in rt6_select(): BUG: KASAN: use-after-free in rt6_select net/ipv6/route.c:755 [inline] at addr ffff8800bc6994e8 BUG: KASAN: use-after-free in ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084 at addr ffff8800bc6994e8 Read of size 4 by task syz-executor1/439628 CPU: 0 PID: 439628 Comm: syz-executor1 Not tainted 4.3.5+ #8 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 0000000000000000 ffff88018fe435b0 ffffffff81ca384d ffff8801d3588c00 ffff8800bc699380 ffff8800bc699500 dffffc0000000000 ffff8801d40a47c0 ffff88018fe435d8 ffffffff81735751 ffff88018fe43660 ffff8800bc699380 Call Trace: [<ffffffff81ca384d>] __dump_stack lib/dump_stack.c:15 [inline] [<ffffffff81ca384d>] dump_stack+0xc1/0x124 lib/dump_stack.c:51 sctp: [Deprecated]: syz-executor0 (pid 439615) Use of struct sctp_assoc_value in delayed_ack socket option. Use struct sctp_sack_info instead [<ffffffff81735751>] kasan_object_err+0x21/0x70 mm/kasan/report.c:158 [<ffffffff817359c4>] print_address_description mm/kasan/report.c:196 [inline] [<ffffffff817359c4>] kasan_report_error+0x1b4/0x4a0 mm/kasan/report.c:285 [<ffffffff81735d93>] kasan_report mm/kasan/report.c:305 [inline] [<ffffffff81735d93>] __asan_report_load4_noabort+0x43/0x50 mm/kasan/report.c:325 [<ffffffff82a28e39>] rt6_select net/ipv6/route.c:755 [inline] [<ffffffff82a28e39>] ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084 [<ffffffff82a28fb1>] ip6_pol_route_output+0x81/0xb0 net/ipv6/route.c:1203 [<ffffffff82ab0a50>] fib6_rule_action+0x1f0/0x680 net/ipv6/fib6_rules.c:95 [<ffffffff8265cbb6>] fib_rules_lookup+0x2a6/0x7a0 net/core/fib_rules.c:223 [<ffffffff82ab1430>] fib6_rule_lookup+0xd0/0x250 net/ipv6/fib6_rules.c:41 [<ffffffff82a22006>] ip6_route_output+0x1d6/0x2c0 net/ipv6/route.c:1224 [<ffffffff829e83d2>] ip6_dst_lookup_tail+0x4d2/0x890 net/ipv6/ip6_output.c:943 [<ffffffff829e889a>] ip6_dst_lookup_flow+0x9a/0x250 net/ipv6/ip6_output.c:1079 [<ffffffff82a9f7d8>] ip6_datagram_dst_update+0x538/0xd40 net/ipv6/datagram.c:91 [<ffffffff82aa0978>] __ip6_datagram_connect net/ipv6/datagram.c:251 [inline] [<ffffffff82aa0978>] ip6_datagram_connect+0x518/0xe50 net/ipv6/datagram.c:272 [<ffffffff82aa1313>] ip6_datagram_connect_v6_only+0x63/0x90 net/ipv6/datagram.c:284 [<ffffffff8292f790>] inet_dgram_connect+0x170/0x1f0 net/ipv4/af_inet.c:564 [<ffffffff82565547>] SYSC_connect+0x1a7/0x2f0 net/socket.c:1582 [<ffffffff8256a649>] SyS_connect+0x29/0x30 net/socket.c:1563 [<ffffffff82c72032>] entry_SYSCALL_64_fastpath+0x12/0x17 Object at ffff8800bc699380, in cache ip6_dst_cache size: 384 The root cause of it is that in fib6_add_rt2node(), when it replaces an existing route with the new one, it does not update fn->rr_ptr. This commit resets fn->rr_ptr to NULL when it points to a route which is replaced in fib6_add_rt2node(). Fixes: 27596472473a ("ipv6: fix ECMP route replacement") Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-18datagram: When peeking datagrams with offset < 0 don't skip empty skbsMatthew Dawson1-1/+2
Due to commit e6afc8ace6dd5cef5e812f26c72579da8806f5ac ("udp: remove headers from UDP packets before queueing"), when udp packets are being peeked the requested extra offset is always 0 as there is no need to skip the udp header. However, when the offset is 0 and the next skb is of length 0, it is only returned once. The behaviour can be seen with the following python script: from socket import *; f=socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK, 0); g=socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK, 0); f.bind(('::', 0)); addr=('::1', f.getsockname()[1]); g.sendto(b'', addr) g.sendto(b'b', addr) print(f.recvfrom(10, MSG_PEEK)); print(f.recvfrom(10, MSG_PEEK)); Where the expected output should be the empty string twice. Instead, make sk_peek_offset return negative values, and pass those values to __skb_try_recv_datagram/__skb_try_recv_from_queue. If the passed offset to __skb_try_recv_from_queue is negative, the checked skb is never skipped. __skb_try_recv_from_queue will then ensure the offset is reset back to 0 if a peek is requested without an offset, unless no packets are found. Also simplify the if condition in __skb_try_recv_from_queue. If _off is greater then 0, and off is greater then or equal to skb->len, then (_off || skb->len) must always be true assuming skb->len >= 0 is always true. Also remove a redundant check around a call to sk_peek_offset in af_unix.c, as it double checked if MSG_PEEK was set in the flags. V2: - Moved the negative fixup into __skb_try_recv_from_queue, and remove now redundant checks - Fix peeking in udp{,v6}_recvmsg to report the right value when the offset is 0 V3: - Marked new branch in __skb_try_recv_from_queue as unlikely. Signed-off-by: Matthew Dawson <matthew@mjdsystems.ca> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-15ipv6: fix NULL dereference in ip6_route_dev_notify()Eric Dumazet1-3/+3
Based on a syzkaller report [1], I found that a per cpu allocation failure in snmp6_alloc_dev() would then lead to NULL dereference in ip6_route_dev_notify(). It seems this is a very old bug, thus no Fixes tag in this submission. Let's add in6_dev_put_clear() helper, as we will probably use it elsewhere (once available/present in net-next) [1] kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 17294 Comm: syz-executor6 Not tainted 4.13.0-rc2+ #10 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff88019f456680 task.stack: ffff8801c6e58000 RIP: 0010:__read_once_size include/linux/compiler.h:250 [inline] RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline] RIP: 0010:refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178 RSP: 0018:ffff8801c6e5f1b0 EFLAGS: 00010202 RAX: 0000000000000037 RBX: dffffc0000000000 RCX: ffffc90005d25000 RDX: ffff8801c6e5f218 RSI: ffffffff82342bbf RDI: 0000000000000001 RBP: ffff8801c6e5f240 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff10038dcbe37 R13: 0000000000000006 R14: 0000000000000001 R15: 00000000000001b8 FS: 00007f21e0429700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001ddbc22000 CR3: 00000001d632b000 CR4: 00000000001426e0 DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Call Trace: refcount_dec_and_test+0x1a/0x20 lib/refcount.c:211 in6_dev_put include/net/addrconf.h:335 [inline] ip6_route_dev_notify+0x1c9/0x4a0 net/ipv6/route.c:3732 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1678 call_netdevice_notifiers net/core/dev.c:1694 [inline] rollback_registered_many+0x91c/0xe80 net/core/dev.c:7107 rollback_registered+0x1be/0x3c0 net/core/dev.c:7149 register_netdevice+0xbcd/0xee0 net/core/dev.c:7587 register_netdev+0x1a/0x30 net/core/dev.c:7669 loopback_net_init+0x76/0x160 drivers/net/loopback.c:214 ops_init+0x10a/0x570 net/core/net_namespace.c:118 setup_net+0x313/0x710 net/core/net_namespace.c:294 copy_net_ns+0x27c/0x580 net/core/net_namespace.c:418 create_new_namespaces+0x425/0x880 kernel/nsproxy.c:107 unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:206 SYSC_unshare kernel/fork.c:2347 [inline] SyS_unshare+0x653/0xfa0 kernel/fork.c:2297 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x4512c9 RSP: 002b:00007f21e0428c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000110 RAX: ffffffffffffffda RBX: 0000000000718150 RCX: 00000000004512c9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000062020200 RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b973d R13: 00000000ffffffff R14: 000000002001d000 R15: 00000000000002dd Code: 50 2b 34 82 c7 00 f1 f1 f1 f1 c7 40 04 04 f2 f2 f2 c7 40 08 f3 f3 f3 f3 e8 a1 43 39 ff 4c 89 f8 48 8b 95 70 ff ff ff 48 c1 e8 03 <0f> b6 0c 18 4c 89 f8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f 85 RIP: __read_once_size include/linux/compiler.h:250 [inline] RSP: ffff8801c6e5f1b0 RIP: atomic_read arch/x86/include/asm/atomic.h:26 [inline] RSP: ffff8801c6e5f1b0 RIP: refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178 RSP: ffff8801c6e5f1b0 ---[ end trace e441d046c6410d31 ]--- Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-14tcp: fix possible deadlock in TCP stack vs BPF filterEric Dumazet1-2/+2
Filtering the ACK packet was not put at the right place. At this place, we already allocated a child and put it into accept queue. We absolutely need to call tcp_child_process() to release its spinlock, or we will deadlock at accept() or close() time. Found by syzkaller team (Thanks a lot !) Fixes: 8fac365f63c8 ("tcp: Add a tcp_filter hook before handle ack packet") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Chenbo Feng <fengc@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-14ipv6: release rt6->rt6i_idev properly during ifdownWei Wang1-8/+5
When a dst is created by addrconf_dst_alloc() for a host route or an anycast route, dst->dev points to loopback dev while rt6->rt6i_idev points to a real device. When the real device goes down, the current cleanup code only checks for dst->dev and assumes rt6->rt6i_idev->dev is the same. This causes the refcount leak on the real device in the above situation. This patch makes sure to always release the refcount taken on rt6->rt6i_idev during dst_dev_put(). Fixes: 587fea741134 ("ipv6: mark DST_NOGC and remove the operation of dst_free()") Reported-by: John Stultz <john.stultz@linaro.org> Tested-by: John Stultz <john.stultz@linaro.org> Tested-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-10udp: consistently apply ufo or fragmentationWillem de Bruijn1-3/+4
When iteratively building a UDP datagram with MSG_MORE and that datagram exceeds MTU, consistently choose UFO or fragmentation. Once skb_is_gso, always apply ufo. Conversely, once a datagram is split across multiple skbs, do not consider ufo. Sendpage already maintains the first invariant, only add the second. IPv6 does not have a sendpage implementation to modify. A gso skb must have a partial checksum, do not follow sk_no_check_tx in udp_send_skb. Found by syzkaller. Fixes: e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-08net: avoid skb_warn_bad_offload false positives on UFOWillem de Bruijn1-1/+1
skb_warn_bad_offload triggers a warning when an skb enters the GSO stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL checksum offload set. Commit b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise") observed that SKB_GSO_DODGY producers can trigger the check and that passing those packets through the GSO handlers will fix it up. But, the software UFO handler will set ip_summed to CHECKSUM_NONE. When __skb_gso_segment is called from the receive path, this triggers the warning again. Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On Tx these two are equivalent. On Rx, this better matches the skb state (checksum computed), as CHECKSUM_NONE here means no checksum computed. See also this thread for context: http://patchwork.ozlabs.org/patch/799015/ Fixes: b2504a5dbef3 ("net: reduce skb_warn_bad_offload() noise") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-03ipv6: set rt6i_protocol properly in the route when it is installedXin Long1-8/+3
After commit c2ed1880fd61 ("net: ipv6: check route protocol when deleting routes"), ipv6 route checks rt protocol when trying to remove a rt entry. It introduced a side effect causing 'ip -6 route flush cache' not to work well. When flushing caches with iproute, all route caches get dumped from kernel then removed one by one by sending DELROUTE requests to kernel for each cache. The thing is iproute sends the request with the cache whose proto is set with RTPROT_REDIRECT by rt6_fill_node() when kernel dumps it. But in kernel the rt_cache protocol is still 0, which causes the cache not to be matched and removed. So the real reason is rt6i_protocol in the route is not set when it is allocated. As David Ahern's suggestion, this patch is to set rt6i_protocol properly in the route when it is installed and remove the codes setting rtm_protocol according to rt6i_flags in rt6_fill_node. This is also an improvement to keep rt6i_protocol consistent with rtm_protocol. Fixes: c2ed1880fd61 ("net: ipv6: check route protocol when deleting routes") Reported-by: Jianlin Shi <jishi@redhat.com> Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31udp6: fix jumbogram receptionPaolo Abeni2-1/+11
Since commit 67a51780aebb ("ipv6: udp: leverage scratch area helpers") udp6_recvmsg() read the skb len from the scratch area, to avoid a cache miss. But the UDP6 rx path support RFC 2675 UDPv6 jumbograms, and their length exceeds the 16 bits available in the scratch area. As a side effect the length returned by recvmsg() is: <ingress datagram len> % (1<<16) This commit addresses the issue allocating one more bit in the IP6CB flags field and setting it for incoming jumbograms. Such field is still in the first cacheline, so at recvmsg() time we can check it and fallback to access skb->len if required, without a measurable overhead. Fixes: 67a51780aebb ("ipv6: udp: leverage scratch area helpers") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29udp6: fix socket leak on early demuxPaolo Abeni1-9/+18
When an early demuxed packet reaches __udp6_lib_lookup_skb(), the sk reference is retrieved and used, but the relevant reference count is leaked and the socket destructor is never called. Beyond leaking the sk memory, if there are pending UDP packets in the receive queue, even the related accounted memory is leaked. In the long run, this will cause persistent forward allocation errors and no UDP skbs (both ipv4 and ipv6) will be able to reach the user-space. Fix this by explicitly accessing the early demux reference before the lookup, and properly decreasing the socket reference count after usage. Also drop the skb_steal_sock() in __udp6_lib_lookup_skb(), and the now obsoleted comment about "socket cache". The newly added code is derived from the current ipv4 code for the similar path. v1 -> v2: fixed the __udp6_lib_rcv() return code for resubmission, as suggested by Eric Reported-by: Sam Edwards <CFSworks@gmail.com> Reported-by: Marc Haber <mh+netdev@zugschlus.de> Fixes: 5425077d73e0 ("net: ipv6: Add early demux handler for UDP unicast") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-25ipv6: Don't increase IPSTATS_MIB_FRAGFAILS twice in ip6_fragment()Stefano Brivio1-4/+0
RFC 2465 defines ipv6IfStatsOutFragFails as: "The number of IPv6 datagrams that have been discarded because they needed to be fragmented at this output interface but could not be." The existing implementation, instead, would increase the counter twice in case we fail to allocate room for single fragments: once for the fragment, once for the datagram. This didn't look intentional though. In one of the two affected affected failure paths, the double increase was simply a result of a new 'goto fail' statement, introduced to avoid a skb leak. The other path appears to be affected since at least 2.6.12-rc2. Reported-by: Sabrina Dubroca <sdubroca@redhat.com> Fixes: 1d325d217c7f ("ipv6: ip6_fragment: fix headroom tests and skb leak") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-19ipv6: avoid overflow of offset in ip6_find_1stfragoptSabrina Dubroca1-2/+6
In some cases, offset can overflow and can cause an infinite loop in ip6_find_1stfragopt(). Make it unsigned int to prevent the overflow, and cap it at IPV6_MAXPLEN, since packets larger than that should be invalid. This problem has been here since before the beginning of git history. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-18ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check()Alexander Potapenko1-0/+1
KMSAN reported use of uninitialized memory in skb_set_hash_from_sk(), which originated from the TCP request socket created in cookie_v6_check(): ================================================================== BUG: KMSAN: use of uninitialized memory in tcp_transmit_skb+0xf77/0x3ec0 CPU: 1 PID: 2949 Comm: syz-execprog Not tainted 4.11.0-rc5+ #2931 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 TCP: request_sock_TCPv6: Possible SYN flooding on port 20028. Sending cookies. Check SNMP counters. Call Trace: <IRQ> __dump_stack lib/dump_stack.c:16 dump_stack+0x172/0x1c0 lib/dump_stack.c:52 kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927 __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469 skb_set_hash_from_sk ./include/net/sock.h:2011 tcp_transmit_skb+0xf77/0x3ec0 net/ipv4/tcp_output.c:983 tcp_send_ack+0x75b/0x830 net/ipv4/tcp_output.c:3493 tcp_delack_timer_handler+0x9a6/0xb90 net/ipv4/tcp_timer.c:284 tcp_delack_timer+0x1b0/0x310 net/ipv4/tcp_timer.c:309 call_timer_fn+0x240/0x520 kernel/time/timer.c:1268 expire_timers kernel/time/timer.c:1307 __run_timers+0xc13/0xf10 kernel/time/timer.c:1601 run_timer_softirq+0x36/0xa0 kernel/time/timer.c:1614 __do_softirq+0x485/0x942 kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 irq_exit+0x1fa/0x230 kernel/softirq.c:405 exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:657 smp_apic_timer_interrupt+0x5a/0x80 arch/x86/kernel/apic/apic.c:966 apic_timer_interrupt+0x86/0x90 arch/x86/entry/entry_64.S:489 RIP: 0010:native_restore_fl ./arch/x86/include/asm/irqflags.h:36 RIP: 0010:arch_local_irq_restore ./arch/x86/include/asm/irqflags.h:77 RIP: 0010:__msan_poison_alloca+0xed/0x120 mm/kmsan/kmsan_instr.c:440 RSP: 0018:ffff880024917cd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 RAX: 0000000000000246 RBX: ffff8800224c0000 RCX: 0000000000000005 RDX: 0000000000000004 RSI: ffff880000000000 RDI: ffffea0000b6d770 RBP: ffff880024917d58 R08: 0000000000000dd8 R09: 0000000000000004 R10: 0000160000000000 R11: 0000000000000000 R12: ffffffff85abf810 R13: ffff880024917dd8 R14: 0000000000000010 R15: ffffffff81cabde4 </IRQ> poll_select_copy_remaining+0xac/0x6b0 fs/select.c:293 SYSC_select+0x4b4/0x4e0 fs/select.c:653 SyS_select+0x76/0xa0 fs/select.c:634 entry_SYSCALL_64_fastpath+0x13/0x94 arch/x86/entry/entry_64.S:204 RIP: 0033:0x4597e7 RSP: 002b:000000c420037ee0 EFLAGS: 00000246 ORIG_RAX: 0000000000000017 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004597e7 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 000000c420037ef0 R08: 000000c420037ee0 R09: 0000000000000059 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000042dc20 R13: 00000000000000f3 R14: 0000000000000030 R15: 0000000000000003 chained origin: save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302 kmsan_save_stack mm/kmsan/kmsan.c:317 kmsan_internal_chain_origin+0x12a/0x1f0 mm/kmsan/kmsan.c:547 __msan_store_shadow_origin_4+0xac/0x110 mm/kmsan/kmsan_instr.c:259 tcp_create_openreq_child+0x709/0x1ae0 net/ipv4/tcp_minisocks.c:472 tcp_v6_syn_recv_sock+0x7eb/0x2a30 net/ipv6/tcp_ipv6.c:1103 tcp_get_cookie_sock+0x136/0x5f0 net/ipv4/syncookies.c:212 cookie_v6_check+0x17a9/0x1b50 net/ipv6/syncookies.c:245 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989 tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298 tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487 ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279 NF_HOOK ./include/linux/netfilter.h:257 ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322 dst_input ./include/net/dst.h:492 ip6_rcv_finish net/ipv6/ip6_input.c:69 NF_HOOK ./include/linux/netfilter.h:257 ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208 __netif_receive_skb net/core/dev.c:4246 process_backlog+0x667/0xba0 net/core/dev.c:4866 napi_poll net/core/dev.c:5268 net_rx_action+0xc95/0x1590 net/core/dev.c:5333 __do_softirq+0x485/0x942 kernel/softirq.c:284 origin: save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302 kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198 kmsan_kmalloc+0x7f/0xe0 mm/kmsan/kmsan.c:337 kmem_cache_alloc+0x1c2/0x1e0 mm/slub.c:2766 reqsk_alloc ./include/net/request_sock.h:87 inet_reqsk_alloc+0xa4/0x5b0 net/ipv4/tcp_input.c:6200 cookie_v6_check+0x4f4/0x1b50 net/ipv6/syncookies.c:169 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989 tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298 tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487 ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279 NF_HOOK ./include/linux/netfilter.h:257 ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322 dst_input ./include/net/dst.h:492 ip6_rcv_finish net/ipv6/ip6_input.c:69 NF_HOOK ./include/linux/netfilter.h:257 ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208 __netif_receive_skb net/core/dev.c:4246 process_backlog+0x667/0xba0 net/core/dev.c:4866 napi_poll net/core/dev.c:5268 net_rx_action+0xc95/0x1590 net/core/dev.c:5333 __do_softirq+0x485/0x942 kernel/softirq.c:284 ================================================================== Similar error is reported for cookie_v4_check(). Fixes: 58d607d3e52f ("tcp: provide skb->hash to synack packets") Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-06net: ipv6: Compare lwstate in detecting duplicate nexthopsDavid Ahern2-11/+2
Lennert reported a failure to add different mpls encaps in a multipath route: $ ip -6 route add 1234::/16 \ nexthop encap mpls 10 via fe80::1 dev ens3 \ nexthop encap mpls 20 via fe80::1 dev ens3 RTNETLINK answers: File exists The problem is that the duplicate nexthop detection does not compare lwtunnel configuration. Add it. Fixes: 19e42e451506 ("ipv6: support for fib route lwtunnel encap attributes") Signed-off-by: David Ahern <dsahern@gmail.com> Reported-by: João Taveira Araújo <joao.taveira@gmail.com> Reported-by: Lennert Buytenhek <buytenh@wantstofly.org> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Tested-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-05Merge branch 'work.memdup_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-8/+3
Pull memdup_user() conversions from Al Viro: "A fairly self-contained series - hunting down open-coded memdup_user() and memdup_user_nul() instances" * 'work.memdup_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: bpf: don't open-code memdup_user() kimage_file_prepare_segments(): don't open-code memdup_user() ethtool: don't open-code memdup_user() do_ip_setsockopt(): don't open-code memdup_user() do_ipv6_setsockopt(): don't open-code memdup_user() irda: don't open-code memdup_user() xfrm_user_policy(): don't open-code memdup_user() ima_write_policy(): don't open-code memdup_user_nul() sel_write_validatetrans(): don't open-code memdup_user_nul()
2017-07-04net, calipso: convert calipso_doi.refcount from atomic_t to refcount_tReshetova, Elena1-6/+6
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04net, ipv6: convert ip6addrlbl_entry.refcnt from atomic_t to refcount_tReshetova, Elena1-4/+5
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04net, ipv6: convert xfrm6_tunnel_spi.refcnt from atomic_t to refcount_tReshetova, Elena1-4/+4
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04net, ipv6: convert ifacaddr6.aca_refcnt from atomic_t to refcount_tReshetova, Elena1-3/+3
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04net, ipv6: convert ifmcaddr6.mca_refcnt from atomic_t to refcount_tReshetova, Elena1-9/+9
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04net, ipv6: convert inet6_ifaddr.refcnt from atomic_t to refcount_tReshetova, Elena1-1/+1
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04net, ipv6: convert inet6_dev.refcnt from atomic_t to refcount_tReshetova, Elena1-1/+1
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04net, ipv6: convert ipv6_txoptions.refcnt from atomic_t to refcount_tReshetova, Elena2-3/+3
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03ipv6: dad: don't remove dynamic addresses if link is downSabrina Dubroca1-9/+9
Currently, when the link for $DEV is down, this command succeeds but the address is removed immediately by DAD (1): ip addr add 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800 In the same situation, this will succeed and not remove the address (2): ip addr add 1111::12/64 dev $DEV ip addr change 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800 The comment in addrconf_dad_begin() when !IF_READY makes it look like this is the intended behavior, but doesn't explain why: * If the device is not ready: * - keep it tentative if it is a permanent address. * - otherwise, kill it. We clearly cannot prevent userspace from doing (2), but we can make (1) work consistently with (2). addrconf_dad_stop() is only called in two cases: if DAD failed, or to skip DAD when the link is down. In that second case, the fix is to avoid deleting the address, like we already do for permanent addresses. Fixes: 3c21edbd1137 ("[IPV6]: Defer IPv6 device initialization until the link becomes ready.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01net: convert netlbl_lsm_cache.refcount from atomic_t to refcount_tReshetova, Elena1-2/+2
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01net: convert sock.sk_refcnt from atomic_t to refcount_tReshetova, Elena4-7/+7
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. This patch uses refcount_inc_not_zero() instead of atomic_inc_not_zero_hint() due to absense of a _hint() version of refcount API. If the hint() version must be used, we might need to revisit API. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01net: convert sock.sk_wmem_alloc from atomic_t to refcount_tReshetova, Elena2-3/+3
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01net: convert sk_buff.users from atomic_t to refcount_tReshetova, Elena2-2/+2
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller8-8/+45
A set of overlapping changes in macvlan and the rocker driver, nothing serious. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller1-5/+5
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. This batch contains connection tracking updates for the cleanup iteration path, patches from Florian Westphal: X) Skip unconfirmed conntracks in nf_ct_iterate_cleanup_net(), just set dying bit to let the CPU release them. X) Add nf_ct_iterate_destroy() to be used on module removal, to kill conntrack from all namespace. X) Restart iteration on hashtable resizing, since both may occur at the same time. X) Use the new nf_ct_iterate_destroy() to remove conntrack with NAT mapping on module removal. X) Use nf_ct_iterate_destroy() to remove conntrack entries helper module removal, from Liping Zhang. X) Use nf_ct_iterate_cleanup_net() to remove the timeout extension if user requests this, also from Liping. X) Add net_ns_barrier() and use it from FTP helper, so make sure no concurrent namespace removal happens at the same time while the helper module is being removed. X) Use NFPROTO_MAX in layer 3 conntrack protocol array, to reduce module size. Same thing in nf_tables. Updates for the nf_tables infrastructure: X) Prepare usage of the extended ACK reporting infrastructure for nf_tables. X) Remove unnecessary forward declaration in nf_tables hash set. X) Skip set size estimation if number of element is not specified. X) Changes to accomodate a (faster) unresizable hash set implementation, for anonymous sets and dynamic size fixed sets with no timeouts. X) Faster lookup function for unresizable hash table for 2 and 4 bytes key. And, finally, a bunch of asorted small updates and cleanups: X) Do not hold reference to netdev from ipt_CLUSTER, instead subscribe to device events and look up for index from the packet path, this is fixing an issue that is present since the very beginning, patch from Xin Long. X) Use nf_register_net_hook() in ipt_CLUSTER, from Florian Westphal. X) Use ebt_invalid_target() whenever possible in the ebtables tree, from Gao Feng. X) Calm down compilation warning in nf_dup infrastructure, patch from stephen hemminger. X) Statify functions in nftables rt expression, also from stephen. X) Update Makefile to use canonical method to specify nf_tables-objs. From Jike Song. X) Use nf_conntrack_helpers_register() in amanda and H323. X) Space cleanup for ctnetlink, from linzhang. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-30do_ipv6_setsockopt(): don't open-code memdup_user()Al Viro1-8/+3
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-27ipv6: udp: leverage scratch area helpersPaolo Abeni1-5/+9
The commit b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue") leveraged the scratched area helpers for UDP v4 but I forgot to update accordingly the IPv6 code path. This change extends the scratch area usage to the IPv6 code, synching the two implementations and giving some performance benefit. IPv6 is again almost on the same level of IPv4, performance-wide. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.validateMatthias Schiffer4-6/+11
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.changelinkMatthias Schiffer4-4/+8
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-26net: add netlink_ext_ack argument to rtnl_link_ops.newlinkMatthias Schiffer4-4/+8
Add support for extended error reporting. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-25net: ipv6: reset daddr and dport in sk if connect() failsWei Wang2-2/+9
In __ip6_datagram_connect(), reset sk->sk_v6_daddr and inet->dport if error occurs. In udp_v6_early_demux(), check for sk_state to make sure it is in TCP_ESTABLISHED state. Together, it makes sure unconnected UDP socket won't be considered as a valid candidate for early demux. v3: add TCP_ESTABLISHED state check in udp_v6_early_demux() v2: fix compilation error Fixes: 5425077d73e0 ("net: ipv6: Add early demux handler for UDP unicast") Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-nextDavid S. Miller2-7/+3
Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-06-23 1) Use memdup_user to spmlify xfrm_user_policy. From Geliang Tang. 2) Make xfrm_dev_register static to silence a sparse warning. From Wei Yongjun. 3) Use crypto_memneq to check the ICV in the AH protocol. From Sabrina Dubroca. 4) Remove some unused variables in esp6. From Stephen Hemminger. 5) Extend XFRM MIGRATE to allow to change the UDP encapsulation port. From Antony Antony. 6) Include the UDP encapsulation port to km_migrate announcements. From Antony Antony. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsecDavid S. Miller2-1/+26
Steffen Klassert says: ==================== pull request (net): ipsec 2017-06-23 1) Fix xfrm garbage collecting when unregistering a netdevice. From Hangbin Liu. 2) Fix NULL pointer derefernce when exiting a network namespace. From Hangbin Liu. 3) Fix some error codes in pfkey to prevent a NULL pointer derefernce. From Dan Carpenter. 4) Fix NULL pointer derefernce on allocation failure in pfkey. From Dan Carpenter. 5) Adjust IPv6 payload_len to include extension headers. Otherwise we corrupt the packets when doing ESP GRO on transport mode. From Yossi Kuperman. 6) Set nhoff to the proper offset of the IPv6 nexthdr when doing ESP GRO. From Yossi Kuperman. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23sit: use __GFP_NOWARN for user controlled allocationWANG Cong1-1/+1
The memory allocation size is controlled by user-space, if it is too large just fail silently and return NULL, not to mention there is a fallback allocation later. Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23net: account for current skb length when deciding about UFOMichal Kubeček1-1/+1
Our customer encountered stuck NFS writes for blocks starting at specific offsets w.r.t. page boundary caused by networking stack sending packets via UFO enabled device with wrong checksum. The problem can be reproduced by composing a long UDP datagram from multiple parts using MSG_MORE flag: sendto(sd, buff, 1000, MSG_MORE, ...); sendto(sd, buff, 1000, MSG_MORE, ...); sendto(sd, buff, 3000, 0, ...); Assume this packet is to be routed via a device with MTU 1500 and NETIF_F_UFO enabled. When second sendto() gets into __ip_append_data(), this condition is tested (among others) to decide whether to call ip_ufo_append_data(): ((length + fragheaderlen) > mtu) || (skb && skb_is_gso(skb)) At the moment, we already have skb with 1028 bytes of data which is not marked for GSO so that the test is false (fragheaderlen is usually 20). Thus we append second 1000 bytes to this skb without invoking UFO. Third sendto(), however, has sufficient length to trigger the UFO path so that we end up with non-UFO skb followed by a UFO one. Later on, udp_send_skb() uses udp_csum() to calculate the checksum but that assumes all fragments have correct checksum in skb->csum which is not true for UFO fragments. When checking against MTU, we need to add skb->len to length of new segment if we already have a partially filled skb and fragheaderlen only if there isn't one. In the IPv6 case, skb can only be null if this is the first segment so that we have to use headersize (length of the first IPv6 header) rather than fragheaderlen (length of IPv6 header of further fragments) for skb == NULL. Fixes: e89e9cf539a2 ("[IPv4/IPv6]: UFO Scatter-gather approach") Fixes: e4c5e13aa45c ("ipv6: Should use consistent conditional judgement for ip6 fragment between __ip6_append_data and ip6_finish_output") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22udp/v6: prefetch rmem_alloc in udp6_queue_rcv_skb()Paolo Abeni1-0/+1
very similar to commit dd99e425be23 ("udp: prefetch rmem_alloc in udp_queue_rcv_skb()"), this allows saving a cache miss when the BH is bottle-neck for UDP over ipv6 packet processing, e.g. for small packets when a single RX NIC ingress queue is in use. Performances under flood when multiple NIC RX queues used are unaffected, but when a single NIC rx queue is in use, this gives ~8% performance improvement. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22ipv6: avoid unregistering inet6_dev for loopbackWANG Cong1-2/+3
The per netns loopback_dev->ip6_ptr is unregistered and set to NULL when its mtu is set to smaller than IPV6_MIN_MTU, this leads to that we could set rt->rt6i_idev NULL after a rt6_uncached_list_flush_dev() and then crash after another call. In this case we should just bring its inet6_dev down, rather than unregistering it, at least prior to commit 176c39af29bc ("netns: fix addrconf_ifdown kernel panic") we always override the case for loopback. Thanks a lot to Andrey for finding a reliable reproducer. Fixes: 176c39af29bc ("netns: fix addrconf_ifdown kernel panic") Reported-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Daniel Lezcano <dlezcano@fr.ibm.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22tcp: Add a tcp_filter hook before handle ack packetChenbo Feng1-0/+2
Currently in both ipv4 and ipv6 code path, the ack packet received when sk at TCP_NEW_SYN_RECV state is not filtered by socket filter or cgroup filter since it is handled from tcp_child_process and never reaches the tcp_filter inside tcp_v4_rcv or tcp_v6_rcv. Adding a tcp_filter hooks here can make sure all the ingress tcp packet can be correctly filtered. Signed-off-by: Chenbo Feng <fengc@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTERWANG Cong1-1/+5
In commit 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf") I assumed NETDEV_REGISTER and NETDEV_UNREGISTER are paired, unfortunately, as reported by jeffy, netdev_wait_allrefs() could rebroadcast NETDEV_UNREGISTER event until all refs are gone. We have to add an additional check to avoid this corner case. For netdev_wait_allrefs() dev->reg_state is NETREG_UNREGISTERED, for dev_change_net_namespace(), dev->reg_state is NETREG_REGISTERED. So check for dev->reg_state != NETREG_UNREGISTERED. Fixes: 242d3a49a2a1 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf") Reported-by: jeffy <jeffy.chen@rock-chips.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22esp6_offload: Fix IP6CB(skb)->nhoff for ESP GROYossi Kuperman1-0/+25
IP6CB(skb)->nhoff is the offset of the nexthdr field in an IPv6 header, unless there are extension headers present, in which case nhoff points to the nexthdr field of the last extension header. In non-GRO code path, nhoff is set by ipv6_rcv before any XFRM code is executed. Conversely, in GRO code path (when esp6_offload is loaded), nhoff is not set. The following functions fail to read the correct value and eventually the packet is dropped: xfrm6_transport_finish xfrm6_tunnel_input xfrm6_rcv_tnl Set nhoff to the proper offset of nexthdr in esp6_gro_receive. Fixes: 7785bba299a8 ("esp: Add a software GRO codepath") Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-06-22xfrm6: Fix IPv6 payload_len in xfrm6_transport_finishYossi Kuperman1-1/+1
IPv6 payload length indicates the size of the payload, including any extension headers. In xfrm6_transport_finish, ipv6_hdr(skb)->payload_len is set to the payload size only, regardless of the presence of any extension headers. After ESP GRO transport mode decapsulation, ipv6_rcv trims the packet according to the wrong payload_len, thus corrupting the packet. Set payload_len to account for extension headers as well. Fixes: 7785bba299a8 ("esp: Add a software GRO codepath") Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-06-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller4-23/+14
Two entries being added at the same time to the IFLA policy table, whilst parallel bug fixes to decnet routing dst handling overlapping with the dst gc removal in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21ip6mr: add netlink notifications on mrt6msg cache reportsJulien Gomes1-2/+69
Add Netlink notifications on cache reports in ip6mr, in addition to the existing mrt6msg sent to mroute6_sk. Send RTM_NEWCACHEREPORT notifications to RTNLGRP_IPV6_MROUTE_R. MSGTYPE, MIF_ID, SRC_ADDR and DST_ADDR Netlink attributes contain the same data as their equivalent fields in the mrt6msg header. PKT attribute is the packet sent to mroute6_sk, without the added mrt6msg header. Suggested-by: Ryan Halbrook <halbrook@arista.com> Signed-off-by: Julien Gomes <julien@arista.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20ipv6: Do not leak throw route referencesSerhey Popovych2-18/+7
While commit 73ba57bfae4a ("ipv6: fix backtracking for throw routes") does good job on error propagation to the fib_rules_lookup() in fib rules core framework that also corrects throw routes handling, it does not solve route reference leakage problem happened when we return -EAGAIN to the fib_rules_lookup() and leave routing table entry referenced in arg->result. If rule with matched throw route isn't last matched in the list we overwrite arg->result losing reference on throw route stored previously forever. We also partially revert commit ab997ad40839 ("ipv6: fix the incorrect return value of throw route") since we never return routing table entry with dst.error == -EAGAIN when CONFIG_IPV6_MULTIPLE_TABLES is on. Also there is no point to check for RTF_REJECT flag since it is always set throw route. Fixes: 73ba57bfae4a ("ipv6: fix backtracking for throw routes") Signed-off-by: Serhey Popovych <serhe.popovych@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>