aboutsummaryrefslogtreecommitdiffstats
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-06-26net: sched: pass extack pointer to block binds and cb registrationJohn Hurley2-10/+17
Pass the extact struct from a tc qdisc add to the block bind function and, in turn, to the setup_tc ndo of binding device via the tc_block_offload struct. Pass this back to any block callback registrations to allow netlink logging of fails in the bind process. Signed-off-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26l2tp: make l2tp_xmit_core() return voidGuillaume Nault1-4/+2
It always returns 0, and nobody reads the return value anyway. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26l2tp: avoid duplicate l2tp_pernet() callsGuillaume Nault1-2/+1
Replace 'l2tp_pernet(tunnel->l2tp_net)' with 'pn', which has been set on the preceding line. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26l2tp: don't export l2tp_tunnel_closeall()Guillaume Nault2-3/+1
This function is only used in l2tp_core.c. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26l2tp: don't export l2tp_session_queue_purge()Guillaume Nault2-3/+1
This function is only used in l2tp_core.c. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26l2tp: remove l2tp_tunnel_priv()Guillaume Nault1-7/+0
This function, and the associated .priv field, are unused. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26l2tp: remove .show from struct l2tp_tunnelGuillaume Nault2-6/+0
This callback has never been implemented. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26l2tp: remove pppol2tp_session_close()Guillaume Nault1-7/+0
l2tp_core.c verifies that ->session_close() is defined before calling it. There's no need for a stub. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26tcp: add SNMP counter for zero-window dropsYafang Shao2-2/+7
It will be helpful if we could display the drops due to zero window or no enough window space. So a new SNMP MIB entry is added to track this behavior. This entry is named LINUX_MIB_TCPZEROWINDOWDROP and published in /proc/net/netstat in TcpExt line as TCPZeroWindowDrop. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26net: Convert NAPI gro list into a small hash table.David Miller1-26/+79
Improve the performance of GRO receive by splitting flows into multiple hash chains. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26net: Convert GRO SKB handling to list_head.David Miller14-103/+93
Manage pending per-NAPI GRO packets via list_head. Return an SKB pointer from the GRO receive handlers. When GRO receive handlers return non-NULL, it means that this SKB needs to be completed at this time and removed from the NAPI queue. Several operations are greatly simplified by this transformation, especially timing out the oldest SKB in the list when gro_count exceeds MAX_GRO_SKBS, and napi_gro_flush() which walks the queue in reverse order. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller4-15/+24
2018-06-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds9-32/+47
Pull networking fixes from David Miller: 1) Fix netpoll OOPS in r8169, from Ville Syrjälä. 2) Fix bpf instruction alignment on powerpc et al., from Eric Dumazet. 3) Don't ignore IFLA_MTU attribute when creating new ipvlan links. From Xin Long. 4) Fix use after free in AF_PACKET, from Eric Dumazet. 5) Mis-matched RTNL unlock in xen-netfront, from Ross Lagerwall. 6) Fix VSOCK loopback on big-endian, from Claudio Imbrenda. 7) Missing RX buffer offset correction when computing DMA addresses in mvneta driver, from Antoine Tenart. 8) Fix crashes in DCCP's ccid3_hc_rx_send_feedback, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (34 commits) sfc: make function efx_rps_hash_bucket static strparser: Corrected typo in documentation. qmi_wwan: add support for the Dell Wireless 5821e module cxgb4: when disabling dcb set txq dcb priority to 0 net_sched: remove a bogus warning in hfsc net: dccp: switch rx_tstamp_last_feedback to monotonic clock net: dccp: avoid crash in ccid3_hc_rx_send_feedback() net: Remove depends on HAS_DMA in case of platform dependency MAINTAINERS: Add file patterns for dsa device tree bindings net: mscc: make sparse happy net: mvneta: fix the Rx desc DMA address in the Rx path Documentation: e1000: Fix docs build error Documentation: e100: Fix docs build error Documentation: e1000: Use correct heading adornment Documentation: e100: Use correct heading adornment ipv6: mcast: fix unsolicited report interval after receiving querys vhost_net: validate sock before trying to put its fd VSOCK: fix loopback on big-endian systems net: ethernet: ti: davinci_cpdma: make function cpdma_desc_pool_create static xen-netfront: Update features after registering netdev ...
2018-06-24tls: Removed unused variableVakul Garg1-3/+0
Removed unused variable 'rxm' from tls_queue(). Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-24net_sched: remove unused htb drop_listCong Wang1-13/+0
After commit a09ceb0e0814 ("sched: remove qdisc->drop"), it is no longer used. Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-23net_sched: remove a bogus warning in hfscCong Wang1-2/+2
In update_vf(): cftree_remove(cl); update_cfmin(cl->cl_parent); the cl_cfmin of cl->cl_parent is intentionally updated to 0 when that parent only has one child. And if this parent is root qdisc, we could end up, in hfsc_schedule_watchdog(), that we can't decide the next schedule time for qdisc watchdog. But it seems safe that we can just skip it, as this watchdog is not always scheduled anyway. Thanks to Marco for testing all the cases, nothing is broken. Reported-by: Marco Berizzi <pupilla@libero.it> Tested-by: Marco Berizzi <pupilla@libero.it> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-23net: drivers/net: Convert random_ether_addr to eth_random_addrJoe Perches1-1/+1
random_ether_addr is a #define for eth_random_addr which is generally preferred in kernel code by ~3:1 Convert the uses of random_ether_addr to enable removing the #define Miscellanea: o Convert &vfmac[0] to equivalent vfmac and avoid unnecessary line wrap Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-23net: dccp: switch rx_tstamp_last_feedback to monotonic clockEric Dumazet1-4/+7
To compute delays, better not use time of the day which can be changed by admins or malicious programs. Also change ccid3_first_li() to use s64 type for delta variable to avoid potential overflows. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk> Cc: dccp@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-23net: dccp: avoid crash in ccid3_hc_rx_send_feedback()Eric Dumazet1-3/+2
On fast hosts or malicious bots, we trigger a DCCP_BUG() which seems excessive. syzbot reported : BUG: delta (-6195) <= 0 at net/dccp/ccids/ccid3.c:628/ccid3_hc_rx_send_feedback() CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.18.0-rc1+ #112 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 ccid3_hc_rx_send_feedback net/dccp/ccids/ccid3.c:628 [inline] ccid3_hc_rx_packet_recv.cold.16+0x38/0x71 net/dccp/ccids/ccid3.c:793 ccid_hc_rx_packet_recv net/dccp/ccid.h:185 [inline] dccp_deliver_input_to_ccids+0xf0/0x280 net/dccp/input.c:180 dccp_rcv_established+0x87/0xb0 net/dccp/input.c:378 dccp_v4_do_rcv+0x153/0x180 net/dccp/ipv4.c:654 sk_backlog_rcv include/net/sock.h:914 [inline] __sk_receive_skb+0x3ba/0xd80 net/core/sock.c:517 dccp_v4_rcv+0x10f9/0x1f58 net/dccp/ipv4.c:875 ip_local_deliver_finish+0x2eb/0xda0 net/ipv4/ip_input.c:215 NF_HOOK include/linux/netfilter.h:287 [inline] ip_local_deliver+0x1e9/0x750 net/ipv4/ip_input.c:256 dst_input include/net/dst.h:450 [inline] ip_rcv_finish+0x823/0x2220 net/ipv4/ip_input.c:396 NF_HOOK include/linux/netfilter.h:287 [inline] ip_rcv+0xa18/0x1284 net/ipv4/ip_input.c:492 __netif_receive_skb_core+0x2488/0x3680 net/core/dev.c:4628 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693 process_backlog+0x219/0x760 net/core/dev.c:5373 napi_poll net/core/dev.c:5771 [inline] net_rx_action+0x7da/0x1980 net/core/dev.c:5837 __do_softirq+0x2e8/0xb17 kernel/softirq.c:284 run_ksoftirqd+0x86/0x100 kernel/softirq.c:645 smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164 kthread+0x345/0x410 kernel/kthread.c:240 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk> Cc: dccp@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-23ipv6: mcast: fix unsolicited report interval after receiving querysHangbin Liu1-3/+6
After recieving MLD querys, we update idev->mc_maxdelay with max_delay from query header. This make the later unsolicited reports have the same interval with mc_maxdelay, which means we may send unsolicited reports with long interval time instead of default configured interval time. Also as we will not call ipv6_mc_reset() after device up. This issue will be there even after leave the group and join other groups. Fixes: fc4eba58b4c14 ("ipv6: make unsolicited report intervals configurable for mld") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-22tcp_bbr: fix bbr pacing rate for internal pacingEric Dumazet2-15/+5
This commit makes BBR use only the MSS (without any headers) to calculate pacing rates when internal TCP-layer pacing is used. This is necessary to achieve the correct pacing behavior in this case, since tcp_internal_pacing() uses only the payload length to calculate pacing delays. Signed-off-by: Kevin Yang <yyd@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-22tcp: ignore rcv_rtt sample with old ts ecr valueWei Wang2-3/+12
When receiving multiple packets with the same ts ecr value, only try to compute rcv_rtt sample with the earliest received packet. This is because the rcv_rtt calculated by later received packets could possibly include long idle time or other types of delay. For example: (1) server sends last packet of reply with TS val V1 (2) client ACKs last packet of reply with TS ecr V1 (3) long idle time passes (4) client sends next request data packet with TS ecr V1 (again!) At this time, the rcv_rtt computed on server with TS ecr V1 will be inflated with the idle time and should get ignored. Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-22rhashtable: remove nulls_base and related code.NeilBrown1-2/+2
This "feature" is unused, undocumented, and untested and so doesn't really belong. A patch is under development to properly implement support for detecting when a search gets diverted down a different chain, which the common purpose of nulls markers. This patch actually fixes a bug too. The table resizing allows a table to grow to 2^31 buckets, but the hash is truncated to 27 bits - any growth beyond 2^27 is wasteful an ineffective. This patch results in NULLS_MARKER(0) being used for all chains, and leaves the use of rht_is_a_null() to test for it. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-22rhashtable: split rhashtable.hNeilBrown9-0/+9
Due to the use of rhashtables in net namespaces, rhashtable.h is included in lots of the kernel, so a small changes can required a large recompilation. This makes development painful. This patch splits out rhashtable-types.h which just includes the major type declarations, and does not include (non-trivial) inline code. rhashtable.h is no longer included by anything in the include/ directory. Common include files only include rhashtable-types.h so a large recompilation is only triggered when that changes. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-22VSOCK: fix loopback on big-endian systemsClaudio Imbrenda1-1/+1
The dst_cid and src_cid are 64 bits, therefore 64 bit accessors should be used, and in fact in virtio_transport_common.c only 64 bit accessors are used. Using 32 bit accessors for 64 bit values breaks big endian systems. This patch fixes a wrong use of le32_to_cpu in virtio_transport_send_pkt. Fixes: b9116823189e85ccf384 ("VSOCK: add loopback to virtio_transport") Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-22cls_flower: fix use after free in flower S/W pathPaolo Abeni1-4/+17
If flower filter is created without the skip_sw flag, fl_mask_put() can race with fl_classify() and we can destroy the mask rhashtable while a lookup operation is accessing it. BUG: unable to handle kernel paging request at 00000000000911d1 PGD 0 P4D 0 SMP PTI CPU: 3 PID: 5582 Comm: vhost-5541 Not tainted 4.18.0-rc1.vanilla+ #1950 Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016 RIP: 0010:rht_bucket_nested+0x20/0x60 Code: 31 c8 c1 c1 18 29 c8 c3 66 90 8b 4f 04 ba 01 00 00 00 8b 07 48 8b bf 80 00 00 0 RSP: 0018:ffffafc5cfbb7a48 EFLAGS: 00010206 RAX: 0000000000001978 RBX: ffff9f12dff88a00 RCX: 00000000ffff9f12 RDX: 00000000000911d1 RSI: 0000000000000148 RDI: 0000000000000001 RBP: ffff9f12dff88a00 R08: 000000005f1cc119 R09: 00000000a715fae2 R10: ffffafc5cfbb7aa8 R11: ffff9f1cb4be804e R12: ffff9f1265e13000 R13: 0000000000000000 R14: ffffafc5cfbb7b48 R15: ffff9f12dff88b68 FS: 0000000000000000(0000) GS:ffff9f1d3f0c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000911d1 CR3: 0000001575a94006 CR4: 00000000001626e0 Call Trace: fl_lookup+0x134/0x140 [cls_flower] fl_classify+0xf3/0x180 [cls_flower] tcf_classify+0x78/0x150 __netif_receive_skb_core+0x69e/0xa50 netif_receive_skb_internal+0x42/0xf0 tun_get_user+0xdd5/0xfd0 [tun] tun_sendmsg+0x52/0x70 [tun] handle_tx+0x2b3/0x5f0 [vhost_net] vhost_worker+0xab/0x100 [vhost] kthread+0xf8/0x130 ret_from_fork+0x35/0x40 Modules linked in: act_mirred act_gact cls_flower vhost_net vhost tap sch_ingress CR2: 00000000000911d1 Fix the above waiting for a RCU grace period before destroying the rhashtable: we need to use tcf_queue_work(), as rhashtable_destroy() must run in process context, as pointed out by Cong Wang. v1 -> v2: use tcf_queue_work to run rhashtable_destroy(). Fixes: 05cd271fd61a ("cls_flower: Support multiple masks per priority") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-22net/packet: fix use-after-freeEric Dumazet1-9/+7
We should put copy_skb in receive_queue only after a successful call to virtio_net_hdr_from_skb(). syzbot report : BUG: KASAN: use-after-free in __skb_unlink include/linux/skbuff.h:1843 [inline] BUG: KASAN: use-after-free in __skb_dequeue include/linux/skbuff.h:1863 [inline] BUG: KASAN: use-after-free in skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815 Read of size 8 at addr ffff8801b044ecc0 by task syz-executor217/4553 CPU: 0 PID: 4553 Comm: syz-executor217 Not tainted 4.18.0-rc1+ #111 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 __skb_unlink include/linux/skbuff.h:1843 [inline] __skb_dequeue include/linux/skbuff.h:1863 [inline] skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815 skb_queue_purge+0x26/0x40 net/core/skbuff.c:2852 packet_set_ring+0x675/0x1da0 net/packet/af_packet.c:4331 packet_release+0x630/0xd90 net/packet/af_packet.c:2991 __sock_release+0xd7/0x260 net/socket.c:603 sock_close+0x19/0x20 net/socket.c:1186 __fput+0x35b/0x8b0 fs/file_table.c:209 ____fput+0x15/0x20 fs/file_table.c:243 task_work_run+0x1ec/0x2a0 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x1b08/0x2750 kernel/exit.c:865 do_group_exit+0x177/0x440 kernel/exit.c:968 __do_sys_exit_group kernel/exit.c:979 [inline] __se_sys_exit_group kernel/exit.c:977 [inline] __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:977 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4448e9 Code: Bad RIP value. RSP: 002b:00007ffd5f777ca8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004448e9 RDX: 00000000004448e9 RSI: 000000000000fcfb RDI: 0000000000000001 RBP: 00000000006cf018 R08: 00007ffd0000a45b R09: 0000000000000000 R10: 00007ffd5f777e48 R11: 0000000000000202 R12: 00000000004021f0 R13: 0000000000402280 R14: 0000000000000000 R15: 0000000000000000 Allocated by task 4553: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554 skb_clone+0x1f5/0x500 net/core/skbuff.c:1282 tpacket_rcv+0x28f7/0x3200 net/packet/af_packet.c:2221 deliver_skb net/core/dev.c:1925 [inline] deliver_ptype_list_skb net/core/dev.c:1940 [inline] __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693 netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767 netif_receive_skb+0xbf/0x420 net/core/dev.c:4791 tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571 tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009 call_write_iter include/linux/fs.h:1795 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x6c6/0x9f0 fs/read_write.c:487 vfs_write+0x1f8/0x560 fs/read_write.c:549 ksys_write+0x101/0x260 fs/read_write.c:598 __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 4553: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kmem_cache_free+0x86/0x2d0 mm/slab.c:3756 kfree_skbmem+0x154/0x230 net/core/skbuff.c:582 __kfree_skb net/core/skbuff.c:642 [inline] kfree_skb+0x1a5/0x580 net/core/skbuff.c:659 tpacket_rcv+0x189e/0x3200 net/packet/af_packet.c:2385 deliver_skb net/core/dev.c:1925 [inline] deliver_ptype_list_skb net/core/dev.c:1940 [inline] __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693 netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767 netif_receive_skb+0xbf/0x420 net/core/dev.c:4791 tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571 tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009 call_write_iter include/linux/fs.h:1795 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x6c6/0x9f0 fs/read_write.c:487 vfs_write+0x1f8/0x560 fs/read_write.c:549 ksys_write+0x101/0x260 fs/read_write.c:598 __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8801b044ecc0 which belongs to the cache skbuff_head_cache of size 232 The buggy address is located 0 bytes inside of 232-byte region [ffff8801b044ecc0, ffff8801b044eda8) The buggy address belongs to the page: page:ffffea0006c11380 count:1 mapcount:0 mapping:ffff8801d9be96c0 index:0x0 flags: 0x2fffc0000000100(slab) raw: 02fffc0000000100 ffffea0006c17988 ffff8801d9bec248 ffff8801d9be96c0 raw: 0000000000000000 ffff8801b044e040 000000010000000c 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8801b044eb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8801b044ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc >ffff8801b044ec80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ^ ffff8801b044ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8801b044ed80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc Fixes: 58d19b19cd99 ("packet: vnet_hdr support for tpacket_rcv") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-22Merge tag 'nfs-for-4.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-3/+7
Pull NFS client bugfixes from Trond Myklebust: "Hightlights include: - fix an rcu deadlock in nfs_delegation_find_inode() - fix NFSv4 deadlocks due to not freeing the session slot in layoutget - don't send layoutreturn if the layout is already invalid - prevent duplicate XID allocation - flexfiles: Don't tie up all the rpciod threads in resends" * tag 'nfs-for-4.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: pNFS/flexfiles: Process writeback resends from nfsiod context as well pNFS/flexfiles: Don't tie up all the rpciod threads in resends sunrpc: Prevent duplicate XID allocation pNFS: Don't send layoutreturn if the layout is already invalid pNFS: Always free the session slot on error in nfs4_layoutget_handle_exception NFS: Fix an rcu deadlock in nfs_delegation_find_inode()
2018-06-21sctp: fix erroneous inc of snmp SctpFragUsrMsgsMarcelo Ricardo Leitner1-1/+3
Currently it is incrementing SctpFragUsrMsgs when the user message size is of the exactly same size as the maximum fragment size, which is wrong. The fix is to increment it only when user message is bigger than the maximum fragment size. Fixes: bfd2e4b8734d ("sctp: refactor sctp_datamsg_from_user") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-21strparser: Don't schedule in workqueue in paused stateVakul Garg1-4/+1
In function strp_data_ready(), it is useless to call queue_work if the state of strparser is already paused. The state checking should be done before calling queue_work. The change reduces the context switches and improves the ktls-rx throughput by approx 20% (measured on cortex-a53 based platform). Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-21bpfilter: fix user mode helper cross compilationMatteo Croce1-1/+1
Use $(OBJDUMP) instead of literal 'objdump' to avoid using host toolchain when cross compiling. Fixes: 421780fd4983 ("bpfilter: fix build error") Signed-off-by: Matteo Croce <mcroce@redhat.com> Reported-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20ip: limit use of gso_size to udpWillem de Bruijn2-2/+4
The ipcm(6)_cookie field gso_size is set only in the udp path. The ip layer copies this to cork only if sk_type is SOCK_DGRAM. This check proved too permissive. Ping and l2tp sockets have the same type. Limit to sockets of type SOCK_DGRAM and protocol IPPROTO_UDP to exclude ping sockets. v1 -> v2 - remove irrelevant whitespace changes Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT") Reported-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20bpfilter: ignore binary filesMatteo Croce1-0/+1
net/bpfilter/bpfilter_umh is a binary file generated when bpfilter is enabled, add it to .gitignore to avoid committing it. Fixes: d2ba09c17a064 ("net: add skeleton of bpfilter kernel module") Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20bpfilter: fix build errorMatteo Croce1-2/+4
bpfilter Makefile assumes that the system locale is en_US, and the parsing of objdump output fails. Set LC_ALL=C and, while at it, rewrite the objdump parsing so it spawns only 2 processes instead of 7. Fixes: d2ba09c17a064 ("net: add skeleton of bpfilter kernel module") Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20net/sched: act_ife: preserve the action control in case of errorDavide Caratti1-2/+1
in the following script # tc actions add action ife encode allow prio pass index 42 # tc actions replace action ife encode allow tcindex drop index 42 the action control should remain equal to 'pass', if the kernel failed to replace the TC action. Pospone the assignment of the action control, to ensure it is not overwritten in the error path of tcf_ife_init(). Fixes: ef6980b6becb ("introduce IFE action") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20net/sched: act_ife: fix recursive lock and idr leakDavide Caratti1-5/+4
a recursive lock warning [1] can be observed with the following script, # $TC actions add action ife encode allow prio pass index 42 IFE type 0xED3E # $TC actions replace action ife encode allow tcindex pass index 42 in case the kernel was unable to run the last command (e.g. because of the impossibility to load 'act_meta_skbtcindex'). For a similar reason, the kernel can leak idr in the error path of tcf_ife_init(), because tcf_idr_release() is not called after successful idr reservation: # $TC actions add action ife encode allow tcindex index 47 IFE type 0xED3E RTNETLINK answers: No such file or directory We have an error talking to the kernel # $TC actions add action ife encode allow tcindex index 47 IFE type 0xED3E RTNETLINK answers: No space left on device We have an error talking to the kernel # $TC actions add action ife encode use mark 7 type 0xfefe pass index 47 IFE type 0xFEFE RTNETLINK answers: No space left on device We have an error talking to the kernel Since tcfa_lock is already taken when the action is being edited, a call to tcf_idr_release() wrongly makes tcf_idr_cleanup() take the same lock again. On the other hand, tcf_idr_release() needs to be called in the error path of tcf_ife_init(), to undo the last tcf_idr_create() invocation. Fix both problems in tcf_ife_init(). Since the cleanup() routine can now be called when ife->params is NULL, also add a NULL pointer check to avoid calling kfree_rcu(NULL, rcu). [1] ============================================ WARNING: possible recursive locking detected 4.17.0-rc4.kasan+ #417 Tainted: G E -------------------------------------------- tc/3932 is trying to acquire lock: 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_cleanup+0x19/0x80 [act_ife] but task is already holding lock: 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&p->tcfa_lock)->rlock); lock(&(&p->tcfa_lock)->rlock); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by tc/3932: #0: 000000007ca8e990 (rtnl_mutex){+.+.}, at: tcf_ife_init+0xf61/0x13c0 [act_ife] #1: 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife] stack backtrace: CPU: 3 PID: 3932 Comm: tc Tainted: G E 4.17.0-rc4.kasan+ #417 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: dump_stack+0x9a/0xeb __lock_acquire+0xf43/0x34a0 ? debug_check_no_locks_freed+0x2b0/0x2b0 ? debug_check_no_locks_freed+0x2b0/0x2b0 ? debug_check_no_locks_freed+0x2b0/0x2b0 ? __mutex_lock+0x62f/0x1240 ? kvm_sched_clock_read+0x1a/0x30 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? find_held_lock+0x39/0x1d0 ? lock_acquire+0x10b/0x330 lock_acquire+0x10b/0x330 ? tcf_ife_cleanup+0x19/0x80 [act_ife] _raw_spin_lock_bh+0x38/0x70 ? tcf_ife_cleanup+0x19/0x80 [act_ife] tcf_ife_cleanup+0x19/0x80 [act_ife] __tcf_idr_release+0xff/0x350 tcf_ife_init+0xdde/0x13c0 [act_ife] ? ife_exit_net+0x290/0x290 [act_ife] ? __lock_is_held+0xb4/0x140 tcf_action_init_1+0x67b/0xad0 ? tcf_action_dump_old+0xa0/0xa0 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? kvm_sched_clock_read+0x1a/0x30 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? memset+0x1f/0x40 tcf_action_init+0x30f/0x590 ? tcf_action_init_1+0xad0/0xad0 ? memset+0x1f/0x40 tc_ctl_action+0x48e/0x5e0 ? mutex_lock_io_nested+0x1160/0x1160 ? tca_action_gd+0x990/0x990 ? sched_clock+0x5/0x10 ? find_held_lock+0x39/0x1d0 rtnetlink_rcv_msg+0x4da/0x990 ? validate_linkmsg+0x680/0x680 ? sched_clock_cpu+0x18/0x170 ? find_held_lock+0x39/0x1d0 netlink_rcv_skb+0x127/0x350 ? validate_linkmsg+0x680/0x680 ? netlink_ack+0x970/0x970 ? __kmalloc_node_track_caller+0x304/0x3a0 netlink_unicast+0x40f/0x5d0 ? netlink_attachskb+0x580/0x580 ? _copy_from_iter_full+0x187/0x760 ? import_iovec+0x90/0x390 netlink_sendmsg+0x67f/0xb50 ? netlink_unicast+0x5d0/0x5d0 ? copy_msghdr_from_user+0x206/0x340 ? netlink_unicast+0x5d0/0x5d0 sock_sendmsg+0xb3/0xf0 ___sys_sendmsg+0x60a/0x8b0 ? copy_msghdr_from_user+0x340/0x340 ? lock_downgrade+0x5e0/0x5e0 ? tty_write_lock+0x18/0x50 ? kvm_sched_clock_read+0x1a/0x30 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? find_held_lock+0x39/0x1d0 ? lock_downgrade+0x5e0/0x5e0 ? lock_acquire+0x10b/0x330 ? __audit_syscall_entry+0x316/0x690 ? current_kernel_time64+0x6b/0xd0 ? __fget_light+0x55/0x1f0 ? __sys_sendmsg+0xd2/0x170 __sys_sendmsg+0xd2/0x170 ? __ia32_sys_shutdown+0x70/0x70 ? syscall_trace_enter+0x57a/0xd60 ? rcu_read_lock_sched_held+0xdc/0x110 ? __bpf_trace_sys_enter+0x10/0x10 ? do_syscall_64+0x22/0x480 do_syscall_64+0xa5/0x480 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fd646988ba0 RSP: 002b:00007fffc9fab3c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fffc9fab4f0 RCX: 00007fd646988ba0 RDX: 0000000000000000 RSI: 00007fffc9fab440 RDI: 0000000000000003 RBP: 000000005b28c8b3 R08: 0000000000000002 R09: 0000000000000000 R10: 00007fffc9faae20 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fffc9fab504 R14: 0000000000000001 R15: 000000000066c100 Fixes: 4e8c86155010 ("net sched: net sched: ife action fix late binding") Fixes: ef6980b6becb ("introduce IFE action") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20net: propagate dev_get_valid_name return codeLi RongQing1-2/+2
if dev_get_valid_name failed, propagate its return code and remove the setting err to ENODEV, it will be set to 0 again before dev_change_net_namespace exits. Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20net/tcp: Fix socket lookups with SO_BINDTODEVICEDavid Ahern2-4/+4
Similar to 69678bcd4d2d ("udp: fix SO_BINDTODEVICE"), TCP socket lookups need to fail if dev_match is not true. Currently, a packet to a given port can match a socket bound to device when it should not. In the VRF case, this causes the lookup to hit a VRF socket and not a global socket resulting in a response trying to go through the VRF when it should not. Fixes: 3fa6f616a7a4d ("net: ipv4: add second dif to inet socket lookups") Fixes: 4297a0ef08572 ("net: ipv6: add second dif to inet6 socket lookups") Reported-by: Lou Berger <lberger@labn.net> Diagnosed-by: Renato Westphal <renato@opensourcerouting.org> Tested-by: Renato Westphal <renato@opensourcerouting.org> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20net/ipv6: respect rcu grace period before freeing fib6_infoEric Dumazet1-2/+3
syzbot reported use after free that is caused by fib6_info being freed without a proper RCU grace period. CPU: 0 PID: 1407 Comm: udevd Not tainted 4.17.0+ #39 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 __read_once_size include/linux/compiler.h:188 [inline] find_rr_leaf net/ipv6/route.c:705 [inline] rt6_select net/ipv6/route.c:761 [inline] fib6_table_lookup+0x12b7/0x14d0 net/ipv6/route.c:1823 ip6_pol_route+0x1c2/0x1020 net/ipv6/route.c:1856 ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2082 fib6_rule_lookup+0x211/0x6d0 net/ipv6/fib6_rules.c:122 ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2110 ip6_route_output include/net/ip6_route.h:82 [inline] icmpv6_xrlim_allow net/ipv6/icmp.c:211 [inline] icmp6_send+0x147c/0x2da0 net/ipv6/icmp.c:535 icmpv6_send+0x17a/0x300 net/ipv6/ip6_icmp.c:43 ip6_link_failure+0xa5/0x790 net/ipv6/route.c:2244 dst_link_failure include/net/dst.h:427 [inline] ndisc_error_report+0xd1/0x1c0 net/ipv6/ndisc.c:695 neigh_invalidate+0x246/0x550 net/core/neighbour.c:892 neigh_timer_handler+0xaf9/0xde0 net/core/neighbour.c:978 call_timer_fn+0x230/0x940 kernel/time/timer.c:1326 expire_timers kernel/time/timer.c:1363 [inline] __run_timers+0x79e/0xc50 kernel/time/timer.c:1666 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:284 invoke_softirq kernel/softirq.c:364 [inline] irq_exit+0x1d1/0x200 kernel/softirq.c:404 exiting_irq arch/x86/include/asm/apic.h:527 [inline] smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863 </IRQ> RIP: 0010:strlen+0x5e/0xa0 lib/string.c:482 Code: 24 00 74 3b 48 bb 00 00 00 00 00 fc ff df 4c 89 e0 48 83 c0 01 48 89 c2 48 89 c1 48 c1 ea 03 83 e1 07 0f b6 14 1a 38 ca 7f 04 <84> d2 75 23 80 38 00 75 de 48 83 c4 08 4c 29 e0 5b 41 5c 5d c3 48 RSP: 0018:ffff8801af117850 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: ffff880197f53bd0 RBX: dffffc0000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff81c5b06c RDI: ffff880197f53bc0 RBP: ffff8801af117868 R08: ffff88019a976540 R09: 0000000000000000 R10: ffff88019a976540 R11: 0000000000000000 R12: ffff880197f53bc0 R13: ffff880197f53bc0 R14: ffffffff899e4e90 R15: ffff8801d91c6a00 strlen include/linux/string.h:267 [inline] getname_kernel+0x24/0x370 fs/namei.c:218 open_exec+0x17/0x70 fs/exec.c:882 load_elf_binary+0x968/0x5610 fs/binfmt_elf.c:780 search_binary_handler+0x17d/0x570 fs/exec.c:1653 exec_binprm fs/exec.c:1695 [inline] __do_execve_file.isra.35+0x16fe/0x2710 fs/exec.c:1819 do_execveat_common fs/exec.c:1866 [inline] do_execve fs/exec.c:1883 [inline] __do_sys_execve fs/exec.c:1964 [inline] __se_sys_execve fs/exec.c:1959 [inline] __x64_sys_execve+0x8f/0xc0 fs/exec.c:1959 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f1576a46207 Code: 77 19 f4 48 89 d7 44 89 c0 0f 05 48 3d 00 f0 ff ff 76 e0 f7 d8 64 41 89 01 eb d8 f7 d8 64 41 89 01 eb df b8 3b 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 02 f3 c3 48 8b 15 00 8c 2d 00 f7 d8 64 89 02 RSP: 002b:00007ffff2784568 EFLAGS: 00000202 ORIG_RAX: 000000000000003b RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f1576a46207 RDX: 0000000001215b10 RSI: 00007ffff2784660 RDI: 00007ffff2785670 RBP: 0000000000625500 R08: 000000000000589c R09: 000000000000589c R10: 0000000000000000 R11: 0000000000000202 R12: 0000000001215b10 R13: 0000000000000007 R14: 0000000001204250 R15: 0000000000000005 Allocated by task 12188: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553 kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620 kmalloc include/linux/slab.h:513 [inline] kzalloc include/linux/slab.h:706 [inline] fib6_info_alloc+0xbb/0x280 net/ipv6/ip6_fib.c:152 ip6_route_info_create+0x782/0x2b50 net/ipv6/route.c:3013 ip6_route_add+0x23/0xb0 net/ipv6/route.c:3154 ipv6_route_ioctl+0x5a5/0x760 net/ipv6/route.c:3660 inet6_ioctl+0x100/0x1f0 net/ipv6/af_inet6.c:546 sock_do_ioctl+0xe4/0x3e0 net/socket.c:973 sock_ioctl+0x30d/0x680 net/socket.c:1097 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:500 [inline] do_vfs_ioctl+0x1cf/0x16f0 fs/ioctl.c:684 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701 __do_sys_ioctl fs/ioctl.c:708 [inline] __se_sys_ioctl fs/ioctl.c:706 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 1402: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kfree+0xd9/0x260 mm/slab.c:3813 fib6_info_destroy+0x29b/0x350 net/ipv6/ip6_fib.c:207 fib6_info_release include/net/ip6_fib.h:286 [inline] __ip6_del_rt_siblings net/ipv6/route.c:3235 [inline] ip6_route_del+0x11c4/0x13b0 net/ipv6/route.c:3316 ipv6_route_ioctl+0x616/0x760 net/ipv6/route.c:3663 inet6_ioctl+0x100/0x1f0 net/ipv6/af_inet6.c:546 sock_do_ioctl+0xe4/0x3e0 net/socket.c:973 sock_ioctl+0x30d/0x680 net/socket.c:1097 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:500 [inline] do_vfs_ioctl+0x1cf/0x16f0 fs/ioctl.c:684 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701 __do_sys_ioctl fs/ioctl.c:708 [inline] __se_sys_ioctl fs/ioctl.c:706 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8801b5df2580 which belongs to the cache kmalloc-256 of size 256 The buggy address is located 8 bytes inside of 256-byte region [ffff8801b5df2580, ffff8801b5df2680) The buggy address belongs to the page: page:ffffea0006d77c80 count:1 mapcount:0 mapping:ffff8801da8007c0 index:0xffff8801b5df2e40 flags: 0x2fffc0000000100(slab) raw: 02fffc0000000100 ffffea0006c5cc48 ffffea0007363308 ffff8801da8007c0 raw: ffff8801b5df2e40 ffff8801b5df2080 0000000100000006 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8801b5df2480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8801b5df2500: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc > ffff8801b5df2580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8801b5df2600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8801b5df2680: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb Fixes: a64efe142f5e ("net/ipv6: introduce fib6_info struct and helpers") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Ahern <dsahern@gmail.com> Reported-by: syzbot+9e6d75e3edef427ee888@syzkaller.appspotmail.com Acked-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20net/ncsi: Use netdev_dbg for debug messagesJoel Stanley2-21/+18
This moves all of the netdev_printk(KERN_DEBUG, ...) messages over to netdev_dbg. As Joe explains: > netdev_dbg is not included in object code unless > DEBUG is defined or CONFIG_DYNAMIC_DEBUG is set. > And then, it is not emitted into the log unless > DEBUG is set or this specific netdev_dbg is enabled > via the dynamic debug control file. Which is what we're after in this case. Acked-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20net/ncsi: Drop no more channels messageJoel Stanley1-2/+0
This does not provide useful information. As the ncsi maintainer said: > either we get a channel or broadcom has gone out to lunch Acked-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20net/ncsi: Silence debug messagesJoel Stanley2-9/+9
In normal operation we see this series of messages as the host drives the network device: ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state down ftgmac100 1e660000.ethernet eth0: NCSI: suspending channel 0 ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0 ftgmac100 1e660000.ethernet eth0: NCSI: channel 0 link down after config ftgmac100 1e660000.ethernet eth0: NCSI interface down ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state up ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0 ftgmac100 1e660000.ethernet eth0: NCSI interface up ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state down ftgmac100 1e660000.ethernet eth0: NCSI: suspending channel 0 ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0 ftgmac100 1e660000.ethernet eth0: NCSI: channel 0 link down after config ftgmac100 1e660000.ethernet eth0: NCSI interface down ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state up ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0 ftgmac100 1e660000.ethernet eth0: NCSI interface up This makes all of these messages netdev_dbg. They are still useful to debug eg. misbehaving network device firmware, but we do not need them filling up the kernel logs in normal operation. Acked-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-19sunrpc: Prevent duplicate XID allocationChuck Lever1-3/+7
Krzysztof Kozlowski <krzk@kernel.org> reports that a heavy NFSv4 WRITE workload against a slow NFS server causes his Raspberry Pi clients to stall. Krzysztof bisected it to commit 37ac86c3a76c ("SUNRPC: Initialize rpc_rqst outside of xprt->reserve_lock") . I was able to reproduce similar behavior and it appears that rarely the RPC client layer is re-allocating an XID for an RPC that it has already partially sent. This results in the client ignoring the subsequent reply, which carries the original XID. For various reasons, checking !req->rq_xmit_bytes_sent in xprt_prepare_transmit is not a 100% reliable mechanism for determining when a fresh XID is needed. Trond's preference is to allocate the XID at the time each rpc_rqst slot is initialized. This patch should also address a gcc 4.1.2 complaint reported by Geert Uytterhoeven <geert@linux-m68k.org>. Reported-by: Krzysztof Kozlowski <krzk@kernel.org> Fixes: 37ac86c3a76c ("SUNRPC: Initialize rpc_rqst outside of ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-17net_sched: blackhole: tell upper qdisc about dropped packetsKonstantin Khlebnikov1-1/+1
When blackhole is used on top of classful qdisc like hfsc it breaks qlen and backlog counters because packets are disappear without notice. In HFSC non-zero qlen while all classes are inactive triggers warning: WARNING: ... at net/sched/sch_hfsc.c:1393 hfsc_dequeue+0xba4/0xe90 [sch_hfsc] and schedules watchdog work endlessly. This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS, this flag tells upper layer: this packet is gone and isn't queued. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-17atm: Preserve value of skb->truesize when accounting to vccDavid Woodhouse7-14/+8
ATM accounts for in-flight TX packets in sk_wmem_alloc of the VCC on which they are to be sent. But it doesn't take ownership of those packets from the sock (if any) which originally owned them. They should remain owned by their actual sender until they've left the box. There's a hack in pskb_expand_head() to avoid adjusting skb->truesize for certain skbs, precisely to avoid messing up sk_wmem_alloc accounting. Ideally that hack would cover the ATM use case too, but it doesn't — skbs which aren't owned by any sock, for example PPP control frames, still get their truesize adjusted when the low-level ATM driver adds headroom. This has always been an issue, it seems. The truesize of a packet increases, and sk_wmem_alloc on the VCC goes negative. But this wasn't for normal traffic, only for control frames. So I think we just got away with it, and we probably needed to send 2GiB of LCP echo frames before the misaccounting would ever have caused a problem and caused atm_may_send() to start refusing packets. Commit 14afee4b609 ("net: convert sock.sk_wmem_alloc from atomic_t to refcount_t") did exactly what it was intended to do, and turned this mostly-theoretical problem into a real one, causing PPPoATM to fail immediately as sk_wmem_alloc underflows and atm_may_send() *immediately* starts refusing to allow new packets. The least intrusive solution to this problem is to stash the value of skb->truesize that was accounted to the VCC, in a new member of the ATM_SKB(skb) structure. Then in atm_pop_raw() subtract precisely that value instead of the then-current value of skb->truesize. Fixes: 158f323b9868 ("net: adjust skb->truesize in pskb_expand_head()") Signed-off-by: David Woodhouse <dwmw2@infradead.org> Tested-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2-17/+7
Daniel Borkmann says: ==================== pull-request: bpf 2018-06-16 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix a panic in devmap handling in generic XDP where return type of __devmap_lookup_elem() got changed recently but generic XDP code missed the related update, from Toshiaki. 2) Fix a freeze when BPF progs are loaded that include BPF to BPF calls when JIT is enabled where we would later bail out via error path w/o dropping kallsyms, and another one to silence syzkaller splats from locking prog read-only, from Daniel. 3) Fix a bug in test_offloads.py BPF selftest which must not assume that the underlying system have no BPF progs loaded prior to test, and one in bpftool to fix accuracy of program load time, from Jakub. 4) Fix a bug in bpftool's probe for availability of the bpf(2) BPF_TASK_FD_QUERY subcommand, from Yonghong. 5) Fix a regression in AF_XDP's XDP_SKB receive path where queue id check got erroneously removed, from Björn. 6) Fix missing state cleanup in BPF's xfrm tunnel test, from William. 7) Check tunnel type more accurately in BPF's tunnel collect metadata kselftest, from Jian. 8) Fix missing Kconfig fragments for BPF kselftests, from Anders. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds42-119/+226
Pull networking fixes from David Miller: 1) Various netfilter fixlets from Pablo and the netfilter team. 2) Fix regression in IPVS caused by lack of PMTU exceptions on local routes in ipv6, from Julian Anastasov. 3) Check pskb_trim_rcsum for failure in DSA, from Zhouyang Jia. 4) Don't crash on poll in TLS, from Daniel Borkmann. 5) Revert SO_REUSE{ADDR,PORT} change, it regresses various things including Avahi mDNS. From Bart Van Assche. 6) Missing of_node_put in qcom/emac driver, from Yue Haibing. 7) We lack checking of the TCP checking in one special case during SYN receive, from Frank van der Linden. 8) Fix module init error paths of mac80211 hwsim, from Johannes Berg. 9) Handle 802.1ad properly in stmmac driver, from Elad Nachman. 10) Must grab HW caps before doing quirk checks in stmmac driver, from Jose Abreu. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits) net: stmmac: Run HWIF Quirks after getting HW caps neighbour: skip NTF_EXT_LEARNED entries during forced gc net: cxgb3: add error handling for sysfs_create_group tls: fix waitall behavior in tls_sw_recvmsg tls: fix use-after-free in tls_push_record l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl() l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels mlxsw: spectrum_switchdev: Fix port_vlan refcounting mlxsw: spectrum_router: Align with new route replace logic mlxsw: spectrum_router: Allow appending to dev-only routes ipv6: Only emit append events for appended routes stmmac: added support for 802.1ad vlan stripping cfg80211: fix rcu in cfg80211_unregister_wdev mac80211: Move up init of TXQs mac80211_hwsim: fix module init error paths cfg80211: initialize sinfo in cfg80211_get_station nl80211: fix some kernel doc tag mistakes hv_netvsc: Fix the variable sizes in ipsecv2 and rsc offload rds: avoid unenecessary cong_update in loop transport l2tp: clean up stale tunnel or session in pppol2tp_connect's error path ...
2018-06-15xdp: Fix handling of devmap in generic XDPToshiaki Makita1-17/+4
Commit 67f29e07e131 ("bpf: devmap introduce dev_map_enqueue") changed the return value type of __devmap_lookup_elem() from struct net_device * to struct bpf_dtab_netdev * but forgot to modify generic XDP code accordingly. Thus generic XDP incorrectly used struct bpf_dtab_netdev where struct net_device is expected, then skb->dev was set to invalid value. v2: - Fix compiler warning without CONFIG_BPF_SYSCALL. Fixes: 67f29e07e131 ("bpf: devmap introduce dev_map_enqueue") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-15neighbour: skip NTF_EXT_LEARNED entries during forced gcRoopa Prabhu1-4/+6
Commit 9ce33e46531d ("neighbour: support for NTF_EXT_LEARNED flag") added support for NTF_EXT_LEARNED for neighbour entries. NTF_EXT_LEARNED entries are neigh entries managed by control plane (eg: Ethernet VPN implementation in FRR routing suite). Periodic gc already excludes these entries. This patch extends it to forced gc which the earlier patch missed. Fixes: 9ce33e46531d ("neighbour: support for NTF_EXT_LEARNED flag") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15tls: fix waitall behavior in tls_sw_recvmsgDaniel Borkmann1-1/+5
Current behavior in tls_sw_recvmsg() is to wait for incoming tls messages and copy up to exactly len bytes of data that the user provided. This is problematic in the sense that i) if no packet is currently queued in strparser we keep waiting until one has been processed and pushed into tls receive layer for tls_wait_data() to wake up and push the decrypted bits to user space. Given after tls decryption, we're back at streaming data, use sock_rcvlowat() hint from tcp socket instead. Retain current behavior with MSG_WAITALL flag and otherwise use the hint target for breaking the loop and returning to application. This is done if currently no ctx->recv_pkt is ready, otherwise continue to process it from our strparser backlog. Fixes: c46234ebb4d1 ("tls: RX path for ktls") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>