aboutsummaryrefslogtreecommitdiffstats
path: root/net (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-02-01net/sched: act_sample: Fix error path in initYotam Gigi1-1/+4
Fix error path of in sample init, by releasing the tc hash in case of failure in psample_group creation. Fixes: 5c5670fae430 ("net/sched: Introduce sample tc action") Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01net: ipv6: add NLM_F_APPEND in notifications when applicableDavid Ahern1-0/+3
IPv6 does not set the NLM_F_APPEND flag in notifications to signal that a NEWROUTE is an append versus a new route or a replaced one. Add the flag if the request has it. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01net: reduce skb_warn_bad_offload() noiseEric Dumazet1-3/+9
Dmitry reported warnings occurring in __skb_gso_segment() [1] All SKB_GSO_DODGY producers can allow user space to feed packets that trigger the current check. We could prevent them from doing so, rejecting packets, but this might add regressions to existing programs. It turns out our SKB_GSO_DODGY handlers properly set up checksum information that is needed anyway when packets needs to be segmented. By checking again skb_needs_check() after skb_mac_gso_segment(), we should remove these pesky warnings, at a very minor cost. With help from Willem de Bruijn [1] WARNING: CPU: 1 PID: 6768 at net/core/dev.c:2439 skb_warn_bad_offload+0x2af/0x390 net/core/dev.c:2434 lo: caps=(0x000000a2803b7c69, 0x0000000000000000) len=138 data_len=0 gso_size=15883 gso_type=4 ip_summed=0 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 6768 Comm: syz-executor1 Not tainted 4.9.0 #5 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 ffff8801c063ecd8 ffffffff82346bdf ffffffff00000001 1ffff100380c7d2e ffffed00380c7d26 0000000041b58ab3 ffffffff84b37e38 ffffffff823468f1 ffffffff84820740 ffffffff84f289c0 dffffc0000000000 ffff8801c063ee20 Call Trace: [<ffffffff82346bdf>] __dump_stack lib/dump_stack.c:15 [inline] [<ffffffff82346bdf>] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 [<ffffffff81827e34>] panic+0x1fb/0x412 kernel/panic.c:179 [<ffffffff8141f704>] __warn+0x1c4/0x1e0 kernel/panic.c:542 [<ffffffff8141f7e5>] warn_slowpath_fmt+0xc5/0x100 kernel/panic.c:565 [<ffffffff8356cbaf>] skb_warn_bad_offload+0x2af/0x390 net/core/dev.c:2434 [<ffffffff83585cd2>] __skb_gso_segment+0x482/0x780 net/core/dev.c:2706 [<ffffffff83586f19>] skb_gso_segment include/linux/netdevice.h:3985 [inline] [<ffffffff83586f19>] validate_xmit_skb+0x5c9/0xc20 net/core/dev.c:2969 [<ffffffff835892bb>] __dev_queue_xmit+0xe6b/0x1e70 net/core/dev.c:3383 [<ffffffff8358a2d7>] dev_queue_xmit+0x17/0x20 net/core/dev.c:3424 [<ffffffff83ad161d>] packet_snd net/packet/af_packet.c:2930 [inline] [<ffffffff83ad161d>] packet_sendmsg+0x32ed/0x4d30 net/packet/af_packet.c:2955 [<ffffffff834f0aaa>] sock_sendmsg_nosec net/socket.c:621 [inline] [<ffffffff834f0aaa>] sock_sendmsg+0xca/0x110 net/socket.c:631 [<ffffffff834f329a>] ___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1954 [<ffffffff834f5e58>] __sys_sendmsg+0x138/0x300 net/socket.c:1988 [<ffffffff834f604d>] SYSC_sendmsg net/socket.c:1999 [inline] [<ffffffff834f604d>] SyS_sendmsg+0x2d/0x50 net/socket.c:1995 [<ffffffff84371941>] entry_SYSCALL_64_fastpath+0x1f/0xc2 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01rtnetlink: Handle IFLA_MASTER parameter when processing rtnl_newlinkTheuns Verwoerd1-1/+6
Allow a master interface to be specified as one of the parameters when creating a new interface via rtnl_newlink. Previously this would require invoking interface creation, waiting for it to complete, and then separately binding that new interface to a master. In particular, this is used when creating a macvlan child interface for VRRP in a VRF configuration, allowing the interface creator to specify directly what master interface should be inherited by the child, without having to deal with asynchronous complications and potential race conditions. Signed-off-by: Theuns Verwoerd <theuns.verwoerd@alliedtelesis.co.nz> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-01Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-nextDavid S. Miller10-199/+575
Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2017-02-01 1) Some typo fixes, from Alexander Alemayhu. 2) Don't acquire state lock in get_mtu functions. The only rece against a dead state does not matter. From Florian Westphal. 3) Remove xfrm4_state_fini, it is unused for more than 10 years. From Florian Westphal. 4) Various rcu usage improvements. From Florian Westphal. 5) Properly handle crypto arrors in ah4/ah6. From Gilad Ben-Yossef. 6) Try to avoid skb linearization in esp4 and esp6. 7) The esp trailer is now set up in different places, add a helper for this. 8) With the upcomming usage of gro_cells in IPsec, a gro merged skb can have a secpath. Drop it before freeing or reusing the skb. 9) Add a xfrm dummy network device for napi. With this we can use gro_cells from within xfrm, it allows IPsec GRO without impact on the generic networking code. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-31net: ethtool: convert large order kmalloc allocations to vzallocAlexei Starovoitov1-17/+22
under memory pressure 'ethtool -S' command may warn: [ 2374.385195] ethtool: page allocation failure: order:4, mode:0x242c0c0 [ 2374.405573] CPU: 12 PID: 40211 Comm: ethtool Not tainted [ 2374.423071] Call Trace: [ 2374.423076] [<ffffffff8148cb29>] dump_stack+0x4d/0x64 [ 2374.423080] [<ffffffff811667cb>] warn_alloc_failed+0xeb/0x150 [ 2374.423082] [<ffffffff81169cd3>] ? __alloc_pages_direct_compact+0x43/0xf0 [ 2374.423084] [<ffffffff8116a25c>] __alloc_pages_nodemask+0x4dc/0xbf0 [ 2374.423091] [<ffffffffa0023dc2>] ? cmd_exec+0x722/0xcd0 [mlx5_core] [ 2374.423095] [<ffffffff811b3dcc>] alloc_pages_current+0x8c/0x110 [ 2374.423097] [<ffffffff81168859>] alloc_kmem_pages+0x19/0x90 [ 2374.423099] [<ffffffff81186e5e>] kmalloc_order_trace+0x2e/0xe0 [ 2374.423101] [<ffffffff811c0084>] __kmalloc+0x204/0x220 [ 2374.423105] [<ffffffff816c269e>] dev_ethtool+0xe4e/0x1f80 [ 2374.423106] [<ffffffff816b967e>] ? dev_get_by_name_rcu+0x5e/0x80 [ 2374.423108] [<ffffffff816d6926>] dev_ioctl+0x156/0x560 [ 2374.423111] [<ffffffff811d4c68>] ? mem_cgroup_commit_charge+0x78/0x3c0 [ 2374.423117] [<ffffffff8169d542>] sock_do_ioctl+0x42/0x50 [ 2374.423119] [<ffffffff8169d9c3>] sock_ioctl+0x1b3/0x250 [ 2374.423121] [<ffffffff811f0f42>] do_vfs_ioctl+0x92/0x580 [ 2374.423123] [<ffffffff8100222b>] ? do_audit_syscall_entry+0x4b/0x70 [ 2374.423124] [<ffffffff8100287c>] ? syscall_trace_enter_phase1+0xfc/0x120 [ 2374.423126] [<ffffffff811f14a9>] SyS_ioctl+0x79/0x90 [ 2374.423127] [<ffffffff81002bb0>] do_syscall_64+0x50/0xa0 [ 2374.423129] [<ffffffff817e19bc>] entry_SYSCALL64_slow_path+0x25/0x25 ~1160 mlx5 counters ~= order 4 allocation which is unlikely to succeed under memory pressure. Convert them to vzalloc() as ethtool_get_regs() does. Also take care of drivers without counters similar to commit 67ae7cf1eeda ("ethtool: Allow zero-length register dumps again") and reduce warn_on to warn_on_once. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30smc: some potential use after free bugsDan Carpenter1-0/+5
Say we got really unlucky and these failed on the last iteration, then it could lead to a use after free bug. Fixes: cd6851f30386 ("smc: remote memory buffers (RMBs)") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30net: dsa: Add plumbing for port mirroringFlorian Fainelli2-1/+139
Add necessary plumbing at the slave network device level to have switch drivers implement ndo_setup_tc() and most particularly the cls_matchall classifier. We add support for two switch operations: port_add_mirror and port_del_mirror() which configure, on a per-port basis the mirror parameters requested from the cls_matchall classifier. Code is largely borrowed from the Mellanox Spectrum switch driver. Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30lwtunnel: remove device arg to lwtunnel_build_stateDavid Ahern8-28/+17
Nothing about lwt state requires a device reference, so remove the input argument. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30net: Avoid receiving packets with an l3mdev on unbound UDP socketsRobert Shearman3-14/+51
Packets arriving in a VRF currently are delivered to UDP sockets that aren't bound to any interface. TCP defaults to not delivering packets arriving in a VRF to unbound sockets. IP route lookup and socket transmit both assume that unbound means using the default table and UDP applications that haven't been changed to be aware of VRFs may not function correctly in this case since they may not be able to handle overlapping IP address ranges, or be able to send packets back to the original sender if required. So add a sysctl, udp_l3mdev_accept, to control this behaviour with it being analgous to the existing tcp_l3mdev_accept, namely to allow a process to have a VRF-global listen socket. Have this default to off as this is the behaviour that users will expect, given that there is no explicit mechanism to set unmodified VRF-unaware application into a default VRF. Signed-off-by: Robert Shearman <rshearma@brocade.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30net: dsa: Hook {get,set}_rxnfc ethtool operationsFlorian Fainelli1-0/+26
In preparation for adding support for CFP/TCAMP in the bcm_sf2 driver add the plumbing to call into driver specific {get,set}_rxnfc operations. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30xfrm: Add a dummy network device for napi.Steffen Klassert1-1/+11
This patch adds a dummy network device so that we can use gro_cells for IPsec GRO. With this, we handle IPsec GRO with no impact on the generic networking code. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-30net: Drop secpath on free after gro merge.Steffen Klassert1-0/+2
With a followup patch, a gro merged skb can have a secpath. So drop it before freeing or reusing the skb. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-29net: add devm version of alloc_etherdev_mqs functionRafał Miłecki1-0/+28
This patch adds devm_alloc_etherdev_mqs function and devm_alloc_etherdev macro. These can be used for simpler netdev allocation without having to care about calling free_netdev. Thanks to this change drivers, their error paths and removal paths may get simpler by a bit. Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29Merge tag 'batadv-next-for-davem-20170128' of git://git.open-mesh.org/linux-mergeDavid S. Miller3-3/+5
Simon Wunderlich says: ==================== Here are two fixes for batman-adv for net-next: - fix double call of dev_queue_xmit(), caused by the recent introduction of net_xmit_eval(), by Sven Eckelmann - Fix includes for IS_ERR/ERR_PTR, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29tcp: include locally failed retries in retransmission statsYuchung Cheng1-9/+9
Currently the retransmission stats are not incremented if the retransmit fails locally. But we always increment the other packet counters that track total packet/bytes sent. Awkwardly while we don't count these failed retransmits in RETRANSSEGS, we do count them in FAILEDRETRANS. If the qdisc is dropping many packets this could under-estimate TCP retransmission rate substantially from both SNMP or per-socket TCP_INFO stats. This patch changes this by always incrementing retransmission stats on retransmission attempts and failures. Another motivation is to properly track retransmists in SCM_TIMESTAMPING_OPT_STATS. Since SCM_TSTAMP_SCHED collection is triggered in tcp_transmit_skb(), If tp->total_retrans is incremented after the function, we'll always mis-count by the amount of the latest retransmission. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29tcp: record pkts sent and retransmisttedYuchung Cheng1-1/+5
Add two stats in SCM_TIMESTAMPING_OPT_STATS: TCP_NLA_DATA_SEGS_OUT: total data packets sent including retransmission TCP_NLA_TOTAL_RETRANS: total data packets retransmitted The names are picked to be consistent with corresponding fields in TCP_INFO. This allows applications that are using the timestamping API to measure latency stats to also retrive retransmission rate of application write. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29openvswitch: Simplify do_execute_actions().andy zhou1-22/+20
do_execute_actions() implements a worthwhile optimization: in case an output action is the last action in an action list, skb_clone() can be avoided by outputing the current skb. However, the implementation is more complicated than necessary. This patch simplify this logic. Signed-off-by: Andy Zhou <azhou@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29net: dsa: pass bridge device when a port leavesVivien Didelot1-5/+5
Upon reception of the NETDEV_CHANGEUPPER, a leaving port is already unbridged, so reflect this by assigning the port's bridge_dev pointer to NULL before calling the port_bridge_leave DSA driver operation. Now that the bridge_dev pointer is exposed to the drivers, reflecting the current state of the DSA switch fabric is necessary for the drivers to adjust their port based VLANs correctly. Pass the bridge device pointer to the port_bridge_leave operation so that drivers have all information to re-program their chips properly, and do not need to cache it anymore. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29net: dsa: move bridge device in dsa_portVivien Didelot2-6/+5
Move the bridge_dev pointer from dsa_slave_priv to dsa_port so that DSA drivers can access this information and remove the need to cache it. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29net: dsa: store a dsa_port in dsa_slave_privVivien Didelot7-100/+96
Store a pointer to the dsa_port structure in the dsa_slave_priv structure, instead of the switch/port index. This will allow to store more information such as the bridge device, needed in DSA drivers for multi-chip configuration. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29net: dsa: add ds and index to dsa_portVivien Didelot1-0/+6
Add the physical switch instance and port index a DSA port belongs to to the dsa_port structure. That can be used later to retrieve information about a physical port when configuring a switch fabric, or lighten up struct dsa_slave_priv. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29net: dsa: use ds->num_ports when possibleVivien Didelot7-19/+19
The dsa_switch structure contains the number of ports. Use it where the structure is valid instead of the DSA_MAX_PORTS value. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29net: dsa: variable number of portsVivien Didelot2-3/+18
Change the ports[DSA_MAX_PORTS] array of the dsa_switch structure for a zero-length array, allocated at the same time as the dsa_switch structure itself. A dsa_switch_alloc() helper is provided for that. This commit brings no functional change yet since we pass DSA_MAX_PORTS as the number of ports for the moment. Future patches can update the DSA drivers separately to support dynamic number of ports. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller53-269/+408
Two trivial overlapping changes conflicts in MPLS and mlx5. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-28batman-adv: Fix includes for IS_ERR/ERR_PTRSven Eckelmann2-2/+2
IS_ERR/ERR_PTR are not defined in linux/device.h but in linux/err.h. The files using these macros therefore have to include the correct one. Reported-by: Linus Luessing <linus.luessing@web.de> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-28batman-adv: Fix double call of dev_queue_xmitSven Eckelmann1-1/+3
The net_xmit_eval has side effects because it is not making sure that e isn't evaluated twice. #define net_xmit_eval(e) ((e) == NET_XMIT_CN ? 0 : (e)) The code requested by David Miller [1] return net_xmit_eval(dev_queue_xmit(skb)); will get transformed into return ((dev_queue_xmit(skb)) == NET_XMIT_CN ? 0 : (dev_queue_xmit(skb))) dev_queue_xmit will therefore be tried again (with an already consumed skb) whenever the return code is not NET_XMIT_CN. [1] https://lkml.kernel.org/r/20170125.225624.965229145391320056.davem@davemloft.net Fixes: c33705188c49 ("batman-adv: Treat NET_XMIT_CN as transmit successfully") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds52-268/+407
Pull networking fixes from David Miller: 1) GTP fixes from Andreas Schultz (missing genl module alias, clear IP DF on transmit). 2) Netfilter needs to reflect the fwmark when sending resets, from Pau Espin Pedrol. 3) nftable dump OOPS fix from Liping Zhang. 4) Fix erroneous setting of VIRTIO_NET_HDR_F_DATA_VALID on transmit, from Rolf Neugebauer. 5) Fix build error of ipt_CLUSTERIP when procfs is disabled, from Arnd Bergmann. 6) Fix regression in handling of NETIF_F_SG in harmonize_features(), from Eric Dumazet. 7) Fix RTNL deadlock wrt. lwtunnel module loading, from David Ahern. 8) tcp_fastopen_create_child() needs to setup tp->max_window, from Alexey Kodanev. 9) Missing kmemdup() failure check in ipv6 segment routing code, from Eric Dumazet. 10) Don't execute unix_bind() under the bindlock, otherwise we deadlock with splice. From WANG Cong. 11) ip6_tnl_parse_tlv_enc_lim() potentially reallocates the skb buffer, therefore callers must reload cached header pointers into that skb. Fix from Eric Dumazet. 12) Fix various bugs in legacy IRQ fallback handling in alx driver, from Tobias Regnery. 13) Do not allow lwtunnel drivers to be unloaded while they are referenced by active instances, from Robert Shearman. 14) Fix truncated PHY LED trigger names, from Geert Uytterhoeven. 15) Fix a few regressions from virtio_net XDP support, from John Fastabend and Jakub Kicinski. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (102 commits) ISDN: eicon: silence misleading array-bounds warning net: phy: micrel: add support for KSZ8795 gtp: fix cross netns recv on gtp socket gtp: clear DF bit on GTP packet tx gtp: add genl family modules alias tcp: don't annotate mark on control socket from tcp_v6_send_response() ravb: unmap descriptors when freeing rings virtio_net: reject XDP programs using header adjustment virtio_net: use dev_kfree_skb for small buffer XDP receive r8152: check rx after napi is enabled r8152: re-schedule napi for tx r8152: avoid start_xmit to schedule napi when napi is disabled r8152: avoid start_xmit to call napi_schedule during autosuspend net: dsa: Bring back device detaching in dsa_slave_suspend() net: phy: leds: Fix truncated LED trigger names net: phy: leds: Break dependency of phy.h on phy_led_triggers.h net: phy: leds: Clear phy_num_led_triggers on failure to avoid crash net-next: ethernet: mediatek: change the compatible string Documentation: devicetree: change the mediatek ethernet compatible string bnxt_en: Fix RTNL lock usage on bnxt_get_port_module_status(). ...
2017-01-27net: adjust skb->truesize in pskb_expand_head()Eric Dumazet3-10/+14
Slava Shwartsman reported a warning in skb_try_coalesce(), when we detect skb->truesize is completely wrong. In his case, issue came from IPv6 reassembly coping with malicious datagrams, that forced various pskb_may_pull() to reallocate a bigger skb->head than the one allocated by NIC driver before entering GRO layer. Current code does not change skb->truesize, leaving this burden to callers if they care enough. Blindly changing skb->truesize in pskb_expand_head() is not easy, as some producers might track skb->truesize, for example in xmit path for back pressure feedback (sk->sk_wmem_alloc) We can detect the cases where it should be safe to change skb->truesize : 1) skb is not attached to a socket. 2) If it is attached to a socket, destructor is sock_edemux() My audit gave only two callers doing their own skb->truesize manipulation. I had to remove skb parameter in sock_edemux macro when CONFIG_INET is not set to avoid a compile error. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Slava Shwartsman <slavash@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27tcp: don't annotate mark on control socket from tcp_v6_send_response()Pablo Neira5-9/+9
Unlike ipv4, this control socket is shared by all cpus so we cannot use it as scratchpad area to annotate the mark that we pass to ip6_xmit(). Add a new parameter to ip6_xmit() to indicate the mark. The SCTP socket family caches the flowi6 structure in the sctp_transport structure, so we cannot use to carry the mark unless we later on reset it back, which I discarded since it looks ugly to me. Fixes: bf99b4ded5f8 ("tcp: fix mark propagation with fwmark_reflect enabled") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27net/ipv6: support more tunnel interfaces for EUI64 link-local generationFelix Jia3-0/+12
Signed-off-by: Felix Jia <felix.jia@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27net/ipv6: allow sysctl to change link-local address generation modeFelix Jia1-20/+84
The address generation mode for IPv6 link-local can only be configured by netlink messages. This patch adds the ability to change the address generation mode via sysctl. v1 -> v2 Removed the rtnl lock and switch to use RCU lock to iterate through the netdev list. v2 -> v3 Removed the addrgenmode variable from the idev structure and use the systcl storage for the flag. Simplifed the logic for sysctl handling by removing the supported for all operation. Added support for more types of tunnel interfaces for link-local address generation. Based the patches from net-next. v3 -> v4 Removed unnecessary whitespace changes. Signed-off-by: Felix Jia <felix.jia@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26net: ipv6: ignore null_entry on route dumpsDavid Ahern1-1/+5
lkp-robot reported a BUG: [ 10.151226] BUG: unable to handle kernel NULL pointer dereference at 00000198 [ 10.152525] IP: rt6_fill_node+0x164/0x4b8 [ 10.153307] *pdpt = 0000000012ee5001 *pde = 0000000000000000 [ 10.153309] [ 10.154492] Oops: 0000 [#1] [ 10.154987] CPU: 0 PID: 909 Comm: netifd Not tainted 4.10.0-rc4-00722-g41e8c70ee162-dirty #10 [ 10.156482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 10.158254] task: d0deb000 task.stack: d0e0c000 [ 10.159059] EIP: rt6_fill_node+0x164/0x4b8 [ 10.159780] EFLAGS: 00010296 CPU: 0 [ 10.160404] EAX: 00000000 EBX: d10c2358 ECX: c1f7c6cc EDX: c1f6ff44 [ 10.161469] ESI: 00000000 EDI: c2059900 EBP: d0e0dc4c ESP: d0e0dbe4 [ 10.162534] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 [ 10.163482] CR0: 80050033 CR2: 00000198 CR3: 10d94660 CR4: 000006b0 [ 10.164535] Call Trace: [ 10.164993] ? paravirt_sched_clock+0x9/0xd [ 10.165727] ? sched_clock+0x9/0xc [ 10.166329] ? sched_clock_cpu+0x19/0xe9 [ 10.166991] ? lock_release+0x13e/0x36c [ 10.167652] rt6_dump_route+0x4c/0x56 [ 10.168276] fib6_dump_node+0x1d/0x3d [ 10.168913] fib6_walk_continue+0xab/0x167 [ 10.169611] fib6_walk+0x2a/0x40 [ 10.170182] inet6_dump_fib+0xfb/0x1e0 [ 10.170855] netlink_dump+0xcd/0x21f This happens when the loopback device is set down and a ipv6 fib route dump is requested. ip6_null_entry is the root of all ipv6 fib tables making it integrated into the table and hence passed to the ipv6 route dump code. The null_entry route uses the loopback device for dst.dev but may not have rt6i_idev set because of the order in which initializations are done -- ip6_route_net_init is run before addrconf_init has initialized the loopback device. Fixing the initialization order is a much bigger problem with no obvious solution thus far. The BUG is triggered when the loopback is set down and the netif_running check added by a1a22c1206 fails. The fill_node descends to checking rt->rt6i_idev for ignore_routes_with_linkdown and since rt6i_idev is NULL it faults. The null_entry route should not be processed in a dump request. Catch and ignore. This check is done in rt6_dump_route as it is the highest place in the callchain with knowledge of both the route and the network namespace. Fixes: a1a22c1206("net: ipv6: Keep nexthop of multipath route on admin down") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26net: ipv6: remove skb_reserve in getrouteDavid Ahern1-6/+0
Remove skb_reserve and skb_reset_mac_header from inet6_rtm_getroute. The allocated skb is not passed through the routing engine (like it is for IPv4) and has not since the beginning of git time. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26net: dsa: Move ports assignment closer to error checkingFlorian Fainelli1-1/+2
Move the assignment of ports in _dsa_register_switch() closer to where it is checked, no functional change. Re-order declarations to be preserve the inverted christmas tree style. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26net: dsa: Suffix function manipulating device_node with _dnFlorian Fainelli1-8/+8
Make it clear that these functions take a device_node structure pointer Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26net: dsa: Make most functions take a dsa_port argumentFlorian Fainelli3-36/+44
In preparation for allowing platform data, and therefore no valid device_node pointer, make most DSA functions takes a pointer to a dsa_port structure whenever possible. While at it, introduce a dsa_port_is_valid() helper function which checks whether port->dn is NULL or not at the moment. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26net: dsa: Pass device pointer to dsa_register_switchFlorian Fainelli1-3/+4
In preparation for allowing dsa_register_switch() to be supplied with device/platform data, pass down a struct device pointer instead of a struct device_node. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26Merge tag 'batadv-next-for-davem-20170126' of git://git.open-mesh.org/linux-mergeDavid S. Miller59-69/+85
Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - ignore self-generated loop detect MAC addresses in translation table, by Simon Wunderlich - install uapi batman_adv.h header, by Sven Eckelmann - bump copyright years, by Sven Eckelmann - Remove an unused variable in translation table code, by Sven Eckelmann - Handle NET_XMIT_CN like NET_XMIT_SUCCESS (revised according to Davids suggestion), and a follow up code clean up, by Gao Feng (2 patches) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller19-88/+103
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains a large batch with Netfilter fixes for your net tree, they are: 1) Two patches to solve conntrack garbage collector cpu hogging, one to remove GC_MAX_EVICTS and another to look at the ratio (scanned entries vs. evicted entries) to make a decision on whether to reduce or not the scanning interval. From Florian Westphal. 2) Two patches to fix incorrect set element counting if NLM_F_EXCL is is not set. Moreover, don't decrenent set->nelems from abort patch if -ENFILE which leaks a spare slot in the set. This includes a patch to deconstify the set walk callback to update set->ndeact. 3) Two fixes for the fwmark_reflect sysctl feature: Propagate mark to reply packets both from nf_reject and local stack, from Pau Espin Pedrol. 4) Fix incorrect handling of loopback traffic in rpfilter and nf_tables fib expression, from Liping Zhang. 5) Fix oops on stateful objects netlink dump, when no filter is specified. Also from Liping Zhang. 6) Fix a build error if proc is not available in ipt_CLUSTERIP, related to fix that was applied in the previous batch for net. From Arnd Bergmann. 7) Fix lack of string validation in table, chain, set and stateful object names in nf_tables, from Liping Zhang. Moreover, restrict maximum log prefix length to 127 bytes, otherwise explicitly bail out. 8) Two patches to fix spelling and typos in nf_tables uapi header file and Kconfig, patches from Alexander Alemayhu and William Breathitt Gray. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26batman-adv: Treat NET_XMIT_CN as transmit successfullyGao Feng1-1/+1
The tc could return NET_XMIT_CN as one congestion notification, but it does not mean the packet is lost. Other modules like ipvlan, macvlan, and others treat NET_XMIT_CN as success too. So batman-adv should handle NET_XMIT_CN also as NET_XMIT_SUCCESS. Signed-off-by: Gao Feng <gfree.wind@gmail.com> [sven@narfation.org: Moved NET_XMIT_CN handling to batadv_send_skb_packet] Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-26batman-adv: Remove one condition check in batadv_route_unicast_packetGao Feng1-5/+4
It could decrease one condition check to collect some statements in the first condition block. Signed-off-by: Gao Feng <gfree.wind@gmail.com> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-26batman-adv: Remove unused variable in batadv_tt_local_set_flagsSven Eckelmann1-2/+0
Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-26batman-adv: update copyright years for 2017Sven Eckelmann59-59/+59
Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-26batman-adv: don't add loop detect macs to TTSimon Wunderlich2-1/+20
The bridge loop avoidance (BLA) feature of batman-adv sends packets to probe for Mesh/LAN packet loops. Those packets are not sent by real clients and should therefore not be added to the translation table (TT). Signed-off-by: Simon Wunderlich <simon.wunderlich@open-mesh.com>
2017-01-25bridge: move maybe_deliver_addr() inside #ifdefArnd Bergmann1-25/+25
The only caller of this new function is inside of an #ifdef checking for CONFIG_BRIDGE_IGMP_SNOOPING, so we should move the implementation there too, in order to avoid this harmless warning: net/bridge/br_forward.c:177:13: error: 'maybe_deliver_addr' defined but not used [-Werror=unused-function] Fixes: 6db6f0eae605 ("bridge: multicast to unicast") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25Merge tag 'batadv-net-for-davem-20170125' of git://git.open-mesh.org/linux-mergeDavid S. Miller1-5/+5
Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - fix reference count handling on fragmentation error, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25net: dsa: Bring back device detaching in dsa_slave_suspend()Florian Fainelli1-0/+2
Commit 448b4482c671 ("net: dsa: Add lockdep class to tx queues to avoid lockdep splat") removed the netif_device_detach() call done in dsa_slave_suspend() which is necessary, and paired with a corresponding netif_device_attach(), bring it back. Fixes: 448b4482c671 ("net: dsa: Add lockdep class to tx queues to avoid lockdep splat") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25net/tcp-fastopen: make connect()'s return case more consistent with non-TFOWilly Tarreau2-4/+4
Without TFO, any subsequent connect() call after a successful one returns -1 EISCONN. The last API update ensured that __inet_stream_connect() can return -1 EINPROGRESS in response to sendmsg() when TFO is in use to indicate that the connection is now in progress. Unfortunately since this function is used both for connect() and sendmsg(), it has the undesired side effect of making connect() now return -1 EINPROGRESS as well after a successful call, while at the same time poll() returns POLLOUT. This can confuse some applications which happen to call connect() and to check for -1 EISCONN to ensure the connection is usable, and for which EINPROGRESS indicates a need to poll, causing a loop. This problem was encountered in haproxy where a call to connect() is precisely used in certain cases to confirm a connection's readiness. While arguably haproxy's behaviour should be improved here, it seems important to aim at a more robust behaviour when the goal of the new API is to make it easier to implement TFO in existing applications. This patch simply ensures that we preserve the same semantics as in the non-TFO case on the connect() syscall when using TFO, while still returning -1 EINPROGRESS on sendmsg(). For this we simply tell __inet_stream_connect() whether we're doing a regular connect() or in fact connecting for a sendmsg() call. Cc: Wei Wang <weiwan@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25net/tcp-fastopen: Add new API supportWei Wang5-9/+102
This patch adds a new socket option, TCP_FASTOPEN_CONNECT, as an alternative way to perform Fast Open on the active side (client). Prior to this patch, a client needs to replace the connect() call with sendto(MSG_FASTOPEN). This can be cumbersome for applications who want to use Fast Open: these socket operations are often done in lower layer libraries used by many other applications. Changing these libraries and/or the socket call sequences are not trivial. A more convenient approach is to perform Fast Open by simply enabling a socket option when the socket is created w/o changing other socket calls sequence: s = socket() create a new socket setsockopt(s, IPPROTO_TCP, TCP_FASTOPEN_CONNECT …); newly introduced sockopt If set, new functionality described below will be used. Return ENOTSUPP if TFO is not supported or not enabled in the kernel. connect() With cookie present, return 0 immediately. With no cookie, initiate 3WHS with TFO cookie-request option and return -1 with errno = EINPROGRESS. write()/sendmsg() With cookie present, send out SYN with data and return the number of bytes buffered. With no cookie, and 3WHS not yet completed, return -1 with errno = EINPROGRESS. No MSG_FASTOPEN flag is needed. read() Return -1 with errno = EWOULDBLOCK/EAGAIN if connect() is called but write() is not called yet. Return -1 with errno = EWOULDBLOCK/EAGAIN if connection is established but no msg is received yet. Return number of bytes read if socket is established and there is msg received. The new API simplifies life for applications that always perform a write() immediately after a successful connect(). Such applications can now take advantage of Fast Open by merely making one new setsockopt() call at the time of creating the socket. Nothing else about the application's socket call sequence needs to change. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>