aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/vxlan.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2014-10-01vxlan: Set inner protocol before transmitTom Herbert1-0/+4
Call skb_set_inner_protocol to set inner Ethernet protocol to ETH_P_TEB before transmit. This is needed for GSO with UDP tunnels. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-23vxlan: Fix bug introduced by commit acbf74a76300Andy Zhou1-5/+5
Commit acbf74a76300 ("vxlan: Refactor vxlan driver to make use of the common UDP tunnel functions." introduced a bug in vxlan_xmit_one() function, causing it to transmit Vxlan packets without proper Vxlan header inserted. The change was not needed in the first place. Revert it. Reported-by: Tom Herbert <therbert@google.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19vxlan: Refactor vxlan driver to make use of the common UDP tunnel functions.Andy Zhou1-83/+22
Simplify vxlan implementation using common UDP tunnel APIs. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-01vxlan: Enable checksum unnecessary conversions for vxlan/UDP socketsTom Herbert1-0/+2
Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-29net: Clarification of CHECKSUM_UNNECESSARYTom Herbert1-2/+0
This patch: - Clarifies the specific requirements of devices returning CHECKSUM_UNNECESSARY (comments in skbuff.h). - Adds csum_level field to skbuff. This is used to express how many checksums are covered by CHECKSUM_UNNECESSARY (stores n - 1). This replaces the overloading of skb->encapsulation, that field is is now only used to indicate inner headers are valid. - Change __skb_checksum_validate_needed to "consume" each checksum as indicated by csum_level as layers of the the packet are parsed. - Remove skb_pop_rcv_encapsulation, no longer needed in the new csum_level model. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-22vxlan: fix incorrect initializer in union vxlan_addrGerhard Stenzel1-4/+4
The first initializer in the following union vxlan_addr ipa = { .sin.sin_addr.s_addr = tip, .sa.sa_family = AF_INET, }; is optimised away by the compiler, due to the second initializer, therefore initialising .sin.sin_addr.s_addr always to 0. This results in netlink messages indicating a L3 miss never contain the missed IP address. This was observed with GCC 4.8 and 4.9. I do not know about previous versions. The problem affects user space programs relying on an IP address being sent as part of a netlink message indicating a L3 miss. Changing .sa.sa_family = AF_INET, to .sin.sin_family = AF_INET, fixes the problem. Signed-off-by: Gerhard Stenzel <gerhard.stenzel@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-28neighbour : fix ndm_type type error issueJun Zhao1-1/+1
ndm_type means L3 address type, in neighbour proxy and vxlan, it's RTN_UNICAST. NDA_DST is for netlink TLV type, hence it's not right value in this context. Signed-off-by: Jun Zhao <mypopydev@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14vxlan: Call udp_sock_createTom Herbert1-91/+24
In vxlan driver call common function udp_sock_create to create the listener UDP port. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-10bridge: fdb dumping takes a filter deviceJamal Hadi Salim1-1/+2
Dumping a bridge fdb dumps every fdb entry held. With this change we are going to filter on selected bridge port. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-07vxlan: Call udp_flow_src_portTom Herbert1-24/+2
In vxlan and OVS vport-vxlan call common function to get source port for a UDP tunnel. Removed vxlan_src_port since the functionality is now in udp_flow_src_port. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15vxlan: Checksum fixesTom Herbert1-9/+2
Call skb_pop_rcv_encapsulation and postpull_rcsum for the Ethernet header to work properly with checksum complete. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-13vxlan: use dev->needed_headroom instead of dev->hard_header_lenCong Wang1-4/+3
When we mirror packets from a vxlan tunnel to other device, the mirror device should see the same packets (that is, without outer header). Because vxlan tunnel sets dev->hard_header_len, tcf_mirred() resets mac header back to outer mac, the mirror device actually sees packets with outer headers Vxlan tunnel should set dev->needed_headroom instead of dev->hard_header_len, like what other ip tunnels do. This fixes the above problem. Cc: "David S. Miller" <davem@davemloft.net> Cc: stephen hemminger <stephen@networkplumber.org> Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11net: Add skb_gro_postpull_rcsum to udp and vxlanTom Herbert1-0/+2
Need to gro_postpull_rcsum for GRO to work with checksum complete. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04vxlan: Add support for UDP checksums (v4 sending, v6 zero csums)Tom Herbert1-61/+59
Added VXLAN link configuration for sending UDP checksums, and allowing TX and RX of UDP6 checksums. Also, call common iptunnel_handle_offloads and added GSO support for checksums. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-13net: get rid of SET_ETHTOOL_OPSWilfried Klaebe1-1/+1
net: get rid of SET_ETHTOOL_OPS Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone. This does that. Mostly done via coccinelle script: @@ struct ethtool_ops *ops; struct net_device *dev; @@ - SET_ETHTOOL_OPS(dev, ops); + dev->ethtool_ops = ops; Compile tested only, but I'd seriously wonder if this broke anything. Suggested-by: Dave Miller <davem@davemloft.net> Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de> Acked-by: Felipe Balbi <balbi@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-24vxlan: add x-netns supportNicolas Dichtel1-16/+47
This patch allows to switch the netns when packet is encapsulated or decapsulated. The vxlan socket is openned into the i/o netns, ie into the netns where encapsulated packets are received. The socket lookup is done into this netns to find the corresponding vxlan tunnel. After decapsulation, the packet is injecting into the corresponding interface which may stand to another netns. When one of the two netns is removed, the tunnel is destroyed. Configuration example: ip netns add netns1 ip netns exec netns1 ip link set lo up ip link add vxlan10 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0 ip link set vxlan10 netns netns1 ip netns exec netns1 ip addr add 192.168.0.249/24 broadcast 192.168.0.255 dev vxlan10 ip netns exec netns1 ip link set vxlan10 up Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-23vxlan: ensure to advertise the right fdb remoteNicolas Dichtel1-17/+21
The goal of this patch is to fix rtnelink notification. The main problem was about notification for fdb entry with more than one remote. Before the patch, when a remote was added to an existing fdb entry, the kernel advertised the first remote instead of the added one. Also when a remote was removed from a fdb entry with several remotes, the deleted remote was not advertised. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-15ipv4: add a sock pointer to dst->output() path.Eric Dumazet1-2/+2
In the dst->output() path for ipv4, the code assumes the skb it has to transmit is attached to an inet socket, specifically via ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the provider of the packet is an AF_PACKET socket. The dst->output() method gets an additional 'struct sock *sk' parameter. This needs a cascade of changes so that this parameter can be propagated from vxlan to final consumer. Fixes: 8f646c922d55 ("vxlan: keep original skb ownership") Reported-by: lucien xin <lucien.xin@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-03net: vxlan: fix crash when interface is created with no groupMike Rapoport1-1/+5
If the vxlan interface is created without explicit group definition, there are corner cases which may cause kernel panic. For instance, in the following scenario: node A: $ ip link add dev vxlan42 address 2c:c2:60:00:10:20 type vxlan id 42 $ ip addr add dev vxlan42 10.0.0.1/24 $ ip link set up dev vxlan42 $ arp -i vxlan42 -s 10.0.0.2 2c:c2:60:00:01:02 $ bridge fdb add dev vxlan42 to 2c:c2:60:00:01:02 dst <IPv4 address> $ ping 10.0.0.2 node B: $ ip link add dev vxlan42 address 2c:c2:60:00:01:02 type vxlan id 42 $ ip addr add dev vxlan42 10.0.0.2/24 $ ip link set up dev vxlan42 $ arp -i vxlan42 -s 10.0.0.1 2c:c2:60:00:10:20 node B crashes: vxlan42: 2c:c2:60:00:10:20 migrated from 4011:eca4:c0a8:6466:c0a8:6415:8e09:2118 to (invalid address) vxlan42: 2c:c2:60:00:10:20 migrated from 4011:eca4:c0a8:6466:c0a8:6415:8e09:2118 to (invalid address) BUG: unable to handle kernel NULL pointer dereference at 0000000000000046 IP: [<ffffffff8143c459>] ip6_route_output+0x58/0x82 PGD 7bd89067 PUD 7bd4e067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.14.0-rc8-hvx-xen-00019-g97a5221-dirty #154 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff88007c774f50 ti: ffff88007c79c000 task.ti: ffff88007c79c000 RIP: 0010:[<ffffffff8143c459>] [<ffffffff8143c459>] ip6_route_output+0x58/0x82 RSP: 0018:ffff88007fd03668 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffff8186a000 RCX: 0000000000000040 RDX: 0000000000000000 RSI: ffff88007b0e4a80 RDI: ffff88007fd03754 RBP: ffff88007fd03688 R08: ffff88007b0e4a80 R09: 0000000000000000 R10: 0200000a0100000a R11: 0001002200000000 R12: ffff88007fd03740 R13: ffff88007b0e4a80 R14: ffff88007b0e4a80 R15: ffff88007bba0c50 FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000046 CR3: 000000007bb60000 CR4: 00000000000006e0 Stack: 0000000000000000 ffff88007fd037a0 ffffffff8186a000 ffff88007fd03740 ffff88007fd036c8 ffffffff814320bb 0000000000006e49 ffff88007b8b7360 ffff88007bdbf200 ffff88007bcbc000 ffff88007b8b7000 ffff88007b8b7360 Call Trace: <IRQ> [<ffffffff814320bb>] ip6_dst_lookup_tail+0x2d/0xa4 [<ffffffff814322a5>] ip6_dst_lookup+0x10/0x12 [<ffffffff81323b4e>] vxlan_xmit_one+0x32a/0x68c [<ffffffff814a325a>] ? _raw_spin_unlock_irqrestore+0x12/0x14 [<ffffffff8104c551>] ? lock_timer_base.isra.23+0x26/0x4b [<ffffffff8132451a>] vxlan_xmit+0x66a/0x6a8 [<ffffffff8141a365>] ? ipt_do_table+0x35f/0x37e [<ffffffff81204ba2>] ? selinux_ip_postroute+0x41/0x26e [<ffffffff8139d0c1>] dev_hard_start_xmit+0x2ce/0x3ce [<ffffffff8139d491>] __dev_queue_xmit+0x2d0/0x392 [<ffffffff813b380f>] ? eth_header+0x28/0xb5 [<ffffffff8139d569>] dev_queue_xmit+0xb/0xd [<ffffffff813a5aa6>] neigh_resolve_output+0x134/0x152 [<ffffffff813db741>] ip_finish_output2+0x236/0x299 [<ffffffff813dc074>] ip_finish_output+0x98/0x9d [<ffffffff813dc749>] ip_output+0x62/0x67 [<ffffffff813da9f2>] dst_output+0xf/0x11 [<ffffffff813dc11c>] ip_local_out+0x1b/0x1f [<ffffffff813dcf1b>] ip_send_skb+0x11/0x37 [<ffffffff813dcf70>] ip_push_pending_frames+0x2f/0x33 [<ffffffff813ff732>] icmp_push_reply+0x106/0x115 [<ffffffff813ff9e4>] icmp_reply+0x142/0x164 [<ffffffff813ffb3b>] icmp_echo.part.16+0x46/0x48 [<ffffffff813c1d30>] ? nf_iterate+0x43/0x80 [<ffffffff813d8037>] ? xfrm4_policy_check.constprop.11+0x52/0x52 [<ffffffff813ffb62>] icmp_echo+0x25/0x27 [<ffffffff814005f7>] icmp_rcv+0x1d2/0x20a [<ffffffff813d8037>] ? xfrm4_policy_check.constprop.11+0x52/0x52 [<ffffffff813d810d>] ip_local_deliver_finish+0xd6/0x14f [<ffffffff813d8037>] ? xfrm4_policy_check.constprop.11+0x52/0x52 [<ffffffff813d7fde>] NF_HOOK.constprop.10+0x4c/0x53 [<ffffffff813d82bf>] ip_local_deliver+0x4a/0x4f [<ffffffff813d7f7b>] ip_rcv_finish+0x253/0x26a [<ffffffff813d7d28>] ? inet_add_protocol+0x3e/0x3e [<ffffffff813d7fde>] NF_HOOK.constprop.10+0x4c/0x53 [<ffffffff813d856a>] ip_rcv+0x2a6/0x2ec [<ffffffff8139a9a0>] __netif_receive_skb_core+0x43e/0x478 [<ffffffff812a346f>] ? virtqueue_poll+0x16/0x27 [<ffffffff8139aa2f>] __netif_receive_skb+0x55/0x5a [<ffffffff8139aaaa>] process_backlog+0x76/0x12f [<ffffffff8139add8>] net_rx_action+0xa2/0x1ab [<ffffffff81047847>] __do_softirq+0xca/0x1d1 [<ffffffff81047ace>] irq_exit+0x3e/0x85 [<ffffffff8100b98b>] do_IRQ+0xa9/0xc4 [<ffffffff814a37ad>] common_interrupt+0x6d/0x6d <EOI> [<ffffffff810378db>] ? native_safe_halt+0x6/0x8 [<ffffffff810110c7>] default_idle+0x9/0xd [<ffffffff81011694>] arch_cpu_idle+0x13/0x1c [<ffffffff8107480d>] cpu_startup_entry+0xbc/0x137 [<ffffffff8102e741>] start_secondary+0x1a0/0x1a5 Code: 24 14 e8 f1 e5 01 00 31 d2 a8 32 0f 95 c2 49 8b 44 24 2c 49 0b 44 24 24 74 05 83 ca 04 eb 1c 4d 85 ed 74 17 49 8b 85 a8 02 00 00 <66> 8b 40 46 66 c1 e8 07 83 e0 07 c1 e0 03 09 c2 4c 89 e6 48 89 RIP [<ffffffff8143c459>] ip6_route_output+0x58/0x82 RSP <ffff88007fd03668> CR2: 0000000000000046 ---[ end trace 4612329caab37efd ]--- When vxlan interface is created without explicit group definition, the default_dst protocol family is initialiazed to AF_UNSPEC and the driver assumes IPv4 configuration. On the other side, the default_dst protocol family is used to differentiate between IPv4 and IPv6 cases and, since, AF_UNSPEC != AF_INET, the processing takes the IPv6 path. Making the IPv4 assumption explicit by settting default_dst protocol family to AF_INET4 and preventing mixing of IPv4 and IPv6 addresses in snooped fdb entries fixes the corner case crashes. Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-14/+116
Conflicts: Documentation/devicetree/bindings/net/micrel-ks8851.txt net/core/netpoll.c The net/core/netpoll.c conflict is a bug fix in 'net' happening to code which is completely removed in 'net-next'. In micrel-ks8851.txt we simply have overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-24vxlan: fix nonfunctional neigh_reduce()David Stevens1-14/+113
The VXLAN neigh_reduce() code is completely non-functional since check-in. Specific errors: 1) The original code drops all packets with a multicast destination address, even though neighbor solicitations are sent to the solicited-node address, a multicast address. The code after this check was never run. 2) The neighbor table lookup used the IPv6 header destination, which is the solicited node address, rather than the target address from the neighbor solicitation. So neighbor lookups would always fail if it got this far. Also for L3MISSes. 3) The code calls ndisc_send_na(), which does a send on the tunnel device. The context for neigh_reduce() is the transmit path, vxlan_xmit(), where the host or a bridge-attached neighbor is trying to transmit a neighbor solicitation. To respond to it, the tunnel endpoint needs to do a *receive* of the appropriate neighbor advertisement. Doing a send, would only try to send the advertisement, encapsulated, to the remote destinations in the fdb -- hosts that definitely did not do the corresponding solicitation. 4) The code uses the tunnel endpoint IPv6 forwarding flag to determine the isrouter flag in the advertisement. This has nothing to do with whether or not the target is a router, and generally won't be set since the tunnel endpoint is bridging, not routing, traffic. The patch below creates a proxy neighbor advertisement to respond to neighbor solicitions as intended, providing proper IPv6 support for neighbor reduction. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-18vxlan: fix potential NULL dereference in arp_reduce()David Stevens1-0/+3
This patch fixes a NULL pointer dereference in the event of an skb allocation failure in arp_reduce(). Signed-Off-By: David L Stevens <dlstevens@us.ibm.com> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-26vxlan: remove unused port variable in vxlan_udp_encap_recv()Pablo Neira Ayuso1-3/+0
Signed-off-by: Pablo Neira Ayuso <pablo@gnumonks.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-14net: introduce netdev_alloc_pcpu_stats() for driversWANG Cong1-9/+1
There are many drivers calling alloc_percpu() to allocate pcpu stats and then initializing ->syncp. So just introduce a helper function for them. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-01vxlan: remove extra newline after function definitionDaniel Baluta1-1/+0
Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-30net/vxlan: Go over all candidate streams for GRO matchingOr Gerlitz1-2/+0
The loop in vxlan_gro_receive() over the current set of candidates for coalescing was wrongly aborted once a match was found. In rare cases, this can cause a false-positives matching in the next layer GRO checks. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-23net/vxlan: Share RX skb de-marking and checksum checks with ovsOr Gerlitz1-11/+10
Make sure the practice set by commit 0afb166 "vxlan: Add capability of Rx checksum offload for inner packet" is applied when the skb goes through the portion of the RX code which is shared between vxlan netdevices and ovs vxlan port instances. Cc: Joseph Gasparakis <joseph.gasparakis@intel.com> Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-22net: vxlan: convert to act as a pernet subsystemDaniel Borkmann1-22/+6
As per suggestion from Eric W. Biederman, vxlan should be using {un,}register_pernet_subsys() instead of {un,}register_pernet_device() to ensure the vxlan_net structure is initialized before and cleaned up after all network devices in a given network namespace i.e. when dealing with network notifiers. This is similarly handeled already in commit 91e2ff3528ac ("net: Teach vlans to cleanup as a pernet subsystem") and, thus, improves upon fd27e0d44a89 ("net: vxlan: do not use vxlan_net before checking event type"). Just as in 91e2ff3528ac, we do not need to explicitly handle deletion of vxlan devices as network namespace exit calls dellink on all remaining virtual devices, and rtnl_link_unregister() calls dellink on all outstanding devices in that network namespace, so we can entirely drop the pernet exit operation as well. Moreover, on vxlan module exit, rcu_barrier() is called by netns since commit 3a765edadb28 ("netns: Add an explicit rcu_barrier to unregister_pernet_{device|subsys}"), so this may be omitted. Tested with various scenarios and works well on my side. Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21net: Add GRO support for vxlan trafficOr Gerlitz1-7/+110
Add GRO handlers for vxlann, by using the UDP GRO infrastructure. For single TCP session that goes through vxlan tunneling I got nice improvement from 6.8Gbs to 11.5Gbs --> UDP/VXLAN GRO disabled $ netperf -H 192.168.52.147 -c -C $ netperf -t TCP_STREAM -H 192.168.52.147 -c -C MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 65536 65536 10.00 6799.75 12.54 24.79 0.604 1.195 --> UDP/VXLAN GRO enabled $ netperf -t TCP_STREAM -H 192.168.52.147 -c -C MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 65536 65536 10.00 11562.72 24.90 20.34 0.706 0.577 Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21net: add vxlan descriptionJesse Brandeburg1-0/+1
Add a description to the vxlan module, helping save the world from the minions of destruction and confusion. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> CC: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17net: vxlan: do not use vxlan_net before checking event typeDaniel Borkmann1-2/+4
Jesse Brandeburg reported that commit acaf4e70997f caused a panic when adding a network namespace while vxlan module was present in the system: [<ffffffff814d0865>] vxlan_lowerdev_event+0xf5/0x100 [<ffffffff816e9e5d>] notifier_call_chain+0x4d/0x70 [<ffffffff810912be>] __raw_notifier_call_chain+0xe/0x10 [<ffffffff810912d6>] raw_notifier_call_chain+0x16/0x20 [<ffffffff815d9610>] call_netdevice_notifiers_info+0x40/0x70 [<ffffffff815d9656>] call_netdevice_notifiers+0x16/0x20 [<ffffffff815e1bce>] register_netdevice+0x1be/0x3a0 [<ffffffff815e1dce>] register_netdev+0x1e/0x30 [<ffffffff814cb94a>] loopback_net_init+0x4a/0xb0 [<ffffffffa016ed6e>] ? lockd_init_net+0x6e/0xb0 [lockd] [<ffffffff815d6bac>] ops_init+0x4c/0x150 [<ffffffff815d6d23>] setup_net+0x73/0x110 [<ffffffff815d725b>] copy_net_ns+0x7b/0x100 [<ffffffff81090e11>] create_new_namespaces+0x101/0x1b0 [<ffffffff81090f45>] copy_namespaces+0x85/0xb0 [<ffffffff810693d5>] copy_process.part.26+0x935/0x1500 [<ffffffff811d5186>] ? mntput+0x26/0x40 [<ffffffff8106a15c>] do_fork+0xbc/0x2e0 [<ffffffff811b7f2e>] ? ____fput+0xe/0x10 [<ffffffff81089c5c>] ? task_work_run+0xac/0xe0 [<ffffffff8106a406>] SyS_clone+0x16/0x20 [<ffffffff816ee689>] stub_clone+0x69/0x90 [<ffffffff816ee329>] ? system_call_fastpath+0x16/0x1b Apparently loopback device is being registered first and thus we receive an event notification when vxlan_net is not ready. Hence, when we call net_generic() and request vxlan_net_id, we seem to access garbage at that point in time. In setup_net() where we set up a newly allocated network namespace, we traverse the list of pernet ops ... list_for_each_entry(ops, &pernet_list, list) { error = ops_init(ops, net); if (error < 0) goto out_undo; } ... and loopback_net_init() is invoked first here, so in the middle of setup_net() we get this notification in vxlan. As currently we only care about devices that unregister, move access through net_generic() there. Fix is based on Cong Wang's proposal, but only changes what is needed here. It sucks a bit as we only work around the actual cure: right now it seems the only way to check if a netns actually finished traversing all init ops would be to check if it's part of net_namespace_list. But that I find quite expensive each time we go through a notifier callback. Anyway, did a couple of tests and it seems good for now. Fixes: acaf4e70997f ("net: vxlan: when lower dev unregisters remove vxlan dev as well") Reported-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Tested-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14net: vxlan: properly cleanup devs on module unloadDaniel Borkmann1-5/+5
We should use vxlan_dellink() handler in vxlan_exit_net(), since i) we're not in fast-path and we should be consistent in dismantle just as we would remove a device through rtnl ops, and more importantly, ii) in case future code will kfree() memory in vxlan_dellink(), we would leak it right here unnoticed. Therefore, do not only half of the cleanup work, but make it properly. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14net: vxlan: when lower dev unregisters remove vxlan dev as wellDaniel Borkmann1-2/+46
We can create a vxlan device with an explicit underlying carrier. In that case, when the carrier link is being deleted from the system (e.g. due to module unload) we should also clean up all created vxlan devices on top of it since otherwise we're in an inconsistent state in vxlan device. In that case, the user needs to remove all such devices, while in case of other virtual devs that sit on top of physical ones, it is usually the case that these devices do unregister automatically as well and do not leave the burden on the user. This work is not necessary when vxlan device was not created with a real underlying device, as connections can resume in that case when driver is plugged again. But at least for the other cases, we should go ahead and do the cleanup on removal. We don't register the notifier during vxlan_newlink() here since I consider this event rather rare, and therefore we should not bloat vxlan's core structure unecessary. Also, we can simply make use of unregister_netdevice_many() to batch that. fdb is flushed upon ndo_stop(). E.g. `ip -d link show vxlan13` after carrier removal before this patch: 5: vxlan13: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default link/ether 1e:47:da:6d:4d:99 brd ff:ff:ff:ff:ff:ff promiscuity 0 vxlan id 13 group 239.0.0.10 dev 2 port 32768 61000 ageing 300 ^^^^^ Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14vxlan: use __dev_get_by_index instead of dev_get_by_index to find interfaceYing Xue1-2/+1
The following call chains indicate that vxlan_fdb_parse() is under rtnl_lock protection. So if we use __dev_get_by_index() instead of dev_get_by_index() to find interface handler in it, this would help us avoid to change interface reference counter. rtnetlink_rcv() rtnl_lock() netlink_rcv_skb() rtnl_fdb_add() vxlan_fdb_add() vxlan_fdb_parse() rtnl_unlock() rtnetlink_rcv() rtnl_lock() netlink_rcv_skb() rtnl_fdb_del() vxlan_fdb_del() vxlan_fdb_parse() rtnl_unlock() Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+2
Conflicts: drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c net/ipv6/ip6_tunnel.c net/ipv6/ip6_vti.c ipv6 tunnel statistic bug fixes conflicting with consolidation into generic sw per-cpu net stats. qlogic conflict between queue counting bug fix and the addition of multiple MAC address support. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06vxlan: keep original skb ownershipEric Dumazet1-21/+10
Sathya Perla posted a patch trying to address following problem : <quote> The vxlan driver sets itself as the socket owner for all the TX flows it encapsulates (using vxlan_set_owner()) and assigns it's own skb destructor. This causes all tunneled traffic to land up on only one TXQ as all encapsulated skbs refer to the vxlan socket and not the original socket. Also, the vxlan skb destructor breaks some functionality for tunneled traffic like wmem accounting and as TCP small queues and FQ/pacing packet scheduler. </quote> I reworked Sathya patch and added some explanations. vxlan_xmit() can avoid one skb_clone()/dev_kfree_skb() pair and gain better drop monitor accuracy, by calling kfree_skb() when appropriate. The UDP socket used by vxlan to perform encapsulation of xmit packets do not need to be alive while packets leave vxlan code. Its better to keep original socket ownership to get proper feedback from qdisc and NIC layers. We use skb->sk to A) control amount of bytes/packets queued on behalf of a socket, but prior vxlan code did the skb->sk transfert without any limit/control on vxlan socket sk_sndbuf. B) security purposes (as selinux) or netfilter uses, and I do not think anything is prepared to handle vxlan stacked case in this area. By not changing ownership, vxlan tunnels behave like other tunnels. As Stephen mentioned, we might do the same change in L2TP. Reported-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04net: unify the pcpu_tstats and br_cpu_netstats as oneLi RongQing1-5/+6
They are same, so unify them as one, pcpu_sw_netstats. Define pcpu_sw_netstat in netdevice.h, remove pcpu_tstats from if_tunnel and remove br_cpu_netstats from br_private.h Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03{vxlan, inet6} Mark vxlan_dev flags with VXLAN_F_IPV6 properlyfan.du1-1/+2
Even if user doesn't supply the physical netdev to attach vxlan dev to, and at the same time user want to vxlan sit top of IPv6, mark vxlan_dev flags with VXLAN_F_IPV6 to create IPv6 based socket. Otherwise kernel crashes safely every time spitting below messages, Steps to reproduce: ip link add vxlan0 type vxlan id 42 group ff0e::110 ip link set vxlan0 up [ 62.656266] BUG: unable to handle kernel NULL pointer dereference[ 62.656320] ip (3008) used greatest stack depth: 3912 bytes left at 0000000000000046 [ 62.656423] IP: [<ffffffff816d822d>] ip6_route_output+0xbd/0xe0 [ 62.656525] PGD 2c966067 PUD 2c9a2067 PMD 0 [ 62.656674] Oops: 0000 [#1] SMP [ 62.656781] Modules linked in: vxlan netconsole deflate zlib_deflate af_key [ 62.657083] CPU: 1 PID: 2128 Comm: whoopsie Not tainted 3.12.0+ #182 [ 62.657083] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox 12/01/2006 [ 62.657083] task: ffff88002e2335d0 ti: ffff88002c94c000 task.ti: ffff88002c94c000 [ 62.657083] RIP: 0010:[<ffffffff816d822d>] [<ffffffff816d822d>] ip6_route_output+0xbd/0xe0 [ 62.657083] RSP: 0000:ffff88002fd038f8 EFLAGS: 00210296 [ 62.657083] RAX: 0000000000000000 RBX: ffff88002fd039e0 RCX: 0000000000000000 [ 62.657083] RDX: ffff88002fd0eb68 RSI: ffff88002fd0d278 RDI: ffff88002fd0d278 [ 62.657083] RBP: ffff88002fd03918 R08: 0000000002000000 R09: 0000000000000000 [ 62.657083] R10: 00000000000001ff R11: 0000000000000000 R12: 0000000000000001 [ 62.657083] R13: ffff88002d96b480 R14: ffffffff81c8e2c0 R15: 0000000000000001 [ 62.657083] FS: 0000000000000000(0000) GS:ffff88002fd00000(0063) knlGS:00000000f693b740 [ 62.657083] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 [ 62.657083] CR2: 0000000000000046 CR3: 000000002c9d2000 CR4: 00000000000006e0 [ 62.657083] Stack: [ 62.657083] ffff88002fd03a40 ffffffff81c8e2c0 ffff88002fd039e0 ffff88002d96b480 [ 62.657083] ffff88002fd03958 ffffffff816cac8b ffff880019277cc0 ffff8800192b5d00 [ 62.657083] ffff88002d5bc000 ffff880019277cc0 0000000000001821 0000000000000001 [ 62.657083] Call Trace: [ 62.657083] <IRQ> [ 62.657083] [<ffffffff816cac8b>] ip6_dst_lookup_tail+0xdb/0xf0 [ 62.657083] [<ffffffff816caea0>] ip6_dst_lookup+0x10/0x20 [ 62.657083] [<ffffffffa0020c13>] vxlan_xmit_one+0x193/0x9c0 [vxlan] [ 62.657083] [<ffffffff8137b3b7>] ? account+0xc7/0x1f0 [ 62.657083] [<ffffffffa0021513>] vxlan_xmit+0xd3/0x400 [vxlan] [ 62.657083] [<ffffffff8161390d>] dev_hard_start_xmit+0x49d/0x5e0 [ 62.657083] [<ffffffff81613d29>] dev_queue_xmit+0x2d9/0x480 [ 62.657083] [<ffffffff817cb854>] ? _raw_write_unlock_bh+0x14/0x20 [ 62.657083] [<ffffffff81630565>] ? eth_header+0x35/0xe0 [ 62.657083] [<ffffffff8161bc5e>] neigh_resolve_output+0x11e/0x1e0 [ 62.657083] [<ffffffff816ce0e0>] ? ip6_fragment+0xad0/0xad0 [ 62.657083] [<ffffffff816cb465>] ip6_finish_output2+0x2f5/0x470 [ 62.657083] [<ffffffff816ce166>] ip6_finish_output+0x86/0xc0 [ 62.657083] [<ffffffff816ce218>] ip6_output+0x78/0xb0 [ 62.657083] [<ffffffff816eadd6>] mld_sendpack+0x256/0x2a0 [ 62.657083] [<ffffffff816ebd8c>] mld_ifc_timer_expire+0x17c/0x290 [ 62.657083] [<ffffffff816ebc10>] ? igmp6_timer_handler+0x80/0x80 [ 62.657083] [<ffffffff816ebc10>] ? igmp6_timer_handler+0x80/0x80 [ 62.657083] [<ffffffff81051065>] call_timer_fn+0x45/0x150 [ 62.657083] [<ffffffff816ebc10>] ? igmp6_timer_handler+0x80/0x80 [ 62.657083] [<ffffffff81052353>] run_timer_softirq+0x1f3/0x2a0 [ 62.657083] [<ffffffff8102dfd8>] ? lapic_next_event+0x18/0x20 [ 62.657083] [<ffffffff8109e36f>] ? clockevents_program_event+0x6f/0x110 [ 62.657083] [<ffffffff8104a2f6>] __do_softirq+0xd6/0x2b0 [ 62.657083] [<ffffffff8104a75e>] irq_exit+0x7e/0xa0 [ 62.657083] [<ffffffff8102ea15>] smp_apic_timer_interrupt+0x45/0x60 [ 62.657083] [<ffffffff817d3eca>] apic_timer_interrupt+0x6a/0x70 [ 62.657083] <EOI> [ 62.657083] [<ffffffff817d4a35>] ? sysenter_dispatch+0x7/0x1a [ 62.657083] Code: 4d 8b 85 a8 02 00 00 4c 89 e9 ba 03 04 00 00 48 c7 c6 c0 be 8d 81 48 c7 c7 48 35 a3 81 31 c0 e8 db 68 0e 00 49 8b 85 a8 02 00 00 <0f> b6 40 46 c0 e8 05 0f b6 c0 c1 e0 03 41 09 c4 e9 77 ff ff ff [ 62.657083] RIP [<ffffffff816d822d>] ip6_route_output+0xbd/0xe0 [ 62.657083] RSP <ffff88002fd038f8> [ 62.657083] CR2: 0000000000000046 [ 62.657083] ---[ end trace ba8a9583d7cd1934 ]--- [ 62.657083] Kernel panic - not syncing: Fatal exception in interrupt Signed-off-by: Fan Du <fan.du@windriver.com> Reported-by: Ryan Whelan <rcwhelan@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-22net: vxlan: use custom ndo_change_mtu handlerDaniel Borkmann1-1/+24
When adding a new vxlan device to an "underlying carrier" (here: dst->remote_ifindex), the MTU size assigned to the vxlan device is the MTU at setup time of the carrier - needed headroom, when adding a vxlan device w/o explicit carrier, then it defaults to 1500. In case of an explicit carrier that supports jumbo frames, we currently cannot change vxlan MTU via ip(8) to > 1500 in post-setup time, as vxlan driver uses eth_change_mtu() as default method for manually setting MTU. Hence, use a custom implementation that only falls back to eth_change_mtu() in case we didn't use a dev parameter on device setup time, and otherwise allow a max MTU setting of the carrier incl. adjustment for headroom. Reported-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
Conflicts: drivers/net/ethernet/intel/i40e/i40e_main.c drivers/net/macvtap.c Both minor merge hassles, simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17net: Change skb_get_rxhash to skb_get_hashTom Herbert1-1/+1
Changing name of function as part of making the hash in skbuff to be generic property, not just for receive path. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11vxlan: leave multicast group when vxlan device downGao feng1-6/+21
vxlan_group_used only allows device to leave multicast group when the remote_ip of this vxlan device is difference from other vxlan devices' remote_ip. this will cause device not leave multicast group untile the vn_sock of this vxlan deivce being released. The check in vxlan_group_used is not quite precise. since even the remote_ip is same, but these vxlan devices may use different lower devices, and they may use different vn_socks. Only when some vxlan devices use the same vn_sock,same lower device and same remote_ip, the mc_list of the vn_sock should not be changed. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11vxlan: remove vxlan_group_used in vxlan_openGao feng1-3/+1
In vxlan_open, vxlan_group_used always returns true, because the state of the vxlan deivces which we want to open has alreay been running. and it has already in vxlan_list. Since ip_mc_join_group takes care of the reference of struct ip_mc_list. removing vxlan_group_used here is safe. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10vxlan: release rt when found circular routeFan Du1-1/+1
Otherwise causing dst memory leakage. Have Checked all other type tunnel device transmit implementation, no such things happens anymore. Signed-off-by: Fan Du <fan.du@windriver.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-0/+8
Pull core locking changes from Ingo Molnar: "The biggest changes: - add lockdep support for seqcount/seqlocks structures, this unearthed both bugs and required extra annotation. - move the various kernel locking primitives to the new kernel/locking/ directory" * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) block: Use u64_stats_init() to initialize seqcounts locking/lockdep: Mark __lockdep_count_forward_deps() as static lockdep/proc: Fix lock-time avg computation locking/doc: Update references to kernel/mutex.c ipv6: Fix possible ipv6 seqlock deadlock cpuset: Fix potential deadlock w/ set_mems_allowed seqcount: Add lockdep functionality to seqcount/seqlock structures net: Explicitly initialize u64_stats_sync structures for lockdep locking: Move the percpu-rwsem code to kernel/locking/ locking: Move the lglocks code to kernel/locking/ locking: Move the rwsem code to kernel/locking/ locking: Move the rtmutex code to kernel/locking/ locking: Move the semaphore core to kernel/locking/ locking: Move the spinlock code to kernel/locking/ locking: Move the lockdep code to kernel/locking/ locking: Move the mutex code to kernel/locking/ hung_task debugging: Add tracepoint to report the hang x86/locking/kconfig: Update paravirt spinlock Kconfig description lockstat: Report avg wait and hold times lockdep, x86/alternatives: Drop ancient lockdep fixup message ...
2013-11-06net: Explicitly initialize u64_stats_sync structures for lockdepJohn Stultz1-0/+8
In order to enable lockdep on seqcount/seqlock structures, we must explicitly initialize any locks. The u64_stats_sync structure, uses a seqcount, and thus we need to introduce a u64_stats_init() function and use it to initialize the structure. This unfortunately adds a lot of fairly trivial initialization code to a number of drivers. But the benefit of ensuring correctness makes this worth while. Because these changes are required for lockdep to be enabled, and the changes are quite trivial, I've not yet split this patch out into 30-some separate patches, as I figured it would be better to get the various maintainers thoughts on how to best merge this change along with the seqcount lockdep enablement. Feedback would be appreciated! Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Cc: Jesse Gross <jesse@nicira.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mirko Lindner <mlindner@marvell.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Roger Luethi <rl@hellgate.ch> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Simon Horman <horms@verge.net.au> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Wensong Zhang <wensong@linux-vs.org> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-04vxlan: Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...))Duan Jiong1-1/+1
trivial patch converting ERR_PTR(PTR_ERR()) into ERR_CAST(). No functional changes. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29vxlan: Have the NIC drivers do less work for offloadsJoseph Gasparakis1-4/+0
This patch removes the burden from the NIC drivers to check if the vxlan driver is enabled in the kernel and also makes available the vxlan headrooms to them. Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2013-10-29vxlan: silence one build warningZhi Yong Wu1-17/+14
drivers/net/vxlan.c: In function ‘vxlan_sock_add’: drivers/net/vxlan.c:2298:11: warning: ‘sock’ may be used uninitialized in this function [-Wmaybe-uninitialized] drivers/net/vxlan.c:2275:17: note: ‘sock’ was declared here LD drivers/net/built-in.o Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>