aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/vxlan.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2016-11-18netns: make struct pernet_operations::id unsigned intAlexey Dobriyan1-1/+1
Make struct pernet_operations::id unsigned. There are 2 reasons to do so: 1) This field is really an index into an zero based array and thus is unsigned entity. Using negative value is out-of-bound access by definition. 2) On x86_64 unsigned 32-bit data which are mixed with pointers via array indexing or offsets added or subtracted to pointers are preffered to signed 32-bit data. "int" being used as an array index needs to be sign-extended to 64-bit before being used. void f(long *p, int i) { g(p[i]); } roughly translates to movsx rsi, esi mov rdi, [rsi+...] call g MOVSX is 3 byte instruction which isn't necessary if the variable is unsigned because x86_64 is zero extending by default. Now, there is net_generic() function which, you guessed it right, uses "int" as an array index: static inline void *net_generic(const struct net *net, int id) { ... ptr = ng->ptr[id - 1]; ... } And this function is used a lot, so those sign extensions add up. Patch snipes ~1730 bytes on allyesconfig kernel (without all junk messing with code generation): add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) Unfortunately some functions actually grow bigger. This is a semmingly random artefact of code generation with register allocator being used differently. gcc decides that some variable needs to live in new r8+ registers and every access now requires REX prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be used which is longer than [r8] However, overall balance is in negative direction: add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) function old new delta nfsd4_lock 3886 3959 +73 tipc_link_build_proto_msg 1096 1140 +44 mac80211_hwsim_new_radio 2776 2808 +32 tipc_mon_rcv 1032 1058 +26 svcauth_gss_legacy_init 1413 1429 +16 tipc_bcbase_select_primary 379 392 +13 nfsd4_exchange_id 1247 1260 +13 nfsd4_setclientid_confirm 782 793 +11 ... put_client_renew_locked 494 480 -14 ip_set_sockfn_get 730 716 -14 geneve_sock_add 829 813 -16 nfsd4_sequence_done 721 703 -18 nlmclnt_lookup_host 708 686 -22 nfsd4_lockt 1085 1063 -22 nfs_get_client 1077 1050 -27 tcf_bpf_init 1106 1076 -30 nfsd4_encode_fattr 5997 5930 -67 Total: Before=154856051, After=154854321, chg -0.00% Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15vxlan: Fix uninitialized variable warnings.David S. Miller1-1/+4
drivers/net/vxlan.c: In function ‘vxlan_xmit_one’: drivers/net/vxlan.c:2141:10: warning: ‘err’ may be used uninitialized in this function [-Wmaybe-uninitialized] Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15vxlan: simplify vxlan xmitpravin shelar1-44/+34
Existing vxlan xmit function handles two distinct cases. 1. vxlan net device 2. vxlan lwt device. By seperating initialization these two cases the egress path looks better. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15vxlan: simplify RTF_LOCAL handling.pravin shelar1-34/+51
Avoid code duplicate code for handling RTF_LOCAL routes. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15vxlan: improve vxlan route lookup checks.pravin shelar1-39/+38
Move route sanity check to respective vxlan[4/6]_get_route functions. This allows us to perform all sanity checks before caching the dst so that we can avoid these checks on subsequent packets. This give move accurate metadata information for packet from fill_metadata_dst(). Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15vxlan: simplify exception handlingpravin shelar1-27/+19
vxlan egress path error handling has became complicated, it need to handle IPv4 and IPv6 tunnel cases. Earlier patch removes vlan handling from vxlan_build_skb(), so vxlan_build_skb does not need to free skb and we can simplify the xmit path by having single error handling for both type of tunnels. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15vxlan: avoid checking socket multiple times.pravin shelar1-7/+5
Check the vxlan socket in vxlan6_getroute(). Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15vxlan: avoid vlan processing in vxlan device.pravin shelar1-8/+1
VxLan device does not have special handling for vlan taging on egress. Therefore it does not make sense to expose vlan offloading feature. This patch does not change vxlan functinality. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+3
Several cases of bug fixes in 'net' overlapping other changes in 'net-next-. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-09vxlan: hide unused local variableArnd Bergmann1-1/+3
A bugfix introduced a harmless warning in v4.9-rc4: drivers/net/vxlan.c: In function 'vxlan_group_used': drivers/net/vxlan.c:947:21: error: unused variable 'sock6' [-Werror=unused-variable] This hides the variable inside of the same #ifdef that is around its user. The extraneous initialization is removed at the same time, it was accidentally introduced in the same commit. Fixes: c6fcc4fc5f8b ("vxlan: avoid using stale vxlan socket.") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-31/+51
Mostly simple overlapping changes. For example, David Ahern's adjacency list revamp in 'net-next' conflicted with an adjacency list traversal bug fix in 'net'. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-29vxlan: avoid using stale vxlan socket.pravin shelar1-30/+50
When vxlan device is closed vxlan socket is freed. This operation can race with vxlan-xmit function which dereferences vxlan socket. Following patch uses RCU mechanism to avoid this situation. Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net: use core MTU range checking in core net infraJarod Wilson1-30/+34
geneve: - Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu - This one isn't quite as straight-forward as others, could use some closer inspection and testing macvlan: - set min/max_mtu tun: - set min/max_mtu, remove tun_net_change_mtu vxlan: - Merge __vxlan_change_mtu back into vxlan_change_mtu - Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in change_mtu function - This one is also not as straight-forward and could use closer inspection and testing from vxlan folks bridge: - set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in change_mtu function openvswitch: - set min/max_mtu, remove internal_dev_change_mtu - note: max_mtu wasn't checked previously, it's been set to 65535, which is the largest possible size supported sch_teql: - set min/max_mtu (note: max_mtu previously unchecked, used max of 65535) macsec: - min_mtu = 0, max_mtu = 65535 macvlan: - min_mtu = 0, max_mtu = 65535 ntb_netdev: - min_mtu = 0, max_mtu = 65535 veth: - min_mtu = 68, max_mtu = 65535 8021q: - min_mtu = 0, max_mtu = 65535 CC: netdev@vger.kernel.org CC: Nicolas Dichtel <nicolas.dichtel@6wind.com> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> CC: Tom Herbert <tom@herbertland.com> CC: Daniel Borkmann <daniel@iogearbox.net> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Paolo Abeni <pabeni@redhat.com> CC: Jiri Benc <jbenc@redhat.com> CC: WANG Cong <xiyou.wangcong@gmail.com> CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Pravin B Shelar <pshelar@ovn.org> CC: Sabrina Dubroca <sd@queasysnail.net> CC: Patrick McHardy <kaber@trash.net> CC: Stephen Hemminger <stephen@networkplumber.org> CC: Pravin Shelar <pshelar@nicira.com> CC: Maxim Krasnyansky <maxk@qti.qualcomm.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-10-20net: add recursion limit to GROSabrina Dubroca1-1/+1
Currently, GRO can do unlimited recursion through the gro_receive handlers. This was fixed for tunneling protocols by limiting tunnel GRO to one level with encap_mark, but both VLAN and TEB still have this problem. Thus, the kernel is vulnerable to a stack overflow, if we receive a packet composed entirely of VLAN headers. This patch adds a recursion counter to the GRO layer to prevent stack overflow. When a gro_receive function hits the recursion limit, GRO is aborted for this skb and it is processed normally. This recursion counter is put in the GRO CB, but could be turned into a percpu counter if we run out of space in the CB. Thanks to Vladimír Beneš <vbenes@redhat.com> for the initial bug report. Fixes: CVE-2016-7039 Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.") Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Jiri Benc <jbenc@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-26/+12
Conflicts: drivers/net/ethernet/mediatek/mtk_eth_soc.c drivers/net/ethernet/qlogic/qed/qed_dcbx.c drivers/net/phy/Kconfig All conflicts were cases of overlapping commits. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-10net/ip_tunnels: Introduce tunnel_id_to_key32() and key32_to_tunnel_id()Amir Vadai1-2/+2
Add utility functions to convert a 32 bits key into a 64 bits tunnel and vice versa. These functions will be used instead of cloning code in GRE and VXLAN, and in tc act_iptunnel which will be introduced in a following patch in this patchset. Signed-off-by: Amir Vadai <amir@vadai.me> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: Jiri Benc <jbenc@redhat.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-06vxlan: Update tx_errors statistics if vxlan_build_skb return err.Haishuang Yan1-0/+1
If vxlan_build_skb return err < 0, tx_errors should be also increased. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-04vxlan: fix duplicated and wrong error messagesJiri Benc1-26/+9
vxlan_dev_configure outputs error messages before returning, no need to print again the same mesages in vxlan_newlink. Also, vxlan_dev_configure may return a particular error code for a different reason than vxlan_newlink thinks. Move the remaining error messages into vxlan_dev_configure and let vxlan_newlink just pass on the error code. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-04vxlan: reject multicast destination without an interfaceJiri Benc1-0/+3
Currently, kernel accepts configurations such as: ip l a type vxlan dstport 4789 id 1 group 239.192.0.1 ip l a type vxlan dstport 4789 id 1 group ff0e::110 However, neither of those really works. In the IPv4 case, the interface cannot be brought up ("RTNETLINK answers: No such device"). This is because multicast join will be rejected without the interface being specified. In the IPv6 case, multicast wil be joined on the first interface found. This is not what the user wants as it depends on random factors (order of interfaces). Note that it's possible to add a local address but it doesn't solve anything. For IPv4, it's not considered in the multicast join (thus the same error as above is returned on ifup). This could be added but it wouldn't help for IPv6 anyway. For IPv6, we do need the interface. Just reject a configuration that sets multicast address and does not provide an interface. Nobody can depend on the previous behavior as it never worked. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-04vxlan: call peernet2id() in fdb notificationWANG Cong1-1/+1
netns id should be already allocated each time we change netns, that is, in dev_change_net_namespace() (more precisely in rtnl_fill_ifinfo()). It is safe to just call peernet2id() here. Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-01rtnetlink: fdb dump: optimize by saving last interface markersRoopa Prabhu1-8/+6
fdb dumps spanning multiple skb's currently restart from the first interface again for every skb. This results in unnecessary iterations on the already visited interfaces and their fdb entries. In large scale setups, we have seen this to slow down fdb dumps considerably. On a system with 30k macs we see fdb dumps spanning across more than 300 skbs. To fix the problem, this patch replaces the existing single fdb marker with three markers: netdev hash entries, netdevs and fdb index to continue where we left off instead of restarting from the first netdev. This is consistent with link dumps. In the process of fixing the performance issue, this patch also re-implements fix done by commit 472681d57a5d ("net: ndo_fdb_dump should report -EMSGSIZE to rtnl_fdb_dump") (with an internal fix from Wilson Kok) in the following ways: - change ndo_fdb_dump handlers to return error code instead of the last fdb index - use cb->args strictly for dump frag markers and not error codes. This is consistent with other dump functions. Below results were taken on a system with 1000 netdevs and 35085 fdb entries: before patch: $time bridge fdb show | wc -l 15065 real 1m11.791s user 0m0.070s sys 1m8.395s (existing code does not return all macs) after patch: $time bridge fdb show | wc -l 35085 real 0m2.017s user 0m0.113s sys 0m1.942s Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-26vxlan: remove the useless header file protocol.hZhu Yanjun1-1/+0
This header file is not used in vxlan.c file. Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08net: vxlan: lwt: Fix vxlan local traffic.pravin shelar1-2/+2
vxlan driver has bypass for local vxlan traffic, but that depends on information about all VNIs on local system in vxlan driver. This is not available in case of LWT. Therefore following patch disable encap bypass for LWT vxlan traffic. Fixes: ee122c79d42 ("vxlan: Flow based tunneling"). Reported-by: Jakub Libosvar <jlibosva@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-08-08net: vxlan: lwt: Use source ip address during route lookup.pravin shelar1-12/+18
LWT user can specify destination as well as source ip address for given tunnel endpoint. But vxlan is ignoring given source ip address. Following patch uses both ip address to route the tunnel packet. This consistent with other LWT implementations, like GENEVE and GRE. Fixes: ee122c79d42 ("vxlan: Flow based tunneling"). Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11drivers/net: fixup comments after "Future-proof tunnel offload handlers"Sabrina Dubroca1-2/+2
Some comments weren't updated to reflect the renaming of ndo's and the change of arguments. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-24/+34
Several cases of overlapping changes, except the packet scheduler conflicts which deal with the addition of the free list parameter to qdisc_enqueue(). Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17vxlan: Add new UDP encapsulation offload type for VXLAN-GPEAlexander Duyck1-0/+6
The fact is VXLAN with Generic Protocol Extensions cannot be supported by the same hardware parsers that support VXLAN. The protocol extensions allow for things like a Next Protocol field which in turn allows for things other than Ethernet to be passed over the tunnel. Most existing parsers will not know how to interpret this. To resolve this I am giving VXLAN-GPE its own UDP encapsulation offload type. This way hardware that does support GPE can simply add this type to the switch statement for VXLAN, and if they don't support it then this will fix any issues where headers might be interpreted incorrectly. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17net: Merge VXLAN and GENEVE push notifiers into a single notifierAlexander Duyck1-1/+1
This patch merges the notifiers for VXLAN and GENEVE into a single UDP tunnel notifier. The idea is that we will want to only have to make one notifier call to receive the list of ports for VXLAN and GENEVE tunnels that need to be offloaded. In addition we add a new set of ndo functions named ndo_udp_tunnel_add and ndo_udp_tunnel_del that are meant to allow us to track the tunnel meta-data such as port and address family as tunnels are added and removed. The tunnel meta-data is now transported in a structure named udp_tunnel_info which for now carries the type, address family, and port number. In the future this could be updated so that we can include a tuple of values including things such as the destination IP address and other fields. I also ended up going with a naming scheme that consisted of using the prefix udp_tunnel on function names. I applied this to the notifier and ndo ops as well so that it hopefully points to the fact that these are primarily used in the udp_tunnel functions. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17net: Combine GENEVE and VXLAN port notifiers into single functionsAlexander Duyck1-51/+9
This patch merges the GENEVE and VXLAN code so that both functions pass through a shared code path. This way we can start the effort of using a single function on the network device drivers to handle both of these tunnel types. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17vxlan/geneve: Include udp_tunnel.h in vxlan/geneve.h and fixup includesAlexander Duyck1-17/+0
This patch makes it so that we add udp_tunnel.h to vxlan.h and geneve.h header files. This is useful as I plan to move the generic handlers for the port offloads into the udp_tunnel header file and leave the vxlan and geneve headers to be a bit more protocol specific. I also went through and cleaned out a number of redundant includes that where in the .h and .c files for these drivers. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-14ovs/vxlan: fix rtnl notifications on iface deletionNicolas Dichtel1-24/+34
The function vxlan_dev_create() (only used by ovs) never calls rtnl_configure_link(). The consequence is that dev->rtnl_link_state is never set to RTNL_LINK_INITIALIZED. During the deletion phase, the function rollback_registered_many() sends a RTM_DELLINK only if dev->rtnl_link_state is set to RTNL_LINK_INITIALIZED. Note that the function vxlan_dev_create() is moved after the rtnl stuff so that vxlan_dellink() can be called in this function. Fixes: dcc38c033b32 ("openvswitch: Re-add CONFIG_OPENVSWITCH_VXLAN") CC: Thomas Graf <tgraf@suug.ch> CC: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-31vxlan: Accept user specified MTU value when create new vxlan linkChen Haiquan1-0/+3
When create a new vxlan link, example: ip link add vtap mtu 1440 type vxlan vni 1 dev eth0 The argument "mtu" has no effect, because it is not set to conf->mtu. The default value is used in vxlan_dev_configure function. This problem was introduced by commit 0dfbdf4102b9 (vxlan: Factor out device configuration). Fixes: 0dfbdf4102b9 (vxlan: Factor out device configuration) Signed-off-by: Chen Haiquan <oc@yunify.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-20udp: prevent skbs lingering in tunnel socket queuesHannes Frederic Sowa1-2/+2
In case we find a socket with encapsulation enabled we should call the encap_recv function even if just a udp header without payload is available. The callbacks are responsible for correctly verifying and dropping the packets. Also, in case the header validation fails for geneve and vxlan we shouldn't put the skb back into the socket queue, no one will pick them up there. Instead we can simply discard them in the respective encap_recv functions. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-16vxlan: set mac_header correctly in GPE modeJiri Benc1-0/+1
For VXLAN-GPE, the interface is ARPHRD_NONE, thus we need to reset mac_header after pulling the outer header. v2: Put the code to the existing conditional block as suggested by Shmulik Ladkani. Fixes: e1e5314de08b ("vxlan: implement GPE") Signed-off-by: Jiri Benc <jbenc@redhat.com> Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+3
In netdevice.h we removed the structure in net-next that is being changes in 'net'. In macsec.c and rtnetlink.c we have overlaps between fixes in 'net' and the u64 attribute changes in 'net-next'. The mlx5 conflicts have to do with vxlan support dependencies. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06udp_offload: Set encapsulation before inner completes.Jarno Rajahalme1-0/+3
UDP tunnel segmentation code relies on the inner offsets being set for an UDP tunnel GSO packet, but the inner *_complete() functions will set the inner offsets only if 'encapsulation' is set before calling them. Currently, udp_gro_complete() sets 'encapsulation' only after the inner *_complete() functions are done. This causes the inner offsets having invalid values after udp_gro_complete() returns, which in turn will make it impossible to properly segment the packet in case it needs to be forwarded, which would be visible to the user either as invalid packets being sent or as packet loss. This patch fixes this by setting skb's 'encapsulation' in udp_gro_complete() before calling into the inner complete functions, and by making each possible UDP tunnel gro_complete() callback set the inner_mac_header to the beginning of the tunnel payload. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Reviewed-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06udp_tunnel: Remove redundant udp_tunnel_gro_complete().Jarno Rajahalme1-2/+0
The setting of the UDP tunnel GSO type is already performed by udp[46]_gro_complete(). Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-29vxlan: fix initialization with custom link parametersJiri Benc1-4/+4
Commit 0c867c9bf84c ("vxlan: move Ethernet initialization to a separate function") changed initialization order and as an unintended result, when the user specifies additional link parameters (such as IFLA_ADDRESS) while creating vxlan interface, those are overwritten by vxlan_ether_setup later. It's necessary to call ether_setup from withing the ->setup callback. That way, the correct parameters are set by rtnl_create_link later. This is done also for VXLAN-GPE, as we don't know the interface type yet at that point, and changed to the correct interface type later. Fixes: 0c867c9bf84c ("vxlan: move Ethernet initialization to a separate function") Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21vxlan: break dependency with netdev driversHannes Frederic Sowa1-5/+9
Currently all drivers depend and autoload the vxlan module because how vxlan_get_rx_port is linked into them. Remove this dependency: By using a new event type in the netdevice notifier call chain we proxy the request from the drivers to flush and resetup the vxlan ports not directly via function call but by the already existing netdevice notifier call chain. I added a separate new event type, NETDEV_OFFLOAD_PUSH_VXLAN, to do so. We don't need to save those ids, as the event type field is an unsigned long and using specialized event types for this purpose seemed to be a more elegant way. This also comes in beneficial if in future we want to add offloading knobs for vxlan. Cc: Jesse Gross <jesse@kernel.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-16ip_tunnel_core: iptunnel_handle_offloads returns int and doesn't free skbAlexander Duyck1-3/+3
This patch updates the IP tunnel core function iptunnel_handle_offloads so that we return an int and do not free the skb inside the function. This actually allows us to clean up several paths in several tunnels so that we can free the skb at one point in the path without having to have a secondary path if we are supporting tunnel offloads. In addition it should resolve some double-free issues I have found in the tunnels paths as I believe it is possible for us to end up triggering such an event in the case of fou or gue. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-16vxlan: reduce usage of synchronize_net in ndo_stopHannes Frederic Sowa1-8/+20
We only need to do the synchronize_net dance once for both, ipv4 and ipv6 sockets, thus removing one synchronize_net in case both sockets get dismantled. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-16vxlan: synchronously and race-free destruction of vxlan socketsHannes Frederic Sowa1-17/+3
Due to the fact that the udp socket is destructed asynchronously in a work queue, we have some nondeterministic behavior during shutdown of vxlan tunnels and creating new ones. Fix this by keeping the destruction process synchronous in regards to the user space process so IFF_UP can be reliably set. udp_tunnel_sock_release destroys vs->sock->sk if reference counter indicates so. We expect to have the same lifetime of vxlan_sock and vxlan_sock->sock->sk even in fast paths with only rcu locks held. So only destruct the whole socket after we can be sure it cannot be found by searching vxlan_net->sock_list. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jiri Benc <jbenc@redhat.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-11vxlan: fix incorrect typeJiri Benc1-2/+2
The protocol is 16bit, not 32bit. Fixes: e1e5314de08ba ("vxlan: implement GPE") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07vxlan: change vxlan to use UDP socket GROTom Herbert1-22/+8
Adapt vxlan_gro_receive, vxlan_gro_complete to take a socket argument. Set these functions in tunnel_config. Don't set udp_offloads any more. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06vxlan: implement GPEJiri Benc1-17/+153
Implement VXLAN-GPE. Only COLLECT_METADATA is supported for now (it is possible to support static configuration, too, if there is demand for it). The GPE header parsing has to be moved before iptunnel_pull_header, as we need to know the protocol. v2: Removed what was called "L2 mode" in v1 of the patchset. Only "L3 mode" (now called "raw mode") is added by this patch. This mode does not allow Ethernet header to be encapsulated in VXLAN-GPE when using ip route to specify the encapsulation, IP header is encapsulated instead. The patch does support Ethernet to be encapsulated, though, using ETH_P_TEB in skb->protocol. This will be utilized by other COLLECT_METADATA users (openvswitch in particular). If there is ever demand for Ethernet encapsulation with VXLAN-GPE using ip route, it's easy to add a new flag switching the interface to "Ethernet mode" (called "L2 mode" in v1 of this patchset). For now, leave this out, it seems we don't need it. Disallowed more flag combinations, especially RCO with GPE. Added comment explaining that GBP and GPE cannot be set together. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06vxlan: move fdb code to common location in vxlan_xmitJiri Benc1-11/+11
Handle VXLAN_F_COLLECT_METADATA before VXLAN_F_PROXY. The latter does not make sense with the former, as it needs populated fdb which does not happen in metadata mode. After this cleanup, the fdb code in vxlan_xmit is moved to a common location and can be later skipped for VXLAN-GPE which does not necessarily carry inner Ethernet header. v2: changed commit description to not reference L3 mode Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06vxlan: move Ethernet initialization to a separate functionJiri Benc1-7/+13
This will allow to initialize vxlan in ARPHRD_NONE mode based on the passed rtnl attributes. v2: renamed "l2mode" to "ether". Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-21vxlan: fix too large pskb_may_pull with remote checksumJiri Benc1-4/+2
vxlan_remcsum is called after iptunnel_pull_header and thus the skb has vxlan header already pulled. Don't include vxlan header again in the calculation. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-20vxlan: fix populating tclass in vxlan6_get_routeDaniel Borkmann1-2/+1
Jiri mentioned that flowi6_tos of struct flowi6 is never used/read anywhere. In fact, rest of the kernel uses the flowi6's flowlabel, where the traffic class _and_ the flowlabel (aka flowinfo) is encoded. For example, for policy routing, fib6_rule_match() uses ip6_tclass() that is applied on the flowlabel member for matching on tclass. Similar fix is needed for geneve, where flowi6_tos is set as well. Installing a v6 blackhole rule that f.e. matches on tos is now working with vxlan. Fixes: 1400615d64cf ("vxlan: allow setting ipv6 traffic class") Reported-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13gro: Defer clearing of flush bit in tunnel pathsAlexander Duyck1-2/+1
This patch updates the GRO handlers for GRE, VXLAN, GENEVE, and FOU so that we do not clear the flush bit until after we have called the next level GRO handler. Previously this was being cleared before parsing through the list of frames, however this resulted in several paths where either the bit needed to be reset but wasn't as in the case of FOU, or cases where it was being set as in GENEVE. By just deferring the clearing of the bit until after the next level protocol has been parsed we can avoid any unnecessary bit twiddling and avoid bugs. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>