path: root/drivers/net/vxlan.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-12-03vxlan: add support for underlay in non-default VRFAlexis Bauvin1-8/+24
Creating a VXLAN device with is underlay in the non-default VRF makes egress route lookup fail or incorrect since it will resolve in the default VRF, and ingress fail because the socket listens in the default VRF. This patch binds the underlying UDP tunnel socket to the l3mdev of the lower device of the VXLAN device. This will listen in the proper VRF and output traffic from said l3mdev, matching l3mdev routing rules and looking up the correct routing table. When the VXLAN device does not have a lower device, or the lower device is in the default VRF, the socket will not be bound to any interface, keeping the previous behaviour. The underlay l3mdev is deduced from the VXLAN lower device (IFLA_VXLAN_LINK). +----------+ +---------+ | | | | | vrf-blue | | vrf-red | | | | | +----+-----+ +----+----+ | | | | +----+-----+ +----+----+ | | | | | br-blue | | br-red | | | | | +----+-----+ +---+-+---+ | | | | +-----+ +-----+ | | | +----+-----+ +------+----+ +----+----+ | | lower device | | | | | eth0 | <- - - - - - - | vxlan-red | | tap-red | (... more taps) | | | | | | +----------+ +-----------+ +---------+ Signed-off-by: Alexis Bauvin <abauvin@scaleway.com> Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: Amine Kherbouche <akherbouche@scaleway.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21vxlan: Allow changing ageing timeIdo Schimmel1-4/+6
In a similar fashion to the bridge device, allow changing the ageing time of the VxLAN device by scheduling its timer to fire if the ageing time changed. One use case is selftests where learning / ageing of VxLAN FDB entries is tested. The default ageing time is 5 minutes, which is too long for a simple selftest. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21vxlan: Add hardware FDB learningPetr Machata1-1/+72
In order to allow devices to signal learning events to VXLAN, introduce two new switchdev messages: SWITCHDEV_VXLAN_FDB_ADD_TO_BRIDGE and SWITCHDEV_VXLAN_FDB_DEL_TO_BRIDGE. Listen to these notifications in the vxlan driver. The FDB entries learned this way have an NTF_EXT_LEARNED flag, and only entries marked as such can be unlearned by the _DEL_ event. They are also immediately marked as offloaded. This is the same behavior that the bridge driver observes. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21vxlan: Don't override user-added entries with ext-learned onesPetr Machata1-9/+17
When an external learning event collides with an user-added entry, the user-added entry shouldn't be taken over. Otherwise on an unlearn event the entry would be completely lost, even though the user added it by hand. Therefore skip update of FDB flags and state for these cases. This is in accordance with the bridge behavior. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21vxlan: Mark user-added FDB entriesPetr Machata1-6/+11
The VXLAN driver needs to differentiate between FDB entries learned by the VXLAN driver, and those added by the user. The latter ones shouldn't be taken over by external learning events. This is in accordance with bridge behavior. Therefore, extend the flags bitfield to 16 bits and add a new private NTF flag to mark the user-added entries. This seems preferable to adding a dedicated boolean, because passing the flag, unlike passing e.g. a true, makes it clear what the meaning of the bit is. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21vxlan: vxlan_fdb_notify(): Make switchdev notification configurablePetr Machata1-30/+41
In a following patch, vxlan is extended to allow hardware FDB learning. For FDB entries learned this way, switchdev notifications should not be sent again, because the driver already knows about these entries. To that end, add an argument vxlan_fdb_notify() to determine whether the switchdev notifications should be sent. Propagate the argument to all call sites transitively, eventually passing true in all root calls. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-21vxlan: __vxlan_fdb_delete(): Drop unused argument vidPetr Machata1-4/+3
This argument is necessary for vxlan_fdb_delete(), the API of which is prescribed by ndo_fdb_del, but __vxlan_fdb_delete() doesn't need it. Therefore drop it. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08vxlan: Allow configuration of DF behaviourStefano Brivio1-1/+28
Allow users to set the IPv4 DF bit in outgoing packets, or to inherit its value from the IPv4 inner header. If the encapsulated protocol is IPv6 and DF is configured to be inherited, always set it. For IPv4, inheriting DF from the inner header was probably intended from the very beginning judging by the comment to vxlan_xmit(), but it wasn't actually implemented -- also because it would have done more harm than good, without handling for ICMP Fragmentation Needed messages. According to RFC 7348, "Path MTU discovery MAY be used". An expired RFC draft, draft-saum-nvo3-pmtud-over-vxlan-05, whose purpose was to describe PMTUD implementation, says that "is a MUST that Vxlan gateways [...] SHOULD set the DF-bit [...]", whatever that means. Given this background, the only sane option is probably to let the user decide, and keep the current behaviour as default. This only applies to non-lwt tunnels: if an external control plane is used, tunnel key will still control the DF flag. v2: - DF behaviour configuration only applies for non-lwt tunnels, move DF setting to if (!info) block in vxlan_xmit_one() (Stephen Hemminger) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08vxlan: ICMP error lookup handlerStefano Brivio1-0/+29
Export an encap_err_lookup() operation to match an ICMP error against a valid VNI. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06net: Add extack argument to rtnl_create_linkDavid Ahern1-1/+1
Add extack arg to rtnl_create_link and add messages for invalid number of Tx or Rx queues. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-10/+2
net/sched/cls_api.c has overlapping changes to a call to nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL to the 5th argument, and another (from 'net-next') added cb->extack instead of NULL to the 6th argument. net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to code which moved (to mr_table_dump)) in 'net-next'. Thanks to David Ahern for the heads up. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17geneve, vxlan: Don't set exceptions if skb->len < mtuStefano Brivio1-2/+2
We shouldn't abuse exceptions: if the destination MTU is already higher than what we're transmitting, no exception should be created. Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path") Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17geneve, vxlan: Don't check skb_dst() twiceStefano Brivio1-10/+2
Commit f15ca723c1eb ("net: don't call update_pmtu unconditionally") avoids that we try updating PMTU for a non-existent destination, but didn't clean up cases where the check was already explicit. Drop those redundant checks. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17vxlan: Notify for each remote of a removed FDB entryPetr Machata1-1/+4
When notifications are sent about FDB activity, and an FDB entry with several remotes is removed, the notification is sent only for the first destination. That makes it impossible to distinguish between the case where only this first remote is removed, and the one where the FDB entry is removed as a whole. Therefore send one notification for each remote of a removed FDB entry. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17vxlan: Support marking RDSTs as offloadedPetr Machata1-1/+58
Offloaded bridge FDB entries are marked with NTF_OFFLOADED. Implement a similar mechanism for VXLAN, where a given remote destination can be marked as offloaded. To that end, introduce a new event, SWITCHDEV_VXLAN_FDB_OFFLOADED, through which the marking is communicated to the vxlan driver. To identify which RDST should be marked as offloaded, an switchdev_notifier_vxlan_fdb_info is passed to the listeners. The "offloaded" flag in that object determines whether the offloaded mark should be set or cleared. When sending offloaded FDB entries over netlink, mark them with NTF_OFFLOADED. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17vxlan: Add vxlan_fdb_find_uc() for FDB queryingPetr Machata1-0/+40
A switchdev-capable driver that is aware of VXLAN may need to query VXLAN FDB. In the particular case of mlxsw, this functionality is limited to querying UC FDBs. Those being easier to deal with than the general case of RDST chain traversal, introduce an interface to query specifically UC FDBs: vxlan_fdb_find_uc(). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17vxlan: Add switchdev notificationsPetr Machata1-2/+44
When offloading VXLAN devices, drivers need to know about events in VXLAN FDB database. Since VXLAN models a bridge, it is natural to distribute the VXLAN FDB notifications using the pre-existing switchdev notification mechanism. To that end, introduce two new notification types: SWITCHDEV_VXLAN_FDB_ADD_TO_DEVICE and SWITCHDEV_VXLAN_FDB_DEL_TO_DEVICE. Introduce a new function, vxlan_fdb_switchdev_call_notifiers() to send the new notifier types, and a struct switchdev_notifier_vxlan_fdb_info to communicate the details of the FDB entry under consideration. Invoke the new function from vxlan_fdb_notify(). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17vxlan: Export address checking functionsIdo Schimmel1-26/+0
Drivers that support VxLAN offload need to be able to sanitize the configuration of the VxLAN device and accept / reject its offload. For example, mlxsw requires that the local IP of the VxLAN device be set and that packets be flooded to unicast IP(s) and not to a multicast group. Expose the functions that perform such checks. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15vxlan: support NTF_USE refresh of fdb entriesRoopa Prabhu1-3/+7
This makes use of NTF_USE in vxlan driver consistent with bridge driver. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+3
Minor conflict in net/core/rtnetlink.c, David Ahern's bug fix in 'net' overlapped the renaming of a netlink attribute in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26vxlan: fill ttl inherit infoHangbin Liu1-0/+3
When add vxlan ttl inherit support, I forgot to fill it when dump vlxan info. Fix it now. Fixes: 72f6d71e491e6 ("vxlan: add ttl inherit support") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-29vxlan: reduce dirty cache line in vxlan_find_macLi RongQing1-1/+1
vxlan_find_mac() unconditionally set f->used for every packet, this causes a cache miss for every packet, since remote, hlist and used of vxlan_fdb share the same cache line, which are accessed when send every packets. so f->used is set only if not equal to jiffies, to reduce dirty cache line times, this gives 3% speed-up with small packets. Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-07vxlan: fix default fdb entry netlink notify ordering during netdev createRoopa Prabhu1-8/+21
Problem: In vxlan_newlink, a default fdb entry is added before register_netdev. The default fdb creation function also notifies user-space of the fdb entry on the vxlan device which user-space does not know about yet. (RTM_NEWNEIGH goes before RTM_NEWLINK for the same ifindex). This patch fixes the user-space netlink notification ordering issue with the following changes: - decouple fdb notify from fdb create. - Move fdb notify after register_netdev. - Call rtnl_configure_link in vxlan newlink handler to notify userspace about the newlink before fdb notify and hence avoiding the user-space race. Fixes: afbd8bae9c79 ("vxlan: add implicit fdb entry for default destination") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-07vxlan: make netlink notify in vxlan_fdb_destroy optionalRoopa Prabhu1-6/+8
Add a new option do_notify to vxlan_fdb_destroy to make sending netlink notify optional. Used by a later patch. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-07vxlan: add new fdb alloc and create helpersRoopa Prabhu1-29/+62
- Add new vxlan_fdb_alloc helper - rename existing vxlan_fdb_create into vxlan_fdb_update: because it really creates or updates an existing fdb entry - move new fdb creation into a separate vxlan_fdb_create Main motivation for this change is to introduce the ability to decouple vxlan fdb creation and notify, used in a later patch. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-03Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-3/+1
Simple overlapping changes in stmmac driver. Adjust skb_gro_flush_final_remcsum function signature to make GRO list changes in net-next, as per Stephen Rothwell's example merge resolution. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-02net: fix use-after-free in GRO with ESPSabrina Dubroca1-3/+1
Since the addition of GRO for ESP, gro_receive can consume the skb and return -EINPROGRESS. In that case, the lower layer GRO handler cannot touch the skb anymore. Commit 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.") converted some of the gro_receive handlers that can lead to ESP's gro_receive so that they wouldn't access the skb when -EINPROGRESS is returned, but missed other spots, mainly in tunneling protocols. This patch finishes the conversion to using skb_gro_flush_final(), and adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and GUE. Fixes: 5f114163f2f5 ("net: Add a skb_gro_flush_final helper.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-29net: check tunnel option type in tunnel flagsPieter Jansen van Vuuren1-1/+2
Check the tunnel option type stored in tunnel flags when creating options for tunnels. Thereby ensuring we do not set geneve, vxlan or erspan tunnel options on interfaces that are not associated with them. Make sure all users of the infrastructure set correct flags, for the BPF helper we have to set all bits to keep backward compatibility. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26net: Convert GRO SKB handling to list_head.David Miller1-5/+6
Manage pending per-NAPI GRO packets via list_head. Return an SKB pointer from the GRO receive handlers. When GRO receive handlers return non-NULL, it means that this SKB needs to be completed at this time and removed from the NAPI queue. Several operations are greatly simplified by this transformation, especially timing out the oldest SKB in the list when gro_count exceeds MAX_GRO_SKBS, and napi_gro_flush() which walks the queue in reverse order. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17vxlan: add ttl inherit supportHangbin Liu1-3/+14
Like tos inherit, ttl inherit should also means inherit the inner protocol's ttl values, which actually not implemented in vxlan yet. But we could not treat ttl == 0 as "use the inner TTL", because that would be used also when the "ttl" option is not specified and that would be a behavior change, and breaking real use cases. So add a different attribute IFLA_VXLAN_TTL_INHERIT when "ttl inherit" is specified with ip cmd. Reported-by: Jianlin Shi <jishi@redhat.com> Suggested-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-4/+2
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net: don't call update_pmtu unconditionallyNicolas Dichtel1-4/+2
Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to: "BUG: unable to handle kernel NULL pointer dereference at (null)" Let's add a helper to check if update_pmtu is available before calling it. Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path") Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path") CC: Roman Kapl <code@rkapl.cz> CC: Xin Long <lucien.xin@gmail.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+19
Lots of overlapping changes. Also on the net-next side the XDP state management is handled more in the generic layers so undo the 'net' nfp fix which isn't applicable in net-next. Include a necessary change by Jakub Kicinski, with log message: ==================== cls_bpf no longer takes care of offload tracking. Make sure netdevsim performs necessary checks. This fixes a warning caused by TC trying to remove a filter it has not added. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19vxlan: update skb dst pmtu on tx pathXin Long1-0/+14
Unlike ip tunnels, now vxlan doesn't do any pmtu update for upper dst pmtu, even if it doesn't match the lower dst pmtu any more. The problem can be reproduced when reducing the vxlan lower dev's pmtu when running netperf. In jianlin's testing, the performance went to 1/7 of the previous. This patch is to update the upper dst pmtu to match the lower dst pmtu on tx path so that packets can be sent out even when lower dev's pmtu has been changed. It also works for metadata dst. Note that this patch doesn't process any pmtu icmp packet. But even in the future, the support for pmtu icmp packets process of udp tunnels will also needs this. The same thing will be done for geneve in another patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19vxlan: speedup vxlan tunnels dismantleHaishuang Yan1-9/+17
Since we now hold RTNL lock in vxlan_exit_net, it's better to batch them to speedup vxlan tunnels dismantle. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-16vxlan: restore dev->mtu setting based on lower deviceAlexey Kodanev1-0/+5
Stefano Brivio says: Commit a985343ba906 ("vxlan: refactor verification and application of configuration") introduced a change in the behaviour of initial MTU setting: earlier, the MTU for a link created on top of a given lower device, without an initial MTU specification, was set to the MTU of the lower device minus headroom as a result of this path in vxlan_dev_configure(): if (!conf->mtu) dev->mtu = lowerdev->mtu - (use_ipv6 ? VXLAN6_HEADROOM : VXLAN_HEADROOM); which is now gone. Now, the initial MTU, in absence of a configured value, is simply set by ether_setup() to ETH_DATA_LEN (1500 bytes). This breaks userspace expectations in case the MTU of the lower device is higher than 1500 bytes minus headroom. This patch restores the previous behaviour on newlink operation. Since max_mtu can be negative and we update dev->mtu directly, also check it for valid minimum. Reported-by: Junhan Yan <juyan@redhat.com> Fixes: a985343ba906 ("vxlan: refactor verification and application of configuration") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-28vxlan: use __be32 type for the param vni in __vxlan_fdb_deleteXin Long1-2/+2
All callers of __vxlan_fdb_delete pass vni with __be32 type, and this param should be declared as __be32 type. Fixes: 3ad7a4b141eb ("vxlan: support fdb and learning in COLLECT_METADATA mode") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-18/+17
Pull networking updates from David Miller: "Highlights: 1) Maintain the TCP retransmit queue using an rbtree, with 1GB windows at 100Gb this really has become necessary. From Eric Dumazet. 2) Multi-program support for cgroup+bpf, from Alexei Starovoitov. 3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew Lunn. 4) Add meter action support to openvswitch, from Andy Zhou. 5) Add a data meta pointer for BPF accessible packets, from Daniel Borkmann. 6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet. 7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli. 8) More work to move the RTNL mutex down, from Florian Westphal. 9) Add 'bpftool' utility, to help with bpf program introspection. From Jakub Kicinski. 10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper Dangaard Brouer. 11) Support 'blocks' of transformations in the packet scheduler which can span multiple network devices, from Jiri Pirko. 12) TC flower offload support in cxgb4, from Kumar Sanghvi. 13) Priority based stream scheduler for SCTP, from Marcelo Ricardo Leitner. 14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg. 15) Add RED qdisc offloadability, and use it in mlxsw driver. From Nogah Frankel. 16) eBPF based device controller for cgroup v2, from Roman Gushchin. 17) Add some fundamental tracepoints for TCP, from Song Liu. 18) Remove garbage collection from ipv6 route layer, this is a significant accomplishment. From Wei Wang. 19) Add multicast route offload support to mlxsw, from Yotam Gigi" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits) tcp: highest_sack fix geneve: fix fill_info when link down bpf: fix lockdep splat net: cdc_ncm: GetNtbFormat endian fix openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start netem: remove unnecessary 64 bit modulus netem: use 64 bit divide by rate tcp: Namespace-ify sysctl_tcp_default_congestion_control net: Protect iterations over net::fib_notifier_ops in fib_seq_sum() ipv6: set all.accept_dad to 0 by default uapi: fix linux/tls.h userspace compilation error usbnet: ipheth: prevent TX queue timeouts when device not ready vhost_net: conditionally enable tx polling uapi: fix linux/rxrpc.h userspace compilation errors net: stmmac: fix LPI transitioning for dwmac4 atm: horizon: Fix irq release error net-sysfs: trigger netlink notification on ifalias change via sysfs openvswitch: Using kfree_rcu() to simplify the code openvswitch: Make local function ovs_nsh_key_attr_size() static openvswitch: Fix return value check in ovs_meter_cmd_features() ...
2017-11-14vxlan: fix the issue that neigh proxy blocks all icmpv6 packetsXin Long1-18/+13
Commit f1fb08f6337c ("vxlan: fix ND proxy when skb doesn't have transport header offset") removed icmp6_code and icmp6_type check before calling neigh_reduce when doing neigh proxy. It means all icmpv6 packets would be blocked by this, not only ns packet. In Jianlin's env, even ping6 couldn't work through it. This patch is to bring the icmp6_code and icmp6_type check back and also removed the same check from neigh_reduce(). Fixes: f1fb08f6337c ("vxlan: fix ND proxy when skb doesn't have transport header offset") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Vincent Bernat <vincent@bernat.im> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-14vxlan: exit_net cleanup checks addedVasily Averin1-0/+4
Be sure that sock_list array initialized in net_init hook was return to initial state Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05timer: Remove init_timer_deferrable() in favor of timer_setup()Kees Cook1-5/+3
This refactors the only users of init_timer_deferrable() to use the new timer_setup() and from_timer(). Removes definition of init_timer_deferrable(). Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: David S. Miller <davem@davemloft.net> # for networking parts Acked-by: Sebastian Reichel <sre@kernel.org> # for drivers/hsi parts Cc: linux-mips@linux-mips.org Cc: Petr Mladek <pmladek@suse.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kalle Valo <kvalo@qca.qualcomm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: linux1394-devel@lists.sourceforge.net Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: linux-s390@vger.kernel.org Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: Wim Van Sebroeck <wim@iguana.be> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ursula Braun <ubraun@linux.vnet.ibm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Harish Patil <harish.patil@cavium.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Manish Chopra <manish.chopra@cavium.com> Cc: Len Brown <len.brown@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux-pm@vger.kernel.org Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Julian Wiedmann <jwi@linux.vnet.ibm.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Mark Gross <mark.gross@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: linux-watchdog@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-wireless@vger.kernel.org Cc: Sebastian Reichel <sre@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Michael Reed <mdr@sgi.com> Cc: netdev@vger.kernel.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Link: https://lkml.kernel.org/r/1507159627-127660-6-git-send-email-keescook@chromium.org
2017-08-29vxlan: factor out VXLAN-GPE next protocolJiri Benc1-25/+7
The values are shared between VXLAN-GPE and NSH. Originally probably by coincidence but I notified both working groups about this last year and they seem to keep the values in sync since then. Hopefully they'll get a single IANA registry for the values, too. (I asked them for that.) Factor out the code to be shared by the NSH implementation. NSH and MPLS values are added in this patch, too. For MPLS, the drafts incorrectly assign only a single value, while we have two MPLS ethertypes. I raised the problem with both groups. For now, I assume the value is for unicast. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-13vxlan: change vxlan_[config_]validate() to use netlink_ext_ack for error reportingGirish Moodalbail1-26/+73
The kernel log is not where users expect error messages for netlink requests; as we have extended acks now, we can replace pr_debug() with NL_SET_ERR_MSG_ATTR(). Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+1
The UDP offload conflict is dealt with by simply taking what is in net-next where we have removed all of the UFO handling code entirely. The TCP conflict was a case of local variables in a function being removed from both net and net-next. In netvsc we had an assignment right next to where a missing set of u64 stats sync object inits were added. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01vxlan: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDPK. Den1-0/+1
In the case that GRO is turned on and the original received packet is CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last csum-unnecessary point, which for instance could occur if the packet comes from another Linux guest on the same Linux host, we have to do either remcsum_adjust or set up CHECKSUM_PARTIAL again with its csum_start properly reset considering RCO. However, since b7fe10e5ebac("gro: Fix remcsum offload to deal with frags in GRO") that barrier in such case could be skipped if GRO turned on, hence we pass over it and the inner L4 validation mistakenly reckons it as a bad csum. This patch makes remcsum_offload being reset at the same time of GRO remcsum cleanup, so as to make it work in such case as before. Fixes: b7fe10e5ebac ("gro: Fix remcsum offload to deal with frags in GRO") Signed-off-by: Koichiro Den <den@klaipeden.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24geneve/vxlan: offload ports on register/unregister eventsSabrina Dubroca1-3/+7
This improves consistency of handling when moving a netdev to another netns. Most drivers currently do a full reset when the device goes up, so that will flush the offload state anyway. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-24geneve/vxlan: add support for NETDEV_UDP_TUNNEL_DROP_INFOSabrina Dubroca1-8/+17
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04net, vxlan: convert vxlan_sock.refcnt from atomic_t to refcount_tReshetova, Elena1-5/+5
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03vxlan: fix hlist corruptionJiri Benc1-11/+21
It's not a good idea to add the same hlist_node to two different hash lists. This leads to various hard to debug memory corruptions. Fixes: b1be00a6c39f ("vxlan: support both IPv4 and IPv6 sockets in a single vxlan device") Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03vxlan: correctly set vxlan->net when creating the device in a netnsSabrina Dubroca1-3/+6
Commit a985343ba906 ("vxlan: refactor verification and application of configuration") modified vxlan device creation, and replaced the assignment of vxlan->net to src_net with dev_net(netdev) in ->setup(). But dev_net(netdev) is not the same as src_net. At the time ->setup() is called, dev_net hasn't been set yet, so we end up creating the socket for the vxlan device in init_net. Fix this by bringing back the assignment of vxlan->net during device creation. Fixes: a985343ba906 ("vxlan: refactor verification and application of configuration") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: David S. Miller <davem@davemloft.net>