aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv6/route.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-03-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+6
2019-03-01ipv4: Add ICMPv6 support when parse route ipprotoHangbin Liu1-1/+2
For ip rules, we need to use 'ipproto ipv6-icmp' to match ICMPv6 headers. But for ip -6 route, currently we only support tcp, udp and icmp. Add ICMPv6 support so we can match ipv6-icmp rules for route lookup. v2: As David Ahern and Sabrina Dubroca suggested, Add an argument to rtm_getroute_parse_ip_proto() to handle ICMP/ICMPv6 with different family. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: eacb9384a3fe ("ipv6: support sport, dport and ip_proto in RTM_GETROUTE") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-26ipv6: Return error for RTA_VIA attributeDavid Ahern1-0/+4
IPv6 currently does not support nexthops outside of the AF_INET6 family. Specifically, it does not handle RTA_VIA attribute. If it is passed in a route add request, the actual route added only uses the device which is clearly not what the user intended: $ ip -6 ro add 2001:db8:2::/64 via inet 172.16.1.1 dev eth0 $ ip ro ls ... 2001:db8:2::/64 dev eth0 metric 1024 pref medium Catch this and fail the route add: $ ip -6 ro add 2001:db8:2::/64 via inet 172.16.1.1 dev eth0 Error: IPv6 does not support RTA_VIA attribute. Fixes: 03c0566542f4c ("mpls: Netlink commands to add, remove, and dump routes") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-8/+24
Three conflicts, one of which, for marvell10g.c is non-trivial and requires some follow-up from Heiner or someone else. The issue is that Heiner converted the marvell10g driver over to use the generic c45 code as much as possible. However, in 'net' a bug fix appeared which makes sure that a new local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0 is cleared. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255Kalash Nainwal1-1/+1
Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255 to keep legacy software happy. This is similar to what was done for ipv4 in commit 709772e6e065 ("net: Fix routing tables with id > 255 for legacy software"). Signed-off-by: Kalash Nainwal <kalash@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22ipv6: route: purge exception on removalPaolo Abeni1-1/+12
When a netdevice is unregistered, we flush the relevant exception via rt6_sync_down_dev() -> fib6_ifdown() -> fib6_del() -> fib6_del_route(). Finally, we end-up calling rt6_remove_exception(), where we release the relevant dst, while we keep the references to the related fib6_info and dev. Such references should be released later when the dst will be destroyed. There are a number of caches that can keep the exception around for an unlimited amount of time - namely dst_cache, possibly even socket cache. As a result device registration may hang, as demonstrated by this script: ip netns add cl ip netns add rt ip netns add srv ip netns exec rt sysctl -w net.ipv6.conf.all.forwarding=1 ip link add name cl_veth type veth peer name cl_rt_veth ip link set dev cl_veth netns cl ip -n cl link set dev cl_veth up ip -n cl addr add dev cl_veth 2001::2/64 ip -n cl route add default via 2001::1 ip -n cl link add tunv6 type ip6tnl mode ip6ip6 local 2001::2 remote 2002::1 hoplimit 64 dev cl_veth ip -n cl link set tunv6 up ip -n cl addr add 2013::2/64 dev tunv6 ip link set dev cl_rt_veth netns rt ip -n rt link set dev cl_rt_veth up ip -n rt addr add dev cl_rt_veth 2001::1/64 ip link add name rt_srv_veth type veth peer name srv_veth ip link set dev srv_veth netns srv ip -n srv link set dev srv_veth up ip -n srv addr add dev srv_veth 2002::1/64 ip -n srv route add default via 2002::2 ip -n srv link add tunv6 type ip6tnl mode ip6ip6 local 2002::1 remote 2001::2 hoplimit 64 dev srv_veth ip -n srv link set tunv6 up ip -n srv addr add 2013::1/64 dev tunv6 ip link set dev rt_srv_veth netns rt ip -n rt link set dev rt_srv_veth up ip -n rt addr add dev rt_srv_veth 2002::2/64 ip netns exec srv netserver & sleep 0.1 ip netns exec cl ping6 -c 4 2013::1 ip netns exec cl netperf -H 2013::1 -t TCP_STREAM -l 3 & sleep 1 ip -n rt link set dev rt_srv_veth mtu 1400 wait %2 ip -n cl link del cl_veth This commit addresses the issue purging all the references held by the exception at time, as we currently do for e.g. ipv6 pcpu dst entries. v1 -> v2: - re-order the code to avoid accessing dst and net after dst_dev_put() Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21ipv6: route: enforce RCU protection in ip6_route_check_nh_onlink()Paolo Abeni1-1/+5
We need a RCU critical section around rt6_info->from deference, and proper annotation. Fixes: 4ed591c8ab44 ("net/ipv6: Allow onlink routes to have a device mismatch if it is the default route") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21ipv6: route: enforce RCU protection in rt6_update_exception_stamp_rt()Paolo Abeni1-5/+6
We must access rt6_info->from under RCU read lock: move the dereference under such lock, with proper annotation. v1 -> v2: - avoid using multiple, racy, fetch operations for rt->from Fixes: a68886a69180 ("net/ipv6: Make from in rt6_info rcu protected") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15net/ipv6: prefer rcu_access_pointer() over rcu_dereference()Paolo Abeni1-7/+1
rt6_cache_allowed_for_pmtu() checks for rt->from presence, but it does not access the RCU protected pointer. We can use rcu_access_pointer() and clean-up the code a bit. No functional changes intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-12/+2
Completely minor snmp doc conflict. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-19net: ipv6: route: perform strict checks also for doit handlersJakub Kicinski1-2/+68
Make RTM_GETROUTE's doit handler use strict checks when NETLINK_F_STRICT_CHK is set. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-16ipv6: route: place a warning with duplicated string with correct extackJakub Kicinski1-12/+2
"IPv6: " prefix is already added by pr_fmt, no need to include it again in the pr_warn() format. The message predates extack support, we can replace the whole thing with an extack message. Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-02ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() errorStefano Brivio1-1/+3
In ip6_neigh_lookup(), we must not return errors coming from neigh_create(): if creation of a neighbour entry fails, the lookup should return NULL, in the same way as it's done in __neigh_lookup(). Otherwise, callers legitimately checking for a non-NULL return value of the lookup function might dereference an invalid pointer. For instance, on neighbour table overflow, ndisc_router_discovery() crashes ndisc_update() by passing ERR_PTR(-ENOBUFS) as 'neigh' argument. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: f8a1b43b709d ("net/ipv6: Create a neigh_lookup for FIB entries") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-27ipv6/route: Add a missing check on proc_dointvecAditya Pakki1-1/+5
While flushing the cache via ipv6_sysctl_rtcache_flush(), the call to proc_dointvec() may fail. The fix adds a check that returns the error, on failure. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-6/+8
2018-11-18ipv6: Fix PMTU updates for UDP/raw sockets in presence of VRFDavid Ahern1-2/+5
Preethi reported that PMTU discovery for UDP/raw applications is not working in the presence of VRF when the socket is not bound to a device. The problem is that ip6_sk_update_pmtu does not consider the L3 domain of the skb device if the socket is not bound. Update the function to set oif to the L3 master device if relevant. Fixes: ca254490c8df ("net: Add VRF support to IPv6 stack") Reported-by: Preethi Ramachandra <preethir@juniper.net> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-16ipv6: fix a dst leak when removing its exceptionXin Long1-4/+3
These is no need to hold dst before calling rt6_remove_exception_rt(). The call to dst_hold_safe() in ip6_link_failure() was for ip6_del_rt(), which has been removed in Commit 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes"). Otherwise, it will cause a dst leak. This patch is to simply remove the dst_hold_safe() call before calling rt6_remove_exception_rt() and also do the same in ip6_del_cached_rt(). It's safe, because the removal of the exception that holds its dst's refcnt is protected by rt6_exception_lock. Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Fixes: 23fb93a4d3f1 ("net/ipv6: Cleanup exception and cache route handling") Reported-by: Li Shuang <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-06net: Add extack argument to ip_fib_metrics_initDavid Ahern1-2/+3
Add extack argument to ip_fib_metrics_init and add messages for invalid metrics. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-24net/ipv6: Allow onlink routes to have a device mismatch if it is the default routeDavid Ahern1-0/+2
The intent of ip6_route_check_nh_onlink is to make sure the gateway given for an onlink route is not actually on a connected route for a different interface (e.g., 2001:db8:1::/64 is on dev eth1 and then an onlink route has a via 2001:db8:1::1 dev eth2). If the gateway lookup hits the default route then it most likely will be a different interface than the onlink route which is ok. Update ip6_route_check_nh_onlink to disregard the device mismatch if the gateway lookup hits the default route. Turns out the existing onlink tests are passing because there is no default route or it is an unreachable default, so update the onlink tests to have a default route other than unreachable. Fixes: fc1e64e1092f6 ("net/ipv6: Add support for onlink flag") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-6/+6
net/sched/cls_api.c has overlapping changes to a call to nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL to the 5th argument, and another (from 'net-next') added cb->extack instead of NULL to the 6th argument. net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to code which moved (to mr_table_dump)) in 'net-next'. Thanks to David Ahern for the heads up. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16net/ipv6: Plumb support for filtering route dumpsDavid Ahern1-8/+32
Implement kernel side filtering of routes by table id, egress device index, protocol, and route type. If the table id is given in the filter, lookup the table and call fib6_dump_table directly for it. Move the existing route flags check for prefix only routes to the new filter. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15ipv6: rate-limit probes for neighbourless routesSabrina Dubroca1-6/+6
When commit 270972554c91 ("[IPV6]: ROUTE: Add Router Reachability Probing (RFC4191).") introduced router probing, the rt6_probe() function required that a neighbour entry existed. This neighbour entry is used to record the timestamp of the last probe via the ->updated field. Later, commit 2152caea7196 ("ipv6: Do not depend on rt->n in rt6_probe().") removed the requirement for a neighbour entry. Neighbourless routes skip the interval check and are not rate-limited. This patch adds rate-limiting for neighbourless routes, by recording the timestamp of the last probe in the fib6_info itself. Fixes: 2152caea7196 ("ipv6: Do not depend on rt->n in rt6_probe().") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12net/ipv6: Add knob to skip DELROUTE message on device downDavid Ahern1-1/+19
Another difference between IPv4 and IPv6 is the generation of RTM_DELROUTE notifications when a device is taken down (admin down) or deleted. IPv4 does not generate a message for routes evicted by the down or delete; IPv6 does. A NOS at scale really needs to avoid these messages and have IPv4 and IPv6 behave similarly, relying on userspace to handle link notifications and evict the routes. At this point existing user behavior needs to be preserved. Since notifications are a global action (not per app) the only way to preserve existing behavior and allow the messages to be skipped is to add a new sysctl (net/ipv6/route/skip_notify_on_dev_down) which can be set to disable the notificatioons. IPv6 route code already supports the option to skip the message (it is used for multipath routes for example). Besides the new sysctl we need to pass the skip_notify setting through the generic fib6_clean and fib6_walk functions to fib6_clean_node and to set skip_notify on calls to __ip_del_rt for the addrconf_ifdown path. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10net/ipv6: Make ipv6_route_table_template staticDavid Ahern1-1/+1
ipv6_route_table_template is exported but there are no users outside of route.c. Make it static. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08net: Add extack to nlmsg_parseDavid Ahern1-1/+1
Make sure extack is passed to nlmsg_parse where easy to do so. Most of these are dump handlers and leveraging the extack in the netlink_callback. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-05ipv6: do not leave garbage in rt->fib6_metricsEric Dumazet1-0/+2
In case ip_fib_metrics_init() returns an error, we better rewrite rt->fib6_metrics with &dst_default_metrics so that we do not crash later in ip_fib_metrics_put() Fixes: 767a2217533f ("net: common metrics init helper for FIB entries") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04net: Move free of dst_metrics to helperDavid Ahern1-4/+1
Move the refcounting and potential free of dst metrics associated for ipv4 and ipv6 to a common helper. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04net: common metrics init helper for dst_entryDavid Ahern1-5/+1
ipv4 and ipv6 both use refcounted metrics if FIB entries have metrics set. Move the common initialization code to a helper and use for both protocols. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04net: common metrics init helper for FIB entriesDavid Ahern1-22/+7
Consolidate initialization of ipv4 and ipv6 metrics when fib entries are created into a single helper, ip_fib_metrics_init, that handles the call to ip_metrics_convert. If no metrics are defined for the fib entry, then the metrics is set to dst_default_metrics. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-5/+0
Minor conflict in net/core/rtnetlink.c, David Ahern's bug fix in 'net' overlapped the renaming of a netlink attribute in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02net: inet6_rtm_getroute() - use new style struct initializer instead of memsetMaciej Żenczykowski1-2/+1
Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02net: rtm_to_fib6_config() - use new style struct initializer instead of memsetMaciej Żenczykowski1-11/+12
(allows for better compiler optimization) Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02net: rtmsg_to_fib6_config() - use new style struct initializer instead of memsetMaciej Żenczykowski1-16/+16
(allows for better compiler optimization) Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02net: ip6_update_pmtu() - use new style struct initializer instead of memsetMaciej Żenczykowski1-9/+8
(allows for better compiler optimization) Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02net: remove 1 always zero parameter from ip6_redirect_no_header()Maciej Żenczykowski1-3/+1
(the parameter in question is mark) Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02net: ip6_redirect_no_header() - use new style struct initializer instead of memsetMaciej Żenczykowski1-9/+8
(allows for better compiler optimization) Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02net: ip6_redirect() - use new style struct initializer instead of memsetMaciej Żenczykowski1-10/+9
(allows for better compiler optimization) Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-26net/ipv6: Remove extra call to ip6_convert_metrics for multipath caseDavid Ahern1-5/+0
The change to move metrics from the dst to rt6_info moved the call to ip6_convert_metrics from ip6_route_add to ip6_route_info_create. In doing so it makes the call in ip6_route_info_append redundant and actually leaks the metrics installed as part of the ip6_route_info_create. Remove the now unnecessary call. Fixes: d4ead6b34b67f ("net/ipv6: move metrics from dst to rt6_info") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+8
Version bump conflict in batman-adv, take what's in net-next. iavf conflict, adjustment of netdev_ops in net-next conflicting with poll controller method removal in net. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-19ipv6: Allow the l3mdev to be a loopbackRobert Shearman1-1/+2
There is no way currently for an IPv6 client connect using a loopback address in a VRF, whereas for IPv4 the loopback address can be added: $ sudo ip addr add dev vrfred 127.0.0.1/8 $ sudo ip -6 addr add ::1/128 dev vrfred RTNETLINK answers: Cannot assign requested address So allow ::1 to be configured on an L3 master device. In order for this to be usable ip_route_output_flags needs to not consider ::1 to be a link scope address (since oif == l3mdev and so it would be dropped), and ipv6_rcv needs to consider the l3mdev to be a loopback device so that it doesn't drop the packets. Signed-off-by: Robert Shearman <rshearma@vyatta.att-mail.com> Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18ipv6: fix memory leak on dst->_metricsWei Wang1-1/+4
When dst->_metrics and f6i->fib6_metrics share the same memory, both take reference count on the dst_metrics structure. However, when dst is destroyed, ip6_dst_destroy() only invokes dst_destroy_metrics_generic() which does not take care of READONLY metrics and does not release refcnt. This causes memory leak. Similar to ipv4 logic, the fix is to properly release refcnt and free the memory space pointed by dst->_metrics if refcnt becomes 0. Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Reported-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18Revert "ipv6: fix double refcount of fib6_metrics"Wei Wang1-0/+4
This reverts commit e70a3aad44cc8b24986687ffc98c4a4f6ecf25ea. This change causes use-after-free on dst->_metrics. The crash trace looks like this: [ 97.763269] BUG: KASAN: use-after-free in ip6_mtu+0x116/0x140 [ 97.769038] Read of size 4 at addr ffff881781d2cf84 by task svw_NetThreadEv/8801 [ 97.777954] CPU: 76 PID: 8801 Comm: svw_NetThreadEv Not tainted 4.15.0-smp-DEV #11 [ 97.777956] Hardware name: Default string Default string/Indus_QC_02, BIOS 5.46.4 03/29/2018 [ 97.777957] Call Trace: [ 97.777971] [<ffffffff895709db>] dump_stack+0x4d/0x72 [ 97.777985] [<ffffffff881651df>] print_address_description+0x6f/0x260 [ 97.777997] [<ffffffff88165747>] kasan_report+0x257/0x370 [ 97.778001] [<ffffffff894488e6>] ? ip6_mtu+0x116/0x140 [ 97.778004] [<ffffffff881658b9>] __asan_report_load4_noabort+0x19/0x20 [ 97.778008] [<ffffffff894488e6>] ip6_mtu+0x116/0x140 [ 97.778013] [<ffffffff892bb91e>] tcp_current_mss+0x12e/0x280 [ 97.778016] [<ffffffff892bb7f0>] ? tcp_mtu_to_mss+0x2d0/0x2d0 [ 97.778022] [<ffffffff887b45b8>] ? depot_save_stack+0x138/0x4a0 [ 97.778037] [<ffffffff87c38985>] ? __mmdrop+0x145/0x1f0 [ 97.778040] [<ffffffff881643b1>] ? save_stack+0xb1/0xd0 [ 97.778046] [<ffffffff89264c82>] tcp_send_mss+0x22/0x220 [ 97.778059] [<ffffffff89273a49>] tcp_sendmsg_locked+0x4f9/0x39f0 [ 97.778062] [<ffffffff881642b4>] ? kasan_check_write+0x14/0x20 [ 97.778066] [<ffffffff89273550>] ? tcp_sendpage+0x60/0x60 [ 97.778070] [<ffffffff881cb359>] ? rw_copy_check_uvector+0x69/0x280 [ 97.778075] [<ffffffff8873c65f>] ? import_iovec+0x9f/0x430 [ 97.778078] [<ffffffff88164be7>] ? kasan_slab_free+0x87/0xc0 [ 97.778082] [<ffffffff8873c5c0>] ? memzero_page+0x140/0x140 [ 97.778085] [<ffffffff881642b4>] ? kasan_check_write+0x14/0x20 [ 97.778088] [<ffffffff89276f6c>] tcp_sendmsg+0x2c/0x50 [ 97.778092] [<ffffffff89276f6c>] ? tcp_sendmsg+0x2c/0x50 [ 97.778098] [<ffffffff89352d43>] inet_sendmsg+0x103/0x480 [ 97.778102] [<ffffffff89352c40>] ? inet_gso_segment+0x15b0/0x15b0 [ 97.778105] [<ffffffff890294da>] sock_sendmsg+0xba/0xf0 [ 97.778108] [<ffffffff8902ab6a>] ___sys_sendmsg+0x6ca/0x8e0 [ 97.778113] [<ffffffff87dccac1>] ? hrtimer_try_to_cancel+0x71/0x3b0 [ 97.778116] [<ffffffff8902a4a0>] ? copy_msghdr_from_user+0x3d0/0x3d0 [ 97.778119] [<ffffffff881646d1>] ? memset+0x31/0x40 [ 97.778123] [<ffffffff87a0cff5>] ? schedule_hrtimeout_range_clock+0x165/0x380 [ 97.778127] [<ffffffff87a0ce90>] ? hrtimer_nanosleep_restart+0x250/0x250 [ 97.778130] [<ffffffff87dcc700>] ? __hrtimer_init+0x180/0x180 [ 97.778133] [<ffffffff87dd1f82>] ? ktime_get_ts64+0x172/0x200 [ 97.778137] [<ffffffff8822b8ec>] ? __fget_light+0x8c/0x2f0 [ 97.778141] [<ffffffff8902d5c6>] __sys_sendmsg+0xe6/0x190 [ 97.778144] [<ffffffff8902d5c6>] ? __sys_sendmsg+0xe6/0x190 [ 97.778147] [<ffffffff8902d4e0>] ? SyS_shutdown+0x20/0x20 [ 97.778152] [<ffffffff87cd4370>] ? wake_up_q+0xe0/0xe0 [ 97.778155] [<ffffffff8902d670>] ? __sys_sendmsg+0x190/0x190 [ 97.778158] [<ffffffff8902d683>] SyS_sendmsg+0x13/0x20 [ 97.778162] [<ffffffff87a1600c>] do_syscall_64+0x2ac/0x430 [ 97.778166] [<ffffffff87c17515>] ? do_page_fault+0x35/0x3d0 [ 97.778171] [<ffffffff8960131f>] ? page_fault+0x2f/0x50 [ 97.778174] [<ffffffff89600071>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 97.778177] RIP: 0033:0x7f83fa36000d [ 97.778178] RSP: 002b:00007f83ef9229e0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [ 97.778180] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f83fa36000d [ 97.778182] RDX: 0000000000004000 RSI: 00007f83ef922f00 RDI: 0000000000000036 [ 97.778183] RBP: 00007f83ef923040 R08: 00007f83ef9231f8 R09: 00007f83ef923168 [ 97.778184] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f83f69c5b40 [ 97.778185] R13: 000000000000001c R14: 0000000000000001 R15: 0000000000004000 [ 97.779684] Allocated by task 5919: [ 97.783185] save_stack+0x46/0xd0 [ 97.783187] kasan_kmalloc+0xad/0xe0 [ 97.783189] kmem_cache_alloc_trace+0xdf/0x580 [ 97.783190] ip6_convert_metrics.isra.79+0x7e/0x190 [ 97.783192] ip6_route_info_create+0x60a/0x2480 [ 97.783193] ip6_route_add+0x1d/0x80 [ 97.783195] inet6_rtm_newroute+0xdd/0xf0 [ 97.783198] rtnetlink_rcv_msg+0x641/0xb10 [ 97.783200] netlink_rcv_skb+0x27b/0x3e0 [ 97.783202] rtnetlink_rcv+0x15/0x20 [ 97.783203] netlink_unicast+0x4be/0x720 [ 97.783204] netlink_sendmsg+0x7bc/0xbf0 [ 97.783205] sock_sendmsg+0xba/0xf0 [ 97.783207] ___sys_sendmsg+0x6ca/0x8e0 [ 97.783208] __sys_sendmsg+0xe6/0x190 [ 97.783209] SyS_sendmsg+0x13/0x20 [ 97.783211] do_syscall_64+0x2ac/0x430 [ 97.783213] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 97.784709] Freed by task 0: [ 97.785056] knetbase: Error: /proc/sys/net/core/txcs_enable does not exist [ 97.794497] save_stack+0x46/0xd0 [ 97.794499] kasan_slab_free+0x71/0xc0 [ 97.794500] kfree+0x7c/0xf0 [ 97.794501] fib6_info_destroy_rcu+0x24f/0x310 [ 97.794504] rcu_process_callbacks+0x38b/0x1730 [ 97.794506] __do_softirq+0x1c8/0x5d0 Reported-by: John Sperbeck <jsperbeck@google.com> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-14/+30
Two new tls tests added in parallel in both net and net-next. Used Stephen Rothwell's linux-next resolution. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-17net/ipv6: do not copy dst flags on rt initPeter Oskolkov1-2/+0
DST_NOCOUNT in dst_entry::flags tracks whether the entry counts toward route cache size (net->ipv6.sysctl.ip6_rt_max_size). If the flag is NOT set, dst_ops::pcpuc_entries counter is incremented in dist_init() and decremented in dst_destroy(). This flag is tied to allocation/deallocation of dst_entry and should not be copied from another dst/route. Otherwise it can happen that dst_ops::pcpuc_entries counter grows until no new routes can be allocated because the counter reached ip6_rt_max_size due to DST_NOCOUNT not set and thus no counter decrements on gc-ed routes. Fixes: 3b6761d18bc1 ("net/ipv6: Move dst flags to booleans in fib entries") Cc: David Ahern <dsahern@gmail.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-13ipv6: use rt6_info members when dst is set in rt6_fill_nodeXin Long1-12/+30
In inet6_rtm_getroute, since Commit 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes"), it has used rt->from to dump route info instead of rt. However for some route like cache, some of its information like flags or gateway is not the same as that of the 'from' one. It caused 'ip route get' to dump the wrong route information. In Jianlin's testing, the output information even lost the expiration time for a pmtu route cache due to the wrong fib6_flags. So change to use rt6_info members for dst addr, src addr, flags and gateway when it tries to dump a route entry without fibmatch set. v1->v2: - not use rt6i_prefsrc. - also fix the gw dump issue. Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-10net/ipv6: Remove rt6i_prefsrcDavid Ahern1-27/+0
After the conversion to fib6_info, rt6i_prefsrc has a single user that reads the value and otherwise it is only set. The one reader can be converted to use rt->from so rt6i_prefsrc can be removed, reducing rt6_info by another 20 bytes. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+0
2018-09-01ipv6: don't get lwtstate twice in ip6_rt_copy_init()Alexey Kodanev1-1/+0
Commit 80f1a0f4e0cd ("net/ipv6: Put lwtstate when destroying fib6_info") partially fixed the kmemleak [1], lwtstate can be copied from fib6_info, with ip6_rt_copy_init(), and it should be done only once there. rt->dst.lwtstate is set by ip6_rt_init_dst(), at the start of the function ip6_rt_copy_init(), so there is no need to get it again at the end. With this patch, lwtstate also isn't copied from RTF_REJECT routes. [1]: unreferenced object 0xffff880b6aaa14e0 (size 64): comm "ip", pid 10577, jiffies 4295149341 (age 1273.903s) hex dump (first 32 bytes): 01 00 04 00 04 00 00 00 10 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000018664623>] lwtunnel_build_state+0x1bc/0x420 [<00000000b73aa29a>] ip6_route_info_create+0x9f7/0x1fd0 [<00000000ee2c5d1f>] ip6_route_add+0x14/0x70 [<000000008537b55c>] inet6_rtm_newroute+0xd9/0xe0 [<000000002acc50f5>] rtnetlink_rcv_msg+0x66f/0x8e0 [<000000008d9cd381>] netlink_rcv_skb+0x268/0x3b0 [<000000004c893c76>] netlink_unicast+0x417/0x5a0 [<00000000f2ab1afb>] netlink_sendmsg+0x70b/0xc30 [<00000000890ff0aa>] sock_sendmsg+0xb1/0xf0 [<00000000a2e7b66f>] ___sys_sendmsg+0x659/0x950 [<000000001e7426c8>] __sys_sendmsg+0xde/0x170 [<00000000fe411443>] do_syscall_64+0x9f/0x4a0 [<000000001be7b28b>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<000000006d21f353>] 0xffffffffffffffff Fixes: 6edb3c96a5f0 ("net/ipv6: Defer initialization of dst to data path") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31net/ipv6: Do not reset nl_net in ip6_route_info_createDavid Ahern1-2/+0
nl_net is set on entry to ip6_route_info_create. Only devices within that namespace are considered so no need to reset it before returning. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-22net/ipv6: init ip6 anycast rt->dst.input as ip6_inputHangbin Liu1-1/+1
Commit 6edb3c96a5f02 ("net/ipv6: Defer initialization of dst to data path") forgot to handle anycast route and init anycast rt->dst.input to ip6_forward. Fix it by setting anycast rt->dst.input back to ip6_input. Fixes: 6edb3c96a5f02 ("net/ipv6: Defer initialization of dst to data path") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>