aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv6/route.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-02-16ipv6: Fix nlmsg_flags when splitting a multipath routeBenjamin Poirier1-0/+1
When splitting an RTA_MULTIPATH request into multiple routes and adding the second and later components, we must not simply remove NLM_F_REPLACE but instead replace it by NLM_F_CREATE. Otherwise, it may look like the netlink message was malformed. For example, ip route add 2001:db8::1/128 dev dummy0 ip route change 2001:db8::1/128 nexthop via fe80::30:1 dev dummy0 \ nexthop via fe80::30:2 dev dummy0 results in the following warnings: [ 1035.057019] IPv6: RTM_NEWROUTE with no NLM_F_CREATE or NLM_F_REPLACE [ 1035.057517] IPv6: NLM_F_CREATE should be set when creating new route This patch makes the nlmsg sequence look equivalent for __ip6_ins_rt() to what it would get if the multipath route had been added in multiple netlink operations: ip route add 2001:db8::1/128 dev dummy0 ip route change 2001:db8::1/128 nexthop via fe80::30:1 dev dummy0 ip route append 2001:db8::1/128 nexthop via fe80::30:2 dev dummy0 Fixes: 27596472473a ("ipv6: fix ECMP route replacement") Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14ipv6: Add "offload" and "trap" indications to routesIdo Schimmel1-0/+7
In a similar fashion to previous patch, add "offload" and "trap" indication to IPv6 routes. This is done by using two unused bits in 'struct fib6_info' to hold these indications. Capable drivers are expected to set these when processing the various in-kernel route notifications. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-7/+15
Simple overlapping changes in bpf land wrt. bpf_helper_defs.h handling. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-24ipv6: Remove old route notifications and convert listenersIdo Schimmel1-16/+2
Now that mlxsw is converted to use the new FIB notifications it is possible to delete the old ones and use the new replace / append / delete notifications. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-24ipv6: Handle multipath route deletion notificationIdo Schimmel1-0/+26
When an entire multipath route is deleted, only emit a notification if it is the first route in the node. Emit a replace notification in case the last sibling is followed by another route. Otherwise, emit a delete notification. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-24ipv6: Notify multipath route if should be offloadedIdo Schimmel1-0/+48
In a similar fashion to previous patches, only notify the new multipath route if it is the first route in the node or if it was appended to such route. The type of the notification (replace vs. append) is determined based on the number of routes added ('nhn') and the number of sibling routes. If the two do not match, then an append notification should be sent. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-24net: add bool confirm_neigh parameter for dst_ops.update_pmtuHangbin Liu1-7/+15
The MTU update code is supposed to be invoked in response to real networking events that update the PMTU. In IPv6 PMTU update function __ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor confirmed time. But for tunnel code, it will call pmtu before xmit, like: - tnl_update_pmtu() - skb_dst_update_pmtu() - ip6_rt_update_pmtu() - __ip6_rt_update_pmtu() - dst_confirm_neigh() If the tunnel remote dst mac address changed and we still do the neigh confirm, we will not be able to update neigh cache and ping6 remote will failed. So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we should not be invoking dst_confirm_neigh() as we have no evidence of successful two-way communication at this point. On the other hand it is also important to keep the neigh reachability fresh for TCP flows, so we cannot remove this dst_confirm_neigh() call. To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu to choose whether we should do neigh update or not. I will add the parameter in this patch and set all the callers to true to comply with the previous way, and fix the tunnel code one by one on later patches. v5: No change. v4: No change. v3: Do not remove dst_confirm_neigh, but add a new bool parameter in dst_ops.update_pmtu to control whether we should do neighbor confirm. Also split the big patch to small ones for each area. v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu. Suggested-by: David Miller <davem@davemloft.net> Reviewed-by: Guillaume Nault <gnault@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+1
Minor conflict in drivers/s390/net/qeth_l2_main.c, kept the lock from commit c8183f548902 ("s390/qeth: fix potential deadlock on workqueue flush"), removed the code which was removed by commit 9897d583b015 ("s390/qeth: consolidate some duplicated HW cmd code"). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-11-21ipv6: keep track of routes using srcPaolo Abeni1-0/+3
Use a per namespace counter, increment it on successful creation of any route using the source address, decrement it on deletion of such routes. This allows us to check easily if the routing decision in the current namespace depends on the packet source. Will be used by the next patch. Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-20ipv6/route: return if there is no fib_nh_gw_familyHangbin Liu1-1/+1
Previously we will return directly if (!rt || !rt->fib6_nh.fib_nh_gw_family) in function rt6_probe(), but after commit cc3a86c802f0 ("ipv6: Change rt6_probe to take a fib6_nh"), the logic changed to return if there is fib_nh_gw_family. Fixes: cc3a86c802f0 ("ipv6: Change rt6_probe to take a fib6_nh") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-3/+10
One conflict in the BPF samples Makefile, some fixes in 'net' whilst we were converting over to Makefile.target rules in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07ipv6: fixes rt6_probe() and fib6_nh->last_probe initEric Dumazet1-3/+10
While looking at a syzbot KCSAN report [1], I found multiple issues in this code : 1) fib6_nh->last_probe has an initial value of 0. While probably okay on 64bit kernels, this causes an issue on 32bit kernels since the time_after(jiffies, 0 + interval) might be false ~24 days after boot (for HZ=1000) 2) The data-race found by KCSAN I could use READ_ONCE() and WRITE_ONCE(), but we also can take the opportunity of not piling-up too many rt6_probe_deferred() works by using instead cmpxchg() so that only one cpu wins the race. [1] BUG: KCSAN: data-race in find_match / find_match write to 0xffff8880bb7aabe8 of 8 bytes by interrupt on cpu 1: rt6_probe net/ipv6/route.c:663 [inline] find_match net/ipv6/route.c:757 [inline] find_match+0x5bd/0x790 net/ipv6/route.c:733 __find_rr_leaf+0xe3/0x780 net/ipv6/route.c:831 find_rr_leaf net/ipv6/route.c:852 [inline] rt6_select net/ipv6/route.c:896 [inline] fib6_table_lookup+0x383/0x650 net/ipv6/route.c:2164 ip6_pol_route+0xee/0x5c0 net/ipv6/route.c:2200 ip6_pol_route_output+0x48/0x60 net/ipv6/route.c:2452 fib6_rule_lookup+0x3d6/0x470 net/ipv6/fib6_rules.c:117 ip6_route_output_flags_noref+0x16b/0x230 net/ipv6/route.c:2484 ip6_route_output_flags+0x50/0x1a0 net/ipv6/route.c:2497 ip6_dst_lookup_tail+0x25d/0xc30 net/ipv6/ip6_output.c:1049 ip6_dst_lookup_flow+0x68/0x120 net/ipv6/ip6_output.c:1150 inet6_csk_route_socket+0x2f7/0x420 net/ipv6/inet6_connection_sock.c:106 inet6_csk_xmit+0x91/0x1f0 net/ipv6/inet6_connection_sock.c:121 __tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169 tcp_transmit_skb net/ipv4/tcp_output.c:1185 [inline] tcp_xmit_probe_skb+0x19b/0x1d0 net/ipv4/tcp_output.c:3735 read to 0xffff8880bb7aabe8 of 8 bytes by interrupt on cpu 0: rt6_probe net/ipv6/route.c:657 [inline] find_match net/ipv6/route.c:757 [inline] find_match+0x521/0x790 net/ipv6/route.c:733 __find_rr_leaf+0xe3/0x780 net/ipv6/route.c:831 find_rr_leaf net/ipv6/route.c:852 [inline] rt6_select net/ipv6/route.c:896 [inline] fib6_table_lookup+0x383/0x650 net/ipv6/route.c:2164 ip6_pol_route+0xee/0x5c0 net/ipv6/route.c:2200 ip6_pol_route_output+0x48/0x60 net/ipv6/route.c:2452 fib6_rule_lookup+0x3d6/0x470 net/ipv6/fib6_rules.c:117 ip6_route_output_flags_noref+0x16b/0x230 net/ipv6/route.c:2484 ip6_route_output_flags+0x50/0x1a0 net/ipv6/route.c:2497 ip6_dst_lookup_tail+0x25d/0xc30 net/ipv6/ip6_output.c:1049 ip6_dst_lookup_flow+0x68/0x120 net/ipv6/ip6_output.c:1150 inet6_csk_route_socket+0x2f7/0x420 net/ipv6/inet6_connection_sock.c:106 inet6_csk_xmit+0x91/0x1f0 net/ipv6/inet6_connection_sock.c:121 __tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 18894 Comm: udevd Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: cc3a86c802f0 ("ipv6: Change rt6_probe to take a fib6_nh") Fixes: f547fac624be ("ipv6: rate-limit probes for neighbourless routes") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-05icmp: remove duplicate codeMatteo Croce1-4/+1
The same code which recognizes ICMP error packets is duplicated several times. Use the icmp_is_err() and icmpv6_is_err() helpers instead, which do the same thing. ip_multipath_l3_keys() and tcf_nat_act() didn't check for all the error types, assume that they should instead. Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04ipv6: use jhash2() in rt6_exception_hash()Eric Dumazet1-2/+2
Faster jhash2() can be used instead of jhash(), since IPv6 addresses have the needed alignment requirement. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-8/+13
Minor overlapping changes in the btusb and ixgbe drivers. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-11ipv6: Don't use dst gateway directly in ip6_confirm_neigh()Stefano Brivio1-1/+1
This is the equivalent of commit 2c6b55f45d53 ("ipv6: fix neighbour resolution with raw socket") for ip6_confirm_neigh(): we can send a packet with MSG_CONFIRM on a raw socket for a connected route, so the gateway would be :: here, and we should pick the next hop using rt6_nexthop() instead. This was found by code review and, to the best of my knowledge, doesn't actually fix a practical issue: the destination address from the packet is not considered while confirming a neighbour, as ip6_confirm_neigh() calls choose_neigh_daddr() without passing the packet, so there are no similar issues as the one fixed by said commit. A possible source of issues with the existing implementation might come from the fact that, if we have a cached dst, we won't consider it, while rt6_nexthop() takes care of that. I might just not be creative enough to find a practical problem here: the only way to affect this with cached routes is to have one coming from an ICMPv6 redirect, but if the next hop is a directly connected host, there should be no topology for which a redirect applies here, and tests with redirected routes show no differences for MSG_CONFIRM (and MSG_PROBE) packets on raw sockets destined to a directly connected host. However, directly using the dst gateway here is not consistent anymore with neighbour resolution, and, in general, as we want the next hop, using rt6_nexthop() looks like the only sane way to fetch it. Reported-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Guillaume Nault <gnault@redhat.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-07ipv6: addrconf_f6i_alloc - fix non-null pointer check to !IS_ERR()Maciej Żenczykowski1-1/+1
Fixes a stupid bug I recently introduced... ip6_route_info_create() returns an ERR_PTR(err) and not a NULL on error. Fixes: d55a2e374a94 ("net-ipv6: fix excessive RTF_ADDRCONF flag on ::1/128 local route (and others)'") Cc: David Ahern <dsahern@gmail.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-05net: Properly update v4 routes with v6 nexthopDonald Sharp1-5/+6
When creating a v4 route that uses a v6 nexthop from a nexthop group. Allow the kernel to properly send the nexthop as v6 via the RTA_VIA attribute. Broken behavior: $ ip nexthop add via fe80::9 dev eth0 $ ip nexthop show id 1 via fe80::9 dev eth0 scope link $ ip route add 4.5.6.7/32 nhid 1 $ ip route show default via 10.0.2.2 dev eth0 4.5.6.7 nhid 1 via 254.128.0.0 dev eth0 10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 $ Fixed behavior: $ ip nexthop add via fe80::9 dev eth0 $ ip nexthop show id 1 via fe80::9 dev eth0 scope link $ ip route add 4.5.6.7/32 nhid 1 $ ip route show default via 10.0.2.2 dev eth0 4.5.6.7 nhid 1 via inet6 fe80::9 dev eth0 10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 $ v2, v3: Addresses code review comments from David Ahern Fixes: dcb1ecb50edf (“ipv4: Prepare for fib6_nh from a nexthop object”) Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-05ipv6: Fix RTA_MULTIPATH with nexthop objectsDavid Ahern1-1/+1
A change to the core nla helpers was missed during the push of the nexthop changes. rt6_fill_node_nexthop should be calling nla_nest_start_noflag not nla_nest_start. Currently, iproute2 does not print multipath data because of parsing issues with the attribute. Fixes: f88d8ea67fbd ("ipv6: Plumb support for nexthop object in a fib6_info") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-05net-ipv6: fix excessive RTF_ADDRCONF flag on ::1/128 local route (and others)Maciej Żenczykowski1-2/+6
There is a subtle change in behaviour introduced by: commit c7a1ce397adacaf5d4bb2eab0a738b5f80dc3e43 'ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create' Before that patch /proc/net/ipv6_route includes: 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo Afterwards /proc/net/ipv6_route includes: 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80240001 lo ie. the above commit causes the ::1/128 local (automatic) route to be flagged with RTF_ADDRCONF (0x040000). AFAICT, this is incorrect since these routes are *not* coming from RA's. As such, this patch restores the old behaviour. Fixes: c7a1ce397ada ("ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create") Cc: David Ahern <dsahern@gmail.com> Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-1/+1
Just minor overlapping changes in the conflicts here. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-05ipv6: have a single rcu unlock point in __ip6_rt_update_pmtuDavid Ahern1-8/+6
Simplify the unlock path in __ip6_rt_update_pmtu by using a single point where rcu_read_unlock is called. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-05ipv6: Fix unbalanced rcu locking in rt6_update_exception_stamp_rtDavid Ahern1-1/+1
The nexthop path in rt6_update_exception_stamp_rt needs to call rcu_read_unlock if it fails to find a fib6_nh match rather than just returning. Fixes: e659ba31d806 ("ipv6: Handle all fib6_nh in a nexthop in exception handling") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-1/+1
Pull networking fixes from David Miller: 1) Fix AF_XDP cq entry leak, from Ilya Maximets. 2) Fix handling of PHY power-down on RTL8411B, from Heiner Kallweit. 3) Add some new PCI IDs to iwlwifi, from Ihab Zhaika. 4) Fix handling of neigh timers wrt. entries added by userspace, from Lorenzo Bianconi. 5) Various cases of missing of_node_put(), from Nishka Dasgupta. 6) The new NET_ACT_CT needs to depend upon NF_NAT, from Yue Haibing. 7) Various RDS layer fixes, from Gerd Rausch. 8) Fix some more fallout from TCQ_F_CAN_BYPASS generalization, from Cong Wang. 9) Fix FIB source validation checks over loopback, also from Cong Wang. 10) Use promisc for unsupported number of filters, from Justin Chen. 11) Missing sibling route unlink on failure in ipv6, from Ido Schimmel. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits) tcp: fix tcp_set_congestion_control() use from bpf hook ag71xx: fix return value check in ag71xx_probe() ag71xx: fix error return code in ag71xx_probe() usb: qmi_wwan: add D-Link DWM-222 A2 device ID bnxt_en: Fix VNIC accounting when enabling aRFS on 57500 chips. net: dsa: sja1105: Fix missing unlock on error in sk_buff() gve: replace kfree with kvfree selftests/bpf: fix test_xdp_noinline on s390 selftests/bpf: fix "valid read map access into a read-only array 1" on s390 net/mlx5: Replace kfree with kvfree MAINTAINERS: update netsec driver ipv6: Unlink sibling route in case of failure liquidio: Replace vmalloc + memset with vzalloc udp: Fix typo in net/ipv4/udp.c net: bcmgenet: use promisc for unsupported filters ipv6: rt6_check should return NULL if 'from' is NULL tipc: initialize 'validated' field of received packets selftests: add a test case for rp_filter fib: relax source validation check for loopback packets mlxsw: spectrum: Do not process learned records with a dummy FID ...
2019-07-18proc/sysctl: add shared variables for range checkMatteo Croce1-5/+2
In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-17ipv6: rt6_check should return NULL if 'from' is NULLDavid Ahern1-1/+1
Paul reported that l2tp sessions were broken after the commit referenced in the Fixes tag. Prior to this commit rt6_check returned NULL if the rt6_info 'from' was NULL - ie., the dst_entry was disconnected from a FIB entry. Restore that behavior. Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes") Reported-by: Paul Donohue <linux-kernel@PaulSD.com> Tested-by: Paul Donohue <linux-kernel@PaulSD.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08ipv6: Support multipath hashing on inner IP pktsStephen Suryaputra1-0/+36
Make the same support as commit 363887a2cdfe ("ipv4: Support multipath hashing on inner IP pkts for GRE tunnel") for outer IPv6. The hashing considers both IPv4 and IPv6 pkts when they are tunneled by IPv6 GRE. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-01blackhole_netdev: use blackhole_netdev to invalidate dst entriesMahesh Bandewar1-1/+1
Use blackhole_netdev instead of 'lo' device with lower MTU when marking dst "dead". Signed-off-by: Mahesh Bandewar <maheshb@google.com> Tested-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-2/+3
The new route handling in ip_mc_finish_output() from 'net' overlapped with the new support for returning congestion notifications from BPF programs. In order to handle this I had to take the dev_loopback_xmit() calls out of the switch statement. The aquantia driver conflicts were simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27ipv6: Convert gateway validation to use fib6_infoDavid Ahern1-62/+56
Gateway validation does not need a dst_entry, it only needs the fib entry to validate the gateway resolution and egress device. So, convert ip6_nh_lookup_table from ip6_pol_route to fib6_table_lookup and ip6_route_check_nh to use fib6_lookup over rt6_lookup. ip6_pol_route is a call to fib6_table_lookup and if successful a call to fib6_select_path. From there the exception cache is searched for an entry or a dst_entry is created to return to the caller. The exception entry is not relevant for gateway validation, so what matters are the calls to fib6_table_lookup and then fib6_select_path. Similarly, rt6_lookup can be replaced with a call to fib6_lookup with RT6_LOOKUP_F_IFACE set in flags. Again, the exception cache search is not relevant, only the lookup with path selection. The primary difference in the lookup paths is the use of rt6_select with fib6_lookup versus rt6_device_match with rt6_lookup. When you remove complexities in the rt6_select path, e.g., 1. saddr is not set for gateway validation, so RT6_LOOKUP_F_HAS_SADDR is not relevant 2. rt6_check_neigh is not called so that removes the RT6_NUD_FAIL_DO_RR return and round-robin logic. the code paths are believed to be equivalent for the given use case - validate the gateway and optionally given the device. Furthermore, it aligns the validation with onlink code path and the lookup path actually used for rx and tx. Adjust the users, ip6_route_check_nh_onlink and ip6_route_check_nh to handle a fib6_info vs a rt6_info when performing validation checks. Existing selftests fib-onlink-tests.sh and fib_tests.sh are used to verify the changes. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26ipv6: fix neighbour resolution with raw socketNicolas Dichtel1-1/+2
The scenario is the following: the user uses a raw socket to send an ipv6 packet, destinated to a not-connected network, and specify a connected nh. Here is the corresponding python script to reproduce this scenario: import socket IPPROTO_RAW = 255 send_s = socket.socket(socket.AF_INET6, socket.SOCK_RAW, IPPROTO_RAW) # scapy # p = IPv6(src='fd00:100::1', dst='fd00:200::fa')/ICMPv6EchoRequest() # str(p) req = b'`\x00\x00\x00\x00\x08:@\xfd\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\xfd\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xfa\x80\x00\x81\xc0\x00\x00\x00\x00' send_s.sendto(req, ('fd00:175::2', 0, 0, 0)) fd00:175::/64 is a connected route and fd00:200::fa is not a connected host. With this scenario, the kernel starts by sending a NS to resolve fd00:175::2. When it receives the NA, it flushes its queue and try to send the initial packet. But instead of sending it, it sends another NS to resolve fd00:200::fa, which obvioulsy fails, thus the packet is dropped. If the user sends again the packet, it now uses the right nh (fd00:175::2). The problem is that ip6_dst_lookup_neigh() uses the rt6i_gateway, which is :: because the associated route is a connected route, thus it uses the dst addr of the packet. Let's use rt6_nexthop() to choose the right nh. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-26ipv6: fix suspicious RCU usage in rt6_dump_route()Eric Dumazet1-0/+2
syzbot reminded us that rt6_nh_dump_exceptions() needs to be called with rcu_read_lock() net/ipv6/route.c:1593 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by syz-executor609/8966: #0: 00000000b7dbe288 (rtnl_mutex){+.+.}, at: netlink_dump+0xe7/0xfb0 net/netlink/af_netlink.c:2199 #1: 00000000f2d87c21 (&(&tb->tb6_lock)->rlock){+...}, at: spin_lock_bh include/linux/spinlock.h:343 [inline] #1: 00000000f2d87c21 (&(&tb->tb6_lock)->rlock){+...}, at: fib6_dump_table.isra.0+0x37e/0x570 net/ipv6/ip6_fib.c:533 stack backtrace: CPU: 0 PID: 8966 Comm: syz-executor609 Not tainted 5.2.0-rc5+ #43 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 lockdep_rcu_suspicious+0x153/0x15d kernel/locking/lockdep.c:5250 fib6_nh_get_excptn_bucket+0x18e/0x1b0 net/ipv6/route.c:1593 rt6_nh_dump_exceptions+0x45/0x4d0 net/ipv6/route.c:5541 rt6_dump_route+0x904/0xc50 net/ipv6/route.c:5640 fib6_dump_node+0x168/0x280 net/ipv6/ip6_fib.c:467 fib6_walk_continue+0x4a9/0x8e0 net/ipv6/ip6_fib.c:1986 fib6_walk+0x9d/0x100 net/ipv6/ip6_fib.c:2034 fib6_dump_table.isra.0+0x38a/0x570 net/ipv6/ip6_fib.c:534 inet6_dump_fib+0x93c/0xb00 net/ipv6/ip6_fib.c:624 rtnl_dump_all+0x295/0x490 net/core/rtnetlink.c:3445 netlink_dump+0x558/0xfb0 net/netlink/af_netlink.c:2244 __netlink_dump_start+0x5b1/0x7d0 net/netlink/af_netlink.c:2352 netlink_dump_start include/linux/netlink.h:226 [inline] rtnetlink_rcv_msg+0x73d/0xb00 net/core/rtnetlink.c:5182 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5237 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x8ae/0xd70 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:646 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:665 sock_write_iter+0x27c/0x3e0 net/socket.c:994 call_write_iter include/linux/fs.h:1872 [inline] new_sync_write+0x4d3/0x770 fs/read_write.c:483 __vfs_write+0xe1/0x110 fs/read_write.c:496 vfs_write+0x20c/0x580 fs/read_write.c:558 ksys_write+0x14f/0x290 fs/read_write.c:611 __do_sys_write fs/read_write.c:623 [inline] __se_sys_write fs/read_write.c:620 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:620 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4401b9 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffc8e134978 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004401b9 RDX: 000000000000001c RSI: 0000000020000000 RDI: 00 Fixes: 1e47b4837f3b ("ipv6: Dump route exceptions if requested") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stefano Brivio <sbrivio@redhat.com> Cc: David Ahern <dsahern@gmail.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-25net/ipv6: Fix misuse of proc_dointvec "skip_notify_on_dev_down"Eiichi Tsukata1-1/+1
/proc/sys/net/ipv6/route/skip_notify_on_dev_down assumes given value to be 0 or 1. Use proc_dointvec_minmax instead of proc_dointvec. Fixes: 7c6bb7d2faaf ("net/ipv6: Add knob to skip DELROUTE message ondevice down") Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24ipv6: Dump route exceptions if requestedStefano Brivio1-11/+103
Since commit 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache"), route exceptions reside in a separate hash table, and won't be found by walking the FIB, so they won't be dumped to userspace on a RTM_GETROUTE message. This causes 'ip -6 route list cache' and 'ip -6 route flush cache' to have no function anymore: # ip -6 route get fc00:3::1 fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 539sec mtu 1400 pref medium # ip -6 route get fc00:4::1 fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 536sec mtu 1500 pref medium # ip -6 route list cache # ip -6 route flush cache # ip -6 route get fc00:3::1 fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 520sec mtu 1400 pref medium # ip -6 route get fc00:4::1 fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 519sec mtu 1500 pref medium because iproute2 lists cached routes using RTM_GETROUTE, and flushes them by listing all the routes, and deleting them with RTM_DELROUTE one by one. If cached routes are requested using the RTM_F_CLONED flag together with strict checking, or if no strict checking is requested (and hence we can't consistently apply filters), look up exceptions in the hash table associated with the current fib6_info in rt6_dump_route(), and, if present and not expired, add them to the dump. We might be unable to dump all the entries for a given node in a single message, so keep track of how many entries were handled for the current node in fib6_walker, and skip that amount in case we start from the same partially dumped node. When a partial dump restarts, as the starting node might change when 'sernum' changes, we have no guarantee that we need to skip the same amount of in-node entries. Therefore, we need two counters, and we need to zero the in-node counter if the node from which the dump is resumed differs. Note that, with the current version of iproute2, this only fixes the 'ip -6 route list cache': on a flush command, iproute2 doesn't pass RTM_F_CLONED and, due to this inconsistency, 'ip -6 route flush cache' is still unable to fetch the routes to be flushed. This will be addressed in a patch for iproute2. To flush cached routes, a procfs entry could be introduced instead: that's how it works for IPv4. We already have a rt6_flush_exception() function ready to be wired to it. However, this would not solve the issue for listing. Versions of iproute2 and kernel tested: iproute2 kernel 4.14.0 4.15.0 4.19.0 5.0.0 5.1.0 5.1.0, patched 3.18 list + + + + + + flush + + + + + + 4.4 list + + + + + + flush + + + + + + 4.9 list + + + + + + flush + + + + + + 4.14 list + + + + + + flush + + + + + + 4.15 list flush 4.19 list flush 5.0 list flush 5.1 list flush with list + + + + + + fix flush + + + + v7: - Explain usage of "skip" counters in commit message (suggested by David Ahern) v6: - Rebase onto net-next, use recently introduced nexthop walker - Make rt6_nh_dump_exceptions() a separate function (suggested by David Ahern) v5: - Use dump_routes and dump_exceptions from filter, ignore NLM_F_MATCH, update test results (flushing works with iproute2 < 5.0.0 now) v4: - Split NLM_F_MATCH and strict check handling in separate patches - Filter routes using RTM_F_CLONED: if it's not set, only return non-cached routes, and if it's set, only return cached routes: change requested by David Ahern and Martin Lau. This implies that iproute2 needs a separate patch to be able to flush IPv6 cached routes. This is not ideal because we can't fix the breakage caused by 2b760fcf5cfb entirely in kernel. However, two years have passed since then, and this makes it more tolerable v3: - More descriptive comment about expired exceptions in rt6_dump_route() - Swap return values of rt6_dump_route() (suggested by Martin Lau) - Don't zero skip_in_node in case we don't dump anything in a given pass (also suggested by Martin Lau) - Remove check on RTM_F_CLONED altogether: in the current UAPI semantic, it's just a flag to indicate the route was cloned, not to filter on routes v2: Add tracking of number of entries to be skipped in current node after a partial dump. As we restart from the same node, if not all the exceptions for a given node fit in a single message, the dump will not terminate, as suggested by Martin Lau. This is a concrete possibility, setting up a big number of exceptions for the same route actually causes the issue, suggested by David Ahern. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24ipv6/route: Change return code of rt6_dump_route() for partial node dumpsStefano Brivio1-6/+10
In the next patch, we are going to add optional dump of exceptions to rt6_dump_route(). Change the return code of rt6_dump_route() to accomodate partial node dumps: we might dump multiple routes per node, and might be able to dump only a given number of them, so fib6_dump_node() will need to know how many routes have been dumped on partial dump, to restart the dump from the point where it was interrupted. Note that fib6_dump_node() is the only caller and already handles all non-negative return codes as success: those become -1 to signal that we're done with the node. If we fail, return 0, as we were unable to dump the single route in the node, but we're not done with it. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24ipv6/route: Don't match on fc_nh_id if not set in ip6_route_del()Stefano Brivio1-1/+2
If fc_nh_id isn't set, we shouldn't try to match against it. This actually matters just for the RTF_CACHE below (where this case is already handled): if iproute2 gets a route exception and tries to delete it, it won't reference it by fc_nh_id, even if a nexthop object might be associated to the originating route. Fixes: 5b98324ebe29 ("ipv6: Allow routes to use nexthop objects") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-23ipv6: convert major tx path to use RT6_LOOKUP_F_DST_NOREFWei Wang1-2/+27
For tx path, in most cases, we still have to take refcnt on the dst cause the caller is caching the dst somewhere. But it still is beneficial to make use of RT6_LOOKUP_F_DST_NOREF flag while doing the route lookup. It is cause this flag prevents manipulating refcnt on net->ipv6.ip6_null_entry when doing fib6_rule_lookup() to traverse each routing table. The null_entry is a shared object and constant updates on it cause false sharing. We converted the current major lookup function ip6_route_output_flags() to make use of RT6_LOOKUP_F_DST_NOREF. Together with the change in the rx path, we see noticable performance boost: I ran synflood tests between 2 hosts under the same switch. Both hosts have 20G mlx NIC, and 8 tx/rx queues. Sender sends pure SYN flood with random src IPs and ports using trafgen. Receiver has a simple TCP listener on the target port. Both hosts have multiple custom rules: - For incoming packets, only local table is traversed. - For outgoing packets, 3 tables are traversed to find the route. The packet processing rate on the receiver is as follows: - Before the fix: 3.78Mpps - After the fix: 5.50Mpps Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-23ipv6: convert rx data path to not take refcnt on dstWei Wang1-3/+4
ip6_route_input() is the key function to do the route lookup in the rx data path. All the callers to this function are already holding rcu lock. So it is fairly easy to convert it to not take refcnt on the dst: We pass in flag RT6_LOOKUP_F_DST_NOREF and do skb_dst_set_noref(). This saves a few atomic inc or dec operations and should boost performance overall. This also makes the logic more aligned with v4. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-23ipv6: initialize rt6->rt6i_uncached in all pre-allocated dst entriesWei Wang1-0/+3
Initialize rt6->rt6i_uncached on the following pre-allocated dsts: net->ipv6.ip6_null_entry net->ipv6.ip6_prohibit_entry net->ipv6.ip6_blk_hole_entry This is a preparation patch for later commits to be able to distinguish dst entries in uncached list by doing: !list_empty(rt6->rt6i_uncached) Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-23ipv6: introduce RT6_LOOKUP_F_DST_NOREF flag in ip6_pol_route()Wei Wang1-43/+30
This new flag is to instruct the route lookup function to not take refcnt on the dst entry. The user which does route lookup with this flag must properly use rcu protection. ip6_pol_route() is the major route lookup function for both tx and rx path. In this function: Do not take refcnt on dst if RT6_LOOKUP_F_DST_NOREF flag is set, and directly return the route entry. The caller should be holding rcu lock when using this flag, and decide whether to take refcnt or not. One note on the dst cache in the uncached_list: As uncached_list does not consume refcnt, one refcnt is always returned back to the caller even if RT6_LOOKUP_F_DST_NOREF flag is set. Uncached dst is only possible in the output path. So in such call path, caller MUST check if the dst is in the uncached_list before assuming that there is no refcnt taken on the returned dst. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-22ipv6: Error when route does not have any valid nexthopsIdo Schimmel1-0/+6
When user space sends invalid information in RTA_MULTIPATH, the nexthop list in ip6_route_multipath_add() is empty and 'rt_notif' is set to NULL. The code that emits the in-kernel notifications does not check for this condition, which results in a NULL pointer dereference [1]. Fix this by bailing earlier in the function if the parsed nexthop list is empty. This is consistent with the corresponding IPv4 code. v2: * Check if parsed nexthop list is empty and bail with extack set [1] kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 9190 Comm: syz-executor149 Not tainted 5.2.0-rc5+ #38 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:call_fib6_multipath_entry_notifiers+0xd1/0x1a0 net/ipv6/ip6_fib.c:396 Code: 8b b5 30 ff ff ff 48 c7 85 68 ff ff ff 00 00 00 00 48 c7 85 70 ff ff ff 00 00 00 00 89 45 88 4c 89 e0 48 c1 e8 03 4c 89 65 80 <42> 80 3c 28 00 0f 85 9a 00 00 00 48 b8 00 00 00 00 00 fc ff df 4d RSP: 0018:ffff88809788f2c0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 1ffff11012f11e59 RCX: 00000000ffffffff RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88809788f390 R08: ffff88809788f8c0 R09: 000000000000000c R10: ffff88809788f5d8 R11: ffff88809788f527 R12: 0000000000000000 R13: dffffc0000000000 R14: ffff88809788f8c0 R15: ffffffff89541d80 FS: 000055555632c880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000080 CR3: 000000009ba7c000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ip6_route_multipath_add+0xc55/0x1490 net/ipv6/route.c:5094 inet6_rtm_newroute+0xed/0x180 net/ipv6/route.c:5208 rtnetlink_rcv_msg+0x463/0xb00 net/core/rtnetlink.c:5219 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5237 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x531/0x710 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x8ae/0xd70 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:646 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:665 ___sys_sendmsg+0x803/0x920 net/socket.c:2286 __sys_sendmsg+0x105/0x1d0 net/socket.c:2324 __do_sys_sendmsg net/socket.c:2333 [inline] __se_sys_sendmsg net/socket.c:2331 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2331 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4401f9 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffc09fd0028 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004401f9 RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 0000000000000000 R09: 00000000004002c8 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401a80 R13: 0000000000401b10 R14: 0000000000000000 R15: 0000000000000000 Reported-by: syzbot+382566d339d52cd1a204@syzkaller.appspotmail.com Fixes: ebee3cad835f ("ipv6: Add IPv6 multipath notifications for add / replace") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
Minor SPDX change conflict. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-19ipv6: Default fib6_type to RTN_UNICAST when not setDavid Ahern1-1/+1
A user reported that routes are getting installed with type 0 (RTN_UNSPEC) where before the routes were RTN_UNICAST. One example is from accel-ppp which apparently still uses the ioctl interface and does not set rtmsg_type. Another is the netlink interface where ipv6 does not require rtm_type to be set (v4 does). Prior to the commit in the Fixes tag the ipv6 stack converted type 0 to RTN_UNICAST, so restore that behavior. Fixes: e8478e80e5a7 ("net/ipv6: Save route type in rt6_info") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18ipv6: Add IPv6 multipath notification for route deleteIdo Schimmel1-0/+6
If all the nexthops of a multipath route are being deleted, send one notification for the entire route, instead of one per-nexthop. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-18ipv6: Add IPv6 multipath notifications for add / replaceIdo Schimmel1-0/+15
Emit a notification when a multipath routes is added or replace. Note that unlike the replace notifications sent from fib6_add_rt2node(), it is possible we are sending a 'FIB_EVENT_ENTRY_REPLACE' when a route was merely added and not replaced. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-10ipv6: Allow routes to use nexthop objectsDavid Ahern1-8/+81
Add support for RTA_NH_ID attribute to allow a user to specify a nexthop id to use with a route. fc_nh_id is added to fib6_config to hold the value passed in the RTA_NH_ID attribute. If a nexthop id is given, the gateway, device, encap and multipath attributes can not be set. Update ip6_route_del to check metric and protocol before nexthop specs. If fc_nh_id is set, then it must match the id in the route entry. Since IPv6 allows delete of a cached entry (an exception), add ip6_del_cached_rt_nh to cycle through all of the fib6_nh in a fib entry if it is using a nexthop. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-10ipv6: Handle all fib6_nh in a nexthop in mtu updatesDavid Ahern1-1/+28
Use nexthop_for_each_fib6_nh to call fib6_nh_mtu_change for each fib6_nh in a nexthop for rt6_mtu_change_route. For __ip6_rt_update_pmtu, we need to find the nexthop that correlates to the device and gateway in the rt6_info. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-10ipv6: Handle all fib6_nh in a nexthop in rt6_do_redirectDavid Ahern1-1/+19
Use nexthop_for_each_fib6_nh and fib6_nh_find_match to find the fib6_nh in a nexthop that correlates to the device and gateway in the rt6_info. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-10ipv6: Handle all fib6_nh in a nexthop in __ip6_route_redirectDavid Ahern1-4/+35
Add a hook in __ip6_route_redirect to handle a nexthop struct in a fib6_info. Use nexthop_for_each_fib6_nh and fib6_nh_redirect_match to call ip6_redirect_nh_match for each fib6_nh looking for a match. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-10ipv6: Handle all fib6_nh in a nexthop in exception handlingDavid Ahern1-3/+108
Add a hook in rt6_flush_exceptions, rt6_remove_exception_rt, rt6_update_exception_stamp_rt, and rt6_age_exceptions to handle nexthop struct in a fib6_info. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>