aboutsummaryrefslogtreecommitdiffstats
path: root/include (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2019-06-24ipv6: Dump route exceptions if requestedStefano Brivio2-1/+2
Since commit 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache"), route exceptions reside in a separate hash table, and won't be found by walking the FIB, so they won't be dumped to userspace on a RTM_GETROUTE message. This causes 'ip -6 route list cache' and 'ip -6 route flush cache' to have no function anymore: # ip -6 route get fc00:3::1 fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 539sec mtu 1400 pref medium # ip -6 route get fc00:4::1 fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 536sec mtu 1500 pref medium # ip -6 route list cache # ip -6 route flush cache # ip -6 route get fc00:3::1 fc00:3::1 via fc00:1::2 dev veth_A-R1 src fc00:1::1 metric 1024 expires 520sec mtu 1400 pref medium # ip -6 route get fc00:4::1 fc00:4::1 via fc00:2::2 dev veth_A-R2 src fc00:2::1 metric 1024 expires 519sec mtu 1500 pref medium because iproute2 lists cached routes using RTM_GETROUTE, and flushes them by listing all the routes, and deleting them with RTM_DELROUTE one by one. If cached routes are requested using the RTM_F_CLONED flag together with strict checking, or if no strict checking is requested (and hence we can't consistently apply filters), look up exceptions in the hash table associated with the current fib6_info in rt6_dump_route(), and, if present and not expired, add them to the dump. We might be unable to dump all the entries for a given node in a single message, so keep track of how many entries were handled for the current node in fib6_walker, and skip that amount in case we start from the same partially dumped node. When a partial dump restarts, as the starting node might change when 'sernum' changes, we have no guarantee that we need to skip the same amount of in-node entries. Therefore, we need two counters, and we need to zero the in-node counter if the node from which the dump is resumed differs. Note that, with the current version of iproute2, this only fixes the 'ip -6 route list cache': on a flush command, iproute2 doesn't pass RTM_F_CLONED and, due to this inconsistency, 'ip -6 route flush cache' is still unable to fetch the routes to be flushed. This will be addressed in a patch for iproute2. To flush cached routes, a procfs entry could be introduced instead: that's how it works for IPv4. We already have a rt6_flush_exception() function ready to be wired to it. However, this would not solve the issue for listing. Versions of iproute2 and kernel tested: iproute2 kernel 4.14.0 4.15.0 4.19.0 5.0.0 5.1.0 5.1.0, patched 3.18 list + + + + + + flush + + + + + + 4.4 list + + + + + + flush + + + + + + 4.9 list + + + + + + flush + + + + + + 4.14 list + + + + + + flush + + + + + + 4.15 list flush 4.19 list flush 5.0 list flush 5.1 list flush with list + + + + + + fix flush + + + + v7: - Explain usage of "skip" counters in commit message (suggested by David Ahern) v6: - Rebase onto net-next, use recently introduced nexthop walker - Make rt6_nh_dump_exceptions() a separate function (suggested by David Ahern) v5: - Use dump_routes and dump_exceptions from filter, ignore NLM_F_MATCH, update test results (flushing works with iproute2 < 5.0.0 now) v4: - Split NLM_F_MATCH and strict check handling in separate patches - Filter routes using RTM_F_CLONED: if it's not set, only return non-cached routes, and if it's set, only return cached routes: change requested by David Ahern and Martin Lau. This implies that iproute2 needs a separate patch to be able to flush IPv6 cached routes. This is not ideal because we can't fix the breakage caused by 2b760fcf5cfb entirely in kernel. However, two years have passed since then, and this makes it more tolerable v3: - More descriptive comment about expired exceptions in rt6_dump_route() - Swap return values of rt6_dump_route() (suggested by Martin Lau) - Don't zero skip_in_node in case we don't dump anything in a given pass (also suggested by Martin Lau) - Remove check on RTM_F_CLONED altogether: in the current UAPI semantic, it's just a flag to indicate the route was cloned, not to filter on routes v2: Add tracking of number of entries to be skipped in current node after a partial dump. As we restart from the same node, if not all the exceptions for a given node fit in a single message, the dump will not terminate, as suggested by Martin Lau. This is a concrete possibility, setting up a big number of exceptions for the same route actually causes the issue, suggested by David Ahern. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24ipv6/route: Change return code of rt6_dump_route() for partial node dumpsStefano Brivio2-7/+11
In the next patch, we are going to add optional dump of exceptions to rt6_dump_route(). Change the return code of rt6_dump_route() to accomodate partial node dumps: we might dump multiple routes per node, and might be able to dump only a given number of them, so fib6_dump_node() will need to know how many routes have been dumped on partial dump, to restart the dump from the point where it was interrupted. Note that fib6_dump_node() is the only caller and already handles all non-negative return codes as success: those become -1 to signal that we're done with the node. If we fail, return 0, as we were unable to dump the single route in the node, but we're not done with it. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24ipv6/route: Don't match on fc_nh_id if not set in ip6_route_del()Stefano Brivio1-1/+2
If fc_nh_id isn't set, we shouldn't try to match against it. This actually matters just for the RTF_CACHE below (where this case is already handled): if iproute2 gets a route exception and tries to delete it, it won't reference it by fc_nh_id, even if a nexthop object might be associated to the originating route. Fixes: 5b98324ebe29 ("ipv6: Allow routes to use nexthop objects") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24Revert "net/ipv6: Bail early if user only wants cloned entries"Stefano Brivio1-5/+2
This reverts commit 08e814c9e8eb5a982cbd1e8f6bd255d97c51026f: as we are preparing to fix listing and dumping of IPv6 cached routes, we need to allow RTM_F_CLONED as a flag to match routes against while dumping them. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24ipv4: Dump route exceptions if requestedStefano Brivio3-13/+108
Since commit 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions."), cached exception routes are stored as a separate entity, so they are not dumped on a FIB dump, even if the RTM_F_CLONED flag is passed. This implies that the command 'ip route list cache' doesn't return any result anymore. If the RTM_F_CLONED is passed, and strict checking requested, retrieve nexthop exception routes and dump them. If no strict checking is requested, filtering can't be performed consistently: dump everything in that case. With this, we need to add an argument to the netlink callback in order to track how many entries were already dumped for the last leaf included in a partial netlink dump. A single additional argument is sufficient, even if we traverse logically nested structures (nexthop objects, hash table buckets, bucket chains): it doesn't matter if we stop in the middle of any of those, because they are always traversed the same way. As an example, s_i values in [], s_fa values in (): node (fa) #1 [1] nexthop #1 bucket #1 -> #0 in chain (1) bucket #2 -> #0 in chain (2) -> #1 in chain (3) -> #2 in chain (4) bucket #3 -> #0 in chain (5) -> #1 in chain (6) nexthop #2 bucket #1 -> #0 in chain (7) -> #1 in chain (8) bucket #2 -> #0 in chain (9) -- node (fa) #2 [2] nexthop #1 bucket #1 -> #0 in chain (1) -> #1 in chain (2) bucket #2 -> #0 in chain (3) it doesn't matter if we stop at (3), (4), (7) for "node #1", or at (2) for "node #2": walking flattens all that. It would even be possible to drop the distinction between the in-tree (s_i) and in-node (s_fa) counter, but a further improvement might advise against this. This is only as accurate as the existing tracking mechanism for leaves: if a partial dump is restarted after exceptions are removed or expired, we might skip some non-dumped entries. To improve this, we could attach a 'sernum' attribute (similar to the one used for IPv6) to nexthop entities, and bump this counter whenever exceptions change: having a distinction between the two counters would make this more convenient. Listing of exception routes (modified routes pre-3.5) was tested against these versions of kernel and iproute2: iproute2 kernel 4.14.0 4.15.0 4.19.0 5.0.0 5.1.0 3.5-rc4 + + + + + 4.4 4.9 4.14 4.15 4.19 5.0 5.1 fixed + + + + + v7: - Move loop over nexthop objects to route.c, and pass struct fib_info and table ID to it, not a struct fib_alias (suggested by David Ahern) - While at it, note that the NULL check on fa->fa_info is redundant, and the check on RTNH_F_DEAD is also not consistent with what's done with regular route listing: just keep it for nhc_flags - Rename entry point function for dumping exceptions to fib_dump_info_fnhe(), and rearrange arguments for consistency with fib_dump_info() - Rename fnhe_dump_buckets() to fnhe_dump_bucket() and make it handle one bucket at a time - Expand commit message to describe why we can have a single "skip" counter for all exceptions stored in bucket chains in nexthop objects (suggested by David Ahern) v6: - Rebased onto net-next - Loop over nexthop paths too. Move loop over fnhe buckets to route.c, avoids need to export rt_fill_info() and to touch exceptions from fib_trie.c. Pass NULL as flow to rt_fill_info(), it now allows that (suggested by David Ahern) Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions.") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24ipv4/route: Allow NULL flowinfo in rt_fill_info()Stefano Brivio1-26/+30
In the next patch, we're going to use rt_fill_info() to dump exception routes upon RTM_GETROUTE with NLM_F_ROOT, meaning userspace is requesting a dump and not a specific route selection, which in turn implies the input interface is not relevant. Update rt_fill_info() to handle a NULL flowinfo. v7: If fl4 is NULL, explicitly set r->rtm_tos to 0: it's not initialised otherwise (spotted by David Ahern) v6: New patch Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24ipv4/fib_frontend: Allow RTM_F_CLONED flag to be used for filteringStefano Brivio1-2/+2
This functionally reverts the check introduced by commit e8ba330ac0c5 ("rtnetlink: Update fib dumps for strict data checking") as modified by commit e4e92fb160d7 ("net/ipv4: Bail early if user only wants prefix entries"). As we are preparing to fix listing of IPv4 cached routes, we need to give userspace a way to request them. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24fib_frontend, ip6_fib: Select routes or exceptions dump from RTM_F_CLONEDStefano Brivio3-2/+11
The following patches add back the ability to dump IPv4 and IPv6 exception routes, and we need to allow selection of regular routes or exceptions. Use RTM_F_CLONED as filter to decide whether to dump routes or exceptions: iproute2 passes it in dump requests (except for IPv6 cache flush requests, this will be fixed in iproute2) and this used to work as long as exceptions were stored directly in the FIB, for both IPv4 and IPv6. Caveat: if strict checking is not requested (that is, if the dump request doesn't go through ip_valid_fib_dump_req()), we can't filter on protocol, tables or route types. In this case, filtering on RTM_F_CLONED would be inconsistent: we would fix 'ip route list cache' by returning exception routes and at the same time introduce another bug in case another selector is present, e.g. on 'ip route list cache table main' we would return all exception routes, without filtering on tables. Keep this consistent by applying no filters at all, and dumping both routes and exceptions, if strict checking is not requested. iproute2 currently filters results anyway, and no unwanted results will be presented to the user. The kernel will just dump more data than needed. v7: No changes v6: Rebase onto net-next, no changes v5: New patch: add dump_routes and dump_exceptions flags in filter and simply clear the unwanted one if strict checking is enabled, don't ignore NLM_F_MATCH and don't set filter_set if NLM_F_MATCH is set. Skip filtering altogether if no strict checking is requested: selecting routes or exceptions only would be inconsistent with the fact we can't filter on tables. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24net: macb: use GROAntoine Tenart2-9/+12
This patch updates the macb driver to use NAPI GRO helpers when receiving SKBs. This improves performances. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-24net: macb: use NAPI_POLL_WEIGHTAntoine Tenart1-1/+1
Use NAPI_POLL_WEIGHT, the default NAPI poll() weight instead of redefining our own value (which turns out to be 64 as well). Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>