wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2021-02-15	netfilter: nftables: add helper function to release one table	Pablo Neira Ayuso	1	-35/+40
	Add a function to release one table. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-02-13	skbuff: queue NAPI_MERGED_FREE skbs into NAPI cache instead of freeing	Alexander Lobakin	3	-11/+11
	napi_frags_finish() and napi_skb_finish() can only be called inside NAPI Rx context, so we can feed NAPI cache with skbuff_heads that got NAPI_MERGED_FREE verdict instead of immediate freeing. Replace __kfree_skb() with __kfree_skb_defer() in napi_skb_finish() and move napi_skb_free_stolen_head() to skbuff.c, so it can drop skbs to NAPI cache. As many drivers call napi_alloc_skb()/napi_get_frags() on their receive path, this becomes especially useful. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13	skbuff: allow to use NAPI cache from __napi_alloc_skb()	Alexander Lobakin	1	-2/+3
	{,__}napi_alloc_skb() is mostly used either for optional non-linear receive methods (usually controlled via Ethtool private flags and off by default) and/or for Rx copybreaks. Use __napi_build_skb() here for obtaining skbuff_heads from NAPI cache instead of inplace allocations. This includes both kmalloc and page frag paths. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13	skbuff: allow to optionally use NAPI cache from __alloc_skb()	Alexander Lobakin	1	-1/+5
	Reuse the old and forgotten SKB_ALLOC_NAPI to add an option to get an skbuff_head from the NAPI cache instead of inplace allocation inside __alloc_skb(). This implies that the function is called from softirq or BH-off context, not for allocating a clone or from a distant node. Cc: Alexander Duyck <alexander.duyck@gmail.com> # Simplified flags check Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13	skbuff: introduce {,__}napi_build_skb() which reuses NAPI cache heads	Alexander Lobakin	2	-13/+83
	Instead of just bulk-flushing skbuff_heads queued up through napi_consume_skb() or __kfree_skb_defer(), try to reuse them on allocation path. If the cache is empty on allocation, bulk-allocate the first 16 elements, which is more efficient than per-skb allocation. If the cache is full on freeing, bulk-wipe the second half of the cache (32 elements). This also includes custom KASAN poisoning/unpoisoning to be double sure there are no use-after-free cases. To not change current behaviour, introduce a new function, napi_build_skb(), to optionally use a new approach later in drivers. Note on selected bulk size, 16: - this equals to XDP_BULK_QUEUE_SIZE, DEV_MAP_BULK_SIZE and especially VETH_XDP_BATCH, which is also used to bulk-allocate skbuff_heads and was tested on powerful setups; - this also showed the best performance in the actual test series (from the array of {8, 16, 32}). Suggested-by: Edward Cree <ecree.xilinx@gmail.com> # Divide on two halves Suggested-by: Eric Dumazet <edumazet@google.com> # KASAN poisoning Cc: Dmitry Vyukov <dvyukov@google.com> # Help with KASAN Cc: Paolo Abeni <pabeni@redhat.com> # Reduced batch size Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13	skbuff: move NAPI cache declarations upper in the file	Alexander Lobakin	1	-45/+45
	NAPI cache structures will be used for allocating skbuff_heads, so move their declarations a bit upper. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13	skbuff: remove __kfree_skb_flush()	Alexander Lobakin	3	-19/+1
	This function isn't much needed as NAPI skb queue gets bulk-freed anyway when there's no more room, and even may reduce the efficiency of bulk operations. It will be even less needed after reusing skb cache on allocation path, so remove it and this way lighten network softirqs a bit. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13	skbuff: use __build_skb_around() in __alloc_skb()	Alexander Lobakin	1	-17/+1
	Just call __build_skb_around() instead of open-coding it. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13	skbuff: simplify __alloc_skb() a bit	Alexander Lobakin	1	-6/+5
	Use unlikely() annotations for skbuff_head and data similarly to the two other allocation functions and remove totally redundant goto. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13	skbuff: make __build_skb_around() return void	Alexander Lobakin	1	-7/+6
	__build_skb_around() can never fail and always returns passed skb. Make it return void to simplify and optimize the code. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13	skbuff: simplify kmalloc_reserve()	Alexander Lobakin	1	-5/+2
	Eversince the introduction of __kmalloc_reserve(), "ip" argument hasn't been used. _RET_IP_ is embedded inside kmalloc_node_track_caller(). Remove the redundant macro and rename the function after it. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-13	skbuff: move __alloc_skb() next to the other skb allocation functions	Alexander Lobakin	1	-142/+142
	In preparation before reusing several functions in all three skb allocation variants, move __alloc_skb() next to the __netdev_alloc_skb() and __napi_alloc_skb(). No functional changes. Signed-off-by: Alexander Lobakin <alobakin@pm.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: axienet: Support dynamic switching between 1000BaseX and SGMII	Robert Hancock	2	-18/+71
	Newer versions of the Xilinx AXI Ethernet core (specifically version 7.2 or later) allow the core to be configured with a PHY interface mode of "Both", allowing either 1000BaseX or SGMII modes to be selected at runtime. Add support for this in the driver to allow better support for applications which can use both fiber and copper SFP modules. Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	dt-bindings: net: xilinx_axienet: add xlnx,switch-x-sgmii attribute	Robert Hancock	1	-0/+4
	Document the new xlnx,switch-x-sgmii attribute which is used to indicate that the Ethernet core supports dynamic switching between 1000BaseX and SGMII. Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: axienet: hook up nway_reset ethtool operation	Robert Hancock	1	-0/+8
	Hook up the nway_reset ethtool operation to the corresponding phylink function so that "ethtool -r" can be supported. Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	tcp: factorize logic into tcp_epollin_ready()	Eric Dumazet	3	-22/+19
	Both tcp_data_ready() and tcp_stream_is_readable() share the same logic. Add tcp_epollin_ready() helper to avoid duplication. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Arjun Roy <arjunroy@google.com> Cc: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	tcp: fix SO_RCVLOWAT related hangs under mem pressure	Eric Dumazet	1	-2/+7
	While commit 24adbc1676af ("tcp: fix SO_RCVLOWAT hangs with fat skbs") fixed an issue vs too small sk_rcvbuf for given sk_rcvlowat constraint, it missed to address issue caused by memory pressure. 1) If we are under memory pressure and socket receive queue is empty. First incoming packet is allowed to be queued, after commit 76dfa6082032 ("tcp: allow one skb to be received per socket under memory pressure") But we do not send EPOLLIN yet, in case tcp_data_ready() sees sk_rcvlowat is bigger than skb length. 2) Then, when next packet comes, it is dropped, and we directly call sk->sk_data_ready(). 3) If application is using poll(), tcp_poll() will then use tcp_stream_is_readable() and decide the socket receive queue is not yet filled, so nothing will happen. Even when sender retransmits packets, phases 2) & 3) repeat and flow is effectively frozen, until memory pressure is off. Fix is to consider tcp_under_memory_pressure() to take care of global memory pressure or memcg pressure. Fixes: 24adbc1676af ("tcp: fix SO_RCVLOWAT hangs with fat skbs") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Arjun Roy <arjunroy@google.com> Suggested-by: Wei Wang <weiwan@google.com> Reviewed-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	selftests: tc: Add generic mpls matching support for tc-flower	Guillaume Nault	2	-1/+162
	Add tests in tc_flower.sh for generic matching on MPLS Label Stack Entries. The label, tc, bos and ttl fields are tested for the first and second labels. For each field, the minimal and maximal values are tested (the former at depth 1 and the later at depth 2). There are also tests for matching the presence of a label stack entry at a given depth. In order to reduce the amount of code, all "lse" subcommands are tested in match_mpls_lse_test(). Action "continue" is used, so that test packets are evaluated by all filters. Then, we can verify if each filter matched the expected number of packets. Some versions of tc-flower produced invalid json output when dumping MPLS filters with depth > 1. Skip the test if tc isn't recent enough. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	selftests: tc: Add basic mpls_* matching support for tc-flower	Guillaume Nault	3	-1/+187
	Add tests in tc_flower.sh for mpls_label, mpls_tc, mpls_bos and mpls_ttl. For each keyword, test the minimal and maximal values. Selectively skip these new mpls tests for tc versions that don't support them. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: dsa: sja1105: offload bridge port flags to device	Vladimir Oltean	3	-11/+219
	The chip can configure unicast flooding, broadcast flooding and learning. Learning is per port, while flooding is per {ingress, egress} port pair and we need to configure the same value for all possible ingress ports towards the requested one. While multicast flooding is not officially supported, we can hack it by using a feature of the second generation (P/Q/R/S) devices, which is that FDB entries are maskable, and multicast addresses always have an odd first octet. So by putting a match-all for 00:01:00:00:00:00 addr and 00:01:00:00:00:00 mask at the end of the FDB, we make sure that it is always checked last, and does not take precedence in front of any other MDB. So it behaves effectively as an unknown multicast entry. For the first generation switches, this feature is not available, so unknown multicast will always be treated the same as unknown unicast. So the only thing we can do is request the user to offload the settings for these 2 flags in tandem, i.e. ip link set swp2 type bridge_slave flood off Error: sja1105: This chip cannot configure multicast flooding independently of unicast. ip link set swp2 type bridge_slave flood off mcast_flood off ip link set swp2 type bridge_slave mcast_flood on Error: sja1105: This chip cannot configure multicast flooding independently of unicast. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: mscc: ocelot: offload bridge port flags to device	Vladimir Oltean	4	-5/+158
	We should not be unconditionally enabling address learning, since doing that is actively detrimential when a port is standalone and not offloading a bridge. Namely, if a port in the switch is standalone and others are offloading the bridge, then we could enter a situation where we learn an address towards the standalone port, but the bridged ports could not forward the packet there, because the CPU is the only path between the standalone and the bridged ports. The solution of course is to not enable address learning unless the bridge asks for it. We need to set up the initial port flags for no learning and flooding everything, and also when the port joins and leaves the bridge. The flood configuration was already configured ok for standalone mode in ocelot_init, we just need to disable learning in ocelot_init_port. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: mscc: ocelot: use separate flooding PGID for broadcast	Vladimir Oltean	3	-12/+18
	In preparation of offloading the bridge port flags which have independent settings for unknown multicast and for broadcast, we should also start reserving one destination Port Group ID for the flooding of broadcast packets, to allow configuring it individually. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: dsa: felix: restore multicast flood to CPU when NPI tagger reinitializes	Vladimir Oltean	1	-0/+1
	ocelot_init sets up PGID_MC to include the CPU port module, and that is fine, but the ocelot-8021q tagger removes the CPU port module from the unknown multicast replicator. So after a transition from the default ocelot tagger towards ocelot-8021q and then again towards ocelot, multicast flooding towards the CPU port module will be disabled. Fixes: e21268efbe26 ("net: dsa: felix: perform switch setup for tag_8021q") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: dsa: act as passthrough for bridge port flags	Vladimir Oltean	10	-127/+263
	There are multiple ways in which a PORT_BRIDGE_FLAGS attribute can be expressed by the bridge through switchdev, and not all of them can be emulated by DSA mid-layer API at the same time. One possible configuration is when the bridge offloads the port flags using a mask that has a single bit set - therefore only one feature should change. However, DSA currently groups together unicast and multicast flooding in the .port_egress_floods method, which limits our options when we try to add support for turning off broadcast flooding: do we extend .port_egress_floods with a third parameter which b53 and mv88e6xxx will ignore? But that means that the DSA layer, which currently implements the PRE_BRIDGE_FLAGS attribute all by itself, will see that .port_egress_floods is implemented, and will report that all 3 types of flooding are supported - not necessarily true. Another configuration is when the user specifies more than one flag at the same time, in the same netlink message. If we were to create one individual function per offloadable bridge port flag, we would limit the expressiveness of the switch driver of refusing certain combinations of flag values. For example, a switch may not have an explicit knob for flooding of unknown multicast, just for flooding in general. In that case, the only correct thing to do is to allow changes to BR_FLOOD and BR_MCAST_FLOOD in tandem, and never allow mismatched values. But having a separate .port_set_unicast_flood and .port_set_multicast_flood would not allow the driver to possibly reject that. Also, DSA doesn't consider it necessary to inform the driver that a SWITCHDEV_ATTR_ID_BRIDGE_MROUTER attribute was offloaded, because it just calls .port_egress_floods for the CPU port. When we'll add support for the plain SWITCHDEV_ATTR_ID_PORT_MROUTER, that will become a real problem because the flood settings will need to be held statefully in the DSA middle layer, otherwise changing the mrouter port attribute will impact the flooding attribute. And that's _assuming_ that the underlying hardware doesn't have anything else to do when a multicast router attaches to a port than flood unknown traffic to it. If it does, there will need to be a dedicated .port_set_mrouter anyway. So we need to let the DSA drivers see the exact form that the bridge passes this switchdev attribute in, otherwise we are standing in the way. Therefore we also need to use this form of language when communicating to the driver that it needs to configure its initial (before bridge join) and final (after bridge leave) port flags. The b53 and mv88e6xxx drivers are converted to the passthrough API and their implementation of .port_egress_floods is split into two: a function that configures unicast flooding and another for multicast. The mv88e6xxx implementation is quite hairy, and it turns out that the implementations of unknown unicast flooding are actually the same for 6185 and for 6352: behind the confusing names actually lie two individual bits: NO_UNKNOWN_MC -> FLOOD_UC = 0x4 = BIT(2) NO_UNKNOWN_UC -> FLOOD_MC = 0x8 = BIT(3) so there was no reason to entangle them in the first place. Whereas the 6185 writes to MV88E6185_PORT_CTL0_FORWARD_UNKNOWN of PORT_CTL0, which has the exact same bit index. I have left the implementations separate though, for the only reason that the names are different enough to confuse me, since I am not able to double-check with a user manual. The multicast flooding setting for 6185 is in a different register than for 6352 though. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: switchdev: pass flags and mask to both {PRE_,}BRIDGE_FLAGS attributes	Vladimir Oltean	10	-89/+129
	This switchdev attribute offers a counterproductive API for a driver writer, because although br_switchdev_set_port_flag gets passed a "flags" and a "mask", those are passed piecemeal to the driver, so while the PRE_BRIDGE_FLAGS listener knows what changed because it has the "mask", the BRIDGE_FLAGS listener doesn't, because it only has the final value. But certain drivers can offload only certain combinations of settings, like for example they cannot change unicast flooding independently of multicast flooding - they must be both on or both off. The way the information is passed to switchdev makes drivers not expressive enough, and unable to reject this request ahead of time, in the PRE_BRIDGE_FLAGS notifier, so they are forced to reject it during the deferred BRIDGE_FLAGS attribute, where the rejection is currently ignored. This patch also changes drivers to make use of the "mask" field for edge detection when possible. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: dsa: configure better brport flags when ports leave the bridge	Vladimir Oltean	1	-7/+38
	For a DSA switch port operating in standalone mode, address learning doesn't make much sense since that is a bridge function. In fact, address learning even breaks setups such as this one: +---------------------------------------------+ \| \| \| +-------------------+ \| \| \| br0 \| send receive \| \| +--------+-+--------+ +--------+ +--------+ \| \| \| \| \| \| \| \| \| \| \| \| \| swp0 \| \| swp1 \| \| swp2 \| \| swp3 \| \| \| \| \| \| \| \| \| \| \| \| +-+--------+-+--------+-+--------+-+--------+-+ \| ^ \| ^ \| \| \| \| \| +-----------+ \| \| \| +--------------------------------+ because if the switch has a single FDB (can offload a single bridge) then source address learning on swp3 can "steal" the source MAC address of swp2 from br0's FDB, because learning frames coming from swp2 will be done twice: first on the swp1 ingress port, second on the swp3 ingress port. So the hardware FDB will become out of sync with the software bridge, and when swp2 tries to send one more packet towards swp1, the ASIC will attempt to short-circuit the forwarding path and send it directly to swp3 (since that's the last port it learned that address on), which it obviously can't, because swp3 operates in standalone mode. So DSA drivers operating in standalone mode should still configure a list of bridge port flags even when they are standalone. Currently DSA attempts to call dsa_port_bridge_flags with 0, which disables egress flooding of unknown unicast and multicast, something which doesn't make much sense. For the switches that implement .port_egress_floods - b53 and mv88e6xxx, it probably doesn't matter too much either, since they can possibly inject traffic from the CPU into a standalone port, regardless of MAC DA, even if egress flooding is turned off for that port, but certainly not all DSA switches can do that - sja1105, for example, can't. So it makes sense to use a better common default there, such as "flood everything". It should also be noted that what DSA calls "dsa_port_bridge_flags()" is a degenerate name for just calling .port_egress_floods(), since nothing else is implemented - not learning, in particular. But disabling address learning, something that this driver is also coding up for, will be supported by individual drivers once .port_egress_floods is replaced with a more generic .port_bridge_flags. Previous attempts to code up this logic have been in the common bridge layer, but as pointed out by Ido Schimmel, there are corner cases that are missed when doing that: https://patchwork.kernel.org/project/netdevbpf/patch/20210209151936.97382-5-olteanv@gmail.com/ So, at least for now, let's leave DSA in charge of setting port flags before and after the bridge join and leave. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: bridge: don't print in br_switchdev_set_port_flag	Vladimir Oltean	4	-14/+21
	For the netlink interface, propagate errors through extack rather than simply printing them to the console. For the sysfs interface, we still print to the console, but at least that's one layer higher than in switchdev, which also allows us to silently ignore the offloading of flags if that is ever needed in the future. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: bridge: offload all port flags at once in br_setport	Vladimir Oltean	2	-76/+39
	If for example this command: ip link set swp0 type bridge_slave flood off mcast_flood off learning off succeeded at configuring BR_FLOOD and BR_MCAST_FLOOD but not at BR_LEARNING, there would be no attempt to revert the partial state in any way. Arguably, if the user changes more than one flag through the same netlink command, this one _should_ be all or nothing, which means it should be passed through switchdev as all or nothing. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: switchdev: propagate extack to port attributes	Vladimir Oltean	8	-11/+24
	When a struct switchdev_attr is notified through switchdev, there is no way to report informational messages, unlike for struct switchdev_obj. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	octeontx2: Fix condition.	David S. Miller	1	-1/+1
	Fixes: 93efb0c656837 ("octeontx2-pf: Fix out-of-bounds read in otx2_get_fecparam()") Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: ipa: introduce gsi_channel_initialized()	Alex Elder	1	-8/+14
	Create a simple helper function that indicates whether a channel has been initialized. This abstacts/hides the details of how this is determined. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: ipa: introduce ipa_table_hash_support()	Alex Elder	3	-9/+17
	Introduce a new function to abstract the knowledge of whether hashed routing and filter tables are supported for a given IPA instance. IPA v4.2 is the only one that doesn't support hashed tables (now and for the foreseeable future), but the name of the helper function is better for explaining what's going on. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: ipa: fix register write command validation	Alex Elder	1	-8/+24
	In ipa_cmd_register_write_valid() we verify that values we will supply to a REGISTER_WRITE IPA immediate command will fit in the fields that need to hold them. This patch fixes some issues in that function and ipa_cmd_register_write_offset_valid(). The dev_err() call in ipa_cmd_register_write_offset_valid() has some printf format errors: - The name of the register (corresponding to the string format specifier) was not supplied. - The IPA base offset and offset need to be supplied separately to match the other format specifiers. Also make the ~0 constant used there to compute the maximum supported offset value explicitly unsigned. There are two other issues in ipa_cmd_register_write_valid(): - There's no need to check the hash flush register for platforms (like IPA v4.2) that do not support hashed tables - The highest possible endpoint number, whose status register offset is computed, is COUNT - 1, not COUNT. Fix these problems, and add some additional commentary. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: ipa: use dev_err_probe() in ipa_clock.c	Alex Elder	1	-4/+5
	When initializing the IPA core clock and interconnects, it's possible we'll get an EPROBE_DEFER error. This isn't really an error, it's just means we need to be re-probed later. Use dev_err_probe() to report the error rather than dev_err(). This avoids polluting the log with these "error" messages. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: ipa: use a separate pointer for adjusted GSI memory	Alex Elder	3	-26/+28
	This patch actually fixes a bug, though it doesn't affect the two platforms supported currently. The fix implements GSI memory pointers a bit differently. For IPA version 4.5 and above, the address space for almost all GSI registers is adjusted downward by a fixed amount. This is currently handled by adjusting the I/O virtual address pointer after it has been mapped. The bug is that the pointer is not "de-adjusted" as it should be when it's unmapped. This patch fixes that error, but it does so by maintaining one "raw" pointer for the mapped memory range. This is assigned when the memory is mapped and used to unmap the memory. This pointer is also used to access the two registers that do not sit in the "adjusted" memory space. Rather than adjusting that pointer, we maintain a separate pointer that's an adjusted copy of the "raw" pointer, and that is used for most GSI register accesses. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	octeontx2-pf: Fix out-of-bounds read in otx2_get_fecparam()	Gustavo A. R. Silva	1	-1/+1
	Code at line 967 implies that rsp->fwdata.supported_fec may be up to 4: 967: if (rsp->fwdata.supported_fec <= FEC_MAX_INDEX) If rsp->fwdata.supported_fec evaluates to 4, then there is an out-of-bounds read at line 971 because fec is an array with a maximum of 4 elements: 954 const int fec[] = { 955 ETHTOOL_FEC_OFF, 956 ETHTOOL_FEC_BASER, 957 ETHTOOL_FEC_RS, 958 ETHTOOL_FEC_BASER \| ETHTOOL_FEC_RS}; 959 #define FEC_MAX_INDEX 4 971: fecparam->fec = fec[rsp->fwdata.supported_fec]; Fix this by properly indexing fec[] with rsp->fwdata.supported_fec - 1. In this case the proper indexes 0 to 3 are used when rsp->fwdata.supported_fec evaluates to a range of 1 to 4, correspondingly. Fixes: d0cf9503e908 ("octeontx2-pf: ethtool fec mode support") Addresses-Coverity-ID: 1501722 ("Out-of-bounds read") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	octeontx2-af: Fix spelling mistake "recievd" -> "received"	Colin Ian King	1	-1/+1
	There is a spelling mistake in the text in array rpm_rx_stats_fields, fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	rxrpc: Fix dependency on IPv6 in udp tunnel config	Vadim Fedorenko	1	-0/+2
	As udp_port_cfg struct changes its members with dependency on IPv6 configuration, the code in rxrpc should also check for IPv6. Fixes: 1a9b86c9fd95 ("rxrpc: use udp tunnel APIs instead of open code in rxrpc_open_socket") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	mptcp: add netlink event support	Florian Westphal	5	-7/+364
	Allow userspace (mptcpd) to subscribe to mptcp genl multicast events. This implementation reuses the same event API as the mptcp kernel fork to ease integration of existing tools, e.g. mptcpd. Supported events include: 1. start and close of an mptcp connection 2. start and close of subflows (joins) 3. announce and withdrawals of addresses 4. subflow priority (backup/non-backup) change. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	mptcp: avoid lock_fast usage in accept path	Florian Westphal	3	-3/+35
	Once event support is added this may need to allocate memory while msk lock is held with softirqs disabled. Not using lock_fast also allows to do the allocation with GFP_KERNEL. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	mptcp: pass subflow socket to a few helpers	Florian Westphal	5	-8/+8
	Pass the first/initial subflow to the existing functions so they can pass this on to the notification handler that is added later in the series. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	mptcp: move subflow close loop after sk close check	Florian Westphal	1	-3/+3
	In case mptcp socket is already dead the entire mptcp socket will be freed. We can avoid the close check in this case. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	mptcp: schedule worker when subflow is closed	Florian Westphal	2	-2/+27
	When remote side closes a subflow we should schedule the worker to dispose of the subflow in a timely manner. Otherwise, SF_CLOSED event won't be generated until the mptcp socket itself is closing or local side is closing another subflow. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	mptcp: split __mptcp_close_ssk helper	Florian Westphal	3	-7/+13
	Prepare for subflow close events: When mptcp connection is torn down its enough to send the mptcp socket close notification rather than a subflow close event for all of the subflows followed by the mptcp close event. This splits the helper: mptcp_close_ssk() will emit the close notification, __mptcp_close_ssk will not. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	mptcp: move pm netlink work into pm_netlink	Florian Westphal	3	-42/+42
	Allows to make some functions static and avoids acquire of the pm spinlock in protocol.c. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	selftests: mptcp: fail if not enough SYN/3rd ACK	Matthieu Baerts	1	-8/+20
	If we receive less MPCapable SYN or 3rd ACK than expected, we now mark the test as failed. On the other hand, if we receive more, we keep the warning but we add a hint that it is probably due to retransmissions and that's why we don't mark the test as failed. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/148 Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	selftests: mptcp: display warnings on one line	Matthieu Baerts	1	-23/+40
	Before we had this in case of SYN retransmissions: (...) # ns4 MPTCP -> ns2 (10.0.1.2:10034 ) MPTCP (duration 1201ms) [ OK ] # ns4 MPTCP -> ns2 (dead:beef:1::2:10035) MPTCP (duration 1242ms) [ OK ] # ns4 MPTCP -> ns2 (10.0.2.1:10036 ) MPTCP ns2-60143c00-cDZWo4 SYNRX: MPTCP -> MPTCP: expect 11, got # 13 # (duration 6221ms) [ OK ] # ns4 MPTCP -> ns2 (dead:beef:2::1:10037) MPTCP (duration 1427ms) [ OK ] # ns4 MPTCP -> ns3 (10.0.2.2:10038 ) MPTCP (duration 881ms) [ OK ] (...) Now we have: (...) # ns4 MPTCP -> ns2 (10.0.1.2:10034 ) MPTCP (duration 1201ms) [ OK ] # ns4 MPTCP -> ns2 (dead:beef:1::2:10035) MPTCP (duration 1242ms) [ OK ] # ns4 MPTCP -> ns2 (10.0.2.1:10036 ) MPTCP (duration 6221ms) [ OK ] WARN: SYNRX: expect 11, got 13 # ns4 MPTCP -> ns2 (dead:beef:2::1:10037) MPTCP (duration 1427ms) [ OK ] # ns4 MPTCP -> ns3 (10.0.2.2:10038 ) MPTCP (duration 881ms) [ OK ] (...) So we put everything on one line, keep the durations and "OK" aligned and removed duplicated info to short the warning. Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	selftests: mptcp: fix ACKRX debug message	Matthieu Baerts	1	-1/+1
	Info from received MPCapable SYN were printed instead of the ones from received MPCapable 3rd ACK. Fixes: fed61c4b584c ("selftests: mptcp: make 2nd net namespace use tcp syn cookies unconditionally") Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	selftests: mptcp: dump more info on errors	Paolo Abeni	1	-3/+12
	Even if that may sound completely unlikely, the mptcp implementation is not perfect, yet. When the self-tests report an error we usually need more information of what the scripts currently report. iproute allow provides some additional goodies since a few releases, let's dump them. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12	net: hns3: refactor out hclge_rm_vport_all_mac_table()	Hao Chen	1	-24/+43
	hclge_rm_vport_all_mac_table() is bloated, so split it into separate functions for readability and maintainability. Signed-off-by: Hao Chen <chenhao288@hisilicon.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>