wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2020-05-26	mlxsw: switchx2: Move SwitchX-2 trap groups out of main enum	Ido Schimmel	2	-2/+5
	The number of Spectrum trap groups is not infinite, but two identifiers are occupied by SwitchX-2 specific trap groups. Free these identifiers by moving them out of the main enum. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: spectrum: Reduce priority of locally delivered packets	Ido Schimmel	2	-2/+2
	To align with recent recommended values. Will be configurable by future patches. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: spectrum: Use same trap group for local routes and link-local destination	Ido Schimmel	1	-1/+1
	Packets with an IPv6 link-local destination (i.e., fe80::/10) should not be forwarded and are therefore trapped to the CPU for local delivery. Since these packets are trapped for the same logical reason as packets hitting local routes, associate both traps with the same group. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: spectrum: Use separate trap group for FID miss	Ido Schimmel	2	-1/+4
	When a packet enters the device it is classified to a filtering identifier (FID) based on the ingress port and VLAN. The FID miss trap is used to trap packets for which a FID could not be found. In mlxsw this trap should only be triggered when a port is enslaved to an OVS bridge and a matching ACL rule could not be found, so as to trigger learning. These packets are therefore completely unrelated to packets hitting local routes and should be in a different group. Move them. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: spectrum: Use same trap group for various IPv6 packets	Ido Schimmel	1	-3/+3
	Group these various IPv6 packets (e.g., router solicitations, router advertisement) together and subject them to the same policer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: spectrum: Rename IPv6 ND trap group	Ido Schimmel	2	-6/+6
	The IPv6 Neighbour Discovery (ND) group will be used for various IPv6 packets, not all of which fall under the definition of ND, so rename it to "IPV6" which is more appropriate. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: spectrum: Use same switch case for identical groups	Ido Schimmel	1	-3/+0
	Trap groups that use the same policer settings can share the same switch case. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mlxsw: spectrum: Use dedicated trap group for ACL trap	Ido Schimmel	2	-1/+4
	Packets that are trapped via tc's trap action are currently subject to the same policer as packets hitting local routes. The latter are critical to the correct functioning of the control plane, while the former are mainly used for traffic inspection. Split the ACL trap to a separate group with its own policer. Use a higher priority for these traps than for traps using mirror action (e.g., ARP, IGMP). Otherwise, packets matching both traps will not be forwarded in hardware (because of trap action) and also not forwarded in software because they will be marked with 'offload_fwd_mark'. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	mptcp: attempt coalescing when moving skbs to mptcp rx queue	Florian Westphal	1	-2/+19
	We can try to coalesce skbs we take from the subflows rx queue with the tail of the mptcp rx queue. If successful, the skb head can be discarded early. We can also free the skb extensions, we do not access them after this. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	r8169: improve rtl_remove_one	Heiner Kallweit	1	-7/+5
	Don't call netif_napi_del() manually, free_netdev() does this for us. In addition reorder calls to match reverse order of calls in probe(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	ARM: dts: imx6qdl-sabresd: enable fec wake-on-lan	Fugang Duan	1	-0/+1
	Enable ethernet wake-on-lan feature for imx6q/dl/qp sabresd boards since the PHY clock is supplied by external osc. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	ARM: dts: imx: add ethernet stop mode property	Fugang Duan	5	-1/+7
	- Update the imx6qdl gpr property to define gpr register offset and bit in DT. - Add imx6sx/imx6ul/imx7d ethernet stop mode property. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	dt-bindings: fec: update the gpr property	Fugang Duan	1	-2/+5
	- rename the 'gpr' property string to 'fsl,stop-mode'. - Update the property to define gpr register offset and bit in DT, since different instance have different gpr bit. v2: * rename 'gpr' property string to 'fsl,stop-mode'. Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	net: ethernet: fec: move GPR register offset and bit into DT	Fugang Duan	1	-12/+12
	The commit da722186f654 (net: fec: set GPR bit on suspend by DT configuration) set the GPR reigster offset and bit in driver for wake on lan feature. But it introduces two issues here: - one SOC has two instances, they have different bit - different SOCs may have different offset and bit So to support wake-on-lan feature on other i.MX platforms, it should configure the GPR reigster offset and bit from DT. So the patch is to improve the commit da722186f654 (net: fec: set GPR bit on suspend by DT configuration) to support multiple ethernet instances on i.MX series. v2: * switch back to store the quirks bitmask in driver_data v3: * suggested by Sascha Hauer, use a struct fec_devinfo for abstracting differences between different hardware variants, it can give more freedom to describe the differences. Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	net/smc: mark smc_pnet_policy as const	Dmitry Vyukov	1	-1/+1
	Netlink policies are generally declared as const. This is safer and prevents potential bugs. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	net: phy: mscc-miim: read poll when high resolution timers are disabled	Antoine Tenart	2	-6/+19
	The driver uses a read polling mechanism to check the status of the MDIO bus, to know if it is ready to accept next commands. This polling mechanism uses usleep_delay() under the hood between reads which is fine as long as high resolution timers are enabled. Otherwise the delays will end up to be much longer than expected. This patch fixes this by using udelay() under the hood when CONFIG_HIGH_RES_TIMERS isn't enabled. This increases CPU usage. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	net: phy: mscc-miim: improve waiting logic	Antoine Tenart	1	-2/+13
	The MSCC MIIM MDIO driver uses a waiting logic to wait for the MDIO bus to be ready to accept next commands. It does so by polling the BUSY status bit which indicates the MDIO bus has completed all pending operations. This can take time, and the controller supports writing the next command as soon as there are no pending commands (which happens while the MDIO bus is busy completing its current command). This patch implements this improved logic by adding an helper to poll the PENDING status bit, and by adjusting where we should wait for the bus to not be busy or to not be pending. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	net: phy: mscc-miim: remove redundant timeout check	Antoine Tenart	1	-6/+2
	readl_poll_timeout already returns -ETIMEDOUT if the condition isn't satisfied, there's no need to check again the condition after calling it. Remove the redundant timeout check. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	net: phy: mscc-miim: use more reasonable delays	Antoine Tenart	1	-1/+1
	The MSCC MIIM MDIO driver uses delays to read poll a status register. I made multiple tests on a Ocelot PCS120 platform which led me to reduce those delays. The delay in between which the polling function is allowed to sleep is reduced from 100us to 50us which in almost all cases is a good value to succeed at the first retry. The overall delay is also lowered as the prior value was really way to high, 10000us is large enough. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	net: mdiobus: add clause 45 mdiobus accessors	Russell King	8	-36/+52
	There is a recurring pattern throughout some of the PHY code converting a devad and regnum to our packed clause 45 representation. Rather than having this scattered around the code, let's put a common translation function in mdio.h, and provide some register accessors. Convert the phylib core, phylink, bcm87xx and cortina to use these. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	cls_flower: Support filtering on multiple MPLS Label Stack Entries	Guillaume Nault	2	-1/+265
	With struct flow_dissector_key_mpls now recording the first FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of these LSEs independently. In order to avoid creating new netlink attributes for every possible depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute that contains the list of LSEs to match. Each LSE is represented by another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains the attributes representing the depth and the MPLS fields to match at this depth (label, TTL, etc.). For each MPLS field, the mask is always set to all-ones, as this is what the original API did. We could allow user configurable masks in the future if there is demand for more flexibility. The new API also allows to only specify an LSE depth. In that case, Flower only verifies that the MPLS label stack depth is greater or equal to the provided depth (that is, an LSE exists at this depth). Filters that only match on one (or more) fields of the first LSE are dumped using the old netlink attributes, to avoid confusing user space programs that don't understand the new API. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	flow_dissector: Parse multiple MPLS Label Stack Entries	Guillaume Nault	5	-52/+132
	The current MPLS dissector only parses the first MPLS Label Stack Entry (second LSE can be parsed too, but only to set a key_id). This patch adds the possibility to parse several LSEs by making __skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long as the Bottom Of Stack bit hasn't been seen, up to a maximum of FLOW_DIS_MPLS_MAX entries. FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for many practical purposes, without wasting too much space. To record the parsed values, flow_dissector_key_mpls is modified to store an array of stack entries, instead of just the values of the first one. A bit field, "used_lses", is also added to keep track of the LSEs that have been set. The objective is to avoid defining a new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack. TC flower is adapted for the new struct flow_dissector_key_mpls layout. Matching on several MPLS Label Stack Entries will be added in the next patch. The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and mlx5's parse_tunnel() now verify that the rule only uses the first LSE and fail if it doesn't. Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is slightly modified. Instead of recording the first Entropy Label, it now records the last one. This shouldn't have any consequences since there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY in the tree. We'd probably better do a hash of all parsed MPLS labels instead (excluding reserved labels) anyway. That'd give better entropy and would probably also simplify the code. But that's not the purpose of this patch, so I'm keeping that as a future possible improvement. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	tipc: add test for Nagle algorithm effectiveness	Tuong Lien	3	-17/+64
	When streaming in Nagle mode, we try to bundle small messages from user as many as possible if there is one outstanding buffer, i.e. not ACK-ed by the receiving side, which helps boost up the overall throughput. So, the algorithm's effectiveness really depends on when Nagle ACK comes or what the specific network latency (RTT) is, compared to the user's message sending rate. In a bad case, the user's sending rate is low or the network latency is small, there will not be many bundles, so making a Nagle ACK or waiting for it is not meaningful. For example: a user sends its messages every 100ms and the RTT is 50ms, then for each messages, we require one Nagle ACK but then there is only one user message sent without any bundles. In a better case, even if we have a few bundles (e.g. the RTT = 300ms), but now the user sends messages in medium size, then there will not be any difference at all, that says 3 x 1000-byte data messages if bundled will still result in 3 bundles with MTU = 1500. When Nagle is ineffective, the delay in user message sending is clearly wasted instead of sending directly. Besides, adding Nagle ACKs will consume some processor load on both the sending and receiving sides. This commit adds a test on the effectiveness of the Nagle algorithm for an individual connection in the network on which it actually runs. Particularly, upon receipt of a Nagle ACK we will compare the number of bundles in the backlog queue to the number of user messages which would be sent directly without Nagle. If the ratio is good (e.g. >= 2), Nagle mode will be kept for further message sending. Otherwise, we will leave Nagle and put a 'penalty' on the connection, so it will have to spend more 'one-way' messages before being able to re-enter Nagle. In addition, the 'ack-required' bit is only set when really needed that the number of Nagle ACKs will be reduced during Nagle mode. Testing with benchmark showed that with the patch, there was not much difference in throughput for small messages since the tool continuously sends messages without a break, so Nagle would still take in effect. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	tipc: add support for broadcast rcv stats dumping	Tuong Lien	9	-56/+101
	This commit enables dumping the statistics of a broadcast-receiver link like the traditional 'broadcast-link' one (which is for broadcast- sender). The link dumping can be triggered via netlink (e.g. the iproute2/tipc tool) by the link flag - 'TIPC_NLA_LINK_BROADCAST' as the indicator. The name of a broadcast-receiver link of a specific peer will be in the format: 'broadcast-link:<peer-id>'. For example: Link <broadcast-link:1001002> Window:50 packets RX packets:7841 fragments:2408/440 bundles:0/0 TX packets:0 fragments:0/0 bundles:0/0 RX naks:0 defs:124 dups:0 TX naks:21 acks:0 retrans:0 Congestion link:0 Send queue max:0 avg:0 In addition, the broadcast-receiver link statistics can be reset in the usual way via netlink by specifying that link name in command. Note: the 'tipc_link_name_ext()' is removed because the link name can now be retrieved simply via the 'l->name'. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	tipc: enable broadcast retrans via unicast	Tuong Lien	6	-11/+28
	In some environment, broadcast traffic is suppressed at high rate (i.e. a kind of bandwidth limit setting). When it is applied, TIPC broadcast can still run successfully. However, when it comes to a high load, some packets will be dropped first and TIPC tries to retransmit them but the packet retransmission is intentionally broadcast too, so making things worse and not helpful at all. This commit enables the broadcast retransmission via unicast which only retransmits packets to the specific peer that has really reported a gap i.e. not broadcasting to all nodes in the cluster, so will prevent from being suppressed, and also reduce some overheads on the other peers due to duplicates, finally improve the overall TIPC broadcast performance. Note: the functionality can be turned on/off via the sysctl file: echo 1 > /proc/sys/net/tipc/bc_retruni echo 0 > /proc/sys/net/tipc/bc_retruni Default is '0', i.e. the broadcast retransmission still works as usual. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	tipc: add back link trace events	Tuong Lien	2	-5/+11
	In the previous commit ("tipc: add Gap ACK blocks support for broadcast link"), we have removed the following link trace events due to the code changes: - tipc_link_bc_ack - tipc_link_retrans This commit adds them back along with some minor changes to adapt to the new code. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	tipc: introduce Gap ACK blocks for broadcast link	Tuong Lien	5	-185/+293
	As achieved through commit 9195948fbf34 ("tipc: improve TIPC throughput by Gap ACK blocks"), we apply the same mechanism for the broadcast link as well. The 'Gap ACK blocks' data field in a 'PROTOCOL/STATE_MSG' will consist of two parts built for both the broadcast and unicast types: 31 16 15 0 +-------------+-------------+-------------+-------------+ \| bgack_cnt \| ugack_cnt \| len \| +-------------+-------------+-------------+-------------+ - \| gap \| ack \| \| +-------------+-------------+-------------+-------------+ > bc gacks : : : \| +-------------+-------------+-------------+-------------+ - \| gap \| ack \| \| +-------------+-------------+-------------+-------------+ > uc gacks : : : \| +-------------+-------------+-------------+-------------+ - which is "automatically" backward-compatible. We also increase the max number of Gap ACK blocks to 128, allowing upto 64 blocks per type (total buffer size = 516 bytes). Besides, the 'tipc_link_advance_transmq()' function is refactored which is applicable for both the unicast and broadcast cases now, so some old functions can be removed and the code is optimized. With the patch, TIPC broadcast is more robust regardless of packet loss or disorder, latency, ... in the underlying network. Its performance is boost up significantly. For example, experiment with a 5% packet loss rate results: $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 real 0m 42.46s user 0m 1.16s sys 0m 17.67s Without the patch: $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 real 8m 27.94s user 0m 0.55s sys 0m 2.38s Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	qed: Add EDPM mode type for user-fw compatibility	Yuval Basson	4	-0/+8
	In older FW versions the completion flag was treated as the ack flag in edpm messages. Expose the FW option of setting which mode the QP is in by adding a flag to the qedr <-> qed API. Flag is added for backward compatibility with libqedr. This flag will be set by qedr after determining whether the libqedr is using the updated version. Fixes: f10939403352 ("qed: Add support for QP verbs") Signed-off-by: Yuval Basson <yuval.bason@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	tcp: tcp_v4_err() icmp skb is named icmp_skb	Eric Dumazet	1	-1/+1
	I missed the fact that tcp_v4_err() differs from tcp_v6_err(). After commit 4d1a2d9ec1c1 ("Rename skb to icmp_skb in tcp_v4_err()") the skb argument has been renamed to icmp_skb only in one function. I will in a future patch reconciliate these functions to avoid this kind of confusion. Fixes: 45af29ca761c ("tcp: allow traceroute -Mtcp for unpriv users") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26	batman-adv: Revert "disable ethtool link speed detection when auto negotiation off"	Sven Eckelmann	1	-14/+1
	The commit 8c46fcd78308 ("batman-adv: disable ethtool link speed detection when auto negotiation off") disabled the usage of ethtool's link_ksetting when auto negotation was enabled due to invalid values when used with tun/tap virtual net_devices. According to the patch, automatic measurements should be used for these kind of interfaces. But there are major flaws with this argumentation: * automatic measurements are not implemented * auto negotiation has nothing to do with the validity of the retrieved values The first point has to be fixed by a longer patch series. The "validity" part of the second point must be addressed in the same patch series by dropping the usage of ethtool's link_ksetting (thus always doing automatic measurements over ethernet). Drop the patch again to have more default values for various net_device types/configurations. The user can still overwrite them using the batadv_hardif's BATADV_ATTR_THROUGHPUT_OVERRIDE. Reported-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2020-05-25	r8169: sync RTL8168f/RTL8411 hw config with vendor driver	Heiner Kallweit	1	-3/+3
	Sync hw config for RTL8168f/RTL8411 with r8168 vendor driver. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-25	r8169: sync RTL8168evl hw config with vendor driver	Heiner Kallweit	1	-3/+5
	Sync hw config for RTL8168evl with r8168 vendor driver. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-25	r8169: sync RTL8168h hw config with vendor driver	Heiner Kallweit	1	-2/+1
	Sync hw config for RTL8168h with r8168 vendor driver. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-25	r8169: sync RTL8168g hw config with vendor driver	Heiner Kallweit	1	-0/+1
	Sync hw config for RTL8168g with r8168 vendor driver. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-25	bridge: mrp: Fix out-of-bounds read in br_mrp_parse	Horatiu Vultur	1	-0/+6
	The issue was reported by syzbot. When the function br_mrp_parse was called with a valid net_bridge_port, the net_bridge was an invalid pointer. Therefore the check br->stp_enabled could pass/fail depending where it was pointing in memory. The fix consists of setting the net_bridge pointer if the port is a valid pointer. Reported-by: syzbot+9c6f0f1f8e32223df9a4@syzkaller.appspotmail.com Fixes: 6536993371fa ("bridge: mrp: Integrate MRP into the bridge") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-25	drivers: ipa: print dev_err info accurately	Wang Wenhu	1	-2/+2
	Print certain name string instead of hard-coded "memory" for dev_err output, which would be more accurate and helpful for debugging. Signed-off-by: Wang Wenhu <wenhu.wang@vivo.com> Cc: Alex Elder <elder@kernel.org> Reviewed-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-25	ptp_clock: Let the ADJ_OFFSET interface respect the ADJ_NANO flag for PHC devices.	Richard Cochran	1	-2/+8
	In commit 184ecc9eb260d5a3bcdddc5bebd18f285ac004e9 ("ptp: Add adjphase function to support phase offset control.") the PTP Hardware Clock interface expanded to support the ADJ_OFFSET offset mode. However, the implementation did not respect the traditional yet pedantic distinction between units of microseconds and nanoseconds signaled by the ADJ_NANO flag. This patch fixes the issue by adding logic to handle that flag. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Vincent Cheng <vincent.cheng.xh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-25	tcp: allow traceroute -Mtcp for unpriv users	Eric Dumazet	2	-0/+4
	Unpriv users can use traceroute over plain UDP sockets, but not TCP ones. $ traceroute -Mtcp 8.8.8.8 You do not have enough privileges to use this traceroute method. $ traceroute -n -Mudp 8.8.8.8 traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets 1 192.168.86.1 3.631 ms 3.512 ms 3.405 ms 2 10.1.10.1 4.183 ms 4.125 ms 4.072 ms 3 96.120.88.125 20.621 ms 19.462 ms 20.553 ms 4 96.110.177.65 24.271 ms 25.351 ms 25.250 ms 5 69.139.199.197 44.492 ms 43.075 ms 44.346 ms 6 68.86.143.93 27.969 ms 25.184 ms 25.092 ms 7 96.112.146.18 25.323 ms 96.112.146.22 25.583 ms 96.112.146.26 24.502 ms 8 72.14.239.204 24.405 ms 74.125.37.224 16.326 ms 17.194 ms 9 209.85.251.9 18.154 ms 209.85.247.55 14.449 ms 209.85.251.9 26.296 ms^C We can easily support traceroute over TCP, by queueing an error message into socket error queue. Note that applications need to set IP_RECVERR/IPV6_RECVERR option to enable this feature, and that the error message is only queued while in SYN_SNT state. socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 3 setsockopt(3, SOL_IPV6, IPV6_RECVERR, [1], 4) = 0 setsockopt(3, SOL_SOCKET, SO_TIMESTAMP_OLD, [1], 4) = 0 setsockopt(3, SOL_IPV6, IPV6_UNICAST_HOPS, [5], 4) = 0 connect(3, {sa_family=AF_INET6, sin6_port=htons(8787), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "2002:a05:6608:297::", &sin6_addr), sin6_scope_id=0}, 28) = -1 EHOSTUNREACH (No route to host) recvmsg(3, {msg_name={sa_family=AF_INET6, sin6_port=htons(8787), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "2002:a05:6608:297::", &sin6_addr), sin6_scope_id=0}, msg_namelen=1024->28, msg_iov=[{iov_base="`\r\337\320\0004\6\1&\7\370\260\200\231\16\27\0\0\0\0\0\0\0\0 \2\n\5f\10\2\227"..., iov_len=1024}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SO_TIMESTAMP_OLD, cmsg_data={tv_sec=1590340680, tv_usec=272424}}, {cmsg_len=60, cmsg_level=SOL_IPV6, cmsg_type=IPV6_RECVERR}], msg_controllen=96, msg_flags=MSG_ERRQUEUE}, MSG_ERRQUEUE) = 144 Suggested-by: Maciej Żenczykowski <maze@google.com Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-25	bnx2x: allow bnx2x_bsc_read() to schedule	Eric Dumazet	1	-11/+15
	bnx2x_warpcore_read_sfp_module_eeprom() can call bnx2x_bsc_read() three times before giving up. This causes latency blips of at least 31 ms (58 ms being reported by our teams) Convert the long lasting loops of udelay() to usleep_range() ones, and breaks the loops on precise time tracking. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ariel Elior <aelior@marvell.com> Cc: Sudarsana Kalluru <skalluru@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-25	ipv4: potential underflow in compat_ip_setsockopt()	Dan Carpenter	1	-1/+1
	The value of "n" is capped at 0x1ffffff but it checked for negative values. I don't think this causes a problem but I'm not certain and it's harmless to prevent it. Fixes: 2e04172875c9 ("ipv4: do compat setsockopt for MCAST_MSFILTER directly") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-25	mvneta: MVNETA_SKB_HEADROOM set last 3 bits to zero	Sven Auhagen	1	-1/+2
	For XDP the MVNETA_SKB_HEADROOM is used as an offset for the received data. The MVNETA manual states that the last 3 bits assumed to be 0. This is currently the case but lets make it explicit in the definition to prevent future problems. Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24	vxlan: Do not assume RTNL is held in vxlan_fdb_info()	Ido Schimmel	1	-3/+12
	vxlan_fdb_info() is not always called with RTNL held or from an RCU read-side critical section. For example, in the following call path: vxlan_cleanup() vxlan_fdb_destroy() vxlan_fdb_notify() __vxlan_fdb_notify() vxlan_fdb_info() The use of rtnl_dereference() can therefore result in the following splat [1]. Fix this by dereferencing the nexthop under RCU read-side critical section. [1] [May24 22:56] ============================= [ +0.004676] WARNING: suspicious RCU usage [ +0.004614] 5.7.0-rc5-custom-16219-g201392003491 #2772 Not tainted [ +0.007116] ----------------------------- [ +0.004657] drivers/net/vxlan.c:276 suspicious rcu_dereference_check() usage! [ +0.008164] other info that might help us debug this: [ +0.009126] rcu_scheduler_active = 2, debug_locks = 1 [ +0.007504] 5 locks held by bash/6892: [ +0.004392] #0: ffff8881d47e3410 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: __do_execve_file.isra.27+0x392/0x23c0 [ +0.011795] #1: ffff8881d47e34b0 (&sig->exec_update_mutex){+.+.}-{3:3}, at: flush_old_exec+0x510/0x2030 [ +0.010947] #2: ffff8881a141b0b0 (ptlock_ptr(page)#2){+.+.}-{2:2}, at: unmap_page_range+0x9c0/0x2590 [ +0.010585] #3: ffff888230009d50 ((&vxlan->age_timer)){+.-.}-{0:0}, at: call_timer_fn+0xe8/0x800 [ +0.010192] #4: ffff888183729bc8 (&vxlan->hash_lock[h]){+.-.}-{2:2}, at: vxlan_cleanup+0x133/0x4a0 [ +0.010382] stack backtrace: [ +0.005103] CPU: 1 PID: 6892 Comm: bash Not tainted 5.7.0-rc5-custom-16219-g201392003491 #2772 [ +0.009675] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016 [ +0.010155] Call Trace: [ +0.002775] <IRQ> [ +0.002313] dump_stack+0xfd/0x178 [ +0.003895] lockdep_rcu_suspicious+0x14a/0x153 [ +0.005157] vxlan_fdb_info+0xe39/0x12a0 [ +0.004775] __vxlan_fdb_notify+0xb8/0x160 [ +0.004672] vxlan_fdb_notify+0x8e/0xe0 [ +0.004370] vxlan_fdb_destroy+0x117/0x330 [ +0.004662] vxlan_cleanup+0x1aa/0x4a0 [ +0.004329] call_timer_fn+0x1c4/0x800 [ +0.004357] run_timer_softirq+0x129d/0x17e0 [ +0.004762] __do_softirq+0x24c/0xaef [ +0.004232] irq_exit+0x167/0x190 [ +0.003767] smp_apic_timer_interrupt+0x1dd/0x6a0 [ +0.005340] apic_timer_interrupt+0xf/0x20 [ +0.004620] </IRQ> Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Amit Cohen <amitc@mellanox.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24	mlxsw: spectrum: Fix spelling mistake in trap's name	Ido Schimmel	2	-4/+4
	Fix incorrect spelling of "advertisement". Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24	mlxsw: spectrum: Use dedicated trap group for sampled packets	Ido Schimmel	2	-1/+3
	The rate with which packets are sampled is determined by user space, so there is no need to associate such packets with a policer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24	mlxsw: spectrum: Use same trap group for IPv6 ND and ARP packets	Ido Schimmel	1	-4/+4
	Both packet types are needed for the same reason (neighbour discovery), so associate them with the same trap group. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24	mlxsw: spectrum: Rename ARP trap group	Ido Schimmel	2	-7/+8
	The ARP trap group will be used for IPv6 ND traps in the next patch, so rename it to "NEIGH_DISCOVERY" which is more appropriate. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24	mlxsw: spectrum_trap: Remove unnecessary field	Ido Schimmel	1	-6/+1
	Now that traffic class (TC) and priority are set to the same value, there is no need to store both. Remove the first. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24	mlxsw: spectrum: Align TC and trap priority	Ido Schimmel	2	-5/+5
	The traffic class (TC) attribute of packet traps determines through which TC a packet trap will be scheduled through the CPU port. The priority attribute determines which trap will be triggered in case several packet traps match a packet. We try to configure these attributes to the same value for all packet traps as there is little reason not to. Some packet traps did not use the same value, so rectify that now. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24	mlxsw: spectrum_buffers: Assign non-zero quotas to TC 0 of the CPU port	Ido Schimmel	1	-1/+1
	As explained in commit 9ffcc3725f09 ("mlxsw: spectrum: Allow packets to be trapped from any PG"), incoming packets can be admitted to the shared buffer and forwarded / trapped, if: (Ingress{Port}.Usage < Thres && Ingress{Port,PG}.Usage < Thres && Egress{Port}.Usage < Thres && Egress{Port,TC}.Usage < Thres) \|\| (Ingress{Port}.Usage < Min \|\| Ingress{Port,PG} < Min \|\| Egress{Port}.Usage < Min \|\| Egress{Port,TC}.Usage < Min) Trapped packets are scheduled to transmission through the CPU port. Currently, the minimum and maximum quotas of traffic class (TC) 0 of the CPU port are 0, which means it is not usable. Assign non-zero quotas to TC 0 of the CPU port, so that it could be utilized by subsequent patches. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-24	mlxsw: spectrum: Change default rate and priority of DHCP packets	Ido Schimmel	1	-2/+2
	Reduce the default acceptable rate of DHCP packets to 128 packets per second and reduce their priority. This is reasonable given the Spectrum ASICs are limited to 128 ports at the moment. These are only the default values. Users will be able to modify them via devlink-trap. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>