wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2022-03-18	net: lan743x: Add support to display Tx Queue statistics	Raju Lakkaraju	3	-0/+33
	Tx 4 queue statistics display through ethtool application Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-17	net: dsa: felix: add port mirroring support	Vladimir Oltean	1	-0/+20
	Gain support for port mirroring using tc-matchall by forwarding the calls to the ocelot switch library. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: dsa: pass extack to dsa_switch_ops :: port_mirror_add()	Vladimir Oltean	10	-10/+14
	Drivers might have error messages to propagate to user space, most common being that they support a single mirror port. Propagate the netlink extack so that they can inform user space in a verbal way of their limitations. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: mscc: ocelot: offload per-flow mirroring using tc-mirred and VCAP IS2	Vladimir Oltean	5	-3/+42
	Per-flow mirroring with the VCAP IS2 TCAM (in itself handled as an offload for tc-flower) is done by setting the MIRROR_ENA bit from the action vector of the filter. The packet is mirrored to the port mask configured in the ANA:ANA:MIRRORPORTS register (the same port mask as the destinations for port-based mirroring). Functionality was tested with: tc qdisc add dev swp3 clsact tc filter add dev swp3 ingress protocol ip \ flower skip_sw ip_proto icmp \ action mirred egress mirror dev swp1 and pinging through swp3, while seeing that the ICMP replies are mirrored towards swp1. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: mscc: ocelot: establish functions for handling VCAP aux resources	Vladimir Oltean	1	-11/+30
	Some VCAP filters utilize resources which are global to the switch, like for example VCAP IS2 policers take an index into a global policer pool. In commit c9a7fe1238e5 ("net: mscc: ocelot: add action of police on vcap_is2"), Xiaoliang expressed this by hooking into the low-level ocelot_vcap_filter_add_to_block() and ocelot_vcap_block_remove_filter() functions, and allocating/freeing the policers from there. Evaluating the code, there probably isn't a better place, but we'll need to do something similar for the mirror ports, and the code will start to look even more hacked up than it is right now. Create two ocelot_vcap_filter_{add,del}_aux_resources() functions to contain the madness, and pollute less the body of other functions such as ocelot_vcap_filter_add_to_block(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: mscc: ocelot: add port mirroring support using tc-matchall	Vladimir Oltean	4	-2/+159
	Ocelot switches perform port-based ingress mirroring if ANA:PORT:PORT_CFG field SRC_MIRROR_ENA is set, and egress mirroring if the port is in ANA:ANA:EMIRRORPORTS. Both ingress-mirrored and egress-mirrored frames are copied to the port mask from ANA:ANA:MIRRORPORTS. So the choice of limiting to a single mirror port via ocelot_mirror_get() and ocelot_mirror_put() may seem bizarre, but the hardware model doesn't map very well to the user space model. If the user wants to mirror the ingress of swp1 towards swp2 and the ingress of swp3 towards swp4, we'd have to program ANA:ANA:MIRRORPORTS with BIT(2) \| BIT(4), and that would make swp1 be mirrored towards swp4 too, and swp3 towards swp2. But there are no tc-matchall rules to describe those actions. Now, we could offload a matchall rule with multiple mirred actions, one per desired mirror port, and force the user to stick to the multi-action rule format for subsequent matchall filters. But both DSA and ocelot have the flow_offload_has_one_action() check for the matchall offload, plus the fact that it will get cumbersome to cross-check matchall mirrors with flower mirrors (which will be added in the next patch). As a result, we limit the configuration to a single mirror port, with the possibility of lifting the restriction in the future. Frames injected from the CPU don't get egress-mirrored, since they are sent with the BYPASS bit in the injection frame header, and this bypasses the analyzer module (effectively also the mirroring logic). I don't know what to do/say about this. Functionality was tested with: tc qdisc add dev swp3 clsact tc filter add dev swp3 ingress \ matchall skip_sw \ action mirred egress mirror dev swp1 and pinging through swp3, while seeing that the ICMP replies are mirrored towards swp1. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: mscc: ocelot: refactor policer work out of ocelot_setup_tc_cls_matchall	Vladimir Oltean	1	-39/+71
	In preparation for adding port mirroring support to the ocelot driver, the dispatching function ocelot_setup_tc_cls_matchall() must be free of action-specific code. Move port policer creation and deletion to separate functions. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	ptp: ocp: Make debugfs variables the correct bitwidth	Jonathan Lemon	1	-1/+2
	An earlier patch mistakenly changed these variables from u32 to u16, leading to unintended truncation. Restore the original logic. Fixes: a509a7c61e3b ("ptp: ocp: Add support for selectable SMA directions.") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Link: https://lore.kernel.org/r/20220316165347.599154-1-jonathan.lemon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: dsa: microchip: ksz8795: handle eee specif erratum	Oleksij Rempel	4	-1/+50
	According to erratum described in DS80000687C[1]: "Module 2: Link drops with some EEE link partners.", we need to "Disable the EEE next page exchange in EEE Global Register 2" 1 - https://ww1.microchip.com/downloads/en/DeviceDoc/KSZ87xx-Errata-DS80000687C.pdf Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220316125529.1489045-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: dsa: mv88e6xxx: MST Offloading	Tobias Waldekranz	2	-8/+255
	Allocate a SID in the STU for each MSTID in use by a bridge and handle the mapping of MSTIDs to VLANs using the SID field of each VTU entry. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: dsa: mv88e6xxx: Export STU as devlink region	Tobias Waldekranz	2	-0/+95
	Export the raw STU data in a devlink region so that it can be inspected from userspace and compared to the current bridge configuration. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: dsa: mv88e6xxx: Disentangle STU from VTU	Tobias Waldekranz	4	-135/+264
	In early LinkStreet silicon (e.g. 6095/6185), the per-VLAN STP states were kept in the VTU - there was no concept of a SID. Later, the information was split into two tables, where the VTU only tracked memberships and deferred the STP state tracking to the STU via a pointer (SID). This meant that a group of VLANs could share the same STU entry. Most likely, this was done to align with MSTP (802.1Q-2018, Clause 13), which is built on this principle. While the VTU is still 4k lines on most devices, the STU is capped at 64 entries. This means that the current stategy, updating STU info whenever a VTU entry is updated, can not easily support MSTP because: - The maximum number of VIDs would also be capped at 64, as we would have to allocate one SID for every VTU entry - even if many VLANs would effectively share the same MST. - MSTP updates would be unnecessarily slow as you would have to iterate over all VLANs that share the same MST. In order to support MSTP offloading in the future, manage the STU as a separate entity from the VTU. Only add support for newer hardware with separate VTU and STU. VTU-only devices can also be supported, but essentially this requires a software implementation of an STU (fanning out state changed to all VLANs tied to the same MST). Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: dsa: Handle MST state changes	Tobias Waldekranz	4	-8/+89
	Add the usual trampoline functionality from the generic DSA layer down to the drivers for MST state changes. When a state changes to disabled/blocking/listening, make sure to fast age any dynamic entries in the affected VLANs (those controlled by the MSTI in question). Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: dsa: Pass VLAN MSTI migration notifications to driver	Tobias Waldekranz	4	-1/+26
	Add the usual trampoline functionality from the generic DSA layer down to the drivers for VLAN MSTI migrations. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: dsa: Validate hardware support for MST	Tobias Waldekranz	3	-0/+30
	When joining a bridge where MST is enabled, we validate that the proper offloading support is in place, otherwise we fallback to software bridging. When then mode is changed on a bridge in which we are members, we refuse the change if offloading is not supported. At the moment we only check for configurable learning, but this will be further restricted as we support more MST related switchdev events. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: bridge: mst: Add helper to query a port's MST state	Tobias Waldekranz	2	-0/+31
	This is useful for switchdev drivers who are offloading MST states into hardware. As an example, a driver may wish to flush the FDB for a port when it transitions from forwarding to blocking - which means that the previous state must be discoverable. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: bridge: mst: Add helper to check if MST is enabled	Tobias Waldekranz	2	-0/+15
	This is useful for switchdev drivers that might want to refuse to join a bridge where MST is enabled, if the hardware can't support it. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: bridge: mst: Add helper to map an MSTI to a VID set	Tobias Waldekranz	2	-0/+33
	br_mst_get_info answers the question: "On this bridge, which VIDs are mapped to the given MSTI?" This is useful in switchdev drivers, which might have to fan-out operations, relating to an MSTI, per VLAN. An example: When a port's MST state changes from forwarding to blocking, a driver may choose to flush the dynamic FDB entries on that port to get faster reconvergence of the network, but this should only be done in the VLANs that are managed by the MSTI in question. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: bridge: mst: Notify switchdev drivers of MST state changes	Tobias Waldekranz	2	-0/+25
	Generate a switchdev notification whenever an MST state changes. This notification is keyed by the VLANs MSTI rather than the VID, since multiple VLANs may share the same MST instance. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations	Tobias Waldekranz	3	-0/+66
	Whenever a VLAN moves to a new MSTI, send a switchdev notification so that switchdevs can track a bridge's VID to MSTI mappings. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: bridge: mst: Notify switchdev drivers of MST mode changes	Tobias Waldekranz	2	-0/+13
	Trigger a switchdev event whenever the bridge's MST mode is enabled/disabled. This allows constituent ports to either perform any required hardware config, or refuse the change if it not supported. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: bridge: mst: Support setting and reporting MST port states	Tobias Waldekranz	5	-1/+209
	Make it possible to change the port state in a given MSTI by extending the bridge port netlink interface (RTM_SETLINK on PF_BRIDGE).The proposed iproute2 interface would be: bridge mst set dev <PORT> msti <MSTI> state <STATE> Current states in all applicable MSTIs can also be dumped via a corresponding RTM_GETLINK. The proposed iproute interface looks like this: $ bridge mst port msti vb1 0 state forwarding 100 state disabled vb2 0 state forwarding 100 state forwarding The preexisting per-VLAN states are still valid in the MST mode (although they are read-only), and can be queried as usual if one is interested in knowing a particular VLAN's state without having to care about the VID to MSTI mapping (in this example VLAN 20 and 30 are bound to MSTI 100): $ bridge -d vlan port vlan-id vb1 10 state forwarding mcast_router 1 20 state disabled mcast_router 1 30 state disabled mcast_router 1 40 state forwarding mcast_router 1 vb2 10 state forwarding mcast_router 1 20 state forwarding mcast_router 1 30 state forwarding mcast_router 1 40 state forwarding mcast_router 1 Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: bridge: mst: Allow changing a VLAN's MSTI	Tobias Waldekranz	4	-0/+59
	Allow a VLAN to move out of the CST (MSTI 0), to an independent tree. The user manages the VID to MSTI mappings via a global VLAN setting. The proposed iproute2 interface would be: bridge vlan global set dev br0 vid <VID> msti <MSTI> Changing the state in non-zero MSTIs is still not supported, but will be addressed in upcoming changes. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: bridge: mst: Multiple Spanning Tree (MST) mode	Tobias Waldekranz	9	-4/+176
	Allow the user to switch from the current per-VLAN STP mode to an MST mode. Up to this point, per-VLAN STP states where always isolated from each other. This is in contrast to the MSTP standard (802.1Q-2018, Clause 13.5), where VLANs are grouped into MST instances (MSTIs), and the state is managed on a per-MSTI level, rather that at the per-VLAN level. Perhaps due to the prevalence of the standard, many switching ASICs are built after the same model. Therefore, add a corresponding MST mode to the bridge, which we can later add offloading support for in a straight-forward way. For now, all VLANs are fixed to MSTI 0, also called the Common Spanning Tree (CST). That is, all VLANs will follow the port-global state. Upcoming changes will make this actually useful by allowing VLANs to be mapped to arbitrary MSTIs and allow individual MSTI states to be changed. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	r8169: improve driver unload and system shutdown behavior on DASH-enabled systems	Heiner Kallweit	1	-4/+16
	There's a number of systems supporting DASH remote management. Driver unload and system shutdown can result in the PHY suspending, thus making DASH unusable. Improve this by handling DASH being enabled very similar to WoL being enabled. Tested-by: Yanko Kaneti <yaneti@declera.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/1de3b176-c09c-1654-6f00-9785f7a4f954@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	ethernet: sun: Fix spelling mistake "mis-matched" -> "mismatched"	Colin Ian King	1	-1/+1
	There is a spelling mistake in a dev_err message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20220316234620.55885-1-colin.i.king@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net: ethernet: ti: Fix spelling mistake and clean up message	Colin Ian King	1	-1/+1
	There is a spelling mistake in a dev_err message and the MAX_SKB_FRAGS value does not need to be printed between parentheses. Fix this. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20220316233455.54541-1-colin.i.king@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	vlan: use correct format characters	Bill Wendling	1	-1/+1
	When compiling with -Wformat, clang emits the following warning: net/8021q/vlanproc.c:284:22: warning: format specifies type 'unsigned short' but the argument has type 'int' [-Wformat] mp->priority, ((mp->vlan_qos >> 13) & 0x7)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for ints and unsigned ints. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Bill Wendling <morbo@google.com> Link: https://lore.kernel.org/r/20220316213125.2353370-1-morbo@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net/fsl: xgmac_mdio: use correct format characters	Bill Wendling	1	-1/+1
	When compiling with -Wformat, clang emits the following warning: drivers/net/ethernet/freescale/xgmac_mdio.c:243:22: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat] phy_id, dev_addr, regnum); ^~~~~~ ./include/linux/dev_printk.h:163:47: note: expanded from macro 'dev_dbg' dev_printk(KERN_DEBUG, dev, dev_fmt(fmt), ##__VA_ARGS__); \ ~~~ ^~~~~~~~~~~ ./include/linux/dev_printk.h:129:34: note: expanded from macro 'dev_printk' _dev_printk(level, dev, fmt, ##__VA_ARGS__); \ ~~~ ^~~~~~~~~~~ The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for ints and unsigned ints. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Bill Wendling <morbo@google.com> Link: https://lore.kernel.org/r/20220316213114.2352352-1-morbo@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	bnx2x: use correct format characters	Bill Wendling	1	-2/+2
	When compiling with -Wformat, clang emits the following warnings: drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6181:40: warning: format specifies type 'unsigned short' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] ret = scnprintf(str, len, "%hx.%hx", num >> 16, num); ~~~ ^~~~~~~~~ %x drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6181:51: warning: format specifies type 'unsigned short' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] ret = scnprintf(str, len, "%hx.%hx", num >> 16, num); ~~~ ^~~ %x drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6196:47: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] ret = scnprintf(str, len, "%hhx.%hhx.%hhx", num >> 16, num >> 8, num); ~~~~ ^~~~~~~~~ %x drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6196:58: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] ret = scnprintf(str, len, "%hhx.%hhx.%hhx", num >> 16, num >> 8, num); ~~~~ ^~~~~~~~ %x drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c:6196:68: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat] ret = scnprintf(str, *len, "%hhx.%hhx.%hhx", num >> 16, num >> 8, num); ~~~~ ^~~ %x The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for ints and unsigned ints. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Bill Wendling <morbo@google.com> Link: https://lore.kernel.org/r/20220316213104.2351651-1-morbo@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	enetc: use correct format characters	Bill Wendling	1	-1/+1
	When compiling with -Wformat, clang emits the following warning: drivers/net/ethernet/freescale/enetc/enetc_mdio.c:151:22: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat] phy_id, dev_addr, regnum); ^~~~~~ ./include/linux/dev_printk.h:163:47: note: expanded from macro 'dev_dbg' dev_printk(KERN_DEBUG, dev, dev_fmt(fmt), ##__VA_ARGS__); \ ~~~ ^~~~~~~~~~~ ./include/linux/dev_printk.h:129:34: note: expanded from macro 'dev_printk' _dev_printk(level, dev, fmt, ##__VA_ARGS__); \ ~~~ ^~~~~~~~~~~ The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for ints and unsigned ints. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Bill Wendling <morbo@google.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20220316213109.2352015-1-morbo@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17	net/mlx5: Remove unused fill page array API function	Tariq Toukan	2	-14/+0
	mlx5_fill_page_array API function is not used. Remove it, reduce the number of exported functions. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17	net/mlx5: Remove unused exported contiguous coherent buffer allocation API	Tariq Toukan	2	-50/+0
	All WQ types moved to using the fragmented allocation API for coherent memory. Contiguous API is not used anymore. Remove it, reduce the number of exported functions. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17	net/mlx5: CT: Remove extra rhashtable remove on tuple entries	Paul Blakey	1	-1/+0
	On tuple offload del command, tuples are tried to be removed twice from the hashtable, once directly via mlx5_tc_ct_entry_remove_from_tuples() and a second time in the following mlx5_tc_ct_entry_put()-> mlx5_tc_ct_entry_del()->mlx5_tc_ct_entry_remove_from_tuples() call. This doesn't cause any issue since rhashtable first checks if the removed object exists in the hashtable. Remove the extra mlx5_tc_ct_entry_remove_from_tuples(). Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17	net/mlx5: DR, Remove hw_ste from mlx5dr_ste to reduce memory	Rongwei Liu	5	-40/+55
	It can be calculated via function mlx5dr_ste_get_hw_ste(). Very simple and lightweight, no need to use a dedicated member. Reduce 8 bytes from struct mlx5dr_ste and its size is 48 bytes now. Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Reviewed-by: Shun Hao <shunh@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17	net/mlx5: DR, Remove 4 members from mlx5dr_ste_htbl to reduce memory	Rongwei Liu	5	-46/+37
	Remove chunk_size in struct mlx5dr_icm_chunk and use chunk->size instead. Remove ste_arr/hw_ste_arr/miss_list since they can be accessed from htbl->chunk pointer, no need to keep a copy. This commit reduces 28 bytes from struct mlx5dr_ste_htbl and its size is 32 bytes now. Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Reviewed-by: Shun Hao <shunh@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17	net/mlx5: DR, Remove num_of_entries byte_size from struct mlx5_dr_icm_chunk	Rongwei Liu	5	-27/+42
	Target to reduce the memory consumption in large scale of flow rules. They can be calculated quickly from buddy memory pool. 1. num_of_entries calls dr_icm_pool_get_chunk_num_of_entries(). 2. byte_size calls dr_icm_pool_get_chunk_byte_size(). Use chunk size in dr_icm_chunk to speed up and the one in dr_ste_htbl will be removed in the upcoming commit. This commit reduce 8 bytes from struct mlx5_dr_icm_chunk and its current size is 56 bytes. Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Reviewed-by: Shun Hao <shunh@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17	net/mlx5: DR, Remove icm_addr from mlx5dr_icm_chunk to reduce memory	Rongwei Liu	8	-33/+54
	It can be calculated quickly from buddy memory pool by function mlx5dr_icm_pool_get_chunk_icm_addr(). This function is very lightweight and straightforward. Reduce 8 bytes and current size of struct mlx5_dr_icm_chunk is 64 bytes. Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Reviewed-by: Shun Hao <shunh@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17	net/mlx5: DR, Remove mr_addr rkey from struct mlx5dr_icm_chunk	Rongwei Liu	4	-10/+22
	Reduce memory footprint by removing mr_addr and rkey from mlx5_dr_icm_chunk. 1. mr_addr is calculated by mlx5dr_icm_pool_get_chunk_mr_addr() 2. rkey is calculated by mlx5dr_icm_pool_get_chunk_rkey() The two new functions are very lightweight and straightforward. Reduce 8 bytes from struct mlx5_dr_icm_chunk, its current size is 72 bytes. Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Reviewed-by: Shun Hao <shunh@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17	net/mlx5: DR, Adjust structure member to reduce memory hole	Rongwei Liu	1	-3/+3
	Accord to profiling, mlx5dr_ste/mlx5dr_icm_chunk are the two hot structures. Their memory layout can be optimized by adjusting member sequences. Struct mlx5dr_ste size changes from 64 bytes to 56 bytes. In the upcoming commits, struct mlx5dr_icm_chunk memory layout will change automatically after removing some members. Keep it untouched here. Signed-off-by: Rongwei Liu <rongweil@nvidia.com> Reviewed-by: Shun Hao <shunh@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17	net/mlx5e: Drop cqe_bcnt32 from mlx5e_skb_from_cqe_mpwrq_linear	Maxim Mikityanskiy	1	-6/+5
	The packet size in mlx5e_skb_from_cqe_mpwrq_linear can't overflow u16, since the maximum packet size in linear striding RQ is 2^13 bytes. Drop the unneeded u32 variable. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17	net/mlx5e: Drop the len output parameter from mlx5e_xdp_handle	Maxim Mikityanskiy	4	-13/+11
	The len parameter of mlx5e_xdp_handle is used to output the new packet length after XDP has processed the packet and returned XDP_PASS. However, this value can be calculated on the caller site, as the caller knows if it was an XDP_PASS. This commit drops the len parameter and moves the calculation to the caller, reducing the number of parameters passed to the function and preparing for XDP support in non-linear legacy RQ. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17	net/mlx5e: RX, Test the XDP program existence out of the handler	Tariq Toukan	4	-25/+39
	Instead of early return inside mlx5e_xdp_handle(), let the caller check if an XDP program is loaded. This allows saving a few unnecessary function calls and calculations in case !prog. Performance test: single core, drop packets in iptables Before: 3,872,504 pps After: 3,975,628 pps (+2.66%) Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17	net/mlx5e: Build SKB in place over the first fragment in non-linear legacy RQ	Maxim Mikityanskiy	2	-34/+57
	As a performance optimization and preparation to enabling XDP multi buffer on non-linear legacy RQ, build the linear part of the SKB over the first fragment, instead of allocating a new buffer and copying the first 256 bytes there. To achieve this, add headroom and tailroom to the first fragment. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17	net/mlx5e: Add headroom only to the first fragment in legacy RQ	Maxim Mikityanskiy	1	-1/+4
	Currently, rq->buff.headroom is applied to all fragments in legacy RQ. In the linear mode, there is a non-zero headroom, but there is only one fragment per packet. In the non-linear mode, the headroom is zero. This commit changes the logic to apply the headroom only to the first fragment. The current behavior remains the same for both linear and non-linear modes. However, it allows the next commit to enable headroom for the non-linear mode, which will be applied only to the first fragment. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17	net/mlx5e: Validate MTU when building non-linear legacy RQ fragments info	Maxim Mikityanskiy	1	-7/+27
	mlx5e_build_rq_frags_info() assumes that MTU is not bigger than PAGE_SIZE * MLX5E_MAX_RX_FRAGS, which is 16K for 4K pages. Currently, the firmware limits MTU to 10K, so the assumption doesn't lead to a bug. This commits adds an additional driver check for reliability, since the firmware boundary might be changed. The calculation is taken to a separate function with a comment explaining it. It's a preparation for the following patches that introcuce XDP multi buffer support. Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-03-17	selftests: vm: fix clang build error multiple output files	Yosry Ahmed	1	-4/+2
	When building the vm selftests using clang, some errors are seen due to having headers in the compilation command: clang -Wall -I ../../../../usr/include -no-pie gup_test.c ../../../../mm/gup_test.h -lrt -lpthread -o .../tools/testing/selftests/vm/gup_test clang: error: cannot specify -o when generating multiple output files make[1]: *** [../lib.mk:146: .../tools/testing/selftests/vm/gup_test] Error 1 Rework to add the header files to LOCAL_HDRS before including ../lib.mk, since the dependency is evaluated in '$(OUTPUT)/%:%.c $(LOCAL_HDRS)' in file lib.mk. Link: https://lkml.kernel.org/r/20220304000645.1888133-1-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-17	ocfs2: fix crash when initialize filecheck kobj fails	Joseph Qi	1	-11/+11
	Once s_root is set, genric_shutdown_super() will be called if fill_super() fails. That means, we will call ocfs2_dismount_volume() twice in such case, which can lead to kernel crash. Fix this issue by initializing filecheck kobj before setting s_root. Link: https://lkml.kernel.org/r/20220310081930.86305-1-joseph.qi@linux.alibaba.com Fixes: 5f483c4abb50 ("ocfs2: add kobject for online file check") Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-17	configs/debug: restore DEBUG_INFO=y for overriding	Qian Cai	1	-0/+1
	Previously, I failed to realize that Kees' patch [1] has not been merged into the mainline yet, and dropped DEBUG_INFO=y too eagerly from the mainline. As the results, "make debug.config" won't be able to flip DEBUG_INFO=n from the existing .config. This should close the gaps of a few weeks before Kees' patch is there, and work regardless of their merging status anyway. Link: https://lore.kernel.org/all/20220125075126.891825-1-keescook@chromium.org/ [1] Link: https://lkml.kernel.org/r/20220308153524.8618-1-quic_qiancai@quicinc.com Signed-off-by: Qian Cai <quic_qiancai@quicinc.com> Reported-by: Daniel Thompson <daniel.thompson@linaro.org> Reviewed-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-17	mm: swap: get rid of livelock in swapin readahead	Guo Ziliang	1	-1/+1
	In our testing, a livelock task was found. Through sysrq printing, same stack was found every time, as follows: __swap_duplicate+0x58/0x1a0 swapcache_prepare+0x24/0x30 __read_swap_cache_async+0xac/0x220 read_swap_cache_async+0x58/0xa0 swapin_readahead+0x24c/0x628 do_swap_page+0x374/0x8a0 __handle_mm_fault+0x598/0xd60 handle_mm_fault+0x114/0x200 do_page_fault+0x148/0x4d0 do_translation_fault+0xb0/0xd4 do_mem_abort+0x50/0xb0 The reason for the livelock is that swapcache_prepare() always returns EEXIST, indicating that SWAP_HAS_CACHE has not been cleared, so that it cannot jump out of the loop. We suspect that the task that clears the SWAP_HAS_CACHE flag never gets a chance to run. We try to lower the priority of the task stuck in a livelock so that the task that clears the SWAP_HAS_CACHE flag will run. The results show that the system returns to normal after the priority is lowered. In our testing, multiple real-time tasks are bound to the same core, and the task in the livelock is the highest priority task of the core, so the livelocked task cannot be preempted. Although cond_resched() is used by __read_swap_cache_async, it is an empty function in the preemptive system and cannot achieve the purpose of releasing the CPU. A high-priority task cannot release the CPU unless preempted by a higher-priority task. But when this task is already the highest priority task on this core, other tasks will not be able to be scheduled. So we think we should replace cond_resched() with schedule_timeout_uninterruptible(1), schedule_timeout_interruptible will call set_current_state first to set the task state, so the task will be removed from the running queue, so as to achieve the purpose of giving up the CPU and prevent it from running in kernel mode for too long. (akpm: ugly hack becomes uglier. But it fixes the issue in a backportable-to-stable fashion while we hopefully work on something better) Link: https://lkml.kernel.org/r/20220221111749.1928222-1-cgel.zte@gmail.com Signed-off-by: Guo Ziliang <guo.ziliang@zte.com.cn> Reported-by: Zeal Robot <zealci@zte.com.cn> Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Reviewed-by: Jiang Xuexin <jiang.xuexin@zte.com.cn> Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> Acked-by: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Roger Quadros <rogerq@kernel.org> Cc: Ziliang Guo <guo.ziliang@zte.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>