linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2022-05-13	net: phy: smsc: Cope with hot-removal in interrupt handler	Lukas Wunner	1	-1/+3
	If reading the Interrupt Source Flag register fails with -ENODEV, then the PHY has been hot-removed and the correct response is to bail out instead of throwing a WARN splat and attempting to suspend the PHY. The PHY should be stopped in due course anyway as the kernel asynchronously tears down the device. Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500 Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514 Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-13	net: phy: smsc: Cache interrupt mask	Lukas Wunner	1	-13/+11
	Cache the interrupt mask to avoid re-reading it from the PHY upon every interrupt. This will simplify a subsequent commit which detects hot-removal in the interrupt handler and bails out. Analyzing and debugging PHY transactions also becomes simpler if such redundant reads are avoided. Last not least, interrupt overhead and latency is slightly improved. Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500 Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514 Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-13	usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling	Lukas Wunner	1	-52/+61
	Link status of SMSC LAN95xx chips is polled once per second, even though they're capable of signaling PHY interrupts through the MAC layer. Forward those interrupts to the PHY driver to avoid polling. Benefits are reduced bus traffic, reduced CPU overhead and quicker interface bringup. Polling was introduced in 2016 by commit d69d16949346 ("usbnet: smsc95xx: fix link detection for disabled autonegotiation"). Back then, the LAN95xx driver neglected to enable the ENERGYON interrupt, hence couldn't detect link-up events when auto-negotiation was disabled. The proper solution would have been to enable the ENERGYON interrupt instead of polling. Since then, PHY handling was moved from the LAN95xx driver to the SMSC PHY driver with commit 05b35e7eb9a1 ("smsc95xx: add phylib support"). That PHY driver is capable of link detection with auto-negotiation disabled because it enables the ENERGYON interrupt. Note that signaling interrupts through the MAC layer not only works with the integrated PHY, but also with an external PHY, provided its interrupt pin is attached to LAN95xx's nPHY_INT pin. In the unlikely event that the interrupt pin of an external PHY is attached to a GPIO of the SoC (or not connected at all), the driver can be amended to retrieve the irq from the PHY's of_node. To forward PHY interrupts to phylib, it is not sufficient to call phy_mac_interrupt(). Instead, the PHY's interrupt handler needs to run so that PHY interrupts are cleared. That's because according to page 119 of the LAN950x datasheet, "The source of this interrupt is a level. The interrupt persists until it is cleared in the PHY." https://www.microchip.com/content/dam/mchp/documents/UNG/ProductDocuments/DataSheets/LAN950x-Data-Sheet-DS00001875D.pdf Therefore, create an IRQ domain with a single IRQ for the PHY. In the future, the IRQ domain may be extended to support the 11 GPIOs on the LAN95xx. Normally the PHY interrupt should be masked until the PHY driver has cleared it. However masking requires a (sleeping) USB transaction and interrupts are received in (non-sleepable) softirq context. I decided not to mask the interrupt at all (by using the dummy_irq_chip's noop ->irq_mask() callback): The USB interrupt endpoint is polled in 1 msec intervals and normally that's sufficient to wake the PHY driver's IRQ thread and have it clear the interrupt. If it does take longer, worst thing that can happen is the IRQ thread is woken again. No big deal. Because PHY interrupts are now perpetually enabled, there's no need to selectively enable them on suspend. So remove all invocations of smsc95xx_enable_phy_wakeup_interrupts(). In smsc95xx_resume(), move the call of phy_init_hw() before usbnet_resume() (which restarts the status URB) to ensure that the PHY is fully initialized when an interrupt is handled. Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500 Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514 Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> # from a PHY perspective Cc: Andre Edich <andre.edich@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-13	usbnet: smsc95xx: Avoid link settings race on interrupt reception	Lukas Wunner	1	-7/+9
	When a PHY interrupt is signaled, the SMSC LAN95xx driver updates the MAC full duplex mode and PHY flow control registers based on cached data in struct phy_device: smsc95xx_status() # raises EVENT_LINK_RESET usbnet_deferred_kevent() smsc95xx_link_reset() # uses cached data in phydev Simultaneously, phylib polls link status once per second and updates that cached data: phy_state_machine() phy_check_link_status() phy_read_status() lan87xx_read_status() genphy_read_status() # updates cached data in phydev If smsc95xx_link_reset() wins the race against genphy_read_status(), the registers may be updated based on stale data. E.g. if the link was previously down, phydev->duplex is set to DUPLEX_UNKNOWN and that's what smsc95xx_link_reset() will use, even though genphy_read_status() may update it to DUPLEX_FULL afterwards. PHY interrupts are currently only enabled on suspend to trigger wakeup, so the impact of the race is limited, but we're about to enable them perpetually. Avoid the race by delaying execution of smsc95xx_link_reset() until phy_state_machine() has done its job and calls back via smsc95xx_handle_link_change(). Signaling EVENT_LINK_RESET on wakeup is not necessary because phylib picks up link status changes through polling. So drop the declaration of a ->link_reset() callback. Note that the semicolon on a line by itself added in smsc95xx_status() is a placeholder for a function call which will be added in a subsequent commit. That function call will actually handle the INT_ENP_PHY_INT_ interrupt. Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500 Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514 Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-13	usbnet: smsc95xx: Don't reset PHY behind PHY driver's back	Lukas Wunner	1	-18/+0
	smsc95xx_reset() resets the PHY behind the PHY driver's back, which seems like a bad idea generally. Remove that portion of the function. We're about to use PHY interrupts instead of polling to detect link changes on SMSC LAN95xx chips. Because smsc95xx_reset() is called from usbnet_open(), PHY interrupt settings are lost whenever the net_device is brought up. There are two other callers of smsc95xx_reset(), namely smsc95xx_bind() and smsc95xx_reset_resume(), and both may indeed benefit from a PHY reset. However they already perform one through their calls to phy_connect_direct() and phy_init_hw(). Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500 Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514 Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: Martyn Welch <martyn.welch@collabora.com> Cc: Gabriel Hojda <ghojda@yo2urs.ro> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-13	usbnet: smsc95xx: Don't clear read-only PHY interrupt	Lukas Wunner	1	-4/+0
	Upon receiving data from the Interrupt Endpoint, the SMSC LAN95xx driver attempts to clear the signaled interrupts by writing "all ones" to the Interrupt Status Register. However the driver only ever enables a single type of interrupt, namely the PHY Interrupt. And according to page 119 of the LAN950x datasheet, its bit in the Interrupt Status Register is read-only. There's no other way to clear it than in a separate PHY register: https://www.microchip.com/content/dam/mchp/documents/UNG/ProductDocuments/DataSheets/LAN950x-Data-Sheet-DS00001875D.pdf Consequently, writing "all ones" to the Interrupt Status Register is pointless and can be dropped. Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500 Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514 Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-13	usbnet: Run unregister_netdev() before unbind() again	Lukas Wunner	3	-10/+5
	Commit 2c9d6c2b871d ("usbnet: run unbind() before unregister_netdev()") sought to fix a use-after-free on disconnect of USB Ethernet adapters. It turns out that a different fix is necessary to address the issue: https://lore.kernel.org/netdev/18b3541e5372bc9b9fc733d422f4e698c089077c.1650177997.git.lukas@wunner.de/ So the commit was not necessary. The commit made binding and unbinding of USB Ethernet asymmetrical: Before, usbnet_probe() first invoked the ->bind() callback and then register_netdev(). usbnet_disconnect() mirrored that by first invoking unregister_netdev() and then ->unbind(). Since the commit, the order in usbnet_disconnect() is reversed and no longer mirrors usbnet_probe(). One consequence is that a PHY disconnected (and stopped) in ->unbind() is afterwards stopped once more by unregister_netdev() as it closes the netdev before unregistering. That necessitates a contortion in ->stop() because the PHY may only be stopped if it hasn't already been disconnected. Reverting the commit allows making the call to phy_stop() unconditional in ->stop(). Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> # LAN9514/9512/9500 Tested-by: Ferry Toth <fntoth@gmail.com> # LAN9514 Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Oliver Neukum <oneukum@suse.com> Cc: Martyn Welch <martyn.welch@collabora.com> Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-13	net: ethernet: fix platform_no_drv_owner.cocci warning	Yang Li	1	-1/+0
	Remove .owner field if calls are used which set it automatically. ./drivers/net/ethernet/sunplus/spl2sw_driver.c:569:3-8: No need to set .owner here. The core will do it. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-13	net: page_pool: add page allocation stats for two fast page allocate path	Jie Wang	1	-1/+4
	Currently If use page pool allocation stats to analysis a RX performance degradation problem. These stats only count for pages allocate from page_pool_alloc_pages. But nic drivers such as hns3 use page_pool_dev_alloc_frag to allocate pages, so page stats in this API should also be counted. Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-13	net: ethernet: Use swap() instead of open coding it	Jiapeng Chong	1	-16/+4
	Clean the following coccicheck warning: ./drivers/net/ethernet/sunplus/spl2sw_driver.c:217:27-28: WARNING opportunity for swap(). ./drivers/net/ethernet/sunplus/spl2sw_driver.c:222:27-28: WARNING opportunity for swap(). Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-12	net: selftests: Stress reuseport listen	Martin KaFai Lau	3	-0/+132
	This patch adds a test that has 300 VIPs listening on port 443. Each VIP:443 will have 80 listening socks by using SO_REUSEPORT. Thus, it will have 24000 listening socks. Before removing the port only listening_hash, all socks will be in the same port 443 bucket and inet_reuseport_add_sock() spends much time to walk through the bucket. After removing the port only listening_hash and move all usage to the port+addr lhash2, each bucket in the ideal case has 80 sk which is much smaller than before. Here is the test result from a qemu: Before: listen 24000 socks took 210.210485362 (~210s) After: listen 24000 socks took 0.207173 (~210ms) Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: inet: Retire port only listening_hash	Martin KaFai Lau	9	-101/+26
	The listen sk is currently stored in two hash tables, listening_hash (hashed by port) and lhash2 (hashed by port and address). After commit 0ee58dad5b06 ("net: tcp6: prefer listeners bound to an address") and commit d9fbc7f6431f ("net: tcp: prefer listeners bound to an address"), the TCP-SYN lookup fast path does not use listening_hash. The commit 05c0b35709c5 ("tcp: seq_file: Replace listening_hash with lhash2") also moved the seq_file (/proc/net/tcp) iteration usage from listening_hash to lhash2. There are still a few listening_hash usages left. One of them is inet_reuseport_add_sock() which uses the listening_hash to search a listen sk during the listen() system call. This turns out to be very slow on use cases that listen on many different VIPs at a popular port (e.g. 443). [ On top of the slowness in adding to the tail in the IPv6 case ]. The latter patch has a selftest to demonstrate this case. This patch takes this chance to move all remaining listening_hash usages to lhash2 and then retire listening_hash. Since most changes need to be done together, it is hard to cut the listening_hash to lhash2 switch into small patches. The changes in this patch is highlighted here for the review purpose. 1. Because of the listening_hash removal, lhash2 can use the sk->sk_nulls_node instead of the icsk->icsk_listen_portaddr_node. This will also keep the sk_unhashed() check to work as is after stop adding sk to listening_hash. The union is removed from inet_listen_hashbucket because only nulls_head is needed. 2. icsk->icsk_listen_portaddr_node and its helpers are removed. 3. The current lhash2 users needs to iterate with sk_nulls_node instead of icsk_listen_portaddr_node. One case is in the inet[6]_lhash2_lookup(). Another case is the seq_file iterator in tcp_ipv4.c. One thing to note is sk_nulls_next() is needed because the old inet_lhash2_for_each_icsk_continue() does a "next" first before iterating. 4. Move the remaining listening_hash usage to lhash2 inet_reuseport_add_sock() which this series is trying to improve. inet_diag.c and mptcp_diag.c are the final two remaining use cases and is moved to lhash2 now also. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: inet: Open code inet_hash2 and inet_unhash2	Martin KaFai Lau	1	-55/+33
	This patch folds lhash2 related functions into __inet_hash and inet_unhash. This will make the removal of the listening_hash in a latter patch easier to review. First, this patch folds inet_hash2 into __inet_hash. For unhash, the current call sequence is like inet_unhash() => __inet_unhash() => inet_unhash2(). The specific testing cases in __inet_unhash() are mostly related to TCP_LISTEN sk and its caller inet_unhash() already has the TCP_LISTEN test, so this patch folds both __inet_unhash() and inet_unhash2() into inet_unhash(). Note that all listening_hash users also have lhash2 initialized, so the !h->lhash2 check is no longer needed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: inet: Remove count from inet_listen_hashbucket	Martin KaFai Lau	2	-7/+0
	After commit 0ee58dad5b06 ("net: tcp6: prefer listeners bound to an address") and commit d9fbc7f6431f ("net: tcp: prefer listeners bound to an address"), the count is no longer used. This patch removes it. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	sfc/siena: Reinstate SRIOV init/fini function calls	Martin Habets	2	-0/+19
	They were removed in the first series since they were not used for EF10. Put that code back for Siena, with the prototypes in siena_sriov.h since that file is a more applicable place for it. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	sfc/siena: Make PTP and reset support specific for Siena	Martin Habets	2	-4/+5
	Change the clock name and work queue names to differentiate them from the names used in sfc.ko. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	sfc/siena: Make MCDI logging support specific for Siena	Martin Habets	6	-15/+26
	Add a Siena Kconfig option and use it in stead of the sfc one. Rename the internal variable for the 'mcdi_logging_default' module parameter to avoid a naming conflict with the one in sfc.ko. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	siena: Make HWMON support specific for Siena	Martin Habets	4	-6/+13
	Add a Siena Kconfig option and use it in stead of the sfc one. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	siena: Make SRIOV support specific for Siena	Martin Habets	13	-30/+38
	Add a Siena Kconfig option and use it in stead of the sfc one. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	siena: Make MTD support specific for Siena	Martin Habets	9	-13/+21
	Add a Siena Kconfig option and use it in stead of the sfc one. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: mscc: ocelot: move ocelot_port_private :: chip_port to ocelot_port :: index	Vladimir Oltean	4	-39/+41
	Currently the ocelot switch lib is unaware of the index of a struct ocelot_port, since that is kept in the encapsulating structures of outer drivers (struct dsa_port :: index, struct ocelot_port_private :: chip_port). With the upcoming increase in complexity associated with assigning DSA tag_8021q CPU ports to certain user ports, it becomes necessary for the switch lib to be able to retrieve the index of a certain ocelot_port. Therefore, introduce a new u8 to ocelot_port (same size as the chip_port used by the ocelot switchdev driver) and rework the existing code to populate and use it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: mscc: ocelot: minimize holes in struct ocelot_port	Vladimir Oltean	1	-9/+11
	Reorder members of struct ocelot_port to eliminate holes and reduce structure size. Pahole says: Before: struct ocelot_port { struct ocelot * ocelot; /* 0 8 / struct regmap target; /* 8 8 / bool vlan_aware; / 16 1 / / XXX 7 bytes hole, try to pack / const struct ocelot_bridge_vlan pvid_vlan; /* 24 8 / unsigned int ptp_skbs_in_flight; / 32 4 / u8 ptp_cmd; / 36 1 / / XXX 3 bytes hole, try to pack / struct sk_buff_head tx_skbs; / 40 96 / / --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- / u8 ts_id; / 136 1 / / XXX 3 bytes hole, try to pack / phy_interface_t phy_mode; / 140 4 / bool is_dsa_8021q_cpu; / 144 1 / bool learn_ena; / 145 1 / / XXX 6 bytes hole, try to pack / struct net_device bond; /* 152 8 / bool lag_tx_active; / 160 1 / / XXX 1 byte hole, try to pack / u16 mrp_ring_id; / 162 2 / / XXX 4 bytes hole, try to pack / struct net_device bridge; /* 168 8 / int bridge_num; / 176 4 / u8 stp_state; / 180 1 / / XXX 3 bytes hole, try to pack / int speed; / 184 4 / / size: 192, cachelines: 3, members: 18 / / sum members: 161, holes: 7, sum holes: 27 / / padding: 4 / }; After: struct ocelot_port { struct ocelot ocelot; /* 0 8 / struct regmap target; /* 8 8 / struct net_device bond; /* 16 8 / struct net_device bridge; /* 24 8 / const struct ocelot_bridge_vlan pvid_vlan; /* 32 8 / phy_interface_t phy_mode; / 40 4 / unsigned int ptp_skbs_in_flight; / 44 4 / struct sk_buff_head tx_skbs; / 48 96 / / --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- / u16 mrp_ring_id; / 144 2 / u8 ptp_cmd; / 146 1 / u8 ts_id; / 147 1 / u8 stp_state; / 148 1 / bool vlan_aware; / 149 1 / bool is_dsa_8021q_cpu; / 150 1 / bool learn_ena; / 151 1 / bool lag_tx_active; / 152 1 / / XXX 3 bytes hole, try to pack / int bridge_num; / 156 4 / int speed; / 160 4 / / size: 168, cachelines: 3, members: 18 / / sum members: 161, holes: 1, sum holes: 3 / / padding: 4 / / last cacheline: 40 bytes */ }; Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: mscc: ocelot: delete ocelot_port :: xmit_template	Vladimir Oltean	1	-1/+0
	This is no longer used since commit 7c4bb540e917 ("net: dsa: tag_ocelot: create separate tagger for Seville"). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: dsa: felix: reimplement tagging protocol change with function pointers	Vladimir Oltean	2	-197/+216
	The error handling for the current tagging protocol change procedure is a bit brittle (we dismantle the previous tagging protocol entirely before setting up the new one). By identifying which parts of a tagging protocol are unique to itself and which parts are shared with the other, we can implement a protocol change procedure where error handling is a bit more robust, because we start setting up the new protocol first, and tear down the old one only after the setup of the specific and shared parts succeeded. The protocol change is a bit too open-coded too, in the area of migrating host flood settings and MDBs. By identifying what differs between tagging protocols (the forwarding masks for host flooding) we can implement a more straightforward migration procedure which is handled in the shared portion of the protocol change, rather than individually by each protocol. Therefore, a more structured approach calls for the introduction of a structure of function pointers per tagging protocol. This covers setup, teardown and the host forwarding mask. In the future it will also cover how to prepare for a new DSA master. The initial tagging protocol setup (at driver probe time) and the final teardown (at driver removal time) are also adapted to call into the structured methods of the specific protocol in current use. This is especially relevant for teardown, where we previously called felix_del_tag_protocol() only for the first CPU port. But by not specifying which CPU port this is for, we gain more flexibility to support multiple CPU ports in the future. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: dsa: felix: dynamically determine tag_8021q CPU port for traps	Vladimir Oltean	1	-9/+17
	Ocelot switches support a single active CPU port at a time (at least as a trapping destination, i.e. for control traffic). This is true regardless of whether we are using the native copy-to-CPU-port-module functionality, or a redirect action towards the software-defined tag_8021q CPU port. Currently we assume that the trapping destination in tag_8021q mode is the first CPU port, yet in the future we may want to migrate the user ports to the second CPU port. For that to work, we need to make sure that the tag_8021q trapping destination is a CPU port that is active, i.e. is used by at least some user port on which the trap was added. Otherwise, we may end up redirecting the traffic to a CPU port which isn't even up. Note that due to the current design where we simply choose the CPU port of the first port from the trap's ingress port mask, it may be that a CPU port absorbes control traffic from user ports which aren't affine to it as per user space's request. This isn't ideal, but is the lesser of two evils. Following the user-configured affinity for traps would mean that we can no longer reuse a single TCAM entry for multiple traps, which is what we actually do for e.g. PTP. Either we duplicate and deduplicate TCAM entries on the fly when user-to-CPU-port mappings change (which is unnecessarily complicated), or we redirect trapped traffic to all tag_8021q CPU ports if multiple such ports are in use. The latter would have actually been nice, if it actually worked, but it doesn't, since a OCELOT_MASK_MODE_REDIRECT action towards multiple ports would not take PGID_SRC into consideration, and it would just duplicate the packet towards each (CPU) port, leading to duplicates in software. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: dsa: remove port argument from ->change_tag_protocol()	Vladimir Oltean	6	-51/+46
	DSA has not supported (and probably will not support in the future either) independent tagging protocols per CPU port. Different switch drivers have different requirements, some may need to replicate some settings for each CPU port, some may need to apply some settings on a single CPU port, while some may have to configure some global settings and then some per-CPU-port settings. In any case, the current model where DSA calls ->change_tag_protocol for each CPU port turns out to be impractical for drivers where there are global things to be done. For example, felix calls dsa_tag_8021q_register(), which makes no sense per CPU port, so it suppresses the second call. Let drivers deal with replication towards all CPU ports, and remove the CPU port argument from the function prototype. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: dsa: felix: manage host flooding using a specific driver callback	Vladimir Oltean	6	-30/+51
	At the time - commit 7569459a52c9 ("net: dsa: manage flooding on the CPU ports") - not introducing a dedicated switch callback for host flooding made sense, because for the only user, the felix driver, there was nothing different to do for the CPU port than set the flood flags on the CPU port just like on any other bridge port. There are 2 reasons why this approach is not good enough, however. (1) Other drivers, like sja1105, support configuring flooding as a function of {ingress port, egress port}, whereas the DSA ->port_bridge_flags() function only operates on an egress port. So with that driver we'd have useless host flooding from user ports which don't need it. (2) Even with the felix driver, support for multiple CPU ports makes it difficult to piggyback on ->port_bridge_flags(). The way in which the felix driver is going to support host-filtered addresses with multiple CPU ports is that it will direct these addresses towards both CPU ports (in a sort of multicast fashion), then restrict the forwarding to only one of the two using the forwarding masks. Consequently, flooding will also be enabled towards both CPU ports. However, ->port_bridge_flags() gets passed the index of a single CPU port, and that leaves the flood settings out of sync between the 2 CPU ports. This is to say, it's better to have a specific driver method for host flooding, which takes the user port as argument. This solves problem (1) by allowing the driver to do different things for different user ports, and problem (2) by abstracting the operation and letting the driver do whatever, rather than explicitly making the DSA core point to the CPU port it thinks needs to be touched. This new method also creates a problem, which is that cross-chip setups are not handled. However I don't have hardware right now where I can test what is the proper thing to do, and there isn't hardware compatible with multi-switch trees that supports host flooding. So it remains a problem to be tackled in the future. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: dsa: introduce the dsa_cpu_ports() helper	Vladimir Oltean	1	-0/+11
	Similar to dsa_user_ports() which retrieves a port mask of all user ports, introduce dsa_cpu_ports() which retrieves the mask of all CPU ports of a switch. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: dsa: felix: bring the NPI port indirection for host flooding to surface	Vladimir Oltean	2	-3/+3
	For symmetry with host FDBs and MDBs where the indirection is now handled outside the ocelot switch lib, do the same for bridge port flags (unicast/multicast/broadcast flooding). The only caller of the ocelot switch lib which uses the NPI port is the Felix DSA driver. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: dsa: felix: bring the NPI port indirection for host MDBs to surface	Vladimir Oltean	2	-6/+6
	For symmetry with host FDBs where the indirection is now handled outside the ocelot switch lib, do the same for host MDB entries. The only caller of the ocelot switch lib which uses the NPI port is the Felix DSA driver. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: dsa: felix: program host FDB entries towards PGID_CPU for tag_8021q too	Vladimir Oltean	2	-8/+11
	I remembered why we had the host FDB migration procedure in place. It is true that host FDB entry migration can be done by changing the value of PGID_CPU, but the problem is that only host FDB entries learned while operating in NPI mode go to PGID_CPU. When the CPU port operates in tag_8021q mode, the FDB entries are learned towards the unicast PGID equal to the physical port number of this CPU port, bypassing the PGID_CPU indirection. So host FDB entries learned in tag_8021q mode are not migrated any longer towards the NPI port. Fix this by extracting the NPI port -> PGID_CPU redirection from the ocelot switch lib, moving it to the Felix DSA driver, and applying it for any CPU port regardless of its kind (NPI or tag_8021q). Fixes: a51c1c3f3218 ("net: dsa: felix: stop migrating FDBs back and forth on tag proto change") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: lan966x: Fix use of pointer after being freed	Horatiu Vultur	1	-2/+2
	The smatch found the following warning: drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c:736 lan966x_fdma_reload() warn: 'rx_dcbs' was already freed. This issue can happen when changing the MTU on one of the ports and once the RX buffers are allocated and then the TX buffer allocation fails. In that case the RX buffers should not be restore. This fix this issue such that the RX buffers will not be restored if the TX buffers failed to be allocated. Fixes: 2ea1cbac267e2a ("net: lan966x: Update FDMA to change MTU.") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20220511204059.2689199-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: update the register_netdevice() kdoc	Jakub Kicinski	1	-14/+6
	The BUGS section looks quite dated, the registration is under rtnl lock. Remove some obvious information while at it. Link: https://lore.kernel.org/r/20220511190720.1401356-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	skbuff: replace a BUG_ON() with the new DEBUG_NET_WARN_ON_ONCE()	Jakub Kicinski	1	-3/+1
	Very few drivers actually have Kconfig knobs for adding -DDEBUG. 8 according to a quick grep, while there are 93 users of skb_checksum_none_assert(). Switch to the new DEBUG_NET_WARN_ON_ONCE() to catch bad skbs. Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220511172305.1382810-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	mlxbf_gige: remove driver-managed interrupt counts	David Thompson	3	-17/+3
	The driver currently has three interrupt counters, which are incremented every time each interrupt handler executes. These driver-managed counters are not necessary as the kernel already has logic that manages interrupt counts and exposes them via /proc/interrupts. This patch removes the driver-managed counters. Signed-off-by: David Thompson <davthompson@nvidia.com> Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com> Link: https://lore.kernel.org/r/20220511135251.2989-1-davthompson@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	tls: Fix context leak on tls_device_down	Maxim Mikityanskiy	1	-0/+3
	The commit cited below claims to fix a use-after-free condition after tls_device_down. Apparently, the description wasn't fully accurate. The context stayed alive, but ctx->netdev became NULL, and the offload was torn down without a proper fallback, so a bug was present, but a different kind of bug. Due to misunderstanding of the issue, the original patch dropped the refcount_dec_and_test line for the context to avoid the alleged premature deallocation. That line has to be restored, because it matches the refcount_inc_not_zero from the same function, otherwise the contexts that survived tls_device_down are leaked. This patch fixes the described issue by restoring refcount_dec_and_test. After this change, there is no leak anymore, and the fallback to software kTLS still works. Fixes: c55dcdd435aa ("net/tls: Fix use-after-free after the TLS device goes down and up") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20220512091830.678684-1-maximmi@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe()	Taehee Yoo	1	-0/+5
	In the NIC ->probe() callback, ->mtd_probe() callback is called. If NIC has 2 ports, ->probe() is called twice and ->mtd_probe() too. In the ->mtd_probe(), which is efx_ef10_mtd_probe() it allocates and initializes mtd partiion. But mtd partition for sfc is shared data. So that allocated mtd partition data from last called efx_ef10_mtd_probe() will not be used. Therefore it must be freed. But it doesn't free a not used mtd partition data in efx_ef10_mtd_probe(). kmemleak reports: unreferenced object 0xffff88811ddb0000 (size 63168): comm "systemd-udevd", pid 265, jiffies 4294681048 (age 348.586s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffffa3767749>] kmalloc_order_trace+0x19/0x120 [<ffffffffa3873f0e>] __kmalloc+0x20e/0x250 [<ffffffffc041389f>] efx_ef10_mtd_probe+0x11f/0x270 [sfc] [<ffffffffc0484c8a>] efx_pci_probe.cold.17+0x3df/0x53d [sfc] [<ffffffffa414192c>] local_pci_probe+0xdc/0x170 [<ffffffffa4145df5>] pci_device_probe+0x235/0x680 [<ffffffffa443dd52>] really_probe+0x1c2/0x8f0 [<ffffffffa443e72b>] __driver_probe_device+0x2ab/0x460 [<ffffffffa443e92a>] driver_probe_device+0x4a/0x120 [<ffffffffa443f2ae>] __driver_attach+0x16e/0x320 [<ffffffffa4437a90>] bus_for_each_dev+0x110/0x190 [<ffffffffa443b75e>] bus_add_driver+0x39e/0x560 [<ffffffffa4440b1e>] driver_register+0x18e/0x310 [<ffffffffc02e2055>] 0xffffffffc02e2055 [<ffffffffa3001af3>] do_one_initcall+0xc3/0x450 [<ffffffffa33ca574>] do_init_module+0x1b4/0x700 Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Fixes: 8127d661e77f ("sfc: Add support for Solarflare SFC9100 family") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://lore.kernel.org/r/20220512054709.12513-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net/smc: non blocking recvmsg() return -EAGAIN when no data and signal_pending	Guangguan Wang	1	-2/+2
	Non blocking sendmsg will return -EAGAIN when any signal pending and no send space left, while non blocking recvmsg return -EINTR when signal pending and no data received. This may makes confused. As TCP returns -EAGAIN in the conditions described above. Align the behavior of smc with TCP. Fixes: 846e344eb722 ("net/smc: add receive timeout check") Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/20220512030820.73848-1-guangguan.wang@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: dsa: bcm_sf2: Fix Wake-on-LAN with mac_link_down()	Florian Fainelli	1	-0/+3
	After commit 2d1f90f9ba83 ("net: dsa/bcm_sf2: fix incorrect usage of state->link") the interface suspend path would call our mac_link_down() call back which would forcibly set the link down, thus preventing Wake-on-LAN packets from reaching our management port. Fix this by looking at whether the port is enabled for Wake-on-LAN and not clearing the link status in that case to let packets go through. Fixes: 2d1f90f9ba83 ("net: dsa/bcm_sf2: fix incorrect usage of state->link") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220512021731.2494261-1-f.fainelli@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	mlxsw: Avoid warning during ip6gre device removal	Amit Cohen	1	-8/+3
	IPv6 addresses which are used for tunnels are stored in a hash table with reference counting. When a new GRE tunnel is configured, the driver is notified and configures it in hardware. Currently, any change in the tunnel is not applied in the driver. It means that if the remote address is changed, the driver is not aware of this change and the first address will be used. This behavior results in a warning [1] in scenarios such as the following: # ip link add name gre1 type ip6gre local 2000::3 remote 2000::fffe tos inherit ttl inherit # ip link set name gre1 type ip6gre local 2000::3 remote 2000::ffff ttl inherit # ip link delete gre1 The change of the address is not applied in the driver. Currently, the driver uses the remote address which is stored in the 'parms' of the overlay device. When the tunnel is removed, the new IPv6 address is used, the driver tries to release it, but as it is not aware of the change, this address is not configured and it warns about releasing non existing IPv6 address. Fix it by using the IPv6 address which is cached in the IPIP entry, this address is the last one that the driver used, so even in cases such the above, the first address will be released, without any warning. [1]: WARNING: CPU: 1 PID: 2197 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2920 mlxsw_sp_ipv6_addr_put+0x146/0x220 [mlxsw_spectrum] ... CPU: 1 PID: 2197 Comm: ip Not tainted 5.17.0-rc8-custom-95062-gc1e5ded51a9a #84 Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 07/12/2021 RIP: 0010:mlxsw_sp_ipv6_addr_put+0x146/0x220 [mlxsw_spectrum] ... Call Trace: <TASK> mlxsw_sp2_ipip_rem_addr_unset_gre6+0xf1/0x120 [mlxsw_spectrum] mlxsw_sp_netdevice_ipip_ol_event+0xdb/0x640 [mlxsw_spectrum] mlxsw_sp_netdevice_event+0xc4/0x850 [mlxsw_spectrum] raw_notifier_call_chain+0x3c/0x50 call_netdevice_notifiers_info+0x2f/0x80 unregister_netdevice_many+0x311/0x6d0 rtnl_dellink+0x136/0x360 rtnetlink_rcv_msg+0x12f/0x380 netlink_rcv_skb+0x49/0xf0 netlink_unicast+0x233/0x340 netlink_sendmsg+0x202/0x440 ____sys_sendmsg+0x1f3/0x220 ___sys_sendmsg+0x70/0xb0 __sys_sendmsg+0x54/0xa0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: e846efe2737b ("mlxsw: spectrum: Add hash table for IPv6 address mapping") Reported-by: Maksym Yaremchuk <maksymy@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220511115747.238602-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-12	nfp: VF rate limit support	Bin Chen	3	-2/+56
	Add VF rate limit feature This patch enhances the NFP driver to supports assignment of both max_tx_rate and min_tx_rate to VFs The template of configurations below is all supported. e.g. # ip link set $DEV vf $VF_NUM max_tx_rate $RATE_VALUE # ip link set $DEV vf $VF_NUM min_tx_rate $RATE_VALUE # ip link set $DEV vf $VF_NUM max_tx_rate $RATE_VALUE \ min_tx_rate $RATE_VALUE # ip link set $DEV vf $VF_NUM min_tx_rate $RATE_VALUE \ max_tx_rate $RATE_VALUE The max RATE_VALUE is limited to 0xFFFF which is about 63Gbps (using 1024 for 1G) Signed-off-by: Bin Chen <bin.chen@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-12	rtnetlink: verify rate parameters for calls to ndo_set_vf_rate	Bin Chen	1	-10/+18
	When calling ndo_set_vf_rate() the max_tx_rate parameter may be zero, in which case the setting is cleared, or it must be greater or equal to min_tx_rate. Enforce this requirement on all calls to ndo_set_vf_rate via a wrapper which also only calls ndo_set_vf_rate() if defined by the driver. Based on work by Jakub Kicinski <kuba@kernel.org> Signed-off-by: Bin Chen <bin.chen@corigine.com> Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-12	net: ethernet: SP7021: Fix spelling mistake "Interrput" -> "Interrupt"	Colin Ian King	1	-1/+1
	There is a spelling mistake in a dev_dbg message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20220511104448.150800-1-colin.i.king@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-12	net: enetc: kill PHY-less mode for PFs	Vladimir Oltean	1	-10/+14
	Right now, a PHY-less port (no phy-mode, no fixed-link, no phy-handle) doesn't register with phylink, but calls netif_carrier_on() from enetc_start(). This makes sense for a VF, but for a PF, this is braindead, because we never call enetc_mac_enable() so the MAC is left inoperational. Furthermore, commit 71b77a7a27a3 ("enetc: Migrate to PHYLINK and PCS_LYNX") put the nail in the coffin because it removed the initial netif_carrier_off() call done right after register_netdev(). Without that call, netif_carrier_on() does not call linkwatch_fire_event(), so the operstate remains IF_OPER_UNKNOWN. Just deny the broken configuration by requiring that a phy-mode is present, and always register a PF with phylink. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20220511094200.558502-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-12	fortify: Provide a memcpy trap door for sharp corners	Kees Cook	3	-1/+27
	As we continue to narrow the scope of what the FORTIFY memcpy() will accept and build alternative APIs that give the compiler appropriate visibility into more complex memcpy scenarios, there is a need for "unfortified" memcpy use in rare cases where combinations of compiler behaviors, source code layout, etc, result in cases where the stricter memcpy checks need to be bypassed until appropriate solutions can be developed (i.e. fix compiler bugs, code refactoring, new API, etc). The intention is for this to be used only if there's no other reasonable solution, for its use to include a justification that can be used to assess future solutions, and for it to be temporary. Example usage included, based on analysis and discussion from: https://lore.kernel.org/netdev/CANn89iLS_2cshtuXPyNUGDPaic=sJiYfvTb_wNLgWrZRyBxZ_g@mail.gmail.com Cc: Jakub Kicinski <kuba@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Coco Li <lixiaoyan@google.com> Cc: Tariq Toukan <tariqt@nvidia.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220511025301.3636666-1-keescook@chromium.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-12	net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral	Florian Fainelli	1	-0/+4
	The interrupt controller supplying the Wake-on-LAN interrupt line maybe modular on some platforms (irq-bcm7038-l1.c) and might be probed at a later time than the GENET driver. We need to specifically check for -EPROBE_DEFER and propagate that error to ensure that we eventually fetch the interrupt descriptor. Fixes: 9deb48b53e7f ("bcmgenet: add WOL IRQ check") Fixes: 5b1f0e62941b ("net: bcmgenet: Avoid touching non-existent interrupt") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Stefan Wahren <stefan.wahren@i2se.com> Link: https://lore.kernel.org/r/20220511031752.2245566-1-f.fainelli@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-12	net: ethernet: mediatek: ppe: fix wrong size passed to memset()	Yang Yingliang	1	-1/+1
	'foe_table' is a pointer, the real size of struct mtk_foe_entry should be pass to memset(). Fixes: ba37b7caf1ed ("net: ethernet: mtk_eth_soc: add support for initializing the PPE") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20220511030829.3308094-1-yangyingliang@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-11	Bluetooth: Fix the creation of hdev->name	Itay Iellin	2	-3/+6
	Set a size limit of 8 bytes of the written buffer to "hdev->name" including the terminating null byte, as the size of "hdev->name" is 8 bytes. If an id value which is greater than 9999 is allocated, then the "snprintf(hdev->name, sizeof(hdev->name), "hci%d", id)" function call would lead to a truncation of the id value in decimal notation. Set an explicit maximum id parameter in the id allocation function call. The id allocation function defines the maximum allocated id value as the maximum id parameter value minus one. Therefore, HCI_MAX_ID is defined as 10000. Signed-off-by: Itay Iellin <ieitayie@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2022-05-11	net: enetc: count the tc-taprio window drops	Po Liu	4	-2/+16
	The enetc scheduler for IEEE 802.1Qbv has 2 options (depending on PTGCR[TG_DROP_DISABLE]) when we attempt to send an oversized packet which will never fit in its allotted time slot for its traffic class: either block the entire port due to head-of-line blocking, or drop the packet and set a bit in the writeback format of the transmit buffer descriptor, allowing other packets to be sent. We obviously choose the second option in the driver, but we do not detect the drop condition, so from the perspective of the network stack, the packet is sent and no error counter is incremented. This change checks the writeback of the TX BD when tc-taprio is enabled, and increments a specific ethtool statistics counter and a generic "tx_dropped" counter in ndo_get_stats64. Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11	net: enetc: manage ENETC_F_QBV in priv->active_offloads only when enabled	Vladimir Oltean	2	-4/+8
	Future work in this driver would like to look at priv->active_offloads & ENETC_F_QBV to determine whether a tc-taprio qdisc offload was installed, but this does not produce the intended effect. All the other flags in priv->active_offloads are managed dynamically, except ENETC_F_QBV which is set statically based on the probed SI capability. This change makes priv->active_offloads & ENETC_F_QBV really track the presence of a tc-taprio schedule on the port. Some existing users, like the enetc_sched_speed_set() call from phylink_mac_link_up(), are best kept using the old logic: the tc-taprio offload does not re-trigger another link mode resolve, so the scheduler needs to be functional from the get go, as long as Qbv is supported at all on the port. So to preserve functionality there, look at the static station interface capability from pf->si->hw_features instead. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>