linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2022-05-12	net: lan966x: Fix use of pointer after being freed	Horatiu Vultur	1	-2/+2
	The smatch found the following warning: drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c:736 lan966x_fdma_reload() warn: 'rx_dcbs' was already freed. This issue can happen when changing the MTU on one of the ports and once the RX buffers are allocated and then the TX buffer allocation fails. In that case the RX buffers should not be restore. This fix this issue such that the RX buffers will not be restored if the TX buffers failed to be allocated. Fixes: 2ea1cbac267e2a ("net: lan966x: Update FDMA to change MTU.") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20220511204059.2689199-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	mlxbf_gige: remove driver-managed interrupt counts	David Thompson	3	-17/+3
	The driver currently has three interrupt counters, which are incremented every time each interrupt handler executes. These driver-managed counters are not necessary as the kernel already has logic that manages interrupt counts and exposes them via /proc/interrupts. This patch removes the driver-managed counters. Signed-off-by: David Thompson <davthompson@nvidia.com> Signed-off-by: Asmaa Mnebhi <asmaa@nvidia.com> Link: https://lore.kernel.org/r/20220511135251.2989-1-davthompson@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	24	-88/+173
	No conflicts. Build issue in drivers/net/ethernet/sfc/ptp.c 54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static") 49e6123c65da ("net: sfc: fix memory leak due to ptp channel") https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	net: sfc: ef10: fix memory leak in efx_ef10_mtd_probe()	Taehee Yoo	1	-0/+5
	In the NIC ->probe() callback, ->mtd_probe() callback is called. If NIC has 2 ports, ->probe() is called twice and ->mtd_probe() too. In the ->mtd_probe(), which is efx_ef10_mtd_probe() it allocates and initializes mtd partiion. But mtd partition for sfc is shared data. So that allocated mtd partition data from last called efx_ef10_mtd_probe() will not be used. Therefore it must be freed. But it doesn't free a not used mtd partition data in efx_ef10_mtd_probe(). kmemleak reports: unreferenced object 0xffff88811ddb0000 (size 63168): comm "systemd-udevd", pid 265, jiffies 4294681048 (age 348.586s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffffa3767749>] kmalloc_order_trace+0x19/0x120 [<ffffffffa3873f0e>] __kmalloc+0x20e/0x250 [<ffffffffc041389f>] efx_ef10_mtd_probe+0x11f/0x270 [sfc] [<ffffffffc0484c8a>] efx_pci_probe.cold.17+0x3df/0x53d [sfc] [<ffffffffa414192c>] local_pci_probe+0xdc/0x170 [<ffffffffa4145df5>] pci_device_probe+0x235/0x680 [<ffffffffa443dd52>] really_probe+0x1c2/0x8f0 [<ffffffffa443e72b>] __driver_probe_device+0x2ab/0x460 [<ffffffffa443e92a>] driver_probe_device+0x4a/0x120 [<ffffffffa443f2ae>] __driver_attach+0x16e/0x320 [<ffffffffa4437a90>] bus_for_each_dev+0x110/0x190 [<ffffffffa443b75e>] bus_add_driver+0x39e/0x560 [<ffffffffa4440b1e>] driver_register+0x18e/0x310 [<ffffffffc02e2055>] 0xffffffffc02e2055 [<ffffffffa3001af3>] do_one_initcall+0xc3/0x450 [<ffffffffa33ca574>] do_init_module+0x1b4/0x700 Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Fixes: 8127d661e77f ("sfc: Add support for Solarflare SFC9100 family") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://lore.kernel.org/r/20220512054709.12513-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12	mlxsw: Avoid warning during ip6gre device removal	Amit Cohen	1	-8/+3
	IPv6 addresses which are used for tunnels are stored in a hash table with reference counting. When a new GRE tunnel is configured, the driver is notified and configures it in hardware. Currently, any change in the tunnel is not applied in the driver. It means that if the remote address is changed, the driver is not aware of this change and the first address will be used. This behavior results in a warning [1] in scenarios such as the following: # ip link add name gre1 type ip6gre local 2000::3 remote 2000::fffe tos inherit ttl inherit # ip link set name gre1 type ip6gre local 2000::3 remote 2000::ffff ttl inherit # ip link delete gre1 The change of the address is not applied in the driver. Currently, the driver uses the remote address which is stored in the 'parms' of the overlay device. When the tunnel is removed, the new IPv6 address is used, the driver tries to release it, but as it is not aware of the change, this address is not configured and it warns about releasing non existing IPv6 address. Fix it by using the IPv6 address which is cached in the IPIP entry, this address is the last one that the driver used, so even in cases such the above, the first address will be released, without any warning. [1]: WARNING: CPU: 1 PID: 2197 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2920 mlxsw_sp_ipv6_addr_put+0x146/0x220 [mlxsw_spectrum] ... CPU: 1 PID: 2197 Comm: ip Not tainted 5.17.0-rc8-custom-95062-gc1e5ded51a9a #84 Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 07/12/2021 RIP: 0010:mlxsw_sp_ipv6_addr_put+0x146/0x220 [mlxsw_spectrum] ... Call Trace: <TASK> mlxsw_sp2_ipip_rem_addr_unset_gre6+0xf1/0x120 [mlxsw_spectrum] mlxsw_sp_netdevice_ipip_ol_event+0xdb/0x640 [mlxsw_spectrum] mlxsw_sp_netdevice_event+0xc4/0x850 [mlxsw_spectrum] raw_notifier_call_chain+0x3c/0x50 call_netdevice_notifiers_info+0x2f/0x80 unregister_netdevice_many+0x311/0x6d0 rtnl_dellink+0x136/0x360 rtnetlink_rcv_msg+0x12f/0x380 netlink_rcv_skb+0x49/0xf0 netlink_unicast+0x233/0x340 netlink_sendmsg+0x202/0x440 ____sys_sendmsg+0x1f3/0x220 ___sys_sendmsg+0x70/0xb0 __sys_sendmsg+0x54/0xa0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: e846efe2737b ("mlxsw: spectrum: Add hash table for IPv6 address mapping") Reported-by: Maksym Yaremchuk <maksymy@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220511115747.238602-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-12	nfp: VF rate limit support	Bin Chen	3	-2/+56
	Add VF rate limit feature This patch enhances the NFP driver to supports assignment of both max_tx_rate and min_tx_rate to VFs The template of configurations below is all supported. e.g. # ip link set $DEV vf $VF_NUM max_tx_rate $RATE_VALUE # ip link set $DEV vf $VF_NUM min_tx_rate $RATE_VALUE # ip link set $DEV vf $VF_NUM max_tx_rate $RATE_VALUE \ min_tx_rate $RATE_VALUE # ip link set $DEV vf $VF_NUM min_tx_rate $RATE_VALUE \ max_tx_rate $RATE_VALUE The max RATE_VALUE is limited to 0xFFFF which is about 63Gbps (using 1024 for 1G) Signed-off-by: Bin Chen <bin.chen@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-12	net: ethernet: SP7021: Fix spelling mistake "Interrput" -> "Interrupt"	Colin Ian King	1	-1/+1
	There is a spelling mistake in a dev_dbg message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20220511104448.150800-1-colin.i.king@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-12	net: enetc: kill PHY-less mode for PFs	Vladimir Oltean	1	-10/+14
	Right now, a PHY-less port (no phy-mode, no fixed-link, no phy-handle) doesn't register with phylink, but calls netif_carrier_on() from enetc_start(). This makes sense for a VF, but for a PF, this is braindead, because we never call enetc_mac_enable() so the MAC is left inoperational. Furthermore, commit 71b77a7a27a3 ("enetc: Migrate to PHYLINK and PCS_LYNX") put the nail in the coffin because it removed the initial netif_carrier_off() call done right after register_netdev(). Without that call, netif_carrier_on() does not call linkwatch_fire_event(), so the operstate remains IF_OPER_UNKNOWN. Just deny the broken configuration by requiring that a phy-mode is present, and always register a PF with phylink. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20220511094200.558502-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-12	fortify: Provide a memcpy trap door for sharp corners	Kees Cook	1	-1/+7
	As we continue to narrow the scope of what the FORTIFY memcpy() will accept and build alternative APIs that give the compiler appropriate visibility into more complex memcpy scenarios, there is a need for "unfortified" memcpy use in rare cases where combinations of compiler behaviors, source code layout, etc, result in cases where the stricter memcpy checks need to be bypassed until appropriate solutions can be developed (i.e. fix compiler bugs, code refactoring, new API, etc). The intention is for this to be used only if there's no other reasonable solution, for its use to include a justification that can be used to assess future solutions, and for it to be temporary. Example usage included, based on analysis and discussion from: https://lore.kernel.org/netdev/CANn89iLS_2cshtuXPyNUGDPaic=sJiYfvTb_wNLgWrZRyBxZ_g@mail.gmail.com Cc: Jakub Kicinski <kuba@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Coco Li <lixiaoyan@google.com> Cc: Tariq Toukan <tariqt@nvidia.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220511025301.3636666-1-keescook@chromium.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-12	net: bcmgenet: Check for Wake-on-LAN interrupt probe deferral	Florian Fainelli	1	-0/+4
	The interrupt controller supplying the Wake-on-LAN interrupt line maybe modular on some platforms (irq-bcm7038-l1.c) and might be probed at a later time than the GENET driver. We need to specifically check for -EPROBE_DEFER and propagate that error to ensure that we eventually fetch the interrupt descriptor. Fixes: 9deb48b53e7f ("bcmgenet: add WOL IRQ check") Fixes: 5b1f0e62941b ("net: bcmgenet: Avoid touching non-existent interrupt") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Stefan Wahren <stefan.wahren@i2se.com> Link: https://lore.kernel.org/r/20220511031752.2245566-1-f.fainelli@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-12	net: ethernet: mediatek: ppe: fix wrong size passed to memset()	Yang Yingliang	1	-1/+1
	'foe_table' is a pointer, the real size of struct mtk_foe_entry should be pass to memset(). Fixes: ba37b7caf1ed ("net: ethernet: mtk_eth_soc: add support for initializing the PPE") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20220511030829.3308094-1-yangyingliang@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-11	net: enetc: count the tc-taprio window drops	Po Liu	4	-2/+16
	The enetc scheduler for IEEE 802.1Qbv has 2 options (depending on PTGCR[TG_DROP_DISABLE]) when we attempt to send an oversized packet which will never fit in its allotted time slot for its traffic class: either block the entire port due to head-of-line blocking, or drop the packet and set a bit in the writeback format of the transmit buffer descriptor, allowing other packets to be sent. We obviously choose the second option in the driver, but we do not detect the drop condition, so from the perspective of the network stack, the packet is sent and no error counter is incremented. This change checks the writeback of the TX BD when tc-taprio is enabled, and increments a specific ethtool statistics counter and a generic "tx_dropped" counter in ndo_get_stats64. Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11	net: enetc: manage ENETC_F_QBV in priv->active_offloads only when enabled	Vladimir Oltean	2	-4/+8
	Future work in this driver would like to look at priv->active_offloads & ENETC_F_QBV to determine whether a tc-taprio qdisc offload was installed, but this does not produce the intended effect. All the other flags in priv->active_offloads are managed dynamically, except ENETC_F_QBV which is set statically based on the probed SI capability. This change makes priv->active_offloads & ENETC_F_QBV really track the presence of a tc-taprio schedule on the port. Some existing users, like the enetc_sched_speed_set() call from phylink_mac_link_up(), are best kept using the old logic: the tc-taprio offload does not re-trigger another link mode resolve, so the scheduler needs to be functional from the get go, as long as Qbv is supported at all on the port. So to preserve functionality there, look at the static station interface capability from pf->si->hw_features instead. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11	net: macb: use NAPI for TX completion path	Robert Hancock	2	-73/+161
	This driver was using the TX IRQ handler to perform all TX completion tasks. Under heavy TX network load, this can cause significant irqs-off latencies (found to be in the hundreds of microseconds using ftrace). This can cause other issues, such as overrunning serial UART FIFOs when using high baud rates with limited UART FIFO sizes. Switch to using a NAPI poll handler to perform the TX completion work to get this out of hard IRQ context and avoid the IRQ latency impact. A separate NAPI instance is used for TX and RX to avoid checking the other ring's state unnecessarily when doing the poll, and so that the NAPI budget handling can work for both TX and RX packets. A new per-queue tx_ptr_lock spinlock has been added to avoid using the main device lock (with IRQs needing to be disabled) across the entire TX mapping operation, and also to protect the TX queue pointers from concurrent access between the TX start and TX poll operations. The TX Used Bit Read interrupt (TXUBR) handling also needs to be moved into the TX NAPI poll handler to maintain the proper order of operations. A flag is used to notify the poll handler that a UBR condition needs to be handled. The macb_tx_restart handler has had some locking added for global register access, since this could now potentially happen concurrently on different queues. Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11	net: macb: simplify/cleanup NAPI reschedule checking	Robert Hancock	1	-34/+31
	Previously the macb_poll method was checking the RSR register after completing its RX receive work to see if additional packets had been received since IRQs were disabled, since this controller does not maintain the pending IRQ status across IRQ disable. It also had to double-check the register after re-enabling IRQs to detect if packets were received after the first check but before IRQs were enabled. Using the RSR register for this purpose is problematic since it reflects the global device state rather than the per-queue state, so if packets are being received on multiple queues it may end up retriggering receive on a queue where the packets did not actually arrive and not on the one where they did arrive. This will also cause problems with an upcoming change to use NAPI for the TX path where use of multiple queues is more likely. Add a macb_rx_pending function to check the RX ring to see if more packets have arrived in the queue, and use that to check if NAPI should be rescheduled rather than the RSR register. By doing this, we can just ignore the global RSR register entirely, and thus save some extra device register accesses at the same time. This also makes the previous first check for pending packets rather redundant, since it would be checking the RX ring state which was just checked in the receive work function. Therefore we can get rid of it and just check after enabling interrupts whether packets are already pending. Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11	i40e: i40e_main: fix a missing check on list iterator	Xiaomeng Tong	1	-13/+14
	The bug is here: ret = i40e_add_macvlan_filter(hw, ch->seid, vdev->dev_addr, &aq_err); The list iterator 'ch' will point to a bogus position containing HEAD if the list is empty or no element is found. This case must be checked before any use of the iterator, otherwise it will lead to a invalid memory access. To fix this bug, use a new variable 'iter' as the list iterator, while use the origin variable 'ch' as a dedicated pointer to point to the found element. Cc: stable@vger.kernel.org Fixes: 1d8d80b4e4ff6 ("i40e: Add macvlan support on i40e") Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220510204846.2166999-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11	Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue	Jakub Kicinski	6	-74/+4
	Tony Nguyen says: ==================== 1GbE Intel Wired LAN Driver Updates 2022-05-10 This series contains updates to igc driver only. Sasha cleans up the code by removing an unused function and removing an enum for PHY type as there is only one PHY. The return type for igc_check_downshift() is changed to void as it always returns success. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: igc: Change type of the 'igc_check_downshift' method igc: Remove unused phy_type enum igc: Remove igc_set_spd_dplx method ==================== Link: https://lore.kernel.org/r/20220510210656.2168393-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-11	Merge tag 'mlx5-lm-parallel' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into v5.19/vfio/next	Alex Williamson	46	-4477/+408
	Improve mlx5 live migration driver From Yishai: This series improves mlx5 live migration driver in few aspects as of below. Refactor to enable running migration commands in parallel over the PF command interface. To achieve that we exposed from mlx5_core an API to let the VF be notified before that the PF command interface goes down/up. (e.g. PF reload upon health recovery). Once having the above functionality in place mlx5 vfio doesn't need any more to obtain the global PF lock upon using the command interface but can rely on the above mechanism to be in sync with the PF. This can enable parallel VFs migration over the PF command interface from kernel driver point of view. In addition, Moved to use the PF async command mode for the SAVE state command. This enables returning earlier to user space upon issuing successfully the command and improve latency by let things run in parallel. Alex, as this series touches mlx5_core we may need to send this in a pull request format to VFIO to avoid conflicts before acceptance. Link: https://lore.kernel.org/all/20220510090206.90374-1-yishaih@nvidia.com Signed-of-by: Leon Romanovsky <leonro@nvidia.com>
2022-05-11	eth: amd: remove NI6510 support (ni65)	Jakub Kicinski	4	-1383/+0
	Looks like all the changes to this driver had been tree-wide refactoring since git era begun. The driver is using virt_to_bus() we should make it use more modern DMA APIs but since it's unlikely to be getting any use these days delete it instead. We can always revert to bring it back. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-11	net: atlantic: verify hw_head_ lies within TX buffer ring	Grant Grundler	1	-0/+7
	Bounds check hw_head index provided by NIC to verify it lies within the TX buffer ring. Reported-by: Aashay Shringarpure <aashay@google.com> Reported-by: Yi Chou <yich@google.com> Reported-by: Shervin Oloumi <enlightened@google.com> Signed-off-by: Grant Grundler <grundler@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-11	net: atlantic: add check for MAX_SKB_FRAGS	Grant Grundler	1	-1/+5
	Enforce that the CPU can not get stuck in an infinite loop. Reported-by: Aashay Shringarpure <aashay@google.com> Reported-by: Yi Chou <yich@google.com> Reported-by: Shervin Oloumi <enlightened@google.com> Signed-off-by: Grant Grundler <grundler@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-11	net: atlantic: reduce scope of is_rsc_complete	Grant Grundler	1	-7/+6
	Don't defer handling the err case outside the loop. That's pointless. And since is_rsc_complete is only used inside this loop, declare it inside the loop to reduce it's scope. Signed-off-by: Grant Grundler <grundler@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-11	net: atlantic: fix "frag[0] not initialized"	Grant Grundler	1	-2/+1
	In aq_ring_rx_clean(), if buff->is_eop is not set AND buff->len < AQ_CFG_RX_HDR_SIZE, then hdr_len remains equal to buff->len and skb_add_rx_frag(xxx, 0, ...) is not called. The loop following this code starts calling skb_add_rx_frag() starting with i=1 and thus frag[0] is never initialized. Since i is initialized to zero at the top of the primary loop, we can just reference and post-increment i instead of hardcoding the 0 when calling skb_add_rx_frag() the first time. Reported-by: Aashay Shringarpure <aashay@google.com> Reported-by: Yi Chou <yich@google.com> Reported-by: Shervin Oloumi <enlightened@google.com> Signed-off-by: Grant Grundler <grundler@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-11	Merge tag 'mlx5-updates-2022-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux	David S. Miller	17	-298/+711
	Saeed Mahameed says: ==================== mlx5-updates-2022-05-09 1) Gavin Li, adds exit route from waiting for FW init on device boot and increases FW init timeout on health recovery flow 2) Support 4 ports HCAs LAG mode Mark Bloch Says: ================ This series adds to mlx5 drivers support for 4 ports HCAs. Starting with ConnectX-7 HCAs with 4 ports are possible. As most driver parts aren't affected by such configuration most driver code is unchanged. Specially the only affected areas are: - Lag - Devcom - Merged E-Switch - Single FDB E-Switch Lag was chosen to be converted first. Creating hardware LAG when all 4 ports are added to the same bond device. Devom, merge E-Switch and single FDB E-Switch, are marked as supporting only 2 ports HCAs and future patches will add support for 4 ports HCAs. In order to activate the hardware lag a user can execute the: ip link add bond0 type bond ip link set bond0 type bond miimon 100 mode 2 ip link set eth2 master bond0 ip link set eth3 master bond0 ip link set eth4 master bond0 ip link set eth5 master bond0 Where eth2, eth3, eth4 and eth5 are the PFs of the same HCA. ================ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-10	net: stmmac: fix missing pci_disable_device() on error in stmmac_pci_probe()	Yang Yingliang	1	-3/+1
	Switch to using pcim_enable_device() to avoid missing pci_disable_device(). Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220510031316.1780409-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10	sfc: Add a basic Siena module	Martin Habets	5	-3/+28
	Make the (un)load message more specific to differentiate it from the sfc.ko messages. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10	sfc/siena: Inline functions in sriov.h to avoid conflicts with sfc	Martin Habets	2	-77/+63
	The implementation of each is quite short. This means sriov.c is not needed any more. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10	sfc/siena: Rename functions in nic_common.h to avoid conflicts with sfc	Martin Habets	14	-138/+95
	For siena use efx_siena_ as the function prefix. efx_nic_update_stats_atomic is only used in efx_common.c, so move it there. efx_nic_copy_stats is not used in Siena, so it is removed. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10	sfc/siena: Rename functions in mcdi headers to avoid conflicts with sfc	Martin Habets	17	-609/+459
	For siena use efx_siena_ as the function prefix. Several functions are not used in Siena, so they are removed. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10	sfc/siena: Rename peripheral functions to avoid conflicts with sfc	Martin Habets	15	-270/+270
	For siena use efx_siena_ as the function prefix. This patch covers selftest.h, ptp.h, net_driver.h and ethtool_common.h. efx_ethtool_fill_self_tests() can become static. Some functions in ptp.c can also become static. Rename loopback_mode in net_driver.h. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10	sfc/siena: Rename RX/TX functions to avoid conflicts with sfc	Martin Habets	13	-230/+213
	For siena use efx_siena_ as the function prefix. Several functions are not used in Siena, so they are removed. Use a Siena specific variable name for module parameter efx_separate_tx_channels. Move efx_fini_tx_queue() to avoid a forward declaration of efx_dequeue_buffer(). Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10	sfc/siena: Rename functions in efx headers to avoid conflicts with sfc	Martin Habets	23	-466/+421
	When building with allyesconfig there are many identical symbol names. For siena use efx_siena_ as the function and variable prefix to avoid build errors. efx_mtd_remove_partition can become static as it is no longer called from other files. efx_ticks_to_usecs and efx_xmit_done_single are not used in Siena, so they are removed. Several functions are only used inside efx_channels.c for Siena so they can become static. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10	sfc/siena: Remove build references to missing functionality	Martin Habets	10	-458/+17
	Functionality not supported or needed on Siena includes: - Anything for EF100 - EF10 specifics such as register access, PIO and TSO offload. Also only bind to Siena NICs. Remove EF10 specifics from nic.h. The functions that start with efx_farch_ will be removed from sfc.ko with a subsequent patch. Add the efx_ prefix to siena_prepare_flush() to make it consistent with the other APIs. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10	sfc: Copy shared files needed for Siena (part 2)	Martin Habets	27	-0/+14153
	These are the files starting with m through w. No changes are done, those will be done with subsequent commits. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10	sfc: Copy shared files needed for Siena (part 1)	Martin Habets	14	-0/+10524
	These are the files starting with b through i. No changes are done, those will be done with subsequent commits. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10	sfc: Move Siena specific files	Martin Habets	4	-0/+0
	Files are only moved, no changes are made. Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10	nfp: flower: fix 'variable 'flow6' set but not used'	Louis Peens	1	-12/+7
	Kernel test robot reported an issue after a recent patch about an unused variable when CONFIG_IPV6 is disabled. Move the variable declaration to be inside the #ifdef, and do a bit more cleanup. There is no need to use a temporary ipv6 bool value, it is just checked once, remove the extra variable and just do the check directly. Fixes: 9d5447ed44b5 ("nfp: flower: fixup ipv6/ipv4 route lookup for neigh events") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20220510074845.41457-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-10	igc: Change type of the 'igc_check_downshift' method	Sasha Neftin	2	-6/+2
	The 'igc_check_downshift' method always returns 0; there is no need for a return value so change the type of this method to void. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-10	igc: Remove unused phy_type enum	Sasha Neftin	3	-18/+3
	Complete to commit 8e153faf5827 ("igc: Remove unused phy type") i225 parts have only one PHY. There is no point to use phy_type enum. Clean up the code accordingly, and get rid of the unused enum lines. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-10	igc: Remove igc_set_spd_dplx method	Sasha Neftin	2	-51/+0
	igc_set_spd_dplx method is not used. This patch comes to tidy up the driver code. Reported-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-05-10	net/mlx5: Expose mlx5_sriov_blocking_notifier_register / unregister APIs	Yishai Hadas	1	-1/+64
	Expose mlx5_sriov_blocking_notifier_register / unregister APIs to let a VF register to be notified for its enablement / disablement by the PF. Upon VF probe it will call mlx5_sriov_blocking_notifier_register() with its notifier block and upon VF remove it will call mlx5_sriov_blocking_notifier_unregister() to drop its registration. This can give a VF the ability to clean some resources upon disable before that the command interface goes down and on the other hand sets some stuff before that it's enabled. This may be used by a VF which is migration capable in few cases.(e.g. PF load/unload upon an health recovery). Link: https://lore.kernel.org/r/20220510090206.90374-2-yishaih@nvidia.com Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-05-10	net: ethernet: Add driver for Sunplus SP7021	Wells Lu	17	-0/+2042
	Add driver for Sunplus SP7021 SoC. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Wells Lu <wellslutw@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10	net: atlantic: always deep reset on pm op, fixing up my null deref regression	Manuel Ullmann	1	-2/+2
	The impact of this regression is the same for resume that I saw on thaw: the kernel hangs and nothing except SysRq rebooting can be done. Fixes regression in commit cbe6c3a8f8f4 ("net: atlantic: invert deep par in pm functions, preventing null derefs"), where I disabled deep pm resets in suspend and resume, trying to make sense of the atl_resume_common() deep parameter in the first place. It turns out, that atlantic always has to deep reset on pm operations. Even though I expected that and tested resume, I screwed up by kexec-rebooting into an unpatched kernel, thus missing the breakage. This fixup obsoletes the deep parameter of atl_resume_common, but I leave the cleanup for the maintainers to post to mainline. Suspend and hibernation were successfully tested by the reporters. Fixes: cbe6c3a8f8f4 ("net: atlantic: invert deep par in pm functions, preventing null derefs") Link: https://lore.kernel.org/regressions/9-Ehc_xXSwdXcvZqKD5aSqsqeNj5Izco4MYEwnx5cySXVEc9-x_WC4C3kAoCqNTi-H38frroUK17iobNVnkLtW36V6VWGSQEOHXhmVMm5iQ=@protonmail.com/ Reported-by: Jordan Leppert <jordanleppert@protonmail.com> Reported-by: Holger Hoffstaette <holger@applied-asynchrony.com> Tested-by: Jordan Leppert <jordanleppert@protonmail.com> Tested-by: Holger Hoffstaette <holger@applied-asynchrony.com> CC: <stable@vger.kernel.org> # 5.10+ Signed-off-by: Manuel Ullmann <labre@posteo.de> Link: https://lore.kernel.org/r/87bkw8dfmp.fsf@posteo.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10	tsnep: Add free running cycle counter support	Gerhard Engleder	3	-7/+63
	The TSN endpoint Ethernet MAC supports a free running counter additionally to its clock. This free running counter can be read and hardware timestamps are supported. As the name implies, this counter cannot be set and its frequency cannot be adjusted. Add free running cycle counter support based on this free running counter to physical clock. This also requires hardware time stamps based on that free running counter. Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-10	eth: dpaa2-mac: remove a dead-code NULL check on fwnode parent	Jakub Kicinski	1	-3/+0
	Since commit 4e30e98c4b4c ("dpaa2-mac: return -EPROBE_DEFER from dpaa2_mac_open in case the fwnode is not set") @parent can't be NULL after the if. It's either the address of the ->fwnode of @dpmacs or @fwnode in case of ACPI. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20220506200029.852310-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-05-09	net/mlx5: Lag, add debugfs to query hardware lag state	Mark Bloch	4	-4/+191
	Lag state has become very complicated with many modes, flags, types and port selections methods and future work will add additional features. Add a debugfs to query the current lag state. A new directory named "lag" will be created under the mlx5 debugfs directory. As the driver has debugfs per pci function the location will be: <debugfs>/mlx5/<BDF>/lag For example: /sys/kernel/debug/mlx5/0000:08:00.0/lag The following files are exposed: - state: Returns "active" or "disabled". If "active" it means hardware lag is active. - members: Returns the BDFs of all the members of lag object. - type: Returns the type of the lag currently configured. Valid only if hardware lag is active. * "roce" - Members are bare metal PFs. * "switchdev" - Members are in switchdev mode. * "multipath" - ECMP offloads. - port_sel_mode: Returns the egress port selection method, valid only if hardware lag is active. * "queue_affinity" - Egress port is selected by the QP/SQ affinity. * "hash" - Egress port is selected by hash done on each packet. Controlled by: xmit_hash_policy of the bond device. - flags: Returns flags that are specific per lag @type. Valid only if hardware lag is active. * "shared_fdb" - "on" or "off", if "on" single FDB is used. - mapping: Returns the mapping which is used to select egress port. Valid only if hardware lag is active. If @port_sel_mode is "hash" returns the active egress ports. The hash result will select only active ports. if @port_sel_mode is "queue_affinity" returns the mapping between the configured port affinity of the QP/SQ and actual egress port. For example: * 1:1 - Mapping means if the configured affinity is port 1 traffic will egress via port 1. * 1:2 - Mapping means if the configured affinity is port 1 traffic will egress via port 2. This can happen if port 1 is down or in active/backup mode and port 1 is backup. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09	net/mlx5: Lag, use buckets in hash mode	Mark Bloch	4	-75/+181
	When in hardware lag and the NIC has more than 2 ports when one port goes down need to distribute the traffic between the remaining active ports. For better spread in such cases instead of using 1-to-1 mapping and only 4 slots in the hash, use many. Each port will have many slots that point to it. When a port goes down go over all the slots that pointed to that port and spread them between the remaining active ports. Once the port comes back restore the default mapping. We will have number_of_ports * MLX5_LAG_MAX_HASH_BUCKETS slots. Each MLX5_LAG_MAX_HASH_BUCKETS belong to a different port. The native mapping is such that: port 1: The first MLX5_LAG_MAX_HASH_BUCKETS slots are: [1, 1, .., 1] which means if a packet is hased into one of this slots it will hit the wire via port 1. port 2: The second MLX5_LAG_MAX_HASH_BUCKETS slots are: [2, 2, .., 2] which means if a packet is hased into one of this slots it will hit the wire via port2. and this mapping is the same of the rest of the ports. On a failover, lets say port 2 goes down (port 1, 3, 4 are still up). the new mapping for port 2 will be: port 2: The second MLX5_LAG_MAX_HASH_BUCKETS are: [1, 3, 1, 4, .., 4] which means the mapping was changed from the native mapping to a mapping that consists of only the active ports. With this if a port goes down the traffic will be split between the active ports randomly Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09	net/mlx5: Lag, refactor dmesg print	Mark Bloch	1	-10/+12
	Combine dmesg lag prints into a single function. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09	net/mlx5: Support devices with more than 2 ports	Mark Bloch	2	-2/+4
	Increase the define MLX5_MAX_PORTS to 4 as the driver is ready to support NICs with 4 ports. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09	net/mlx5: Lag, use actual number of lag ports	Mark Bloch	3	-149/+216
	Refactor the entire lag code to use ldev->ports instead of hard-coded defines (like MLX5_MAX_PORTS) for its operations. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>