aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-10-23tg3: prevent use of uninitialized remote_adv and local_adv variablesAlexey Simakov1-4/+1
[ Upstream commit 0c3f2e62815a43628e748b1e4ad97a1c46cce703 ] Some execution paths that jump to the fiber_setup_done label could leave the remote_adv and local_adv variables uninitialized and then use it. Initialize this variables at the point of definition to avoid this. Fixes: 85730a631f0c ("tg3: Add SGMII phy support for 5719/5718 serdes") Co-developed-by: Alexandr Sapozhnikov <alsp705@gmail.com> Signed-off-by: Alexandr Sapozhnikov <alsp705@gmail.com> Signed-off-by: Alexey Simakov <bigalex934@gmail.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20251014164736.5890-1-bigalex934@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-23net: airoha: Take into account out-of-order tx completions in airoha_dev_xmit()Lorenzo Bianconi1-1/+15
[ Upstream commit bd5afca115f181c85f992d42a57cd497bc823ccb ] Completion napi can free out-of-order tx descriptors if hw QoS is enabled and packets with different priority are queued to same DMA ring. Take into account possible out-of-order reports checking if the tx queue is full using circular buffer head/tail pointer instead of the number of queued packets. Fixes: 23020f0493270 ("net: airoha: Introduce ethernet support for EN7581 SoC") Suggested-by: Simon Horman <horms@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251012-airoha-tx-busy-queue-v2-1-a600b08bab2d@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-23Octeontx2-af: Fix missing error code in cgx_probe()Harshit Mogalapalli1-0/+1
[ Upstream commit c5705a2a4aa35350e504b72a94b5c71c3754833c ] When CGX fails mapping to NIX, set the error code to -ENODEV, currently err is zero and that is treated as success path. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/aLAdlCg2_Yv7Y-3h@stanley.mountain/ Fixes: d280233fc866 ("Octeontx2-af: Fix NIX X2P calibration failures") Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251010204239.94237-1-harshit.m.mogalapalli@oracle.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-23amd-xgbe: Avoid spurious link down messages during interface toggleRaju Rangoju2-1/+1
[ Upstream commit 2616222e423398bb374ffcb5d23dea4ba2c3e524 ] During interface toggle operations (ifdown/ifup), the driver currently resets the local helper variable 'phy_link' to -1. This causes the link state machine to incorrectly interpret the state as a link change event, resulting in spurious "Link is down" messages being logged when the interface is brought back up. Preserve the phy_link state across interface toggles to avoid treating the -1 sentinel value as a legitimate link state transition. Fixes: 88131a812b16 ("amd-xgbe: Perform phy connect/disconnect at dev open/stop") Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Link: https://patch.msgid.link/20251010065142.1189310-1-Raju.Rangoju@amd.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-23ixgbe: fix too early devlink_free() in ixgbe_remove()Koichiro Den1-1/+2
[ Upstream commit 5feef67b646d8f5064bac288e22204ffba2b9a4a ] Since ixgbe_adapter is embedded in devlink, calling devlink_free() prematurely in the ixgbe_remove() path can lead to UAF. Move devlink_free() to the end. KASAN report: BUG: KASAN: use-after-free in ixgbe_reset_interrupt_capability+0x140/0x180 [ixgbe] Read of size 8 at addr ffff0000adf813e0 by task bash/2095 CPU: 1 UID: 0 PID: 2095 Comm: bash Tainted: G S 6.17.0-rc2-tnguy.net-queue+ #1 PREEMPT(full) [...] Call trace: show_stack+0x30/0x90 (C) dump_stack_lvl+0x9c/0xd0 print_address_description.constprop.0+0x90/0x310 print_report+0x104/0x1f0 kasan_report+0x88/0x180 __asan_report_load8_noabort+0x20/0x30 ixgbe_reset_interrupt_capability+0x140/0x180 [ixgbe] ixgbe_clear_interrupt_scheme+0xf8/0x130 [ixgbe] ixgbe_remove+0x2d0/0x8c0 [ixgbe] pci_device_remove+0xa0/0x220 device_remove+0xb8/0x170 device_release_driver_internal+0x318/0x490 device_driver_detach+0x40/0x68 unbind_store+0xec/0x118 drv_attr_store+0x64/0xb8 sysfs_kf_write+0xcc/0x138 kernfs_fop_write_iter+0x294/0x440 new_sync_write+0x1fc/0x588 vfs_write+0x480/0x6a0 ksys_write+0xf0/0x1e0 __arm64_sys_write+0x70/0xc0 invoke_syscall.constprop.0+0xcc/0x280 el0_svc_common.constprop.0+0xa8/0x248 do_el0_svc+0x44/0x68 el0_svc+0x54/0x160 el0t_64_sync_handler+0xa0/0xe8 el0t_64_sync+0x1b0/0x1b8 Fixes: a0285236ab93 ("ixgbe: add initial devlink support") Signed-off-by: Koichiro Den <den@valinux.co.jp> Tested-by: Rinitha S <sx.rinitha@intel.com> Reviewed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251009-jk-iwl-net-2025-10-01-v3-6-ef32a425b92a@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-23idpf: cleanup remaining SKBs in PTP flowsMilena Olech2-0/+4
[ Upstream commit a3f8c0a273120fd2638f03403e786c3de2382e72 ] When the driver requests Tx timestamp value, one of the first steps is to clone SKB using skb_get. It increases the reference counter for that SKB to prevent unexpected freeing by another component. However, there may be a case where the index is requested, SKB is assigned and never consumed by PTP flows - for example due to reset during running PTP apps. Add a check in release timestamping function to verify if the SKB assigned to Tx timestamp latch was freed, and release remaining SKBs. Fixes: 4901e83a94ef ("idpf: add Tx timestamp capabilities negotiation") Signed-off-by: Milena Olech <milena.olech@intel.com> Signed-off-by: Anton Nadezhdin <anton.nadezhdin@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251009-jk-iwl-net-2025-10-01-v3-1-ef32a425b92a@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-23r8169: fix packet truncation after S4 resume on RTL8168H/RTL8111HLinmao Li1-2/+3
[ Upstream commit 70f92ab97042f243e1c8da1c457ff56b9b3e49f1 ] After resume from S4 (hibernate), RTL8168H/RTL8111H truncates incoming packets. Packet captures show messages like "IP truncated-ip - 146 bytes missing!". The issue is caused by RxConfig not being properly re-initialized after resume. Re-initializing the RxConfig register before the chip re-initialization sequence avoids the truncation and restores correct packet reception. This follows the same pattern as commit ef9da46ddef0 ("r8169: fix data corruption issue on RTL8402"). Fixes: 6e1d0b898818 ("r8169:add support for RTL8168H and RTL8107E") Signed-off-by: Linmao Li <lilinmao@kylinos.cn> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/20251009122549.3955845-1-lilinmao@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-23net: dlink: handle dma_map_single() failure properlyYeounsu Moon1-7/+16
[ Upstream commit 65946eac6d888d50ae527c4e5c237dbe5cc3a2f2 ] There is no error handling for `dma_map_single()` failures. Add error handling by checking `dma_mapping_error()` and freeing the `skb` using `dev_kfree_skb()` (process context) when it fails. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Yeounsu Moon <yyyynoom@gmail.com> Tested-on: D-Link DGE-550T Rev-A3 Suggested-by: Simon Horman <horms@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-23net: mtk: wed: add dma mask limitation and GFP_DMA32 for device with more than 4GB DRAMRex Lu1-2/+6
[ Upstream commit 3abc0e55ea1fa2250e52bc860e8f24b2b9a2093a ] Limit tx/rx buffer address to 32-bit address space for board with more than 4GB DRAM. Fixes: 804775dfc2885 ("net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)") Fixes: 6757d345dd7db ("net: ethernet: mtk_wed: introduce hw_rro support for MT7988") Tested-by: Daniel Pawlik <pawlik.dan@gmail.com> Tested-by: Matteo Croce <teknoraver@meta.com> Signed-off-by: Rex Lu <rex.lu@mediatek.com> Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-23gve: Check valid ts bit on RX descriptor before hw timestampingTim Hostetler3-7/+16
commit bfdd74166a639930baaba27a8d729edaacd46907 upstream. The device returns a valid bit in the LSB of the low timestamp byte in the completion descriptor that the driver should check before setting the SKB's hardware timestamp. If the timestamp is not valid, do not hardware timestamp the SKB. Cc: stable@vger.kernel.org Fixes: b2c7aeb49056 ("gve: Implement ndo_hwtstamp_get/set for RX timestamping") Reviewed-by: Joshua Washington <joshwash@google.com> Signed-off-by: Tim Hostetler <thostet@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20251014004740.2775957-1-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-23ixgbevf: fix mailbox API compatibility by negotiating supported featuresJedrzej Jagielski6-3/+96
commit a7075f501bd33c93570af759b6f4302ef0175168 upstream. There was backward compatibility in the terms of mailbox API. Various drivers from various OSes supporting 10G adapters from Intel portfolio could easily negotiate mailbox API. This convention has been broken since introducing API 1.4. Commit 0062e7cc955e ("ixgbevf: add VF IPsec offload code") added support for IPSec which is specific only for the kernel ixgbe driver. None of the rest of the Intel 10G PF/VF drivers supports it. And actually lack of support was not included in the IPSec implementation - there were no such code paths. No possibility to negotiate support for the feature was introduced along with introduction of the feature itself. Commit 339f28964147 ("ixgbevf: Add support for new mailbox communication between PF and VF") increasing API version to 1.5 did the same - it introduced code supported specifically by the PF ESX driver. It altered API version for the VF driver in the same time not touching the version defined for the PF ixgbe driver. It led to additional discrepancies, as the code provided within API 1.6 cannot be supported for Linux ixgbe driver as it causes crashes. The issue was noticed some time ago and mitigated by Jake within the commit d0725312adf5 ("ixgbevf: stop attempting IPSEC offload on Mailbox API 1.5"). As a result we have regression for IPsec support and after increasing API to version 1.6 ixgbevf driver stopped to support ESX MBX. To fix this mess add new mailbox op asking PF driver about supported features. Basing on a response determine whether to set support for IPSec and ESX-specific enhanced mailbox. New mailbox op, for compatibility purposes, must be added within new API revision, as API version of OOT PF & VF drivers is already increased to 1.6 and doesn't incorporate features negotiate op. Features negotiation mechanism gives possibility to be extended with new features when needed in the future. Reported-by: Jacob Keller <jacob.e.keller@intel.com> Closes: https://lore.kernel.org/intel-wired-lan/20241101-jk-ixgbevf-mailbox-v1-5-fixes-v1-0-f556dc9a66ed@intel.com/ Fixes: 0062e7cc955e ("ixgbevf: add VF IPsec offload code") Fixes: 339f28964147 ("ixgbevf: Add support for new mailbox communication between PF and VF") Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251009-jk-iwl-net-2025-10-01-v3-4-ef32a425b92a@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-23ixgbevf: fix getting link speed data for E610 devicesJedrzej Jagielski4-32/+116
commit 53f0eb62b4d23d40686f2dd51776b8220f2887bb upstream. E610 adapters no longer use the VFLINKS register to read PF's link speed and linkup state. As a result VF driver cannot get actual link state and it incorrectly reports 10G which is the default option. It leads to a situation where even 1G adapters print 10G as actual link speed. The same happens when PF driver set speed different than 10G. Add new mailbox operation to let the VF driver request a PF driver to provide actual link data. Update the mailbox api to v1.6. Incorporate both ways of getting link status within the legacy ixgbe_check_mac_link_vf() function. Fixes: 4c44b450c69b ("ixgbevf: Add support for Intel(R) E610 device") Co-developed-by: Andrzej Wilczynski <andrzejx.wilczynski@intel.com> Signed-off-by: Andrzej Wilczynski <andrzejx.wilczynski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251009-jk-iwl-net-2025-10-01-v3-2-ef32a425b92a@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-19net: airoha: Fix loopback mode configuration for GDM2 portLorenzo Bianconi2-1/+6
[ Upstream commit fea8cdf6738a8b25fccbb7b109b440795a0892cb ] Add missing configuration for loopback mode in airhoha_set_gdm2_loopback routine. Fixes: 9cd451d414f6e ("net: airoha: Add loopback support for GDM2") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251008-airoha-loopback-mode-fix-v2-1-045694fe7f60@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-19net/mlx5e: Prevent tunnel reformat when tunnel mode not allowedCarolina Jubran3-21/+43
[ Upstream commit 22239eb258bc1e6ccdb2d3502fce1cc2b2a88386 ] When configuring IPsec packet offload in tunnel mode, the driver tries to create tunnel reformat objects unconditionally. This is incorrect, because tunnel mode is only permitted under specific encapsulation settings, and that decision is already made when the flow table is created. The offending commit attempted to block this case in the state add path, but the check there happens too late and does not prevent the reformat from being configured. Fix by taking short reservations for both the eswitch mode and the encap at the start of state setup. This preserves the block ordering (mode --> encap) used later: the mode is blocked during RX/TX get, and the encap is blocked during flow-table creation. This lets us fail early if either reservation cannot be obtained, it means a mode transition is underway or a conflicting configuration already owns encap. If both succeed, the flow-table path later takes the ownership and the reservations are released on exit. Fixes: 146c196b60e4 ("net/mlx5e: Create IPsec table with tunnel support only when encap is disabled") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1759652999-858513-3-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-19net/mlx5: Prevent tunnel mode conflicts between FDB and NIC IPsec tablesCarolina Jubran3-12/+19
[ Upstream commit 7593439c13933164f701eed9c83d89358f203469 ] When creating IPsec flow tables with tunnel mode enabled, the driver uses mlx5_eswitch_block_encap() to prevent tunnel encapsulation conflicts across different domains (NIC_RX/NIC_TX and FDB), since the firmware doesn’t allow both at the same time. Currently, the driver attempts to reserve tunnel mode unconditionally for both NIC and FDB IPsec tables. This can lead to conflicting tunnel mode setups, for example, if a flow table was created in the FDB domain with tunnel offload enabled, and we later try to create another one in the NIC, or vice versa. To resolve this, adjust the blocking logic so that tunnel mode is only reserved by NIC flows. This ensures that tunnel offload is exclusively used in either the NIC or the FDB, and avoids unintended offload conflicts. Fixes: 1762f132d542 ("net/mlx5e: Support IPsec packet offload for RX in switchdev mode") Fixes: c6c2bf5db4ea ("net/mlx5e: Support IPsec packet offload for TX in switchdev mode") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1759652999-858513-2-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-19net: sparx5/lan969x: fix flooding configuration on bridge join/leaveDaniel Machon3-10/+17
[ Upstream commit c9d1b0b54258ba13b567dd116ead3c7c30cba7d8 ] The sparx5 driver programs UC/MC/BC flooding in sparx5_update_fwd() by unconditionally applying bridge_fwd_mask to all flood PGIDs. Any bridge topology change that triggers sparx5_update_fwd() (for example enslaving another port) therefore reinstalls flooding in hardware for already bridged ports, regardless of their per-port flood flags. This results in clobbering of the flood masks, and desynchronization between software and hardware: the bridge still reports “flood off” for the port, but hardware has flooding enabled due to unconditional PGID reprogramming. Steps to reproduce: $ ip link add br0 type bridge $ ip link set br0 up $ ip link set eth0 master br0 $ ip link set eth0 up $ bridge link set dev eth0 flood off $ ip link set eth1 master br0 $ ip link set eth1 up At this point, flooding is silently re-enabled for eth0. Software still shows “flood off” for eth0, but hardware has flooding enabled. To fix this, flooding is now set explicitly during bridge join/leave, through sparx5_port_attr_bridge_flags(): On bridge join, UC/MC/BC flooding is enabled by default. On bridge leave, UC/MC/BC flooding is disabled. sparx5_update_fwd() no longer touches the flood PGIDs, clobbering the flood masks, and desynchronizing software and hardware. Initialization of the flooding PGIDs have been moved to sparx5_start(). This is required as flooding PGIDs defaults to 0x3fffffff in hardware and the initialization was previously handled in sparx5_update_fwd(), which was removed. With this change, user-configured flooding flags persist across bridge updates and are no longer overridden by sparx5_update_fwd(). Fixes: d6fce5141929 ("net: sparx5: add switching support") Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251003-fix-flood-fwd-v1-1-48eb478b2904@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-19net: fsl_pq_mdio: Fix device node reference leak in fsl_pq_mdio_probeErick Karanja1-0/+2
[ Upstream commit 521405cb54cd2812bbb6dedd5afc14bca1e7e98a ] Add missing of_node_put call to release device node tbi obtained via for_each_child_of_node. Fixes: afae5ad78b342 ("net/fsl_pq_mdio: streamline probing of MDIO nodes") Signed-off-by: Erick Karanja <karanja99erick@gmail.com> Link: https://patch.msgid.link/20251002174617.960521-1-karanja99erick@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-19ice: ice_adapter: release xa entry on adapter allocation failureHaotian Zhang1-4/+6
[ Upstream commit 2db687f3469dbc5c59bc53d55acafd75d530b497 ] When ice_adapter_new() fails, the reserved XArray entry created by xa_insert() is not released. This causes subsequent insertions at the same index to return -EBUSY, potentially leading to NULL pointer dereferences. Reorder the operations as suggested by Przemek Kitszel: 1. Check if adapter already exists (xa_load) 2. Reserve the XArray slot (xa_reserve) 3. Allocate the adapter (ice_adapter_new) 4. Store the adapter (xa_store) Fixes: 0f0023c649c7 ("ice: do not init struct ice_adapter more times than needed") Suggested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Suggested-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20251001115336.1707-1-vulab@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-19net: mscc: ocelot: Fix use-after-free caused by cyclic delayed workDuoming Zhou1-1/+1
[ Upstream commit bc9ea787079671cb19a8b25ff9f02be5ef6bfcf5 ] The origin code calls cancel_delayed_work() in ocelot_stats_deinit() to cancel the cyclic delayed work item ocelot->stats_work. However, cancel_delayed_work() may fail to cancel the work item if it is already executing. While destroy_workqueue() does wait for all pending work items in the work queue to complete before destroying the work queue, it cannot prevent the delayed work item from being rescheduled within the ocelot_check_stats_work() function. This limitation exists because the delayed work item is only enqueued into the work queue after its timer expires. Before the timer expiration, destroy_workqueue() has no visibility of this pending work item. Once the work queue appears empty, destroy_workqueue() proceeds with destruction. When the timer eventually expires, the delayed work item gets queued again, leading to the following warning: workqueue: cannot queue ocelot_check_stats_work on wq ocelot-switch-stats WARNING: CPU: 2 PID: 0 at kernel/workqueue.c:2255 __queue_work+0x875/0xaf0 ... RIP: 0010:__queue_work+0x875/0xaf0 ... RSP: 0018:ffff88806d108b10 EFLAGS: 00010086 RAX: 0000000000000000 RBX: 0000000000000101 RCX: 0000000000000027 RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffff88806d123e88 RBP: ffffffff813c3170 R08: 0000000000000000 R09: ffffed100da247d2 R10: ffffed100da247d1 R11: ffff88806d123e8b R12: ffff88800c00f000 R13: ffff88800d7285c0 R14: ffff88806d0a5580 R15: ffff88800d7285a0 FS: 0000000000000000(0000) GS:ffff8880e5725000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe18e45ea10 CR3: 0000000005e6c000 CR4: 00000000000006f0 Call Trace: <IRQ> ? kasan_report+0xc6/0xf0 ? __pfx_delayed_work_timer_fn+0x10/0x10 ? __pfx_delayed_work_timer_fn+0x10/0x10 call_timer_fn+0x25/0x1c0 __run_timer_base.part.0+0x3be/0x8c0 ? __pfx_delayed_work_timer_fn+0x10/0x10 ? rcu_sched_clock_irq+0xb06/0x27d0 ? __pfx___run_timer_base.part.0+0x10/0x10 ? try_to_wake_up+0xb15/0x1960 ? _raw_spin_lock_irq+0x80/0xe0 ? __pfx__raw_spin_lock_irq+0x10/0x10 tmigr_handle_remote_up+0x603/0x7e0 ? __pfx_tmigr_handle_remote_up+0x10/0x10 ? sched_balance_trigger+0x1c0/0x9f0 ? sched_tick+0x221/0x5a0 ? _raw_spin_lock_irq+0x80/0xe0 ? __pfx__raw_spin_lock_irq+0x10/0x10 ? tick_nohz_handler+0x339/0x440 ? __pfx_tmigr_handle_remote_up+0x10/0x10 __walk_groups.isra.0+0x42/0x150 tmigr_handle_remote+0x1f4/0x2e0 ? __pfx_tmigr_handle_remote+0x10/0x10 ? ktime_get+0x60/0x140 ? lapic_next_event+0x11/0x20 ? clockevents_program_event+0x1d4/0x2a0 ? hrtimer_interrupt+0x322/0x780 handle_softirqs+0x16a/0x550 irq_exit_rcu+0xaf/0xe0 sysvec_apic_timer_interrupt+0x70/0x80 </IRQ> ... The following diagram reveals the cause of the above warning: CPU 0 (remove) | CPU 1 (delayed work callback) mscc_ocelot_remove() | ocelot_deinit() | ocelot_check_stats_work() ocelot_stats_deinit() | cancel_delayed_work()| ... | queue_delayed_work() destroy_workqueue() | (wait a time) | __queue_work() //UAF The above scenario actually constitutes a UAF vulnerability. The ocelot_stats_deinit() is only invoked when initialization failure or resource destruction, so we must ensure that any delayed work items cannot be rescheduled. Replace cancel_delayed_work() with disable_delayed_work_sync() to guarantee proper cancellation of the delayed work item and ensure completion of any currently executing work before the workqueue is deallocated. A deadlock concern was considered: ocelot_stats_deinit() is called in a process context and is not holding any locks that the delayed work item might also need. Therefore, the use of the _sync() variant is safe here. This bug was identified through static analysis. To reproduce the issue and validate the fix, I simulated ocelot-switch device by writing a kernel module and prepared the necessary resources for the virtual ocelot-switch device's probe process. Then, removing the virtual device will trigger the mscc_ocelot_remove() function, which in turn destroys the workqueue. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Link: https://patch.msgid.link/20251001011149.55073-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-19net/mlx4: prevent potential use after free in mlx4_en_do_uc_filter()Dan Carpenter1-1/+1
[ Upstream commit 4f0d91ba72811fd5dd577bcdccd7fed649aae62c ] Print "entry->mac" before freeing "entry". The "entry" pointer is freed with kfree_rcu() so it's unlikely that we would trigger this in real life, but it's safer to re-order it. Fixes: cc5387f7346a ("net/mlx4_en: Add unicast MAC filtering") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/aNvMHX4g8RksFFvV@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-15octeontx2-pf: fix bitmap leakBo Sun1-0/+1
[ Upstream commit 92e9f4faffca70c82126e59552f6e8ff8f95cc65 ] The bitmap allocated with bitmap_zalloc() in otx2_probe() was not released in otx2_remove(). Unbinding and rebinding the driver therefore triggers a kmemleak warning: unreferenced object (size 8): backtrace: bitmap_zalloc otx2_probe Call bitmap_free() in the remove path to fix the leak. Fixes: efabce290151 ("octeontx2-pf: AF_XDP zero copy receive support") Signed-off-by: Bo Sun <bo@mboxify.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-15octeontx2-vf: fix bitmap leakBo Sun1-0/+1
[ Upstream commit cd9ea7da41a449ff1950230a35990155457b9879 ] The bitmap allocated with bitmap_zalloc() in otx2vf_probe() was not released in otx2vf_remove(). Unbinding and rebinding the driver therefore triggers a kmemleak warning: unreferenced object (size 8): backtrace: bitmap_zalloc otx2vf_probe Call bitmap_free() in the remove path to fix the leak. Fixes: efabce290151 ("octeontx2-pf: AF_XDP zero copy receive support") Signed-off-by: Bo Sun <bo@mboxify.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-15Revert "net/mlx5e: Update and set Xon/Xoff upon MTU set"Jakub Kicinski2-28/+1
[ Upstream commit 6f5dacf88a32b3fd8b52c8ea781bf188c42aaa95 ] This reverts commit ceddedc969f0532b7c62ca971ee50d519d2bc0cb. Commit in question breaks the mapping of PGs to pools for some SKUs. Specifically multi-host NICs seem to be shipped with a custom buffer configuration which maps the lossy PG to pool 4. But the bad commit overrides this with pool 0 which does not have sufficient buffer space reserved. Resulting in ~40% packet loss. The commit also breaks BMC / OOB connection completely (100% packet loss). Revert, similarly to commit 3fbfe251cc9f ("Revert "net/mlx5e: Update and set Xon/Xoff upon port speed set""). The breakage is exactly the same, the only difference is that quoted commit would break the NIC immediately on boot, and the currently reverted commit only when MTU is changed. Note: "good" kernels do not restore the configuration, so downgrade isn't enough to recover machines. A NIC power cycle seems to be necessary to return to a healthy state (or overriding the relevant registers using a custom patch). Fixes: ceddedc969f0 ("net/mlx5e: Update and set Xon/Xoff upon MTU set") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20250929181529.1848157-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-15net: enetc: initialize SW PIR and CIR based HW PIR and CIR valuesWei Fang1-10/+5
[ Upstream commit 2aff4420efc2910e905ee5b000e04e87422aebc4 ] Software can only initialize the PIR and CIR of the command BD ring after a FLR, and these two registers can only be set to 0. But the reset values of these two registers are 0, so software does not need to update them. If there is no a FLR and PIR and CIR are not 0, resetting them to 0 or other values by software will cause the command BD ring to work abnormally. This is because of an internal context in the ring prefetch logic that will retain the state from the first incarnation of the ring and continue prefetching from the stale location when the ring is reinitialized. The internal context can only be reset by the FLR. In addition, there is a logic error in the implementation, next_to_clean indicates the software CIR and next_to_use indicates the software PIR. But the current driver uses next_to_clean to set PIR and use next_to_use to set CIR. This does not cause a problem in actual use, because the current command BD ring is only initialized after FLR, and the initial values of next_to_use and next_to_clean are both 0. Therefore, this patch removes the initialization of PIR and CIR. Instead, next_to_use and next_to_clean are initialized by reading the values of PIR and CIR. Fixes: 4701073c3deb ("net: enetc: add initial netc-lib driver to support NTMP") Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20250926013954.2003456-1-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-15net/mlx5: fw reset, add reset timeout workMoshe Shemesh1-0/+24
[ Upstream commit 5cfbe7ebfa42fd3c517a701dab5bd73524da9088 ] Add sync reset timeout to stop poll_sync_reset in case there was no reset done or abort event within timeout. Otherwise poll sync reset will just continue and in case of fw fatal error no health reporting will be done. Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-15net/mlx5: pagealloc: Fix reclaim race during command interface teardownShay Drory1-2/+5
[ Upstream commit 79a0e32b32ac4e4f9e4bb22be97f371c8c116c88 ] The reclaim_pages_cmd() function sends a command to the firmware to reclaim pages if the command interface is active. A race condition can occur if the command interface goes down (e.g., due to a PCI error) while the mlx5_cmd_do() call is in flight. In this case, mlx5_cmd_do() will return an error. The original code would propagate this error immediately, bypassing the software-based page reclamation logic that is supposed to run when the command interface is down. Fix this by checking whether mlx5_cmd_do() returns -ENXIO, which mark that command interface is down. If this is the case, fall through to the software reclamation path. If the command failed for any another reason, or finished successfully, return as before. Fixes: b898ce7bccf1 ("net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-15net/mlx5: Stop polling for command response if interface goes downMoshe Shemesh1-1/+5
[ Upstream commit b1f0349bd6d320c382df2e7f6fc2ac95c85f2b18 ] Stop polling on firmware response to command in polling mode if the command interface got down. This situation can occur, for example, if a firmware fatal error is detected during polling. This change halts the polling process when the command interface goes down, preventing unnecessary waits. Fixes: b898ce7bccf1 ("net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-15net: dlink: handle copy_thresh allocation failureYeounsu Moon1-2/+5
[ Upstream commit 8169a6011c5fecc6cb1c3654c541c567d3318de8 ] The driver did not handle failure of `netdev_alloc_skb_ip_align()`. If the allocation failed, dereferencing `skb->protocol` could lead to a NULL pointer dereference. This patch tries to allocate `skb`. If the allocation fails, it falls back to the normal path. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Suggested-by: Jakub Kicinski <kuba@kernel.org> Tested-on: D-Link DGE-550T Rev-A3 Signed-off-by: Yeounsu Moon <yyyynoom@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250928190124.1156-1-yyyynoom@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-15net: ena: return 0 in ena_get_rxfh_key_size() when RSS hash key is not configurableKohei Enju1-1/+4
[ Upstream commit f017156aea60db8720e47591ed1e041993381ad2 ] In EC2 instances where the RSS hash key is not configurable, ethtool shows bogus RSS hash key since ena_get_rxfh_key_size() unconditionally returns ENA_HASH_KEY_SIZE. Commit 6a4f7dc82d1e ("net: ena: rss: do not allocate key when not supported") added proper handling for devices that don't support RSS hash key configuration, but ena_get_rxfh_key_size() has been unchanged. When the RSS hash key is not configurable, return 0 instead of ENA_HASH_KEY_SIZE to clarify getting the value is not supported. Tested on m5 instance families. Without patch: # ethtool -x ens5 | grep -A 1 "RSS hash key" RSS hash key: 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 With patch: # ethtool -x ens5 | grep -A 1 "RSS hash key" RSS hash key: Operation not supported Fixes: 6a4f7dc82d1e ("net: ena: rss: do not allocate key when not supported") Signed-off-by: Kohei Enju <enjuk@amazon.com> Link: https://patch.msgid.link/20250929050247.51680-1-enjuk@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-15nfp: fix RSS hash key size when RSS is not supportedKohei Enju1-1/+1
[ Upstream commit 8425161ac1204d2185e0a10f5ae652bae75d2451 ] The nfp_net_get_rxfh_key_size() function returns -EOPNOTSUPP when devices don't support RSS, and callers treat the negative value as a large positive value since the return type is u32. Return 0 when devices don't support RSS, aligning with the ethtool interface .get_rxfh_key_size() that requires returning 0 in such cases. Fixes: 9ff304bfaf58 ("nfp: add support for reporting CRC32 hash function") Signed-off-by: Kohei Enju <enjuk@amazon.com> Link: https://patch.msgid.link/20250929054230.68120-1-enjuk@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-15idpf: fix mismatched free function for dma_alloc_coherentAlok Tiwari1-3/+3
[ Upstream commit b9bd25f47eb79c9eb275e3d9ac3983dc88577dd4 ] The mailbox receive path allocates coherent DMA memory with dma_alloc_coherent(), but frees it with dmam_free_coherent(). This is incorrect since dmam_free_coherent() is only valid for buffers allocated with dmam_alloc_coherent(). Fix the mismatch by using dma_free_coherent() instead of dmam_free_coherent Fixes: e54232da1238 ("idpf: refactor idpf_recv_mb_msg") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Link: https://patch.msgid.link/20250925180212.415093-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-15net: macb: single dma_alloc_coherent() for DMA descriptorsThéo Lebrun1-39/+41
[ Upstream commit 78d901897b3cae06b38f54e48a2378cf9da21175 ] Move from 2*NUM_QUEUES dma_alloc_coherent() for DMA descriptor rings to 2 calls overall. Issue is with how all queues share the same register for configuring the upper 32-bits of Tx/Rx descriptor rings. Taking Tx, notice how TBQPH does *not* depend on the queue index: #define GEM_TBQP(hw_q) (0x0440 + ((hw_q) << 2)) #define GEM_TBQPH(hw_q) (0x04C8) queue_writel(queue, TBQP, lower_32_bits(queue->tx_ring_dma)); #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT if (bp->hw_dma_cap & HW_DMA_CAP_64B) queue_writel(queue, TBQPH, upper_32_bits(queue->tx_ring_dma)); #endif To maximise our chances of getting valid DMA addresses, we do a single dma_alloc_coherent() across queues. This improves the odds because alloc_pages() guarantees natural alignment. Other codepaths (IOMMU or dev/arch dma_map_ops) don't give high enough guarantees (even page-aligned isn't enough). Two consideration: - dma_alloc_coherent() gives us page alignment. Here we remove this constraint meaning each queue's ring won't be page-aligned anymore. - This can save some tiny amounts of memory. Fewer allocations means (1) less overhead (constant cost per alloc) and (2) less wasted bytes due to alignment constraints. Example for (2): 4 queues, default ring size (512), 64-bit DMA descriptors, 16K pages: - Before: 8 allocs of 8K, each rounded to 16K => 64K wasted. - After: 2 allocs of 32K => 0K wasted. Fixes: 02c958dd3446 ("net/macb: add TX multiqueue support for gem") Reviewed-by: Sean Anderson <sean.anderson@linux.dev> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Tested-by: Nicolas Ferre <nicolas.ferre@microchip.com> # on sam9x75 Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250923-macb-fixes-v6-4-772d655cdeb6@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-15net: macb: move ring size computation to functionsThéo Lebrun1-11/+16
[ Upstream commit 92d4256fafd8d0a14d3aaa10452ac771bf9b597c ] The tx/rx ring size calculation is somewhat complex and partially hidden behind a macro. Move that out of the {RX,TX}_RING_BYTES() macros and macb_{alloc,free}_consistent() functions into neat separate functions. In macb_free_consistent(), we drop the size variable and directly call the size helpers in the arguments list. In macb_alloc_consistent(), we keep the size variable that is used by netdev_dbg() calls. Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250923-macb-fixes-v6-3-772d655cdeb6@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 78d901897b3c ("net: macb: single dma_alloc_coherent() for DMA descriptors") Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-15net: macb: remove illusion about TBQPH/RBQPH being per-queueThéo Lebrun2-37/+24
[ Upstream commit fca3dc859b200ca4dcdd2124beaf3bb2ab80b0f7 ] The MACB driver acts as if TBQPH/RBQPH are configurable on a per queue basis; this is a lie. A single register configures the upper 32 bits of each DMA descriptor buffers for all queues. Concrete actions: - Drop GEM_TBQPH/GEM_RBQPH macros which have a queue index argument. Only use MACB_TBQPH/MACB_RBQPH constants. - Drop struct macb_queue->TBQPH/RBQPH fields. - In macb_init_buffers(): do a single write to TBQPH and RBQPH for all queues instead of a write per queue. - In macb_tx_error_task(): drop the write to TBQPH. - In macb_alloc_consistent(): if allocations give different upper 32-bits, fail. Previously, it would have lead to silent memory corruption as queues would have used the upper 32 bits of the alloc from queue 0 and their own low 32 bits. - In macb_suspend(): if we use the tie off descriptor for suspend, do the write once for all queues instead of once per queue. Fixes: fff8019a08b6 ("net: macb: Add 64 bit addressing support for GEM") Fixes: ae1f2a56d273 ("net: macb: Added support for many RX queues") Reviewed-by: Sean Anderson <sean.anderson@linux.dev> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250923-macb-fixes-v6-2-772d655cdeb6@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-15net: enetc: Fix probing error message typo for the ENETCv4 PF driverClaudiu Manoil1-1/+1
[ Upstream commit c35cf24a69b00b7f54f2f19838f2b82d54480b0f ] Blamed commit wrongly indicates VF error in case of PF probing error. Fixes: 99100d0d9922 ("net: enetc: add preliminary support for i.MX95 ENETC PF") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250924082755.1984798-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-15idpf: fix Rx descriptor ready check barrier in splitqAlexander Lobakin1-6/+2
[ Upstream commit c20edbacc0295fd36f5f634b3421647ce3e08fd7 ] No idea what the current barrier position was meant for. At that point, nothing is read from the descriptor, only the pointer to the actual one is fetched. The correct barrier usage here is after the generation check, so that only the first qword is read if the descriptor is not yet ready and we need to stop polling. Debatable on coherent DMA as the Rx descriptor size is <= cacheline size, but anyway, the current barrier position only makes the codegen worse. Fixes: 3a8845af66ed ("idpf: add RX splitq napi poll support") Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Ramu R <ramu.r@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-09-25octeontx2-pf: Fix potential use after free in otx2_tc_add_flow()Dan Carpenter1-1/+1
This code calls kfree_rcu(new_node, rcu) and then dereferences "new_node" and then dereferences it on the next line. Two lines later, we take a mutex so I don't think this is an RCU safe region. Re-order it to do the dereferences before queuing up the free. Fixes: 68fbff68dbea ("octeontx2-pf: Add police action for TC flower") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/aNKCL1jKwK8GRJHh@stanley.mountain Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-24libie: fix string names for AQ error codesJacob Keller1-1/+1
The LIBIE_AQ_STR macro() introduced by commit 5feaa7a07b85 ("libie: add adminq helper for converting err to str") is used in order to generate strings for printing human readable error codes. Its definition is missing the separating underscore ('_') character which makes the resulting strings difficult to read. Additionally, the string won't match the source code, preventing search tools from working properly. Add the missing underscore character, fixing the error string names. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Fixes: 5feaa7a07b85 ("libie: add adminq helper for converting err to str") Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250923205657.846759-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-23net/mlx5e: Fix missing FEC RS stats for RS_544_514_INTERLEAVED_QUADCarolina Jubran1-0/+1
Include MLX5E_FEC_RS_544_514_INTERLEAVED_QUAD in the FEC RS stats handling. This addresses a gap introduced when adding support for 200G/lane link modes. Fixes: 4e343c11efbb ("net/mlx5e: Support FEC settings for 200G per lane link modes") Signed-off-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Yael Chemla <ychemla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1758525094-816583-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-23net/mlx5: HWS, ignore flow level for multi-dest tableYevgeny Kliteynik3-12/+6
When HWS creates multi-dest FW table and adds rules to forward to other tables, ignore the flow level enforcement in FW, because HWS is responsible for table levels. This fixes the following error: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:818:(pid 192306): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x6ae84c), err(-22) Fixes: 504e536d9010 ("net/mlx5: HWS, added actions handling") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1758525094-816583-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-23net/mlx5: fs, fix UAF in flow counter releaseMoshe Shemesh4-5/+31
Fix a kernel trace [1] caused by releasing an HWS action of a local flow counter in mlx5_cmd_hws_delete_fte(), where the HWS action refcount and mutex were not initialized and the counter struct could already be freed when deleting the rule. Fix it by adding the missing initializations and adding refcount for the local flow counter struct. [1] Kernel log: Call Trace: <TASK> dump_stack_lvl+0x34/0x48 mlx5_fs_put_hws_action.part.0.cold+0x21/0x94 [mlx5_core] mlx5_fc_put_hws_action+0x96/0xad [mlx5_core] mlx5_fs_destroy_fs_actions+0x8b/0x152 [mlx5_core] mlx5_cmd_hws_delete_fte+0x5a/0xa0 [mlx5_core] del_hw_fte+0x1ce/0x260 [mlx5_core] mlx5_del_flow_rules+0x12d/0x240 [mlx5_core] ? ttwu_queue_wakelist+0xf4/0x110 mlx5_ib_destroy_flow+0x103/0x1b0 [mlx5_ib] uverbs_free_flow+0x20/0x50 [ib_uverbs] destroy_hw_idr_uobject+0x1b/0x50 [ib_uverbs] uverbs_destroy_uobject+0x34/0x1a0 [ib_uverbs] uobj_destroy+0x3c/0x80 [ib_uverbs] ib_uverbs_run_method+0x23e/0x360 [ib_uverbs] ? uverbs_finalize_object+0x60/0x60 [ib_uverbs] ib_uverbs_cmd_verbs+0x14f/0x2c0 [ib_uverbs] ? do_tty_write+0x1a9/0x270 ? file_tty_write.constprop.0+0x98/0xc0 ? new_sync_write+0xfc/0x190 ib_uverbs_ioctl+0xd7/0x160 [ib_uverbs] __x64_sys_ioctl+0x87/0xc0 do_syscall_64+0x59/0x90 Fixes: b581f4266928 ("net/mlx5: fs, manage flow counters HWS action sharing by refcount") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1758525094-816583-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-22bnxt_en: correct offset handling for IPv6 destination addressAlok Tiwari1-1/+1
In bnxt_tc_parse_pedit(), the code incorrectly writes IPv6 destination values to the source address field (saddr) when processing pedit offsets within the destination address range. This patch corrects the assignment to use daddr instead of saddr, ensuring that pedit operations on IPv6 destination addresses are applied correctly. Fixes: 9b9eb518e338 ("bnxt_en: Add support for NAT(L3/L4 rewrite)") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Link: https://patch.msgid.link/20250920121157.351921-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-22Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queueJakub Kicinski4-52/+90
Tony Nguyen says: ==================== i40e: virtchnl improvements Przemek Kitszel says: Improvements hardening PF-VF communication for i40e driver. This patchset targets several issues that can cause undefined behavior or be exploited in some other way. * '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: i40e: improve VF MAC filters accounting i40e: add mask to apply valid bits for itr_idx i40e: add max boundary check for VF filters i40e: fix validation of VF state in get resources i40e: fix input validation logic for action_meta i40e: fix idx validation in config queues msg i40e: fix idx validation in i40e_validate_queue_map i40e: add validation for ring_len param ==================== Link: https://patch.msgid.link/20250919184959.656681-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-19ethernet: rvu-af: Remove slash from the driver namePetr Malat1-2/+1
Having a slash in the driver name leads to EIO being returned while reading /sys/module/rvu_af/drivers content. Remove DRV_STRING as it's not used anywhere. Fixes: 91c6945ea1f9 ("octeontx2-af: cn10k: Add RPM MAC support") Signed-off-by: Petr Malat <oss@malat.biz> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250918152106.1798299-1-oss@malat.biz Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-18i40e: improve VF MAC filters accountingLukasz Czapnik3-44/+50
When adding new VM MAC, driver checks only *active* filters in vsi->mac_filter_hash. Each MAC, even in non-active state is using resources. To determine number of MACs VM uses, count VSI filters in *any* state. Add i40e_count_all_filters() to simply count all filters, and rename i40e_count_filters() to i40e_count_active_filters() to avoid ambiguity. Fixes: cfb1d572c986 ("i40e: Add ensurance of MacVlan resources for every trusted VF") Cc: stable@vger.kernel.org Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-18i40e: add mask to apply valid bits for itr_idxLukasz Czapnik1-1/+1
The ITR index (itr_idx) is only 2 bits wide. When constructing the register value for QINT_RQCTL, all fields are ORed together. Without masking, higher bits from itr_idx may overwrite adjacent fields in the register. Apply I40E_QINT_RQCTL_ITR_INDX_MASK to ensure only the intended bits are set. Fixes: 5c3c48ac6bf5 ("i40e: implement virtual device interface") Cc: stable@vger.kernel.org Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-18i40e: add max boundary check for VF filtersLukasz Czapnik1-0/+10
There is no check for max filters that VF can request. Add it. Fixes: e284fc280473 ("i40e: Add and delete cloud filter") Cc: stable@vger.kernel.org Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-18i40e: fix validation of VF state in get resourcesLukasz Czapnik2-2/+8
VF state I40E_VF_STATE_ACTIVE is not the only state in which VF is actually active so it should not be used to determine if a VF is allowed to obtain resources. Use I40E_VF_STATE_RESOURCES_LOADED that is set only in i40e_vc_get_vf_resources_msg() and cleared during reset. Fixes: 61125b8be85d ("i40e: Fix failed opcode appearing if handling messages from VF") Cc: stable@vger.kernel.org Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-18i40e: fix input validation logic for action_metaLukasz Czapnik1-1/+1
Fix condition to check 'greater or equal' to prevent OOB dereference. Fixes: e284fc280473 ("i40e: Add and delete cloud filter") Cc: stable@vger.kernel.org Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-09-18i40e: fix idx validation in config queues msgLukasz Czapnik1-2/+2
Ensure idx is within range of active/initialized TCs when iterating over vf->ch[idx] in i40e_vc_config_queues_msg(). Fixes: c27eac48160d ("i40e: Enable ADq and create queue channel/s on VF") Cc: stable@vger.kernel.org Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Kamakshi Nellore <nellorex.kamakshi@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>