linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2022-07-18	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue	Jakub Kicinski	4	-11/+15
	Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-07-15 This series contains updates to ice driver only. Ani updates feature restriction for devices that don't support external time stamping. Zhuo Chen removes unnecessary call to pci_aer_clear_nonfatal_status(). * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Remove pci_aer_clear_nonfatal_status() call ice: Add EXTTS feature to the feature bitmap ==================== Link: https://lore.kernel.org/r/20220715214642.2968799-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18	net: macb: fixup sparse warnings on __be16 ports	Ben Dooks	1	-3/+4
	The port fields in the ethool flow structures are defined to be __be16 types, so sparse is showing issues where these are being passed to htons(). Fix these warnings by passing them to be16_to_cpu() instead. These are being used in netdev_dbg() so should only effect anyone doing debug. Fixes the following sparse warnings: drivers/net/ethernet/cadence/macb_main.c:3366:9: warning: cast from restricted __be16 drivers/net/ethernet/cadence/macb_main.c:3366:9: warning: cast from restricted __be16 drivers/net/ethernet/cadence/macb_main.c:3366:9: warning: cast from restricted __be16 drivers/net/ethernet/cadence/macb_main.c:3419:25: warning: cast from restricted __be16 drivers/net/ethernet/cadence/macb_main.c:3419:25: warning: cast from restricted __be16 drivers/net/ethernet/cadence/macb_main.c:3419:25: warning: cast from restricted __be16 drivers/net/ethernet/cadence/macb_main.c:3419:25: warning: cast from restricted __be16 Signed-off-by: Ben Dooks <ben.dooks@sifive.com> Link: https://lore.kernel.org/r/20220715173009.526126-1-ben.dooks@sifive.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18	net: prestera: acl: fix code formatting	Maksym Glubokiy	1	-4/+4
	Make the code look better. Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu> Link: https://lore.kernel.org/r/20220715103806.7108-1-maksym.glubokiy@plvision.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18	net: stmmac: remove redunctant disable xPCS EEE call	Wong Vee Khee	1	-8/+0
	Disable is done in stmmac_init_eee() on the event of MAC link down. Since setting enable/disable EEE via ethtool will eventually trigger a MAC down, removing this redunctant call in stmmac_ethtool.c to avoid calling xpcs_config_eee() twice. Fixes: d4aeaed80b0e ("net: stmmac: trigger PCS EEE to turn off on link down") Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> Link: https://lore.kernel.org/r/20220715122402.1017470-1-vee.khee.wong@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18	ixgbe: Add locking to prevent panic when setting sriov_numvfs to zero	Piotr Skajewski	3	-0/+10
	It is possible to disable VFs while the PF driver is processing requests from the VF driver. This can result in a panic. BUG: unable to handle kernel paging request at 000000000000106c PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 8 PID: 0 Comm: swapper/8 Kdump: loaded Tainted: G I --------- - Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS 2.8.2 08/27/2020 RIP: 0010:ixgbe_msg_task+0x4c8/0x1690 [ixgbe] Code: 00 00 48 8d 04 40 48 c1 e0 05 89 7c 24 24 89 fd 48 89 44 24 10 83 ff 01 0f 84 b8 04 00 00 4c 8b 64 24 10 4d 03 a5 48 22 00 00 <41> 80 7c 24 4c 00 0f 84 8a 03 00 00 0f b7 c7 83 f8 08 0f 84 8f 0a RSP: 0018:ffffb337869f8df8 EFLAGS: 00010002 RAX: 0000000000001020 RBX: 0000000000000000 RCX: 000000000000002b RDX: 0000000000000002 RSI: 0000000000000008 RDI: 0000000000000006 RBP: 0000000000000006 R08: 0000000000000002 R09: 0000000000029780 R10: 00006957d8f42832 R11: 0000000000000000 R12: 0000000000001020 R13: ffff8a00e8978ac0 R14: 000000000000002b R15: ffff8a00e8979c80 FS: 0000000000000000(0000) GS:ffff8a07dfd00000(0000) knlGS:00000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000106c CR3: 0000000063e10004 CR4: 00000000007726e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> ? ttwu_do_wakeup+0x19/0x140 ? try_to_wake_up+0x1cd/0x550 ? ixgbevf_update_xcast_mode+0x71/0xc0 [ixgbevf] ixgbe_msix_other+0x17e/0x310 [ixgbe] __handle_irq_event_percpu+0x40/0x180 handle_irq_event_percpu+0x30/0x80 handle_irq_event+0x36/0x53 handle_edge_irq+0x82/0x190 handle_irq+0x1c/0x30 do_IRQ+0x49/0xd0 common_interrupt+0xf/0xf This can be eventually be reproduced with the following script: while : do echo 63 > /sys/class/net/<devname>/device/sriov_numvfs sleep 1 echo 0 > /sys/class/net/<devname>/device/sriov_numvfs sleep 1 done Add lock when disabling SR-IOV to prevent process VF mailbox communication. Fixes: d773d1310625 ("ixgbe: Fix memory leak when SR-IOV VFs are direct assigned") Signed-off-by: Piotr Skajewski <piotrx.skajewski@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220715214456.2968711-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18	i40e: Fix erroneous adapter reinitialization during recovery process	Dawid Lukwinski	1	-8/+5
	Fix an issue when driver incorrectly detects state of recovery process and erroneously reinitializes interrupts, which results in a kernel error and call trace message. The issue was caused by a combination of two factors: 1. Assuming the EMP reset issued after completing firmware recovery means the whole recovery process is complete. 2. Erroneous reinitialization of interrupt vector after detecting the above mentioned EMP reset. Fixes (1) by changing how recovery state change is detected and (2) by adjusting the conditional expression to ensure using proper interrupt reinitialization method, depending on the situation. Fixes: 4ff0ee1af016 ("i40e: Introduce recovery mode support") Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220715214542.2968762-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18	net: ethernet: mtk_eth_soc: fix off by one check of ARRAY_SIZE	Tom Rix	1	-1/+1
	In mtk_wed_tx_ring_setup(.., int idx, ..), idx is used as an index here struct mtk_wed_ring *ring = &dev->tx_ring[idx]; The bounds of idx are checked here BUG_ON(idx > ARRAY_SIZE(dev->tx_ring)); If idx is the size of the array, it will pass this check and overflow. So change the check to >= . Fixes: 804775dfc288 ("net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)") Signed-off-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20220716214654.1540240-1-trix@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18	mlxsw: convert driver to use unlocked devlink API during init/fini	Jiri Pirko	10	-241/+248
	Prepare for devlink reload being called with devlink->lock held and convert the mlxsw driver to use unlocked devlink API during init and fini flows. Take devl_lock() in reload_down() and reload_up() ops in the meantime before reload cmd is converted to take the lock itself. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18	net: lan966x: Fix usage of lan966x->mac_lock when used by FDB	Horatiu Vultur	1	-11/+23
	When the SW bridge was trying to add/remove entries to/from HW, the access to HW was not protected by any lock. In this way, it was possible to have race conditions. Fix this by using the lan966x->mac_lock to protect parallel access to HW for this cases. Fixes: 25ee9561ec622 ("net: lan966x: More MAC table functionality") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18	net: lan966x: Fix usage of lan966x->mac_lock inside lan966x_mac_irq_handler	Horatiu Vultur	1	-7/+12
	The problem with this spin lock is that it was just protecting the list of the MAC entries in SW and not also the access to the MAC entries in HW. Because the access to HW is indirect, then it could happen to have race conditions. For example when SW introduced an entry in MAC table and the irq mac is trying to read something from the MAC. Update such that also the access to MAC entries in HW is protected by this lock. Fixes: 5ccd66e01cbef ("net: lan966x: add support for interrupts from analyzer") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18	net: lan966x: Fix usage of lan966x->mac_lock when entry is removed	Horatiu Vultur	1	-4/+20
	To remove an entry to the MAC table, it is required first to setup the entry and then issue a command for the MAC to forget the entry. So if it happens for two threads to remove simultaneously an entry in MAC table then it would be a race condition. Fix this by using lan966x->mac_lock to protect the HW access. Fixes: e18aba8941b40 ("net: lan966x: add mactable support") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18	net: lan966x: Fix usage of lan966x->mac_lock when entry is added	Horatiu Vultur	1	-1/+7
	To add an entry to the MAC table, it is required first to setup the entry and then issue a command for the MAC to learn the entry. So if it happens for two threads to add simultaneously an entry in MAC table then it would be a race condition. Fix this by using lan966x->mac_lock to protect the HW access. Fixes: fc0c3fe7486f2 ("net: lan966x: Add function lan966x_mac_ip_learn()") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18	net: lan966x: Fix taking rtnl_lock while holding spin_lock	Horatiu Vultur	1	-9/+18
	When the HW deletes an entry in MAC table then it generates an interrupt. The SW will go through it's own list of MAC entries and if it is not found then it would notify the listeners about this. The problem is that when the SW will go through it's own list it would take a spin lock(lan966x->mac_lock) and when it notifies that the entry is deleted. But to notify the listeners it taking the rtnl_lock which is illegal. This is fixed by instead of notifying right away that the entry is deleted, move the entry on a temp list and once, it checks all the entries then just notify that the entries from temp list are deleted. Fixes: 5ccd66e01cbe ("net: lan966x: add support for interrupts from analyzer") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-18	iavf: Fix missing state logs	Przemyslaw Patynowski	1	-0/+4
	Fix debug prints, by adding missing state prints. Extend iavf_state_str by strings for __IAVF_INIT_EXTENDED_CAPS and __IAVF_INIT_CONFIG_ADAPTER. Without this patch, when enabling debug prints for iavf.h, user will see: iavf 0000:06:0e.0: state transition from:__IAVF_INIT_GET_RESOURCES to:__IAVF_UNKNOWN_STATE iavf 0000:06:0e.0: state transition from:__IAVF_UNKNOWN_STATE to:__IAVF_UNKNOWN_STATE Fixes: 605ca7c5c670 ("iavf: Fix kernel BUG in free_msi_irqs") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Jun Zhang <xuejun.zhang@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-18	iavf: Fix handling of dummy receive descriptors	Przemyslaw Patynowski	1	-3/+2
	Fix memory leak caused by not handling dummy receive descriptor properly. iavf_get_rx_buffer now sets the rx_buffer return value for dummy receive descriptors. Without this patch, when the hardware writes a dummy descriptor, iavf would not free the page allocated for the previous receive buffer. This is an unlikely event but can still happen. [Jesse: massaged commit message] Fixes: efa14c398582 ("iavf: allow null RX descriptors") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-18	iavf: Disallow changing rx/tx-frames and rx/tx-frames-irq	Przemyslaw Patynowski	4	-13/+1
	Remove from supported_coalesce_params ETHTOOL_COALESCE_MAX_FRAMES and ETHTOOL_COALESCE_MAX_FRAMES_IRQ. As tx-frames-irq allowed user to change budget for iavf_clean_tx_irq, remove work_limit and use define for budget. Without this patch there would be possibility to change rx/tx-frames and rx/tx-frames-irq, which for rx/tx-frames did nothing, while for rx/tx-frames-irq it changed rx/tx-frames and only changed budget for cleaning NAPI poll. Fixes: fbb7ddfef253 ("i40evf: core ethtool functionality") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Jun Zhang <xuejun.zhang@intel.com> Tested-by: Marek Szlosek <marek.szlosek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-18	iavf: Fix VLAN_V2 addition/rejection	Przemyslaw Patynowski	3	-10/+74
	Fix VLAN addition, so that PF driver does not reject whole VLAN batch. Add VLAN reject handling, so rejected VLANs, won't litter VLAN filter list. Fix handling of active_(c/s)vlans, so it will be possible to re-add VLAN filters for user. Without this patch, after changing trust to off, with VLAN filters saturated, no VLAN is added, due to PF rejecting addition. Fixes: 92fc50859872 ("iavf: Restrict maximum VLAN filters for VIRTCHNL_VF_OFFLOAD_VLAN_V2") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-18	igc: Remove forced_speed_duplex value	Sasha Neftin	1	-2/+0
	u8 forced_speed_duplex from value from igc_mac_info struct is not used. This patch comes to tidy up the driver code. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-18	igc: Remove MSI-X PBA Clear register	Sasha Neftin	1	-3/+0
	MSI-X PBA Clear register is not used. This patch comes to tidy up the driver code. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-18	igc: Lift TAPRIO schedule restriction	Kurt Kanzenbach	1	-6/+17
	Add support for Qbv schedules where one queue stays open in consecutive entries. Currently that's not supported. Example schedule: \|tc qdisc replace dev ${INTERFACE} handle 100 parent root taprio num_tc 3 \ \| map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \ \| queues 1@0 1@1 2@2 \ \| base-time ${BASETIME} \ \| sched-entry S 0x01 300000 \ # Stream High/Low \| sched-entry S 0x06 500000 \ # Management and Best Effort \| sched-entry S 0x04 200000 \ # Best Effort \| flags 0x02 Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-18	net: prestera: acl: use proper mask for port selector	Maksym Glubokiy	1	-3/+3
	Adjusted as per packet processor documentation. This allows to properly match 'indev' for clsact rules. Fixes: 47327e198d42 ("net: prestera: acl: migrate to new vTCAM api") Signed-off-by: Maksym Glubokiy <maksym.glubokiy@plvision.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18	atl1c: use netif_napi_add_tx() for Tx NAPI	Sieng-Piaw Liew	1	-2/+2
	Use netif_napi_add_tx() for NAPI in Tx direction instead of the regular netif_napi_add() function. Signed-off-by: Sieng-Piaw Liew <liew.s.piaw@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18	net: stmmac: fix dma queue left shift overflow issue	Junxiao Chang	1	-0/+3
	When queue number is > 4, left shift overflows due to 32 bits integer variable. Mask calculation is wrong for MTL_RXQ_DMA_MAP1. If CONFIG_UBSAN is enabled, kernel dumps below warning: [ 10.363842] ================================================================== [ 10.363882] UBSAN: shift-out-of-bounds in /build/linux-intel-iotg-5.15-8e6Tf4/ linux-intel-iotg-5.15-5.15.0/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c:224:12 [ 10.363929] shift exponent 40 is too large for 32-bit type 'unsigned int' [ 10.363953] CPU: 1 PID: 599 Comm: NetworkManager Not tainted 5.15.0-1003-intel-iotg [ 10.363956] Hardware name: ADLINK Technology Inc. LEC-EL/LEC-EL, BIOS 0.15.11 12/22/2021 [ 10.363958] Call Trace: [ 10.363960] <TASK> [ 10.363963] dump_stack_lvl+0x4a/0x5f [ 10.363971] dump_stack+0x10/0x12 [ 10.363974] ubsan_epilogue+0x9/0x45 [ 10.363976] __ubsan_handle_shift_out_of_bounds.cold+0x61/0x10e [ 10.363979] ? wake_up_klogd+0x4a/0x50 [ 10.363983] ? vprintk_emit+0x8f/0x240 [ 10.363986] dwmac4_map_mtl_dma.cold+0x42/0x91 [stmmac] [ 10.364001] stmmac_mtl_configuration+0x1ce/0x7a0 [stmmac] [ 10.364009] ? dwmac410_dma_init_channel+0x70/0x70 [stmmac] [ 10.364020] stmmac_hw_setup.cold+0xf/0xb14 [stmmac] [ 10.364030] ? page_pool_alloc_pages+0x4d/0x70 [ 10.364034] ? stmmac_clear_tx_descriptors+0x6e/0xe0 [stmmac] [ 10.364042] stmmac_open+0x39e/0x920 [stmmac] [ 10.364050] __dev_open+0xf0/0x1a0 [ 10.364054] __dev_change_flags+0x188/0x1f0 [ 10.364057] dev_change_flags+0x26/0x60 [ 10.364059] do_setlink+0x908/0xc40 [ 10.364062] ? do_setlink+0xb10/0xc40 [ 10.364064] ? __nla_validate_parse+0x4c/0x1a0 [ 10.364068] __rtnl_newlink+0x597/0xa10 [ 10.364072] ? __nla_reserve+0x41/0x50 [ 10.364074] ? __kmalloc_node_track_caller+0x1d0/0x4d0 [ 10.364079] ? pskb_expand_head+0x75/0x310 [ 10.364082] ? nla_reserve_64bit+0x21/0x40 [ 10.364086] ? skb_free_head+0x65/0x80 [ 10.364089] ? security_sock_rcv_skb+0x2c/0x50 [ 10.364094] ? __cond_resched+0x19/0x30 [ 10.364097] ? kmem_cache_alloc_trace+0x15a/0x420 [ 10.364100] rtnl_newlink+0x49/0x70 This change fixes MTL_RXQ_DMA_MAP1 mask issue and channel/queue mapping warning. Fixes: d43042f4da3e ("net: stmmac: mapping mtl rx to dma channel") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216195 Reported-by: Cedric Wassenaar <cedric@bytespeed.nl> Signed-off-by: Junxiao Chang <junxiao.chang@intel.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-18	net: stmmac: switch to use interrupt for hw crosstimestamping	Wong Vee Khee	6	-21/+29
	Using current implementation of polling mode, there is high chances we will hit into timeout error when running phc2sys. Hence, update the implementation of hardware crosstimestamping to use the MAC interrupt service routine instead of polling for TSIS bit in the MAC Timestamp Interrupt Status register to be set. Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-17	net/mlx5: fs, allow flow table creation with a UID	Mark Bloch	8	-15/+25
	Add UID field to flow table attributes to allow creating flow tables with a non default (zero) uid. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-17	net/mlx5: fs, expose flow table ID to users	Mark Bloch	1	-0/+6
	Expose the flow table ID to users. This will be used by downstream patches to allow creating steering rules that point to a flow table ID. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-15	Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue	Jakub Kicinski	6	-33/+11
	Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-07-14 This series contains updates to e1000e and igc drivers. Sasha re-enables GPT clock when exiting s0ix to prevent hardware unit hang and reverts a workaround for this issue on e1000e. Lennert Buytenhek restores checks for removed device while accessing registers to prevent NULL pointer dereferences for igc. * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: igc: Reinstate IGC_REMOVED logic and implement it properly Revert "e1000e: Fix possible HW unit hang after an s0ix exit" e1000e: Enable GPT clock before sending message to CSME ==================== Link: https://lore.kernel.org/r/20220714175857.933537-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-15	ice: Remove pci_aer_clear_nonfatal_status() call	Zhuo Chen	1	-6/+0
	After commit 62b36c3ea664 ("PCI/AER: Remove pci_cleanup_aer_uncorrect_error_status() calls"), calls to pci_cleanup_aer_uncorrect_error_status() have already been removed. But in commit 5995b6d0c6fc ("ice: Implement pci_error_handler ops") pci_cleanup_aer_uncorrect_error_status was used again, so remove it in this patch. Signed-off-by: Zhuo Chen <chenzhuo.1@bytedance.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Sen Wang <wangsen.harry@bytedance.com> Cc: Wenliang Wang <wangwenliang.1995@bytedance.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-15	ice: Add EXTTS feature to the feature bitmap	Anirudh Venkataramanan	3	-5/+15
	External time stamp sources are supported only on certain devices. Enforce the right support matrix by adding the ICE_F_PTP_EXTTS bit to the feature bitmap set. Co-developed-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-15	lib/bitmap: change type of bitmap_weight to unsigned long	Yury Norov	1	-1/+1
	bitmap_weight() doesn't return negative values, so change it's type to unsigned long. It may help compiler to generate better code and catch bugs. Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-07-15	net: stmmac: fix unbalanced ptp clock issue in suspend/resume flow	Biao Huang	2	-10/+15
	Current stmmac driver will prepare/enable ptp_ref clock in stmmac_init_tstamp_counter(). The stmmac_pltfr_noirq_suspend will disable it once in suspend flow. But in resume flow, stmmac_pltfr_noirq_resume --> stmmac_init_tstamp_counter stmmac_resume --> stmmac_hw_setup --> stmmac_init_ptp --> stmmac_init_tstamp_counter ptp_ref clock reference counter increases twice, which leads to unbalance ptp clock when resume back. Move ptp_ref clock prepare/enable out of stmmac_init_tstamp_counter to fix it. Fixes: 0735e639f129d ("net: stmmac: skip only stmmac_ptp_register when resume from suspend") Signed-off-by: Biao Huang <biao.huang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15	net: stmmac: fix pm runtime issue in stmmac_dvr_remove()	Biao Huang	1	-2/+3
	If netif is running when stmmac_dvr_remove is invoked, the unregister_netdev will call ndo_stop(stmmac_release) and vlan_kill_rx_filter(stmmac_vlan_rx_kill_vid). Currently, stmmac_dvr_remove() will disable pm runtime before unregister_netdev. When stmmac_vlan_rx_kill_vid is invoked, pm_runtime_resume_and_get in it returns EACCESS error number, and reports: dwmac-mediatek 11021000.ethernet eth0: stmmac_dvr_remove: removing driver dwmac-mediatek 11021000.ethernet eth0: FPE workqueue stop dwmac-mediatek 11021000.ethernet eth0: failed to kill vid 0081/0 Move the pm_runtime_disable to the end of stmmac_dvr_remove to fix this issue. Fixes: 6449520391dfc ("net: stmmac: properly handle with runtime pm in stmmac_dvr_remove()") Signed-off-by: Biao Huang <biao.huang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15	stmmac: dwmac-mediatek: fix clock issue	Biao Huang	1	-28/+21
	The pm_runtime takes care of the clock handling in current stmmac drivers, and dwmac-mediatek implement the mediatek_dwmac_clks_config() as the callback for pm_runtime. Then, stripping duplicated clocks handling in old init()/exit() to fix clock issue in suspend/resume test. As to clocks in probe/remove, vendor need symmetric handling to ensure clocks balance. Test pass, including suspend/resume and ko insertion/remove. Fixes: 3186bdad97d5 ("stmmac: dwmac-mediatek: add platform level clocks management") Signed-off-by: Biao Huang <biao.huang@mediatek.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15	octeontx2-af: Set NIX link credits based on max LMAC	Sunil Goutham	7	-9/+102
	When number of LMACs active on a CGX/RPM are 3, then current NIX link credit config based on per lmac fifo length which inturn is calculated as 'lmac_fifo_len = total_fifo_len / 3', is incorrect. In HW one of the LMAC gets half of the FIFO and rest gets 1/4th. Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Geetha Sowjanya <gakula@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15	octeontx2-af: Fixes static warnings	Ratheesh Kannoth	2	-15/+55
	Fixes smatch static tool warning reported by smatch tool. rvu_npc_hash.c:1232 rvu_npc_exact_del_table_entry_by_id() error: uninitialized symbol 'drop_mcam_idx'. rvu_npc_hash.c:1312 rvu_npc_exact_add_table_entry() error: uninitialized symbol 'drop_mcam_idx'. rvu_npc_hash.c:1391 rvu_npc_exact_update_table_entry() error: uninitialized symbol 'hash_index'. rvu_npc_hash.c:1428 rvu_npc_exact_promisc_disable() error: uninitialized symbol 'drop_mcam_idx'. rvu_npc_hash.c:1473 rvu_npc_exact_promisc_enable() error: uninitialized symbol 'drop_mcam_idx'. otx2_dmac_flt.c:191 otx2_dmacflt_update() error: 'rsp' dereferencing possible ERR_PTR() otx2_dmac_flt.c:60 otx2_dmacflt_add_pfmac() error: 'rsp' dereferencing possible ERR_PTR() Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15	ip: Fix data-races around sysctl_ip_fwd_update_priority.	Kuniyuki Iwashima	1	-1/+2
	While reading sysctl_ip_fwd_update_priority, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 432e05d32892 ("net: ipv4: Control SKB reprioritization after forwarding") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-15	ip: Fix data-races around sysctl_ip_default_ttl.	Kuniyuki Iwashima	1	-1/+1
	While reading sysctl_ip_default_ttl, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-14	Merge tag 'mlx5-updates-2022-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux	Jakub Kicinski	16	-112/+661
	Saeed Mahameed says: ==================== mlx5-updates-2022-07-13 1) Support 802.1ad for bridge offloads Vlad Buslov Says: ================= Current mlx5 bridge VLAN offload implementation only supports 802.1Q VLAN Ethernet protocol. That protocol type is assumed by default and SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL notification is ignored. In order to support dynamically setting VLAN protocol handle SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL notification by flushing FDB and re-creating VLAN modify header actions with a new protocol. Implement support for 802.1ad protocol by saving the current VLAN protocol to per-bridge variable and re-create the necessary flow groups according to its current value (either use cvlan or svlan flow fields). ================== 2) debugfs to count ongoing FW commands 3) debugfs to query eswitch vport firmware diagnostic counters 4) Add missing meter configuration in flow action 5) Some misc cleanup * tag 'mlx5-updates-2022-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Remove the duplicating check for striding RQ when enabling LRO net/mlx5e: Move the LRO-XSK check to mlx5e_fix_features net/mlx5e: Extend flower police validation net/mlx5e: configure meter in flow action net/mlx5e: Removed useless code in function net/mlx5: Bridge, implement QinQ support net/mlx5: Bridge, implement infrastructure for VLAN protocol change net/mlx5: Bridge, extract VLAN push/pop actions creation net/mlx5: Bridge, rename filter fg to vlan_filter net/mlx5: Bridge, refactor groups sizes and indices net/mlx5: debugfs, Add num of in-use FW command interface slots net/mlx5: Expose vnic diagnostic counters for eswitch managed vports net/mlx5: Use software VHCA id when it's supported net/mlx5: Introduce ifc bits for using software vhca id net/mlx5: Use the bitmap API to allocate bitmaps ==================== Link: https://lore.kernel.org/r/20220713225859.401241-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-14	net: devlink: make devlink_dpipe_headers_register() return void	Jiri Pirko	1	-4/+2
	The return value is not used, so change the return value type to void. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-14	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	34	-154/+350
	include/net/sock.h 310731e2f161 ("net: Fix data-races around sysctl_mem.") e70f3c701276 ("Revert "net: set SK_MEM_QUANTUM to 4096"") https://lore.kernel.org/all/20220711120211.7c8b7cba@canb.auug.org.au/ net/ipv4/fib_semantics.c 747c14307214 ("ip: fix dflt addr selection for connected nexthop") d62607c3fe45 ("net: rename reference+tracking helpers") net/tls/tls.h include/net/tls.h 3d8c51b25a23 ("net/tls: Check for errors in tls_device_init") 587903142308 ("tls: create an internal header") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-14	nfp: flower: configure tunnel neighbour on cmsg rx	Tianyu Yuan	1	-5/+13
	nfp_tun_write_neigh() function will configure a tunnel neighbour when calling nfp_tun_neigh_event_handler() or nfp_flower_cmsg_process_one_rx() (with no tunnel neighbour type) from firmware. When configuring IP on physical port as a tunnel endpoint, no operation will be performed after receiving the cmsg mentioned above. Therefore, add a progress to configure tunnel neighbour in this case. v2: Correct format of fixes tag. Fixes: f1df7956c11f ("nfp: flower: rework tunnel neighbour configuration") Signed-off-by: Tianyu Yuan <tianyu.yuan@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Reviewed-by: Baowen Zheng <baowen.zheng@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20220714081915.148378-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-14	igc: Reinstate IGC_REMOVED logic and implement it properly	Lennert Buytenhek	2	-1/+7
	The initially merged version of the igc driver code (via commit 146740f9abc4, "igc: Add support for PF") contained the following IGC_REMOVED checks in the igc_rd32/wr32() MMIO accessors: u32 igc_rd32(struct igc_hw hw, u32 reg) { u8 __iomem hw_addr = READ_ONCE(hw->hw_addr); u32 value = 0; if (IGC_REMOVED(hw_addr)) return ~value; value = readl(&hw_addr[reg]); /* reads should not return all F's / if (!(~value) && (!reg \|\| !(~readl(hw_addr)))) hw->hw_addr = NULL; return value; } And: #define wr32(reg, val) \ do { \ u8 __iomem hw_addr = READ_ONCE((hw)->hw_addr); \ if (!IGC_REMOVED(hw_addr)) \ writel((val), &hw_addr[(reg)]); \ } while (0) E.g. igb has similar checks in its MMIO accessors, and has a similar macro E1000_REMOVED, which is implemented as follows: #define E1000_REMOVED(h) unlikely(!(h)) These checks serve to detect and take note of an 0xffffffff MMIO read return from the device, which can be caused by a PCIe link flap or some other kind of PCI bus error, and to avoid performing MMIO reads and writes from that point onwards. However, the IGC_REMOVED macro was not originally implemented: #ifndef IGC_REMOVED #define IGC_REMOVED(a) (0) #endif /* IGC_REMOVED */ This led to the IGC_REMOVED logic to be removed entirely in a subsequent commit (commit 3c215fb18e70, "igc: remove IGC_REMOVED function"), with the rationale that such checks matter only for virtualization and that igc does not support virtualization -- but a PCIe device can become detached even without virtualization being in use, and without proper checks, a PCIe bus error affecting an igc adapter will lead to various NULL pointer dereferences, as the first access after the error will set hw->hw_addr to NULL, and subsequent accesses will blindly dereference this now-NULL pointer. This patch reinstates the IGC_REMOVED checks in igc_rd32/wr32(), and implements IGC_REMOVED the way it is done for igb, by checking for the unlikely() case of hw_addr being NULL. This change prevents the oopses seen when a PCIe link flap occurs on an igc adapter. Fixes: 146740f9abc4 ("igc: Add support for PF") Signed-off-by: Lennert Buytenhek <buytenh@arista.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-14	Revert "e1000e: Fix possible HW unit hang after an s0ix exit"	Sasha Neftin	4	-32/+0
	This reverts commit 1866aa0d0d6492bc2f8d22d0df49abaccf50cddd. Commit 1866aa0d0d64 ("e1000e: Fix possible HW unit hang after an s0ix exit") was a workaround for CSME problem to handle messages comes via H2ME mailbox. This problem has been fixed by patch "e1000e: Enable the GPT clock before sending message to the CSME". Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821 Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-14	e1000e: Enable GPT clock before sending message to CSME	Sasha Neftin	1	-0/+4
	On corporate (CSME) ADL systems, the Ethernet Controller may stop working ("HW unit hang") after exiting from the s0ix state. The reason is that CSME misses the message sent by the host. Enabling the dynamic GPT clock solves this problem. This clock is cleared upon HW initialization. Fixes: 3e55d231716e ("e1000e: Add handshake with the CSME to support S0ix") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214821 Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-14	net: atlantic: remove aq_nic_deinit() when resume	Chia-Lin Kao (AceLan)	1	-3/+0
	aq_nic_deinit() has been called while suspending, so we don't have to call it again on resume. Actually, call it again leads to another hang issue when resuming from S3. Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992345] Call Trace: Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992346] <TASK> Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992348] aq_nic_deinit+0xb4/0xd0 [atlantic] Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992356] aq_pm_thaw+0x7f/0x100 [atlantic] Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992362] pci_pm_resume+0x5c/0x90 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992366] ? pci_pm_thaw+0x80/0x80 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992368] dpm_run_callback+0x4e/0x120 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992371] device_resume+0xad/0x200 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992373] async_resume+0x1e/0x40 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992374] async_run_entry_fn+0x33/0x120 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992377] process_one_work+0x220/0x3c0 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992380] worker_thread+0x4d/0x3f0 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992382] ? process_one_work+0x3c0/0x3c0 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992384] kthread+0x12a/0x150 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992386] ? set_kthread_struct+0x40/0x40 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992387] ret_from_fork+0x22/0x30 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992391] </TASK> Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992392] ---[ end trace 1ec8c79604ed5e0d ]--- Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992394] PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110 Jul 8 03:09:44 u-Precision-7865-Tower kernel: [ 5910.992397] atlantic 0000:02:00.0: PM: failed to resume async: error -110 Fixes: 1809c30b6e5a ("net: atlantic: always deep reset on pm op, fixing up my null deref regression") Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com> Link: https://lore.kernel.org/r/20220713111224.1535938-2-acelan.kao@canonical.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-14	net: atlantic: remove deep parameter on suspend/resume functions	Chia-Lin Kao (AceLan)	1	-14/+10
	Below commit claims that atlantic NIC requires to reset the device on pm op, and had set the deep to true for all suspend/resume functions. commit 1809c30b6e5a ("net: atlantic: always deep reset on pm op, fixing up my null deref regression") So, we could remove deep parameter on suspend/resume functions without any functional change. Fixes: 1809c30b6e5a ("net: atlantic: always deep reset on pm op, fixing up my null deref regression") Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com> Link: https://lore.kernel.org/r/20220713111224.1535938-1-acelan.kao@canonical.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-14	sfc: fix kernel panic when creating VF	Íñigo Huguet	1	-0/+3
	When creating VFs a kernel panic can happen when calling to efx_ef10_try_update_nic_stats_vf. When releasing a DMA coherent buffer, sometimes, I don't know in what specific circumstances, it has to unmap memory with vunmap. It is disallowed to do that in IRQ context or with BH disabled. Otherwise, we hit this line in vunmap, causing the crash: BUG_ON(in_interrupt()); This patch reenables BH to release the buffer. Log messages when the bug is hit: kernel BUG at mm/vmalloc.c:2727! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 6 PID: 1462 Comm: NetworkManager Kdump: loaded Tainted: G I --------- --- 5.14.0-119.el9.x86_64 #1 Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS 2.8.2 08/27/2020 RIP: 0010:vunmap+0x2e/0x30 ...skip... Call Trace: __iommu_dma_free+0x96/0x100 efx_nic_free_buffer+0x2b/0x40 [sfc] efx_ef10_try_update_nic_stats_vf+0x14a/0x1c0 [sfc] efx_ef10_update_stats_vf+0x18/0x40 [sfc] efx_start_all+0x15e/0x1d0 [sfc] efx_net_open+0x5a/0xe0 [sfc] __dev_open+0xe7/0x1a0 __dev_change_flags+0x1d7/0x240 dev_change_flags+0x21/0x60 ...skip... Fixes: d778819609a2 ("sfc: DMA the VF stats only when requested") Reported-by: Ma Yuying <yuma@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Link: https://lore.kernel.org/r/20220713092116.21238-1-ihuguet@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-13	octeontx2-af: Limit link bringup time at firmware	Hariprasad Kelam	4	-5/+15
	Set the maximum time firmware should poll for a link. If not set firmware could block CPU for a long time resulting in mailbox failures. If link doesn't come up within 1second, firmware will anyway notify the status as and when LINK comes up Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Geetha Sowjanya <gakula@marvell.com> Link: https://lore.kernel.org/r/20220712161815.12621-1-gakula@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-13	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue	Jakub Kicinski	4	-21/+136
	Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-07-12 This series contains updates to ice driver only. Paul fixes detection of E822 devices for firmware update and changes NVM read for snapshot creation to be done in chunks as some systems cannot read the entire NVM in the allotted time. ==================== Link: https://lore.kernel.org/r/20220712164829.7275-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-13	sfc: fix use after free when disabling sriov	Íñigo Huguet	1	-3/+7
	Use after free is detected by kfence when disabling sriov. What was read after being freed was vf->pci_dev: it was freed from pci_disable_sriov and later read in efx_ef10_sriov_free_vf_vports, called from efx_ef10_sriov_free_vf_vswitching. Set the pointer to NULL at release time to not trying to read it later. Reproducer and dmesg log (note that kfence doesn't detect it every time): $ echo 1 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs $ echo 0 > /sys/class/net/enp65s0f0np0/device/sriov_numvfs BUG: KFENCE: use-after-free read in efx_ef10_sriov_free_vf_vswitching+0x82/0x170 [sfc] Use-after-free read at 0x00000000ff3c1ba5 (in kfence-#224): efx_ef10_sriov_free_vf_vswitching+0x82/0x170 [sfc] efx_ef10_pci_sriov_disable+0x38/0x70 [sfc] efx_pci_sriov_configure+0x24/0x40 [sfc] sriov_numvfs_store+0xfe/0x140 kernfs_fop_write_iter+0x11c/0x1b0 new_sync_write+0x11f/0x1b0 vfs_write+0x1eb/0x280 ksys_write+0x5f/0xe0 do_syscall_64+0x5c/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae kfence-#224: 0x00000000edb8ef95-0x00000000671f5ce1, size=2792, cache=kmalloc-4k allocated by task 6771 on cpu 10 at 3137.860196s: pci_alloc_dev+0x21/0x60 pci_iov_add_virtfn+0x2a2/0x320 sriov_enable+0x212/0x3e0 efx_ef10_sriov_configure+0x67/0x80 [sfc] efx_pci_sriov_configure+0x24/0x40 [sfc] sriov_numvfs_store+0xba/0x140 kernfs_fop_write_iter+0x11c/0x1b0 new_sync_write+0x11f/0x1b0 vfs_write+0x1eb/0x280 ksys_write+0x5f/0xe0 do_syscall_64+0x5c/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae freed by task 6771 on cpu 12 at 3170.991309s: device_release+0x34/0x90 kobject_cleanup+0x3a/0x130 pci_iov_remove_virtfn+0xd9/0x120 sriov_disable+0x30/0xe0 efx_ef10_pci_sriov_disable+0x57/0x70 [sfc] efx_pci_sriov_configure+0x24/0x40 [sfc] sriov_numvfs_store+0xfe/0x140 kernfs_fop_write_iter+0x11c/0x1b0 new_sync_write+0x11f/0x1b0 vfs_write+0x1eb/0x280 ksys_write+0x5f/0xe0 do_syscall_64+0x5c/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 3c5eb87605e85 ("sfc: create vports for VFs and assign random MAC addresses") Reported-by: Yanghang Liu <yanghliu@redhat.com> Signed-off-by: Íñigo Huguet <ihuguet@redhat.com> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/20220712062642.6915-1-ihuguet@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>