aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/intel/ice/ice_lib.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-04-11ice: receive LLDP on trusted VFsMateusz Pacuszka1-6/+42
When a trusted VF tries to configure an LLDP multicast address, configure a rule that would mirror the traffic to this VF, untrusted VFs are not allowed to receive LLDP at all, so the request to add LLDP MAC address will always fail for them. Add a forwarding LLDP filter to a trusted VF when it tries to add an LLDP multicast MAC address. The MAC address has to be added after enabling trust (through restarting the LLDP service). Signed-off-by: Mateusz Pacuszka <mateuszx.pacuszka@intel.com> Co-developed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-04-11ice: do not add LLDP-specific filter if not necessaryLarysa Zaremba1-7/+16
Commit 34295a3696fb ("ice: implement new LLDP filter command") introduced the ability to use LLDP-specific filter that directs all LLDP traffic to a single VSI. However, current goal is for all trusted VFs to be able to see LLDP neighbors, which is impossible to do with the special filter. Make using the generic filter the default choice and fall back to special one only if a generic filter cannot be added. That way setups with "NVMs where an already existent LLDP filter is blocking the creation of a filter to allow LLDP packets" will still be able to configure software Rx LLDP on PF only, while all other setups would be able to forward them to VFs too. Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-03-18ice: Add E830 checksum offload supportPaul Greenwalt1-1/+7
E830 supports raw receive and generic transmit checksum offloads. Raw receive checksum support is provided by hardware calculating the checksum over the whole packet, regardless of type. The calculated checksum is provided to driver in the Rx flex descriptor. Then the driver assigns the checksum to skb->csum and sets skb->ip_summed to CHECKSUM_COMPLETE. Generic transmit checksum support is provided by hardware calculating the checksum given two offsets: the start offset to begin checksum calculation, and the offset to insert the calculated checksum in the packet. Support is advertised to the stack using NETIF_F_HW_CSUM feature. E830 has the following limitations when both generic transmit checksum offload and TCP Segmentation Offload (TSO) are enabled: 1. Inner packet header modification is not supported. This restriction includes the inability to alter TCP flags, such as the push flag. As a result, this limitation can impact the receiver's ability to coalesce packets, potentially degrading network throughput. 2. The Maximum Segment Size (MSS) is limited to 1023 bytes, which prevents support of Maximum Transmission Unit (MTU) greater than 1063 bytes. Therefore NETIF_F_HW_CSUM and NETIF_F_ALL_TSO features are mutually exclusive. NETIF_F_HW_CSUM hardware feature support is indicated but is not enabled by default. Instead, IP checksums and NETIF_F_ALL_TSO are the defaults. Enforcement of mutual exclusivity of NETIF_F_HW_CSUM and NETIF_F_ALL_TSO is done in ice_set_features(). Mutual exclusivity of IP checksums and NETIF_F_HW_CSUM is handled by netdev_fix_features(). When NETIF_F_HW_CSUM is requested the provided skb->csum_start and skb->csum_offset are passed to hardware in the Tx context descriptor generic checksum (GCS) parameters. Hardware calculates the 1's complement from skb->csum_start to the end of the packet, and inserts the result in the packet at skb->csum_offset. Co-developed-by: Alice Michael <alice.michael@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Co-developed-by: Eric Joyner <eric.joyner@intel.com> Signed-off-by: Eric Joyner <eric.joyner@intel.com> Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20250310174502.3708121-2-anthony.l.nguyen@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni1-18/+0
Cross-merge networking fixes after downstream PR (net-6.14-rc6). Conflicts: tools/testing/selftests/drivers/net/ping.py 75cc19c8ff89 ("selftests: drv-net: add xdp cases for ping.py") de94e8697405 ("selftests: drv-net: store addresses in dict indexed by ipver") https://lore.kernel.org/netdev/20250311115758.17a1d414@canb.auug.org.au/ net/core/devmem.c a70f891e0fa0 ("net: devmem: do not WARN conditionally after netdev_rx_queue_restart()") 1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations") https://lore.kernel.org/netdev/20250313114929.43744df1@canb.auug.org.au/ Adjacent changes: tools/testing/selftests/net/Makefile 6f50175ccad4 ("selftests: Add IPv6 link-local address generation tests for GRE devices.") 2e5584e0f913 ("selftests/net: expand cmsg_ipv6.sh with ipv4") drivers/net/ethernet/broadcom/bnxt/bnxt.c 661958552eda ("eth: bnxt: do not use BNXT_VNIC_NTUPLE unconditionally in queue restart logic") fe96d717d38e ("bnxt_en: Extend queue stop/start for TX rings") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-05ice: do not configure destination override for switchdevLarysa Zaremba1-18/+0
After switchdev is enabled and disabled later, LLDP packets sending stops, despite working perfectly fine before and during switchdev state. To reproduce (creating/destroying VF is what triggers the reconfiguration): devlink dev eswitch set pci/<address> mode switchdev echo '2' > /sys/class/net/<ifname>/device/sriov_numvfs echo '0' > /sys/class/net/<ifname>/device/sriov_numvfs This happens because LLDP relies on the destination override functionality. It needs to 1) set a flag in the descriptor, 2) set the VSI permission to make it valid. The permissions are set when the PF VSI is first configured, but switchdev then enables it for the uplink VSI (which is always the PF) once more when configured and disables when deconfigured, which leads to software-generated LLDP packets being blocked. Do not modify the destination override permissions when configuring switchdev, as the enabled state is the default configuration that is never modified. Fixes: 1a1c40df2e80 ("ice: set and release switchdev environment") Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-26ice: use napi's irq affinity and rmap IRQ notifiersAhmed Zaki1-7/+0
Delete the driver CPU affinity and aRFS rmap info, use the core's API instead. Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Link: https://patch.msgid.link/20250224232228.990783-5-ahmed.zaki@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-26ice: clear NAPI's IRQ numbers in ice_vsi_clear_napi_queues()Ahmed Zaki1-1/+8
We set the NAPI's IRQ number in ice_vsi_set_napi_queues(). Clear the NAPI's IRQ in ice_vsi_clear_napi_queues(). Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Link: https://patch.msgid.link/20250224232228.990783-4-ahmed.zaki@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-14ice: support Rx timestamp on flex descriptorSimei Su1-3/+2
To support Rx timestamp offload, VIRTCHNL_OP_1588_PTP_CAPS is sent by the VF to request PTP capability and responded by the PF what capability is enabled for that VF. Hardware captures timestamps which contain only 32 bits of nominal nanoseconds, as opposed to the 64bit timestamps that the stack expects. To convert 32b to 64b, we need a current PHC time. VIRTCHNL_OP_1588_PTP_GET_TIME is sent by the VF and responded by the PF with the current PHC time. Signed-off-by: Simei Su <simei.su@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-10ice: Don't check device type when checking GNSS presenceKarol Kolacinski1-1/+1
Don't check if the device type is E810T as non-E810T devices can support GNSS too and PCA9575 check is enough to determine if GNSS is present or not. Rename ice_gnss_is_gps_present() to ice_gnss_is_module_present() because GNSS module supports multiple GNSS providers, not only GPS. Move functions related to PCA9575 from ice_ptp_hw.c to ice_common.c to be able to access them when PTP is disabled in the kernel, but GNSS is enabled. Remove logical AND with ICE_AQC_LINK_TOPO_NODE_TYPE_M in ice_get_pca9575_handle(), which has no effect, and reorder device type checks to check the device_id first, then set other variables. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-05ice: enable_rdma devlink paramMichal Swiatkowski1-1/+7
Implement enable_rdma devlink parameter to allow user to turn RDMA feature on and off. It is useful when there is no enough interrupts and user doesn't need RDMA feature. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jan Sokolowski <jan.sokolowski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-05ice: treat dyn_allowed only as suggestionMichal Swiatkowski1-0/+2
It can be needed to have some MSI-X allocated as static and rest as dynamic. For example on PF VSI. We want to always have minimum one MSI-X on it, because of that it is allocated as a static one, rest can be dynamic if it is supported. Change the ice_get_irq_res() to allow using static entries if they are free even if caller wants dynamic one. Adjust limit values to the new approach. Min and max in limit means the values that are valid, so decrease max and num_static by one. Set vsi::irq_dyn_alloc if dynamic allocation is supported. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-02-05ice: get rid of num_lan_msix fieldMichal Swiatkowski1-11/+14
Remove the field to allow having more queues than MSI-X on VSI. As default the number will be the same, but if there won't be more MSI-X available VSI can run with at least one MSI-X. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-01-16ice: support FW Recovery ModeKonrad Knitter1-0/+6
Recovery Mode is intended to recover from a fatal failure scenario in which the device is not accessible to the host, meaning the firmware is non-responsive. The purpose of the Firmware Recovery Mode is to enable software tools to update firmware and/or device configuration so the fatal error can be resolved. Recovery Mode Firmware supports a limited set of admin commands required for NVM update. Recovery Firmware does not support hardware interrupts so a polling mode is used. The driver will expose only the minimum set of devlink commands required for the recovery of the adapter. Using an appropriate NVM image, the user can recover the adapter using the devlink flash API. Prior to 4.20 E810 Adapter Recovery Firmware supports only the update and erase of the "fw.mgmt" component. E810 Adapter Recovery Firmware doesn't support selected preservation of cards settings or identifiers. The following command can be used to recover the adapter: $ devlink dev flash <pci-address> <update-image.bin> component fw.mgmt overwrite settings overwrite identifier Newer FW versions (4.20 or newer) supports update of "fw.undi" and "fw.netlist" components. $ devlink dev flash <pci-address> <update-image.bin> Tested on Intel Corporation Ethernet Controller E810-C for SFP FW revision 3.20 and 4.30. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Konrad Knitter <konrad.knitter@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-11-13ice: Add support for persistent NAPI configJoe Damato1-2/+4
Use netif_napi_add_config to assign persistent per-NAPI config when initializing NAPIs. This preserves NAPI config settings when queue counts are adjusted. Tested with an E810-2CQDA2 NIC. Begin by setting the queue count to 4: $ sudo ethtool -L eth4 combined 4 Check the queue settings: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8452, 'ifindex': 4, 'irq': 2782}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8451, 'ifindex': 4, 'irq': 2781}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Now, set the queue with NAPI ID 8451 to have a gro-flush-timeout of 1111: $ sudo ./tools/net/ynl/cli.py \ --spec Documentation/netlink/specs/netdev.yaml \ --do napi-set --json='{"id": 8451, "gro-flush-timeout": 1111}' None Check that worked: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8452, 'ifindex': 4, 'irq': 2782}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 1111, 'id': 8451, 'ifindex': 4, 'irq': 2781}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Now reduce the queue count to 2, which would destroy the queue with NAPI ID 8451: $ sudo ethtool -L eth4 combined 2 Check the queue settings, noting that NAPI ID 8451 is gone: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Now, increase the number of queues back to 4: $ sudo ethtool -L eth4 combined 4 Dump the settings, expecting to see the same NAPI IDs as above and for NAPI ID 8451 to have its gro-flush-timeout set to 1111: $ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --dump napi-get --json='{"ifindex": 4}' [{'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8452, 'ifindex': 4, 'irq': 2782}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 1111, 'id': 8451, 'ifindex': 4, 'irq': 2781}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8450, 'ifindex': 4, 'irq': 2780}, {'defer-hard-irqs': 0, 'gro-flush-timeout': 0, 'id': 8449, 'ifindex': 4, 'irq': 2779}] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-10-08ice: add E830 HW VF mailbox message limit supportPaul Greenwalt1-0/+3
E830 adds hardware support to prevent the VF from overflowing the PF mailbox with VIRTCHNL messages. E830 will use the hardware feature (ICE_F_MBX_LIMIT) instead of the software solution ice_is_malicious_vf(). To prevent a VF from overflowing the PF, the PF sets the number of messages per VF that can be in the PF's mailbox queue (ICE_MBX_OVERFLOW_WATERMARK). When the PF processes a message from a VF, the PF decrements the per VF message count using the E830_MBX_VF_DEC_TRIG register. Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-7/+8
Cross-merge networking fixes after downstream PR. No conflicts (sort of) and no adjacent changes. This merge reverts commit b3c9e65eb227 ("net: hsr: remove seqnr_lock") from net, as it was superseded by commit 430d67bdcb04 ("net: hsr: Use the seqnr lock for frames received via interlink port.") in net-next. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09ice: Fix lldp packets dropping after changing the number of channelsMartyna Szapar-Mudlaw1-7/+8
After vsi setup refactor commit 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") ice_cfg_sw_lldp function which removes rx rule directing LLDP packets to vsi is moved from ice_vsi_release to ice_vsi_decfg function. ice_vsi_decfg is used in more cases than just in vsi_release resulting in unnecessary removal of rx lldp packets handling switch rule. This leads to lldp packets being dropped after a change number of channels via ethtool. This patch moves ice_cfg_sw_lldp function that removes rx lldp sw rule back to ice_vsi_release function. Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") Reported-by: Matěj Grégr <mgregr@netx.as> Closes: https://lore.kernel.org/intel-wired-lan/1be45a76-90af-4813-824f-8398b69745a9@netx.as/T/#u Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-06ice: add basic devlink subfunctions supportPiotr Raczynski1-2/+3
Implement devlink port handlers responsible for ethernet type devlink subfunctions. Create subfunction devlink port and setup all resources needed for a subfunction netdev to operate. Configure new VSI for each new subfunction, initialize and configure interrupts and Tx/Rx resources. Set correct MAC filters and create new netdev. For now, subfunction is limited to only one Tx/Rx queue pair. Only allocate new subfunction VSI with devlink port new command. Allocate and free subfunction MSIX interrupt vectors using new API calls with pci_msix_alloc_irq_at and pci_msix_free_irq. Support both automatic and manual subfunction numbers. If no subfunction number is provided, use xa_alloc to pick a number automatically. This will find the first free index and use that as the number. This reduces burden on users in the simple case where a specific number is not required. It may also be slightly faster to check that a number exists since xarray lookup should be faster than a linear scan of the dyn_ports xarray. Co-developed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-06ice: export ice ndo_ops functionsPiotr Raczynski1-0/+20
Make some of the netdevice_ops functions visible from outside for another VSI type created netdev. Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-06ice: add new VSI type for subfunctionsPiotr Raczynski1-2/+23
Add required plumbing for new VSI type dedicated to devlink subfunctions. Make sure that the vsi is properly configured and destroyed. Also allow loading XDP and AF_XDP sockets. The first implementation of devlink subfunctions supports only one Tx/Rx queue pair per given subfunction. Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03ice: check ICE_VSI_DOWN under rtnl_lock when preparing for resetLarysa Zaremba1-6/+6
Consider the following scenario: .ndo_bpf() | ice_prepare_for_reset() | ________________________|_______________________________________| rtnl_lock() | | ice_down() | | | test_bit(ICE_VSI_DOWN) - true | | ice_dis_vsi() returns | ice_up() | | | proceeds to rebuild a running VSI | .ndo_bpf() is not the only rtnl-locked callback that toggles the interface to apply new configuration. Another example is .set_channels(). To avoid the race condition above, act only after reading ICE_VSI_DOWN under rtnl_lock. Fixes: 0f9d5027a749 ("ice: Refactor VSI allocation, deletion and rebuild flow") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03ice: check for XDP rings instead of bpf program when unconfiguringLarysa Zaremba1-2/+2
If VSI rebuild is pending, .ndo_bpf() can attach/detach the XDP program on VSI without applying new ring configuration. When unconfiguring the VSI, we can encounter the state in which there is an XDP program but no XDP rings to destroy or there will be XDP rings that need to be destroyed, but no XDP program to indicate their presence. When unconfiguring, rely on the presence of XDP rings rather then XDP program, as they better represent the current state that has to be destroyed. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03ice: protect XDP configuration with a mutexLarysa Zaremba1-13/+21
The main threat to data consistency in ice_xdp() is a possible asynchronous PF reset. It can be triggered by a user or by TX timeout handler. XDP setup and PF reset code access the same resources in the following sections: * ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked * ice_vsi_rebuild() for the PF VSI - not protected * ice_vsi_open() - already rtnl-locked With an unfortunate timing, such accesses can result in a crash such as the one below: [ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14 [ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18 [Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms [ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001 [ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14 [ +0.394718] ice 0000:b1:00.0: PTP reset successful [ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098 [ +0.000045] #PF: supervisor read access in kernel mode [ +0.000023] #PF: error_code(0x0000) - not-present page [ +0.000023] PGD 0 P4D 0 [ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI [ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1 [ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021 [ +0.000036] Workqueue: ice ice_service_task [ice] [ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice] [...] [ +0.000013] Call Trace: [ +0.000016] <TASK> [ +0.000014] ? __die+0x1f/0x70 [ +0.000029] ? page_fault_oops+0x171/0x4f0 [ +0.000029] ? schedule+0x3b/0xd0 [ +0.000027] ? exc_page_fault+0x7b/0x180 [ +0.000022] ? asm_exc_page_fault+0x22/0x30 [ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice] [ +0.000194] ice_free_tx_ring+0xe/0x60 [ice] [ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice] [ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice] [ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice] [ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice] [ +0.000145] ice_rebuild+0x18c/0x840 [ice] [ +0.000145] ? delay_tsc+0x4a/0xc0 [ +0.000022] ? delay_tsc+0x92/0xc0 [ +0.000020] ice_do_reset+0x140/0x180 [ice] [ +0.000886] ice_service_task+0x404/0x1030 [ice] [ +0.000824] process_one_work+0x171/0x340 [ +0.000685] worker_thread+0x277/0x3a0 [ +0.000675] ? preempt_count_add+0x6a/0xa0 [ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50 [ +0.000679] ? __pfx_worker_thread+0x10/0x10 [ +0.000653] kthread+0xf0/0x120 [ +0.000635] ? __pfx_kthread+0x10/0x10 [ +0.000616] ret_from_fork+0x2d/0x50 [ +0.000612] ? __pfx_kthread+0x10/0x10 [ +0.000604] ret_from_fork_asm+0x1b/0x30 [ +0.000604] </TASK> The previous way of handling this through returning -EBUSY is not viable, particularly when destroying AF_XDP socket, because the kernel proceeds with removal anyway. There is plenty of code between those calls and there is no need to create a large critical section that covers all of them, same as there is no need to protect ice_vsi_rebuild() with rtnl_lock(). Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp(). Leaving unprotected sections in between would result in two states that have to be considered: 1. when the VSI is closed, but not yet rebuild 2. when VSI is already rebuild, but not yet open The latter case is actually already handled through !netif_running() case, we just need to adjust flag checking a little. The former one is not as trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of hardware interaction happens, this can make adding/deleting rings exit with an error. Luckily, VSI rebuild is pending and can apply new configuration for us in a managed fashion. Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to indicate that ice_xdp() can just hot-swap the program. Also, as ice_vsi_rebuild() flow is touched in this patch, make it more consistent by deconfiguring VSI when coalesce allocation fails. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-09-03ice: move netif_queue_set_napi to rtnl-protected sectionsLarysa Zaremba1-98/+31
Currently, netif_queue_set_napi() is called from ice_vsi_rebuild() that is not rtnl-locked when called from the reset. This creates the need to take the rtnl_lock just for a single function and complicates the synchronization with .ndo_bpf. At the same time, there no actual need to fill napi-to-queue information at this exact point. Fill napi-to-queue information when opening the VSI and clear it when the VSI is being closed. Those routines are already rtnl-locked. Also, rewrite napi-to-queue assignment in a way that prevents inclusion of XDP queues, as this leads to out-of-bounds writes, such as one below. [ +0.000004] BUG: KASAN: slab-out-of-bounds in netif_queue_set_napi+0x1c2/0x1e0 [ +0.000012] Write of size 8 at addr ffff889881727c80 by task bash/7047 [ +0.000006] CPU: 24 PID: 7047 Comm: bash Not tainted 6.10.0-rc2+ #2 [ +0.000004] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021 [ +0.000003] Call Trace: [ +0.000003] <TASK> [ +0.000002] dump_stack_lvl+0x60/0x80 [ +0.000007] print_report+0xce/0x630 [ +0.000007] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ +0.000007] ? __virt_addr_valid+0x1c9/0x2c0 [ +0.000005] ? netif_queue_set_napi+0x1c2/0x1e0 [ +0.000003] kasan_report+0xe9/0x120 [ +0.000004] ? netif_queue_set_napi+0x1c2/0x1e0 [ +0.000004] netif_queue_set_napi+0x1c2/0x1e0 [ +0.000005] ice_vsi_close+0x161/0x670 [ice] [ +0.000114] ice_dis_vsi+0x22f/0x270 [ice] [ +0.000095] ice_pf_dis_all_vsi.constprop.0+0xae/0x1c0 [ice] [ +0.000086] ice_prepare_for_reset+0x299/0x750 [ice] [ +0.000087] pci_dev_save_and_disable+0x82/0xd0 [ +0.000006] pci_reset_function+0x12d/0x230 [ +0.000004] reset_store+0xa0/0x100 [ +0.000006] ? __pfx_reset_store+0x10/0x10 [ +0.000002] ? __pfx_mutex_lock+0x10/0x10 [ +0.000004] ? __check_object_size+0x4c1/0x640 [ +0.000007] kernfs_fop_write_iter+0x30b/0x4a0 [ +0.000006] vfs_write+0x5d6/0xdf0 [ +0.000005] ? fd_install+0x180/0x350 [ +0.000005] ? __pfx_vfs_write+0x10/0xA10 [ +0.000004] ? do_fcntl+0x52c/0xcd0 [ +0.000004] ? kasan_save_track+0x13/0x60 [ +0.000003] ? kasan_save_free_info+0x37/0x60 [ +0.000006] ksys_write+0xfa/0x1d0 [ +0.000003] ? __pfx_ksys_write+0x10/0x10 [ +0.000002] ? __x64_sys_fcntl+0x121/0x180 [ +0.000004] ? _raw_spin_lock+0x87/0xe0 [ +0.000005] do_syscall_64+0x80/0x170 [ +0.000007] ? _raw_spin_lock+0x87/0xe0 [ +0.000004] ? __pfx__raw_spin_lock+0x10/0x10 [ +0.000003] ? file_close_fd_locked+0x167/0x230 [ +0.000005] ? syscall_exit_to_user_mode+0x7d/0x220 [ +0.000005] ? do_syscall_64+0x8c/0x170 [ +0.000004] ? do_syscall_64+0x8c/0x170 [ +0.000003] ? do_syscall_64+0x8c/0x170 [ +0.000003] ? fput+0x1a/0x2c0 [ +0.000004] ? filp_close+0x19/0x30 [ +0.000004] ? do_dup2+0x25a/0x4c0 [ +0.000004] ? __x64_sys_dup2+0x6e/0x2e0 [ +0.000002] ? syscall_exit_to_user_mode+0x7d/0x220 [ +0.000004] ? do_syscall_64+0x8c/0x170 [ +0.000003] ? __count_memcg_events+0x113/0x380 [ +0.000005] ? handle_mm_fault+0x136/0x820 [ +0.000005] ? do_user_addr_fault+0x444/0xa80 [ +0.000004] ? clear_bhb_loop+0x25/0x80 [ +0.000004] ? clear_bhb_loop+0x25/0x80 [ +0.000002] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000005] RIP: 0033:0x7f2033593154 Fixes: 080b0c8d6d26 ("ice: Fix ASSERT_RTNL() warning during certain scenarios") Fixes: 91fdbce7e8d6 ("ice: Add support in the driver for associating queue with napi") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-10ice: use irq_update_affinity_hint()Michal Schmidt1-2/+2
irq_set_affinity_hint() is deprecated. Use irq_update_affinity_hint() instead. This removes the side-effect of actually applying the affinity. The driver does not really need to worry about spreading its IRQs across CPUs. The core code already takes care of that. On the contrary, when the driver applies affinities by itself, it breaks the users' expectations: 1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in order to prevent IRQs from being moved to certain CPUs that run a real-time workload. 2. ice reconfigures VSIs at runtime due to a MIB change (ice_dcb_process_lldp_set_mib_change). Reopening a VSI resets the affinity in ice_vsi_req_irq_msix(). 3. ice has no idea about irqbalance's config, so it may move an IRQ to a banned CPU. The real-time workload suffers unacceptable latency. I am not sure if updating the affinity hints is at all useful, because irqbalance ignores them since 2016 ([1]), but at least it's harmless. This ice change is similar to i40e commit d34c54d1739c ("i40e: Use irq_update_affinity_hint()"). [1] https://github.com/Irqbalance/irqbalance/commit/dcc411e7bfdd Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Sunil Goutham <sgoutham@marvell.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240607-next-2024-06-03-intel-next-batch-v3-3-d1470cee3347@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05ice: map XDP queues to vectors in ice_vsi_map_rings_to_vectors()Larysa Zaremba1-7/+7
ice_pf_dcb_recfg() re-maps queues to vectors with ice_vsi_map_rings_to_vectors(), which does not restore the previous state for XDP queues. This leads to no AF_XDP traffic after rebuild. Map XDP queues to vectors in ice_vsi_map_rings_to_vectors(). Also, move the code around, so XDP queues are mapped independently only through .ndo_bpf(). Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-5-e3563aa89b0c@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05ice: add flag to distinguish reset from .ndo_bpf in XDP rings configLarysa Zaremba1-2/+3
Commit 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") has placed ice_vsi_free_q_vectors() after ice_destroy_xdp_rings() in the rebuild process. The behaviour of the XDP rings config functions is context-dependent, so the change of order has led to ice_destroy_xdp_rings() doing additional work and removing XDP prog, when it was supposed to be preserved. Also, dependency on the PF state reset flags creates an additional, fortunately less common problem: * PFR is requested e.g. by tx_timeout handler * .ndo_bpf() is asked to delete the program, calls ice_destroy_xdp_rings(), but reset flag is set, so rings are destroyed without deleting the program * ice_vsi_rebuild tries to delete non-existent XDP rings, because the program is still on the VSI * system crashes With a similar race, when requested to attach a program, ice_prepare_xdp_rings() can actually skip setting the program in the VSI and nevertheless report success. Instead of reverting to the old order of function calls, add an enum argument to both ice_prepare_xdp_rings() and ice_destroy_xdp_rings() in order to distinguish between calls from rebuild and .ndo_bpf(). Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-4-e3563aa89b0c@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05ice: remove af_xdp_zc_qps bitmapLarysa Zaremba1-8/+0
Referenced commit has introduced a bitmap to distinguish between ZC and copy-mode AF_XDP queues, because xsk_get_pool_from_qid() does not do this for us. The bitmap would be especially useful when restoring previous state after rebuild, if only it was not reallocated in the process. This leads to e.g. xdpsock dying after changing number of queues. Instead of preserving the bitmap during the rebuild, remove it completely and distinguish between ZC and copy-mode queues based on the presence of a device associated with the pool. Fixes: e102db780e1c ("ice: track AF_XDP ZC enabled queues in bitmap") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-3-e3563aa89b0c@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-06ice: refactor struct ice_vsi_cfg_params to be inside of struct ice_vsiMateusz Polchlopek1-22/+11
Refactor struct ice_vsi_cfg_params to be embedded into struct ice_vsi. Prior to that the members of the struct were scattered around ice_vsi, and were copy-pasted for purposes of reinit. Now we have struct handy, and it is easier to have something sticky in the flags field. Suggested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Vaishnavi Tipireddy <vaishnavi.tipireddy@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-01ice: move devlink port code to a separate filePiotr Raczynski1-1/+0
Keep devlink related code in a separate file. More devlink port code and handlers will be added here for other port operations. Remove no longer needed include of our devlink.h in ice_lib.c. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-04-01ice: move ice_devlink.[ch] to devlink folderMichal Swiatkowski1-1/+1
Only moving whole files, fixing Makefile and bunch of includes. Some changes to ice_devlink file was done even in representor part (Tx topology), so keep it as final patch to not mess up with rebasing. After moving to devlink folder there is no need to have such long name for these files. Rename them to simple devlink. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-28Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queueJakub Kicinski1-48/+1
Tony Nguyen says: ==================== ice: use less resources in switchdev Michal Swiatkowski says: Switchdev is using one queue per created port representor. This can quickly lead to Rx queue shortage, as with subfunction support user can create high number of PRs. Save one MSI-X and 'number of PRs' * 1 queues. Refactor switchdev slow-path to use less resources (even no additional resources). Do this by removing control plane VSI and move its functionality to PF VSI. Even with current solution PF is acting like uplink and can't be used simultaneously for other use cases (adding filters can break slow-path). In short, do Tx via PF VSI and Rx packets using PF resources. Rx needs additional code in interrupt handler to choose correct PR netdev. Previous solution had to queue filters, it was way more elegant but needed one queue per PRs. Beside that this refactor mostly simplifies switchdev configuration. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: count representor stats ice: do switchdev slow-path Rx using PF VSI ice: change repr::id values ice: remove switchdev control plane VSI ice: control default Tx rule in lag ice: default Tx rule instead of to queue ice: do Tx through PF netdev in slow-path ice: remove eswitch changing queues algorithm ==================== Link: https://lore.kernel.org/r/20240325202623.1012287-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-25ice: remove switchdev control plane VSIMichal Swiatkowski1-48/+1
For slow-path Rx and Tx PF VSI is used. There is no need to have control plane VSI. Remove all code related to it. Eswitch rebuild can't fail without rebuilding control plane VSI. Return void from ice_eswitch_rebuild(). Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-25ice: fix memory corruption bug with suspend and rebuildJesse Brandeburg1-9/+9
The ice driver would previously panic after suspend. This is caused from the driver *only* calling the ice_vsi_free_q_vectors() function by itself, when it is suspending. Since commit b3e7b3a6ee92 ("ice: prevent NULL pointer deref during reload") the driver has zeroed out num_q_vectors, and only restored it in ice_vsi_cfg_def(). This further causes the ice_rebuild() function to allocate a zero length buffer, after which num_q_vectors is updated, and then the new value of num_q_vectors is used to index into the zero length buffer, which corrupts memory. The fix entails making sure all the code referencing num_q_vectors only does so after it has been reset via ice_vsi_cfg_def(). I didn't perform a full bisect, but I was able to test against 6.1.77 kernel and that ice driver works fine for suspend/resume with no panic, so sometime since then, this problem was introduced. Also clean up an un-needed init of a local variable in the function being modified. PANIC from 6.8.0-rc1: [1026674.915596] PM: suspend exit [1026675.664697] ice 0000:17:00.1: PTP reset successful [1026675.664707] ice 0000:17:00.1: 2755 msecs passed between update to cached PHC time [1026675.667660] ice 0000:b1:00.0: PTP reset successful [1026675.675944] ice 0000:b1:00.0: 2832 msecs passed between update to cached PHC time [1026677.137733] ixgbe 0000:31:00.0 ens787: NIC Link is Up 1 Gbps, Flow Control: None [1026677.190201] BUG: kernel NULL pointer dereference, address: 0000000000000010 [1026677.192753] ice 0000:17:00.0: PTP reset successful [1026677.192764] ice 0000:17:00.0: 4548 msecs passed between update to cached PHC time [1026677.197928] #PF: supervisor read access in kernel mode [1026677.197933] #PF: error_code(0x0000) - not-present page [1026677.197937] PGD 1557a7067 P4D 0 [1026677.212133] ice 0000:b1:00.1: PTP reset successful [1026677.212143] ice 0000:b1:00.1: 4344 msecs passed between update to cached PHC time [1026677.212575] [1026677.243142] Oops: 0000 [#1] PREEMPT SMP NOPTI [1026677.247918] CPU: 23 PID: 42790 Comm: kworker/23:0 Kdump: loaded Tainted: G W 6.8.0-rc1+ #1 [1026677.257989] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022 [1026677.269367] Workqueue: ice ice_service_task [ice] [1026677.274592] RIP: 0010:ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice] [1026677.281421] Code: 0f 84 3a ff ff ff 41 0f b7 74 ec 02 66 89 b0 22 02 00 00 81 e6 ff 1f 00 00 e8 ec fd ff ff e9 35 ff ff ff 48 8b 43 30 49 63 ed <41> 0f b7 34 24 41 83 c5 01 48 8b 3c e8 66 89 b7 aa 02 00 00 81 e6 [1026677.300877] RSP: 0018:ff3be62a6399bcc0 EFLAGS: 00010202 [1026677.306556] RAX: ff28691e28980828 RBX: ff28691e41099828 RCX: 0000000000188000 [1026677.314148] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ff28691e41099828 [1026677.321730] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [1026677.329311] R10: 0000000000000007 R11: ffffffffffffffc0 R12: 0000000000000010 [1026677.336896] R13: 0000000000000000 R14: 0000000000000000 R15: ff28691e0eaa81a0 [1026677.344472] FS: 0000000000000000(0000) GS:ff28693cbffc0000(0000) knlGS:0000000000000000 [1026677.353000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1026677.359195] CR2: 0000000000000010 CR3: 0000000128df4001 CR4: 0000000000771ef0 [1026677.366779] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1026677.374369] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1026677.381952] PKRU: 55555554 [1026677.385116] Call Trace: [1026677.388023] <TASK> [1026677.390589] ? __die+0x20/0x70 [1026677.394105] ? page_fault_oops+0x82/0x160 [1026677.398576] ? do_user_addr_fault+0x65/0x6a0 [1026677.403307] ? exc_page_fault+0x6a/0x150 [1026677.407694] ? asm_exc_page_fault+0x22/0x30 [1026677.412349] ? ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice] [1026677.418614] ice_vsi_rebuild+0x34b/0x3c0 [ice] [1026677.423583] ice_vsi_rebuild_by_type+0x76/0x180 [ice] [1026677.429147] ice_rebuild+0x18b/0x520 [ice] [1026677.433746] ? delay_tsc+0x8f/0xc0 [1026677.437630] ice_do_reset+0xa3/0x190 [ice] [1026677.442231] ice_service_task+0x26/0x440 [ice] [1026677.447180] process_one_work+0x174/0x340 [1026677.451669] worker_thread+0x27e/0x390 [1026677.455890] ? __pfx_worker_thread+0x10/0x10 [1026677.460627] kthread+0xee/0x120 [1026677.464235] ? __pfx_kthread+0x10/0x10 [1026677.468445] ret_from_fork+0x2d/0x50 [1026677.472476] ? __pfx_kthread+0x10/0x10 [1026677.476671] ret_from_fork_asm+0x1b/0x30 [1026677.481050] </TASK> Fixes: b3e7b3a6ee92 ("ice: prevent NULL pointer deref during reload") Reported-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-11Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queueDavid S. Miller1-0/+37
Tony Nguyen says: ==================== ethtool: ice: Support for RSS settings to GTP Takeru Hayasaka enables RSS functionality for GTP packets on ice driver with ethtool. A user can include TEID and make RSS work for GTP-U over IPv4 by doing the following:`ethtool -N ens3 rx-flow-hash gtpu4 sde` In addition to gtpu(4|6), we now support gtpc(4|6),gtpc(4|6)t,gtpu(4|6)e, gtpu(4|6)u, and gtpu(4|6)d. gtpc(4|6): Used for GTP-C in IPv4 and IPv6, where the GTP header format does not include a TEID. gtpc(4|6)t: Used for GTP-C in IPv4 and IPv6, with a GTP header format that includes a TEID. gtpu(4|6): Used for GTP-U in both IPv4 and IPv6 scenarios. gtpu(4|6)e: Used for GTP-U with extended headers in both IPv4 and IPv6. gtpu(4|6)u: Used when the PSC (PDU session container) in the GTP-U extended header includes Uplink, applicable to both IPv4 and IPv6. gtpu(4|6)d: Used when the PSC in the GTP-U extended header includes Downlink, for both IPv4 and IPv6. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+1
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/page_pool_user.c 0b11b1c5c320 ("netdev: let netlink core handle -EMSGSIZE errors") 429679dcf7d9 ("page_pool: fix netlink dump stop/resume") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-06ice: Implement RSS settings for GTP using ethtoolTakeru Hayasaka1-0/+37
Following the addition of new GTP RSS hash options to ethtool.h, this patch implements the corresponding RSS settings for GTP packets in the Intel ice driver. It enables users to configure RSS for GTP-U and GTP-C traffic over IPv4 and IPv6, utilizing the newly defined hash options. The implementation covers the handling of gtpu(4|6), gtpc(4|6), gtpc(4|6)t, gtpu(4|6)e, gtpu(4|6)u, and gtpu(4|6)d traffic, providing enhanced load distribution for GTP traffic across multiple processing units. Signed-off-by: Takeru Hayasaka <hayatake396@gmail.com> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-05ice: fix typo in assignmentJesse Brandeburg1-1/+1
Fix an obviously incorrect assignment, created with a typo or cut-n-paste error. Fixes: 5995ef88e3a8 ("ice: realloc VSI stats arrays") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-04ice: do not disable Tx queues twice in ice_down()Maciej Fijalkowski1-55/+0
ice_down() clears QINT_TQCTL_CAUSE_ENA_M bit twice, which is not necessary. First clearing happens in ice_vsi_dis_irq() and second in ice_vsi_stop_tx_ring() - remove the first one. While at it, make ice_vsi_dis_irq() static as ice_down() is the only current caller of it. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-02-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-17/+69
Cross-merge networking fixes after downstream PR. Conflicts: net/mptcp/protocol.c adf1bb78dab5 ("mptcp: fix snd_wnd initialization for passive socket") 9426ce476a70 ("mptcp: annotate lockless access for RX path fields") https://lore.kernel.org/all/20240228103048.19255709@canb.auug.org.au/ Adjacent changes: drivers/dpll/dpll_core.c 0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()") e7f8df0e81bf ("dpll: move xa_erase() call in to match dpll_pin_alloc() error path order") drivers/net/veth.c 1ce7d306ea63 ("veth: try harder when allocating queue memory") 0bef512012b1 ("net: add netdev_lockdep_set_classes() to virtual drivers") drivers/net/wireless/intel/iwlwifi/mvm/d3.c 8c9bef26e98b ("wifi: iwlwifi: mvm: d3: implement suspend with MLO") 78f65fbf421a ("wifi: iwlwifi: mvm: ensure offloading TID queue exists") net/wireless/nl80211.c f78c1375339a ("wifi: nl80211: reject iftype change with mesh ID change") 414532d8aa89 ("wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriately") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-20ice: Fix ASSERT_RTNL() warning during certain scenariosAmritha Nambiar1-17/+69
Commit 91fdbce7e8d6 ("ice: Add support in the driver for associating queue with napi") invoked the netif_queue_set_napi() call. This kernel function requires to be called with rtnl_lock taken, otherwise ASSERT_RTNL() warning will be triggered. ice_vsi_rebuild() initiating this call is under rtnl_lock when the rebuild is in response to configuration changes from external interfaces (such as tc, ethtool etc. which holds the lock). But, the VSI rebuild generated from service tasks and resets (PFR/CORER/GLOBR) is not under rtnl lock protection. Handle these cases as well to hold lock before the kernel call (by setting the 'locked' boolean to false). netif_queue_set_napi() is also used to clear previously set napi in the q_vector unroll flow. Handle this for locked/lockless execution paths. Fixes: 91fdbce7e8d6 ("ice: Add support in the driver for associating queue with napi") Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-02-02ice: make ice_vsi_cfg_txq() staticMaciej Fijalkowski1-73/+0
Currently, XSK control path in ice driver calls directly ice_vsi_cfg_txq() whereas we have ice_vsi_cfg_single_txq() for that purpose. Use the latter from XSK side and make ice_vsi_cfg_txq() static. ice_vsi_cfg_txq() resides in ice_base.c and is rather big, so to reduce the code churn let us move the callers of it from ice_lib.c to ice_base.c. This change puts ice_qp_ena() on nice diet due to the checks and operations that ice_vsi_cfg_single_{r,t}xq() do internally. add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-182 (-182) Function old new delta ice_xsk_pool_setup 2165 1983 -182 Total: Before=472597, After=472415, chg -0.04% Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-02-02ice: make ice_vsi_cfg_rxq() staticMaciej Fijalkowski1-56/+0
Currently, XSK control path in ice driver calls directly ice_vsi_cfg_rxq() whereas we have ice_vsi_cfg_single_rxq() for that purpose. Use the latter from XSK side and make ice_vsi_cfg_rxq() static. ice_vsi_cfg_rxq() resides in ice_base.c and is rather big, so to reduce the code churn let us move two callers of it from ice_lib.c to ice_base.c. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-22Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queueDavid S. Miller1-34/+20
Tony Nguyen says: ==================== intel: use bitfield operations Jesse Brandeburg says: After repeatedly getting review comments on new patches, and sporadic patches to fix parts of our drivers, we should just convert the Intel code to use FIELD_PREP() and FIELD_GET(). It's then "common" in the code and hopefully future change-sets will see the context and do-the-right-thing. This conversion was done with a coccinelle script which is mentioned in the commit messages. Generally there were only a couple conversions that were "undone" after the automatic changes because they tried to convert a non-contiguous mask. Patch 1 is required at the beginning of this series to fix a "forever" issue in the e1000e driver that fails the compilation test after conversion because the shift / mask was out of range. The second patch just adds all the new #includes in one go. The patch titled: "ice: fix pre-shifted bit usage" is needed to allow the use of the FIELD_* macros and fix up the unexpected "shifts included" defines found while creating this series. The rest are the conversion to use FIELD_PREP()/FIELD_GET(), and the occasional leXX_{get,set,encode}_bits() call, as suggested by Alex. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni1-4/+3
Cross-merge networking fixes after downstream PR. Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c 23c93c3b6275 ("bnxt_en: do not map packet buffers twice") 6d1add95536b ("bnxt_en: Modify TX ring indexing logic.") tools/testing/selftests/net/Makefile 2258b666482d ("selftests: add vlan hw filter tests") a0bc96c0cd6e ("selftests: net: verify fq per-band packet limit") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-18ice: cleanup inconsistent codeJesse Brandeburg1-2/+2
It was found while doing further testing of the previous commit fbf32a9bab91 ("ice: field get conversion") that one of the FIELD_GET conversions should really be a FIELD_PREP. The previous code was styled as a match to the FIELD_GET conversion, which always worked because the shift value was 0. The code makes way more sense as a FIELD_PREP and was in fact the only FIELD_GET with two constant arguments in this series. Didn't squash this patch to make it easier to call out the (non-impactful) bug. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-18ice: field get conversionJesse Brandeburg1-3/+2
Refactor the ice driver to use FIELD_GET() for mask and shift reads, which reduces lines of code and adds clarity of intent. This code was generated by the following coccinelle/spatch script and then manually repaired. @get@ constant shift,mask; type T; expression a; @@ -(((T)(a) & mask) >> shift) +FIELD_GET(mask, a) and applied via: spatch --sp-file field_prep.cocci --in-place --dir \ drivers/net/ethernet/intel/ CC: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-18ice: fix pre-shifted bit usageJesse Brandeburg1-1/+2
While converting to FIELD_PREP() and FIELD_GET(), it was noticed that some of the RSS defines had *included* the shift in their definitions. This is completely outside of normal, such that a developer could easily make a mistake and shift at the usage site (like when using FIELD_PREP()). Rename the defines and set them to the "pre-shifted values" so they match the template the driver normally uses for masks and the member bits of the mask, which also allows the driver to use FIELD_PREP correctly with these values. Use GENMASK() for this changed MASK value. Do the same for the VLAN EMODE defines as well. Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-18ice: field prep conversionJesse Brandeburg1-30/+16
Refactor ice driver to use FIELD_PREP(), which reduces lines of code and adds clarity of intent. This code was generated by the following coccinelle/spatch script and then manually repaired. Several places I changed to OR into a single variable with |= instead of using a multi-line statement with trailing OR operators, as it (subjectively) makes the code clearer. A local variable vmvf_and_timeout was created and used to avoid multiple logical ORs being __le16 converted, which shortened some lines and makes the code cleaner. Also clean up a couple of places where conversions were made to have the code read more clearly/consistently. @prep2@ constant shift,mask; type T; expression a; @@ -(((T)(a) << shift) & mask) +FIELD_PREP(mask, a) @prep@ constant shift,mask; type T; expression a; @@ -((T)((a) << shift) & mask) +FIELD_PREP(mask, a) Cc: Julia Lawall <Julia.Lawall@inria.fr> CC: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-18ice: Fix PF with enabled XDP going no-carrier after resetLarysa Zaremba1-0/+3
Commit 6624e780a577fc596788 ("ice: split ice_vsi_setup into smaller functions") has refactored a bunch of code involved in PFR. In this process, TC queue number adjustment for XDP was lost. Bring it back. Lack of such adjustment causes interface to go into no-carrier after a reset, if XDP program is attached, with the following message: ice 0000:b1:00.0: Failed to set LAN Tx queue context, error: -22 ice 0000:b1:00.0 ens801f0np0: Failed to open VSI 0x0006 on switch 0x0001 ice 0000:b1:00.0: enable VSI failed, err -22, VSI index 0, type ICE_VSI_PF ice 0000:b1:00.0: PF VSI rebuild failed: -22 ice 0000:b1:00.0: Rebuild failed, unload and reload driver Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>