Age | Commit message (Collapse) | Author | Files | Lines |
|
When a trusted VF tries to configure an LLDP multicast address, configure a
rule that would mirror the traffic to this VF, untrusted VFs are not
allowed to receive LLDP at all, so the request to add LLDP MAC address will
always fail for them.
Add a forwarding LLDP filter to a trusted VF when it tries to add an LLDP
multicast MAC address. The MAC address has to be added after enabling
trust (through restarting the LLDP service).
Signed-off-by: Mateusz Pacuszka <mateuszx.pacuszka@intel.com>
Co-developed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Commit 34295a3696fb ("ice: implement new LLDP filter command")
introduced the ability to use LLDP-specific filter that directs all
LLDP traffic to a single VSI. However, current goal is for all trusted VFs
to be able to see LLDP neighbors, which is impossible to do with the
special filter.
Make using the generic filter the default choice and fall back to special
one only if a generic filter cannot be added. That way setups with "NVMs
where an already existent LLDP filter is blocking the creation of a filter
to allow LLDP packets" will still be able to configure software Rx LLDP on
PF only, while all other setups would be able to forward them to VFs too.
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
E830 supports raw receive and generic transmit checksum offloads.
Raw receive checksum support is provided by hardware calculating the
checksum over the whole packet, regardless of type. The calculated
checksum is provided to driver in the Rx flex descriptor. Then the driver
assigns the checksum to skb->csum and sets skb->ip_summed to
CHECKSUM_COMPLETE.
Generic transmit checksum support is provided by hardware calculating the
checksum given two offsets: the start offset to begin checksum calculation,
and the offset to insert the calculated checksum in the packet. Support is
advertised to the stack using NETIF_F_HW_CSUM feature.
E830 has the following limitations when both generic transmit checksum
offload and TCP Segmentation Offload (TSO) are enabled:
1. Inner packet header modification is not supported. This restriction
includes the inability to alter TCP flags, such as the push flag. As a
result, this limitation can impact the receiver's ability to coalesce
packets, potentially degrading network throughput.
2. The Maximum Segment Size (MSS) is limited to 1023 bytes, which prevents
support of Maximum Transmission Unit (MTU) greater than 1063 bytes.
Therefore NETIF_F_HW_CSUM and NETIF_F_ALL_TSO features are mutually
exclusive. NETIF_F_HW_CSUM hardware feature support is indicated but is not
enabled by default. Instead, IP checksums and NETIF_F_ALL_TSO are the
defaults. Enforcement of mutual exclusivity of NETIF_F_HW_CSUM and
NETIF_F_ALL_TSO is done in ice_set_features(). Mutual exclusivity
of IP checksums and NETIF_F_HW_CSUM is handled by netdev_fix_features().
When NETIF_F_HW_CSUM is requested the provided skb->csum_start and
skb->csum_offset are passed to hardware in the Tx context descriptor
generic checksum (GCS) parameters. Hardware calculates the 1's complement
from skb->csum_start to the end of the packet, and inserts the result in
the packet at skb->csum_offset.
Co-developed-by: Alice Michael <alice.michael@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Co-developed-by: Eric Joyner <eric.joyner@intel.com>
Signed-off-by: Eric Joyner <eric.joyner@intel.com>
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20250310174502.3708121-2-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Cross-merge networking fixes after downstream PR (net-6.14-rc6).
Conflicts:
tools/testing/selftests/drivers/net/ping.py
75cc19c8ff89 ("selftests: drv-net: add xdp cases for ping.py")
de94e8697405 ("selftests: drv-net: store addresses in dict indexed by ipver")
https://lore.kernel.org/netdev/20250311115758.17a1d414@canb.auug.org.au/
net/core/devmem.c
a70f891e0fa0 ("net: devmem: do not WARN conditionally after netdev_rx_queue_restart()")
1d22d3060b9b ("net: drop rtnl_lock for queue_mgmt operations")
https://lore.kernel.org/netdev/20250313114929.43744df1@canb.auug.org.au/
Adjacent changes:
tools/testing/selftests/net/Makefile
6f50175ccad4 ("selftests: Add IPv6 link-local address generation tests for GRE devices.")
2e5584e0f913 ("selftests/net: expand cmsg_ipv6.sh with ipv4")
drivers/net/ethernet/broadcom/bnxt/bnxt.c
661958552eda ("eth: bnxt: do not use BNXT_VNIC_NTUPLE unconditionally in queue restart logic")
fe96d717d38e ("bnxt_en: Extend queue stop/start for TX rings")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
After switchdev is enabled and disabled later, LLDP packets sending stops,
despite working perfectly fine before and during switchdev state.
To reproduce (creating/destroying VF is what triggers the reconfiguration):
devlink dev eswitch set pci/<address> mode switchdev
echo '2' > /sys/class/net/<ifname>/device/sriov_numvfs
echo '0' > /sys/class/net/<ifname>/device/sriov_numvfs
This happens because LLDP relies on the destination override functionality.
It needs to 1) set a flag in the descriptor, 2) set the VSI permission to
make it valid. The permissions are set when the PF VSI is first configured,
but switchdev then enables it for the uplink VSI (which is always the PF)
once more when configured and disables when deconfigured, which leads to
software-generated LLDP packets being blocked.
Do not modify the destination override permissions when configuring
switchdev, as the enabled state is the default configuration that is never
modified.
Fixes: 1a1c40df2e80 ("ice: set and release switchdev environment")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Delete the driver CPU affinity and aRFS rmap info, use the core's
API instead.
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://patch.msgid.link/20250224232228.990783-5-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We set the NAPI's IRQ number in ice_vsi_set_napi_queues(). Clear the
NAPI's IRQ in ice_vsi_clear_napi_queues().
Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
Link: https://patch.msgid.link/20250224232228.990783-4-ahmed.zaki@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To support Rx timestamp offload, VIRTCHNL_OP_1588_PTP_CAPS is sent by
the VF to request PTP capability and responded by the PF what capability
is enabled for that VF.
Hardware captures timestamps which contain only 32 bits of nominal
nanoseconds, as opposed to the 64bit timestamps that the stack expects.
To convert 32b to 64b, we need a current PHC time.
VIRTCHNL_OP_1588_PTP_GET_TIME is sent by the VF and responded by the
PF with the current PHC time.
Signed-off-by: Simei Su <simei.su@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Co-developed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Don't check if the device type is E810T as non-E810T devices can support
GNSS too and PCA9575 check is enough to determine if GNSS is present or
not.
Rename ice_gnss_is_gps_present() to ice_gnss_is_module_present()
because GNSS module supports multiple GNSS providers, not only GPS.
Move functions related to PCA9575 from ice_ptp_hw.c to ice_common.c
to be able to access them when PTP is disabled in the kernel, but GNSS
is enabled.
Remove logical AND with ICE_AQC_LINK_TOPO_NODE_TYPE_M in
ice_get_pca9575_handle(), which has no effect, and reorder device type
checks to check the device_id first, then set other variables.
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Implement enable_rdma devlink parameter to allow user to turn RDMA
feature on and off.
It is useful when there is no enough interrupts and user doesn't need
RDMA feature.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
It can be needed to have some MSI-X allocated as static and rest as
dynamic. For example on PF VSI. We want to always have minimum one MSI-X
on it, because of that it is allocated as a static one, rest can be
dynamic if it is supported.
Change the ice_get_irq_res() to allow using static entries if they are
free even if caller wants dynamic one.
Adjust limit values to the new approach. Min and max in limit means the
values that are valid, so decrease max and num_static by one.
Set vsi::irq_dyn_alloc if dynamic allocation is supported.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Remove the field to allow having more queues than MSI-X on VSI. As
default the number will be the same, but if there won't be more MSI-X
available VSI can run with at least one MSI-X.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Recovery Mode is intended to recover from a fatal failure scenario in
which the device is not accessible to the host, meaning the firmware is
non-responsive.
The purpose of the Firmware Recovery Mode is to enable software tools to
update firmware and/or device configuration so the fatal error can be
resolved.
Recovery Mode Firmware supports a limited set of admin commands required
for NVM update.
Recovery Firmware does not support hardware interrupts so a polling mode
is used.
The driver will expose only the minimum set of devlink commands required
for the recovery of the adapter.
Using an appropriate NVM image, the user can recover the adapter using
the devlink flash API.
Prior to 4.20 E810 Adapter Recovery Firmware supports only the update
and erase of the "fw.mgmt" component.
E810 Adapter Recovery Firmware doesn't support selected preservation of
cards settings or identifiers.
The following command can be used to recover the adapter:
$ devlink dev flash <pci-address> <update-image.bin> component fw.mgmt
overwrite settings overwrite identifier
Newer FW versions (4.20 or newer) supports update of "fw.undi" and
"fw.netlist" components.
$ devlink dev flash <pci-address> <update-image.bin>
Tested on Intel Corporation Ethernet Controller E810-C for SFP
FW revision 3.20 and 4.30.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Konrad Knitter <konrad.knitter@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Use netif_napi_add_config to assign persistent per-NAPI config when
initializing NAPIs. This preserves NAPI config settings when queue
counts are adjusted.
Tested with an E810-2CQDA2 NIC.
Begin by setting the queue count to 4:
$ sudo ethtool -L eth4 combined 4
Check the queue settings:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 4}'
[{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8452,
'ifindex': 4,
'irq': 2782},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8451,
'ifindex': 4,
'irq': 2781},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8450,
'ifindex': 4,
'irq': 2780},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8449,
'ifindex': 4,
'irq': 2779}]
Now, set the queue with NAPI ID 8451 to have a gro-flush-timeout of
1111:
$ sudo ./tools/net/ynl/cli.py \
--spec Documentation/netlink/specs/netdev.yaml \
--do napi-set --json='{"id": 8451, "gro-flush-timeout": 1111}'
None
Check that worked:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 4}'
[{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8452,
'ifindex': 4,
'irq': 2782},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 1111,
'id': 8451,
'ifindex': 4,
'irq': 2781},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8450,
'ifindex': 4,
'irq': 2780},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8449,
'ifindex': 4,
'irq': 2779}]
Now reduce the queue count to 2, which would destroy the queue with NAPI
ID 8451:
$ sudo ethtool -L eth4 combined 2
Check the queue settings, noting that NAPI ID 8451 is gone:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 4}'
[{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8450,
'ifindex': 4,
'irq': 2780},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8449,
'ifindex': 4,
'irq': 2779}]
Now, increase the number of queues back to 4:
$ sudo ethtool -L eth4 combined 4
Dump the settings, expecting to see the same NAPI IDs as above and for
NAPI ID 8451 to have its gro-flush-timeout set to 1111:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
--dump napi-get --json='{"ifindex": 4}'
[{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8452,
'ifindex': 4,
'irq': 2782},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 1111,
'id': 8451,
'ifindex': 4,
'irq': 2781},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8450,
'ifindex': 4,
'irq': 2780},
{'defer-hard-irqs': 0,
'gro-flush-timeout': 0,
'id': 8449,
'ifindex': 4,
'irq': 2779}]
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
E830 adds hardware support to prevent the VF from overflowing the PF
mailbox with VIRTCHNL messages. E830 will use the hardware feature
(ICE_F_MBX_LIMIT) instead of the software solution ice_is_malicious_vf().
To prevent a VF from overflowing the PF, the PF sets the number of
messages per VF that can be in the PF's mailbox queue
(ICE_MBX_OVERFLOW_WATERMARK). When the PF processes a message from a VF,
the PF decrements the per VF message count using the E830_MBX_VF_DEC_TRIG
register.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts (sort of) and no adjacent changes.
This merge reverts commit b3c9e65eb227 ("net: hsr: remove seqnr_lock")
from net, as it was superseded by
commit 430d67bdcb04 ("net: hsr: Use the seqnr lock for frames received via interlink port.")
in net-next.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After vsi setup refactor commit 6624e780a577 ("ice: split ice_vsi_setup
into smaller functions") ice_cfg_sw_lldp function which removes rx rule
directing LLDP packets to vsi is moved from ice_vsi_release to
ice_vsi_decfg function. ice_vsi_decfg is used in more cases than just in
vsi_release resulting in unnecessary removal of rx lldp packets handling
switch rule. This leads to lldp packets being dropped after a change number
of channels via ethtool.
This patch moves ice_cfg_sw_lldp function that removes rx lldp sw rule back
to ice_vsi_release function.
Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions")
Reported-by: Matěj Grégr <mgregr@netx.as>
Closes: https://lore.kernel.org/intel-wired-lan/1be45a76-90af-4813-824f-8398b69745a9@netx.as/T/#u
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Implement devlink port handlers responsible for ethernet type devlink
subfunctions. Create subfunction devlink port and setup all resources
needed for a subfunction netdev to operate. Configure new VSI for each
new subfunction, initialize and configure interrupts and Tx/Rx resources.
Set correct MAC filters and create new netdev.
For now, subfunction is limited to only one Tx/Rx queue pair.
Only allocate new subfunction VSI with devlink port new command.
Allocate and free subfunction MSIX interrupt vectors using new API
calls with pci_msix_alloc_irq_at and pci_msix_free_irq.
Support both automatic and manual subfunction numbers. If no subfunction
number is provided, use xa_alloc to pick a number automatically. This
will find the first free index and use that as the number. This reduces
burden on users in the simple case where a specific number is not
required. It may also be slightly faster to check that a number exists
since xarray lookup should be faster than a linear scan of the dyn_ports
xarray.
Co-developed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Make some of the netdevice_ops functions visible from outside for
another VSI type created netdev.
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add required plumbing for new VSI type dedicated to devlink subfunctions.
Make sure that the vsi is properly configured and destroyed. Also allow
loading XDP and AF_XDP sockets.
The first implementation of devlink subfunctions supports only one Tx/Rx
queue pair per given subfunction.
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Consider the following scenario:
.ndo_bpf() | ice_prepare_for_reset() |
________________________|_______________________________________|
rtnl_lock() | |
ice_down() | |
| test_bit(ICE_VSI_DOWN) - true |
| ice_dis_vsi() returns |
ice_up() | |
| proceeds to rebuild a running VSI |
.ndo_bpf() is not the only rtnl-locked callback that toggles the interface
to apply new configuration. Another example is .set_channels().
To avoid the race condition above, act only after reading ICE_VSI_DOWN
under rtnl_lock.
Fixes: 0f9d5027a749 ("ice: Refactor VSI allocation, deletion and rebuild flow")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
If VSI rebuild is pending, .ndo_bpf() can attach/detach the XDP program on
VSI without applying new ring configuration. When unconfiguring the VSI, we
can encounter the state in which there is an XDP program but no XDP rings
to destroy or there will be XDP rings that need to be destroyed, but no XDP
program to indicate their presence.
When unconfiguring, rely on the presence of XDP rings rather then XDP
program, as they better represent the current state that has to be
destroyed.
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The main threat to data consistency in ice_xdp() is a possible asynchronous
PF reset. It can be triggered by a user or by TX timeout handler.
XDP setup and PF reset code access the same resources in the following
sections:
* ice_vsi_close() in ice_prepare_for_reset() - already rtnl-locked
* ice_vsi_rebuild() for the PF VSI - not protected
* ice_vsi_open() - already rtnl-locked
With an unfortunate timing, such accesses can result in a crash such as the
one below:
[ +1.999878] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 14
[ +2.002992] ice 0000:b1:00.0: Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring 18
[Mar15 18:17] ice 0000:b1:00.0 ens801f0np0: NETDEV WATCHDOG: CPU: 38: transmit queue 14 timed out 80692736 ms
[ +0.000093] ice 0000:b1:00.0 ens801f0np0: tx_timeout: VSI_num: 6, Q 14, NTC: 0x0, HW_HEAD: 0x0, NTU: 0x0, INT: 0x4000001
[ +0.000012] ice 0000:b1:00.0 ens801f0np0: tx_timeout recovery level 1, txqueue 14
[ +0.394718] ice 0000:b1:00.0: PTP reset successful
[ +0.006184] BUG: kernel NULL pointer dereference, address: 0000000000000098
[ +0.000045] #PF: supervisor read access in kernel mode
[ +0.000023] #PF: error_code(0x0000) - not-present page
[ +0.000023] PGD 0 P4D 0
[ +0.000018] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ +0.000023] CPU: 38 PID: 7540 Comm: kworker/38:1 Not tainted 6.8.0-rc7 #1
[ +0.000031] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
[ +0.000036] Workqueue: ice ice_service_task [ice]
[ +0.000183] RIP: 0010:ice_clean_tx_ring+0xa/0xd0 [ice]
[...]
[ +0.000013] Call Trace:
[ +0.000016] <TASK>
[ +0.000014] ? __die+0x1f/0x70
[ +0.000029] ? page_fault_oops+0x171/0x4f0
[ +0.000029] ? schedule+0x3b/0xd0
[ +0.000027] ? exc_page_fault+0x7b/0x180
[ +0.000022] ? asm_exc_page_fault+0x22/0x30
[ +0.000031] ? ice_clean_tx_ring+0xa/0xd0 [ice]
[ +0.000194] ice_free_tx_ring+0xe/0x60 [ice]
[ +0.000186] ice_destroy_xdp_rings+0x157/0x310 [ice]
[ +0.000151] ice_vsi_decfg+0x53/0xe0 [ice]
[ +0.000180] ice_vsi_rebuild+0x239/0x540 [ice]
[ +0.000186] ice_vsi_rebuild_by_type+0x76/0x180 [ice]
[ +0.000145] ice_rebuild+0x18c/0x840 [ice]
[ +0.000145] ? delay_tsc+0x4a/0xc0
[ +0.000022] ? delay_tsc+0x92/0xc0
[ +0.000020] ice_do_reset+0x140/0x180 [ice]
[ +0.000886] ice_service_task+0x404/0x1030 [ice]
[ +0.000824] process_one_work+0x171/0x340
[ +0.000685] worker_thread+0x277/0x3a0
[ +0.000675] ? preempt_count_add+0x6a/0xa0
[ +0.000677] ? _raw_spin_lock_irqsave+0x23/0x50
[ +0.000679] ? __pfx_worker_thread+0x10/0x10
[ +0.000653] kthread+0xf0/0x120
[ +0.000635] ? __pfx_kthread+0x10/0x10
[ +0.000616] ret_from_fork+0x2d/0x50
[ +0.000612] ? __pfx_kthread+0x10/0x10
[ +0.000604] ret_from_fork_asm+0x1b/0x30
[ +0.000604] </TASK>
The previous way of handling this through returning -EBUSY is not viable,
particularly when destroying AF_XDP socket, because the kernel proceeds
with removal anyway.
There is plenty of code between those calls and there is no need to create
a large critical section that covers all of them, same as there is no need
to protect ice_vsi_rebuild() with rtnl_lock().
Add xdp_state_lock mutex to protect ice_vsi_rebuild() and ice_xdp().
Leaving unprotected sections in between would result in two states that
have to be considered:
1. when the VSI is closed, but not yet rebuild
2. when VSI is already rebuild, but not yet open
The latter case is actually already handled through !netif_running() case,
we just need to adjust flag checking a little. The former one is not as
trivial, because between ice_vsi_close() and ice_vsi_rebuild(), a lot of
hardware interaction happens, this can make adding/deleting rings exit
with an error. Luckily, VSI rebuild is pending and can apply new
configuration for us in a managed fashion.
Therefore, add an additional VSI state flag ICE_VSI_REBUILD_PENDING to
indicate that ice_xdp() can just hot-swap the program.
Also, as ice_vsi_rebuild() flow is touched in this patch, make it more
consistent by deconfiguring VSI when coalesce allocation fails.
Fixes: 2d4238f55697 ("ice: Add support for AF_XDP")
Fixes: efc2214b6047 ("ice: Add support for XDP")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Currently, netif_queue_set_napi() is called from ice_vsi_rebuild() that is
not rtnl-locked when called from the reset. This creates the need to take
the rtnl_lock just for a single function and complicates the
synchronization with .ndo_bpf. At the same time, there no actual need to
fill napi-to-queue information at this exact point.
Fill napi-to-queue information when opening the VSI and clear it when the
VSI is being closed. Those routines are already rtnl-locked.
Also, rewrite napi-to-queue assignment in a way that prevents inclusion of
XDP queues, as this leads to out-of-bounds writes, such as one below.
[ +0.000004] BUG: KASAN: slab-out-of-bounds in netif_queue_set_napi+0x1c2/0x1e0
[ +0.000012] Write of size 8 at addr ffff889881727c80 by task bash/7047
[ +0.000006] CPU: 24 PID: 7047 Comm: bash Not tainted 6.10.0-rc2+ #2
[ +0.000004] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0014.082620210524 08/26/2021
[ +0.000003] Call Trace:
[ +0.000003] <TASK>
[ +0.000002] dump_stack_lvl+0x60/0x80
[ +0.000007] print_report+0xce/0x630
[ +0.000007] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ +0.000007] ? __virt_addr_valid+0x1c9/0x2c0
[ +0.000005] ? netif_queue_set_napi+0x1c2/0x1e0
[ +0.000003] kasan_report+0xe9/0x120
[ +0.000004] ? netif_queue_set_napi+0x1c2/0x1e0
[ +0.000004] netif_queue_set_napi+0x1c2/0x1e0
[ +0.000005] ice_vsi_close+0x161/0x670 [ice]
[ +0.000114] ice_dis_vsi+0x22f/0x270 [ice]
[ +0.000095] ice_pf_dis_all_vsi.constprop.0+0xae/0x1c0 [ice]
[ +0.000086] ice_prepare_for_reset+0x299/0x750 [ice]
[ +0.000087] pci_dev_save_and_disable+0x82/0xd0
[ +0.000006] pci_reset_function+0x12d/0x230
[ +0.000004] reset_store+0xa0/0x100
[ +0.000006] ? __pfx_reset_store+0x10/0x10
[ +0.000002] ? __pfx_mutex_lock+0x10/0x10
[ +0.000004] ? __check_object_size+0x4c1/0x640
[ +0.000007] kernfs_fop_write_iter+0x30b/0x4a0
[ +0.000006] vfs_write+0x5d6/0xdf0
[ +0.000005] ? fd_install+0x180/0x350
[ +0.000005] ? __pfx_vfs_write+0x10/0xA10
[ +0.000004] ? do_fcntl+0x52c/0xcd0
[ +0.000004] ? kasan_save_track+0x13/0x60
[ +0.000003] ? kasan_save_free_info+0x37/0x60
[ +0.000006] ksys_write+0xfa/0x1d0
[ +0.000003] ? __pfx_ksys_write+0x10/0x10
[ +0.000002] ? __x64_sys_fcntl+0x121/0x180
[ +0.000004] ? _raw_spin_lock+0x87/0xe0
[ +0.000005] do_syscall_64+0x80/0x170
[ +0.000007] ? _raw_spin_lock+0x87/0xe0
[ +0.000004] ? __pfx__raw_spin_lock+0x10/0x10
[ +0.000003] ? file_close_fd_locked+0x167/0x230
[ +0.000005] ? syscall_exit_to_user_mode+0x7d/0x220
[ +0.000005] ? do_syscall_64+0x8c/0x170
[ +0.000004] ? do_syscall_64+0x8c/0x170
[ +0.000003] ? do_syscall_64+0x8c/0x170
[ +0.000003] ? fput+0x1a/0x2c0
[ +0.000004] ? filp_close+0x19/0x30
[ +0.000004] ? do_dup2+0x25a/0x4c0
[ +0.000004] ? __x64_sys_dup2+0x6e/0x2e0
[ +0.000002] ? syscall_exit_to_user_mode+0x7d/0x220
[ +0.000004] ? do_syscall_64+0x8c/0x170
[ +0.000003] ? __count_memcg_events+0x113/0x380
[ +0.000005] ? handle_mm_fault+0x136/0x820
[ +0.000005] ? do_user_addr_fault+0x444/0xa80
[ +0.000004] ? clear_bhb_loop+0x25/0x80
[ +0.000004] ? clear_bhb_loop+0x25/0x80
[ +0.000002] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ +0.000005] RIP: 0033:0x7f2033593154
Fixes: 080b0c8d6d26 ("ice: Fix ASSERT_RTNL() warning during certain scenarios")
Fixes: 91fdbce7e8d6 ("ice: Add support in the driver for associating queue with napi")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
irq_set_affinity_hint() is deprecated. Use irq_update_affinity_hint()
instead. This removes the side-effect of actually applying the affinity.
The driver does not really need to worry about spreading its IRQs across
CPUs. The core code already takes care of that.
On the contrary, when the driver applies affinities by itself, it breaks
the users' expectations:
1. The user configures irqbalance with IRQBALANCE_BANNED_CPULIST in
order to prevent IRQs from being moved to certain CPUs that run a
real-time workload.
2. ice reconfigures VSIs at runtime due to a MIB change
(ice_dcb_process_lldp_set_mib_change). Reopening a VSI resets the
affinity in ice_vsi_req_irq_msix().
3. ice has no idea about irqbalance's config, so it may move an IRQ to
a banned CPU. The real-time workload suffers unacceptable latency.
I am not sure if updating the affinity hints is at all useful, because
irqbalance ignores them since 2016 ([1]), but at least it's harmless.
This ice change is similar to i40e commit d34c54d1739c ("i40e: Use
irq_update_affinity_hint()").
[1] https://github.com/Irqbalance/irqbalance/commit/dcc411e7bfdd
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Sunil Goutham <sgoutham@marvell.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240607-next-2024-06-03-intel-next-batch-v3-3-d1470cee3347@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ice_pf_dcb_recfg() re-maps queues to vectors with
ice_vsi_map_rings_to_vectors(), which does not restore the previous
state for XDP queues. This leads to no AF_XDP traffic after rebuild.
Map XDP queues to vectors in ice_vsi_map_rings_to_vectors().
Also, move the code around, so XDP queues are mapped independently only
through .ndo_bpf().
Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-5-e3563aa89b0c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 6624e780a577 ("ice: split ice_vsi_setup into smaller functions")
has placed ice_vsi_free_q_vectors() after ice_destroy_xdp_rings() in
the rebuild process. The behaviour of the XDP rings config functions is
context-dependent, so the change of order has led to
ice_destroy_xdp_rings() doing additional work and removing XDP prog, when
it was supposed to be preserved.
Also, dependency on the PF state reset flags creates an additional,
fortunately less common problem:
* PFR is requested e.g. by tx_timeout handler
* .ndo_bpf() is asked to delete the program, calls ice_destroy_xdp_rings(),
but reset flag is set, so rings are destroyed without deleting the
program
* ice_vsi_rebuild tries to delete non-existent XDP rings, because the
program is still on the VSI
* system crashes
With a similar race, when requested to attach a program,
ice_prepare_xdp_rings() can actually skip setting the program in the VSI
and nevertheless report success.
Instead of reverting to the old order of function calls, add an enum
argument to both ice_prepare_xdp_rings() and ice_destroy_xdp_rings() in
order to distinguish between calls from rebuild and .ndo_bpf().
Fixes: efc2214b6047 ("ice: Add support for XDP")
Reviewed-by: Igor Bagnucki <igor.bagnucki@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-4-e3563aa89b0c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Referenced commit has introduced a bitmap to distinguish between ZC and
copy-mode AF_XDP queues, because xsk_get_pool_from_qid() does not do this
for us.
The bitmap would be especially useful when restoring previous state after
rebuild, if only it was not reallocated in the process. This leads to e.g.
xdpsock dying after changing number of queues.
Instead of preserving the bitmap during the rebuild, remove it completely
and distinguish between ZC and copy-mode queues based on the presence of
a device associated with the pool.
Fixes: e102db780e1c ("ice: track AF_XDP ZC enabled queues in bitmap")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-3-e3563aa89b0c@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Refactor struct ice_vsi_cfg_params to be embedded into struct ice_vsi.
Prior to that the members of the struct were scattered around ice_vsi,
and were copy-pasted for purposes of reinit.
Now we have struct handy, and it is easier to have something sticky
in the flags field.
Suggested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Vaishnavi Tipireddy <vaishnavi.tipireddy@intel.com>
Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Keep devlink related code in a separate file. More devlink port code and
handlers will be added here for other port operations.
Remove no longer needed include of our devlink.h in ice_lib.c.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Only moving whole files, fixing Makefile and bunch of includes.
Some changes to ice_devlink file was done even in representor part (Tx
topology), so keep it as final patch to not mess up with rebasing.
After moving to devlink folder there is no need to have such long name
for these files. Rename them to simple devlink.
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Tony Nguyen says:
====================
ice: use less resources in switchdev
Michal Swiatkowski says:
Switchdev is using one queue per created port representor. This can
quickly lead to Rx queue shortage, as with subfunction support user
can create high number of PRs.
Save one MSI-X and 'number of PRs' * 1 queues.
Refactor switchdev slow-path to use less resources (even no additional
resources). Do this by removing control plane VSI and move its
functionality to PF VSI. Even with current solution PF is acting like
uplink and can't be used simultaneously for other use cases (adding
filters can break slow-path).
In short, do Tx via PF VSI and Rx packets using PF resources. Rx needs
additional code in interrupt handler to choose correct PR netdev.
Previous solution had to queue filters, it was way more elegant but
needed one queue per PRs. Beside that this refactor mostly simplifies
switchdev configuration.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: count representor stats
ice: do switchdev slow-path Rx using PF VSI
ice: change repr::id values
ice: remove switchdev control plane VSI
ice: control default Tx rule in lag
ice: default Tx rule instead of to queue
ice: do Tx through PF netdev in slow-path
ice: remove eswitch changing queues algorithm
====================
Link: https://lore.kernel.org/r/20240325202623.1012287-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For slow-path Rx and Tx PF VSI is used. There is no need to have control
plane VSI. Remove all code related to it.
Eswitch rebuild can't fail without rebuilding control plane VSI. Return
void from ice_eswitch_rebuild().
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The ice driver would previously panic after suspend. This is caused
from the driver *only* calling the ice_vsi_free_q_vectors() function by
itself, when it is suspending. Since commit b3e7b3a6ee92 ("ice: prevent
NULL pointer deref during reload") the driver has zeroed out
num_q_vectors, and only restored it in ice_vsi_cfg_def().
This further causes the ice_rebuild() function to allocate a zero length
buffer, after which num_q_vectors is updated, and then the new value of
num_q_vectors is used to index into the zero length buffer, which
corrupts memory.
The fix entails making sure all the code referencing num_q_vectors only
does so after it has been reset via ice_vsi_cfg_def().
I didn't perform a full bisect, but I was able to test against 6.1.77
kernel and that ice driver works fine for suspend/resume with no panic,
so sometime since then, this problem was introduced.
Also clean up an un-needed init of a local variable in the function
being modified.
PANIC from 6.8.0-rc1:
[1026674.915596] PM: suspend exit
[1026675.664697] ice 0000:17:00.1: PTP reset successful
[1026675.664707] ice 0000:17:00.1: 2755 msecs passed between update to cached PHC time
[1026675.667660] ice 0000:b1:00.0: PTP reset successful
[1026675.675944] ice 0000:b1:00.0: 2832 msecs passed between update to cached PHC time
[1026677.137733] ixgbe 0000:31:00.0 ens787: NIC Link is Up 1 Gbps, Flow Control: None
[1026677.190201] BUG: kernel NULL pointer dereference, address: 0000000000000010
[1026677.192753] ice 0000:17:00.0: PTP reset successful
[1026677.192764] ice 0000:17:00.0: 4548 msecs passed between update to cached PHC time
[1026677.197928] #PF: supervisor read access in kernel mode
[1026677.197933] #PF: error_code(0x0000) - not-present page
[1026677.197937] PGD 1557a7067 P4D 0
[1026677.212133] ice 0000:b1:00.1: PTP reset successful
[1026677.212143] ice 0000:b1:00.1: 4344 msecs passed between update to cached PHC time
[1026677.212575]
[1026677.243142] Oops: 0000 [#1] PREEMPT SMP NOPTI
[1026677.247918] CPU: 23 PID: 42790 Comm: kworker/23:0 Kdump: loaded Tainted: G W 6.8.0-rc1+ #1
[1026677.257989] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022
[1026677.269367] Workqueue: ice ice_service_task [ice]
[1026677.274592] RIP: 0010:ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice]
[1026677.281421] Code: 0f 84 3a ff ff ff 41 0f b7 74 ec 02 66 89 b0 22 02 00 00 81 e6 ff 1f 00 00 e8 ec fd ff ff e9 35 ff ff ff 48 8b 43 30 49 63 ed <41> 0f b7 34 24 41 83 c5 01 48 8b 3c e8 66 89 b7 aa 02 00 00 81 e6
[1026677.300877] RSP: 0018:ff3be62a6399bcc0 EFLAGS: 00010202
[1026677.306556] RAX: ff28691e28980828 RBX: ff28691e41099828 RCX: 0000000000188000
[1026677.314148] RDX: 0000000000000000 RSI: 0000000000000010 RDI: ff28691e41099828
[1026677.321730] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[1026677.329311] R10: 0000000000000007 R11: ffffffffffffffc0 R12: 0000000000000010
[1026677.336896] R13: 0000000000000000 R14: 0000000000000000 R15: ff28691e0eaa81a0
[1026677.344472] FS: 0000000000000000(0000) GS:ff28693cbffc0000(0000) knlGS:0000000000000000
[1026677.353000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1026677.359195] CR2: 0000000000000010 CR3: 0000000128df4001 CR4: 0000000000771ef0
[1026677.366779] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1026677.374369] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[1026677.381952] PKRU: 55555554
[1026677.385116] Call Trace:
[1026677.388023] <TASK>
[1026677.390589] ? __die+0x20/0x70
[1026677.394105] ? page_fault_oops+0x82/0x160
[1026677.398576] ? do_user_addr_fault+0x65/0x6a0
[1026677.403307] ? exc_page_fault+0x6a/0x150
[1026677.407694] ? asm_exc_page_fault+0x22/0x30
[1026677.412349] ? ice_vsi_rebuild_set_coalesce+0x130/0x1e0 [ice]
[1026677.418614] ice_vsi_rebuild+0x34b/0x3c0 [ice]
[1026677.423583] ice_vsi_rebuild_by_type+0x76/0x180 [ice]
[1026677.429147] ice_rebuild+0x18b/0x520 [ice]
[1026677.433746] ? delay_tsc+0x8f/0xc0
[1026677.437630] ice_do_reset+0xa3/0x190 [ice]
[1026677.442231] ice_service_task+0x26/0x440 [ice]
[1026677.447180] process_one_work+0x174/0x340
[1026677.451669] worker_thread+0x27e/0x390
[1026677.455890] ? __pfx_worker_thread+0x10/0x10
[1026677.460627] kthread+0xee/0x120
[1026677.464235] ? __pfx_kthread+0x10/0x10
[1026677.468445] ret_from_fork+0x2d/0x50
[1026677.472476] ? __pfx_kthread+0x10/0x10
[1026677.476671] ret_from_fork_asm+0x1b/0x30
[1026677.481050] </TASK>
Fixes: b3e7b3a6ee92 ("ice: prevent NULL pointer deref during reload")
Reported-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Tony Nguyen says:
====================
ethtool: ice: Support for RSS settings to GTP
Takeru Hayasaka enables RSS functionality for GTP packets on ice driver
with ethtool.
A user can include TEID and make RSS work for GTP-U over IPv4 by doing the
following:`ethtool -N ens3 rx-flow-hash gtpu4 sde`
In addition to gtpu(4|6), we now support gtpc(4|6),gtpc(4|6)t,gtpu(4|6)e,
gtpu(4|6)u, and gtpu(4|6)d.
gtpc(4|6): Used for GTP-C in IPv4 and IPv6, where the GTP header format does
not include a TEID.
gtpc(4|6)t: Used for GTP-C in IPv4 and IPv6, with a GTP header format that
includes a TEID.
gtpu(4|6): Used for GTP-U in both IPv4 and IPv6 scenarios.
gtpu(4|6)e: Used for GTP-U with extended headers in both IPv4 and IPv6.
gtpu(4|6)u: Used when the PSC (PDU session container) in the GTP-U extended
header includes Uplink, applicable to both IPv4 and IPv6.
gtpu(4|6)d: Used when the PSC in the GTP-U extended header includes Downlink,
for both IPv4 and IPv6.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
net/core/page_pool_user.c
0b11b1c5c320 ("netdev: let netlink core handle -EMSGSIZE errors")
429679dcf7d9 ("page_pool: fix netlink dump stop/resume")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Following the addition of new GTP RSS hash options to ethtool.h, this patch
implements the corresponding RSS settings for GTP packets in the Intel ice
driver. It enables users to configure RSS for GTP-U and GTP-C traffic over IPv4
and IPv6, utilizing the newly defined hash options.
The implementation covers the handling of gtpu(4|6), gtpc(4|6), gtpc(4|6)t,
gtpu(4|6)e, gtpu(4|6)u, and gtpu(4|6)d traffic, providing enhanced load
distribution for GTP traffic across multiple processing units.
Signed-off-by: Takeru Hayasaka <hayatake396@gmail.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Fix an obviously incorrect assignment, created with a typo or cut-n-paste
error.
Fixes: 5995ef88e3a8 ("ice: realloc VSI stats arrays")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
ice_down() clears QINT_TQCTL_CAUSE_ENA_M bit twice, which is not
necessary. First clearing happens in ice_vsi_dis_irq() and second in
ice_vsi_stop_tx_ring() - remove the first one.
While at it, make ice_vsi_dis_irq() static as ice_down() is the only
current caller of it.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
net/mptcp/protocol.c
adf1bb78dab5 ("mptcp: fix snd_wnd initialization for passive socket")
9426ce476a70 ("mptcp: annotate lockless access for RX path fields")
https://lore.kernel.org/all/20240228103048.19255709@canb.auug.org.au/
Adjacent changes:
drivers/dpll/dpll_core.c
0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()")
e7f8df0e81bf ("dpll: move xa_erase() call in to match dpll_pin_alloc() error path order")
drivers/net/veth.c
1ce7d306ea63 ("veth: try harder when allocating queue memory")
0bef512012b1 ("net: add netdev_lockdep_set_classes() to virtual drivers")
drivers/net/wireless/intel/iwlwifi/mvm/d3.c
8c9bef26e98b ("wifi: iwlwifi: mvm: d3: implement suspend with MLO")
78f65fbf421a ("wifi: iwlwifi: mvm: ensure offloading TID queue exists")
net/wireless/nl80211.c
f78c1375339a ("wifi: nl80211: reject iftype change with mesh ID change")
414532d8aa89 ("wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriately")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 91fdbce7e8d6 ("ice: Add support in the driver for associating
queue with napi") invoked the netif_queue_set_napi() call. This
kernel function requires to be called with rtnl_lock taken,
otherwise ASSERT_RTNL() warning will be triggered. ice_vsi_rebuild()
initiating this call is under rtnl_lock when the rebuild is in
response to configuration changes from external interfaces (such as
tc, ethtool etc. which holds the lock). But, the VSI rebuild
generated from service tasks and resets (PFR/CORER/GLOBR) is not
under rtnl lock protection. Handle these cases as well to hold lock
before the kernel call (by setting the 'locked' boolean to false).
netif_queue_set_napi() is also used to clear previously set napi
in the q_vector unroll flow. Handle this for locked/lockless execution
paths.
Fixes: 91fdbce7e8d6 ("ice: Add support in the driver for associating queue with napi")
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Currently, XSK control path in ice driver calls directly
ice_vsi_cfg_txq() whereas we have ice_vsi_cfg_single_txq() for that
purpose. Use the latter from XSK side and make ice_vsi_cfg_txq() static.
ice_vsi_cfg_txq() resides in ice_base.c and is rather big, so to reduce
the code churn let us move the callers of it from ice_lib.c to
ice_base.c.
This change puts ice_qp_ena() on nice diet due to the checks and
operations that ice_vsi_cfg_single_{r,t}xq() do internally.
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-182 (-182)
Function old new delta
ice_xsk_pool_setup 2165 1983 -182
Total: Before=472597, After=472415, chg -0.04%
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Currently, XSK control path in ice driver calls directly
ice_vsi_cfg_rxq() whereas we have ice_vsi_cfg_single_rxq() for that
purpose. Use the latter from XSK side and make ice_vsi_cfg_rxq() static.
ice_vsi_cfg_rxq() resides in ice_base.c and is rather big, so to reduce
the code churn let us move two callers of it from ice_lib.c to
ice_base.c.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Tony Nguyen says:
====================
intel: use bitfield operations
Jesse Brandeburg says:
After repeatedly getting review comments on new patches, and sporadic
patches to fix parts of our drivers, we should just convert the Intel code
to use FIELD_PREP() and FIELD_GET(). It's then "common" in the code and
hopefully future change-sets will see the context and do-the-right-thing.
This conversion was done with a coccinelle script which is mentioned in the
commit messages. Generally there were only a couple conversions that were
"undone" after the automatic changes because they tried to convert a
non-contiguous mask.
Patch 1 is required at the beginning of this series to fix a "forever"
issue in the e1000e driver that fails the compilation test after conversion
because the shift / mask was out of range.
The second patch just adds all the new #includes in one go.
The patch titled: "ice: fix pre-shifted bit usage" is needed to allow the
use of the FIELD_* macros and fix up the unexpected "shifts included"
defines found while creating this series.
The rest are the conversion to use FIELD_PREP()/FIELD_GET(), and the
occasional leXX_{get,set,encode}_bits() call, as suggested by Alex.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cross-merge networking fixes after downstream PR.
Adjacent changes:
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
23c93c3b6275 ("bnxt_en: do not map packet buffers twice")
6d1add95536b ("bnxt_en: Modify TX ring indexing logic.")
tools/testing/selftests/net/Makefile
2258b666482d ("selftests: add vlan hw filter tests")
a0bc96c0cd6e ("selftests: net: verify fq per-band packet limit")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
It was found while doing further testing of the previous commit
fbf32a9bab91 ("ice: field get conversion") that one of the FIELD_GET
conversions should really be a FIELD_PREP. The previous code was styled
as a match to the FIELD_GET conversion, which always worked because the
shift value was 0. The code makes way more sense as a FIELD_PREP and
was in fact the only FIELD_GET with two constant arguments in this
series.
Didn't squash this patch to make it easier to call out the
(non-impactful) bug.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Refactor the ice driver to use FIELD_GET() for mask and shift reads,
which reduces lines of code and adds clarity of intent.
This code was generated by the following coccinelle/spatch script and
then manually repaired.
@get@
constant shift,mask;
type T;
expression a;
@@
-(((T)(a) & mask) >> shift)
+FIELD_GET(mask, a)
and applied via:
spatch --sp-file field_prep.cocci --in-place --dir \
drivers/net/ethernet/intel/
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
While converting to FIELD_PREP() and FIELD_GET(), it was noticed that
some of the RSS defines had *included* the shift in their definitions.
This is completely outside of normal, such that a developer could easily
make a mistake and shift at the usage site (like when using
FIELD_PREP()).
Rename the defines and set them to the "pre-shifted values" so they
match the template the driver normally uses for masks and the member
bits of the mask, which also allows the driver to use FIELD_PREP
correctly with these values. Use GENMASK() for this changed MASK value.
Do the same for the VLAN EMODE defines as well.
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Refactor ice driver to use FIELD_PREP(), which reduces lines of code
and adds clarity of intent.
This code was generated by the following coccinelle/spatch script and
then manually repaired.
Several places I changed to OR into a single variable with |= instead of
using a multi-line statement with trailing OR operators, as it
(subjectively) makes the code clearer.
A local variable vmvf_and_timeout was created and used to avoid multiple
logical ORs being __le16 converted, which shortened some lines and makes
the code cleaner.
Also clean up a couple of places where conversions were made to have the
code read more clearly/consistently.
@prep2@
constant shift,mask;
type T;
expression a;
@@
-(((T)(a) << shift) & mask)
+FIELD_PREP(mask, a)
@prep@
constant shift,mask;
type T;
expression a;
@@
-((T)((a) << shift) & mask)
+FIELD_PREP(mask, a)
Cc: Julia Lawall <Julia.Lawall@inria.fr>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Commit 6624e780a577fc596788 ("ice: split ice_vsi_setup into smaller
functions") has refactored a bunch of code involved in PFR. In this
process, TC queue number adjustment for XDP was lost. Bring it back.
Lack of such adjustment causes interface to go into no-carrier after a
reset, if XDP program is attached, with the following message:
ice 0000:b1:00.0: Failed to set LAN Tx queue context, error: -22
ice 0000:b1:00.0 ens801f0np0: Failed to open VSI 0x0006 on switch 0x0001
ice 0000:b1:00.0: enable VSI failed, err -22, VSI index 0, type ICE_VSI_PF
ice 0000:b1:00.0: PF VSI rebuild failed: -22
ice 0000:b1:00.0: Rebuild failed, unload and reload driver
Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|