aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/sfc
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2022-03-24 13:13:26 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2022-03-24 13:13:26 -0700
commit169e77764adc041b1dacba84ea90516a895d43b2 (patch)
treeaf7124681fa65d40fccee902af5194ab9f9c95f4 /drivers/net/ethernet/sfc
parentMerge tag 'vfio-v5.18-rc1' of https://github.com/awilliam/linux-vfio (diff)
parentMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (diff)
downloadlinux-dev-169e77764adc041b1dacba84ea90516a895d43b2.tar.xz
linux-dev-169e77764adc041b1dacba84ea90516a895d43b2.zip
Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski: "The sprinkling of SPI drivers is because we added a new one and Mark sent us a SPI driver interface conversion pull request. Core ---- - Introduce XDP multi-buffer support, allowing the use of XDP with jumbo frame MTUs and combination with Rx coalescing offloads (LRO). - Speed up netns dismantling (5x) and lower the memory cost a little. Remove unnecessary per-netns sockets. Scope some lists to a netns. Cut down RCU syncing. Use batch methods. Allow netdev registration to complete out of order. - Support distinguishing timestamp types (ingress vs egress) and maintaining them across packet scrubbing points (e.g. redirect). - Continue the work of annotating packet drop reasons throughout the stack. - Switch netdev error counters from an atomic to dynamically allocated per-CPU counters. - Rework a few preempt_disable(), local_irq_save() and busy waiting sections problematic on PREEMPT_RT. - Extend the ref_tracker to allow catching use-after-free bugs. BPF --- - Introduce "packing allocator" for BPF JIT images. JITed code is marked read only, and used to be allocated at page granularity. Custom allocator allows for more efficient memory use, lower iTLB pressure and prevents identity mapping huge pages from getting split. - Make use of BTF type annotations (e.g. __user, __percpu) to enforce the correct probe read access method, add appropriate helpers. - Convert the BPF preload to use light skeleton and drop the user-mode-driver dependency. - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling its use as a packet generator. - Allow local storage memory to be allocated with GFP_KERNEL if called from a hook allowed to sleep. - Introduce fprobe (multi kprobe) to speed up mass attachment (arch bits to come later). - Add unstable conntrack lookup helpers for BPF by using the BPF kfunc infra. - Allow cgroup BPF progs to return custom errors to user space. - Add support for AF_UNIX iterator batching. - Allow iterator programs to use sleepable helpers. - Support JIT of add, and, or, xor and xchg atomic ops on arm64. - Add BTFGen support to bpftool which allows to use CO-RE in kernels without BTF info. - Large number of libbpf API improvements, cleanups and deprecations. Protocols --------- - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev. - Adjust TSO packet sizes based on min_rtt, allowing very low latency links (data centers) to always send full-sized TSO super-frames. - Make IPv6 flow label changes (AKA hash rethink) more configurable, via sysctl and setsockopt. Distinguish between server and client behavior. - VxLAN support to "collect metadata" devices to terminate only configured VNIs. This is similar to VLAN filtering in the bridge. - Support inserting IPv6 IOAM information to a fraction of frames. - Add protocol attribute to IP addresses to allow identifying where given address comes from (kernel-generated, DHCP etc.) - Support setting socket and IPv6 options via cmsg on ping6 sockets. - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS. Define dscp_t and stop taking ECN bits into account in fib-rules. - Add support for locked bridge ports (for 802.1X). - tun: support NAPI for packets received from batched XDP buffs, doubling the performance in some scenarios. - IPv6 extension header handling in Open vSwitch. - Support IPv6 control message load balancing in bonding, prevent neighbor solicitation and advertisement from using the wrong port. Support NS/NA monitor selection similar to existing ARP monitor. - SMC - improve performance with TCP_CORK and sendfile() - support auto-corking - support TCP_NODELAY - MCTP (Management Component Transport Protocol) - add user space tag control interface - I2C binding driver (as specified by DMTF DSP0237) - Multi-BSSID beacon handling in AP mode for WiFi. - Bluetooth: - handle MSFT Monitor Device Event - add MGMT Adv Monitor Device Found/Lost events - Multi-Path TCP: - add support for the SO_SNDTIMEO socket option - lots of selftest cleanups and improvements - Increase the max PDU size in CAN ISOTP to 64 kB. Driver API ---------- - Add HW counters for SW netdevs, a mechanism for devices which offload packet forwarding to report packet statistics back to software interfaces such as tunnels. - Select the default NIC queue count as a fraction of number of physical CPU cores, instead of hard-coding to 8. - Expose devlink instance locks to drivers. Allow device layer of drivers to use that lock directly instead of creating their own which always runs into ordering issues in devlink callbacks. - Add header/data split indication to guide user space enabling of TCP zero-copy Rx. - Allow configuring completion queue event size. - Refactor page_pool to enable fragmenting after allocation. - Add allocation and page reuse statistics to page_pool. - Improve Multiple Spanning Trees support in the bridge to allow reuse of topologies across VLANs, saving HW resources in switches. - DSA (Distributed Switch Architecture): - replay and offload of host VLAN entries - offload of static and local FDB entries on LAG interfaces - FDB isolation and unicast filtering New hardware / drivers ---------------------- - Ethernet: - LAN937x T1 PHYs - Davicom DM9051 SPI NIC driver - Realtek RTL8367S, RTL8367RB-VB switch and MDIO - Microchip ksz8563 switches - Netronome NFP3800 SmartNICs - Fungible SmartNICs - MediaTek MT8195 switches - WiFi: - mt76: MediaTek mt7916 - mt76: MediaTek mt7921u USB adapters - brcmfmac: Broadcom BCM43454/6 - Mobile: - iosm: Intel M.2 7360 WWAN card Drivers ------- - Convert many drivers to the new phylink API built for split PCS designs but also simplifying other cases. - Intel Ethernet NICs: - add TTY for GNSS module for E810T device - improve AF_XDP performance - GTP-C and GTP-U filter offload - QinQ VLAN support - Mellanox Ethernet NICs (mlx5): - support xdp->data_meta - multi-buffer XDP - offload tc push_eth and pop_eth actions - Netronome Ethernet NICs (nfp): - flow-independent tc action hardware offload (police / meter) - AF_XDP - Other Ethernet NICs: - at803x: fiber and SFP support - xgmac: mdio: preamble suppression and custom MDC frequencies - r8169: enable ASPM L1.2 if system vendor flags it as safe - macb/gem: ZynqMP SGMII - hns3: add TX push mode - dpaa2-eth: software TSO - lan743x: multi-queue, mdio, SGMII, PTP - axienet: NAPI and GRO support - Mellanox Ethernet switches (mlxsw): - source and dest IP address rewrites - RJ45 ports - Marvell Ethernet switches (prestera): - basic routing offload - multi-chain TC ACL offload - NXP embedded Ethernet switches (ocelot & felix): - PTP over UDP with the ocelot-8021q DSA tagging protocol - basic QoS classification on Felix DSA switch using dcbnl - port mirroring for ocelot switches - Microchip high-speed industrial Ethernet (sparx5): - offloading of bridge port flooding flags - PTP Hardware Clock - Other embedded switches: - lan966x: PTP Hardward Clock - qca8k: mdio read/write operations via crafted Ethernet packets - Qualcomm 802.11ax WiFi (ath11k): - add LDPC FEC type and 802.11ax High Efficiency data in radiotap - enable RX PPDU stats in monitor co-exist mode - Intel WiFi (iwlwifi): - UHB TAS enablement via BIOS - band disablement via BIOS - channel switch offload - 32 Rx AMPDU sessions in newer devices - MediaTek WiFi (mt76): - background radar detection - thermal management improvements on mt7915 - SAR support for more mt76 platforms - MBSSID and 6 GHz band on mt7915 - RealTek WiFi: - rtw89: AP mode - rtw89: 160 MHz channels and 6 GHz band - rtw89: hardware scan - Bluetooth: - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS) - Microchip CAN (mcp251xfd): - multiple RX-FIFOs and runtime configurable RX/TX rings - internal PLL, runtime PM handling simplification - improve chip detection and error handling after wakeup" * tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits) llc: fix netdevice reference leaks in llc_ui_bind() drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool ice: don't allow to run ice_send_event_to_aux() in atomic ctx ice: fix 'scheduling while atomic' on aux critical err interrupt net/sched: fix incorrect vlan_push_eth dest field net: bridge: mst: Restrict info size queries to bridge ports net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init() drivers: net: xgene: Fix regression in CRC stripping net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT net: dsa: fix missing host-filtered multicast addresses net/mlx5e: Fix build warning, detected write beyond size of field iwlwifi: mvm: Don't fail if PPAG isn't supported selftests/bpf: Fix kprobe_multi test. Revert "rethook: x86: Add rethook x86 implementation" Revert "arm64: rethook: Add arm64 rethook implementation" Revert "powerpc: Add rethook support" Revert "ARM: rethook: Add rethook arm implementation" netdevice: add missing dm_private kdoc net: bridge: mst: prevent NULL deref in br_mst_info_size() selftests: forwarding: Use same VRF for port and VLAN upper ...
Diffstat (limited to 'drivers/net/ethernet/sfc')
-rw-r--r--drivers/net/ethernet/sfc/ef10.c26
-rw-r--r--drivers/net/ethernet/sfc/ef100_nic.c9
-rw-r--r--drivers/net/ethernet/sfc/efx_channels.c63
-rw-r--r--drivers/net/ethernet/sfc/net_driver.h2
-rw-r--r--drivers/net/ethernet/sfc/nic_common.h5
-rw-r--r--drivers/net/ethernet/sfc/rx_common.c18
-rw-r--r--drivers/net/ethernet/sfc/rx_common.h6
-rw-r--r--drivers/net/ethernet/sfc/siena.c8
8 files changed, 101 insertions, 36 deletions
diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index cf366ed2557c..50d535981a35 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -3990,6 +3990,30 @@ static unsigned int ef10_check_caps(const struct efx_nic *efx,
}
}
+static unsigned int efx_ef10_recycle_ring_size(const struct efx_nic *efx)
+{
+ unsigned int ret = EFX_RECYCLE_RING_SIZE_10G;
+
+ /* There is no difference between PFs and VFs. The side is based on
+ * the maximum link speed of a given NIC.
+ */
+ switch (efx->pci_dev->device & 0xfff) {
+ case 0x0903: /* Farmingdale can do up to 10G */
+ break;
+ case 0x0923: /* Greenport can do up to 40G */
+ case 0x0a03: /* Medford can do up to 40G */
+ ret *= 4;
+ break;
+ default: /* Medford2 can do up to 100G */
+ ret *= 10;
+ }
+
+ if (IS_ENABLED(CONFIG_PPC64))
+ ret *= 4;
+
+ return ret;
+}
+
#define EF10_OFFLOAD_FEATURES \
(NETIF_F_IP_CSUM | \
NETIF_F_HW_VLAN_CTAG_FILTER | \
@@ -4106,6 +4130,7 @@ const struct efx_nic_type efx_hunt_a0_vf_nic_type = {
.check_caps = ef10_check_caps,
.print_additional_fwver = efx_ef10_print_additional_fwver,
.sensor_event = efx_mcdi_sensor_event,
+ .rx_recycle_ring_size = efx_ef10_recycle_ring_size,
};
const struct efx_nic_type efx_hunt_a0_nic_type = {
@@ -4243,4 +4268,5 @@ const struct efx_nic_type efx_hunt_a0_nic_type = {
.check_caps = ef10_check_caps,
.print_additional_fwver = efx_ef10_print_additional_fwver,
.sensor_event = efx_mcdi_sensor_event,
+ .rx_recycle_ring_size = efx_ef10_recycle_ring_size,
};
diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
index f79b14a119ae..a07cbf45a326 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.c
+++ b/drivers/net/ethernet/sfc/ef100_nic.c
@@ -23,6 +23,7 @@
#include "ef100_rx.h"
#include "ef100_tx.h"
#include "ef100_netdev.h"
+#include "rx_common.h"
#define EF100_MAX_VIS 4096
#define EF100_NUM_MCDI_BUFFERS 1
@@ -696,6 +697,12 @@ static unsigned int ef100_check_caps(const struct efx_nic *efx,
}
}
+static unsigned int efx_ef100_recycle_ring_size(const struct efx_nic *efx)
+{
+ /* Maximum link speed for Riverhead is 100G */
+ return 10 * EFX_RECYCLE_RING_SIZE_10G;
+}
+
/* NIC level access functions
*/
#define EF100_OFFLOAD_FEATURES (NETIF_F_HW_CSUM | NETIF_F_RXCSUM | \
@@ -770,6 +777,7 @@ const struct efx_nic_type ef100_pf_nic_type = {
.rx_push_rss_context_config = efx_mcdi_rx_push_rss_context_config,
.rx_pull_rss_context_config = efx_mcdi_rx_pull_rss_context_config,
.rx_restore_rss_contexts = efx_mcdi_rx_restore_rss_contexts,
+ .rx_recycle_ring_size = efx_ef100_recycle_ring_size,
.reconfigure_mac = ef100_reconfigure_mac,
.reconfigure_port = efx_mcdi_port_reconfigure,
@@ -849,6 +857,7 @@ const struct efx_nic_type ef100_vf_nic_type = {
.rx_pull_rss_config = efx_mcdi_rx_pull_rss_config,
.rx_push_rss_config = efx_mcdi_pf_rx_push_rss_config,
.rx_restore_rss_contexts = efx_mcdi_rx_restore_rss_contexts,
+ .rx_recycle_ring_size = efx_ef100_recycle_ring_size,
.reconfigure_mac = ef100_reconfigure_mac,
.test_nvram = efx_new_mcdi_nvram_test_all,
diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c
index ead550ae2709..d6fdcdc530ca 100644
--- a/drivers/net/ethernet/sfc/efx_channels.c
+++ b/drivers/net/ethernet/sfc/efx_channels.c
@@ -78,31 +78,48 @@ static const struct efx_channel_type efx_default_channel_type = {
* INTERRUPTS
*************/
-static unsigned int efx_wanted_parallelism(struct efx_nic *efx)
+static unsigned int count_online_cores(struct efx_nic *efx, bool local_node)
{
- cpumask_var_t thread_mask;
+ cpumask_var_t filter_mask;
unsigned int count;
int cpu;
+ if (unlikely(!zalloc_cpumask_var(&filter_mask, GFP_KERNEL))) {
+ netif_warn(efx, probe, efx->net_dev,
+ "RSS disabled due to allocation failure\n");
+ return 1;
+ }
+
+ cpumask_copy(filter_mask, cpu_online_mask);
+ if (local_node) {
+ int numa_node = pcibus_to_node(efx->pci_dev->bus);
+
+ cpumask_and(filter_mask, filter_mask, cpumask_of_node(numa_node));
+ }
+
+ count = 0;
+ for_each_cpu(cpu, filter_mask) {
+ ++count;
+ cpumask_andnot(filter_mask, filter_mask, topology_sibling_cpumask(cpu));
+ }
+
+ free_cpumask_var(filter_mask);
+
+ return count;
+}
+
+static unsigned int efx_wanted_parallelism(struct efx_nic *efx)
+{
+ unsigned int count;
+
if (rss_cpus) {
count = rss_cpus;
} else {
- if (unlikely(!zalloc_cpumask_var(&thread_mask, GFP_KERNEL))) {
- netif_warn(efx, probe, efx->net_dev,
- "RSS disabled due to allocation failure\n");
- return 1;
- }
+ count = count_online_cores(efx, true);
- count = 0;
- for_each_online_cpu(cpu) {
- if (!cpumask_test_cpu(cpu, thread_mask)) {
- ++count;
- cpumask_or(thread_mask, thread_mask,
- topology_sibling_cpumask(cpu));
- }
- }
-
- free_cpumask_var(thread_mask);
+ /* If no online CPUs in local node, fallback to any online CPUs */
+ if (count == 0)
+ count = count_online_cores(efx, false);
}
if (count > EFX_MAX_RX_QUEUES) {
@@ -369,12 +386,20 @@ int efx_probe_interrupts(struct efx_nic *efx)
#if defined(CONFIG_SMP)
void efx_set_interrupt_affinity(struct efx_nic *efx)
{
+ int numa_node = pcibus_to_node(efx->pci_dev->bus);
+ const struct cpumask *numa_mask = cpumask_of_node(numa_node);
struct efx_channel *channel;
unsigned int cpu;
+ /* If no online CPUs in local node, fallback to any online CPU */
+ if (cpumask_first_and(cpu_online_mask, numa_mask) >= nr_cpu_ids)
+ numa_mask = cpu_online_mask;
+
+ cpu = -1;
efx_for_each_channel(channel, efx) {
- cpu = cpumask_local_spread(channel->channel,
- pcibus_to_node(efx->pci_dev->bus));
+ cpu = cpumask_next_and(cpu, cpu_online_mask, numa_mask);
+ if (cpu >= nr_cpu_ids)
+ cpu = cpumask_first_and(cpu_online_mask, numa_mask);
irq_set_affinity_hint(channel->irq, cpumask_of(cpu));
}
}
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index cc15ee8812d9..c75dc75e2857 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -1282,6 +1282,7 @@ struct efx_udp_tunnel {
* @udp_tnl_has_port: Check if a port has been added as UDP tunnel
* @print_additional_fwver: Dump NIC-specific additional FW version info
* @sensor_event: Handle a sensor event from MCDI
+ * @rx_recycle_ring_size: Size of the RX recycle ring
* @revision: Hardware architecture revision
* @txd_ptr_tbl_base: TX descriptor ring base address
* @rxd_ptr_tbl_base: RX descriptor ring base address
@@ -1460,6 +1461,7 @@ struct efx_nic_type {
size_t (*print_additional_fwver)(struct efx_nic *efx, char *buf,
size_t len);
void (*sensor_event)(struct efx_nic *efx, efx_qword_t *ev);
+ unsigned int (*rx_recycle_ring_size)(const struct efx_nic *efx);
int revision;
unsigned int txd_ptr_tbl_base;
diff --git a/drivers/net/ethernet/sfc/nic_common.h b/drivers/net/ethernet/sfc/nic_common.h
index b9cafe9cd568..0cef35c0c559 100644
--- a/drivers/net/ethernet/sfc/nic_common.h
+++ b/drivers/net/ethernet/sfc/nic_common.h
@@ -195,6 +195,11 @@ static inline void efx_sensor_event(struct efx_nic *efx, efx_qword_t *ev)
efx->type->sensor_event(efx, ev);
}
+static inline unsigned int efx_rx_recycle_ring_size(const struct efx_nic *efx)
+{
+ return efx->type->rx_recycle_ring_size(efx);
+}
+
/* Some statistics are computed as A - B where A and B each increase
* linearly with some hardware counter(s) and the counters are read
* asynchronously. If the counters contributing to B are always read
diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c
index 633ca77a26fd..1b22c7be0088 100644
--- a/drivers/net/ethernet/sfc/rx_common.c
+++ b/drivers/net/ethernet/sfc/rx_common.c
@@ -23,13 +23,6 @@ module_param(rx_refill_threshold, uint, 0444);
MODULE_PARM_DESC(rx_refill_threshold,
"RX descriptor ring refill threshold (%)");
-/* Number of RX buffers to recycle pages for. When creating the RX page recycle
- * ring, this number is divided by the number of buffers per page to calculate
- * the number of pages to store in the RX page recycle ring.
- */
-#define EFX_RECYCLE_RING_SIZE_IOMMU 4096
-#define EFX_RECYCLE_RING_SIZE_NOIOMMU (2 * EFX_RX_PREFERRED_BATCH)
-
/* RX maximum head room required.
*
* This must be at least 1 to prevent overflow, plus one packet-worth
@@ -141,16 +134,7 @@ static void efx_init_rx_recycle_ring(struct efx_rx_queue *rx_queue)
unsigned int bufs_in_recycle_ring, page_ring_size;
struct efx_nic *efx = rx_queue->efx;
- /* Set the RX recycle ring size */
-#ifdef CONFIG_PPC64
- bufs_in_recycle_ring = EFX_RECYCLE_RING_SIZE_IOMMU;
-#else
- if (iommu_present(&pci_bus_type))
- bufs_in_recycle_ring = EFX_RECYCLE_RING_SIZE_IOMMU;
- else
- bufs_in_recycle_ring = EFX_RECYCLE_RING_SIZE_NOIOMMU;
-#endif /* CONFIG_PPC64 */
-
+ bufs_in_recycle_ring = efx_rx_recycle_ring_size(efx);
page_ring_size = roundup_pow_of_two(bufs_in_recycle_ring /
efx->rx_bufs_per_page);
rx_queue->page_ring = kcalloc(page_ring_size,
diff --git a/drivers/net/ethernet/sfc/rx_common.h b/drivers/net/ethernet/sfc/rx_common.h
index 207ccd8ba062..fbd2769307f9 100644
--- a/drivers/net/ethernet/sfc/rx_common.h
+++ b/drivers/net/ethernet/sfc/rx_common.h
@@ -18,6 +18,12 @@
#define EFX_RX_MAX_FRAGS DIV_ROUND_UP(EFX_MAX_FRAME_LEN(EFX_MAX_MTU), \
EFX_RX_USR_BUF_SIZE)
+/* Number of RX buffers to recycle pages for. When creating the RX page recycle
+ * ring, this number is divided by the number of buffers per page to calculate
+ * the number of pages to store in the RX page recycle ring.
+ */
+#define EFX_RECYCLE_RING_SIZE_10G 256
+
static inline u8 *efx_rx_buf_va(struct efx_rx_buffer *buf)
{
return page_address(buf->page) + buf->page_offset;
diff --git a/drivers/net/ethernet/sfc/siena.c b/drivers/net/ethernet/sfc/siena.c
index 16347a6d0c47..ce3060e15b54 100644
--- a/drivers/net/ethernet/sfc/siena.c
+++ b/drivers/net/ethernet/sfc/siena.c
@@ -25,6 +25,7 @@
#include "mcdi_port_common.h"
#include "selftest.h"
#include "siena_sriov.h"
+#include "rx_common.h"
/* Hardware control for SFC9000 family including SFL9021 (aka Siena). */
@@ -958,6 +959,12 @@ static unsigned int siena_check_caps(const struct efx_nic *efx,
return 0;
}
+static unsigned int efx_siena_recycle_ring_size(const struct efx_nic *efx)
+{
+ /* Maximum link speed is 10G */
+ return EFX_RECYCLE_RING_SIZE_10G;
+}
+
/**************************************************************************
*
* Revision-dependent attributes used by efx.c and nic.c
@@ -1098,4 +1105,5 @@ const struct efx_nic_type siena_a0_nic_type = {
.rx_hash_key_size = 16,
.check_caps = siena_check_caps,
.sensor_event = efx_mcdi_sensor_event,
+ .rx_recycle_ring_size = efx_siena_recycle_ring_size,
};