aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/include/linux/netdevice.h
diff options
context:
space:
mode:
authorEric Dumazet <edumazet@google.com>2020-04-22 09:13:27 -0700
committerDavid S. Miller <davem@davemloft.net>2020-04-23 12:43:20 -0700
commit6f8b12d661d09b488b9ac879b8eafbd2cc4a1450 (patch)
treeb5ef6f687bc70ecdffd8c9ce39b3daf7f11214ba /include/linux/netdevice.h
parentMerge branch 'qed-aer' (diff)
downloadwireguard-linux-6f8b12d661d09b488b9ac879b8eafbd2cc4a1450.tar.xz
wireguard-linux-6f8b12d661d09b488b9ac879b8eafbd2cc4a1450.zip
net: napi: add hard irqs deferral feature
Back in commit 3b47d30396ba ("net: gro: add a per device gro flush timer") we added the ability to arm one high resolution timer, that we used to keep not-complete packets in GRO engine a bit longer, hoping that further frames might be added to them. Since then, we added the napi_complete_done() interface, and commit 364b6055738b ("net: busy-poll: return busypolling status to drivers") allowed drivers to avoid re-arming NIC interrupts if we made a promise that their NAPI poll() handler would be called in the near future. This infrastructure can be leveraged, thanks to a new device parameter, which allows to arm the napi hrtimer, instead of re-arming the device hard IRQ. We have noticed that on some servers with 32 RX queues or more, the chit-chat between the NIC and the host caused by IRQ delivery and re-arming could hurt throughput by ~20% on 100Gbit NIC. In contrast, hrtimers are using local (percpu) resources and might have lower cost. The new tunable, named napi_defer_hard_irqs, is placed in the same hierarchy than gro_flush_timeout (/sys/class/net/ethX/) By default, both gro_flush_timeout and napi_defer_hard_irqs are zero. This patch does not change the prior behavior of gro_flush_timeout if used alone : NIC hard irqs should be rearmed as before. One concrete usage can be : echo 20000 >/sys/class/net/eth1/gro_flush_timeout echo 10 >/sys/class/net/eth1/napi_defer_hard_irqs If at least one packet is retired, then we will reset napi counter to 10 (napi_defer_hard_irqs), ensuring at least 10 periodic scans of the queue. On busy queues, this should avoid NIC hard IRQ, while before this patch IRQ avoidance was only possible if napi->poll() was exhausting its budget and not call napi_complete_done(). This feature also can be used to work around some non-optimal NIC irq coalescing strategies. Having the ability to insert XX usec delays between each napi->poll() can increase cache efficiency, since we increase batch sizes. It also keeps serving cpus not idle too long, reducing tail latencies. Co-developed-by: Luigi Rizzo <lrizzo@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'include/linux/netdevice.h')
-rw-r--r--include/linux/netdevice.h2
1 files changed, 2 insertions, 0 deletions
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0750b54b3765..5a8d40f1ffe2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -329,6 +329,7 @@ struct napi_struct {
unsigned long state;
int weight;
+ int defer_hard_irqs_count;
unsigned long gro_bitmask;
int (*poll)(struct napi_struct *, int);
#ifdef CONFIG_NETPOLL
@@ -1995,6 +1996,7 @@ struct net_device {
struct bpf_prog __rcu *xdp_prog;
unsigned long gro_flush_timeout;
+ int napi_defer_hard_irqs;
rx_handler_func_t __rcu *rx_handler;
void __rcu *rx_handler_data;