aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4/tcp_output.c
diff options
context:
space:
mode:
authorEric Dumazet <edumazet@google.com>2019-09-10 14:49:28 -0700
committerDavid S. Miller <davem@davemloft.net>2019-09-11 23:59:01 +0100
commit051ba67447de1294aaacc59752a37f72091d15ed (patch)
treea364966011b3e45a19ddbbe85db010a90f7fba76 /net/ipv4/tcp_output.c
parentipv6: Don't use dst gateway directly in ip6_confirm_neigh() (diff)
downloadlinux-dev-051ba67447de1294aaacc59752a37f72091d15ed.tar.xz
linux-dev-051ba67447de1294aaacc59752a37f72091d15ed.zip
tcp: force a PSH flag on TSO packets
When tcp sends a TSO packet, adding a PSH flag on it reduces the sojourn time of GRO packet in GRO receivers. This is particularly the case under pressure, since RX queues receive packets for many concurrent flows. A sender can give a hint to GRO engines when it is appropriate to flush a super-packet, especially when pacing is in the picture, since next packet is probably delayed by one ms. Having less packets in GRO engine reduces chance of LRU eviction or inflated RTT, and reduces GRO cost. We found recently that we must not set the PSH flag on individual full-size MSS segments [1] : Under pressure (CWR state), we better let the packet sit for a small delay (depending on NAPI logic) so that the ACK packet is delayed, and thus next packet we send is also delayed a bit. Eventually the bottleneck queue can be drained. DCTCP flows with CWND=1 have demonstrated the issue. This patch allows to slowdown the aggregate traffic without involving high resolution timers on senders and/or receivers. It has been used at Google for about four years, and has been discussed at various networking conferences. [1] segments smaller than MSS already have PSH flag set by tcp_sendmsg() / tcp_mark_push(), unless MSG_MORE has been requested by the user. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Tariq Toukan <tariqt@mellanox.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to '')
-rw-r--r--net/ipv4/tcp_output.c15
1 files changed, 13 insertions, 2 deletions
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 42abc9bd687a..fec6d67bfd14 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1050,11 +1050,22 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
tcb = TCP_SKB_CB(skb);
memset(&opts, 0, sizeof(opts));
- if (unlikely(tcb->tcp_flags & TCPHDR_SYN))
+ if (unlikely(tcb->tcp_flags & TCPHDR_SYN)) {
tcp_options_size = tcp_syn_options(sk, skb, &opts, &md5);
- else
+ } else {
tcp_options_size = tcp_established_options(sk, skb, &opts,
&md5);
+ /* Force a PSH flag on all (GSO) packets to expedite GRO flush
+ * at receiver : This slightly improve GRO performance.
+ * Note that we do not force the PSH flag for non GSO packets,
+ * because they might be sent under high congestion events,
+ * and in this case it is better to delay the delivery of 1-MSS
+ * packets and thus the corresponding ACK packet that would
+ * release the following packet.
+ */
+ if (tcp_skb_pcount(skb) > 1)
+ tcb->tcp_flags |= TCPHDR_PSH;
+ }
tcp_header_size = tcp_options_size + sizeof(struct tcphdr);
/* if no packet is in qdisc/device queue, then allow XPS to select