aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux/tcp.h
diff options
context:
space:
mode:
authorSoheil Hassas Yeganeh <soheil@google.com>2018-05-01 15:39:15 -0400
committerDavid S. Miller <davem@davemloft.net>2018-05-01 18:56:29 -0400
commitb75eba76d3d72e2374fac999926dafef2997edd2 (patch)
tree43e3ae67eeeae91a7d17c68a40140840494fcd70 /include/linux/tcp.h
parentMerge branch 'hns3-fixes' (diff)
downloadlinux-dev-b75eba76d3d72e2374fac999926dafef2997edd2.tar.xz
linux-dev-b75eba76d3d72e2374fac999926dafef2997edd2.zip
tcp: send in-queue bytes in cmsg upon read
Applications with many concurrent connections, high variance in receive queue length and tight memory bounds cannot allocate worst-case buffer size to drain sockets. Knowing the size of receive queue length, applications can optimize how they allocate buffers to read from the socket. The number of bytes pending on the socket is directly available through ioctl(FIONREAD/SIOCINQ) and can be approximated using getsockopt(MEMINFO) (rmem_alloc includes skb overheads in addition to application data). But, both of these options add an extra syscall per recvmsg. Moreover, ioctl(FIONREAD/SIOCINQ) takes the socket lock. Add the TCP_INQ socket option to TCP. When this socket option is set, recvmsg() relays the number of bytes available on the socket for reading to the application via the TCP_CM_INQ control message. Calculate the number of bytes after releasing the socket lock to include the processed backlog, if any. To avoid an extra branch in the hot path of recvmsg() for this new control message, move all cmsg processing inside an existing branch for processing receive timestamps. Since the socket lock is not held when calculating the size of receive queue, TCP_INQ is a hint. For example, it can overestimate the queue size by one byte, if FIN is received. With this method, applications can start reading from the socket using a small buffer, and then use larger buffers based on the remaining data when needed. V3 change-log: As suggested by David Miller, added loads with barrier to check whether we have multiple threads calling recvmsg in parallel. When that happens we lock the socket to calculate inq. V4 change-log: Removed inline from a static function. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'include/linux/tcp.h')
-rw-r--r--include/linux/tcp.h2
1 files changed, 1 insertions, 1 deletions
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 20585d5c4e1c..807776928cb8 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -228,7 +228,7 @@ struct tcp_sock {
unused:2;
u8 nonagle : 4,/* Disable Nagle algorithm? */
thin_lto : 1,/* Use linear timeouts for thin streams */
- unused1 : 1,
+ recvmsg_inq : 1,/* Indicate # of bytes in queue upon recvmsg */
repair : 1,
frto : 1;/* F-RTO (RFC5682) activated in CA_Loss */
u8 repair_queue;