tcp: add tcp_tx_skb_cache sysctl

Feng Tang reported a performance regression after introduction of per TCP socket tx/rx caches, for TCP over loopback (netperf) There is high chance the regression is caused by a change on how well the 32 KB per-thread page (current->task_frag) can be recycled, and lack of pcp caches for order-3 pages. I could not reproduce the regression myself, cpus all being spinning on the mm spinlocks for page allocs/freeing, regardless of enabling or disabling the per tcp socket caches. It seems best to disable the feature by default, and let admins enabling it. MM layer either needs to provide scalable order-3 pages allocations, or could attempt a trylock on zone->lock if the caller only attempts to get a high-order page and is able to fallback to order-0 ones in case of pressure. Tests run on a 56 cores host (112 hyper threads) - 35.49% netperf [kernel.vmlinux] [k] queued_spin_lock_slowpath - 35.49% queued_spin_lock_slowpath - 18.18% get_page_from_freelist - __alloc_pages_nodemask - 18.18% alloc_pages_current skb_page_frag_refill sk_page_frag_refill tcp_sendmsg_locked tcp_sendmsg inet_sendmsg sock_sendmsg __sys_sendto __x64_sys_sendto do_syscall_64 entry_SYSCALL_64_after_hwframe __libc_send + 17.31% __free_pages_ok + 31.43% swapper [kernel.vmlinux] [k] intel_idle + 9.12% netperf [kernel.vmlinux] [k] copy_user_enhanced_fast_string + 6.53% netserver [kernel.vmlinux] [k] copy_user_enhanced_fast_string + 0.69% netserver [kernel.vmlinux] [k] queued_spin_lock_slowpath + 0.68% netperf [kernel.vmlinux] [k] skb_release_data + 0.52% netperf [kernel.vmlinux] [k] tcp_sendmsg_locked 0.46% netperf [kernel.vmlinux] [k] _raw_spin_lock_irqsave Fixes: 472c2e07eef0 ("tcp: add one skb cache for tx") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Feng Tang <feng.tang@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
author: Eric Dumazet <edumazet@google.com> 2019-06-14 16:22:20 -0700
committer: David S. Miller <davem@davemloft.net> 2019-06-14 20:18:28 -0700
commit: 0b7d7f6b22084a3156f267c85303908a8f4c9a08 (patch)
tree: 63b86d07294975f0cddb579b0be1ec4ad223f309 /include/net/sock.h
parent: tcp: add tcp_rx_skb_cache sysctl (diff)
download: linux-dev-0b7d7f6b22084a3156f267c85303908a8f4c9a08.tar.xz
linux-dev-0b7d7f6b22084a3156f267c85303908a8f4c9a08.zip
1 files changed, 3 insertions, 1 deletions
diff --git a/include/net/sock.h b/include/net/sock.h
index b02645e2dfad..7d7f4ce63bb2 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1463,12 +1463,14 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
 		__sk_mem_reclaim(sk, 1 << 20);
 }
 
+DECLARE_STATIC_KEY_FALSE(tcp_tx_skb_cache_key);
 static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb)
 {
 	sock_set_flag(sk, SOCK_QUEUE_SHRUNK);
 	sk->sk_wmem_queued -= skb->truesize;
 	sk_mem_uncharge(sk, skb->truesize);
-	if (!sk->sk_tx_skb_cache && !skb_cloned(skb)) {
+	if (static_branch_unlikely(&tcp_tx_skb_cache_key) &&
+	    !sk->sk_tx_skb_cache && !skb_cloned(skb)) {
 		skb_zcopy_clear(skb, true);
 		sk->sk_tx_skb_cache = skb;
 		return;
author	Eric Dumazet <edumazet@google.com>	2019-06-14 16:22:20 -0700
committer	David S. Miller <davem@davemloft.net>	2019-06-14 20:18:28 -0700
commit	0b7d7f6b22084a3156f267c85303908a8f4c9a08 (patch)
tree	63b86d07294975f0cddb579b0be1ec4ad223f309 /include/net/sock.h
parent	tcp: add tcp_rx_skb_cache sysctl (diff)
download	linux-dev-0b7d7f6b22084a3156f267c85303908a8f4c9a08.tar.xz linux-dev-0b7d7f6b22084a3156f267c85303908a8f4c9a08.zip