<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-dev/net/ipv4, branch master</title>
<subtitle>Linux kernel development work - see feature branches</subtitle>
<id>https://git.zx2c4.com/linux-dev/atom/net/ipv4?h=master</id>
<link rel='self' href='https://git.zx2c4.com/linux-dev/atom/net/ipv4?h=master'/>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/'/>
<updated>2022-11-07T11:29:38Z</updated>
<entry>
<title>tcp: prohibit TCP_REPAIR_OPTIONS if data was already sent</title>
<updated>2022-11-07T11:29:38Z</updated>
<author>
<name>Lu Wei</name>
<email>luwei32@huawei.com</email>
</author>
<published>2022-11-04T02:27:23Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=0c175da7b0378445f5ef53904247cfbfb87e0b78'/>
<id>urn:sha1:0c175da7b0378445f5ef53904247cfbfb87e0b78</id>
<content type='text'>
If setsockopt with option name of TCP_REPAIR_OPTIONS and opt_code
of TCPOPT_SACK_PERM is called to enable sack after data is sent
and dupacks are received , it will trigger a warning in function
tcp_verify_left_out() as follows:

============================================
WARNING: CPU: 8 PID: 0 at net/ipv4/tcp_input.c:2132
tcp_timeout_mark_lost+0x154/0x160
tcp_enter_loss+0x2b/0x290
tcp_retransmit_timer+0x50b/0x640
tcp_write_timer_handler+0x1c8/0x340
tcp_write_timer+0xe5/0x140
call_timer_fn+0x3a/0x1b0
__run_timers.part.0+0x1bf/0x2d0
run_timer_softirq+0x43/0xb0
__do_softirq+0xfd/0x373
__irq_exit_rcu+0xf6/0x140

The warning is caused in the following steps:
1. a socket named socketA is created
2. socketA enters repair mode without build a connection
3. socketA calls connect() and its state is changed to TCP_ESTABLISHED
   directly
4. socketA leaves repair mode
5. socketA calls sendmsg() to send data, packets_out and sack_outs(dup
   ack receives) increase
6. socketA enters repair mode again
7. socketA calls setsockopt with TCPOPT_SACK_PERM to enable sack
8. retransmit timer expires, it calls tcp_timeout_mark_lost(), lost_out
   increases
9. sack_outs + lost_out &gt; packets_out triggers since lost_out and
   sack_outs increase repeatly

In function tcp_timeout_mark_lost(), tp-&gt;sacked_out will be cleared if
Step7 not happen and the warning will not be triggered. As suggested by
Denis and Eric, TCP_REPAIR_OPTIONS should be prohibited if data was
already sent.

socket-tcp tests in CRIU has been tested as follows:
$ sudo ./test/zdtm.py run -t zdtm/static/socket-tcp*  --keep-going \
       --ignore-taint

socket-tcp* represent all socket-tcp tests in test/zdtm/static/.

Fixes: b139ba4e90dc ("tcp: Repair connection-time negotiated parameters")
Signed-off-by: Lu Wei &lt;luwei32@huawei.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>bpf, sockmap: Fix the sk-&gt;sk_forward_alloc warning of sk_stream_kill_queues</title>
<updated>2022-11-01T20:59:52Z</updated>
<author>
<name>Wang Yufen</name>
<email>wangyufen@huawei.com</email>
</author>
<published>2022-11-01T01:31:36Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=8ec95b94716a1e4d126edc3fb2bc426a717e2dba'/>
<id>urn:sha1:8ec95b94716a1e4d126edc3fb2bc426a717e2dba</id>
<content type='text'>
When running `test_sockmap` selftests, the following warning appears:

  WARNING: CPU: 2 PID: 197 at net/core/stream.c:205 sk_stream_kill_queues+0xd3/0xf0
  Call Trace:
  &lt;TASK&gt;
  inet_csk_destroy_sock+0x55/0x110
  tcp_rcv_state_process+0xd28/0x1380
  ? tcp_v4_do_rcv+0x77/0x2c0
  tcp_v4_do_rcv+0x77/0x2c0
  __release_sock+0x106/0x130
  __tcp_close+0x1a7/0x4e0
  tcp_close+0x20/0x70
  inet_release+0x3c/0x80
  __sock_release+0x3a/0xb0
  sock_close+0x14/0x20
  __fput+0xa3/0x260
  task_work_run+0x59/0xb0
  exit_to_user_mode_prepare+0x1b3/0x1c0
  syscall_exit_to_user_mode+0x19/0x50
  do_syscall_64+0x48/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae

The root case is in commit 84472b436e76 ("bpf, sockmap: Fix more uncharged
while msg has more_data"), where I used msg-&gt;sg.size to replace the tosend,
causing breakage:

  if (msg-&gt;apply_bytes &amp;&amp; msg-&gt;apply_bytes &lt; tosend)
    tosend = psock-&gt;apply_bytes;

Fixes: 84472b436e76 ("bpf, sockmap: Fix more uncharged while msg has more_data")
Reported-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Signed-off-by: Wang Yufen &lt;wangyufen@huawei.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Acked-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Link: https://lore.kernel.org/bpf/1667266296-8794-1-git-send-email-wangyufen@huawei.com
</content>
</entry>
<entry>
<title>net: also flag accepted sockets supporting msghdr originated zerocopy</title>
<updated>2022-10-29T03:21:25Z</updated>
<author>
<name>Stefan Metzmacher</name>
<email>metze@samba.org</email>
</author>
<published>2022-10-26T23:25:59Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=71b7786ea478f3c4611deff4d2b9676b0c17c56b'/>
<id>urn:sha1:71b7786ea478f3c4611deff4d2b9676b0c17c56b</id>
<content type='text'>
Without this only the client initiated tcp sockets have SOCK_SUPPORT_ZC.
The listening socket on the server also has it, but the accepted
connections didn't, which meant IORING_OP_SEND[MSG]_ZC will always
fails with -EOPNOTSUPP.

Fixes: e993ffe3da4b ("net: flag sockets supporting msghdr originated zerocopy")
Cc: &lt;stable@vger.kernel.org&gt; # 6.0
CC: Jens Axboe &lt;axboe@kernel.dk&gt;
Link: https://lore.kernel.org/io-uring/20221024141503.22b4e251@kernel.org/T/#m38aa19b0b825758fb97860a38ad13122051f9dda
Signed-off-by: Stefan Metzmacher &lt;metze@samba.org&gt;
Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net/ulp: remove SOCK_SUPPORT_ZC from tls sockets</title>
<updated>2022-10-29T03:21:25Z</updated>
<author>
<name>Pavel Begunkov</name>
<email>asml.silence@gmail.com</email>
</author>
<published>2022-10-26T23:25:58Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=e276d62dcfdee6582486e8b8344dd869518e14be'/>
<id>urn:sha1:e276d62dcfdee6582486e8b8344dd869518e14be</id>
<content type='text'>
Remove SOCK_SUPPORT_ZC when we're setting ulp as it might not support
msghdr::ubuf_info, e.g. like TLS replacing -&gt;sk_prot with a new set of
handlers.

Cc: &lt;stable@vger.kernel.org&gt; # 6.0
Reported-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Fixes: e993ffe3da4bc ("net: flag sockets supporting msghdr originated zerocopy")
Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>net: remove SOCK_SUPPORT_ZC from sockmap</title>
<updated>2022-10-29T03:21:25Z</updated>
<author>
<name>Pavel Begunkov</name>
<email>asml.silence@gmail.com</email>
</author>
<published>2022-10-26T23:25:57Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=fee9ac06647e59a69fb7aec58f25267c134264b4'/>
<id>urn:sha1:fee9ac06647e59a69fb7aec58f25267c134264b4</id>
<content type='text'>
sockmap replaces -&gt;sk_prot with its own callbacks, we should remove
SOCK_SUPPORT_ZC as the new proto doesn't support msghdr::ubuf_info.

Cc: &lt;stable@vger.kernel.org&gt; # 6.0
Reported-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Fixes: e993ffe3da4bc ("net: flag sockets supporting msghdr originated zerocopy")
Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>nh: fix scope used to find saddr when adding non gw nh</title>
<updated>2022-10-27T17:17:40Z</updated>
<author>
<name>Nicolas Dichtel</name>
<email>nicolas.dichtel@6wind.com</email>
</author>
<published>2022-10-20T10:09:52Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=bac0f937c343d651874f83b265ca8f5070ed4f06'/>
<id>urn:sha1:bac0f937c343d651874f83b265ca8f5070ed4f06</id>
<content type='text'>
As explained by Julian, fib_nh_scope is related to fib_nh_gw4, but
fib_info_update_nhc_saddr() needs the scope of the route, which is
the scope "before" fib_nh_scope, ie fib_nh_scope - 1.

This patch fixes the problem described in commit 747c14307214 ("ip: fix
dflt addr selection for connected nexthop").

Fixes: 597cfe4fc339 ("nexthop: Add support for IPv4 nexthops")
Link: https://lore.kernel.org/netdev/6c8a44ba-c2d5-cdf-c5c7-5baf97cba38@ssi.bg/
Signed-off-by: Nicolas Dichtel &lt;nicolas.dichtel@6wind.com&gt;
Reviewed-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Revert "ip: fix dflt addr selection for connected nexthop"</title>
<updated>2022-10-27T17:17:36Z</updated>
<author>
<name>Nicolas Dichtel</name>
<email>nicolas.dichtel@6wind.com</email>
</author>
<published>2022-10-20T10:09:51Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=e021c329ee198f1cd0cb3855f221137dda49256a'/>
<id>urn:sha1:e021c329ee198f1cd0cb3855f221137dda49256a</id>
<content type='text'>
This reverts commit 747c14307214b55dbd8250e1ab44cad8305756f1.

As explained by Julian, nhc_scope is related to nhc_gw, not to the route.
Revert the original patch. The initial problem is fixed differently in the
next commit.

Link: https://lore.kernel.org/netdev/6c8a44ba-c2d5-cdf-c5c7-5baf97cba38@ssi.bg/
Signed-off-by: Nicolas Dichtel &lt;nicolas.dichtel@6wind.com&gt;
Reviewed-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Revert "ip: fix triggering of 'icmp redirect'"</title>
<updated>2022-10-27T17:17:32Z</updated>
<author>
<name>Nicolas Dichtel</name>
<email>nicolas.dichtel@6wind.com</email>
</author>
<published>2022-10-20T10:09:50Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=745b913a59947919454658dd44cddf5ee8d8f899'/>
<id>urn:sha1:745b913a59947919454658dd44cddf5ee8d8f899</id>
<content type='text'>
This reverts commit eb55dc09b5dd040232d5de32812cc83001a23da6.

The patch that introduces this bug is reverted right after this one.

Signed-off-by: Nicolas Dichtel &lt;nicolas.dichtel@6wind.com&gt;
Reviewed-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'net-6.1-rc3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2022-10-24T19:43:51Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-10-24T19:43:51Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=337a0a0b63f1c30195733eaacf39e4310a592a68'/>
<id>urn:sha1:337a0a0b63f1c30195733eaacf39e4310a592a68</id>
<content type='text'>
Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf.

  The net-memcg fix stands out, the rest is very run-off-the-mill. Maybe
  I'm biased.

  Current release - regressions:

   - eth: fman: re-expose location of the MAC address to userspace,
     apparently some udev scripts depended on the exact value

  Current release - new code bugs:

   - bpf:
       - wait for busy refill_work when destroying bpf memory allocator
       - allow bpf_user_ringbuf_drain() callbacks to return 1
       - fix dispatcher patchable function entry to 5 bytes nop

  Previous releases - regressions:

   - net-memcg: avoid stalls when under memory pressure

   - tcp: fix indefinite deferral of RTO with SACK reneging

   - tipc: fix a null-ptr-deref in tipc_topsrv_accept

   - eth: macb: specify PHY PM management done by MAC

   - tcp: fix a signed-integer-overflow bug in tcp_add_backlog()

  Previous releases - always broken:

   - eth: amd-xgbe: SFP fixes and compatibility improvements

  Misc:

   - docs: netdev: offer performance feedback to contributors"

* tag 'net-6.1-rc3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits)
  net-memcg: avoid stalls when under memory pressure
  tcp: fix indefinite deferral of RTO with SACK reneging
  tcp: fix a signed-integer-overflow bug in tcp_add_backlog()
  net: lantiq_etop: don't free skb when returning NETDEV_TX_BUSY
  net: fix UAF issue in nfqnl_nf_hook_drop() when ops_init() failed
  docs: netdev: offer performance feedback to contributors
  kcm: annotate data-races around kcm-&gt;rx_wait
  kcm: annotate data-races around kcm-&gt;rx_psock
  net: fman: Use physical address for userspace interfaces
  net/mlx5e: Cleanup MACsec uninitialization routine
  atlantic: fix deadlock at aq_nic_stop
  nfp: only clean `sp_indiff` when application firmware is unloaded
  amd-xgbe: add the bit rate quirk for Molex cables
  amd-xgbe: fix the SFP compliance codes check for DAC cables
  amd-xgbe: enable PLL_CTL for fixed PHY modes only
  amd-xgbe: use enums for mailbox cmd and sub_cmds
  amd-xgbe: Yellow carp devices do not need rrc
  bpf: Use __llist_del_all() whenever possbile during memory draining
  bpf: Wait for busy refill_work when destroying bpf memory allocator
  MAINTAINERS: add keyword match on PTP
  ...
</content>
</entry>
<entry>
<title>tcp: fix indefinite deferral of RTO with SACK reneging</title>
<updated>2022-10-24T17:34:48Z</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2022-10-21T17:08:21Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=3d2af9cce3133b3bc596a9d065c6f9d93419ccfb'/>
<id>urn:sha1:3d2af9cce3133b3bc596a9d065c6f9d93419ccfb</id>
<content type='text'>
This commit fixes a bug that can cause a TCP data sender to repeatedly
defer RTOs when encountering SACK reneging.

The bug is that when we're in fast recovery in a scenario with SACK
reneging, every time we get an ACK we call tcp_check_sack_reneging()
and it can note the apparent SACK reneging and rearm the RTO timer for
srtt/2 into the future. In some SACK reneging scenarios that can
happen repeatedly until the receive window fills up, at which point
the sender can't send any more, the ACKs stop arriving, and the RTO
fires at srtt/2 after the last ACK. But that can take far too long
(O(10 secs)), since the connection is stuck in fast recovery with a
low cwnd that cannot grow beyond ssthresh, even if more bandwidth is
available.

This fix changes the logic in tcp_check_sack_reneging() to only rearm
the RTO timer if data is cumulatively ACKed, indicating forward
progress. This avoids this kind of nearly infinite loop of RTO timer
re-arming. In addition, this meets the goals of
tcp_check_sack_reneging() in handling Windows TCP behavior that looks
temporarily like SACK reneging but is not really.

Many thanks to Jakub Kicinski and Neil Spring, who reported this issue
and provided critical packet traces that enabled root-causing this
issue. Also, many thanks to Jakub Kicinski for testing this fix.

Fixes: 5ae344c949e7 ("tcp: reduce spurious retransmits due to transient SACK reneging")
Reported-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Reported-by: Neil Spring &lt;ntspring@fb.com&gt;
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Tested-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Link: https://lore.kernel.org/r/20221021170821.1093930-1-ncardwell.kernel@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
</feed>
