<feed xmlns='http://www.w3.org/2005/Atom'>
<title>wireguard-linux/net/xfrm, branch stable</title>
<subtitle>WireGuard for the Linux kernel</subtitle>
<id>https://git.zx2c4.com/wireguard-linux/atom/net/xfrm?h=stable</id>
<link rel='self' href='https://git.zx2c4.com/wireguard-linux/atom/net/xfrm?h=stable'/>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/'/>
<updated>2026-05-26T08:35:30Z</updated>
<entry>
<title>xfrm: input: hold netns during deferred transport reinjection</title>
<updated>2026-05-26T08:35:30Z</updated>
<author>
<name>Zhengchuan Liang</name>
<email>zcliangcn@gmail.com</email>
</author>
<published>2026-05-22T09:31:55Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=c16f74dc1d75d0e2e7670076d5375deda110ebeb'/>
<id>urn:sha1:c16f74dc1d75d0e2e7670076d5375deda110ebeb</id>
<content type='text'>
Transport-mode reinjection stores a struct net pointer in skb-&gt;cb and
uses it later from xfrm_trans_reinject(). That pointer must stay valid
until the deferred callback runs.

Take a netns reference when queueing deferred reinjection work and drop
it after the callback completes. Use maybe_get_net() so the queueing
path does not revive a namespace that is already being torn down.

This keeps the existing workqueue design and fixes the netns lifetime
handling in one place for all users of xfrm_trans_queue_net().

Fixes: 7b3801927e52 ("xfrm: introduce xfrm_trans_queue_net")
Cc: stable@kernel.org
Reported-by: Yuan Tan &lt;yuantan098@gmail.com&gt;
Reported-by: Xin Liu &lt;bird@lzu.edu.cn&gt;
Co-developed-by: Luxing Yin &lt;tr0jan@lzu.edu.cn&gt;
Signed-off-by: Luxing Yin &lt;tr0jan@lzu.edu.cn&gt;
Signed-off-by: Zhengchuan Liang &lt;zcliangcn@gmail.com&gt;
Signed-off-by: Ren Wei &lt;n05ec@lzu.edu.cn&gt;
Assisted-by: Codex:gpt-5.4
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
</entry>
<entry>
<title>xfrm: move policy_bydst RCU sync from per-netns .exit to .pre_exit</title>
<updated>2026-05-26T08:35:29Z</updated>
<author>
<name>Usama Arif</name>
<email>usama.arif@linux.dev</email>
</author>
<published>2026-05-21T10:29:26Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=3e52417318473782012b236d0325bf7d2266a597'/>
<id>urn:sha1:3e52417318473782012b236d0325bf7d2266a597</id>
<content type='text'>
The struct pernet_operations docstring in include/net/net_namespace.h
explicitly warns against blocking RCU primitives in .exit handlers:

    Exit methods using blocking RCU primitives, such as
    synchronize_rcu(), should be implemented via exit_batch.
    [...]
    Please, avoid synchronize_rcu() at all, where it's possible.

    Note that a combination of pre_exit() and exit() can
    be used, since a synchronize_rcu() is guaranteed between
    the calls.

xfrm_policy_fini() violates this: it calls synchronize_rcu() before
freeing the policy_bydst hash tables (so no RCU reader is mid-
traversal at free time), but runs from xfrm_net_ops.exit -- once per
namespace -- so a cleanup_net() of N namespaces pays N full RCU
grace periods serially.

Use the documented pre_exit/exit split. Move the policy flush (and
the workqueue drains it depends on) into a new .pre_exit handler;
xfrm_policy_fini() then runs in .exit and frees the hash tables
after the synchronize_rcu_expedited() that cleanup_net() guarantees
between the two phases. Providing O(1) RCU grace periods per batch
instead of O(N).

Observed on Linux 6.18 with a workload doing unshare(CLONE_NEWNET)
at ~13/sec sustained: cleanup_net() and the netns_wq rescuer kthread
both stuck in xfrm_policy_fini()'s synchronize_rcu(), &gt;300k struct
net accumulated in the cleanup queue, Percpu in /proc/meminfo climbed
to 130+ GB on 256-CPU hosts, and memcg OOMs followed. setup_net and
__put_net counts were balanced, ruling out a refcount leak.

Fixes: 069daad4f2ae ("xfrm: Wait for RCU readers during policy netns exit")
Signed-off-by: Usama Arif &lt;usama.arif@linux.dev&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
</entry>
<entry>
<title>xfrm: iptfs: reset runtime state when cloning SAs</title>
<updated>2026-05-26T08:35:28Z</updated>
<author>
<name>Shaomin Chen</name>
<email>eeesssooo020@gmail.com</email>
</author>
<published>2026-05-20T18:07:23Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=7f83d174073234839aea176f265e517e0d50a1d2'/>
<id>urn:sha1:7f83d174073234839aea176f265e517e0d50a1d2</id>
<content type='text'>
iptfs_clone_state() clones the IPTFS mode data with kmemdup(). This
copies runtime objects which must not be shared with the original SA,
including the embedded sk_buff_head, hrtimers, spinlock, and in-flight
reassembly/reorder state.

If xfrm_state_migrate() fails after clone_state() but before the later
init_state() call has reinitialized those fields, the cloned state can be
destroyed by xfrm_state_gc_task() with list and timer state copied from the
original SA. With queued packets this lets the clone splice and free skbs
owned by the original IPTFS queue, leading to use-after-free and
double-free reports in iptfs_destroy_state() and skb release paths.

Reinitialize the clone's runtime state before publishing it through
x-&gt;mode_data. Because clone_state() now publishes a destroyable mode_data
object before init_state(), take the mode callback module reference there.
Avoid taking it again from __iptfs_init_state() for the same object.

Fixes: 0e4fbf013fa5 ("xfrm: iptfs: add user packet (tunnel ingress) handling")
Cc: stable@vger.kernel.org
Signed-off-by: Shaomin Chen &lt;eeesssooo020@gmail.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
</entry>
<entry>
<title>xfrm: Check for underflow in xfrm_state_mtu</title>
<updated>2026-05-14T08:17:43Z</updated>
<author>
<name>David Ahern</name>
<email>dahern@nvidia.com</email>
</author>
<published>2026-05-13T16:49:14Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=742b04d0550b0ec89dcbc99537ec88653bd1ad90'/>
<id>urn:sha1:742b04d0550b0ec89dcbc99537ec88653bd1ad90</id>
<content type='text'>
Leo Lin reported OOB write issue in esp component:

  xfrm_state_mtu() returns u32 but performs its arithmetic in unsigned
  modulo-2^32 space using an attacker-influenced "header_len + authsize +
  net_adj" subtracted from a small "mtu" argument. A nobody user can
  install an IPv4 ESP tunnel SA with a large authentication key
  (XFRMA_ALG_AUTH_TRUNC, e.g. hmac(sha512), 64-byte key, 64-byte trunc),
  configure a small interface MTU (68 bytes), and set XFRMA_TFCPAD to a
  large value. When a single UDP datagram is then sent through the
  tunnel, xfrm_state_mtu() underflows to a near-2^32 value, and
  esp_output() consumes it as a signed int via:

        padto      = min(x-&gt;tfcpad, xfrm_state_mtu(x, mtu_cached))
        esp.tfclen = padto - skb-&gt;len   (assigned to int)

  esp.tfclen ends up negative (e.g. -207). It is sign-extended to size_t
  when passed to memset() inside esp_output_fill_trailer(), producing a
  ~16 EB write of zeroes at skb_tail_pointer(skb). KASAN logs it as
  "Write of size 18446744073709551537 at addr ffff888...".

Check for underflow and return 1. This causes the sendmsg attempt to
fail with ENETUNREACH.

Fixes: c5c252389374 ("[XFRM]: Optimize MTU calculation")
Reported-by: Leo Lin &lt;leo@depthfirst.com&gt;
Assisted-by: Codex:26.506.31004
Signed-off-by: David Ahern &lt;dahern@nvidia.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
</entry>
<entry>
<title>xfrm: ipcomp: Free destination pages on acomp errors</title>
<updated>2026-05-11T08:34:35Z</updated>
<author>
<name>Herbert Xu</name>
<email>herbert@gondor.apana.org.au</email>
</author>
<published>2026-05-06T13:23:28Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=7dbac7680eb629b3b4dc7e98c34f943b8814c0c8'/>
<id>urn:sha1:7dbac7680eb629b3b4dc7e98c34f943b8814c0c8</id>
<content type='text'>
Move the out_free_req label up by a couple of lines so that the
allocated dst SG list gets freed on error as well as success.

Fixes: eb2953d26971 ("xfrm: ipcomp: Use crypto_acomp interface")
Cc: stable@kernel.org
Reported-by: Yuan Tan &lt;yuantan098@gmail.com&gt;
Reported-by: Yifan Wu &lt;yifanwucs@gmail.com&gt;
Reported-by: Juefei Pu &lt;tomapufckgml@gmail.com&gt;
Reported-by: Xin Liu &lt;bird@lzu.edu.cn&gt;
Reported-by: Yilin Zhu &lt;zylzyl2333@gmail.com&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
</entry>
<entry>
<title>xfrm: route MIGRATE notifications to caller's netns</title>
<updated>2026-05-07T07:31:10Z</updated>
<author>
<name>Maoyi Xie</name>
<email>maoyixie.tju@gmail.com</email>
</author>
<published>2026-05-04T14:27:36Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=7e2a4f7ca0952820731ef7bdadfc9a9e9d3571b4'/>
<id>urn:sha1:7e2a4f7ca0952820731ef7bdadfc9a9e9d3571b4</id>
<content type='text'>
xfrm_send_migrate() in net/xfrm/xfrm_user.c and pfkey_send_migrate()
in net/key/af_key.c both hardcode &amp;init_net for the multicast that
announces a successful XFRM_MSG_MIGRATE / SADB_X_MIGRATE.

XFRM_MSG_MIGRATE arrives on a per-netns NETLINK_XFRM socket, and the
rest of the xfrm/af_key netlink path was made netns-aware in 2008.
The other 14 multicast paths in xfrm_user.c route their event using
xs_net(x), xp_net(xp) or sock_net(skb-&gt;sk); only the migrate path
was missed.

Two consequences of the init_net hardcoding:

  1. The notification (selector, old/new endpoint addresses, and the
     km_address) is delivered to listeners on init_net's
     XFRMNLGRP_MIGRATE / pfkey BROADCAST_ALL groups rather than on
     the issuing netns. An IKE daemon running in init_net therefore
     receives migration notifications originating from any other
     netns on the host.

  2. An IKE daemon running inside a non-init netns and subscribed
     to its own XFRMNLGRP_MIGRATE / pfkey groups never receives the
     notification of its own migration. IKEv2 MOBIKE / address-update
     handling inside a netns is silently broken.

Thread struct net through km_migrate() and the xfrm_mgr.migrate
function pointer, drop the &amp;init_net override in xfrm_send_migrate()
and pfkey_send_migrate(), and pass the caller's net (already in
scope in xfrm_migrate() via sock_net(skb-&gt;sk)) all the way down.
struct xfrm_mgr is in-tree only and not exported as a stable API,
so the function-pointer signature change is internal.

pfkey_broadcast() is already netns-aware via net_generic(net,
pfkey_net_id) since the pernet conversion. The five other
pfkey_broadcast() callers in af_key.c already pass xs_net(x),
sock_net(sk) or a per-netns net, so this only removes the
&amp;init_net outlier.

Fixes: 5c79de6e79cd ("[XFRM]: User interface for handling XFRM_MSG_MIGRATE")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Maoyi Xie &lt;maoyi.xie@ntu.edu.sg&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
</entry>
<entry>
<title>xfrm: defensively unhash xfrm_state lists in __xfrm_state_delete</title>
<updated>2026-04-29T09:27:34Z</updated>
<author>
<name>Michal Kosiorek</name>
<email>mkosiorek121@gmail.com</email>
</author>
<published>2026-04-29T08:54:51Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=14acf9652e5690de3c7486c6db5fb8dafd0a32a3'/>
<id>urn:sha1:14acf9652e5690de3c7486c6db5fb8dafd0a32a3</id>
<content type='text'>
KASAN reproduces a slab-use-after-free in __xfrm_state_delete()'s
hlist_del_rcu calls under syzkaller load on linux-6.12.y stable
(reproduced on 6.12.47, also reachable via the same code path on
torvalds/master and on the ipsec tree). Nine unique signatures cluster
in the xfrm_state lifecycle, the load-bearing one being:

  BUG: KASAN: slab-use-after-free in __hlist_del include/linux/list.h:990 [inline]
  BUG: KASAN: slab-use-after-free in hlist_del_rcu include/linux/rculist.h:516 [inline]
  BUG: KASAN: slab-use-after-free in __xfrm_state_delete net/xfrm/xfrm_state.c
  Write of size 8 at addr ffff8881198bcb70 by task kworker/u8:9/435

  Workqueue: netns cleanup_net
  Call Trace:
   __hlist_del / hlist_del_rcu
   __xfrm_state_delete
   xfrm_state_delete
   xfrm_state_flush
   xfrm_state_fini
   ops_exit_list
   cleanup_net

The other observed signatures hit the same slab object from
__xfrm_state_lookup, xfrm_alloc_spi, __xfrm_state_insert and an OOB
write variant of __xfrm_state_delete, all on the byseq/byspi
hash chains.

__xfrm_state_delete() guards its byseq and byspi unhashes with
value-based predicates:

	if (x-&gt;km.seq)
		hlist_del_rcu(&amp;x-&gt;byseq);
	if (x-&gt;id.spi)
		hlist_del_rcu(&amp;x-&gt;byspi);

while everywhere else in the file (e.g. state_cache, state_cache_input)
the safer hlist_unhashed() check is used. xfrm_alloc_spi() sets
x-&gt;id.spi = newspi inside xfrm_state_lock and then immediately inserts
into byspi, but a path that observes x-&gt;id.spi != 0 outside of
xfrm_state_lock can still skip-or-hit the byspi unhash inconsistently
with whether x is actually on the list. The same holds for x-&gt;km.seq
versus byseq, and the bydst/bysrc unhashes have no predicate at all,
so a second __xfrm_state_delete() on the same object writes through
LIST_POISON pprev.

The defensive change here:

  - Use hlist_del_init_rcu() instead of hlist_del_rcu() on bydst,
    bysrc, byseq and byspi so a second deletion is a no-op rather
    than a write through LIST_POISON pprev. The byseq/byspi nodes
    are already initialised in xfrm_state_alloc().
  - Test hlist_unhashed() rather than the value predicate for
    byseq/byspi, so the unhash decision tracks list state rather than
    mutable scalar fields.

Empirical verification: applied this patch on top of v6.12.47, rebuilt,
and re-ran the same syzkaller harness for 1h16m on a previously-crashy
configuration that produced ~100 hits each of slab-use-after-free
Read in xfrm_alloc_spi / Read in __xfrm_state_lookup / Write in
__xfrm_state_delete. After the patch, 7.1M execs across 32 VMs at
~1550 exec/sec produced zero xfrm_state UAF/OOB hits. /proc/slabinfo
confirms the xfrm_state slab is actively allocated and freed during
the run (~143 KiB resident), so the fuzzer is still exercising those
code paths -- they just no longer crash.

Reproduction:

  - Linux 6.12.47 x86_64 + KASAN_GENERIC + KASAN_INLINE + KCOV
  - syzkaller @ 746545b8b1e4c3a128db8652b340d3df90ce61db
  - 32 QEMU/KVM VMs x 2 vCPU on AWS c5.metal bare metal
  - 9 unique signatures collected in ~9h, all within xfrm_state
    lifecycle

Fixes: fe9f1d8779cb ("xfrm: add state hashtable keyed by seq")
Fixes: 7b4dc3600e48 ("[XFRM]: Do not add a state whose SPI is zero to the SPI hash.")
Reported-by: Michal Kosiorek &lt;mkosiorek121@gmail.com&gt;
Tested-by: Michal Kosiorek &lt;mkosiorek121@gmail.com&gt;
Cc: stable@vger.kernel.org
Signed-off-by: Michal Kosiorek &lt;mkosiorek121@gmail.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
</entry>
<entry>
<title>xfrm: provide message size for XFRM_MSG_MAPPING</title>
<updated>2026-04-29T07:10:25Z</updated>
<author>
<name>Ruijie Li</name>
<email>ruijieli51@gmail.com</email>
</author>
<published>2026-04-28T16:41:43Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=28465227c80fe417b4013c432be1f3737cb9f9a3'/>
<id>urn:sha1:28465227c80fe417b4013c432be1f3737cb9f9a3</id>
<content type='text'>
The compat 64=&gt;32 translation path handles XFRM_MSG_MAPPING, but
xfrm_msg_min[] does not provide the native payload size for this
message type.

Add the missing XFRM_MSG_MAPPING entry so compat translation can size
and translate mapping notifications correctly.

Fixes: 5461fc0c8d9f ("xfrm/compat: Add 64=&gt;32-bit messages translator")
Cc: stable@kernel.org
Reported-by: Yuan Tan &lt;yuantan098@gmail.com&gt;
Reported-by: Yifan Wu &lt;yifanwucs@gmail.com&gt;
Reported-by: Juefei Pu &lt;tomapufckgml@gmail.com&gt;
Reported-by: Xin Liu &lt;bird@lzu.edu.cn&gt;
Signed-off-by: Ruijie Li &lt;ruijieli51@gmail.com&gt;
Signed-off-by: Ren Wei &lt;n05ec@lzu.edu.cn&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
</entry>
<entry>
<title>xfrm: Don't clobber inner headers when already set</title>
<updated>2026-04-28T04:47:20Z</updated>
<author>
<name>Cosmin Ratiu</name>
<email>cratiu@nvidia.com</email>
</author>
<published>2026-04-22T14:06:48Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=fa90a3145c0340c3f624206a81637c542254ea1d'/>
<id>urn:sha1:fa90a3145c0340c3f624206a81637c542254ea1d</id>
<content type='text'>
On VXLAN over IPsec egress, xfrm{4,6}_transport_output() blindly
overwrite inner_transport_header (== the inner TCP header saved in VXLAN
iptunnel_handle_offloads() -&gt; skb_reset_inner_headers()) with the
current transport_header (== the VXLAN outer UDP header set by
udp_tunnel_xmit_skb()).

This was a latent bug, harmless until commit [1] added a doff validation
check in qdisc_pkt_len_segs_init() for encapsulated GSO packets. With
the wrong inner_transport_header set by xfrm, qdisc_pkt_len_segs_init()
interprets inner_transport_header as a TCP header, reads doff=0 from the
upper byte of the VNI and drops the packet with DROP_REASON_SKB_BAD_GSO.

Besides the use in GSO to determine the header size of segmented
packets, inner_transport_header might be used by drivers to set up
inner checksum offloading by pointing the HW to the inner transport
header. A quick browse through available drivers shows that mlx5 uses
skb-&gt;csum_start specifically for this scenario, while others either
don't support VXLAN over IPsec crypto offload (ixgbe) or the HW is
capable of parsing the packets itself (nfp, Chelsio).

But in all cases, it is more correct to let the inner_transport_header
point to the innermost header instead of overwriting it in xfrm.

So fix this by guarding all four inner header save sites in
xfrm_output.c (xfrm{4,6}_transport_output, xfrm{4,6}_tunnel_encap_add)
with a check for skb-&gt;inner_protocol. When inner_protocol is set, a
tunnel layer (VXLAN, Geneve, GRE, etc.) has already saved the correct
inner header offsets and they must not be overwritten. When
inner_protocol is zero, no prior tunnel encapsulation exists and xfrm
must save the inner headers itself. The tunnel mode checks are only
added for completion, since they aren't strictly required, as
xfrm_output() forces software GSO in tunnel mode before encap.

This makes the previously added test pass:
 # ./tools/testing/selftests/drivers/net/hw/ipsec_vxlan.py
 TAP version 13
 1..4
 ok 1 ipsec_vxlan.test_vxlan_ipsec_crypto_offload.outer_v4_inner_v4
 ok 2 ipsec_vxlan.test_vxlan_ipsec_crypto_offload.outer_v4_inner_v6
 ok 3 ipsec_vxlan.test_vxlan_ipsec_crypto_offload.outer_v6_inner_v4
 ok 4 ipsec_vxlan.test_vxlan_ipsec_crypto_offload.outer_v6_inner_v6
 # Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0

[1] commit 7fb4c1967011 ("net: pull headers in qdisc_pkt_len_segs_init()")
Fixes: f1bd7d659ef0 ("xfrm: Add encapsulation header offsets while SKB is not encrypted")
Signed-off-by: Cosmin Ratiu &lt;cratiu@nvidia.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2026-04-09T20:20:59Z</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2026-04-02T17:57:09Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/wireguard-linux/commit/?id=b6e39e48469e37057fce27a1b87cf6d3e456aa42'/>
<id>urn:sha1:b6e39e48469e37057fce27a1b87cf6d3e456aa42</id>
<content type='text'>
Cross-merge networking fixes after downstream PR (net-7.0-rc8).

Conflicts:

net/ipv6/seg6_iptunnel.c
  c3812651b522f ("seg6: separate dst_cache for input and output paths in seg6 lwtunnel")
  78723a62b969a ("seg6: add per-route tunnel source address")
https://lore.kernel.org/adZhwtOYfo-0ImSa@sirena.org.uk

net/ipv4/icmp.c
  fde29fd934932 ("ipv4: icmp: fix null-ptr-deref in icmp_build_probe()")
  d98adfbdd5c01 ("ipv4: drop ipv6_stub usage and use direct function calls")
https://lore.kernel.org/adO3dccqnr6j-BL9@sirena.org.uk

Adjacent changes:

drivers/net/ethernet/stmicro/stmmac/chain_mode.c
  51f4e090b9f8 ("net: stmmac: fix integer underflow in chain mode")
  6b4286e05508 ("net: stmmac: rename STMMAC_GET_ENTRY() -&gt; STMMAC_NEXT_ENTRY()")

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
</entry>
</feed>
