aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/net/ipv4 (follow)
AgeCommit message (Collapse)AuthorFilesLines
2024-03-21Merge tag 'net-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds7-33/+28
Pull networking fixes from Jakub Kicinski: "Including fixes from CAN, netfilter, wireguard and IPsec. I'd like to highlight [ lowlight? - Linus ] Florian W stepping down as a netfilter maintainer due to constant stream of bug reports. Not sure what we can do but IIUC this is not the first such case. Current release - regressions: - rxrpc: fix use of page_frag_alloc_align(), it changed semantics and we added a new caller in a different subtree - xfrm: allow UDP encapsulation only in offload modes Current release - new code bugs: - tcp: fix refcnt handling in __inet_hash_connect() - Revert "net: Re-use and set mono_delivery_time bit for userspace tstamp packets", conflicted with some expectations in BPF uAPI Previous releases - regressions: - ipv4: raw: fix sending packets from raw sockets via IPsec tunnels - devlink: fix devlink's parallel command processing - veth: do not manipulate GRO when using XDP - esp: fix bad handling of pages from page_pool Previous releases - always broken: - report RCU QS for busy network kthreads (with Paul McK's blessing) - tcp/rds: fix use-after-free on netns with kernel TCP reqsk - virt: vmxnet3: fix missing reserved tailroom with XDP Misc: - couple of build fixes for Documentation" * tag 'net-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (59 commits) selftests: forwarding: Fix ping failure due to short timeout MAINTAINERS: step down as netfilter maintainer netfilter: nf_tables: Fix a memory leak in nf_tables_updchain net: dsa: mt7530: fix handling of all link-local frames net: dsa: mt7530: fix link-local frames that ingress vlan filtering ports bpf: report RCU QS in cpumap kthread net: report RCU QS on threaded NAPI repolling rcu: add a helper to report consolidated flavor QS ionic: update documentation for XDP support lib/bitmap: Fix bitmap_scatter() and bitmap_gather() kernel doc netfilter: nf_tables: do not compare internal table flags on updates netfilter: nft_set_pipapo: release elements in clone only from destroy path octeontx2-af: Use separate handlers for interrupts octeontx2-pf: Send UP messages to VF only when VF is up. octeontx2-pf: Use default max_active works instead of one octeontx2-pf: Wait till detach_resources msg is complete octeontx2: Detect the mbox up or down message via register devlink: fix port new reply cmd type tcp: Clear req->syncookie in reqsk_alloc(). net/bnx2x: Prevent access to a freed page in page_pool ...
2024-03-19Merge tag 'ipsec-2024-03-19' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsecJakub Kicinski1-4/+4
Steffen Klassert says: ==================== pull request (net): ipsec 2024-03-19 1) Fix possible page_pool leak triggered by esp_output. From Dragos Tatulea. 2) Fix UDP encapsulation in software GSO path. From Leon Romanovsky. * tag 'ipsec-2024-03-19' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: Allow UDP encapsulation only in offload modes net: esp: fix bad handling of pages from page_pool ==================== Link: https://lore.kernel.org/r/20240319110151.409825-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-19tcp: Clear req->syncookie in reqsk_alloc().Kuniyuki Iwashima1-0/+3
syzkaller reported a read of uninit req->syncookie. [0] Originally, req->syncookie was used only in tcp_conn_request() to indicate if we need to encode SYN cookie in SYN+ACK, so the field remains uninitialised in other places. The commit 695751e31a63 ("bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check().") added another meaning in ACK path; req->syncookie is set true if SYN cookie is validated by BPF kfunc. After the change, cookie_v[46]_check() always read req->syncookie, but it is not initialised in the normal SYN cookie case as reported by KMSAN. Let's make sure we always initialise req->syncookie in reqsk_alloc(). [0]: BUG: KMSAN: uninit-value in cookie_v4_check+0x22b7/0x29e0 net/ipv4/syncookies.c:477 cookie_v4_check+0x22b7/0x29e0 net/ipv4/syncookies.c:477 tcp_v4_cookie_check net/ipv4/tcp_ipv4.c:1855 [inline] tcp_v4_do_rcv+0xb17/0x10b0 net/ipv4/tcp_ipv4.c:1914 tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322 ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:314 [inline] ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:460 [inline] ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:449 NF_HOOK include/linux/netfilter.h:314 [inline] ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core net/core/dev.c:5538 [inline] __netif_receive_skb+0x319/0x9e0 net/core/dev.c:5652 process_backlog+0x480/0x8b0 net/core/dev.c:5981 __napi_poll+0xe7/0x980 net/core/dev.c:6632 napi_poll net/core/dev.c:6701 [inline] net_rx_action+0x89d/0x1820 net/core/dev.c:6813 __do_softirq+0x1c0/0x7d7 kernel/softirq.c:554 do_softirq+0x9a/0x100 kernel/softirq.c:455 __local_bh_enable_ip+0x9f/0xb0 kernel/softirq.c:382 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:820 [inline] __dev_queue_xmit+0x2776/0x52c0 net/core/dev.c:4362 dev_queue_xmit include/linux/netdevice.h:3091 [inline] neigh_hh_output include/net/neighbour.h:526 [inline] neigh_output include/net/neighbour.h:540 [inline] ip_finish_output2+0x187a/0x1b70 net/ipv4/ip_output.c:235 __ip_finish_output+0x287/0x810 ip_finish_output+0x4b/0x550 net/ipv4/ip_output.c:323 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip_output+0x15f/0x3f0 net/ipv4/ip_output.c:433 dst_output include/net/dst.h:450 [inline] ip_local_out net/ipv4/ip_output.c:129 [inline] __ip_queue_xmit+0x1e93/0x2030 net/ipv4/ip_output.c:535 ip_queue_xmit+0x60/0x80 net/ipv4/ip_output.c:549 __tcp_transmit_skb+0x3c70/0x4890 net/ipv4/tcp_output.c:1462 tcp_transmit_skb net/ipv4/tcp_output.c:1480 [inline] tcp_write_xmit+0x3ee1/0x8900 net/ipv4/tcp_output.c:2792 __tcp_push_pending_frames net/ipv4/tcp_output.c:2977 [inline] tcp_send_fin+0xa90/0x12e0 net/ipv4/tcp_output.c:3578 tcp_shutdown+0x198/0x1f0 net/ipv4/tcp.c:2716 inet_shutdown+0x33f/0x5b0 net/ipv4/af_inet.c:923 __sys_shutdown_sock net/socket.c:2425 [inline] __sys_shutdown net/socket.c:2437 [inline] __do_sys_shutdown net/socket.c:2445 [inline] __se_sys_shutdown+0x2a4/0x440 net/socket.c:2443 __x64_sys_shutdown+0x6c/0xa0 net/socket.c:2443 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 Uninit was stored to memory at: reqsk_alloc include/net/request_sock.h:148 [inline] inet_reqsk_alloc+0x651/0x7a0 net/ipv4/tcp_input.c:6978 cookie_tcp_reqsk_alloc+0xd4/0x900 net/ipv4/syncookies.c:328 cookie_tcp_check net/ipv4/syncookies.c:388 [inline] cookie_v4_check+0x289f/0x29e0 net/ipv4/syncookies.c:420 tcp_v4_cookie_check net/ipv4/tcp_ipv4.c:1855 [inline] tcp_v4_do_rcv+0xb17/0x10b0 net/ipv4/tcp_ipv4.c:1914 tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322 ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:314 [inline] ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:460 [inline] ip_rcv_finish+0x4a2/0x520 net/ipv4/ip_input.c:449 NF_HOOK include/linux/netfilter.h:314 [inline] ip_rcv+0xcd/0x380 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core net/core/dev.c:5538 [inline] __netif_receive_skb+0x319/0x9e0 net/core/dev.c:5652 process_backlog+0x480/0x8b0 net/core/dev.c:5981 __napi_poll+0xe7/0x980 net/core/dev.c:6632 napi_poll net/core/dev.c:6701 [inline] net_rx_action+0x89d/0x1820 net/core/dev.c:6813 __do_softirq+0x1c0/0x7d7 kernel/softirq.c:554 Uninit was created at: __alloc_pages+0x9a7/0xe00 mm/page_alloc.c:4592 __alloc_pages_node include/linux/gfp.h:238 [inline] alloc_pages_node include/linux/gfp.h:261 [inline] alloc_slab_page mm/slub.c:2175 [inline] allocate_slab mm/slub.c:2338 [inline] new_slab+0x2de/0x1400 mm/slub.c:2391 ___slab_alloc+0x1184/0x33d0 mm/slub.c:3525 __slab_alloc mm/slub.c:3610 [inline] __slab_alloc_node mm/slub.c:3663 [inline] slab_alloc_node mm/slub.c:3835 [inline] kmem_cache_alloc+0x6d3/0xbe0 mm/slub.c:3852 reqsk_alloc include/net/request_sock.h:131 [inline] inet_reqsk_alloc+0x66/0x7a0 net/ipv4/tcp_input.c:6978 tcp_conn_request+0x484/0x44e0 net/ipv4/tcp_input.c:7135 tcp_v4_conn_request+0x16f/0x1d0 net/ipv4/tcp_ipv4.c:1716 tcp_rcv_state_process+0x2e5/0x4bb0 net/ipv4/tcp_input.c:6655 tcp_v4_do_rcv+0xbfd/0x10b0 net/ipv4/tcp_ipv4.c:1929 tcp_v4_rcv+0x4ce4/0x5420 net/ipv4/tcp_ipv4.c:2322 ip_protocol_deliver_rcu+0x2a3/0x13d0 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x332/0x500 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:314 [inline] ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:460 [inline] ip_sublist_rcv_finish net/ipv4/ip_input.c:580 [inline] ip_list_rcv_finish net/ipv4/ip_input.c:631 [inline] ip_sublist_rcv+0x15f3/0x17f0 net/ipv4/ip_input.c:639 ip_list_rcv+0x9ef/0xa40 net/ipv4/ip_input.c:674 __netif_receive_skb_list_ptype net/core/dev.c:5581 [inline] __netif_receive_skb_list_core+0x15c5/0x1670 net/core/dev.c:5629 __netif_receive_skb_list net/core/dev.c:5681 [inline] netif_receive_skb_list_internal+0x106c/0x16f0 net/core/dev.c:5773 gro_normal_list include/net/gro.h:438 [inline] napi_complete_done+0x425/0x880 net/core/dev.c:6113 virtqueue_napi_complete drivers/net/virtio_net.c:465 [inline] virtnet_poll+0x149d/0x2240 drivers/net/virtio_net.c:2211 __napi_poll+0xe7/0x980 net/core/dev.c:6632 napi_poll net/core/dev.c:6701 [inline] net_rx_action+0x89d/0x1820 net/core/dev.c:6813 __do_softirq+0x1c0/0x7d7 kernel/softirq.c:554 CPU: 0 PID: 16792 Comm: syz-executor.2 Not tainted 6.8.0-syzkaller-05562-g61387b8dcf1d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024 Fixes: 695751e31a63 ("bpf: tcp: Handle BPF SYN Cookie in cookie_v[46]_check().") Reported-by: syzkaller <syzkaller@googlegroups.com> Reported-by: Eric Dumazet <edumazet@google.com> Closes: https://lore.kernel.org/bpf/CANn89iKdN9c+C_2JAUbc+VY3DDQjAQukMtiBbormAmAk9CdvQA@mail.gmail.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240315224710.55209-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-19ipv4: raw: Fix sending packets from raw sockets via IPsec tunnelsTobias Brunner1-0/+1
Since the referenced commit, the xfrm_inner_extract_output() function uses the protocol field to determine the address family. So not setting it for IPv4 raw sockets meant that such packets couldn't be tunneled via IPsec anymore. IPv6 raw sockets are not affected as they already set the protocol since 9c9c9ad5fae7 ("ipv6: set skb->protocol on tcp, raw and ip6_append_data genereated skbs"). Fixes: f4796398f21b ("xfrm: Remove inner/outer modes from output path") Signed-off-by: Tobias Brunner <tobias@strongswan.org> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/c5d9a947-eb19-4164-ac99-468ea814ce20@strongswan.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-18Revert "net: Re-use and set mono_delivery_time bit for userspace tstamp packets"Abhishek Chauhan2-2/+0
This reverts commit 885c36e59f46375c138de18ff1692f18eff67b7f. The patch currently broke the bpf selftest test_tc_dtime because uapi field __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not necessarily mean mono with the original fix as the bit was re-used for userspace timestamp as well to avoid tstamp reset in the forwarding path. To solve this we need to keep mono_delivery_time as is and introduce another bit called user_delivery_time and fall back to the initial proposal of setting the user_delivery_time bit based on sk_clockid set from userspace. Fixes: 885c36e59f46 ("net: Re-use and set mono_delivery_time bit for userspace tstamp packets") Link: https://lore.kernel.org/netdev/bc037db4-58bb-4861-ac31-a361a93841d3@linux.dev/ Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-18net: esp: fix bad handling of pages from page_poolDragos Tatulea1-4/+4
When the skb is reorganized during esp_output (!esp->inline), the pages coming from the original skb fragments are supposed to be released back to the system through put_page. But if the skb fragment pages are originating from a page_pool, calling put_page on them will trigger a page_pool leak which will eventually result in a crash. This leak can be easily observed when using CONFIG_DEBUG_VM and doing ipsec + gre (non offloaded) forwarding: BUG: Bad page state in process ksoftirqd/16 pfn:1451b6 page:00000000de2b8d32 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1451b6000 pfn:0x1451b6 flags: 0x200000000000000(node=0|zone=2) page_type: 0xffffffff() raw: 0200000000000000 dead000000000040 ffff88810d23c000 0000000000000000 raw: 00000001451b6000 0000000000000001 00000000ffffffff 0000000000000000 page dumped because: page_pool leak Modules linked in: ip_gre gre mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat xt_addrtype br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core overlay zram zsmalloc fuse [last unloaded: mlx5_core] CPU: 16 PID: 96 Comm: ksoftirqd/16 Not tainted 6.8.0-rc4+ #22 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x36/0x50 bad_page+0x70/0xf0 free_unref_page_prepare+0x27a/0x460 free_unref_page+0x38/0x120 esp_ssg_unref.isra.0+0x15f/0x200 esp_output_tail+0x66d/0x780 esp_xmit+0x2c5/0x360 validate_xmit_xfrm+0x313/0x370 ? validate_xmit_skb+0x1d/0x330 validate_xmit_skb_list+0x4c/0x70 sch_direct_xmit+0x23e/0x350 __dev_queue_xmit+0x337/0xba0 ? nf_hook_slow+0x3f/0xd0 ip_finish_output2+0x25e/0x580 iptunnel_xmit+0x19b/0x240 ip_tunnel_xmit+0x5fb/0xb60 ipgre_xmit+0x14d/0x280 [ip_gre] dev_hard_start_xmit+0xc3/0x1c0 __dev_queue_xmit+0x208/0xba0 ? nf_hook_slow+0x3f/0xd0 ip_finish_output2+0x1ca/0x580 ip_sublist_rcv_finish+0x32/0x40 ip_sublist_rcv+0x1b2/0x1f0 ? ip_rcv_finish_core.constprop.0+0x460/0x460 ip_list_rcv+0x103/0x130 __netif_receive_skb_list_core+0x181/0x1e0 netif_receive_skb_list_internal+0x1b3/0x2c0 napi_gro_receive+0xc8/0x200 gro_cell_poll+0x52/0x90 __napi_poll+0x25/0x1a0 net_rx_action+0x28e/0x300 __do_softirq+0xc3/0x276 ? sort_range+0x20/0x20 run_ksoftirqd+0x1e/0x30 smpboot_thread_fn+0xa6/0x130 kthread+0xcd/0x100 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x31/0x50 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork_asm+0x11/0x20 </TASK> The suggested fix is to introduce a new wrapper (skb_page_unref) that covers page refcounting for page_pool pages as well. Cc: stable@vger.kernel.org Fixes: 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling") Reported-and-tested-by: Anatoli N.Chechelnickiy <Anatoli.Chechelnickiy@m.interpipe.biz> Reported-by: Ian Kumlien <ian.kumlien@gmail.com> Link: https://lore.kernel.org/netdev/CAA85sZvvHtrpTQRqdaOx6gd55zPAVsqMYk_Lwh4Md5knTq7AyA@mail.gmail.com Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-03-14Merge tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mmLinus Torvalds1-1/+1
Pull non-MM updates from Andrew Morton: - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min heap optimizations". - Kuan-Wei Chiu has also sped up the library sorting code in the series "lib/sort: Optimize the number of swaps and comparisons". - Alexey Gladkov has added the ability for code running within an IPC namespace to alter its IPC and MQ limits. The series is "Allow to change ipc/mq sysctls inside ipc namespace". - Geert Uytterhoeven has contributed some dhrystone maintenance work in the series "lib: dhry: miscellaneous cleanups". - Ryusuke Konishi continues nilfs2 maintenance work in the series "nilfs2: eliminate kmap and kmap_atomic calls" "nilfs2: fix kernel bug at submit_bh_wbc()" - Nathan Chancellor has updated our build tools requirements in the series "Bump the minimum supported version of LLVM to 13.0.1". - Muhammad Usama Anjum continues with the selftests maintenance work in the series "selftests/mm: Improve run_vmtests.sh". - Oleg Nesterov has done some maintenance work against the signal code in the series "get_signal: minor cleanups and fix". Plus the usual shower of singleton patches in various parts of the tree. Please see the individual changelogs for details. * tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits) nilfs2: prevent kernel bug at submit_bh_wbc() nilfs2: fix failure to detect DAT corruption in btree and direct mappings ocfs2: enable ocfs2_listxattr for special files ocfs2: remove SLAB_MEM_SPREAD flag usage assoc_array: fix the return value in assoc_array_insert_mid_shortcut() buildid: use kmap_local_page() watchdog/core: remove sysctl handlers from public header nilfs2: use div64_ul() instead of do_div() mul_u64_u64_div_u64: increase precision by conditionally swapping a and b kexec: copy only happens before uchunk goes to zero get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_task get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksig get_signal: don't abuse ksig->info.si_signo and ksig->sig const_structs.checkpatch: add device_type Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>" dyndbg: replace kstrdup() + strchr() with kstrdup_and_replace() list: leverage list_is_head() for list_entry_is_head() nilfs2: MAINTAINERS: drop unreachable project mirror site smp: make __smp_processor_id() 0-argument macro fat: fix uninitialized field in nostale filehandles ...
2024-03-14tcp: Fix refcnt handling in __inet_hash_connect().Kuniyuki Iwashima1-1/+1
syzbot reported a warning in sk_nulls_del_node_init_rcu(). The commit 66b60b0c8c4a ("dccp/tcp: Unhash sk from ehash for tb2 alloc failure after check_estalblished().") tried to fix an issue that an unconnected socket occupies an ehash entry when bhash2 allocation fails. In such a case, we need to revert changes done by check_established(), which does not hold refcnt when inserting socket into ehash. So, to revert the change, we need to __sk_nulls_add_node_rcu() instead of sk_nulls_add_node_rcu(). Otherwise, sock_put() will cause refcnt underflow and leak the socket. [0]: WARNING: CPU: 0 PID: 23948 at include/net/sock.h:799 sk_nulls_del_node_init_rcu+0x166/0x1a0 include/net/sock.h:799 Modules linked in: CPU: 0 PID: 23948 Comm: syz-executor.2 Not tainted 6.8.0-rc6-syzkaller-00159-gc055fc00c07b #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 RIP: 0010:sk_nulls_del_node_init_rcu+0x166/0x1a0 include/net/sock.h:799 Code: e8 7f 71 c6 f7 83 fb 02 7c 25 e8 35 6d c6 f7 4d 85 f6 0f 95 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 1b 6d c6 f7 90 <0f> 0b 90 eb b2 e8 10 6d c6 f7 4c 89 e7 be 04 00 00 00 e8 63 e7 d2 RSP: 0018:ffffc900032d7848 EFLAGS: 00010246 RAX: ffffffff89cd0035 RBX: 0000000000000001 RCX: 0000000000040000 RDX: ffffc90004de1000 RSI: 000000000003ffff RDI: 0000000000040000 RBP: 1ffff1100439ac26 R08: ffffffff89ccffe3 R09: 1ffff1100439ac28 R10: dffffc0000000000 R11: ffffed100439ac29 R12: ffff888021cd6140 R13: dffffc0000000000 R14: ffff88802a9bf5c0 R15: ffff888021cd6130 FS: 00007f3b823f16c0(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3b823f0ff8 CR3: 000000004674a000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __inet_hash_connect+0x140f/0x20b0 net/ipv4/inet_hashtables.c:1139 dccp_v6_connect+0xcb9/0x1480 net/dccp/ipv6.c:956 __inet_stream_connect+0x262/0xf30 net/ipv4/af_inet.c:678 inet_stream_connect+0x65/0xa0 net/ipv4/af_inet.c:749 __sys_connect_file net/socket.c:2048 [inline] __sys_connect+0x2df/0x310 net/socket.c:2065 __do_sys_connect net/socket.c:2075 [inline] __se_sys_connect net/socket.c:2072 [inline] __x64_sys_connect+0x7a/0x90 net/socket.c:2072 do_syscall_64+0xf9/0x240 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7f3b8167dda9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3b823f10c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 00007f3b817abf80 RCX: 00007f3b8167dda9 RDX: 000000000000001c RSI: 0000000020000040 RDI: 0000000000000003 RBP: 00007f3b823f1120 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 R13: 000000000000000b R14: 00007f3b817abf80 R15: 00007ffd3beb57b8 </TASK> Reported-by: syzbot+12c506c1aae251e70449@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=12c506c1aae251e70449 Fixes: 66b60b0c8c4a ("dccp/tcp: Unhash sk from ehash for tb2 alloc failure after check_estalblished().") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240308201623.65448-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-12rds: tcp: Fix use-after-free of net in reqsk_timer_handler().Kuniyuki Iwashima1-4/+0
syzkaller reported a warning of netns tracker [0] followed by KASAN splat [1] and another ref tracker warning [1]. syzkaller could not find a repro, but in the log, the only suspicious sequence was as follows: 18:26:22 executing program 1: r0 = socket$inet6_mptcp(0xa, 0x1, 0x106) ... connect$inet6(r0, &(0x7f0000000080)={0xa, 0x4001, 0x0, @loopback}, 0x1c) (async) The notable thing here is 0x4001 in connect(), which is RDS_TCP_PORT. So, the scenario would be: 1. unshare(CLONE_NEWNET) creates a per netns tcp listener in rds_tcp_listen_init(). 2. syz-executor connect()s to it and creates a reqsk. 3. syz-executor exit()s immediately. 4. netns is dismantled. [0] 5. reqsk timer is fired, and UAF happens while freeing reqsk. [1] 6. listener is freed after RCU grace period. [2] Basically, reqsk assumes that the listener guarantees netns safety until all reqsk timers are expired by holding the listener's refcount. However, this was not the case for kernel sockets. Commit 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in inet_twsk_purge()") fixed this issue only for per-netns ehash. Let's apply the same fix for the global ehash. [0]: ref_tracker: net notrefcnt@0000000065449cc3 has 1/1 users at sk_alloc (./include/net/net_namespace.h:337 net/core/sock.c:2146) inet6_create (net/ipv6/af_inet6.c:192 net/ipv6/af_inet6.c:119) __sock_create (net/socket.c:1572) rds_tcp_listen_init (net/rds/tcp_listen.c:279) rds_tcp_init_net (net/rds/tcp.c:577) ops_init (net/core/net_namespace.c:137) setup_net (net/core/net_namespace.c:340) copy_net_ns (net/core/net_namespace.c:497) create_new_namespaces (kernel/nsproxy.c:110) unshare_nsproxy_namespaces (kernel/nsproxy.c:228 (discriminator 4)) ksys_unshare (kernel/fork.c:3429) __x64_sys_unshare (kernel/fork.c:3496) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) ... WARNING: CPU: 0 PID: 27 at lib/ref_tracker.c:179 ref_tracker_dir_exit (lib/ref_tracker.c:179) [1]: BUG: KASAN: slab-use-after-free in inet_csk_reqsk_queue_drop (./include/net/inet_hashtables.h:180 net/ipv4/inet_connection_sock.c:952 net/ipv4/inet_connection_sock.c:966) Read of size 8 at addr ffff88801b370400 by task swapper/0/0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: <IRQ> dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) print_report (mm/kasan/report.c:378 mm/kasan/report.c:488) kasan_report (mm/kasan/report.c:603) inet_csk_reqsk_queue_drop (./include/net/inet_hashtables.h:180 net/ipv4/inet_connection_sock.c:952 net/ipv4/inet_connection_sock.c:966) reqsk_timer_handler (net/ipv4/inet_connection_sock.c:979 net/ipv4/inet_connection_sock.c:1092) call_timer_fn (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/timer.h:127 kernel/time/timer.c:1701) __run_timers.part.0 (kernel/time/timer.c:1752 kernel/time/timer.c:2038) run_timer_softirq (kernel/time/timer.c:2053) __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:554) irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632 kernel/softirq.c:644) sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076 (discriminator 14)) </IRQ> Allocated by task 258 on cpu 0 at 83.612050s: kasan_save_stack (mm/kasan/common.c:48) kasan_save_track (mm/kasan/common.c:68) __kasan_slab_alloc (mm/kasan/common.c:343) kmem_cache_alloc (mm/slub.c:3813 mm/slub.c:3860 mm/slub.c:3867) copy_net_ns (./include/linux/slab.h:701 net/core/net_namespace.c:421 net/core/net_namespace.c:480) create_new_namespaces (kernel/nsproxy.c:110) unshare_nsproxy_namespaces (kernel/nsproxy.c:228 (discriminator 4)) ksys_unshare (kernel/fork.c:3429) __x64_sys_unshare (kernel/fork.c:3496) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) Freed by task 27 on cpu 0 at 329.158864s: kasan_save_stack (mm/kasan/common.c:48) kasan_save_track (mm/kasan/common.c:68) kasan_save_free_info (mm/kasan/generic.c:643) __kasan_slab_free (mm/kasan/common.c:265) kmem_cache_free (mm/slub.c:4299 mm/slub.c:4363) cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:446 net/core/net_namespace.c:639) process_one_work (kernel/workqueue.c:2638) worker_thread (kernel/workqueue.c:2700 kernel/workqueue.c:2787) kthread (kernel/kthread.c:388) ret_from_fork (arch/x86/kernel/process.c:153) ret_from_fork_asm (arch/x86/entry/entry_64.S:250) The buggy address belongs to the object at ffff88801b370000 which belongs to the cache net_namespace of size 4352 The buggy address is located 1024 bytes inside of freed 4352-byte region [ffff88801b370000, ffff88801b371100) [2]: WARNING: CPU: 0 PID: 95 at lib/ref_tracker.c:228 ref_tracker_free (lib/ref_tracker.c:228 (discriminator 1)) Modules linked in: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:ref_tracker_free (lib/ref_tracker.c:228 (discriminator 1)) ... Call Trace: <IRQ> __sk_destruct (./include/net/net_namespace.h:353 net/core/sock.c:2204) rcu_core (./arch/x86/include/asm/preempt.h:26 kernel/rcu/tree.c:2165 kernel/rcu/tree.c:2433) __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:554) irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632 kernel/softirq.c:644) sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076 (discriminator 14)) </IRQ> Reported-by: syzkaller <syzkaller@googlegroups.com> Suggested-by: Eric Dumazet <edumazet@google.com> Fixes: 467fa15356ac ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240308200122.64357-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-12tcp: Fix NEW_SYN_RECV handling in inet_twsk_purge()Eric Dumazet1-22/+19
inet_twsk_purge() uses rcu to find TIME_WAIT and NEW_SYN_RECV objects to purge. These objects use SLAB_TYPESAFE_BY_RCU semantic and need special care. We need to use refcount_inc_not_zero(&sk->sk_refcnt). Reuse the existing correct logic I wrote for TIME_WAIT, because both structures have common locations for sk_state, sk_family, and netns pointer. If after the refcount_inc_not_zero() the object fields longer match the keys, use sock_gen_put(sk) to release the refcount. Then we can call inet_twsk_deschedule_put() for TIME_WAIT, inet_csk_reqsk_queue_drop_and_put() for NEW_SYN_RECV sockets, with BH disabled. Then we need to restart the loop because we had drop rcu_read_lock(). Fixes: 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in inet_twsk_purge()") Link: https://lore.kernel.org/netdev/CANn89iLvFuuihCtt9PME2uS1WJATnf5fKjDToa1WzVnRzHnPfg@mail.gmail.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240308200122.64357-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+14
Merge in late fixes to prepare for the 6.9 net-next PR. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=yIdo Schimmel1-1/+2
Locally generated packets can increment the new nexthop statistics from process context, resulting in the following splat [1] due to preemption being enabled. Fix by using get_cpu_ptr() / put_cpu_ptr() which will which take care of disabling / enabling preemption. BUG: using smp_processor_id() in preemptible [00000000] code: ping/949 caller is nexthop_select_path+0xcf8/0x1e30 CPU: 12 PID: 949 Comm: ping Not tainted 6.8.0-rc7-custom-gcb450f605fae #11 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xbd/0xe0 check_preemption_disabled+0xce/0xe0 nexthop_select_path+0xcf8/0x1e30 fib_select_multipath+0x865/0x18b0 fib_select_path+0x311/0x1160 ip_route_output_key_hash_rcu+0xe54/0x2720 ip_route_output_key_hash+0x193/0x380 ip_route_output_flow+0x25/0x130 raw_sendmsg+0xbab/0x34a0 inet_sendmsg+0xa2/0xe0 __sys_sendto+0x2ad/0x430 __x64_sys_sendto+0xe5/0x1c0 do_syscall_64+0xc5/0x1d0 entry_SYSCALL_64_after_hwframe+0x63/0x6b [...] Fixes: f4676ea74b85 ("net: nexthop: Add nexthop group entry stats") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240311162307.545385-5-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11nexthop: Fix out-of-bounds access during attribute validationIdo Schimmel1-12/+17
Passing a maximum attribute type to nlmsg_parse() that is larger than the size of the passed policy will result in an out-of-bounds access [1] when the attribute type is used as an index into the policy array. Fix by setting the maximum attribute type according to the policy size, as is already done for RTM_NEWNEXTHOP messages. Add a test case that triggers the bug. No regressions in fib nexthops tests: # ./fib_nexthops.sh [...] Tests passed: 236 Tests failed: 0 [1] BUG: KASAN: global-out-of-bounds in __nla_validate_parse+0x1e53/0x2940 Read of size 1 at addr ffffffff99ab4d20 by task ip/610 CPU: 3 PID: 610 Comm: ip Not tainted 6.8.0-rc7-custom-gd435d6e3e161 #9 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x8f/0xe0 print_report+0xcf/0x670 kasan_report+0xd8/0x110 __nla_validate_parse+0x1e53/0x2940 __nla_parse+0x40/0x50 rtm_del_nexthop+0x1bd/0x400 rtnetlink_rcv_msg+0x3cc/0xf20 netlink_rcv_skb+0x170/0x440 netlink_unicast+0x540/0x820 netlink_sendmsg+0x8d3/0xdb0 ____sys_sendmsg+0x31f/0xa60 ___sys_sendmsg+0x13a/0x1e0 __sys_sendmsg+0x11c/0x1f0 do_syscall_64+0xc5/0x1d0 entry_SYSCALL_64_after_hwframe+0x63/0x6b [...] The buggy address belongs to the variable: rtm_nh_policy_del+0x20/0x40 Fixes: 2118f9390d83 ("net: nexthop: Adjust netlink policy parsing for a new attribute") Reported-by: Eric Dumazet <edumazet@google.com> Closes: https://lore.kernel.org/netdev/CANn89i+UNcG0PJMW5X7gOMunF38ryMh=L1aeZUKH3kL4UdUqag@mail.gmail.com/ Reported-by: syzbot+65bb09a7208ce3d4a633@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/00000000000088981b06133bc07b@google.com/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240311162307.545385-4-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11nexthop: Only parse NHA_OP_FLAGS for dump messages that require itIdo Schimmel1-5/+5
The attribute is parsed in __nh_valid_dump_req() which is called by the dump handlers of RTM_GETNEXTHOP and RTM_GETNEXTHOPBUCKET although it is only used by the former and rejected by the policy of the latter. Move the parsing to nh_valid_dump_req() which is only called by the dump handler of RTM_GETNEXTHOP. This is a preparation for a subsequent patch. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240311162307.545385-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11nexthop: Only parse NHA_OP_FLAGS for get messages that require itIdo Schimmel1-8/+8
The attribute is parsed into 'op_flags' in nh_valid_get_del_req() which is called from the handlers of three message types: RTM_DELNEXTHOP, RTM_GETNEXTHOPBUCKET and RTM_GETNEXTHOP. The attribute is only used by the latter and rejected by the policies of the other two. Pass 'op_flags' as NULL from the handlers of the other two and only parse the attribute when the argument is not NULL. This is a preparation for a subsequent patch. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240311162307.545385-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski1-5/+1
Alexei Starovoitov says: ==================== pull-request: bpf-next 2024-03-11 We've added 59 non-merge commits during the last 9 day(s) which contain a total of 88 files changed, 4181 insertions(+), 590 deletions(-). The main changes are: 1) Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce VM_SPARSE kind and vm_area_[un]map_pages to be used in bpf_arena, from Alexei. 2) Introduce bpf_arena which is sparse shared memory region between bpf program and user space where structures inside the arena can have pointers to other areas of the arena, and pointers work seamlessly for both user-space programs and bpf programs, from Alexei and Andrii. 3) Introduce may_goto instruction that is a contract between the verifier and the program. The verifier allows the program to loop assuming it's behaving well, but reserves the right to terminate it, from Alexei. 4) Use IETF format for field definitions in the BPF standard document, from Dave. 5) Extend struct_ops libbpf APIs to allow specify version suffixes for stuct_ops map types, share the same BPF program between several map definitions, and other improvements, from Eduard. 6) Enable struct_ops support for more than one page in trampolines, from Kui-Feng. 7) Support kCFI + BPF on riscv64, from Puranjay. 8) Use bpf_prog_pack for arm64 bpf trampoline, from Puranjay. 9) Fix roundup_pow_of_two undefined behavior on 32-bit archs, from Toke. ==================== Link: https://lore.kernel.org/r/20240312003646.8692-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11net: nexthop: Have all NH notifiers carry NH IDPetr Machata1-1/+1
When sending the notifications to collect NH statistics for resilient groups, the driver will need to know the nexthop IDs in individual buckets to look up the right counter. To that end, move the nexthop ID from struct nh_notifier_grp_entry_info to nh_notifier_single_info. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/8f964cd50b1a56d3606ce7ab4c50354ae019c43b.1709901020.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11net: nexthop: Initialize NH group ID in resilient NH group notifiersPetr Machata1-0/+1
The NEXTHOP_EVENT_RES_TABLE_PRE_REPLACE notifier currently keeps the group ID unset. That makes it impossible to look up the group for which the notifier is intended. This is not an issue at the moment, because the only client is netdevsim, and that just so that it veto replacements, which is a static property not tied to a particular group. But for any practical use, the ID is necessary. Set it. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/025fef095dcfb408042568bb5439da014d47239e.1709901020.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-11tcp: annotate a data-race around sysctl_tcp_wmem[0]Jason Xing1-1/+1
When reading wmem[0], it could be changed concurrently without READ_ONCE() protection. So add one annotation here. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-11udp: no longer touch sk->sk_refcnt in early demuxEric Dumazet1-2/+3
After commits ca065d0cf80f ("udp: no longer use SLAB_DESTROY_BY_RCU") and 7ae215d23c12 ("bpf: Don't refcount LISTEN sockets in sk_assign()") UDP early demux no longer need to grab a refcount on the UDP socket. This save two atomic operations per incoming packet for connected sockets. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Joe Stringer <joe@wand.net.nz> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Cc: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-11udp: fix incorrect parameter validation in the udp_lib_getsockopt() functionGavrilov Ilia1-2/+2
The 'len' variable can't be negative when assigned the result of 'min_t' because all 'min_t' parameters are cast to unsigned int, and then the minimum one is chosen. To fix the logic, check 'len' as read from 'optlen', where the types of relevant variables are (signed) int. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-11ipmr: fix incorrect parameter validation in the ip_mroute_getsockopt() functionGavrilov Ilia1-1/+3
The 'olr' variable can't be negative when assigned the result of 'min_t' because all 'min_t' parameters are cast to unsigned int, and then the minimum one is chosen. To fix the logic, check 'olr' as read from 'optlen', where the types of relevant variables are (signed) int. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-11tcp: fix incorrect parameter validation in the do_tcp_getsockopt() functionGavrilov Ilia1-2/+2
The 'len' variable can't be negative when assigned the result of 'min_t' because all 'min_t' parameters are cast to unsigned int, and then the minimum one is chosen. To fix the logic, check 'len' as read from 'optlen', where the types of relevant variables are (signed) int. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08ipv4: raw: check sk->sk_rcvbuf earlierEric Dumazet1-0/+7
There is no point cloning an skb and having to free the clone if the receive queue of the raw socket is full. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240307163020.2524409-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08nexthop: Simplify dump error handlingIdo Schimmel1-9/+0
The only error that can happen during a nexthop dump is insufficient space in the skb caring the netlink messages (EMSGSIZE). If this happens and some messages were already filled in, the nexthop code returns the skb length to signal the netlink core that more objects need to be dumped. After commit b5a899154aa9 ("netlink: handle EMSGSIZE errors in the core") there is no need to handle this error in the nexthop code as it is now handled in the core. Simplify the code and simply return the error to the core. No regressions in nexthop tests: # ./fib_nexthops.sh Tests passed: 234 Tests failed: 0 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240307154727.3555462-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08net: nexthop: Expose nexthop group HW stats to user spaceIdo Schimmel1-8/+122
Add netlink support for reading NH group hardware stats. Stats collection is done through a new notifier, NEXTHOP_EVENT_HW_STATS_REPORT_DELTA. Drivers that implement HW counters for a given NH group are thereby asked to collect the stats and report back to core by calling nh_grp_hw_stats_report_delta(). This is similar to what netdevice L3 stats do. Besides exposing number of packets that passed in the HW datapath, also include information on whether any driver actually realizes the counters. The core can tell based on whether it got any _report_delta() reports from the drivers. This allows enabling the statistics at the group at any time, with drivers opting into supporting them. This is also in line with what netdevice L3 stats are doing. So as not to waste time and space, tie the collection and reporting of HW stats with a new op flag, NHA_OP_FLAG_DUMP_HW_STATS. Co-developed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Kees Cook <keescook@chromium.org> # For the __counted_by bits Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08net: nexthop: Add ability to enable / disable hardware statisticsIdo Schimmel1-1/+14
Add netlink support for enabling collection of HW statistics on nexthop groups. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08net: nexthop: Add hardware statistics notificationsIdo Schimmel1-0/+2
Add hw_stats field to several notifier structures to communicate to the drivers that HW statistics should be configured for nexthops within a given group. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08net: nexthop: Expose nexthop group stats to user spaceIdo Schimmel1-8/+87
Add netlink support for reading NH group stats. This data is only for statistics of the traffic in the SW datapath. HW nexthop group statistics will be added in the following patches. Emission of the stats is keyed to a new op_stats flag to avoid cluttering the netlink message with stats if the user doesn't need them: NHA_OP_FLAG_DUMP_STATS. Co-developed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08net: nexthop: Add nexthop group entry statsIdo Schimmel1-4/+31
Add nexthop group entry stats to count the number of packets forwarded via each nexthop in the group. The stats will be exposed to user space for better data path observability in the next patch. The per-CPU stats pointer is placed at the beginning of 'struct nh_grp_entry', so that all the fields accessed for the data path reside on the same cache line: struct nh_grp_entry { struct nexthop * nh; /* 0 8 */ struct nh_grp_entry_stats * stats; /* 8 8 */ u8 weight; /* 16 1 */ /* XXX 7 bytes hole, try to pack */ union { struct { atomic_t upper_bound; /* 24 4 */ } hthr; /* 24 4 */ struct { struct list_head uw_nh_entry; /* 24 16 */ u16 count_buckets; /* 40 2 */ u16 wants_buckets; /* 42 2 */ } res; /* 24 24 */ }; /* 24 24 */ struct list_head nh_list; /* 48 16 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct nexthop * nh_parent; /* 64 8 */ /* size: 72, cachelines: 2, members: 6 */ /* sum members: 65, holes: 1, sum holes: 7 */ /* last cacheline: 8 bytes */ }; Co-developed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08net: nexthop: Add NHA_OP_FLAGSPetr Machata1-4/+20
In order to add per-nexthop statistics, but still not increase netlink message size for consumers that do not care about them, there needs to be a toggle through which the user indicates their desire to get the statistics. To that end, add a new attribute, NHA_OP_FLAGS. The idea is to be able to use the attribute for carrying of arbitrary operation-specific flags, i.e. not make it specific for get / dump. Add the new attribute to get and dump policies, but do not actually allow any flags yet -- those will come later as the flags themselves are defined. Add the necessary parsing code. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08net: nexthop: Adjust netlink policy parsing for a new attributePetr Machata1-30/+28
A following patch will introduce a new attribute, op-specific flags to adjust the behavior of an operation. Different operations will recognize different flags. - To make the differentiation possible, stop sharing the policies for get and del operations. - To allow querying for presence of the attribute, have all the attribute arrays sized to NHA_MAX, regardless of what is permitted by policy, and pass the corresponding value to nlmsg_parse() as well. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08net: ip_tunnel: make sure to pull inner header in ip_tunnel_rcv()Eric Dumazet1-1/+14
Apply the same fix than ones found in : 8d975c15c0cd ("ip6_tunnel: make sure to pull inner header in __ip6_tnl_rcv()") 1ca1ba465e55 ("geneve: make sure to pull inner header in geneve_rx()") We have to save skb->network_header in a temporary variable in order to be able to recompute the network_header pointer after a pskb_inet_may_pull() call. pskb_inet_may_pull() makes sure the needed headers are in skb->head. syzbot reported: BUG: KMSAN: uninit-value in __INET_ECN_decapsulate include/net/inet_ecn.h:253 [inline] BUG: KMSAN: uninit-value in INET_ECN_decapsulate include/net/inet_ecn.h:275 [inline] BUG: KMSAN: uninit-value in IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline] BUG: KMSAN: uninit-value in ip_tunnel_rcv+0xed9/0x2ed0 net/ipv4/ip_tunnel.c:409 __INET_ECN_decapsulate include/net/inet_ecn.h:253 [inline] INET_ECN_decapsulate include/net/inet_ecn.h:275 [inline] IP_ECN_decapsulate include/net/inet_ecn.h:302 [inline] ip_tunnel_rcv+0xed9/0x2ed0 net/ipv4/ip_tunnel.c:409 __ipgre_rcv+0x9bc/0xbc0 net/ipv4/ip_gre.c:389 ipgre_rcv net/ipv4/ip_gre.c:411 [inline] gre_rcv+0x423/0x19f0 net/ipv4/ip_gre.c:447 gre_rcv+0x2a4/0x390 net/ipv4/gre_demux.c:163 ip_protocol_deliver_rcu+0x264/0x1300 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x2b8/0x440 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:314 [inline] ip_local_deliver+0x21f/0x490 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:461 [inline] ip_rcv_finish net/ipv4/ip_input.c:449 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] ip_rcv+0x46f/0x760 net/ipv4/ip_input.c:569 __netif_receive_skb_one_core net/core/dev.c:5534 [inline] __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5648 netif_receive_skb_internal net/core/dev.c:5734 [inline] netif_receive_skb+0x58/0x660 net/core/dev.c:5793 tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1556 tun_get_user+0x53b9/0x66e0 drivers/net/tun.c:2009 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2055 call_write_iter include/linux/fs.h:2087 [inline] new_sync_write fs/read_write.c:497 [inline] vfs_write+0xb6b/0x1520 fs/read_write.c:590 ksys_write+0x20f/0x4c0 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __x64_sys_write+0x93/0xd0 fs/read_write.c:652 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b Uninit was created at: __alloc_pages+0x9a6/0xe00 mm/page_alloc.c:4590 alloc_pages_mpol+0x62b/0x9d0 mm/mempolicy.c:2133 alloc_pages+0x1be/0x1e0 mm/mempolicy.c:2204 skb_page_frag_refill+0x2bf/0x7c0 net/core/sock.c:2909 tun_build_skb drivers/net/tun.c:1686 [inline] tun_get_user+0xe0a/0x66e0 drivers/net/tun.c:1826 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2055 call_write_iter include/linux/fs.h:2087 [inline] new_sync_write fs/read_write.c:497 [inline] vfs_write+0xb6b/0x1520 fs/read_write.c:590 ksys_write+0x20f/0x4c0 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __x64_sys_write+0x93/0xd0 fs/read_write.c:652 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b Fixes: c54419321455 ("GRE: Refactor GRE tunneling code.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-07net: introduce include/net/rps.hEric Dumazet2-0/+2
Move RPS related structures and helpers from include/linux/netdevice.h and include/net/sock.h to a new include file. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-18-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07inet: move inet_ehash_secret and udp_ehash_secret into net_hotdataEric Dumazet2-4/+1
"struct net_protocol" has a 32bit hole in 32bit arches. Use it to store the 32bit secret used by UDP and TCP, to increase cache locality in rx path. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-15-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07inet: move tcp_protocol and udp_protocol to net_hotdataEric Dumazet1-15/+15
These structures are read in rx path, move them to net_hotdata for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-14-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07udp: move udpv4_offload and udpv6_offload to net_hotdataEric Dumazet1-9/+8
These structures are used in GRO and GSO paths. Move them to net_hodata for better cache locality. v2: udpv6_offload definition depends on CONFIG_INET=y Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-12-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07net: move tcpv4_offload and tcpv6_offload to net_hotdataEric Dumazet1-9/+8
These are used in TCP fast paths. Move them into net_hotdata for better cache locality. v2: tcpv6_offload definition depends on CONFIG_INET Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07net: move ip_packet_offload and ipv6_packet_offload to net_hotdataEric Dumazet1-9/+9
These structures are used in GRO and GSO paths. v2: ipv6_packet_offload definition depends on CONFIG_INET Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306160031.874438-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07netlink: let core handle error cases in dump operationsEric Dumazet2-10/+1
After commit b5a899154aa9 ("netlink: handle EMSGSIZE errors in the core"), we can remove some code that was not 100 % correct anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240306102426.245689-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-06Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>"Ahelenia Ziemiańska1-1/+1
Found with git grep 'MODULE_AUTHOR(".*([^)]*@' Fixed with sed -i '/MODULE_AUTHOR(".*([^)]*@/{s/ (/ </g;s/)"/>"/;s/)and/> and/}' \ $(git grep -l 'MODULE_AUTHOR(".*([^)]*@') Also: in drivers/media/usb/siano/smsusb.c normalise ", INC" to ", Inc"; this is what every other MODULE_AUTHOR for this company says, and it's what the header says in drivers/sbus/char/openprom.c normalise a double-spaced separator; this is clearly copied from the copyright header, where the names are aligned on consecutive lines thusly: * Linux/SPARC PROM Configuration Driver * Copyright (C) 1996 Thomas K. Dyas (tdyas@noc.rutgers.edu) * Copyright (C) 1996 Eddie C. Dost (ecd@skynet.be) but the authorship branding is single-line Link: https://lkml.kernel.org/r/mk3geln4azm5binjjlfsgjepow4o73domjv6ajybws3tz22vb3@tarta.nabijaczleweli.xyz Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-06inet: Add getsockopt support for IP_ROUTER_ALERT and IPV6_ROUTER_ALERTJuntong Deng1-3/+10
Currently getsockopt does not support IP_ROUTER_ALERT and IPV6_ROUTER_ALERT, and we are unable to get the values of these two socket options through getsockopt. This patch adds getsockopt support for IP_ROUTER_ALERT and IPV6_ROUTER_ALERT. Signed-off-by: Juntong Deng <juntong.deng@outlook.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-05sock: Use unsafe_memcpy() for sock_copy()Kees Cook1-2/+3
While testing for places where zero-sized destinations were still showing up in the kernel, sock_copy() and inet_reqsk_clone() were found, which are using very specific memcpy() offsets for both avoiding a portion of struct sock, and copying beyond the end of it (since struct sock is really just a common header before the protocol-specific allocation). Instead of trying to unravel this historical lack of container_of(), just switch to unsafe_memcpy(), since that's effectively what was happening already (memcpy() wasn't checking 0-sized destinations while the code base was being converted away from fake flexible arrays). Avoid the following false positive warning with future changes to CONFIG_FORTIFY_SOURCE: memcpy: detected field-spanning write (size 3068) of destination "&nsk->__sk_common.skc_dontcopy_end" at net/core/sock.c:2057 (size 0) Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240304212928.make.772-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-05net: Re-use and set mono_delivery_time bit for userspace tstamp packetsAbhishek Chauhan2-0/+2
Bridge driver today has no support to forward the userspace timestamp packets and ends up resetting the timestamp. ETF qdisc checks the packet coming from userspace and encounters to be 0 thereby dropping time sensitive packets. These changes will allow userspace timestamps packets to be forwarded from the bridge to NIC drivers. Setting the same bit (mono_delivery_time) to avoid dropping of userspace tstamp packets in the forwarding path. Existing functionality of mono_delivery_time remains unaltered here, instead just extended with userspace tstamp support for bridge forwarding path. Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240301201348.2815102-1-quic_abchauha@quicinc.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-05tcp: gro: micro optimizations in tcp[4]_gro_complete()Eric Dumazet1-8/+9
In tcp_gro_complete() : Moving the skb->inner_transport_header setting allows the compiler to reuse the previously loaded value of skb->transport_header. Caching skb_shinfo() avoids duplications as well. In tcp4_gro_complete(), doing a single change on skb_shinfo(skb)->gso_type also generates better code. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-05net: gro: rename skb_gro_header_hard()Eric Dumazet3-3/+3
skb_gro_header_hard() is renamed to skb_gro_may_pull() to match the convention used by common helpers like pskb_may_pull(). This means the condition is inverted: if (skb_gro_header_hard(skb, hlen)) slow_path(); becomes: if (!skb_gro_may_pull(skb, hlen)) slow_path(); Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-04tcp: align tcp_sock_write_rx groupEric Dumazet1-1/+1
Stephen Rothwell and kernel test robot reported that some arches (parisc, hexagon) and/or compilers would not like blamed commit. Lets make sure tcp_sock_write_rx group does not start with a hole. While we are at it, correct tcp_sock_write_tx CACHELINE_ASSERT_GROUP_SIZE() since after the blamed commit, we went to 105 bytes. Fixes: 99123622050f ("tcp: remove some holes in struct tcp_sock") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/netdev/20240301121108.5d39e4f9@canb.auug.org.au/ Closes: https://lore.kernel.org/oe-kbuild-all/202403011451.csPYOS3C-lkp@intel.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> # build-tested Link: https://lore.kernel.org/r/20240301171945.2958176-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-04bpf, net: validate struct_ops when updating value.Kui-Feng Lee1-5/+1
Perform all validations when updating values of struct_ops maps. Doing validation in st_ops->reg() and st_ops->update() is not necessary anymore. However, tcp_register_congestion_control() has been called in various places. It still needs to do validations. Cc: netdev@vger.kernel.org Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240224223418.526631-2-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-03-04net: adopt skb_network_offset() and similar helpersEric Dumazet2-2/+2
This is a cleanup patch, making code a bit more concise. 1) Use skb_network_offset(skb) in place of (skb_network_header(skb) - skb->data) 2) Use -skb_network_offset(skb) in place of (skb->data - skb_network_header(skb)) 3) Use skb_transport_offset(skb) in place of (skb_transport_header(skb) - skb->data) 4) Use skb_inner_transport_offset(skb) in place of (skb_inner_transport_header(skb) - skb->data) Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> # for sfc Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-02Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski5-10/+10
Daniel Borkmann says: ==================== pull-request: bpf-next 2024-02-29 We've added 119 non-merge commits during the last 32 day(s) which contain a total of 150 files changed, 3589 insertions(+), 995 deletions(-). The main changes are: 1) Extend the BPF verifier to enable static subprog calls in spin lock critical sections, from Kumar Kartikeya Dwivedi. 2) Fix confusing and incorrect inference of PTR_TO_CTX argument type in BPF global subprogs, from Andrii Nakryiko. 3) Larger batch of riscv BPF JIT improvements and enabling inlining of the bpf_kptr_xchg() for RV64, from Pu Lehui. 4) Allow skeleton users to change the values of the fields in struct_ops maps at runtime, from Kui-Feng Lee. 5) Extend the verifier's capabilities of tracking scalars when they are spilled to stack, especially when the spill or fill is narrowing, from Maxim Mikityanskiy & Eduard Zingerman. 6) Various BPF selftest improvements to fix errors under gcc BPF backend, from Jose E. Marchesi. 7) Avoid module loading failure when the module trying to register a struct_ops has its BTF section stripped, from Geliang Tang. 8) Annotate all kfuncs in .BTF_ids section which eventually allows for automatic kfunc prototype generation from bpftool, from Daniel Xu. 9) Several updates to the instruction-set.rst IETF standardization document, from Dave Thaler. 10) Shrink the size of struct bpf_map resp. bpf_array, from Alexei Starovoitov. 11) Initial small subset of BPF verifier prepwork for sleepable bpf_timer, from Benjamin Tissoires. 12) Fix bpftool to be more portable to musl libc by using POSIX's basename(), from Arnaldo Carvalho de Melo. 13) Add libbpf support to gcc in CORE macro definitions, from Cupertino Miranda. 14) Remove a duplicate type check in perf_event_bpf_event, from Florian Lehner. 15) Fix bpf_spin_{un,}lock BPF helpers to actually annotate them with notrace correctly, from Yonghong Song. 16) Replace the deprecated bpf_lpm_trie_key 0-length array with flexible array to fix build warnings, from Kees Cook. 17) Fix resolve_btfids cross-compilation to non host-native endianness, from Viktor Malik. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits) selftests/bpf: Test if shadow types work correctly. bpftool: Add an example for struct_ops map and shadow type. bpftool: Generated shadow variables for struct_ops maps. libbpf: Convert st_ops->data to shadow type. libbpf: Set btf_value_type_id of struct bpf_map for struct_ops. bpf: Replace bpf_lpm_trie_key 0-length array with flexible array bpf, arm64: use bpf_prog_pack for memory management arm64: patching: implement text_poke API bpf, arm64: support exceptions arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT bpf: add is_async_callback_calling_insn() helper bpf: introduce in_sleepable() helper bpf: allow more maps in sleepable bpf programs selftests/bpf: Test case for lacking CFI stub functions. bpf: Check cfi_stubs before registering a struct_ops type. bpf: Clarify batch lookup/lookup_and_delete semantics bpf, docs: specify which BPF_ABS and BPF_IND fields were zero bpf, docs: Fix typos in instruction-set.rst selftests/bpf: update tcp_custom_syncookie to use scalar packet offset bpf: Shrink size of struct bpf_map/bpf_array. ... ==================== Link: https://lore.kernel.org/r/20240301001625.8800-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>