wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2025-08-01	selftests/bpf: Test for unaligned flow_dissector ctx access	Paul Chaignon	1	-1/+22
	This patch adds tests for two context fields where unaligned accesses were not properly rejected. Note the new macro is similar to the existing narrow_load macro, but we need a different description and access offset. Combining the two macros into one is probably doable but I don't think it would help readability. vmlinux.h is included in place of bpf.h so we have the definition of struct bpf_nf_ctx. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Tested-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/bf014046ddcf41677fb8b98d150c14027e9fddba.1754039605.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-01	bpf: Improve ctx access verifier error message	Paul Chaignon	1	-1/+1
	We've already had two "error during ctx access conversion" warnings triggered by syzkaller. Let's improve the error message by dumping the cnt variable so that we can more easily differentiate between the different error cases. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/cc94316c30dd76fae4a75a664b61a2dbfe68e205.1754039605.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-01	bpf: Check netfilter ctx accesses are aligned	Paul Chaignon	1	-0/+3
	Similarly to the previous patch fixing the flow_dissector ctx accesses, nf_is_valid_access also doesn't check that ctx accesses are aligned. Contrary to flow_dissector programs, netfilter programs don't have context conversion. The unaligned ctx accesses are therefore allowed by the verifier. Fixes: fd9c663b9ad6 ("bpf: minimal support for programs hooked into netfilter framework") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/853ae9ed5edaa5196e8472ff0f1bb1cc24059214.1754039605.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-01	bpf: Check flow_dissector ctx accesses are aligned	Paul Chaignon	1	-0/+3
	flow_dissector_is_valid_access doesn't check that the context access is aligned. As a consequence, an unaligned access within one of the exposed field is considered valid and later rejected by flow_dissector_convert_ctx_access when we try to convert it. The later rejection is problematic because it's reported as a verifier bug with a kernel warning and doesn't point to the right instruction in verifier logs. Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook") Reported-by: syzbot+ccac90e482b2a81d74aa@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=ccac90e482b2a81d74aa Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/cc1b036be484c99be45eddf48bd78cc6f72839b1.1754039605.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-08-01	vsock/virtio: Allocate nonlinear SKBs for handling large transmit buffers	Will Deacon	1	-2/+3
	When transmitting a vsock packet, virtio_transport_send_pkt_info() calls virtio_transport_alloc_linear_skb() to allocate and fill SKBs with the transmit data. Unfortunately, these are always linear allocations and can therefore result in significant pressure on kmalloc() considering that the maximum packet size (VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + VIRTIO_VSOCK_SKB_HEADROOM) is a little over 64KiB, resulting in a 128KiB allocation for each packet. Rework the vsock SKB allocation so that, for sizes with page order greater than PAGE_ALLOC_COSTLY_ORDER, a nonlinear SKB is allocated instead with the packet header in the SKB and the transmit data in the fragments. Note that this affects both the vhost and virtio transports. Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20250717090116.11987-10-will@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-08-01	vsock/virtio: Rename virtio_vsock_skb_rx_put()	Will Deacon	3	-3/+3
	In preparation for using virtio_vsock_skb_rx_put() when populating SKBs on the vsock TX path, rename virtio_vsock_skb_rx_put() to virtio_vsock_skb_put(). No functional change. Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20250717090116.11987-9-will@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-08-01	vhost/vsock: Allocate nonlinear SKBs for handling large receive buffers	Will Deacon	2	-8/+32
	When receiving a packet from a guest, vhost_vsock_handle_tx_kick() calls vhost_vsock_alloc_linear_skb() to allocate and fill an SKB with the receive data. Unfortunately, these are always linear allocations and can therefore result in significant pressure on kmalloc() considering that the maximum packet size (VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + VIRTIO_VSOCK_SKB_HEADROOM) is a little over 64KiB, resulting in a 128KiB allocation for each packet. Rework the vsock SKB allocation so that, for sizes with page order greater than PAGE_ALLOC_COSTLY_ORDER, a nonlinear SKB is allocated instead with the packet header in the SKB and the receive data in the fragments. Finally, add a debug warning if virtio_vsock_skb_rx_put() is ever called on an SKB with a non-zero length, as this would be destructive for the nonlinear case. Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20250717090116.11987-8-will@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-08-01	vsock/virtio: Move SKB allocation lower-bound check to callers	Will Deacon	2	-4/+2
	virtio_vsock_alloc_linear_skb() checks that the requested size is at least big enough for the packet header (VIRTIO_VSOCK_SKB_HEADROOM). Of the three callers of virtio_vsock_alloc_linear_skb(), only vhost_vsock_alloc_skb() can potentially pass a packet smaller than the header size and, as it already has a check against the maximum packet size, extend its bounds checking to consider the minimum packet size and remove the check from virtio_vsock_alloc_linear_skb(). Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20250717090116.11987-7-will@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-08-01	vsock/virtio: Rename virtio_vsock_alloc_skb()	Will Deacon	4	-4/+5
	In preparation for nonlinear allocations for large SKBs, rename virtio_vsock_alloc_skb() to virtio_vsock_alloc_linear_skb() to indicate that it returns linear SKBs unconditionally and switch all callers over to this new interface for now. No functional change. Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20250717090116.11987-6-will@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-08-01	vsock/virtio: Resize receive buffers so that each SKB fits in a 4K page	Will Deacon	2	-2/+7
	When allocating receive buffers for the vsock virtio RX virtqueue, an SKB is allocated with a 4140 data payload (the 44-byte packet header + VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE). Even when factoring in the SKB overhead, the resulting 8KiB allocation thanks to the rounding in kmalloc_reserve() is wasteful (~3700 unusable bytes) and results in a higher-order page allocation on systems with 4KiB pages just for the sake of a few hundred bytes of packet data. Limit the vsock virtio RX buffers to 4KiB per SKB, resulting in much better memory utilisation and removing the need to allocate higher-order pages entirely. Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20250717090116.11987-5-will@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-08-01	vsock/virtio: Move length check to callers of virtio_vsock_skb_rx_put()	Will Deacon	3	-9/+6
	virtio_vsock_skb_rx_put() only calls skb_put() if the length in the packet header is not zero even though skb_put() handles this case gracefully. Remove the functionally redundant check from virtio_vsock_skb_rx_put() and, on the assumption that this is a worthwhile optimisation for handling credit messages, augment the existing length checks in virtio_transport_rx_work() to elide the call for zero-length payloads. Since the callers all have the length, extend virtio_vsock_skb_rx_put() to take it as an additional parameter rather than fish it back out of the packet header. Note that the vhost code already has similar logic in vhost_vsock_alloc_skb(). Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20250717090116.11987-4-will@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-08-01	vsock/virtio: Validate length in packet header before skb_put()	Will Deacon	1	-2/+10
	When receiving a vsock packet in the guest, only the virtqueue buffer size is validated prior to virtio_vsock_skb_rx_put(). Unfortunately, virtio_vsock_skb_rx_put() uses the length from the packet header as the length argument to skb_put(), potentially resulting in SKB overflow if the host has gone wonky. Validate the length as advertised by the packet header before calling virtio_vsock_skb_rx_put(). Cc: <stable@vger.kernel.org> Fixes: 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff") Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20250717090116.11987-3-will@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2025-08-01	vhost/vsock: Avoid allocating arbitrarily-sized SKBs	Will Deacon	1	-2/+4
	vhost_vsock_alloc_skb() returns NULL for packets advertising a length larger than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE in the packet header. However, this is only checked once the SKB has been allocated and, if the length in the packet header is zero, the SKB may not be freed immediately. Hoist the size check before the SKB allocation so that an iovec larger than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE + the header size is rejected outright. The subsequent check on the length field in the header can then simply check that the allocated SKB is indeed large enough to hold the packet. Cc: <stable@vger.kernel.org> Fixes: 71dc9ec9ac7d ("virtio/vsock: replace virtio_vsock_pkt with sk_buff") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Will Deacon <will@kernel.org> Message-Id: <20250717090116.11987-2-will@kernel.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-08-01	vhost_net: basic in_order support	Jason Wang	1	-25/+61
	This patch introduces basic in-order support for vhost-net. By recording the number of batched buffers in an array when calling `vhost_add_used_and_signal_n()`, we can reduce the number of userspace accesses. Note that the vhost-net batching logic is kept as we still count the number of buffers there. Testing Results: With testpmd: - TX: txonly mode + vhost_net with XDP_DROP on TAP shows a 17.5% improvement, from 4.75 Mpps to 5.35 Mpps. - RX: No obvious improvements were observed. With virtio-ring in-order experimental code in the guest: - TX: pktgen in the guest + XDP_DROP on TAP shows a 19% improvement, from 5.2 Mpps to 6.2 Mpps. - RX: pktgen on TAP with vhost_net + XDP_DROP in the guest achieves a 6.1% improvement, from 3.47 Mpps to 3.61 Mpps. Acked-by: Jonah Palmer <jonah.palmer@oracle.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20250714084755.11921-4-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com>
2025-08-01	vhost: basic in order support	Jason Wang	3	-25/+109
	This patch adds basic in order support for vhost. Two optimizations are implemented in this patch: 1) Since driver uses descriptor in order, vhost can deduce the next avail ring head by counting the number of descriptors that has been used in next_avail_head. This eliminate the need to access the available ring in vhost. 2) vhost_add_used_and_singal_n() is extended to accept the number of batched buffers per used elem. While this increases the times of userspace memory access but it helps to reduce the chance of used ring access of both the driver and vhost. Vhost-net will be the first user for this. Acked-by: Jonah Palmer <jonah.palmer@oracle.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20250714084755.11921-3-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com>
2025-08-01	vhost: fail early when __vhost_add_used() fails	Jason Wang	1	-0/+3
	This patch fails vhost_add_used_n() early when __vhost_add_used() fails to make sure used idx is not updated with stale used ring information. Reported-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20250714084755.11921-2-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com>
2025-08-01	vhost: Reintroduce kthread API and add mode selection	Cindy Lu	4	-18/+295
	Since commit 6e890c5d5021 ("vhost: use vhost_tasks for worker threads"), the vhost uses vhost_task and operates as a child of the owner thread. This is required for correct CPU usage accounting, especially when using containers. However, this change has caused confusion for some legacy userspace applications, and we didn't notice until it's too late. Unfortunately, it's too late to revert - we now have userspace depending both on old and new behaviour :( To address the issue, reintroduce kthread mode for vhost workers and provide a configuration to select between kthread and task worker. - Add 'fork_owner' parameter to vhost_dev to let users select kthread or task mode. Default mode is task mode(VHOST_FORK_OWNER_TASK). - Reintroduce kthread mode support: * Bring back the original vhost_worker() implementation, and renamed to vhost_run_work_kthread_list(). * Add cgroup support for the kthread * Introduce struct vhost_worker_ops: - Encapsulates create / stop / wake‑up callbacks. - vhost_worker_create() selects the proper ops according to inherit_owner. - Userspace configuration interface: * New IOCTLs: - VHOST_SET_FORK_FROM_OWNER lets userspace select task mode (VHOST_FORK_OWNER_TASK) or kthread mode (VHOST_FORK_OWNER_KTHREAD) - VHOST_GET_FORK_FROM_OWNER reads the current worker mode * Expose module parameter 'fork_from_owner_default' to allow system administrators to configure the default mode for vhost workers * Kconfig option CONFIG_VHOST_ENABLE_FORK_OWNER_CONTROL controls whether these IOCTLs and the parameter are available - The VHOST_NEW_WORKER functionality requires fork_owner to be set to true, with validation added to ensure proper configuration This partially reverts or improves upon: commit 6e890c5d5021 ("vhost: use vhost_tasks for worker threads") commit 1cdaafa1b8b4 ("vhost: replace single worker pointer with xarray") Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads"), Signed-off-by: Cindy Lu <lulu@redhat.com> Message-Id: <20250714071333.59794-2-lulu@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com>
2025-08-01	vdpa: Fix IDR memory leak in VDUSE module exit	Anders Roxell	1	-0/+1
	Add missing idr_destroy() call in vduse_exit() to properly free the vduse_idr radix tree nodes. Without this, module load/unload cycles leak 576-byte radix tree node allocations, detectable by kmemleak as: unreferenced object (size 576): backtrace: [<ffffffff81234567>] radix_tree_node_alloc+0xa0/0xf0 [<ffffffff81234568>] idr_get_free+0x128/0x280 The vduse_idr is initialized via DEFINE_IDR() at line 136 and used throughout the VDUSE (vDPA Device in Userspace) driver for device ID management. The fix follows the documented pattern in lib/idr.c and matches the cleanup approach used by other drivers. This leak was discovered through comprehensive module testing with cumulative kmemleak detection across 10 load/unload iterations per module. Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Message-Id: <20250704125335.1084649-1-anders.roxell@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-08-01	vdpa/mlx5: Fix release of uninitialized resources on error path	Dragos Tatulea	2	-4/+9
	The commit in the fixes tag made sure that mlx5_vdpa_free() is the single entrypoint for removing the vdpa device resources added in mlx5_vdpa_dev_add(), even in the cleanup path of mlx5_vdpa_dev_add(). This means that all functions from mlx5_vdpa_free() should be able to handle uninitialized resources. This was not the case though: mlx5_vdpa_destroy_mr_resources() and mlx5_cmd_cleanup_async_ctx() were not able to do so. This caused the splat below when adding a vdpa device without a MAC address. This patch fixes these remaining issues: - Makes mlx5_vdpa_destroy_mr_resources() return early if called on uninitialized resources. - Moves mlx5_cmd_init_async_ctx() early on during device addition because it can't fail. This means that mlx5_cmd_cleanup_async_ctx() also can't fail. To mirror this, move the call site of mlx5_cmd_cleanup_async_ctx() in mlx5_vdpa_free(). An additional comment was added in mlx5_vdpa_free() to document the expectations of functions called from this context. Splat: mlx5_core 0000:b5:03.2: mlx5_vdpa_dev_add:3950:(pid 2306) warning: No mac address provisioned? ------------[ cut here ]------------ WARNING: CPU: 13 PID: 2306 at kernel/workqueue.c:4207 __flush_work+0x9a/0xb0 [...] Call Trace: <TASK> ? __try_to_del_timer_sync+0x61/0x90 ? __timer_delete_sync+0x2b/0x40 mlx5_vdpa_destroy_mr_resources+0x1c/0x40 [mlx5_vdpa] mlx5_vdpa_free+0x45/0x160 [mlx5_vdpa] vdpa_release_dev+0x1e/0x50 [vdpa] device_release+0x31/0x90 kobject_cleanup+0x37/0x130 mlx5_vdpa_dev_add+0x327/0x890 [mlx5_vdpa] vdpa_nl_cmd_dev_add_set_doit+0x2c1/0x4d0 [vdpa] genl_family_rcv_msg_doit+0xd8/0x130 genl_family_rcv_msg+0x14b/0x220 ? __pfx_vdpa_nl_cmd_dev_add_set_doit+0x10/0x10 [vdpa] genl_rcv_msg+0x47/0xa0 ? __pfx_genl_rcv_msg+0x10/0x10 netlink_rcv_skb+0x53/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x27b/0x3b0 netlink_sendmsg+0x1f7/0x430 __sys_sendto+0x1fa/0x210 ? ___pte_offset_map+0x17/0x160 ? next_uptodate_folio+0x85/0x2b0 ? percpu_counter_add_batch+0x51/0x90 ? filemap_map_pages+0x515/0x660 __x64_sys_sendto+0x20/0x30 do_syscall_64+0x7b/0x2c0 ? do_read_fault+0x108/0x220 ? do_pte_missing+0x14a/0x3e0 ? __handle_mm_fault+0x321/0x730 ? count_memcg_events+0x13f/0x180 ? handle_mm_fault+0x1fb/0x2d0 ? do_user_addr_fault+0x20c/0x700 ? syscall_exit_work+0x104/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f0c25b0feca [...] ---[ end trace 0000000000000000 ]--- Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Fixes: 83e445e64f48 ("vdpa/mlx5: Fix error path during device add") Reported-by: Wenli Quan <wquan@redhat.com> Closes: https://lore.kernel.org/virtualization/CADZSLS0r78HhZAStBaN1evCSoPqRJU95Lt8AqZNJ6+wwYQ6vPQ@mail.gmail.com/ Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Message-Id: <20250708120424.2363354-2-dtatulea@nvidia.com> Tested-by: Wenli Quan <wquan@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-08-01	vhost-scsi: Fix check for inline_sg_cnt exceeding preallocated limit	Alok Tiwari	1	-1/+1
	The condition comparing ret to VHOST_SCSI_PREALLOC_SGLS was incorrect, as ret holds the result of kstrtouint() (typically 0 on success), not the parsed value. Update the check to use cnt, which contains the actual user-provided value. prevents silently accepting values exceeding the maximum inline_sg_cnt. Fixes: bca939d5bcd0 ("vhost-scsi: Dynamically allocate scatterlists") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20250628183405.3979538-1-alok.a.tiwari@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2025-08-01	virtio: virtio_dma_buf: fix missing parameter documentation	WangYuli	1	-0/+2
	Add missing parameter documentation for virtio_dma_buf_attach() function to fix kernel-doc warnings: Warning: drivers/virtio/virtio_dma_buf.c:41 function parameter 'dma_buf' not described in 'virtio_dma_buf_attach' Warning: drivers/virtio/virtio_dma_buf.c:41 function parameter 'attach' not described in 'virtio_dma_buf_attach' The function documentation was missing descriptions for both the 'dma_buf' and 'attach' parameters. Add proper parameter documentation following kernel-doc format. Signed-off-by: WangYuli <wangyuli@uniontech.com> Message-Id: <241C7118259DA110+20250623065210.270237-1-wangyuli@uniontech.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
2025-08-01	vhost: Fix typos	Alok Tiwari	1	-5/+5
	Fix multiple typos and improve comment clarity across vhost.c. Spelling errors: "thead" -> "thread", "RUNNUNG" -> "RUNNING" and "available". Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Message-Id: <20250615173933.1610324-1-alok.a.tiwari@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org>
2025-08-01	vhost: vringh: Remove unused functions	Dr. David Alan Gilbert	2	-82/+0
	The functions: vringh_abandon_kern() vringh_abandon_user() vringh_iov_pull_kern() and vringh_iov_push_kern() were all added in 2013 by commit f87d0fbb5798 ("vringh: host-side implementation of virtio rings.") but have remained unused. Remove them and the two helper functions they used. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Message-Id: <20250617001838.114457-3-linux@treblig.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Simon Horman <horms@kernel.org>
2025-08-01	vhost: vringh: Remove unused iotlb functions	Dr. David Alan Gilbert	2	-48/+0
	The functions: vringh_abandon_iotlb() vringh_notify_disable_iotlb() and vringh_notify_enable_iotlb() were added in 2020 by commit 9ad9c49cfe97 ("vringh: IOTLB support") but have remained unused. Remove them. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Simon Horman <horms@kernel.org> Message-Id: <20250617001838.114457-2-linux@treblig.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com>
2025-08-01	vhost-scsi: Fix log flooding with target does not exist errors	Mike Christie	1	-3/+1
	As part of the normal initiator side scanning the guest's scsi layer will loop over all possible targets and send an inquiry. Since the max number of targets for virtio-scsi is 256, this can result in 255 error messages about targets not existing if you only have a single target. When there's more than 1 vhost-scsi device each with a single target, then you get N * 255 log messages. It looks like the log message was added by accident in: commit 3f8ca2e115e5 ("vhost/scsi: Extract common handling code from control queue handler") when we added common helpers. Then in: commit 09d7583294aa ("vhost/scsi: Use common handling code in request queue handler") we converted the scsi command processing path to use the new helpers so we started to see the extra log messages during scanning. The patches were just making some code common but added the vq_err call and I'm guessing the patch author forgot to enable the vq_err call (vq_err is implemented by pr_debug which defaults to off). So this patch removes the call since it's expected to hit this path during device discovery. Fixes: 09d7583294aa ("vhost/scsi: Use common handling code in request queue handler") Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20250611210113.10912-1-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-08-01	vhost-scsi: Fix typos and formatting in comments and logs	Alok Tiwari	1	-9/+9
	This patch corrects several minor typos and formatting issues. Changes include: Fixing misspellings like in comments - "explict" -> "explicit" - "infight" -> "inflight", - "with generate" -> "will generate" formatting in logs - Correcting log formatting specifier from "%dd" to "%d" - Adding a missing space in the sysfs emit string to prevent misinterpreted output like "X86_64on ". changing to "X86_64 on " - Cleaning up stray semicolons in struct definition endings These changes improve code readability and consistency. no functionality changes. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Message-Id: <20250611143932.2443796-1-alok.a.tiwari@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Mike Christie <michael.christie@oracle.com>
2025-08-01	vdpa/mlx5: Fix needs_teardown flag calculation	Dragos Tatulea	1	-1/+1
	needs_teardown is a device flag that indicates when virtual queues need to be recreated. This happens for certain configuration changes: queue size and some specific features. Currently, the needs_teardown state can be incorrectly reset by subsequent .set_vq_num() calls. For example, for 1 rx VQ with size 512 and 1 tx VQ with size 256: .set_vq_num(0, 512) -> sets needs_teardown to true (rx queue has a non-default size) .set_vq_num(1, 256) -> sets needs_teardown to false (tx queue has a default size) This change takes into account the previous value of the needs_teardown flag when re-calculating it during VQ size configuration. Fixes: 0fe963d6fc16 ("vdpa/mlx5: Re-create HW VQs under certain conditions") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Tested-by: Si-Wei Liu<si-wei.liu@oracle.com> Message-Id: <20250604184802.2625300-1-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2025-08-01	vhost: Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...))	Pei Xiao	1	-1/+1
	cocci warning: ./kernel/vhost_task.c:148:9-16: WARNING: ERR_CAST can be used with tsk Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...)). Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn> Message-Id: <1a8499a5da53e4f72cf21aca044ae4b26db8b2ad.1749020055.git.xiaopei01@kylinos.cn> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-08-01	virtio: Fix typo in register_virtio_device() doc comment	Alok Tiwari	1	-1/+1
	Corrected "suceess" to "success" in the function documentation for clarity. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20250529084350.3145699-1-alok.a.tiwari@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
2025-08-01	virtio-vdpa: Remove virtqueue list	Viresh Kumar	1	-41/+3
	The virtio vdpa implementation creates a list of virtqueues, while the same is already available in the struct virtio_device. This list is never traversed though, and only the pointer to the struct virtio_vdpa_vq_info is used in the callback, where the virtqueue pointer could be directly used. Remove the unwanted code to simplify the driver. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Message-Id: <7808f2f7e484987b95f172fffb6c71a5da20ed1e.1748503784.git.viresh.kumar@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
2025-08-01	virtio-mmio: Remove virtqueue list from mmio device	Viresh Kumar	2	-50/+4
	The MMIO transport implementation creates a list of virtqueues for a virtio device, while the same is already available in the struct virtio_device. Don't create a duplicate list, and use the other one instead. While at it, fix the virtio_device_for_each_vq() macro to accept an argument like "&vm_dev->vdev" (which currently fails to build). Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Message-Id: <3e56c6f74002987e22f364d883cbad177cd9ad9c.1747827066.git.viresh.kumar@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2025-08-01	virtio: document ENOSPC	Michael S. Tsirkin	1	-0/+4
	drivers handle ENOSPC specially since it's an error one can get from a working VQ. Document the semantics. Message-Id: <2e6ec46b8d5e6755be291cec8e2ec57ef286e97b.1748356035.git.mst@redhat.com> Reported-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Parav Pandit <parav@nvidia.com>
2025-08-01	drm/virtio: implement virtio_gpu_shutdown	Gerd Hoffmann	1	-4/+4
	Calling drm_dev_unplug() is the drm way to say the device is gone and can not be accessed any more. Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20250507082821.2710706-1-kraxel@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-08-01	virtio: fix comments, readability	Michael S. Tsirkin	1	-2/+3
	Fix a couple of comments to match reality. Initialize config_driver_disabled to be consistent with other fields (note: the structure is already zero initialized, so this is not a bugfix as such). Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <7b74a55a5f3dc066d954472f5b68c29022f11b43.1752094439.git.mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2025-07-31	arm64/cfi,bpf: Support kCFI + BPF on arm64	Puranjay Mohan	2	-3/+34
	Currently, bpf_dispatcher_*_func() is marked with `__nocfi` therefore calling BPF programs from this interface doesn't cause CFI warnings. When BPF programs are called directly from C: from BPF helpers or struct_ops, CFI warnings are generated. Implement proper CFI prologues for the BPF programs and callbacks and drop __nocfi for arm64. Fix the trampoline generation code to emit kCFI prologue when a struct_ops trampoline is being prepared. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Co-developed-by: Maxwell Bland <mbland@motorola.com> Signed-off-by: Maxwell Bland <mbland@motorola.com> Co-developed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Dao Huang <huangdao1@oppo.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250801001004.1859976-8-samitolvanen@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-31	cfi: Move BPF CFI types and helpers to generic code	Sami Tolvanen	6	-68/+56
	Instead of duplicating the same code for each architecture, move the CFI type hash variables for BPF function types and related helper functions to generic CFI code, and allow architectures to override the function definitions if needed. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20250801001004.1859976-7-samitolvanen@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-31	cfi: add C CFI type macro	Mark Rutland	3	-60/+29
	Currently x86 and riscv open-code 4 instances of the same logic to define a u32 variable with the KCFI typeid of a given function. Replace the duplicate logic with a common macro. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Co-developed-by: Maxwell Bland <mbland@motorola.com> Signed-off-by: Maxwell Bland <mbland@motorola.com> Co-developed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Dao Huang <huangdao1@oppo.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20250801001004.1859976-6-samitolvanen@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-31	dt-bindings: PCI: qcom,pcie-sa8775p: Document 'link_down' reset	Ziyue Zhang	1	-3/+8
	Each PCIe controller on SA8775P includes a 'link_down' reset line in hardware. This patch documents the reset in the device tree binding. The 'link_down' reset is used to forcefully bring down the PCIe link layer, which is useful in scenarios such as link recovery after errors, power management transitions, and hotplug events. Including this reset line improves robustness and provides finer control over PCIe controller behavior. As the 'link_down' reset was omitted in the initial submission, it is now being documented. While this reset is not required for most of the block's basic functionality, and device trees lacking it will continue to function correctly in most cases, it is necessary to ensure maximum robustness when shutting down or recovering the PCIe core. Therefore, its inclusion is justified despite the minor ABI change. Signed-off-by: Ziyue Zhang <ziyue.zhang@oss.qualcomm.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Johan Hovold <johan+linaro@kernel.org> Link: https://patch.msgid.link/20250718081718.390790-3-ziyue.zhang@oss.qualcomm.com
2025-07-31	dt-bindings: PCI: Remove 83xx-512x-pci.txt	Rob Herring (Arm)	1	-39/+0
	This binding is already covered by fsl,mpc8xxx-pci.yaml schema. While the MPC512x is mentioned here, its compatible strings aren't actually documented and remain that way. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250710180843.2971667-1-robh@kernel.org
2025-07-31	dt-bindings: PCI: Convert amazon,al-alpine-v[23]-pcie to DT schema	Rob Herring (Arm)	4	-48/+73
	Convert the Amazon Alpine PCIe binding to DT schema format. It's a straight forward conversion. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250710180825.2971248-1-robh@kernel.org
2025-07-31	dt-bindings: PCI: Convert marvell,armada-3700-pcie to DT schema	Rob Herring (Arm)	3	-60/+100
	Convert the Marvell Armada 3700 PCIe binding to DT schema format. The 'clocks' property was missing and has been added. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250710180811.2970846-1-robh@kernel.org
2025-07-31	dt-bindings: PCI: Convert apm,xgene-pcie to DT schema	Rob Herring (Arm)	3	-51/+85
	Convert the Applied Micro X-Gene PCIe binding to DT schema format. It's a straight forward conversion. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250710180749.2970379-1-robh@kernel.org
2025-07-31	dt-bindings: PCI: Convert axis,artpec6-pcie to DT schema	Rob Herring (Arm)	2	-50/+118
	Convert the Axis ARTPEC-6/7 PCIe binding to DT schema format. It's a straight forward conversion. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250710180741.2970148-1-robh@kernel.org
2025-07-31	dt-bindings: PCI: Convert st,spear1340-pcie to DT schema	Rob Herring (Arm)	2	-14/+45
	Convert the ST SPEAr1340 PCIe binding to DT schema format. It's a straight forward conversion. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> [mani: added the license] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20250710180731.2969879-1-robh@kernel.org
2025-07-31	gpu: nova-core: fix up formatting after merge	Miguel Ojeda	1	-1/+1
	In the merge 260f6f4fda93 ("Merge tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel"), the formatting in the conflict resolution doesn't match what `make rustfmt` wants to make it. Fix it up appropriately. Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-07-31	libbpf: Avoid possible use of uninitialized mod_len	Achill Gilgenast	1	-1/+1
	Though mod_len is only read when mod_name != NULL and both are initialized together, gcc15 produces a warning with -Werror=maybe-uninitialized: libbpf.c: In function 'find_kernel_btf_id.constprop': libbpf.c:10100:33: error: 'mod_len' may be used uninitialized [-Werror=maybe-uninitialized] 10100 \| if (mod_name && strncmp(mod->name, mod_name, mod_len) != 0) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ libbpf.c:10070:21: note: 'mod_len' was declared here 10070 \| int ret, i, mod_len; \| ^~~~~~~ Silence the false positive. Signed-off-by: Achill Gilgenast <fossdd@pwned.life> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250729094611.2065713-1-fossdd@pwned.life Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-31	bpf: Fix oob access in cgroup local storage	Daniel Borkmann	2	-0/+16
	Lonial reported that an out-of-bounds access in cgroup local storage can be crafted via tail calls. Given two programs each utilizing a cgroup local storage with a different value size, and one program doing a tail call into the other. The verifier will validate each of the indivial programs just fine. However, in the runtime context the bpf_cg_run_ctx holds an bpf_prog_array_item which contains the BPF program as well as any cgroup local storage flavor the program uses. Helpers such as bpf_get_local_storage() pick this up from the runtime context: ctx = container_of(current->bpf_ctx, struct bpf_cg_run_ctx, run_ctx); storage = ctx->prog_item->cgroup_storage[stype]; if (stype == BPF_CGROUP_STORAGE_SHARED) ptr = &READ_ONCE(storage->buf)->data[0]; else ptr = this_cpu_ptr(storage->percpu_buf); For the second program which was called from the originally attached one, this means bpf_get_local_storage() will pick up the former program's map, not its own. With mismatching sizes, this can result in an unintended out-of-bounds access. To fix this issue, we need to extend bpf_map_owner with an array of storage_cookie[] to match on i) the exact maps from the original program if the second program was using bpf_get_local_storage(), or ii) allow the tail call combination if the second program was not using any of the cgroup local storage maps. Fixes: 7d9c3427894f ("bpf: Make cgroup storages shared between programs on the same cgroup") Reported-by: Lonial Con <kongln9170@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20250730234733.530041-4-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-31	bpf: Move cgroup iterator helpers to bpf.h	Daniel Borkmann	2	-13/+14
	Move them into bpf.h given we also need them in core code. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20250730234733.530041-3-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-31	bpf: Move bpf map owner out of common struct	Daniel Borkmann	3	-35/+49
	Given this is only relevant for BPF tail call maps, it is adding up space and penalizing other map types. We also need to extend this with further objects to track / compare to. Therefore, lets move this out into a separate structure and dynamically allocate it only for BPF tail call maps. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20250730234733.530041-2-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-31	bpf: Add cookie object to bpf maps	Daniel Borkmann	2	-0/+7
	Add a cookie to BPF maps to uniquely identify BPF maps for the timespan when the node is up. This is different to comparing a pointer or BPF map id which could get rolled over and reused. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20250730234733.530041-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>