aboutsummaryrefslogtreecommitdiffstatshomepage
AgeCommit message (Collapse)AuthorFilesLines
6 daysMerge branch 'wireguard-fixes-for-7-1-rc6'HEADstabledavem/netJakub Kicinski1-10/+10
Jason A. Donenfeld says: ==================== WireGuard fixes for 7.1-rc6 Please find one small patch, fixing the order of adding padding onto a packet, to ensure padding bytes get zeroed properly. ==================== Link: https://patch.msgid.link/20260529173134.3080773-1-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 dayswireguard: send: append trailer after expanding headJason A. Donenfeld1-10/+10
With how this is currently written, we add the trailer, zero it out, and then add the header space on. If that header space requires a reallocation + copy, the zeros in the trailer aren't copied, because the skb len hasn't actually been yet expanded to cover that. Instead add the padding at the end of the process rather than at the beginning. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20260529173134.3080773-2-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysRevert "ipv6: preserve insertion order for same-scope addresses"Fernando Fernandez Mancera2-2/+2
Chris Adams reported that preserving insertion order for same-scope addresses is causing SSH connections to be dropped after stopping a VM while running NetworkManager. NetworkManager caches the IPv6 address configuration, when a RA arrives, it determines the list of addresses to configure and checks if the addresses are already in the right order in the kernel. If they aren't, NetworkManager removes and re-adds them to achieve the desired order. As the order changes, NetworkManager is confused and reconfigures the addresses on every update. In addition, this would also affect to cloud tooling that relies on IPv6 addresses order to identify primary and secondaries addresses. This reverts commit cb3de96eea66f5e4a580086c6a1be46e765f97f4. Fixes: cb3de96eea66 ("ipv6: preserve insertion order for same-scope addresses") Reported-by: Chris Adams <linux@cmadams.net> Closes: https://lore.kernel.org/netdev/20260521135310.GC977@cmadams.net/ Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20260529112357.5079-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysMerge tag 'ipsec-2026-05-29' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsecJakub Kicinski12-39/+83
Steffen Klassert says: ==================== pull request (net): ipsec 2026-05-29 1) xfrm: route MIGRATE notifications to caller's netns Thread the caller's netns through km_migrate() so that MIGRATE notifications go to the issuing netns, fixing both the init_net listener leak and MOBIKE notifications inside non-init netns. From Maoyi Xie. 2) xfrm: ipcomp: Free destination pages on acomp errors Move the out_free_req label up so that allocated destination pages are released on decompression errors, not only on success. From Herbert Xu. 3) xfrm: Check for underflow in xfrm_state_mtu Reject configurations that cause xfrm_state_mtu() to underflow, preventing a negative TFCPAD value from becoming a memset size that triggers an out-of-bounds write of several terabytes. From David Ahern. 4) xfrm: ah: use skb_to_full_sk in async output callbacks Convert the possibly-incomplete skb->sk to a full socket pointer in async AH callbacks so that a request_sock or timewait_sock never reaches xfrm_output_resume() downstream consumers. From Michael Bommarito. 5) Add and revert: esp: fix page frag reference leak on skb_to_sgvec failure The patch does not fix te issue completely. 6) xfrm: esp: restore combined single-frag length gate Check the aligned post-trailer combined length against a page limit in the fast path, preventing skb_page_frag_refill() from falling back to a page too small for the destination scatterlist. From Jingguo Tan. 7) xfrm: iptfs: reset runtime state when cloning SAs Reinitialise the clone's mode_data runtime objects before publishing it, preventing queued skbs from being freed with list state copied from the original SA when migration fails. From Shaomin Chen. 8) xfrm: move policy_bydst RCU sync from per-netns .exit to .pre_exit Flush policy tables and drain the workqueue in a .pre_exit handler so that cleanup_net() pays one RCU grace period per batch instead of one per namespace, fixing stalls at high CLONE_NEWNET rates. From Usama Arif. 9) xfrm: input: hold netns during deferred transport reinjection Take a netns reference when queueing deferred transport reinjection work and drop it after the callback completes, keeping the skb->cb net pointer valid until the deferred work runs. From Zhengchuan Liang. * tag 'ipsec-2026-05-29' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: Revert "esp: fix page frag reference leak on skb_to_sgvec failure" xfrm: input: hold netns during deferred transport reinjection xfrm: move policy_bydst RCU sync from per-netns .exit to .pre_exit xfrm: iptfs: reset runtime state when cloning SAs xfrm: esp: restore combined single-frag length gate esp: fix page frag reference leak on skb_to_sgvec failure xfrm: ah: use skb_to_full_sk in async output callbacks xfrm: Check for underflow in xfrm_state_mtu xfrm: ipcomp: Free destination pages on acomp errors xfrm: route MIGRATE notifications to caller's netns ==================== Link: https://patch.msgid.link/20260529092648.3878973-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysnet: skbuff: fix pskb_carve leaking zcopy pagesPavel Begunkov1-0/+10
When SKBFL_MANAGED_FRAG_REFS is set, frag pages are not refcounted but their lifetime is controlled by the attached ubuf_info. To make a copy of the skb_shared_info, we either should clear the flag and reference the frags, or keep the flag and have frags unreferenced. pskb_carve_inside_header() and pskb_carve_inside_nonlinear() don't follow the rule and thus can leak page references. Let's clear SKBFL_MANAGED_FRAG_REFS from the original skb to fix it. It's the simplest way to address it, but there are more performant ways to do that if it ever becomes a problem. Link: https://lore.kernel.org/all/20260523085809.26331-1-nvminh232@clc.fitus.edu.vn/ Fixes: 753f1ca4e1e50 ("net: introduce managed frags infrastructure") Reported-by: Minh Nguyen <minhnguyen.080505@gmail.com> Reported-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/1e2086aa69217d7f9c8da3d38f5be7160f1b4cd1.1779993185.git.asml.silence@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysipv6: fix possible infinite loop in fib6_select_path()Jiayuan Chen1-0/+3
Found while auditing the same pattern Sashiko reported in rt6_fill_node() [1]. Apply the same fix as commit f8d8ce1b515a ("ipv6: fix possible infinite loop in fib6_info_uses_dev()"). Writers holding tb6_lock can list_del_rcu(&first->fib6_siblings) without waiting for RCU readers; first->fib6_siblings.next then still points into the old ring and this softirq-side walker never reaches &first->fib6_siblings as its terminator. fib6_purge_rt() always WRITE_ONCE()s first->fib6_nsiblings to 0 before list_del_rcu(), so an inside-loop check is a reliable detach signal. [1] https://sashiko.dev/#/patchset/20260526020227.4857-1-jiayuan.chen%40linux.dev Fixes: d9ccb18f83ea ("ipv6: Fix soft lockups in fib6_select_path under high next hop churn") Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260527053133.180695-2-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysipv6: fix possible infinite loop in rt6_fill_node()Jiayuan Chen1-0/+2
Sashiko reported this issue [1]. Apply the same fix as commit f8d8ce1b515a ("ipv6: fix possible infinite loop in fib6_info_uses_dev()"). Writers holding tb6_lock can list_del_rcu(&rt->fib6_siblings) without waiting for RCU readers; rt->fib6_siblings.next then still points into the old ring and this softirq-side walker never reaches &rt->fib6_siblings, causing a CPU stall. fib6_del_route() always WRITE_ONCE()s rt->fib6_nsiblings to 0 before list_del_rcu(), so an inside-loop check is a reliable detach signal. [1] https://sashiko.dev/#/patchset/20260526020227.4857-1-jiayuan.chen%40linux.dev Fixes: d9ccb18f83ea ("ipv6: Fix soft lockups in fib6_select_path under high next hop churn") Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260527053133.180695-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysbpf: sockmap: fix tail fragment offset in bpf_msg_push_dataYuqi Xu1-1/+1
When bpf_msg_push_data() inserts data in the middle of a scatterlist entry, it splits the original entry into a left fragment and a right fragment. The right fragment offset is page-local, but the code advances it with `start`, which is the message-global insertion point. For inserts into a non-first SG entry, this over-advances the offset and leaves the split layout inconsistent. Advance the right fragment offset by the fragment-local delta, `start - offset`, which matches the length removed from the front of the original entry. Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Yuqi Xu <xuyq21@lenovo.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Link: https://patch.msgid.link/8b129d10566aa3eb43f61a8f9757bcf51707d324.1779636774.git.xuyq21@lenovo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysvsock/virtio: bind uarg before filling zerocopy skbJingguo Tan1-3/+9
virtio_transport_send_pkt_info() allocates or reuses the zerocopy uarg before entering the send loop, but virtio_transport_alloc_skb() still fills the skb before it inherits that uarg. When fixed-buffer vectored zerocopy hits MAX_SKB_FRAGS, io_sg_from_iter() may partially attach managed frags and return -EMSGSIZE. The rollback path call kfree_skb() to free an skb that carries SKBFL_MANAGED_FRAG_REFS but no uarg, so skb_release_data() falls through to ordinary frag unref. Pass the uarg into virtio_transport_alloc_skb() and bind it immediately before virtio_transport_fill_skb(). This keeps control or no-payload skbs untouched while ensuring success and rollback share one lifetime rule. Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") Signed-off-by: Lin Ma <malin89@huawei.com> Signed-off-by: Rongzhen Cui <cuirongzhen@huawei.com> Signed-off-by: Jingguo Tan <tanjingguo@huawei.com> Acked-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260527023301.1075581-1-malin89@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysRevert "esp: fix page frag reference leak on skb_to_sgvec failure"Steffen Klassert2-14/+10
This reverts commit 2982e599fff6faa21c8df147d96fc7af6c1a2f24. The patch does not fully fix the issue and the Author does not match the 'Signed-off-by:' tag, so revert it for now. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
6 daysnet: pcs: pcs-mtk-lynxi: fix bpi-r3 serdes configurationFrank Wunderlich1-0/+3
Commit 8871389da151 introduces common pcs dts properties which writes rx=normal,tx=normal polarity to register SGMSYS_QPHY_WRAP_CTRL of switch. This is initialized with tx-bit set and so change inverts polarity compared to before. It looks like mt7531 has tx polarity inverted in hardware and set tx-bit by default to restore the normal polarity. The MT7531 datasheet quite clearly states: Register 000050EC QPHY_WRAP_CTRL -- QPHY wrapper control Reset value: 0x00000501 BIT 1 RX_BIT_POLARITY -- RX bit polarity control 1'b0: normal 1'b1: inverted BIT 0 TX_BIT_POLARITY -- TX bit polarity control (TX default inversed in MT7531) 1'b0: normal 1'b1: inverted Till this patch the register write was only called when mediatek,pnswap property was set which cannot be done for switch because the fw-node param was always NULL from switch driver in the mtk_pcs_lynxi_create call. Do not configure switch side like it's done before. Fixes: 8871389da151 ("net: pcs: pcs-mtk-lynxi: deprecate "mediatek,pnswap"") Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20260526153239.30194-1-linux@fw-web.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysMerge tag 'for-net-2026-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetoothJakub Kicinski10-69/+100
Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - hci_core: Rework hci_dev_do_reset() to use hci_sync functions - hci_conn: Fix memory leak in hci_le_big_terminate() - hci_sync: Set HCI_CMD_DRAIN_WORKQUEUE during device close - hci_sync: Reset device counters in hci_dev_close_sync() - hci_sync: fix UAF in hci_le_create_cis_sync - L2CAP: Fix possible crash on l2cap_ecred_conn_rsp - L2CAP: fix chan ref leak in l2cap_chan_timeout() on !conn - L2CAP: use chan timer to close channels in cleanup_listen() - L2CAP: clear chan->ident on ECRED reconfiguration success - ISO: fix UAF in iso_recv_frame - ISO: serialize iso_sock_clear_timer with socket lock - HIDP: fix missing length checks in hidp_input_report() - 6lowpan: check skb_clone() return value in send_mcast_pkt() - btusb: Allow firmware re-download when version matches - hci_qca: Use 100 ms SSR delay for rampatch and NVM loading * tag 'for-net-2026-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: hci_sync: Reset device counters in hci_dev_close_sync() Bluetooth: hci_sync: Set HCI_CMD_DRAIN_WORKQUEUE during device close Bluetooth: hci_core: Rework hci_dev_do_reset() to use hci_sync functions Bluetooth: ISO: serialize iso_sock_clear_timer with socket lock Bluetooth: ISO: fix UAF in iso_recv_frame Bluetooth: L2CAP: Fix possible crash on l2cap_ecred_conn_rsp Bluetooth: l2cap: clear chan->ident on ECRED reconfiguration success Bluetooth: hci_qca: Use 100 ms SSR delay for rampatch and NVM loading Bluetooth: hci_sync: fix UAF in hci_le_create_cis_sync Bluetooth: 6lowpan: check skb_clone() return value in send_mcast_pkt() Bluetooth: btusb: Allow firmware re-download when version matches Bluetooth: HIDP: fix missing length checks in hidp_input_report() Bluetooth: L2CAP: use chan timer to close channels in cleanup_listen() Bluetooth: L2CAP: fix chan ref leak in l2cap_chan_timeout() on !conn Bluetooth: hci_conn: Fix memory leak in hci_le_big_terminate() ==================== Link: https://patch.msgid.link/20260528131839.462344-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 dayssctp: fix race between sctp_wait_for_connect and peeloffZhenghang Xiao1-0/+2
sctp_wait_for_connect() drops and re-acquires the socket lock while waiting for the association to reach ESTABLISHED state. During this window, another thread can peeloff the association to a new socket via getsockopt(SCTP_SOCKOPT_PEELOFF), changing asoc->base.sk. After re-acquiring the old socket lock, sctp_wait_for_connect() returns success without noticing the migration — the caller then accesses the association under the wrong lock in sctp_datamsg_from_user(). Add the same sk != asoc->base.sk check that sctp_wait_for_sndbuf() already has, returning an error if the association was migrated while we slept. Fixes: 668c9beb9020 ("sctp: implement assign_number for sctp_stream_interleave") Signed-off-by: Zhenghang Xiao <kipreyyy@gmail.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20260527032411.60959-1-kipreyyy@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysMerge branch 'net-mana-fix-null-dereferences-during-teardown-after-attach-failure'Jakub Kicinski1-29/+47
Dipayaan Roy says: ==================== net: mana: Fix NULL dereferences during teardown after attach failure When mana_attach() fails (e.g. during queue allocation), the error cleanup frees apc->tx_qp and apc->rxqs and sets them to NULL. Multiple subsequent teardown paths can then dereference these NULL pointers, causing kernel panics. Patch 1 adds NULL guards in the low-level teardown functions (mana_fence_rqs, mana_destroy_vport, mana_dealloc_queues) so they are safe to call regardless of queue initialization state. This covers all callers: mana_remove(), mana_change_mtu() recovery, and internal error paths in mana_alloc_queues(). Patch 2 adds an early exit in mana_detach() for already-detached ports, making it safe for non-close callers. This allows the queue reset handler to safely retry mana_attach() without redundant teardown. ==================== Link: https://patch.msgid.link/20260525081129.1230035-1-dipayanroy@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysnet: mana: Skip redundant detach on already-detached portDipayaan Roy1-0/+6
When mana_per_port_queue_reset_work_handler() runs after a previous detach succeeded but attach failed, the port is left in a detached state with apc->tx_qp and apc->rxqs already freed. Calling mana_detach() again unconditionally leads to NULL pointer dereferences during queue teardown. Add an early exit in mana_detach() when the port is already in detached state (!netif_device_present) for non-close callers, making it safe to call idempotently. This allows the queue reset handler and other recovery paths to simply retry mana_attach() without redundant teardown. Fixes: 3b194343c250 ("net: mana: Implement ndo_tx_timeout and serialize queue resets per port.") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20260525081129.1230035-3-dipayanroy@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 daysnet: mana: Add NULL guards in teardown path to prevent panic on attach failureDipayaan Roy1-29/+41
When queue allocation fails partway through, the error cleanup frees and NULLs apc->tx_qp and apc->rxqs. Multiple teardown paths such as mana_remove(), mana_change_mtu() recovery, and internal error handling in mana_alloc_queues() can subsequently call into functions that dereference these pointers without NULL checks: - mana_chn_setxdp() dereferences apc->rxqs[0], causing a NULL pointer dereference panic (CR2: 0000000000000000 at mana_chn_setxdp+0x26). - mana_destroy_vport() iterates apc->rxqs without a NULL check. - mana_fence_rqs() iterates apc->rxqs without a NULL check. - mana_dealloc_queues() iterates apc->tx_qp without a NULL check. Add NULL guards for apc->rxqs in mana_fence_rqs(), mana_destroy_vport(), and before the mana_chn_setxdp() call. Add a NULL guard for apc->tx_qp in mana_dealloc_queues() to skip TX queue draining when TX queues were never allocated or already freed. Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/20260525081129.1230035-2-dipayanroy@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysMerge tag 'net-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds89-482/+1962
Pull networking fixes from Paolo Abeni: "This is again significantly bigger than the same point into the previous cycle, but at least smaller than last week. I'm not aware of any pending regression for the current cycle. Including fixes from netfilter. Current release - regressions: - netfilter: walk fib6_siblings under RCU Previous releases - regressions: - netlink: fix sending unassigned nsid after assigned one - bridge: fix sleep in atomic context in netlink path - sched: fix ethx:ingress -> ethy:egress -> ethx:ingress mirred loop - ipv4: fix net->ipv4.sysctl_local_reserved_ports UaF - eth: tun: free page on short-frame rejection in tun_xdp_one() Previous releases - always broken: - skbuff: fix missing zerocopy reference in pskb_carve helpers - handshake: drain pending requests at net namespace exit - ethtool: - rss: avoid modifying the RSS context response - module: avoid leaking a netdev ref on module flash errors - coalesce: cap profile updates at NET_DIM_PARAMS_NUM_PROFILES - netfilter: fix dst corruption in same register operation - nfc: hci: fix out-of-bounds read in HCP header parsing - ipv6: exthdrs: refresh nh pointer after ipv6_hop_jumbo() - eth: - vti: use ip6_tnl.net in vti6_changelink(). - vxlan: do not reuse cached ip_hdr() value after skb_tunnel_check_pmtu()" * tag 'net-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits) dpll: zl3073x: make frequency monitor a per-device attribute dpll: zl3073x: use __dpll_device_change_ntf() and remove change_work dpll: export __dpll_device_change_ntf() for use under dpll_lock net/handshake: Drain pending requests at net namespace exit net/handshake: Verify file-reference balance in submit paths net/handshake: Close the submit-side sock_hold race net/handshake: hand off the pinned file reference to accept_doit net/handshake: Take a long-lived file reference at submit net/handshake: Pass negative errno through handshake_complete() nvme-tcp: store negative errno in queue->tls_err net/handshake: Use spin_lock_bh for hn_lock net: skbuff: fix missing zerocopy reference in pskb_carve helpers net: hibmcge: move dma_rmb() after dma_sync_single_for_cpu() in RX path net: hibmcge: disable Relaxed Ordering to fix RX packet corruption selftests/tc-testing: Add netem test case exercising loops selftests/tc-testing: Add mirred test cases exercising loops net/sched: act_mirred: Fix return code in early mirred redirect error paths net/sched: act_mirred: Fix blockcast recursion bypass leading to stack overflow net/sched: Fix ethx:ingress -> ethy:egress -> ethx:ingress mirred loop net/sched: fix packet loop on netem when duplicate is on ...
7 daysMerge tag 'gpio-fixes-for-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linuxLinus Torvalds6-16/+35
Pull gpio fixes from Bartosz Golaszewski: - fix interrupt handling in gpio-mxc - fix scoped_guard() usage in gpio-adnp - don't accept partial writes in gpio-virtuser debugfs interface as they can't really work correctly - fix resource leaks in gpio-rockchip - fix locking issues in remove path in shared GPIO management - undo the vote of a GPIO shared proxy virtual device on GPIO release * tag 'gpio-fixes-for-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: rockchip: teardown bugs and resource leaks gpio: rockchip: convert bank->clk to devm_clk_get_enabled() gpio: virtuser: Fix uninitialized data bug in gpio_virtuser_direction_do_write() gpio: shared: fix lockdep false positive by removing unneeded lock gpio: shared: fix deadlock on shared proxy's parent removal gpio: adnp: fix flow control regression caused by scoped_guard() gpio: shared: undo the vote of the proxy on GPIO free gpio: mxc: fix irq_high handling
7 dayssecurity/keys: fix missed RCU read section on lookupLinus Torvalds1-0/+1
Nicholas Carlini reports that the keyring code calls assoc_array_find() in find_key_to_update() without holding the RCU read lock, while the assoc_array_gc() code really is designed around removing the node from the tree and then freeing it after an RCU grace-period. The regular key handling doesn't see this because holding the keyring semaphore hides any lifetime issues, but the persistent key handling uses a different model. Instead of extending the keyring locking, just do the simple RCU locking that the assoc_array was designed for. Reported-by: Nicholas Carlini <npc@anthropic.com> Cc: David Howells <dhowells@redhat.com> Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris James Morris <jmorris@namei.org> Cc: Serge E. Hallyn <serge@hallyn.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 daysgpio: rockchip: teardown bugs and resource leaksMarco Scardovi1-1/+16
Address several teardown issues and resource leaks in the driver's remove path and error handling: 1. Debounce clock reference leak: The debounce clock (bank->db_clk) is obtained using of_clk_get() which increments the clock's reference count, but clk_put() is never called. Register a devm action to cleanly release it on unbind. Note that of_clk_get(..., 1) remains necessary over devm_clk_get() because the DT binding does not define clock-names, precluding name-based lookup. 2. Unregistered chained IRQ handler: The chained IRQ handler is not disconnected in remove(). If a stray interrupt fires after the driver is removed, the kernel attempts to execute a stale handler, leading to a panic. Fix this by clearing the handler in remove(). 3. IRQ domain leak: The linear IRQ domain and its generic chips are allocated manually during probe but never removed. Remove the IRQ domain during driver teardown to free the associated generic chips and mappings. Fixes: 936ee2675eee ("gpio/rockchip: add driver for rockchip gpio") Assisted-by: Antigravity:gemini-3.5-flash Signed-off-by: Marco Scardovi <scardracs@disroot.org> Link: https://patch.msgid.link/20260526171050.12785-3-scardracs@disroot.org [Bartosz: don't emit an error message on devres allocation failure] Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
7 daysgpio: rockchip: convert bank->clk to devm_clk_get_enabled()Marco Scardovi1-5/+1
The bank->clk was previously obtained via of_clk_get() and manually prepared/enabled. However, it was missing a corresponding clk_put() in both the error paths and the remove function, leading to a reference leak. Convert the allocation to devm_clk_get_enabled(), which also properly propagates failures from clk_prepare_enable() that were previously ignored. The GPIO bank device uses the same OF node as the previous of_clk_get() call, so devm_clk_get_enabled(dev, NULL) correctly resolves the same clock provider entry. Fix the reference leak and simplify the code by removing the manual clk_disable_unprepare() calls in the probe error paths and in the remove function. Fixes: 936ee2675eee ("gpio/rockchip: add driver for rockchip gpio") Assisted-by: Antigravity:gemini-3.5-flash Signed-off-by: Marco Scardovi <scardracs@disroot.org> Link: https://patch.msgid.link/20260526171050.12785-2-scardracs@disroot.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
7 daysgpio: virtuser: Fix uninitialized data bug in gpio_virtuser_direction_do_write()Dan Carpenter1-2/+2
If *ppos is non-zero (user-space write split over multiple calls to write()) then simple_write_to_buffer() won't initialize the start of the buffer. Really, non-zero values for *ppos aren't going to work at all. Check for that and return -EINVAL at the start of the function. Fixes: 91581c4b3f29 ("gpio: virtuser: new virtual testing driver for the GPIO API") Signed-off-by: Dan Carpenter <error27@gmail.com> Link: https://patch.msgid.link/ahP3BJWWy-m_qI0X@stanley.mountain Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
7 daysgpio: shared: fix lockdep false positive by removing unneeded lockBartosz Golaszewski1-2/+0
By the time gpio_device_teardown_shared() is called, the parent device is gone from the global list of GPIO devices and all outstanding SRCU read-side critical sections have completed. That means that no concurrent gpio_find_and_request() can call gpio_shared_add_proxy_lookup() for this device at this time. There's also no risk of the parent device being re-bound to the driver before the unbinding completes (including the child devices). Lockdep produces a false-positive report about a possible circular dependency as it doesn't know the ordering guarantee. Not taking the ref->lock in gpio_device_teardown_shared() silences it and is safe to do. Cc: stable@vger.kernel.org Fixes: ea513dd3c066 ("gpio: shared: make locking more fine-grained") Reviewed-by: Linus Walleij <linusw@kernel.org> Link: https://patch.msgid.link/20260522-gpio-shared-deadlock-v1-2-76bca088f8c0@oss.qualcomm.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
7 daysgpio: shared: fix deadlock on shared proxy's parent removalBartosz Golaszewski1-4/+3
Commit 710abda58055 ("gpio: shared: call gpio_chip::of_xlate() if set") used the mutex embedded in struct gpio_shared_entry to protect the offset field which now can be modified after assignment. The critical section however is too wide and introduced a potential deadlock on the removal of the shared GPIO proxy's parent. Make the critical section shorter - only protect the offset when it's being read. While at it: mention the fact that the entry lock is now also used to protect against concurrent access to the offset field in the structure's documentation. Cc: stable@vger.kernel.org Fixes: 710abda58055 ("gpio: shared: call gpio_chip::of_xlate() if set") Reviewed-by: Linus Walleij <linusw@kernel.org> Link: https://patch.msgid.link/20260522-gpio-shared-deadlock-v1-1-76bca088f8c0@oss.qualcomm.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
7 daysgpio: adnp: fix flow control regression caused by scoped_guard()Bartosz Golaszewski1-1/+3
scoped_guard() is implemented as a for loop. Using it to protect code using the continue statement changes the flow as we now only break out of the hidden loop inside scoped_guard(), not the original for loop. Use a regular code block instead. Fixes: c7fe19ed3973 ("gpio: adnp: use lock guards for the I2C lock") Reported-by: David Lechner <dlechner@baylibre.com> Closes: https://lore.kernel.org/all/cde2abb2-4cc8-4fc9-b34a-0c5d2b95779f@baylibre.com/ Reviewed-by: Linus Walleij <linusw@kernel.org> Link: https://patch.msgid.link/20260522073527.9812-1-bartosz.golaszewski@oss.qualcomm.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
7 daysgpio: shared: undo the vote of the proxy on GPIO freeBartosz Golaszewski1-0/+9
When the user of a shared GPIO managed by gpio-shared-proxy calls gpiod_put() to release it, we never undo the potential "vote" for driving the shared line "high". In the free() callback, check if this proxy voted for "high" and - if so - decrease the number of votes and potentially revert the value to low if this is the last user. Cc: stable@vger.kernel.org Fixes: e992d54c6f97 ("gpio: shared-proxy: implement the shared GPIO proxy driver") Closes: https://sashiko.dev/#/patchset/20260513-gpio-shared-dynamic-voting-v1-1-8e1c49961b7d%40oss.qualcomm.com Reviewed-by: Linus Walleij <linusw@kernel.org> Link: https://patch.msgid.link/20260522-gpio-shared-free-vote-v3-1-8a4fddc6bedb@oss.qualcomm.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
7 daysBluetooth: hci_sync: Reset device counters in hci_dev_close_sync()Heitor Alves de Siqueira1-0/+4
Before resetting or closing the device, protocol counters should also be zeroed. Fixes: d0b137062b2d ("Bluetooth: hci_sync: Rework init stages") Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
7 daysBluetooth: hci_sync: Set HCI_CMD_DRAIN_WORKQUEUE during device closeHeitor Alves de Siqueira1-0/+8
Since hci_dev_close_sync() can now be called during the reset path, we should also set HCI_CMD_DRAIN_WORKQUEUE. This avoids queuing timeouts while the hdev workqueue is being drained. Fixes: 877afadad2dc ("Bluetooth: When HCI work queue is drained, only queue chained work") Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
7 daysBluetooth: hci_core: Rework hci_dev_do_reset() to use hci_sync functionsHeitor Alves de Siqueira1-40/+3
The current HCI reset function in hci_core.c duplicates most of the work done by hci_dev_close_sync(), and doesn't handle LE, advertising or discovery. Instead of porting these to hci_dev_do_reset(), directly call the close/open functions from hci_sync to reset the hdev. MGMT now notifies when a user performs a reset. Suggested-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Heitor Alves de Siqueira <halves@igalia.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
7 daysBluetooth: ISO: serialize iso_sock_clear_timer with socket lockMuhammad Bilal1-1/+1
iso_sock_close() calls iso_sock_clear_timer() before acquiring lock_sock(sk). iso_sock_clear_timer() reads iso_pi(sk)->conn twice without the socket lock held: if (!iso_pi(sk)->conn) return; cancel_delayed_work(&iso_pi(sk)->conn->timeout_work); Concurrently, iso_conn_del() executes under lock_sock(sk) and calls iso_chan_del(), which sets iso_pi(sk)->conn to NULL and may result in the final reference to the connection being dropped: CPU0 CPU1 ---- ---- iso_sock_clear_timer() if (conn != NULL) ... lock_sock(sk) iso_chan_del() iso_pi(sk)->conn = NULL cancel_delayed_work(conn) /* NULL deref or UAF */ iso_pi(sk)->conn is not stable across the unlock window, causing a NULL pointer dereference or use-after-free. Serialize iso_sock_clear_timer() with the socket lock by moving it inside lock_sock()/release_sock(), matching the pattern used in iso_conn_del() and all other call sites. Fixes: ccf74f2390d60a2f9a75ef496d2564abb478f46a ("Bluetooth: Add BTPROTO_ISO socket type") Cc: stable@vger.kernel.org Signed-off-by: Muhammad Bilal <meatuni001@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
7 daysBluetooth: ISO: fix UAF in iso_recv_frameMuhammad Bilal1-3/+7
iso_recv_frame reads conn->sk under iso_conn_lock but releases the lock before using sk, with no reference held. A concurrent iso_sock_kill() can free sk in that window, causing use-after-free on sk->sk_state and sock_queue_rcv_skb(). Fix by replacing the bare pointer read with iso_sock_hold(conn), which calls sock_hold() while the spinlock is held, atomically elevating the refcount before the lock drops. Add a drop_put label so sock_put() is called on all exit paths where the hold succeeded. Fixes: ccf74f2390d60a2f9a75ef496d2564abb478f46a ("Bluetooth: Add BTPROTO_ISO socket type") Cc: stable@vger.kernel.org Signed-off-by: Muhammad Bilal <meatuni001@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
7 daysBluetooth: L2CAP: Fix possible crash on l2cap_ecred_conn_rspLuiz Augusto von Dentz1-5/+22
If dcid is received for an already-assigned destination CID the spec requires that both channels to be discarded, but calling l2cap_chan_del may invalidate the tmp cursor created by list_for_each_entry_safe and in fact it is the wrong procedure as the chan->dcid may be assigned previously it really needs to be disconnected. Calling l2cap_chan_clone directly may still lead to l2cap_chan_del so instead schedule l2cap_chan_timeout with delay 0 to close the channel asynchronously. Fixes: 15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
7 daysBluetooth: l2cap: clear chan->ident on ECRED reconfiguration successZhenghang Xiao1-2/+8
l2cap_ecred_reconf_rsp() returns early on success without clearing chan->ident. Every other L2CAP response handler (l2cap_ecred_conn_rsp, l2cap_le_connect_rsp, l2cap_config_rsp) clears chan->ident after a successful transaction to prevent the channel from matching subsequent responses with the recycled ident value. A remote attacker that completed a reconfiguration as the peer can replay a failure response with the stale ident, causing the kernel to match and destroy the already-established channel via l2cap_chan_del(chan, ECONNRESET). Clear chan->ident for all matching channels on success, and harden the failure path by using l2cap_chan_hold_unless_zero() consistent with other L2CAP handlers (l2cap_le_command_rej, __l2cap_get_chan_by_ident). Fixes: 15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode") Signed-off-by: Zhenghang Xiao <kipreyyy@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
7 daysMerge branch 'dpll-zl3073x-various-fixes'Paolo Abeni6-50/+46
Ivan Vecera says: ==================== dpll: zl3073x: various fixes Three fixes for the zl3073x DPLL driver. Patch 1 exports __dpll_device_change_ntf() for use by drivers that need to send device change notifications from within callbacks already running under dpll_lock. Patch 2 replaces the change_work workqueue mechanism with direct calls to __dpll_device_change_ntf(), eliminating a race condition where the work handler could dereference a freed dpll_dev pointer during device teardown. Patch 3 moves the freq_monitor flag from per-DPLL to per-device scope to match the hardware behavior where frequency measurement registers are shared across all DPLL channels. ==================== Link: https://patch.msgid.link/20260526074525.1451008-1-ivecera@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 daysdpll: zl3073x: make frequency monitor a per-device attributeIvan Vecera4-29/+25
The frequency monitoring feature uses shared hardware registers that measure input reference frequencies independently of individual DPLL channels. However, the freq_monitor flag was incorrectly placed in the per-DPLL structure, causing each channel to track its own enable/disable state independently. Since the DPLL core calls measured_freq_get() only for the first pin registration, the measured_freq_check() in the periodic worker was gated by the per-DPLL freq_monitor flag of whichever channel happens to be checked. If the first DPLL channel had frequency monitoring disabled while another had it enabled, measurements were never reported. Move freq_monitor from struct zl3073x_dpll to struct zl3073x_dev so all DPLL channels share a single flag, matching the hardware behavior. Update freq_monitor_set() to notify other DPLL devices about the change (like phase_offset_avg_factor_set() already does) and remove the mode-dependent guard in zl3073x_dpll_changes_check() since all input pin monitoring (pin state, phase offset, FFO, and measured frequency) works correctly in all DPLL modes. Fixes: bfc923b642874 ("dpll: zl3073x: implement frequency monitoring") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20260526074525.1451008-4-ivecera@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 daysdpll: zl3073x: use __dpll_device_change_ntf() and remove change_workIvan Vecera2-19/+9
The change_work was introduced to send device change notifications from DPLL device callbacks without deadlocking on dpll_lock, since the callbacks are already invoked under that lock. Now that __dpll_device_change_ntf() is exported for callers that already hold dpll_lock, use it directly and remove the change_work infrastructure entirely. This eliminates a race condition where change_work could be re-scheduled after cancel_work_sync() during device teardown, potentially causing the handler to dereference a freed or NULL dpll_dev pointer. Fixes: 9363b4837659 ("dpll: zl3073x: Allow to configure phase offset averaging factor") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20260526074525.1451008-3-ivecera@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 daysdpll: export __dpll_device_change_ntf() for use under dpll_lockIvan Vecera2-2/+12
Export __dpll_device_change_ntf() so that drivers can send device change notifications from within device callbacks, which are already called under dpll_lock. Using dpll_device_change_ntf() in that context would deadlock. Add lockdep_assert_held() to catch misuse without the lock held. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20260526074525.1451008-2-ivecera@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 daysMerge branch 'net-handshake-anchor-request-lifetime-to-a-pinned-file-reference'Paolo Abeni9-45/+129
Chuck Lever says: ==================== net/handshake: anchor request lifetime to a pinned file reference handshake_nl_accept_doit() has accumulated four follow-on fixes since 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests"): 7ea9c1ec66bc, 7798b59409c3, fe67b063f687, and dabac51b8102. Each was a local refcount or NULL-check correction; none moved where the file reference is owned, and the same code keeps producing the same class of bug. Reworking the ownership is what breaks the pattern. For the duration of a request, sock->file has no single owner. Submit publishes the request without taking a file reference; accept_doit acquires one inside the handler, after the request has already left the pending list. The consumer can drop its own reference at any time, including the moment between handshake_req_next() popping the request and accept_doit reaching get_file(). The submit-side sock_hold() pins only struct sock; struct socket and sock->file remain under the consumer's control via the file descriptor. This series places the file reference under unambiguous ownership. handshake_req_submit() pins it on the request and completion or cancel drops it (patches 4-5); the submit-side sock_hold() then becomes redundant, and dropping it also closes a publish-before-pin race the late sock_hold itself opened (patch 6). The handshake_complete() API and its consumers move to a uniform negative-errno sign convention (patch 3), with the matching sign correction in nvme-tcp (patch 2). Patch 1 hardens hn_lock for BH context, the netns-exit drain fix builds on the new file-pin infrastructure (patch 8), and new KUnit file-count assertions verify the refcount contract (patch 7). Three things in this restructuring want a careful look. In handshake_complete(), the fput() of the request's file reference has to come after hp_done() -- fput() can transitively run handshake_sk_destruct() and free the request, so the patch stashes hr_file in a local first. handshake_sk_destruct() itself is kept on purpose: it owns rhashtable removal and kfree, and remains the backstop if a consumer path bypasses handshake_complete() entirely. Third, handshake_req_next() now returns its request with an extra get_file() held under hn_lock; accept_doit must consume that reference (FD_PREPARE on success, explicit fput on the fdf.err path), and any future caller has to honor the same contract. v2: https://patch.msgid.link/20260521-handshake-file-pin-v2-0-b9dadc472840@oracle.com v1: https://patch.msgid.link/20260518-handshake-file-pin-v1-0-4bbcb7e62fda@oracle.com ==================== Link: https://patch.msgid.link/20260525-handshake-file-pin-v3-0-66c616906ead@oracle.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 daysnet/handshake: Drain pending requests at net namespace exitChuck Lever2-3/+12
The arguments to list_splice_init() in handshake_net_exit() are reversed. The call moves the local empty "requests" list onto hn->hn_requests, leaving the local list empty, so the subsequent drain loop runs zero iterations. Pending handshake requests that had not yet been accepted are not torn down when the net namespace is destroyed; each one keeps a reference on a socket file and on the handshake_req allocation. Pass the source and destination in the documented order (list_splice_init(list, head) moves list onto head) so the pending list is transferred to the local scratch list and drained through handshake_complete(). Fixing the splice direction exposes a list-corruption race. After the splice each req->hr_list still has non-empty link pointers, threading the stack-local scratch list rather than hn_requests. A concurrent handshake_req_cancel() -- for example, from sunrpc's TLS timeout on a kernel socket whose netns reference was not taken -- finds the request through the rhashtable, calls remove_pending(), and sees !list_empty(&req->hr_list). __remove_pending_locked() then list_del_init()s an entry off the scratch list while the drain iterates, corrupting it. The same call arriving after the drain loop has run list_del() on an entry hits LIST_POISON instead. Have remove_pending() check HANDSHAKE_F_NET_DRAINING under hn_lock and report not-found when drain is in progress. The drain has already taken ownership; handshake_complete()'s existing test_and_set on HANDSHAKE_F_REQ_COMPLETED still arbitrates between drain and cancel for who calls the consumer's hp_done. Use list_del_init() rather than list_del() in the drain so req->hr_list does not carry LIST_POISON after drain releases the entry. The DRAINING guard in remove_pending() makes cancel return false, but cancel still falls through to test_and_set_bit on HANDSHAKE_F_REQ_COMPLETED and drops the request's hr_file reference. Without another pin, if that is the last reference, sk_destruct frees the request while it is still linked on the drain loop's local list. Pin each request's hr_file under hn_lock before releasing the list, and drop that drain pin after the loop finishes with the request. Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Hannes Reinecke <hare@kernel.org> Link: https://patch.msgid.link/20260525-handshake-file-pin-v3-8-66c616906ead@oracle.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 daysnet/handshake: Verify file-reference balance in submit pathsChuck Lever1-0/+28
The new file-reference contract on struct handshake_req is silently breakable: a missing get_file() at submit or a missing fput() on an error path leaves the file leaked but does not crash the test, so the existing absence-of-crash checks pass either way. Snapshot file_count(filp) before each handshake_req_submit() in the submit-success, EAGAIN, EBUSY, and cancel tests, and assert the expected balance after submit and again after cancel. The already-completed cancel test also asserts the post-complete balance, which pins down that handshake_complete() drops the reference and that the subsequent cancel does not double-fput. The destroy test gets the same treatment before __fput_sync(), which double-checks that cancel's fput() ran and the only remaining reference is the one sock_alloc_file() established. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Hannes Reinecke <hare@kernel.org> Link: https://patch.msgid.link/20260525-handshake-file-pin-v3-7-66c616906ead@oracle.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 daysnet/handshake: Close the submit-side sock_hold raceChuck Lever1-12/+0
handshake_req_submit() publishes the request via handshake_req_hash_add() and __add_pending_locked(), drops hn_lock, and calls handshake_genl_notify() (which can sleep) before taking sock_hold() on req->hr_sk. A fast tlshd ACCEPT followed by DONE can drive handshake_complete()'s sock_put() into the window between the spin_unlock and the late sock_hold(); on a system where the consumer's fd held the only sk reference, the late sock_hold() then operates on an sk whose refcount has reached zero. The preceding two patches install an explicit file reference on struct handshake_req. That file pins sock->file, which pins the embedded struct socket, which defers inet_release()'s sock_put(). As long as hr_file is held, sk cannot reach refcount zero from the consumer side, and the submit-side sock_hold() with its matching sock_put() calls in handshake_complete() and handshake_req_cancel() is now redundant. Drop all three. The file reference already keeps each request's socket alive, and the lifetime story is contained in a single get_file()/fput() pair. Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Hannes Reinecke <hare@kernel.org> Link: https://patch.msgid.link/20260525-handshake-file-pin-v3-6-66c616906ead@oracle.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 daysnet/handshake: hand off the pinned file reference to accept_doitChuck Lever3-5/+28
handshake_req_next() removes the request from the per-net pending list and drops hn_lock before handshake_nl_accept_doit() reads req->hr_sk->sk_socket and dereferences sock->file (once in FD_PREPARE() and again in get_file()). In that window a consumer running tls_handshake_cancel() followed by sockfd_put() (svc_sock_free) or __fput_sync() (xs_reset_transport) releases sock->file. sock_release() then runs sock_orphan(), zeroing sk_socket, and frees the struct socket. The accept-side code either reads NULL through sk_socket or chases freed memory. The submit-side sock_hold() does not prevent this. sk_refcnt protects struct sock, but struct socket and sock->file are independently refcounted via the file descriptor the consumer owns. Pinning sk leaves sock and sock->file unprotected. Retarget the accept-side dereferences at req->hr_file, which was pinned at submit time, instead of req->hr_sk->sk_socket->file. Pinning on its own is not sufficient: a consumer that cancels between handshake_req_next() returning and accept_doit reaching FD_PREPARE() takes the !remove_pending() branch in handshake_req_cancel() and drops hr_file before the accept side takes its own reference. Hand off an additional file reference inside handshake_req_next(), under hn_lock, so the accept side operates on a reference that no concurrent handshake_req_cancel() can revoke. FD_PREPARE() consumes that handed-off reference, either by transferring it to the new fd in fd_publish() or by dropping it in the cleanup destructor on error; the explicit get_file() that previously balanced FD_PREPARE() is therefore redundant and goes away. Update handshake_req_cancel_test2 and _test3 to simulate the FD_PREPARE() consumption with an fput() so the kunit file-count assertions stay balanced. Reported-by: Chris Mason <clm@meta.com> Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Hannes Reinecke <hare@kernel.org> Link: https://patch.msgid.link/20260525-handshake-file-pin-v3-5-66c616906ead@oracle.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 daysnet/handshake: Take a long-lived file reference at submitChuck Lever3-13/+37
handshake_nl_accept_doit() needs the file pointer backing req->hr_sk->sk_socket to survive the window between handshake_req_next() and the subsequent FD_PREPARE() and get_file(). The submit-side sock_hold() does not provide that. sk_refcnt keeps struct sock alive, but struct socket is owned by sock->file: when the consumer fputs the last file reference, sock_release() tears the socket down regardless of any sock_hold. Add an hr_file pointer to struct handshake_req and acquire an explicit reference on sock->file during handshake_req_submit(). handshake_complete() and handshake_req_cancel() release the reference on the completion-bit-winning path. The submit error path must also release the file reference, but after rhashtable insertion a concurrent handshake_req_cancel() can discover the request and race the error path. Gate the error-path cleanup -- sk_destruct restoration, fput, and request destruction -- with test_and_set_bit(HANDSHAKE_F_REQ_COMPLETED), the same serialization handshake_complete() and handshake_req_cancel() already use. When cancel has already claimed ownership, the submit error path returns without touching the request; socket teardown handles final destruction. The accept-side dereferences are not yet retargeted; that change comes in the next patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Link: https://patch.msgid.link/20260525-handshake-file-pin-v3-4-66c616906ead@oracle.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 daysnet/handshake: Pass negative errno through handshake_complete()Chuck Lever8-8/+18
handshake_complete() declares status as unsigned int and tls_handshake_done() negates that value (-status) before handing it to the TLS consumer. Consumers match on negative errno constants -- xs_tls_handshake_done() has switch (status) { case 0: case -EACCES: case -ETIMEDOUT: lower_transport->xprt_err = status; break; default: lower_transport->xprt_err = -EACCES; } so the API as designed expects callers to pass positive errno values that the tlshd shim then negates. Three internal callers in handshake_nl_accept_doit(), the net-exit drain, and a kunit test follow kernel convention and pass negative errnos -- -EIO, -ETIMEDOUT, -ETIMEDOUT. The implicit conversion to unsigned int turns -ETIMEDOUT into 0xFFFFFF92; the subsequent -status in tls_handshake_done() wraps back to 110, the consumer's switch falls through, and the xprt reports -EACCES on what should be -ETIMEDOUT or -EIO. Fix the API rather than the call sites. The natural kernel convention is negative errno in, negative errno out. Change handshake_complete() and hp_done to take int status, drop the negation in tls_handshake_done(), and negate once in handshake_nl_done_doit() where status arrives from the wire as an unsigned netlink attribute. The three internal callers were already correct under that convention and need no change. At the same wire boundary, declare MAX_ERRNO as the netlink policy upper bound for HANDSHAKE_A_DONE_STATUS. Attribute validation rejects out-of-range values before handshake_nl_done_doit() runs, and negating a bounded u32 there stays within int range -- closing the UBSAN-visible signed- integer overflow that an unconstrained u32 would invoke. Fixes: 3b3009ea8abb ("net/handshake: Create a NETLINK service for handling handshake requests") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Hannes Reinecke <hare@kernel.org> Link: https://patch.msgid.link/20260525-handshake-file-pin-v3-3-66c616906ead@oracle.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 daysnvme-tcp: store negative errno in queue->tls_errChuck Lever1-1/+1
nvme_tcp_tls_done() assigns queue->tls_err in three branches. The ENOKEY lookup failure and the EOPNOTSUPP initializer both store negative errnos. The third branch, reached when the handshake layer reports a non-zero status, stores -status. The handshake layer delivers status to the consumer callback as a negative errno; the other in-tree consumers -- xs_tls_handshake_done() and the nvmet target callback -- treat their status argument that way. The extra negation in nvme_tcp_tls_done() flips the sign, leaving tls_err as a positive value (for instance, +EIO), which nvme_tcp_start_tls() then returns to its caller. Drop the extra negation so queue->tls_err uniformly carries a negative errno on failure. Fixes: be8e82caa685 ("nvme-tcp: enable TLS handshake upcall") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Link: https://patch.msgid.link/20260525-handshake-file-pin-v3-2-66c616906ead@oracle.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 daysnet/handshake: Use spin_lock_bh for hn_lockChuck Lever3-9/+11
nvmet_tcp_state_change(), a socket callback that runs in BH context, can reach handshake_req_cancel() via nvmet_tcp_schedule_release_queue() and tls_handshake_cancel(). handshake_req_cancel() acquires hn->hn_lock with plain spin_lock(). If a process-context thread on the same CPU holds hn->hn_lock when a softirq invokes the cancel path, the lock attempt deadlocks. This is the only caller that invokes tls_handshake_cancel() from BH context; every other consumer calls it from process context. Deferring the cancel to process context in the NVMe target is not straightforward: nvmet_tcp_schedule_release_queue() must call tls_handshake_cancel() atomically with its state transition to DISCONNECTING. If the cancel were deferred, the handshake completion callback could fire in the window before the cancel runs, observe the unexpected state, and return without dropping its kref on the queue. Reworking that interlock is considerably more invasive than hardening the handshake lock. Convert all hn->hn_lock acquisitions from spin_lock/spin_unlock to spin_lock_bh/spin_unlock_bh so the lock is never taken with softirqs enabled. Fixes: 675b453e0241 ("nvmet-tcp: enable TLS handshake upcall") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Hannes Reinecke <hare@kernel.org> Link: https://patch.msgid.link/20260525-handshake-file-pin-v3-1-66c616906ead@oracle.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 daysnet: skbuff: fix missing zerocopy reference in pskb_carve helpersMinh Nguyen1-0/+4
pskb_carve_inside_header() and pskb_carve_inside_nonlinear() both copy the old skb_shared_info header into a new buffer via memcpy(), which includes the destructor_arg pointer (uarg) for MSG_ZEROCOPY skbs. Neither function calls net_zcopy_get() for the new shinfo, creating an unaccounted holder: every skb_shared_info with destructor_arg set will call skb_zcopy_clear() once when freed, but the corresponding net_zcopy_get() was never called for the new copy. Repeated calls drive uarg->refcnt to zero prematurely, freeing ubuf_info_msgzc while TX skbs still hold live destructor_arg pointers. KASAN reports use-after-free on a freed ubuf_info_msgzc: BUG: KASAN: slab-use-after-free in skb_release_data+0x77b/0x810 Read of size 8 at addr ffff88801574d3e8 by task poc/220 Call Trace: skb_release_data+0x77b/0x810 kfree_skb_list_reason+0x13e/0x610 skb_release_data+0x4cd/0x810 sk_skb_reason_drop+0xf3/0x340 skb_queue_purge_reason+0x282/0x440 rds_tcp_inc_free+0x1e/0x30 rds_recvmsg+0x354/0x1780 __sys_recvmsg+0xdf/0x180 Allocated by task 219: msg_zerocopy_realloc+0x157/0x7b0 tcp_sendmsg_locked+0x2892/0x3ba0 Freed by task 219: ip_recv_error+0x74a/0xb10 tcp_recvmsg+0x475/0x530 The skb consuming the late access still referenced the same uarg via shinfo->destructor_arg copied by pskb_carve_inside_nonlinear() without a refcount bump. This has been verified to be reliably exploitable: a working proof-of-concept achieves full root privilege escalation from an unprivileged local user on a default kernel configuration. The fix follows the pattern of pskb_expand_head() which has the same memcpy/cloned structure. For pskb_carve_inside_header(), net_zcopy_get() is placed after skb_orphan_frags() succeeds, so the orphan error path needs no cleanup. For pskb_carve_inside_nonlinear(), net_zcopy_get() is placed after all failure points and just before skb_release_data(), so no error path needs cleanup at all -- matching pskb_expand_head() more closely and avoiding the need for a balancing net_zcopy_put(). Fixes: 6fa01ccd8830 ("skbuff: Add pskb_extract() helper function") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-sonnet-4-6 Signed-off-by: Minh Nguyen <minhnguyen.080505@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260526041240.329462-1-minhnguyen.080505@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 daysMerge branch 'hibmcge-fix-rx-packet-corruption-issue'Paolo Abeni2-3/+6
Jijie Shao says: ==================== hibmcge: fix RX packet corruption issue This series fixes an RX packet corruption issue observed when SMMU is disabled on the hibmcge driver. The fixes include disabling PCI Relaxed Ordering and correcting the order of DMA barrier operations in the RX data sync path. ==================== Link: https://patch.msgid.link/20260525144525.94884-1-shaojijie@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 daysnet: hibmcge: move dma_rmb() after dma_sync_single_for_cpu() in RX pathJijie Shao1-3/+3
The dma_rmb() barrier was placed before dma_sync_single_for_cpu(), which is incorrect. DMA sync must complete first to make the buffer accessible to the CPU, then the rmb barrier ensures subsequent descriptor reads observe the latest data written by the hardware. Reorder the operations so dma_sync_single_for_cpu() is called before dma_rmb() to guarantee the driver reads consistent data from the DMA buffer. Fixes: f72e25594061 ("net: hibmcge: Implement rx_poll function to receive packets") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20260525144525.94884-3-shaojijie@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 daysnet: hibmcge: disable Relaxed Ordering to fix RX packet corruptionJijie Shao1-0/+3
When SMMU is disabled, the hibmcge driver may receive corrupted packets. The hardware writes packet data and descriptors to the same page, but with Relaxed Ordering enabled, PCI write transactions may not be strictly ordered. This can cause the driver to observe a valid descriptor before the corresponding packet data is fully written. Fix this by clearing PCI_EXP_DEVCTL_RELAX_EN in the PCI bridge control register to ensure strict write ordering between packet data and descriptors. Fixes: f72e25594061 ("net: hibmcge: Implement rx_poll function to receive packets") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20260525144525.94884-2-shaojijie@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>