wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2025-04-04	netlink: specs: rt_route: pull the ifa- prefix out of the names	Jakub Kicinski	1	-89/+91
	YAML specs don't normally include the C prefix name in the name of the YAML attr. Remove the ifa- prefix from all attributes in route-attrs and metrics and specify name-prefix instead. This is a bit risky, hopefully there aren't many users out there. Fixes: 023289b4f582 ("doc/netlink: Add spec for rt route messages") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250403013706.2828322-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04	netlink: specs: rt_addr: pull the ifa- prefix out of the names	Jakub Kicinski	2	-20/+21
	YAML specs don't normally include the C prefix name in the name of the YAML attr. Remove the ifa- prefix from all attributes in addr-attrs and specify name-prefix instead. This is a bit risky, hopefully there aren't many users out there. Fixes: dfb0f7d9d979 ("doc/netlink: Add spec for rt addr messages") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250403013706.2828322-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04	netlink: specs: rt_addr: fix get multi command name	Jakub Kicinski	2	-2/+2
	Command names should match C defines, codegens may depend on it. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Fixes: 4f280376e531 ("selftests/net: Add selftest for IPv4 RTM_GETMULTICAST support") Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250403013706.2828322-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04	netlink: specs: rt_addr: fix the spec format / schema failures	Jakub Kicinski	1	-0/+1
	The spec is mis-formatted, schema validation says: Failed validating 'type' in schema['properties']['operations']['properties']['list']['items']['properties']['dump']['properties']['request']['properties']['value']: {'minimum': 0, 'type': 'integer'} On instance['operations']['list'][3]['dump']['request']['value']: '58 - ifa-family' The ifa-family clearly wants to be part of an attribute list. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Yuyang Huang <yuyanghuang@google.com> Fixes: 4f280376e531 ("selftests/net: Add selftest for IPv4 RTM_GETMULTICAST support") Link: https://patch.msgid.link/20250403013706.2828322-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04	net: avoid false positive warnings in __net_mp_close_rxq()	Jakub Kicinski	2	-8/+8
	Commit under Fixes solved the problem of spurious warnings when we uninstall an MP from a device while its down. The __net_mp_close_rxq() which is used by io_uring was not fixed. Move the fix over and reuse __net_mp_close_rxq() in the devmem path. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Fixes: a70f891e0fa0 ("net: devmem: do not WARN conditionally after netdev_rx_queue_restart()") Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250403013405.2827250-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04	net: move mp dev config validation to __net_mp_open_rxq()	Jakub Kicinski	4	-57/+54
	devmem code performs a number of safety checks to avoid having to reimplement all of them in the drivers. Move those to __net_mp_open_rxq() and reuse that function for binding to make sure that io_uring ZC also benefits from them. While at it rename the queue ID variable to rxq_idx in __net_mp_open_rxq(), we touch most of the relevant lines. The XArray insertion is reordered after the netdev_rx_queue_restart() call, otherwise we'd need to duplicate the queue index check or risk inserting an invalid pointer. The XArray allocation failures should be extremely rare. Reviewed-by: Mina Almasry <almasrymina@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Fixes: 6e18ed929d3b ("net: add helpers for setting a memory provider on an rx queue") Link: https://patch.msgid.link/20250403013405.2827250-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04	net: ibmveth: make veth_pool_store stop hanging	Dave Marquardt	1	-12/+27
	v2: - Created a single error handling unlock and exit in veth_pool_store - Greatly expanded commit message with previous explanatory-only text Summary: Use rtnl_mutex to synchronize veth_pool_store with itself, ibmveth_close and ibmveth_open, preventing multiple calls in a row to napi_disable. Background: Two (or more) threads could call veth_pool_store through writing to /sys/devices/vio/30000002/pool/. You can do this easily with a little shell script. This causes a hang. I configured LOCKDEP, compiled ibmveth.c with DEBUG, and built a new kernel. I ran this test again and saw: Setting pool0/active to 0 Setting pool1/active to 1 [ 73.911067][ T4365] ibmveth 30000002 eth0: close starting Setting pool1/active to 1 Setting pool1/active to 0 [ 73.911367][ T4366] ibmveth 30000002 eth0: close starting [ 73.916056][ T4365] ibmveth 30000002 eth0: close complete [ 73.916064][ T4365] ibmveth 30000002 eth0: open starting [ 110.808564][ T712] systemd-journald[712]: Sent WATCHDOG=1 notification. [ 230.808495][ T712] systemd-journald[712]: Sent WATCHDOG=1 notification. [ 243.683786][ T123] INFO: task stress.sh:4365 blocked for more than 122 seconds. [ 243.683827][ T123] Not tainted 6.14.0-01103-g2df0c02dab82-dirty #8 [ 243.683833][ T123] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 243.683838][ T123] task:stress.sh state:D stack:28096 pid:4365 tgid:4365 ppid:4364 task_flags:0x400040 flags:0x00042000 [ 243.683852][ T123] Call Trace: [ 243.683857][ T123] [c00000000c38f690] [0000000000000001] 0x1 (unreliable) [ 243.683868][ T123] [c00000000c38f840] [c00000000001f908] __switch_to+0x318/0x4e0 [ 243.683878][ T123] [c00000000c38f8a0] [c000000001549a70] __schedule+0x500/0x12a0 [ 243.683888][ T123] [c00000000c38f9a0] [c00000000154a878] schedule+0x68/0x210 [ 243.683896][ T123] [c00000000c38f9d0] [c00000000154ac80] schedule_preempt_disabled+0x30/0x50 [ 243.683904][ T123] [c00000000c38fa00] [c00000000154dbb0] __mutex_lock+0x730/0x10f0 [ 243.683913][ T123] [c00000000c38fb10] [c000000001154d40] napi_enable+0x30/0x60 [ 243.683921][ T123] [c00000000c38fb40] [c000000000f4ae94] ibmveth_open+0x68/0x5dc [ 243.683928][ T123] [c00000000c38fbe0] [c000000000f4aa20] veth_pool_store+0x220/0x270 [ 243.683936][ T123] [c00000000c38fc70] [c000000000826278] sysfs_kf_write+0x68/0xb0 [ 243.683944][ T123] [c00000000c38fcb0] [c0000000008240b8] kernfs_fop_write_iter+0x198/0x2d0 [ 243.683951][ T123] [c00000000c38fd00] [c00000000071b9ac] vfs_write+0x34c/0x650 [ 243.683958][ T123] [c00000000c38fdc0] [c00000000071bea8] ksys_write+0x88/0x150 [ 243.683966][ T123] [c00000000c38fe10] [c0000000000317f4] system_call_exception+0x124/0x340 [ 243.683973][ T123] [c00000000c38fe50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec ... [ 243.684087][ T123] Showing all locks held in the system: [ 243.684095][ T123] 1 lock held by khungtaskd/123: [ 243.684099][ T123] #0: c00000000278e370 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x50/0x248 [ 243.684114][ T123] 4 locks held by stress.sh/4365: [ 243.684119][ T123] #0: c00000003a4cd3f8 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x88/0x150 [ 243.684132][ T123] #1: c000000041aea888 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x154/0x2d0 [ 243.684143][ T123] #2: c0000000366fb9a8 (kn->active#64){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x160/0x2d0 [ 243.684155][ T123] #3: c000000035ff4cb8 (&dev->lock){+.+.}-{3:3}, at: napi_enable+0x30/0x60 [ 243.684166][ T123] 5 locks held by stress.sh/4366: [ 243.684170][ T123] #0: c00000003a4cd3f8 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x88/0x150 [ 243.684183][ T123] #1: c00000000aee2288 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x154/0x2d0 [ 243.684194][ T123] #2: c0000000366f4ba8 (kn->active#64){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x160/0x2d0 [ 243.684205][ T123] #3: c000000035ff4cb8 (&dev->lock){+.+.}-{3:3}, at: napi_disable+0x30/0x60 [ 243.684216][ T123] #4: c0000003ff9bbf18 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0x138/0x12a0 From the ibmveth debug, two threads are calling veth_pool_store, which calls ibmveth_close and ibmveth_open. Here's the sequence: T4365 T4366 ----------------- ----------------- --------- veth_pool_store veth_pool_store ibmveth_close ibmveth_close napi_disable napi_disable ibmveth_open napi_enable <- HANG ibmveth_close calls napi_disable at the top and ibmveth_open calls napi_enable at the top. https://docs.kernel.org/networking/napi.html]] says The control APIs are not idempotent. Control API calls are safe against concurrent use of datapath APIs but an incorrect sequence of control API calls may result in crashes, deadlocks, or race conditions. For example, calling napi_disable() multiple times in a row will deadlock. In the normal open and close paths, rtnl_mutex is acquired to prevent other callers. This is missing from veth_pool_store. Use rtnl_mutex in veth_pool_store fixes these hangs. Signed-off-by: Dave Marquardt <davemarq@linux.ibm.com> Fixes: 860f242eb534 ("[PATCH] ibmveth change buffer pools dynamically") Reviewed-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250402154403.386744-1-davemarq@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04	arcnet: Add NULL check in com20020pci_probe()	Henry Martin	1	-1/+16
	devm_kasprintf() returns NULL when memory allocation fails. Currently, com20020pci_probe() does not check for this case, which results in a NULL pointer dereference. Add NULL check after devm_kasprintf() to prevent this issue and ensure no resources are left allocated. Fixes: 6b17a597fc2f ("arcnet: restoring support for multiple Sohard Arcnet cards") Signed-off-by: Henry Martin <bsdhenrymartin@gmail.com> Link: https://patch.msgid.link/20250402135036.44697-1-bsdhenrymartin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04	ipv6: Do not consider link down nexthops in path selection	Ido Schimmel	1	-2/+4
	Nexthops whose link is down are not supposed to be considered during path selection when the "ignore_routes_with_linkdown" sysctl is set. This is done by assigning them a negative region boundary. However, when comparing the computed hash (unsigned) with the region boundary (signed), the negative region boundary is treated as unsigned, resulting in incorrect nexthop selection. Fix by treating the computed hash as signed. Note that the computed hash is always in range of [0, 2^31 - 1]. Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250402114224.293392-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04	ipv6: Start path selection from the first nexthop	Ido Schimmel	1	-3/+35
	Cited commit transitioned IPv6 path selection to use hash-threshold instead of modulo-N. With hash-threshold, each nexthop is assigned a region boundary in the multipath hash function's output space and a nexthop is chosen if the calculated hash is smaller than the nexthop's region boundary. Hash-threshold does not work correctly if path selection does not start with the first nexthop. For example, if fib6_select_path() is always passed the last nexthop in the group, then it will always be chosen because its region boundary covers the entire hash function's output space. Fix this by starting the selection process from the first nexthop and do not consider nexthops for which rt6_score_route() provided a negative score. Fixes: 3d709f69a3e7 ("ipv6: Use hash-threshold instead of modulo-N") Reported-by: Stanislav Fomichev <stfomichev@gmail.com> Closes: https://lore.kernel.org/netdev/Z9RIyKZDNoka53EO@mini-arch/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20250402114224.293392-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04	usbnet:fix NPE during rx_complete	Ying Lu	1	-3/+3
	Missing usbnet_going_away Check in Critical Path. The usb_submit_urb function lacks a usbnet_going_away validation, whereas __usbnet_queue_skb includes this check. This inconsistency creates a race condition where: A URB request may succeed, but the corresponding SKB data fails to be queued. Subsequent processes: (e.g., rx_complete → defer_bh → __skb_unlink(skb, list)) attempt to access skb->next, triggering a NULL pointer dereference (Kernel Panic). Fixes: 04e906839a05 ("usbnet: fix cyclical race on disconnect with work queue") Cc: stable@vger.kernel.org Signed-off-by: Ying Lu <luying1@xiaomi.com> Link: https://patch.msgid.link/4c9ef2efaa07eb7f9a5042b74348a67e5a3a7aea.1743584159.git.luying1@xiaomi.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-04	net: octeontx2: Handle XDP_ABORTED and XDP invalid as XDP_DROP	Lorenzo Bianconi	1	-5/+4
	In the current implementation octeontx2 manages XDP_ABORTED and XDP invalid as XDP_PASS forwarding the skb to the networking stack. Align the behaviour to other XDP drivers handling XDP_ABORTED and XDP invalid as XDP_DROP. Please note this patch has just compile tested. Fixes: 06059a1a9a4a5 ("octeontx2-pf: Add XDP support to netdev PF") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250401-octeontx2-xdp-abort-fix-v1-1-f0587c35a0b9@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03	net: fix geneve_opt length integer overflow	Lin Ma	4	-4/+4
	struct geneve_opt uses 5 bit length for each single option, which means every vary size option should be smaller than 128 bytes. However, all current related Netlink policies cannot promise this length condition and the attacker can exploit a exact 128-byte size option to fake a zero length option and confuse the parsing logic, further achieve heap out-of-bounds read. One example crash log is like below: [ 3.905425] ================================================================== [ 3.905925] BUG: KASAN: slab-out-of-bounds in nla_put+0xa9/0xe0 [ 3.906255] Read of size 124 at addr ffff888005f291cc by task poc/177 [ 3.906646] [ 3.906775] CPU: 0 PID: 177 Comm: poc-oob-read Not tainted 6.1.132 #1 [ 3.907131] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 3.907784] Call Trace: [ 3.907925] <TASK> [ 3.908048] dump_stack_lvl+0x44/0x5c [ 3.908258] print_report+0x184/0x4be [ 3.909151] kasan_report+0xc5/0x100 [ 3.909539] kasan_check_range+0xf3/0x1a0 [ 3.909794] memcpy+0x1f/0x60 [ 3.909968] nla_put+0xa9/0xe0 [ 3.910147] tunnel_key_dump+0x945/0xba0 [ 3.911536] tcf_action_dump_1+0x1c1/0x340 [ 3.912436] tcf_action_dump+0x101/0x180 [ 3.912689] tcf_exts_dump+0x164/0x1e0 [ 3.912905] fw_dump+0x18b/0x2d0 [ 3.913483] tcf_fill_node+0x2ee/0x460 [ 3.914778] tfilter_notify+0xf4/0x180 [ 3.915208] tc_new_tfilter+0xd51/0x10d0 [ 3.918615] rtnetlink_rcv_msg+0x4a2/0x560 [ 3.919118] netlink_rcv_skb+0xcd/0x200 [ 3.919787] netlink_unicast+0x395/0x530 [ 3.921032] netlink_sendmsg+0x3d0/0x6d0 [ 3.921987] __sock_sendmsg+0x99/0xa0 [ 3.922220] __sys_sendto+0x1b7/0x240 [ 3.922682] __x64_sys_sendto+0x72/0x90 [ 3.922906] do_syscall_64+0x5e/0x90 [ 3.923814] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 3.924122] RIP: 0033:0x7e83eab84407 [ 3.924331] Code: 48 89 fa 4c 89 df e8 38 aa 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 faf [ 3.925330] RSP: 002b:00007ffff505e370 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 3.925752] RAX: ffffffffffffffda RBX: 00007e83eaafa740 RCX: 00007e83eab84407 [ 3.926173] RDX: 00000000000001a8 RSI: 00007ffff505e3c0 RDI: 0000000000000003 [ 3.926587] RBP: 00007ffff505f460 R08: 00007e83eace1000 R09: 000000000000000c [ 3.926977] R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffff505f3c0 [ 3.927367] R13: 00007ffff505f5c8 R14: 00007e83ead1b000 R15: 00005d4fbbe6dcb8 Fix these issues by enforing correct length condition in related policies. Fixes: 925d844696d9 ("netfilter: nft_tunnel: add support for geneve opts") Fixes: 4ece47787077 ("lwtunnel: add options setting and dumping for geneve") Fixes: 0ed5269f9e41 ("net/sched: add tunnel option support to act_tunnel_key") Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options") Signed-off-by: Lin Ma <linma@zju.edu.cn> Reviewed-by: Xin Long <lucien.xin@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Link: https://patch.msgid.link/20250402165632.6958-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03	fs: actually hold the namespace semaphore	Christian Brauner	1	-1/+2
	Don't use a scoped guard that only protects the next statement. Use a regular guard to make sure that the namespace semaphore is held across the whole function. Signed-off-by: Christian Brauner <brauner@kernel.org> Reported-by: Leon Romanovsky <leon@kernel.org> Link: https://lore.kernel.org/all/20250401170715.GA112019@unreal/ Fixes: db04662e2f4f ("fs: allow detached mounts in clone_private_mount()") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-03	io_uring/zcrx: fix selftests w/ updated netdev Python helpers	David Wei	1	-4/+4
	Fix io_uring zero copy rx selftest with updated netdev Python helpers. Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250402172414.895276-1-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03	selftests: net: use netdevsim in netns test	Stanislav Fomichev	2	-4/+34
	Netdevsim has extra register_netdevice_notifier_dev_net notifiers, use netdevim instead of dummy device to test them out. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250401163452.622454-9-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03	docs: net: document netdev notifier expectations	Stanislav Fomichev	1	-0/+23
	We don't have a consistent state yet, but document where we think we are and where we wanna be. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250401163452.622454-8-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03	net: dummy: request ops lock	Stanislav Fomichev	1	-0/+1
	Even though dummy device doesn't really need an instance lock, a lot of selftests use dummy so it's useful to have extra expose to the instance lock on NIPA. Request the instance/ops locking. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250401163452.622454-7-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03	netdevsim: add dummy device notifiers	Stanislav Fomichev	4	-5/+28
	In order to exercise and verify notifiers' locking assumptions, register dummy notifiers (via register_netdevice_notifier_dev_net). Share notifier event handler that enforces the assumptions with lock_debug.c (rename and export rtnl_net_debug_event as netdev_debug_event). Add ops lock asserts to netdev_debug_event. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250401163452.622454-6-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03	net: rename rtnl_net_debug to lock_debug	Stanislav Fomichev	2	-1/+1
	And make it selected by CONFIG_DEBUG_NET. Don't rename any of the structs/functions. Next patch will use rtnl_net_debug_event in netdevsim. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250401163452.622454-5-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03	net: use netif_disable_lro in ipv6_add_dev	Stanislav Fomichev	3	-10/+22
	ipv6_add_dev might call dev_disable_lro which unconditionally grabs instance lock, so it will deadlock during NETDEV_REGISTER. Switch to netif_disable_lro. Make sure all callers hold the instance lock as well. Cc: Cosmin Ratiu <cratiu@nvidia.com> Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250401163452.622454-4-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03	net: hold instance lock during NETDEV_REGISTER/UP	Stanislav Fomichev	4	-15/+15
	Callers of inetdev_init can come from several places with inconsistent expectation about netdev instance lock. Grab instance lock during REGISTER (plus UP). Also solve the inconsistency with UNREGISTER where it was locked only during move netns path. WARNING: CPU: 10 PID: 1479 at ./include/net/netdev_lock.h:54 __netdev_update_features+0x65f/0xca0 __warn+0x81/0x180 __netdev_update_features+0x65f/0xca0 report_bug+0x156/0x180 handle_bug+0x4f/0x90 exc_invalid_op+0x13/0x60 asm_exc_invalid_op+0x16/0x20 __netdev_update_features+0x65f/0xca0 netif_disable_lro+0x30/0x1d0 inetdev_init+0x12f/0x1f0 inetdev_event+0x48b/0x870 notifier_call_chain+0x38/0xf0 register_netdevice+0x741/0x8b0 register_netdev+0x1f/0x40 mlx5e_probe+0x4e3/0x8e0 [mlx5_core] auxiliary_bus_probe+0x3f/0x90 really_probe+0xc3/0x3a0 __driver_probe_device+0x80/0x150 driver_probe_device+0x1f/0x90 __device_attach_driver+0x7d/0x100 bus_for_each_drv+0x80/0xd0 __device_attach+0xb4/0x1c0 bus_probe_device+0x91/0xa0 device_add+0x657/0x870 Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reported-by: Cosmin Ratiu <cratiu@nvidia.com> Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250401163452.622454-3-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03	net: switch to netif_disable_lro in inetdev_init	Stanislav Fomichev	1	-1/+1
	Cosmin reports the following deadlock: dump_stack_lvl+0x62/0x90 print_deadlock_bug+0x274/0x3b0 __lock_acquire+0x1229/0x2470 lock_acquire+0xb7/0x2b0 __mutex_lock+0xa6/0xd20 dev_disable_lro+0x20/0x80 inetdev_init+0x12f/0x1f0 inetdev_event+0x48b/0x870 notifier_call_chain+0x38/0xf0 netif_change_net_namespace+0x72e/0x9f0 do_setlink.isra.0+0xd5/0x1220 rtnl_newlink+0x7ea/0xb50 rtnetlink_rcv_msg+0x459/0x5e0 netlink_rcv_skb+0x54/0x100 netlink_unicast+0x193/0x270 netlink_sendmsg+0x204/0x450 Switch to netif_disable_lro which assumes the caller holds the instance lock. inetdev_init is called for blackhole device (which sw device and doesn't grab instance lock) and from REGISTER/UNREGISTER notifiers. We already hold the instance lock for REGISTER notifier during netns change and we'll soon hold the lock during other paths. Reviewed-by: Jakub Kicinski <kuba@kernel.org> Reported-by: Cosmin Ratiu <cratiu@nvidia.com> Fixes: ad7c7b2172c3 ("net: hold netdev instance lock during sysfs operations") Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250401163452.622454-2-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03	net: airoha: Validate egress gdm port in airoha_ppe_foe_entry_prepare()	Lorenzo Bianconi	3	-2/+22
	Dev pointer in airoha_ppe_foe_entry_prepare routine is not strictly a device allocated by airoha_eth driver since it is an egress device and the flowtable can contain even wlan, pppoe or vlan devices. E.g: flowtable ft { hook ingress priority filter devices = { eth1, lan1, lan2, lan3, lan4, wlan0 } flags offload ^ \| "not allocated by airoha_eth" -- } In this case airoha_get_dsa_port() will just return the original device pointer and we can't assume netdev priv pointer points to an airoha_gdm_port struct. Fix the issue validating egress gdm port in airoha_ppe_foe_entry_prepare routine before accessing net_device priv pointer. Fixes: 00a7678310fe ("net: airoha: Introduce flowtable offload support") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250401-airoha-validate-egress-gdm-port-v4-1-c7315d33ce10@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03	net: dsa: mv88e6xxx: propperly shutdown PPU re-enable timer on destroy	David Oberhollenzer	2	-4/+10
	The mv88e6xxx has an internal PPU that polls PHY state. If we want to access the internal PHYs, we need to disable the PPU first. Because that is a slow operation, a 10ms timer is used to re-enable it, canceled with every access, so bulk operations effectively only disable it once and re-enable it some 10ms after the last access. If a PHY is accessed and then the mv88e6xxx module is removed before the 10ms are up, the PPU re-enable ends up accessing a dangling pointer. This especially affects probing during bootup. The MDIO bus and PHY registration may succeed, but registration with the DSA framework may fail later on (e.g. because the CPU port depends on another, very slow device that isn't done probing yet, returning -EPROBE_DEFER). In this case, probe() fails, but the MDIO subsystem may already have accessed the MIDO bus or PHYs, arming the timer. This is fixed as follows: - If probe fails after mv88e6xxx_phy_init(), make sure we also call mv88e6xxx_phy_destroy() before returning - In mv88e6xxx_remove(), make sure we do the teardown in the correct order, calling mv88e6xxx_phy_destroy() after unregistering the switch device. - In mv88e6xxx_phy_destroy(), destroy both the timer and the work item that the timer might schedule, synchronously waiting in case one of the callbacks already fired and destroying the timer first, before waiting for the work item. - Access to the PPU is guarded by a mutex, the worker acquires it with a mutex_trylock(), not proceeding with the expensive shutdown if that fails. We grab the mutex in mv88e6xxx_phy_destroy() to make sure the slow PPU shutdown is already done or won't even enter, when we wait for the work item. Fixes: 2e5f032095ff ("dsa: add support for the Marvell 88E6131 switch chip") Signed-off-by: David Oberhollenzer <david.oberhollenzer@sigma-star.at> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/20250401135705.92760-1-david.oberhollenzer@sigma-star.at Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03	MAINTAINERS: Update Loic Poulain's email address	Loic Poulain	1	-3/+3
	Update Loic Poulain's email address to @oss.qualcomm.com. Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250401145344.10669-1-loic.poulain@oss.qualcomm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03	ipv6: fix omitted netlink attributes when using RTEXT_FILTER_SKIP_STATS	Fernando Fernandez Mancera	1	-12/+25
	Using RTEXT_FILTER_SKIP_STATS is incorrectly skipping non-stats IPv6 netlink attributes on link dump. This causes issues on userspace tools, e.g iproute2 is not rendering address generation mode as it should due to missing netlink attribute. Move the filling of IFLA_INET6_STATS and IFLA_INET6_ICMP6STATS to a helper function guarded by a flag check to avoid hitting the same situation in the future. Fixes: d5566fd72ec1 ("rtnetlink: RTEXT_FILTER_SKIP_STATS support to avoid dumping inet/inet6 stats") Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250402121751.3108-1-ffmancera@riseup.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03	eth: bnxt: fix deadlock in the mgmt_ops	Taehee Yoo	1	-3/+3
	When queue is being reset, callbacks of mgmt_ops are called by netdev_nl_bind_rx_doit(). The netdev_nl_bind_rx_doit() first acquires netdev_lock() and then calls callbacks. So, mgmt_ops callbacks should not acquire netdev_lock() internaly. The bnxt_queue_{start \| stop}() calls napi_{enable \| disable}() but they internally acquire netdev_lock(). So, deadlock occurs. To avoid deadlock, napi_{enable \| disable}_locked() should be used instead. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Fixes: cae03e5bdd9e ("net: hold netdev instance lock during queue operations") Link: https://patch.msgid.link/20250402133123.840173-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03	net/selftests: Add loopback link local route for self-connect	Dmitry Safonov	1	-0/+3
	self-connect-ipv6 got slightly flaky on netdev: > # timeout set to 120 > # selftests: net/tcp_ao: self-connect_ipv6 > # 1..5 > # # 708[lib/setup.c:250] rand seed 1742872572 > # TAP version 13 > # # 708[lib/proc.c:213] Snmp6 Ip6OutNoRoutes: 0 => 1 > # not ok 1 # error 708[self-connect.c:70] failed to connect() > # ok 2 No unexpected trace events during the test run > # # Planned tests != run tests (5 != 2) > # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:1 > ok 1 selftests: net/tcp_ao: self-connect_ipv6 I can not reproduce it on my machines, but judging by "Ip6OutNoRoutes" there is no route to the local_addr (::1). Looking at the kernel code, I see that kernel does add link-local address automatically in init_loopback(), but that is called from ipv6 notifier block. So, in turn the userspace that brought up the loopback interface may see rtnetlink ACK earlier than addrconf_notify() does it's job (at least, on a slow VM such as netdev). Probably, for ipv4 it's the same, judging by inetdev_event(). The fix is quite simple: set the link-local route straight after bringing the loopback interface. That will make it synchronous. Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20250402-tcp-ao-selfconnect-flake-v1-1-8388d629ef3d@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03	sfc: fix NULL dereferences in ef100_process_design_param()	Edward Cree	2	-29/+24
	Since cited commit, ef100_probe_main() and hence also ef100_check_design_params() run before efx->net_dev is created; consequently, we cannot netif_set_tso_max_size() or _segs() at this point. Move those netif calls to ef100_probe_netdev(), and also replace netif_err within the design params code with pci_err. Reported-by: Kyungwook Boo <bookyungwook@gmail.com> Fixes: 98ff4c7c8ac7 ("sfc: Separate netdev probe/remove from PCI probe/remove") Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Link: https://patch.msgid.link/20250401225439.2401047-1-edward.cree@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03	gve: handle overflow when reporting TX consumed descriptors	Joshua Washington	1	-1/+3
	When the tx tail is less than the head (in cases of wraparound), the TX consumed descriptor statistic in DQ will be reported as UINT32_MAX - head + tail, which is incorrect. Mask the difference of head and tail according to the ring size when reporting the statistic. Cc: stable@vger.kernel.org Fixes: 2c9198356d56 ("gve: Add consumed counts to ethtool stats") Signed-off-by: Joshua Washington <joshwash@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250402001037.2717315-1-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-03	bcachefs: Fix "journal stuck" during recovery	Kent Overstreet	1	-0/+8
	If we crash when the journal pin fifo is completely full - i.e. we're at the maximum number of dirty journal entries - that may put us in a sticky situation in recovery, as journal replay will need to be able to open new journal entries in order to get going. bch2_fs_journal_start() already had provisions for resizing the journal pin fifo if needed, but it needs a fudge factor to ensure there's room for journal replay. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03	bcachefs: backpointer_get_key: check for null from peek_slot()	Kent Overstreet	1	-0/+12
	peek_slot() doesn't normally return bkey_s_c_null - except when we ask for a key at a btree level that doesn't exist, which can happen here. We might want to revisit this, but we'll have to look over all the places where we use peek_slot() on interior nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03	bcachefs: Fix null ptr deref in invalidate_one_bucket()	Kent Overstreet	1	-0/+3
	bch2_backpointer_get_key() returns bkey_s_c_null when the target isn't found. backpointer_get_key() flags the error, so there's nothing else to do here - just skip it and move on. Link: https://github.com/koverstreet/bcachefs/issues/847 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03	bcachefs: Fix check_snapshot_exists() restart handling	Kent Overstreet	1	-3/+0
	Codepaths that create entries in the snapshots btree currently call bch2_mark_snapshot(), which updates the in-memory snapshot table, before transaction commit. This is because bch2_mark_snapshot() is an atomic trigger, run with btree write locks held, and isn't allowed to fail - but it might need to reallocate the table, hence we call it early when we're still allowed to fail. This is generally harmless - if we fail, we'll have left an entry in the snapshots table around, but nothing will reference it and it'll get overwritten if reused by another transaction. But check_snapshot_exists(), which reconstructs snapshots when the snapshots btree has been corrupted or lost, was erronously rechecking if the snapshot exists inside the transaction commit loop - so on transaction restart (in this case mem_realloced), the second iteration would return without repairing. This code needs some cleanup: splitting out a "maybe realloc snapshots table" helper would have avoided this, that will be in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03	bcachefs: use nonblocking variant of print_string_as_lines in error path	Bharadwaj Raju	1	-2/+2
	The inconsistency error path calls print_string_as_lines, which calls console_lock, which is a potentially-sleeping function and so can't be called in an atomic context. Replace calls to it with the nonblocking variant which is safe to call. Signed-off-by: Bharadwaj Raju <bharadwaj.raju777@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03	bcachefs: Fix scheduling while atomic from logging changes	Kent Overstreet	2	-0/+4
	Two fixes from the recent logging changes: bch2_inconsistent(), bch2_fs_inconsistent() be called from interrupt context, or with rcu_read_lock() held. The one syzbot found is in bch2_bkey_pick_read_device bch2_dev_rcu bch2_fs_inconsistent We're starting to switch to lift the printbufs up to higher levels so we can emit better log messages and print them all in one go (avoid garbling), so that conversion will help with spotting these in the future; when we declare a printbuf it must be flagged if we're in an atomic context. Secondly, in btree_node_write_endio: 00085 BUG: sleeping function called from invalid context at include/linux/sched/mm.h:321 00085 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 618, name: bch-reclaim/fa6 00085 preempt_count: 10001, expected: 0 00085 RCU nest depth: 0, expected: 0 00085 4 locks held by bch-reclaim/fa6/618: 00085 #0: ffffff80d7ccad68 (&j->reclaim_lock){+.+.}-{4:4}, at: bch2_journal_reclaim_thread+0x84/0x198 00085 #1: ffffff80d7c84218 (&c->btree_trans_barrier){.+.+}-{0:0}, at: __bch2_trans_get+0x1c0/0x440 00085 #2: ffffff80cd3f8140 (bcachefs_btree){+.+.}-{0:0}, at: __bch2_trans_get+0x22c/0x440 00085 #3: ffffff80c3823c20 (&vblk->vqs[i].lock){-.-.}-{3:3}, at: virtblk_done+0x58/0x130 00085 irq event stamp: 328 00085 hardirqs last enabled at (327): [<ffffffc080073a14>] finish_task_switch.isra.0+0xbc/0x2a0 00085 hardirqs last disabled at (328): [<ffffffc080971a10>] el1_interrupt+0x20/0x60 00085 softirqs last enabled at (0): [<ffffffc08002f920>] copy_process+0x7c8/0x2118 00085 softirqs last disabled at (0): [<0000000000000000>] 0x0 00085 Preemption disabled at: 00085 [<ffffffc08003ada0>] irq_enter_rcu+0x18/0x90 00085 CPU: 8 UID: 0 PID: 618 Comm: bch-reclaim/fa6 Not tainted 6.14.0-rc6-ktest-g04630bde23e8 #18798 00085 Hardware name: linux,dummy-virt (DT) 00085 Call trace: 00085 show_stack+0x1c/0x30 (C) 00085 dump_stack_lvl+0x84/0xc0 00085 dump_stack+0x14/0x20 00085 __might_resched+0x180/0x288 00085 __might_sleep+0x4c/0x88 00085 __kmalloc_node_track_caller_noprof+0x34c/0x3e0 00085 krealloc_noprof+0x1a0/0x2d8 00085 bch2_printbuf_make_room+0x9c/0x120 00085 bch2_prt_printf+0x60/0x1b8 00085 btree_node_write_endio+0x1b0/0x2d8 00085 bio_endio+0x138/0x1f0 00085 btree_node_write_endio+0xe8/0x2d8 00085 bio_endio+0x138/0x1f0 00085 blk_update_request+0x220/0x4c0 00085 blk_mq_end_request+0x28/0x148 00085 virtblk_request_done+0x64/0xe8 00085 blk_mq_complete_request+0x34/0x40 00085 virtblk_done+0x78/0x130 00085 vring_interrupt+0x6c/0xb0 00085 __handle_irq_event_percpu+0x8c/0x2e0 00085 handle_irq_event+0x50/0xb0 00085 handle_fasteoi_irq+0xc4/0x250 00085 handle_irq_desc+0x44/0x60 00085 generic_handle_domain_irq+0x20/0x30 00085 gic_handle_irq+0x54/0xc8 00085 call_on_irq_stack+0x24/0x40 Reported-by: syzbot+c82cd2906e2f192410bb@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03	bcachefs: Add error handling for zlib_deflateInit2()	Wentao Liang	1	-2/+3
	In attempt_compress(), the return value of zlib_deflateInit2() needs to be checked. A proper implementation can be found in pstore_compress(). Add an error check and return 0 immediately if the initialzation fails. Fixes: 986e9842fb68 ("bcachefs: Compression levels") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-04-03	block: don't grab elevator lock during queue initialization	Ming Lei	1	-7/+17
	->elevator_lock depends on queue freeze lock, see block/blk-sysfs.c. queue freeze lock depends on fs_reclaim. So don't grab elevator lock during queue initialization which needs to call kmalloc(GFP_KERNEL), and we can cut the dependency between ->elevator_lock and fs_reclaim, then the lockdep warning can be killed. This way is safe because elevator setting isn't ready to run during queue initialization. There isn't such issue in __blk_mq_update_nr_hw_queues() because memalloc_noio_save() is called before acquiring elevator lock. Fixes the following lockdep warning: https://lore.kernel.org/linux-block/67e6b425.050a0220.2f068f.007b.GAE@google.com/ Reported-by: syzbot+4c7e0f9b94ad65811efb@syzkaller.appspotmail.com Cc: Nilay Shroff <nilay@linux.ibm.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250403105402.1334206-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-03	io_uring: always do atomic put from iowq	Pavel Begunkov	2	-1/+8
	io_uring always switches requests to atomic refcounting for iowq execution before there is any parallilism by setting REQ_F_REFCOUNT, and the flag is not cleared until the request completes. That should be fine as long as the compiler doesn't make up a non existing value for the flags, however KCSAN still complains when the request owner changes oter flag bits: BUG: KCSAN: data-race in io_req_task_cancel / io_wq_free_work ... read to 0xffff888117207448 of 8 bytes by task 3871 on cpu 0: req_ref_put_and_test io_uring/refs.h:22 [inline] Skip REQ_F_REFCOUNT checks for iowq, we know it's set. Reported-by: syzbot+903a2ad71fb3f1e47cf5@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d880bc27fb8c3209b54641be4ff6ac02b0e5789a.1743679736.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-03	netfilter: nft_tunnel: fix geneve_opt type confusion addition	Lin Ma	1	-2/+2
	When handling multiple NFTA_TUNNEL_KEY_OPTS_GENEVE attributes, the parsing logic should place every geneve_opt structure one by one compactly. Hence, when deciding the next geneve_opt position, the pointer addition should be in units of char *. However, the current implementation erroneously does type conversion before the addition, which will lead to heap out-of-bounds write. [ 6.989857] ================================================================== [ 6.990293] BUG: KASAN: slab-out-of-bounds in nft_tunnel_obj_init+0x977/0xa70 [ 6.990725] Write of size 124 at addr ffff888005f18974 by task poc/178 [ 6.991162] [ 6.991259] CPU: 0 PID: 178 Comm: poc-oob-write Not tainted 6.1.132 #1 [ 6.991655] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 6.992281] Call Trace: [ 6.992423] <TASK> [ 6.992586] dump_stack_lvl+0x44/0x5c [ 6.992801] print_report+0x184/0x4be [ 6.993790] kasan_report+0xc5/0x100 [ 6.994252] kasan_check_range+0xf3/0x1a0 [ 6.994486] memcpy+0x38/0x60 [ 6.994692] nft_tunnel_obj_init+0x977/0xa70 [ 6.995677] nft_obj_init+0x10c/0x1b0 [ 6.995891] nf_tables_newobj+0x585/0x950 [ 6.996922] nfnetlink_rcv_batch+0xdf9/0x1020 [ 6.998997] nfnetlink_rcv+0x1df/0x220 [ 6.999537] netlink_unicast+0x395/0x530 [ 7.000771] netlink_sendmsg+0x3d0/0x6d0 [ 7.001462] __sock_sendmsg+0x99/0xa0 [ 7.001707] ____sys_sendmsg+0x409/0x450 [ 7.002391] ___sys_sendmsg+0xfd/0x170 [ 7.003145] __sys_sendmsg+0xea/0x170 [ 7.004359] do_syscall_64+0x5e/0x90 [ 7.005817] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 7.006127] RIP: 0033:0x7ec756d4e407 [ 7.006339] Code: 48 89 fa 4c 89 df e8 38 aa 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 faf [ 7.007364] RSP: 002b:00007ffed5d46760 EFLAGS: 00000202 ORIG_RAX: 000000000000002e [ 7.007827] RAX: ffffffffffffffda RBX: 00007ec756cc4740 RCX: 00007ec756d4e407 [ 7.008223] RDX: 0000000000000000 RSI: 00007ffed5d467f0 RDI: 0000000000000003 [ 7.008620] RBP: 00007ffed5d468a0 R08: 0000000000000000 R09: 0000000000000000 [ 7.009039] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 [ 7.009429] R13: 00007ffed5d478b0 R14: 00007ec756ee5000 R15: 00005cbd4e655cb8 Fix this bug with correct pointer addition and conversion in parse and dump code. Fixes: 925d844696d9 ("netfilter: nft_tunnel: add support for geneve opts") Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-04-03	net: decrease cached dst counters in dst_release	Antoine Tenart	1	-0/+8
	Upstream fix ac888d58869b ("net: do not delay dst_entries_add() in dst_release()") moved decrementing the dst count from dst_destroy to dst_release to avoid accessing already freed data in case of netns dismantle. However in case CONFIG_DST_CACHE is enabled and OvS+tunnels are used, this fix is incomplete as the same issue will be seen for cached dsts: Unable to handle kernel paging request at virtual address ffff5aabf6b5c000 Call trace: percpu_counter_add_batch+0x3c/0x160 (P) dst_release+0xec/0x108 dst_cache_destroy+0x68/0xd8 dst_destroy+0x13c/0x168 dst_destroy_rcu+0x1c/0xb0 rcu_do_batch+0x18c/0x7d0 rcu_core+0x174/0x378 rcu_core_si+0x18/0x30 Fix this by invalidating the cache, and thus decrementing cached dst counters, in dst_release too. Fixes: d71785ffc7e7 ("net: add dst_cache to ovs vxlan lwtunnel") Signed-off-by: Antoine Tenart <atenart@kernel.org> Link: https://patch.msgid.link/20250326173634.31096-1-atenart@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-04-02	selftests/bpf: Fix verifier_private_stack test failure	Yonghong Song	1	-3/+3
	Several verifier_private_stack tests failed with latest bpf-next. For example, for 'Private stack, single prog' subtest, the jitted code: func #0: 0: f3 0f 1e fa endbr64 4: 0f 1f 44 00 00 nopl (%rax,%rax) 9: 0f 1f 00 nopl (%rax) c: 55 pushq %rbp d: 48 89 e5 movq %rsp, %rbp 10: f3 0f 1e fa endbr64 14: 49 b9 58 74 8a 8f 7d 60 00 00 movabsq $0x607d8f8a7458, %r9 1e: 65 4c 03 0c 25 28 c0 48 87 addq %gs:-0x78b73fd8, %r9 27: bf 2a 00 00 00 movl $0x2a, %edi 2c: 49 89 b9 00 ff ff ff movq %rdi, -0x100(%r9) 33: 31 c0 xorl %eax, %eax 35: c9 leave 36: e9 20 5d 0f e1 jmp 0xffffffffe10f5d5b The insn 'addq %gs:-0x78b73fd8, %r9' does not match the expected regex 'addq %gs:0x{{.}}, %r9' and this caused test failure. Fix it by changing '%gs:0x{{.}}' to '%gs:{{.*}}' to accommodate the possible negative offset. A few other subtests are fixed in a similar way. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250331033828.365077-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-02	selftests/bpf: Fix verifier_bpf_fastcall test	Song Liu	1	-3/+3
	Commit [1] moves percpu data on x86 from address 0x000... to address 0xfff... Before [1]: 159020: 0000000000030700 0 OBJECT GLOBAL DEFAULT 23 pcpu_hot After [1]: 152602: ffffffff83a3e034 4 OBJECT GLOBAL DEFAULT 35 pcpu_hot As a result, verifier_bpf_fastcall tests should now expect a negative value for pcpu_hot, IOW, the disassemble should show "r=" instead of "w=". Fix this in the test. Note that, a later change created a new variable "cpu_number" for bpf_get_smp_processor_id() [2]. The inlining logic is updated properly as part of this change, so there is no need to fix anything on the kernel side. [1] commit 9d7de2aa8b41 ("x86/percpu/64: Use relative percpu offsets") [2] commit 01c7bc5198e9 ("x86/smp: Move cpu number to percpu hot section") Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250328193124.808784-1-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-02	selftests/bpf: Fix tests after fields reorder in struct file	Song Liu	2	-4/+4
	The change in struct file [1] moved f_ref to the 3rd cache line. It made (u64 )file dereference invalid from the verifier point of view, because btf_struct_walk() walks into f_lock field, which is 4-byte long. Fix the selftests to deference the file pointer as a 4-byte access. [1] commit e249056c91a2 ("fs: place f_ref to 3rd cache line in struct file to resolve false sharing") Reported-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250327185528.1740787-1-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-02	xsk: Fix __xsk_generic_xmit() error code when cq is full	Wang Liang	1	-1/+4
	When the cq reservation is failed, the error code is not set which is initialized to zero in __xsk_generic_xmit(). That means the packet is not send successfully but sendto() return ok. Considering the impact on uapi, return -EAGAIN is a good idea. The cq is full usually because it is not released in time, try to send msg again is appropriate. The bug was at the very early implementation of xsk, so the Fixes tag targets the commit that introduced the changes in xsk_cq_reserve_addr_locked where this fix depends on. Fixes: e6c4047f5122 ("xsk: Use xsk_buff_pool directly for cq functions") Suggested-by: Magnus Karlsson <magnus.karlsson@gmail.com> Signed-off-by: Wang Liang <wangliang74@huawei.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250227081052.4096337-1-wangliang74@huawei.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-03	docs: fs/9p: Add missing "not" in cache documentation	Tingmao Wang	1	-2/+2
	A quick fix for what I assume is a typo. Signed-off-by: Tingmao Wang <m@maowtm.org> Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com> Message-ID: <20250330213443.98434-1-m@maowtm.org> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2025-04-02	cifs: update internal version number	Steve French	1	-2/+2
	To 2.52 Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-02	cifs: Implement is_network_name_deleted for SMB1	Pali Rohár	1	-0/+44
	This change allows Linux SMB1 client to autoreconnect the share when it is modified on server by admin operation which removes and re-adds it. Implementation is reused from SMB2+ is_network_name_deleted callback. There are just adjusted checks for error codes and access to struct smb_hdr. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2025-04-02	cifs: Remove cifs_truncate_page() as it should be superfluous	David Howells	3	-22/+0
	The calls to cifs_truncate_page() should be superfluous as the places that call it also call truncate_setsize() or cifs_setsize() and therefore truncate_pagecache() which should also clear the tail part of the folio containing the EOF marker. Further, smb3_simple_falloc() calls both cifs_setsize() and truncate_setsize() in addition to cifs_truncate_page(). Remove the superfluous calls. This gets rid of another function referring to struct page. [Should cifs_setsize() also set inode->i_blocks?] Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <stfrench@microsoft.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>