wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2023-10-13	tcp: allow again tcp_disconnect() when threads are waiting	Paolo Abeni	10	-45/+80
	As reported by Tom, .NET and applications build on top of it rely on connect(AF_UNSPEC) to async cancel pending I/O operations on TCP socket. The blamed commit below caused a regression, as such cancellation can now fail. As suggested by Eric, this change addresses the problem explicitly causing blocking I/O operation to terminate immediately (with an error) when a concurrent disconnect() is executed. Instead of tracking the number of threads blocked on a given socket, track the number of disconnect() issued on such socket. If such counter changes after a blocking operation releasing and re-acquiring the socket lock, error out the current operation. Fixes: 4faeee0cf8a5 ("tcp: deny tcp_disconnect() when threads are waiting") Reported-by: Tom Deseyn <tdeseyn@redhat.com> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1886305 Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/f3b95e47e3dbed840960548aebaa8d954372db41.1697008693.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13	ice: fix over-shifted variable	Jesse Brandeburg	1	-2/+1
	Since the introduction of the ice driver the code has been double-shifting the RSS enabling field, because the define already has shifts in it and can't have the regular pattern of "a << shiftval & mask" applied. Most places in the code got it right, but one line was still wrong. Fix this one location for easy backports to stable. An in-progress patch fixes the defines to "standard" and will be applied as part of the regular -next process sometime after this one. Fixes: d76a60ba7afb ("ice: Add support for VLANs and offloads") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> CC: stable@vger.kernel.org Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20231010203101.406248-1-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13	net: dsa: bcm_sf2: Fix possible memory leak in bcm_sf2_mdio_register()	Jinjie Ruan	1	-9/+15
	In bcm_sf2_mdio_register(), the class_find_device() will call get_device() to increment reference count for priv->master_mii_bus->dev if of_mdio_find_bus() succeeds. If mdiobus_alloc() or mdiobus_register() fails, it will call get_device() twice without decrement reference count for the device. And it is the same if bcm_sf2_mdio_register() succeeds but fails in bcm_sf2_sw_probe(), or if bcm_sf2_sw_probe() succeeds. If the reference count has not decremented to zero, the dev related resource will not be freed. So remove the get_device() in bcm_sf2_mdio_register(), and call put_device() if mdiobus_alloc() or mdiobus_register() fails and in bcm_sf2_mdio_unregister() to solve the issue. And as Simon suggested, unwind from errors for bcm_sf2_mdio_register() and just return 0 if it succeeds to make it cleaner. Fixes: 461cd1b03e32 ("net: dsa: bcm_sf2: Register our slave MDIO bus") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Suggested-by: Simon Horman <horms@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20231011032419.2423290-1-ruanjinjie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13	selftests: fib_tests: Count all trace point invocations	Ido Schimmel	1	-2/+2
	The tests rely on the IPv{4,6} FIB trace points being triggered once for each forwarded packet. If receive processing is deferred to the ksoftirqd task these invocations will not be counted and the tests will fail. Fix by specifying the '-a' flag to avoid perf from filtering on the mausezahn task. Before: # ./fib_tests.sh -t ipv4_mpath_list IPv4 multipath list receive tests TEST: Multipath route hit ratio (.68) [FAIL] # ./fib_tests.sh -t ipv6_mpath_list IPv6 multipath list receive tests TEST: Multipath route hit ratio (.27) [FAIL] After: # ./fib_tests.sh -t ipv4_mpath_list IPv4 multipath list receive tests TEST: Multipath route hit ratio (1.00) [ OK ] # ./fib_tests.sh -t ipv6_mpath_list IPv6 multipath list receive tests TEST: Multipath route hit ratio (.99) [ OK ] Fixes: 8ae9efb859c0 ("selftests: fib_tests: Add multipath list receive tests") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/netdev/202309191658.c00d8b8-oliver.sang@intel.com/ Tested-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Tested-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Link: https://lore.kernel.org/r/20231010132113.3014691-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13	selftests: fib_tests: Disable RP filter in multipath list receive test	Ido Schimmel	1	-0/+3
	The test relies on the fib:fib_table_lookup trace point being triggered once for each forwarded packet. If RP filter is not disabled, the trace point will be triggered twice for each packet (for source validation and forwarding), potentially masking actual bugs. Fix by explicitly disabling RP filter. Before: # ./fib_tests.sh -t ipv4_mpath_list IPv4 multipath list receive tests TEST: Multipath route hit ratio (1.99) [ OK ] After: # ./fib_tests.sh -t ipv4_mpath_list IPv4 multipath list receive tests TEST: Multipath route hit ratio (.99) [ OK ] Fixes: 8ae9efb859c0 ("selftests: fib_tests: Add multipath list receive tests") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/netdev/202309191658.c00d8b8-oliver.sang@intel.com/ Tested-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Tested-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Link: https://lore.kernel.org/r/20231010132113.3014691-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13	tcp: Fix listen() warning with v4-mapped-v6 address.	Kuniyuki Iwashima	1	-15/+9
	syzbot reported a warning [0] introduced by commit c48ef9c4aed3 ("tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address."). After the cited commit, a v4 socket's address matches the corresponding v4-mapped-v6 tb2 in inet_bind2_bucket_match_addr(), not vice versa. During X.X.X.X -> ::ffff:X.X.X.X order bind()s, the second bind() uses bhash and conflicts properly without checking bhash2 so that we need not check if a v4-mapped-v6 sk matches the corresponding v4 address tb2 in inet_bind2_bucket_match_addr(). However, the repro shows that we need to check that in a no-conflict case. The repro bind()s two sockets to the 2-tuples using SO_REUSEPORT and calls listen() for the first socket: from socket import * s1 = socket() s1.setsockopt(SOL_SOCKET, SO_REUSEPORT, 1) s1.bind(('127.0.0.1', 0)) s2 = socket(AF_INET6) s2.setsockopt(SOL_SOCKET, SO_REUSEPORT, 1) s2.bind(('::ffff:127.0.0.1', s1.getsockname()[1])) s1.listen() The second socket should belong to the first socket's tb2, but the second bind() creates another tb2 bucket because inet_bind2_bucket_find() returns NULL in inet_csk_get_port() as the v4-mapped-v6 sk does not match the corresponding v4 address tb2. bhash2[] -> tb2(::ffff:X.X.X.X) -> tb2(X.X.X.X) Then, listen() for the first socket calls inet_csk_get_port(), where the v4 address matches the v4-mapped-v6 tb2 and WARN_ON() is triggered. To avoid that, we need to check if v4-mapped-v6 sk address matches with the corresponding v4 address tb2 in inet_bind2_bucket_match(). The same checks are needed in inet_bind2_bucket_addr_match() too, so we can move all checks there and call it from inet_bind2_bucket_match(). Note that now tb->family is just an address family of tb->(v6_)?rcv_saddr and not of sockets in the bucket. This could be refactored later by defining tb->rcv_saddr as tb->v6_rcv_saddr.s6_addr32[3] and prepending ::ffff: when creating v4 tb2. [0]: WARNING: CPU: 0 PID: 5049 at net/ipv4/inet_connection_sock.c:587 inet_csk_get_port+0xf96/0x2350 net/ipv4/inet_connection_sock.c:587 Modules linked in: CPU: 0 PID: 5049 Comm: syz-executor288 Not tainted 6.6.0-rc2-syzkaller-00018-g2cf0f7156238 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023 RIP: 0010:inet_csk_get_port+0xf96/0x2350 net/ipv4/inet_connection_sock.c:587 Code: 7c 24 08 e8 4c b6 8a 01 31 d2 be 88 01 00 00 48 c7 c7 e0 94 ae 8b e8 59 2e a3 f8 2e 2e 2e 31 c0 e9 04 fe ff ff e8 ca 88 d0 f8 <0f> 0b e9 0f f9 ff ff e8 be 88 d0 f8 49 8d 7e 48 e8 65 ca 5a 00 31 RSP: 0018:ffffc90003abfbf0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff888026429100 RCX: 0000000000000000 RDX: ffff88807edcbb80 RSI: ffffffff88b73d66 RDI: ffff888026c49f38 RBP: ffff888026c49f30 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff9260f200 R13: ffff888026c49880 R14: 0000000000000000 R15: ffff888026429100 FS: 00005555557d5380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000045ad50 CR3: 0000000025754000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> inet_csk_listen_start+0x155/0x360 net/ipv4/inet_connection_sock.c:1256 __inet_listen_sk+0x1b8/0x5c0 net/ipv4/af_inet.c:217 inet_listen+0x93/0xd0 net/ipv4/af_inet.c:239 __sys_listen+0x194/0x270 net/socket.c:1866 __do_sys_listen net/socket.c:1875 [inline] __se_sys_listen net/socket.c:1873 [inline] __x64_sys_listen+0x53/0x80 net/socket.c:1873 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f3a5bce3af9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc1a1c79e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000032 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3a5bce3af9 RDX: 00007f3a5bce3af9 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 00007f3a5bd565f0 R08: 0000000000000006 R09: 0000000000000006 R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000001 R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001 </TASK> Fixes: c48ef9c4aed3 ("tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address.") Reported-by: syzbot+71e724675ba3958edb31@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=71e724675ba3958edb31 Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231010013814.70571-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13	bonding: Return pointer to data after pull on skb	Jiri Wiesner	1	-1/+1
	Since 429e3d123d9a ("bonding: Fix extraction of ports from the packet headers"), header offsets used to compute a hash in bond_xmit_hash() are relative to skb->data and not skb->head. If the tail of the header buffer of an skb really needs to be advanced and the operation is successful, the pointer to the data must be returned (and not a pointer to the head of the buffer). Fixes: 429e3d123d9a ("bonding: Fix extraction of ports from the packet headers") Signed-off-by: Jiri Wiesner <jwiesner@suse.de> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-12	IXP4xx MAINTAINERS entries	Krzysztof Hałasa	1	-17/+8
	Update MAINTAINERS entries for Intel IXP4xx SoCs. Linus has been handling all IXP4xx stuff since 2019 or so. Signed-off-by: Krzysztof Hałasa <khalasa@piap.pl> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Deepak Saxena <dsaxena@plexity.net> Link: https://lore.kernel.org/r/m3ttqxu4ru.fsf@t19.piap.pl Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-10-12	rswitch: Fix imbalance phy_power_off() calling	Yoshihiro Shimoda	1	-1/+1
	The phy_power_off() should not be called if phy_power_on() failed. So, add a condition .power_count before calls phy_power_off(). Fixes: 5cb630925b49 ("net: renesas: rswitch: Add phy_power_{on,off}() calling") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-12	rswitch: Fix renesas_eth_sw_remove() implementation	Yoshihiro Shimoda	1	-4/+6
	Fix functions calling order and a condition in renesas_eth_sw_remove(). Otherwise, kernel NULL pointer dereference happens from phy_stop() if a net device opens. Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"") Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-12	octeontx2-pf: Fix page pool frag allocation warning	Ratheesh Kannoth	1	-0/+1
	Since page pool param's "order" is set to 0, will result in below warn message if interface is configured with higher rx buffer size. Steps to reproduce the issue. 1. devlink dev param set pci/0002:04:00.0 name receive_buffer_size \ value 8196 cmode runtime 2. ifconfig eth0 up [ 19.901356] ------------[ cut here ]------------ [ 19.901361] WARNING: CPU: 11 PID: 12331 at net/core/page_pool.c:567 page_pool_alloc_frag+0x3c/0x230 [ 19.901449] pstate: 82401009 (Nzcv daif +PAN -UAO +TCO -DIT +SSBS BTYPE=--) [ 19.901451] pc : page_pool_alloc_frag+0x3c/0x230 [ 19.901453] lr : __otx2_alloc_rbuf+0x60/0xbc [rvu_nicpf] [ 19.901460] sp : ffff80000f66b970 [ 19.901461] x29: ffff80000f66b970 x28: 0000000000000000 x27: 0000000000000000 [ 19.901464] x26: ffff800000d15b68 x25: ffff000195b5c080 x24: ffff0002a5a32dc0 [ 19.901467] x23: ffff0001063c0878 x22: 0000000000000100 x21: 0000000000000000 [ 19.901469] x20: 0000000000000000 x19: ffff00016f781000 x18: 0000000000000000 [ 19.901472] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 19.901474] x14: 0000000000000000 x13: ffff0005ffdc9c80 x12: 0000000000000000 [ 19.901477] x11: ffff800009119a38 x10: 4c6ef2e3ba300519 x9 : ffff800000d13844 [ 19.901479] x8 : ffff0002a5a33cc8 x7 : 0000000000000030 x6 : 0000000000000030 [ 19.901482] x5 : 0000000000000005 x4 : 0000000000000000 x3 : 0000000000000a20 [ 19.901484] x2 : 0000000000001080 x1 : ffff80000f66b9d4 x0 : 0000000000001000 [ 19.901487] Call trace: [ 19.901488] page_pool_alloc_frag+0x3c/0x230 [ 19.901490] __otx2_alloc_rbuf+0x60/0xbc [rvu_nicpf] [ 19.901494] otx2_rq_aura_pool_init+0x1c4/0x240 [rvu_nicpf] [ 19.901498] otx2_open+0x228/0xa70 [rvu_nicpf] [ 19.901501] otx2vf_open+0x20/0xd0 [rvu_nicvf] [ 19.901504] __dev_open+0x114/0x1d0 [ 19.901507] __dev_change_flags+0x194/0x210 [ 19.901510] dev_change_flags+0x2c/0x70 [ 19.901512] devinet_ioctl+0x3a4/0x6c4 [ 19.901515] inet_ioctl+0x228/0x240 [ 19.901518] sock_ioctl+0x2ac/0x480 [ 19.901522] __arm64_sys_ioctl+0x564/0xe50 [ 19.901525] invoke_syscall.constprop.0+0x58/0xf0 [ 19.901529] do_el0_svc+0x58/0x150 [ 19.901531] el0_svc+0x30/0x140 [ 19.901533] el0t_64_sync_handler+0xe8/0x114 [ 19.901535] el0t_64_sync+0x1a0/0x1a4 [ 19.901537] ---[ end trace 678c0bf660ad8116 ]--- Fixes: b2e3406a38f0 ("octeontx2-pf: Add support for page pool") Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Link: https://lore.kernel.org/r/20231010034842.3807816-1-rkannoth@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-12	nfc: nci: assert requested protocol is valid	Jeremy Cline	1	-0/+5
	The protocol is used in a bit mask to determine if the protocol is supported. Assert the provided protocol is less than the maximum defined so it doesn't potentially perform a shift-out-of-bounds and provide a clearer error for undefined protocols vs unsupported ones. Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation") Reported-and-tested-by: syzbot+0839b78e119aae1fec78@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0839b78e119aae1fec78 Signed-off-by: Jeremy Cline <jeremy@jcline.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231009200054.82557-1-jeremy@jcline.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-12	af_packet: Fix fortified memcpy() without flex array.	Kuniyuki Iwashima	2	-6/+7
	Sergei Trofimovich reported a regression [0] caused by commit a0ade8404c3b ("af_packet: Fix warning of fortified memcpy() in packet_getname()."). It introduced a flex array sll_addr_flex in struct sockaddr_ll as a union-ed member with sll_addr to work around the fortified memcpy() check. However, a userspace program uses a struct that has struct sockaddr_ll in the middle, where a flex array is illegal to exist. include/linux/if_packet.h:24:17: error: flexible array member 'sockaddr_ll::<unnamed union>::<unnamed struct>::sll_addr_flex' not at end of 'struct packet_info_t' 24 \| __DECLARE_FLEX_ARRAY(unsigned char, sll_addr_flex); \| ^~~~~~~~~~~~~~~~~~~~ To fix the regression, let's go back to the first attempt [1] telling memcpy() the actual size of the array. Reported-by: Sergei Trofimovich <slyich@gmail.com> Closes: https://github.com/NixOS/nixpkgs/pull/252587#issuecomment-1741733002 [0] Link: https://lore.kernel.org/netdev/20230720004410.87588-3-kuniyu@amazon.com/ [1] Fixes: a0ade8404c3b ("af_packet: Fix warning of fortified memcpy() in packet_getname().") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20231009153151.75688-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-12	pinctrl: renesas: rzn1: Enable missing PINMUX	Ralph Siemsen	1	-0/+1
	Enable pin muxing (eg. programmable function), so that the RZ/N1 GPIO pins will be configured as specified by the pinmux in the DTS. This used to be enabled implicitly via CONFIG_GENERIC_PINMUX_FUNCTIONS, however that was removed, since the RZ/N1 driver does not call any of the generic pinmux functions. Fixes: 1308fb4e4eae14e6 ("pinctrl: rzn1: Do not select GENERIC_PIN{CTRL_GROUPS,MUX_FUNCTIONS}") Signed-off-by: Ralph Siemsen <ralph.siemsen@linaro.org> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20231004200008.1306798-1-ralph.siemsen@linaro.org Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-10-11	net: tcp: fix crashes trying to free half-baked MTU probes	Jakub Kicinski	1	-0/+1
	tcp_stream_alloc_skb() initializes the skb to use tcp_tsorted_anchor which is a union with the destructor. We need to clean that TCP-iness up before freeing. Fixes: 736013292e3c ("tcp: let tcp_mtu_probe() build headless packets") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20231010173651.3990234-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-11	btrfs: add __counted_by for struct btrfs_delayed_item and use struct_size()	Gustavo A. R. Silva	2	-2/+2
	Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). While there, use struct_size() helper, instead of the open-coded version, to calculate the size for the allocation of the whole flexible structure, including of course, the flexible-array member. This code was found with the help of Coccinelle, and audited and fixed manually. Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-11	net/smc: Fix pos miscalculation in statistics	Nils Hoppmann	1	-5/+9
	SMC_STAT_PAYLOAD_SUB(_smc_stats, _tech, key, _len, _rc) will calculate wrong bucket positions for payloads of exactly 4096 bytes and (1 << (m + 12)) bytes, with m == SMC_BUF_MAX - 1. Intended bucket distribution: Assume l == size of payload, m == SMC_BUF_MAX - 1. Bucket 0 : 0 < l <= 2^13 Bucket n, 1 <= n <= m-1 : 2^(n+12) < l <= 2^(n+13) Bucket m : l > 2^(m+12) Current solution: _pos = fls64((l) >> 13) [...] _pos = (_pos < m) ? ((l == 1 << (_pos + 12)) ? _pos - 1 : _pos) : m For l == 4096, _pos == -1, but should be _pos == 0. For l == (1 << (m + 12)), _pos == m, but should be _pos == m - 1. In order to avoid special treatment of these corner cases, the calculation is adjusted. The new solution first subtracts the length by one, and then calculates the correct bucket by shifting accordingly, i.e. _pos = fls64((l - 1) >> 13), l > 0. This not only fixes the issues named above, but also makes the whole bucket assignment easier to follow. Same is done for SMC_STAT_RMB_SIZE_SUB(_smc_stats, _tech, k, _len), where the calculation of the bucket position is similar to the one named above. Fixes: e0e4b8fa5338 ("net/smc: Add SMC statistics support") Suggested-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Nils Hoppmann <niho@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-11	nfp: flower: avoid rmmod nfp crash issues	Yanguo Li	6	-23/+54
	When there are CT table entries, and you rmmod nfp, the following events can happen: task1： nfp_net_pci_remove ↓ nfp_flower_stop->(asynchronous)tcf_ct_flow_table_cleanup_work(3) ↓ nfp_zone_table_entry_destroy(1) task2： nfp_fl_ct_handle_nft_flow(2) When the execution order is (1)->(2)->(3), it will crash. Therefore, in the function nfp_fl_ct_del_flow, nf_flow_table_offload_del_cb needs to be executed synchronously. At the same time, in order to solve the deadlock problem and the problem of rtnl_lock sometimes failing, replace rtnl_lock with the private nfp_fl_lock. Fixes: 7cc93d888df7 ("nfp: flower-ct: remove callback delete deadlock") Cc: stable@vger.kernel.org Signed-off-by: Yanguo Li <yanguo.li@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-10	net: usb: dm9601: fix uninitialized variable use in dm9601_mdio_read	Javier Carrasco	1	-1/+6
	syzbot has found an uninit-value bug triggered by the dm9601 driver [1]. This error happens because the variable res is not updated if the call to dm_read_shared_word returns an error. In this particular case -EPROTO was returned and res stayed uninitialized. This can be avoided by checking the return value of dm_read_shared_word and propagating the error if the read operation failed. [1] https://syzkaller.appspot.com/bug?extid=1f53a30781af65d2c955 Cc: stable@vger.kernel.org Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com> Reported-and-tested-by: syzbot+1f53a30781af65d2c955@syzkaller.appspotmail.com Acked-by: Peter Korsgaard <peter@korsgaard.com> Fixes: d0374f4f9c35cdfbee0 ("USB: Davicom DM9601 usbnet driver") Link: https://lore.kernel.org/r/20231009-topic-dm9601_uninit_mdio_read-v2-1-f2fe39739b6c@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-10	ethtool: Fix mod state of verbose no_mask bitset	Kory Maincent	1	-6/+26
	A bitset without mask in a _SET request means we want exactly the bits in the bitset to be set. This works correctly for compact format but when verbose format is parsed, ethnl_update_bitset32_verbose() only sets the bits present in the request bitset but does not clear the rest. The commit 6699170376ab fixes this issue by clearing the whole target bitmap before we start iterating. The solution proposed brought an issue with the behavior of the mod variable. As the bitset is always cleared the old val will always differ to the new val. Fix it by adding a new temporary variable which save the state of the old bitmap. Fixes: 6699170376ab ("ethtool: fix application of verbose no_mask bitset") Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231009133645.44503-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-10	net: nfc: fix races in nfc_llcp_sock_get() and nfc_llcp_sock_get_sn()	Eric Dumazet	1	-18/+12
	Sili Luo reported a race in nfc_llcp_sock_get(), leading to UAF. Getting a reference on the socket found in a lookup while holding a lock should happen before releasing the lock. nfc_llcp_sock_get_sn() has a similar problem. Finally nfc_llcp_recv_snl() needs to make sure the socket found by nfc_llcp_sock_from_sn() does not disappear. Fixes: 8f50020ed9b8 ("NFC: LLCP late binding") Reported-by: Sili Luo <rootlab@huawei.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20231009123110.3735515-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-10	mctp: perform route lookups under a RCU read-side lock	Jeremy Kerr	1	-6/+16
	Our current route lookups (mctp_route_lookup and mctp_route_lookup_null) traverse the net's route list without the RCU read lock held. This means the route lookup is subject to preemption, resulting in an potential grace period expiry, and so an eventual kfree() while we still have the route pointer. Add the proper read-side critical section locks around the route lookups, preventing premption and a possible parallel kfree. The remaining net->mctp.routes accesses are already under a rcu_read_lock, or protected by the RTNL for updates. Based on an analysis from Sili Luo <rootlab@huawei.com>, where introducing a delay in the route lookup could cause a UAF on simultaneous sendmsg() and route deletion. Reported-by: Sili Luo <rootlab@huawei.com> Fixes: 889b7da23abf ("mctp: Add initial routing framework") Cc: stable@vger.kernel.org Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/29c4b0e67dc1bf3571df3982de87df90cae9b631.1696837310.git.jk@codeconstruct.com.au Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-10	net: skbuff: fix kernel-doc typos	Randy Dunlap	1	-2/+2
	Correct punctuation and drop an extraneous word. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231008214121.25940-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-11	s390/bpf: Fix unwinding past the trampoline	Ilya Leoshkevich	1	-3/+14
	When functions called by the trampoline panic, the backtrace that is printed stops at the trampoline, because the trampoline does not store its caller's frame address (backchain) on stack; it also stores the return address at a wrong location. Store both the same way as is already done for the regular eBPF programs. Fixes: 528eb2cb87bc ("s390/bpf: Implement arch_prepare_bpf_trampoline()") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231010203512.385819-3-iii@linux.ibm.com
2023-10-11	s390/bpf: Fix clobbering the caller's backchain in the trampoline	Ilya Leoshkevich	1	-2/+6
	One of the first things that s390x kernel functions do is storing the the caller's frame address (backchain) on stack. This makes unwinding possible. The backchain is always stored at frame offset 152, which is inside the 160-byte stack area, that the functions allocate for their callees. The callees must preserve the backchain; the remaining 152 bytes they may use as they please. Currently the trampoline uses all 160 bytes, clobbering the backchain. This causes kernel panics when using __builtin_return_address() in functions called by the trampoline. Fix by reducing the usage of the caller-reserved stack area by 8 bytes in the trampoline. Fixes: 528eb2cb87bc ("s390/bpf: Implement arch_prepare_bpf_trampoline()") Reported-by: Song Liu <song@kernel.org> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231010203512.385819-2-iii@linux.ibm.com
2023-10-10	KEYS: trusted: Remove redundant static calls usage	Sumit Garg	1	-8/+5
	Static calls invocations aren't well supported from module __init and __exit functions. Especially the static call from cleanup_trusted() led to a crash on x86 kernel with CONFIG_DEBUG_VIRTUAL=y. However, the usage of static call invocations for trusted_key_init() and trusted_key_exit() don't add any value from either a performance or security perspective. Hence switch to use indirect function calls instead. Note here that although it will fix the current crash report, ultimately the static call infrastructure should be fixed to either support its future usage from module __init and __exit functions or not. Reported-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Link: https://lore.kernel.org/lkml/ZRhKq6e5nF%2F4ZIV1@fedora/#t Fixes: 5d0682be3189 ("KEYS: trusted: Add generic trusted keys framework") Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-10-10	Revert "btrfs: reject unknown mount options early"	David Sterba	1	-4/+0
	This reverts commit 5f521494cc73520ffac18ede0758883b9aedd018. The patch breaks mounts with security mount options like $ mount -o context=system_u:object_r:root_t:s0 /dev/sdX /mn mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdX, missing codepage or helper program, ... We cannot reject all unknown options in btrfs_parse_subvol_options() as intended, the security options can be present at this point and it's not possible to enumerate them in a future proof way. This means unknown mount options are silently accepted like before when the filesystem is mounted with either -o subvol=/path or as followup mounts of the same device. Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-10	net/mlx5e: Again mutually exclude RX-FCS and RX-port-timestamp	Will Mortensen	1	-1/+2
	Commit 1e66220948df8 ("net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change") seems to have accidentally inverted the logic added in commit 0bc73ad46a76 ("net/mlx5e: Mutually exclude RX-FCS and RX-port-timestamp"). The impact of this is a little unclear since it seems the FCS scattered with RX-FCS is (usually?) correct regardless. Fixes: 1e66220948df8 ("net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change") Tested-by: Charlotte Tan <charlotte@extrahop.com> Reviewed-by: Charlotte Tan <charlotte@extrahop.com> Cc: Adham Faris <afaris@nvidia.com> Cc: Aya Levin <ayal@nvidia.com> Cc: Tariq Toukan <tariqt@nvidia.com> Cc: Moshe Shemesh <moshe@nvidia.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Will Mortensen <will@extrahop.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20231006053706.514618-1-will@extrahop.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-10	net/smc: Fix dependency of SMC on ISM	Gerd Bayer	2	-1/+2
	When the SMC protocol is built into the kernel proper while ISM is configured to be built as module, linking the kernel fails due to unresolved dependencies out of net/smc/smc_ism.o to ism_get_smcd_ops, ism_register_client, and ism_unregister_client as reported via the linux-next test automation (see link). This however is a bug introduced a while ago. Correct the dependency list in ISM's and SMC's Kconfig to reflect the dependencies that are actually inverted. With this you cannot build a kernel with CONFIG_SMC=y and CONFIG_ISM=m. Either ISM needs to be 'y', too - or a 'n'. That way, SMC can still be configured on non-s390 architectures that do not have (nor need) an ISM driver. Fixes: 89e7d2ba61b7 ("net/ism: Add new API for client registration") Reported-by: Randy Dunlap <rdunlap@infradead.org> Closes: https://lore.kernel.org/linux-next/d53b5b50-d894-4df8-8969-fd39e63440ae@infradead.org/ Co-developed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Simon Horman <horms@kernel.org> # build-tested Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Link: https://lore.kernel.org/r/20231006125847.1517840-1-gbayer@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-10	ixgbe: fix crash with empty VF macvlan list	Dan Carpenter	1	-2/+3
	The adapter->vf_mvs.l list needs to be initialized even if the list is empty. Otherwise it will lead to crashes. Fixes: a1cbb15c1397 ("ixgbe: Add macvlan support for VF") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Link: https://lore.kernel.org/r/ZSADNdIw8zFx1xw2@kadam Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-10	net/mlx5e: macsec: use update_pn flag instead of PN comparation	Radu Pirea (NXP OSS)	1	-2/+2
	When updating the SA, use the new update_pn flags instead of comparing the new PN with the initial one. Comparing the initial PN value with the new value will allow the user to update the SA using the initial PN value as a parameter like this: $ ip macsec add macsec0 tx sa 0 pn 1 on key 00 \ ead3664f508eb06c40ac7104cdae4ce5 $ ip macsec set macsec0 tx sa 0 pn 1 off Fixes: 8ff0ac5be144 ("net/mlx5: Add MACsec offload Tx command support") Fixes: aae3454e4d4c ("net/mlx5e: Add MACsec offload Rx command support") Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-10	net: phy: mscc: macsec: reject PN update requests	Radu Pirea (NXP OSS)	1	-0/+6
	Updating the PN is not supported. Return -EINVAL if update_pn is true. The following command succeeded, but it should fail because the driver does not update the PN: ip macsec set macsec0 tx sa 0 pn 232 on Fixes: 28c5107aa904 ("net: phy: mscc: macsec support") Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-10	octeontx2-pf: mcs: update PN only when update_pn is true	Radu Pirea (NXP OSS)	1	-4/+9
	When updating SA, update the PN only when the update_pn flag is true. Otherwise, the PN will be reset to its previous value using the following command and this should not happen: $ ip macsec set macsec0 tx sa 0 on Fixes: c54ffc73601c ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading") Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-10	net: macsec: indicate next pn update when offloading	Radu Pirea (NXP OSS)	2	-0/+3
	Indicate next PN update using update_pn flag in macsec_context. Offloaded MACsec implementations does not know whether or not the MACSEC_SA_ATTR_PN attribute was passed for an SA update and assume that next PN should always updated, but this is not always true. The PN can be reset to its initial value using the following command: $ ip macsec set macsec0 tx sa 0 off #octeontx2-pf case Or, the update PN command will succeed even if the driver does not support PN updates. $ ip macsec set macsec0 tx sa 0 pn 1 on #mscc phy driver case Comparing the initial PN with the new PN value is not a solution. When the user updates the PN using its initial value the command will succeed, even if the driver does not support it. Like this: $ ip macsec add macsec0 tx sa 0 pn 1 on key 00 \ ead3664f508eb06c40ac7104cdae4ce5 $ ip macsec set macsec0 tx sa 0 pn 1 on #mlx5 case Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-10	scsi: Do not rescan devices with a suspended queue	Damien Le Moal	1	-5/+6
	Commit ff48b37802e5 ("scsi: Do not attempt to rescan suspended devices") modified scsi_rescan_device() to avoid attempting rescanning a suspended device. However, the modification added a check to verify that a SCSI device is in the running state without checking if the device request queue (in the case of block device) is also running, thus allowing the exectuion of internal requests. Without checking the device request queue, commit ff48b37802e5 fix is incomplete and deadlocks on resume can still happen. Use blk_queue_pm_only() to check if the device request queue allows executing commands in addition to checking the SCSI device state. Reported-by: Petr Tesarik <petr@tesarici.cz> Fixes: ff48b37802e5 ("scsi: Do not attempt to rescan suspended devices") Cc: stable@vger.kernel.org Tested-by: Petr Tesarik <petr@tesarici.cz> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-10-10	ata: pata_parport: fit3: implement IDE command set registers	Ondrej Zary	1	-12/+2
	fit3 protocol driver does not support accessing IDE control registers (device control/altstatus). The DOS driver does not use these registers either (as observed from DOSEMU trace). But the HW seems to be capable of accessing these registers - I simply tried bit 3 and it works! The control register is required to properly reset ATAPI devices or they will be detected only once (after a power cycle). Tested with EXP Computer CD-865 with MC-1285B EPP cable and TransDisk 3000. Signed-off-by: Ondrej Zary <linux@zary.sk> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-10-10	ata: pata_parport: add custom version of wait_after_reset	Ondrej Zary	1	-1/+67
	Some parallel adapters (e.g. EXP Computer MC-1285B EPP Cable) return bogus values when there's no master device present. This can cause reset to fail, preventing the lone slave device (such as EXP Computer CD-865) from working. Add custom version of wait_after_reset that ignores master failure when a slave device is present. The custom version is also needed because the generic ata_sff_wait_after_reset uses direct port I/O for slave device detection. Signed-off-by: Ondrej Zary <linux@zary.sk> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-10-10	ata: pata_parport: implement set_devctl	Ondrej Zary	1	-0/+8
	Add missing ops->sff_set_devctl implementation. Fixes: 246a1c4c6b7f ("ata: pata_parport: add driver (PARIDE replacement)") Cc: stable@vger.kernel.org Signed-off-by: Ondrej Zary <linux@zary.sk> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-10-10	ata: pata_parport: fix pata_parport_devchk	Ondrej Zary	1	-1/+1
	There's a 'x' missing in 0x55 in pata_parport_devchk(), causing the detection to always fail. Fix it. Fixes: 246a1c4c6b7f ("ata: pata_parport: add driver (PARIDE replacement)") Cc: stable@vger.kernel.org Signed-off-by: Ondrej Zary <linux@zary.sk> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
2023-10-10	hv/hv_kvp_daemon:Support for keyfile based connection profile	Shradha Gupta	2	-37/+235
	Ifcfg config file support in NetworkManger is deprecated. This patch provides support for the new keyfile config format for connection profiles in NetworkManager. The patch modifies the hv_kvp_daemon code to generate the new network configuration in keyfile format(.ini-style format) along with a ifcfg format configuration. The ifcfg format configuration is also retained to support easy backward compatibility for distro vendors. These configurations are stored in temp files which are further translated using the hv_set_ifconfig.sh script. This script is implemented by individual distros based on the network management commands supported. For example, RHEL's implementation could be found here: https://gitlab.com/redhat/centos-stream/src/hyperv-daemons/-/blob/c9s/hv_set_ifconfig.sh Debian's implementation could be found here: https://github.com/endlessm/linux/blob/master/debian/cloud-tools/hv_set_ifconfig The next part of this support is to let the Distro vendors consume these modified implementations to the new configuration format. Tested-on: Rhel9(Hyper-V, Azure)(nm and ifcfg files verified) Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Ani Sinha <anisinha@redhat.com> Signed-off-by: Wei Liu <wei.liu@kernel.org> Link: https://lore.kernel.org/r/1696847920-31125-1-git-send-email-shradhagupta@linux.microsoft.com
2023-10-09	net: refine debug info in skb_checksum_help()	Eric Dumazet	1	-2/+6
	syzbot uses panic_on_warn. This means that the skb_dump() I added in the blamed commit are not even called. Rewrite this so that we get the needed skb dump before syzbot crashes. Fixes: eeee4b77dc52 ("net: add more debug info in skb_checksum_help()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20231006173355.2254983-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-09	selftests/bpf: Add testcase for async callback return value failure	David Vernet	2	-2/+51
	A previous commit updated the verifier to print an accurate failure message for when someone specifies a nonzero return value from an async callback. This adds a testcase for validating that the verifier emits the correct message in such a case. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231009161414.235829-2-void@manifault.com
2023-10-09	bpf: Fix verifier log for async callback return values	David Vernet	1	-3/+3
	The verifier, as part of check_return_code(), verifies that async callbacks such as from e.g. timers, will return 0. It does this by correctly checking that R0->var_off is in tnum_const(0), which effectively checks that it's in a range of 0. If this condition fails, however, it prints an error message which says that the value should have been in (0x0; 0x1). This results in possibly confusing output such as the following in which an async callback returns 1: At async callback the register R0 has value (0x1; 0x0) should have been in (0x0; 0x1) The fix is easy -- we should just pass the tnum_const(0) as the correct range to verbose_invalid_scalar(), which will then print the following: At async callback the register R0 has value (0x1; 0x0) should have been in (0x0; 0x0) Fixes: bfc6bb74e4f1 ("bpf: Implement verifier support for validation of async callbacks.") Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231009161414.235829-1-void@manifault.com
2023-10-09	xdp: Fix zero-size allocation warning in xskq_create()	Andrew Kanner	1	-0/+10
	Syzkaller reported the following issue: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 2807 at mm/vmalloc.c:3247 __vmalloc_node_range (mm/vmalloc.c:3361) Modules linked in: CPU: 0 PID: 2807 Comm: repro Not tainted 6.6.0-rc2+ #12 Hardware name: Generic DT based system unwind_backtrace from show_stack (arch/arm/kernel/traps.c:258) show_stack from dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) dump_stack_lvl from __warn (kernel/panic.c:633 kernel/panic.c:680) __warn from warn_slowpath_fmt (./include/linux/context_tracking.h:153 kernel/panic.c:700) warn_slowpath_fmt from __vmalloc_node_range (mm/vmalloc.c:3361 (discriminator 3)) __vmalloc_node_range from vmalloc_user (mm/vmalloc.c:3478) vmalloc_user from xskq_create (net/xdp/xsk_queue.c:40) xskq_create from xsk_setsockopt (net/xdp/xsk.c:953 net/xdp/xsk.c:1286) xsk_setsockopt from __sys_setsockopt (net/socket.c:2308) __sys_setsockopt from ret_fast_syscall (arch/arm/kernel/entry-common.S:68) xskq_get_ring_size() uses struct_size() macro to safely calculate the size of struct xsk_queue and q->nentries of desc members. But the syzkaller repro was able to set q->nentries with the value initially taken from copy_from_sockptr() high enough to return SIZE_MAX by struct_size(). The next PAGE_ALIGN(size) is such case will overflow the size_t value and set it to 0. This will trigger WARN_ON_ONCE in vmalloc_user() -> __vmalloc_node_range(). The issue is reproducible on 32-bit arm kernel. Fixes: 9f78bf330a66 ("xsk: support use vaddr as ring") Reported-by: syzbot+fae676d3cf469331fc89@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000c84b4705fb31741e@google.com/T/ Reported-by: syzbot+b132693e925cbbd89e26@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000e20df20606ebab4f@google.com/T/ Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: syzbot+fae676d3cf469331fc89@syzkaller.appspotmail.com Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://syzkaller.appspot.com/bug?extid=fae676d3cf469331fc89 Link: https://lore.kernel.org/bpf/20231007075148.1759-1-andrew.kanner@gmail.com
2023-10-09	riscv, bpf: Track both a0 (RISC-V ABI) and a5 (BPF) return values	Björn Töpel	1	-4/+9
	The RISC-V BPF uses a5 for BPF return values, which are zero-extended, whereas the RISC-V ABI uses a0 which is sign-extended. In other words, a5 and a0 can differ, and are used in different context. The BPF trampoline are used for both BPF programs, and regular kernel functions. Make sure that the RISC-V BPF trampoline saves, and restores both a0 and a5. Fixes: 49b5e77ae3e2 ("riscv, bpf: Add bpf trampoline support for RV64") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231004120706.52848-3-bjorn@kernel.org
2023-10-09	riscv, bpf: Sign-extend return values	Björn Töpel	1	-2/+3
	The RISC-V architecture does not expose sub-registers, and hold all 32-bit values in a sign-extended format [1] [2]: \| The compiler and calling convention maintain an invariant that all \| 32-bit values are held in a sign-extended format in 64-bit \| registers. Even 32-bit unsigned integers extend bit 31 into bits \| 63 through 32. Consequently, conversion between unsigned and \| signed 32-bit integers is a no-op, as is conversion from a signed \| 32-bit integer to a signed 64-bit integer. While BPF, on the other hand, exposes sub-registers, and use zero-extension (similar to arm64/x86). This has led to some subtle bugs, where a BPF JITted program has not sign-extended the a0 register (return value in RISC-V land), passed the return value up the kernel, e.g.: \| int from_bpf(void); \| \| long foo(void) \| { \| return from_bpf(); \| } Here, a0 would be 0xffff_ffff, instead of the expected 0xffff_ffff_ffff_ffff. Internally, the RISC-V JIT uses a5 as a dedicated register for BPF return values. Keep a5 zero-extended, but explicitly sign-extend a0 (which is used outside BPF land). Now that a0 (RISC-V ABI) and a5 (BPF ABI) differs, a0 is only moved to a5 for non-BPF native calls (BPF_PSEUDO_CALL). Fixes: 2353ecc6f91f ("bpf, riscv: add BPF JIT for RV64G") Signed-off-by: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://github.com/riscv/riscv-isa-manual/releases/download/riscv-isa-release-056b6ff-2023-10-02/unpriv-isa-asciidoc.pdf # [2] Link: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/releases/download/draft-20230929-e5c800e661a53efe3c2678d71a306323b60eb13b/riscv-abi.pdf # [2] Link: https://lore.kernel.org/bpf/20231004120706.52848-2-bjorn@kernel.org
2023-10-09	printk: flush consoles before checking progress	John Ogness	1	-1/+7
	Commit 9e70a5e109a4 ("printk: Add per-console suspended state") removed console lock usage during resume and replaced it with the clearly defined console_list_lock and srcu mechanisms. However, the console lock usage had an important side-effect of flushing the consoles. After its removal, consoles were no longer flushed before checking their progress. Add the console_lock/console_unlock dance to the beginning of __pr_flush() to actually flush the consoles before checking their progress. Also add comments to clarify this additional usage of the console lock. Note that console_unlock() does not guarantee flushing all messages since the commit dbdda842fe96f89 ("printk: Add console owner and waiter logic to load balance console writes"). Reported-by: Todd Brandt <todd.e.brandt@intel.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217955 Fixes: 9e70a5e109a4 ("printk: Add per-console suspended state") Co-developed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: John Ogness <john.ogness@linutronix.de> Link: https://lore.kernel.org/r/20231006082151.6969-2-pmladek@suse.com
2023-10-09	xen/events: replace evtchn_rwlock with RCU	Juergen Gross	1	-41/+46
	In unprivileged Xen guests event handling can cause a deadlock with Xen console handling. The evtchn_rwlock and the hvc_lock are taken in opposite sequence in __hvc_poll() and in Xen console IRQ handling. Normally this is no problem, as the evtchn_rwlock is taken as a reader in both paths, but as soon as an event channel is being closed, the lock will be taken as a writer, which will cause read_lock() to block: CPU0 CPU1 CPU2 (IRQ handling) (__hvc_poll()) (closing event channel) read_lock(evtchn_rwlock) spin_lock(hvc_lock) write_lock(evtchn_rwlock) [blocks] spin_lock(hvc_lock) [blocks] read_lock(evtchn_rwlock) [blocks due to writer waiting, and not in_interrupt()] This issue can be avoided by replacing evtchn_rwlock with RCU in xen_free_irq(). Note that RCU is used only to delay freeing of the irq_info memory. There is no RCU based dereferencing or replacement of pointers involved. In order to avoid potential races between removing the irq_info reference and handling of interrupts, set the irq_info pointer to NULL only when freeing its memory. The IRQ itself must be freed at that time, too, as otherwise the same IRQ number could be allocated again before handling of the old instance would have been finished. This is XSA-441 / CVE-2023-34324. Fixes: 54c9de89895e ("xen/events: add a new "late EOI" evtchn framework") Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2023-10-09	ALSA: usb-audio: Fix microphone sound on Nexigo webcam.	Christos Skevis	2	-0/+9
	I own an external usb Webcam, model NexiGo N930AF, which had low mic volume and inconsistent sound quality. Video works as expected. (snip) [ +0.047857] usb 5-1: new high-speed USB device number 2 using xhci_hcd [ +0.003406] usb 5-1: New USB device found, idVendor=1bcf, idProduct=2283, bcdDevice=12.17 [ +0.000007] usb 5-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ +0.000004] usb 5-1: Product: NexiGo N930AF FHD Webcam [ +0.000003] usb 5-1: Manufacturer: SHENZHEN AONI ELECTRONIC CO., LTD [ +0.000004] usb 5-1: SerialNumber: 20201217011 [ +0.003900] usb 5-1: Found UVC 1.00 device NexiGo N930AF FHD Webcam (1bcf:2283) [ +0.025726] usb 5-1: 3:1: cannot get usb sound sample rate freq at ep 0x86 [ +0.071482] usb 5-1: 3:2: cannot get usb sound sample rate freq at ep 0x86 [ +0.004679] usb 5-1: 3:3: cannot get usb sound sample rate freq at ep 0x86 [ +0.051607] usb 5-1: Warning! Unlikely big volume range (=4096), cval->res is probably wrong. [ +0.000005] usb 5-1: [7] FU [Mic Capture Volume] ch = 1, val = 0/4096/1 Set up quirk cval->res to 16 for 256 levels, Set GET_SAMPLE_RATE quirk flag to stop trying to get the sample rate. Confirmed that happened anyway later due to the backoff mechanism, after 3 failures All audio stream on device interfaces share the same values, apart from wMaxPacketSize and tSamFreq : (snip) Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 3 bAlternateSetting 3 bNumEndpoints 1 bInterfaceClass 1 Audio bInterfaceSubClass 2 Streaming bInterfaceProtocol 0 iInterface 0 AudioStreaming Interface Descriptor: bLength 7 bDescriptorType 36 bDescriptorSubtype 1 (AS_GENERAL) bTerminalLink 8 bDelay 1 frames wFormatTag 0x0001 PCM AudioStreaming Interface Descriptor: bLength 11 bDescriptorType 36 bDescriptorSubtype 2 (FORMAT_TYPE) bFormatType 1 (FORMAT_TYPE_I) bNrChannels 1 bSubframeSize 2 bBitResolution 16 bSamFreqType 1 Discrete tSamFreq[ 0] 44100 Endpoint Descriptor: bLength 9 bDescriptorType 5 bEndpointAddress 0x86 EP 6 IN bmAttributes 5 Transfer Type Isochronous Synch Type Asynchronous Usage Type Data wMaxPacketSize 0x005c 1x 92 bytes bInterval 4 bRefresh 0 bSynchAddress 0 AudioStreaming Endpoint Descriptor: bLength 7 bDescriptorType 37 bDescriptorSubtype 1 (EP_GENERAL) bmAttributes 0x01 Sampling Frequency bLockDelayUnits 0 Undefined wLockDelay 0x0000 (snip) Based on the usb data about manufacturer, SPCA2281B3 is the most likely controller IC Manufacturer does not provide link for datasheet nor detailed specs. No way to confirm if the firmware supports any other way of getting the sample rate. Testing patch provides consistent good sound recording quality and volume range. (snip) [ +0.045764] usb 5-1: new high-speed USB device number 2 using xhci_hcd [ +0.106290] usb 5-1: New USB device found, idVendor=1bcf, idProduct=2283, bcdDevice=12.17 [ +0.000006] usb 5-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ +0.000004] usb 5-1: Product: NexiGo N930AF FHD Webcam [ +0.000003] usb 5-1: Manufacturer: SHENZHEN AONI ELECTRONIC CO., LTD [ +0.000004] usb 5-1: SerialNumber: 20201217011 [ +0.043700] usb 5-1: set resolution quirk: cval->res = 16 [ +0.002585] usb 5-1: Found UVC 1.00 device NexiGo N930AF FHD Webcam (1bcf:2283) Signed-off-by: Christos Skevis <xristos.thes@gmail.com> Link: https://lore.kernel.org/r/20231006155330.399393-1-xristos.thes@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-10-08	Linux 6.6-rc5	Linus Torvalds	1	-1/+1