Age | Commit message (Collapse) | Author | Files | Lines |
|
As reported by Tom, .NET and applications build on top of it rely
on connect(AF_UNSPEC) to async cancel pending I/O operations on TCP
socket.
The blamed commit below caused a regression, as such cancellation
can now fail.
As suggested by Eric, this change addresses the problem explicitly
causing blocking I/O operation to terminate immediately (with an error)
when a concurrent disconnect() is executed.
Instead of tracking the number of threads blocked on a given socket,
track the number of disconnect() issued on such socket. If such counter
changes after a blocking operation releasing and re-acquiring the socket
lock, error out the current operation.
Fixes: 4faeee0cf8a5 ("tcp: deny tcp_disconnect() when threads are waiting")
Reported-by: Tom Deseyn <tdeseyn@redhat.com>
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1886305
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/f3b95e47e3dbed840960548aebaa8d954372db41.1697008693.git.pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since the introduction of the ice driver the code has been
double-shifting the RSS enabling field, because the define already has
shifts in it and can't have the regular pattern of "a << shiftval &
mask" applied.
Most places in the code got it right, but one line was still wrong. Fix
this one location for easy backports to stable. An in-progress patch
fixes the defines to "standard" and will be applied as part of the
regular -next process sometime after this one.
Fixes: d76a60ba7afb ("ice: Add support for VLANs and offloads")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
CC: stable@vger.kernel.org
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231010203101.406248-1-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In bcm_sf2_mdio_register(), the class_find_device() will call get_device()
to increment reference count for priv->master_mii_bus->dev if
of_mdio_find_bus() succeeds. If mdiobus_alloc() or mdiobus_register()
fails, it will call get_device() twice without decrement reference count
for the device. And it is the same if bcm_sf2_mdio_register() succeeds but
fails in bcm_sf2_sw_probe(), or if bcm_sf2_sw_probe() succeeds. If the
reference count has not decremented to zero, the dev related resource will
not be freed.
So remove the get_device() in bcm_sf2_mdio_register(), and call
put_device() if mdiobus_alloc() or mdiobus_register() fails and in
bcm_sf2_mdio_unregister() to solve the issue.
And as Simon suggested, unwind from errors for bcm_sf2_mdio_register() and
just return 0 if it succeeds to make it cleaner.
Fixes: 461cd1b03e32 ("net: dsa: bcm_sf2: Register our slave MDIO bus")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Suggested-by: Simon Horman <horms@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://lore.kernel.org/r/20231011032419.2423290-1-ruanjinjie@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The tests rely on the IPv{4,6} FIB trace points being triggered once for
each forwarded packet. If receive processing is deferred to the
ksoftirqd task these invocations will not be counted and the tests will
fail. Fix by specifying the '-a' flag to avoid perf from filtering on
the mausezahn task.
Before:
# ./fib_tests.sh -t ipv4_mpath_list
IPv4 multipath list receive tests
TEST: Multipath route hit ratio (.68) [FAIL]
# ./fib_tests.sh -t ipv6_mpath_list
IPv6 multipath list receive tests
TEST: Multipath route hit ratio (.27) [FAIL]
After:
# ./fib_tests.sh -t ipv4_mpath_list
IPv4 multipath list receive tests
TEST: Multipath route hit ratio (1.00) [ OK ]
# ./fib_tests.sh -t ipv6_mpath_list
IPv6 multipath list receive tests
TEST: Multipath route hit ratio (.99) [ OK ]
Fixes: 8ae9efb859c0 ("selftests: fib_tests: Add multipath list receive tests")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/netdev/202309191658.c00d8b8-oliver.sang@intel.com/
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Tested-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Link: https://lore.kernel.org/r/20231010132113.3014691-3-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The test relies on the fib:fib_table_lookup trace point being triggered
once for each forwarded packet. If RP filter is not disabled, the trace
point will be triggered twice for each packet (for source validation and
forwarding), potentially masking actual bugs. Fix by explicitly
disabling RP filter.
Before:
# ./fib_tests.sh -t ipv4_mpath_list
IPv4 multipath list receive tests
TEST: Multipath route hit ratio (1.99) [ OK ]
After:
# ./fib_tests.sh -t ipv4_mpath_list
IPv4 multipath list receive tests
TEST: Multipath route hit ratio (.99) [ OK ]
Fixes: 8ae9efb859c0 ("selftests: fib_tests: Add multipath list receive tests")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/netdev/202309191658.c00d8b8-oliver.sang@intel.com/
Tested-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Tested-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Link: https://lore.kernel.org/r/20231010132113.3014691-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot reported a warning [0] introduced by commit c48ef9c4aed3 ("tcp: Fix
bind() regression for v4-mapped-v6 non-wildcard address.").
After the cited commit, a v4 socket's address matches the corresponding
v4-mapped-v6 tb2 in inet_bind2_bucket_match_addr(), not vice versa.
During X.X.X.X -> ::ffff:X.X.X.X order bind()s, the second bind() uses
bhash and conflicts properly without checking bhash2 so that we need not
check if a v4-mapped-v6 sk matches the corresponding v4 address tb2 in
inet_bind2_bucket_match_addr(). However, the repro shows that we need
to check that in a no-conflict case.
The repro bind()s two sockets to the 2-tuples using SO_REUSEPORT and calls
listen() for the first socket:
from socket import *
s1 = socket()
s1.setsockopt(SOL_SOCKET, SO_REUSEPORT, 1)
s1.bind(('127.0.0.1', 0))
s2 = socket(AF_INET6)
s2.setsockopt(SOL_SOCKET, SO_REUSEPORT, 1)
s2.bind(('::ffff:127.0.0.1', s1.getsockname()[1]))
s1.listen()
The second socket should belong to the first socket's tb2, but the second
bind() creates another tb2 bucket because inet_bind2_bucket_find() returns
NULL in inet_csk_get_port() as the v4-mapped-v6 sk does not match the
corresponding v4 address tb2.
bhash2[] -> tb2(::ffff:X.X.X.X) -> tb2(X.X.X.X)
Then, listen() for the first socket calls inet_csk_get_port(), where the
v4 address matches the v4-mapped-v6 tb2 and WARN_ON() is triggered.
To avoid that, we need to check if v4-mapped-v6 sk address matches with
the corresponding v4 address tb2 in inet_bind2_bucket_match().
The same checks are needed in inet_bind2_bucket_addr_match() too, so we
can move all checks there and call it from inet_bind2_bucket_match().
Note that now tb->family is just an address family of tb->(v6_)?rcv_saddr
and not of sockets in the bucket. This could be refactored later by
defining tb->rcv_saddr as tb->v6_rcv_saddr.s6_addr32[3] and prepending
::ffff: when creating v4 tb2.
[0]:
WARNING: CPU: 0 PID: 5049 at net/ipv4/inet_connection_sock.c:587 inet_csk_get_port+0xf96/0x2350 net/ipv4/inet_connection_sock.c:587
Modules linked in:
CPU: 0 PID: 5049 Comm: syz-executor288 Not tainted 6.6.0-rc2-syzkaller-00018-g2cf0f7156238 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/04/2023
RIP: 0010:inet_csk_get_port+0xf96/0x2350 net/ipv4/inet_connection_sock.c:587
Code: 7c 24 08 e8 4c b6 8a 01 31 d2 be 88 01 00 00 48 c7 c7 e0 94 ae 8b e8 59 2e a3 f8 2e 2e 2e 31 c0 e9 04 fe ff ff e8 ca 88 d0 f8 <0f> 0b e9 0f f9 ff ff e8 be 88 d0 f8 49 8d 7e 48 e8 65 ca 5a 00 31
RSP: 0018:ffffc90003abfbf0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff888026429100 RCX: 0000000000000000
RDX: ffff88807edcbb80 RSI: ffffffff88b73d66 RDI: ffff888026c49f38
RBP: ffff888026c49f30 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff9260f200
R13: ffff888026c49880 R14: 0000000000000000 R15: ffff888026429100
FS: 00005555557d5380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000045ad50 CR3: 0000000025754000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
inet_csk_listen_start+0x155/0x360 net/ipv4/inet_connection_sock.c:1256
__inet_listen_sk+0x1b8/0x5c0 net/ipv4/af_inet.c:217
inet_listen+0x93/0xd0 net/ipv4/af_inet.c:239
__sys_listen+0x194/0x270 net/socket.c:1866
__do_sys_listen net/socket.c:1875 [inline]
__se_sys_listen net/socket.c:1873 [inline]
__x64_sys_listen+0x53/0x80 net/socket.c:1873
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f3a5bce3af9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffc1a1c79e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000032
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3a5bce3af9
RDX: 00007f3a5bce3af9 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007f3a5bd565f0 R08: 0000000000000006 R09: 0000000000000006
R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000001
R13: 431bde82d7b634db R14: 0000000000000001 R15: 0000000000000001
</TASK>
Fixes: c48ef9c4aed3 ("tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address.")
Reported-by: syzbot+71e724675ba3958edb31@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=71e724675ba3958edb31
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231010013814.70571-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since 429e3d123d9a ("bonding: Fix extraction of ports from the packet
headers"), header offsets used to compute a hash in bond_xmit_hash() are
relative to skb->data and not skb->head. If the tail of the header buffer
of an skb really needs to be advanced and the operation is successful, the
pointer to the data must be returned (and not a pointer to the head of the
buffer).
Fixes: 429e3d123d9a ("bonding: Fix extraction of ports from the packet headers")
Signed-off-by: Jiri Wiesner <jwiesner@suse.de>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Update MAINTAINERS entries for Intel IXP4xx SoCs.
Linus has been handling all IXP4xx stuff since 2019 or so.
Signed-off-by: Krzysztof Hałasa <khalasa@piap.pl>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Deepak Saxena <dsaxena@plexity.net>
Link: https://lore.kernel.org/r/m3ttqxu4ru.fsf@t19.piap.pl
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
The phy_power_off() should not be called if phy_power_on() failed.
So, add a condition .power_count before calls phy_power_off().
Fixes: 5cb630925b49 ("net: renesas: rswitch: Add phy_power_{on,off}() calling")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Fix functions calling order and a condition in renesas_eth_sw_remove().
Otherwise, kernel NULL pointer dereference happens from phy_stop() if
a net device opens.
Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Since page pool param's "order" is set to 0, will result
in below warn message if interface is configured with higher
rx buffer size.
Steps to reproduce the issue.
1. devlink dev param set pci/0002:04:00.0 name receive_buffer_size \
value 8196 cmode runtime
2. ifconfig eth0 up
[ 19.901356] ------------[ cut here ]------------
[ 19.901361] WARNING: CPU: 11 PID: 12331 at net/core/page_pool.c:567 page_pool_alloc_frag+0x3c/0x230
[ 19.901449] pstate: 82401009 (Nzcv daif +PAN -UAO +TCO -DIT +SSBS BTYPE=--)
[ 19.901451] pc : page_pool_alloc_frag+0x3c/0x230
[ 19.901453] lr : __otx2_alloc_rbuf+0x60/0xbc [rvu_nicpf]
[ 19.901460] sp : ffff80000f66b970
[ 19.901461] x29: ffff80000f66b970 x28: 0000000000000000 x27: 0000000000000000
[ 19.901464] x26: ffff800000d15b68 x25: ffff000195b5c080 x24: ffff0002a5a32dc0
[ 19.901467] x23: ffff0001063c0878 x22: 0000000000000100 x21: 0000000000000000
[ 19.901469] x20: 0000000000000000 x19: ffff00016f781000 x18: 0000000000000000
[ 19.901472] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 19.901474] x14: 0000000000000000 x13: ffff0005ffdc9c80 x12: 0000000000000000
[ 19.901477] x11: ffff800009119a38 x10: 4c6ef2e3ba300519 x9 : ffff800000d13844
[ 19.901479] x8 : ffff0002a5a33cc8 x7 : 0000000000000030 x6 : 0000000000000030
[ 19.901482] x5 : 0000000000000005 x4 : 0000000000000000 x3 : 0000000000000a20
[ 19.901484] x2 : 0000000000001080 x1 : ffff80000f66b9d4 x0 : 0000000000001000
[ 19.901487] Call trace:
[ 19.901488] page_pool_alloc_frag+0x3c/0x230
[ 19.901490] __otx2_alloc_rbuf+0x60/0xbc [rvu_nicpf]
[ 19.901494] otx2_rq_aura_pool_init+0x1c4/0x240 [rvu_nicpf]
[ 19.901498] otx2_open+0x228/0xa70 [rvu_nicpf]
[ 19.901501] otx2vf_open+0x20/0xd0 [rvu_nicvf]
[ 19.901504] __dev_open+0x114/0x1d0
[ 19.901507] __dev_change_flags+0x194/0x210
[ 19.901510] dev_change_flags+0x2c/0x70
[ 19.901512] devinet_ioctl+0x3a4/0x6c4
[ 19.901515] inet_ioctl+0x228/0x240
[ 19.901518] sock_ioctl+0x2ac/0x480
[ 19.901522] __arm64_sys_ioctl+0x564/0xe50
[ 19.901525] invoke_syscall.constprop.0+0x58/0xf0
[ 19.901529] do_el0_svc+0x58/0x150
[ 19.901531] el0_svc+0x30/0x140
[ 19.901533] el0t_64_sync_handler+0xe8/0x114
[ 19.901535] el0t_64_sync+0x1a0/0x1a4
[ 19.901537] ---[ end trace 678c0bf660ad8116 ]---
Fixes: b2e3406a38f0 ("octeontx2-pf: Add support for page pool")
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Link: https://lore.kernel.org/r/20231010034842.3807816-1-rkannoth@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The protocol is used in a bit mask to determine if the protocol is
supported. Assert the provided protocol is less than the maximum
defined so it doesn't potentially perform a shift-out-of-bounds and
provide a clearer error for undefined protocols vs unsupported ones.
Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
Reported-and-tested-by: syzbot+0839b78e119aae1fec78@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=0839b78e119aae1fec78
Signed-off-by: Jeremy Cline <jeremy@jcline.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231009200054.82557-1-jeremy@jcline.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Sergei Trofimovich reported a regression [0] caused by commit a0ade8404c3b
("af_packet: Fix warning of fortified memcpy() in packet_getname().").
It introduced a flex array sll_addr_flex in struct sockaddr_ll as a
union-ed member with sll_addr to work around the fortified memcpy() check.
However, a userspace program uses a struct that has struct sockaddr_ll in
the middle, where a flex array is illegal to exist.
include/linux/if_packet.h:24:17: error: flexible array member 'sockaddr_ll::<unnamed union>::<unnamed struct>::sll_addr_flex' not at end of 'struct packet_info_t'
24 | __DECLARE_FLEX_ARRAY(unsigned char, sll_addr_flex);
| ^~~~~~~~~~~~~~~~~~~~
To fix the regression, let's go back to the first attempt [1] telling
memcpy() the actual size of the array.
Reported-by: Sergei Trofimovich <slyich@gmail.com>
Closes: https://github.com/NixOS/nixpkgs/pull/252587#issuecomment-1741733002 [0]
Link: https://lore.kernel.org/netdev/20230720004410.87588-3-kuniyu@amazon.com/ [1]
Fixes: a0ade8404c3b ("af_packet: Fix warning of fortified memcpy() in packet_getname().")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20231009153151.75688-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Enable pin muxing (eg. programmable function), so that the RZ/N1 GPIO
pins will be configured as specified by the pinmux in the DTS.
This used to be enabled implicitly via CONFIG_GENERIC_PINMUX_FUNCTIONS,
however that was removed, since the RZ/N1 driver does not call any of
the generic pinmux functions.
Fixes: 1308fb4e4eae14e6 ("pinctrl: rzn1: Do not select GENERIC_PIN{CTRL_GROUPS,MUX_FUNCTIONS}")
Signed-off-by: Ralph Siemsen <ralph.siemsen@linaro.org>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20231004200008.1306798-1-ralph.siemsen@linaro.org
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
tcp_stream_alloc_skb() initializes the skb to use tcp_tsorted_anchor
which is a union with the destructor. We need to clean that
TCP-iness up before freeing.
Fixes: 736013292e3c ("tcp: let tcp_mtu_probe() build headless packets")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20231010173651.3990234-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for
array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
While there, use struct_size() helper, instead of the open-coded
version, to calculate the size for the allocation of the whole
flexible structure, including of course, the flexible-array member.
This code was found with the help of Coccinelle, and audited and
fixed manually.
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
SMC_STAT_PAYLOAD_SUB(_smc_stats, _tech, key, _len, _rc) will calculate
wrong bucket positions for payloads of exactly 4096 bytes and
(1 << (m + 12)) bytes, with m == SMC_BUF_MAX - 1.
Intended bucket distribution:
Assume l == size of payload, m == SMC_BUF_MAX - 1.
Bucket 0 : 0 < l <= 2^13
Bucket n, 1 <= n <= m-1 : 2^(n+12) < l <= 2^(n+13)
Bucket m : l > 2^(m+12)
Current solution:
_pos = fls64((l) >> 13)
[...]
_pos = (_pos < m) ? ((l == 1 << (_pos + 12)) ? _pos - 1 : _pos) : m
For l == 4096, _pos == -1, but should be _pos == 0.
For l == (1 << (m + 12)), _pos == m, but should be _pos == m - 1.
In order to avoid special treatment of these corner cases, the
calculation is adjusted. The new solution first subtracts the length by
one, and then calculates the correct bucket by shifting accordingly,
i.e. _pos = fls64((l - 1) >> 13), l > 0.
This not only fixes the issues named above, but also makes the whole
bucket assignment easier to follow.
Same is done for SMC_STAT_RMB_SIZE_SUB(_smc_stats, _tech, k, _len),
where the calculation of the bucket position is similar to the one
named above.
Fixes: e0e4b8fa5338 ("net/smc: Add SMC statistics support")
Suggested-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Nils Hoppmann <niho@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When there are CT table entries, and you rmmod nfp, the following
events can happen:
task1:
nfp_net_pci_remove
↓
nfp_flower_stop->(asynchronous)tcf_ct_flow_table_cleanup_work(3)
↓
nfp_zone_table_entry_destroy(1)
task2:
nfp_fl_ct_handle_nft_flow(2)
When the execution order is (1)->(2)->(3), it will crash. Therefore, in
the function nfp_fl_ct_del_flow, nf_flow_table_offload_del_cb needs to
be executed synchronously.
At the same time, in order to solve the deadlock problem and the problem
of rtnl_lock sometimes failing, replace rtnl_lock with the private
nfp_fl_lock.
Fixes: 7cc93d888df7 ("nfp: flower-ct: remove callback delete deadlock")
Cc: stable@vger.kernel.org
Signed-off-by: Yanguo Li <yanguo.li@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzbot has found an uninit-value bug triggered by the dm9601 driver [1].
This error happens because the variable res is not updated if the call
to dm_read_shared_word returns an error. In this particular case -EPROTO
was returned and res stayed uninitialized.
This can be avoided by checking the return value of dm_read_shared_word
and propagating the error if the read operation failed.
[1] https://syzkaller.appspot.com/bug?extid=1f53a30781af65d2c955
Cc: stable@vger.kernel.org
Signed-off-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Reported-and-tested-by: syzbot+1f53a30781af65d2c955@syzkaller.appspotmail.com
Acked-by: Peter Korsgaard <peter@korsgaard.com>
Fixes: d0374f4f9c35cdfbee0 ("USB: Davicom DM9601 usbnet driver")
Link: https://lore.kernel.org/r/20231009-topic-dm9601_uninit_mdio_read-v2-1-f2fe39739b6c@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A bitset without mask in a _SET request means we want exactly the bits in
the bitset to be set. This works correctly for compact format but when
verbose format is parsed, ethnl_update_bitset32_verbose() only sets the
bits present in the request bitset but does not clear the rest. The commit
6699170376ab fixes this issue by clearing the whole target bitmap before we
start iterating. The solution proposed brought an issue with the behavior
of the mod variable. As the bitset is always cleared the old val will
always differ to the new val.
Fix it by adding a new temporary variable which save the state of the old
bitmap.
Fixes: 6699170376ab ("ethtool: fix application of verbose no_mask bitset")
Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231009133645.44503-1-kory.maincent@bootlin.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Sili Luo reported a race in nfc_llcp_sock_get(), leading to UAF.
Getting a reference on the socket found in a lookup while
holding a lock should happen before releasing the lock.
nfc_llcp_sock_get_sn() has a similar problem.
Finally nfc_llcp_recv_snl() needs to make sure the socket
found by nfc_llcp_sock_from_sn() does not disappear.
Fixes: 8f50020ed9b8 ("NFC: LLCP late binding")
Reported-by: Sili Luo <rootlab@huawei.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20231009123110.3735515-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Our current route lookups (mctp_route_lookup and mctp_route_lookup_null)
traverse the net's route list without the RCU read lock held. This means
the route lookup is subject to preemption, resulting in an potential
grace period expiry, and so an eventual kfree() while we still have the
route pointer.
Add the proper read-side critical section locks around the route
lookups, preventing premption and a possible parallel kfree.
The remaining net->mctp.routes accesses are already under a
rcu_read_lock, or protected by the RTNL for updates.
Based on an analysis from Sili Luo <rootlab@huawei.com>, where
introducing a delay in the route lookup could cause a UAF on
simultaneous sendmsg() and route deletion.
Reported-by: Sili Luo <rootlab@huawei.com>
Fixes: 889b7da23abf ("mctp: Add initial routing framework")
Cc: stable@vger.kernel.org
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/29c4b0e67dc1bf3571df3982de87df90cae9b631.1696837310.git.jk@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Correct punctuation and drop an extraneous word.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231008214121.25940-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When functions called by the trampoline panic, the backtrace that is
printed stops at the trampoline, because the trampoline does not store
its caller's frame address (backchain) on stack; it also stores the
return address at a wrong location.
Store both the same way as is already done for the regular eBPF programs.
Fixes: 528eb2cb87bc ("s390/bpf: Implement arch_prepare_bpf_trampoline()")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231010203512.385819-3-iii@linux.ibm.com
|
|
One of the first things that s390x kernel functions do is storing the
the caller's frame address (backchain) on stack. This makes unwinding
possible. The backchain is always stored at frame offset 152, which is
inside the 160-byte stack area, that the functions allocate for their
callees. The callees must preserve the backchain; the remaining 152
bytes they may use as they please.
Currently the trampoline uses all 160 bytes, clobbering the backchain.
This causes kernel panics when using __builtin_return_address() in
functions called by the trampoline.
Fix by reducing the usage of the caller-reserved stack area by 8 bytes
in the trampoline.
Fixes: 528eb2cb87bc ("s390/bpf: Implement arch_prepare_bpf_trampoline()")
Reported-by: Song Liu <song@kernel.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231010203512.385819-2-iii@linux.ibm.com
|
|
Static calls invocations aren't well supported from module __init and
__exit functions. Especially the static call from cleanup_trusted() led
to a crash on x86 kernel with CONFIG_DEBUG_VIRTUAL=y.
However, the usage of static call invocations for trusted_key_init()
and trusted_key_exit() don't add any value from either a performance or
security perspective. Hence switch to use indirect function calls instead.
Note here that although it will fix the current crash report, ultimately
the static call infrastructure should be fixed to either support its
future usage from module __init and __exit functions or not.
Reported-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Link: https://lore.kernel.org/lkml/ZRhKq6e5nF%2F4ZIV1@fedora/#t
Fixes: 5d0682be3189 ("KEYS: trusted: Add generic trusted keys framework")
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This reverts commit 5f521494cc73520ffac18ede0758883b9aedd018.
The patch breaks mounts with security mount options like
$ mount -o context=system_u:object_r:root_t:s0 /dev/sdX /mn
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/sdX, missing codepage or helper program, ...
We cannot reject all unknown options in btrfs_parse_subvol_options() as
intended, the security options can be present at this point and it's not
possible to enumerate them in a future proof way. This means unknown
mount options are silently accepted like before when the filesystem is
mounted with either -o subvol=/path or as followup mounts of the same
device.
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Commit 1e66220948df8 ("net/mlx5e: Update rx ring hw mtu upon each rx-fcs
flag change") seems to have accidentally inverted the logic added in
commit 0bc73ad46a76 ("net/mlx5e: Mutually exclude RX-FCS and
RX-port-timestamp").
The impact of this is a little unclear since it seems the FCS scattered
with RX-FCS is (usually?) correct regardless.
Fixes: 1e66220948df8 ("net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change")
Tested-by: Charlotte Tan <charlotte@extrahop.com>
Reviewed-by: Charlotte Tan <charlotte@extrahop.com>
Cc: Adham Faris <afaris@nvidia.com>
Cc: Aya Levin <ayal@nvidia.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
Cc: Moshe Shemesh <moshe@nvidia.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Will Mortensen <will@extrahop.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20231006053706.514618-1-will@extrahop.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When the SMC protocol is built into the kernel proper while ISM is
configured to be built as module, linking the kernel fails due to
unresolved dependencies out of net/smc/smc_ism.o to
ism_get_smcd_ops, ism_register_client, and ism_unregister_client
as reported via the linux-next test automation (see link).
This however is a bug introduced a while ago.
Correct the dependency list in ISM's and SMC's Kconfig to reflect the
dependencies that are actually inverted. With this you cannot build a
kernel with CONFIG_SMC=y and CONFIG_ISM=m. Either ISM needs to be 'y',
too - or a 'n'. That way, SMC can still be configured on non-s390
architectures that do not have (nor need) an ISM driver.
Fixes: 89e7d2ba61b7 ("net/ism: Add new API for client registration")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Closes: https://lore.kernel.org/linux-next/d53b5b50-d894-4df8-8969-fd39e63440ae@infradead.org/
Co-developed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lore.kernel.org/r/20231006125847.1517840-1-gbayer@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The adapter->vf_mvs.l list needs to be initialized even if the list is
empty. Otherwise it will lead to crashes.
Fixes: a1cbb15c1397 ("ixgbe: Add macvlan support for VF")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Link: https://lore.kernel.org/r/ZSADNdIw8zFx1xw2@kadam
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When updating the SA, use the new update_pn flags instead of comparing the
new PN with the initial one.
Comparing the initial PN value with the new value will allow the user
to update the SA using the initial PN value as a parameter like this:
$ ip macsec add macsec0 tx sa 0 pn 1 on key 00 \
ead3664f508eb06c40ac7104cdae4ce5
$ ip macsec set macsec0 tx sa 0 pn 1 off
Fixes: 8ff0ac5be144 ("net/mlx5: Add MACsec offload Tx command support")
Fixes: aae3454e4d4c ("net/mlx5e: Add MACsec offload Rx command support")
Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Updating the PN is not supported.
Return -EINVAL if update_pn is true.
The following command succeeded, but it should fail because the driver
does not update the PN:
ip macsec set macsec0 tx sa 0 pn 232 on
Fixes: 28c5107aa904 ("net: phy: mscc: macsec support")
Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When updating SA, update the PN only when the update_pn flag is true.
Otherwise, the PN will be reset to its previous value using the
following command and this should not happen:
$ ip macsec set macsec0 tx sa 0 on
Fixes: c54ffc73601c ("octeontx2-pf: mcs: Introduce MACSEC hardware offloading")
Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Indicate next PN update using update_pn flag in macsec_context.
Offloaded MACsec implementations does not know whether or not the
MACSEC_SA_ATTR_PN attribute was passed for an SA update and assume
that next PN should always updated, but this is not always true.
The PN can be reset to its initial value using the following command:
$ ip macsec set macsec0 tx sa 0 off #octeontx2-pf case
Or, the update PN command will succeed even if the driver does not support
PN updates.
$ ip macsec set macsec0 tx sa 0 pn 1 on #mscc phy driver case
Comparing the initial PN with the new PN value is not a solution. When
the user updates the PN using its initial value the command will
succeed, even if the driver does not support it. Like this:
$ ip macsec add macsec0 tx sa 0 pn 1 on key 00 \
ead3664f508eb06c40ac7104cdae4ce5
$ ip macsec set macsec0 tx sa 0 pn 1 on #mlx5 case
Signed-off-by: Radu Pirea (NXP OSS) <radu-nicolae.pirea@oss.nxp.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Commit ff48b37802e5 ("scsi: Do not attempt to rescan suspended devices")
modified scsi_rescan_device() to avoid attempting rescanning a suspended
device. However, the modification added a check to verify that a SCSI
device is in the running state without checking if the device request
queue (in the case of block device) is also running, thus allowing the
exectuion of internal requests. Without checking the device request
queue, commit ff48b37802e5 fix is incomplete and deadlocks on resume can
still happen. Use blk_queue_pm_only() to check if the device request
queue allows executing commands in addition to checking the SCSI device
state.
Reported-by: Petr Tesarik <petr@tesarici.cz>
Fixes: ff48b37802e5 ("scsi: Do not attempt to rescan suspended devices")
Cc: stable@vger.kernel.org
Tested-by: Petr Tesarik <petr@tesarici.cz>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
|
|
fit3 protocol driver does not support accessing IDE control registers
(device control/altstatus). The DOS driver does not use these registers
either (as observed from DOSEMU trace). But the HW seems to be capable
of accessing these registers - I simply tried bit 3 and it works!
The control register is required to properly reset ATAPI devices or
they will be detected only once (after a power cycle).
Tested with EXP Computer CD-865 with MC-1285B EPP cable and
TransDisk 3000.
Signed-off-by: Ondrej Zary <linux@zary.sk>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
|
|
Some parallel adapters (e.g. EXP Computer MC-1285B EPP Cable) return
bogus values when there's no master device present. This can cause
reset to fail, preventing the lone slave device (such as EXP Computer
CD-865) from working.
Add custom version of wait_after_reset that ignores master failure when
a slave device is present. The custom version is also needed because
the generic ata_sff_wait_after_reset uses direct port I/O for slave
device detection.
Signed-off-by: Ondrej Zary <linux@zary.sk>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
|
|
Add missing ops->sff_set_devctl implementation.
Fixes: 246a1c4c6b7f ("ata: pata_parport: add driver (PARIDE replacement)")
Cc: stable@vger.kernel.org
Signed-off-by: Ondrej Zary <linux@zary.sk>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
|
|
There's a 'x' missing in 0x55 in pata_parport_devchk(), causing the
detection to always fail. Fix it.
Fixes: 246a1c4c6b7f ("ata: pata_parport: add driver (PARIDE replacement)")
Cc: stable@vger.kernel.org
Signed-off-by: Ondrej Zary <linux@zary.sk>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
|
|
Ifcfg config file support in NetworkManger is deprecated. This patch
provides support for the new keyfile config format for connection
profiles in NetworkManager. The patch modifies the hv_kvp_daemon code
to generate the new network configuration in keyfile
format(.ini-style format) along with a ifcfg format configuration.
The ifcfg format configuration is also retained to support easy
backward compatibility for distro vendors. These configurations are
stored in temp files which are further translated using the
hv_set_ifconfig.sh script. This script is implemented by individual
distros based on the network management commands supported.
For example, RHEL's implementation could be found here:
https://gitlab.com/redhat/centos-stream/src/hyperv-daemons/-/blob/c9s/hv_set_ifconfig.sh
Debian's implementation could be found here:
https://github.com/endlessm/linux/blob/master/debian/cloud-tools/hv_set_ifconfig
The next part of this support is to let the Distro vendors consume
these modified implementations to the new configuration format.
Tested-on: Rhel9(Hyper-V, Azure)(nm and ifcfg files verified)
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/1696847920-31125-1-git-send-email-shradhagupta@linux.microsoft.com
|
|
syzbot uses panic_on_warn.
This means that the skb_dump() I added in the blamed commit are
not even called.
Rewrite this so that we get the needed skb dump before syzbot crashes.
Fixes: eeee4b77dc52 ("net: add more debug info in skb_checksum_help()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20231006173355.2254983-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A previous commit updated the verifier to print an accurate failure
message for when someone specifies a nonzero return value from an async
callback. This adds a testcase for validating that the verifier emits
the correct message in such a case.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231009161414.235829-2-void@manifault.com
|
|
The verifier, as part of check_return_code(), verifies that async
callbacks such as from e.g. timers, will return 0. It does this by
correctly checking that R0->var_off is in tnum_const(0), which
effectively checks that it's in a range of 0. If this condition fails,
however, it prints an error message which says that the value should
have been in (0x0; 0x1). This results in possibly confusing output such
as the following in which an async callback returns 1:
At async callback the register R0 has value (0x1; 0x0) should have been in (0x0; 0x1)
The fix is easy -- we should just pass the tnum_const(0) as the correct
range to verbose_invalid_scalar(), which will then print the following:
At async callback the register R0 has value (0x1; 0x0) should have been in (0x0; 0x0)
Fixes: bfc6bb74e4f1 ("bpf: Implement verifier support for validation of async callbacks.")
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231009161414.235829-1-void@manifault.com
|
|
Syzkaller reported the following issue:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 2807 at mm/vmalloc.c:3247 __vmalloc_node_range (mm/vmalloc.c:3361)
Modules linked in:
CPU: 0 PID: 2807 Comm: repro Not tainted 6.6.0-rc2+ #12
Hardware name: Generic DT based system
unwind_backtrace from show_stack (arch/arm/kernel/traps.c:258)
show_stack from dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
dump_stack_lvl from __warn (kernel/panic.c:633 kernel/panic.c:680)
__warn from warn_slowpath_fmt (./include/linux/context_tracking.h:153 kernel/panic.c:700)
warn_slowpath_fmt from __vmalloc_node_range (mm/vmalloc.c:3361 (discriminator 3))
__vmalloc_node_range from vmalloc_user (mm/vmalloc.c:3478)
vmalloc_user from xskq_create (net/xdp/xsk_queue.c:40)
xskq_create from xsk_setsockopt (net/xdp/xsk.c:953 net/xdp/xsk.c:1286)
xsk_setsockopt from __sys_setsockopt (net/socket.c:2308)
__sys_setsockopt from ret_fast_syscall (arch/arm/kernel/entry-common.S:68)
xskq_get_ring_size() uses struct_size() macro to safely calculate the
size of struct xsk_queue and q->nentries of desc members. But the
syzkaller repro was able to set q->nentries with the value initially
taken from copy_from_sockptr() high enough to return SIZE_MAX by
struct_size(). The next PAGE_ALIGN(size) is such case will overflow
the size_t value and set it to 0. This will trigger WARN_ON_ONCE in
vmalloc_user() -> __vmalloc_node_range().
The issue is reproducible on 32-bit arm kernel.
Fixes: 9f78bf330a66 ("xsk: support use vaddr as ring")
Reported-by: syzbot+fae676d3cf469331fc89@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000c84b4705fb31741e@google.com/T/
Reported-by: syzbot+b132693e925cbbd89e26@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000e20df20606ebab4f@google.com/T/
Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: syzbot+fae676d3cf469331fc89@syzkaller.appspotmail.com
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://syzkaller.appspot.com/bug?extid=fae676d3cf469331fc89
Link: https://lore.kernel.org/bpf/20231007075148.1759-1-andrew.kanner@gmail.com
|
|
The RISC-V BPF uses a5 for BPF return values, which are zero-extended,
whereas the RISC-V ABI uses a0 which is sign-extended. In other words,
a5 and a0 can differ, and are used in different context.
The BPF trampoline are used for both BPF programs, and regular kernel
functions.
Make sure that the RISC-V BPF trampoline saves, and restores both a0
and a5.
Fixes: 49b5e77ae3e2 ("riscv, bpf: Add bpf trampoline support for RV64")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231004120706.52848-3-bjorn@kernel.org
|
|
The RISC-V architecture does not expose sub-registers, and hold all
32-bit values in a sign-extended format [1] [2]:
| The compiler and calling convention maintain an invariant that all
| 32-bit values are held in a sign-extended format in 64-bit
| registers. Even 32-bit unsigned integers extend bit 31 into bits
| 63 through 32. Consequently, conversion between unsigned and
| signed 32-bit integers is a no-op, as is conversion from a signed
| 32-bit integer to a signed 64-bit integer.
While BPF, on the other hand, exposes sub-registers, and use
zero-extension (similar to arm64/x86).
This has led to some subtle bugs, where a BPF JITted program has not
sign-extended the a0 register (return value in RISC-V land), passed
the return value up the kernel, e.g.:
| int from_bpf(void);
|
| long foo(void)
| {
| return from_bpf();
| }
Here, a0 would be 0xffff_ffff, instead of the expected
0xffff_ffff_ffff_ffff.
Internally, the RISC-V JIT uses a5 as a dedicated register for BPF
return values.
Keep a5 zero-extended, but explicitly sign-extend a0 (which is used
outside BPF land). Now that a0 (RISC-V ABI) and a5 (BPF ABI) differs,
a0 is only moved to a5 for non-BPF native calls (BPF_PSEUDO_CALL).
Fixes: 2353ecc6f91f ("bpf, riscv: add BPF JIT for RV64G")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://github.com/riscv/riscv-isa-manual/releases/download/riscv-isa-release-056b6ff-2023-10-02/unpriv-isa-asciidoc.pdf # [2]
Link: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/releases/download/draft-20230929-e5c800e661a53efe3c2678d71a306323b60eb13b/riscv-abi.pdf # [2]
Link: https://lore.kernel.org/bpf/20231004120706.52848-2-bjorn@kernel.org
|
|
Commit 9e70a5e109a4 ("printk: Add per-console suspended state")
removed console lock usage during resume and replaced it with
the clearly defined console_list_lock and srcu mechanisms.
However, the console lock usage had an important side-effect
of flushing the consoles. After its removal, consoles were no
longer flushed before checking their progress.
Add the console_lock/console_unlock dance to the beginning
of __pr_flush() to actually flush the consoles before checking
their progress. Also add comments to clarify this additional
usage of the console lock.
Note that console_unlock() does not guarantee flushing all messages
since the commit dbdda842fe96f89 ("printk: Add console owner and waiter
logic to load balance console writes").
Reported-by: Todd Brandt <todd.e.brandt@intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217955
Fixes: 9e70a5e109a4 ("printk: Add per-console suspended state")
Co-developed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Link: https://lore.kernel.org/r/20231006082151.6969-2-pmladek@suse.com
|
|
In unprivileged Xen guests event handling can cause a deadlock with
Xen console handling. The evtchn_rwlock and the hvc_lock are taken in
opposite sequence in __hvc_poll() and in Xen console IRQ handling.
Normally this is no problem, as the evtchn_rwlock is taken as a reader
in both paths, but as soon as an event channel is being closed, the
lock will be taken as a writer, which will cause read_lock() to block:
CPU0 CPU1 CPU2
(IRQ handling) (__hvc_poll()) (closing event channel)
read_lock(evtchn_rwlock)
spin_lock(hvc_lock)
write_lock(evtchn_rwlock)
[blocks]
spin_lock(hvc_lock)
[blocks]
read_lock(evtchn_rwlock)
[blocks due to writer waiting,
and not in_interrupt()]
This issue can be avoided by replacing evtchn_rwlock with RCU in
xen_free_irq(). Note that RCU is used only to delay freeing of the
irq_info memory. There is no RCU based dereferencing or replacement of
pointers involved.
In order to avoid potential races between removing the irq_info
reference and handling of interrupts, set the irq_info pointer to NULL
only when freeing its memory. The IRQ itself must be freed at that
time, too, as otherwise the same IRQ number could be allocated again
before handling of the old instance would have been finished.
This is XSA-441 / CVE-2023-34324.
Fixes: 54c9de89895e ("xen/events: add a new "late EOI" evtchn framework")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
I own an external usb Webcam, model NexiGo N930AF, which had low mic volume and
inconsistent sound quality. Video works as expected.
(snip)
[ +0.047857] usb 5-1: new high-speed USB device number 2 using xhci_hcd
[ +0.003406] usb 5-1: New USB device found, idVendor=1bcf, idProduct=2283, bcdDevice=12.17
[ +0.000007] usb 5-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ +0.000004] usb 5-1: Product: NexiGo N930AF FHD Webcam
[ +0.000003] usb 5-1: Manufacturer: SHENZHEN AONI ELECTRONIC CO., LTD
[ +0.000004] usb 5-1: SerialNumber: 20201217011
[ +0.003900] usb 5-1: Found UVC 1.00 device NexiGo N930AF FHD Webcam (1bcf:2283)
[ +0.025726] usb 5-1: 3:1: cannot get usb sound sample rate freq at ep 0x86
[ +0.071482] usb 5-1: 3:2: cannot get usb sound sample rate freq at ep 0x86
[ +0.004679] usb 5-1: 3:3: cannot get usb sound sample rate freq at ep 0x86
[ +0.051607] usb 5-1: Warning! Unlikely big volume range (=4096), cval->res is probably wrong.
[ +0.000005] usb 5-1: [7] FU [Mic Capture Volume] ch = 1, val = 0/4096/1
Set up quirk cval->res to 16 for 256 levels,
Set GET_SAMPLE_RATE quirk flag to stop trying to get the sample rate.
Confirmed that happened anyway later due to the backoff mechanism, after 3 failures
All audio stream on device interfaces share the same values,
apart from wMaxPacketSize and tSamFreq :
(snip)
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 3
bAlternateSetting 3
bNumEndpoints 1
bInterfaceClass 1 Audio
bInterfaceSubClass 2 Streaming
bInterfaceProtocol 0
iInterface 0
AudioStreaming Interface Descriptor:
bLength 7
bDescriptorType 36
bDescriptorSubtype 1 (AS_GENERAL)
bTerminalLink 8
bDelay 1 frames
wFormatTag 0x0001 PCM
AudioStreaming Interface Descriptor:
bLength 11
bDescriptorType 36
bDescriptorSubtype 2 (FORMAT_TYPE)
bFormatType 1 (FORMAT_TYPE_I)
bNrChannels 1
bSubframeSize 2
bBitResolution 16
bSamFreqType 1 Discrete
tSamFreq[ 0] 44100
Endpoint Descriptor:
bLength 9
bDescriptorType 5
bEndpointAddress 0x86 EP 6 IN
bmAttributes 5
Transfer Type Isochronous
Synch Type Asynchronous
Usage Type Data
wMaxPacketSize 0x005c 1x 92 bytes
bInterval 4
bRefresh 0
bSynchAddress 0
AudioStreaming Endpoint Descriptor:
bLength 7
bDescriptorType 37
bDescriptorSubtype 1 (EP_GENERAL)
bmAttributes 0x01
Sampling Frequency
bLockDelayUnits 0 Undefined
wLockDelay 0x0000
(snip)
Based on the usb data about manufacturer, SPCA2281B3 is the most likely controller IC
Manufacturer does not provide link for datasheet nor detailed specs.
No way to confirm if the firmware supports any other way of getting the sample rate.
Testing patch provides consistent good sound recording quality and volume range.
(snip)
[ +0.045764] usb 5-1: new high-speed USB device number 2 using xhci_hcd
[ +0.106290] usb 5-1: New USB device found, idVendor=1bcf, idProduct=2283, bcdDevice=12.17
[ +0.000006] usb 5-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ +0.000004] usb 5-1: Product: NexiGo N930AF FHD Webcam
[ +0.000003] usb 5-1: Manufacturer: SHENZHEN AONI ELECTRONIC CO., LTD
[ +0.000004] usb 5-1: SerialNumber: 20201217011
[ +0.043700] usb 5-1: set resolution quirk: cval->res = 16
[ +0.002585] usb 5-1: Found UVC 1.00 device NexiGo N930AF FHD Webcam (1bcf:2283)
Signed-off-by: Christos Skevis <xristos.thes@gmail.com>
Link: https://lore.kernel.org/r/20231006155330.399393-1-xristos.thes@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
|