Age | Commit message (Collapse) | Author | Files | Lines |
|
We should trace the allocated address instead of page struct.
Fixes: 27c874867c4e ("dpaa2-eth: Use a single page per Rx buffer")
Signed-off-by: Chen Lin <chen.lin5@zte.com.cn>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://lore.kernel.org/r/20220811151651.3327-1-chen45464546@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Multicast group flags were added in commit 4d54cc32112d ("mptcp: avoid
lock_fast usage in accept path"), but it missed adding the kdoc.
Mention which flags go into that field, and do the same for
op structs.
Link: https://lore.kernel.org/r/20220809232012.403730-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
area_cache_get() is used to distribute cache->area and set cache->id,
and if cache->id is not 0 and cache->area->kref refcount is 0, it will
release the cache->area by nfp_cpp_area_release(). area_cache_get()
set cache->id before cpp->op->area_init() and nfp_cpp_area_acquire().
But if area_init() or nfp_cpp_area_acquire() fails, the cache->id is
is already set but the refcount is not increased as expected. At this
time, calling the nfp_cpp_area_release() will cause use-after-free.
To avoid the use-after-free, set cache->id after area_init() and
nfp_cpp_area_acquire() complete successfully.
Note: This vulnerability is triggerable by providing emulated device
equipped with specified configuration.
BUG: KASAN: use-after-free in nfp6000_area_init (drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c:760)
Write of size 4 at addr ffff888005b7f4a0 by task swapper/0/1
Call Trace:
<TASK>
nfp6000_area_init (drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c:760)
area_cache_get.constprop.8 (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:884)
Allocated by task 1:
nfp_cpp_area_alloc_with_name (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:303)
nfp_cpp_area_cache_add (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:802)
nfp6000_init (drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c:1230)
nfp_cpp_from_operations (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:1215)
nfp_pci_probe (drivers/net/ethernet/netronome/nfp/nfp_main.c:744)
Freed by task 1:
kfree (mm/slub.c:4562)
area_cache_get.constprop.8 (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:873)
nfp_cpp_read (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:924 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:973)
nfp_cpp_readl (drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cpplib.c:48)
Signed-off-by: Jialiang Wang <wangjialiang0806@163.com>
Reviewed-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Acked-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20220810073057.4032-1-wangjialiang0806@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Change my address for mt7601u to the main one.
Link: https://lore.kernel.org/r/20220809233843.408004-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Drop devl_lock() / devl_unlock() from ports creation and removal flows
since the devlink instance lock is now taken by mlxsw_core.
Fixes: 72a4c8c94efa ("mlxsw: convert driver to use unlocked devlink API during init/fini")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/f4afce5ab0318617f3866b85274be52542d59b32.1660211614.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit d5410ac7b0ba ("net:bonding:support balance-alb interface
with vlan to bridge") introduced a reference count leak by not releasing
the reference acquired by ip_dev_find(). Remedy this by insuring the
reference is released.
Fixes: d5410ac7b0ba ("net:bonding:support balance-alb interface with vlan to bridge")
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://lore.kernel.org/r/26758.1660194413@famine
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit 258fafcd0683d9ccfa524129d489948ab3ddc24c.
The clang -Wformat warning is terminally broken, and the clang people
can't seem to get their act together.
This test program causes a warning with clang:
#include <stdio.h>
int main(int argc, char **argv)
{
printf("%hhu\n", 'a');
}
resulting in
t.c:5:19: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat]
printf("%hhu\n", 'a');
~~~~ ^~~
%d
and apparently clang people consider that a feature, because they don't
want to face the reality of how either C character constants, C
arithmetic, and C varargs functions work.
The rest of the world just shakes their head at that kind of
incompetence, and turns off -Wformat for clang again.
And no, the "you should use a pointless cast to shut this up" is not a
valid answer. That warning should not exist in the first place, or at
least be optinal with some "-Wformat-me-harder" kind of option.
[ Admittedly, there's also very little reason to *ever* use '%hh[ud]' in
C, but what little reason there is is entirely about 'I want to see
only the low 8 bits of the argument'. So I would suggest nobody ever
use that format in the first place, but if they do, the clang
behavious is simply always wrong. Because '%hhu' takes an 'int'. It's
that simple. ]
Reported-by: Sudip Mukherjee (Codethink) <sudipm.mukherjee@gmail.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit b32d45824aa7 ("dm bufio: Add DM_BUFIO_CLIENT_NO_SLEEP flag")
added a "NO_SLEEP" mode, it replaces a mutex with a spinlock, and it
is only usable when the device is in read-only mode (because the write
path may be sleeping while holding the dm_bufio_client lock).
However, there are still two points where the code could sleep even in
read-only mode. One is in __get_unclaimed_buffer -> __make_buffer_clean.
The other is in __try_evict_buffer -> __make_buffer_clean. These functions
will call __make_buffer_clean which sleeps if the buffer is being read.
Fix these cases so that if c->no_sleep is set __make_buffer_clean
will not be called and the buffer will be skipped instead.
Fixes: b32d45824aa7 ("dm bufio: Add DM_BUFIO_CLIENT_NO_SLEEP flag")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
There are 2 models for MV32 serials. MV32-W-A is designed
based on Qualcomm SDX62 chip, and MV32-W-B is designed based
on Qualcomm SDX65 chip. So we use 2 different PID to separate it.
Test evidence as below:
T: Bus=03 Lev=01 Prnt=01 Port=02 Cnt=03 Dev#= 3 Spd=480 MxCh= 0
D: Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1
P: Vendor=1e2d ProdID=00f3 Rev=05.04
S: Manufacturer=Cinterion
S: Product=Cinterion PID 0x00F3 USB Mobile Broadband
S: SerialNumber=d7b4be8d
C: #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA
I: If#=0x0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
I: If#=0x1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
I: If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
I: If#=0x3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
T: Bus=03 Lev=01 Prnt=01 Port=02 Cnt=03 Dev#= 10 Spd=480 MxCh= 0
D: Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1
P: Vendor=1e2d ProdID=00f4 Rev=05.04
S: Manufacturer=Cinterion
S: Product=Cinterion PID 0x00F4 USB Mobile Broadband
S: SerialNumber=d095087d
C: #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA
I: If#=0x0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan
I: If#=0x1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
I: If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option
I: If#=0x3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
Signed-off-by: Slark Xiao <slark_xiao@163.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Link: https://lore.kernel.org/r/20220810014521.9383-1-slark_xiao@163.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Shut up this warning:
kernel/bpf/syscall.c:5089:5: warning: no previous prototype for function 'kern_sys_bpf' [-Wmissing-prototypes]
int kern_sys_bpf(int cmd, union bpf_attr *attr, unsigned int size)
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
There is unexpected warning on KVM_CAP_VM_DISABLE_NX_HUGE_PAGES capability
table, which cause the table to be rendered as paragraph text instead.
The warning is due to missing colon at capability name and returns keyword,
as well as improper alignment on multi-line returns field.
Fix the warning by adding missing colons and aligning the field.
Link: https://lore.kernel.org/lkml/20220627181937.3be67263@canb.auug.org.au/
Fixes: 084cc29f8bbb03 ("KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Matlack <dmatlack@google.com>
Cc: Ben Gardon <bgardon@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-next@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Message-Id: <20220627095151.19339-3-bagasdotme@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Extend heading underline for KVM_CAP_VM_DISABLE_NX_HUGE_PAGE to match
the heading text length.
Link: https://lore.kernel.org/lkml/20220627181937.3be67263@canb.auug.org.au/
Fixes: 084cc29f8bbb03 ("KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Matlack <dmatlack@google.com>
Cc: Ben Gardon <bgardon@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-next@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Message-Id: <20220627095151.19339-2-bagasdotme@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Currently, tls_device_down synchronizes with tls_device_resync_rx using
RCU, however, the pointer to netdev is stored using WRITE_ONCE and
loaded using READ_ONCE.
Although such approach is technically correct (rcu_dereference is
essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store
NULL), using special RCU helpers for pointers is more valid, as it
includes additional checks and might change the implementation
transparently to the callers.
Mark the netdev pointer as __rcu and use the correct RCU helpers to
access it. For non-concurrent access pass the right conditions that
guarantee safe access (locks taken, refcount value). Also use the
correct helper in mlx5e, where even READ_ONCE was missing.
The transition to RCU exposes existing issues, fixed by this commit:
1. bond_tls_device_xmit could read netdev twice, and it could become
NULL the second time, after the NULL check passed.
2. Drivers shouldn't stop processing the last packet if tls_device_down
just set netdev to NULL, before tls_dev_del was called. This prevents a
possible packet drop when transitioning to the fallback software mode.
Fixes: 89df6a810470 ("net/bonding: Implement TLS TX device offload")
Fixes: c55dcdd435aa ("net/tls: Fix use-after-free after the TLS device goes down and up")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Link: https://lore.kernel.org/r/20220810081602.1435800-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Another device offload bug, we use the length of the output
skb as an indication of how much data to copy. But that skb
is sized to offset + record length, and we start from offset.
So we end up double-counting the offset which leads to
skb_copy_bits() returning -EFAULT.
Reported-by: Tariq Toukan <tariqt@nvidia.com>
Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Tested-by: Ran Rozenstein <ranro@nvidia.com>
Link: https://lore.kernel.org/r/20220809175544.354343-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We can't do skb_walk_frags() on the input skbs, because
the input skbs is really just a pointer to the tcp read
queue. We need to bound the "is decrypted" check by the
amount of data in the message.
Note that the walk in tls_device_reencrypt() is after a
CoW so the skb there is safe to walk. Actually in the
current implementation it can't have frags at all, but
whatever, maybe one day it will.
Reported-by: Tariq Toukan <tariqt@nvidia.com>
Fixes: 84c61fe1a75b ("tls: rx: do not use the standard strparser")
Tested-by: Ran Rozenstein <ranro@nvidia.com>
Link: https://lore.kernel.org/r/20220809175544.354343-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When a route filter is replaced and the old filter has a 0 handle, the old
one won't be removed from the hashtable, while it will still be freed.
The test was there since before commit 1109c00547fc ("net: sched: RCU
cls_route"), when a new filter was not allocated when there was an old one.
The old filter was reused and the reinserting would only be necessary if an
old filter was replaced. That was still wrong for the same case where the
old handle was 0.
Remove the old filter from the list independently from its handle value.
This fixes CVE-2022-2588, also reported as ZDI-CAN-17440.
Reported-by: Zhenpeng Lin <zplin@u.northwestern.edu>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Kamal Mostafa <kamal@canonical.com>
Cc: <stable@vger.kernel.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20220809170518.164662-1-cascardo@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The custom multipath hash tests use mausezahn in order to test how
changes in various packet fields affect the packet distribution across
the available nexthops.
The tool uses the libnet library for various low-level packet
construction and injection. The library started using the
"SO_BINDTODEVICE" socket option for IPv6 sockets in version 1.1.6 and
for IPv4 sockets in version 1.2.
When the option is not set, packets are not routed according to the
table associated with the VRF master device and tests fail.
Fix this by prefixing the command with "ip vrf exec", which will cause
the route lookup to occur in the VRF routing table. This makes the tests
pass regardless of the libnet library version.
Fixes: 511e8db54036 ("selftests: forwarding: Add test for custom multipath hash")
Fixes: 185b0c190bb6 ("selftests: forwarding: Add test for custom multipath hash with IPv4 GRE")
Fixes: b7715acba4d3 ("selftests: forwarding: Add test for custom multipath hash with IPv6 GRE")
Reported-by: Ivan Vecera <ivecera@redhat.com>
Tested-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Link: https://lore.kernel.org/r/20220809113320.751413-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Refactor sk_user_data dereference using more generic function
__rcu_dereference_sk_user_data_with_flags(), which improve its
maintainability
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Syzkaller reports refcount bug as follows:
------------[ cut here ]------------
refcount_t: saturated; leaking memory.
WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19
Modules linked in:
CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda7d90 #0
<TASK>
__refcount_add_not_zero include/linux/refcount.h:163 [inline]
__refcount_inc_not_zero include/linux/refcount.h:227 [inline]
refcount_inc_not_zero include/linux/refcount.h:245 [inline]
sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439
tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091
tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983
tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057
tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659
tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682
sk_backlog_rcv include/net/sock.h:1061 [inline]
__release_sock+0x134/0x3b0 net/core/sock.c:2849
release_sock+0x54/0x1b0 net/core/sock.c:3404
inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909
__sys_shutdown_sock net/socket.c:2331 [inline]
__sys_shutdown_sock net/socket.c:2325 [inline]
__sys_shutdown+0xf1/0x1b0 net/socket.c:2343
__do_sys_shutdown net/socket.c:2351 [inline]
__se_sys_shutdown net/socket.c:2349 [inline]
__x64_sys_shutdown+0x50/0x70 net/socket.c:2349
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
</TASK>
During SMC fallback process in connect syscall, kernel will
replaces TCP with SMC. In order to forward wakeup
smc socket waitqueue after fallback, kernel will sets
clcsk->sk_user_data to origin smc socket in
smc_fback_replace_callbacks().
Later, in shutdown syscall, kernel will calls
sk_psock_get(), which treats the clcsk->sk_user_data
as psock type, triggering the refcnt warning.
So, the root cause is that smc and psock, both will use
sk_user_data field. So they will mismatch this field
easily.
This patch solves it by using another bit(defined as
SK_USER_DATA_PSOCK) in PTRMASK, to mark whether
sk_user_data points to a psock object or not.
This patch depends on a PTRMASK introduced in commit f1ff5ce2cd5e
("net, sk_msg: Clear sk_user_data pointer on clone if tagged").
For there will possibly be more flags in the sk_user_data field,
this patch also refactor sk_user_data flags code to be more generic
to improve its maintainability.
Reported-and-tested-by: syzbot+5f26f85569bd179c18ce@syzkaller.appspotmail.com
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Hawkins Jiawei <yin31149@gmail.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Users of GNU ld (BFD) from binutils 2.39+ will observe multiple
instances of a new warning when linking kernels in the form:
ld: warning: arch/x86/boot/pmjump.o: missing .note.GNU-stack section implies executable stack
ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
ld: warning: arch/x86/boot/compressed/vmlinux has a LOAD segment with RWX permissions
Generally, we would like to avoid the stack being executable. Because
there could be a need for the stack to be executable, assembler sources
have to opt-in to this security feature via explicit creation of the
.note.GNU-stack feature (which compilers create by default) or command
line flag --noexecstack. Or we can simply tell the linker the
production of such sections is irrelevant and to link the stack as
--noexecstack.
LLVM's LLD linker defaults to -z noexecstack, so this flag isn't
strictly necessary when linking with LLD, only BFD, but it doesn't hurt
to be explicit here for all linkers IMO. --no-warn-rwx-segments is
currently BFD specific and only available in the current latest release,
so it's wrapped in an ld-option check.
While the kernel makes extensive usage of ELF sections, it doesn't use
permissions from ELF segments.
Link: https://lore.kernel.org/linux-block/3af4127a-f453-4cf7-f133-a181cce06f73@kernel.dk/
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=ba951afb99912da01a6e8434126b8fac7aa75107
Link: https://github.com/llvm/llvm-project/issues/57009
Reported-and-tested-by: Jens Axboe <axboe@kernel.dk>
Suggested-by: Fangrui Song <maskray@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Users of GNU ld (BFD) from binutils 2.39+ will observe multiple
instances of a new warning when linking kernels in the form:
ld: warning: vmlinux: missing .note.GNU-stack section implies executable stack
ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
ld: warning: vmlinux has a LOAD segment with RWX permissions
Generally, we would like to avoid the stack being executable. Because
there could be a need for the stack to be executable, assembler sources
have to opt-in to this security feature via explicit creation of the
.note.GNU-stack feature (which compilers create by default) or command
line flag --noexecstack. Or we can simply tell the linker the
production of such sections is irrelevant and to link the stack as
--noexecstack.
LLVM's LLD linker defaults to -z noexecstack, so this flag isn't
strictly necessary when linking with LLD, only BFD, but it doesn't hurt
to be explicit here for all linkers IMO. --no-warn-rwx-segments is
currently BFD specific and only available in the current latest release,
so it's wrapped in an ld-option check.
While the kernel makes extensive usage of ELF sections, it doesn't use
permissions from ELF segments.
Link: https://lore.kernel.org/linux-block/3af4127a-f453-4cf7-f133-a181cce06f73@kernel.dk/
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=ba951afb99912da01a6e8434126b8fac7aa75107
Link: https://github.com/llvm/llvm-project/issues/57009
Reported-and-tested-by: Jens Axboe <axboe@kernel.dk>
Suggested-by: Fangrui Song <maskray@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
It turns out that gcc-12.1 has some nasty problems with register
allocation on a 32-bit x86 build for the 64-bit values used in the
generic blake2b implementation, where the pattern of 64-bit rotates and
xor operations ends up making gcc generate horrible code.
As a result it ends up with a ridiculously large stack frame for all the
spills it generates, resulting in the following build problem:
crypto/blake2b_generic.c: In function ‘blake2b_compress_one_generic’:
crypto/blake2b_generic.c:109:1: error: the frame size of 2640 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
on the same test-case, clang ends up generating a stack frame that is
just 296 bytes (and older gcc versions generate a slightly bigger one at
428 bytes - still nowhere near that almost 3kB monster stack frame of
gcc-12.1).
The issue is fixed both in mainline and the GCC 12 release branch [1],
but current release compilers end up failing the i386 allmodconfig build
due to this issue.
Disable the warning for now by simply raising the frame size for this
one file, just to keep this issue from having people turn off WERROR.
Link: https://lore.kernel.org/all/CAHk-=wjxqgeG2op+=W9sqgsWqCYnavC+SRfVyopu9-31S6xw+Q@mail.gmail.com/
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105930 [1]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Now that the PMU is refreshed when MSR_IA32_PERF_CAPABILITIES is written
by host userspace, zero out the number of LBR records for a vCPU during
PMU refresh if PMU_CAP_LBR_FMT is not set in PERF_CAPABILITIES instead of
handling the check at run-time.
guest_cpuid_has() is expensive due to the linear search of guest CPUID
entries, intel_pmu_lbr_is_enabled() is checked on every VM-Enter, _and_
simply enumerating the same "Model" as the host causes KVM to set the
number of LBR records to a non-zero value.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220727233424.2968356-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Turn vcpu_to_lbr_desc() and vcpu_to_lbr_records() into functions in order
to provide type safety, to document exactly what they return, and to
allow consuming the helpers in vmx.h. Move the definitions as necessary
(the macros "reference" to_vmx() before its definition).
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220727233424.2968356-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Refresh the PMU if userspace modifies MSR_IA32_PERF_CAPABILITIES. KVM
consumes the vCPU's PERF_CAPABILITIES when enumerating PEBS support, but
relies on CPUID updates to refresh the PMU. I.e. KVM will do the wrong
thing if userspace stuffs PERF_CAPABILITIES _after_ setting guest CPUID.
Opportunistically fix a curly-brace indentation.
Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
Cc: Like Xu <like.xu.linux@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220727233424.2968356-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Test all possible input values to verify that KVM rejects all values
except the exact host value. Due to the LBR format affecting the core
functionality of LBRs, KVM can't emulate "other" formats, so even though
there are a variety of legal values, KVM should reject anything but an
exact host match.
Suggested-by: Like Xu <like.xu.linux@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
sched_getcpu() is glibc dependent and it can simply return the CPU
ID from the registered rseq information, as Florian Weimer pointed.
In this case, it's pointless to compare the return value from
sched_getcpu() and that fetched from the registered rseq information.
Fix the issue by replacing sched_getcpu() with getcpu(), as Florian
suggested. The comments are modified accordingly by replacing
"sched_getcpu()" with "getcpu()".
Reported-by: Yihuang Yu <yihyu@redhat.com>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20220810104114.6838-3-gshan@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The rseq information is registered by TLS, starting from glibc-2.35.
In this case, the test always fails due to syscall(__NR_rseq). For
example, on RHEL9.1 where upstream glibc-2.35 features are enabled
on downstream glibc-2.34, the test fails like below.
# ./rseq_test
==== Test Assertion Failure ====
rseq_test.c:60: !r
pid=112043 tid=112043 errno=22 - Invalid argument
1 0x0000000000401973: main at rseq_test.c:226
2 0x0000ffff84b6c79b: ?? ??:0
3 0x0000ffff84b6c86b: ?? ??:0
4 0x0000000000401b6f: _start at ??:?
rseq failed, errno = 22 (Invalid argument)
# rpm -aq | grep glibc-2
glibc-2.34-39.el9.aarch64
Fix the issue by using "../rseq/rseq.c" to fetch the rseq information,
registred by TLS if it exists. Otherwise, we're going to register our
own rseq information as before.
Reported-by: Yihuang Yu <yihyu@redhat.com>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Message-Id: <20220810104114.6838-2-gshan@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Doing debugfs creation after vm creation leaves things in a
quasi-initialized state for a while. This is further complicated by the
fact that we tear down debugfs from kvm_destroy_vm(). Align debugfs and
stats init/destroy with the vm init/destroy pattern to avoid any
headaches.
Note the fix for a benign mistake in error handling for calls to
kvm_arch_create_vm_debugfs() rolled in. Since all implementations of
the function return 0 unconditionally it isn't actually a bug at
the moment.
Lastly, tear down debugfs/stats data in the kvm_create_vm_debugfs()
error path. Previously it was safe to assume that kvm_destroy_vm() would
take out the garbage, that is no longer the case.
Signed-off-by: Oliver Upton <oupton@google.com>
Message-Id: <20220720092259.3491733-6-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
At the time the VM fd is used in kvm_create_vm_debugfs(), the fd has
been allocated but not yet installed. It is only really useful as an
identifier in strings for the VM (such as debugfs).
Treat it exactly as such by passing the string name of the fd to
kvm_create_vm_debugfs(), futureproofing against possible misuse of the
VM fd.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Message-Id: <20220720092259.3491733-5-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Allocate a VM's fd at the very beginning of kvm_dev_ioctl_create_vm() so
that KVM can use the fd value to generate strigns, e.g. for debugfs,
when creating and initializing the VM.
Signed-off-by: Oliver Upton <oupton@google.com>
Message-Id: <20220720092259.3491733-4-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Initialize stats_id alongside other kvm_vcpu fields to make it more
difficult to unintentionally access stats_id before it's set.
No functional change intended.
Signed-off-by: Oliver Upton <oupton@google.com>
Message-Id: <20220720092259.3491733-3-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Initialize stats_id alongside other struct kvm fields to make it more
difficult to unintentionally access stats_id before it's set. While at
it, move the format string to the first line of the call and fix the
indentation of the second line.
No functional change intended.
Signed-off-by: Oliver Upton <oupton@google.com>
Message-Id: <20220720092259.3491733-2-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add compile-time and init-time sanity checks to ensure that the MMIO SPTE
mask doesn't overlap the MMIO SPTE generation or the MMU-present bit.
The generation currently avoids using bit 63, but that's as much
coincidence as it is strictly necessarly. That will change in the future,
as TDX support will require setting bit 63 (SUPPRESS_VE) in the mask.
Explicitly carve out the bits that are allowed in the mask so that any
future shuffling of SPTE bits doesn't silently break MMIO caching (KVM
has broken MMIO caching more than once due to overlapping the generation
with other things).
Suggested-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-Id: <20220805194133.86299-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Rename the tracepoint function from trace_kvm_async_pf_doublefault() to
trace_kvm_async_pf_repeated_fault() to make it clear, since double fault
has nothing to do with this trace function.
Asynchronous Page Fault (APF) is an artifact generated by KVM when it
cannot find a physical page to satisfy an EPT violation. KVM uses APF to
tell the guest OS to do something else such as scheduling other guest
processes to make forward progress. However, when another guest process
also touches a previously APFed page, KVM halts the vCPU instead of
generating a repeated APF to avoid wasting cycles.
Double fault (#DF) clearly has a different meaning and a different
consequence when triggered. #DF requires two nested contributory exceptions
instead of two page faults faulting at the same address. A prevous bug on
APF indicates that it may trigger a double fault in the guest [1] and
clearly this trace function has nothing to do with it. So rename this
function should be a valid choice.
No functional change intended.
[1] https://www.spinics.net/lists/kvm/msg214957.html
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Message-Id: <20220807052141.69186-1-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Stop Xen timer (if it's running) prior to changing the IRQ vector and
potentially (re)starting the timer. Changing the IRQ vector while the
timer is still running can result in KVM injecting a garbage event, e.g.
vm_xen_inject_timer_irqs() could see a non-zero xen.timer_pending from
a previous timer but inject the new xen.timer_virq.
Fixes: 536395260582 ("KVM: x86/xen: handle PV timers oneshot mode")
Cc: stable@vger.kernel.org
Link: https://syzkaller.appspot.com/bug?id=8234a9dfd3aafbf092cc5a7cd9842e3ebc45fc42
Reported-by: syzbot+e54f930ed78eb0f85281@syzkaller.appspotmail.com
Signed-off-by: Coleman Dietsch <dietschc@csp.edu>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20220808190607.323899-3-dietschc@csp.edu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a check for existing xen timers before initializing a new one.
Currently kvm_xen_init_timer() is called on every
KVM_XEN_VCPU_ATTR_TYPE_TIMER, which is causing the following ODEBUG
crash when vcpu->arch.xen.timer is already set.
ODEBUG: init active (active state 0)
object type: hrtimer hint: xen_timer_callbac0
RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:502
Call Trace:
__debug_object_init
debug_hrtimer_init
debug_init
hrtimer_init
kvm_xen_init_timer
kvm_xen_vcpu_set_attr
kvm_arch_vcpu_ioctl
kvm_vcpu_ioctl
vfs_ioctl
Fixes: 536395260582 ("KVM: x86/xen: handle PV timers oneshot mode")
Cc: stable@vger.kernel.org
Link: https://syzkaller.appspot.com/bug?id=8234a9dfd3aafbf092cc5a7cd9842e3ebc45fc42
Reported-by: syzbot+e54f930ed78eb0f85281@syzkaller.appspotmail.com
Signed-off-by: Coleman Dietsch <dietschc@csp.edu>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220808190607.323899-2-dietschc@csp.edu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Disable SEV-ES if MMIO caching is disabled as SEV-ES relies on MMIO SPTEs
generating #NPF(RSVD), which are reflected by the CPU into the guest as
a #VC. With SEV-ES, the untrusted host, a.k.a. KVM, doesn't have access
to the guest instruction stream or register state and so can't directly
emulate in response to a #NPF on an emulated MMIO GPA. Disabling MMIO
caching means guest accesses to emulated MMIO ranges cause #NPF(!PRESENT),
and those flavors of #NPF cause automatic VM-Exits, not #VC.
Adjust KVM's MMIO masks to account for the C-bit location prior to doing
SEV(-ES) setup, and document that dependency between adjusting the MMIO
SPTE mask and SEV(-ES) setup.
Fixes: b09763da4dd8 ("KVM: x86/mmu: Add module param to disable MMIO caching (for testing)")
Reported-by: Michael Roth <michael.roth@amd.com>
Tested-by: Michael Roth <michael.roth@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220803224957.1285926-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Fully re-evaluate whether or not MMIO caching can be enabled when SPTE
masks change; simply clearing enable_mmio_caching when a configuration
isn't compatible with caching fails to handle the scenario where the
masks are updated, e.g. by VMX for EPT or by SVM to account for the C-bit
location, and toggle compatibility from false=>true.
Snapshot the original module param so that re-evaluating MMIO caching
preserves userspace's desire to allow caching. Use a snapshot approach
so that enable_mmio_caching still reflects KVM's actual behavior.
Fixes: 8b9e74bfbf8c ("KVM: x86/mmu: Use enable_mmio_caching to track if MMIO caching is enabled")
Reported-by: Michael Roth <michael.roth@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Tested-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-Id: <20220803224957.1285926-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Mark kvm_mmu_x86_module_init() with __init, the entire reason it exists
is to initialize variables when kvm.ko is loaded, i.e. it must never be
called after module initialization.
Fixes: 1d0e84806047 ("KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loaded")
Cc: stable@vger.kernel.org
Reviewed-by: Kai Huang <kai.huang@intel.com>
Tested-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220803224957.1285926-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The emulator mishandles LEA with register source operand. Even though such
LEA is illegal, it can be encoded and fed to CPU. In which case real
hardware throws #UD. The emulator, instead, returns address of
x86_emulate_ctxt._regs. This info leak hurts host's kASLR.
Tell the decoder that illegal LEA is not to be emulated.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Message-Id: <20220729134801.1120-1-mhal@rbox.co>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
kvm_fixup_and_inject_pf_error() was introduced to fixup the error code(
e.g., to add RSVD flag) and inject the #PF to the guest, when guest
MAXPHYADDR is smaller than the host one.
When it comes to nested, L0 is expected to intercept and fix up the #PF
and then inject to L2 directly if
- L2.MAXPHYADDR < L0.MAXPHYADDR and
- L1 has no intention to intercept L2's #PF (e.g., L2 and L1 have the
same MAXPHYADDR value && L1 is using EPT for L2),
instead of constructing a #PF VM Exit to L1. Currently, with PFEC_MASK
and PFEC_MATCH both set to 0 in vmcs02, the interception and injection
may happen on all L2 #PFs.
However, failing to initialize 'fault' in kvm_fixup_and_inject_pf_error()
may cause the fault.async_page_fault being NOT zeroed, and later the #PF
being treated as a nested async page fault, and then being injected to L1.
Instead of zeroing 'fault' at the beginning of this function, we mannually
set the value of 'fault.async_page_fault', because false is the value we
really expect.
Fixes: 897861479c064 ("KVM: x86: Add helper functions for illegal GPA checking and page fault injection")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216178
Reported-by: Yang Lixiao <lixiao.yang@intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220718074756.53788-1-yu.c.zhang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Bug the VM if retrieving the x2APIC MSR/register while processing an
accelerated vAPIC trap VM-Exit fails. In theory it's impossible for the
lookup to fail as hardware has already validated the register, but bugs
happen, and not checking the result of kvm_lapic_msr_read() would result
in consuming the uninitialized "val" if a KVM or hardware bug occurs.
Fixes: 1bd9dfec9fd4 ("KVM: x86: Do not block APIC write for non ICR registers")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220804235028.1766253-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Commit 7e2175ebd695 ("KVM: x86: Fix recording of guest steal time
/ preempted status", 2021-11-11) open coded the previous call to
kvm_map_gfn, but in doing so it dropped the comparison between the cached
guest physical address and the one in the MSR. This cause an incorrect
cache hit if the guest modifies the steal time address while the memslots
remain the same. This can happen with kexec, in which case the preempted
bit is written at the address used by the old kernel instead of
the old one.
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: stable@vger.kernel.org
Fixes: 7e2175ebd695 ("KVM: x86: Fix recording of guest steal time / preempted status")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Commit 7e2175ebd695 ("KVM: x86: Fix recording of guest steal time
/ preempted status", 2021-11-11) open coded the previous call to
kvm_map_gfn, but in doing so it dropped the comparison between the cached
guest physical address and the one in the MSR. This cause an incorrect
cache hit if the guest modifies the steal time address while the memslots
remain the same. This can happen with kexec, in which case the steal
time data is written at the address used by the old kernel instead of
the old one.
While at it, rename the variable from gfn to gpa since it is a plain
physical address and not a right-shifted one.
Reported-by: Dave Young <ruyang@redhat.com>
Reported-by: Xiaoying Yan <yiyan@redhat.com>
Analyzed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: stable@vger.kernel.org
Fixes: 7e2175ebd695 ("KVM: x86: Fix recording of guest steal time / preempted status")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Commit 49de12ba06ef ("selftests: drop KSFT_KHDR_INSTALL make target")
dropped from tools/testing/selftests/lib.mk the code related to KSFT_KHDR_INSTALL,
but in doing so it also dropped the definition of the ARCH variable. The ARCH
variable is used in several subdirectories, but kvm/ is the only one of these
that was using KSFT_KHDR_INSTALL.
As a result, kvm selftests cannot be built anymore:
In file included from include/x86_64/vmx.h:12,
from x86_64/vmx_pmu_caps_test.c:18:
include/x86_64/processor.h:15:10: fatal error: asm/msr-index.h: No such file or directory
15 | #include <asm/msr-index.h>
| ^~~~~~~~~~~~~~~~~
In file included from ../../../../tools/include/asm/atomic.h:6,
from ../../../../tools/include/linux/atomic.h:5,
from rseq_test.c:15:
../../../../tools/include/asm/../../arch/x86/include/asm/atomic.h:11:10: fatal error: asm/cmpxchg.h: No such file or directory
11 | #include <asm/cmpxchg.h>
| ^~~~~~~~~~~~~~~
Fix it by including the definition that was present in lib.mk.
Fixes: 49de12ba06ef ("selftests: drop KSFT_KHDR_INSTALL make target")
Cc: Guillaume Tucker <guillaume.tucker@collabora.com>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a test to ensure sleepable program is rejected by hash map iterator.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-10-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add test to validate the overwrite of sock local storage map value in
map iterator and another one to ensure out-of-bound value writing is
rejected.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-9-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
After closing both related link fd and map fd, reading the map
iterator fd to ensure it is OK to do so.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-8-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When a sleepable program is attached to a hash map iterator, might_fault()
will report "BUG: sleeping function called from invalid context..." if
CONFIG_DEBUG_ATOMIC_SLEEP is enabled. The reason is that rcu_read_lock()
is held in bpf_hash_map_seq_next() and won't be released until all elements
are traversed or bpf_hash_map_seq_stop() is called.
Fixing it by reusing BPF_ITER_RESCHED to indicate that only non-sleepable
program is allowed for iterator without BPF_ITER_RESCHED. We can revise
bpf_iter_link_attach() later if there are other conditions which may
cause rcu_read_lock() or spin_lock() issues.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20220810080538.1845898-7-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|