Age | Commit message (Collapse) | Author | Files | Lines |
|
The vDSO getcpu() reads CPU ID from the GDT_ENTRY_CPUNODE entry when the RDPID
instruction is not available. And GDT_ENTRY_CPUNODE is defined as 28 on 32-bit
Linux kernel and 15 on 64-bit. But the 32-bit getcpu() on 64-bit Linux kernel
is compiled with 32-bit Linux kernel GDT_ENTRY_CPUNODE, i.e., 28, beyond the
64-bit Linux kernel GDT limit. Thus, it just fails _silently_.
When BUILD_VDSO32_64 is defined, choose the 64-bit Linux kernel GDT definitions
to compile the 32-bit getcpu().
Fixes: 877cff5296faa6e ("x86/vdso: Fake 32bit VDSO build on 64bit compile for vgetcpu")
Reported-by: kernel test robot <yujie.liu@intel.com>
Reported-by: Shan Kang <shan.kang@intel.com>
Signed-off-by: Xin Li <xin3.li@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230322061758.10639-1-xin3.li@intel.com
Link: https://lore.kernel.org/oe-lkp/202303020903.b01fd1de-yujie.liu@intel.com
|
|
Tao Liu reported a boot hang on an Intel Atom machine due to an unmapped
EFI config table. The reason being that the CC blob which contains the
CPUID page for AMD SNP guests is parsed for before even checking
whether the machine runs on AMD hardware.
Usually that's not a problem on !AMD hw - it simply won't find the CC
blob's GUID and return. However, if any parts of the config table
pointers array is not mapped, the kernel will #PF very early in the
decompressor stage without any opportunity to recover.
Therefore, do a superficial CPUID check before poking for the CC blob.
This will fix the current issue on real hardware. It would also work as
a guest on a non-lying hypervisor.
For the lying hypervisor, the check is done again, *after* parsing the
CC blob as the real CPUID page will be present then.
Clear the #VC handler in case SEV-{ES,SNP} hasn't been detected, as
a precaution.
Fixes: c01fce9cef84 ("x86/compressed: Add SEV-SNP feature detection/setup")
Reported-by: Tao Liu <ltao@redhat.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Tao Liu <ltao@redhat.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230601072043.24439-1-ltao@redhat.com
|
|
|
|
Now that we removed ->iterate we don't need to check for either
->iterate or ->iterate_shared in file_needs_f_pos_lock(). Simply check
for ->iterate_shared instead. This will tell us whether we need to
unconditionally take the lock. Not just does it allow us to avoid
checking f_inode's mode it also actually clearly shows that we're
locking because of readdir.
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
All users now just use '->iterate_shared()', which only takes the
directory inode lock for reading.
Filesystems that never got convered to shared mode now instead use a
wrapper that drops the lock, re-takes it in write mode, calls the old
function, and then downgrades the lock back to read mode.
This way the VFS layer and other callers no longer need to care about
filesystems that never got converted to the modern era.
The filesystems that use the new wrapper are ceph, coda, exfat, jfs,
ntfs, ocfs2, overlayfs, and vboxsf.
Honestly, several of them look like they really could just iterate their
directories in shared mode and skip the wrapper entirely, but the point
of this change is to not change semantics or fix filesystems that
haven't been fixed in the last 7+ years, but to finally get rid of the
dual iterators.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
I'm looking at the directory handling due to the discussion about f_pos
locking (see commit 797964253d35: "file: reinstate f_pos locking
optimization for regular files"), and wanting to clean that up.
And one source of ugliness is how we were supposed to move filesystems
over to the '->iterate_shared()' function that only takes the inode lock
for reading many many years ago, but several filesystems still use the
bad old '->iterate()' that takes the inode lock for exclusive access.
See commit 6192269444eb ("introduce a parallel variant of ->iterate()")
that also added some documentation stating
Old method is only used if the new one is absent; eventually it will
be removed. Switch while you still can; the old one won't stay.
and that was back in April 2016. Here we are, many years later, and the
old version is still clearly sadly alive and well.
Now, some of those old style iterators are probably just because the
filesystem may end up having per-inode mutable data that it uses for
iterating a directory, but at least one case is just a mistake.
Al switched over most filesystems to use '->iterate_shared()' back when
it was introduced. In particular, the /proc filesystem was converted as
one of the first ones in commit f50752eaa0b0 ("switch all procfs
directories ->iterate_shared()").
But then later one new user of '->iterate()' was then re-introduced by
commit 6d9c939dbe4d ("procfs: add smack subdir to attrs").
And that's clearly not what we wanted, since that new case just uses the
same 'proc_pident_readdir()' and 'proc_pident_lookup()' helper functions
that other /proc pident directories use, and they are most definitely
safe to use with the inode lock held shared.
So just fix it.
This still leaves a fair number of oddball filesystems using the
old-style directory iterator (ceph, coda, exfat, jfs, ntfs, ocfs2,
overlayfs, and vboxsf), but at least we don't have any remaining in the
core filesystems.
I'm going to add a wrapper function that just drops the read-lock and
takes it as a write lock, so that we can clean up the core vfs layer and
make all the ugly 'this filesystem needs exclusive inode locking' be
just filesystem-internal warts.
I just didn't want to make that conversion when we still had a core user
left.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
O_TMPFILE is actually __O_TMPFILE|O_DIRECTORY. This means that the old
fast-path check for RESOLVE_CACHED would reject all users passing
O_DIRECTORY with -EAGAIN, when in fact the intended test was to check
for __O_TMPFILE.
Cc: stable@vger.kernel.org # v5.12+
Fixes: 99668f618062 ("fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Message-Id: <20230806-resolve_cached-o_tmpfile-v1-1-7ba16308465e@cyphar.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In commit 20ea1e7d13c1 ("file: always lock position for
FMODE_ATOMIC_POS") we ended up always taking the file pos lock, because
pidfd_getfd() could get a reference to the file even when it didn't have
an elevated file count due to threading of other sharing cases.
But Mateusz Guzik reports that the extra locking is actually measurable,
so let's re-introduce the optimization, and only force the locking for
directory traversal.
Directories need the lock for correctness reasons, while regular files
only need it for "POSIX semantics". Since pidfd_getfd() is about
debuggers etc special things that are _way_ outside of POSIX, we can
relax the rules for that case.
Reported-by: Mateusz Guzik <mjguzik@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/linux-fsdevel/20230803095311.ijpvhx3fyrbkasul@f/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We have a function sve_sync_from_fpsimd_zeropad() which is used by the
ptrace code to update the SVE state when the user writes to the the
FPSIMD register set. Currently this checks that the task has SVE
enabled but this will miss updates for tasks which have streaming SVE
enabled if SVE has not been enabled for the thread, also do the
conversion if the task has streaming SVE enabled.
Fixes: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-ssve-no-sve-v1-3-49df214bfb3e@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently we guard FPSIMD/SVE state conversions with a check for the system
supporting SVE but SME only systems may need to sync streaming mode SVE
state so add a check for SME support too. These functions are only used
by the ptrace code.
Fixes: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-ssve-no-sve-v1-2-49df214bfb3e@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Systems which implement SME without also implementing SVE are
architecturally valid but were not initially supported by the kernel,
unfortunately we missed one issue in the ptrace code.
The SVE register setting code is shared between SVE and streaming mode
SVE. When we set full SVE register state we currently enable TIF_SVE
unconditionally, in the case where streaming SVE is being configured on a
system that supports vanilla SVE this is not an issue since we always
initialise enough state for both vector lengths but on a system which only
support SME it will result in us attempting to restore the SVE vector
length after having set streaming SVE registers.
Fix this by making the enabling of SVE conditional on setting SVE vector
state. If we set streaming SVE state and SVE was not already enabled this
will result in a SVE access trap on next use of normal SVE, this will cause
us to flush our register state but this is fine since the only way to
trigger a SVE access trap would be to exit streaming mode which will cause
the in register state to be flushed anyway.
Fixes: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-ssve-no-sve-v1-1-49df214bfb3e@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
With commit 2d47c6956ab3 ("ubsan: Tighten UBSAN_BOUNDS on GCC") if
CONFIG_UBSAN is enabled and gcc supports -fsanitize=bounds-strict, we
can trigger the following build error due to bindgen lacking support for
this additional build option:
BINDGEN rust/bindings/bindings_generated.rs
error: unsupported argument 'bounds-strict' to option '-fsanitize='
Fix by adding -fsanitize=bounds-strict to the list of skipped gcc flags
for bindgen.
Fixes: 2d47c6956ab3 ("ubsan: Tighten UBSAN_BOUNDS on GCC")
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Link: https://lore.kernel.org/r/20230711071914.133946-1-andrea.righi@canonical.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
We discovered that the current design of `borrow_mut` is problematic.
This patch removes it until a better solution can be found.
Specifically, the current design gives you access to a `&mut T`, which
lets you change where the `ForeignOwnable` points (e.g., with
`core::mem::swap`). No upcoming user of this API intended to make that
possible, making all of them unsound.
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com>
Fixes: 0fc4424d24a2 ("rust: types: introduce `ForeignOwnable`")
Link: https://lore.kernel.org/r/20230706094615.3080784-1-aliceryhl@google.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
Currently the rust allocator simply passes the size of the type Layout
to krealloc(), and in theory the alignment requirement from the type
Layout may be larger than the guarantee provided by SLAB, which means
the allocated object is mis-aligned.
Fix this by adjusting the allocation size to the nearest power of two,
which SLAB always guarantees a size-aligned allocation. And because Rust
guarantees that the original size must be a multiple of alignment and
the alignment must be a power of two, then the alignment requirement is
satisfied.
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Co-developed-by: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>
Signed-off-by: "Andreas Hindborg (Samsung)" <nmi@metaspace.dk>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Cc: stable@vger.kernel.org # v6.1+
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Fixes: 247b365dc8dc ("rust: add `kernel` crate")
Link: https://github.com/Rust-for-Linux/linux/issues/974
Link: https://lore.kernel.org/r/20230730012905.643822-2-boqun.feng@gmail.com
[ Applied rewording of comment as discussed in the mailing list. ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
Willem and Jason have agreed to take over the maintainer
duties for TUN/TAP, thank you!
There's an existing entry for TUN/TAP which only covers
the user mode Linux implementation.
Since we haven't heard from Maxim on the list for almost
a decade, extend that entry and take it over, rather than
adding a new one.
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20230802182843.4193099-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We forgot to add vsock_perf to the rm command in the `clean`
target, so now we have a left over after `make clean` in
tools/testing/vsock.
Fixes: 8abbffd27ced ("test/vsock: vsock_perf utility")
Cc: AVKrasnov@sberdevices.ru
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org> # build-tested
Link: https://lore.kernel.org/r/20230803085454.30897-1-sgarzare@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Whenever tcpm_new() reclaims an old entry, tcpm_suck_dst()
would overwrite data that could be read from tcp_fastopen_cache_get()
or tcp_metrics_fill_info().
We need to acquire fastopen_seqlock to maintain consistency.
For newly allocated objects, tcpm_new() can switch to kzalloc()
to avoid an extra fastopen_seqlock acquisition.
Fixes: 1fe4c481ba63 ("net-tcp: Fast Open client - cookie cache")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230802131500.1478140-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tm->tcpm_net can be read or written locklessly.
Instead of changing write_pnet() and read_pnet() and potentially
hurt performance, add the needed READ_ONCE()/WRITE_ONCE()
in tm_net() and tcpm_new().
Fixes: 849e8a0ca8d5 ("tcp_metrics: Add a field tcpm_net and verify it matches on lookup")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230802131500.1478140-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tm->tcpm_vals[] values can be read or written locklessly.
Add needed READ_ONCE()/WRITE_ONCE() to document this,
and force use of tcp_metric_get() and tcp_metric_set()
Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tm->tcpm_lock can be read or written locklessly.
Add needed READ_ONCE()/WRITE_ONCE() to document this.
Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230802131500.1478140-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tm->tcpm_stamp can be read or written locklessly.
Add needed READ_ONCE()/WRITE_ONCE() to document this.
Also constify tcpm_check_stamp() dst argument.
Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230802131500.1478140-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Because v4 and v6 families use separate inetpeer trees (respectively
net->ipv4.peers and net->ipv6.peers), inetpeer_addr_cmp(a, b) assumes
a & b share the same family.
tcp_metrics use a common hash table, where entries can have different
families.
We must therefore make sure to not call inetpeer_addr_cmp()
if the families do not match.
Fixes: d39d14ffa24c ("net: Add helper function to compare inetpeer addresses")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230802131500.1478140-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When both supported and previous version have the same major version,
and the firmwares are missing, the driver ends in a loop requesting the
same (previous) version over and over again:
[ 76.327413] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.1.img firmware, fall-back to previous 4.0 version
[ 76.339802] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.352162] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.364502] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.376848] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.389183] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.401522] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.413860] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
[ 76.426199] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.0.img firmware, fall-back to previous 4.0 version
...
Fix this by inverting the check to that we aren't yet at the previous
version, and also check the minor version.
This also catches the case where both versions are the same, as it was
after commit bb5dbf2cc64d ("net: marvell: prestera: add firmware v4.0
support").
With this fix applied:
[ 88.499622] Prestera DX 0000:01:00.0: missing latest mrvl/prestera/mvsw_prestera_fw-v4.1.img firmware, fall-back to previous 4.0 version
[ 88.511995] Prestera DX 0000:01:00.0: failed to request previous firmware: mrvl/prestera/mvsw_prestera_fw-v4.0.img
[ 88.522403] Prestera DX: probe of 0000:01:00.0 failed with error -2
Fixes: 47f26018a414 ("net: marvell: prestera: try to load previous fw version")
Signed-off-by: Jonas Gorski <jonas.gorski@bisdn.de>
Acked-by: Elad Nachman <enachman@marvell.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Acked-by: Taras Chornyi <taras.chornyi@plvision.eu>
Link: https://lore.kernel.org/r/20230802092357.163944-1-jonas.gorski@bisdn.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When setting ZT0 via ptrace we do not currently force a reload of the
floating point register state from memory, do that to ensure that the newly
set value gets loaded into the registers on next task execution.
The function was templated off the function for FPSIMD which due to our
providing the option of embedding a FPSIMD regset within the SVE regset
does not directly include the flush.
Fixes: f90b529bcbe5 ("arm64/sme: Implement ZT0 ptrace support")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-zt0-flush-v1-1-72e854eaf96e@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
When setting SME vector lengths we clear TIF_SME to reenable SME traps,
doing a reallocation of the backing storage on next use. We do this using
clear_thread_flag() which operates on the current thread, meaning that when
setting the vector length via ptrace we may both not force traps for the
target task and force a spurious flush of any SME state that the tracing
task may have.
Clear the flag in the target task.
Fixes: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Reported-by: David Spickett <David.Spickett@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230803-arm64-fix-ptrace-tif-sme-v1-1-88312fd6fbfd@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Fix checkpatch warnings:
unaligned.c:475: ERROR: space required after that ','
Signed-off-by: Yu Han <hanyu001@208suo.com>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
This driver does not actually work with DMA mode, but still tries
to call ISA DMA interface functions that are stubbed out on
parisc, resulting in a W=1 build warning:
drivers/parport/parport_gsc.c: In function 'parport_remove_chip':
drivers/parport/parport_gsc.c:389:20: warning: suggest braces around empty body in an 'if' statement [-Wempty-body]
389 | free_dma(p->dma);
Remove the corresponding code as a prerequisite for turning on -Wempty-body
by default in all kernels.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Clearly, this code isn't needed, but it gives a false positive when
grepping the complete source tree for coherent_dma_mask.
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Christoph Biedl reported early OOM on recent kernels:
swapper: page allocation failure: order:0, mode:0x100(__GFP_ZERO),
nodemask=(null)
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
Backtrace:
[<10408594>] show_stack+0x48/0x5c
[<10e152d8>] dump_stack_lvl+0x48/0x64
[<10e15318>] dump_stack+0x24/0x34
[<105cf7f8>] warn_alloc+0x10c/0x1c8
[<105d068c>] __alloc_pages+0xbbc/0xcf8
[<105d0e4c>] __get_free_pages+0x28/0x78
[<105ad10c>] __pte_alloc_kernel+0x30/0x98
[<10406934>] set_fixmap+0xec/0xf4
[<10411ad4>] patch_map.constprop.0+0xa8/0xdc
[<10411bb0>] __patch_text_multiple+0xa8/0x208
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4
Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0
slab_reclaimable:0 slab_unreclaimable:0
mapped:0 shmem:0 pagetables:0
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:0 free_pcp:0 free_cma:0
Node 0 active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
mapped:0kB dirty:0kB writeback:0kB shmem:0kB
+writeback_tmp:0kB kernel_stack:0kB pagetables:0kB sec_pagetables:0kB
all_unreclaimable? no
Normal free:0kB boost:0kB min:0kB low:0kB high:0kB
reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB
+present:1048576kB managed:1039360kB mlocked:0kB bounce:0kB free_pcp:0kB
local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 0
Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 0kB
0 total pagecache pages
0 pages in swap cache
Free swap = 0kB
Total swap = 0kB
262144 pages RAM
0 pages HighMem/MovableOnly
2304 pages reserved
Backtrace:
[<10411d78>] patch_text+0x30/0x48
[<1041246c>] arch_jump_label_transform+0x90/0xcc
[<1056f734>] jump_label_update+0xd4/0x184
[<1056fc9c>] static_key_enable_cpuslocked+0xc0/0x110
[<1056fd08>] static_key_enable+0x1c/0x2c
[<1011362c>] init_mem_debugging_and_hardening+0xdc/0xf8
[<1010141c>] start_kernel+0x5f0/0xa98
[<10105da8>] start_parisc+0xb8/0xe4
Kernel Fault: Code=15 (Data TLB miss fault) at addr 0f7fe3c0
CPU: 0 PID: 0 Comm: swapper Not tainted 6.3.0-rc4+ #16
Hardware name: 9000/785/C3600
This happens because patching static key code temporarily maps it via
fixmap and if it happens before page allocator is initialized set_fixmap()
cannot allocate memory using pte_alloc_kernel().
Make sure that fixmap page tables are preallocated early so that
pte_offset_kernel() in set_fixmap() never resorts to pte allocation.
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Christoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
Tested-by: John David Anglin <dave.anglin@bell.net>
Cc: <stable@vger.kernel.org> # v6.4+
|
|
__ip_append_data() can get into an infinite loop when asked to splice into
a partially-built UDP message that has more than the frag-limit data and up
to the MTU limit. Something like:
pipe(pfd);
sfd = socket(AF_INET, SOCK_DGRAM, 0);
connect(sfd, ...);
send(sfd, buffer, 8161, MSG_CONFIRM|MSG_MORE);
write(pfd[1], buffer, 8);
splice(pfd[0], 0, sfd, 0, 0x4ffe0ul, 0);
where the amount of data given to send() is dependent on the MTU size (in
this instance an interface with an MTU of 8192).
The problem is that the calculation of the amount to copy in
__ip_append_data() goes negative in two places, and, in the second place,
this gets subtracted from the length remaining, thereby increasing it.
This happens when pagedlen > 0 (which happens for MSG_ZEROCOPY and
MSG_SPLICE_PAGES), because the terms in:
copy = datalen - transhdrlen - fraggap - pagedlen;
then mostly cancel when pagedlen is substituted for, leaving just -fraggap.
This causes:
length -= copy + transhdrlen;
to increase the length to more than the amount of data in msg->msg_iter,
which causes skb_splice_from_iter() to be unable to fill the request and it
returns less than 'copied' - which means that length never gets to 0 and we
never exit the loop.
Fix this by:
(1) Insert a note about the dodgy calculation of 'copy'.
(2) If MSG_SPLICE_PAGES, clear copy if it is negative from the above
equation, so that 'offset' isn't regressed and 'length' isn't
increased, which will mean that length and thus copy should match the
amount left in the iterator.
(3) When handling MSG_SPLICE_PAGES, give a warning and return -EIO if
we're asked to splice more than is in the iterator. It might be
better to not give the warning or even just give a 'short' write.
[!] Note that this ought to also affect MSG_ZEROCOPY, but MSG_ZEROCOPY
avoids the problem by simply assuming that everything asked for got copied,
not just the amount that was in the iterator. This is a potential bug for
the future.
Fixes: 7ac7c987850c ("udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES")
Reported-by: syzbot+f527b971b4bdc8e79f9e@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/000000000000881d0606004541d1@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
cc: David Ahern <dsahern@kernel.org>
cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/1420063.1690904933@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix typo in setup_fte_upper_proto_match() where destination UDP port
was used instead of source port.
Fixes: a7385187a386 ("net/mlx5e: IPsec, support upper protocol selector field offload")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/ffc024a4d192113103f392b0502688366ca88c1f.1690803944.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the cited commit, new type of FS_TYPE_PRIO_CHAINS fs_prio was added
to support multiple parallel namespaces for multi-chains. And we skip
all the flow tables under the fs_node of this type unconditionally,
when searching for the next or previous flow table to connect for a
new table.
As this search function is also used for find new root table when the
old one is being deleted, it will skip the entire FS_TYPE_PRIO_CHAINS
fs_node next to the old root. However, new root table should be chosen
from it if there is any table in it. Fix it by skipping only the flow
tables in the same FS_TYPE_PRIO_CHAINS fs_node when finding the
closest FT for a fs_node.
Besides, complete the connecting from FTs of previous priority of prio
because there should be multiple prevs after this fs_prio type is
introduced. And also the next FT should be chosen from the first flow
table next to the prio in the same FS_TYPE_PRIO_CHAINS fs_prio, if
this prio is the first child.
Fixes: 328edb499f99 ("net/mlx5: Split FDB fast path prio to multiple namespaces")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/7a95754df479e722038996c97c97b062b372591f.1690803944.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As find_closest_ft_recursive is called to find the closest FT, the
first parameter of find_closest_ft can be changed from fs_prio to
fs_node. Thus this function is extended to find the closest FT for the
nodes of any type, not only prios, but also the sub namespaces.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/d3962c2b443ec8dde7a740dc742a1f052d5e256c.1690803944.git.leonro@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Since commit 30fbee49b071 ("Staging: hv: vmbus: Get rid of the unused function vmbus_ontimer()")
this is not used anymore, so can remove it.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20230725142108.27280-1-yuehaibing@huawei.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
Hyper-V can run VMs at different privilege "levels" known as Virtual
Trust Levels (VTL). Sometimes, it chooses to run two different VMs
at different levels but they share some of their address space. In
such setups VTL2 (higher level VM) has visibility of all of the
VTL0 (level 0) memory space.
When the CONFIG_X86_MPPARSE is enabled for VTL2, the VTL2 kernel
performs a search within the low memory to locate MP tables. However,
in systems where VTL0 manages the low memory and may contain valid
tables, this scanning can result in incorrect MP table information
being provided to the VTL2 kernel, mistakenly considering VTL0's MP
table as its own
Add noop functions to avoid MP parse scan by VTL2.
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/1687537688-5397-1-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
RISC-V Linux exports "va_kernel_pa_offset" in vmcoreinfo to help
Crash-utility translate the kernel virtual address correctly.
Here adds the definition of "va_kernel_pa_offset".
Fixes: 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping")
Link: https://lore.kernel.org/linux-riscv/20230724040649.220279-1-suagrfillet@gmail.com/
Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230724100917.309061-2-suagrfillet@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Since RISC-V Linux v6.4, the commit 3335068f8721 ("riscv: Use
PUD/P4D/PGD pages for the linear mapping") changes phys_ram_base
from the physical start of the kernel to the actual start of the DRAM.
The Crash-utility's VTOP() still uses phys_ram_base and kernel_map.virt_addr
to translate kernel virtual address, that failed the Crash with Linux v6.4 [1].
Export kernel_map.va_kernel_pa_offset in vmcoreinfo to help Crash translate
the kernel virtual address correctly.
Fixes: 3335068f8721 ("riscv: Use PUD/P4D/PGD pages for the linear mapping")
Link: https://lore.kernel.org/linux-riscv/20230724040649.220279-1-suagrfillet@gmail.com/ [1]
Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Reviewed-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230724100917.309061-1-suagrfillet@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
acpi_os_ioremap() currently is a wrapper to memremap() on
RISC-V. But the callers of acpi_os_ioremap() expect it to
return __iomem address and hence sparse tool reports a new
warning. Fix this issue by type casting to __iomem type.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307230357.egcTAefj-lkp@intel.com/
Fixes: a91a9ffbd3a5 ("RISC-V: Add support to build the ACPI core")
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230724100346.1302937-1-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
The following error happens:
In file included from vstate_exec_nolibc.c:2:
/usr/include/riscv64-linux-gnu/sys/prctl.h:42:12: error: conflicting types for ‘prctl’; h
ave ‘int(int, ...)’
42 | extern int prctl (int __option, ...) __THROW;
| ^~~~~
In file included from ./../../../../include/nolibc/nolibc.h:99,
from <command-line>:
./../../../../include/nolibc/sys.h:892:5: note: previous definition of ‘prctl’ with type
‘int(int, long unsigned int, long unsigned int, long unsigned int, long unsigned int)
’
892 | int prctl(int option, unsigned long arg2, unsigned long arg3,
| ^~~~~
Fix this by not including <sys/prctl.h>, which is not needed here since
prctl syscall is directly called using its number.
Fixes: 7cf6198ce22d ("selftests: Test RISC-V Vector prctl interface")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230713115829.110421-1-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
The riscv selftests (which were modeled after the arm64 selftests) are
improperly declaring the "emit_tests" target to depend upon the "all"
target. This approach, when combined with commit 9fc96c7c19df
("selftests: error out if kernel header files are not yet built"), has
caused build failures [1] on arm64, and is likely to cause similar
failures for riscv.
To fix this, simply remove the unnecessary "all" dependency from the
emit_tests target. The dependency is still effectively honored, because
again, invocation is via "install", which also depends upon "all".
An alternative approach would be to harden the emit_tests target so that
it can depend upon "all", but that's a lot more complicated and hard to
get right, and doesn't seem worth it, especially given that emit_tests
should probably not be overridden at all.
[1] https://lore.kernel.org/20230710-kselftest-fix-arm64-v1-1-48e872844f25@kernel.org
Fixes: 9fc96c7c19df ("selftests: error out if kernel header files are not yet built")
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230712193514.740033-1-jhubbard@nvidia.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Customer reported that they couldn't mount their DFS link that was
seen by the client as a DFS interlink -- special form of DFS link
where its single target may point to a different DFS namespace -- and
it turned out that it was just a regular DFS link where its referral
header flags missed the StorageServers bit thus making the client
think it couldn't tree connect to target directly without requiring
further referrals.
When the DFS link referral header flags misses the StoraServers bit
and its target doesn't respond to any referrals, then tree connect to
it.
Fixes: a1c0d00572fc ("cifs: share dfs connections and supers")
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Compiling big-endian targets with Clang produces the diagnostic:
fs/namei.c:2173:13: warning: use of bitwise '|' with boolean operands [-Wbitwise-instead-of-logical]
} while (!(has_zero(a, &adata, &constants) | has_zero(b, &bdata, &constants)));
~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
||
fs/namei.c:2173:13: note: cast one or both operands to int to silence this warning
It appears that when has_zero was introduced, two definitions were
produced with different signatures (in particular different return
types).
Looking at the usage in hash_name() in fs/namei.c, I suspect that
has_zero() is meant to be invoked twice per while loop iteration; using
logical-or would not update `bdata` when `a` did not have zeros. So I
think it's preferred to always return an unsigned long rather than a
bool than update the while loop in hash_name() to use a logical-or
rather than bitwise-or.
[ Also changed powerpc version to do the same - Linus ]
Link: https://github.com/ClangBuiltLinux/linux/issues/1832
Link: https://lore.kernel.org/lkml/20230801-bitwise-v1-1-799bec468dc4@google.com/
Fixes: 36126f8f2ed8 ("word-at-a-time: make the interfaces truly generic")
Debugged-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
On a powermac platform, via the call path:
start_kernel()
time_init()
ppc_md.calibrate_decr() (pmac_calibrate_decr)
via_calibrate_decr()
ioremap() and iounmap() are called. The unmap can enable interrupts
unexpectedly (cond_resched() in vunmap_pmd_range()), which causes a
warning later in the boot sequence in start_kernel().
Use the early_* variants of these IO functions to prevent this.
The issue is pre-existing, but is surfaced by commit 721255b9826b
("genirq: Use a maple tree for interrupt descriptor management").
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230706010816.72682-1-bgray@linux.ibm.com
|
|
Using brcmfmac with 6.5-rc3 on a brcmfmac43241b4-sdio triggers
a backtrace caused by the following field-spanning warning:
memcpy: detected field-spanning write (size 120) of single field
"¶ms_le->channel_list[0]" at
drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c:1072 (size 2)
The driver still works after this warning. The warning was introduced by the
new field-spanning write checks which were enabled recently.
Fix this by replacing the channel_list[1] declaration at the end of
the struct with a flexible array declaration.
Most users of struct brcmf_scan_params_le calculate the size to alloc
using the size of the non flex-array part of the struct + needed extra
space, so they do not care about sizeof(struct brcmf_scan_params_le).
brcmf_notify_escan_complete() however uses the struct on the stack,
expecting there to be room for at least 1 entry in the channel-list
to store the special -1 abort channel-id.
To make this work use an anonymous union with a padding member
added + the actual channel_list flexible array.
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20230729140500.27892-1-hdegoede@redhat.com
|
|
The nexthop code expects a 31 bit hash, such as what is returned by
fib_multipath_hash() and rt6_multipath_hash(). Passing the 32 bit hash
returned by skb_get_hash() can lead to problems related to the fact that
'int hash' is a negative number when the MSB is set.
In the case of hash threshold nexthop groups, nexthop_select_path_hthr()
will disproportionately select the first nexthop group entry. In the case
of resilient nexthop groups, nexthop_select_path_res() may do an out of
bounds access in nh_buckets[], for example:
hash = -912054133
num_nh_buckets = 2
bucket_index = 65535
which leads to the following panic:
BUG: unable to handle page fault for address: ffffc900025910c8
PGD 100000067 P4D 100000067 PUD 10026b067 PMD 0
Oops: 0002 [#1] PREEMPT SMP KASAN NOPTI
CPU: 4 PID: 856 Comm: kworker/4:3 Not tainted 6.5.0-rc2+ #34
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:nexthop_select_path+0x197/0xbf0
Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85
RSP: 0018:ffff88810c36f260 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8
RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219
R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0
R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900
FS: 0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffc900025910c8 CR3: 0000000129d00000 CR4: 0000000000750ee0
PKRU: 55555554
Call Trace:
<TASK>
? __die+0x23/0x70
? page_fault_oops+0x1ee/0x5c0
? __pfx_is_prefetch.constprop.0+0x10/0x10
? __pfx_page_fault_oops+0x10/0x10
? search_bpf_extables+0xfe/0x1c0
? fixup_exception+0x3b/0x470
? exc_page_fault+0xf6/0x110
? asm_exc_page_fault+0x26/0x30
? nexthop_select_path+0x197/0xbf0
? nexthop_select_path+0x197/0xbf0
? lock_is_held_type+0xe7/0x140
vxlan_xmit+0x5b2/0x2340
? __lock_acquire+0x92b/0x3370
? __pfx_vxlan_xmit+0x10/0x10
? __pfx___lock_acquire+0x10/0x10
? __pfx_register_lock_class+0x10/0x10
? skb_network_protocol+0xce/0x2d0
? dev_hard_start_xmit+0xca/0x350
? __pfx_vxlan_xmit+0x10/0x10
dev_hard_start_xmit+0xca/0x350
__dev_queue_xmit+0x513/0x1e20
? __pfx___dev_queue_xmit+0x10/0x10
? __pfx_lock_release+0x10/0x10
? mark_held_locks+0x44/0x90
? skb_push+0x4c/0x80
? eth_header+0x81/0xe0
? __pfx_eth_header+0x10/0x10
? neigh_resolve_output+0x215/0x310
? ip6_finish_output2+0x2ba/0xc90
ip6_finish_output2+0x2ba/0xc90
? lock_release+0x236/0x3e0
? ip6_mtu+0xbb/0x240
? __pfx_ip6_finish_output2+0x10/0x10
? find_held_lock+0x83/0xa0
? lock_is_held_type+0xe7/0x140
ip6_finish_output+0x1ee/0x780
ip6_output+0x138/0x460
? __pfx_ip6_output+0x10/0x10
? __pfx___lock_acquire+0x10/0x10
? __pfx_ip6_finish_output+0x10/0x10
NF_HOOK.constprop.0+0xc0/0x420
? __pfx_NF_HOOK.constprop.0+0x10/0x10
? ndisc_send_skb+0x2c0/0x960
? __pfx_lock_release+0x10/0x10
? __local_bh_enable_ip+0x93/0x110
? lock_is_held_type+0xe7/0x140
ndisc_send_skb+0x4be/0x960
? __pfx_ndisc_send_skb+0x10/0x10
? mark_held_locks+0x65/0x90
? find_held_lock+0x83/0xa0
ndisc_send_ns+0xb0/0x110
? __pfx_ndisc_send_ns+0x10/0x10
addrconf_dad_work+0x631/0x8e0
? lock_acquire+0x180/0x3f0
? __pfx_addrconf_dad_work+0x10/0x10
? mark_held_locks+0x24/0x90
process_one_work+0x582/0x9c0
? __pfx_process_one_work+0x10/0x10
? __pfx_do_raw_spin_lock+0x10/0x10
? mark_held_locks+0x24/0x90
worker_thread+0x93/0x630
? __kthread_parkme+0xdc/0x100
? __pfx_worker_thread+0x10/0x10
kthread+0x1a5/0x1e0
? __pfx_kthread+0x10/0x10
ret_from_fork+0x34/0x60
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
RIP: 0000:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Modules linked in:
CR2: ffffc900025910c8
---[ end trace 0000000000000000 ]---
RIP: 0010:nexthop_select_path+0x197/0xbf0
Code: c1 e4 05 be 08 00 00 00 4c 8b 35 a4 14 7e 01 4e 8d 6c 25 00 4a 8d 7c 25 08 48 01 dd e8 c2 25 15 ff 49 8d 7d 08 e8 39 13 15 ff <4d> 89 75 08 48 89 ef e8 7d 12 15 ff 48 8b 5d 00 e8 14 55 2f 00 85
RSP: 0018:ffff88810c36f260 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000000002000c0 RCX: ffffffffaf02dd77
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffffc900025910c8
RBP: ffffc900025910c0 R08: 0000000000000001 R09: fffff520004b2219
R10: ffffc900025910cf R11: 31392d2068736168 R12: 00000000002000c0
R13: ffffc900025910c0 R14: 00000000fffef608 R15: ffff88811840e900
FS: 0000000000000000(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 0000000129d00000 CR4: 0000000000750ee0
PKRU: 55555554
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x2ca00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
Fix this problem by ensuring the MSB of hash is 0 using a right shift - the
same approach used in fib_multipath_hash() and rt6_multipath_hash().
Fixes: 1274e1cc4226 ("vxlan: ecmp support for mac fdb entries")
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
skbuff: skb_under_panic: text:ffffffff88771f69 len:56 put:-4
head:ffff88805f86a800 data:ffff887f5f86a850 tail:0x88 end:0x2c0 dev:pim6reg
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:192!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 2 PID: 22968 Comm: kworker/2:11 Not tainted 6.5.0-rc3-00044-g0a8db05b571a #236
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:skb_panic+0x152/0x1d0
Call Trace:
<TASK>
skb_push+0xc4/0xe0
ip6mr_cache_report+0xd69/0x19b0
reg_vif_xmit+0x406/0x690
dev_hard_start_xmit+0x17e/0x6e0
__dev_queue_xmit+0x2d6a/0x3d20
vlan_dev_hard_start_xmit+0x3ab/0x5c0
dev_hard_start_xmit+0x17e/0x6e0
__dev_queue_xmit+0x2d6a/0x3d20
neigh_connected_output+0x3ed/0x570
ip6_finish_output2+0x5b5/0x1950
ip6_finish_output+0x693/0x11c0
ip6_output+0x24b/0x880
NF_HOOK.constprop.0+0xfd/0x530
ndisc_send_skb+0x9db/0x1400
ndisc_send_rs+0x12a/0x6c0
addrconf_dad_completed+0x3c9/0xea0
addrconf_dad_work+0x849/0x1420
process_one_work+0xa22/0x16e0
worker_thread+0x679/0x10c0
ret_from_fork+0x28/0x60
ret_from_fork_asm+0x11/0x20
When setup a vlan device on dev pim6reg, DAD ns packet may sent on reg_vif_xmit().
reg_vif_xmit()
ip6mr_cache_report()
skb_push(skb, -skb_network_offset(pkt));//skb_network_offset(pkt) is 4
And skb_push declared as:
void *skb_push(struct sk_buff *skb, unsigned int len);
skb->data -= len;
//0xffff88805f86a84c - 0xfffffffc = 0xffff887f5f86a850
skb->data is set to 0xffff887f5f86a850, which is invalid mem addr, lead to skb_push() fails.
Fixes: 14fb64e1f449 ("[IPV6] MROUTE: Support PIM-SM (SSM).")
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dev_close() and dev_open() are issued to change the interface state to DOWN
or UP (dev->flags IFF_UP). When the netdev is set DOWN it loses e.g its
Ipv6 addresses and routes. We don't want this in cases of device recovery
(triggered by hardware or software) or when the qeth device is set
offline.
Setting a qeth device offline or online and device recovery actions call
netif_device_detach() and/or netif_device_attach(). That will reset or
set the LOWER_UP indication i.e. change the dev->state Bit
__LINK_STATE_PRESENT. That is enough to e.g. cause bond failovers, and
still preserves the interface settings that are handled by the network
stack.
Don't call dev_open() nor dev_close() from the qeth device driver. Let the
network stack handle this.
Fixes: d4560150cb47 ("s390/qeth: call dev_close() during recovery")
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 66b2c338adce initializes the "sk_uid" field in the protocol socket
(struct sock) from the "/dev/tapX" device node's owner UID. Per original
commit 86741ec25462 ("net: core: Add a UID field to struct sock.",
2016-11-04), that's wrong: the idea is to cache the UID of the userspace
process that creates the socket. Commit 86741ec25462 mentions socket() and
accept(); with "tap", the action that creates the socket is
open("/dev/tapX").
Therefore the device node's owner UID is irrelevant. In most cases,
"/dev/tapX" will be owned by root, so in practice, commit 66b2c338adce has
no observable effect:
- before, "sk_uid" would be zero, due to undefined behavior
(CVE-2023-1076),
- after, "sk_uid" would be zero, due to "/dev/tapX" being owned by root.
What matters is the (fs)UID of the process performing the open(), so cache
that in "sk_uid".
Cc: Eric Dumazet <edumazet@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Pietro Borrello <borrello@diag.uniroma1.it>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 66b2c338adce ("tap: tap_open(): correctly initialize socket uid")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit a096ccca6e50 initializes the "sk_uid" field in the protocol socket
(struct sock) from the "/dev/net/tun" device node's owner UID. Per
original commit 86741ec25462 ("net: core: Add a UID field to struct
sock.", 2016-11-04), that's wrong: the idea is to cache the UID of the
userspace process that creates the socket. Commit 86741ec25462 mentions
socket() and accept(); with "tun", the action that creates the socket is
open("/dev/net/tun").
Therefore the device node's owner UID is irrelevant. In most cases,
"/dev/net/tun" will be owned by root, so in practice, commit a096ccca6e50
has no observable effect:
- before, "sk_uid" would be zero, due to undefined behavior
(CVE-2023-1076),
- after, "sk_uid" would be zero, due to "/dev/net/tun" being owned by root.
What matters is the (fs)UID of the process performing the open(), so cache
that in "sk_uid".
Cc: Eric Dumazet <edumazet@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Pietro Borrello <borrello@diag.uniroma1.it>
Cc: netdev@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: a096ccca6e50 ("tun: tun_chr_open(): correctly initialize socket uid")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2173435
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The driver can be built as a module, however the lack of the
MODULE_DEVICE_TABLE macro prevents it from being automatically probed
from the DT in such case.
Add the missed macro to make sure the module can load automatically.
Fixes: 6810bb390282 ("drm/panel: Add Samsung S6D7AA0 panel controller driver")
Signed-off-by: Nikita Travkin <nikita@trvn.ru>
Acked-by: Artur Weber <aweber.kernel@gmail.com>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230802-gt5-panel-dtable-v1-1-c0a765c175e2@trvn.ru
|