Age | Commit message (Collapse) | Author | Files | Lines |
|
Use built-in functions instead of shell commands to avoid forking
processes.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
|
|
extract-cert is only used in certs/Makefile.
Move it there and build extract-cert on demand.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
The previous commit fixed up all shell scripts to not include
include/config/auto.conf.
Now that include/config/auto.conf is only included by Makefiles,
we can change it into a more Make-friendly form.
Previously, Kconfig output string values enclosed with double-quotes
(both in the .config and include/config/auto.conf):
CONFIG_X="foo bar"
Unlike shell, Make handles double-quotes (and single-quotes as well)
verbatim. We must rip them off when used.
There are some patterns:
[1] $(patsubst "%",%,$(CONFIG_X))
[2] $(CONFIG_X:"%"=%)
[3] $(subst ",,$(CONFIG_X))
[4] $(shell echo $(CONFIG_X))
These are not only ugly, but also fragile.
[1] and [2] do not work if the value contains spaces, like
CONFIG_X=" foo bar "
[3] does not work correctly if the value contains double-quotes like
CONFIG_X="foo\"bar"
[4] seems to work better, but has a cost of forking a process.
Anyway, quoted strings were always PITA for our Makefiles.
This commit changes Kconfig to stop quoting in include/config/auto.conf.
These are the string type symbols referenced in Makefiles or scripts:
ACPI_CUSTOM_DSDT_FILE
ARC_BUILTIN_DTB_NAME
ARC_TUNE_MCPU
BUILTIN_DTB_SOURCE
CC_IMPLICIT_FALLTHROUGH
CC_VERSION_TEXT
CFG80211_EXTRA_REGDB_KEYDIR
EXTRA_FIRMWARE
EXTRA_FIRMWARE_DIR
EXTRA_TARGETS
H8300_BUILTIN_DTB
INITRAMFS_SOURCE
LOCALVERSION
MODULE_SIG_HASH
MODULE_SIG_KEY
NDS32_BUILTIN_DTB
NIOS2_DTB_SOURCE
OPENRISC_BUILTIN_DTB
SOC_CANAAN_K210_DTB_SOURCE
SYSTEM_BLACKLIST_HASH_LIST
SYSTEM_REVOCATION_KEYS
SYSTEM_TRUSTED_KEYS
TARGET_CPU
UNUSED_KSYMS_WHITELIST
XILINX_MICROBLAZE0_FAMILY
XILINX_MICROBLAZE0_HW_VER
XTENSA_VARIANT_NAME
I checked them one by one, and fixed up the code where necessary.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Richard Weinberger pointed out the risk of sourcing the kernel config
from shell scripts [1], and proposed some patches [2], [3]. It is a good
point, but it took a long time because I was wondering how to fix this.
This commit goes with simple grep approach because there are only a few
scripts including the kernel configuration.
scripts/link_vmlinux.sh has references to a bunch of CONFIG options,
all of which are boolean. I added is_enabled() helper as
scripts/package/{mkdebian,builddeb} do.
scripts/gen_autoksyms.sh uses 'eval', stating "to expand the whitelist
path". I removed it since it is the issue we are trying to fix.
I was a bit worried about the cost of invoking the grep command over
again. I extracted the grep parts from it, and measured the cost. It
was approximately 0.03 sec, which I hope is acceptable.
[test code]
$ cat test-grep.sh
#!/bin/sh
is_enabled() {
grep -q "^$1=y" include/config/auto.conf
}
is_enabled CONFIG_LTO_CLANG
is_enabled CONFIG_LTO_CLANG
is_enabled CONFIG_STACK_VALIDATION
is_enabled CONFIG_UNWINDER_ORC
is_enabled CONFIG_FTRACE_MCOUNT_USE_OBJTOOL
is_enabled CONFIG_VMLINUX_VALIDATION
is_enabled CONFIG_FRAME_POINTER
is_enabled CONFIG_GCOV_KERNEL
is_enabled CONFIG_LTO_CLANG
is_enabled CONFIG_RETPOLINE
is_enabled CONFIG_X86_SMAP
is_enabled CONFIG_LTO_CLANG
is_enabled CONFIG_VMLINUX_MAP
is_enabled CONFIG_KALLSYMS_ALL
is_enabled CONFIG_KALLSYMS_ABSOLUTE_PERCPU
is_enabled CONFIG_KALLSYMS_BASE_RELATIVE
is_enabled CONFIG_DEBUG_INFO_BTF
is_enabled CONFIG_KALLSYMS
is_enabled CONFIG_DEBUG_INFO_BTF
is_enabled CONFIG_BPF
is_enabled CONFIG_BUILDTIME_TABLE_SORT
is_enabled CONFIG_KALLSYMS
$ time ./test-grep.sh
real 0m0.036s
user 0m0.027s
sys m0.009s
[1]: https://lore.kernel.org/all/1919455.eZKeABUfgV@blindfold/
[2]: https://lore.kernel.org/all/20180219092245.26404-1-richard@nod.at/
[3]: https://lore.kernel.org/all/20210920213957.1064-2-richard@nod.at/
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
|
|
The complex macro, config_filename, was introduced to do:
[1] drop double-quotes from the string value
[2] add $(srctree)/ prefix in case the file is not found in $(objtree)
[3] escape spaces and more
[1] will be more generally handled by Kconfig later.
As for [2], Kbuild uses VPATH to search for files in $(objtree),
$(srctree) in this order. GNU Make can natively handle it.
As for [3], converting $(space) to $(space_escape) back and forth looks
questionable to me. It is well-known that GNU Make cannot handle file
paths with spaces in the first place.
Instead of using the complex macro, use $< so it will be expanded to
the file path of the key.
Remove config_filename, finally.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Toward the goal of removing the config_filename macro, drop
the double-quotes and add $(srctree)/ prefix in an ad hoc way.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
|
|
This dependency is necessary irrespective of the mentioned GCC PR
because the embedded certificates are build artifacts and must be
generated by extract_certs before *.S files are compiled.
The comment sounds like we are hoping to remove these dependencies
someday. No, we cannot remove them.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
'make clean' removes files listed in 'targets'. It is redundant to
specify both 'targets' and 'clean-files'.
Move 'targets' assignments out of the ifeq-conditionals so
scripts/Makefile.clean can see them.
One effective change is that certs/certs/signing_key.x509 is now
deleted by 'make clean' instead of 'make mrproper. This certificate
is embedded in the kernel. It is not used in any way by external
module builds.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
|
|
The .incbin directive in certs/system_certificates.S includes
certs/signing_key.x509 and certs/x509_certificate_list, both of which
are generated by extract_certs, i.e. exist in $(objtree).
This option -I$(srctree) is unneeded.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
cmd_extract_certs is defined twice. Unify them.
The current log shows the input file $(2), which might be empty.
You cannot know what is being created from the log, "EXTRACT_CERTS".
Change the log to show the output file with better alignment.
[Before]
EXTRACT_CERTS certs/signing_key.pem
CC certs/system_keyring.o
EXTRACT_CERTS
AS certs/system_certificates.o
CC certs/common.o
CC certs/blacklist.o
EXTRACT_CERTS
AS certs/revocation_certificates.o
[After]
CERT certs/signing_key.x509
CC certs/system_keyring.o
CERT certs/x509_certificate_list
AS certs/system_certificates.o
CC certs/common.o
CC certs/blacklist.o
CERT certs/x509_revocation_list
AS certs/revocation_certificates.o
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
|
|
Do not repeat $(obj)/x509.genkey or $(obj)/signing_key.pem
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
|
|
Linux 5.15 is out. Remove this stub now.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
|
|
This script is only used by usr/include/Makefile. Make it local to
the directory.
Update the comment in include/uapi/linux/soundcard.h because
'make headers_check' is no longer functional.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
If the key type of the existing signing key does not match to
CONFIG_MODULE_SIG_KEY_TYPE_*, the Makefile removes it so that it is
re-generated.
Use if_changed so that the key is re-generated when the key type is
changed (that is, the openssl command line is changed).
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Commit 5d06ee20b662 ("modsign: hide openssl output in silent builds")
silenced the key generation log from openssl in silent builds.
Since commit 174a1dcc9642 ("kbuild: sink stdout from cmd for silent
build"), the 'cmd' macro can handle it in a cleaner way.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
When you run Kbuild with the parallel option -j, the messages from this
rule and others are interleaved, like follows:
###
CC arch/x86/mm/pat/set_memory.o
### Now generating an X.509 key pair to be used for signing modules.
###
### If this takes a long time, you might wish to run rngd in the
### background to keep the supply of entropy topped up. It
CC arch/x86/events/intel/bts.o
HDRTEST usr/include/linux/qnx4_fs.h
CC arch/x86/events/zhaoxin/core.o
### needs to be run as root, and uses a hardware random
### number generator if one is available.
AR init/built-in.a
###
On modern machines, it does not take a long time to generate the key.
Remove the ugly log messages.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
When x509.genkey is created, it prints a log:
Generating X.509 key generation config
..., which is not the ordinary Kbuild log style.
Check-in the default config as certs/default_x509.genkey to make it
readable, and copy it to certs/x509.genkey if it is not present.
The log is shown in the Kbuild style.
COPY certs/x509.genkey
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
CONFIG_MODULE_SIG_HASH is defined by init/Kconfig. This $(error ...) is
never reachable. (If it is, you need to fix the bug.)
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
This is not used or exported.
BUILTIN_DTB is locally defined and used in arch/nds32/boot/dts/Makefile.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
When the condition "MODULE_SIG || (IMA_APPRAISE_MODSIG && MODULES)"
is unmet, you cannot choose anything in the choice, but the choice
menu is still displayed in the menuconfig etc.
Move the 'depends on' to the choice to hide the meaningless menu.
Also delete the redundant 'default'. In a choice, the first entry is
the default.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
The minimum supported version of LLVM has been raised to 11.0.0, meaning
this check is always true, so it can be dropped.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
The minimum supported version of LLVM has been raised to 11.0.0, meaning
this check is always true, so it can be dropped.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
This reverts commit 7411cfc3c91a08a884463bbc7623087ecc2efdd8.
The minimum supported version of LLVM has been raised to 11.0.0, meaning
this check is always true, so it can be dropped.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
The minimum supported version of LLVM has been raised to 11.0.0, meaning
this check is always true, so it can be dropped.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
The minimum supported version of LLVM has been raised to 11.0.0, meaning
this check is always true, so it can be dropped.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
LLVM versions prior to 11.0.0 have a harder time with dead code
elimination, which can cause issues with commonly used expressions such
as BUILD_BUG_ON and the bitmask functions/macros in bitfield.h (see the
first two issues links below).
Whenever there is an issue within LLVM that has been resolved in a later
release, the only course of action is to gate the problematic
configuration or source code on the toolchain verson or raise the
minimum supported version of LLVM for building the kernel, as LLVM has a
limited support lifetime compared to GCC. GCC major releases will
typically see a few point releases across a two year period on average
whereas LLVM major releases are only supported until the next major
release and will only see one or two point releases within that
timeframe. For example, GCC 8.1 was released in May 2018 and GCC 8.5 was
released in May 2021, whereas LLVM 12.0.0 was released in April 2021 and
its only point release, 12.0.1, was released in July 2021, giving a
minimal window for fixes to be backported.
To resolve these build errors around improper dead code elimination,
raise the minimum supported version of LLVM for building the kernel to
11.0.0. Doing so is a more proper solution than mucking around with core
kernel macros that have always worked with GCC or disabling drivers for
using these macros in a proper manner. This type of issue may continue
to crop up and require patching, which creates more debt for bumping the
minimum supported version in the future.
This should have a minimal impact to distributions. Using a script to
pull several different Docker images and check the output of
'clang --version':
archlinux:latest: clang version 13.0.0
debian:oldoldstable-slim: clang version 3.8.1-24 (tags/RELEASE_381/final)
debian:oldstable-slim: clang version 7.0.1-8+deb10u2 (tags/RELEASE_701/final)
debian:stable-slim: Debian clang version 11.0.1-2
debian:testing-slim: Debian clang version 11.1.0-4
debian:unstable-slim: Debian clang version 11.1.0-4
fedora:34: clang version 12.0.1 (Fedora 12.0.1-1.fc34)
fedora:latest: clang version 13.0.0 (Fedora 13.0.0-3.fc35)
fedora:rawhide: clang version 13.0.0 (Fedora 13.0.0-5.fc36)
opensuse/leap:15.2: clang version 9.0.1
opensuse/leap:latest: clang version 11.0.1
opensuse/tumbleweed:latest: clang version 13.0.0
ubuntu:bionic: clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
ubuntu:latest: clang version 10.0.0-4ubuntu1
ubuntu:hirsute: Ubuntu clang version 12.0.0-3ubuntu1~21.04.2
ubuntu:rolling: Ubuntu clang version 13.0.0-2
ubuntu:devel: Ubuntu clang version 13.0.0-9
In every case, the distribution's version of clang is either older than
the current minimum supported version of LLVM 10.0.1 or equal to or
greater than the proposed 11.0.0 so nothing should change.
Another benefit of this change is LLVM=1 works better with arm64 and
x86_64 since commit f12b034afeb3 ("scripts/Makefile.clang: default to
LLVM_IAS=1") enabled the integrated assembler by default, which only
works well with clang 11+ (clang-10 required it to be disabled to
successfully build a kernel).
Link: https://github.com/ClangBuiltLinux/linux/issues/1293
Link: https://github.com/ClangBuiltLinux/linux/issues/1506
Link: https://github.com/ClangBuiltLinux/linux/issues/1511
Link: https://github.com/llvm/llvm-project/commit/fa496ce3c6774097080c8a9cb808da56f383b938
Link: https://groups.google.com/g/clang-built-linux/c/mPQb9_ZWW0s/m/W7o6S-QTBAAJ
Link: https://github.com/ClangBuiltLinux/misc-scripts
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Show the very same file name that was passed to open()
in case the operation failed.
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
|
|
When converting a modular kernel to a monolithic kernel, once the kernel
works without loading any modules, this helps to quickly disable all the
modules before turning off module support entirely.
Refactor conf_rewrite_mod_or_yes to a more general
conf_rewrite_tristates that accepts an old and new state.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Tested-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
|
|
Use the architecture independent Kconfig option PAGE_SIZE_LESS_THAN_64KB
to indicate that VMXNET3 requires a page size smaller than 64kB.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
NTFS_RW code allocates page size dependent arrays on the stack. This
results in build failures if the page size is 64k or larger.
fs/ntfs/aops.c: In function 'ntfs_write_mst_block':
fs/ntfs/aops.c:1311:1: error:
the frame size of 2240 bytes is larger than 2048 bytes
Since commit f22969a66041 ("powerpc/64s: Default to 64K pages for 64 bit
book3s") this affects ppc:allmodconfig builds, but other architectures
supporting page sizes of 64k or larger are also affected.
Increasing the maximum frame size for affected architectures just to
silence this error does not really help. The frame size would have to
be set to a really large value for 256k pages. Also, a large frame size
could potentially result in stack overruns in this code and elsewhere
and is therefore not desirable. Make NTFS_RW dependent on page sizes
smaller than 64k instead.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
NTFS_RW and VMXNET3 require a page size smaller than 64kB. Add generic
Kconfig option for use outside architecture code to avoid architecture
specific Kconfig options in that code.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When creating a new event (via a module, kprobe, eprobe, etc), the
descriptors that are created must add flags for pid filtering if an
instance has pid filtering enabled, as the flags are used at the time the
event is executed to know if pid filtering should be done or not.
The "Only trace this pid" case was added, but a cut and paste error made
that case checked twice, instead of checking the "Trace all but this pid"
case.
Link: https://lore.kernel.org/all/202111280401.qC0z99JB-lkp@intel.com/
Fixes: 6cb206508b62 ("tracing: Check pid filtering when creating events")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
We got issue as follows:
================================================================================
UBSAN: Undefined behaviour in ./include/linux/ktime.h:42:14
signed integer overflow:
-4966321760114568020 * 1000000000 cannot be represented in type 'long long int'
CPU: 1 PID: 2186 Comm: syz-executor.2 Not tainted 4.19.90+ #12
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x3f0 arch/arm64/kernel/time.c:78
show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x170/0x1dc lib/dump_stack.c:118
ubsan_epilogue+0x18/0xb4 lib/ubsan.c:161
handle_overflow+0x188/0x1dc lib/ubsan.c:192
__ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213
ktime_set include/linux/ktime.h:42 [inline]
timespec64_to_ktime include/linux/ktime.h:78 [inline]
io_timeout fs/io_uring.c:5153 [inline]
io_issue_sqe+0x42c8/0x4550 fs/io_uring.c:5599
__io_queue_sqe+0x1b0/0xbc0 fs/io_uring.c:5988
io_queue_sqe+0x1ac/0x248 fs/io_uring.c:6067
io_submit_sqe fs/io_uring.c:6137 [inline]
io_submit_sqes+0xed8/0x1c88 fs/io_uring.c:6331
__do_sys_io_uring_enter fs/io_uring.c:8170 [inline]
__se_sys_io_uring_enter fs/io_uring.c:8129 [inline]
__arm64_sys_io_uring_enter+0x490/0x980 fs/io_uring.c:8129
invoke_syscall arch/arm64/kernel/syscall.c:53 [inline]
el0_svc_common+0x374/0x570 arch/arm64/kernel/syscall.c:121
el0_svc_handler+0x190/0x260 arch/arm64/kernel/syscall.c:190
el0_svc+0x10/0x218 arch/arm64/kernel/entry.S:1017
================================================================================
As ktime_set only judge 'secs' if big than KTIME_SEC_MAX, but if we pass
negative value maybe lead to overflow.
To address this issue, we must check if 'sec' is negative.
Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20211118015907.844807-1-yebin10@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
I got issue as follows:
[ 567.094140] __io_remove_buffers: [1]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680
[ 594.360799] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108]
[ 594.364987] Modules linked in:
[ 594.365405] irq event stamp: 604180238
[ 594.365906] hardirqs last enabled at (604180237): [<ffffffff93fec9bd>] _raw_spin_unlock_irqrestore+0x2d/0x50
[ 594.367181] hardirqs last disabled at (604180238): [<ffffffff93fbbadb>] sysvec_apic_timer_interrupt+0xb/0xc0
[ 594.368420] softirqs last enabled at (569080666): [<ffffffff94200654>] __do_softirq+0x654/0xa9e
[ 594.369551] softirqs last disabled at (569080575): [<ffffffff913e1d6a>] irq_exit_rcu+0x1ca/0x250
[ 594.370692] CPU: 2 PID: 108 Comm: kworker/u32:5 Tainted: G L 5.15.0-next-20211112+ #88
[ 594.371891] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
[ 594.373604] Workqueue: events_unbound io_ring_exit_work
[ 594.374303] RIP: 0010:_raw_spin_unlock_irqrestore+0x33/0x50
[ 594.375037] Code: 48 83 c7 18 53 48 89 f3 48 8b 74 24 10 e8 55 f5 55 fd 48 89 ef e8 ed a7 56 fd 80 e7 02 74 06 e8 43 13 7b fd fb bf 01 00 00 00 <e8> f8 78 474
[ 594.377433] RSP: 0018:ffff888101587a70 EFLAGS: 00000202
[ 594.378120] RAX: 0000000024030f0d RBX: 0000000000000246 RCX: 1ffffffff2f09106
[ 594.379053] RDX: 0000000000000000 RSI: ffffffff9449f0e0 RDI: 0000000000000001
[ 594.379991] RBP: ffffffff9586cdc0 R08: 0000000000000001 R09: fffffbfff2effcab
[ 594.380923] R10: ffffffff977fe557 R11: fffffbfff2effcaa R12: ffff8881b8f3def0
[ 594.381858] R13: 0000000000000246 R14: ffff888153a8b070 R15: 0000000000000000
[ 594.382787] FS: 0000000000000000(0000) GS:ffff888399c00000(0000) knlGS:0000000000000000
[ 594.383851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 594.384602] CR2: 00007fcbe71d2000 CR3: 00000000b4216000 CR4: 00000000000006e0
[ 594.385540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 594.386474] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 594.387403] Call Trace:
[ 594.387738] <TASK>
[ 594.388042] find_and_remove_object+0x118/0x160
[ 594.389321] delete_object_full+0xc/0x20
[ 594.389852] kfree+0x193/0x470
[ 594.390275] __io_remove_buffers.part.0+0xed/0x147
[ 594.390931] io_ring_ctx_free+0x342/0x6a2
[ 594.392159] io_ring_exit_work+0x41e/0x486
[ 594.396419] process_one_work+0x906/0x15a0
[ 594.399185] worker_thread+0x8b/0xd80
[ 594.400259] kthread+0x3bf/0x4a0
[ 594.401847] ret_from_fork+0x22/0x30
[ 594.402343] </TASK>
Message from syslogd@localhost at Nov 13 09:09:54 ...
kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108]
[ 596.793660] __io_remove_buffers: [2099199]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680
We can reproduce this issue by follow syzkaller log:
r0 = syz_io_uring_setup(0x401, &(0x7f0000000300), &(0x7f0000003000/0x2000)=nil, &(0x7f0000ff8000/0x4000)=nil, &(0x7f0000000280)=<r1=>0x0, &(0x7f0000000380)=<r2=>0x0)
sendmsg$ETHTOOL_MSG_FEATURES_SET(0xffffffffffffffff, &(0x7f0000003080)={0x0, 0x0, &(0x7f0000003040)={&(0x7f0000000040)=ANY=[], 0x18}}, 0x0)
syz_io_uring_submit(r1, r2, &(0x7f0000000240)=@IORING_OP_PROVIDE_BUFFERS={0x1f, 0x5, 0x0, 0x401, 0x1, 0x0, 0x100, 0x0, 0x1, {0xfffd}}, 0x0)
io_uring_enter(r0, 0x3a2d, 0x0, 0x0, 0x0, 0x0)
The reason above issue is 'buf->list' has 2,100,000 nodes, occupied cpu lead
to soft lockup.
To solve this issue, we need add schedule point when do while loop in
'__io_remove_buffers'.
After add schedule point we do regression, get follow data.
[ 240.141864] __io_remove_buffers: [1]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00
[ 268.408260] __io_remove_buffers: [1]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180
[ 275.899234] __io_remove_buffers: [2099199]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00
[ 296.741404] __io_remove_buffers: [1]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380
[ 305.090059] __io_remove_buffers: [2099199]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180
[ 325.415746] __io_remove_buffers: [1]start ctx=0xffff8881b92d1000 bgid=65533 buf=0xffff8881a17d8f00
[ 333.160318] __io_remove_buffers: [2099199]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380
...
Fixes:8bab4c09f24e("io_uring: allow conditional reschedule for intensive iterators")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20211122024737.2198530-1-yebin10@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If a event is filtered by pid and a trigger that requires processing of
the event to happen is a attached to the event, the discard portion does
not take the pid filtering into account, and the event will then be
recorded when it should not have been.
Cc: stable@vger.kernel.org
Fixes: 3fdaf80f4a836 ("tracing: Implement event pid filtering")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
When supporting only the .map and .unmap callbacks of iommu_ops,
the IOMMU driver can make assumptions about the size and alignment
used for mappings based on the driver provided pgsize_bitmap. VT-d
previously used essentially PAGE_MASK for this bitmap as any power
of two mapping was acceptably filled by native page sizes.
However, with the .map_pages and .unmap_pages interface we're now
getting page-size and count arguments. If we simply combine these
as (page-size * count) and make use of the previous map/unmap
functions internally, any size and alignment assumptions are very
different.
As an example, a given vfio device assignment VM will often create
a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff]. On a system that
does not support IOMMU super pages, the unmap_pages interface will
ask to unmap 1024 4KB pages at the base IOVA. dma_pte_clear_level()
will recurse down to level 2 of the page table where the first half
of the pfn range exactly matches the entire pte level. We clear the
pte, increment the pfn by the level size, but (oops) the next pte is
on a new page, so we exit the loop an pop back up a level. When we
then update the pfn based on that higher level, we seem to assume
that the previous pfn value was at the start of the level. In this
case the level size is 256K pfns, which we add to the base pfn and
get a results of 0x7fe00, which is clearly greater than 0x401ff,
so we're done. Meanwhile we never cleared the ptes for the remainder
of the range. When the VM remaps this range, we're overwriting valid
ptes and the VT-d driver complains loudly, as reported by the user
report linked below.
The fix for this seems relatively simple, if each iteration of the
loop in dma_pte_clear_level() is assumed to clear to the end of the
level pte page, then our next pfn should be calculated from level_pfn
rather than our working pfn.
Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops callback")
Reported-by: Ajay Garg <ajaygargnsit@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Link: https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/
Link: https://lore.kernel.org/r/163659074748.1617923.12716161410774184024.stgit@omen
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211126135556.397932-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
If we return -EOPNOTSUPP, the rcu lock remains lock. This is spurious.
Go through the end of the function instead. This way, the missing
'rcu_read_unlock()' is called.
Fixes: 7afd7f6aa21a ("iommu/vt-d: Check FL and SL capability sanity in scalable mode")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/40cc077ca5f543614eab2a10e84d29dd190273f6.1636217517.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211126135556.397932-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
With the submission of iommu driver for RK3568 a subtle bug was
introduced: PAGE_DESC_HI_MASK1 and PAGE_DESC_HI_MASK2 have to be
the other way arround - that leads to random errors, especially when
addresses beyond 32 bit are used.
Fix it.
Fixes: c55356c534aa ("iommu: rockchip: Add support for iommu v2")
Signed-off-by: Alex Bee <knaerzche@gmail.com>
Tested-by: Peter Geis <pgwipeout@gmail.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Dan Johansen <strit@manjaro.org>
Reviewed-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Link: https://lore.kernel.org/r/20211124021325.858139-1-knaerzche@gmail.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The messages printed on the initialization of the AMD IOMMUv2 driver
have caused some confusion in the past. Clarify the messages to lower
the confusion in the future.
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20211123105507.7654-3-joro@8bytes.org
|
|
The macro is unused after commit 00ecd5401349a so it can be removed.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 00ecd5401349a ("iommu/vt-d: Clean up unused PASID updating functions")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211123105507.7654-2-joro@8bytes.org
|
|
Current driver version is able to handle only one bridge at time.
Configuring two bridges on two different ports would end up shorting this
bridges by HW. To reproduce it:
ip l a name br0 type bridge
ip l a name br1 type bridge
ip l s dev br0 up
ip l s dev br1 up
ip l s lan1 master br0
ip l s dev lan1 up
ip l s lan2 master br1
ip l s dev lan2 up
Ping on lan1 and get response on lan2, which should not happen.
This happened, because current driver version is storing one global "Port VLAN
Membership" and applying it to all ports which are members of any
bridge.
To solve this issue, we need to handle each port separately.
This patch is dropping the global port member storage and calculating
membership dynamically depending on STP state and bridge participation.
Note: STP support was broken before this patch and should be fixed
separately.
Fixes: c2e866911e25 ("net: dsa: microchip: break KSZ9477 DSA driver into two files")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20211126123926.2981028-1-o.rempel@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The driver doesn't support RX timestamping for non-PTP packets, but it
declares that it does. Restrict the reported RX filters to PTP v2 over
L2 and over L4.
Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
IEEE 1588 support was declared too soon for the Ocelot switch. Out of
reset, this switch does not apply any special treatment for PTP packets,
i.e. when an event message is received, the natural tendency is to
forward it by MAC DA/VLAN ID. This poses a problem when the ingress port
is under a bridge, since user space application stacks (written
primarily for endpoint ports, not switches) like ptp4l expect that PTP
messages are always received on AF_PACKET / AF_INET sockets (depending
on the PTP transport being used), and never being autonomously
forwarded. Any forwarding, if necessary (for example in Transparent
Clock mode) is handled in software by ptp4l. Having the hardware forward
these packets too will cause duplicates which will confuse endpoints
connected to these switches.
So PTP over L2 barely works, in the sense that PTP packets reach the CPU
port, but they reach it via flooding, and therefore reach lots of other
unwanted destinations too. But PTP over IPv4/IPv6 does not work at all.
This is because the Ocelot switch have a separate destination port mask
for unknown IP multicast (which PTP over IP is) flooding compared to
unknown non-IP multicast (which PTP over L2 is) flooding. Specifically,
the driver allows the CPU port to be in the PGID_MC port group, but not
in PGID_MCIPV4 and PGID_MCIPV6. There are several presentations from
Allan Nielsen which explain that the embedded MIPS CPU on Ocelot
switches is not very powerful at all, so every penny they could save by
not allowing flooding to the CPU port module matters. Unknown IP
multicast did not make it.
The de facto consensus is that when a switch is PTP-aware and an
application stack for PTP is running, switches should have some sort of
trapping mechanism for PTP packets, to extract them from the hardware
data path. This avoids both problems:
(a) PTP packets are no longer flooded to unwanted destinations
(b) PTP over IP packets are no longer denied from reaching the CPU since
they arrive there via a trap and not via flooding
It is not the first time when this change is attempted. Last time, the
feedback from Allan Nielsen and Andrew Lunn was that the traps should
not be installed by default, and that PTP-unaware switching may be
desired for some use cases:
https://patchwork.ozlabs.org/project/netdev/patch/20190813025214.18601-5-yangbo.lu@nxp.com/
To address that feedback, the present patch adds the necessary packet
traps according to the RX filter configuration transmitted by user space
through the SIOCSHWTSTAMP ioctl. Trapping is done via VCAP IS2, where we
keep 5 filters, which are amended each time RX timestamping is enabled
or disabled on a port:
- 1 for PTP over L2
- 2 for PTP over IPv4 (UDP ports 319 and 320)
- 2 for PTP over IPv6 (UDP ports 319 and 320)
The cookie by which these filters (invisible to tc) are identified is
strategically chosen such that it does not collide with the filters used
for the ocelot-8021q tagging protocol by the Felix driver, or with the
MRP traps set up by the Ocelot library.
Other alternatives were considered, like patching user space to do
something, but there are so many ways in which PTP packets could be made
to reach the CPU, generically speaking, that "do what?" is a very valid
question. The ptp4l program from the linuxptp stack already attempts to
do something: it calls setsockopt(IP_ADD_MEMBERSHIP) (and
PACKET_ADD_MEMBERSHIP, respectively) which translates in both cases into
a dev_mc_add() on the interface, in the kernel:
https://github.com/richardcochran/linuxptp/blob/v3.1.1/udp.c#L73
https://github.com/richardcochran/linuxptp/blob/v3.1.1/raw.c
Reality shows that this is not sufficient in case the interface belongs
to a switchdev driver, as dev_mc_add() does not show the intention to
trap a packet to the CPU, but rather the intention to not drop it (it is
strictly for RX filtering, same as promiscuous does not mean to send all
traffic to the CPU, but to not drop traffic with unknown MAC DA). This
topic is a can of worms in itself, and it would be great if user space
could just stay out of it.
On the other hand, setting up PTP traps privately within the driver is
not new by any stretch of the imagination:
https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c#L833
https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/net/dsa/hirschmann/hellcreek.c#L1050
https://elixir.bootlin.com/linux/v5.16-rc2/source/include/linux/dsa/sja1105.h#L21
So this is the approach taken here as well. The difference here being
that we prepare and destroy the traps per port, dynamically at runtime,
as opposed to driver init time, because apparently, PTP-unaware
forwarding is a use case.
Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support")
Reported-by: Po Liu <po.liu@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As opposed to event messages (Sync, PdelayReq etc) which require
timestamping, general messages (Announce, FollowUp etc) do not.
In PTP they are part of different streams of data.
IEEE 1588-2008 Annex D.2 "UDP port numbers" states that the UDP
destination port assigned by IANA is 319 for event messages, and 320 for
general messages. Yet the kernel seems to be missing the definition for
general messages. This patch adds it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
VCAP (Versatile Content Aware Processor) is the TCAM-based engine behind
tc flower offload on ocelot, among other things. The ingress port mask
on which VCAP rules match is present as a bit field in the actual key of
the rule. This means that it is possible for a rule to be shared among
multiple source ports. When the rule is added one by one on each desired
port, that the ingress port mask of the key must be edited and rewritten
to hardware.
But the API in ocelot_vcap.c does not allow for this. For one thing,
ocelot_vcap_filter_add() and ocelot_vcap_filter_del() are not symmetric,
because ocelot_vcap_filter_add() works with a preallocated and
prepopulated filter and programs it to hardware, and
ocelot_vcap_filter_del() does both the job of removing the specified
filter from hardware, as well as kfreeing it. That is to say, the only
option of editing a filter in place, which is to delete it, modify the
structure and add it back, does not work because it results in
use-after-free.
This patch introduces ocelot_vcap_filter_replace, which trivially
reprograms a VCAP entry to hardware, at the exact same index at which it
existed before, without modifying any list or allocating any memory.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The ocelot driver, when asked to timestamp all receiving packets, 1588
v1 or NTP, says "nah, here's 1588 v2 for you".
According to this discussion:
https://patchwork.kernel.org/project/netdevbpf/patch/20211104133204.19757-8-martin.kaistra@linutronix.de/#24577647
drivers that downgrade from a wider request to a narrower response (or
even a response where the intersection with the request is empty) are
buggy, and should return -ERANGE instead. This patch fixes that.
Fixes: 4e3b0468e6d7 ("net: mscc: PTP Hardware Clock (PHC) support")
Suggested-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, HNS3 driver doesn't clear the reset flags of components after
successfully executing reset, it causes userspace info of
"Components reset" and "Components not reset" is incorrect.
So fix this problem by clear corresponding reset flag after reset process.
Fixes: ddccc5e368a3 ("net: hns3: add support for triggering reset by ethtool")
Signed-off-by: Jie Wang <wangjie125@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, when user queries page pool info by debugfs command
"cat page_pool_info", the cnt of allocated page for page pool may be
incorrect because of memory inconsistency problem caused by compiler
optimization.
So this patch uses READ_ONCE() to read value of pages_state_hold_cnt to
fix this problem.
Fixes: 850bfb912a6d ("net: hns3: debugfs add support dumping page pool info")
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When page pool is not enabled, its address value is still NULL and page
pool should not be accessed, so add a check for it.
Fixes: 850bfb912a6d ("net: hns3: debugfs add support dumping page pool info")
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|