Age | Commit message (Collapse) | Author | Files | Lines |
|
The objects placed at the head of vmlinux need special treatments:
- arch/$(SRCARCH)/Makefile adds them to head-y in order to place
them before other archives in the linker command line.
- arch/$(SRCARCH)/kernel/Makefile adds them to extra-y instead of
obj-y to avoid them going into built-in.a.
This commit gets rid of the latter.
Create vmlinux.a to collect all the objects that are unconditionally
linked to vmlinux. The objects listed in head-y are moved to the head
of vmlinux.a by using 'ar m'.
With this, arch/$(SRCARCH)/kernel/Makefile can consistently use obj-y
for builtin objects.
There is no *.o that is directly linked to vmlinux. Drop unneeded code
in scripts/clang-tools/gen_compile_commands.py.
$(AR) mPi needs 'T' to workaround the llvm-ar bug. The fix was suggested
by Nathan Chancellor [1].
[1]: https://lore.kernel.org/llvm/YyjjT5gQ2hGMH0ni@dev-arch.thelio-3990X/
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
|
|
V=1 (verbose build) shows commands executed by Make, but it may cause
misunderstanding.
For example, the following command shows the outstanding error message.
$ make V=1 INSTALL_PATH=/tmp install
test -e include/generated/autoconf.h -a -e include/config/auto.conf || ( \
echo >&2; \
echo >&2 " ERROR: Kernel configuration is invalid."; \
echo >&2 " include/generated/autoconf.h or include/config/auto.conf are missing.";\
echo >&2 " Run 'make oldconfig && make prepare' on kernel src to fix it."; \
echo >&2 ; \
/bin/false)
unset sub_make_done; ./scripts/install.sh
It is not an error. Make just showed the recipe lines it has executed,
but people may think that 'make install' has failed.
Likewise, the combination of V=1 and O= shows confusing
"*** The source tree is not clean, please run 'make mrproper'".
Suppress such misleading logs.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
|
|
Modpost generates .vmlinux.export.c and *.mod.c, which are prerequisites
of vmlinux and modules, respectively.
The modpost stage should be re-run when the modpost code is updated.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Currently, modpost is executed twice; first for vmlinux, second
for modules.
This commit merges them.
Current build flow
==================
1) build obj-y and obj-m objects
2) link vmlinux.o
3) modpost for vmlinux
4) link vmlinux
5) modpost for modules
6) link modules (*.ko)
The build steps 1) through 6) are serialized, that is, modules are
built after vmlinux. You do not get benefits of parallel builds when
scripts/link-vmlinux.sh is being run.
New build flow
==============
1) build obj-y and obj-m objects
2) link vmlinux.o
3) modpost for vmlinux and modules
4a) link vmlinux
4b) link modules (*.ko)
In the new build flow, modpost is invoked just once.
vmlinux and modules are built in parallel. One exception is
CONFIG_DEBUG_INFO_BTF_MODULES=y, where modules depend on vmlinux.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
|
|
Move the build rules of vmlinux.o out of scripts/link-vmlinux.sh to
clearly separate 1) pre-modpost, 2) modpost, 3) post-modpost stages.
This will make further refactoring possible.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
|
|
.vmlinux.objs is used by modpost, so scripts/Makefile.modpost is
a better place to generate it.
It is used only when CONFIG_MODVERSIONS=y. It should be guarded
by "ifdef CONFIG_MODVERSIONS".
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
|
|
Use the ordinary obj-y syntax to list subdirectories.
Note1:
Previously, the link order of lib-y depended on CONFIG_MODULES; lib-y
was linked before drivers-y when CONFIG_MODULES=y, otherwise after
drivers-y. This was a bug of commit 7273ad2b08f8 ("kbuild: link lib-y
objects to vmlinux forcibly when CONFIG_MODULES=y"), but it was not a
big deal after all. Now, all objects listed in lib-y are linked last,
irrespective of CONFIG_MODULES.
Note2:
Finally, the single target build in arch/*/lib/ works correctly. There was
a bug report about this. [1]
$ make ARCH=arm arch/arm/lib/findbit.o
CALL scripts/checksyscalls.sh
AS arch/arm/lib/findbit.o
[1]: https://lore.kernel.org/linux-kbuild/YvUQOwL6lD4%2F5%2FU6@shell.armlinux.org.uk/
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
|
|
cc-ifversion is GCC specific. Replace it with compiler specific
variants. Update the users of cc-ifversion to use these new macros.
Link: https://github.com/ClangBuiltLinux/linux/issues/350
Link: https://lore.kernel.org/llvm/CAGG=3QWSAUakO42kubrCap8fp-gm1ERJJAYXTnP1iHk_wrH=BQ@mail.gmail.com/
Suggested-by: Bill Wendling <morbo@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Doing make V=1 binrpm-pkg results in:
Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.EgV6qJ
+ umask 022
+ cd .
+ /bin/rm -rf /home/scgl/rpmbuild/BUILDROOT/kernel-6.0.0_rc5+-1.s390x
+ /bin/mkdir -p /home/scgl/rpmbuild/BUILDROOT
+ /bin/mkdir /home/scgl/rpmbuild/BUILDROOT/kernel-6.0.0_rc5+-1.s390x
+ mkdir -p /home/scgl/rpmbuild/BUILDROOT/kernel-6.0.0_rc5+-1.s390x/boot
+ make -f ./Makefile image_name
+ cp test -e include/generated/autoconf.h -a -e include/config/auto.conf || ( \ echo >&2; \ echo >&2 " ERROR: Kernel configuration is invalid."; \ echo >&2 " include/generated/autoconf.h or include/config/auto.conf are missing.";\ echo >&2 " Run 'make oldconfig && make prepare' on kernel src to fix it."; \ echo >&2 ; \ /bin/false) arch/s390/boot/bzImage /home/scgl/rpmbuild/BUILDROOT/kernel-6.0.0_rc5+-1.s390x/boot/vmlinuz-6.0.0-rc5+
cp: invalid option -- 'e'
Try 'cp --help' for more information.
error: Bad exit status from /var/tmp/rpm-tmp.EgV6qJ (%install)
Because the make call to get the image name is verbose and prints
additional information.
Fixes: 993bdde94547 ("kbuild: add image_name to no-sync-config-targets")
Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Remove unused function argument, and there is
no logic changes.
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
sym_set_choice_value could be removed and directly call
sym_set_tristate_value instead.
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Since commit 7b4537199a4a ("kbuild: link symbol CRCs at final link,
removing CONFIG_MODULE_REL_CRCS"), the module versioning on the
(non-upstreamed-yet) kvx Linux port is broken due to unexpected padding
for __crc_* symbols. The kvx GCC adds padding so u32 gets 8-byte
alignment instead of 4.
I do not know if this happens for upstream architectures in general,
but any compiler has the freedom to insert padding for faster access.
Use the inline assembler to directly specify the wanted data layout.
This is how we previously did before the breakage.
Link: https://lore.kernel.org/lkml/20220817161438.32039-1-ysionneau@kalray.eu/
Link: https://lore.kernel.org/linux-kbuild/31ce5305-a76b-13d7-ea55-afca82c46cf2@kalray.eu/
Fixes: 7b4537199a4a ("kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS")
Reported-by: Yann Sionneau <ysionneau@kalray.eu>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Yann Sionneau <ysionneau@kalray.eu>
|
|
Based on Linus' patch. Refactor scripts/Makefile.vmlinux_o as well.
Link: https://lore.kernel.org/lkml/CAHk-=wgjTMQgiKzBZTmb=uWGDEQxDdyF1+qxBkODYciuNsmwnw@mail.gmail.com/
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
|
|
The single target build has a subtle bug for the combination for
an individual file and a subdirectory.
[1] 'make kernel/fork.i' builds only kernel/fork.i
$ make kernel/fork.i
CALL scripts/checksyscalls.sh
DESCEND objtool
CPP kernel/fork.i
[2] 'make kernel/' builds only under the kernel/ directory.
$ make kernel/
CALL scripts/checksyscalls.sh
DESCEND objtool
CC kernel/fork.o
CC kernel/exec_domain.o
[snip]
CC kernel/rseq.o
AR kernel/built-in.a
But, if you try to do [1] and [2] in a single command, you will get
only [1] with a weird log:
$ make kernel/fork.i kernel/
CALL scripts/checksyscalls.sh
DESCEND objtool
CPP kernel/fork.i
make[2]: Nothing to be done for 'kernel/'.
With 'make kernel/fork.i kernel/', you should get both [1] and [2].
Rewrite the single target build.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Remove the bash build dependency for those who otherwise do not
have it installed. This also provides a significant speedup:
$ make defconfig
$ make yes2modconfig
...
$ find . -name "*.o" | grep -v vmlinux | wc
3169 3169 89615
$ export NM=nm
$ time sh -c 'find . -name "*.o" | grep -v vmlinux | xargs -n1
./scripts/check-local-export'
Without patch:
0m15.90s real 0m12.17s user 0m05.28s system
With patch:
dash + nawk
0m02.16s real 0m02.92s user 0m00.34s system
dash + busybox awk
0m02.36s real 0m03.36s user 0m00.34s system
dash + gawk
0m02.07s real 0m03.26s user 0m00.32s system
bash + gawk
0m03.55s real 0m05.00s user 0m00.54s system
Signed-off-by: Owen Rafferty <owen@owenrafferty.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
This reverts commit [1] in the pre-git era.
I do not know what problem happened in the script when sh != bash
because there is no commit message.
Now that this script is much simpler than it used to be, let's revert
it, and let' see. (If this turns out to be problematic, fix the code
with proper commit description.)
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=11acbbbb8a50f4de7dbe4bc1b5acc440dfe81810
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Minimize the scope of LC_ALL=C like before commit 87c94bfb8ad3 ("kbuild:
override build timestamp & version").
Give LC_ALL=C to '$LD -v' to get the consistent version output, as commit
bcbcf50f5218 ("kbuild: fix ld-version.sh to not be affected by locale")
mentioned the LD version is affected by locale.
While I was here, I merged two sed invocations.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Now that UTS_VERSION was separated out, this header can be generated
much earlier, and probably the top Makefile is a better place to do it
than init/Makefile.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Kbuild builds init/built-in.a twice; first during the ordinary
directory descending, second from scripts/link-vmlinux.sh.
We do this because UTS_VERSION contains the build version and the
timestamp. We cannot update it during the normal directory traversal
since we do not yet know if we need to update vmlinux. UTS_VERSION is
temporarily calculated, but omitted from the update check. Otherwise,
vmlinux would be rebuilt every time.
When Kbuild results in running link-vmlinux.sh, it increments the
version number in the .version file and takes the timestamp at that
time to really fix UTS_VERSION.
However, updating the same file twice is a footgun. To avoid nasty
timestamp issues, all build artifacts that depend on init/built-in.a
are atomically generated in link-vmlinux.sh, where some of them do not
need rebuilding.
To fix this issue, this commit changes as follows:
[1] Split UTS_VERSION out to include/generated/utsversion.h from
include/generated/compile.h
include/generated/utsversion.h is generated just before the
vmlinux link. It is generated under include/generated/ because
some decompressors (s390, x86) use UTS_VERSION.
[2] Split init_uts_ns and linux_banner out to init/version-timestamp.c
from init/version.c
init_uts_ns and linux_banner contain UTS_VERSION. During the ordinary
directory descending, they are compiled with __weak and used to
determine if vmlinux needs relinking. Just before the vmlinux link,
they are compiled without __weak to embed the real version and
timestamp.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
This is unneeded since commit 073a9ecb3a73 ("init/version.c: remove
Version_<LINUX_VERSION_CODE> symbol").
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Unify the code between in-tree builds and external module builds.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Remove the potentially invalid modules.order instead of using
the temporary file.
Also, KBUILD_MODULES is don't care for single builds. No need to
cancel it.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
The dependency, "modules: modules_check" is specified twice.
Commit 1a998be620a1 ("kbuild: check module name conflict for external
modules as well") missed to clean it up.
'PHONY += modules' also appears twice.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Use obj-y to clean up Makefile.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
The AWK code was added to deduplicate modules.order in case $(obj-m)
contains the same module multiple times, but it is actually unneeded
since commit b2c885549122 ("kbuild: update modules.order only when
contained modules are updated").
The list is already deduplicated before being processed by AWK because
$^ is the deduplicated list of prerequisites.
(Please note the real-prereqs macro uses $^)
Yet, modules.order will contain duplication if two different Makefiles
build the same module:
foo/Makefile:
obj-m += bar/baz.o
foo/bar/Makefile:
obj-m += baz.o
However, the parallel builds cannot properly handle this case in the
first place. So, it is better to let it fail (as already done by
scripts/modules-check.sh).
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
It is unneeded to check the sha1sum every time.
Create the timestamp files to manage it.
Add '.' to clean-dirs because 'make clean' must visit ./Kbuild to
clean up the timestamp files.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
My future plan is to list subdirectories in ./Kbuild. When it occurs,
$(vmlinux-alldirs) will not contain all subdirectories.
Let's hard-code the directory list until I get around to implementing
a more sophisticated way for generating a source tarball.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
|
|
missing-syscalls and old-atomics are meant to be phony targets.
Adding them to always-y is odd. (always-y should generate something).
Add a new phony target 'prepare', which depends on all the other.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
When receiving some signal, GNU Make automatically deletes the target if
it has already been changed by the interrupted recipe.
If the target is possibly incomplete due to interruption, it must be
deleted so that it will be remade from scratch on the next run of make.
Otherwise, the target would remain corrupted permanently because its
timestamp had already been updated.
Thanks to this behavior of Make, you can stop the build any time by
pressing Ctrl-C, and just run 'make' to resume it.
Kbuild also relies on this feature, but it is equivalently important
for any build systems that make decisions based on timestamps (if you
want to support Ctrl-C reliably).
However, this does not always work as claimed; Make immediately dies
with Ctrl-C if its stderr goes into a pipe.
[Test Makefile]
foo:
echo hello > $@
sleep 3
echo world >> $@
[Test Result]
$ make # hit Ctrl-C
echo hello > foo
sleep 3
^Cmake: *** Deleting file 'foo'
make: *** [Makefile:3: foo] Interrupt
$ make 2>&1 | cat # hit Ctrl-C
echo hello > foo
sleep 3
^C$ # 'foo' is often left-over
The reason is because SIGINT is sent to the entire process group.
In this example, SIGINT kills 'cat', and 'make' writes the message to
the closed pipe, then dies with SIGPIPE before cleaning the target.
A typical bad scenario (as reported by [1], [2]) is to save build log
by using the 'tee' command:
$ make 2>&1 | tee log
This can be problematic for any build systems based on Make, so I hope
it will be fixed in GNU Make. The maintainer of GNU Make stated this is
a long-standing issue and difficult to fix [3]. It has not been fixed
yet as of writing.
So, we cannot rely on Make cleaning the target. We can do it by
ourselves, in signal traps.
As far as I understand, Make takes care of SIGHUP, SIGINT, SIGQUIT, and
SITERM for the target removal. I added the traps for them, and also for
SIGPIPE just in case cmd_* rule prints something to stdout or stderr
(but I did not observe an actual case where SIGPIPE was triggered).
[Note 1]
The trap handler might be worth explaining.
rm -f $@; trap - $(sig); kill -s $(sig) $$
This lets the shell kill itself by the signal it caught, so the parent
process can tell the child has exited on the signal. Generally, this is
a proper manner for handling signals, in case the calling program (like
Bash) may monitor WIFSIGNALED() and WTERMSIG() for WCE although this may
not be a big deal here because GNU Make handles SIGHUP, SIGINT, SIGQUIT
in WUE and SIGTERM in IUE.
IUE - Immediate Unconditional Exit
WUE - Wait and Unconditional Exit
WCE - Wait and Cooperative Exit
For details, see "Proper handling of SIGINT/SIGQUIT" [4].
[Note 2]
Reverting 392885ee82d3 ("kbuild: let fixdep directly write to .*.cmd
files") would directly address [1], but it only saves if_changed_dep.
As reported in [2], all commands that use redirection can potentially
leave an empty (i.e. broken) target.
[Note 3]
Another (even safer) approach might be to always write to a temporary
file, and rename it to $@ at the end of the recipe.
<command> > $(tmp-target)
mv $(tmp-target) $@
It would require a lot of Makefile changes, and result in ugly code,
so I did not take it.
[Note 4]
A little more thoughts about a pattern rule with multiple targets (or
a grouped target).
%.x %.y: %.z
<recipe>
When interrupted, GNU Make deletes both %.x and %.y, while this solution
only deletes $@. Probably, this is not a big deal. The next run of make
will execute the rule again to create $@ along with the other files.
[1]: https://lore.kernel.org/all/YLeot94yAaM4xbMY@gmail.com/
[2]: https://lore.kernel.org/all/20220510221333.2770571-1-robh@kernel.org/
[3]: https://lists.gnu.org/archive/html/help-make/2021-06/msg00001.html
[4]: https://www.cons.org/cracauer/sigint.html
Fixes: 392885ee82d3 ("kbuild: let fixdep directly write to .*.cmd files")
Reported-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Rob Herring <robh@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
|
|
|
|
The "hmem" platform-devices that are created to represent the
platform-advertised "Soft Reserved" memory ranges end up inserting a
resource that causes the iomem_resource tree to look like this:
340000000-43fffffff : hmem.0
340000000-43fffffff : Soft Reserved
340000000-43fffffff : dax0.0
This is because insert_resource() reparents ranges when they completely
intersect an existing range.
This matters because code that uses region_intersects() to scan for a
given IORES_DESC will only check that top-level 'hmem.0' resource and
not the 'Soft Reserved' descendant.
So, to support EINJ (via einj_error_inject()) to inject errors into
memory hosted by a dax-device, be sure to describe the memory as
IORES_DESC_SOFT_RESERVED. This is a follow-on to:
commit b13a3e5fd40b ("ACPI: APEI: Fix _EINJ vs EFI_MEMORY_SP")
...that fixed EINJ support for "Soft Reserved" ranges in the first
instance.
Fixes: 262b45ae3ab4 ("x86/efi: EFI soft reservation to E820 enumeration")
Reported-by: Ricardo Sandoval Torres <ricardo.sandoval.torres@intel.com>
Tested-by: Ricardo Sandoval Torres <ricardo.sandoval.torres@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Omar Avelar <omar.avelar@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mark Gross <markgross@kernel.org>
Link: https://lore.kernel.org/r/166397075670.389916.7435722208896316387.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Alexey reported that the fraction of unknown filename instances in
kallsyms grew from ~0.3% to ~10% recently; Bill and Greg tracked it down
to assembler defined symbols, which regressed as a result of:
commit b8a9092330da ("Kbuild: do not emit debug info for assembly with LLVM_IAS=1")
In that commit, I allude to restoring debug info for assembler defined
symbols in a follow up patch, but it seems I forgot to do so in
commit a66049e2cf0e ("Kbuild: make DWARF version a choice")
Link: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=31bf18645d98b4d3d7357353be840e320649a67d
Fixes: b8a9092330da ("Kbuild: do not emit debug info for assembly with LLVM_IAS=1")
Reported-by: Alexey Alexandrov <aalexand@google.com>
Reported-by: Bill Wendling <morbo@google.com>
Reported-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Dmitrii, Fangrui, and Mashahiro note:
Before GCC 11 and Clang 12 -gsplit-dwarf implicitly uses -g2.
Fix CONFIG_DEBUG_INFO_SPLIT for gcc-11+ & clang-12+ which now need -g
specified in order for -gsplit-dwarf to work at all.
-gsplit-dwarf has been mutually exclusive with -g since support for
CONFIG_DEBUG_INFO_SPLIT was introduced in
commit 866ced950bcd ("kbuild: Support split debug info v4")
I don't think it ever needed to be.
Link: https://lore.kernel.org/lkml/20220815013317.26121-1-dmitrii.bundin.a@gmail.com/
Link: https://lore.kernel.org/lkml/CAK7LNARPAmsJD5XKAw7m_X2g7Fi-CAAsWDQiP7+ANBjkg7R7ng@mail.gmail.com/
Link: https://reviews.llvm.org/D80391
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Dmitrii Bundin <dmitrii.bundin.a@gmail.com>
Reported-by: Fangrui Song <maskray@google.com>
Reported-by: Masahiro Yamada <masahiroy@kernel.org>
Suggested-by: Dmitrii Bundin <dmitrii.bundin.a@gmail.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
io_uring caches task references to avoid doing atomics for each of them
per request. If a request is put from the same task that allocated it,
then we can maintain a per-ctx cache of them. This obviously relies
on io_uring always pruning caches in a reliable way, and there's
currently a case off io_uring fd release where we can miss that.
One example is a ring setup with IOPOLL, which relies on the task
polling for completions, which will free them. However, if such a task
submits a request and then exits or closes the ring without reaping
the completion, then ring release will reap and put. If release happens
from that very same task, the completed request task refs will get
put back into the cache pool. This is problematic, as we're now beyond
the point of pruning caches.
Manually drop these caches after doing an IOPOLL reap. This releases
references from the current task, which is enough. If another task
happens to be doing the release, then the caching will not be
triggered and there's no issue.
Cc: stable@vger.kernel.org
Fixes: e98e49b2bbf7 ("io_uring: extend task put optimisations")
Reported-by: Homin Rhee <hominlab@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit e90886291c7c ("certs: make system keyring depend on x509 parser")
is not the right fix because x509_load_certificate_list() can be modular.
The combination of CONFIG_SYSTEM_TRUSTED_KEYRING=y and
CONFIG_X509_CERTIFICATE_PARSER=m still results in the following error:
LD .tmp_vmlinux.kallsyms1
ld: certs/system_keyring.o: in function `load_system_certificate_list':
system_keyring.c:(.init.text+0x8c): undefined reference to `x509_load_certificate_list'
make: *** [Makefile:1169: vmlinux] Error 1
Fixes: e90886291c7c ("certs: make system keyring depend on x509 parser")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Adam Borowski <kilobyte@angband.pl>
|
|
There is nowhere calling `menu_get_root_menu` function,
so remove it.
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Remove unused imported 'os' module.
Signed-off-by: yangxingwu <xingwu.yang@gmail.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
cgroup has to be one kernfs dir, otherwise kernel panic is caused,
especially cgroup id is provide from userspace.
Reported-by: Marco Patalano <mpatalan@redhat.com>
Fixes: 6b658c4863c1 ("scsi: cgroup: Add cgroup_get_from_id()")
Cc: Muneendra <muneendra.kumar@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Mukesh Ojha <quic_mojha@quicinc.com>
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Due to undocumented, hysterical raisins on x86, the CFI jump-table
sections in .text are needlessly aligned to PMD_SIZE in the vmlinux
linker script. When compiling a CFI-enabled arm64 kernel with a 64KiB
page-size, a PMD maps 512MiB of virtual memory and so the .text section
increases to a whopping 940MiB and blows the final Image up to 960MiB.
Others report a link failure.
Since the CFI jump-table requires only instruction alignment, reduce the
alignment directives to function alignment for parity with other parts
of the .text section. This reduces the size of the .text section for the
aforementioned 64KiB page size arm64 kernel to 19MiB for a much more
reasonable total Image size of 39MiB.
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: "Mohan Rao .vanimina" <mailtoc.mohanrao@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/all/CAL_GTzigiNOMYkOPX1KDnagPhJtFNqSK=1USNbS0wUL4PW6-Uw@mail.gmail.com/
Fixes: cf68fffb66d6 ("add support for Clang CFI")
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220922215715.13345-1-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
My linux.ie address is in a bad place.
also add dri-devel for agpgart.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Inject #UD when emulating XSETBV if CR4.OSXSAVE is not set. This also
covers the "XSAVE not supported" check, as setting CR4.OSXSAVE=1 #GPs if
XSAVE is not supported (and userspace gets to keep the pieces if it
forces incoherent vCPU state).
Add a comment to kvm_emulate_xsetbv() to call out that the CPU checks
CR4.OSXSAVE before checking for intercepts. AMD'S APM implies that #UD
has priority (says that intercepts are checked before #GP exceptions),
while Intel's SDM says nothing about interception priority. However,
testing on hardware shows that both AMD and Intel CPUs prioritize the #UD
over interception.
Fixes: 02d4160fbd76 ("x86: KVM: add xsetbv to the emulator")
Cc: stable@vger.kernel.org
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220824033057.3576315-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Allow FP and SSE state to be saved and restored via KVM_{G,SET}_XSAVE on
XSAVE-capable hosts even if their bits are not exposed to the guest via
XCR0.
Failing to allow FP+SSE first showed up as a QEMU live migration failure,
where migrating a VM from a pre-XSAVE host, e.g. Nehalem, to an XSAVE
host failed due to KVM rejecting KVM_SET_XSAVE. However, the bug also
causes problems even when migrating between XSAVE-capable hosts as
KVM_GET_SAVE won't set any bits in user_xfeatures if XSAVE isn't exposed
to the guest, i.e. KVM will fail to actually migrate FP+SSE.
Because KVM_{G,S}ET_XSAVE are designed to allowing migrating between
hosts with and without XSAVE, KVM_GET_XSAVE on a non-XSAVE (by way of
fpu_copy_guest_fpstate_to_uabi()) always sets the FP+SSE bits in the
header so that KVM_SET_XSAVE will work even if the new host supports
XSAVE.
Fixes: ad856280ddea ("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0")
bz: https://bugzilla.redhat.com/show_bug.cgi?id=2079311
Cc: stable@vger.kernel.org
Cc: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
[sean: add comment, massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220824033057.3576315-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Reinstate the per-vCPU guest_supported_xcr0 by partially reverting
commit 988896bb6182; the implicit assessment that guest_supported_xcr0 is
always the same as guest_fpu.fpstate->user_xfeatures was incorrect.
kvm_vcpu_after_set_cpuid() isn't the only place that sets user_xfeatures,
as user_xfeatures is set to fpu_user_cfg.default_features when guest_fpu
is allocated via fpu_alloc_guest_fpstate() => __fpstate_reset().
guest_supported_xcr0 on the other hand is zero-allocated. If userspace
never invokes KVM_SET_CPUID2, supported XCR0 will be '0', whereas the
allowed user XFEATURES will be non-zero.
Practically speaking, the edge case likely doesn't matter as no sane
userspace will live migrate a VM without ever doing KVM_SET_CPUID2. The
primary motivation is to prepare for KVM intentionally and explicitly
setting bits in user_xfeatures that are not set in guest_supported_xcr0.
Because KVM_{G,S}ET_XSAVE can be used to svae/restore FP+SSE state even
if the host doesn't support XSAVE, KVM needs to set the FP+SSE bits in
user_xfeatures even if they're not allowed in XCR0, e.g. because XCR0
isn't exposed to the guest. At that point, the simplest fix is to track
the two things separately (allowed save/restore vs. allowed XCR0).
Fixes: 988896bb6182 ("x86/kvm/fpu: Remove kvm_vcpu_arch.guest_supported_xcr0")
Cc: stable@vger.kernel.org
Cc: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220824033057.3576315-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The update to statistic max_mmu_rmap_size is unintentionally removed by
commit 4293ddb788c1 ("KVM: x86/mmu: Remove redundant spte present check
in mmu_set_spte"). Add missing update to it or max_mmu_rmap_size will
always be nonsensical 0.
Fixes: 4293ddb788c1 ("KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Message-Id: <20220907080657.42898-1-linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The following warning appears when executing:
make -C tools/testing/selftests/kvm
rseq_test.c: In function ‘main’:
rseq_test.c:237:33: warning: implicit declaration of function ‘gettid’; did you mean ‘getgid’? [-Wimplicit-function-declaration]
(void *)(unsigned long)gettid());
^~~~~~
getgid
/usr/bin/ld: /tmp/ccr5mMko.o: in function `main':
../kvm/tools/testing/selftests/kvm/rseq_test.c:237: undefined reference to `gettid'
collect2: error: ld returned 1 exit status
make: *** [../lib.mk:173: ../kvm/tools/testing/selftests/kvm/rseq_test] Error 1
Use the more compatible syscall(SYS_gettid) instead of gettid() to fix it.
More subsequent reuse may cause it to be wrapped in a lib file.
Signed-off-by: Jinrong Liang <cloudliang@tencent.com>
Message-Id: <20220802071240.84626-1-cloudliang@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Commit 5a836bf6b09f ("mm: slub: move flush_cpu_slab() invocations
__free_slab() invocations out of IRQ context") moved all flush_cpu_slab()
invocations to the global workqueue to avoid a problem related
with deactivate_slab()/__free_slab() being called from an IRQ context
on PREEMPT_RT kernels.
When the flush_all_cpu_locked() function is called from a task context
it may happen that a workqueue with WQ_MEM_RECLAIM bit set ends up
flushing the global workqueue, this will cause a dependency issue.
workqueue: WQ_MEM_RECLAIM nvme-delete-wq:nvme_delete_ctrl_work [nvme_core]
is flushing !WQ_MEM_RECLAIM events:flush_cpu_slab
WARNING: CPU: 37 PID: 410 at kernel/workqueue.c:2637
check_flush_dependency+0x10a/0x120
Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core]
RIP: 0010:check_flush_dependency+0x10a/0x120[ 453.262125] Call Trace:
__flush_work.isra.0+0xbf/0x220
? __queue_work+0x1dc/0x420
flush_all_cpus_locked+0xfb/0x120
__kmem_cache_shutdown+0x2b/0x320
kmem_cache_destroy+0x49/0x100
bioset_exit+0x143/0x190
blk_release_queue+0xb9/0x100
kobject_cleanup+0x37/0x130
nvme_fc_ctrl_free+0xc6/0x150 [nvme_fc]
nvme_free_ctrl+0x1ac/0x2b0 [nvme_core]
Fix this bug by creating a workqueue for the flush operation with
the WQ_MEM_RECLAIM bit set.
Fixes: 5a836bf6b09f ("mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context")
Cc: <stable@vger.kernel.org>
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
This patch avoids threads live-locking for hours when a large number
threads are competing over the last few free extents as they blocks
getting added and removed from preallocation pools. From our bug
reporter:
A reliable way for triggering this has multiple writers
continuously write() to files when the filesystem is full, while
small amounts of space are freed (e.g. by truncating a large file
-1MiB at a time). In the local filesystem, this can be done by
simply not checking the return code of write (0) and/or the error
(ENOSPACE) that is set. Over NFS with an async mount, even clients
with proper error checking will behave this way since the linux NFS
client implementation will not propagate the server errors [the
write syscalls immediately return success] until the file handle is
closed. This leads to a situation where NFS clients send a
continuous stream of WRITE rpcs which result in ERRNOSPACE -- but
since the client isn't seeing this, the stream of writes continues
at maximum network speed.
When some space does appear, multiple writers will all attempt to
claim it for their current write. For NFS, we may see dozens to
hundreds of threads that do this.
The real-world scenario of this is database backup tooling (in
particular, github.com/mdkent/percona-xtrabackup) which may write
large files (>1TiB) to NFS for safe keeping. Some temporary files
are written, rewound, and read back -- all before closing the file
handle (the temp file is actually unlinked, to trigger automatic
deletion on close/crash.) An application like this operating on an
async NFS mount will not see an error code until TiB have been
written/read.
The lockup was observed when running this database backup on large
filesystems (64 TiB in this case) with a high number of block
groups and no free space. Fragmentation is generally not a factor
in this filesystem (~thousands of large files, mostly contiguous
except for the parts written while the filesystem is at capacity.)
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
When walking through an inode extents, the ext4_ext_binsearch_idx() function
assumes that the extent header has been previously validated. However, there
are no checks that verify that the number of entries (eh->eh_entries) is
non-zero when depth is > 0. And this will lead to problems because the
EXT_FIRST_INDEX() and EXT_LAST_INDEX() will return garbage and result in this:
[ 135.245946] ------------[ cut here ]------------
[ 135.247579] kernel BUG at fs/ext4/extents.c:2258!
[ 135.249045] invalid opcode: 0000 [#1] PREEMPT SMP
[ 135.250320] CPU: 2 PID: 238 Comm: tmp118 Not tainted 5.19.0-rc8+ #4
[ 135.252067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
[ 135.255065] RIP: 0010:ext4_ext_map_blocks+0xc20/0xcb0
[ 135.256475] Code:
[ 135.261433] RSP: 0018:ffffc900005939f8 EFLAGS: 00010246
[ 135.262847] RAX: 0000000000000024 RBX: ffffc90000593b70 RCX: 0000000000000023
[ 135.264765] RDX: ffff8880038e5f10 RSI: 0000000000000003 RDI: ffff8880046e922c
[ 135.266670] RBP: ffff8880046e9348 R08: 0000000000000001 R09: ffff888002ca580c
[ 135.268576] R10: 0000000000002602 R11: 0000000000000000 R12: 0000000000000024
[ 135.270477] R13: 0000000000000000 R14: 0000000000000024 R15: 0000000000000000
[ 135.272394] FS: 00007fdabdc56740(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
[ 135.274510] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 135.276075] CR2: 00007ffc26bd4f00 CR3: 0000000006261004 CR4: 0000000000170ea0
[ 135.277952] Call Trace:
[ 135.278635] <TASK>
[ 135.279247] ? preempt_count_add+0x6d/0xa0
[ 135.280358] ? percpu_counter_add_batch+0x55/0xb0
[ 135.281612] ? _raw_read_unlock+0x18/0x30
[ 135.282704] ext4_map_blocks+0x294/0x5a0
[ 135.283745] ? xa_load+0x6f/0xa0
[ 135.284562] ext4_mpage_readpages+0x3d6/0x770
[ 135.285646] read_pages+0x67/0x1d0
[ 135.286492] ? folio_add_lru+0x51/0x80
[ 135.287441] page_cache_ra_unbounded+0x124/0x170
[ 135.288510] filemap_get_pages+0x23d/0x5a0
[ 135.289457] ? path_openat+0xa72/0xdd0
[ 135.290332] filemap_read+0xbf/0x300
[ 135.291158] ? _raw_spin_lock_irqsave+0x17/0x40
[ 135.292192] new_sync_read+0x103/0x170
[ 135.293014] vfs_read+0x15d/0x180
[ 135.293745] ksys_read+0xa1/0xe0
[ 135.294461] do_syscall_64+0x3c/0x80
[ 135.295284] entry_SYSCALL_64_after_hwframe+0x46/0xb0
This patch simply adds an extra check in __ext4_ext_check(), verifying that
eh_entries is not 0 when eh_depth is > 0.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215941
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216283
Cc: Baokun Li <libaokun1@huawei.com>
Cc: stable@kernel.org
Signed-off-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20220822094235.2690-1-lhenriques@suse.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When the PWM driver was changed to disable clocks if no PWMs are enabled,
it ended up also disabling the shared parent with the UART, since the
UART doesn't do any clock enablement on its own.
To avoid these surprises, switch to clk_get_enabled().
Fixes: ace41d7564e655 ("pwm: sifive: Ensure the clk is enabled exactly once per running PWM")
Cc: stable <stable@kernel.org>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Link: https://lore.kernel.org/r/20220920160017.7315-1-olof@lixom.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
8250_omap uses em485, fill in rs485_supported accordingly. This makes
RS485 work with 8250_omap again, which was broken with the introduction
of the RS485 config sanitization.
Fixes: be2e2cb1d2819 ("serial: Sanitize rs485_struct")
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Link: https://lore.kernel.org/r/20220916110955.161099-1-matthias.schiffer@ew.tq-group.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|