Age | Commit message (Collapse) | Author | Files | Lines |
|
Every EXPORT_SYMBOL creates __kstrtab_* and __kstrtabns_*, which
consumes 15-20% of the kallsyms entries.
For example, on the system built from the x86_64 defconfig,
$ cat /proc/kallsyms | wc
129527 388581 5685465
$ cat /proc/kallsyms | grep __kstrtab | wc
23489 70467 1187932
We already ignore __crc_* symbols populated by EXPORT_SYMBOL, so it
should be fine to ignore __kstrtab_* and __kstrtabns_* as well.
This makes vmlinux a bit smaller.
$ size vmlinux.before vmlinux.after
text data bss dec hex filename
22785374 8559694 1413328 32758396 1f3da7c vmlinux.before
22785374 8137806 1413328 32336508 1ed6a7c vmlinux.after
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
This gets rid of the pipe operator connected with 'cat'.
Also use getopt_long() to parse the command line.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Now that kallsyms.c parses the output from mksysmap, some symbols have
already been dropped.
Move comments to scripts/mksysmap. Also, make the grep command readable.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
scripts/mksysmap internally runs ${NM} (dropping some symbols).
When CONFIG_KALLSYMS=y, mksysmap creates .tmp_System.map, but it is
almost the same as the output from the ${NM} invocation in kallsyms().
It is true scripts/mksysmap drops some symbols, but scripts/kallsyms.c
ignores more anyway.
Keep the mksysmap output as *.syms, and reuse it for kallsyms and
'cmp -s'. It saves one ${NM} invocation.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Since commit 7b4537199a4a ("kbuild: link symbol CRCs at final link,
removing CONFIG_MODULE_REL_CRCS"), __crc_* symbols never become
absolute.
Keep ignoring __crc_*, but update the comment.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Kbuild puts the objects listed in head-y at the head of vmlinux.
Conventionally, we do this for head*.S, which contains the kernel entry
point.
A counter approach is to control the section order by the linker script.
Actually, the code marked as __HEAD goes into the ".head.text" section,
which is placed before the normal ".text" section.
I do not know if both of them are needed. From the build system
perspective, head-y is not mandatory. If you can achieve the proper code
placement by the linker script only, it would be cleaner.
I collected the current head-y objects into head-object-list.txt. It is
a whitelist. My hope is it will be reduced in the long run.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
|
|
The objects placed at the head of vmlinux need special treatments:
- arch/$(SRCARCH)/Makefile adds them to head-y in order to place
them before other archives in the linker command line.
- arch/$(SRCARCH)/kernel/Makefile adds them to extra-y instead of
obj-y to avoid them going into built-in.a.
This commit gets rid of the latter.
Create vmlinux.a to collect all the objects that are unconditionally
linked to vmlinux. The objects listed in head-y are moved to the head
of vmlinux.a by using 'ar m'.
With this, arch/$(SRCARCH)/kernel/Makefile can consistently use obj-y
for builtin objects.
There is no *.o that is directly linked to vmlinux. Drop unneeded code
in scripts/clang-tools/gen_compile_commands.py.
$(AR) mPi needs 'T' to workaround the llvm-ar bug. The fix was suggested
by Nathan Chancellor [1].
[1]: https://lore.kernel.org/llvm/YyjjT5gQ2hGMH0ni@dev-arch.thelio-3990X/
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
|
|
V=1 (verbose build) shows commands executed by Make, but it may cause
misunderstanding.
For example, the following command shows the outstanding error message.
$ make V=1 INSTALL_PATH=/tmp install
test -e include/generated/autoconf.h -a -e include/config/auto.conf || ( \
echo >&2; \
echo >&2 " ERROR: Kernel configuration is invalid."; \
echo >&2 " include/generated/autoconf.h or include/config/auto.conf are missing.";\
echo >&2 " Run 'make oldconfig && make prepare' on kernel src to fix it."; \
echo >&2 ; \
/bin/false)
unset sub_make_done; ./scripts/install.sh
It is not an error. Make just showed the recipe lines it has executed,
but people may think that 'make install' has failed.
Likewise, the combination of V=1 and O= shows confusing
"*** The source tree is not clean, please run 'make mrproper'".
Suppress such misleading logs.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
|
|
Modpost generates .vmlinux.export.c and *.mod.c, which are prerequisites
of vmlinux and modules, respectively.
The modpost stage should be re-run when the modpost code is updated.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Currently, modpost is executed twice; first for vmlinux, second
for modules.
This commit merges them.
Current build flow
==================
1) build obj-y and obj-m objects
2) link vmlinux.o
3) modpost for vmlinux
4) link vmlinux
5) modpost for modules
6) link modules (*.ko)
The build steps 1) through 6) are serialized, that is, modules are
built after vmlinux. You do not get benefits of parallel builds when
scripts/link-vmlinux.sh is being run.
New build flow
==============
1) build obj-y and obj-m objects
2) link vmlinux.o
3) modpost for vmlinux and modules
4a) link vmlinux
4b) link modules (*.ko)
In the new build flow, modpost is invoked just once.
vmlinux and modules are built in parallel. One exception is
CONFIG_DEBUG_INFO_BTF_MODULES=y, where modules depend on vmlinux.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
|
|
Move the build rules of vmlinux.o out of scripts/link-vmlinux.sh to
clearly separate 1) pre-modpost, 2) modpost, 3) post-modpost stages.
This will make further refactoring possible.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
|
|
.vmlinux.objs is used by modpost, so scripts/Makefile.modpost is
a better place to generate it.
It is used only when CONFIG_MODVERSIONS=y. It should be guarded
by "ifdef CONFIG_MODVERSIONS".
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
|
|
Use the ordinary obj-y syntax to list subdirectories.
Note1:
Previously, the link order of lib-y depended on CONFIG_MODULES; lib-y
was linked before drivers-y when CONFIG_MODULES=y, otherwise after
drivers-y. This was a bug of commit 7273ad2b08f8 ("kbuild: link lib-y
objects to vmlinux forcibly when CONFIG_MODULES=y"), but it was not a
big deal after all. Now, all objects listed in lib-y are linked last,
irrespective of CONFIG_MODULES.
Note2:
Finally, the single target build in arch/*/lib/ works correctly. There was
a bug report about this. [1]
$ make ARCH=arm arch/arm/lib/findbit.o
CALL scripts/checksyscalls.sh
AS arch/arm/lib/findbit.o
[1]: https://lore.kernel.org/linux-kbuild/YvUQOwL6lD4%2F5%2FU6@shell.armlinux.org.uk/
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
|
|
cc-ifversion is GCC specific. Replace it with compiler specific
variants. Update the users of cc-ifversion to use these new macros.
Link: https://github.com/ClangBuiltLinux/linux/issues/350
Link: https://lore.kernel.org/llvm/CAGG=3QWSAUakO42kubrCap8fp-gm1ERJJAYXTnP1iHk_wrH=BQ@mail.gmail.com/
Suggested-by: Bill Wendling <morbo@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Doing make V=1 binrpm-pkg results in:
Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.EgV6qJ
+ umask 022
+ cd .
+ /bin/rm -rf /home/scgl/rpmbuild/BUILDROOT/kernel-6.0.0_rc5+-1.s390x
+ /bin/mkdir -p /home/scgl/rpmbuild/BUILDROOT
+ /bin/mkdir /home/scgl/rpmbuild/BUILDROOT/kernel-6.0.0_rc5+-1.s390x
+ mkdir -p /home/scgl/rpmbuild/BUILDROOT/kernel-6.0.0_rc5+-1.s390x/boot
+ make -f ./Makefile image_name
+ cp test -e include/generated/autoconf.h -a -e include/config/auto.conf || ( \ echo >&2; \ echo >&2 " ERROR: Kernel configuration is invalid."; \ echo >&2 " include/generated/autoconf.h or include/config/auto.conf are missing.";\ echo >&2 " Run 'make oldconfig && make prepare' on kernel src to fix it."; \ echo >&2 ; \ /bin/false) arch/s390/boot/bzImage /home/scgl/rpmbuild/BUILDROOT/kernel-6.0.0_rc5+-1.s390x/boot/vmlinuz-6.0.0-rc5+
cp: invalid option -- 'e'
Try 'cp --help' for more information.
error: Bad exit status from /var/tmp/rpm-tmp.EgV6qJ (%install)
Because the make call to get the image name is verbose and prints
additional information.
Fixes: 993bdde94547 ("kbuild: add image_name to no-sync-config-targets")
Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Remove unused function argument, and there is
no logic changes.
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
sym_set_choice_value could be removed and directly call
sym_set_tristate_value instead.
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Since commit 7b4537199a4a ("kbuild: link symbol CRCs at final link,
removing CONFIG_MODULE_REL_CRCS"), the module versioning on the
(non-upstreamed-yet) kvx Linux port is broken due to unexpected padding
for __crc_* symbols. The kvx GCC adds padding so u32 gets 8-byte
alignment instead of 4.
I do not know if this happens for upstream architectures in general,
but any compiler has the freedom to insert padding for faster access.
Use the inline assembler to directly specify the wanted data layout.
This is how we previously did before the breakage.
Link: https://lore.kernel.org/lkml/20220817161438.32039-1-ysionneau@kalray.eu/
Link: https://lore.kernel.org/linux-kbuild/31ce5305-a76b-13d7-ea55-afca82c46cf2@kalray.eu/
Fixes: 7b4537199a4a ("kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS")
Reported-by: Yann Sionneau <ysionneau@kalray.eu>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Yann Sionneau <ysionneau@kalray.eu>
|
|
Based on Linus' patch. Refactor scripts/Makefile.vmlinux_o as well.
Link: https://lore.kernel.org/lkml/CAHk-=wgjTMQgiKzBZTmb=uWGDEQxDdyF1+qxBkODYciuNsmwnw@mail.gmail.com/
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
|
|
The single target build has a subtle bug for the combination for
an individual file and a subdirectory.
[1] 'make kernel/fork.i' builds only kernel/fork.i
$ make kernel/fork.i
CALL scripts/checksyscalls.sh
DESCEND objtool
CPP kernel/fork.i
[2] 'make kernel/' builds only under the kernel/ directory.
$ make kernel/
CALL scripts/checksyscalls.sh
DESCEND objtool
CC kernel/fork.o
CC kernel/exec_domain.o
[snip]
CC kernel/rseq.o
AR kernel/built-in.a
But, if you try to do [1] and [2] in a single command, you will get
only [1] with a weird log:
$ make kernel/fork.i kernel/
CALL scripts/checksyscalls.sh
DESCEND objtool
CPP kernel/fork.i
make[2]: Nothing to be done for 'kernel/'.
With 'make kernel/fork.i kernel/', you should get both [1] and [2].
Rewrite the single target build.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Remove the bash build dependency for those who otherwise do not
have it installed. This also provides a significant speedup:
$ make defconfig
$ make yes2modconfig
...
$ find . -name "*.o" | grep -v vmlinux | wc
3169 3169 89615
$ export NM=nm
$ time sh -c 'find . -name "*.o" | grep -v vmlinux | xargs -n1
./scripts/check-local-export'
Without patch:
0m15.90s real 0m12.17s user 0m05.28s system
With patch:
dash + nawk
0m02.16s real 0m02.92s user 0m00.34s system
dash + busybox awk
0m02.36s real 0m03.36s user 0m00.34s system
dash + gawk
0m02.07s real 0m03.26s user 0m00.32s system
bash + gawk
0m03.55s real 0m05.00s user 0m00.54s system
Signed-off-by: Owen Rafferty <owen@owenrafferty.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
This reverts commit [1] in the pre-git era.
I do not know what problem happened in the script when sh != bash
because there is no commit message.
Now that this script is much simpler than it used to be, let's revert
it, and let' see. (If this turns out to be problematic, fix the code
with proper commit description.)
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=11acbbbb8a50f4de7dbe4bc1b5acc440dfe81810
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Minimize the scope of LC_ALL=C like before commit 87c94bfb8ad3 ("kbuild:
override build timestamp & version").
Give LC_ALL=C to '$LD -v' to get the consistent version output, as commit
bcbcf50f5218 ("kbuild: fix ld-version.sh to not be affected by locale")
mentioned the LD version is affected by locale.
While I was here, I merged two sed invocations.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Now that UTS_VERSION was separated out, this header can be generated
much earlier, and probably the top Makefile is a better place to do it
than init/Makefile.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Kbuild builds init/built-in.a twice; first during the ordinary
directory descending, second from scripts/link-vmlinux.sh.
We do this because UTS_VERSION contains the build version and the
timestamp. We cannot update it during the normal directory traversal
since we do not yet know if we need to update vmlinux. UTS_VERSION is
temporarily calculated, but omitted from the update check. Otherwise,
vmlinux would be rebuilt every time.
When Kbuild results in running link-vmlinux.sh, it increments the
version number in the .version file and takes the timestamp at that
time to really fix UTS_VERSION.
However, updating the same file twice is a footgun. To avoid nasty
timestamp issues, all build artifacts that depend on init/built-in.a
are atomically generated in link-vmlinux.sh, where some of them do not
need rebuilding.
To fix this issue, this commit changes as follows:
[1] Split UTS_VERSION out to include/generated/utsversion.h from
include/generated/compile.h
include/generated/utsversion.h is generated just before the
vmlinux link. It is generated under include/generated/ because
some decompressors (s390, x86) use UTS_VERSION.
[2] Split init_uts_ns and linux_banner out to init/version-timestamp.c
from init/version.c
init_uts_ns and linux_banner contain UTS_VERSION. During the ordinary
directory descending, they are compiled with __weak and used to
determine if vmlinux needs relinking. Just before the vmlinux link,
they are compiled without __weak to embed the real version and
timestamp.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
This is unneeded since commit 073a9ecb3a73 ("init/version.c: remove
Version_<LINUX_VERSION_CODE> symbol").
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Unify the code between in-tree builds and external module builds.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Remove the potentially invalid modules.order instead of using
the temporary file.
Also, KBUILD_MODULES is don't care for single builds. No need to
cancel it.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
The dependency, "modules: modules_check" is specified twice.
Commit 1a998be620a1 ("kbuild: check module name conflict for external
modules as well") missed to clean it up.
'PHONY += modules' also appears twice.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Use obj-y to clean up Makefile.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
The AWK code was added to deduplicate modules.order in case $(obj-m)
contains the same module multiple times, but it is actually unneeded
since commit b2c885549122 ("kbuild: update modules.order only when
contained modules are updated").
The list is already deduplicated before being processed by AWK because
$^ is the deduplicated list of prerequisites.
(Please note the real-prereqs macro uses $^)
Yet, modules.order will contain duplication if two different Makefiles
build the same module:
foo/Makefile:
obj-m += bar/baz.o
foo/bar/Makefile:
obj-m += baz.o
However, the parallel builds cannot properly handle this case in the
first place. So, it is better to let it fail (as already done by
scripts/modules-check.sh).
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
It is unneeded to check the sha1sum every time.
Create the timestamp files to manage it.
Add '.' to clean-dirs because 'make clean' must visit ./Kbuild to
clean up the timestamp files.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
My future plan is to list subdirectories in ./Kbuild. When it occurs,
$(vmlinux-alldirs) will not contain all subdirectories.
Let's hard-code the directory list until I get around to implementing
a more sophisticated way for generating a source tarball.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
|
|
missing-syscalls and old-atomics are meant to be phony targets.
Adding them to always-y is odd. (always-y should generate something).
Add a new phony target 'prepare', which depends on all the other.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
When receiving some signal, GNU Make automatically deletes the target if
it has already been changed by the interrupted recipe.
If the target is possibly incomplete due to interruption, it must be
deleted so that it will be remade from scratch on the next run of make.
Otherwise, the target would remain corrupted permanently because its
timestamp had already been updated.
Thanks to this behavior of Make, you can stop the build any time by
pressing Ctrl-C, and just run 'make' to resume it.
Kbuild also relies on this feature, but it is equivalently important
for any build systems that make decisions based on timestamps (if you
want to support Ctrl-C reliably).
However, this does not always work as claimed; Make immediately dies
with Ctrl-C if its stderr goes into a pipe.
[Test Makefile]
foo:
echo hello > $@
sleep 3
echo world >> $@
[Test Result]
$ make # hit Ctrl-C
echo hello > foo
sleep 3
^Cmake: *** Deleting file 'foo'
make: *** [Makefile:3: foo] Interrupt
$ make 2>&1 | cat # hit Ctrl-C
echo hello > foo
sleep 3
^C$ # 'foo' is often left-over
The reason is because SIGINT is sent to the entire process group.
In this example, SIGINT kills 'cat', and 'make' writes the message to
the closed pipe, then dies with SIGPIPE before cleaning the target.
A typical bad scenario (as reported by [1], [2]) is to save build log
by using the 'tee' command:
$ make 2>&1 | tee log
This can be problematic for any build systems based on Make, so I hope
it will be fixed in GNU Make. The maintainer of GNU Make stated this is
a long-standing issue and difficult to fix [3]. It has not been fixed
yet as of writing.
So, we cannot rely on Make cleaning the target. We can do it by
ourselves, in signal traps.
As far as I understand, Make takes care of SIGHUP, SIGINT, SIGQUIT, and
SITERM for the target removal. I added the traps for them, and also for
SIGPIPE just in case cmd_* rule prints something to stdout or stderr
(but I did not observe an actual case where SIGPIPE was triggered).
[Note 1]
The trap handler might be worth explaining.
rm -f $@; trap - $(sig); kill -s $(sig) $$
This lets the shell kill itself by the signal it caught, so the parent
process can tell the child has exited on the signal. Generally, this is
a proper manner for handling signals, in case the calling program (like
Bash) may monitor WIFSIGNALED() and WTERMSIG() for WCE although this may
not be a big deal here because GNU Make handles SIGHUP, SIGINT, SIGQUIT
in WUE and SIGTERM in IUE.
IUE - Immediate Unconditional Exit
WUE - Wait and Unconditional Exit
WCE - Wait and Cooperative Exit
For details, see "Proper handling of SIGINT/SIGQUIT" [4].
[Note 2]
Reverting 392885ee82d3 ("kbuild: let fixdep directly write to .*.cmd
files") would directly address [1], but it only saves if_changed_dep.
As reported in [2], all commands that use redirection can potentially
leave an empty (i.e. broken) target.
[Note 3]
Another (even safer) approach might be to always write to a temporary
file, and rename it to $@ at the end of the recipe.
<command> > $(tmp-target)
mv $(tmp-target) $@
It would require a lot of Makefile changes, and result in ugly code,
so I did not take it.
[Note 4]
A little more thoughts about a pattern rule with multiple targets (or
a grouped target).
%.x %.y: %.z
<recipe>
When interrupted, GNU Make deletes both %.x and %.y, while this solution
only deletes $@. Probably, this is not a big deal. The next run of make
will execute the rule again to create $@ along with the other files.
[1]: https://lore.kernel.org/all/YLeot94yAaM4xbMY@gmail.com/
[2]: https://lore.kernel.org/all/20220510221333.2770571-1-robh@kernel.org/
[3]: https://lists.gnu.org/archive/html/help-make/2021-06/msg00001.html
[4]: https://www.cons.org/cracauer/sigint.html
Fixes: 392885ee82d3 ("kbuild: let fixdep directly write to .*.cmd files")
Reported-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Rob Herring <robh@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
|
|
|
|
The "hmem" platform-devices that are created to represent the
platform-advertised "Soft Reserved" memory ranges end up inserting a
resource that causes the iomem_resource tree to look like this:
340000000-43fffffff : hmem.0
340000000-43fffffff : Soft Reserved
340000000-43fffffff : dax0.0
This is because insert_resource() reparents ranges when they completely
intersect an existing range.
This matters because code that uses region_intersects() to scan for a
given IORES_DESC will only check that top-level 'hmem.0' resource and
not the 'Soft Reserved' descendant.
So, to support EINJ (via einj_error_inject()) to inject errors into
memory hosted by a dax-device, be sure to describe the memory as
IORES_DESC_SOFT_RESERVED. This is a follow-on to:
commit b13a3e5fd40b ("ACPI: APEI: Fix _EINJ vs EFI_MEMORY_SP")
...that fixed EINJ support for "Soft Reserved" ranges in the first
instance.
Fixes: 262b45ae3ab4 ("x86/efi: EFI soft reservation to E820 enumeration")
Reported-by: Ricardo Sandoval Torres <ricardo.sandoval.torres@intel.com>
Tested-by: Ricardo Sandoval Torres <ricardo.sandoval.torres@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Omar Avelar <omar.avelar@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mark Gross <markgross@kernel.org>
Link: https://lore.kernel.org/r/166397075670.389916.7435722208896316387.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Alexey reported that the fraction of unknown filename instances in
kallsyms grew from ~0.3% to ~10% recently; Bill and Greg tracked it down
to assembler defined symbols, which regressed as a result of:
commit b8a9092330da ("Kbuild: do not emit debug info for assembly with LLVM_IAS=1")
In that commit, I allude to restoring debug info for assembler defined
symbols in a follow up patch, but it seems I forgot to do so in
commit a66049e2cf0e ("Kbuild: make DWARF version a choice")
Link: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=31bf18645d98b4d3d7357353be840e320649a67d
Fixes: b8a9092330da ("Kbuild: do not emit debug info for assembly with LLVM_IAS=1")
Reported-by: Alexey Alexandrov <aalexand@google.com>
Reported-by: Bill Wendling <morbo@google.com>
Reported-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Dmitrii, Fangrui, and Mashahiro note:
Before GCC 11 and Clang 12 -gsplit-dwarf implicitly uses -g2.
Fix CONFIG_DEBUG_INFO_SPLIT for gcc-11+ & clang-12+ which now need -g
specified in order for -gsplit-dwarf to work at all.
-gsplit-dwarf has been mutually exclusive with -g since support for
CONFIG_DEBUG_INFO_SPLIT was introduced in
commit 866ced950bcd ("kbuild: Support split debug info v4")
I don't think it ever needed to be.
Link: https://lore.kernel.org/lkml/20220815013317.26121-1-dmitrii.bundin.a@gmail.com/
Link: https://lore.kernel.org/lkml/CAK7LNARPAmsJD5XKAw7m_X2g7Fi-CAAsWDQiP7+ANBjkg7R7ng@mail.gmail.com/
Link: https://reviews.llvm.org/D80391
Cc: Andi Kleen <ak@linux.intel.com>
Reported-by: Dmitrii Bundin <dmitrii.bundin.a@gmail.com>
Reported-by: Fangrui Song <maskray@google.com>
Reported-by: Masahiro Yamada <masahiroy@kernel.org>
Suggested-by: Dmitrii Bundin <dmitrii.bundin.a@gmail.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
io_uring caches task references to avoid doing atomics for each of them
per request. If a request is put from the same task that allocated it,
then we can maintain a per-ctx cache of them. This obviously relies
on io_uring always pruning caches in a reliable way, and there's
currently a case off io_uring fd release where we can miss that.
One example is a ring setup with IOPOLL, which relies on the task
polling for completions, which will free them. However, if such a task
submits a request and then exits or closes the ring without reaping
the completion, then ring release will reap and put. If release happens
from that very same task, the completed request task refs will get
put back into the cache pool. This is problematic, as we're now beyond
the point of pruning caches.
Manually drop these caches after doing an IOPOLL reap. This releases
references from the current task, which is enough. If another task
happens to be doing the release, then the caching will not be
triggered and there's no issue.
Cc: stable@vger.kernel.org
Fixes: e98e49b2bbf7 ("io_uring: extend task put optimisations")
Reported-by: Homin Rhee <hominlab@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit e90886291c7c ("certs: make system keyring depend on x509 parser")
is not the right fix because x509_load_certificate_list() can be modular.
The combination of CONFIG_SYSTEM_TRUSTED_KEYRING=y and
CONFIG_X509_CERTIFICATE_PARSER=m still results in the following error:
LD .tmp_vmlinux.kallsyms1
ld: certs/system_keyring.o: in function `load_system_certificate_list':
system_keyring.c:(.init.text+0x8c): undefined reference to `x509_load_certificate_list'
make: *** [Makefile:1169: vmlinux] Error 1
Fixes: e90886291c7c ("certs: make system keyring depend on x509 parser")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Adam Borowski <kilobyte@angband.pl>
|
|
There is nowhere calling `menu_get_root_menu` function,
so remove it.
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Remove unused imported 'os' module.
Signed-off-by: yangxingwu <xingwu.yang@gmail.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
cgroup has to be one kernfs dir, otherwise kernel panic is caused,
especially cgroup id is provide from userspace.
Reported-by: Marco Patalano <mpatalan@redhat.com>
Fixes: 6b658c4863c1 ("scsi: cgroup: Add cgroup_get_from_id()")
Cc: Muneendra <muneendra.kumar@broadcom.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Mukesh Ojha <quic_mojha@quicinc.com>
Cc: stable@vger.kernel.org # v5.14+
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Due to undocumented, hysterical raisins on x86, the CFI jump-table
sections in .text are needlessly aligned to PMD_SIZE in the vmlinux
linker script. When compiling a CFI-enabled arm64 kernel with a 64KiB
page-size, a PMD maps 512MiB of virtual memory and so the .text section
increases to a whopping 940MiB and blows the final Image up to 960MiB.
Others report a link failure.
Since the CFI jump-table requires only instruction alignment, reduce the
alignment directives to function alignment for parity with other parts
of the .text section. This reduces the size of the .text section for the
aforementioned 64KiB page size arm64 kernel to 19MiB for a much more
reasonable total Image size of 39MiB.
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: "Mohan Rao .vanimina" <mailtoc.mohanrao@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/all/CAL_GTzigiNOMYkOPX1KDnagPhJtFNqSK=1USNbS0wUL4PW6-Uw@mail.gmail.com/
Fixes: cf68fffb66d6 ("add support for Clang CFI")
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220922215715.13345-1-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
My linux.ie address is in a bad place.
also add dri-devel for agpgart.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Inject #UD when emulating XSETBV if CR4.OSXSAVE is not set. This also
covers the "XSAVE not supported" check, as setting CR4.OSXSAVE=1 #GPs if
XSAVE is not supported (and userspace gets to keep the pieces if it
forces incoherent vCPU state).
Add a comment to kvm_emulate_xsetbv() to call out that the CPU checks
CR4.OSXSAVE before checking for intercepts. AMD'S APM implies that #UD
has priority (says that intercepts are checked before #GP exceptions),
while Intel's SDM says nothing about interception priority. However,
testing on hardware shows that both AMD and Intel CPUs prioritize the #UD
over interception.
Fixes: 02d4160fbd76 ("x86: KVM: add xsetbv to the emulator")
Cc: stable@vger.kernel.org
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220824033057.3576315-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Allow FP and SSE state to be saved and restored via KVM_{G,SET}_XSAVE on
XSAVE-capable hosts even if their bits are not exposed to the guest via
XCR0.
Failing to allow FP+SSE first showed up as a QEMU live migration failure,
where migrating a VM from a pre-XSAVE host, e.g. Nehalem, to an XSAVE
host failed due to KVM rejecting KVM_SET_XSAVE. However, the bug also
causes problems even when migrating between XSAVE-capable hosts as
KVM_GET_SAVE won't set any bits in user_xfeatures if XSAVE isn't exposed
to the guest, i.e. KVM will fail to actually migrate FP+SSE.
Because KVM_{G,S}ET_XSAVE are designed to allowing migrating between
hosts with and without XSAVE, KVM_GET_XSAVE on a non-XSAVE (by way of
fpu_copy_guest_fpstate_to_uabi()) always sets the FP+SSE bits in the
header so that KVM_SET_XSAVE will work even if the new host supports
XSAVE.
Fixes: ad856280ddea ("x86/kvm/fpu: Limit guest user_xfeatures to supported bits of XCR0")
bz: https://bugzilla.redhat.com/show_bug.cgi?id=2079311
Cc: stable@vger.kernel.org
Cc: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
[sean: add comment, massage changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220824033057.3576315-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Reinstate the per-vCPU guest_supported_xcr0 by partially reverting
commit 988896bb6182; the implicit assessment that guest_supported_xcr0 is
always the same as guest_fpu.fpstate->user_xfeatures was incorrect.
kvm_vcpu_after_set_cpuid() isn't the only place that sets user_xfeatures,
as user_xfeatures is set to fpu_user_cfg.default_features when guest_fpu
is allocated via fpu_alloc_guest_fpstate() => __fpstate_reset().
guest_supported_xcr0 on the other hand is zero-allocated. If userspace
never invokes KVM_SET_CPUID2, supported XCR0 will be '0', whereas the
allowed user XFEATURES will be non-zero.
Practically speaking, the edge case likely doesn't matter as no sane
userspace will live migrate a VM without ever doing KVM_SET_CPUID2. The
primary motivation is to prepare for KVM intentionally and explicitly
setting bits in user_xfeatures that are not set in guest_supported_xcr0.
Because KVM_{G,S}ET_XSAVE can be used to svae/restore FP+SSE state even
if the host doesn't support XSAVE, KVM needs to set the FP+SSE bits in
user_xfeatures even if they're not allowed in XCR0, e.g. because XCR0
isn't exposed to the guest. At that point, the simplest fix is to track
the two things separately (allowed save/restore vs. allowed XCR0).
Fixes: 988896bb6182 ("x86/kvm/fpu: Remove kvm_vcpu_arch.guest_supported_xcr0")
Cc: stable@vger.kernel.org
Cc: Leonardo Bras <leobras@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220824033057.3576315-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The update to statistic max_mmu_rmap_size is unintentionally removed by
commit 4293ddb788c1 ("KVM: x86/mmu: Remove redundant spte present check
in mmu_set_spte"). Add missing update to it or max_mmu_rmap_size will
always be nonsensical 0.
Fixes: 4293ddb788c1 ("KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Message-Id: <20220907080657.42898-1-linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|