linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2021-06-17	kbuild: mkcompile_h: consider timestamp if KBUILD_BUILD_TIMESTAMP is set	Matthias Maennich	1	-3/+11
	To avoid unnecessary recompilations, mkcompile_h does not regenerate compile.h if just the timestamp changed. Though, if KBUILD_BUILD_TIMESTAMP is set, an explicit timestamp for the build was requested, in which case we should not ignore it. If a user follows the documentation for reproducible builds [1] and defines KBUILD_BUILD_TIMESTAMP as the git commit timestamp, a clean build will have the correct timestamp. A subsequent cherry-pick (or amend) changes the commit timestamp and if an incremental build is done with a different KBUILD_BUILD_TIMESTAMP now, that new value is not taken into consideration. But it should for reproducibility. Hence, whenever KBUILD_BUILD_TIMESTAMP is explicitly set, do not ignore UTS_VERSION when making a decision about whether the regenerated version of compile.h should be moved into place. [1] https://www.kernel.org/doc/html/latest/kbuild/reproducible-builds.html Signed-off-by: Matthias Maennich <maennich@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-06-17	kbuild: modpost: Explicitly warn about unprototyped symbols	Mark Brown	1	-2/+5
	One common cause of modpost version generation failures is a failure to prototype exported assembly functions - the tooling requires this for exported functions even if they are not and should not be called from C code in order to do the version mangling for symbols. Unfortunately the error message is currently rather abstruse, simply saying that "version generation failed" and even diving into the code doesn't directly show what's going on since there's several steps between the problem and it being observed. Provide an explicit hint as to the likely cause of a version generation failure to help anyone who runs into this in future more readily diagnose and fix the problem. Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-06-17	kbuild: remove trailing slashes from $(KBUILD_EXTMOD)	Masahiro Yamada	1	-0/+5
	M= (or KBUILD_EXTMOD) generally expects a directory path without any trailing slashes, like M=a/b/c. If you add a trailing slash, like M=a/b/c/, you will get ugly build logs (two slashes in a series), but it still works fine as long as it is consistent between 'make modules' and 'make modules_install'. The following commands correctly build and install the modules. $ make M=a/b/c/ modules $ sudo make M=a/b/c/ modules_install Since commit ccae4cfa7bfb ("kbuild: refactor scripts/Makefile.modinst"), a problem happens if you add a trailing slash only for modules_install. $ make M=a/b/c modules $ sudo make M=a/b/c/ modules_install No module is installed in this case, Johannes Berg reported. [1] Trim any trailing slashes from $(KBUILD_EXTMOD). I used the 'dirname' command to remove all the trailing slashes in case someone adds more slashes like M=a/b/c/////. The Make's built-in function, $(dir ...) cannot take care of such a case. [1]: https://lore.kernel.org/lkml/10cc8522b27a051e6a9c3e158a4c4b6414fd04a0.camel@sipsolutions.net/ Fixes: ccae4cfa7bfb ("kbuild: refactor scripts/Makefile.modinst") Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-06-05	kconfig.h: explain IS_MODULE(), IS_ENABLED()	Bjorn Helgaas	1	-2/+4
	Extend IS_MODULE() and IS_ENABLED comments to explain why one might use "#if IS_ENABLED(CONFIG_FOO)" instead of "#ifdef CONFIG_FOO". To wit, "#ifdef CONFIG_FOO" is true only for CONFIG_FOO=y, while "#if IS_ENABLED(CONFIG_FOO)" is true for both CONFIG_FOO=y and CONFIG_FOO=m. This is because "CONFIG_FOO=m" in .config does not result in "CONFIG_FOO" being defined. The actual definitions are in autoconf.h, where: CONFIG_FOO=y results in #define CONFIG_FOO 1 CONFIG_FOO=m results in #define CONFIG_FOO_MODULE 1 Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-06-05	kconfig: constify long_opts	Masahiro Yamada	1	-1/+1
	getopt_long() does not modify the long_opts structure. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-05-27	scripts/setlocalversion: simplify the short version part	Masahiro Yamada	1	-5/+3
	Reduce the indentation. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Nico Schottelius <nico-linuxsetlocalversion@schottelius.org>
2021-05-27	scripts/setlocalversion: factor out 12-chars hash construction	Masahiro Yamada	1	-17/+5
	Both of if and else parts append exactly 12 hex chars, but in different ways. Factor out the else part because we need to support it without relying on git-describe. Remove the --abbrev=12 option since we do not use the hash from git-describe anyway. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Nico Schottelius <nico-linuxsetlocalversion@schottelius.org>
2021-05-27	scripts/setlocalversion: add more comments to -dirty flag detection	Masahiro Yamada	1	-0/+4
	This script stumbled on the read-only source tree over again: - a2bb90a08cb3 ("kbuild: fix delay in setlocalversion on readonly source") - cdf2bc632ebc ("scripts/setlocalversion on write-protected source tree") - 8ef14c2c41d9 ("Revert "scripts/setlocalversion: git: Make -dirty check more robust"") - ff64dd485730 ("scripts/setlocalversion: Improve -dirty check with git-status --no-optional-locks") Add comments to clarify that this script should never ever try to write to the source tree. 'git describe --dirty' might look as a simple solution for appending the -dirty string, but we cannot use it because it creates the .git/index.lock file. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Nico Schottelius <nico-linuxsetlocalversion@schottelius.org>
2021-05-27	scripts/setlocalversion: remove workaround for old make-kpkg	Masahiro Yamada	1	-1/+1
	This reverts commit b052ce4c840e ("kbuild: fix false positive -dirty tag caused by make-kpkg"). If I understand correctly, this problem occurred in very old versions of make-kpkg. When I tried a newer version, make-kpkg did not touch scripts/package/Makefile. Anyway, Debian uses 'make deb-pkg' instead of make-kpkg these days. Debian handbook [1] mentions it as "the good old days": "CULTURE The good old days of kernel-package Before the Linux build system gained the ability to build proper Debian packages, the recommended way to build such packages was to use make-kpkg from the kernel-package package." [1]: https://debian-handbook.info/browse/stable/sect.kernel-compilation.html Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Nico Schottelius <nico-linuxsetlocalversion@schottelius.org>
2021-05-27	scripts/setlocalversion: remove mercurial, svn and git-svn supports	Masahiro Yamada	1	-41/+0
	The mercurial, svn, git-svn supports were added by the following commits: - 3dce174cfcba ("kbuild: support mercurial in setlocalversion") - ba3d05fb6369 ("kbuild: add svn revision information to setlocalversion") - ff80aa97c9b4 ("setlocalversion: add git-svn support") They did not explain why they are useful for the kernel source tree. Let's revert all of them, and see if somebody will complain about it. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Nico Schottelius <nico-linuxsetlocalversion@schottelius.org>
2021-05-27	kbuild: clean up ${quiet} checks in shell scripts	Masahiro Yamada	3	-9/+3
	There were efforts to make 'make -s' really silent when it is a warning-free build. The conventional way was to let a shell script check ${quiet}, and if it is 'silent_', suppress the stdout by itself. With the previous commit, the 'cmd' takes care of it now. The 'cmd' is also invoked from if_changed, if_changed_dep, and if_changed_rule. You can omit ${quiet} checks in shell scripts when they are invoked from the 'cmd' macro. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-05-27	kbuild: sink stdout from cmd for silent build	Masahiro Yamada	1	-1/+6
	When building with 'make -s', no output to stdout should be printed. As Arnd Bergmann reported [1], mkimage shows the detailed information of the generated images. I think this should be suppressed by the 'cmd' macro instead of by individual scripts. Insert 'exec >/dev/null;' in order to redirect stdout to /dev/null for silent builds. [Note about this implementation] 'exec >/dev/null;' may look somewhat tricky, but this has a reason. Appending '>/dev/null' at the end of command line is a common way for redirection, so I first tried this: cmd = @set -e; $(echo-cmd) $(cmd_$(1)) >/dev/null ... but it would not work if $(cmd_$(1)) itself contains a redirection. For example, cmd_wrap in scripts/Makefile.asm-generic redirects the output from the 'echo' command into the target file. It would be expanded into: echo "#include <asm-generic/$.h>" > $@ >/dev/null Then, the target file gets empty because the string will go to /dev/null instead of $@. Next, I tried this: cmd = @set -e; $(echo-cmd) { $(cmd_$(1)); } >/dev/null The form above would be expanded into: { echo "#include <asm-generic/$.h>" > $@; } >/dev/null This works as expected. However, it would be a syntax error if $(cmd_$(1)) is empty. When CONFIG_TRIM_UNUSED_KSYMS is disabled, $(call cmd,gen_ksymdeps) in scripts/Makefile.build would be expanded into: set -e; { ; } >/dev/null ..., which causes an syntax error. I also tried this: cmd = @set -e; $(echo-cmd) ( $(cmd_$(1)) ) >/dev/null ... but this causes a syntax error for the same reason. So, finally I adopted: cmd = @set -e; $(echo-cmd) exec >/dev/null; $(cmd_$(1)) [1]: https://lore.kernel.org/lkml/20210514135752.2910387-1-arnd@kernel.org/ Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-05-27	init: use $(call cmd,) for generating include/generated/compile.h	Masahiro Yamada	1	-6/+6
	The 'cmd' macro shows the short log only when $(quiet) is quiet_. Do not do it manually. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-05-27	kbuild: merge scripts/mkmakefile to top Makefile	Masahiro Yamada	2	-19/+9
	scripts/mkmakefile is simple enough to be merged in the Makefile. Use $(call cmd,...) to show the log instead of doing it in the shell script. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-05-26	sh: move core-y in arch/sh/Makefile to arch/sh/Kbuild	Masahiro Yamada	2	-5/+3
	Use obj-y to clean up Makefile. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-05-26	hexagon: move core-y in arch/hexagon/Makefile to arch/hexagon/Kbuild	Masahiro Yamada	2	-4/+1
	Use obj-y to clean up Makefile. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-05-26	h8300: move core-y in arch/h8300/Makefile to arch/h8300/Kbuild	Masahiro Yamada	2	-3/+1
	Use obj-y to clean up Makefile. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-05-26	alpha: move core-y in arch/alpha/Makefile to arch/alpha/Kbuild	Masahiro Yamada	2	-2/+2
	Use obj-y to clean up Makefile. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-05-26	kbuild: require all architectures to have arch/$(SRCARCH)/Kbuild	Masahiro Yamada	24	-24/+13
	arch/$(SRCARCH)/Kbuild is useful for Makefile cleanups because you can use the obj-y syntax. Add an empty file if it is missing in arch/$(SRCARCH)/. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-05-24	kbuild: remove libelf checks from top Makefile	Masahiro Yamada	3	-60/+22
	I do not see a good reason why only the libelf development package must be so carefully checked. Kbuild generally does not check host tools or libraries. For example, x86_64 defconfig fails to build with no libssl development package installed. scripts/extract-cert.c:21:10: fatal error: openssl/bio.h: No such file or directory 21 \| #include <openssl/bio.h> \| ^~~~~~~~~~~~~~~ To solve the build error, you need to install libssl-dev or openssl-devel package, depending on your distribution. 'apt-file search', 'dnf provides', etc. is your frined to find a proper package to install. This commit removes all the libelf checks from the top Makefile. If libelf is missing, objtool will fail to build in a similar pattern: .../linux/tools/objtool/include/objtool/elf.h:10:10: fatal error: gelf.h: No such file or directory 10 \| #include <gelf.h> You need to install libelf-dev, libelf-devel, or elfutils-libelf-devel to proceed. Another remarkable change is, CONFIG_STACK_VALIDATION (without CONFIG_UNWINDER_ORC) previously continued to build with a warning, but now it will treat missing libelf as an error. This is just a one-time installation, so it should not hurt to break a build and make a user install the package. BTW, the traditional way to handle such checks is autotool, but according to [1], I do not expect the kernel build would have similar scripting like './configure' does. [1]: https://lore.kernel.org/lkml/CA+55aFzr2HTZVOuzpHYDwmtRJLsVzE-yqg2DHpHi_9ePsYp5ug@mail.gmail.com/ Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org>
2021-05-24	kbuild: hide tools/ build targets from external module builds	Masahiro Yamada	1	-14/+16
	The tools/ directory only exists in the kernel source tree, not in external modules. Do not expose the meaningless targets to external modules. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-05-24	Makefile: extend 32B aligned debug option to 64B aligned	Feng Tang	2	-4/+4
	Commit 09c60546f04f ("./Makefile: add debug option to enable function aligned on 32 bytes") was introduced to help debugging strange kernel performance changes caused by code alignment change. Recently we found 2 similar cases [1][2] caused by code-alignment changes, which can only be identified by forcing 64 bytes aligned for all functions. Originally, 32 bytes was used mainly for not wasting too much text space, but this option is only for debug anyway where text space is not a big concern. So extend the alignment to 64 bytes to cover more similar cases. [1].https://lore.kernel.org/lkml/20210427090013.GG32408@xsang-OptiPlex-9020/ [2].https://lore.kernel.org/lkml/20210420030837.GB31773@xsang-OptiPlex-9020/ Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-05-23	Linux 5.13-rc3	Linus Torvalds	1	-1/+1

2021-05-22	userfaultfd: hugetlbfs: fix new flag usage in error path	Mike Kravetz	2	-15/+15
	In commit d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags") the use of PagePrivate to indicate a reservation count should be restored at free time was changed to the hugetlb specific flag HPageRestoreReserve. Changes to a userfaultfd error path as well as a VM_BUG_ON() in remove_inode_hugepages() were overlooked. Users could see incorrect hugetlb reserve counts if they experience an error with a UFFDIO_COPY operation. Specifically, this would be the result of an unlikely copy_huge_page_from_user error. There is not an increased chance of hitting the VM_BUG_ON. Link: https://lkml.kernel.org/r/20210521233952.236434-1-mike.kravetz@oracle.com Fixes: d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Mina Almasry <almasry.mina@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Mina Almasry <almasrymina@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-22	lib: kunit: suppress a compilation warning of frame size	Zhen Lei	1	-0/+1
	lib/bitfield_kunit.c: In function `test_bitfields_constants': lib/bitfield_kunit.c:93:1: warning: the frame size of 7456 bytes is larger than 2048 bytes [-Wframe-larger-than=] } ^ As the description of BITFIELD_KUNIT in lib/Kconfig.debug, it "Only useful for kernel devs running the KUnit test harness, and not intended for inclusion into a production build". Therefore, it is not worth modifying variable 'test_bitfields_constants' to clear this warning. Just suppress it. Link: https://lkml.kernel.org/r/20210518094533.7652-1-thunder.leizhen@huawei.com Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Vitor Massaru Iha <vitor@massaru.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-22	proc: remove Alexey from MAINTAINERS	Alexey Dobriyan	1	-1/+0
	People Cc me and I don't have time. Link: https://lkml.kernel.org/r/YKarMxHJBIhMHQIh@localhost.localdomain Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-22	linux/bits.h: fix compilation error with GENMASK	Rikard Falkeborn	5	-10/+20
	GENMASK() has an input check which uses __builtin_choose_expr() to enable a compile time sanity check of its inputs if they are known at compile time. However, it turns out that __builtin_constant_p() does not always return a compile time constant [0]. It was thought this problem was fixed with gcc 4.9 [1], but apparently this is not the case [2]. Switch to use __is_constexpr() instead which always returns a compile time constant, regardless of its inputs. Link: https://lore.kernel.org/lkml/42b4342b-aefc-a16a-0d43-9f9c0d63ba7a@rasmusvillemoes.dk [0] Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19449 [1] Link: https://lore.kernel.org/lkml/1ac7bbc2-45d9-26ed-0b33-bf382b8d858b@I-love.SAKURA.ne.jp [2] Link: https://lkml.kernel.org/r/20210511203716.117010-1-rikard.falkeborn@gmail.com Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Yury Norov <yury.norov@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-22	watchdog: reliable handling of timestamps	Petr Mladek	1	-14/+20
	Commit 9bf3bc949f8a ("watchdog: cleanup handling of false positives") tried to handle a virtual host stopped by the host a more straightforward and cleaner way. But it introduced a risk of false softlockup reports. The virtual host might be stopped at any time, for example between kvm_check_and_clear_guest_paused() and is_softlockup(). As a result, is_softlockup() might read the updated jiffies and detects a softlockup. A solution might be to put back kvm_check_and_clear_guest_paused() after is_softlockup() and detect it. But it would put back the cycle that complicates the logic. In fact, the handling of all the timestamps is not reliable. The code does not guarantee when and how many times the timestamps are read. For example, "period_ts" might be touched anytime also from NMI and re-read in is_softlockup(). It works just by chance. Fix all the problems by making the code even more explicit. 1. Make sure that "now" and "period_ts" timestamps are read only once. They might be changed at anytime by NMI or when the virtual guest is stopped by the host. Note that "now" timestamp does this implicitly because "jiffies" is marked volatile. 2. "now" time must be read first. The state of "period_ts" will decide whether it will be used or the period will get restarted. 3. kvm_check_and_clear_guest_paused() must be called before reading "period_ts". It touches the variable when the guest was stopped. As a result, "now" timestamp is used only when the watchdog was not touched and the guest not stopped in the meantime. "period_ts" is restarted in all other situations. Link: https://lkml.kernel.org/r/YKT55gw+RZfyoFf7@alley Fixes: 9bf3bc949f8aeefeacea4b ("watchdog: cleanup handling of false positives") Signed-off-by: Petr Mladek <pmladek@suse.com> Reported-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-22	kasan: slab: always reset the tag in get_freepointer_safe()	Alexander Potapenko	1	-0/+1
	With CONFIG_DEBUG_PAGEALLOC enabled, the kernel should also untag the object pointer, as done in get_freepointer(). Failing to do so reportedly leads to SLUB freelist corruptions that manifest as boot-time crashes. Link: https://lkml.kernel.org/r/20210514072228.534418-1-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Marco Elver <elver@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Elliot Berman <eberman@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-22	tools/testing/selftests/exec: fix link error	Yang Yingliang	1	-3/+3
	Fix the link error by adding '-static': gcc -Wall -Wl,-z,max-page-size=0x1000 -pie load_address.c -o /home/yang/linux/tools/testing/selftests/exec/load_address_4096 /usr/bin/ld: /tmp/ccopEGun.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against symbol `stderr@@GLIBC_2.17' which may bind externally can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: /tmp/ccopEGun.o(.text+0x158): unresolvable R_AARCH64_ADR_PREL_PG_HI21 relocation against symbol `stderr@@GLIBC_2.17' /usr/bin/ld: final link failed: bad value collect2: error: ld returned 1 exit status make: *** [Makefile:25: tools/testing/selftests/exec/load_address_4096] Error 1 Link: https://lkml.kernel.org/r/20210514092422.2367367-1-yangyingliang@huawei.com Fixes: 206e22f01941 ("tools/testing/selftests: add self-test for verifying load alignment") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Cc: Chris Kennelly <ckennelly@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-22	ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry	Varad Gautam	3	-6/+12
	do_mq_timedreceive calls wq_sleep with a stack local address. The sender (do_mq_timedsend) uses this address to later call pipelined_send. This leads to a very hard to trigger race where a do_mq_timedreceive call might return and leave do_mq_timedsend to rely on an invalid address, causing the following crash: RIP: 0010:wake_q_add_safe+0x13/0x60 Call Trace: __x64_sys_mq_timedsend+0x2a9/0x490 do_syscall_64+0x80/0x680 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f5928e40343 The race occurs as: 1. do_mq_timedreceive calls wq_sleep with the address of `struct ext_wait_queue` on function stack (aliased as `ewq_addr` here) - it holds a valid `struct ext_wait_queue ` as long as the stack has not been overwritten. 2. `ewq_addr` gets added to info->e_wait_q[RECV].list in wq_add, and do_mq_timedsend receives it via wq_get_first_waiter(info, RECV) to call __pipelined_op. 3. Sender calls __pipelined_op::smp_store_release(&this->state, STATE_READY). Here is where the race window begins. (`this` is `ewq_addr`.) 4. If the receiver wakes up now in do_mq_timedreceive::wq_sleep, it will see `state == STATE_READY` and break. 5. do_mq_timedreceive returns, and `ewq_addr` is no longer guaranteed to be a `struct ext_wait_queue ` since it was on do_mq_timedreceive's stack. (Although the address may not get overwritten until another function happens to touch it, which means it can persist around for an indefinite time.) 6. do_mq_timedsend::__pipelined_op() still believes `ewq_addr` is a `struct ext_wait_queue *`, and uses it to find a task_struct to pass to the wake_q_add_safe call. In the lucky case where nothing has overwritten `ewq_addr` yet, `ewq_addr->task` is the right task_struct. In the unlucky case, __pipelined_op::wake_q_add_safe gets handed a bogus address as the receiver's task_struct causing the crash. do_mq_timedsend::__pipelined_op() should not dereference `this` after setting STATE_READY, as the receiver counterpart is now free to return. Change __pipelined_op to call wake_q_add_safe on the receiver's task_struct returned by get_task_struct, instead of dereferencing `this` which sits on the receiver's stack. As Manfred pointed out, the race potentially also exists in ipc/msg.c::expunge_all and ipc/sem.c::wake_up_sem_queue_prepare. Fix those in the same way. Link: https://lkml.kernel.org/r/20210510102950.12551-1-varad.gautam@suse.com Fixes: c5b2cbdbdac563 ("ipc/mqueue.c: update/document memory barriers") Fixes: 8116b54e7e23ef ("ipc/sem.c: document and update memory barriers") Fixes: 0d97a82ba830d8 ("ipc/msg.c: update and document memory barriers") Signed-off-by: Varad Gautam <varad.gautam@suse.com> Reported-by: Matthias von Faber <matthias.vonfaber@aox-tech.de> Acked-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Manfred Spraul <manfred@colorfullife.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-22	Revert "mm/gup: check page posion status for coredump."	Michal Hocko	2	-24/+0
	While reviewing [1] I came across commit d3378e86d182 ("mm/gup: check page posion status for coredump.") and noticed that this patch is broken in two ways. First it doesn't really prevent hwpoison pages from being dumped because hwpoison pages can be marked asynchornously at any time after the check. Secondly, and more importantly, the patch introduces a ref count leak because get_dump_page takes a reference on the page which is not released. It also seems that the patch was merged incorrectly because there were follow up changes not included as well as discussions on how to address the underlying problem [2] Therefore revert the original patch. Link: http://lkml.kernel.org/r/20210429122519.15183-4-david@redhat.com [1] Link: http://lkml.kernel.org/r/57ac524c-b49a-99ec-c1e4-ef5027bfb61b@redhat.com [2] Link: https://lkml.kernel.org/r/20210505135407.31590-1-mhocko@kernel.org Fixes: d3378e86d182 ("mm/gup: check page posion status for coredump.") Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Aili Yao <yaoaili@kingsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-22	mm/shuffle: fix section mismatch warning	Arnd Bergmann	1	-2/+2
	clang sometimes decides not to inline shuffle_zone(), but it calls a __meminit function. Without the extra __meminit annotation we get this warning: WARNING: modpost: vmlinux.o(.text+0x2a86d4): Section mismatch in reference from the function shuffle_zone() to the function .meminit.text:__shuffle_zone() The function shuffle_zone() references the function __meminit __shuffle_zone(). This is often because shuffle_zone lacks a __meminit annotation or the annotation of __shuffle_zone is wrong. shuffle_free_memory() did not show the same problem in my tests, but it could happen in theory as well, so mark both as __meminit. Link: https://lkml.kernel.org/r/20210514135952.2928094-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-21	xen-pciback: reconfigure also from backend watch handler	Jan Beulich	1	-5/+17
	When multiple PCI devices get assigned to a guest right at boot, libxl incrementally populates the backend tree. The writes for the first of the devices trigger the backend watch. In turn xen_pcibk_setup_backend() will set the XenBus state to Initialised, at which point no further reconfigures would happen unless a device got hotplugged. Arrange for reconfigure to also get triggered from the backend watch handler. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/2337cbd6-94b9-4187-9862-c03ea12e0c61@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2021-05-21	xen-pciback: redo VF placement in the virtual topology	Jan Beulich	1	-6/+8
	The commit referenced below was incomplete: It merely affected what would get written to the vdev-<N> xenstore node. The guest would still find the function at the original function number as long as __xen_pcibk_get_pci_dev() wouldn't be in sync. The same goes for AER wrt __xen_pcibk_get_pcifront_dev(). Undo overriding the function to zero and instead make sure that VFs at function zero remain alone in their slot. This has the added benefit of improving overall capacity, considering that there's only a total of 32 slots available right now (PCI segment and bus can both only ever be zero at present). Fixes: 8a5248fe10b1 ("xen PV passthru: assign SR-IOV virtual functions to separate virtual slots") Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/8def783b-404c-3452-196d-3f3fd4d72c9e@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2021-05-21	x86/Xen: swap NX determination and GDT setup on BSP	Jan Beulich	1	-4/+4
	xen_setup_gdt(), via xen_load_gdt_boot(), wants to adjust page tables. For this to work when NX is not available, x86_configure_nx() needs to be called first. [jgross] Note that this is a revert of 36104cb9012a82e73 ("x86/xen: Delay get_cpu_cap until stack canary is established"), which is possible now that we no longer support running as PV guest in 32-bit mode. Cc: <stable.vger.kernel.org> # 5.9 Fixes: 36104cb9012a82e73 ("x86/xen: Delay get_cpu_cap until stack canary is established") Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/12a866b0-9e89-59f7-ebeb-a2a6cec0987a@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2021-05-20	Fix KASAN identified use-after-free issue.	Rohith Surabattula	2	-9/+23
	[ 612.157429] ================================================================== [ 612.158275] BUG: KASAN: use-after-free in process_one_work+0x90/0x9b0 [ 612.158801] Read of size 8 at addr ffff88810a31ca60 by task kworker/2:9/2382 [ 612.159611] CPU: 2 PID: 2382 Comm: kworker/2:9 Tainted: G OE 5.13.0-rc2+ #98 [ 612.159623] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 [ 612.159640] Workqueue: 0x0 (deferredclose) [ 612.159669] Call Trace: [ 612.159685] dump_stack+0xbb/0x107 [ 612.159711] print_address_description.constprop.0+0x18/0x140 [ 612.159733] ? process_one_work+0x90/0x9b0 [ 612.159743] ? process_one_work+0x90/0x9b0 [ 612.159754] kasan_report.cold+0x7c/0xd8 [ 612.159778] ? lock_is_held_type+0x80/0x130 [ 612.159789] ? process_one_work+0x90/0x9b0 [ 612.159812] kasan_check_range+0x145/0x1a0 [ 612.159834] process_one_work+0x90/0x9b0 [ 612.159877] ? pwq_dec_nr_in_flight+0x110/0x110 [ 612.159914] ? spin_bug+0x90/0x90 [ 612.159967] worker_thread+0x3b6/0x6c0 [ 612.160023] ? process_one_work+0x9b0/0x9b0 [ 612.160038] kthread+0x1dc/0x200 [ 612.160051] ? kthread_create_worker_on_cpu+0xd0/0xd0 [ 612.160092] ret_from_fork+0x1f/0x30 [ 612.160399] Allocated by task 2358: [ 612.160757] kasan_save_stack+0x1b/0x40 [ 612.160768] __kasan_kmalloc+0x9b/0xd0 [ 612.160778] cifs_new_fileinfo+0xb0/0x960 [cifs] [ 612.161170] cifs_open+0xadf/0xf20 [cifs] [ 612.161421] do_dentry_open+0x2aa/0x6b0 [ 612.161432] path_openat+0xbd9/0xfa0 [ 612.161441] do_filp_open+0x11d/0x230 [ 612.161450] do_sys_openat2+0x115/0x240 [ 612.161460] __x64_sys_openat+0xce/0x140 When mod_delayed_work is called to modify the delay of pending work, it might return false and queue a new work when pending work is already scheduled or when try to grab pending work failed. So, Increase the reference count when new work is scheduled to avoid use-after-free. Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-05-20	xfs: restore old ioctl definitions	Darrick J. Wong	1	-0/+4
	These ioctl definitions in xfs_fs.h are part of the userspace ABI and were mistakenly removed during the 5.13 merge window. Fixes: 9fefd5db08ce ("xfs: convert to fileattr") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2021-05-20	xfs: fix deadlock retry tracepoint arguments	Darrick J. Wong	1	-1/+3
	sc->ip is the inode that's being scrubbed, which means that it's not set for scrub types that don't involve inodes. If one of those scrubbers (e.g. inode btrees) returns EDEADLOCK, we'll trip over the null pointer. Fix that by reporting either the file being examined or the file that was used to call scrub. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com>
2021-05-20	xfs: retry allocations when locality-based search fails	Darrick J. Wong	1	-1/+14
	If a realtime allocation fails because we can't find a sufficiently large free extent satisfying locality rules, relax the locality rules and try again. This reduces the occurrence of short writes to realtime files when the write size is large and the free space is fragmented. This was originally discovered by running generic/186 with the realtime reflink patchset and a 128k cow extent size hint, but the short write symptoms can manifest with a 128k extent size hint and no reflink, so apply the fix now. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2021-05-21	powerpc/64s/syscall: Fix ptrace syscall info with scv syscalls	Nicholas Piggin	2	-35/+52
	The scv implementation missed updating syscall return value and error value get/set functions to deal with the changed register ABI. This broke ptrace PTRACE_GET_SYSCALL_INFO as well as some kernel auditing and tracing functions. Fix. tools/testing/selftests/ptrace/get_syscall_info now passes when scv is used. Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv instructions") Cc: stable@vger.kernel.org # v5.9+ Reported-by: "Dmitry V. Levin" <ldv@altlinux.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210520111931.2597127-2-npiggin@gmail.com
2021-05-21	powerpc/64s/syscall: Use pt_regs.trap to distinguish syscall ABI difference between sc and scv syscalls	Nicholas Piggin	2	-9/+28
	The sc and scv 0 system calls have different ABI conventions, and ptracers need to know which system call type is being used if they want to look at the syscall registers. Document that pt_regs.trap can be used for this, and fix one in-tree user to work with scv 0 syscalls. Fixes: 7fa95f9adaee ("powerpc/64s: system call support for scv/rfscv instructions") Cc: stable@vger.kernel.org # v5.9+ Reported-by: "Dmitry V. Levin" <ldv@altlinux.org> Suggested-by: "Dmitry V. Levin" <ldv@altlinux.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210520111931.2597127-1-npiggin@gmail.com
2021-05-20	block: fix a race between del_gendisk and BLKRRPART	Gulam Mohamed	1	-0/+3
	When BLKRRPART is called concurrently with del_gendisk, the partitions rescan can create a stale partition that will never be be cleaned up. Fix this by checking the the disk is up before rescanning partitions while under bd_mutex. Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210514131842.1600568-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-20	block: prevent block device lookups at the beginning of del_gendisk	Christoph Hellwig	3	-22/+6
	As an artifact of how gendisk lookup used to work in earlier kernels, GENHD_FL_UP is only cleared very late in del_gendisk, and a global lock is used to prevent opens from succeeding while del_gendisk is tearing down the gendisk. Switch to clearing the flag early and under bd_mutex so that callers can use bd_mutex to stabilize the flag, which removes the need for the global mutex. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210514131842.1600568-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-20	btrfs: zoned: fix parallel compressed writes	Johannes Thumshirn	1	-4/+38
	When multiple processes write data to the same block group on a compressed zoned filesystem, the underlying device could report I/O errors and data corruption is possible. This happens because on a zoned file system, compressed data writes where sent to the device via a REQ_OP_WRITE instead of a REQ_OP_ZONE_APPEND operation. But with REQ_OP_WRITE and parallel submission it cannot be guaranteed that the data is always submitted aligned to the underlying zone's write pointer. The change to using REQ_OP_ZONE_APPEND instead of REQ_OP_WRITE on a zoned filesystem is non intrusive on a regular file system or when submitting to a conventional zone on a zoned filesystem, as it is guarded by btrfs_use_zone_append. Reported-by: David Sterba <dsterba@suse.com> Fixes: 9d294a685fbc ("btrfs: zoned: enable to mount ZONED incompat flag") CC: stable@vger.kernel.org # 5.12.x: e380adfc213a13: btrfs: zoned: pass start block to btrfs_use_zone_append CC: stable@vger.kernel.org # 5.12.x Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-20	btrfs: zoned: pass start block to btrfs_use_zone_append	Johannes Thumshirn	4	-7/+6
	btrfs_use_zone_append only needs the passed in extent_map's block_start member, so there's no need to pass in the full extent map. This also enables the use of btrfs_use_zone_append in places where we only have a start byte but no extent_map. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-20	io_uring: fortify tctx/io_wq cleanup	Pavel Begunkov	1	-4/+4
	We don't want anyone poking into tctx->io_wq awhile it's being destroyed by io_wq_put_and_exit(), and even though it shouldn't even happen, if buggy would be preferable to get a NULL-deref instead of subtle delayed failure or UAF. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/827b021de17926fd807610b3e53a5a5fa8530856.1621513214.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-05-20	platform/x86: touchscreen_dmi: Add info for the Chuwi Hi10 Pro (CWI529) tablet	Hans de Goede	1	-0/+35
	Add touchscreen info for the Chuwi Hi10 Pro (CWI529) tablet. This includes info for getting the firmware directly from the UEFI, so that the user does not need to manually install the firmware in /lib/firmware/silead. This change will make the touchscreen on these devices work OOTB, without requiring any manual setup. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210520093228.7439-1-hdegoede@redhat.com
2021-05-20	dma-buf: fix unintended pin/unpin warnings	Christian König	1	-5/+5
	DMA-buf internal users call the pin/unpin functions without having a dynamic attachment. Avoid the warning and backtrace in the logs. Signed-off-by: Christian König <christian.koenig@amd.com> Bugs: https://gitlab.freedesktop.org/drm/intel/-/issues/3481 Fixes: c545781e1c55 ("dma-buf: doc polish for pin/unpin") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> CC: stable@kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20210517115705.2141-1-christian.koenig@amd.com
2021-05-20	powerpc: Fix early setup to make early_ioremap() work	Alexey Kardashevskiy	1	-2/+2
	The immediate problem is that after commit 0bd3f9e953bd ("powerpc/legacy_serial: Use early_ioremap()") the kernel silently reboots on some systems. The reason is that early_ioremap() returns broken addresses as it uses slot_virt[] array which initialized with offsets from FIXADDR_TOP == IOREMAP_END+FIXADDR_SIZE == KERN_IO_END - FIXADDR_SIZ + FIXADDR_SIZE == __kernel_io_end which is 0 when early_ioremap_setup() is called. __kernel_io_end is initialized little bit later in early_init_mmu(). This fixes the initialization by swapping early_ioremap_setup() and early_init_mmu(). Fixes: 265c3491c4bc ("powerpc: Add support for GENERIC_EARLY_IOREMAP") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Drop unrelated cleanup & cleanup change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210520032919.358935-1-aik@ozlabs.ru