aboutsummaryrefslogtreecommitdiffstats
path: root/init/Kconfig (follow)
AgeCommit message (Collapse)AuthorFilesLines
2021-09-07kbuild: Only default to -Werror if COMPILE_TESTMarco Elver1-1/+1
The cross-product of the kernel's supported toolchains, architectures, and configuration options is large. So large, that it's generally accepted to be infeasible to enumerate and build+test them all (many compile-testers rely on randomly generated configs). Without the possibility to enumerate all possible combinations of toolchains, architectures, and configuration options, it is inevitable that compiler warnings in this space exist. With -Werror, this means that an innumerable set of kernels are now broken, yet had been perfectly usable before (confused compilers, code with warnings unused, or luck). Distributors will necessarily pick a point in the toolchain X arch X config space, and if unlucky, will have a broken build. Granted, those will likely disable CONFIG_WERROR and move on. The kernel's default configuration is unlikely to be suitable for all users, but it's inappropriate to force many users to set CONFIG_WERROR=n. This also holds for CI systems which are focused on runtime testing, where the odd warning in some subsystem will disrupt testing of the rest of the kernel. Many of those runtime-focused CI systems run tests or fuzz the kernel using runtime debugging tools. Runtime testing of different subsystems can proceed in parallel, and potentially uncover serious bugs; halting runtime testing of the entire kernel because of the odd warning (now error) in a subsystem or driver is simply inappropriate. Therefore, runtime-focused CI systems will likely choose CONFIG_WERROR=n as well. The appropriate usecase for -Werror is therefore compile-test focused builds (often done by developers or CI systems). Reflect this in the Kconfig option by making the default value of WERROR match COMPILE_TEST. Signed-off-by: Marco Elver <elver@google.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Randy Dunlap <rdunlap@infradead.org> Reviwed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-05Enable '-Werror' by default for all kernel buildsLinus Torvalds1-0/+14
... but make it a config option so that broken environments can disable it when required. We really should always have a clean build, and will disable specific over-eager warnings as required, if we can't fix them. But while I fairly religiously enforce that in my own tree, it doesn't get enforced by various build robots that don't necessarily report warnings. So this just makes '-Werror' a default compiler flag, but allows people to disable it for their configuration if they have some particular issues. Occasionally, new compiler versions end up enabling new warnings, and it can take a while before we have them fixed (or the warnings disabled if that is what it takes), so the config option allows for that situation. Hopefully this will mean that I get fewer pull requests that have new warnings that were not noticed by various automation we have in place. Knock wood. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-01Merge tag 'printk-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linuxLinus Torvalds1-5/+14
Pull printk updates from Petr Mladek: - Optionally, provide an index of possible printk messages via <debugfs>/printk/index/. It can be used when monitoring important kernel messages on a farm of various hosts. The monitor has to be updated when some messages has changed or are not longer available by a newly deployed kernel. - Add printk.console_no_auto_verbose boot parameter. It allows to generate crash dump even with slow consoles in a reasonable time frame. - Remove printk_safe buffers. The messages are always stored directly to the main logbuffer, even in NMI or recursive context. Also it allows to serialize syslog operations by a mutex instead of a spin lock. - Misc clean up and build fixes. * tag 'printk-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk/index: Fix -Wunused-function warning lib/nmi_backtrace: Serialize even messages about idle CPUs printk: Add printk.console_no_auto_verbose boot parameter printk: Remove console_silent() lib/test_scanf: Handle n_bits == 0 in random tests printk: syslog: close window between wait and read printk: convert @syslog_lock to mutex printk: remove NMI tracking printk: remove safe buffers printk: track/limit recursion lib/nmi_backtrace: explicitly serialize banner and regs printk: Move the printk() kerneldoc comment to its new home printk/index: Fix warning about missing prototypes MIPS/asm/printk: Fix build failure caused by printk printk: index: Add indexing support to dev_printk printk: Userspace format indexing support printk: Rework parse_prefix into printk_parse_prefix printk: Straighten out log_flags into printk_info_flags string_helpers: Escape double quotes in escape_special printk/console: Check consistent sequence number when handling race in console_unlock()
2021-08-30Merge branch 'rework/printk_safe-removal' into for-linusPetr Mladek1-5/+0
2021-07-26printk: remove NMI trackingJohn Ogness1-5/+0
All NMI contexts are handled the same as the safe context: store the message and defer printing. There is no need to have special NMI context tracking for this. Using in_nmi() is enough. There are several parts of the kernel that are manually calling into the printk NMI context tracking in order to cause general printk deferred printing: arch/arm/kernel/smp.c arch/powerpc/kexec/crash.c kernel/trace/trace.c For arm/kernel/smp.c and powerpc/kexec/crash.c, provide a new function pair printk_deferred_enter/exit that explicitly achieves the same objective. For ftrace, remove the printk context manipulation completely. It was added in commit 03fc7f9c99c1 ("printk/nmi: Prevent deadlock when accessing the main log buffer in NMI"). The purpose was to enforce storing messages directly into the ring buffer even in NMI context. It really should have only modified the behavior in NMI context. There is no need for a special behavior any longer. All messages are always stored directly now. The console deferring is handled transparently in vprintk(). Signed-off-by: John Ogness <john.ogness@linutronix.de> [pmladek@suse.com: Remove special handling in ftrace.c completely. Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210715193359.25946-5-john.ogness@linutronix.de
2021-07-19printk: Userspace format indexing supportChris Down1-0/+14
We have a number of systems industry-wide that have a subset of their functionality that works as follows: 1. Receive a message from local kmsg, serial console, or netconsole; 2. Apply a set of rules to classify the message; 3. Do something based on this classification (like scheduling a remediation for the machine), rinse, and repeat. As a couple of examples of places we have this implemented just inside Facebook, although this isn't a Facebook-specific problem, we have this inside our netconsole processing (for alarm classification), and as part of our machine health checking. We use these messages to determine fairly important metrics around production health, and it's important that we get them right. While for some kinds of issues we have counters, tracepoints, or metrics with a stable interface which can reliably indicate the issue, in order to react to production issues quickly we need to work with the interface which most kernel developers naturally use when developing: printk. Most production issues come from unexpected phenomena, and as such usually the code in question doesn't have easily usable tracepoints or other counters available for the specific problem being mitigated. We have a number of lines of monitoring defence against problems in production (host metrics, process metrics, service metrics, etc), and where it's not feasible to reliably monitor at another level, this kind of pragmatic netconsole monitoring is essential. As one would expect, monitoring using printk is rather brittle for a number of reasons -- most notably that the message might disappear entirely in a new version of the kernel, or that the message may change in some way that the regex or other classification methods start to silently fail. One factor that makes this even harder is that, under normal operation, many of these messages are never expected to be hit. For example, there may be a rare hardware bug which one wants to detect if it was to ever happen again, but its recurrence is not likely or anticipated. This precludes using something like checking whether the printk in question was printed somewhere fleetwide recently to determine whether the message in question is still present or not, since we don't anticipate that it should be printed anywhere, but still need to monitor for its future presence in the long-term. This class of issue has happened on a number of occasions, causing unhealthy machines with hardware issues to remain in production for longer than ideal. As a recent example, some monitoring around blk_update_request fell out of date and caused semi-broken machines to remain in production for longer than would be desirable. Searching through the codebase to find the message is also extremely fragile, because many of the messages are further constructed beyond their callsite (eg. btrfs_printk and other module-specific wrappers, each with their own functionality). Even if they aren't, guessing the format and formulation of the underlying message based on the aesthetics of the message emitted is not a recipe for success at scale, and our previous issues with fleetwide machine health checking demonstrate as much. This provides a solution to the issue of silently changed or deleted printks: we record pointers to all printk format strings known at compile time into a new .printk_index section, both in vmlinux and modules. At runtime, this can then be iterated by looking at <debugfs>/printk/index/<module>, which emits the following format, both readable by humans and able to be parsed by machines: $ head -1 vmlinux; shuf -n 5 vmlinux # <level[,flags]> filename:line function "format" <5> block/blk-settings.c:661 disk_stack_limits "%s: Warning: Device %s is misaligned\n" <4> kernel/trace/trace.c:8296 trace_create_file "Could not create tracefs '%s' entry\n" <6> arch/x86/kernel/hpet.c:144 _hpet_print_config "hpet: %s(%d):\n" <6> init/do_mounts.c:605 prepare_namespace "Waiting for root device %s...\n" <6> drivers/acpi/osl.c:1410 acpi_no_auto_serialize_setup "ACPI: auto-serialization disabled\n" This mitigates the majority of cases where we have a highly-specific printk which we want to match on, as we can now enumerate and check whether the format changed or the printk callsite disappeared entirely in userspace. This allows us to catch changes to printks we monitor earlier and decide what to do about it before it becomes problematic. There is no additional runtime cost for printk callers or printk itself, and the assembly generated is exactly the same. Signed-off-by: Chris Down <chris@chrisdown.name> Cc: Petr Mladek <pmladek@suse.com> Cc: Jessica Yu <jeyu@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kees Cook <keescook@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Jessica Yu <jeyu@kernel.org> # for module.{c,h} Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/e42070983637ac5e384f17fbdbe86d19c7b212a5.1623775748.git.chris@chrisdown.name
2021-07-17Revert "mm/slub: use stackdepot to save stack trace in objects"Linus Torvalds1-1/+0
This reverts commit 788691464c29455346dc613a3b43c2fb9e5757a4. It's not clear why, but it causes unexplained problems in entirely unrelated xfs code. The most likely explanation is some slab corruption, possibly triggered due to CONFIG_SLUB_DEBUG_ON. See [1]. It ends up having a few other problems too, like build errors on arch/arc, and Geert reporting it using much more memory on m68k [3] (it probably does so elsewhere too, but it is probably just more noticeable on m68k). The architecture issues (both build and memory use) are likely just because this change effectively force-enabled STACKDEPOT (along with a very bad default value for the stackdepot hash size). But together with the xfs issue, this all smells like "this commit was not ready" to me. Link: https://lore.kernel.org/linux-xfs/YPE3l82acwgI2OiV@infradead.org/ [1] Link: https://lore.kernel.org/lkml/202107150600.LkGNb4Vb-lkp@intel.com/ [2] Link: https://lore.kernel.org/lkml/CAMuHMdW=eoVzM1Re5FVoEN87nKfiLmM2+Ah7eNu2KXEhCvbZyA@mail.gmail.com/ [3] Reported-by: Christoph Hellwig <hch@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08mm/slub: use stackdepot to save stack trace in objectsOliver Glitta1-0/+1
Many stack traces are similar so there are many similar arrays. Stackdepot saves each unique stack only once. Replace field addrs in struct track with depot_stack_handle_t handle. Use stackdepot to save stack trace. The benefits are smaller memory overhead and possibility to aggregate per-cache statistics in the future using the stackdepot handle instead of matching stacks manually. [rdunlap@infradead.org: rename save_stack_trace()] Link: https://lkml.kernel.org/r/20210513051920.29320-1-rdunlap@infradead.org [vbabka@suse.cz: fix lockdep splat] Link: https://lkml.kernel.org/r/20210516195150.26740-1-vbabka@suse.czLink: https://lkml.kernel.org/r/20210414163434.4376-1-glittao@gmail.com Signed-off-by: Oliver Glitta <glittao@gmail.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-30Merge tag 'clang-features-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds1-0/+3
Pull clang feature updates from Kees Cook: - Add CC_HAS_NO_PROFILE_FN_ATTR in preparation for PGO support in the face of the noinstr attribute, paving the way for PGO and fixing GCOV. (Nick Desaulniers) - x86_64 LTO coverage is expanded to 32-bit x86. (Nathan Chancellor) - Small fixes to CFI. (Mark Rutland, Nathan Chancellor) * tag 'clang-features-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: qemu_fw_cfg: Make fw_cfg_rev_attr a proper kobj_attribute Kconfig: Introduce ARCH_WANTS_NO_INSTR and CC_HAS_NO_PROFILE_FN_ATTR compiler_attributes.h: cleanups for GCC 4.9+ compiler_attributes.h: define __no_profile, add to noinstr x86, lto: Enable Clang LTO for 32-bit as well CFI: Move function_nocfi() into compiler.h MAINTAINERS: Add Clang CFI section
2021-06-22Kconfig: Introduce ARCH_WANTS_NO_INSTR and CC_HAS_NO_PROFILE_FN_ATTRNick Desaulniers1-0/+3
We don't want compiler instrumentation to touch noinstr functions, which are annotated with the no_profile_instrument_function function attribute. Add a Kconfig test for this and make GCOV depend on it, and in the future, PGO. If an architecture is using noinstr, it should denote that via this Kconfig value. That makes Kconfigs that depend on noinstr able to express dependencies in an architecturally agnostic way. Cc: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/lkml/YMTn9yjuemKFLbws@hirez.programming.kicks-ass.net/ Link: https://lore.kernel.org/lkml/YMcssV%2Fn5IBGv4f0@hirez.programming.kicks-ass.net/ Suggested-by: Nathan Chancellor <nathan@kernel.org> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210621231822.2848305-4-ndesaulniers@google.com
2021-05-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller1-40/+1
Daniel Borkmann says: ==================== pull-request: bpf 2021-05-11 The following pull-request contains BPF updates for your *net* tree. We've added 13 non-merge commits during the last 8 day(s) which contain a total of 21 files changed, 817 insertions(+), 382 deletions(-). The main changes are: 1) Fix multiple ringbuf bugs in particular to prevent writable mmap of read-only pages, from Andrii Nakryiko & Thadeu Lima de Souza Cascardo. 2) Fix verifier alu32 known-const subregister bound tracking for bitwise operations and/or/xor, from Daniel Borkmann. 3) Reject trampoline attachment for functions with variable arguments, and also add a deny list of other forbidden functions, from Jiri Olsa. 4) Fix nested bpf_bprintf_prepare() calls used by various helpers by switching to per-CPU buffers, from Florent Revest. 5) Fix kernel compilation with BTF debug info on ppc64 due to pahole missing TCP-CC functions like cubictcp_init, from Martin KaFai Lau. 6) Add a kconfig entry to provide an option to disallow unprivileged BPF by default, from Daniel Borkmann. 7) Fix libbpf compilation for older libelf when GELF_ST_VISIBILITY() macro is not available, from Arnaldo Carvalho de Melo. 8) Migrate test_tc_redirect to test_progs framework as prep work for upcoming skb_change_head() fix & selftest, from Jussi Maki. 9) Fix a libbpf segfault in add_dummy_ksym_var() if BTF is not present, from Ian Rogers. 10) Fix tx_only micro-benchmark in xdpsock BPF sample with proper frame size, from Magnus Karlsson. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-11bpf, kconfig: Add consolidated menu entry for bpf with core optionsDaniel Borkmann1-40/+1
Right now, all core BPF related options are scattered in different Kconfig locations mainly due to historic reasons. Moving forward, lets add a proper subsystem entry under ... General setup ---> BPF subsystem ---> ... in order to have all knobs in a single location and thus ease BPF related configuration. Networking related bits such as sockmap are out of scope for the general setup and therefore better suited to remain in net/Kconfig. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/f23f58765a4d59244ebd8037da7b6a6b2fb58446.1620765074.git.daniel@iogearbox.net
2021-05-07Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-0/+12
Merge yet more updates from Andrew Morton: "This is everything else from -mm for this merge window. 90 patches. Subsystems affected by this patch series: mm (cleanups and slub), alpha, procfs, sysctl, misc, core-kernel, bitmap, lib, compat, checkpatch, epoll, isofs, nilfs2, hpfs, exit, fork, kexec, gcov, panic, delayacct, gdb, resource, selftests, async, initramfs, ipc, drivers/char, and spelling" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (90 commits) mm: fix typos in comments mm: fix typos in comments treewide: remove editor modelines and cruft ipc/sem.c: spelling fix fs: fat: fix spelling typo of values kernel/sys.c: fix typo kernel/up.c: fix typo kernel/user_namespace.c: fix typos kernel/umh.c: fix some spelling mistakes include/linux/pgtable.h: few spelling fixes mm/slab.c: fix spelling mistake "disired" -> "desired" scripts/spelling.txt: add "overflw" scripts/spelling.txt: Add "diabled" typo scripts/spelling.txt: add "overlfow" arm: print alloc free paths for address in registers mm/vmalloc: remove vwrite() mm: remove xlate_dev_kmem_ptr() drivers/char: remove /dev/kmem for good mm: fix some typos and code style problems ipc/sem.c: mundane typo fixes ...
2021-05-07modules: add CONFIG_MODPROBE_PATHRasmus Villemoes1-0/+12
Allow the developer to specifiy the initial value of the modprobe_path[] string. This can be used to set it to the empty string initially, thus effectively disabling request_module() during early boot until userspace writes a new value via the /proc/sys/kernel/modprobe interface. [1] When building a custom kernel (often for an embedded target), it's normal to build everything into the kernel that is needed for booting, and indeed the initramfs often contains no modules at all, so every such request_module() done before userspace init has mounted the real rootfs is a waste of time. This is particularly useful when combined with the previous patch, which made the initramfs unpacking asynchronous - for that to work, it had to make any usermodehelper call wait for the unpacking to finish before attempting to invoke the userspace helper. By eliminating all such (known-to-be-futile) calls of usermodehelper, the initramfs unpacking and the {device,late}_initcalls can proceed in parallel for much longer. For a relatively slow ppc board I'm working on, the two patches combined lead to 0.2s faster boot - but more importantly, the fact that the initramfs unpacking proceeds completely in the background while devices get probed means I get to handle the gpio watchdog in time without getting reset. [1] __request_module() already has an early -ENOENT return when modprobe_path is the empty string. Link: https://lkml.kernel.org/r/20210313212528.2956377-3-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Jessica Yu <jeyu@kernel.org> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Takashi Iwai <tiwai@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-0/+5
Merge more updates from Andrew Morton: "The remainder of the main mm/ queue. 143 patches. Subsystems affected by this patch series (all mm): pagecache, hugetlb, userfaultfd, vmscan, compaction, migration, cma, ksm, vmstat, mmap, kconfig, util, memory-hotplug, zswap, zsmalloc, highmem, cleanups, and kfence" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (143 commits) kfence: use power-efficient work queue to run delayed work kfence: maximize allocation wait timeout duration kfence: await for allocation using wait_event kfence: zero guard page after out-of-bounds access mm/process_vm_access.c: remove duplicate include mm/mempool: minor coding style tweaks mm/highmem.c: fix coding style issue btrfs: use memzero_page() instead of open coded kmap pattern iov_iter: lift memzero_page() to highmem.h mm/zsmalloc: use BUG_ON instead of if condition followed by BUG. mm/zswap.c: switch from strlcpy to strscpy arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE mm,memory_hotplug: add kernel boot option to enable memmap_on_memory acpi,memhotplug: enable MHP_MEMMAP_ON_MEMORY when supported mm,memory_hotplug: allocate memmap from the added memory range mm,memory_hotplug: factor out adjusting present pages into adjust_present_page_count() mm,memory_hotplug: relax fully spanned sections check drivers/base/memory: introduce memory_block_{online,offline} mm/memory_hotplug: remove broken locking of zone PCP structures during hot remove ...
2021-05-05userfaultfd: add minor fault registration modeAxel Rasmussen1-0/+5
Patch series "userfaultfd: add minor fault handling", v9. Overview ======== This series adds a new userfaultfd feature, UFFD_FEATURE_MINOR_HUGETLBFS. When enabled (via the UFFDIO_API ioctl), this feature means that any hugetlbfs VMAs registered with UFFDIO_REGISTER_MODE_MISSING will *also* get events for "minor" faults. By "minor" fault, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s) (shared memory). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. We also add a new ioctl to resolve such faults: UFFDIO_CONTINUE. The idea is, userspace resolves the fault by either a) doing nothing if the contents are already correct, or b) updating the underlying contents using the second, non-UFFD mapping (via memcpy/memset or similar, or something fancier like RDMA, or etc...). In either case, userspace issues UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". Use Case ======== Consider the use case of VM live migration (e.g. under QEMU/KVM): 1. While a VM is still running, we copy the contents of its memory to a target machine. The pages are populated on the target by writing to the non-UFFD mapping, using the setup described above. The VM is still running (and therefore its memory is likely changing), so this may be repeated several times, until we decide the target is "up to date enough". 2. We pause the VM on the source, and start executing on the target machine. During this gap, the VM's user(s) will *see* a pause, so it is desirable to minimize this window. 3. Between the last time any page was copied from the source to the target, and when the VM was paused, the contents of that page may have changed - and therefore the copy we have on the target machine is out of date. Although we can keep track of which pages are out of date, for VMs with large amounts of memory, it is "slow" to transfer this information to the target machine. We want to resume execution before such a transfer would complete. 4. So, the guest begins executing on the target machine. The first time it touches its memory (via the UFFD-registered mapping), userspace wants to intercept this fault. Userspace checks whether or not the page is up to date, and if not, copies the updated page from the source machine, via the non-UFFD mapping. Finally, whether a copy was performed or not, userspace issues a UFFDIO_CONTINUE ioctl to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". We don't have to do all of the final updates on-demand. The userfaultfd manager can, in the background, also copy over updated pages once it receives the map of which pages are up-to-date or not. Interaction with Existing APIs ============================== Because this is a feature, a registered VMA could potentially receive both missing and minor faults. I spent some time thinking through how the existing API interacts with the new feature: UFFDIO_CONTINUE cannot be used to resolve non-minor faults, as it does not allocate a new page. If UFFDIO_CONTINUE is used on a non-minor fault: - For non-shared memory or shmem, -EINVAL is returned. - For hugetlb, -EFAULT is returned. UFFDIO_COPY and UFFDIO_ZEROPAGE cannot be used to resolve minor faults. Without modifications, the existing codepath assumes a new page needs to be allocated. This is okay, since userspace must have a second non-UFFD-registered mapping anyway, thus there isn't much reason to want to use these in any case (just memcpy or memset or similar). - If UFFDIO_COPY is used on a minor fault, -EEXIST is returned. - If UFFDIO_ZEROPAGE is used on a minor fault, -EEXIST is returned (or -EINVAL in the case of hugetlb, as UFFDIO_ZEROPAGE is unsupported in any case). - UFFDIO_WRITEPROTECT simply doesn't work with shared memory, and returns -ENOENT in that case (regardless of the kind of fault). Future Work =========== This series only supports hugetlbfs. I have a second series in flight to support shmem as well, extending the functionality. This series is more mature than the shmem support at this point, and the functionality works fully on hugetlbfs, so this series can be merged first and then shmem support will follow. This patch (of 6): This feature allows userspace to intercept "minor" faults. By "minor" faults, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. This commit adds the new registration mode, and sets the relevant flag on the VMAs being registered. In the hugetlb fault path, if we find that we have huge_pte_none(), but find_lock_page() does indeed find an existing page, then we have a "minor" fault, and if the VMA has the userfaultfd registration flag, we call into userfaultfd to handle it. This is implemented as a new registration mode, instead of an API feature. This is because the alternative implementation has significant drawbacks [1]. However, doing it this was requires we allocate a VM_* flag for the new registration mode. On 32-bit systems, there are no unused bits, so this feature is only supported on architectures with CONFIG_ARCH_USES_HIGH_VMA_FLAGS. When attempting to register a VMA in MINOR mode on 32-bit architectures, we return -EINVAL. [1] https://lore.kernel.org/patchwork/patch/1380226/ [peterx@redhat.com: fix minor fault page leak] Link: https://lkml.kernel.org/r/20210322175132.36659-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20210301222728.176417-1-axelrasmussen@google.com Link: https://lkml.kernel.org/r/20210301222728.176417-2-axelrasmussen@google.com Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Xu <peterx@redhat.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Steven Price <steven.price@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Oliver Upton <oupton@google.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-01Merge tag 'integrity-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrityLinus Torvalds1-3/+3
Pull IMA updates from Mimi Zohar: "In addition to loading the kernel module signing key onto the builtin keyring, load it onto the IMA keyring as well. Also six trivial changes and bug fixes" * tag 'integrity-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: ima: ensure IMA_APPRAISE_MODSIG has necessary dependencies ima: Fix fall-through warnings for Clang integrity: Add declarations to init_once void arguments. ima: Fix function name error in comment. ima: enable loading of build time generated key on .ima keyring ima: enable signing of modules with build time generated key keys: cleanup build time module signing keys ima: Fix the error code for restoring the PCR value ima: without an IMA policy loaded, return quickly
2021-04-29Merge tag 'kconfig-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuildLinus Torvalds1-11/+1
Pull Kconfig updates from Masahiro Yamada: - Change 'option defconfig' to the environment variable KCONFIG_DEFCONFIG_LIST - Refactor tinyconfig without using allnoconfig_y - Remove 'option allnoconfig_y' syntax - Change 'option modules' to 'modules' - Do not use /boot/config-* etc. as base config for cross-compilation - Fix a search bug in nconf - Various code cleanups * tag 'kconfig-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits) kconfig: refactor .gitignore kconfig: highlight xconfig 'comment' lines with '***' kconfig: highlight gconfig 'comment' lines with '***' kconfig: gconf: remove unused code kconfig: remove unused PACKAGE definition kconfig: nconf: stop endless search loops kconfig: split menu.c out of parser.y kconfig: nconf: refactor in print_in_middle() kconfig: nconf: remove meaningless wattrset() call from show_menu() kconfig: nconf: change set_config_filename() to void function kconfig: nconf: refactor attributes setup code kconfig: nconf: remove unneeded default for menu prompt kconfig: nconf: get rid of (void) casts from wattrset() calls kconfig: nconf: fix NORMAL attributes kconfig: mconf,nconf: remove unneeded '\0' termination after snprintf() kconfig: use /boot/config-* etc. as DEFCONFIG_LIST only for native build kconfig: change sym_change_count to a boolean flag kconfig: nconf: fix core dump when searching in empty menu kconfig: lxdialog: A spello fix and a punctuation added kconfig: streamline_config.pl: Couple of typo fixes ...
2021-04-29Merge tag 'kbuild-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuildLinus Torvalds1-20/+45
Pull Kbuild updates from Masahiro Yamada: - Evaluate $(call cc-option,...) etc. only for build targets - Add CONFIG_VMLINUX_MAP to generate .map file when linking vmlinux - Remove unnecessary --gcc-toolchains Clang flag because the --prefix flag finds the toolchains - Do not pass Clang's --prefix flag when using the integrated as - Check the assembler version in Kconfig time - Add new CONFIG options, AS_VERSION, AS_IS_GNU, AS_IS_LLVM to clean up some dependencies in Kconfig - Fix invalid Module.symvers creation when building only modules without vmlinux - Fix false-positive modpost warnings when CONFIG_TRIM_UNUSED_KSYMS is set, but there is no module to build - Refactor module installation Makefile - Support zstd for module compression - Convert alpha and ia64 to use generic shell scripts to generate the syscall headers - Add a new elfnote to indicate if the kernel was built with LTO, which will be used by pahole - Flatten the directory structure under include/config/ so CONFIG options and filenames match - Change the deb source package name from linux-$(KERNELRELEASE) to linux-upstream * tag 'kbuild-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (42 commits) kbuild: Add $(KBUILD_HOSTLDFLAGS) to 'has_libelf' test kbuild: deb-pkg: change the source package name to linux-upstream tools: do not include scripts/Kbuild.include kbuild: redo fake deps at include/config/*.h kbuild: remove TMPO from try-run MAINTAINERS: add pattern for dummy-tools kbuild: add an elfnote for whether vmlinux is built with lto ia64: syscalls: switch to generic syscallhdr.sh ia64: syscalls: switch to generic syscalltbl.sh alpha: syscalls: switch to generic syscallhdr.sh alpha: syscalls: switch to generic syscalltbl.sh sysctl: use min() helper for namecmp() kbuild: add support for zstd compressed modules kbuild: remove CONFIG_MODULE_COMPRESS kbuild: merge scripts/Makefile.modsign to scripts/Makefile.modinst kbuild: move module strip/compression code into scripts/Makefile.modinst kbuild: refactor scripts/Makefile.modinst kbuild: rename extmod-prefix to extmod_prefix kbuild: check module name conflict for external modules as well kbuild: show the target directory for depmod log ...
2021-04-29Merge tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds1-0/+2
Pull networking updates from Jakub Kicinski: "Core: - bpf: - allow bpf programs calling kernel functions (initially to reuse TCP congestion control implementations) - enable task local storage for tracing programs - remove the need to store per-task state in hash maps, and allow tracing programs access to task local storage previously added for BPF_LSM - add bpf_for_each_map_elem() helper, allowing programs to walk all map elements in a more robust and easier to verify fashion - sockmap: support UDP and cross-protocol BPF_SK_SKB_VERDICT redirection - lpm: add support for batched ops in LPM trie - add BTF_KIND_FLOAT support - mostly to allow use of BTF on s390 which has floats in its headers files - improve BPF syscall documentation and extend the use of kdoc parsing scripts we already employ for bpf-helpers - libbpf, bpftool: support static linking of BPF ELF files - improve support for encapsulation of L2 packets - xdp: restructure redirect actions to avoid a runtime lookup, improving performance by 4-8% in microbenchmarks - xsk: build skb by page (aka generic zerocopy xmit) - improve performance of software AF_XDP path by 33% for devices which don't need headers in the linear skb part (e.g. virtio) - nexthop: resilient next-hop groups - improve path stability on next-hops group changes (incl. offload for mlxsw) - ipv6: segment routing: add support for IPv4 decapsulation - icmp: add support for RFC 8335 extended PROBE messages - inet: use bigger hash table for IP ID generation - tcp: deal better with delayed TX completions - make sure we don't give up on fast TCP retransmissions only because driver is slow in reporting that it completed transmitting the original - tcp: reorder tcp_congestion_ops for better cache locality - mptcp: - add sockopt support for common TCP options - add support for common TCP msg flags - include multiple address ids in RM_ADDR - add reset option support for resetting one subflow - udp: GRO L4 improvements - improve 'forward' / 'frag_list' co-existence with UDP tunnel GRO, allowing the first to take place correctly even for encapsulated UDP traffic - micro-optimize dev_gro_receive() and flow dissection, avoid retpoline overhead on VLAN and TEB GRO - use less memory for sysctls, add a new sysctl type, to allow using u8 instead of "int" and "long" and shrink networking sysctls - veth: allow GRO without XDP - this allows aggregating UDP packets before handing them off to routing, bridge, OvS, etc. - allow specifing ifindex when device is moved to another namespace - netfilter: - nft_socket: add support for cgroupsv2 - nftables: add catch-all set element - special element used to define a default action in case normal lookup missed - use net_generic infra in many modules to avoid allocating per-ns memory unnecessarily - xps: improve the xps handling to avoid potential out-of-bound accesses and use-after-free when XPS change race with other re-configuration under traffic - add a config knob to turn off per-cpu netdev refcnt to catch underflows in testing Device APIs: - add WWAN subsystem to organize the WWAN interfaces better and hopefully start driving towards more unified and vendor- independent APIs - ethtool: - add interface for reading IEEE MIB stats (incl. mlx5 and bnxt support) - allow network drivers to dump arbitrary SFP EEPROM data, current offset+length API was a poor fit for modern SFP which define EEPROM in terms of pages (incl. mlx5 support) - act_police, flow_offload: add support for packet-per-second policing (incl. offload for nfp) - psample: add additional metadata attributes like transit delay for packets sampled from switch HW (and corresponding egress and policy-based sampling in the mlxsw driver) - dsa: improve support for sandwiched LAGs with bridge and DSA - netfilter: - flowtable: use direct xmit in topologies with IP forwarding, bridging, vlans etc. - nftables: counter hardware offload support - Bluetooth: - improvements for firmware download w/ Intel devices - add support for reading AOSP vendor capabilities - add support for virtio transport driver - mac80211: - allow concurrent monitor iface and ethernet rx decap - set priority and queue mapping for injected frames - phy: add support for Clause-45 PHY Loopback - pci/iov: add sysfs MSI-X vector assignment interface to distribute MSI-X resources to VFs (incl. mlx5 support) New hardware/drivers: - dsa: mv88e6xxx: add support for Marvell mv88e6393x - 11-port Ethernet switch with 8x 1-Gigabit Ethernet and 3x 10-Gigabit interfaces. - dsa: support for legacy Broadcom tags used on BCM5325, BCM5365 and BCM63xx switches - Microchip KSZ8863 and KSZ8873; 3x 10/100Mbps Ethernet switches - ath11k: support for QCN9074 a 802.11ax device - Bluetooth: Broadcom BCM4330 and BMC4334 - phy: Marvell 88X2222 transceiver support - mdio: add BCM6368 MDIO mux bus controller - r8152: support RTL8153 and RTL8156 (USB Ethernet) chips - mana: driver for Microsoft Azure Network Adapter (MANA) - Actions Semi Owl Ethernet MAC - can: driver for ETAS ES58X CAN/USB interfaces Pure driver changes: - add XDP support to: enetc, igc, stmmac - add AF_XDP support to: stmmac - virtio: - page_to_skb() use build_skb when there's sufficient tailroom (21% improvement for 1000B UDP frames) - support XDP even without dedicated Tx queues - share the Tx queues with the stack when necessary - mlx5: - flow rules: add support for mirroring with conntrack, matching on ICMP, GTP, flex filters and more - support packet sampling with flow offloads - persist uplink representor netdev across eswitch mode changes - allow coexistence of CQE compression and HW time-stamping - add ethtool extended link error state reporting - ice, iavf: support flow filters, UDP Segmentation Offload - dpaa2-switch: - move the driver out of staging - add spanning tree (STP) support - add rx copybreak support - add tc flower hardware offload on ingress traffic - ionic: - implement Rx page reuse - support HW PTP time-stamping - octeon: support TC hardware offloads - flower matching on ingress and egress ratelimitting. - stmmac: - add RX frame steering based on VLAN priority in tc flower - support frame preemption (FPE) - intel: add cross time-stamping freq difference adjustment - ocelot: - support forwarding of MRP frames in HW - support multiple bridges - support PTP Sync one-step timestamping - dsa: mv88e6xxx, dpaa2-switch: offload bridge port flags like learning, flooding etc. - ipa: add IPA v4.5, v4.9 and v4.11 support (Qualcomm SDX55, SM8350, SC7280 SoCs) - mt7601u: enable TDLS support - mt76: - add support for 802.3 rx frames (mt7915/mt7615) - mt7915 flash pre-calibration support - mt7921/mt7663 runtime power management fixes" * tag 'net-next-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2451 commits) net: selftest: fix build issue if INET is disabled net: netrom: nr_in: Remove redundant assignment to ns net: tun: Remove redundant assignment to ret net: phy: marvell: add downshift support for M88E1240 net: dsa: ksz: Make reg_mib_cnt a u8 as it never exceeds 255 net/sched: act_ct: Remove redundant ct get and check icmp: standardize naming of RFC 8335 PROBE constants bpf, selftests: Update array map tests for per-cpu batched ops bpf: Add batched ops support for percpu array bpf: Implement formatted output helpers with bstr_printf seq_file: Add a seq_bprintf function sfc: adjust efx->xdp_tx_queue_count with the real number of initialized queues net:nfc:digital: Fix a double free in digital_tg_recv_dep_req net: fix a concurrency bug in l2tp_tunnel_register() net/smc: Remove redundant assignment to rc mpls: Remove redundant assignment to err llc2: Remove redundant assignment to rc net/tls: Remove redundant initialization of record rds: Remove redundant assignment to nr_sig dt-bindings: net: mdio-gpio: add compatible for microchip,mdio-smi0 ...
2021-04-27Merge branch 'for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroupLinus Torvalds1-0/+14
Pull cgroup changes from Tejun Heo: "The only notable change is Vipin's new misc cgroup controller. This implements generic support for resources which can be controlled by simply counting and limiting the number of resource instances - ie there's X number of these on the system and this cgroup subtree can have upto Y of those. The first user is the address space IDs used for virtual machine memory encryption and expected future usages are similar - niche hardware features with concrete resource limits and simple usage models" * 'for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: use tsk->in_iowait instead of delayacct_is_task_waiting_on_io() cgroup/cpuset: fix typos in comments cgroup: misc: mark dummy misc_cg_res_total_usage() static inline svm/sev: Register SEV and SEV-ES ASIDs to the misc controller cgroup: Miscellaneous cgroup documentation. cgroup: Add misc cgroup controller
2021-04-27bpf: Implement formatted output helpers with bstr_printfFlorent Revest1-0/+1
BPF has three formatted output helpers: bpf_trace_printk, bpf_seq_printf and bpf_snprintf. Their signatures specify that all arguments are provided from the BPF world as u64s (in an array or as registers). All of these helpers are currently implemented by calling functions such as snprintf() whose signatures take a variable number of arguments, then placed in a va_list by the compiler to call vsnprintf(). "d9c9e4db bpf: Factorize bpf_trace_printk and bpf_seq_printf" introduced a bpf_printf_prepare function that fills an array of u64 sanitized arguments with an array of "modifiers" which indicate what the "real" size of each argument should be (given by the format specifier). The BPF_CAST_FMT_ARG macro consumes these arrays and casts each argument to its real size. However, the C promotion rules implicitely cast them all back to u64s. Therefore, the arguments given to snprintf are u64s and the va_list constructed by the compiler will use 64 bits for each argument. On 64 bit machines, this happens to work well because 32 bit arguments in va_lists need to occupy 64 bits anyway, but on 32 bit architectures this breaks the layout of the va_list expected by the called function and mangles values. In "88a5c690b6 bpf: fix bpf_trace_printk on 32 bit archs", this problem had been solved for bpf_trace_printk only with a "horrid workaround" that emitted multiple calls to trace_printk where each call had different argument types and generated different va_list layouts. One of the call would be dynamically chosen at runtime. This was ok with the 3 arguments that bpf_trace_printk takes but bpf_seq_printf and bpf_snprintf accept up to 12 arguments. Because this approach scales code exponentially, it is not a viable option anymore. Because the promotion rules are part of the language and because the construction of a va_list is an arch-specific ABI, it's best to just avoid variadic arguments and va_lists altogether. Thankfully the kernel's snprintf() has an alternative in the form of bstr_printf() that accepts arguments in a "binary buffer representation". These binary buffers are currently created by vbin_printf and used in the tracing subsystem to split the cost of printing into two parts: a fast one that only dereferences and remembers values, and a slower one, called later, that does the pretty-printing. This patch refactors bpf_printf_prepare to construct binary buffers of arguments consumable by bstr_printf() instead of arrays of arguments and modifiers. This gets rid of BPF_CAST_FMT_ARG and greatly simplifies the bpf_printf_prepare usage but there are a few gotchas that change how bpf_printf_prepare needs to do things. Currently, bpf_printf_prepare uses a per cpu temporary buffer as a generic storage for strings and IP addresses. With this refactoring, the temporary buffers now holds all the arguments in a structured binary format. To comply with the format expected by bstr_printf, certain format specifiers also need to be pre-formatted: %pB and %pi6/%pi4/%pI4/%pI6. Because vsnprintf subroutines for these specifiers are hard to expose, we pre-format these arguments with calls to snprintf(). Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Florent Revest <revest@chromium.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210427174313.860948-3-revest@chromium.org
2021-04-25kbuild: redo fake deps at include/config/*.hAlexey Dobriyan1-1/+1
Make include/config/foo/bar.h fake deps files generation simpler. * delete .h suffix those aren't header files, shorten filenames, * delete tolower() Linux filesystems can deal with both upper and lowercase filenames very well, * put everything in 1 directory Presumably 'mkdir -p' split is from dark times when filesystems handled huge directories badly, disks were round adding to seek times. x86_64 allmodconfig lists 12364 files in include/config. ../obj/include/config/ ├── 104_QUAD_8 ├── 60XX_WDT ├── 64BIT ... ├── ZSWAP_DEFAULT_ON ├── ZSWAP_ZPOOL_DEFAULT └── ZSWAP_ZPOOL_DEFAULT_ZBUD 0 directories, 12364 files Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-04-25kbuild: add support for zstd compressed modulesPiotr Gorski1-1/+7
kmod 28 supports modules compressed in zstd format so let's add this possibility to kernel. Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com> Reviewed-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-04-25kbuild: remove CONFIG_MODULE_COMPRESSMasahiro Yamada1-19/+26
CONFIG_MODULE_COMPRESS is only used to activate the choice for module compression algorithm. It will be simpler to make the choice always visible, and add CONFIG_MODULE_COMPRESS_NONE in the choice. This is more consistent with the "Kernel compression mode" and "Built-in initramfs compression mode" choices. CONFIG_KERNEL_UNCOMPRESSED and CONFIG_INITRAMFS_COMPRESSION_NONE are available to choose no compression. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
2021-04-25kbuild: check the minimum assembler version in KconfigMasahiro Yamada1-0/+12
Documentation/process/changes.rst defines the minimum assembler version (binutils version), but we have never checked it in the build time. Kbuild never invokes 'as' directly because all assembly files in the kernel tree are *.S, hence must be preprocessed. I do not expect raw assembly source files (*.s) would be added to the kernel tree. Therefore, we always use $(CC) as the assembler driver, and commit aa824e0c962b ("kbuild: remove AS variable") removed 'AS'. However, we are still interested in the version of the assembler acting behind. As usual, the --version option prints the version string. $ as --version | head -n 1 GNU assembler (GNU Binutils for Ubuntu) 2.35.1 But, we do not have $(AS). So, we can add the -Wa prefix so that $(CC) passes --version down to the backing assembler. $ gcc -Wa,--version | head -n 1 gcc: fatal error: no input files compilation terminated. OK, we need to input something to satisfy gcc. $ gcc -Wa,--version -c -x assembler /dev/null -o /dev/null | head -n 1 GNU assembler (GNU Binutils for Ubuntu) 2.35.1 The combination of Clang and GNU assembler works in the same way: $ clang -no-integrated-as -Wa,--version -c -x assembler /dev/null -o /dev/null | head -n 1 GNU assembler (GNU Binutils for Ubuntu) 2.35.1 Clang with the integrated assembler fails like this: $ clang -integrated-as -Wa,--version -c -x assembler /dev/null -o /dev/null | head -n 1 clang: error: unsupported argument '--version' to option 'Wa,' For the last case, checking the error message is fragile. If the proposal for -Wa,--version support [1] is accepted, this may not be even an error in the future. One easy way is to check if -integrated-as is present in the passed arguments. We did not pass -integrated-as to CLANG_FLAGS before, but we can make it explicit. Nathan pointed out -integrated-as is the default for all of the architectures/targets that the kernel cares about, but it goes along with "explicit is better than implicit" policy. [2] With all this in my mind, I implemented scripts/as-version.sh to check the assembler version in Kconfig time. $ scripts/as-version.sh gcc GNU 23501 $ scripts/as-version.sh clang -no-integrated-as GNU 23501 $ scripts/as-version.sh clang -integrated-as LLVM 0 [1]: https://github.com/ClangBuiltLinux/linux/issues/1320 [2]: https://lore.kernel.org/linux-kbuild/20210307044253.v3h47ucq6ng25iay@archlinux-ax161/ Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org>
2021-04-14kconfig: change "modules" from sub-option to first-level attributeMasahiro Yamada1-1/+1
Now "modules" is the only member of the "option" property. Remove "option", and move "modules" to the top level property. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-04-14kconfig: do not use allnoconfig_y optionMasahiro Yamada1-1/+0
allnoconfig_y is an ugly hack that sets a symbol to 'y' by allnoconfig. allnoconfig does not mean a minimal set of CONFIG options because a bunch of prompts are hidden by 'if EMBEDDED' or 'if EXPERT', but I do not like to hack Kconfig this way. Use the pre-existing feature, KCONFIG_ALLCONFIG, to provide a one liner config fragment. CONFIG_EMBEDDED=y is still forced when allnoconfig is invoked as a part of tinyconfig. No change in the .config file produced by 'make tinyconfig'. The output of 'make allnoconfig' will be changed; we will get CONFIG_EMBEDDED=n because allnoconfig literally sets all symbols to n. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-04-14kconfig: change defconfig_list option to environment variableMasahiro Yamada1-9/+0
"defconfig_list" is a weird option that defines a static symbol that declares the list of base config files in case the .config does not exist yet. This is quite different from other normal symbols; we just abused the "string" type and the "default" properties to list out the input files. They must be fixed values since these are searched for and loaded in the parse stage. It is an ugly hack, and should not exist in the first place. Providing this feature as an environment variable is a saner approach. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2021-04-09ima: enable signing of modules with build time generated keyNayna Jain1-3/+3
The kernel build process currently only signs kernel modules when MODULE_SIG is enabled. Also, sign the kernel modules at build time when IMA_APPRAISE_MODSIG is enabled. Signed-off-by: Nayna Jain <nayna@linux.ibm.com> Acked-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2021-04-08add support for Clang CFISami Tolvanen1-1/+1
This change adds support for Clang’s forward-edge Control Flow Integrity (CFI) checking. With CONFIG_CFI_CLANG, the compiler injects a runtime check before each indirect function call to ensure the target is a valid function with the correct static type. This restricts possible call targets and makes it more difficult for an attacker to exploit bugs that allow the modification of stored function pointers. For more details, see: https://clang.llvm.org/docs/ControlFlowIntegrity.html Clang requires CONFIG_LTO_CLANG to be enabled with CFI to gain visibility to possible call targets. Kernel modules are supported with Clang’s cross-DSO CFI mode, which allows checking between independently compiled components. With CFI enabled, the compiler injects a __cfi_check() function into the kernel and each module for validating local call targets. For cross-module calls that cannot be validated locally, the compiler calls the global __cfi_slowpath_diag() function, which determines the target module and calls the correct __cfi_check() function. This patch includes a slowpath implementation that uses __module_address() to resolve call targets, and with CONFIG_CFI_CLANG_SHADOW enabled, a shadow map that speeds up module look-ups by ~3x. Clang implements indirect call checking using jump tables and offers two methods of generating them. With canonical jump tables, the compiler renames each address-taken function to <function>.cfi and points the original symbol to a jump table entry, which passes __cfi_check() validation. This isn’t compatible with stand-alone assembly code, which the compiler doesn’t instrument, and would result in indirect calls to assembly code to fail. Therefore, we default to using non-canonical jump tables instead, where the compiler generates a local jump table entry <function>.cfi_jt for each address-taken function, and replaces all references to the function with the address of the jump table entry. Note that because non-canonical jump table addresses are local to each component, they break cross-module function address equality. Specifically, the address of a global function will be different in each module, as it's replaced with the address of a local jump table entry. If this address is passed to a different module, it won’t match the address of the same function taken there. This may break code that relies on comparing addresses passed from other components. CFI checking can be disabled in a function with the __nocfi attribute. Additionally, CFI can be disabled for an entire compilation unit by filtering out CC_FLAGS_CFI. By default, CFI failures result in a kernel panic to stop a potential exploit. CONFIG_CFI_PERMISSIVE enables a permissive mode, where the kernel prints out a rate-limited warning instead, and allows execution to continue. This option is helpful for locating type mismatches, but should only be enabled during development. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210408182843.1754385-2-samitolvanen@google.com
2021-04-04cgroup: Add misc cgroup controllerVipin Sharma1-0/+14
The Miscellaneous cgroup provides the resource limiting and tracking mechanism for the scalar resources which cannot be abstracted like the other cgroup resources. Controller is enabled by the CONFIG_CGROUP_MISC config option. A resource can be added to the controller via enum misc_res_type{} in the include/linux/misc_cgroup.h file and the corresponding name via misc_res_name[] in the kernel/cgroup/misc.c file. Provider of the resource must set its capacity prior to using the resource by calling misc_cg_set_capacity(). Once a capacity is set then the resource usage can be updated using charge and uncharge APIs. All of the APIs to interact with misc controller are in include/linux/misc_cgroup.h. Miscellaneous controller provides 3 interface files. If two misc resources (res_a and res_b) are registered then: misc.capacity A read-only flat-keyed file shown only in the root cgroup. It shows miscellaneous scalar resources available on the platform along with their quantities:: $ cat misc.capacity res_a 50 res_b 10 misc.current A read-only flat-keyed file shown in the non-root cgroups. It shows the current usage of the resources in the cgroup and its children:: $ cat misc.current res_a 3 res_b 0 misc.max A read-write flat-keyed file shown in the non root cgroups. Allowed maximum usage of the resources in the cgroup and its children.:: $ cat misc.max res_a max res_b 4 Limit can be set by:: # echo res_a 1 > misc.max Limit can be set to max by:: # echo res_a max > misc.max Limits can be set more than the capacity value in the misc.capacity file. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: David Rientjes <rientjes@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-03-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-6/+5
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-14Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-2/+1
Merge misc fixes from Andrew Morton: "28 patches. Subsystems affected by this series: mm (memblock, pagealloc, hugetlb, highmem, kfence, oom-kill, madvise, kasan, userfaultfd, memcg, and zram), core-kernel, kconfig, fork, binfmt, MAINTAINERS, kbuild, and ia64" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (28 commits) zram: fix broken page writeback zram: fix return value on writeback_store mm/memcg: set memcg when splitting page mm/memcg: rename mem_cgroup_split_huge_fixup to split_page_memcg and add nr_pages argument ia64: fix ptrace(PTRACE_SYSCALL_INFO_EXIT) sign ia64: fix ia64_syscall_get_set_arguments() for break-based syscalls mm/userfaultfd: fix memory corruption due to writeprotect kasan: fix KASAN_STACK dependency for HW_TAGS kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC mm/madvise: replace ptrace attach requirement for process_madvise include/linux/sched/mm.h: use rcu_dereference in in_vfork() kfence: fix reports if constant function prefixes exist kfence, slab: fix cache_alloc_debugcheck_after() for bulk allocations kfence: fix printk format for ptrdiff_t linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP* MAINTAINERS: exclude uapi directories in API/ABI section binfmt_misc: fix possible deadlock in bm_register_write mm/highmem.c: fix zero_user_segments() with start > end hugetlb: do early cow when page pinned on src mm mm: use is_cow_mapping() across tree where proper ...
2021-03-13init/Kconfig: make COMPILE_TEST depend on HAS_IOMEMMasahiro Yamada1-2/+1
I read the commit log of the following two: - bc083a64b6c0 ("init/Kconfig: make COMPILE_TEST depend on !UML") - 334ef6ed06fa ("init/Kconfig: make COMPILE_TEST depend on !S390") Both are talking about HAS_IOMEM dependency missing in many drivers. So, 'depends on HAS_IOMEM' seems the direct, sensible solution to me. This does not change the behavior of UML. UML still cannot enable COMPILE_TEST because it does not provide HAS_IOMEM. The current dependency for S390 is too strong. Under the condition of CONFIG_PCI=y, S390 provides HAS_IOMEM, hence can enable COMPILE_TEST. I also removed the meaningless 'default n'. Link: https://lkml.kernel.org/r/20210224140809.1067582-1-masahiroy@kernel.org Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Arnd Bergmann <arnd@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KP Singh <kpsingh@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Quentin Perret <qperret@google.com> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: "Enrico Weigelt, metux IT consult" <lkml@metux.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-11kbuild: rebuild GCC plugins when the compiler is upgradedMasahiro Yamada1-4/+4
Linus reported a build error due to the GCC plugin incompatibility when the compiler is upgraded. [1] GCC plugins are tied to a particular GCC version. So, they must be rebuilt when the compiler is upgraded. This seems to be a long-standing flaw since the initial support of GCC plugins. Extend commit 8b59cd81dc5e ("kbuild: ensure full rebuild when the compiler is updated"), so that GCC plugins are covered by the compiler upgrade detection. [1]: https://lore.kernel.org/lkml/CAHk-=wieoN5ttOy7SnsGwZv+Fni3R6m-Ut=oxih6bbZ28G+4dw@mail.gmail.com/ Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org>
2021-03-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-0/+1
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-03-09 The following pull-request contains BPF updates for your *net-next* tree. We've added 90 non-merge commits during the last 17 day(s) which contain a total of 114 files changed, 5158 insertions(+), 1288 deletions(-). The main changes are: 1) Faster bpf_redirect_map(), from Björn. 2) skmsg cleanup, from Cong. 3) Support for floating point types in BTF, from Ilya. 4) Documentation for sys_bpf commands, from Joe. 5) Support for sk_lookup in bpf_prog_test_run, form Lorenz. 6) Enable task local storage for tracing programs, from Song. 7) bpf_for_each_map_elem() helper, from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-28kbuild: fix UNUSED_KSYMS_WHITELIST for Clang LTOMasahiro Yamada1-1/+0
Commit fbe078d397b4 ("kbuild: lto: add a default list of used symbols") does not work as expected if the .config file has already specified CONFIG_UNUSED_KSYMS_WHITELIST="my/own/white/list" before enabling CONFIG_LTO_CLANG. So, the user-supplied whitelist and LTO-specific white list must be independent of each other. I refactored the shell script so CONFIG_MODVERSIONS and CONFIG_CLANG_LTO handle whitelists in the same way. Fixes: fbe078d397b4 ("kbuild: lto: add a default list of used symbols") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
2021-02-26bpf: Clean up sockmap related KconfigsCong Wang1-0/+1
As suggested by John, clean up sockmap related Kconfigs: Reduce the scope of CONFIG_BPF_STREAM_PARSER down to TCP stream parser, to reflect its name. Make the rest sockmap code simply depend on CONFIG_BPF_SYSCALL and CONFIG_INET, the latter is still needed at this point because of TCP/UDP proto update. And leave CONFIG_NET_SOCK_MSG untouched, as it is used by non-sockmap cases. Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210223184934.6054-2-xiyou.wangcong@gmail.com
2021-02-26init/Kconfig: fix a typo in CC_VERSION_TEXT help textBhaskar Chowdhury1-1/+1
s/compier/compiler/ Link: https://lkml.kernel.org/r/20210224223325.29099-1-unixbhaskar@gmail.com Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-25Merge tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuildLinus Torvalds1-12/+18
Pull Kbuild updates from Masahiro Yamada: - Fix false-positive build warnings for ARCH=ia64 builds - Optimize dictionary size for module compression with xz - Check the compiler and linker versions in Kconfig - Fix misuse of extra-y - Support DWARF v5 debug info - Clamp SUBLEVEL to 255 because stable releases 4.4.x and 4.9.x exceeded the limit - Add generic syscall{tbl,hdr}.sh for cleanups across arches - Minor cleanups of genksyms - Minor cleanups of Kconfig * tag 'kbuild-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (38 commits) initramfs: Remove redundant dependency of RD_ZSTD on BLK_DEV_INITRD kbuild: remove deprecated 'always' and 'hostprogs-y/m' kbuild: parse C= and M= before changing the working directory kbuild: reuse this-makefile to define abs_srctree kconfig: unify rule of config, menuconfig, nconfig, gconfig, xconfig kconfig: omit --oldaskconfig option for 'make config' kconfig: fix 'invalid option' for help option kconfig: remove dead code in conf_askvalue() kconfig: clean up nested if-conditionals in check_conf() kconfig: Remove duplicate call to sym_get_string_value() Makefile: Remove # characters from compiler string Makefile: reuse CC_VERSION_TEXT kbuild: check the minimum linker version in Kconfig kbuild: remove ld-version macro scripts: add generic syscallhdr.sh scripts: add generic syscalltbl.sh arch: syscalls: remove $(srctree)/ prefix from syscall tables arch: syscalls: add missing FORCE and fix 'targets' to make if_changed work gen_compile_commands: prune some directories kbuild: simplify access to the kernel's version ...
2021-02-24Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-14/+0
Merge misc updates from Andrew Morton: "A few small subsystems and some of MM. 172 patches. Subsystems affected by this patch series: hexagon, scripts, ntfs, ocfs2, vfs, and mm (slab-generic, slab, slub, debug, pagecache, swap, memcg, pagemap, mprotect, mremap, page-reporting, vmalloc, kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction, mempolicy, oom-kill, hugetlbfs, and migration)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (172 commits) mm/migrate: remove unneeded semicolons hugetlbfs: remove unneeded return value of hugetlb_vmtruncate() hugetlbfs: fix some comment typos hugetlbfs: correct some obsolete comments about inode i_mutex hugetlbfs: make hugepage size conversion more readable hugetlbfs: remove meaningless variable avoid_reserve hugetlbfs: correct obsolete function name in hugetlbfs_read_iter() hugetlbfs: use helper macro default_hstate in init_hugetlbfs_fs hugetlbfs: remove useless BUG_ON(!inode) in hugetlbfs_setattr() hugetlbfs: remove special hugetlbfs_set_page_dirty() mm/hugetlb: change hugetlb_reserve_pages() to type bool mm, oom: fix a comment in dump_task() mm/mempolicy: use helper range_in_vma() in queue_pages_test_walk() numa balancing: migrate on fault among multiple bound nodes mm, compaction: make fast_isolate_freepages() stay within zone mm/compaction: fix misbehaviors of fast_find_migrateblock() mm/compaction: correct deferral logic for proactive compaction mm/compaction: remove duplicated VM_BUG_ON_PAGE !PageLocked mm/compaction: remove rcu_read_lock during page compaction z3fold: simplify the zhdr initialization code in init_z3fold_page() ...
2021-02-24mm, slub: remove slub_memcg_sysfs boot param and CONFIG_SLUB_MEMCG_SYSFS_ONVlastimil Babka1-14/+0
The boot param and config determine the value of memcg_sysfs_enabled, which is unused since commit 10befea91b61 ("mm: memcg/slab: use a single set of kmem_caches for all allocations") as there are no per-memcg kmem caches anymore. Link: https://lkml.kernel.org/r/20210127124745.7928-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24Kbuild: enable TRIM_UNUSED_KSYMS again, with some guardingLinus Torvalds1-2/+2
In commit 5cf0fd591f2e ("Kbuild: disable TRIM_UNUSED_KSYMS option") I disabled this option because it's hugely expensive at build time, and I questioned how much use it gets. Several people piped up and convinced me it's actually useful, so instead of disabling it entirely, it now depends on EXPERT and gets disabled by COMPILE_TEST builds so that 'allmodconfig' style things don't enable it. I still hope somebody will take a look at the build time issue, because as Arnd also noted: "However, the combination of thinlto and trim indeed has a steep cost in compile time, taking almost twice as long as a normal defconfig (gc-sections makes it slightly faster)" Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Arnd Bergmann <arnd@kernel.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Cristoph Hellwig <hch@lst.de>, Cc: Miroslav Benes <mbenes@suse.cz> Cc: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-23Kbuild: disable TRIM_UNUSED_KSYMS optionLinus Torvalds1-0/+1
The removal of EXPORT_UNUSED_SYMBOL() in commit 367948220fce looks like (and was sold as) a no-op, but it actually had a rather serious and subtle side effect: the UNUSED_SYMBOLS option not only enabled the removed (unused) functionality, it also _disabled_ the TRIM_UNUSED_KSYMS functionality. And it turns out that TRIM_UNUSED_KSYMS is a huge time waste, and takes up a third of the kernel build time for me. For no actual upside, since no distro is likely to ever be able to enable it (because they all support external kernel modules). Rather than re-enable EXPORT_UNUSED_SYMBOL, this just disables the TRIM_UNUSED_KSYMS option by marking it broken. I'm tempted to just remove the support entirely, but maybe somebody has a use-case and can fix the behavior of it. I could have just disabled it for COMPILE_TEST, but it really smells like the TRIM_UNUSED_KSYMS option is badly done and not really useful, so this takes the more direct approach - let's see if anybody ever actually notices or complains. Cc: Miroslav Benes <mbenes@suse.cz> Cc: Emil Velikov <emil.l.velikov@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jessica Yu <jeyu@kernel.org> Fixes: 367948220fce ("module: remove EXPORT_UNUSED_SYMBOL*") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-23Merge tag 'modules-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linuxLinus Torvalds1-17/+0
Pull module updates from Jessica Yu: - Retire EXPORT_UNUSED_SYMBOL() and EXPORT_SYMBOL_GPL_FUTURE(). These export types were introduced between 2006 - 2008. All the of the unused symbols have been long removed and gpl future symbols were converted to gpl quite a long time ago, and I don't believe these export types have been used ever since. So, I think it should be safe to retire those export types now (Christoph Hellwig) - Refactor and clean up some aged code cruft in the module loader (Christoph Hellwig) - Build {,module_}kallsyms_on_each_symbol only when livepatching is enabled, as it is the only caller (Christoph Hellwig) - Unexport find_module() and module_mutex and fix the last module callers to not rely on these anymore. Make module_mutex internal to the module loader (Christoph Hellwig) - Harden ELF checks on module load and validate ELF structures before checking the module signature (Frank van der Linden) - Fix undefined symbol warning for clang (Fangrui Song) - Fix smatch warning (Dan Carpenter) * tag 'modules-for-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: module: potential uninitialized return in module_kallsyms_on_each_symbol() module: remove EXPORT_UNUSED_SYMBOL* module: remove EXPORT_SYMBOL_GPL_FUTURE module: move struct symsearch to module.c module: pass struct find_symbol_args to find_symbol module: merge each_symbol_section into find_symbol module: remove each_symbol_in_section module: mark module_mutex static kallsyms: only build {,module_}kallsyms_on_each_symbol when required kallsyms: refactor {,module_}kallsyms_on_each_symbol module: use RCU to synchronize find_module module: unexport find_module and module_mutex drm: remove drm_fb_helper_modinit powerpc/powernv: remove get_cxl_module module: harden ELF info handling module: Ignore _GLOBAL_OFFSET_TABLE_ when warning for undefined symbols
2021-02-23Merge tag 'clang-lto-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linuxLinus Torvalds1-0/+1
Pull clang LTO updates from Kees Cook: "Clang Link Time Optimization. This is built on the work done preparing for LTO by arm64 folks, tracing folks, etc. This includes the core changes as well as the remaining pieces for arm64 (LTO has been the default build method on Android for about 3 years now, as it is the prerequisite for the Control Flow Integrity protections). While x86 LTO enablement is done, it depends on some pending objtool clean-ups. It's possible that I'll send a "part 2" pull request for LTO that includes x86 support. For merge log posterity, and as detailed in commit dc5723b02e52 ("kbuild: add support for Clang LTO"), here is the lt;dr to do an LTO build: make LLVM=1 LLVM_IAS=1 defconfig scripts/config -e LTO_CLANG_THIN make LLVM=1 LLVM_IAS=1 (To do a cross-compile of arm64, add "CROSS_COMPILE=aarch64-linux-gnu-" and "ARCH=arm64" to the "make" command lines.) Summary: - Clang LTO build infrastructure and arm64-specific enablement (Sami Tolvanen) - Recursive build CC_FLAGS_LTO fix (Alexander Lobakin)" * tag 'clang-lto-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: kbuild: prevent CC_FLAGS_LTO self-bloating on recursive rebuilds arm64: allow LTO to be selected arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS arm64: vdso: disable LTO drivers/misc/lkdtm: disable LTO for rodata.o efi/libstub: disable LTO scripts/mod: disable LTO for empty.c modpost: lto: strip .lto from module names PCI: Fix PREL32 relocations for LTO init: lto: fix PREL32 relocations init: lto: ensure initcall ordering kbuild: lto: add a default list of used symbols kbuild: lto: merge module sections kbuild: lto: limit inlining kbuild: lto: fix module versioning kbuild: add support for Clang LTO tracing: move function tracer options to Kconfig
2021-02-22Merge tag 'topic/kcmp-kconfig-2021-02-22' of git://anongit.freedesktop.org/drm/drmLinus Torvalds1-0/+11
Pull kcmp kconfig update from Daniel Vetter: "Make the kcmp syscall available independently of checkpoint/restore. drm userspaces uses this, systemd uses this, so makes sense to pull it out from the checkpoint-restore bundle. Kees reviewed this from security pov and is happy with the final version" Link: https://lwn.net/Articles/845448/ * tag 'topic/kcmp-kconfig-2021-02-22' of git://anongit.freedesktop.org/drm/drm: kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE
2021-02-22kbuild: check the minimum linker version in KconfigMasahiro Yamada1-8/+13
Unify the two scripts/ld-version.sh and scripts/lld-version.sh, and check the minimum linker version like scripts/cc-version.sh did. I tested this script for some corner cases reported in the past: - GNU ld version 2.25-15.fc23 as reported by commit 8083013fc320 ("ld-version: Fix it on Fedora") - GNU ld (GNU Binutils) 2.20.1.20100303 as reported by commit 0d61ed17dd30 ("ld-version: Drop the 4th and 5th version components") This script show an error message if the linker is too old: $ make LD=ld.lld-9 SYNC include/config/auto.conf *** *** Linker is too old. *** Your LLD version: 9.0.1 *** Minimum LLD version: 10.0.1 *** scripts/Kconfig.include:50: Sorry, this linker is not supported. make[2]: *** [scripts/kconfig/Makefile:71: syncconfig] Error 1 make[1]: *** [Makefile:600: syncconfig] Error 2 make: *** [Makefile:708: include/config/auto.conf] Error 2 I also moved the check for gold to this script, so gold is still rejected: $ make LD=gold SYNC include/config/auto.conf gold linker is not supported as it is not capable of linking the kernel proper. scripts/Kconfig.include:50: Sorry, this linker is not supported. make[2]: *** [scripts/kconfig/Makefile:71: syncconfig] Error 1 make[1]: *** [Makefile:600: syncconfig] Error 2 make: *** [Makefile:708: include/config/auto.conf] Error 2 Thanks to David Laight for suggesting shell script improvements. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org>
2021-02-21Merge tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull scheduler updates from Ingo Molnar: "Core scheduler updates: - Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the preempt=none/voluntary/full boot options (default: full), to allow distros to build a PREEMPT kernel but fall back to close to PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling behavior via a boot time selection. There's also the /debug/sched_debug switch to do this runtime. This feature is implemented via runtime patching (a new variant of static calls). The scope of the runtime patching can be best reviewed by looking at the sched_dynamic_update() function in kernel/sched/core.c. ( Note that the dynamic none/voluntary mode isn't 100% identical, for example preempt-RCU is available in all cases, plus the preempt count is maintained in all models, which has runtime overhead even with the code patching. ) The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast majority of distributions, are supposed to be unaffected. - Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that was found via rcutorture triggering a hang. The bug is that rcu_idle_enter() may wake up a NOCB kthread, but this happens after the last generic need_resched() check. Some cpuidle drivers fix it by chance but many others don't. In true 2020 fashion the original bug fix has grown into a 5-patch scheduler/RCU fix series plus another 16 RCU patches to address the underlying issue of missed preemption events. These are the initial fixes that should fix current incarnations of the bug. - Clean up rbtree usage in the scheduler, by providing & using the following consistent set of rbtree APIs: partial-order; less() based: - rb_add(): add a new entry to the rbtree - rb_add_cached(): like rb_add(), but for a rb_root_cached total-order; cmp() based: - rb_find(): find an entry in an rbtree - rb_find_add(): find an entry, and add if not found - rb_find_first(): find the first (leftmost) matching entry - rb_next_match(): continue from rb_find_first() - rb_for_each(): iterate a sub-tree using the previous two - Improve the SMP/NUMA load-balancer: scan for an idle sibling in a single pass. This is a 4-commit series where each commit improves one aspect of the idle sibling scan logic. - Improve the cpufreq cooling driver by getting the effective CPU utilization metrics from the scheduler - Improve the fair scheduler's active load-balancing logic by reducing the number of active LB attempts & lengthen the load-balancing interval. This improves stress-ng mmapfork performance. - Fix CFS's estimated utilization (util_est) calculation bug that can result in too high utilization values Misc updates & fixes: - Fix the HRTICK reprogramming & optimization feature - Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code - Reduce dl_add_task_root_domain() overhead - Fix uprobes refcount bug - Process pending softirqs in flush_smp_call_function_from_idle() - Clean up task priority related defines, remove *USER_*PRIO and USER_PRIO() - Simplify the sched_init_numa() deduplication sort - Documentation updates - Fix EAS bug in update_misfit_status(), which degraded the quality of energy-balancing - Smaller cleanups" * tag 'sched-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) sched,x86: Allow !PREEMPT_DYNAMIC entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point entry: Explicitly flush pending rcuog wakeup before last rescheduling point rcu/nocb: Trigger self-IPI on late deferred wake up before user resume rcu/nocb: Perform deferred wake up before last idle's need_resched() check rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers sched/features: Distinguish between NORMAL and DEADLINE hrtick sched/features: Fix hrtick reprogramming sched/deadline: Reduce rq lock contention in dl_add_task_root_domain() uprobes: (Re)add missing get_uprobe() in __find_uprobe() smp: Process pending softirqs in flush_smp_call_function_from_idle() sched: Harden PREEMPT_DYNAMIC static_call: Allow module use without exposing static_call_key sched: Add /debug/sched_preempt preempt/dynamic: Support dynamic preempt with preempt= boot option preempt/dynamic: Provide irqentry_exit_cond_resched() static call preempt/dynamic: Provide preempt_schedule[_notrace]() static calls preempt/dynamic: Provide cond_resched() and might_resched() static calls preempt: Introduce CONFIG_PREEMPT_DYNAMIC static_call: Provide DEFINE_STATIC_CALL_RET0() ...