aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2019-06-06arm64: Silence gcc warnings about arch ABI driftDave Martin1-0/+1
Since GCC 9, the compiler warns about evolution of the platform-specific ABI, in particular relating for the marshaling of certain structures involving bitfields. The kernel is a standalone binary, and of course nobody would be so stupid as to expose structs containing bitfields as function arguments in ABI. (Passing a pointer to such a struct, however inadvisable, should be unaffected by this change. perf and various drivers rely on that.) So these warnings do more harm than good: turn them off. We may miss warnings about future ABI drift, but that's too bad. Future ABI breaks of this class will have to be debugged and fixed the traditional way unless the compiler evolves finer-grained diagnostics. Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-06-06ARM64: trivial: s/TIF_SECOMP/TIF_SECCOMP/ comment typo fixGeorge G. Davis1-1/+1
Fix a s/TIF_SECOMP/TIF_SECCOMP/ comment typo Cc: Jiri Kosina <trivial@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org Signed-off-by: George G. Davis <george_davis@mentor.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-06-05arm64: arch_timer: mark functions as __always_inlineAnders Roxell1-4/+4
If CONFIG_FUNCTION_GRAPH_TRACER is enabled function arch_counter_get_cntvct() is marked as notrace. However, function __arch_counter_get_cntvct is marked as inline. If CONFIG_OPTIMIZE_INLINING is set that will make the two functions tracable which they shouldn't. Rework so that functions __arch_counter_get_* are marked with __always_inline so they will be inlined even if CONFIG_OPTIMIZE_INLINING is turned on. Fixes: 0ea415390cd3 ("clocksource/arm_arch_timer: Use arch_timer_read_counter to access stable counters") Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-06-05arm64: smp: Moved cpu_logical_map[] to smp.hFlorian Fainelli2-5/+6
asm/smp.h is included by linux/smp.h and some drivers, in particular irqchip drivers can access cpu_logical_map[] in order to perform SMP affinity tasks. Make arm64 consistent with other architectures here. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-06-05arm64: cpufeature: Fix missing ZFR0 in __read_sysreg_by_encoding()Dave Martin1-0/+1
In commit 06a916feca2b ("arm64: Expose SVE2 features for userspace"), new hwcaps are added that are detected via fields in the SVE-specific ID register ID_AA64ZFR0_EL1. In order to check compatibility of secondary cpus with the hwcaps established at boot, the cpufeatures code uses __read_sysreg_by_encoding() to read this ID register based on the sys_reg field of the arm64_elf_hwcaps[] table. This leads to a kernel splat if an hwcap uses an ID register that __read_sysreg_by_encoding() doesn't explicitly handle, as now happens when exercising cpu hotplug on an SVE2-capable platform. So fix it by adding the required case in there. Fixes: 06a916feca2b ("arm64: Expose SVE2 features for userspace") Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-29arm64: use the correct function type for __arm64_sys_ni_syscallSami Tolvanen2-10/+11
Calling sys_ni_syscall through a syscall_fn_t pointer trips indirect call Control-Flow Integrity checking due to a function type mismatch. Use SYSCALL_DEFINE0 for __arm64_sys_ni_syscall instead and remove the now unnecessary casts. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-29arm64: use the correct function type in SYSCALL_DEFINE0Sami Tolvanen1-9/+9
Although a syscall defined using SYSCALL_DEFINE0 doesn't accept parameters, use the correct function type to avoid indirect call type mismatches with Control-Flow Integrity checking. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-29arm64: fix syscall_fn_t typeSami Tolvanen1-1/+1
Syscall wrappers in <asm/syscall_wrapper.h> use const struct pt_regs * as the argument type. Use const in syscall_fn_t as well to fix indirect call type mismatches with Control-Flow Integrity checking. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-29signal/arm64: Use force_sig not force_sig_fault for SIGKILLEric W. Biederman1-1/+4
I don't think this is userspace visible but SIGKILL does not have any si_codes that use the fault member of the siginfo union. Correct this the simple way and call force_sig instead of force_sig_fault when the signal is SIGKILL. The two know places where synchronous SIGKILL are generated are do_bad_area and fpsimd_save. The call paths to force_sig_fault are: do_bad_area arm64_force_sig_fault force_sig_fault force_signal_inject arm64_notify_die arm64_force_sig_fault force_sig_fault Which means correcting this in arm64_force_sig_fault is enough to ensure the arm64 code is not misusing the generic code, which could lead to maintenance problems later. Cc: stable@vger.kernel.org Cc: Dave Martin <Dave.Martin@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@arm.com> Fixes: af40ff687bc9 ("arm64: signal: Ensure si_code is valid for all fault signals") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-28arm64/module: revert to unsigned interpretation of ABS16/32 relocationsArd Biesheuvel1-8/+30
Commit 1cf24a2cc3fd ("arm64/module: deal with ambiguity in PRELxx relocation ranges") updated the overflow checking logic in the relocation handling code to ensure that PREL16/32 relocations don't overflow signed quantities. However, the same code path is used for absolute relocations, where the interpretation is the opposite: the only current use case for absolute relocations operating on non-native word size quantities is the CRC32 handling in the CONFIG_MODVERSIONS code, and these CRCs are unsigned 32-bit quantities, which are now being rejected by the module loader if bit 31 happens to be set. So let's use different ranges for quanties subject to absolute vs. relative relocations: - ABS16/32 relocations should be in the range [0, Uxx_MAX) - PREL16/32 relocations should be in the range [Sxx_MIN, Sxx_MAX) - otherwise, print an error since no other 16 or 32 bit wide data relocations are currently supported. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-28arm64: Fix the arm64_personality() syscall wrapper redirectionCatalin Marinas1-1/+1
Following commit 4378a7d4be30 ("arm64: implement syscall wrappers"), the syscall function names gained the '__arm64_' prefix. Ensure that we have the correct #define for redirecting a default syscall through a wrapper. Fixes: 4378a7d4be30 ("arm64: implement syscall wrappers") Cc: <stable@vger.kernel.org> # 4.19.x- Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-24arm64: insn: Add BUILD_BUG_ON() for invalid masksJean-Philippe Brucker1-5/+11
Detect invalid instruction masks at build time. Some versions of GCC can warn about the situation, but not all of them, it seems. Suggested-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-24arm64: insn: Fix ldadd instruction encodingJean-Philippe Brucker1-1/+1
GCC 8.1.0 reports that the ldadd instruction encoding, recently added to insn.c, doesn't match the mask and couldn't possibly be identified: linux/arch/arm64/include/asm/insn.h: In function 'aarch64_insn_is_ldadd': linux/arch/arm64/include/asm/insn.h:280:257: warning: bitwise comparison always evaluates to false [-Wtautological-compare] Bits [31:30] normally encode the size of the instruction (1 to 8 bytes) and the current instruction value only encodes the 4- and 8-byte variants. At the moment only the BPF JIT needs this instruction, and doesn't require the 1- and 2-byte variants, but to be consistent with our other ldr and str instruction encodings, clear the size field in the insn value. Fixes: 34b8ab091f9ef57a ("bpf, arm64: use more scalable stadd over ldxr / stxr loop in xadd") Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reported-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-24arm64: Kconfig: Make ARM64_PSEUDO_NMI depend on BROKEN for nowWill Deacon1-0/+1
Although we merged support for pseudo-nmi using interrupt priority masking in 5.1, we've since uncovered a number of non-trivial issues with the implementation. Although there are patches pending to address these problems, we're facing issues that prevent us from merging them at this current time: https://lkml.kernel.org/r/1556553607-46531-1-git-send-email-julien.thierry@arm.com For now, simply mark this optional feature as BROKEN in the hope that we can fix things properly in the near future. Cc: <stable@vger.kernel.org> # 5.1 Cc: Julien Thierry <julien.thierry@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-23arm64: Handle erratum 1418040 as a superset of erratum 1188873Marc Zyngier5-21/+24
We already mitigate erratum 1188873 affecting Cortex-A76 and Neoverse-N1 r0p0 to r2p0. It turns out that revisions r0p0 to r3p1 of the same cores are affected by erratum 1418040, which has the same workaround as 1188873. Let's expand the range of affected revisions to match 1418040, and repaint all occurences of 1188873 to 1418040. Whilst we're there, do a bit of reformating in silicon-errata.txt and drop a now unnecessary dependency on ARM_ARCH_TIMER_OOL_WORKAROUND. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-23arm64/module: deal with ambiguity in PRELxx relocation rangesArd Biesheuvel1-2/+14
The R_AARCH64_PREL16 and R_AARCH64_PREL32 relocations are documented as permitting a range of [-2^15 .. 2^16), resp. [-2^31 .. 2^32). It is also documented that this means we cannot detect overflow in some cases, which is bad. Since we always interpret the targets of these relocations as signed quantities (e.g., in the ksymtab handling code), let's tighten the overflow checks so that targets that are out of range for our signed interpretation of the relocated quantity get flagged. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-23ACPI/IORT: Fix build error when IOMMU_SUPPORT is disabledLorenzo Pieralisi1-118/+120
If IOMMU_SUPPORT is not enabled (and therefore IOMMU_API is not selected), struct iommu_fwspec is an empty struct and IOMMU_FWSPEC_PCI_RC_ATS is not defined, resulting in the following compilation errors: drivers/acpi/arm64/iort.c: In function iort_iommu_configure: drivers/acpi/arm64/iort.c:1079:21: error: struct iommu_fwspec has no member named flag: dev->iommu_fwspec->flags |= IOMMU_FWSPEC_PCI_RC_ATS; ^~ drivers/acpi/arm64/iort.c:1079:32: error: IOMMU_FWSPEC_PCI_RC_ATS undeclared (first use in this function) dev->iommu_fwspec->flags |= IOMMU_FWSPEC_PCI_RC_ATS; ^~~~~~~~~~~~~~~~~~~~~~~ drivers/acpi/arm64/iort.c:1079:32: note: each undeclared identifier is reported only once for each function it appears in Move iort_iommu_configure() (and the helpers functions it relies on) into CONFIG_IOMMU_API preprocessor guarded code so that when CONFIG_IOMMU_SUPPORT is not enabled we prevent compiling code that is basically equivalent to no-OP, fixing the build errors. Cc: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/linux-arm-kernel/20190515034253.79348-1-wangkefeng.wang@huawei.com/ Fixes: 5702ee24182f ("ACPI/IORT: Check ATS capability in root complex nodes") Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-23arm64/kernel: kaslr: reduce module randomization range to 2 GBArd Biesheuvel2-4/+4
The following commit 7290d5809571 ("module: use relative references for __ksymtab entries") updated the ksymtab handling of some KASLR capable architectures so that ksymtab entries are emitted as pairs of 32-bit relative references. This reduces the size of the entries, but more importantly, it gets rid of statically assigned absolute addresses, which require fixing up at boot time if the kernel is self relocating (which takes a 24 byte RELA entry for each member of the ksymtab struct). Since ksymtab entries are always part of the same module as the symbol they export, it was assumed at the time that a 32-bit relative reference is always sufficient to capture the offset between a ksymtab entry and its target symbol. Unfortunately, this is not always true: in the case of per-CPU variables, a per-CPU variable's base address (which usually differs from the actual address of any of its per-CPU copies) is allocated in the vicinity of the ..data.percpu section in the core kernel (i.e., in the per-CPU reserved region which follows the section containing the core kernel's statically allocated per-CPU variables). Since we randomize the module space over a 4 GB window covering the core kernel (based on the -/+ 4 GB range of an ADRP/ADD pair), we may end up putting the core kernel out of the -/+ 2 GB range of 32-bit relative references of module ksymtab entries that refer to per-CPU variables. So reduce the module randomization range a bit further. We lose 1 bit of randomization this way, but this is something we can tolerate. Cc: <stable@vger.kernel.org> # v4.19+ Signed-off-by: Ard Biesheuvel <ard.biesheuvel@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-23arm64: errata: Add workaround for Cortex-A76 erratum #1463225Will Deacon6-1/+109
Revisions of the Cortex-A76 CPU prior to r4p0 are affected by an erratum that can prevent interrupts from being taken when single-stepping. This patch implements a software workaround to prevent userspace from effectively being able to disable interrupts. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-23arm64: Remove useless message during oopsWill Deacon1-4/+0
During an oops, we print the name of the current task and its pid twice. We also helpfully advertise its stack limit as "0x(____ptrval____)". Drop these useless messages. Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-19Linux 5.2-rc1Linus Torvalds1-2/+2
2019-05-19Merge tag 'upstream-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifsLinus Torvalds3-4/+8
Pull UBIFS fixes from Richard Weinberger: - build errors wrt xattrs - mismerge which lead to a wrong Kconfig ifdef - missing endianness conversion * tag 'upstream-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubifs: Convert xattr inum to host order ubifs: Use correct config name for encryption ubifs: Fix build error without CONFIG_UBIFS_FS_XATTR
2019-05-19Merge branch 'akpm' (patches from Andrew)Linus Torvalds10-257/+889
Merge yet more updates from Andrew Morton: "A few final bits: - large changes to vmalloc, yielding large performance benefits - tweak the console-flush-on-panic code - a few fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: panic: add an option to replay all the printk message in buffer initramfs: don't free a non-existent initrd fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount mm/compaction.c: correct zone boundary handling when isolating pages from a pageblock mm/vmap: add DEBUG_AUGMENT_LOWEST_MATCH_CHECK macro mm/vmap: add DEBUG_AUGMENT_PROPAGATE_CHECK macro mm/vmalloc.c: keep track of free blocks for vmap allocation
2019-05-19Merge tag 'kbuild-v5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuildLinus Torvalds74-172/+178
Pull more Kbuild updates from Masahiro Yamada: - remove unneeded use of cc-option, cc-disable-warning, cc-ldoption - exclude tracked files from .gitignore - re-enable -Wint-in-bool-context warning - refactor samples/Makefile - stop building immediately if syncconfig fails - do not sprinkle error messages when $(CC) does not exist - move arch/alpha/defconfig to the configs subdirectory - remove crappy header search path manipulation - add comment lines to .config to clarify the end of menu blocks - check uniqueness of module names (adding new warnings intentionally) * tag 'kbuild-v5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (24 commits) kconfig: use 'else ifneq' for Makefile to improve readability kbuild: check uniqueness of module names kconfig: Terminate menu blocks with a comment in the generated config kbuild: add LICENSES to KBUILD_ALLDIRS kbuild: remove 'addtree' and 'flags' magic for header search paths treewide: prefix header search paths with $(srctree)/ media: prefix header search paths with $(srctree)/ media: remove unneeded header search paths alpha: move arch/alpha/defconfig to arch/alpha/configs/defconfig kbuild: terminate Kconfig when $(CC) or $(LD) is missing kbuild: turn auto.conf.cmd into a mandatory include file .gitignore: exclude .get_maintainer.ignore and .gitattributes kbuild: add all Clang-specific flags unconditionally kbuild: Don't try to add '-fcatch-undefined-behavior' flag kbuild: add some extra warning flags unconditionally kbuild: add -Wvla flag unconditionally arch: remove dangling asm-generic wrappers samples: guard sub-directories with CONFIG options kbuild: re-enable int-in-bool-context warning MAINTAINERS: kbuild: Add pattern for scripts/*vmlinux* ...
2019-05-19Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linuxLinus Torvalds3-13/+111
Pull i2c updates from Wolfram Sang: "Some I2C core API additions which are kind of simple but enhance error checking for users a lot, especially by returning errno now. There are wrappers to still support the old API but it will be removed once all users are converted" * 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: core: add device-managed version of i2c_new_dummy i2c: core: improve return value handling of i2c_new_device and i2c_new_dummy
2019-05-19Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds13-65/+107
Pull ext4 fixes from Ted Ts'o: "Some bug fixes, and an update to the URL's for the final version of Unicode 12.1.0" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: avoid panic during forced reboot due to aborted journal ext4: fix block validity checks for journal inodes using indirect blocks unicode: update to Unicode 12.1.0 final unicode: add missing check for an error return from utf8lookup() ext4: fix miscellaneous sparse warnings ext4: unsigned int compared against zero ext4: fix use-after-free in dx_release() ext4: fix data corruption caused by overlapping unaligned and aligned IO jbd2: fix potential double free ext4: zero out the unused memory region in the extent tree block
2019-05-19Merge tag '5.2-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds8-51/+173
Pull cifs fixes from Steve French: "Minor cleanup and fixes, one for stable, four rdma (smbdirect) related. Also adds SEEK_HOLE support" * tag '5.2-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: add support for SEEK_DATA and SEEK_HOLE Fixed https://bugzilla.kernel.org/show_bug.cgi?id=202935 allow write on the same file cifs: Allocate memory for all iovs in smb2_ioctl cifs: Don't match port on SMBDirect transport cifs:smbd Use the correct DMA direction when sending data cifs:smbd When reconnecting to server, call smbd_destroy() after all MIDs have been called cifs: use the right include for signal_pending() smb3: trivial cleanup to smb2ops.c cifs: cleanup smb2ops.c and normalize strings smb3: display session id in debug data
2019-05-19Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds94-214/+5414
Pull perf tooling updates from Ingo Molnar: "perf.data: - Streaming compression of perf ring buffer into PERF_RECORD_COMPRESSED user space records, resulting in ~3-5x perf.data file size reduction on variety of tested workloads what saves storage space on larger server systems where perf.data size can easily reach several tens or even hundreds of GiBs, especially when profiling with DWARF-based stacks and tracing of context switches. perf record: - Improve -user-regs/intr-regs suggestions to overcome errors perf annotate: - Remove hist__account_cycles() from callback, speeding up branch processing (perf record -b) perf stat: - Add a 'percore' event qualifier, e.g.: -e cpu/event=0,umask=0x3,percore=1/, that sums up the event counts for both hardware threads in a core. We can already do this with --per-core, but it's often useful to do this together with other metrics that are collected per hardware thread. I.e. now its possible to do this per-event, and have it mixed with other events not aggregated by core. arm64: - Map Brahma-B53 CPUID to cortex-a53 events. - Add Cortex-A57 and Cortex-A72 events. csky: - Add DWARF register mappings for libdw, allowing --call-graph=dwarf to work on the C-SKY arch. x86: - Add support for recording and printing XMM registers, available, for instance, on Icelake. - Add uncore_upi (Intel's "Ultra Path Interconnect" events) JSON support. UPI replaced the Intel QuickPath Interconnect (QPI) in Xeon Skylake-SP. Intel PT: - Fix instructions sampling rate. - Timestamp fixes. - Improve exported-sql-viewer GUI, allowing, for instance, to copy'n'paste the trees, useful for e-mailing" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits) perf stat: Support 'percore' event qualifier perf stat: Factor out aggregate counts printing perf tools: Add a 'percore' event qualifier perf docs: Add description for stderr perf intel-pt: Fix sample timestamp wrt non-taken branches perf intel-pt: Fix improved sample timestamp perf intel-pt: Fix instructions sampling rate perf regs x86: Add X86 specific arch__intr_reg_mask() perf parse-regs: Add generic support for arch__intr/user_reg_mask() perf parse-regs: Split parse_regs perf vendor events arm64: Add Cortex-A57 and Cortex-A72 events perf vendor events arm64: Map Brahma-B53 CPUID to cortex-a53 events perf vendor events arm64: Remove [[:xdigit:]] wildcard perf jevents: Remove unused variable perf test zstd: Fixup verbose mode output perf tests: Implement Zstd comp/decomp integration test perf inject: Enable COMPRESSED record decompression perf report: Implement perf.data record decompression perf record: Implement -z,--compression_level[=<n>] option perf report: Add stub processing of compressed events for -D ...
2019-05-19Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds12-136/+202
Pull clocksource updates from Ingo Molnar: "Misc clocksource/clockevent driver updates that came in a bit late but are ready for v5.2" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: misc: atmel_tclib: Do not probe already used TCBs clocksource/drivers/timer-atmel-tcb: Convert tc_clksrc_suspend|resume() to static clocksource/drivers/tcb_clksrc: Rename the file for consistency clocksource/drivers/timer-atmel-pit: Rework Kconfig option clocksource/drivers/tcb_clksrc: Move Kconfig option ARM: at91: Implement clocksource selection clocksource/drivers/tcb_clksrc: Use tcb as sched_clock clocksource/drivers/tcb_clksrc: Stop depending on atmel_tclib ARM: at91: move SoC specific definitions to SoC folder clocksource/drivers/timer-milbeaut: Cleanup common register accesses clocksource/drivers/timer-milbeaut: Add shutdown function clocksource/drivers/timer-milbeaut: Fix to enable one-shot timer clocksource/drivers/tegra: Rework for compensation of suspend time clocksource/drivers/sp804: Add COMPILE_TEST to CONFIG_ARM_TIMER_SP804 clocksource/drivers/sun4i: Add a compatible for suniv dt-bindings: timer: Add Allwinner suniv timer
2019-05-19Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds38-229/+2511
Pull IRQ chip updates from Ingo Molnar: "A late irqchips update: - New TI INTR/INTA set of drivers - Rewrite of the stm32mp1-exti driver as a platform driver - Update the IOMMU MSI mapping API to be RT friendly - A number of cleanups and other low impact fixes" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) iommu/dma-iommu: Remove iommu_dma_map_msi_msg() irqchip/gic-v3-mbi: Don't map the MSI page in mbi_compose_m{b, s}i_msg() irqchip/ls-scfg-msi: Don't map the MSI page in ls_scfg_msi_compose_msg() irqchip/gic-v3-its: Don't map the MSI page in its_irq_compose_msi_msg() irqchip/gicv2m: Don't map the MSI page in gicv2m_compose_msi_msg() iommu/dma-iommu: Split iommu_dma_map_msi_msg() in two parts genirq/msi: Add a new field in msi_desc to store an IOMMU cookie arm64: arch_k3: Enable interrupt controller drivers irqchip/ti-sci-inta: Add msi domain support soc: ti: Add MSI domain bus support for Interrupt Aggregator irqchip/ti-sci-inta: Add support for Interrupt Aggregator driver dt-bindings: irqchip: Introduce TISCI Interrupt Aggregator bindings irqchip/ti-sci-intr: Add support for Interrupt Router driver dt-bindings: irqchip: Introduce TISCI Interrupt router bindings gpio: thunderx: Use the default parent apis for {request,release}_resources genirq: Introduce irq_chip_{request,release}_resource_parent() apis firmware: ti_sci: Add helper apis to manage resources firmware: ti_sci: Add RM mapping table for am654 firmware: ti_sci: Add support for IRQ management firmware: ti_sci: Add support for RM core ops ...
2019-05-19Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-2/+6
Pull EFI fix from Ingo Molnar: "Fix an EFI-fb regression that affects certain x86 systems" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fbdev/efifb: Ignore framebuffer memmap entries that lack any memory types
2019-05-19Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds9-26/+27
Pull core fixes from Ingo Molnar: "This fixes a particularly thorny munmap() bug with MPX, plus fixes a host build environment assumption in objtool" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Allow AR to be overridden with HOSTAR x86/mpx, mm/core: Fix recursive munmap() corruption
2019-05-19Merge tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds25-124/+277
Pull ARM SoC late updates from Olof Johansson: "This is some material that we picked up into our tree late. Most of it are smaller fixes and additions, some defconfig updates due to recent development, etc. Code-wise the largest portion is a series of PM updates for the at91 platform, and those have been in linux-next a while through the at91 tree before we picked them up" * tag 'armsoc-late' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (29 commits) arm64: dts: sprd: Add clock properties for serial devices Opt out of scripts/get_maintainer.pl ARM: ixp4xx: Remove duplicated include from common.c soc: ixp4xx: qmgr: Fix an NULL vs IS_ERR() check in probe arm64: tegra: Disable XUSB support on Jetson TX2 arm64: tegra: Enable SMMU translation for PCI on Tegra186 arm64: tegra: Fix insecure SMMU users for Tegra186 arm64: tegra: Select ARM_GIC_PM amba: tegra-ahb: Mark PM functions as __maybe_unused ARM: dts: logicpd-som-lv: Fix MMC1 card detect ARM: mvebu: drop return from void function ARM: mvebu: prefix coprocessor operand with p ARM: mvebu: drop unnecessary label ARM: mvebu: fix a leaked reference by adding missing of_node_put ARM: socfpga_defconfig: enable LTC2497 ARM: mvebu: kirkwood: remove error message when retrieving mac address ARM: at91: sama5: make ov2640 as a module ARM: OMAP1: ams-delta: fix early boot crash when LED support is disabled ARM: at91: remove HAVE_FB_ATMEL for sama5 SoC as they use DRM soc/fsl/qe: Fix an error code in qe_pin_request() ...
2019-05-19Merge tag 'powerpc-5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds4-5/+7
Pull powerpc fixes from Michael Ellerman: "One fix going back to stable, for a bug on 32-bit introduced when we added support for THREAD_INFO_IN_TASK. A fix for a typo in a recent rework of our hugetlb code that leads to crashes on 64-bit when using hugetlbfs with a 4K PAGE_SIZE. Two fixes for our recent rework of the address layout on 64-bit hash CPUs, both only triggered when userspace tries to access addresses outside the user or kernel address ranges. Finally a fix for a recently introduced double free in an error path in our cacheinfo code. Thanks to: Aneesh Kumar K.V, Christophe Leroy, Sachin Sant, Tobin C. Harding" * tag 'powerpc-5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/cacheinfo: Remove double free powerpc/mm/hash: Fix get_region_id() for invalid addresses powerpc/mm: Drop VM_BUG_ON in get_region_id() powerpc/mm: Fix crashes with hugepages & 4K pages powerpc/32s: fix flush_hash_pages() on SMP
2019-05-19Merge tag 'mips_5.2_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linuxLinus Torvalds18-659/+831
Pull a few more MIPS updates from Paul Burton: "Some SGI IP27 specific PCI rework and a batch of fixes: - A build fix for BMIPS5000 configurations with CONFIG_HW_PERF_EVENTS=y, which also neatly removes some #ifdefery. - A fix to report supported ISAs correctly on older Ingenic SoCs which incorrectly indicate MIPSr2 support in their cop0 Config register. - Some PCI modernization for SGI IP27 systems as part of ongoing work to support some other SGI systems. - A fix allowing use of appended DTB files with generic kernels. - DMA mask fixes for SGI IP22 & Alchemy systems" * tag 'mips_5.2_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: Alchemy: add DMA masks for on-chip ethernet MIPS: SGI-IP22: provide missing dma_mask/coherent_dma_mask generic: fix appended dtb support MIPS: SGI-IP27: abstract chipset irq from bridge MIPS: SGI-IP27: use generic PCI driver MIPS: Fix Ingenic SoCs sometimes reporting wrong ISA MIPS: perf: Fix build with CONFIG_CPU_BMIPS5000 enabled
2019-05-19Merge tag 'riscv-for-linus-5.2-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linuxLinus Torvalds36-321/+635
Pull RISC-V updates from Palmer Dabbelt: "This contains an assortment of RISC-V related patches that I'd like to target for the 5.2 merge window. Most of the patches are cleanups, but there are a handful of user-visible changes: - The nosmp and nr_cpus command-line arguments are now supported, which work like normal. - The SBI console no longer installs itself as a preferred console, we rely on standard mechanisms (/chosen, command-line, hueristics) instead. - sfence_remove_sfence_vma{,_asid} now pass their arguments along to the SBI call. - Modules now support BUG(). - A missing sfence.vma during boot has been added. This bug only manifests during boot. - The arch/riscv support for SiFive's L2 cache controller has been merged, which should un-block the EDAC framework work. I've only tested this on QEMU again, as I didn't have time to get things running on the Unleashed. The latest master from this morning merges in cleanly and passes the tests as well" * tag 'riscv-for-linus-5.2-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: (31 commits) riscv: fix locking violation in page fault handler RISC-V: sifive_l2_cache: Add L2 cache controller driver for SiFive SoCs RISC-V: Add DT documentation for SiFive L2 Cache Controller RISC-V: Avoid using invalid intermediate translations riscv: Support BUG() in kernel module riscv: Add the support for c.ebreak check in is_valid_bugaddr() riscv: support trap-based WARN() riscv: fix sbi_remote_sfence_vma{,_asid}. riscv: move switch_mm to its own file riscv: move flush_icache_{all,mm} to cacheflush.c tty: Don't force RISCV SBI console as preferred console RISC-V: Access CSRs using CSR numbers RISC-V: Add interrupt related SCAUSE defines in asm/csr.h RISC-V: Use tabs to align macro values in asm/csr.h RISC-V: Fix minor checkpatch issues. RISC-V: Support nr_cpus command line option. RISC-V: Implement nosmp commandline option. RISC-V: Add RISC-V specific arch_match_cpu_phys_id riscv: vdso: drop unnecessary cc-ldoption riscv: call pm_power_off from machine_halt / machine_power_off ...
2019-05-19kconfig: use 'else ifneq' for Makefile to improve readabilityMasahiro Yamada1-3/+1
'ifeq ... else ifneq ... endif' notation is supported by GNU Make 3.81 or later, which is the requirement for building the kernel since commit 37d69ee30808 ("docs: bump minimal GNU Make version to 3.81"). Use it to improve the readability. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-05-18panic: add an option to replay all the printk message in bufferFeng Tang5-4/+24
Currently on panic, kernel will lower the loglevel and print out pending printk msg only with console_flush_on_panic(). Add an option for users to configure the "panic_print" to replay all dmesg in buffer, some of which they may have never seen due to the loglevel setting, which will help panic debugging . [feng.tang@intel.com: keep the original console_flush_on_panic() inside panic()] Link: http://lkml.kernel.org/r/1556199137-14163-1-git-send-email-feng.tang@intel.com [feng.tang@intel.com: use logbuf lock to protect the console log index] Link: http://lkml.kernel.org/r/1556269868-22654-1-git-send-email-feng.tang@intel.com Link: http://lkml.kernel.org/r/1556095872-36838-1-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang <feng.tang@intel.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Aaro Koskinen <aaro.koskinen@nokia.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-18initramfs: don't free a non-existent initrdSteven Price1-1/+1
Since commit 54c7a8916a88 ("initramfs: free initrd memory if opening /initrd.image fails"), the kernel has unconditionally attempted to free the initrd even if it doesn't exist. In the non-existent case this causes a boot-time splat if CONFIG_DEBUG_VIRTUAL is enabled due to a call to virt_to_phys() with a NULL address. Instead we should check that the initrd actually exists and only attempt to free it if it does. Link: http://lkml.kernel.org/r/20190516143125.48948-1-steven.price@arm.com Fixes: 54c7a8916a88 ("initramfs: free initrd memory if opening /initrd.image fails") Signed-off-by: Steven Price <steven.price@arm.com> Reported-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-18fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umountJiufei Xue1-3/+8
synchronize_rcu() didn't wait for call_rcu() callbacks, so inode wb switch may not go to the workqueue after synchronize_rcu(). Thus previous scheduled switches was not finished even flushing the workqueue, which will cause a NULL pointer dereferenced followed below. VFS: Busy inodes after unmount of vdd. Self-destruct in 5 seconds. Have a nice day... BUG: unable to handle kernel NULL pointer dereference at 0000000000000278 evict+0xb3/0x180 iput+0x1b0/0x230 inode_switch_wbs_work_fn+0x3c0/0x6a0 worker_thread+0x4e/0x490 ? process_one_work+0x410/0x410 kthread+0xe6/0x100 ret_from_fork+0x39/0x50 Replace the synchronize_rcu() call with a rcu_barrier() to wait for all pending callbacks to finish. And inc isw_nr_in_flight after call_rcu() in inode_switch_wbs() to make more sense. Link: http://lkml.kernel.org/r/20190429024108.54150-1-jiufei.xue@linux.alibaba.com Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> Acked-by: Tejun Heo <tj@kernel.org> Suggested-by: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-18mm/compaction.c: correct zone boundary handling when isolating pages from a pageblockMel Gorman1-2/+2
syzbot reported the following error from a tree with a head commit of baf76f0c58ae ("slip: make slhc_free() silently accept an error pointer") BUG: unable to handle kernel paging request at ffffea0003348000 #PF error: [normal kernel read fault] PGD 12c3f9067 P4D 12c3f9067 PUD 12c3f8067 PMD 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 28916 Comm: syz-executor.2 Not tainted 5.1.0-rc6+ #89 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:constant_test_bit arch/x86/include/asm/bitops.h:314 [inline] RIP: 0010:PageCompound include/linux/page-flags.h:186 [inline] RIP: 0010:isolate_freepages_block+0x1c0/0xd40 mm/compaction.c:579 Code: 01 d8 ff 4d 85 ed 0f 84 ef 07 00 00 e8 29 00 d8 ff 4c 89 e0 83 85 38 ff ff ff 01 48 c1 e8 03 42 80 3c 38 00 0f 85 31 0a 00 00 <4d> 8b 2c 24 31 ff 49 c1 ed 10 41 83 e5 01 44 89 ee e8 3a 01 d8 ff RSP: 0018:ffff88802b31eab8 EFLAGS: 00010246 RAX: 1ffffd4000669000 RBX: 00000000000cd200 RCX: ffffc9000a235000 RDX: 000000000001ca5e RSI: ffffffff81988cc7 RDI: 0000000000000001 RBP: ffff88802b31ebd8 R08: ffff88805af700c0 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffea0003348000 R13: 0000000000000000 R14: ffff88802b31f030 R15: dffffc0000000000 FS: 00007f61648dc700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffea0003348000 CR3: 0000000037c64000 CR4: 00000000001426e0 Call Trace: fast_isolate_around mm/compaction.c:1243 [inline] fast_isolate_freepages mm/compaction.c:1418 [inline] isolate_freepages mm/compaction.c:1438 [inline] compaction_alloc+0x1aee/0x22e0 mm/compaction.c:1550 There is no reproducer and it is difficult to hit -- 1 crash every few days. The issue is very similar to the fix in commit 6b0868c820ff ("mm/compaction.c: correct zone boundary handling when resetting pageblock skip hints"). When isolating free pages around a target pageblock, the boundary handling is off by one and can stray into the next pageblock. Triggering the syzbot error requires that the end of pageblock is section or zone aligned, and that the next section is unpopulated. A more subtle consequence of the bug is that pageblocks were being improperly used as migration targets which potentially hurts fragmentation avoidance in the long-term one page at a time. A debugging patch revealed that it's definitely possible to stray outside of a pageblock which is not intended. While syzbot cannot be used to verify this patch, it was confirmed that the debugging warning no longer triggers with this patch applied. It has also been confirmed that the THP allocation stress tests are not degraded by this patch. Link: http://lkml.kernel.org/r/20190510182124.GI18914@techsingularity.net Fixes: e332f741a8dd ("mm, compaction: be selective about what pageblocks to clear skip hints") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: syzbot+d84c80f9fe26a0f7a734@syzkaller.appspotmail.com Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Qian Cai <cai@lca.pw> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> # v5.1+ Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-18mm/vmap: add DEBUG_AUGMENT_LOWEST_MATCH_CHECK macroUladzislau Rezki (Sony)1-0/+43
This macro adds some debug code to check that vmap allocations are happened in ascending order. By default this option is set to 0 and not active. It requires recompilation of the kernel to activate it. Set to 1, compile the kernel. [urezki@gmail.com: v4] Link: http://lkml.kernel.org/r/20190406183508.25273-4-urezki@gmail.com Link: http://lkml.kernel.org/r/20190402162531.10888-4-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Roman Gushchin <guro@fb.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Joel Fernandes <joelaf@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-18mm/vmap: add DEBUG_AUGMENT_PROPAGATE_CHECK macroUladzislau Rezki (Sony)1-0/+48
This macro adds some debug code to check that the augment tree is maintained correctly, meaning that every node contains valid subtree_max_size value. By default this option is set to 0 and not active. It requires recompilation of the kernel to activate it. Set to 1, compile the kernel. [urezki@gmail.com: v4] Link: http://lkml.kernel.org/r/20190406183508.25273-3-urezki@gmail.com Link: http://lkml.kernel.org/r/20190402162531.10888-3-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Roman Gushchin <guro@fb.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Joel Fernandes <joelaf@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-18mm/vmalloc.c: keep track of free blocks for vmap allocationUladzislau Rezki (Sony)2-247/+763
Patch series "improve vmap allocation", v3. Objective --------- Please have a look for the description at: https://lkml.org/lkml/2018/10/19/786 but let me also summarize it a bit here as well. The current implementation has O(N) complexity. Requests with different permissive parameters can lead to long allocation time. When i say "long" i mean milliseconds. Description ----------- This approach organizes the KVA memory layout into free areas of the 1-ULONG_MAX range, i.e. an allocation is done over free areas lookups, instead of finding a hole between two busy blocks. It allows to have lower number of objects which represent the free space, therefore to have less fragmented memory allocator. Because free blocks are always as large as possible. It uses the augment tree where all free areas are sorted in ascending order of va->va_start address in pair with linked list that provides O(1) access to prev/next elements. Since the tree is augment, we also maintain the "subtree_max_size" of VA that reflects a maximum available free block in its left or right sub-tree. Knowing that, we can easily traversal toward the lowest (left most path) free area. Allocation: ~O(log(N)) complexity. It is sequential allocation method therefore tends to maximize locality. The search is done until a first suitable block is large enough to encompass the requested parameters. Bigger areas are split. I copy paste here the description of how the area is split, since i described it in https://lkml.org/lkml/2018/10/19/786 <snip> A free block can be split by three different ways. Their names are FL_FIT_TYPE, LE_FIT_TYPE/RE_FIT_TYPE and NE_FIT_TYPE, i.e. they correspond to how requested size and alignment fit to a free block. FL_FIT_TYPE - in this case a free block is just removed from the free list/tree because it fully fits. Comparing with current design there is an extra work with rb-tree updating. LE_FIT_TYPE/RE_FIT_TYPE - left/right edges fit. In this case what we do is just cutting a free block. It is as fast as a current design. Most of the vmalloc allocations just end up with this case, because the edge is always aligned to 1. NE_FIT_TYPE - Is much less common case. Basically it happens when requested size and alignment does not fit left nor right edges, i.e. it is between them. In this case during splitting we have to build a remaining left free area and place it back to the free list/tree. Comparing with current design there are two extra steps. First one is we have to allocate a new vmap_area structure. Second one we have to insert that remaining free block to the address sorted list/tree. In order to optimize a first case there is a cache with free_vmap objects. Instead of allocating from slab we just take an object from the cache and reuse it. Second one is pretty optimized. Since we know a start point in the tree we do not do a search from the top. Instead a traversal begins from a rb-tree node we split. <snip> De-allocation. ~O(log(N)) complexity. An area is not inserted straight away to the tree/list, instead we identify the spot first, checking if it can be merged around neighbors. The list provides O(1) access to prev/next, so it is pretty fast to check it. Summarizing. If merged then large coalesced areas are created, if not the area is just linked making more fragments. There is one more thing that i should mention here. After modification of VA node, its subtree_max_size is updated if it was/is the biggest area in its left or right sub-tree. Apart of that it can also be populated back to upper levels to fix the tree. For more details please have a look at the __augment_tree_propagate_from() function and the description. Tests and stressing ------------------- I use the "test_vmalloc.sh" test driver available under "tools/testing/selftests/vm/" since 5.1-rc1 kernel. Just trigger "sudo ./test_vmalloc.sh" to find out how to deal with it. Tested on different platforms including x86_64/i686/ARM64/x86_64_NUMA. Regarding last one, i do not have any physical access to NUMA system, therefore i emulated it. The time of stressing is days. If you run the test driver in "stress mode", you also need the patch that is in Andrew's tree but not in Linux 5.1-rc1. So, please apply it: http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/commit/?id=e0cf7749bade6da318e98e934a24d8b62fab512c After massive testing, i have not identified any problems like memory leaks, crashes or kernel panics. I find it stable, but more testing would be good. Performance analysis -------------------- I have used two systems to test. One is i5-3320M CPU @ 2.60GHz and another is HiKey960(arm64) board. i5-3320M runs on 4.20 kernel, whereas Hikey960 uses 4.15 kernel. I have both system which could run on 5.1-rc1 as well, but the results have not been ready by time i an writing this. Currently it consist of 8 tests. There are three of them which correspond to different types of splitting(to compare with default). We have 3 ones(see above). Another 5 do allocations in different conditions. a) sudo ./test_vmalloc.sh performance When the test driver is run in "performance" mode, it runs all available tests pinned to first online CPU with sequential execution test order. We do it in order to get stable and repeatable results. Take a look at time difference in "long_busy_list_alloc_test". It is not surprising because the worst case is O(N). # i5-3320M How many cycles all tests took: CPU0=646919905370(default) cycles vs CPU0=193290498550(patched) cycles # See detailed table with results here: ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_performance_default.txt ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_performance_patched.txt # Hikey960 8x CPUs How many cycles all tests took: CPU0=3478683207 cycles vs CPU0=463767978 cycles # See detailed table with results here: ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/HiKey960_performance_default.txt ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/HiKey960_performance_patched.txt b) time sudo ./test_vmalloc.sh test_repeat_count=1 With this configuration, all tests are run on all available online CPUs. Before running each CPU shuffles its tests execution order. It gives random allocation behaviour. So it is rough comparison, but it puts in the picture for sure. # i5-3320M <default> vs <patched> real 101m22.813s real 0m56.805s user 0m0.011s user 0m0.015s sys 0m5.076s sys 0m0.023s # See detailed table with results here: ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_test_repeat_count_1_default.txt ftp://vps418301.ovh.net/incoming/vmap_test_results_v2/i5-3320M_test_repeat_count_1_patched.txt # Hikey960 8x CPUs <default> vs <patched> real unknown real 4m25.214s user unknown user 0m0.011s sys unknown sys 0m0.670s I did not manage to complete this test on "default Hikey960" kernel version. After 24 hours it was still running, therefore i had to cancel it. That is why real/user/sys are "unknown". This patch (of 3): Currently an allocation of the new vmap area is done over busy list iteration(complexity O(n)) until a suitable hole is found between two busy areas. Therefore each new allocation causes the list being grown. Due to over fragmented list and different permissive parameters an allocation can take a long time. For example on embedded devices it is milliseconds. This patch organizes the KVA memory layout into free areas of the 1-ULONG_MAX range. It uses an augment red-black tree that keeps blocks sorted by their offsets in pair with linked list keeping the free space in order of increasing addresses. Nodes are augmented with the size of the maximum available free block in its left or right sub-tree. Thus, that allows to take a decision and traversal toward the block that will fit and will have the lowest start address, i.e. it is sequential allocation. Allocation: to allocate a new block a search is done over the tree until a suitable lowest(left most) block is large enough to encompass: the requested size, alignment and vstart point. If the block is bigger than requested size - it is split. De-allocation: when a busy vmap area is freed it can either be merged or inserted to the tree. Red-black tree allows efficiently find a spot whereas a linked list provides a constant-time access to previous and next blocks to check if merging can be done. In case of merging of de-allocated memory chunk a large coalesced area is created. Complexity: ~O(log(N)) [urezki@gmail.com: v3] Link: http://lkml.kernel.org/r/20190402162531.10888-2-urezki@gmail.com [urezki@gmail.com: v4] Link: http://lkml.kernel.org/r/20190406183508.25273-2-urezki@gmail.com Link: http://lkml.kernel.org/r/20190321190327.11813-2-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Roman Gushchin <guro@fb.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Thomas Garnier <thgarnie@google.com> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-18Merge tag 'perf-core-for-mingo-5.2-20190517' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/coreIngo Molnar94-214/+5414
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: perf.data: Alexey Budankov: - Streaming compression of perf ring buffer into PERF_RECORD_COMPRESSED user space records, resulting in ~3-5x perf.data file size reduction on variety of tested workloads what saves storage space on larger server systems where perf.data size can easily reach several tens or even hundreds of GiBs, especially when profiling with DWARF-based stacks and tracing of context switches. perf record: Arnaldo Carvalho de Melo - Improve -user-regs/intr-regs suggestions to overcome errors. perf annotate: Jin Yao: - Remove hist__account_cycles() from callback, speeding up branch processing (perf record -b). perf stat: - Add a 'percore' event qualifier, e.g.: -e cpu/event=0,umask=0x3,percore=1/, that sums up the event counts for both hardware threads in a core. We can already do this with --per-core, but it's often useful to do this together with other metrics that are collected per hardware thread. I.e. now its possible to do this per-event, and have it mixed with other events not aggregated by core. core libraries: Donald Yandt: - Check for errors when doing fgets(/proc/version). Jiri Olsa: - Speed up report for perf compiled with linbunwind. tools headers: Arnaldo Carvalho de Melo - Update memcpy_64.S, x86's kvm.h and pt_regs.h. arm64: Florian Fainelli: - Map Brahma-B53 CPUID to cortex-a53 events. - Add Cortex-A57 and Cortex-A72 events. csky: Mao Han: - Add DWARF register mappings for libdw, allowing --call-graph=dwarf to work on the C-SKY arch. x86: Andi Kleen/Kan Liang: - Add support for recording and printing XMM registers, available, for instance, on Icelake. Kan Liang: - Add uncore_upi (Intel's "Ultra Path Interconnect" events) JSON support. UPI replaced the Intel QuickPath Interconnect (QPI) in Xeon Skylake-SP. Intel PT: Adrian Hunter . Fix instructions sampling rate. . Timestamp fixes. . Improve exported-sql-viewer GUI, allowing, for instance, to copy'n'paste the trees, useful for e-mailing. Documentation: Thomas Richter: - Add description for 'perf --debug stderr=1', which redirects stderr to stdout. libtraceevent: Tzvetomir Stoyanov: - Add man pages for the various APIs. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-18kbuild: check uniqueness of module namesMasahiro Yamada2-0/+17
In the recent build test of linux-next, Stephen saw a build error caused by a broken .tmp_versions/*.mod file: https://lkml.org/lkml/2019/5/13/991 drivers/net/phy/asix.ko and drivers/net/usb/asix.ko have the same basename, and there is a race in generating .tmp_versions/asix.mod Kbuild has not checked this before, and it suddenly shows up with obscure error messages when this kind of race occurs. Non-unique module names cause various sort of problems, but it is not trivial to catch them by eyes. Hence, this script. It checks not only real modules, but also built-in modules (i.e. controlled by tristate CONFIG option, but currently compiled with =y). Non-unique names for built-in modules also cause problems because /sys/modules/ would fall over. For the latest kernel, I tested "make allmodconfig all" (or more quickly "make allyesconfig modules"), and it detected the following: warning: same basename if the following are built as modules: drivers/regulator/88pm800.ko drivers/mfd/88pm800.ko warning: same basename if the following are built as modules: drivers/gpu/drm/bridge/adv7511/adv7511.ko drivers/media/i2c/adv7511.ko warning: same basename if the following are built as modules: drivers/net/phy/asix.ko drivers/net/usb/asix.ko warning: same basename if the following are built as modules: fs/coda/coda.ko drivers/media/platform/coda/coda.ko warning: same basename if the following are built as modules: drivers/net/phy/realtek.ko drivers/net/dsa/realtek.ko Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
2019-05-18kconfig: Terminate menu blocks with a comment in the generated configAlexander Popov1-1/+12
Currently menu blocks start with a pretty header but end with nothing in the generated config. So next config options stick together with the options from the menu block. Let's terminate menu blocks in the generated config with a comment and a newline if needed. Example: ... CONFIG_BPF_STREAM_PARSER=y CONFIG_NET_FLOW_LIMIT=y # # Network testing # CONFIG_NET_PKTGEN=y CONFIG_NET_DROP_MONITOR=y # end of Network testing # end of Networking options CONFIG_HAMRADIO=y ... Signed-off-by: Alexander Popov <alex.popov@linux.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-05-18kbuild: add LICENSES to KBUILD_ALLDIRSMasahiro Yamada1-1/+1
For *-pkg targets, the LICENSES directory should be included in the source tarball. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-05-18kbuild: remove 'addtree' and 'flags' magic for header search pathsMasahiro Yamada3-33/+13
The 'addtree' and 'flags' in scripts/Kbuild.include are so compilecated and ugly. As I mentioned in [1], Kbuild should stop automatic prefixing of header search path options. I fixed up (almost) all Makefiles in the kernel. Now 'addtree' and 'flags' have been removed. Kbuild still caters to add $(srctree)/$(src) and $(objtree)/$(obj) to the header search path for O= building, but never touches extra compiler options from ccflags-y etc. [1]: https://patchwork.kernel.org/patch/9632347/ Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-05-18treewide: prefix header search paths with $(srctree)/Masahiro Yamada17-26/+25
Currently, the Kbuild core manipulates header search paths in a crazy way [1]. To fix this mess, I want all Makefiles to add explicit $(srctree)/ to the search paths in the srctree. Some Makefiles are already written in that way, but not all. The goal of this work is to make the notation consistent, and finally get rid of the gross hacks. Having whitespaces after -I does not matter since commit 48f6e3cf5bc6 ("kbuild: do not drop -I without parameter"). [1]: https://patchwork.kernel.org/patch/9632347/ Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>