aboutsummaryrefslogtreecommitdiffstats
path: root/arch/arm64/include (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-10-04arm64: Use larger stacks when KASAN is selectedMark Rutland1-3/+6
AddressSanitizer instrumentation can significantly bloat the stack, and with GCC 7 this can result in stack overflows at boot time in some configurations. We can avoid this by doubling our stack size when KASAN is in use, as is already done on x86 (and has been since KASAN was introduced). Regardless of other patches to decrease KASAN's stack utilization, kernels built with KASAN will always require more stack space than those built without, and we should take this into account. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-09-29arm64: mm: Use READ_ONCE when dereferencing pointer to pte tableWill Deacon1-1/+1
On kernels built with support for transparent huge pages, different CPUs can access the PMD concurrently due to e.g. fast GUP or page_vma_mapped_walk and they must take care to use READ_ONCE to avoid value tearing or caching of stale values by the compiler. Unfortunately, these functions call into our pgtable macros, which don't use READ_ONCE, and compiler caching has been observed to cause the following crash during ext4 writeback: PC is at check_pte+0x20/0x170 LR is at page_vma_mapped_walk+0x2e0/0x540 [...] Process doio (pid: 2463, stack limit = 0xffff00000f2e8000) Call trace: [<ffff000008233328>] check_pte+0x20/0x170 [<ffff000008233758>] page_vma_mapped_walk+0x2e0/0x540 [<ffff000008234adc>] page_mkclean_one+0xac/0x278 [<ffff000008234d98>] rmap_walk_file+0xf0/0x238 [<ffff000008236e74>] rmap_walk+0x64/0xa0 [<ffff0000082370c8>] page_mkclean+0x90/0xa8 [<ffff0000081f3c64>] clear_page_dirty_for_io+0x84/0x2a8 [<ffff00000832f984>] mpage_submit_page+0x34/0x98 [<ffff00000832fb4c>] mpage_process_page_bufs+0x164/0x170 [<ffff00000832fc8c>] mpage_prepare_extent_to_map+0x134/0x2b8 [<ffff00000833530c>] ext4_writepages+0x484/0xe30 [<ffff0000081f6ab4>] do_writepages+0x44/0xe8 [<ffff0000081e5bd4>] __filemap_fdatawrite_range+0xbc/0x110 [<ffff0000081e5e68>] file_write_and_wait_range+0x48/0xd8 [<ffff000008324310>] ext4_sync_file+0x80/0x4b8 [<ffff0000082bd434>] vfs_fsync_range+0x64/0xc0 [<ffff0000082332b4>] SyS_msync+0x194/0x1e8 This is because page_vma_mapped_walk loads the PMD twice before calling pte_offset_map: the first time without READ_ONCE (where it gets all zeroes due to a concurrent pmdp_invalidate) and the second time with READ_ONCE (where it sees a valid table pointer due to a concurrent pmd_populate). However, the compiler inlines everything and caches the first value in a register, which is subsequently used in pte_offset_phys which returns a junk pointer that is later dereferenced when attempting to access the relevant pte. This patch fixes the issue by using READ_ONCE in pte_offset_phys to ensure that a stale value is not used. Whilst this is a point fix for a known failure (and simple to backport), a full fix moving all of our page table accessors over to {READ,WRITE}_ONCE and consistently using READ_ONCE in page_vma_mapped_walk is in the works for a future kernel release. Cc: Jon Masters <jcm@redhat.com> Cc: Timur Tabi <timur@codeaurora.org> Cc: <stable@vger.kernel.org> Fixes: f27176cfc363 ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()") Tested-by: Richard Ruigrok <rruigrok@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-09-18arm64: relax assembly code alignment from 16 byte to 4 byteMasahiro Yamada1-2/+2
Aarch64 instructions must be word aligned. The current 16 byte alignment is more than enough. Relax it into 4 byte alignment. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-09-08Merge tag 'kvm-4.14-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-5/+19
Pull KVM updates from Radim Krčmář: "First batch of KVM changes for 4.14 Common: - improve heuristic for boosting preempted spinlocks by ignoring VCPUs in user mode ARM: - fix for decoding external abort types from guests - added support for migrating the active priority of interrupts when running a GICv2 guest on a GICv3 host - minor cleanup PPC: - expose storage keys to userspace - merge kvm-ppc-fixes with a fix that missed 4.13 because of vacations - fixes s390: - merge of kvm/master to avoid conflicts with additional sthyi fixes - wire up the no-dat enhancements in KVM - multiple epoch facility (z14 feature) - Configuration z/Architecture Mode - more sthyi fixes - gdb server range checking fix - small code cleanups x86: - emulate Hyper-V TSC frequency MSRs - add nested INVPCID - emulate EPTP switching VMFUNC - support Virtual GIF - support 5 level page tables - speedup nested VM exits by packing byte operations - speedup MMIO by using hardware provided physical address - a lot of fixes and cleanups, especially nested" * tag 'kvm-4.14-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits) KVM: arm/arm64: Support uaccess of GICC_APRn KVM: arm/arm64: Extract GICv3 max APRn index calculation KVM: arm/arm64: vITS: Drop its_ite->lpi field KVM: arm/arm64: vgic: constify seq_operations and file_operations KVM: arm/arm64: Fix guest external abort matching KVM: PPC: Book3S HV: Fix memory leak in kvm_vm_ioctl_get_htab_fd KVM: s390: vsie: cleanup mcck reinjection KVM: s390: use WARN_ON_ONCE only for checking KVM: s390: guestdbg: fix range check KVM: PPC: Book3S HV: Report storage key support to userspace KVM: PPC: Book3S HV: Fix case where HDEC is treated as 32-bit on POWER9 KVM: PPC: Book3S HV: Fix invalid use of register expression KVM: PPC: Book3S HV: Fix H_REGISTER_VPA VPA size validation KVM: PPC: Book3S HV: Fix setting of storage key in H_ENTER KVM: PPC: e500mc: Fix a NULL dereference KVM: PPC: e500: Fix some NULL dereferences on error KVM: PPC: Book3S HV: Protect updates to spapr_tce_tables list KVM: s390: we are always in czam mode KVM: s390: expose no-DAT to guest and migration support KVM: s390: sthyi: remove invalid guest write access ...
2017-09-08Merge branch 'kvm-ppc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpcRadim Krčmář2-4/+4
This fix was intended for 4.13, but didn't get in because both maintainers were on vacation. Paul Mackerras: "It adds mutual exclusion between list_add_rcu and list_del_rcu calls on the kvm->arch.spapr_tce_tables list. Without this, userspace could potentially trigger corruption of the list and cause a host crash or worse."
2017-09-07Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-0/+3
Pull EFI updates from Ingo Molnar: "The main changes in this cycle were: - Transparently fall back to other poweroff method(s) if EFI poweroff fails (and returns) - Use separate PE/COFF section headers for the RX and RW parts of the ARM stub loader so that the firmware can use strict mapping permissions - Add support for requesting the firmware to wipe RAM at warm reboot - Increase the size of the random seed obtained from UEFI so CRNG fast init can complete earlier - Update the EFI framebuffer address if it points to a BAR that gets moved by the PCI resource allocation code - Enable "reset attack mitigation" of TPM environments: this is enabled if the kernel is configured with CONFIG_RESET_ATTACK_MITIGATION=y. - Clang related fixes - Misc cleanups, constification, refactoring, etc" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/bgrt: Use efi_mem_type() efi: Move efi_mem_type() to common code efi/reboot: Make function pointer orig_pm_power_off static efi/random: Increase size of firmware supplied randomness efi/libstub: Enable reset attack mitigation firmware/efi/esrt: Constify attribute_group structures firmware/efi: Constify attribute_group structures firmware/dcdbas: Constify attribute_group structures arm/efi: Split zImage code and data into separate PE/COFF sections arm/efi: Replace open coded constants with symbolic ones arm/efi: Remove pointless dummy .reloc section arm/efi: Remove forbidden values from the PE/COFF header drivers/fbdev/efifb: Allow BAR to be moved instead of claiming it efi/reboot: Fall back to original power-off method if EFI_RESET_SHUTDOWN returns efi/arm/arm64: Add missing assignment of efi.config_table efi/libstub/arm64: Set -fpie when building the EFI stub efi/libstub/arm64: Force 'hidden' visibility for section markers efi/libstub/arm64: Use hidden attribute for struct screen_info reference efi/arm: Don't mark ACPI reclaim memory as MEMBLOCK_NOMAP
2017-09-05Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds34-281/+475
Pull arm64 updates from Catalin Marinas: - VMAP_STACK support, allowing the kernel stacks to be allocated in the vmalloc space with a guard page for trapping stack overflows. One of the patches introduces THREAD_ALIGN and changes the generic alloc_thread_stack_node() to use this instead of THREAD_SIZE (no functional change for other architectures) - Contiguous PTE hugetlb support re-enabled (after being reverted a couple of times). We now have the semantics agreed in the generic mm layer together with API improvements so that the architecture code can detect between contiguous and non-contiguous huge PTEs - Initial support for persistent memory on ARM: DC CVAP instruction exposed to user space (HWCAP) and the in-kernel pmem API implemented - raid6 improvements for arm64: faster algorithm for the delta syndrome and implementation of the recovery routines using Neon - FP/SIMD refactoring and removal of support for Neon in interrupt context. This is in preparation for full SVE support - PTE accessors converted from inline asm to cmpxchg so that we can use LSE atomics if available (ARMv8.1) - Perf support for Cortex-A35 and A73 - Non-urgent fixes and cleanups * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (75 commits) arm64: cleanup {COMPAT_,}SET_PERSONALITY() macro arm64: introduce separated bits for mm_context_t flags arm64: hugetlb: Cleanup setup_hugepagesz arm64: Re-enable support for contiguous hugepages arm64: hugetlb: Override set_huge_swap_pte_at() to support contiguous hugepages arm64: hugetlb: Override huge_pte_clear() to support contiguous hugepages arm64: hugetlb: Handle swap entries in huge_pte_offset() for contiguous hugepages arm64: hugetlb: Add break-before-make logic for contiguous entries arm64: hugetlb: Spring clean huge pte accessors arm64: hugetlb: Introduce pte_pgprot helper arm64: hugetlb: set_huge_pte_at Add WARN_ON on !pte_present arm64: kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores arm64: dma-mapping: Mark atomic_pool as __ro_after_init arm64: dma-mapping: Do not pass data to gen_pool_set_algo() arm64: Remove the !CONFIG_ARM64_HW_AFDBM alternative code paths arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect() arm64: Move PTE_RDONLY bit handling out of set_pte_at() kvm: arm64: Convert kvm_set_s2pte_readonly() from inline asm to cmpxchg() arm64: Convert pte handling from inline asm to using (cmp)xchg arm64: neon/efi: Make EFI fpsimd save/restore variables static ...
2017-09-05KVM: arm/arm64: Fix guest external abort matchingJames Morse1-5/+19
The ARM-ARM has two bits in the ESR/HSR relevant to external aborts. A range of {I,D}FSC values (of which bit 5 is always set) and bit 9 'EA' which provides: > an IMPLEMENTATION DEFINED classification of External Aborts. This bit is in addition to the {I,D}FSC range, and has an implementation defined meaning. KVM should always ignore this bit when handling external aborts from a guest. Remove the ESR_ELx_EA definition and rewrite its helper kvm_vcpu_dabt_isextabt() to check the {I,D}FSC range. This merges kvm_vcpu_dabt_isextabt() and the recently added is_abort_sea() helper. CC: Tyler Baicar <tbaicar@codeaurora.org> Reported-by: gengdongjiu <gengdj.1984@gmail.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-09-04Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-0/+7
Pull irq updates from Thomas Gleixner: "The interrupt subsystem delivers this time: - Refactoring of the GIC-V3 driver to prepare for the GIC-V4 support - Initial GIC-V4 support - Consolidation of the FSL MSI support - Utilize the effective affinity interface in various ARM irqchip drivers - Yet another interrupt chip driver (UniPhier AIDET) - Bulk conversion of the irq chip driver to use %pOF - The usual small fixes and improvements all over the place" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits) irqchip/ls-scfg-msi: Add MSI affinity support irqchip/ls-scfg-msi: Add LS1043a v1.1 MSI support irqchip/ls-scfg-msi: Add LS1046a MSI support arm64: dts: ls1046a: Add MSI dts node arm64: dts: ls1043a: Share all MSIs arm: dts: ls1021a: Share all MSIs arm64: dts: ls1043a: Fix typo of MSI compatible string arm: dts: ls1021a: Fix typo of MSI compatible string irqchip/ls-scfg-msi: Fix typo of MSI compatible strings irqchip/irq-bcm7120-l2: Use correct I/O accessors for irq_fwd_mask irqchip/mmp: Make mmp_intc_conf const irqchip/gic: Make irq_chip const irqchip/gic-v3: Advertise GICv4 support to KVM irqchip/gic-v4: Enable low-level GICv4 operations irqchip/gic-v4: Add some basic documentation irqchip/gic-v4: Add VLPI configuration interface irqchip/gic-v4: Add VPE command interface irqchip/gic-v4: Add per-VM VPE domain creation irqchip/gic-v3-its: Set implementation defined bit to enable VLPIs irqchip/gic-v3-its: Allow doorbell interrupts to be injected/cleared ...
2017-09-04Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-31/+6
Pull locking updates from Ingo Molnar: - Add 'cross-release' support to lockdep, which allows APIs like completions, where it's not the 'owner' who releases the lock, to be tracked. It's all activated automatically under CONFIG_PROVE_LOCKING=y. - Clean up (restructure) the x86 atomics op implementation to be more readable, in preparation of KASAN annotations. (Dmitry Vyukov) - Fix static keys (Paolo Bonzini) - Add killable versions of down_read() et al (Kirill Tkhai) - Rework and fix jump_label locking (Marc Zyngier, Paolo Bonzini) - Rework (and fix) tlb_flush_pending() barriers (Peter Zijlstra) - Remove smp_mb__before_spinlock() and convert its usages, introduce smp_mb__after_spinlock() (Peter Zijlstra) * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits) locking/lockdep/selftests: Fix mixed read-write ABBA tests sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK() acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse locking/pvqspinlock: Relax cmpxchg's to improve performance on some architectures smp: Avoid using two cache lines for struct call_single_data locking/lockdep: Untangle xhlock history save/restore from task independence locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being futex: Remove duplicated code and fix undefined behaviour Documentation/locking/atomic: Finish the document... locking/lockdep: Fix workqueue crossrelease annotation workqueue/lockdep: 'Fix' flush_work() annotation locking/lockdep/selftests: Add mixed read-write ABBA tests mm, locking/barriers: Clarify tlb_flush_pending() barriers locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive locking/lockdep: Explicitly initialize wq_barrier::done::map locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS locking/lockdep: Reword title of LOCKDEP_CROSSRELEASE config locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING locking/refcounts, x86/asm: Implement fast refcount overflow protection locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease ...
2017-09-04Merge branch 'x86-syscall-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-1/+6
Pull syscall updates from Ingo Molnar: "Improve the security of set_fs(): we now check the address limit on a number of key platforms (x86, arm, arm64) before returning to user-space - without adding overhead to the typical system call fast path" * 'x86-syscall-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arm64/syscalls: Check address limit on user-mode return arm/syscalls: Check address limit on user-mode return x86/syscalls: Check address limit on user-mode return
2017-09-04Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-7/+0
Pull perf updates from Ingo Molnar: "Kernel side changes: - Add branch type profiling/tracing support. (Jin Yao) - Add the PERF_SAMPLE_PHYS_ADDR ABI to allow the tracing/profiling of physical memory addresses, where the PMU supports it. (Kan Liang) - Export some PMU capability details in the new /sys/bus/event_source/devices/cpu/caps/ sysfs directory. (Andi Kleen) - Aux data fixes and updates (Will Deacon) - kprobes fixes and updates (Masami Hiramatsu) - AMD uncore PMU driver fixes and updates (Janakarajan Natarajan) On the tooling side, here's a (limited!) list of highlights - there were many other changes that I could not list, see the shortlog and git history for details: UI improvements: - Implement a visual marker for fused x86 instructions in the annotate TUI browser, available now in 'perf report', more work needed to have it available as well in 'perf top' (Jin Yao) Further explanation from one of Jin's patches: │ ┌──cmpl $0x0,argp_program_version_hook 81.93 │ ├──je 20 │ │ lock cmpxchg %esi,0x38a9a4(%rip) │ │↓ jne 29 │ │↓ jmp 43 11.47 │20:└─→cmpxch %esi,0x38a999(%rip) That means the cmpl+je is a fused instruction pair and they should be considered together. - Record the branch type and then show statistics and info about in callchain entries (Jin Yao) Example from one of Jin's patches: # perf record -g -j any,save_type # perf report --branch-history --stdio --no-children 38.50% div.c:45 [.] main div | ---main div.c:42 (RET CROSS_2M cycles:2) compute_flag div.c:28 (cycles:2) compute_flag div.c:27 (RET CROSS_2M cycles:1) rand rand.c:28 (cycles:1) rand rand.c:28 (RET CROSS_2M cycles:1) __random random.c:298 (cycles:1) __random random.c:297 (COND_BWD CROSS_2M cycles:1) __random random.c:295 (cycles:1) __random random.c:295 (COND_BWD CROSS_2M cycles:1) __random random.c:295 (cycles:1) __random random.c:295 (RET CROSS_2M cycles:9) namespaces support: - Add initial support for namespaces, using setns to access files in namespaces, grabbing their build-ids, etc. (Krister Johansen) perf trace enhancements: - Beautify pkey_{alloc,free,mprotect} arguments in 'perf trace' (Arnaldo Carvalho de Melo) - Add initial 'clone' syscall args beautifier in 'perf trace' (Arnaldo Carvalho de Melo) - Ignore 'fd' and 'offset' args for MAP_ANONYMOUS in 'perf trace' (Arnaldo Carvalho de Melo) - Beautifiers for the 'cmd' arg of several ioctl types, including: sound, DRM, KVM, vhost virtio and perf_events. (Arnaldo Carvalho de Melo) - Add PERF_SAMPLE_CALLCHAIN and PERF_RECORD_MMAP[2] to 'perf data' CTF conversion, allowing CTF trace visualization tools to show callchains and to resolve symbols (Geneviève Bastien) - Beautify the fcntl syscall, which is an interesting one in the sense that infrastructure had to be put in place to change the formatters of some arguments according to the value in a previous one, i.e. cmd dictates how arg and the syscall return will be formatted. (Arnaldo Carvalho de Melo perf stat enhancements: - Use group read for event groups in 'perf stat', reducing overhead when groups are defined in the event specification, i.e. when using {} to enclose a list of events, asking them to be read at the same time, e.g.: "perf stat -e '{cycles,instructions}'" (Jiri Olsa) pipe mode improvements: - Process tracing data in 'perf annotate' pipe mode (David Carrillo-Cisneros) - Add header record types to pipe-mode, now this command: $ perf record -o - -e cycles sleep 1 | perf report --stdio --header Will show the same as in non-pipe mode, i.e. involving a perf.data file (David Carrillo-Cisneros) Vendor specific hardware event support updates/enhancements: - Update POWER9 vendor events tables (Sukadev Bhattiprolu) - Add POWER9 PMU events Sukadev (Bhattiprolu) - Support additional POWER8+ PVR in PMU mapfile (Shriya) - Add Skylake server uncore JSON vendor events (Andi Kleen) - Support exporting Intel PT data to sqlite3 with python perf scripts, this is in addition to the postgresql support that was already there (Adrian Hunter)" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (253 commits) perf symbols: Fix plt entry calculation for ARM and AARCH64 perf probe: Fix kprobe blacklist checking condition perf/x86: Fix caps/ for !Intel perf/core, x86: Add PERF_SAMPLE_PHYS_ADDR perf/core, pt, bts: Get rid of itrace_started perf trace beauty: Beautify pkey_{alloc,free,mprotect} arguments tools headers: Sync cpu features kernel ABI headers with tooling headers perf tools: Pass full path of FEATURES_DUMP perf tools: Robustify detection of clang binary tools lib: Allow external definition of CC, AR and LD perf tools: Allow external definition of flex and bison binary names tools build tests: Don't hardcode gcc name perf report: Group stat values on global event id perf values: Zero value buffers perf values: Fix allocation check perf values: Fix thread index bug perf report: Add dump_read function perf record: Set read_format for inherit_stat perf c2c: Fix remote HITM detection for Skylake perf tools: Fix static build with newer toolchains ...
2017-09-04Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-53/+5
Pull RCU updates from Ingo Molnad: "The main RCU related changes in this cycle were: - Removal of spin_unlock_wait() - SRCU updates - RCU torture-test updates - RCU Documentation updates - Extend the sys_membarrier() ABI with the MEMBARRIER_CMD_PRIVATE_EXPEDITED variant - Miscellaneous RCU fixes - CPU-hotplug fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits) arch: Remove spin_unlock_wait() arch-specific definitions locking: Remove spin_unlock_wait() generic definitions drivers/ata: Replace spin_unlock_wait() with lock/unlock pair ipc: Replace spin_unlock_wait() with lock/unlock pair exit: Replace spin_unlock_wait() with lock/unlock pair completion: Replace spin_unlock_wait() with lock/unlock pair doc: Set down RCU's scheduling-clock-interrupt needs doc: No longer allowed to use rcu_dereference on non-pointers doc: Add RCU files to docbook-generation files doc: Update memory-barriers.txt for read-to-write dependencies doc: Update RCU documentation membarrier: Provide expedited private command rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter() rcu: Add warning to rcu_idle_enter() for irqs enabled rcu: Make rcu_idle_enter() rely on callers disabling irqs rcu: Add assertions verifying blocked-tasks list rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit() rcu: Add TPS() protection for _rcu_barrier_trace strings rcu: Use idle versions of swait to make idle-hack clear swait: Add idle variants which don't contribute to load average ...
2017-09-04Merge branch 'linus' into locking/core, to fix up conflictsIngo Molnar1-6/+0
Conflicts: mm/page_alloc.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-31KVM: update to new mmu_notifier semantic v2Jérôme Glisse1-6/+0
Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and are now bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Changed since v1 (Linus Torvalds) - remove now useless kvm_arch_mmu_notifier_invalidate_page() Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Tested-by: Mike Galbraith <efault@gmx.de> Tested-by: Adam Borowski <kilobyte@angband.pl> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31irqchip/gic-v3-its: Add VPE interrupt maskingMarc Zyngier1-0/+2
When masking/unmasking a doorbell interrupt, it is necessary to issue an invalidation to the corresponding redistributor. We use the DirectLPI feature by writting directly to the corresponding redistributor. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-08-31irqchip/gic-v3-its: Add VPENDBASER/VPROPBASER accessorsMarc Zyngier1-0/+5
V{PEND,PROP}BASER being 64bit registers, they need some ad-hoc accessors on 32bit, specially given that VPENDBASER contains a Valid bit, making the access a bit convoluted. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-08-25futex: Remove duplicated code and fix undefined behaviourJiri Slaby1-22/+4
There is code duplicated over all architecture's headers for futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr, and comparison of the result. Remove this duplication and leave up to the arches only the needed assembly which is now in arch_futex_atomic_op_inuser. This effectively distributes the Will Deacon's arm64 fix for undefined behaviour reported by UBSAN to all architectures. The fix was done in commit 5f16a046f8e1 (arm64: futex: Fix undefined behaviour with FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump. And as suggested by Thomas, check for negative oparg too, because it was also reported to cause undefined behaviour report. Note that s390 removed access_ok check in d12a29703 ("s390/uaccess: remove pointless access_ok() checks") as access_ok there returns true. We introduce it back to the helper for the sake of simplicity (it gets optimized away anyway). Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390] Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile] Reviewed-by: Darren Hart (VMware) <dvhart@infradead.org> Reviewed-by: Will Deacon <will.deacon@arm.com> [core/arm64] Cc: linux-mips@linux-mips.org Cc: Rich Felker <dalias@libc.org> Cc: linux-ia64@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: peterz@infradead.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: sparclinux@vger.kernel.org Cc: Jonas Bonn <jonas@southpole.se> Cc: linux-s390@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: linux-hexagon@vger.kernel.org Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: linux-snps-arc@lists.infradead.org Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux-xtensa@linux-xtensa.org Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: openrisc@lists.librecores.org Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Stafford Horne <shorne@gmail.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Richard Henderson <rth@twiddle.net> Cc: Chris Zankel <chris@zankel.net> Cc: Michal Simek <monstr@monstr.eu> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-parisc@vger.kernel.org Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: linux-alpha@vger.kernel.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: "David S. Miller" <davem@davemloft.net> Link: http://lkml.kernel.org/r/20170824073105.3901-1-jslaby@suse.cz
2017-08-25Merge branch 'linus' into locking/core, to pick up fixesIngo Molnar2-4/+4
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-24Merge branch 'linus' into perf/core, to pick up fixesIngo Molnar2-4/+4
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-22arm64: cleanup {COMPAT_,}SET_PERSONALITY() macroYury Norov2-2/+3
There is some work that should be done after setting the personality. Currently it's done in the macro, which is not the best idea. In this patch new arch_setup_new_exec() routine is introduced, and all setup code is moved there, as suggested by Catalin: https://lkml.org/lkml/2017/8/4/494 Cc: Pratyush Anand <panand@redhat.com> Signed-off-by: Yury Norov <ynorov@caviumnetworks.com> [catalin.marinas@arm.com: comments changed or removed] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-22arm64: introduce separated bits for mm_context_t flagsYury Norov2-2/+4
Currently mm->context.flags field uses thread_info flags which is not the best idea for many reasons. For example, mm_context_t doesn't need most of thread_info flags. And it would be difficult to add new mm-related flag if needed because it may easily interfere with TIF ones. To deal with it, the new MMCF_AARCH32 flag is introduced for mm_context_t->flags, where MMCF prefix stands for mm_context_t flags. Also, mm_context_t flag doesn't require atomicity and ordering of the access, so using set/clear_bit() is replaced with simple masks. Signed-off-by: Yury Norov <ynorov@caviumnetworks.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-22arm64: hugetlb: Override set_huge_swap_pte_at() to support contiguous hugepagesPunit Agrawal1-0/+3
The default implementation of set_huge_swap_pte_at() does not support hugepages consisting of contiguous ptes. Override it to add support for contiguous hugepages. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Cc: David Woods <dwoods@mellanox.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-22arm64: hugetlb: Override huge_pte_clear() to support contiguous hugepagesPunit Agrawal1-1/+5
The default huge_pte_clear() implementation does not clear contiguous page table entries when it encounters contiguous hugepages that are supported on arm64. Fix this by overriding the default implementation to clear all the entries associated with contiguous hugepages. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Cc: David Woods <dwoods@mellanox.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-21arm64: kexec: have own crash_smp_send_stop() for crash dump for nonpanic coresHoeun Ryu1-1/+1
Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly version in panic path) introduced crash_smp_send_stop() which is a weak function and can be overridden by architecture codes to fix the side effect caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ notifiers" option). ARM64 architecture uses the weak version function and the problem is that the weak function simply calls smp_send_stop() which makes other CPUs offline and takes away the chance to save crash information for nonpanic CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel option is enabled. Calling smp_send_crash_stop() in machine_crash_shutdown() is useless because all nonpanic CPUs are already offline by smp_send_stop() in this case and smp_send_crash_stop() only works against online CPUs. The result is that secondary CPUs registers are not saved by crash_save_cpu() and the vmcore file misreports these CPUs as being offline. crash_smp_send_stop() is implemented to fix this problem by replacing the existing smp_send_crash_stop() and adding a check for multiple calling to the function. The function (strong symbol version) saves crash information for nonpanic CPUs and machine_crash_shutdown() tries to save crash information for nonpanic CPUs only when crash_kexec_post_notifiers kernel option is disabled. * crash_kexec_post_notifiers : false panic() __crash_kexec() machine_crash_shutdown() crash_smp_send_stop() <= save crash dump for nonpanic cores * crash_kexec_post_notifiers : true panic() crash_smp_send_stop() <= save crash dump for nonpanic cores __crash_kexec() machine_crash_shutdown() crash_smp_send_stop() <= just return. Signed-off-by: Hoeun Ryu <hoeun.ryu@gmail.com> Reviewed-by: James Morse <james.morse@arm.com> Tested-by: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-21arm64: Remove the !CONFIG_ARM64_HW_AFDBM alternative code pathsCatalin Marinas1-8/+1
Since the pte handling for hardware AF/DBM works even when the hardware feature is not present, make the pte accessors implementation permanent and remove the corresponding #ifdefs. The Kconfig option is kept as it can still be used to disable the feature at the hardware level. Reviewed-by: Will Deacon <will.deacon@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-21arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect()Catalin Marinas1-8/+13
ptep_set_wrprotect() is only called on CoW mappings which are private (!VM_SHARED) with the pte either read-only (!PTE_WRITE && PTE_RDONLY) or writable and software-dirty (PTE_WRITE && !PTE_RDONLY && PTE_DIRTY). There is no race with the hardware update of the dirty state: clearing of PTE_RDONLY when PTE_WRITE (a.k.a. PTE_DBM) is set. This patch removes the code setting the software PTE_DIRTY bit in ptep_set_wrprotect() as superfluous. A VM_WARN_ONCE is introduced in case the above logic is wrong or the core mm code changes its use of ptep_set_wrprotect(). Reviewed-by: Will Deacon <will.deacon@arm.com> Acked-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-21arm64: Move PTE_RDONLY bit handling out of set_pte_at()Catalin Marinas2-34/+18
Currently PTE_RDONLY is treated as a hardware only bit and not handled by the pte_mkwrite(), pte_wrprotect() or the user PAGE_* definitions. The set_pte_at() function is responsible for setting this bit based on the write permission or dirty state. This patch moves the PTE_RDONLY handling out of set_pte_at into the pte_mkwrite()/pte_wrprotect() functions. The PAGE_* definitions to need to be updated to explicitly include PTE_RDONLY when !PTE_WRITE. The patch also removes the redundant PAGE_COPY(_EXEC) definitions as they are identical to the corresponding PAGE_READONLY(_EXEC). Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-21kvm: arm64: Convert kvm_set_s2pte_readonly() from inline asm to cmpxchg()Catalin Marinas1-12/+9
To take advantage of the LSE atomic instructions and also make the code cleaner, convert the kvm_set_s2pte_readonly() function to use the more generic cmpxchg(). Cc: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-21arm64: Convert pte handling from inline asm to using (cmp)xchgCatalin Marinas1-38/+33
With the support for hardware updates of the access and dirty states, the following pte handling functions had to be implemented using exclusives: __ptep_test_and_clear_young(), ptep_get_and_clear(), ptep_set_wrprotect() and ptep_set_access_flags(). To take advantage of the LSE atomic instructions and also make the code cleaner, convert these pte functions to use the more generic cmpxchg()/xchg(). Reviewed-by: Will Deacon <will.deacon@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-21Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcuIngo Molnar1-53/+5
Pull RCU updates from Paul E. McKenney: - Removal of spin_unlock_wait() - SRCU updates - Torture-test updates - Documentation updates - Miscellaneous fixes - CPU-hotplug fixes - Miscellaneous non-RCU fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-21efi/libstub/arm64: Use hidden attribute for struct screen_info referenceArd Biesheuvel1-0/+3
To prevent the compiler from emitting absolute references to screen_info when building position independent code, redeclare the symbol with hidden visibility. Tested-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20170818194947.19347-3-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-20Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-2/+2
Pull timer fixes from Thomas Gleixner: "A few small fixes for timer drivers: - Prevent infinite recursion in the arm architected timer driver with ftrace - Propagate error codes to the caller in case of failure in EM STI driver - Adjust a bogus loop iteration in the arm architected timer driver - Add a missing Kconfig dependency to the pistachio clocksource to prevent build failures - Correctly check for IS_ERR() instead of NULL in the shared timer-of code" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/drivers/arm_arch_timer: Avoid infinite recursion when ftrace is enabled clocksource/drivers/Kconfig: Fix CLKSRC_PISTACHIO dependencies clocksource/drivers/timer-of: Checking for IS_ERR() instead of NULL clocksource/drivers/em_sti: Fix error return codes in em_sti_probe() clocksource/drivers/arm_arch_timer: Fix mem frame loop initialization
2017-08-18mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changesKees Cook1-2/+2
Moving the x86_64 and arm64 PIE base from 0x555555554000 to 0x000100000000 broke AddressSanitizer. This is a partial revert of: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE") 02445990a96e ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB") The AddressSanitizer tool has hard-coded expectations about where executable mappings are loaded. The motivation for changing the PIE base in the above commits was to avoid the Stack-Clash CVEs that allowed executable mappings to get too close to heap and stack. This was mainly a problem on 32-bit, but the 64-bit bases were moved too, in an effort to proactively protect those systems (proofs of concept do exist that show 64-bit collisions, but other recent changes to fix stack accounting and setuid behaviors will minimize the impact). The new 32-bit PIE base is fine for ASan (since it matches the ET_EXEC base), so only the 64-bit PIE base needs to be reverted to let x86 and arm64 ASan binaries run again. Future changes to the 64-bit PIE base on these architectures can be made optional once a more dynamic method for dealing with AddressSanitizer is found. (e.g. always loading PIE into the mmap region for marked binaries.) Link: http://lkml.kernel.org/r/20170807201542.GA21271@beast Fixes: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE") Fixes: 02445990a96e ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB") Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Kostya Serebryany <kcc@google.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-18Merge branch 'for-next/kernel-mode-neon' into for-next/coreCatalin Marinas6-75/+75
* for-next/kernel-mode-neon: arm64: neon/efi: Make EFI fpsimd save/restore variables static arm64: neon: Forbid when irqs are disabled arm64: neon: Export kernel_neon_busy to loadable modules arm64: neon: Temporarily add a kernel_mode_begin_partial() definition arm64: neon: Remove support for nested or hardirq kernel-mode NEON arm64: neon: Allow EFI runtime services to use FPSIMD in irq context arm64: fpsimd: Consistently use __this_cpu_ ops where appropriate arm64: neon: Add missing header guard in <asm/neon.h> arm64: neon: replace generic definition of may_use_simd()
2017-08-17arch: Remove spin_unlock_wait() arch-specific definitionsPaul E. McKenney1-53/+5
There is no agreed-upon definition of spin_unlock_wait()'s semantics, and it appears that all callers could do just as well with a lock/unlock pair. This commit therefore removes the underlying arch-specific arch_spin_unlock_wait() for all architectures providing them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <linux-arch@vger.kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Boqun Feng <boqun.feng@gmail.com>
2017-08-15Merge branch 'arm64/vmap-stack' of git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux into for-next/coreCatalin Marinas9-48/+164
* 'arm64/vmap-stack' of git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux: arm64: add VMAP_STACK overflow detection arm64: add on_accessible_stack() arm64: add basic VMAP_STACK support arm64: use an irq stack pointer arm64: assembler: allow adr_this_cpu to use the stack pointer arm64: factor out entry stack manipulation efi/arm64: add EFI_KIMG_ALIGN arm64: move SEGMENT_ALIGN to <asm/memory.h> arm64: clean up irq stack definitions arm64: clean up THREAD_* definitions arm64: factor out PAGE_* and CONT_* definitions arm64: kernel: remove {THREAD,IRQ_STACK}_START_SP fork: allow arch-override of VMAP stack alignment arm64: remove __die()'s stack dump
2017-08-15arm64: add VMAP_STACK overflow detectionMark Rutland2-0/+18
This patch adds stack overflow detection to arm64, usable when vmap'd stacks are in use. Overflow is detected in a small preamble executed for each exception entry, which checks whether there is enough space on the current stack for the general purpose registers to be saved. If there is not enough space, the overflow handler is invoked on a per-cpu overflow stack. This approach preserves the original exception information in ESR_EL1 (and where appropriate, FAR_EL1). Task and IRQ stacks are aligned to double their size, enabling overflow to be detected with a single bit test. For example, a 16K stack is aligned to 32K, ensuring that bit 14 of the SP must be zero. On an overflow (or underflow), this bit is flipped. Thus, overflow (of less than the size of the stack) can be detected by testing whether this bit is set. The overflow check is performed before any attempt is made to access the stack, avoiding recursive faults (and the loss of exception information these would entail). As logical operations cannot be performed on the SP directly, the SP is temporarily swapped with a general purpose register using arithmetic operations to enable the test to be performed. This gives us a useful error message on stack overflow, as can be trigger with the LKDTM overflow test: [ 305.388749] lkdtm: Performing direct entry OVERFLOW [ 305.395444] Insufficient stack space to handle exception! [ 305.395482] ESR: 0x96000047 -- DABT (current EL) [ 305.399890] FAR: 0xffff00000a5e7f30 [ 305.401315] Task stack: [0xffff00000a5e8000..0xffff00000a5ec000] [ 305.403815] IRQ stack: [0xffff000008000000..0xffff000008004000] [ 305.407035] Overflow stack: [0xffff80003efce4e0..0xffff80003efcf4e0] [ 305.409622] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5 [ 305.412785] Hardware name: linux,dummy-virt (DT) [ 305.415756] task: ffff80003d051c00 task.stack: ffff00000a5e8000 [ 305.419221] PC is at recursive_loop+0x10/0x48 [ 305.421637] LR is at recursive_loop+0x38/0x48 [ 305.423768] pc : [<ffff00000859f330>] lr : [<ffff00000859f358>] pstate: 40000145 [ 305.428020] sp : ffff00000a5e7f50 [ 305.430469] x29: ffff00000a5e8350 x28: ffff80003d051c00 [ 305.433191] x27: ffff000008981000 x26: ffff000008f80400 [ 305.439012] x25: ffff00000a5ebeb8 x24: ffff00000a5ebeb8 [ 305.440369] x23: ffff000008f80138 x22: 0000000000000009 [ 305.442241] x21: ffff80003ce65000 x20: ffff000008f80188 [ 305.444552] x19: 0000000000000013 x18: 0000000000000006 [ 305.446032] x17: 0000ffffa2601280 x16: ffff0000081fe0b8 [ 305.448252] x15: ffff000008ff546d x14: 000000000047a4c8 [ 305.450246] x13: ffff000008ff7872 x12: 0000000005f5e0ff [ 305.452953] x11: ffff000008ed2548 x10: 000000000005ee8d [ 305.454824] x9 : ffff000008545380 x8 : ffff00000a5e8770 [ 305.457105] x7 : 1313131313131313 x6 : 00000000000000e1 [ 305.459285] x5 : 0000000000000000 x4 : 0000000000000000 [ 305.461781] x3 : 0000000000000000 x2 : 0000000000000400 [ 305.465119] x1 : 0000000000000013 x0 : 0000000000000012 [ 305.467724] Kernel panic - not syncing: kernel stack overflow [ 305.470561] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5 [ 305.473325] Hardware name: linux,dummy-virt (DT) [ 305.475070] Call trace: [ 305.476116] [<ffff000008088ad8>] dump_backtrace+0x0/0x378 [ 305.478991] [<ffff000008088e64>] show_stack+0x14/0x20 [ 305.481237] [<ffff00000895a178>] dump_stack+0x98/0xb8 [ 305.483294] [<ffff0000080c3288>] panic+0x118/0x280 [ 305.485673] [<ffff0000080c2e9c>] nmi_panic+0x6c/0x70 [ 305.486216] [<ffff000008089710>] handle_bad_stack+0x118/0x128 [ 305.486612] Exception stack(0xffff80003efcf3a0 to 0xffff80003efcf4e0) [ 305.487334] f3a0: 0000000000000012 0000000000000013 0000000000000400 0000000000000000 [ 305.488025] f3c0: 0000000000000000 0000000000000000 00000000000000e1 1313131313131313 [ 305.488908] f3e0: ffff00000a5e8770 ffff000008545380 000000000005ee8d ffff000008ed2548 [ 305.489403] f400: 0000000005f5e0ff ffff000008ff7872 000000000047a4c8 ffff000008ff546d [ 305.489759] f420: ffff0000081fe0b8 0000ffffa2601280 0000000000000006 0000000000000013 [ 305.490256] f440: ffff000008f80188 ffff80003ce65000 0000000000000009 ffff000008f80138 [ 305.490683] f460: ffff00000a5ebeb8 ffff00000a5ebeb8 ffff000008f80400 ffff000008981000 [ 305.491051] f480: ffff80003d051c00 ffff00000a5e8350 ffff00000859f358 ffff00000a5e7f50 [ 305.491444] f4a0: ffff00000859f330 0000000040000145 0000000000000000 0000000000000000 [ 305.492008] f4c0: 0001000000000000 0000000000000000 ffff00000a5e8350 ffff00000859f330 [ 305.493063] [<ffff00000808205c>] __bad_stack+0x88/0x8c [ 305.493396] [<ffff00000859f330>] recursive_loop+0x10/0x48 [ 305.493731] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.494088] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.494425] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.494649] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.494898] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.495205] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.495453] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.495708] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.496000] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.496302] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.496644] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.496894] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.497138] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.497325] [<ffff00000859f3dc>] lkdtm_OVERFLOW+0x14/0x20 [ 305.497506] [<ffff00000859f314>] lkdtm_do_action+0x1c/0x28 [ 305.497786] [<ffff00000859f178>] direct_entry+0xe0/0x170 [ 305.498095] [<ffff000008345568>] full_proxy_write+0x60/0xa8 [ 305.498387] [<ffff0000081fb7f4>] __vfs_write+0x1c/0x128 [ 305.498679] [<ffff0000081fcc68>] vfs_write+0xa0/0x1b0 [ 305.498926] [<ffff0000081fe0fc>] SyS_write+0x44/0xa0 [ 305.499182] Exception stack(0xffff00000a5ebec0 to 0xffff00000a5ec000) [ 305.499429] bec0: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0 [ 305.499674] bee0: 574f4c465245564f 0000000000000000 0000000000000000 8000000080808080 [ 305.499904] bf00: 0000000000000040 0000000000000038 fefefeff1b4bc2ff 7f7f7f7f7f7fff7f [ 305.500189] bf20: 0101010101010101 0000000000000000 000000000047a4c8 0000000000000038 [ 305.500712] bf40: 0000000000000000 0000ffffa2601280 0000ffffc63f6068 00000000004b5000 [ 305.501241] bf60: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0 [ 305.501791] bf80: 0000000000000020 0000000000000000 00000000004b5000 000000001c4cc458 [ 305.502314] bfa0: 0000000000000000 0000ffffc63f7950 000000000040a3c4 0000ffffc63f70e0 [ 305.502762] bfc0: 0000ffffa2601268 0000000080000000 0000000000000001 0000000000000040 [ 305.503207] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 305.503680] [<ffff000008082fb0>] el0_svc_naked+0x24/0x28 [ 305.504720] Kernel Offset: disabled [ 305.505189] CPU features: 0x002082 [ 305.505473] Memory Limit: none [ 305.506181] ---[ end Kernel panic - not syncing: kernel stack overflow This patch was co-authored by Ard Biesheuvel and Mark Rutland. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-08-15arm64: add on_accessible_stack()Mark Rutland1-0/+16
Both unwind_frame() and dump_backtrace() try to check whether a stack address is sane to access, with very similar logic. Both will need updating in order to handle overflow stacks. Factor out this logic into a helper, so that we can avoid further duplication when we add overflow stacks. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-08-15arm64: add basic VMAP_STACK supportMark Rutland2-2/+28
This patch enables arm64 to be built with vmap'd task and IRQ stacks. As vmap'd stacks are mapped at page granularity, stacks must be a multiple of PAGE_SIZE. This means that a 64K page kernel must use stacks of at least 64K in size. To minimize the increase in Image size, IRQ stacks are dynamically allocated at boot time, rather than embedding the boot CPU's IRQ stack in the kernel image. This patch was co-authored by Ard Biesheuvel and Mark Rutland. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-08-15arm64: use an irq stack pointerMark Rutland1-2/+5
We allocate our IRQ stacks using a percpu array. This allows us to generate our IRQ stack pointers with adr_this_cpu, but bloats the kernel Image with the boot CPU's IRQ stack. Additionally, these are packed with other percpu variables, and aren't guaranteed to have guard pages. When we enable VMAP_STACK we'll want to vmap our IRQ stacks also, in order to provide guard pages and to permit more stringent alignment requirements. Doing so will require that we use a percpu pointer to each IRQ stack, rather than allocating a percpu IRQ stack in the kernel image. This patch updates our IRQ stack code to use a percpu pointer to the base of each IRQ stack. This will allow us to change the way the stack is allocated with minimal changes elsewhere. In some cases we may try to backtrace before the IRQ stack pointers are initialised, so on_irq_stack() is updated to account for this. In testing with cyclictest, there was no measureable difference between using adr_this_cpu (for irq_stack) and ldr_this_cpu (for irq_stack_ptr) in the IRQ entry path. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-08-15arm64: assembler: allow adr_this_cpu to use the stack pointerArd Biesheuvel1-1/+7
Given that adr_this_cpu already requires a temp register in addition to the destination register, tweak the instruction sequence so that sp may be used as well. This will simplify switching to per-cpu stacks in subsequent patches. While this limits the range of adr_this_cpu, to +/-4GiB, we don't currently use adr_this_cpu in modules, and this is not problematic for the main kernel image. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [Mark: add more commit text] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-08-15efi/arm64: add EFI_KIMG_ALIGNMark Rutland1-0/+3
The EFI stub is intimately coupled with the kernel, and takes advantage of this by relocating the kernel at a weaker alignment than the documented boot protocol mandates. However, it does so by assuming it can align the kernel to the segment alignment, and assumes that this is 64K. In subsequent patches, we'll have to consider other details to determine this de-facto alignment constraint. This patch adds a new EFI_KIMG_ALIGN definition that will track the kernel's de-facto alignment requirements. Subsequent patches will modify this as required. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk>
2017-08-15arm64: move SEGMENT_ALIGN to <asm/memory.h>Mark Rutland1-0/+19
Currently we define SEGMENT_ALIGN directly in our vmlinux.lds.S. This is unfortunate, as the EFI stub currently open-codes the same number, and in future we'll want to fiddle with this. This patch moves the definition to our <asm/memory.h>, where it can be used by both vmlinux.lds.S and the EFI stub code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-08-15arm64: clean up irq stack definitionsMark Rutland3-25/+26
Before we add yet another stack to the kernel, it would be nice to ensure that we consistently organise stack definitions and related helper functions. This patch moves the basic IRQ stack defintions to <asm/memory.h> to live with their task stack counterparts. Helpers used for unwinding are moved into <asm/stacktrace.h>, where subsequent patches will add helpers for other stacks. Includes are fixed up accordingly. This patch is a pure refactoring -- there should be no functional changes as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-08-15arm64: clean up THREAD_* definitionsMark Rutland2-8/+9
Currently we define THREAD_SIZE and THREAD_SIZE_ORDER separately, with the latter dependent on particular CONFIG_ARM64_*K_PAGES definitions. This is somewhat opaque, and will get in the way of future modifications to THREAD_SIZE. This patch cleans this up, defining both in terms of a common THREAD_SHIFT, and using PAGE_SHIFT to calculate THREAD_SIZE_ORDER, rather than using a number of definitions dependent on config symbols. Subsequent patches will make use of this to alter the stack size used in some configurations. At the same time, these are moved into <asm/memory.h>, which will avoid circular include issues in subsequent patches. To ensure that existing code isn't adversely affected, <asm/thread_info.h> is updated to transitively include these definitions. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-08-15arm64: factor out PAGE_* and CONT_* definitionsMark Rutland3-11/+36
Some headers rely on PAGE_* definitions from <asm/page.h>, but cannot include this due to potential circular includes. For example, a number of definitions in <asm/memory.h> rely on PAGE_SHIFT, and <asm/page.h> includes <asm/memory.h>. This requires users of these definitions to include both headers, which is fragile and error-prone. This patch ameliorates matters by moving the basic definitions out to a new header, <asm/page-def.h>. Both <asm/page.h> and <asm/memory.h> are updated to include this, avoiding this fragility, and avoiding the possibility of circular include dependencies. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-08-15arm64: kernel: remove {THREAD,IRQ_STACK}_START_SPArd Biesheuvel3-5/+3
For historical reasons, we leave the top 16 bytes of our task and IRQ stacks unused, a practice used to ensure that the SP can always be masked to find the base of the current stack (historically, where thread_info could be found). However, this is not necessary, as: * When an exception is taken from a task stack, we decrement the SP by S_FRAME_SIZE and stash the exception registers before we compare the SP against the task stack. In such cases, the SP must be at least S_FRAME_SIZE below the limit, and can be safely masked to determine whether the task stack is in use. * When transitioning to an IRQ stack, we'll place a dummy frame onto the IRQ stack before enabling asynchronous exceptions, or executing code we expect to trigger faults. Thus, if an exception is taken from the IRQ stack, the SP must be at least 16 bytes below the limit. * We no longer mask the SP to find the thread_info, which is now found via sp_el0. Note that historically, the offset was critical to ensure that cpu_switch_to() found the correct stack for new threads that hadn't yet executed ret_from_fork(). Given that, this initial offset serves no purpose, and can be removed. This brings us in-line with other architectures (e.g. x86) which do not rely on this masking. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [Mark: rebase, kill THREAD_START_SP, commit msg additions] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-08-15arm64: numa: Remove the unused parent_node() macroDou Liyang1-3/+0
Commit a7be6e5a7f8d ("mm: drop useless local parameters of __register_one_node()") removes the last user of parent_node(). The parent_node() macro in ARM64 platform is unnecessary. Remove it for cleanup. Reported-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-13Merge branch 'clockevents/4.13-fixes' of http://git.linaro.org/people/daniel.lezcano/linux into timers/urgentIngo Molnar1-2/+2
Pull clockevents fixes from Daniel Lezcano: " - Fix error check against IS_ERR() instead of NULL for the timer-of code (Dan Carpenter) - Fix infinite recusion with ftrace for the ARM architected timer (Ding Tianhong) - Fix the error code return in the em_sti's probe function (Gustavo A. R. Silva) - Fix Kconfig dependency for the pistachio driver (Matt Redfearn) - Fix mem frame loop initialization for the ARM architected timer (Matthias Kaehlcke)" Signed-off-by: Ingo Molnar <mingo@kernel.org>