aboutsummaryrefslogtreecommitdiffstats
path: root/arch/arm64/include (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2017-10-04arm64: Use larger stacks when KASAN is selectedMark Rutland1-3/+6
AddressSanitizer instrumentation can significantly bloat the stack, and with GCC 7 this can result in stack overflows at boot time in some configurations. We can avoid this by doubling our stack size when KASAN is in use, as is already done on x86 (and has been since KASAN was introduced). Regardless of other patches to decrease KASAN's stack utilization, kernels built with KASAN will always require more stack space than those built without, and we should take this into account. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-09-29arm64: mm: Use READ_ONCE when dereferencing pointer to pte tableWill Deacon1-1/+1
On kernels built with support for transparent huge pages, different CPUs can access the PMD concurrently due to e.g. fast GUP or page_vma_mapped_walk and they must take care to use READ_ONCE to avoid value tearing or caching of stale values by the compiler. Unfortunately, these functions call into our pgtable macros, which don't use READ_ONCE, and compiler caching has been observed to cause the following crash during ext4 writeback: PC is at check_pte+0x20/0x170 LR is at page_vma_mapped_walk+0x2e0/0x540 [...] Process doio (pid: 2463, stack limit = 0xffff00000f2e8000) Call trace: [<ffff000008233328>] check_pte+0x20/0x170 [<ffff000008233758>] page_vma_mapped_walk+0x2e0/0x540 [<ffff000008234adc>] page_mkclean_one+0xac/0x278 [<ffff000008234d98>] rmap_walk_file+0xf0/0x238 [<ffff000008236e74>] rmap_walk+0x64/0xa0 [<ffff0000082370c8>] page_mkclean+0x90/0xa8 [<ffff0000081f3c64>] clear_page_dirty_for_io+0x84/0x2a8 [<ffff00000832f984>] mpage_submit_page+0x34/0x98 [<ffff00000832fb4c>] mpage_process_page_bufs+0x164/0x170 [<ffff00000832fc8c>] mpage_prepare_extent_to_map+0x134/0x2b8 [<ffff00000833530c>] ext4_writepages+0x484/0xe30 [<ffff0000081f6ab4>] do_writepages+0x44/0xe8 [<ffff0000081e5bd4>] __filemap_fdatawrite_range+0xbc/0x110 [<ffff0000081e5e68>] file_write_and_wait_range+0x48/0xd8 [<ffff000008324310>] ext4_sync_file+0x80/0x4b8 [<ffff0000082bd434>] vfs_fsync_range+0x64/0xc0 [<ffff0000082332b4>] SyS_msync+0x194/0x1e8 This is because page_vma_mapped_walk loads the PMD twice before calling pte_offset_map: the first time without READ_ONCE (where it gets all zeroes due to a concurrent pmdp_invalidate) and the second time with READ_ONCE (where it sees a valid table pointer due to a concurrent pmd_populate). However, the compiler inlines everything and caches the first value in a register, which is subsequently used in pte_offset_phys which returns a junk pointer that is later dereferenced when attempting to access the relevant pte. This patch fixes the issue by using READ_ONCE in pte_offset_phys to ensure that a stale value is not used. Whilst this is a point fix for a known failure (and simple to backport), a full fix moving all of our page table accessors over to {READ,WRITE}_ONCE and consistently using READ_ONCE in page_vma_mapped_walk is in the works for a future kernel release. Cc: Jon Masters <jcm@redhat.com> Cc: Timur Tabi <timur@codeaurora.org> Cc: <stable@vger.kernel.org> Fixes: f27176cfc363 ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()") Tested-by: Richard Ruigrok <rruigrok@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-09-18arm64: relax assembly code alignment from 16 byte to 4 byteMasahiro Yamada1-2/+2
Aarch64 instructions must be word aligned. The current 16 byte alignment is more than enough. Relax it into 4 byte alignment. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-09-05KVM: arm/arm64: Fix guest external abort matchingJames Morse1-5/+19
The ARM-ARM has two bits in the ESR/HSR relevant to external aborts. A range of {I,D}FSC values (of which bit 5 is always set) and bit 9 'EA' which provides: > an IMPLEMENTATION DEFINED classification of External Aborts. This bit is in addition to the {I,D}FSC range, and has an implementation defined meaning. KVM should always ignore this bit when handling external aborts from a guest. Remove the ESR_ELx_EA definition and rewrite its helper kvm_vcpu_dabt_isextabt() to check the {I,D}FSC range. This merges kvm_vcpu_dabt_isextabt() and the recently added is_abort_sea() helper. CC: Tyler Baicar <tbaicar@codeaurora.org> Reported-by: gengdongjiu <gengdj.1984@gmail.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-08-31KVM: update to new mmu_notifier semantic v2Jérôme Glisse1-6/+0
Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and are now bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Changed since v1 (Linus Torvalds) - remove now useless kvm_arch_mmu_notifier_invalidate_page() Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Tested-by: Mike Galbraith <efault@gmx.de> Tested-by: Adam Borowski <kilobyte@angband.pl> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31irqchip/gic-v3-its: Add VPE interrupt maskingMarc Zyngier1-0/+2
When masking/unmasking a doorbell interrupt, it is necessary to issue an invalidation to the corresponding redistributor. We use the DirectLPI feature by writting directly to the corresponding redistributor. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-08-31irqchip/gic-v3-its: Add VPENDBASER/VPROPBASER accessorsMarc Zyngier1-0/+5
V{PEND,PROP}BASER being 64bit registers, they need some ad-hoc accessors on 32bit, specially given that VPENDBASER contains a Valid bit, making the access a bit convoluted. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-08-25futex: Remove duplicated code and fix undefined behaviourJiri Slaby1-22/+4
There is code duplicated over all architecture's headers for futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr, and comparison of the result. Remove this duplication and leave up to the arches only the needed assembly which is now in arch_futex_atomic_op_inuser. This effectively distributes the Will Deacon's arm64 fix for undefined behaviour reported by UBSAN to all architectures. The fix was done in commit 5f16a046f8e1 (arm64: futex: Fix undefined behaviour with FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump. And as suggested by Thomas, check for negative oparg too, because it was also reported to cause undefined behaviour report. Note that s390 removed access_ok check in d12a29703 ("s390/uaccess: remove pointless access_ok() checks") as access_ok there returns true. We introduce it back to the helper for the sake of simplicity (it gets optimized away anyway). Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390] Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile] Reviewed-by: Darren Hart (VMware) <dvhart@infradead.org> Reviewed-by: Will Deacon <will.deacon@arm.com> [core/arm64] Cc: linux-mips@linux-mips.org Cc: Rich Felker <dalias@libc.org> Cc: linux-ia64@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: peterz@infradead.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: sparclinux@vger.kernel.org Cc: Jonas Bonn <jonas@southpole.se> Cc: linux-s390@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: linux-hexagon@vger.kernel.org Cc: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: linux-snps-arc@lists.infradead.org Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux-xtensa@linux-xtensa.org Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: openrisc@lists.librecores.org Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Stafford Horne <shorne@gmail.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Richard Henderson <rth@twiddle.net> Cc: Chris Zankel <chris@zankel.net> Cc: Michal Simek <monstr@monstr.eu> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-parisc@vger.kernel.org Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: linux-alpha@vger.kernel.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: "David S. Miller" <davem@davemloft.net> Link: http://lkml.kernel.org/r/20170824073105.3901-1-jslaby@suse.cz
2017-08-22arm64: cleanup {COMPAT_,}SET_PERSONALITY() macroYury Norov2-2/+3
There is some work that should be done after setting the personality. Currently it's done in the macro, which is not the best idea. In this patch new arch_setup_new_exec() routine is introduced, and all setup code is moved there, as suggested by Catalin: https://lkml.org/lkml/2017/8/4/494 Cc: Pratyush Anand <panand@redhat.com> Signed-off-by: Yury Norov <ynorov@caviumnetworks.com> [catalin.marinas@arm.com: comments changed or removed] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-22arm64: introduce separated bits for mm_context_t flagsYury Norov2-2/+4
Currently mm->context.flags field uses thread_info flags which is not the best idea for many reasons. For example, mm_context_t doesn't need most of thread_info flags. And it would be difficult to add new mm-related flag if needed because it may easily interfere with TIF ones. To deal with it, the new MMCF_AARCH32 flag is introduced for mm_context_t->flags, where MMCF prefix stands for mm_context_t flags. Also, mm_context_t flag doesn't require atomicity and ordering of the access, so using set/clear_bit() is replaced with simple masks. Signed-off-by: Yury Norov <ynorov@caviumnetworks.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-22arm64: hugetlb: Override set_huge_swap_pte_at() to support contiguous hugepagesPunit Agrawal1-0/+3
The default implementation of set_huge_swap_pte_at() does not support hugepages consisting of contiguous ptes. Override it to add support for contiguous hugepages. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Cc: David Woods <dwoods@mellanox.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-22arm64: hugetlb: Override huge_pte_clear() to support contiguous hugepagesPunit Agrawal1-1/+5
The default huge_pte_clear() implementation does not clear contiguous page table entries when it encounters contiguous hugepages that are supported on arm64. Fix this by overriding the default implementation to clear all the entries associated with contiguous hugepages. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Cc: David Woods <dwoods@mellanox.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-21arm64: kexec: have own crash_smp_send_stop() for crash dump for nonpanic coresHoeun Ryu1-1/+1
Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly version in panic path) introduced crash_smp_send_stop() which is a weak function and can be overridden by architecture codes to fix the side effect caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ notifiers" option). ARM64 architecture uses the weak version function and the problem is that the weak function simply calls smp_send_stop() which makes other CPUs offline and takes away the chance to save crash information for nonpanic CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel option is enabled. Calling smp_send_crash_stop() in machine_crash_shutdown() is useless because all nonpanic CPUs are already offline by smp_send_stop() in this case and smp_send_crash_stop() only works against online CPUs. The result is that secondary CPUs registers are not saved by crash_save_cpu() and the vmcore file misreports these CPUs as being offline. crash_smp_send_stop() is implemented to fix this problem by replacing the existing smp_send_crash_stop() and adding a check for multiple calling to the function. The function (strong symbol version) saves crash information for nonpanic CPUs and machine_crash_shutdown() tries to save crash information for nonpanic CPUs only when crash_kexec_post_notifiers kernel option is disabled. * crash_kexec_post_notifiers : false panic() __crash_kexec() machine_crash_shutdown() crash_smp_send_stop() <= save crash dump for nonpanic cores * crash_kexec_post_notifiers : true panic() crash_smp_send_stop() <= save crash dump for nonpanic cores __crash_kexec() machine_crash_shutdown() crash_smp_send_stop() <= just return. Signed-off-by: Hoeun Ryu <hoeun.ryu@gmail.com> Reviewed-by: James Morse <james.morse@arm.com> Tested-by: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-21arm64: Remove the !CONFIG_ARM64_HW_AFDBM alternative code pathsCatalin Marinas1-8/+1
Since the pte handling for hardware AF/DBM works even when the hardware feature is not present, make the pte accessors implementation permanent and remove the corresponding #ifdefs. The Kconfig option is kept as it can still be used to disable the feature at the hardware level. Reviewed-by: Will Deacon <will.deacon@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-21arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect()Catalin Marinas1-8/+13
ptep_set_wrprotect() is only called on CoW mappings which are private (!VM_SHARED) with the pte either read-only (!PTE_WRITE && PTE_RDONLY) or writable and software-dirty (PTE_WRITE && !PTE_RDONLY && PTE_DIRTY). There is no race with the hardware update of the dirty state: clearing of PTE_RDONLY when PTE_WRITE (a.k.a. PTE_DBM) is set. This patch removes the code setting the software PTE_DIRTY bit in ptep_set_wrprotect() as superfluous. A VM_WARN_ONCE is introduced in case the above logic is wrong or the core mm code changes its use of ptep_set_wrprotect(). Reviewed-by: Will Deacon <will.deacon@arm.com> Acked-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-21arm64: Move PTE_RDONLY bit handling out of set_pte_at()Catalin Marinas2-34/+18
Currently PTE_RDONLY is treated as a hardware only bit and not handled by the pte_mkwrite(), pte_wrprotect() or the user PAGE_* definitions. The set_pte_at() function is responsible for setting this bit based on the write permission or dirty state. This patch moves the PTE_RDONLY handling out of set_pte_at into the pte_mkwrite()/pte_wrprotect() functions. The PAGE_* definitions to need to be updated to explicitly include PTE_RDONLY when !PTE_WRITE. The patch also removes the redundant PAGE_COPY(_EXEC) definitions as they are identical to the corresponding PAGE_READONLY(_EXEC). Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-21kvm: arm64: Convert kvm_set_s2pte_readonly() from inline asm to cmpxchg()Catalin Marinas1-12/+9
To take advantage of the LSE atomic instructions and also make the code cleaner, convert the kvm_set_s2pte_readonly() function to use the more generic cmpxchg(). Cc: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-21arm64: Convert pte handling from inline asm to using (cmp)xchgCatalin Marinas1-38/+33
With the support for hardware updates of the access and dirty states, the following pte handling functions had to be implemented using exclusives: __ptep_test_and_clear_young(), ptep_get_and_clear(), ptep_set_wrprotect() and ptep_set_access_flags(). To take advantage of the LSE atomic instructions and also make the code cleaner, convert these pte functions to use the more generic cmpxchg()/xchg(). Reviewed-by: Will Deacon <will.deacon@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Steve Capper <steve.capper@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-21efi/libstub/arm64: Use hidden attribute for struct screen_info referenceArd Biesheuvel1-0/+3
To prevent the compiler from emitting absolute references to screen_info when building position independent code, redeclare the symbol with hidden visibility. Tested-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20170818194947.19347-3-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-18mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changesKees Cook1-2/+2
Moving the x86_64 and arm64 PIE base from 0x555555554000 to 0x000100000000 broke AddressSanitizer. This is a partial revert of: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE") 02445990a96e ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB") The AddressSanitizer tool has hard-coded expectations about where executable mappings are loaded. The motivation for changing the PIE base in the above commits was to avoid the Stack-Clash CVEs that allowed executable mappings to get too close to heap and stack. This was mainly a problem on 32-bit, but the 64-bit bases were moved too, in an effort to proactively protect those systems (proofs of concept do exist that show 64-bit collisions, but other recent changes to fix stack accounting and setuid behaviors will minimize the impact). The new 32-bit PIE base is fine for ASan (since it matches the ET_EXEC base), so only the 64-bit PIE base needs to be reverted to let x86 and arm64 ASan binaries run again. Future changes to the 64-bit PIE base on these architectures can be made optional once a more dynamic method for dealing with AddressSanitizer is found. (e.g. always loading PIE into the mmap region for marked binaries.) Link: http://lkml.kernel.org/r/20170807201542.GA21271@beast Fixes: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE") Fixes: 02445990a96e ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB") Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Kostya Serebryany <kcc@google.com> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-17arch: Remove spin_unlock_wait() arch-specific definitionsPaul E. McKenney1-53/+5
There is no agreed-upon definition of spin_unlock_wait()'s semantics, and it appears that all callers could do just as well with a lock/unlock pair. This commit therefore removes the underlying arch-specific arch_spin_unlock_wait() for all architectures providing them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: <linux-arch@vger.kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Boqun Feng <boqun.feng@gmail.com>
2017-08-15arm64: add VMAP_STACK overflow detectionMark Rutland2-0/+18
This patch adds stack overflow detection to arm64, usable when vmap'd stacks are in use. Overflow is detected in a small preamble executed for each exception entry, which checks whether there is enough space on the current stack for the general purpose registers to be saved. If there is not enough space, the overflow handler is invoked on a per-cpu overflow stack. This approach preserves the original exception information in ESR_EL1 (and where appropriate, FAR_EL1). Task and IRQ stacks are aligned to double their size, enabling overflow to be detected with a single bit test. For example, a 16K stack is aligned to 32K, ensuring that bit 14 of the SP must be zero. On an overflow (or underflow), this bit is flipped. Thus, overflow (of less than the size of the stack) can be detected by testing whether this bit is set. The overflow check is performed before any attempt is made to access the stack, avoiding recursive faults (and the loss of exception information these would entail). As logical operations cannot be performed on the SP directly, the SP is temporarily swapped with a general purpose register using arithmetic operations to enable the test to be performed. This gives us a useful error message on stack overflow, as can be trigger with the LKDTM overflow test: [ 305.388749] lkdtm: Performing direct entry OVERFLOW [ 305.395444] Insufficient stack space to handle exception! [ 305.395482] ESR: 0x96000047 -- DABT (current EL) [ 305.399890] FAR: 0xffff00000a5e7f30 [ 305.401315] Task stack: [0xffff00000a5e8000..0xffff00000a5ec000] [ 305.403815] IRQ stack: [0xffff000008000000..0xffff000008004000] [ 305.407035] Overflow stack: [0xffff80003efce4e0..0xffff80003efcf4e0] [ 305.409622] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5 [ 305.412785] Hardware name: linux,dummy-virt (DT) [ 305.415756] task: ffff80003d051c00 task.stack: ffff00000a5e8000 [ 305.419221] PC is at recursive_loop+0x10/0x48 [ 305.421637] LR is at recursive_loop+0x38/0x48 [ 305.423768] pc : [<ffff00000859f330>] lr : [<ffff00000859f358>] pstate: 40000145 [ 305.428020] sp : ffff00000a5e7f50 [ 305.430469] x29: ffff00000a5e8350 x28: ffff80003d051c00 [ 305.433191] x27: ffff000008981000 x26: ffff000008f80400 [ 305.439012] x25: ffff00000a5ebeb8 x24: ffff00000a5ebeb8 [ 305.440369] x23: ffff000008f80138 x22: 0000000000000009 [ 305.442241] x21: ffff80003ce65000 x20: ffff000008f80188 [ 305.444552] x19: 0000000000000013 x18: 0000000000000006 [ 305.446032] x17: 0000ffffa2601280 x16: ffff0000081fe0b8 [ 305.448252] x15: ffff000008ff546d x14: 000000000047a4c8 [ 305.450246] x13: ffff000008ff7872 x12: 0000000005f5e0ff [ 305.452953] x11: ffff000008ed2548 x10: 000000000005ee8d [ 305.454824] x9 : ffff000008545380 x8 : ffff00000a5e8770 [ 305.457105] x7 : 1313131313131313 x6 : 00000000000000e1 [ 305.459285] x5 : 0000000000000000 x4 : 0000000000000000 [ 305.461781] x3 : 0000000000000000 x2 : 0000000000000400 [ 305.465119] x1 : 0000000000000013 x0 : 0000000000000012 [ 305.467724] Kernel panic - not syncing: kernel stack overflow [ 305.470561] CPU: 0 PID: 1219 Comm: sh Not tainted 4.13.0-rc3-00021-g9636aea #5 [ 305.473325] Hardware name: linux,dummy-virt (DT) [ 305.475070] Call trace: [ 305.476116] [<ffff000008088ad8>] dump_backtrace+0x0/0x378 [ 305.478991] [<ffff000008088e64>] show_stack+0x14/0x20 [ 305.481237] [<ffff00000895a178>] dump_stack+0x98/0xb8 [ 305.483294] [<ffff0000080c3288>] panic+0x118/0x280 [ 305.485673] [<ffff0000080c2e9c>] nmi_panic+0x6c/0x70 [ 305.486216] [<ffff000008089710>] handle_bad_stack+0x118/0x128 [ 305.486612] Exception stack(0xffff80003efcf3a0 to 0xffff80003efcf4e0) [ 305.487334] f3a0: 0000000000000012 0000000000000013 0000000000000400 0000000000000000 [ 305.488025] f3c0: 0000000000000000 0000000000000000 00000000000000e1 1313131313131313 [ 305.488908] f3e0: ffff00000a5e8770 ffff000008545380 000000000005ee8d ffff000008ed2548 [ 305.489403] f400: 0000000005f5e0ff ffff000008ff7872 000000000047a4c8 ffff000008ff546d [ 305.489759] f420: ffff0000081fe0b8 0000ffffa2601280 0000000000000006 0000000000000013 [ 305.490256] f440: ffff000008f80188 ffff80003ce65000 0000000000000009 ffff000008f80138 [ 305.490683] f460: ffff00000a5ebeb8 ffff00000a5ebeb8 ffff000008f80400 ffff000008981000 [ 305.491051] f480: ffff80003d051c00 ffff00000a5e8350 ffff00000859f358 ffff00000a5e7f50 [ 305.491444] f4a0: ffff00000859f330 0000000040000145 0000000000000000 0000000000000000 [ 305.492008] f4c0: 0001000000000000 0000000000000000 ffff00000a5e8350 ffff00000859f330 [ 305.493063] [<ffff00000808205c>] __bad_stack+0x88/0x8c [ 305.493396] [<ffff00000859f330>] recursive_loop+0x10/0x48 [ 305.493731] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.494088] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.494425] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.494649] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.494898] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.495205] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.495453] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.495708] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.496000] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.496302] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.496644] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.496894] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.497138] [<ffff00000859f358>] recursive_loop+0x38/0x48 [ 305.497325] [<ffff00000859f3dc>] lkdtm_OVERFLOW+0x14/0x20 [ 305.497506] [<ffff00000859f314>] lkdtm_do_action+0x1c/0x28 [ 305.497786] [<ffff00000859f178>] direct_entry+0xe0/0x170 [ 305.498095] [<ffff000008345568>] full_proxy_write+0x60/0xa8 [ 305.498387] [<ffff0000081fb7f4>] __vfs_write+0x1c/0x128 [ 305.498679] [<ffff0000081fcc68>] vfs_write+0xa0/0x1b0 [ 305.498926] [<ffff0000081fe0fc>] SyS_write+0x44/0xa0 [ 305.499182] Exception stack(0xffff00000a5ebec0 to 0xffff00000a5ec000) [ 305.499429] bec0: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0 [ 305.499674] bee0: 574f4c465245564f 0000000000000000 0000000000000000 8000000080808080 [ 305.499904] bf00: 0000000000000040 0000000000000038 fefefeff1b4bc2ff 7f7f7f7f7f7fff7f [ 305.500189] bf20: 0101010101010101 0000000000000000 000000000047a4c8 0000000000000038 [ 305.500712] bf40: 0000000000000000 0000ffffa2601280 0000ffffc63f6068 00000000004b5000 [ 305.501241] bf60: 0000000000000001 000000001c4cf5e0 0000000000000009 000000001c4cf5e0 [ 305.501791] bf80: 0000000000000020 0000000000000000 00000000004b5000 000000001c4cc458 [ 305.502314] bfa0: 0000000000000000 0000ffffc63f7950 000000000040a3c4 0000ffffc63f70e0 [ 305.502762] bfc0: 0000ffffa2601268 0000000080000000 0000000000000001 0000000000000040 [ 305.503207] bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 305.503680] [<ffff000008082fb0>] el0_svc_naked+0x24/0x28 [ 305.504720] Kernel Offset: disabled [ 305.505189] CPU features: 0x002082 [ 305.505473] Memory Limit: none [ 305.506181] ---[ end Kernel panic - not syncing: kernel stack overflow This patch was co-authored by Ard Biesheuvel and Mark Rutland. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-08-15arm64: add on_accessible_stack()Mark Rutland1-0/+16
Both unwind_frame() and dump_backtrace() try to check whether a stack address is sane to access, with very similar logic. Both will need updating in order to handle overflow stacks. Factor out this logic into a helper, so that we can avoid further duplication when we add overflow stacks. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-08-15arm64: add basic VMAP_STACK supportMark Rutland2-2/+28
This patch enables arm64 to be built with vmap'd task and IRQ stacks. As vmap'd stacks are mapped at page granularity, stacks must be a multiple of PAGE_SIZE. This means that a 64K page kernel must use stacks of at least 64K in size. To minimize the increase in Image size, IRQ stacks are dynamically allocated at boot time, rather than embedding the boot CPU's IRQ stack in the kernel image. This patch was co-authored by Ard Biesheuvel and Mark Rutland. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-08-15arm64: use an irq stack pointerMark Rutland1-2/+5
We allocate our IRQ stacks using a percpu array. This allows us to generate our IRQ stack pointers with adr_this_cpu, but bloats the kernel Image with the boot CPU's IRQ stack. Additionally, these are packed with other percpu variables, and aren't guaranteed to have guard pages. When we enable VMAP_STACK we'll want to vmap our IRQ stacks also, in order to provide guard pages and to permit more stringent alignment requirements. Doing so will require that we use a percpu pointer to each IRQ stack, rather than allocating a percpu IRQ stack in the kernel image. This patch updates our IRQ stack code to use a percpu pointer to the base of each IRQ stack. This will allow us to change the way the stack is allocated with minimal changes elsewhere. In some cases we may try to backtrace before the IRQ stack pointers are initialised, so on_irq_stack() is updated to account for this. In testing with cyclictest, there was no measureable difference between using adr_this_cpu (for irq_stack) and ldr_this_cpu (for irq_stack_ptr) in the IRQ entry path. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-08-15arm64: assembler: allow adr_this_cpu to use the stack pointerArd Biesheuvel1-1/+7
Given that adr_this_cpu already requires a temp register in addition to the destination register, tweak the instruction sequence so that sp may be used as well. This will simplify switching to per-cpu stacks in subsequent patches. While this limits the range of adr_this_cpu, to +/-4GiB, we don't currently use adr_this_cpu in modules, and this is not problematic for the main kernel image. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [Mark: add more commit text] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-08-15efi/arm64: add EFI_KIMG_ALIGNMark Rutland1-0/+3
The EFI stub is intimately coupled with the kernel, and takes advantage of this by relocating the kernel at a weaker alignment than the documented boot protocol mandates. However, it does so by assuming it can align the kernel to the segment alignment, and assumes that this is 64K. In subsequent patches, we'll have to consider other details to determine this de-facto alignment constraint. This patch adds a new EFI_KIMG_ALIGN definition that will track the kernel's de-facto alignment requirements. Subsequent patches will modify this as required. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk>
2017-08-15arm64: move SEGMENT_ALIGN to <asm/memory.h>Mark Rutland1-0/+19
Currently we define SEGMENT_ALIGN directly in our vmlinux.lds.S. This is unfortunate, as the EFI stub currently open-codes the same number, and in future we'll want to fiddle with this. This patch moves the definition to our <asm/memory.h>, where it can be used by both vmlinux.lds.S and the EFI stub code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-08-15arm64: clean up irq stack definitionsMark Rutland3-25/+26
Before we add yet another stack to the kernel, it would be nice to ensure that we consistently organise stack definitions and related helper functions. This patch moves the basic IRQ stack defintions to <asm/memory.h> to live with their task stack counterparts. Helpers used for unwinding are moved into <asm/stacktrace.h>, where subsequent patches will add helpers for other stacks. Includes are fixed up accordingly. This patch is a pure refactoring -- there should be no functional changes as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-08-15arm64: clean up THREAD_* definitionsMark Rutland2-8/+9
Currently we define THREAD_SIZE and THREAD_SIZE_ORDER separately, with the latter dependent on particular CONFIG_ARM64_*K_PAGES definitions. This is somewhat opaque, and will get in the way of future modifications to THREAD_SIZE. This patch cleans this up, defining both in terms of a common THREAD_SHIFT, and using PAGE_SHIFT to calculate THREAD_SIZE_ORDER, rather than using a number of definitions dependent on config symbols. Subsequent patches will make use of this to alter the stack size used in some configurations. At the same time, these are moved into <asm/memory.h>, which will avoid circular include issues in subsequent patches. To ensure that existing code isn't adversely affected, <asm/thread_info.h> is updated to transitively include these definitions. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-08-15arm64: factor out PAGE_* and CONT_* definitionsMark Rutland3-11/+36
Some headers rely on PAGE_* definitions from <asm/page.h>, but cannot include this due to potential circular includes. For example, a number of definitions in <asm/memory.h> rely on PAGE_SHIFT, and <asm/page.h> includes <asm/memory.h>. This requires users of these definitions to include both headers, which is fragile and error-prone. This patch ameliorates matters by moving the basic definitions out to a new header, <asm/page-def.h>. Both <asm/page.h> and <asm/memory.h> are updated to include this, avoiding this fragility, and avoiding the possibility of circular include dependencies. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-08-15arm64: kernel: remove {THREAD,IRQ_STACK}_START_SPArd Biesheuvel3-5/+3
For historical reasons, we leave the top 16 bytes of our task and IRQ stacks unused, a practice used to ensure that the SP can always be masked to find the base of the current stack (historically, where thread_info could be found). However, this is not necessary, as: * When an exception is taken from a task stack, we decrement the SP by S_FRAME_SIZE and stash the exception registers before we compare the SP against the task stack. In such cases, the SP must be at least S_FRAME_SIZE below the limit, and can be safely masked to determine whether the task stack is in use. * When transitioning to an IRQ stack, we'll place a dummy frame onto the IRQ stack before enabling asynchronous exceptions, or executing code we expect to trigger faults. Thus, if an exception is taken from the IRQ stack, the SP must be at least 16 bytes below the limit. * We no longer mask the SP to find the thread_info, which is now found via sp_el0. Note that historically, the offset was critical to ensure that cpu_switch_to() found the correct stack for new threads that hadn't yet executed ret_from_fork(). Given that, this initial offset serves no purpose, and can be removed. This brings us in-line with other architectures (e.g. x86) which do not rely on this masking. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [Mark: rebase, kill THREAD_START_SP, commit msg additions] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com>
2017-08-15arm64: numa: Remove the unused parent_node() macroDou Liyang1-3/+0
Commit a7be6e5a7f8d ("mm: drop useless local parameters of __register_one_node()") removes the last user of parent_node(). The parent_node() macro in ARM64 platform is unnecessary. Remove it for cleanup. Reported-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-11clocksource/drivers/arm_arch_timer: Avoid infinite recursion when ftrace is enabledDing Tianhong1-2/+2
On platforms with an arch timer erratum workaround, it's possible for arch_timer_reg_read_stable() to recurse into itself when certain tracing options are enabled, leading to stack overflows and related problems. For example, when PREEMPT_TRACER and FUNCTION_GRAPH_TRACER are selected, it's possible to trigger this with: $ mount -t debugfs nodev /sys/kernel/debug/ $ echo function_graph > /sys/kernel/debug/tracing/current_tracer The problem is that in such cases, preempt_disable() instrumentation attempts to acquire a timestamp via trace_clock(), resulting in a call back to arch_timer_reg_read_stable(), and hence recursion. This patch changes arch_timer_reg_read_stable() to use preempt_{disable,enable}_notrace(), which avoids this. This problem is similar to the fixed by upstream commit 96b3d28bf4 ("sched/clock: Prevent tracing recursion in sched_clock_cpu()"). Fixes: 6acc71ccac71 ("arm64: arch_timer: Allows a CPU-specific erratum to only affect a subset of CPUs") Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2017-08-10arm64: compat: Remove leftover variable declarationKevin Brodsky1-2/+0
Commit a1d5ebaf8ccd ("arm64: big-endian: don't treat code as data when copying sigret code") moved the 32-bit sigreturn trampoline code from the aarch32_sigret_code array to kuser32.S. The commit removed the array definition from signal32.c, but not its declaration in signal32.h. Remove the leftover declaration. Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Mark Salyzyn <salyzyn@android.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-10irq: Make the irqentry text section unconditionalMasami Hiramatsu1-7/+0
Generate irqentry and softirqentry text sections without any Kconfig dependencies. This will add extra sections, but there should be no performace impact. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Chris Zankel <chris@zankel.net> Cc: David S . Miller <davem@davemloft.net> Cc: Francis Deslauriers <francis.deslauriers@efficios.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: linux-arch@vger.kernel.org Cc: linux-cris-kernel@axis.com Cc: mathieu.desnoyers@efficios.com Link: http://lkml.kernel.org/r/150172789110.27216.3955739126693102122.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking: Remove smp_mb__before_spinlock()Peter Zijlstra1-9/+0
Now that there are no users of smp_mb__before_spinlock() left, remove it entirely. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10locking: Introduce smp_mb__after_spinlock()Peter Zijlstra1-0/+2
Since its inception, our understanding of ACQUIRE, esp. as applied to spinlocks, has changed somewhat. Also, I wonder if, with a simple change, we cannot make it provide more. The problem with the comment is that the STORE done by spin_lock isn't itself ordered by the ACQUIRE, and therefore a later LOAD can pass over it and cross with any prior STORE, rendering the default WMB insufficient (pointed out by Alan). Now, this is only really a problem on PowerPC and ARM64, both of which already defined smp_mb__before_spinlock() as a smp_mb(). At the same time, we can get a much stronger construct if we place that same barrier _inside_ the spin_lock(). In that case we upgrade the RCpc spinlock to an RCsc. That would make all schedule() calls fully transitive against one another. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will.deacon@arm.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-09arm64: neon: Forbid when irqs are disabledDave Martin1-1/+3
Currently, may_use_simd() can return true if IRQs are disabled. If the caller goes ahead and calls kernel_neon_begin(), this can result in use of local_bh_enable() in an unsafe context. In particular, __efi_fpsimd_begin() may do this when calling EFI as part of system shutdown. This patch ensures that callers don't think they can use kernel_neon_begin() in such a context. Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-09arm64: unwind: remove sp from struct stackframeArd Biesheuvel1-1/+0
The unwind code sets the sp member of struct stackframe to 'frame pointer + 0x10' unconditionally, without regard for whether doing so produces a legal value. So let's simply remove it now that we have stopped using it anyway. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@arm.com>
2017-08-09arm64: unwind: reference pt_regs via embedded stack frameArd Biesheuvel3-25/+6
As it turns out, the unwind code is slightly broken, and probably has been for a while. The problem is in the dumping of the exception stack, which is intended to dump the contents of the pt_regs struct at each level in the call stack where an exception was taken and routed to a routine marked as __exception (which means its stack frame is right below the pt_regs struct on the stack). 'Right below the pt_regs struct' is ill defined, though: the unwind code assigns 'frame pointer + 0x10' to the .sp member of the stackframe struct at each level, and dump_backtrace() happily dereferences that as the pt_regs pointer when encountering an __exception routine. However, the actual size of the stack frame created by this routine (which could be one of many __exception routines we have in the kernel) is not known, and so frame.sp is pretty useless to figure out where struct pt_regs really is. So it seems the only way to ensure that we can find our struct pt_regs when walking the stack frames is to put it at a known fixed offset of the stack frame pointer that is passed to such __exception routines. The simplest way to do that is to put it inside pt_regs itself, which is the main change implemented by this patch. As a bonus, doing this allows us to get rid of a fair amount of cruft related to walking from one stack to the other, which is especially nice since we intend to introduce yet another stack for overflow handling once we add support for vmapped stacks. It also fixes an inconsistency where we only add a stack frame pointing to ELR_EL1 if we are executing from the IRQ stack but not when we are executing from the task stack. To consistly identify exceptions regs even in the presence of exceptions taken from entry code, we must check whether the next frame was created by entry text, rather than whether the current frame was crated by exception text. To avoid backtracing using PCs that fall in the idmap, or are controlled by userspace, we must explcitly zero the FP and LR in startup paths, and must ensure that the frame embedded in pt_regs is zeroed upon entry from EL0. To avoid these NULL entries showin in the backtrace, unwind_frame() is updated to avoid them. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [Mark: compare current frame against .entry.text, avoid bogus PCs] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@arm.com>
2017-08-09arm64: uaccess: Implement *_flushcache variantsRobin Murphy2-0/+16
Implement the set of copy functions with guarantees of a clean cache upon completion necessary to support the pmem driver. Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-09arm64: Implement pmem API supportRobin Murphy3-1/+9
Add a clean-to-point-of-persistence cache maintenance helper, and wire up the basic architectural support for the pmem driver based on it. Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> [catalin.marinas@arm.com: move arch_*_pmem() functions to arch/arm64/mm/flush.c] [catalin.marinas@arm.com: change dmb(sy) to dmb(osh)] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-09arm64: Handle trapped DC CVAPRobin Murphy1-1/+2
Cache clean to PoP is subject to the same access controls as to PoC, so if we are trapping userspace cache maintenance with SCTLR_EL1.UCI, we need to be prepared to handle it. To avoid getting into complicated fights with binutils about ARMv8.2 options, we'll just cheat and use the raw SYS instruction rather than the 'proper' DC alias. Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-09arm64: Expose DC CVAP to userspaceRobin Murphy2-0/+2
The ARMv8.2-DCPoP feature introduces persistent memory support to the architecture, by defining a point of persistence in the memory hierarchy, and a corresponding cache maintenance operation, DC CVAP. Expose the support via HWCAP and MRS emulation. Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-09arm64: Convert __inval_cache_range() to area-basedRobin Murphy1-0/+1
__inval_cache_range() is already the odd one out among our data cache maintenance routines as the only remaining range-based one; as we're going to want an invalidation routine to call from C code for the pmem API, let's tweak the prototype and name to bring it in line with the clean operations, and to make its relationship with __dma_inv_area() neatly mirror that of __clean_dcache_area_poc() and __dma_clean_area(). The loop clearing the early page tables gets mildly massaged in the process for the sake of consistency. Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-09arm64: mm: Fix set_memory_valid() declarationRobin Murphy1-1/+1
Clearly, set_memory_valid() has never been seen in the same room as its declaration... Whilst the type mismatch is such that kexec probably wasn't broken in practice, fix it to match the definition as it should. Fixes: 9b0aa14e3155 ("arm64: mm: add set_memory_valid()") Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-08arm64: unwind: disregard frame.sp when validating frame pointerArd Biesheuvel1-1/+9
Currently, when unwinding the call stack, we validate the frame pointer of each frame against frame.sp, whose value is not clearly defined, and which makes it more difficult to link stack frames together across different stacks. It is far better to simply check whether the frame pointer itself points into a valid stack. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@arm.com>
2017-08-08arm64: unwind: avoid percpu indirection for irq stackMark Rutland1-3/+3
Our IRQ_STACK_PTR() and on_irq_stack() helpers both take a cpu argument, used to generate a percpu address. In all cases, they are passed {raw_,}smp_processor_id(), so this parameter is redundant. Since {raw_,}smp_processor_id() use a percpu variable internally, this approach means we generate a percpu offset to find the current cpu, then use this to index an array of percpu offsets, which we then use to find the current CPU's IRQ stack pointer. Thus, most of the work is redundant. Instead, we can consistently use raw_cpu_ptr() to generate the CPU's irq_stack pointer by simply adding the percpu offset to the irq_stack address, which is simpler in both respects. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@arm.com>
2017-08-08arm64: move non-entry code out of .entry.textMark Rutland1-0/+11
Currently, cpu_switch_to and ret_from_fork both live in .entry.text, though neither form the critical path for an exception entry. In subsequent patches, we will require that code in .entry.text is part of the critical path for exception entry, for which we can assume certain properties (e.g. the presence of exception regs on the stack). Neither cpu_switch_to nor ret_from_fork will meet these requirements, so we must move them out of .entry.text. To ensure that neither are kprobed after being moved out of .entry.text, we must explicitly blacklist them, requiring a new NOKPROBE() asm helper. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@arm.com>