aboutsummaryrefslogtreecommitdiffstats
path: root/arch (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2019-03-06perf/x86/intel: Implement support for TSX Force AbortPeter Zijlstra (Intel)2-3/+77
Skylake (and later) will receive a microcode update to address a TSX errata. This microcode will, on execution of a TSX instruction (speculative or not) use (clobber) PMC3. This update will also provide a new MSR to change this behaviour along with a CPUID bit to enumerate the presence of this new MSR. When the MSR gets set; the microcode will no longer use PMC3 but will Force Abort every TSX transaction (upon executing COMMIT). When TSX Force Abort (TFA) is allowed (default); the MSR gets set when PMC3 gets scheduled and cleared when, after scheduling, PMC3 is unused. When TFA is not allowed; clear PMC3 from all constraints such that it will not get used. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-03-06x86: Add TSX Force Abort CPUID/MSRPeter Zijlstra (Intel)2-0/+7
Skylake systems will receive a microcode update to address a TSX errata. This microcode will (by default) clobber PMC3 when TSX instructions are (speculatively or not) executed. It also provides an MSR to cause all TSX transaction to abort and preserve PMC3. Add the CPUID enumeration and MSR definition. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-03-06perf/x86/intel: Generalize dynamic constraint creationPeter Zijlstra (Intel)1-21/+30
Such that we can re-use it. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-03-06perf/x86/intel: Make cpuc allocations consistentPeter Zijlstra (Intel)3-22/+31
The cpuc data structure allocation is different between fake and real cpuc's; use the same code to init/free both. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-03-05docs/core-api/mm: fix user memory accessors formattingMike Rapoport2-16/+16
The descriptions of userspace memory access functions had minor issues with formatting that made kernel-doc unable to properly detect the function/macro names and the return value sections: ./arch/x86/include/asm/uaccess.h:80: info: Scanning doc for ./arch/x86/include/asm/uaccess.h:139: info: Scanning doc for ./arch/x86/include/asm/uaccess.h:231: info: Scanning doc for ./arch/x86/include/asm/uaccess.h:505: info: Scanning doc for ./arch/x86/include/asm/uaccess.h:530: info: Scanning doc for ./arch/x86/lib/usercopy_32.c:58: info: Scanning doc for ./arch/x86/lib/usercopy_32.c:69: warning: No description found for return value of 'clear_user' ./arch/x86/lib/usercopy_32.c:78: info: Scanning doc for ./arch/x86/lib/usercopy_32.c:90: warning: No description found for return value of '__clear_user' Fix the formatting. Link: http://lkml.kernel.org/r/1549549644-4903-3-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05numa: make "nr_node_ids" unsigned intAlexey Dobriyan4-5/+5
Number of NUMA nodes can't be negative. This saves a few bytes on x86_64: add/remove: 0/0 grow/shrink: 4/21 up/down: 27/-265 (-238) Function old new delta hv_synic_alloc.cold 88 110 +22 prealloc_shrinker 260 262 +2 bootstrap 249 251 +2 sched_init_numa 1566 1567 +1 show_slab_objects 778 777 -1 s_show 1201 1200 -1 kmem_cache_init 346 345 -1 __alloc_workqueue_key 1146 1145 -1 mem_cgroup_css_alloc 1614 1612 -2 __do_sys_swapon 4702 4699 -3 __list_lru_init 655 651 -4 nic_probe 2379 2374 -5 store_user_store 118 111 -7 red_zone_store 106 99 -7 poison_store 106 99 -7 wq_numa_init 348 338 -10 __kmem_cache_empty 75 65 -10 task_numa_free 186 173 -13 merge_across_nodes_store 351 336 -15 irq_create_affinity_masks 1261 1246 -15 do_numa_crng_init 343 321 -22 task_numa_fault 4760 4737 -23 swapfile_init 179 156 -23 hv_synic_alloc 536 492 -44 apply_wqattrs_prepare 746 695 -51 Link: http://lkml.kernel.org/r/20190201223029.GA15820@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05powerpc/mm/iommu: allow large IOMMU page size only for hugetlb backingAneesh Kumar K.V1-17/+7
THP pages can get split during different code paths. An incremented reference count does imply we will not split the compound page. But the pmd entry can be converted to level 4 pte entries. Keep the code simpler by allowing large IOMMU page size only if the guest ram is backed by hugetlb pages. Link: http://lkml.kernel.org/r/20190114095438.32470-6-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05powerpc/mm/iommu: allow migration of cma allocated pages during mm_iommu_do_allocAneesh Kumar K.V1-87/+38
The current code doesn't do page migration if the page allocated is a compound page. With HugeTLB migration support, we can end up allocating hugetlb pages from CMA region. Also, THP pages can be allocated from CMA region. This patch updates the code to handle compound pages correctly. The patch also switches to a single get_user_pages with the right count, instead of doing one get_user_pages per page. That avoids reading page table multiple times. This is done by using get_user_pages_longterm, because that also takes care of DAX backed pages. DAX pages lifetime is dictated by file system rules and as such, we need to make sure that we free these pages on operations like truncate and punch hole. If we have long term pin on these pages, which are mostly return to userspace with elevated page count, the entity holding the long term pin may not be aware of the fact that file got truncated and the file system blocks possibly got reused. That can result in corruption. The patch also converts the hpas member of mm_iommu_table_group_mem_t to a union. We use the same storage location to store pointers to struct page. We cannot update all the code path use struct page *, because we access hpas in real mode and we can't do that struct page * to pfn conversion in real mode. [aneesh.kumar@linux.ibm.com: address review feedback, update changelog] Link: http://lkml.kernel.org/r/20190227144736.5872-4-aneesh.kumar@linux.ibm.com Link: http://lkml.kernel.org/r/20190114095438.32470-5-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05ia64: perfmon: don't mark buffer pages as PG_reservedDavid Hildenbrand1-55/+4
In the old days, remap_pfn_range() required pages to be marked as PG_reserved, so they would e.g. never get swapped out. This was required for special mappings. Nowadays, this is fully handled via the VMA (VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP inside remap_pfn_range() to be precise). PG_reserved is no longer required but only a relic from the past. So only architecture specific MM handling might require it (e.g. to detect them as MMIO pages). As there are no architecture specific checks for PageReserved() apart from MCA handling in ia64code, this can go. Use simple vzalloc()/vfree() instead. Note that before calling vzalloc(), size has already been aligned to PAGE_SIZE, no need to align again. Link: http://lkml.kernel.org/r/20190114125903.24845-9-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05arm64: kdump: no need to mark crashkernel pages manually PG_reservedDavid Hildenbrand2-28/+1
The crashkernel is reserved via memblock_reserve(). memblock_free_all() will call free_low_memory_core_early(), which will go over all reserved memblocks, marking the pages as PG_reserved. So manually marking pages as PG_reserved is not necessary, they are already in the desired state (otherwise they would have been handed over to the buddy as free pages and bad things would happen). Link: http://lkml.kernel.org/r/20190114125903.24845-8-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Matthias Brugger <mbrugger@suse.com> Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: James Morse <james.morse@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Stefan Agner <stefan@agner.ch> Cc: Laura Abbott <labbott@redhat.com> Cc: Greg Hackmann <ghackmann@android.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kristina Martsenko <kristina.martsenko@arm.com> Cc: CHANDAN VN <chandan.vn@samsung.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05arm64: kexec: no need to ClearPageReserved()David Hildenbrand1-1/+0
This will be done by free_reserved_page(). Link: http://lkml.kernel.org/r/20190114125903.24845-7-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: James Morse <james.morse@arm.com> Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05m68k/mm: use __ClearPageReserved()David Hildenbrand1-1/+1
The PG_reserved flag is cleared from memory that is part of the kernel image (and therefore marked as PG_reserved). Avoid using PG_reserved directly. Link: http://lkml.kernel.org/r/20190114125903.24845-6-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05riscv/vdso: don't clear PG_reservedDavid Hildenbrand1-1/+0
The VDSO is part of the kernel image and therefore the struct pages are marked as reserved during boot. As we install a special mapping, the actual struct pages will never be exposed to MM via the page tables. We can therefore leave the pages marked as reserved. Link: http://lkml.kernel.org/r/20190114125903.24845-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Palmer Dabbelt <palmer@sifive.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Tobias Klauser <tklauser@distanz.ch> Cc: Michal Hocko <mhocko@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05powerpc/vdso: don't clear PG_reservedDavid Hildenbrand1-2/+0
The VDSO is part of the kernel image and therefore the struct pages are marked as reserved during boot. As we install a special mapping, the actual struct pages will never be exposed to MM via the page tables. We can therefore leave the pages marked as reserved. Link: http://lkml.kernel.org/r/20190114125903.24845-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Kees Cook <keescook@chromium.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05s390/vdso: don't clear PG_reservedDavid Hildenbrand1-2/+0
The VDSO is part of the kernel image and therefore the struct pages are marked as reserved during boot. As we install a special mapping, the actual struct pages will never be exposed to MM via the page tables. We can therefore leave the pages marked as reserved. Link: http://lkml.kernel.org/r/20190114125903.24845-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Suggested-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Souptick Joarder <jrdr.linux@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05arch/powerpc/mm/hugetlb: NestMMU workaround for hugetlb mprotect RW upgradeAneesh Kumar K.V3-0/+54
NestMMU requires us to mark the pte invalid and flush the tlb when we do a RW upgrade of pte. We fixed a variant of this in the fault path in bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang"). Link: http://lkml.kernel.org/r/20190116085035.29729-6-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05arch/powerpc/mm: Nest MMU workaround for mprotect RW upgradeAneesh Kumar K.V4-0/+65
NestMMU requires us to mark the pte invalid and flush the tlb when we do a RW upgrade of pte. We fixed a variant of this in the fault path in bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang"). Do the same for mprotect upgrades. Hugetlb is handled in the next patch. Link: http://lkml.kernel.org/r/20190116085035.29729-4-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm: update ptep_modify_prot_commit to take old pte value as argAneesh Kumar K.V3-3/+4
Architectures like ppc64 require to do a conditional tlb flush based on the old and new value of pte. Enable that by passing old pte value as the arg. Link: http://lkml.kernel.org/r/20190116085035.29729-3-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm: update ptep_modify_prot_start/commit to take vm_area_struct as argAneesh Kumar K.V6-17/+21
Patch series "NestMMU pte upgrade workaround for mprotect", v5. We can upgrade pte access (R -> RW transition) via mprotect. We need to make sure we follow the recommended pte update sequence as outlined in commit bd5050e38aec ("powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang") for such updates. This patch series does that. This patch (of 5): Some architectures may want to call flush_tlb_range from these helpers. Link: http://lkml.kernel.org/r/20190116085035.29729-2-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pagesAnshuman Khandual2-0/+25
Let arm64 subscribe to the previously added framework in which architecture can inform whether a given huge page size is supported for migration. This just overrides the default function arch_hugetlb_migration_supported() and enables migration for all possible HugeTLB page sizes on arm64. With this, HugeTLB migration support on arm64 now covers all possible HugeTLB options. CONT PTE PMD CONT PMD PUD -------- --- -------- --- 4K: 64K 2M 32M 1G 16K: 2M 32M 1G 64K: 2M 512M 16G Link: http://lkml.kernel.org/r/1545121450-1663-6-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05arm64/mm: enable HugeTLB migrationAnshuman Khandual1-0/+4
Let arm64 subscribe to generic HugeTLB page migration framework. Right now this only works on the following PMD and PUD level HugeTLB page sizes with various kernel base page size combinations. CONT PTE PMD CONT PMD PUD -------- --- -------- --- 4K: NA 2M NA 1G 16K: NA 32M NA 64K: NA 512M NA Link: http://lkml.kernel.org/r/1545121450-1663-5-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Steve Capper <steve.capper@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05mm: replace all open encodings for NUMA_NO_NODEAnshuman Khandual16-29/+41
Patch series "Replace all open encodings for NUMA_NO_NODE", v3. All these places for replacement were found by running the following grep patterns on the entire kernel code. Please let me know if this might have missed some instances. This might also have replaced some false positives. I will appreciate suggestions, inputs and review. 1. git grep "nid == -1" 2. git grep "node == -1" 3. git grep "nid = -1" 4. git grep "node = -1" This patch (of 2): At present there are multiple places where invalid node number is encoded as -1. Even though implicitly understood it is always better to have macros in there. Replace these open encodings for an invalid node number with the global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like 'invalid node' from various places redirecting them to a common definition. Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [ixgbe] Acked-by: Jens Axboe <axboe@kernel.dk> [mtip32xx] Acked-by: Vinod Koul <vkoul@kernel.org> [dmaengine.c] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Doug Ledford <dledford@redhat.com> [drivers/infiniband] Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Hans Verkuil <hverkuil@xs4all.nl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05sh: remove nargs from __SYSCALLFiroz Khan2-3/+3
The __SYSCALL macro's arguments are system call number, system call entry name and number of arguments for the system call. Argument- nargs in __SYSCALL(nr, entry, nargs) is neither calculated nor used anywhere. So it would be better to keep the implementation as __SYSCALL(nr, entry). This unifies the implementation with some other architectures too. Link: http://lkml.kernel.org/r/1546443445-21075-2-git-send-email-firoz.khan@linaro.org Signed-off-by: Firoz Khan <firoz.khan@linaro.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Simon Horman <horms+renesas@verge.net.au> Cc: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Kate Stewart <kstewart@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05kasan: remove use after scope bugs detection.Andrey Ryabinin1-4/+0
Use after scope bugs detector seems to be almost entirely useless for the linux kernel. It exists over two years, but I've seen only one valid bug so far [1]. And the bug was fixed before it has been reported. There were some other use-after-scope reports, but they were false-positives due to different reasons like incompatibility with structleak plugin. This feature significantly increases stack usage, especially with GCC < 9 version, and causes a 32K stack overflow. It probably adds performance penalty too. Given all that, let's remove use-after-scope detector entirely. While preparing this patch I've noticed that we mistakenly enable use-after-scope detection for clang compiler regardless of CONFIG_KASAN_EXTRA setting. This is also fixed now. [1] http://lkml.kernel.org/r/<20171129052106.rhgbjhhis53hkgfn@wfg-t540p.sh.intel.com> Link: http://lkml.kernel.org/r/20190111185842.13978-1-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Will Deacon <will.deacon@arm.com> [arm64] Cc: Qian Cai <cai@lca.pw> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05x86: Deprecate a.out supportBorislav Petkov2-2/+1
Linux supports ELF binaries for ~25 years now. a.out coredumping has bitrotten quite significantly and would need some fixing to get it into shape again but considering how even the toolchains cannot create a.out executables in its default configuration, let's deprecate a.out support and remove it a couple of releases later, instead. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Richard Weinberger <richard@nod.at> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Jann Horn <jannh@google.com> Cc: <linux-api@vger.kernel.org> Cc: <linux-fsdevel@vger.kernel.org> Cc: lkml <linux-kernel@vger.kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <x86@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05a.out: remove core dumping supportLinus Torvalds5-402/+0
We're (finally) phasing out a.out support for good. As Borislav Petkov points out, we've supported ELF binaries for about 25 years by now, and coredumping in particular has bitrotted over the years. None of the tool chains even support generating a.out binaries any more, and the plan is to deprecate a.out support entirely for the kernel. But I want to start with just removing the core dumping code, because I can still imagine that somebody actually might want to support a.out as a simpler biinary format. Particularly if you generate some random binaries on the fly, ELF is a much more complicated format (admittedly ELF also does have a lot of toolchain support, mitigating that complexity a lot and you really should have moved over in the last 25 years). So it's at least somewhat possible that somebody out there has some workflow that still involves generating and running a.out executables. In contrast, it's very unlikely that anybody depends on debugging any legacy a.out core files. But regardless, I want this phase-out to be done in two steps, so that we can resurrect a.out support (if needed) without having to resurrect the core file dumping that is almost certainly not needed. Jann Horn pointed to the <asm/a.out-core.h> file that my first trivial cut at this had missed. And Alan Cox points out that the a.out binary loader _could_ be done in user space if somebody wants to, but we might keep just the loader in the kernel if somebody really wants it, since the loader isn't that big and has no really odd special cases like the core dumping does. Acked-by: Borislav Petkov <bp@alien8.de> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Jann Horn <jannh@google.com> Cc: Richard Weinberger <richard@nod.at> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05x86/ftrace: Fix warning and considate ftrace_jmp_replace() and ftrace_call_replace()Steven Rostedt (VMware)1-25/+17
Arnd reported the following compiler warning: arch/x86/kernel/ftrace.c:669:23: error: 'ftrace_jmp_replace' defined but not used [-Werror=unused-function] The ftrace_jmp_replace() function now only has a single user and should be simply moved by that user. But looking at the code, it shows that ftrace_jmp_replace() is similar to ftrace_call_replace() except that instead of using the opcode of 0xe8 it uses 0xe9. It makes more sense to consolidate that function into one implementation that both ftrace_jmp_replace() and ftrace_call_replace() use by passing in the op code separate. The structure in ftrace_code_union is also modified to replace the "e8" field with the more appropriate name "op". Cc: stable@vger.kernel.org Reported-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: http://lkml.kernel.org/r/20190304200748.1418790-1-arnd@arndb.de Fixes: d2a68c4effd8 ("x86/ftrace: Do not call function graph from dynamic trampolines") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-03-05xen: remove pre-xen3 fallback handlersArnd Bergmann1-11/+2
The legacy hypercall handlers were originally added with a comment explaining that "copying the argument structures in HYPERVISOR_event_channel_op() and HYPERVISOR_physdev_op() into the local variable is sufficiently safe" and only made sure to not write past the end of the argument structure, the checks in linux/string.h disagree with that, when link-time optimizations are used: In function 'memcpy', inlined from 'pirq_query_unmask' at drivers/xen/fallback.c:53:2, inlined from '__startup_pirq' at drivers/xen/events/events_base.c:529:2, inlined from 'restore_pirqs' at drivers/xen/events/events_base.c:1439:3, inlined from 'xen_irq_resume' at drivers/xen/events/events_base.c:1581:2: include/linux/string.h:350:3: error: call to '__read_overflow2' declared with attribute error: detected read beyond size of object passed as 2nd parameter __read_overflow2(); ^ Further research turned out that only Xen 3.0.2 or earlier required the fallback at all, while all versions in use today don't need it. As far as I can tell, it is not even possible to run a mainline kernel on those old Xen releases, at the time when they were in use, only a patched kernel was supported anyway. Fixes: cf47a83fb06e ("xen/hypercall: fix hypercall fallback code for very old hypervisors") Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jan Beulich <JBeulich@suse.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Juergen Gross <jgross@suse.com>
2019-03-04arch: riscv: fix logic error in parse_dtbAndreas Schwab1-1/+1
The function early_init_dt_scan returns true if a DTB was detected. Fixes: 8fd6e05c7463 ("arch: riscv: support kernel command line forcing when no DTB passed") Signed-off-by: Andreas Schwab <schwab@suse.de> Reviewed-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com> Tested-by: Paul Walmsley <paul.walmsley@sifive.com> # FU540 HiFive-U BBL Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2019-03-04get rid of legacy 'get_ds()' functionLinus Torvalds28-44/+9
Every in-kernel use of this function defined it to KERNEL_DS (either as an actual define, or as an inline function). It's an entirely historical artifact, and long long long ago used to actually read the segment selector valueof '%ds' on x86. Which in the kernel is always KERNEL_DS. Inspired by a patch from Jann Horn that just did this for a very small subset of users (the ones in fs/), along with Al who suggested a script. I then just took it to the logical extreme and removed all the remaining gunk. Roughly scripted with git grep -l '(get_ds())' -- :^tools/ | xargs sed -i 's/(get_ds())/(KERNEL_DS)/' git grep -lw 'get_ds' -- :^tools/ | xargs sed -i '/^#define get_ds()/d' plus manual fixups to remove a few unusual usage patterns, the couple of inline function cases and to fix up a comment that had become stale. The 'get_ds()' function remains in an x86 kvm selftest, since in user space it actually does something relevant. Inspired-by: Jann Horn <jannh@google.com> Inspired-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-04RISC-V: Assign hwcap as per comman capabilities.Atish Patra1-19/+22
Currently, we set hwcap based on first valid hart from DT. This may not be correct always as that hart might not be current booting cpu or may have a different capability. Set hwcap as the capabilities supported by all possible harts with "okay" status. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Johan Hovold <johan@kernel.org> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2019-03-04RISC-V: Compare cpuid with NR_CPUS before mapping.Atish Patra1-0/+5
We should never have a cpuid greater that NR_CPUS. Compare with NR_CPUS before creating the mapping between logical and physical CPU ids. This is also mandatory as NR_CPUS check is removed from riscv_of_processor_hartid. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2019-03-04RISC-V: Allow hartid-to-cpuid function to fail.Atish Patra1-1/+0
It is perfectly okay to call riscv_hartid_to_cpuid for a hartid that is not mapped with an CPU id. It can happen if the calling functions retrieves the hartid from DT. However, that hartid was never brought online by the firmware or kernel for any reasons. No need to BUG() in the above case. A negative error return is sufficient and the calling function should check for the return value always. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2019-03-04RISC-V: Remove NR_CPUs check during hartid search from DTAtish Patra1-4/+0
In non-smp configuration, hartid can be higher that NR_CPUS. riscv_of_processor_hartid should not be compared to hartid to NR_CPUS in that case. Moreover, this function checks all the DT properties of a hart node. NR_CPUS comparison seems out of place. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2019-03-04RISC-V: Move cpuid to hartid mapping to SMP.Atish Patra3-14/+22
Currently, logical CPU id to physical hartid mapping is defined for both smp and non-smp configurations. This is not required as we need this only for smp configuration. The mapping function can define directly boot_cpu_hartid for non-smp use case. The reverse mapping function i.e. hartid to cpuid can be called for any valid but not booted harts. So it should return default cpu 0 only if it is a boot hartid. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2019-03-04RISC-V: Do not wait indefinitely in __cpu_upAtish Patra1-3/+12
In SMP path, __cpu_up waits for other CPU to come online indefinitely. This is wrong as other CPU might be disabled in machine mode and possible CPU is set to the cpus present in DT. Introduce a completion variable and waits only for a second. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2019-03-04x86-64: add warning for non-canonical user access address dereferencesLinus Torvalds1-0/+1
This adds a warning (once) for any kernel dereference that has a user exception handler, but accesses a non-canonical address. It basically is a simpler - and more limited - version of commit 9da3f2b74054 ("x86/fault: BUG() when uaccess helpers fault on kernel addresses") that got reverted. Note that unlike that original commit, this only causes a warning, because there are real situations where we currently can do this (notably speculative argument fetching for uprobes etc). Also, unlike that original commit, this _only_ triggers for #GP accesses, so the cases of valid kernel pointers that cross into a non-mapped page aren't affected. The intent of this is two-fold: - the uprobe/tracing accesses really do need to be more careful. In particular, from a portability standpoint it's just wrong to think that "a pointer is a pointer", and use the same logic for any random pointer value you find on the stack. It may _work_ on x86-64, but it doesn't necessarily work on other architectures (where the same pointer value can be either a kernel pointer _or_ a user pointer, and you really need to be much more careful in how you try to access it) The warning can hopefully end up being a reminder that just any random pointer access won't do. - Kees in particular wanted a way to actually report invalid uses of wild pointers to user space accessors, instead of just silently failing them. Automated fuzzers want a way to get reports if the kernel ever uses invalid values that the fuzzer fed it. The non-canonical address range is a fair chunk of the address space, and with this you can teach syzkaller to feed in invalid pointer values and find cases where we do not properly validate user addresses (possibly due to bad uses of "set_fs()"). Acked-by: Kees Cook <keescook@chromium.org> Cc: Jann Horn <jannh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-04powerpc/32: Clear on-stack exception marker upon exception returnChristophe Leroy1-0/+9
Clear the on-stack STACK_FRAME_REGS_MARKER on exception exit in order to avoid confusing stacktrace like the one below. Call Trace: [c0e9dca0] [c01c42a0] print_address_description+0x64/0x2bc (unreliable) [c0e9dcd0] [c01c4684] kasan_report+0xfc/0x180 [c0e9dd10] [c0895130] memchr+0x24/0x74 [c0e9dd30] [c00a9e38] msg_print_text+0x124/0x574 [c0e9dde0] [c00ab710] console_unlock+0x114/0x4f8 [c0e9de40] [c00adc60] vprintk_emit+0x188/0x1c4 --- interrupt: c0e9df00 at 0x400f330 LR = init_stack+0x1f00/0x2000 [c0e9de80] [c00ae3c4] printk+0xa8/0xcc (unreliable) [c0e9df20] [c0c27e44] early_irq_init+0x38/0x108 [c0e9df50] [c0c15434] start_kernel+0x310/0x488 [c0e9dff0] [00003484] 0x3484 With this patch the trace becomes: Call Trace: [c0e9dca0] [c01c42c0] print_address_description+0x64/0x2bc (unreliable) [c0e9dcd0] [c01c46a4] kasan_report+0xfc/0x180 [c0e9dd10] [c0895150] memchr+0x24/0x74 [c0e9dd30] [c00a9e58] msg_print_text+0x124/0x574 [c0e9dde0] [c00ab730] console_unlock+0x114/0x4f8 [c0e9de40] [c00adc80] vprintk_emit+0x188/0x1c4 [c0e9de80] [c00ae3e4] printk+0xa8/0xcc [c0e9df20] [c0c27e44] early_irq_init+0x38/0x108 [c0e9df50] [c0c15434] start_kernel+0x310/0x488 [c0e9dff0] [00003484] 0x3484 Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-03-02powerpc: Remove export of save_stack_trace_tsk_reliable()Joe Lawrence1-1/+0
As tglx points out, there are no in-tree module users of save_stack_trace_tsk_reliable() and its x86 counterpart is not exported, so remove the powerpc symbol export. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-03-02powerpc/mm: fix "section_base" set but not usedQian Cai1-2/+0
The commit 24b6d4164348 ("mm: pass the vmem_altmap to vmemmap_free") removed a line in vmemmap_free(), altmap = to_vmem_altmap((unsigned long) section_base); but left a variable no longer used. arch/powerpc/mm/init_64.c: In function 'vmemmap_free': arch/powerpc/mm/init_64.c:277:16: error: variable 'section_base' set but not used [-Werror=unused-but-set-variable] Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-03-02powerpc/mm: Fix "sz" set but not used warningQian Cai1-2/+1
Fix compiler warning: arch/powerpc/mm/hugetlbpage-hash64.c: In function '__hash_page_huge': arch/powerpc/mm/hugetlbpage-hash64.c:29:28: warning: variable 'sz' set but not used [-Wunused-but-set-variable] mpe: The last usage of sz was removed in 0895ecda7942 ("powerpc/mm: Bring hugepage PTE accessor functions back into sync with normal accessors"). Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-03-02powerpc/mm: Check secondary hash page tableRashmica Gupta1-1/+1
We were always calling base_hpte_find() with primary = true, even when we wanted to check the secondary table. mpe: I broke this when refactoring Rashmica's original patch. Fixes: 1515ab932156 ("powerpc/mm: Dump hash table") Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-03-02powerpc: remove nargs from __SYSCALLFiroz Khan3-6/+6
The __SYSCALL macro's arguments are system call number, system call entry name and number of arguments for the system call. Argument- nargs in __SYSCALL(nr, entry, nargs) is neither calculated nor used anywhere. So it would be better to keep the implementaion as __SYSCALL(nr, entry). This will unifies the implementation with some other architetures too. Signed-off-by: Firoz Khan <firoz.khan@linaro.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-03-02MIPS: eBPF: Fix icache flush end addressPaul Burton1-1/+1
The MIPS eBPF JIT calls flush_icache_range() in order to ensure the icache observes the code that we just wrote. Unfortunately it gets the end address calculation wrong due to some bad pointer arithmetic. The struct jit_ctx target field is of type pointer to u32, and as such adding one to it will increment the address being pointed to by 4 bytes. Therefore in order to find the address of the end of the code we simply need to add the number of 4 byte instructions emitted, but we mistakenly add the number of instructions multiplied by 4. This results in the call to flush_icache_range() operating on a memory region 4x larger than intended, which is always wasteful and can cause crashes if we overrun into an unmapped page. Fix this by correcting the pointer arithmetic to remove the bogus multiplication, and use braces to remove the need for a set of brackets whilst also making it obvious that the target field is a pointer. Signed-off-by: Paul Burton <paul.burton@mips.com> Fixes: b6bd53f9c4e8 ("MIPS: Add missing file for eBPF JIT.") Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: netdev@vger.kernel.org Cc: bpf@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-03-01arm64: dts: fsl: ls1028a-rdb: Add ENETC external eth ports for the LS1028A RDB boardClaudiu Manoil1-0/+17
The LS1028A RDB board features an Atheros PHY connected over SGMII to the ENETC PF0 (or Port0). ENETC Port1 (PF1) has no external connection on this board, so it can be disabled for now. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01arm64: dts: fsl: ls1028a: Add PCI IERC node and ENETC endpointsClaudiu Manoil1-0/+35
The LS1028A SoC features a PCI Integrated Endpoint Root Complex (IERC) defining several integrated PCI devices, including the ENETC ethernet controller integrated endpoints (IEPs). The IERC implements ECAM (Enhanced Configuration Access Mechanism) to provide access to the PCIe config space of the IEPs. This means the the IEPs (including ENETC) do not support the standard PCIe BARs, instead the Enhanced Allocation (EA) capability structures in the ECAM space are used to fix the base addresses in the system, and the PCI subsystem uses these structures for device enumeration and discovery. The "ranges" entries contain basic information from these EA capabily structures required by the kernel for device enumeration. The current patch also enables the first 2 ENETC PFs (Physiscal Functions) and the associated VFs (Virtual Functions), 2 VFs for each PF. Each of these ENETC PFs has an external ethernet port on the LS1028A SoC. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01arm64: mmu: drop paging_init commentsPeng Fan1-4/+0
The comments could not reflect the code, and it is easy to get what this function does from a straight-line reading of the code. So let's drop the comments Signed-off-by: Peng Fan <peng.fan@nxp.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-03-01arm64: debug: Ensure debug handlers check triggering exception levelWill Deacon2-4/+16
Debug exception handlers may be called for exceptions generated both by user and kernel code. In many cases, this is checked explicitly, but in other cases things either happen to work by happy accident or they go slightly wrong. For example, executing 'brk #4' from userspace will enter the kprobes code and be ignored, but the instruction will be retried forever in userspace instead of delivering a SIGTRAP. Fix this issue in the most stable-friendly fashion by simply adding explicit checks of the triggering exception level to all of our debug exception handlers. Cc: <stable@vger.kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-03-01arm64: debug: Don't propagate UNKNOWN FAR into si_code for debug signalsWill Deacon1-4/+5
FAR_EL1 is UNKNOWN for all debug exceptions other than those caused by taking a hardware watchpoint. Unfortunately, if a debug handler returns a non-zero value, then we will propagate the UNKNOWN FAR value to userspace via the si_addr field of the SIGTRAP siginfo_t. Instead, let's set si_addr to take on the PC of the faulting instruction, which we have available in the current pt_regs. Cc: <stable@vger.kernel.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-03-01s390/suspend: fix prefix register reset in swsusp_arch_resumeMartin Schwidefsky1-3/+3
The reset of the prefix to zero in swsusp_arch_resume uses a 4 byte stack slot. With CONFIG_VMAP_STACK=y this is now in the vmalloc area, this works only with DAT enabled. Move the DAT disable in swsusp_arch_resume after the prefix reset. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>