aboutsummaryrefslogtreecommitdiffstats
path: root/arch/s390/mm (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-05-20s390/mm: fix set_huge_pte_at() for empty ptesGerald Schaefer1-3/+6
On s390, the layout of normal and large ptes (i.e. pmds/puds) differs. Therefore, set_huge_pte_at() does a conversion from a normal pte to the corresponding large pmd/pud. So, when converting an empty pte, this should result in an empty pmd/pud, which would return true for pmd/pud_none(). However, after conversion we also mark the pmd/pud as large, and therefore present. For empty ptes, this will result in an empty pmd/pud that is also marked as large, and pmd/pud_none() would not return true. There is currently no issue with this behaviour, as set_huge_pte_at() does not seem to be called for empty ptes. It would be valid though, so let's fix this by not marking empty ptes as large in set_huge_pte_at(). This was found by testing a patch from from Anshuman Khandual, which is currently discussed on LKML ("mm/debug: Add more arch page table helper tests"). Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-04-28Merge tag 'cve-2020-11884' from emailed bundleLinus Torvalds1-2/+14
Pull s390 fix from Christian Borntraeger: "Fix a race between page table upgrade and uaccess on s390. This fixes CVE-2020-11884 which allows for a local kernel crash or code execution" * tag 'cve-2020-11884' from emailed bundle: s390/mm: fix page table upgrade vs 2ndary address mode accesses
2020-04-21s390/mm: fix page table upgrade vs 2ndary address mode accessesChristian Borntraeger1-2/+14
A page table upgrade in a kernel section that uses secondary address mode will mess up the kernel instructions as follows: Consider the following scenario: two threads are sharing memory. On CPU1 thread 1 does e.g. strnlen_user(). That gets to old_fs = enable_sacf_uaccess(); len = strnlen_user_srst(src, size); and " la %2,0(%1)\n" " la %3,0(%0,%1)\n" " slgr %0,%0\n" " sacf 256\n" "0: srst %3,%2\n" in strnlen_user_srst(). At that point we are in secondary space mode, control register 1 points to kernel page table and instruction fetching happens via c1, rather than usual c13. Interrupts are not disabled, for obvious reasons. On CPU2 thread 2 does MAP_FIXED mmap(), forcing the upgrade of page table from 3-level to e.g. 4-level one. We'd allocated new top-level table, set it up and now we hit this: notify = 1; spin_unlock_bh(&mm->page_table_lock); } if (notify) on_each_cpu(__crst_table_upgrade, mm, 0); OK, we need to actually change over to use of new page table and we need that to happen in all threads that are currently running. Which happens to include the thread 1. IPI is delivered and we have static void __crst_table_upgrade(void *arg) { struct mm_struct *mm = arg; if (current->active_mm == mm) set_user_asce(mm); __tlb_flush_local(); } run on CPU1. That does static inline void set_user_asce(struct mm_struct *mm) { S390_lowcore.user_asce = mm->context.asce; OK, user page table address updated... __ctl_load(S390_lowcore.user_asce, 1, 1); ... and control register 1 set to it. clear_cpu_flag(CIF_ASCE_PRIMARY); } IPI is run in home space mode, so it's fine - insns are fetched using c13, which always points to kernel page table. But as soon as we return from the interrupt, previous PSW is restored, putting CPU1 back into secondary space mode, at which point we no longer get the kernel instructions from the kernel mapping. The fix is to only fixup the control registers that are currently in use for user processes during the page table update. We must also disable interrupts in enable_sacf_uaccess to synchronize the cr and thread.mm_segment updates against the on_each-cpu. Fixes: 0aaba41b58bc ("s390: remove all code using the access register mode") Cc: stable@vger.kernel.org # 4.15+ Reported-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> References: CVE-2020-11884 Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-04-10Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-4/+7
Merge yet more updates from Andrew Morton: - Almost all of the rest of MM (memcg, slab-generic, slab, pagealloc, gup, hugetlb, pagemap, memremap) - Various other things (hfs, ocfs2, kmod, misc, seqfile) * akpm: (34 commits) ipc/util.c: sysvipc_find_ipc() should increase position index kernel/gcov/fs.c: gcov_seq_next() should increase position index fs/seq_file.c: seq_read(): add info message about buggy .next functions drivers/dma/tegra20-apb-dma.c: fix platform_get_irq.cocci warnings change email address for Pali Rohár selftests: kmod: test disabling module autoloading selftests: kmod: fix handling test numbers above 9 docs: admin-guide: document the kernel.modprobe sysctl fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once() kmod: make request_module() return an error when autoloading is disabled mm/memremap: set caching mode for PCI P2PDMA memory to WC mm/memory_hotplug: add pgprot_t to mhp_params powerpc/mm: thread pgprot_t through create_section_mapping() x86/mm: introduce __set_memory_prot() x86/mm: thread pgprot_t through init_memory_mapping() mm/memory_hotplug: rename mhp_restrictions to mhp_params mm/memory_hotplug: drop the flags field from struct mhp_restrictions mm/special: create generic fallbacks for pte_special() and pte_mkspecial() mm/vma: introduce VM_ACCESS_FLAGS mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS ...
2020-04-10mm/memory_hotplug: add pgprot_t to mhp_paramsLogan Gunthorpe1-0/+3
devm_memremap_pages() is currently used by the PCI P2PDMA code to create struct page mappings for IO memory. At present, these mappings are created with PAGE_KERNEL which implies setting the PAT bits to be WB. However, on x86, an mtrr register will typically override this and force the cache type to be UC-. In the case firmware doesn't set this register it is effectively WB and will typically result in a machine check exception when it's accessed. Other arches are not currently likely to function correctly seeing they don't have any MTRR registers to fall back on. To solve this, provide a way to specify the pgprot value explicitly to arch_add_memory(). Of the arches that support MEMORY_HOTPLUG: x86_64, and arm64 need a simple change to pass the pgprot_t down to their respective functions which set up the page tables. For x86_32, set the page tables explicitly using _set_memory_prot() (seeing they are already mapped). For ia64, s390 and sh, reject anything but PAGE_KERNEL settings -- this should be fine, for now, seeing these architectures don't support ZONE_DEVICE. A check in __add_pages() is also added to ensure the pgprot parameter was set for all arches. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Badger <ebadger@gigaio.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200306170846.9333-7-logang@deltatee.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10mm/memory_hotplug: rename mhp_restrictions to mhp_paramsLogan Gunthorpe1-3/+3
The mhp_restrictions struct really doesn't specify anything resembling a restriction anymore so rename it to be mhp_params as it is a list of extended parameters. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Eric Badger <ebadger@gigaio.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Link: http://lkml.kernel.org/r/20200306170846.9333-3-logang@deltatee.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10mm/vma: introduce VM_ACCESS_FLAGSAnshuman Khandual1-1/+1
There are many places where all basic VMA access flags (read, write, exec) are initialized or checked against as a group. One such example is during page fault. Existing vma_is_accessible() wrapper already creates the notion of VMA accessibility as a group access permissions. Hence lets just create VM_ACCESS_FLAGS (VM_READ|VM_WRITE|VM_EXEC) which will not only reduce code duplication but also extend the VMA accessibility concept in general. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Salter <msalter@redhat.com> Cc: Nick Hu <nickhu@andestech.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rob Springer <rspringer@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Link: http://lkml.kernel.org/r/1583391014-8170-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-10Merge tag 's390-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linuxLinus Torvalds1-2/+0
Pull more s390 updates from Vasily Gorbik: "Second round of s390 fixes and features for 5.7: - The rest of fallthrough; annotations conversion - Couple of fixes for ADD uevents in the common I/O layer - Minor refactoring of the queued direct I/O code" * tag 's390-5.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/cio: generate delayed uevent for vfio-ccw subchannels s390/cio: avoid duplicated 'ADD' uevents s390/qdio: clear DSCI early for polling drivers s390/qdio: inline shared_ind() s390/qdio: remove cdev from init_data s390/qdio: allow for non-contiguous SBAL array in init_data zfcp: inline zfcp_qdio_setup_init_data() s390/qdio: cleanly split alloc and establish s390/mm: use fallthrough;
2020-04-08Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-1/+6
Pull more kvm updates from Paolo Bonzini: "s390: - nested virtualization fixes x86: - split svm.c - miscellaneous fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: fix crash cleanup when KVM wasn't used KVM: X86: Filter out the broadcast dest for IPI fastpath KVM: s390: vsie: Fix possible race when shadowing region 3 tables KVM: s390: vsie: Fix delivery of addressing exceptions KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks KVM: nVMX: don't clear mtf_pending when nested events are blocked KVM: VMX: Remove unnecessary exception trampoline in vmx_vmenter KVM: SVM: Split svm_vcpu_run inline assembly to separate file KVM: SVM: Move SEV code to separate file KVM: SVM: Move AVIC code to separate file KVM: SVM: Move Nested SVM Implementation to nested.c kVM SVM: Move SVM related files to own sub-directory
2020-04-07KVM: s390: vsie: Fix possible race when shadowing region 3 tablesDavid Hildenbrand1-0/+1
We have to properly retry again by returning -EINVAL immediately in case somebody else instantiated the table concurrently. We missed to add the goto in this function only. The code now matches the other, similar shadowing functions. We are overwriting an existing region 2 table entry. All allocated pages are added to the crst_list to be freed later, so they are not lost forever. However, when unshadowing the region 2 table, we wouldn't trigger unshadowing of the original shadowed region 3 table that we replaced. It would get unshadowed when the original region 3 table is modified. As it's not connected to the page table hierarchy anymore, it's not going to get used anymore. However, for a limited time, this page table will stick around, so it's in some sense a temporary memory leak. Identified by manual code inspection. I don't think this classifies as stable material. Fixes: 998f637cc4b9 ("s390/mm: avoid races on region/segment/page table shadowing") Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200403153050.20569-4-david@redhat.com Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-04-07KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checksDavid Hildenbrand1-1/+5
In case we have a region 1 the following calculation (31 + ((gmap->asce & _ASCE_TYPE_MASK) >> 2)*11) results in 64. As shifts beyond the size are undefined the compiler is free to use instructions like sllg. sllg will only use 6 bits of the shift value (here 64) resulting in no shift at all. That means that ALL addresses will be rejected. The can result in endless loops, e.g. when prefix cannot get mapped. Fixes: 4be130a08420 ("s390/mm: add shadow gmap support") Tested-by: Janosch Frank <frankja@linux.ibm.com> Reported-by: Janosch Frank <frankja@linux.ibm.com> Cc: <stable@vger.kernel.org> # v4.8+ Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200403153050.20569-2-david@redhat.com Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [borntraeger@de.ibm.com: fix patch description, remove WARN_ON_ONCE] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-04-06s390/mm: use fallthrough;Joe Perches1-2/+0
Convert the various uses of fallthrough comments to fallthrough; Done via script Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe.com/ Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-04-04Merge tag 's390-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linuxLinus Torvalds7-163/+83
Pull s390 updates from Vasily Gorbik: - Update maintainers. Niklas Schnelle takes over zpci and Vineeth Vijayan common io code. - Extend cpuinfo to include topology information. - Add new extended counters for IBM z15 and sampling buffer allocation rework in perf code. - Add control over zeroing out memory during system restart. - CCA protected key block version 2 support and other fixes/improvements in crypto code. - Convert to new fallthrough; annotations. - Replace zero-length arrays with flexible-arrays. - QDIO debugfs and other small improvements. - Drop 2-level paging support optimization for compat tasks. Varios mm cleanups. - Remove broken and unused hibernate / power management support. - Remove fake numa support which does not bring any benefits. - Exclude offline CPUs from CPU topology masks to be more consistent with other architectures. - Prevent last branching instruction address leaking to userspace. - Other small various fixes and improvements all over the code. * tag 's390-5.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (57 commits) s390/mm: cleanup init_new_context() callback s390/mm: cleanup virtual memory constants usage s390/mm: remove page table downgrade support s390/qdio: set qdio_irq->cdev at allocation time s390/qdio: remove unused function declarations s390/ccwgroup: remove pm support s390/ap: remove power management code from ap bus and drivers s390/zcrypt: use kvmalloc instead of kmalloc for 256k alloc s390/mm: cleanup arch_get_unmapped_area() and friends s390/ism: remove pm support s390/cio: use fallthrough; s390/vfio: use fallthrough; s390/zcrypt: use fallthrough; s390: use fallthrough; s390/cpum_sf: Fix wrong page count in error message s390/diag: fix display of diagnose call statistics s390/ap: Remove ap device suspend and resume callbacks s390/pci: Improve handling of unset UID s390/pci: Fix zpci_alloc_domain() over allocation s390/qdio: pass ISC as parameter to chsc_sadc() ...
2020-04-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-13/+137
Pull kvm updates from Paolo Bonzini: "ARM: - GICv4.1 support - 32bit host removal PPC: - secure (encrypted) using under the Protected Execution Framework ultravisor s390: - allow disabling GISA (hardware interrupt injection) and protected VMs/ultravisor support. x86: - New dirty bitmap flag that sets all bits in the bitmap when dirty page logging is enabled; this is faster because it doesn't require bulk modification of the page tables. - Initial work on making nested SVM event injection more similar to VMX, and less buggy. - Various cleanups to MMU code (though the big ones and related optimizations were delayed to 5.8). Instead of using cr3 in function names which occasionally means eptp, KVM too has standardized on "pgd". - A large refactoring of CPUID features, which now use an array that parallels the core x86_features. - Some removal of pointer chasing from kvm_x86_ops, which will also be switched to static calls as soon as they are available. - New Tigerlake CPUID features. - More bugfixes, optimizations and cleanups. Generic: - selftests: cleanups, new MMU notifier stress test, steal-time test - CSV output for kvm_stat" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits) x86/kvm: fix a missing-prototypes "vmread_error" KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y KVM: VMX: Add a trampoline to fix VMREAD error handling KVM: SVM: Annotate svm_x86_ops as __initdata KVM: VMX: Annotate vmx_x86_ops as __initdata KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup() KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes KVM: VMX: Configure runtime hooks using vmx_x86_ops KVM: VMX: Move hardware_setup() definition below vmx_x86_ops KVM: x86: Move init-only kvm_x86_ops to separate struct KVM: Pass kvm_init()'s opaque param to additional arch funcs s390/gmap: return proper error code on ksm unsharing KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move() KVM: Fix out of range accesses to memslots KVM: X86: Micro-optimize IPI fastpath delay KVM: X86: Delay read msr data iff writes ICR MSR KVM: PPC: Book3S HV: Add a capability for enabling secure guests KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs ...
2020-04-02mm: allow VM_FAULT_RETRY for multiple timesPeter Xu1-4/+1
The idea comes from a discussion between Linus and Andrea [1]. Before this patch we only allow a page fault to retry once. We achieved this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing handle_mm_fault() the second time. This was majorly used to avoid unexpected starvation of the system by looping over forever to handle the page fault on a single page. However that should hardly happen, and after all for each code path to return a VM_FAULT_RETRY we'll first wait for a condition (during which time we should possibly yield the cpu) to happen before VM_FAULT_RETRY is really returned. This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY flag when we receive VM_FAULT_RETRY. It means that the page fault handler now can retry the page fault for multiple times if necessary without the need to generate another page fault event. Meanwhile we still keep the FAULT_FLAG_TRIED flag so page fault handler can still identify whether a page fault is the first attempt or not. Then we'll have these combinations of fault flags (only considering ALLOW_RETRY flag and TRIED flag): - ALLOW_RETRY and !TRIED: this means the page fault allows to retry, and this is the first try - ALLOW_RETRY and TRIED: this means the page fault allows to retry, and this is not the first try - !ALLOW_RETRY and !TRIED: this means the page fault does not allow to retry at all - !ALLOW_RETRY and TRIED: this is forbidden and should never be used In existing code we have multiple places that has taken special care of the first condition above by checking against (fault_flags & FAULT_FLAG_ALLOW_RETRY). This patch introduces a simple helper to detect the first retry of a page fault by checking against both (fault_flags & FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now even the 2nd try will have the ALLOW_RETRY set, then use that helper in all existing special paths. One example is in __lock_page_or_retry(), now we'll drop the mmap_sem only in the first attempt of page fault and we'll keep it in follow up retries, so old locking behavior will be retained. This will be a nice enhancement for current code [2] at the same time a supporting material for the future userfaultfd-writeprotect work, since in that work there will always be an explicit userfault writeprotect retry for protected pages, and if that cannot resolve the page fault (e.g., when userfaultfd-writeprotect is used in conjunction with swapped pages) then we'll possibly need a 3rd retry of the page fault. It might also benefit other potential users who will have similar requirement like userfault write-protection. GUP code is not touched yet and will be covered in follow up patch. Please read the thread below for more information. [1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/ [2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/ Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160246.9790-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02mm: introduce FAULT_FLAG_DEFAULTPeter Xu1-1/+1
Although there're tons of arch-specific page fault handlers, most of them are still sharing the same initial value of the page fault flags. Say, merely all of the page fault handlers would allow the fault to be retried, and they also allow the fault to respond to SIGKILL. Let's define a default value for the fault flags to replace those initial page fault flags that were copied over. With this, it'll be far easier to introduce new fault flag that can be used by all the architectures instead of touching all the archs. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160238.9694-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02mm: introduce fault_signal_pending()Peter Xu1-2/+1
For most architectures, we've got a quick path to detect fatal signal after a handle_mm_fault(). Introduce a helper for that quick path. It cleans the current codes a bit so we don't need to duplicate the same check across archs. More importantly, this will be an unified place that we handle the signal immediately right after an interrupted page fault, so it'll be much easier for us if we want to change the behavior of handling signals later on for all the archs. Note that currently only part of the archs are using this new helper, because some archs have their own way to handle signals. In the follow up patches, we'll try to apply this helper to all the rest of archs. Another note is that the "regs" parameter in the new helper is not used yet. It'll be used very soon. Now we kept it in this patch only to avoid touching all the archs again in the follow up patches. [peterx@redhat.com: fix sparse warnings] Link: http://lkml.kernel.org/r/20200311145921.GD479302@xz-x1 Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220155353.8676-4-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-28s390/mm: cleanup virtual memory constants usageAlexander Gordeev1-2/+2
Remove duplicate definitions and consolidate usage of virutal and address translation constants. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-28s390/mm: remove page table downgrade supportAlexander Gordeev1-24/+0
This update consolidates page table handling code. Because there are hardly any 31-bit binaries left we do not need to optimize for that. No extra efforts are needed to ensure that a compat task does not map anything above 2GB. The TASK_SIZE limit for 31-bit tasks is 2GB already and the generic code does check that a resulting map address would not surpass that limit. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-27s390/gmap: return proper error code on ksm unsharingChristian Borntraeger1-4/+5
If a signal is pending we might return -ENOMEM instead of -EINTR. We should propagate the proper error during KSM unsharing. unmerge_ksm_pages returns -ERESTARTSYS on signal_pending. This gets translated by entry.S to -EINTR. It is important to get this error code so that userspace can retry. To make this clearer we also add -EINTR to the documentation of the PV_ENABLE call, which calls unmerge_ksm_pages. Fixes: 3ac8e38015d4 ("s390/mm: disable KSM for storage key enabled pages") Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> Tested-by: Marc Hartmayer <mhartmay@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-03-27s390/mm: cleanup arch_get_unmapped_area() and friendsAlexander Gordeev2-38/+13
Factor out check_asce_limit() function and fix few style defects in arch_get_unmapped_area() family of functions. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> [heiko.carstens@de.ibm.com: small coding style changes] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-25s390: use fallthrough;Joe Perches2-7/+6
Convert the various uses of fallthrough comments to fallthrough; Done via script Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe.com/ Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-23KVM: s390: Use fallthrough;Joe Perches1-3/+3
Convert the various uses of fallthrough comments to fallthrough; Done via script Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com Signed-off-by: Joe Perches <joe@perches.com> Link: https://lore.kernel.org/r/d63c86429f3e5aa806aa3e185c97d213904924a5.1583896348.git.joe@perches.com [borntrager@de.ibm.com: Fix link to tool and subject] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-03-23s390: remove broken hibernate / power management supportHeiko Carstens2-59/+3
Hibernation is known to be broken for many years on s390. Given that there aren't any real use cases, remove the code instead of spending time to fix and maintain it. Without hibernate support it doesn't make too much sense to keep power management support; therefore remove it completely. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-23s390/mm: optimize page table upgrade routineAlexander Gordeev1-34/+56
There is a maximum of two new tables allocated on page table upgrade. Because we know that a loop the current implementation is based on could be unrolled with some improvements: * upgrade from 3 to 5 levels happens in one go - without an unnecessary re-take of page_table_lock in-between; * page tables initialization moved out of the atomic code; Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-10s390: prevent leaking kernel address in BEARSven Schnelle1-0/+4
When userspace executes a syscall or gets interrupted, BEAR contains a kernel address when returning to userspace. This make it pretty easy to figure out where the kernel is mapped even with KASLR enabled. To fix this, add lpswe to lowcore and always execute it there, so userspace sees only the lowcore address of lpswe. For this we have to extend both critical_cleanup and the SWITCH_ASYNC macro to also check for lpswe addresses in lowcore. Fixes: b2d24b97b2a9 ("s390/kernel: add support for kernel address space layout randomization (KASLR)") Cc: <stable@vger.kernel.org> # v5.2+ Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-04s390/mm: mark private defines for vm_fault_t as suchChristian Borntraeger1-5/+5
This fixes several sparse warnings for fault.c: arch/s390/mm/fault.c:336:36: warning: restricted vm_fault_t degrades to integer arch/s390/mm/fault.c:573:23: warning: incorrect type in assignment (different base types) arch/s390/mm/fault.c:573:23: expected restricted vm_fault_t [usertype] fault arch/s390/mm/fault.c:573:23: got int Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-02-27KVM: s390/mm: Make pages accessible before destroying the guestChristian Borntraeger1-0/+35
Before we destroy the secure configuration, we better make all pages accessible again. This also happens during reboot, where we reboot into a non-secure guest that then can go again into secure mode. As this "new" secure guest will have a new ID we cannot reuse the old page state. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com>
2020-02-27KVM: s390: protvirt: Secure memory is not mergeableJanosch Frank1-10/+20
KSM will not work on secure pages, because when the kernel reads a secure page, it will be encrypted and hence no two pages will look the same. Let's mark the guest pages as unmergeable when we transition to secure mode. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-02-27s390/mm: add (non)secure page access exceptions handlersVasily Gorbik1-0/+78
Add exceptions handlers performing transparent transition of non-secure pages to secure (import) upon guest access and secure pages to non-secure (export) upon hypervisor access. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> [frankja@linux.ibm.com: adding checks for failures] Signed-off-by: Janosch Frank <frankja@linux.ibm.com> [imbrenda@linux.ibm.com: adding a check for gmap fault] Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> [borntraeger@de.ibm.com: patch merging, splitting, fixing] Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2020-01-30s390/mm: fix dynamic pagetable upgrade for hugetlbfsGerald Schaefer1-1/+99
Commit ee71d16d22bb ("s390/mm: make TASK_SIZE independent from the number of page table levels") changed the logic of TASK_SIZE and also removed the arch_mmap_check() implementation for s390. This combination has a subtle effect on how get_unmapped_area() for hugetlbfs pages works. It is now possible that a user process establishes a hugetlbfs mapping at an address above 4 TB, without triggering a dynamic pagetable upgrade from 3 to 4 levels. This is because hugetlbfs mappings will not use mm->get_unmapped_area, but rather file->f_op->get_unmapped_area, which currently is the generic implementation of hugetlb_get_unmapped_area() that does not know about s390 dynamic pagetable upgrades, but with the new definition of TASK_SIZE, it will now allow mappings above 4 TB. Subsequent access to such a mapped address above 4 TB will result in a page fault loop, because the CPU cannot translate such a large address with 3 pagetable levels. The fault handler will try to map in a hugepage at the address, but due to the folded pagetable logic it will end up with creating entries in the 3 level pagetable, possibly overwriting existing mappings, and then it all repeats when the access is retried. Apart from the page fault loop, this can have various nasty effects, e.g. kernel panic from one of the BUG_ON() checks in memory management code, or even data loss if an existing mapping gets overwritten. Fix this by implementing HAVE_ARCH_HUGETLB_UNMAPPED_AREA support for s390, providing an s390 version for hugetlb_get_unmapped_area() with pagetable upgrade support similar to arch_get_unmapped_area(), which will then be used instead of the generic version. Fixes: ee71d16d22bb ("s390/mm: make TASK_SIZE independent from the number of page table levels") Cc: <stable@vger.kernel.org> # 4.12+ Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-01-04mm/memory_hotplug: shrink zones when offlining memoryDavid Hildenbrand1-3/+1
We currently try to shrink a single zone when removing memory. We use the zone of the first page of the memory we are removing. If that memmap was never initialized (e.g., memory was never onlined), we will read garbage and can trigger kernel BUGs (due to a stale pointer): BUG: unable to handle page fault for address: 000000000000353d #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP PTI CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4 Workqueue: kacpi_hotplug acpi_hotplug_work_fn RIP: 0010:clear_zone_contiguous+0x5/0x10 Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840 RSP: 0018:ffffad2400043c98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000 RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40 RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000 R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680 FS: 0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __remove_pages+0x4b/0x640 arch_remove_memory+0x63/0x8d try_remove_memory+0xdb/0x130 __remove_memory+0xa/0x11 acpi_memory_device_remove+0x70/0x100 acpi_bus_trim+0x55/0x90 acpi_device_hotplug+0x227/0x3a0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x221/0x550 worker_thread+0x50/0x3b0 kthread+0x105/0x140 ret_from_fork+0x3a/0x50 Modules linked in: CR2: 000000000000353d Instead, shrink the zones when offlining memory or when onlining failed. Introduce and use remove_pfn_range_from_zone(() for that. We now properly shrink the zones, even if we have DIMMs whereby - Some memory blocks fall into no zone (never onlined) - Some memory blocks fall into multiple zones (offlined+re-onlined) - Multiple memory blocks that fall into different zones Drop the zone parameter (with a potential dubious value) from __remove_pages() and __remove_section(). Link: http://lkml.kernel.org/r/20191006085646.5768-6-david@redhat.com Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319] Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: <stable@vger.kernel.org> [5.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-11s390/kasan: add KASAN_VMALLOC supportVasily Gorbik1-12/+56
Add KASAN_VMALLOC support which now enables vmalloc memory area access checks as well as enables usage of VMAP_STACK under kasan. KASAN_VMALLOC changes the way vmalloc and modules areas shadow memory is handled. With this new approach only top level page tables are pre-populated and lower levels are filled dynamically upon memory allocation. Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-30s390: disable preemption when switching to nodat stack with CALL_ON_STACKVasily Gorbik1-3/+9
Make sure preemption is disabled when temporary switching to nodat stack with CALL_ON_STACK helper, because nodat stack is per cpu. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-25Merge tag 's390-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linuxLinus Torvalds1-4/+8
Pull s390 updates from Vasily Gorbik: - Adjust PMU device drivers registration to avoid WARN_ON and few other perf improvements. - Enhance tracing in vfio-ccw. - Few stack unwinder fixes and improvements, convert get_wchan custom stack unwinding to generic api usage. - Fixes for mm helpers issues uncovered with tests validating architecture page table helpers. - Fix noexec bit handling when hardware doesn't support it. - Fix memleak and unsigned value compared with zero bugs in crypto code. Minor code simplification. - Fix crash during kdump with kasan enabled kernel. - Switch bug and alternatives from asm to asm_inline to improve inlining decisions. - Use 'depends on cc-option' for MARCH and TUNE options in Kconfig, add z13s and z14 ZR1 to TUNE descriptions. - Minor head64.S simplification. - Fix physical to logical CPU map for SMT. - Several cleanups in qdio code. - Other minor cleanups and fixes all over the code. * tag 's390-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (41 commits) s390/cpumf: Adjust registration of s390 PMU device drivers s390/smp: fix physical to logical CPU map for SMT s390/early: move access registers setup in C code s390/head64: remove unnecessary vdso_per_cpu_data setup s390/early: move control registers setup in C code s390/kasan: support memcpy_real with TRACE_IRQFLAGS s390/crypto: Fix unsigned variable compared with zero s390/pkey: use memdup_user() to simplify code s390/pkey: fix memory leak within _copy_apqns_from_user() s390/disassembler: don't hide instruction addresses s390/cpum_sf: Assign error value to err variable s390/cpum_sf: Replace function name in debug statements s390/cpum_sf: Use consistant debug print format for sampling s390/unwind: drop unnecessary code around calling ftrace_graph_ret_addr() s390: add error handling to perf_callchain_kernel s390: always inline current_stack_pointer() s390/mm: add mm_pxd_folded() checks to pxd_free() s390/mm: properly clear _PAGE_NOEXEC bit when it is not supported s390/mm: simplify page table helpers for large entries s390/mm: make pmd/pud_bad() report large entries as bad ...
2019-11-25Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds1-0/+1
Pull arm64 updates from Catalin Marinas: "Apart from the arm64-specific bits (core arch and perf, new arm64 selftests), it touches the generic cow_user_page() (reviewed by Kirill) together with a macro for x86 to preserve the existing behaviour on this architecture. Summary: - On ARMv8 CPUs without hardware updates of the access flag, avoid failing cow_user_page() on PFN mappings if the pte is old. The patches introduce an arch_faults_on_old_pte() macro, defined as false on x86. When true, cow_user_page() makes the pte young before attempting __copy_from_user_inatomic(). - Covert the synchronous exception handling paths in arch/arm64/kernel/entry.S to C. - FTRACE_WITH_REGS support for arm64. - ZONE_DMA re-introduced on arm64 to support Raspberry Pi 4 - Several kselftest cases specific to arm64, together with a MAINTAINERS update for these files (moved to the ARM64 PORT entry). - Workaround for a Neoverse-N1 erratum where the CPU may fetch stale instructions under certain conditions. - Workaround for Cortex-A57 and A72 errata where the CPU may speculatively execute an AT instruction and associate a VMID with the wrong guest page tables (corrupting the TLB). - Perf updates for arm64: additional PMU topologies on HiSilicon platforms, support for CCN-512 interconnect, AXI ID filtering in the IMX8 DDR PMU, support for the CCPI2 uncore PMU in ThunderX2. - GICv3 optimisation to avoid a heavy barrier when accessing the ICC_PMR_EL1 register. - ELF HWCAP documentation updates and clean-up. - SMC calling convention conduit code clean-up. - KASLR diagnostics printed during boot - NVIDIA Carmel CPU added to the KPTI whitelist - Some arm64 mm clean-ups: use generic free_initrd_mem(), remove stale macro, simplify calculation in __create_pgd_mapping(), typos. - Kconfig clean-ups: CMDLINE_FORCE to depend on CMDLINE, choice for endinanness to help with allmodconfig" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits) arm64: Kconfig: add a choice for endianness kselftest: arm64: fix spelling mistake "contiguos" -> "contiguous" arm64: Kconfig: make CMDLINE_FORCE depend on CMDLINE MAINTAINERS: Add arm64 selftests to the ARM64 PORT entry arm64: kaslr: Check command line before looking for a seed arm64: kaslr: Announce KASLR status on boot kselftest: arm64: fake_sigreturn_misaligned_sp kselftest: arm64: fake_sigreturn_bad_size kselftest: arm64: fake_sigreturn_duplicated_fpsimd kselftest: arm64: fake_sigreturn_missing_fpsimd kselftest: arm64: fake_sigreturn_bad_size_for_magic0 kselftest: arm64: fake_sigreturn_bad_magic kselftest: arm64: add helper get_current_context kselftest: arm64: extend test_init functionalities kselftest: arm64: mangle_pstate_invalid_mode_el[123][ht] kselftest: arm64: mangle_pstate_invalid_daif_bits kselftest: arm64: mangle_pstate_invalid_compat_toggle and common utils kselftest: arm64: extend toplevel skeleton Makefile drivers/perf: hisi: update the sccl_id/ccl_id for certain HiSilicon platform arm64: mm: reserve CMA and crashkernel in ZONE_DMA32 ...
2019-11-20s390/kasan: support memcpy_real with TRACE_IRQFLAGSVasily Gorbik1-4/+8
Currently if the kernel is built with CONFIG_TRACE_IRQFLAGS and KASAN and used as crash kernel it crashes itself due to trace_hardirqs_off/trace_hardirqs_on being called with DAT off. This happens because trace_hardirqs_off/trace_hardirqs_on are instrumented and kasan code tries to perform access to shadow memory to validate memory accesses. Kasan shadow memory is populated with vmemmap, so all accesses require DAT on. memcpy_real could be called with DAT on or off (with kasan enabled DAT is set even before early code is executed). Make sure that trace_hardirqs_off/trace_hardirqs_on are called with DAT on and only actual __memcpy_real is called with DAT off. Also annotate __memcpy_real and _memcpy_real with __no_sanitize_address to avoid further problems due to switching DAT off. Reviewed-by: Philipp Rudo <prudo@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-11-01dma/direct: turn ARCH_ZONE_DMA_BITS into a variableNicolas Saenz Julienne1-0/+1
Some architectures, notably ARM, are interested in tweaking this depending on their runtime DMA addressing limitations. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2019-10-31s390/cmm: fix information leak in cmm_timeout_handler()Yihui ZENG1-6/+6
The problem is that we were putting the NUL terminator too far: buf[sizeof(buf) - 1] = '\0'; If the user input isn't NUL terminated and they haven't initialized the whole buffer then it leads to an info leak. The NUL terminator should be: buf[len - 1] = '\0'; Signed-off-by: Yihui Zeng <yzeng56@asu.edu> Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> [heiko.carstens@de.ibm.com: keep semantics of how *lenp and *ppos are handled] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-09-26mm: treewide: clarify pgtable_page_{ctor,dtor}() namingMark Rutland1-3/+3
The naming of pgtable_page_{ctor,dtor}() seems to have confused a few people, and until recently arm64 used these erroneously/pointlessly for other levels of page table. To make it incredibly clear that these only apply to the PTE level, and to align with the naming of pgtable_pmd_page_{ctor,dtor}(), let's rename them to pgtable_pte_page_{ctor,dtor}(). These changes were generated with the following shell script: ---- git grep -lw 'pgtable_page_.tor' | while read FILE; do sed -i '{s/pgtable_page_ctor/pgtable_pte_page_ctor/}' $FILE; sed -i '{s/pgtable_page_dtor/pgtable_pte_page_dtor/}' $FILE; done ---- ... with the documentation re-flowed to remain under 80 columns, and whitespace fixed up in macros to keep backslashes aligned. There should be no functional change as a result of this patch. Link: http://lkml.kernel.org/r/20190722141133.3116-1-mark.rutland@arm.com Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-21Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-19/+16
Pull hmm updates from Jason Gunthorpe: "This is more cleanup and consolidation of the hmm APIs and the very strongly related mmu_notifier interfaces. Many places across the tree using these interfaces are touched in the process. Beyond that a cleanup to the page walker API and a few memremap related changes round out the series: - General improvement of hmm_range_fault() and related APIs, more documentation, bug fixes from testing, API simplification & consolidation, and unused API removal - Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE, and make them internal kconfig selects - Hoist a lot of code related to mmu notifier attachment out of drivers by using a refcount get/put attachment idiom and remove the convoluted mmu_notifier_unregister_no_release() and related APIs. - General API improvement for the migrate_vma API and revision of its only user in nouveau - Annotate mmu_notifiers with lockdep and sleeping region debugging Two series unrelated to HMM or mmu_notifiers came along due to dependencies: - Allow pagemap's memremap_pages family of APIs to work without providing a struct device - Make walk_page_range() and related use a constant structure for function pointers" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (75 commits) libnvdimm: Enable unit test infrastructure compile checks mm, notifier: Catch sleeping/blocking for !blockable kernel.h: Add non_block_start/end() drm/radeon: guard against calling an unpaired radeon_mn_unregister() csky: add missing brackets in a macro for tlb.h pagewalk: use lockdep_assert_held for locking validation pagewalk: separate function pointers from iterator data mm: split out a new pagewalk.h header from mm.h mm/mmu_notifiers: annotate with might_sleep() mm/mmu_notifiers: prime lockdep mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end mm/mmu_notifiers: remove the __mmu_notifier_invalidate_range_start/end exports mm/hmm: hmm_range_fault() infinite loop mm/hmm: hmm_range_fault() NULL pointer bug mm/hmm: fix hmm_range_fault()'s handling of swapped out pages mm/mmu_notifiers: remove unregister_no_release RDMA/odp: remove ib_ucontext from ib_umem RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm' RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr RDMA/mlx5: Use ib_umem_start instead of umem.address ...
2019-09-20Merge tag 'powerpc-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds1-6/+1
Pull powerpc updates from Michael Ellerman: "This is a bit late, partly due to me travelling, and partly due to a power outage knocking out some of my test systems *while* I was travelling. - Initial support for running on a system with an Ultravisor, which is software that runs below the hypervisor and protects guests against some attacks by the hypervisor. - Support for building the kernel to run as a "Secure Virtual Machine", ie. as a guest capable of running on a system with an Ultravisor. - Some changes to our DMA code on bare metal, to allow devices with medium sized DMA masks (> 32 && < 59 bits) to use more than 2GB of DMA space. - Support for firmware assisted crash dumps on bare metal (powernv). - Two series fixing bugs in and refactoring our PCI EEH code. - A large series refactoring our exception entry code to use gas macros, both to make it more readable and also enable some future optimisations. As well as many cleanups and other minor features & fixups. Thanks to: Adam Zerella, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman Khandual, Balbir Singh, Benjamin Herrenschmidt, Cédric Le Goater, Christophe JAILLET, Christophe Leroy, Christopher M. Riedl, Christoph Hellwig, Claudio Carvalho, Daniel Axtens, David Gibson, David Hildenbrand, Desnes A. Nunes do Rosario, Ganesh Goudar, Gautham R. Shenoy, Greg Kurz, Guerney Hunt, Gustavo Romero, Halil Pasic, Hari Bathini, Joakim Tjernlund, Jonathan Neuschafer, Jordan Niethe, Leonardo Bras, Lianbo Jiang, Madhavan Srinivasan, Mahesh Salgaonkar, Mahesh Salgaonkar, Masahiro Yamada, Maxiwell S. Garcia, Michael Anderson, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Qian Cai, Ram Pai, Ravi Bangoria, Reza Arbab, Ryan Grimm, Sam Bobroff, Santosh Sivaraj, Segher Boessenkool, Sukadev Bhattiprolu, Thiago Bauermann, Thiago Jung Bauermann, Thomas Gleixner, Tom Lendacky, Vasant Hegde" * tag 'powerpc-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (264 commits) powerpc/mm/mce: Keep irqs disabled during lockless page table walk powerpc: Use ftrace_graph_ret_addr() when unwinding powerpc/ftrace: Enable HAVE_FUNCTION_GRAPH_RET_ADDR_PTR ftrace: Look up the address of return_to_handler() using helpers powerpc: dump kernel log before carrying out fadump or kdump docs: powerpc: Add missing documentation reference powerpc/xmon: Fix output of XIVE IPI powerpc/xmon: Improve output of XIVE interrupts powerpc/mm/radix: remove useless kernel messages powerpc/fadump: support holes in kernel boot memory area powerpc/fadump: remove RMA_START and RMA_END macros powerpc/fadump: update documentation about option to release opalcore powerpc/fadump: consider f/w load area powerpc/opalcore: provide an option to invalidate /sys/firmware/opal/core file powerpc/opalcore: export /sys/firmware/opal/core for analysing opal crashes powerpc/fadump: update documentation about CONFIG_PRESERVE_FA_DUMP powerpc/fadump: add support to preserve crash data on FADUMP disabled kernel powerpc/fadump: improve how crashed kernel's memory is reserved powerpc/fadump: consider reserved ranges while releasing memory powerpc/fadump: make crash memory ranges array allocation generic ...
2019-09-17Merge tag 's390-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linuxLinus Torvalds4-32/+17
Pull s390 updates from Vasily Gorbik: - Add support for IBM z15 machines. - Add SHA3 and CCA AES cipher key support in zcrypt and pkey refactoring. - Move to arch_stack_walk infrastructure for the stack unwinder. - Various kasan fixes and improvements. - Various command line parsing fixes. - Improve decompressor phase debuggability. - Lift no bss usage restriction for the early code. - Use refcount_t for reference counters for couple of places in mm code. - Logging improvements and return code fix in vfio-ccw code. - Couple of zpci fixes and minor refactoring. - Remove some outdated documentation. - Fix secure boot detection. - Other various minor code clean ups. * tag 's390-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits) s390: remove pointless drivers-y in drivers/s390/Makefile s390/cpum_sf: Fix line length and format string s390/pci: fix MSI message data s390: add support for IBM z15 machines s390/crypto: Support for SHA3 via CPACF (MSA6) s390/startup: add pgm check info printing s390/crypto: xts-aes-s390 fix extra run-time crypto self tests finding vfio-ccw: fix error return code in vfio_ccw_sch_init() s390: vfio-ap: fix warning reset not completed s390/base: remove unused s390_base_mcck_handler s390/sclp: Fix bit checked for has_sipl s390/zcrypt: fix wrong handling of cca cipher keygenflags s390/kasan: add kdump support s390/setup: avoid using strncmp with hardcoded length s390/sclp: avoid using strncmp with hardcoded length s390/module: avoid using strncmp with hardcoded length s390/pci: avoid using strncmp with hardcoded length s390/kaslr: reserve memory for kasan usage s390/mem_detect: provide single get_mem_detect_end s390/cmma: reuse kstrtobool for option value parsing ...
2019-09-07pagewalk: separate function pointers from iterator dataChristoph Hellwig1-18/+15
The mm_walk structure currently mixed data and code. Split out the operations vectors into a new mm_walk_ops structure, and while we are changing the API also declare the mm_walk structure inside the walk_page_range and walk_page_vma functions. Based on patch from Linus Torvalds. Link: https://lore.kernel.org/r/20190828141955.22210-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-07mm: split out a new pagewalk.h header from mm.hChristoph Hellwig1-1/+1
Add a new header for the two handful of users of the walk_page_range / walk_page_vma interface instead of polluting all users of mm.h with it. Link: https://lore.kernel.org/r/20190828141955.22210-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-29s390/kasan: add kdump supportVasily Gorbik1-0/+2
If kasan enabled kernel is used as crash kernel it crashes itself with program check loop during kdump execution. The reason for that is that kasan shadow memory backed by pages beyond OLDMEM_SIZE. Make kasan memory allocator respect physical memory limit imposed by kdump. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-08-26s390/mem_detect: provide single get_mem_detect_endVasily Gorbik1-12/+0
get_mem_detect_end is already used in couple of places with potential to be utilized in more cases. Provide single get_mem_detect_end implementation in asm/mem_detect.h to be used by kasan and startup code. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-08-26s390/cmma: reuse kstrtobool for option value parsingVasily Gorbik1-10/+4
"cmma" option setup already recognises some textual values. Yet kstrtobool is a more common way to parse boolean values, reuse it to unify option value parsing behavior and simplify code a bit. While at it, __setup value parsing callbacks are expected to return 1 when an option is recognized, and returning any other value won't trigger any error message currently, so simply return 1. Reviewed-by: Philipp Rudo <prudo@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-08-21s390/mm: use refcount_t for refcountChuhong Yuan1-5/+5
Reference counters are preferred to use refcount_t instead of atomic_t. This is because the implementation of refcount_t can prevent overflows and detect possible use-after-free. So convert atomic_t ref counters to refcount_t. Link: http://lkml.kernel.org/r/20190808071826.6649-1-hslester96@gmail.com Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2019-08-21s390/extmem: use refcount_t for refcountChuhong Yuan1-5/+6
Reference counters are preferred to use refcount_t instead of atomic_t. This is because the implementation of refcount_t can prevent overflows and detect possible use-after-free. So convert atomic_t ref counters to refcount_t. Link: http://lkml.kernel.org/r/20190808071817.6595-1-hslester96@gmail.com Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>