aboutsummaryrefslogtreecommitdiffstats
path: root/arch/s390/include/asm/pgalloc.h (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-01-04mm: treewide: remove unused address argument from pte_alloc functionsJoel Fernandes (Google)1-2/+2
Patch series "Add support for fast mremap". This series speeds up the mremap(2) syscall by copying page tables at the PMD level even for non-THP systems. There is concern that the extra 'address' argument that mremap passes to pte_alloc may do something subtle architecture related in the future that may make the scheme not work. Also we find that there is no point in passing the 'address' to pte_alloc since its unused. This patch therefore removes this argument tree-wide resulting in a nice negative diff as well. Also ensuring along the way that the enabled architectures do not do anything funky with the 'address' argument that goes unnoticed by the optimization. Build and boot tested on x86-64. Build tested on arm64. The config enablement patch for arm64 will be posted in the future after more testing. The changes were obtained by applying the following Coccinelle script. (thanks Julia for answering all Coccinelle questions!). Following fix ups were done manually: * Removal of address argument from pte_fragment_alloc * Removal of pte_alloc_one_fast definitions from m68k and microblaze. // Options: --include-headers --no-includes // Note: I split the 'identifier fn' line, so if you are manually // running it, please unsplit it so it runs for you. virtual patch @pte_alloc_func_def depends on patch exists@ identifier E2; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; type T2; @@ fn(... - , T2 E2 ) { ... } @pte_alloc_func_proto_noarg depends on patch exists@ type T1, T2, T3, T4; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ ( - T3 fn(T1, T2); + T3 fn(T1); | - T3 fn(T1, T2, T4); + T3 fn(T1, T2); ) @pte_alloc_func_proto depends on patch exists@ identifier E1, E2, E4; type T1, T2, T3, T4; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ ( - T3 fn(T1 E1, T2 E2); + T3 fn(T1 E1); | - T3 fn(T1 E1, T2 E2, T4 E4); + T3 fn(T1 E1, T2 E2); ) @pte_alloc_func_call depends on patch exists@ expression E2; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ fn(... -, E2 ) @pte_alloc_macro depends on patch exists@ identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; identifier a, b, c; expression e; position p; @@ ( - #define fn(a, b, c) e + #define fn(a, b) e | - #define fn(a, b) e + #define fn(a) e ) Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.com Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Suggested-by: Kirill A. Shutemov <kirill@shutemov.name> Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michal Hocko <mhocko@kernel.org> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-02s390/mm: fix mis-accounting of pgtable_bytesMartin Schwidefsky1-3/+3
In case a fork or a clone system fails in copy_process and the error handling does the mmput() at the bad_fork_cleanup_mm label, the following warning messages will appear on the console: BUG: non-zero pgtables_bytes on freeing mm: 16384 The reason for that is the tricks we play with mm_inc_nr_puds() and mm_inc_nr_pmds() in init_new_context(). A normal 64-bit process has 3 levels of page table, the p4d level and the pud level are folded. On process termination the free_pud_range() function in mm/memory.c will subtract 16KB from pgtable_bytes with a mm_dec_nr_puds() call, but there actually is not really a pud table. One issue with this is the fact that pgtable_bytes is usually off by a few kilobytes, but the more severe problem is that for a failed fork or clone the free_pgtables() function is not called. In this case there is no mm_dec_nr_puds() or mm_dec_nr_pmds() that go together with the mm_inc_nr_puds() and mm_inc_nr_pmds in init_new_context(). The pgtable_bytes will be off by 16384 or 32768 bytes and we get the BUG message. The message itself is purely cosmetic, but annoying. To fix this override the mm_pmd_folded, mm_pud_folded and mm_p4d_folded function to check for the true size of the address space. Reported-by: Li Wang <liwang@redhat.com> Tested-by: Li Wang <liwang@redhat.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-02-22s390/mm: provide base_asce_alloc() / base_asce_free() helper functionsHeiko Carstens1-0/+3
Provide base_asce_alloc() and base_asce_free() helper functions which can be used to allocate an ASCE and all required region, segment and page tables required to access memory regions of the virtual kernel address space. Both, the ASCE and all tables, do not use any features that correspond to e.g. enhanced DAT features. This is required for some I/O functions that pass an ASCE, like e.g. some service call requests, but which may not use any enhanced features. Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Acked-by: Janosch Frank <frankja@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-13Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linuxLinus Torvalds1-16/+2
Pull s390 updates from Heiko Carstens: "Since Martin is on vacation you get the s390 pull request for the v4.15 merge window this time from me. Besides a lot of cleanups and bug fixes these are the most important changes: - a new regset for runtime instrumentation registers - hardware accelerated AES-GCM support for the aes_s390 module - support for the new CEX6S crypto cards - support for FORTIFY_SOURCE - addition of missing z13 and new z14 instructions to the in-kernel disassembler - generate opcode tables for the in-kernel disassembler out of a simple text file instead of having to manually maintain those tables - fast memset16, memset32 and memset64 implementations - removal of named saved segment support - hardware counter support for z14 - queued spinlocks and queued rwlocks implementations for s390 - use the stack_depth tracking feature for s390 BPF JIT - a new s390_sthyi system call which emulates the sthyi (store hypervisor information) instruction - removal of the old KVM virtio transport - an s390 specific CPU alternatives implementation which is used in the new spinlock code" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (88 commits) MAINTAINERS: add virtio-ccw.h to virtio/s390 section s390/noexec: execute kexec datamover without DAT s390: fix transactional execution control register handling s390/bpf: take advantage of stack_depth tracking s390: simplify transactional execution elf hwcap handling s390/zcrypt: Rework struct ap_qact_ap_info. s390/virtio: remove unused header file kvm_virtio.h s390: avoid undefined behaviour s390/disassembler: generate opcode tables from text file s390/disassembler: remove insn_to_mnemonic() s390/dasd: avoid calling do_gettimeofday() s390: vfio-ccw: Do not attempt to free no-op, test and tic cda. s390: remove named saved segment support s390/archrandom: Reconsider s390 arch random implementation s390/pci: do not require AIS facility s390/qdio: sanitize put_indicator s390/qdio: use atomic_cmpxchg s390/nmi: avoid using long-displacement facility s390: pass endianness info to sparse s390/decompressor: remove informational messages ...
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman1-0/+1
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-09s390/mm: use memset64 instead of clear_tableHeiko Carstens1-16/+2
Use memset64 instead of the (now) open-coded variant clear_table. Performance wise there is no difference. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-07-26s390/mm,vmem: simplify region and segment table allocation codeHeiko Carstens1-0/+4
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-07-26s390/mm: use new mm defines instead of magic valuesHeiko Carstens1-6/+6
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-07-26s390/mm: introduce defines to reflect the hardware mmuHeiko Carstens1-0/+2
Add various defines like e.g. _REGION1_SHIFT to reflect the hardware mmu. We have quite a bit code that does not make use of the Linux memory management primitives but directly modifies page, segment and region values. Most of this is open-coded like e.g. "1UL << 53". In order to clean this up introduce a couple of new defines. The existing Linux memory management defines are changed, so the mapping to the hardware implementation is reflected. Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-06-12s390/mm: implement 5 level pages tablesMartin Schwidefsky1-4/+21
Add the logic to upgrade the page table for a 64-bit process to five levels. This increases the TASK_SIZE from 8PB to 16EB-4K. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-12-07s390: Remove VLAIS in ptff() and clear_table()Michael Holzheu1-11/+11
The ptff() and clear_table() functions use the gcc extension "variable length arrays in structures" (VLAIS) to define in the inline assembler constraints the area of the clobbered memory. This extension will most likely never be supported by LLVM/Clang. Since currently BPF programs are compiled with LLVM, this leads to the following compile errors: $ cd samples/bpf $ make In file included from /root/linux-master/samples/bpf/tracex1_kern.c:8: In file included from ./include/linux/netdevice.h:44: ... In file included from ./arch/s390/include/asm/mmu_context.h:10: ./arch/s390/include/asm/pgalloc.h:30:24: error: fields must have a constant size: 'variable length array in structure' extension will never be supported typedef struct { char _[n]; } addrtype; In file included from /root/linux-master/samples/bpf/tracex1_kern.c:7: In file included from ./include/linux/skbuff.h:18: ... In file included from ./include/linux/jiffies.h:8: In file included from ./include/linux/timex.h:65: ./arch/s390/include/asm/timex.h:105:24: error: fields must have a constant size: 'variable length array in structure' extension will never be supported typedef struct { char _[len]; } addrtype; To fix this do the following: - Convert ptff() into a macro that then uses a fixed size array when expanded. - Convert the clear_table() function and use an inline assembly with fixed size array in a loop. The runtime performance of the new version is even better than the old version (tested with EC12/z13 and gcc 4.8.5/6.2.1 with "-march=z196 -O2"). Reported-by: Zvonko Kosic <zvonko.kosic@de.ibm.com> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-20s390/mm: add shadow gmap supportMartin Schwidefsky1-0/+2
For a nested KVM guest the outer KVM host needs to create shadow page tables for the nested guest. This patch adds the basic support to the guest address space (gmap) code. For each guest address space the inner KVM host creates, the first outer KVM host needs to create shadow page tables. The address space is identified by the ASCE loaded into the control register 1 at the time the inner SIE instruction for the second nested KVM guest is executed. The outer KVM host creates the shadow tables starting with the table identified by the ASCE on a on-demand basis. The outer KVM host will get repeated faults for all the shadow tables needed to run the second KVM guest. While a shadow page table for the second KVM guest is active the access to the origin region, segment and page tables needs to be restricted for the first KVM guest. For region and segment and page tables the first KVM guest may read the memory, but write attempt has to lead to an unshadow. This is done using the page invalid and read-only bits in the page table of the first KVM guest. If the first guest re-accesses one of the origin pages of a shadow, it gets a fault and the affected parts of the shadow page table hierarchy needs to be removed again. PGSTE tables don't have to be shadowed, as all interpretation assist can't deal with the invalid bits in the shadow pte being set differently than the original ones provided by the first KVM guest. Many bug fixes and improvements by David Hildenbrand. Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-04-21s390/mm: fix asce_bits handling with dynamic pagetable levelsGerald Schaefer1-2/+2
There is a race with multi-threaded applications between context switch and pagetable upgrade. In switch_mm() a new user_asce is built from mm->pgd and mm->context.asce_bits, w/o holding any locks. A concurrent mmap with a pagetable upgrade on another thread in crst_table_upgrade() could already have set new asce_bits, but not yet the new mm->pgd. This would result in a corrupt user_asce in switch_mm(), and eventually in a kernel panic from a translation exception. Fix this by storing the complete asce instead of just the asce_bits, which can then be read atomically from switch_mm(), so that it either sees the old value or the new value, but no mixture. Both cases are OK. Having the old value would result in a page fault on access to the higher level memory, but the fault handler would see the new mm->pgd, if it was a valid access after the mmap on the other thread has completed. So as worst-case scenario we would have a page fault loop for the racing thread until the next time slice. Also remove dead code and simplify the upgrade/downgrade path, there are no upgrades from 2 levels, and only downgrades from 3 levels for compat tasks. There are also no concurrent upgrades, because the mmap_sem is held with down_write() in do_mmap, so the flush and table checks during upgrade can be removed. Reported-by: Michael Munday <munday@ca.ibm.com> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-16Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linuxLinus Torvalds1-4/+0
Pull s390 updates from Martin Schwidefsky: - Add the CPU id for the new z13s machine - Add a s390 specific XOR template for RAID-5 checksumming based on the XC instruction. Remove all other alternatives, XC is always faster - The merge of our four different stack tracers into a single one - Tidy up the code related to page tables, several large inline functions are now out-of-line. Bloat-o-meter reports ~11K text size reduction - A binary interface for the priviledged CLP instruction to retrieve the hardware view of the installed PCI functions - Improvements for the dasd format code - Bug fixes and cleanups * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (31 commits) s390/pci: enforce fmb page boundary rule s390: fix floating pointer register corruption (again) s390/cpumf: add missing lpp magic initialization s390: Fix misspellings in comments s390/mm: split arch/s390/mm/pgtable.c s390/mm: uninline pmdp_xxx functions from pgtable.h s390/mm: uninline ptep_xxx functions from pgtable.h s390/pci: add ioctl interface for CLP s390: Use pr_warn instead of pr_warning s390/dasd: remove casts to dasd_*_private s390/dasd: Refactor dasd format functions s390/dasd: Simplify code in format logic s390/dasd: Improve dasd format code s390/percpu: remove this_cpu_cmpxchg_double_4 s390/cpumf: Improve guest detection heuristics s390/fault: merge report_user_fault implementations s390/dis: use correct escape sequence for '%' character s390/kvm: simplify set_guest_storage_key s390/oprofile: add z13/z13s model numbers s390: add z13s model number to z13 elf platform ...
2016-03-10s390/mm: four page table levels vs. forkMartin Schwidefsky1-5/+19
The fork of a process with four page table levels is broken since git commit 6252d702c5311ce9 "[S390] dynamic page tables." All new mm contexts are created with three page table levels and an asce limit of 4TB. If the parent has four levels dup_mmap will add vmas to the new context which are outside of the asce limit. The subsequent call to copy_page_range will walk the three level page table structure of the new process with non-zero pgd and pud indexes. This leads to memory clobbers as the pgd_index *and* the pud_index is added to the mm->pgd pointer without a pgd_deref in between. The init_new_context() function is selecting the number of page table levels for a new context. The function is used by mm_init() which in turn is called by dup_mm() and mm_alloc(). These two are used by fork() and exec(). The init_new_context() function can distinguish the two cases by looking at mm->context.asce_limit, for fork() the mm struct has been copied and the number of page table levels may not change. For exec() the mm_alloc() function set the new mm structure to zero, in this case a three-level page table is created as the temporary stack space is located at STACK_TOP_MAX = 4TB. This fixes CVE-2016-2143. Reported-by: Marcin Koƛcielnicki <koriakin@0x04.net> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08s390/mm: split arch/s390/mm/pgtable.cMartin Schwidefsky1-4/+0
The pgtable.c file is quite big, before it grows any larger split it into pgtable.c, pgalloc.c and gmap.c. In addition move the gmap related header definitions into the new gmap.h header and all of the pgste helpers from pgtable.h to pgtable.c. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-04-23s390/kvm: remove delayed reallocation of page tables for KVMMartin Schwidefsky1-0/+1
Replacing a 2K page table with a 4K page table while a VMA is active for the affected memory region is fundamentally broken. Rip out the page table reallocation code and replace it with a simple system control 'vm.allocate_pgste'. If the system control is set the page tables for all processes are allocated as full 4K pages, even for processes that do not need it. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-03-25s390: remove 31 bit supportHeiko Carstens1-24/+0
Remove the 31 bit support in order to reduce maintenance cost and effectively remove dead code. Since a couple of years there is no distribution left that comes with a 31 bit kernel. The 31 bit kernel also has been broken since more than a year before anybody noticed. In addition I added a removal warning to the kernel shown at ipl for 5 minutes: a960062e5826 ("s390: add 31 bit warning message") which let everybody know about the plan to remove 31 bit code. We didn't get any response. Given that the last 31 bit only machine was introduced in 1999 let's remove the code. Anybody with 31 bit user space code can still use the compat mode. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-12-18Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+1
Pull KVM update from Paolo Bonzini: "3.19 changes for KVM: - spring cleaning: removed support for IA64, and for hardware- assisted virtualization on the PPC970 - ARM, PPC, s390 all had only small fixes For x86: - small performance improvements (though only on weird guests) - usual round of hardware-compliancy fixes from Nadav - APICv fixes - XSAVES support for hosts and guests. XSAVES hosts were broken because the (non-KVM) XSAVES patches inadvertently changed the KVM userspace ABI whenever XSAVES was enabled; hence, this part is going to stable. Guest support is just a matter of exposing the feature and CPUID leaves support" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits) KVM: move APIC types to arch/x86/ KVM: PPC: Book3S: Enable in-kernel XICS emulation by default KVM: PPC: Book3S HV: Improve H_CONFER implementation KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register KVM: PPC: Book3S HV: Remove code for PPC970 processors KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions KVM: PPC: Book3S HV: Simplify locking around stolen time calculations arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function arch: powerpc: kvm: book3s_pr.c: Remove unused function arch: powerpc: kvm: book3s.c: Remove some unused functions arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked KVM: PPC: Book3S HV: ptes are big endian KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI KVM: PPC: Book3S HV: Fix KSM memory corruption KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI KVM: PPC: Book3S HV: Fix computation of tlbie operand KVM: PPC: Book3S HV: Add missing HPTE unlock KVM: PPC: BookE: Improve irq inject tracepoint arm/arm64: KVM: Require in-kernel vgic for the arch timers ...
2014-11-28KVM: S390: Create helper function get_guest_storage_keyJason J. Herne1-0/+1
Define get_guest_storage_key which can be used to get the value of a guest storage key. This compliments the functionality provided by the helper function set_guest_storage_key. Both functions are needed for live migration of s390 guests that use storage keys. Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-10-27s390/mm: recfactor global pgste updatesDominik Dingel1-2/+0
Replace the s390 specific page table walker for the pgste updates with a call to the common code walk_page_range function. There are now two pte modification functions, one for the reset of the CMMA state and another one for the initialization of the storage keys. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-08-26KVM: s390/mm: use radix trees for guest to host mappingsMartin Schwidefsky1-4/+4
Store the target address for the gmap segments in a radix tree instead of using invalid segment table entries. gmap_translate becomes a simple radix_tree_lookup, gmap_fault is split into the address translation with gmap_translate and the part that does the linking of the gmap shadow page table with the process page table. A second radix tree is used to keep the pointers to the segment table entries for segments that are mapped in the guest address space. On unmap of a segment the pointer is retrieved from the radix tree and is used to carry out the segment invalidation in the gmap shadow page table. As the radix tree can only store one pointer, each host segment may only be mapped to exactly one guest location. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-04-22KVM: s390: Clear storage keysDominik Dingel1-1/+2
page_table_reset_pgste() already does a complete page table walk to reset the pgste. Enhance it to initialize the storage keys to PAGE_DEFAULT_KEY if requested by the caller. This will be used for lazy storage key handling. Also provide an empty stub for !CONFIG_PGSTE Lets adopt the current code (diag 308) to not clear the keys. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-02-21s390/mm: enable split page table lock for PMD levelMartin Schwidefsky1-3/+14
Add the pgtable_pmd_page_ctor/pgtable_pmd_page_dtor calls to the pmd allocation and free functions and enable ARCH_ENABLE_SPLIT_PMD_PTLOCK for 64 bit. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-02-21s390/kvm: set guest page states to stable on re-iplMartin Schwidefsky1-0/+1
The guest page state needs to be reset to stable for all pages on initial program load via diagnose 0x308. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-06-26s390/kvm: Provide function for setting the guest storage keyChristian Borntraeger1-0/+3
From time to time we need to set the guest storage key. Lets provide a helper function that handles the changes with all the right locking and checking. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-07-20s390/comments: unify copyright messages and remove file namesHeiko Carstens1-3/+1
Remove the file name from the comment at top of many files. In most cases the file name was wrong anyway, so it's rather pointless. Also unify the IBM copyright statement. We did have a lot of sightly different statements and wanted to change them one after another whenever a file gets touched. However that never happened. Instead people start to take the old/"wrong" statements to use as a template for new files. So unify all of them in one go. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2012-05-24s390/headers: replace __s390x__ with CONFIG_64BIT where possibleHeiko Carstens1-3/+3
Replace __s390x__ with CONFIG_64BIT in all places that are not exported to userspace or guarded with #ifdef __KERNEL__. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2012-04-11[S390] fix tlb flushing for page table pagesMartin Schwidefsky1-3/+0
Git commit 36409f6353fc2d7b6516e631415f938eadd92ffa "use generic RCU page-table freeing code" introduced a tlb flushing bug. Partially revert the above git commit and go back to s390 specific page table flush code. For s390 the TLB can contain three types of entries, "normal" TLB page-table entries, TLB combined region-and-segment-table (CRST) entries and real-space entries. Linux does not use real-space entries which leaves normal TLB entries and CRST entries. The CRST entries are intermediate steps in the page-table translation called translation paths. For example a 4K page access in a three-level page table setup will create two CRST TLB entries and one page-table TLB entry. The advantage of that approach is that a page access next to the previous one can reuse the CRST entries and needs just a single read from memory to create the page-table TLB entry. The disadvantage is that the TLB flushing rules are more complicated, before any page-table may be freed the TLB needs to be flushed. In short: the generic RCU page-table freeing code is incorrect for the CRST entries, in particular the check for mm_users < 2 is troublesome. This is applicable to 3.0+ kernels. Cc: <stable@vger.kernel.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] kvm guest address space mappingMartin Schwidefsky1-3/+4
Add code that allows KVM to control the virtual memory layout that is seen by a guest. The guest address space uses a second page table that shares the last level pte-tables with the process page table. If a page is unmapped from the process page table it is automatically unmapped from the guest page table as well. The guest address space mapping starts out empty, KVM can map any individual 1MB segments from the process virtual memory to any 1MB aligned location in the guest virtual memory. If a target segment in the process virtual memory does not exist or is unmapped while a guest mapping exists the desired target address is stored as an invalid segment table entry in the guest page table. The population of the guest page table is fault driven. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-06-06[S390] use generic RCU page-table freeing codeMartin Schwidefsky1-4/+4
Replace the s390 specific rcu page-table freeing code with the generic variant. This requires to duplicate the definition for the struct mmu_table_batch as s390 does not use the generic tlb flush code. While we are at it remove the restriction that page table fragments can not be reused after a single fragment has been freed with rcu and split out allocation and freeing of page tables with pgstes. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-23[S390] refactor page table functions for better pgste supportMartin Schwidefsky1-24/+5
Rework the architecture page table functions to access the bits in the page table extension array (pgste). There are a number of changes: 1) Fix missing pgste update if the attach_count for the mm is <= 1. 2) For every operation that affects the invalid bit in the pte or the rcp byte in the pgste the pcl lock needs to be acquired. The function pgste_get_lock gets the pcl lock and returns the current pgste value for a pte pointer. The function pgste_set_unlock stores the pgste and releases the lock. Between these two calls the bits in the pgste can be shuffled. 3) Define two software bits in the pte _PAGE_SWR and _PAGE_SWC to avoid calling SetPageDirty and SetPageReferenced from pgtable.h. If the host reference backup bit or the host change backup bit has been set the dirty/referenced state is transfered to the pte. The common code will pick up the state from the pte. 4) Add ptep_modify_prot_start and ptep_modify_prot_commit for mprotect. 5) Remove pgd_populate_kernel, pud_populate_kernel, pmd_populate_kernel pgd_clear_kernel, pud_clear_kernel, pmd_clear_kernel and ptep_invalidate. 6) Rename kvm_s390_test_and_clear_page_dirty to ptep_test_and_clear_user_dirty and add ptep_test_and_clear_user_young. 7) Define mm_exclusive() and mm_has_pgste() helper to improve readability. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-23[S390] Remove data execution protectionMartin Schwidefsky1-24/+4
The noexec support on s390 does not rely on a bit in the page table entry but utilizes the secondary space mode to distinguish between memory accesses for instructions vs. data. The noexec code relies on the assumption that the cpu will always use the secondary space page table for data accesses while it is running in the secondary space mode. Up to the z9-109 class machines this has been the case. Unfortunately this is not true anymore with z10 and later machines. The load-relative-long instructions lrl, lgrl and lgfrl access the memory operand using the same addressing-space mode that has been used to fetch the instruction. This breaks the noexec mode for all user space binaries compiled with march=z10 or later. The only option is to remove the current noexec support. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25[S390] lockless get_user_pages_fast()Martin Schwidefsky1-0/+4
Implement get_user_pages_fast without locking in the fastpath on s390. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-12-07[S390] Improve address space mode selection.Martin Schwidefsky1-1/+2
Introduce user_mode to replace the two variables switch_amode and s390_noexec. There are three valid combinations of the old values: 1) switch_amode == 0 && s390_noexec == 0 2) switch_amode == 1 && s390_noexec == 0 3) switch_amode == 1 && s390_noexec == 1 They get replaced by 1) user_mode == HOME_SPACE_MODE 2) user_mode == PRIMARY_SPACE_MODE 3) user_mode == SECONDARY_SPACE_MODE The new kernel parameter user_mode=[primary,secondary,home] lets you choose the address space mode the user space processes should use. In addition the CONFIG_S390_SWITCH_AMODE config option is removed. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-09-11[S390] fix recursive locking on page_table_lockMartin Schwidefsky1-0/+1
Suzuki Poulose reported the following recursive locking bug on s390: Here is the stack trace : (see Appendix I for more info) [<0000000000406ed6>] _spin_lock+0x52/0x94 [<0000000000103bde>] crst_table_free+0x14e/0x1a4 [<00000000001ba684>] __pmd_alloc+0x114/0x1ec [<00000000001be8d0>] handle_mm_fault+0x2cc/0xb80 [<0000000000407d62>] do_dat_exception+0x2b6/0x3a0 [<0000000000114f8c>] sysc_return+0x0/0x8 [<00000200001642b2>] 0x200001642b2 The page_table_lock is already acquired in __pmd_alloc (mm/memory.c) and it tries to populate the pud/pgd with a new pmd allocated. If another thread populates it before we get a chance, we free the pmd using pmd_free(). On s390x, pmd_free(even pud_free ) is #defined to crst_table_free(), which acquires the page_table_lock to protect the crst_table index updates. Hence this ends up in a recursive locking of the page_table_lock. The solution suggested by Dave Hansen is to use a new spin lock in the mmu context to protect the access to the crst_list and the pgtable_list. Reported-by: Suzuki Poulose <suzuki@in.ibm.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-12-25[S390] clear_table inline assembly contraintsMartin Schwidefsky1-1/+4
Tell the compile that the clear_table inline assembly writes to the memory referenced by *s. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-08-01[S390] move include/asm-s390 to arch/s390/include/asmMartin Schwidefsky1-0/+174
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>