aboutsummaryrefslogtreecommitdiffstats
path: root/arch/s390/mm/hugetlbpage.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-09-11s390/hugetlb: switch to generic version of follow_huge_pud()Gerald Schaefer1-10/+0
When pud-sized hugepages were introduced for s390, the generic version of follow_huge_pud() was using pte_page() instead of pud_page(). This would be wrong for s390, see also commit 97534127012f ("mm/hugetlb: use pmd_page() in follow_huge_pmd()"). Therefore, and probably because not all archs were supporting pud_page() at that time, a private version of follow_huge_pud() was added for s390, correctly using pud_page(). Since commit 3a194f3f8ad01 ("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry"), the generic version of follow_huge_pud() is now also using pud_page(), and in general behaves similar to follow_huge_pmd(). Therefore we can now switch to the generic version and get rid of the s390-specific follow_huge_pud(). Link: https://lkml.kernel.org/r/20220818135717.609eef8a@thinkpad Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Haiyue Wang <haiyue.wang@intel.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-03-01s390/mm,hugetlb: don't use pte_val()/pXd_val() as lvalueHeiko Carstens1-21/+13
Convert pgtable code so pte_val()/pXd_val() aren't used as lvalue anymore. This allows in later step to convert pte_val()/pXd_val() to functions, which in turn makes it impossible to use these macros to modify page table entries like they have been used before. Therefore a construct like this: pte_val(*pte) = __pa(addr) | prot; which would directly write into a page table, isn't possible anymore with the last step of this series. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2022-03-01s390/mm: use set_pXd()/set_pte() helper functions everywhereHeiko Carstens1-1/+1
Use the new set_pXd()/set_pte() helper functions at all places where page table entries are modified. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-12-16add includes masked by cgroup -> bpf dependencyJakub Kicinski1-0/+1
cgroup pulls in BPF which pulls in a lot of includes. We're about to break that chain so fix those who were depending on it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20211216025538.1649516-2-kuba@kernel.org
2021-05-05hugetlb: pass vma into huge_pte_alloc() and huge_pmd_share()Peter Xu1-1/+1
Patch series "hugetlb: Disable huge pmd unshare for uffd-wp", v4. This series tries to disable huge pmd unshare of hugetlbfs backed memory for uffd-wp. Although uffd-wp of hugetlbfs is still during rfc stage, the idea of this series may be needed for multiple tasks (Axel's uffd minor fault series, and Mike's soft dirty series), so I picked it out from the larger series. This patch (of 4): It is a preparation work to be able to behave differently in the per architecture huge_pte_alloc() according to different VMA attributes. Pass it deeper into huge_pmd_share() so that we can avoid the find_vma() call. [peterx@redhat.com: build fix] Link: https://lkml.kernel.org/r/20210304164653.GB397383@xz-x1Link: https://lkml.kernel.org/r/20210218230633.15028-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20210218230633.15028-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Suggested-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Adam Ruprecht <ruprecht@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Cannon Matthews <cannonmatthews@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chinwen Chang <chinwen.chang@mediatek.com> Cc: David Rientjes <rientjes@google.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Michel Lespinasse <walken@google.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Mina Almasry <almasrymina@google.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oliver Upton <oupton@google.com> Cc: Shaohua Li <shli@fb.com> Cc: Shawn Anastasio <shawn@anastas.io> Cc: Steven Price <steven.price@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-07-09s390/mm: fix huge pte soft dirty copyingJanosch Frank1-1/+1
If the pmd is soft dirty we must mark the pte as soft dirty (and not dirty). This fixes some cases for guest migration with huge page backings. Cc: <stable@vger.kernel.org> # 4.8 Fixes: bc29b7ac1d9f ("s390/mm: clean up pte/pmd encoding") Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2020-06-03hugetlbfs: move hugepagesz= parsing to arch independent codeMike Kravetz1-18/+0
Now that architectures provide arch_hugetlb_valid_size(), parsing of "hugepagesz=" can be done in architecture independent code. Create a single routine to handle hugepagesz= parsing and remove all arch specific routines. We can also remove the interface hugetlb_bad_size() as this is no longer used outside arch independent code. This also provides consistent behavior of hugetlbfs command line options. The hugepagesz= option should only be specified once for a specific size, but some architectures allow multiple instances. This appears to be more of an oversight when code was added by some architectures to set up ALL huge pages sizes. Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Sandipan Das <sandipan@linux.ibm.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Mina Almasry <almasrymina@google.com> Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390] Acked-by: Will Deacon <will@kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Longpeng <longpeng2@huawei.com> Cc: Nitesh Narayan Lal <nitesh@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Qian Cai <cai@lca.pw> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/20200417185049.275845-3-mike.kravetz@oracle.com Link: http://lkml.kernel.org/r/20200428205614.246260-3-mike.kravetz@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03hugetlbfs: add arch_hugetlb_valid_sizeMike Kravetz1-4/+12
Patch series "Clean up hugetlb boot command line processing", v4. Longpeng(Mike) reported a weird message from hugetlb command line processing and proposed a solution [1]. While the proposed patch does address the specific issue, there are other related issues in command line processing. As hugetlbfs evolved, updates to command line processing have been made to meet immediate needs and not necessarily in a coordinated manner. The result is that some processing is done in arch specific code, some is done in arch independent code and coordination is problematic. Semantics can vary between architectures. The patch series does the following: - Define arch specific arch_hugetlb_valid_size routine used to validate passed huge page sizes. - Move hugepagesz= command line parsing out of arch specific code and into an arch independent routine. - Clean up command line processing to follow desired semantics and document those semantics. [1] https://lore.kernel.org/linux-mm/20200305033014.1152-1-longpeng2@huawei.com This patch (of 3): The architecture independent routine hugetlb_default_setup sets up the default huge pages size. It has no way to verify if the passed value is valid, so it accepts it and attempts to validate at a later time. This requires undocumented cooperation between the arch specific and arch independent code. For architectures that support more than one huge page size, provide a routine arch_hugetlb_valid_size to validate a huge page size. hugetlb_default_setup can use this to validate passed values. arch_hugetlb_valid_size will also be used in a subsequent patch to move processing of the "hugepagesz=" in arch specific code to a common routine in arch independent code. Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390] Acked-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Longpeng <longpeng2@huawei.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Mina Almasry <almasrymina@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Nitesh Narayan Lal <nitesh@redhat.com> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Qian Cai <cai@lca.pw> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: http://lkml.kernel.org/r/20200428205614.246260-1-mike.kravetz@oracle.com Link: http://lkml.kernel.org/r/20200428205614.246260-2-mike.kravetz@oracle.com Link: http://lkml.kernel.org/r/20200417185049.275845-1-mike.kravetz@oracle.com Link: http://lkml.kernel.org/r/20200417185049.275845-2-mike.kravetz@oracle.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-20s390/mm: fix set_huge_pte_at() for empty ptesGerald Schaefer1-3/+6
On s390, the layout of normal and large ptes (i.e. pmds/puds) differs. Therefore, set_huge_pte_at() does a conversion from a normal pte to the corresponding large pmd/pud. So, when converting an empty pte, this should result in an empty pmd/pud, which would return true for pmd/pud_none(). However, after conversion we also mark the pmd/pud as large, and therefore present. For empty ptes, this will result in an empty pmd/pud that is also marked as large, and pmd/pud_none() would not return true. There is currently no issue with this behaviour, as set_huge_pte_at() does not seem to be called for empty ptes. It would be valid though, so let's fix this by not marking empty ptes as large in set_huge_pte_at(). This was found by testing a patch from from Anshuman Khandual, which is currently discussed on LKML ("mm/debug: Add more arch page table helper tests"). Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-03-27s390/mm: cleanup arch_get_unmapped_area() and friendsAlexander Gordeev1-9/+2
Factor out check_asce_limit() function and fix few style defects in arch_get_unmapped_area() family of functions. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> [heiko.carstens@de.ibm.com: small coding style changes] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-01-30s390/mm: fix dynamic pagetable upgrade for hugetlbfsGerald Schaefer1-1/+99
Commit ee71d16d22bb ("s390/mm: make TASK_SIZE independent from the number of page table levels") changed the logic of TASK_SIZE and also removed the arch_mmap_check() implementation for s390. This combination has a subtle effect on how get_unmapped_area() for hugetlbfs pages works. It is now possible that a user process establishes a hugetlbfs mapping at an address above 4 TB, without triggering a dynamic pagetable upgrade from 3 to 4 levels. This is because hugetlbfs mappings will not use mm->get_unmapped_area, but rather file->f_op->get_unmapped_area, which currently is the generic implementation of hugetlb_get_unmapped_area() that does not know about s390 dynamic pagetable upgrades, but with the new definition of TASK_SIZE, it will now allow mappings above 4 TB. Subsequent access to such a mapped address above 4 TB will result in a page fault loop, because the CPU cannot translate such a large address with 3 pagetable levels. The fault handler will try to map in a hugepage at the address, but due to the folded pagetable logic it will end up with creating entries in the 3 level pagetable, possibly overwriting existing mappings, and then it all repeats when the access is retried. Apart from the page fault loop, this can have various nasty effects, e.g. kernel panic from one of the BUG_ON() checks in memory management code, or even data loss if an existing mapping gets overwritten. Fix this by implementing HAVE_ARCH_HUGETLB_UNMAPPED_AREA support for s390, providing an s390 version for hugetlb_get_unmapped_area() with pagetable upgrade support similar to arch_get_unmapped_area(), which will then be used instead of the generic version. Fixes: ee71d16d22bb ("s390/mm: make TASK_SIZE independent from the number of page table levels") Cc: <stable@vger.kernel.org> # 4.12+ Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2018-07-30s390/mm: Clear skeys for newly mapped huge guest pmdsJanosch Frank1-0/+24
Similarly to the pte skey handling, where we set the storage key to the default key for each newly mapped pte, we have to also do that for huge pmds. With the PG_arch_1 flag we keep track if the area has already been cleared of its skeys. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman1-0/+1
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-06mm/hugetlb: add size parameter to huge_pte_offset()Punit Agrawal1-1/+2
A poisoned or migrated hugepage is stored as a swap entry in the page tables. On architectures that support hugepages consisting of contiguous page table entries (such as on arm64) this leads to ambiguity in determining the page table entry to return in huge_pte_offset() when a poisoned entry is encountered. Let's remove the ambiguity by adding a size parameter to convey additional information about the requested address. Also fixup the definition/usage of huge_pte_offset() throughout the tree. Link: http://lkml.kernel.org/r/20170522133604.11392-4-punit.agrawal@arm.com Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Steve Capper <steve.capper@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: James Hogan <james.hogan@imgtec.com> (odd fixer:METAG ARCHITECTURE) Cc: Ralf Baechle <ralf@linux-mips.org> (supporter:MIPS) Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-12s390/mm: implement 5 level pages tablesMartin Schwidefsky1-11/+19
Add the logic to upgrade the page table for a 64-bit process to five levels. This increases the TASK_SIZE from 8PB to 16EB-4K. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-02-23s390/mm: use _SEGMENT_ENTRY_EMPTY in the codeDominik Dingel1-1/+1
_SEGMENT_ENTRY_INVALID denotes the invalid bit in a segment table entry whereas _SEGMENT_ENTRY_EMPTY means that the value of the whole entry is only the invalid bit, as the entry is completely empty. Therefore we use _SEGMENT_ENTRY_INVALID only to check and set the invalid bit with bitwise operations. _SEGMENT_ENTRY_EMPTY is only used to check for (un)equality. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2017-02-08s390: add no-execute supportMartin Schwidefsky1-1/+9
Bit 0x100 of a page table, segment table of region table entry can be used to disallow code execution for the virtual addresses associated with the entry. There is one tricky bit, the system call to return from a signal is part of the signal frame written to the user stack. With a non-executable stack this would stop working. To avoid breaking things the protection fault handler checks the opcode that caused the fault for 0x0a77 (sys_sigreturn) and 0x0aad (sys_rt_sigreturn) and injects a system call. This is preferable to the alternative solution with a stub function in the vdso because it works for vdso=off and statically linked binaries as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-10-17s390/mm: use hugetlb_bad_size()Shyam Saini1-0/+1
Update setup_hugepagesz() to call hugetlb_bad_size() when unsupported hugepage size is found. Signed-off-by: Shyam Saini <mayhs11saini@gmail.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-31s390/mm: clean up pte/pmd encodingGerald Schaefer1-14/+38
The hugetlbfs pte<->pmd conversion functions currently assume that the pmd bit layout is consistent with the pte layout, which is not really true. The SW read and write bits are encoded as the sequence "wr" in a pte, but in a pmd it is "rw". The hugetlbfs conversion assumes that the sequence is identical in both cases, which results in swapped read and write bits in the pmd. In practice this is not a problem, because those pmd bits are only relevant for THP pmds and not for hugetlbfs pmds. The hugetlbfs code works on (fake) ptes, and the converted pte bits are correct. There is another variation in pte/pmd encoding which affects dirty prot-none ptes/pmds. In this case, a pmd has both its HW read-only and invalid bit set, while it is only the invalid bit for a pte. This also has no effect in practice, but it should better be consistent. This patch fixes both inconsistencies by changing the SW read/write bit layout for pmds as well as the PAGE_NONE encoding for ptes. It also makes the hugetlbfs conversion functions more robust by introducing a move_set_bit() macro that uses the pte/pmd bit #defines instead of constant shifts. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-06s390/mm: add support for 2GB hugepagesGerald Schaefer1-39/+90
This adds support for 2GB hugetlbfs pages on s390. Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-08s390/mm: uninline pmdp_xxx functions from pgtable.hMartin Schwidefsky1-4/+3
The pmdp_xxx function are smaller than their ptep_xxx counterparts but to keep things symmetrical unline them as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14s390/mm: implement soft-dirty bits for user memory change trackingMartin Schwidefsky1-0/+2
Use bit 2**1 of the pte and bit 2**14 of the pmd for the soft dirty bit. The fault mechanism to do dirty tracking is already in place. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-06-25s390/mm: forward check for huge pmds to pmd_large()Dominik Dingel1-4/+1
We already do the check in pmd_large, so we can just forward the call. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-25s390/hugetlb: remove dead code for sw emulated huge pagesDominik Dingel1-57/+3
We now support only hugepages on hardware with EDAT1 support. So we remove the prepare/release_hugepage hooks and simplify set_huge_pte_at and huge_ptep_get. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24mm/hugetlb: reduce arch dependent code about huge_pmd_unshareZhang Zhen1-5/+0
Currently we have many duplicates in definitions of huge_pmd_unshare. In all architectures this function just returns 0 when CONFIG_ARCH_WANT_HUGE_PMD_SHARE is N. This patch puts the default implementation in mm/hugetlb.c and lets these architectures use the common code. Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: David Rientjes <rientjes@google.com> Cc: James Yang <James.Yang@freescale.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-23s390/mm: change swap pte encoding and pgtable cleanupMartin Schwidefsky1-28/+34
After the file ptes have been removed the bit combination used to encode non-linear mappings can be reused for the swap ptes. This frees up a precious pte software bit. Reflect the change in the swap encoding in the comments and do some cleanup while we are at it. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-04-23s390/mm: correct transfer of dirty & young bits in __pmd_to_pteMartin Schwidefsky1-2/+2
The dirty & young bit from the pmd is not copied correctly to the pseudo pte in __pmd_to_pte. In fact it is not copied at all, the bits get lost. As the old style huge page currently does not need the dirty & young information this has no effect, but may be needed in the future. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-02-11mm/hugetlb: reduce arch dependent code around follow_huge_*Naoya Horiguchi1-20/+0
Currently we have many duplicates in definitions around follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this patch tries to remove the m. The basic idea is to put the default implementation for these functions in mm/hugetlb.c as weak symbols (regardless of CONFIG_ARCH_WANT_GENERAL_HUGETL B), and to implement arch-specific code only when the arch needs it. For follow_huge_addr(), only powerpc and ia64 have their own implementation, and in all other architectures this function just returns ERR_PTR(-EINVAL). So this patch sets returning ERR_PTR(-EINVAL) as default. As for follow_huge_(pmd|pud)(), if (pmd|pud)_huge() is implemented to always return 0 in your architecture (like in ia64 or sparc,) it's never called (the callsite is optimized away) no matter how implemented it is. So in such architectures, we don't need arch-specific implementation. In some architecture (like mips, s390 and tile,) their current arch-specific follow_huge_(pmd|pud)() are effectively identical with the common code, so this patch lets these architecture use the common code. One exception is metag, where pmd_huge() could return non-zero but it expects follow_huge_pmd() to always return NULL. This means that we need arch-specific implementation which returns NULL. This behavior looks strange to me (because non-zero pmd_huge() implies that the architecture supports PMD-based hugepage, so follow_huge_pmd() can/should return some relevant value,) but that's beyond this cleanup patch, so let's keep it. Justification of non-trivial changes: - in s390, follow_huge_pmd() checks !MACHINE_HAS_HPAGE at first, and this patch removes the check. This is OK because we can assume MACHINE_HAS_HPAGE is true when follow_huge_pmd() can be called (note that pmd_huge() has the same check and always returns 0 for !MACHINE_HAS_HPAGE.) - in s390 and mips, we use HPAGE_MASK instead of PMD_MASK as done in common code. This patch forces these archs use PMD_MASK, but it's OK because they are identical in both archs. In s390, both of HPAGE_SHIFT and PMD_SHIFT are 20. In mips, HPAGE_SHIFT is defined as (PAGE_SHIFT + PAGE_SHIFT - 3) and PMD_SHIFT is define as (PAGE_SHIFT + PAGE_SHIFT + PTE_ORDER - 3), but PTE_ORDER is always 0, so these are identical. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Steve Capper <steve.capper@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-25s390/mm: remove change bit override supportHeiko Carstens1-1/+1
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-08-01s390/mm: implement dirty bits for large segment table entriesMartin Schwidefsky1-56/+47
The large segment table entry format has block of bits for the ACC/F values for the large page. These bits are valid only if another bit (AV bit 0x10000) of the segment table entry is set. The ACC/F bits do not have a meaning if the AV bit is off. This allows to put the THP splitting bit, the segment young bit and the new segment dirty bit into the ACC/F bits as long as the AV bit stays off. The dirty and young information is only available if the pmd is large. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-06-04hugetlb: restrict hugepage_migration_support() to x86_64Naoya Horiguchi1-5/+0
Currently hugepage migration is available for all archs which support pmd-level hugepage, but testing is done only for x86_64 and there're bugs for other archs. So to avoid breaking such archs, this patch limits the availability strictly to x86_64 until developers of other archs get interested in enabling this feature. Simply disabling hugepage migration on non-x86_64 archs is not enough to fix the reported problem where sys_move_pages() hits the BUG_ON() in follow_page(FOLL_GET), so let's fix this by checking if hugepage migration is supported in vma_migratable(). Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03s390/mm,tlb: optimize TLB flushing for zEC12Martin Schwidefsky1-4/+1
The zEC12 machines introduced the local-clearing control for the IDTE and IPTE instruction. If the control is set only the TLB of the local CPU is cleared of entries, either all entries of a single address space for IDTE, or the entry for a single page-table entry for IPTE. Without the local-clearing control the TLB flush is broadcasted to all CPUs in the configuration, which is expensive. The reset of the bit mask of the CPUs that need flushing after a non-local IDTE is tricky. As TLB entries for an address space remain in the TLB even if the address space is detached a new bit field is required to keep track of attached CPUs vs. CPUs in the need of a flush. After a non-local flush with IDTE the bit-field of attached CPUs is copied to the bit-field of CPUs in need of a flush. The ordering of operations on cpu_attach_mask, attach_count and mm_cpumask(mm) is such that an underindication in mm_cpumask(mm) is prevented but an overindication in mm_cpumask(mm) is possible. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-09-11mm: migrate: check movability of hugepage in unmap_and_move_huge_page()Naoya Horiguchi1-0/+5
Currently hugepage migration works well only for pmd-based hugepages (mainly due to lack of testing,) so we had better not enable migration of other levels of hugepages until we are ready for it. Some users of hugepage migration (mbind, move_pages, and migrate_pages) do page table walk and check pud/pmd_huge() there, so they are safe. But the other users (softoffline and memory hotremove) don't do this, so without this patch they can try to migrate unexpected types of hugepages. To prevent this, we introduce hugepage_migration_support() as an architecture dependent check of whether hugepage are implemented on a pmd basis or not. And on some architecture multiple sizes of hugepages are available, so hugepage_migration_support() also checks hugepage size. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-08-29s390/mm: implement software referenced bitsMartin Schwidefsky1-19/+39
The last remaining use for the storage key of the s390 architecture is reference counting. The alternative is to make page table entries invalid while they are old. On access the fault handler marks the pte/pmd as young which makes the pte/pmd valid if the access rights allow read access. The pte/pmd invalidations required for software managed reference bits cost a bit of performance, on the other hand the RRBE/RRBM instructions to read and reset the referenced bits are quite expensive as well. Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-08-22s390/mm: cleanup page table definitionsMartin Schwidefsky1-9/+95
Improve the encoding of the different pte types and the naming of the page, segment table and region table bits. Due to the different pte encoding the hugetlbfs primitives need to be adapted as well. To improve compatability with common code make the huge ptes use the encoding of normal ptes. The conversion between the pte and pmd encoding for a huge pte is done with set_huge_pte_at and huge_ptep_get. Overall the code is now easier to understand. Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2013-04-29mm/hugetlb: add more arch-defined huge_pte functionsGerald Schaefer1-1/+1
Commit abf09bed3cce ("s390/mm: implement software dirty bits") introduced another difference in the pte layout vs. the pmd layout on s390, thoroughly breaking the s390 support for hugetlbfs. This requires replacing some more pte_xxx functions in mm/hugetlbfs.c with a huge_pte_xxx version. This patch introduces those huge_pte_xxx functions and their generic implementation in asm-generic/hugetlb.h, which will now be included on all architectures supporting hugetlbfs apart from s390. This change will be a no-op for those architectures. [akpm@linux-foundation.org: fix warning] Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Hillf Danton <dhillf@gmail.com> Acked-by: Michal Hocko <mhocko@suse.cz> [for !s390 parts] Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-20s390/comments: unify copyright messages and remove file namesHeiko Carstens1-1/+1
Remove the file name from the comment at top of many files. In most cases the file name was wrong anyway, so it's rather pointless. Also unify the IBM copyright statement. We did have a lot of sightly different statements and wanted to change them one after another whenever a file gets touched. However that never happened. Instead people start to take the old/"wrong" statements to use as a template for new files. So unify all of them in one go. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2012-05-16s390/hugepages: clear page table for sw large page emulationGerald Schaefer1-0/+2
The software large page emulation on s390 did not clear the the pre-allocated page table in arch_release_hugepage() before freeing it. This could trigger the WARN_ON(!pte_none(*pte) in mm/vmalloc.c:106 and make vmap_pte_range() fail, because the page table could be reused in page_table_alloc(). This is fixed now by calling clear_table() before page_table_free(). Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] kvm guest address space mappingMartin Schwidefsky1-1/+1
Add code that allows KVM to control the virtual memory layout that is seen by a guest. The guest address space uses a second page table that shares the last level pte-tables with the process page table. If a page is unmapped from the process page table it is automatically unmapped from the guest page table as well. The guest address space mapping starts out empty, KVM can map any individual 1MB segments from the process virtual memory to any 1MB aligned location in the guest virtual memory. If a target segment in the process virtual memory does not exist or is unmapped while a guest mapping exists the desired target address is stored as an invalid segment table entry in the guest page table. The population of the guest page table is fault driven. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-05-23[S390] Remove data execution protectionMartin Schwidefsky1-10/+0
The noexec support on s390 does not rely on a bit in the page table entry but utilizes the secondary space mode to distinguish between memory accesses for instructions vs. data. The noexec code relies on the assumption that the cpu will always use the secondary space page table for data accesses while it is running in the secondary space mode. Up to the z9-109 class machines this has been the case. Unfortunately this is not true anymore with z10 and later machines. The load-relative-long instructions lrl, lgrl and lgfrl access the memory operand using the same addressing-space mode that has been used to fetch the instruction. This breaks the noexec mode for all user space binaries compiled with march=z10 or later. The only option is to remove the current noexec support. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2010-10-25[S390] lockless get_user_pages_fast()Martin Schwidefsky1-1/+1
Implement get_user_pages_fast without locking in the fastpath on s390. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-07-24hugetlb: introduce pud_hugeAndi Kleen1-0/+5
Straight forward extensions for huge pages located in the PUD instead of PMDs. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24hugetlb: modular state for hugetlb page sizeAndi Kleen1-1/+2
The goal of this patchset is to support multiple hugetlb page sizes. This is achieved by introducing a new struct hstate structure, which encapsulates the important hugetlb state and constants (eg. huge page size, number of huge pages currently allocated, etc). The hstate structure is then passed around the code which requires these fields, they will do the right thing regardless of the exact hstate they are operating on. This patch adds the hstate structure, with a single global instance of it (default_hstate), and does the basic work of converting hugetlb to use the hstate. Future patches will add more hstate structures to allow for different hugetlbfs mounts to have different page sizes. [akpm@linux-foundation.org: coding-style fixes] Acked-by: Adam Litke <agl@us.ibm.com> Acked-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30[S390] System z large page support.Gerald Schaefer1-0/+134
This adds hugetlbfs support on System z, using both hardware large page support if available and software large page emulation on older hardware. Shared (large) page tables are implemented in software emulation mode, by using page->index of the first tail page from a compound large page to store page table information. Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>