aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc/mm/pgtable_32.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-02-23powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWXChristophe Leroy1-1/+9
Today, STRICT_KERNEL_RWX is based on the use of regular pages to map kernel pages. On Book3s 32, it has three consequences: - Using pages instead of BAT for mapping kernel linear memory severely impacts performance. - Exec protection is not effective because no-execute cannot be set at page level (except on 603 which doesn't have hash tables) - Write protection is not effective because PP bits do not provide RO mode for kernel-only pages (except on 603 which handles it in software via PAGE_DIRTY) On the 603+, we have: - Independent IBAT and DBAT allowing limitation of exec parts. - NX bit can be set in segment registers to forbit execution on memory mapped by pages. - RO mode on DBATs even for kernel-only blocks. On the 601, there is nothing much we can do other than warn the user about it, because: - BATs are common to instructions and data. - BAT do not provide RO mode for kernel-only blocks. - segment registers don't have the NX bit. In order to use IBAT for exec protection, this patch: - Aligns _etext to BAT block sizes (128kb) - Set NX bit in kernel segment register (Except on vmalloc area when CONFIG_MODULES is selected) - Maps kernel text with IBATs. In order to use DBAT for exec protection, this patch: - Aligns RW DATA to BAT block sizes (4M) - Maps kernel RO area with write prohibited DBATs - Maps remaining memory with remaining DBATs Here is what we get with this patch on a 832x when activating STRICT_KERNEL_RWX: Symbols: c0000000 T _stext c0680000 R __start_rodata c0680000 R _etext c0800000 T __init_begin c0800000 T _sinittext ~# cat /sys/kernel/debug/block_address_translation ---[ Instruction Block Address Translation ]--- 0: 0xc0000000-0xc03fffff 0x00000000 Kernel EXEC coherent 1: 0xc0400000-0xc05fffff 0x00400000 Kernel EXEC coherent 2: 0xc0600000-0xc067ffff 0x00600000 Kernel EXEC coherent 3: - 4: - 5: - 6: - 7: - ---[ Data Block Address Translation ]--- 0: 0xc0000000-0xc07fffff 0x00000000 Kernel RO coherent 1: 0xc0800000-0xc0ffffff 0x00800000 Kernel RW coherent 2: 0xc1000000-0xc1ffffff 0x01000000 Kernel RW coherent 3: 0xc2000000-0xc3ffffff 0x02000000 Kernel RW coherent 4: 0xc4000000-0xc7ffffff 0x04000000 Kernel RW coherent 5: 0xc8000000-0xcfffffff 0x08000000 Kernel RW coherent 6: 0xd0000000-0xdfffffff 0x10000000 Kernel RW coherent 7: - ~# cat /sys/kernel/debug/segment_registers ---[ User Segments ]--- 0x00000000-0x0fffffff Kern key 1 User key 1 VSID 0xa085d0 0x10000000-0x1fffffff Kern key 1 User key 1 VSID 0xa086e1 0x20000000-0x2fffffff Kern key 1 User key 1 VSID 0xa087f2 0x30000000-0x3fffffff Kern key 1 User key 1 VSID 0xa08903 0x40000000-0x4fffffff Kern key 1 User key 1 VSID 0xa08a14 0x50000000-0x5fffffff Kern key 1 User key 1 VSID 0xa08b25 0x60000000-0x6fffffff Kern key 1 User key 1 VSID 0xa08c36 0x70000000-0x7fffffff Kern key 1 User key 1 VSID 0xa08d47 0x80000000-0x8fffffff Kern key 1 User key 1 VSID 0xa08e58 0x90000000-0x9fffffff Kern key 1 User key 1 VSID 0xa08f69 0xa0000000-0xafffffff Kern key 1 User key 1 VSID 0xa0907a 0xb0000000-0xbfffffff Kern key 1 User key 1 VSID 0xa0918b ---[ Kernel Segments ]--- 0xc0000000-0xcfffffff Kern key 0 User key 1 No Exec VSID 0x000ccc 0xd0000000-0xdfffffff Kern key 0 User key 1 No Exec VSID 0x000ddd 0xe0000000-0xefffffff Kern key 0 User key 1 No Exec VSID 0x000eee 0xf0000000-0xffffffff Kern key 0 User key 1 No Exec VSID 0x000fff Aligning _etext to 128kb allows to map up to 32Mb text with 8 IBATs: 16Mb + 8Mb + 4Mb + 2Mb + 1Mb + 512kb + 256kb + 128kb (+ 128kb) = 32Mb (A 9th IBAT is unneeded as 32Mb would need only a single 32Mb block) Aligning data to 4M allows to map up to 512Mb data with 8 DBATs: 16Mb + 8Mb + 4Mb + 4Mb + 32Mb + 64Mb + 128Mb + 256Mb = 512Mb Because some processors only have 4 BATs and because some targets need DBATs for mapping other areas, the following patch will allow to modify _etext and data alignment. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/32: always populate page tables for Abatron BDI.Christophe Leroy1-1/+4
When CONFIG_BDI_SWITCH is set, the page tables have to be populated allthough large TLBs are used, because the BDI switch knows nothing about those large TLBs which are handled directly in TLB miss logic. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/mm/32s: use generic mmu_mapin_ram() for all blocks.Christophe Leroy1-18/+9
Now that mmu_mapin_ram() is able to handle other blocks than the one starting at 0, the WII can use it for all its blocks. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/mm/32: add base address to mmu_mapin_ram()Christophe Leroy1-3/+3
At the time being, mmu_mapin_ram() always maps RAM from the beginning. But some platforms like the WII have to map a second block of RAM. This patch adds to mmu_mapin_ram() the base address of the block. At the moment, only base address 0 is supported. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-01-04mm: treewide: remove unused address argument from pte_alloc functionsJoel Fernandes (Google)1-4/+4
Patch series "Add support for fast mremap". This series speeds up the mremap(2) syscall by copying page tables at the PMD level even for non-THP systems. There is concern that the extra 'address' argument that mremap passes to pte_alloc may do something subtle architecture related in the future that may make the scheme not work. Also we find that there is no point in passing the 'address' to pte_alloc since its unused. This patch therefore removes this argument tree-wide resulting in a nice negative diff as well. Also ensuring along the way that the enabled architectures do not do anything funky with the 'address' argument that goes unnoticed by the optimization. Build and boot tested on x86-64. Build tested on arm64. The config enablement patch for arm64 will be posted in the future after more testing. The changes were obtained by applying the following Coccinelle script. (thanks Julia for answering all Coccinelle questions!). Following fix ups were done manually: * Removal of address argument from pte_fragment_alloc * Removal of pte_alloc_one_fast definitions from m68k and microblaze. // Options: --include-headers --no-includes // Note: I split the 'identifier fn' line, so if you are manually // running it, please unsplit it so it runs for you. virtual patch @pte_alloc_func_def depends on patch exists@ identifier E2; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; type T2; @@ fn(... - , T2 E2 ) { ... } @pte_alloc_func_proto_noarg depends on patch exists@ type T1, T2, T3, T4; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ ( - T3 fn(T1, T2); + T3 fn(T1); | - T3 fn(T1, T2, T4); + T3 fn(T1, T2); ) @pte_alloc_func_proto depends on patch exists@ identifier E1, E2, E4; type T1, T2, T3, T4; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ ( - T3 fn(T1 E1, T2 E2); + T3 fn(T1 E1); | - T3 fn(T1 E1, T2 E2, T4 E4); + T3 fn(T1 E1, T2 E2); ) @pte_alloc_func_call depends on patch exists@ expression E2; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ fn(... -, E2 ) @pte_alloc_macro depends on patch exists@ identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; identifier a, b, c; expression e; position p; @@ ( - #define fn(a, b, c) e + #define fn(a, b) e | - #define fn(a, b) e + #define fn(a) e ) Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.com Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Suggested-by: Kirill A. Shutemov <kirill@shutemov.name> Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michal Hocko <mhocko@kernel.org> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-19powerpc: implement CONFIG_DEBUG_VIRTUALChristophe Leroy1-1/+1
This patch implements CONFIG_DEBUG_VIRTUAL to warn about incorrect use of virt_to_phys() and page_to_phys() Below is the result of test_debug_virtual: [ 1.438746] WARNING: CPU: 0 PID: 1 at ./arch/powerpc/include/asm/io.h:808 test_debug_virtual_init+0x3c/0xd4 [ 1.448156] CPU: 0 PID: 1 Comm: swapper Not tainted 4.20.0-rc5-00560-g6bfb52e23a00-dirty #532 [ 1.457259] NIP: c066c550 LR: c0650ccc CTR: c066c514 [ 1.462257] REGS: c900bdb0 TRAP: 0700 Not tainted (4.20.0-rc5-00560-g6bfb52e23a00-dirty) [ 1.471184] MSR: 00029032 <EE,ME,IR,DR,RI> CR: 48000422 XER: 20000000 [ 1.477811] [ 1.477811] GPR00: c0650ccc c900be60 c60d0000 00000000 006000c0 c9000000 00009032 c7fa0020 [ 1.477811] GPR08: 00002400 00000001 09000000 00000000 c07b5d04 00000000 c00037d8 00000000 [ 1.477811] GPR16: 00000000 00000000 00000000 00000000 c0760000 c0740000 00000092 c0685bb0 [ 1.477811] GPR24: c065042c c068a734 c0685b8c 00000006 00000000 c0760000 c075c3c0 ffffffff [ 1.512711] NIP [c066c550] test_debug_virtual_init+0x3c/0xd4 [ 1.518315] LR [c0650ccc] do_one_initcall+0x8c/0x1cc [ 1.523163] Call Trace: [ 1.525595] [c900be60] [c0567340] 0xc0567340 (unreliable) [ 1.530954] [c900be90] [c0650ccc] do_one_initcall+0x8c/0x1cc [ 1.536551] [c900bef0] [c0651000] kernel_init_freeable+0x1f4/0x2cc [ 1.542658] [c900bf30] [c00037ec] kernel_init+0x14/0x110 [ 1.547913] [c900bf40] [c000e1d0] ret_from_kernel_thread+0x14/0x1c [ 1.553971] Instruction dump: [ 1.556909] 3ca50100 bfa10024 54a5000e 3fa0c076 7c0802a6 3d454000 813dc204 554893be [ 1.564566] 7d294010 7d294910 90010034 39290001 <0f090000> 7c3e0b78 955e0008 3fe0c062 [ 1.572425] ---[ end trace 6f6984225b280ad6 ]--- [ 1.577467] PA: 0x09000000 for VA: 0xc9000000 [ 1.581799] PA: 0x061e8f50 for VA: 0xc61e8f50 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04powerpc/mm: Extend pte_fragment functionality to PPC32Christophe Leroy1-21/+4
In order to allow the 8xx to handle pte_fragments, this patch extends the use of pte_fragments to PPC32 platforms. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-11-26powerpc: change CONFIG_PPC_STD_MMU_32 to CONFIG_PPC_BOOK3S_32Christophe Leroy1-1/+1
Today we have: config PPC_BOOK3S_32 bool "512x/52xx/6xx/7xx/74xx/82xx/83xx/86xx" [depends on PPC32 within a choice] config PPC_BOOK3S def_bool y depends on PPC_BOOK3S_32 || PPC_BOOK3S_64 config PPC_STD_MMU def_bool y depends on PPC_BOOK3S config PPC_STD_MMU_32 def_bool y depends on PPC_STD_MMU && PPC32 PPC_STD_MMU_32 is therefore redundant with PPC_BOOK3S_32. In order to make the code clearer, lets use preferably PPC_BOOK3S_32. This will allow to remove CONFIG_PPC_STD_MMU_32 in a later patch. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-31memblock: rename memblock_alloc{_nid,_try_nid} to memblock_phys_alloc*Mike Rapoport1-1/+1
Make it explicit that the caller gets a physical address rather than a virtual one. This will also allow using meblock_alloc prefix for memblock allocations returning virtual address, which is done in the following patches. The conversion is done using the following semantic patch: @@ expression e1, e2, e3; @@ ( - memblock_alloc(e1, e2) + memblock_phys_alloc(e1, e2) | - memblock_alloc_nid(e1, e2, e3) + memblock_phys_alloc_nid(e1, e2, e3) | - memblock_alloc_try_nid(e1, e2, e3) + memblock_phys_alloc_try_nid(e1, e2, e3) ) Link: http://lkml.kernel.org/r/1536927045-23536-7-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Serge Semin <fancer.lancer@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-14powerpc/mm: use pte helpers in generic codeChristophe Leroy1-7/+8
Get rid of platform specific _PAGE_XXXX in powerpc common code and use helpers instead. mm/dump_linuxpagetables.c will be handled separately Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/mm: don't use _PAGE_EXEC for calling hash_preload()Christophe Leroy1-1/+1
The 'access' parameter of hash_preload() is either 0 or _PAGE_EXEC. Among the two versions of hash_preload(), only the PPC64 one is doing something with this 'access' parameter. In order to remove the use of _PAGE_EXEC outside platform code, 'access' parameter is replaced by 'is_exec' which will be either true of false, and the PPC64 version of hash_preload() creates the access flag based on 'is_exec'. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc: handover page flags with a pgprot_t parameterChristophe Leroy1-20/+17
In order to avoid multiple conversions, handover directly a pgprot_t to map_kernel_page() as already done for radix. Do the same for __ioremap_caller() and __ioremap_at(). Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-14powerpc/mm: properly set PAGE_KERNEL flags in ioremap()Christophe Leroy1-16/+12
Set PAGE_KERNEL directly in the caller and do not rely on a hack adding PAGE_KERNEL flags when _PAGE_PRESENT is not set. As already done for PPC64, use pgprot_cache() helpers instead of _PAGE_XXX flags in PPC32 ioremap() derived functions. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-10-13powerpc/32: Add ioremap_wt() and ioremap_coherent()Christophe Leroy1-0/+16
Other arches have ioremap_wt() to map IO areas write-through. Implement it on PPC as well in order to avoid drivers using __ioremap(_PAGE_WRITETHRU) Also implement ioremap_coherent() to avoid drivers using __ioremap(_PAGE_COHERENT) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/mm/32: Remove the reserved memory hackJonathan Neuschäfer1-2/+1
This hack, introduced in commit c5df7f775148 ("powerpc: allow ioremap within reserved memory regions") is now unnecessary. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-01powerpc/mm/32: Use page_is_ram to check for RAMJonathan Neuschäfer1-0/+1
On systems where there is MMIO space between different blocks of RAM in the physical address space, __ioremap_caller did not allow mapping these MMIO areas, because they were below the end RAM and thus considered RAM as well. Use the memblock-based page_is_ram function, which returns false for such MMIO holes. v2: Keep the check for p < virt_to_phys(high_memory). On 32-bit systems with high memory (memory above physical address 4GiB), the high memory is expected to be available though ioremap. The high_memory variable marks the end of low memory; comparing against it means that only ioremap requests for low RAM will be denied. Reported by Michael Ellerman. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-16powerpc/mm: extend _PAGE_PRIVILEGED to all CPUsChristophe Leroy1-8/+1
commit ac29c64089b74 ("powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED") introduced _PAGE_PRIVILEGED for BOOK3S/64 This patch generalises _PAGE_PRIVILEGED for all CPUs, allowing to have either _PAGE_PRIVILEGED or _PAGE_USER or both. PPC_8xx has a _PAGE_SHARED flag which is set for and only for all non user pages. Lets rename it _PAGE_PRIVILEGED to remove confusion as it has nothing to do with Linux shared pages. On BookE, there's a _PAGE_BAP_SR which has to be set for kernel pages: defining _PAGE_PRIVILEGED as _PAGE_BAP_SR will make this generic Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04powerpc/mm: Call flush_tlb_kernel_range with interrupts enabledGuenter Roeck1-1/+1
flush_tlb_kernel_range() may call smp_call_function_many() which expects interrupts to be enabled. This results in a traceback. WARNING: CPU: 0 PID: 1 at kernel/smp.c:416 smp_call_function_many+0xcc/0x2fc CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.0-rc1-00009-g0666f56 #1 task: cf830000 task.stack: cf82e000 NIP: c00a93c8 LR: c00a9634 CTR: 00000001 REGS: cf82fde0 TRAP: 0700 Not tainted (4.14.0-rc1-00009-g0666f56) MSR: 00021000 <CE,ME> CR: 24000082 XER: 00000000 GPR00: c00a9634 cf82fe90 cf830000 c050ad3c c0015a54 00000000 00000001 00000001 GPR08: 00000001 00000000 00000000 cf82e000 24000084 00000000 c0003150 00000000 GPR16: 00000000 00000000 00000000 00000000 00000000 00000001 00000000 c0510000 GPR24: 00000000 c0015a54 00000000 c050ad3c c051823c c050ad3c 00000025 00000000 NIP [c00a93c8] smp_call_function_many+0xcc/0x2fc LR [c00a9634] smp_call_function+0x3c/0x50 Call Trace: [cf82fe90] [00000010] 0x10 (unreliable) [cf82fed0] [c00a9634] smp_call_function+0x3c/0x50 [cf82fee0] [c0015d2c] flush_tlb_kernel_range+0x20/0x38 [cf82fef0] [c001524c] mark_initmem_nx+0x154/0x16c [cf82ff20] [c001484c] free_initmem+0x20/0x4c [cf82ff30] [c000316c] kernel_init+0x1c/0x108 [cf82ff40] [c000f3a8] ret_from_kernel_thread+0x5c/0x64 Instruction dump: 7c0803a6 7d808120 38210040 4e800020 3d20c052 812981a0 2f890000 40beffac 3d20c051 8929ac64 2f890000 40beff9c <0fe00000> 4bffff94 7fc3f378 7f64db78 Fixes: 3184cc4b6f6a ("powerpc/mm: Fix kernel RAM protection after freeing ...") Fixes: e611939fc8ec ("powerpc/mm: Ensure change_page_attr() doesn't ...") Cc: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15powerpc/mm: Simplify __set_fixmap()Christophe Leroy1-15/+0
__set_fixmap() uses __fix_to_virt() then does the boundary checks by it self. Instead, we can use fix_to_virt() which does the verification at build time. For this, we need to use it inline so that GCC can see the real value of idx at buildtime. In the meantime, we remove the 'fixmaps' variable. This variable is set but has never been used from the beginning (commit 2c419bdeca1d9 ("[POWERPC] Port fixmap from x86 and use for kmap_atomic")) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15powerpc/mm: declare some local functions staticChristophe Leroy1-2/+2
get_pteptr() and __mapin_ram_chunk() are only used locally, so define them static Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32Christophe Leroy1-0/+24
This patch implements STRICT_KERNEL_RWX on PPC32. As for CONFIG_DEBUG_PAGEALLOC, it deactivates BAT and LTLB mappings in order to allow page protection setup at the level of each page. As BAT/LTLB mappings are deactivated, there might be a performance impact. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15powerpc/mm: Fix kernel RAM protection after freeing unused memory on PPC32Christophe Leroy1-3/+10
As seen below, allthough the init sections have been freed, the associated memory area is still marked as executable in the page tables. ~ dmesg [ 5.860093] Freeing unused kernel memory: 592K (c0570000 - c0604000) ~ cat /sys/kernel/debug/kernel_page_tables ---[ Start of kernel VM ]--- 0xc0000000-0xc0497fff 4704K rw X present dirty accessed shared 0xc0498000-0xc056ffff 864K rw present dirty accessed shared 0xc0570000-0xc059ffff 192K rw X present dirty accessed shared 0xc05a0000-0xc7ffffff 125312K rw present dirty accessed shared ---[ vmalloc() Area ]--- This patch fixes that. The implementation is done by reusing the change_page_attr() function implemented for CONFIG_DEBUG_PAGEALLOC Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15powerpc/mm: Ensure change_page_attr() doesn't invalidate pinned TLBsChristophe Leroy1-4/+6
__change_page_attr() uses flush_tlb_page(). flush_tlb_page() uses tlbie instruction, which also invalidates pinned TLBs, which is not what we expect. This patch modifies the implementation to use flush_tlb_kernel_range() instead. This will make use of tlbia which will preserve pinned TLBs. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05powerpc/mm: Rename map_page() to map_kernel_page() on 32-bitChristophe Leroy1-4/+4
These two functions implement the same semantics, so unify their naming so we can share code that calls them. The longer name is more descriptive so use it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05powerpc/mm/book(e)(3s)/32: Add page table accountingBalbir Singh1-1/+1
Add support in pte_alloc_one() and pgd_alloc() by passing __GFP_ACCOUNT in the flags Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02powerpc/mm: Remove __this_fixmap_does_not_exist()Christophe Leroy1-5/+0
This function has not been used since commit 9494a1e8428ea ("powerpc: use generic fixmap.h) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-12-09powerpc: port 64 bits pgtable_cache to 32 bitsChristophe Leroy1-37/+0
Today powerpc64 uses a set of pgtable_caches while powerpc32 uses standard pages when using 4k pages and a single pgtable_cache if using other size pages. In preparation of implementing huge pages on the 8xx, this patch replaces the specific powerpc32 handling by the 64 bits approach. This is done by: * moving 64 bits pgtable_cache_add() and pgtable_cache_init() in a new file called init-common.c * modifying pgtable_cache_init() to also handle the case without PMD * removing the 32 bits version of pgtable_cache_add() and pgtable_cache_init() * copying related header contents from 64 bits into both the book3s/32 and nohash/32 header files On the 8xx, the following cache sizes will be used: * 4k pages mode: - PGT_CACHE(10) for PGD - PGT_CACHE(3) for 512k hugepage tables * 16k pages mode: - PGT_CACHE(6) for PGD - PGT_CACHE(7) for 512k hugepage tables - PGT_CACHE(3) for 8M hugepage tables Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Scott Wood <oss@buserror.net>
2016-08-02treewide: replace obsolete _refok by __refFabian Frederick1-1/+1
There was only one use of __initdata_refok and __exit_refok __init_refok was used 46 times against 82 for __ref. Those definitions are obsolete since commit 312b1485fb50 ("Introduce new section reference annotations tags: __ref, __refdata, __refconst") This patch removes the following compatibility definitions and replaces them treewide. /* compatibility defines */ #define __init_refok __ref #define __initdata_refok __refdata #define __exit_refok __ref I can also provide separate patches if necessary. (One patch per tree and check in 1 month or 2 to remove old definitions) [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Ingo Molnar <mingo@redhat.com> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-24tree wide: get rid of __GFP_REPEAT for order-0 allocations part IMichal Hocko1-2/+2
This is the third version of the patchset previously sent [1]. I have basically only rebased it on top of 4.7-rc1 tree and dropped "dm: get rid of superfluous gfp flags" which went through dm tree. I am sending it now because it is tree wide and chances for conflicts are reduced considerably when we want to target rc2. I plan to send the next step and rename the flag and move to a better semantic later during this release cycle so we will have a new semantic ready for 4.8 merge window hopefully. Motivation: While working on something unrelated I've checked the current usage of __GFP_REPEAT in the tree. It seems that a majority of the usage is and always has been bogus because __GFP_REPEAT has always been about costly high order allocations while we are using it for order-0 or very small orders very often. It seems that a big pile of them is just a copy&paste when a code has been adopted from one arch to another. I think it makes some sense to get rid of them because they are just making the semantic more unclear. Please note that GFP_REPEAT is documented as * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt * _might_ fail. This depends upon the particular VM implementation. while !costly requests have basically nofail semantic. So one could reasonably expect that order-0 request with __GFP_REPEAT will not loop for ever. This is not implemented right now though. I would like to move on with __GFP_REPEAT and define a better semantic for it. $ git grep __GFP_REPEAT origin/master | wc -l 111 $ git grep __GFP_REPEAT | wc -l 36 So we are down to the third after this patch series. The remaining places really seem to be relying on __GFP_REPEAT due to large allocation requests. This still needs some double checking which I will do later after all the simple ones are sorted out. I am touching a lot of arch specific code here and I hope I got it right but as a matter of fact I even didn't compile test for some archs as I do not have cross compiler for them. Patches should be quite trivial to review for stupid compile mistakes though. The tricky parts are usually hidden by macro definitions and thats where I would appreciate help from arch maintainers. [1] http://lkml.kernel.org/r/1461849846-27209-1-git-send-email-mhocko@kernel.org This patch (of 19): __GFP_REPEAT has a rather weak semantic but since it has been introduced around 2.6.12 it has been ignored for low order allocations. Yet we have the full kernel tree with its usage for apparently order-0 allocations. This is really confusing because __GFP_REPEAT is explicitly documented to allow allocation failures which is a weaker semantic than the current order-0 has (basically nofail). Let's simply drop __GFP_REPEAT from those places. This would allow to identify place which really need allocator to retry harder and formulate a more specific semantic for what the flag is supposed to do actually. Link: http://lkml.kernel.org/r/1464599699-30131-2-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Liqin <liqin.linux@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> [for tile] Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: John Crispin <blogic@openwrt.org> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Ley Foon Tan <lftan@altera.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-11powerpc32: PAGE_EXEC required for inittextChristophe Leroy1-2/+3
PAGE_EXEC is required for inittext, otherwise CONFIG_DEBUG_PAGEALLOC ends up with an Oops [ 0.000000] Inode-cache hash table entries: 8192 (order: 1, 32768 bytes) [ 0.000000] Sorting __ex_table... [ 0.000000] bootmem::free_all_bootmem_core nid=0 start=0 end=2000 [ 0.000000] Unable to handle kernel paging request for instruction fetch [ 0.000000] Faulting instruction address: 0xc045b970 [ 0.000000] Oops: Kernel access of bad area, sig: 11 [#1] [ 0.000000] PREEMPT DEBUG_PAGEALLOC CMPC885 [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.25-local-dirty #1673 [ 0.000000] task: c04d83d0 ti: c04f8000 task.ti: c04f8000 [ 0.000000] NIP: c045b970 LR: c045b970 CTR: 0000000a [ 0.000000] REGS: c04f9ea0 TRAP: 0400 Not tainted (3.18.25-local-dirty) [ 0.000000] MSR: 08001032 <ME,IR,DR,RI> CR: 39955d35 XER: a000ff40 [ 0.000000] GPR00: c045b970 c04f9f50 c04d83d0 00000000 ffffffff c04dcdf4 00000048 c04f6b10 GPR08: c04f6ab0 00000001 c0563488 c04f6ab0 c04f8000 00000000 00000000 b6db6db7 GPR16: 00003474 00000180 00002000 c7fec000 00000000 000003ff 00000176 c0415014 GPR24: c0471018 c0414ee8 c05304e8 c03aeaac c0510000 c0471018 c0471010 00000000 [ 0.000000] NIP [c045b970] free_all_bootmem+0x164/0x228 [ 0.000000] LR [c045b970] free_all_bootmem+0x164/0x228 [ 0.000000] Call Trace: [ 0.000000] [c04f9f50] [c045b970] free_all_bootmem+0x164/0x228 (unreliable) [ 0.000000] [c04f9fa0] [c0454044] mem_init+0x3c/0xd0 [ 0.000000] [c04f9fb0] [c045080c] start_kernel+0x1f4/0x390 [ 0.000000] [c04f9ff0] [c0002214] start_here+0x38/0x98 [ 0.000000] Instruction dump: [ 0.000000] 2f150000 7f968840 72a90001 3ad60001 56b5f87e 419a0028 419e0024 41a20018 [ 0.000000] 807cc20c 38800000 7c638214 4bffd2f5 <3a940001> 3a100024 4bffffc8 7e368b78 [ 0.000000] ---[ end trace dc8fa200cb88537f ]--- Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11powerpc32: remove ioremap_baseChristophe Leroy1-2/+1
ioremap_base is not initialised and is nowhere used so remove it Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() togetherChristophe Leroy1-38/+6
x_mapped_by_bats() and x_mapped_by_tlbcam() serve the same kind of purpose, and are never defined at the same time. So rename them x_block_mapped() and define them in the relevant places Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2015-04-10powerpc: Replace mem_init_done with slab_is_available()Michael Ellerman1-5/+4
We have a powerpc specific global called mem_init_done which is "set on boot once kmalloc can be called". But that's not *quite* true. We set it at the bottom of mem_init(), and rely on the fact that mm_init() calls kmem_cache_init() immediately after that, and nothing is running in parallel. So replace it with the generic and 100% correct slab_is_available(). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-10powerpc: Fix compile errors with STRICT_MM_TYPECHECKS enabledMichael Ellerman1-2/+2
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Fix the 32-bit code also] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-07powerpc/mm: Remove duplicate declaration of setbat()Michael Ellerman1-3/+0
This is already declared in mmu_decl.h, so we don't need a second version in the C file. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-23powerpc/32: %pF is only for function pointersScott Wood1-1/+1
Use %pS for actual addresses, otherwise you'll get bad output on arches like ppc64 where %pF expects a function descriptor. Even on other architectures, refrain from setting a bad example that people copy. Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-29powerpc32: Use kmem_cache memory for PGDIRLEROY Christophe1-2/+14
When pages are not 4K, PGDIR table is allocated with kmalloc(). In order to optimise TLB handlers, aligned memory is needed. kmalloc() doesn't provide aligned memory blocks, so lets use a kmem_cache pool instead. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29powerpc32: adds handling of _PAGE_ROLEROY Christophe1-1/+1
Some powerpc like the 8xx don't have a RW bit in PTE bits but a RO (Read Only) bit. This patch implements the handling of a _PAGE_RO flag to be used in place of _PAGE_RW Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [scottwood@freescale.com: fix whitespace] Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-01-29powerpc: Remove duplicate tlbcam_index declarationsEmil Medve1-1/+0
They seem to be leftovers from '14cf11a powerpc: Merge enough to start building in arch/powerpc' Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com>
2014-12-13mm/debug-pagealloc: make debug-pagealloc boottime configurableJoonsoo Kim1-1/+1
Now, we have prepared to avoid using debug-pagealloc in boottime. So introduce new kernel-parameter to disable debug-pagealloc in boottime, and makes related functions to be disabled in this case. Only non-intuitive part is change of guard page functions. Because guard page is effective only if debug-pagealloc is enabled, turning off according to debug-pagealloc is reasonable thing to do. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dave Hansen <dave@sr71.net> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Jungsoo Son <jungsoo.son@lge.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-10powerpc: Remove bootmem allocatorAnton Blanchard1-2/+1
At the moment we transition from the memblock alloctor to the bootmem allocator. Gitting rid of the bootmem allocator removes a bunch of complicated code (most of which I owe the dubious honour of being responsible for writing). Signed-off-by: Anton Blanchard <anton@samba.org> Tested-by: Emil Medve <Emilian.Medve@Freescale.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-07-28powerpc: Remove CONFIG_POWER3Michael Ellerman1-1/+1
Now that we have dropped power3 support we can remove CONFIG_POWER3. The usage in pgtable_32.c was already dead code as CONFIG_POWER3 was not selectable on PPC32. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-09powerpc: add barrier after writing kernel PTEScott Wood1-0/+1
There is no barrier between something like ioremap() writing to a PTE, and returning the value to a caller that may then store the pointer in a place that is visible to other CPUs. Such callers generally don't perform barriers of their own. Even if callers of ioremap() and similar things did use barriers, the most logical choise would be smp_wmb(), which is not architecturally sufficient when BookE hardware tablewalk is used. A full sync is specified by the architecture. For userspace mappings, OTOH, we generally already have an lwsync due to locking, and if we occasionally take a spurious fault due to not having a full sync with hardware tablewalk, it will not be fatal because we will retry rather than oops. Signed-off-by: Scott Wood <scottwood@freescale.com>
2013-11-15powerpc: handle pgtable_page_ctor() failKirill A. Shutemov1-1/+4
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-03-28Disintegrate asm/system.h for PowerPCDavid Howells1-0/+1
Disintegrate asm/system.h for PowerPC. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> cc: linuxppc-dev@lists.ozlabs.org
2012-03-07powerpc: Use vsprintf extention %pf with builtin_return_addressJoe Perches1-1/+1
Emit the function name not the address when possible. builtin_return_address() gives an address. When building a kernel with CONFIG_KALLSYMS, emit the actual function name not the address. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-19powerpc: Remove ioremap_flagsAnton Blanchard1-2/+2
We have a confusing number of ioremap functions. Make things just a bit simpler by merging ioremap_flags and ioremap_prot. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-19powerpc: Add ioremap_wcAnton Blanchard1-0/+8
Add ioremap_wc so drivers can request write combining on kernel mappings. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-12-09powerpc: Remove unnecessary casts of void ptrJesper Juhl1-1/+1
Hi, The [vk][cmz]alloc(_node) family of functions return void pointers which it's completely unnecessary/pointless to cast to other pointer types since that happens implicitly. This patch removes such casts from arch/powerpc/ Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-12-09powerpc: Record vma->phys_addr in ioremap()Michael Ellerman1-0/+1
The vmalloc code can track the physical address of a vma, when the vma is used for ioremap, if set it is displayed in /proc/vmallocinfo. Because get_vm_area_caller() doesn't know it's being called for ioremap() it's up to the arch code to set the phys_addr. A bunch of other arch's do this, I'm not sure why powerpc doesn't? Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>