linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-09-25	arm64/mm: move runtime pgds to rodata	Jun Yao	2	-21/+24
	Now that deliberate writes to swapper_pg_dir are made via the fixmap, we can defend against errant writes by moving it into the rodata section. Since tramp_pg_dir and reserved_ttbr0 must be at a fixed offset from swapper_pg_dir, and are not modified at runtime, these are also moved into the rodata section. Likewise, idmap_pg_dir is not modified at runtime, and is moved into rodata. Signed-off-by: Jun Yao <yaojun8558363@gmail.com> Reviewed-by: James Morse <james.morse@arm.com> [Mark: simplify linker script, commit message] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-25	arm64/mm: use fixmap to modify swapper_pg_dir	Jun Yao	2	-8/+53
	Once swapper_pg_dir is in the rodata section, it will not be possible to modify it directly, but we will need to modify it in some cases. To enable this, we can use the fixmap when deliberately modifying swapper_pg_dir. As the pgd is only transiently mapped, this provides some resilience against illicit modification of the pgd, e.g. for Kernel Space Mirror Attack (KSMA). Signed-off-by: Jun Yao <yaojun8558363@gmail.com> Reviewed-by: James Morse <james.morse@arm.com> [Mark: simplify ifdeffery, commit message] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-25	arm64/mm: Separate boot-time page tables from swapper_pg_dir	Jun Yao	6	-33/+21
	Since the address of swapper_pg_dir is fixed for a given kernel image, it is an attractive target for manipulation via an arbitrary write. To mitigate this we'd like to make it read-only by moving it into the rodata section. We require that swapper_pg_dir is at a fixed offset from tramp_pg_dir and reserved_ttbr0, so these will also need to move into rodata. However, swapper_pg_dir is allocated along with some transient page tables used for boot which we do not want to move into rodata. As a step towards this, this patch separates the boot-time page tables into a new init_pg_dir, and reduces swapper_pg_dir to the single page it needs to be. This allows us to retain the relationship between swapper_pg_dir, tramp_pg_dir, and swapper_pg_dir, while cleanly separating these from the boot-time page tables. The init_pg_dir holds all of the pgd/pud/pmd/pte levels needed during boot, and all of these levels will be freed when we switch to the swapper_pg_dir, which is initialized by the existing code in paging_init(). Since we start off on the init_pg_dir, we no longer need to allocate a transient page table in paging_init() in order to ensure that swapper_pg_dir isn't live while we initialize it. There should be no functional change as a result of this patch. Signed-off-by: Jun Yao <yaojun8558363@gmail.com> Reviewed-by: James Morse <james.morse@arm.com> [Mark: place init_pg_dir after BSS, fold mm changes, commit message] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-25	arm64/mm: Pass ttbr1 as a parameter to __enable_mmu()	Jun Yao	2	-9/+12
	In subsequent patches we'll use a transient pgd during the primary cpu's boot process. To make this work while allowing secondary cpus to use the swapper_pg_dir, we need to pass the relevant TTBR1 pgd as a parameter to __enable_mmu(). This patch updates __enable__mmu() to take this as a parameter, updating callsites to pass swapper_pg_dir for now. There should be no functional change as a result of this patch. Signed-off-by: Jun Yao <yaojun8558363@gmail.com> Reviewed-by: James Morse <james.morse@arm.com> [Mark: simplify assembly, clarify commit message] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-24	arm64: lse: remove -fcall-used-x0 flag	Tri Vo	1	-1/+1
	x0 is not callee-saved in the PCS. So there is no need to specify -fcall-used-x0. Clang doesn't currently support -fcall-used flags. This patch will help building the kernel with clang. Tested-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Tri Vo <trong@android.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-21	arm64: Remove unused VGA console support	Andrew Murray	1	-4/+0
	Support for VGA_CONSOLE is not allowable due to commit ee23794b8668 ("video: vgacon: Don't build on arm64"), thus remove the associated unused code. Whilst PCI on arm64 would support VGA a valid screen_info structure is missing. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-21	arm64: Kconfig: Remove ARCH_HAS_HOLES_MEMORYMODEL	James Morse	3	-8/+1
	include/linux/mmzone.h describes ARCH_HAS_HOLES_MEMORYMODEL as relevant when parts the memmap have been free()d. This would happen on systems where memory is smaller than a sparsemem-section, and the extra struct pages are expensive. pfn_valid() on these systems returns true for the whole sparsemem-section, so an extra memmap_valid_within() check is needed. On arm64 we have nomap memory, so always provide pfn_valid() to test for nomap pages. This means ARCH_HAS_HOLES_MEMORYMODEL's extra checks are already rolled up into pfn_valid(). Remove it. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-21	arm64/cpufeatures: Emulate MRS instructions by parsing ESR_ELx.ISS	Anshuman Khandual	2	-0/+29
	Armv8.4-A extension enables MRS instruction encodings inside ESR_ELx.ISS during exception class ESR_ELx_EC_SYS64 (0x18). This encoding can be used to emulate MRS instructions which can avoid fetch/decode from user space thus improving performance. This adds a new sys64_hook structure element with applicable ESR mask/value pair for MRS instructions on various system registers but constrained by sysreg encodings which is currently allowed to be emulated. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-21	arm64/cpufeatures: Factorize emulate_mrs()	Anshuman Khandual	2	-10/+16
	MRS emulation gets triggered with exception class (0x00 or 0x18) eventually calling the function emulate_mrs() which fetches the user space instruction and analyses it's encodings (OP0, OP1, OP2, CRN, CRM, RT). The kernel tries to emulate the given instruction looking into the encoding details. Going forward these encodings can also be parsed from ESR_ELx.ISS fields without requiring to fetch/decode faulting userspace instruction which can improve performance. This factorizes emulate_mrs() function in a way that it can be called directly with MRS encodings (OP0, OP1, OP2, CRN, CRM) for any given target register which can then be used directly from 0x18 exception class. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-21	arm64/cpufeatures: Introduce ESR_ELx_SYS64_ISS_RT()	Anshuman Khandual	3	-5/+7
	Extracting target register from ESR.ISS encoding has already been required at multiple instances. Just make it a macro definition and replace all the existing use cases. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-19	arm64: cpu_errata: Remove ARM64_MISMATCHED_CACHE_LINE_SIZE	Will Deacon	3	-20/+7
	There's no need to treat mismatched cache-line sizes reported by CTR_EL0 differently to any other mismatched fields that we treat as "STRICT" in the cpufeature code. In both cases we need to trap and emulate EL0 accesses to the register, so drop ARM64_MISMATCHED_CACHE_LINE_SIZE and rely on ARM64_MISMATCHED_CACHE_TYPE instead. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> [catalin.marinas@arm.com: move ARM64_HAS_CNP in the empty cpucaps.h slot] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-18	arm64: KVM: Enable Common Not Private translations	Vladimir Murzin	6	-2/+17
	We rely on cpufeature framework to detect and enable CNP so for KVM we need to patch hyp to set CNP bit just before TTBR0_EL2 gets written. For the guest we encode CNP bit while building vttbr, so we don't need to bother with that in a world switch. Reviewed-by: James Morse <james.morse@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-18	arm64: mm: Support Common Not Private translations	Vladimir Murzin	9	-6/+88
	Common Not Private (CNP) is a feature of ARMv8.2 extension which allows translation table entries to be shared between different PEs in the same inner shareable domain, so the hardware can use this fact to optimise the caching of such entries in the TLB. CNP occupies one bit in TTBRx_ELy and VTTBR_EL2, which advertises to the hardware that the translation table entries pointed to by this TTBR are the same as every PE in the same inner shareable domain for which the equivalent TTBR also has CNP bit set. In case CNP bit is set but TTBR does not point at the same translation table entries for a given ASID and VMID, then the system is mis-configured, so the results of translations are UNPREDICTABLE. For kernel we postpone setting CNP till all cpus are up and rely on cpufeature framework to 1) patch the code which is sensitive to CNP and 2) update TTBR1_EL1 with CNP bit set. TTBR1_EL1 can be reprogrammed as result of hibernation or cpuidle (via __enable_mmu). For these two cases we restore CnP bit via __cpu_suspend_exit(). There are a few cases we need to care of changes in TTBR0_EL1: - a switch to idmap - software emulated PAN we rule out latter via Kconfig options and for the former we make sure that CNP is set for non-zero ASIDs only. Reviewed-by: James Morse <james.morse@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> [catalin.marinas@arm.com: default y for CONFIG_ARM64_CNP] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-17	arm64: sysreg: Clean up instructions for modifying PSTATE fields	Suzuki K Poulose	2	-13/+23
	Instructions for modifying the PSTATE fields which were not supported in the older toolchains (e.g, PAN, UAO) are generated using macros. We have so far used the normal sys_reg() helper for defining the PSTATE fields. While this works fine, it is really difficult to correlate the code with the Arm ARM definition. As per Arm ARM, the PSTATE fields are defined only using Op1, Op2 fields, with fixed values for Op0, CRn. Also the CRm field has been reserved for the Immediate value for the instruction. So using the sys_reg() looks quite confusing. This patch cleans up the instruction helpers by bringing them in line with the Arm ARM definitions to make it easier to correlate code with the document. No functional changes. Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-14	arm64: fix for bad_mode() handler to always result in panic	Hari Vyas	1	-1/+0
	The bad_mode() handler is called if we encounter an uunknown exception, with the expectation that the subsequent call to panic() will halt the system. Unfortunately, if the exception calling bad_mode() is taken from EL0, then the call to die() can end up killing the current user task and calling schedule() instead of falling through to panic(). Remove the die() call altogether, since we really want to bring down the machine in this "impossible" case. Signed-off-by: Hari Vyas <hari.vyas@broadcom.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-14	arm64: force_signal_inject: WARN if called from kernel context	Will Deacon	1	-1/+4
	force_signal_inject() is designed to send a fatal signal to userspace, so WARN if the current pt_regs indicates a kernel context. This can currently happen for the undefined instruction trap, so patch that up so we always BUG() if we didn't have a handler. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-14	arm64: cpu: Move errata and feature enable callbacks closer to callers	Will Deacon	5	-29/+28
	The cpu errata and feature enable callbacks are only called via their respective arm64_cpu_capabilities structure and therefore shouldn't exist in the global namespace. Move the PAN, RAS and cache maintenance emulation enable callbacks into the same files as their corresponding arm64_cpu_capabilities structures, making them static in the process. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-14	KVM: arm64: Set SCTLR_EL2.DSSBS if SSBD is forcefully disabled and !vhe	Will Deacon	2	-0/+22
	When running without VHE, it is necessary to set SCTLR_EL2.DSSBS if SSBD has been forcefully disabled on the kernel command-line. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-14	arm64: ssbd: Add support for PSTATE.SSBS rather than trapping to EL3	Will Deacon	8	-2/+106
	On CPUs with support for PSTATE.SSBS, the kernel can toggle the SSBD state without needing to call into firmware. This patch hooks into the existing SSBD infrastructure so that SSBS is used on CPUs that support it, but it's all made horribly complicated by the very real possibility of big/little systems that don't uniformly provide the new capability. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-14	arm64: entry: Allow handling of undefined instructions from EL1	Will Deacon	2	-5/+8
	Rather than panic() when taking an undefined instruction exception from EL1, allow a hook to be registered in case we want to emulate the instruction, like we will for the SSBS PSTATE manipulation instructions. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-14	arm64: ssbd: Drop #ifdefs for PR_SPEC_STORE_BYPASS	Will Deacon	1	-3/+0
	Now that we're all merged nicely into mainline, there's no need to check to see if PR_SPEC_STORE_BYPASS is defined. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-14	arm64: cpufeature: Detect SSBS and advertise to userspace	Will Deacon	5	-7/+33
	Armv8.5 introduces a new PSTATE bit known as Speculative Store Bypass Safe (SSBS) which can be used as a mitigation against Spectre variant 4. Additionally, a CPU may provide instructions to manipulate PSTATE.SSBS directly, so that userspace can toggle the SSBS control without trapping to the kernel. This patch probes for the existence of SSBS and advertise the new instructions to userspace if they exist. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-14	arm64: Fix silly typo in comment	Will Deacon	1	-1/+1
	I was passing through and figuered I'd fix this up: featuer -> feature Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-11	arm64: tlb: Rewrite stale comment in asm/tlbflush.h	Will Deacon	1	-25/+55
	Peter Z asked me to justify the barrier usage in asm/tlbflush.h, but actually that whole block comment needs to be rewritten. Reported-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-11	arm64: tlb: Avoid synchronous TLBIs when freeing page tables	Will Deacon	3	-14/+1
	By selecting HAVE_RCU_TABLE_INVALIDATE, we can rely on tlb_flush() being called if we fail to batch table pages for freeing. This in turn allows us to postpone walk-cache invalidation until tlb_finish_mmu(), which avoids lots of unnecessary DSBs and means we can shoot down the ASID if the range is large enough. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-11	arm64: tlb: Adjust stride and type of TLBI according to mmu_gather	Will Deacon	1	-9/+10
	Now that the core mmu_gather code keeps track of both the levels of page table cleared and also whether or not these entries correspond to intermediate entries, we can use this in our tlb_flush() callback to reduce the number of invalidations we issue as well as their scope. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-11	arm64: tlb: Remove redundant !CONFIG_HAVE_RCU_TABLE_FREE code	Will Deacon	1	-9/+3
	If there's one thing the RCU-based table freeing doesn't need, it's more ifdeffery. Remove the redundant !CONFIG_HAVE_RCU_TABLE_FREE code, since this option is unconditionally selected in our Kconfig. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-11	arm64: tlbflush: Allow stride to be specified for __flush_tlb_range()	Will Deacon	2	-7/+10
	When we are unmapping intermediate page-table entries or huge pages, we don't need to issue a TLBI instruction for every PAGE_SIZE chunk in the VA range being unmapped. Allow the invalidation stride to be passed to __flush_tlb_range(), and adjust our "just nuke the ASID" heuristic to take this into account. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-11	arm64: tlb: Justify non-leaf invalidation in flush_tlb_range()	Will Deacon	1	-0/+4
	Add a comment to explain why we can't get away with last-level invalidation in flush_tlb_range() Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-11	arm64: pgtable: Implement p[mu]d_valid() and check in set_p[mu]d()	Will Deacon	1	-2/+8
	Now that our walk-cache invalidation routines imply a DSB before the invalidation, we no longer need one when we are clearing an entry during unmap. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-11	arm64: tlb: Add DSB ISHST prior to TLBI in __flush_tlb_[kernel_]pgtable()	Will Deacon	1	-0/+2
	__flush_tlb_[kernel_]pgtable() rely on set_pXd() having a DSB after writing the new table entry and therefore avoid the barrier prior to the TLBI instruction. In preparation for delaying our walk-cache invalidation on the unmap() path, move the DSB into the TLB invalidation routines. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-11	arm64: tlb: Use last-level invalidation in flush_tlb_kernel_range()	Will Deacon	1	-1/+1
	flush_tlb_kernel_range() is only ever used to invalidate last-level entries, so we can restrict the scope of the TLB invalidation instruction. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-10	arm64: uaccess: implement unsafe accessors	Julien Thierry	1	-15/+46
	Current implementation of get/put_user_unsafe default to get/put_user which toggle PAN before each access, despite having been told by the caller that multiple accesses to user memory were about to happen. Provide implementations for user_access_begin/end to turn PAN off/on and implement unsafe accessors that assume PAN was already turned off. Tested-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-10	arm64: dump: Use consistent capitalisation for page-table dumps	Will Deacon	1	-3/+3
	Being consistent in our capitalisation for page-table dumps helps when grepping for things like "end". Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-10	arm64/lib: add accelerated crc32 routines	Ard Biesheuvel	3	-0/+63
	Unlike crc32c(), which is wired up to the crypto API internally so the optimal driver is selected based on the platform's capabilities, crc32_le() is implemented as a library function using a slice-by-8 table based C implementation. Even though few of the call sites may be bottlenecks, calling a time variant implementation with a non-negligible D-cache footprint is a bit of a waste, given that ARMv8.1 and up mandates support for the CRC32 instructions that were optional in ARMv8.0, but are already widely available, even on the Cortex-A53 based Raspberry Pi. So implement routines that use these instructions if available, and fall back to the existing generic routines otherwise. The selection is based on alternatives patching. Note that this unconditionally selects CONFIG_CRC32 as a builtin. Since CRC32 is relied upon by core functionality such as CONFIG_OF_FLATTREE, this just codifies the status quo. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-10	arm64: cpufeature: add feature for CRC32 instructions	Ard Biesheuvel	2	-1/+11
	Add a CRC32 feature bit and wire it up to the CPU id register so we will be able to use alternatives patching for CRC32 operations. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-10	lib/crc32: make core crc32() routines weak so they can be overridden	Ard Biesheuvel	1	-4/+7
	Allow architectures to drop in accelerated CRC32 routines by making the crc32_le/__crc32c_le entry points weak, and exposing non-weak aliases for them that may be used by the accelerated versions as fallbacks in case the instructions they rely upon are not available. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-09-07	MAINTAINERS: Add entry for MMU GATHER AND TLB INVALIDATION	Will Deacon	1	-0/+13
	We recently had to debug a TLB invalidation problem on the munmap() path, which was made more difficult than necessary because: (a) The MMU gather code had changed without people realising (b) Many people subtly misunderstood the operation of the MMU gather code and its interactions with RCU and arch-specific TLB invalidation (c) Untangling the intended behaviour involved educated guesswork and plenty of discussion Hopefully, we can avoid getting into this mess again by designating a cross-arch group of people to look after this code. It is not intended that they will have a separate tree, but they at least provide a point of contact for anybody working in this area and can co-ordinate any proposed future changes to the internal API. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-09-07	mm/memory: Move mmu_gather and TLB invalidation code into its own file	Peter Zijlstra	4	-252/+265
	In preparation for maintaining the mmu_gather code as its own entity, move the implementation out of memory.c and into its own file. Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-09-04	asm-generic/tlb: Track which levels of the page tables have been cleared	Will Deacon	2	-9/+53
	It is common for architectures with hugepage support to require only a single TLB invalidation operation per hugepage during unmap(), rather than iterating through the mapping at a PAGE_SIZE increment. Currently, however, the level in the page table where the unmap() operation occurs is not stored in the mmu_gather structure, therefore forcing architectures to issue additional TLB invalidation operations or to give up and over-invalidate by e.g. invalidating the entire TLB. Ideally, we could add an interval rbtree to the mmu_gather structure, which would allow us to associate the correct mapping granule with the various sub-mappings within the range being invalidated. However, this is costly in terms of book-keeping and memory management, so instead we approximate by keeping track of the page table levels that are cleared and provide a means to query the smallest granule required for invalidation. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-09-04	asm-generic/tlb: Track freeing of page-table directories in struct mmu_gather	Peter Zijlstra	1	-8/+23
	Some architectures require different TLB invalidation instructions depending on whether it is only the last-level of page table being changed, or whether there are also changes to the intermediate (directory) entries higher up the tree. Add a new bit to the flags bitfield in struct mmu_gather so that the architecture code can operate accordingly if it's the intermediate levels being invalidated. Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-09-04	asm-generic/tlb: Guard with #ifdef CONFIG_MMU	Will Deacon	1	-0/+4
	The inner workings of the mmu_gather-based TLB invalidation mechanism are not relevant to nommu configurations, so guard them with an #ifdef. This allows us to implement future functions using static inlines without breaking the build. Acked-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-09-02	Linux 4.19-rc2	Linus Torvalds	1	-1/+1

2018-09-02	x86/pti: Fix section mismatch warning/error	Randy Dunlap	1	-1/+1
	Fix the section mismatch warning in arch/x86/mm/pti.c: WARNING: vmlinux.o(.text+0x6972a): Section mismatch in reference from the function pti_clone_pgtable() to the function .init.text:pti_user_pagetable_walk_pte() The function pti_clone_pgtable() references the function __init pti_user_pagetable_walk_pte(). This is often because pti_clone_pgtable lacks a __init annotation or the annotation of pti_user_pagetable_walk_pte is wrong. FATAL: modpost: Section mismatches detected. Fixes: 85900ea51577 ("x86/pti: Map the vsyscall page if needed") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/43a6d6a3-d69d-5eda-da09-0b1c88215a2a@infradead.org
2018-09-01	x86/vdso: Fix lsl operand order	Samuel Neves	1	-1/+1
	In the __getcpu function, lsl is using the wrong target and destination registers. Luckily, the compiler tends to choose %eax for both variables, so it has been working so far. Fixes: a582c540ac1b ("x86/vdso: Use RDPID in preference to LSL when available") Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180901201452.27828-1-sneves@dei.uc.pt
2018-09-01	x86/mce: Fix set_mce_nospec() to avoid #GP fault	LuckTony	1	-1/+24
	The trick with flipping bit 63 to avoid loading the address of the 1:1 mapping of the poisoned page while the 1:1 map is updated used to work when unmapping the page. But it falls down horribly when attempting to directly set the page as uncacheable. The problem is that when the cache mode is changed to uncachable, the pages needs to be flushed from the cache first. But the decoy address is non-canonical due to bit 63 flipped, and the CLFLUSH instruction throws a #GP fault. Add code to change_page_attr_set_clr() to fix the address before calling flush. Fixes: 284ce4011ba6 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()") Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Link: https://lkml.kernel.org/r/20180831165506.GA9605@agluck-desk
2018-08-31	x86/efi: Load fixmap GDT in efi_call_phys_epilog()	Joerg Roedel	1	-6/+2
	When PTI is enabled on x86-32 the kernel uses the GDT mapped in the fixmap for the simple reason that this address is also mapped for user-space. The efi_call_phys_prolog()/efi_call_phys_epilog() wrappers change the GDT to call EFI runtime services and switch back to the kernel GDT when they return. But the switch-back uses the writable GDT, not the fixmap GDT. When that happened and and the CPU returns to user-space it switches to the user %cr3 and tries to restore user segment registers. This fails because the writable GDT is not mapped in the user page-table, and without a GDT the fault handlers also can't be launched. The result is a triple fault and reboot of the machine. Fix that by restoring the GDT back to the fixmap GDT which is also mapped in the user page-table. Fixes: 7757d607c6b3 x86/pti: ('Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32') Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: hpa@zytor.com Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/1535702738-10971-1-git-send-email-joro@8bytes.org
2018-08-31	x86/nmi: Fix NMI uaccess race against CR3 switching	Andy Lutomirski	4	-1/+53
	A NMI can hit in the middle of context switching or in the middle of switch_mm_irqs_off(). In either case, CR3 might not match current->mm, which could cause copy_from_user_nmi() and friends to read the wrong memory. Fix it by adding a new nmi_uaccess_okay() helper and checking it in copy_from_user_nmi() and in __copy_from_user_nmi()'s callers. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Rik van Riel <riel@surriel.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/dd956eba16646fd0b15c3c0741269dfd84452dac.1535557289.git.luto@kernel.org
2018-08-31	x86: Allow generating user-space headers without a compiler	Ben Hutchings	1	-4/+7
	When bootstrapping an architecture, it's usual to generate the kernel's user-space headers (make headers_install) before building a compiler. Move the compiler check (for asm goto support) to the archprepare target so that it is only done when building code for the target. Fixes: e501ce957a78 ("x86: Force asm-goto") Reported-by: Helmut Grohne <helmutg@debian.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180829194317.GA4765@decadent.org.uk
2018-08-31	x86/dumpstack: Don't dump kernel memory based on usermode RIP	Jann Horn	3	-5/+15
	show_opcodes() is used both for dumping kernel instructions and for dumping user instructions. If userspace causes #PF by jumping to a kernel address, show_opcodes() can be reached with regs->ip controlled by the user, pointing to kernel code. Make sure that userspace can't trick us into dumping kernel memory into dmesg. Fixes: 7cccf0725cf7 ("x86/dumpstack: Add a show_ip() function") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: security@kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180828154901.112726-1-jannh@google.com