aboutsummaryrefslogtreecommitdiffstats
path: root/arch/arm64/include/asm (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-11-09arm64: mm: tidy up top of kernel VA spaceArd Biesheuvel2-3/+3
Tidy up the way the top of the kernel VA space is organized, by mirroring the 256 MB region we have below the vmalloc space, and populating it top down with the PCI I/O space, some guard regions, and the fixmap region. The latter region is itself populated top down, and today only covers about 4 MB, and so 224 MB is ample, and no guard region is therefore required. The resulting layout is identical between 48-bit/4k and 52-bit/64k configurations. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Steve Capper <steve.capper@arm.com> Link: https://lore.kernel.org/r/20201008153602.9467-5-ardb@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-11-09arm64: mm: make vmemmap region a projection of the linear regionArd Biesheuvel1-8/+6
Now that we have reverted the introduction of the vmemmap struct page pointer and the separate physvirt_offset, we can simplify things further, and place the vmemmap region in the VA space in such a way that virtual to page translations and vice versa can be implemented using a single arithmetic shift. One happy coincidence resulting from this is that the 48-bit/4k and 52-bit/64k configurations (which are assumed to be the two most prevalent) end up with the same placement of the vmemmap region. In a subsequent patch, we will take advantage of this, and unify the memory maps even more. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Steve Capper <steve.capper@arm.com> Link: https://lore.kernel.org/r/20201008153602.9467-4-ardb@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-11-09arm64: mm: extend linear region for 52-bit VA configurationsArd Biesheuvel1-7/+5
For historical reasons, the arm64 kernel VA space is configured as two equally sized halves, i.e., on a 48-bit VA build, the VA space is split into a 47-bit vmalloc region and a 47-bit linear region. When support for 52-bit virtual addressing was added, this equal split was kept, resulting in a substantial waste of virtual address space in the linear region: 48-bit VA 52-bit VA 0xffff_ffff_ffff_ffff +-------------+ +-------------+ | vmalloc | | vmalloc | 0xffff_8000_0000_0000 +-------------+ _PAGE_END(48) +-------------+ | linear | : : 0xffff_0000_0000_0000 +-------------+ : : : : : : : : : : : : : : : : : currently : : unusable : : : : : : unused : : by : : : : : : : : hardware : : : : : : : 0xfff8_0000_0000_0000 : : _PAGE_END(52) +-------------+ : : | | : : | | : : | | : : | | : : | | : unusable : | | : : | linear | : by : | | : : | region | : hardware : | | : : | | : : | | : : | | : : | | : : | | : : | | 0xfff0_0000_0000_0000 +-------------+ PAGE_OFFSET +-------------+ As illustrated above, the 52-bit VA kernel uses 47 bits for the vmalloc space (as before), to ensure that a single 64k granule kernel image can support any 64k granule capable system, regardless of whether it supports the 52-bit virtual addressing extension. However, due to the fact that the VA space is still split in equal halves, the linear region is only 2^51 bytes in size, wasting almost half of the 52-bit VA space. Let's fix this, by abandoning the equal split, and simply assigning all VA space outside of the vmalloc region to the linear region. The KASAN shadow region is reconfigured so that it ends at the start of the vmalloc region, and grows downwards. That way, the arrangement of the vmalloc space (which contains kernel mappings, modules, BPF region, the vmemmap array etc) is identical between non-KASAN and KASAN builds, which aids debugging. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Steve Capper <steve.capper@arm.com> Link: https://lore.kernel.org/r/20201008153602.9467-3-ardb@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-11-06Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds3-1/+4
Pull arm64 fixes from Will Deacon: "Here's the weekly batch of fixes for arm64. Not an awful lot here, but there are still a few unresolved issues relating to CPU hotplug, RCU and IRQ tracing that I hope to queue fixes for next week. Summary: - Fix early use of kprobes - Fix kernel placement in kexec_file_load() - Bump maximum number of NUMA nodes" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: kexec_file: try more regions if loading segments fails arm64: kprobes: Use BRK instead of single-step when executing instructions out-of-line arm64: NUMA: Kconfig: Increase NODES_SHIFT to 4
2020-11-03arm64: kprobes: Use BRK instead of single-step when executing instructions out-of-lineJean-Philippe Brucker3-1/+4
Commit 36dadef23fcc ("kprobes: Init kprobes in early_initcall") enabled using kprobes from early_initcall. Unfortunately at this point the hardware debug infrastructure is not operational. The OS lock may still be locked, and the hardware watchpoints may have unknown values when kprobe enables debug monitors to single-step instructions. Rather than using hardware single-step, append a BRK instruction after the instruction to be executed out-of-line. Fixes: 36dadef23fcc ("kprobes: Init kprobes in early_initcall") Suggested-by: Will Deacon <will@kernel.org> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/r/20201103134900.337243-1-jean-philippe@linaro.org Signed-off-by: Will Deacon <will@kernel.org>
2020-11-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds3-14/+36
Pull kvm fixes from Paolo Bonzini: "ARM: - selftest fix - force PTE mapping on device pages provided via VFIO - fix detection of cacheable mapping at S2 - fallback to PMD/PTE mappings for composite huge pages - fix accounting of Stage-2 PGD allocation - fix AArch32 handling of some of the debug registers - simplify host HYP entry - fix stray pointer conversion on nVHE TLB invalidation - fix initialization of the nVHE code - simplify handling of capabilities exposed to HYP - nuke VCPUs caught using a forbidden AArch32 EL0 x86: - new nested virtualization selftest - miscellaneous fixes - make W=1 fixes - reserve new CPUID bit in the KVM leaves" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: vmx: remove unused variable KVM: selftests: Don't require THP to run tests KVM: VMX: eVMCS: make evmcs_sanitize_exec_ctrls() work again KVM: selftests: test behavior of unmapped L2 APIC-access address KVM: x86: Fix NULL dereference at kvm_msr_ignored_check() KVM: x86: replace static const variables with macros KVM: arm64: Handle Asymmetric AArch32 systems arm64: cpufeature: upgrade hyp caps to final arm64: cpufeature: reorder cpus_have_{const, final}_cap() KVM: arm64: Factor out is_{vhe,nvhe}_hyp_code() KVM: arm64: Force PTE mapping on fault resulting in a device mapping KVM: arm64: Use fallback mapping sizes for contiguous huge page sizes KVM: arm64: Fix masks in stage2_pte_cacheable() KVM: arm64: Fix AArch32 handling of DBGD{CCINT,SCRext} and DBGVCR KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT KVM: arm64: Drop useless PAN setting on host EL1 to EL2 transition KVM: arm64: Remove leftover kern_hyp_va() in nVHE TLB invalidation KVM: arm64: Don't corrupt tpidr_el2 on failed HVC call x86/kvm: Reserve KVM_FEATURE_MSI_EXT_DEST_ID
2020-10-30Merge tag 'kvmarm-fixes-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEADPaolo Bonzini3-14/+36
KVM/arm64 fixes for 5.10, take #1 - Force PTE mapping on device pages provided via VFIO - Fix detection of cacheable mapping at S2 - Fallback to PMD/PTE mappings for composite huge pages - Fix accounting of Stage-2 PGD allocation - Fix AArch32 handling of some of the debug registers - Simplify host HYP entry - Fix stray pointer conversion on nVHE TLB invalidation - Fix initialization of the nVHE code - Simplify handling of capabilities exposed to HYP - Nuke VCPUs caught using a forbidden AArch32 EL0
2020-10-30arm64: cpufeature: upgrade hyp caps to finalMark Rutland2-14/+24
We finalize caps before initializing kvm hyp code, and any use of cpus_have_const_cap() in kvm hyp code generates redundant and potentially unsound code to read the cpu_hwcaps array. A number of helper functions used in both hyp context and regular kernel context use cpus_have_const_cap(), as some regular kernel code runs before the capabilities are finalized. It's tedious and error-prone to write separate copies of these for hyp and non-hyp code. So that we can avoid the redundant code, let's automatically upgrade cpus_have_const_cap() to cpus_have_final_cap() when used in hyp context. With this change, there's never a reason to access to cpu_hwcaps array from hyp code, and we don't need to create an NVHE alias for this. This should have no effect on non-hyp code. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: David Brazdil <dbrazdil@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201026134931.28246-4-mark.rutland@arm.com
2020-10-30arm64: cpufeature: reorder cpus_have_{const, final}_cap()Mark Rutland1-8/+8
In a subsequent patch we'll modify cpus_have_const_cap() to call cpus_have_final_cap(), and hence we need to define cpus_have_final_cap() first. To make subsequent changes easier to follow, this patch reorders the two without making any other changes. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: David Brazdil <dbrazdil@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201026134931.28246-3-mark.rutland@arm.com
2020-10-30KVM: arm64: Factor out is_{vhe,nvhe}_hyp_code()Mark Rutland1-5/+16
Currently has_vhe() detects whether it is being compiled for VHE/NVHE hyp code based on preprocessor definitions, and uses this knowledge to avoid redundant runtime checks. There are other cases where we'd like to use this knowledge, so let's factor the preprocessor checks out into separate helpers. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Cc: David Brazdil <dbrazdil@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201026134931.28246-2-mark.rutland@arm.com
2020-10-29KVM: arm64: Fix AArch32 handling of DBGD{CCINT,SCRext} and DBGVCRMarc Zyngier1-0/+1
The DBGD{CCINT,SCRext} and DBGVCR register entries in the cp14 array are missing their target register, resulting in all accesses being targetted at the guard sysreg (indexed by __INVALID_SYSREG__). Point the emulation code at the actual register entries. Fixes: bdfb4b389c8d ("arm64: KVM: add trap handlers for AArch32 debug registers") Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20201029172409.2768336-1-maz@kernel.org
2020-10-29arm64: Add workaround for Arm Cortex-A77 erratum 1508412Rob Herring2-1/+11
On Cortex-A77 r0p0 and r1p0, a sequence of a non-cacheable or device load and a store exclusive or PAR_EL1 read can cause a deadlock. The workaround requires a DMB SY before and after a PAR_EL1 register read. In addition, it's possible an interrupt (doing a device read) or KVM guest exit could be taken between the DMB and PAR read, so we also need a DMB before returning from interrupt and before returning to a guest. A deadlock is still possible with the workaround as KVM guests must also have the workaround. IOW, a malicious guest can deadlock an affected systems. This workaround also depends on a firmware counterpart to enable the h/w to insert DMB SY after load and store exclusive instructions. See the errata document SDEN-1152370 v10 [1] for more information. [1] https://static.docs.arm.com/101992/0010/Arm_Cortex_A77_MP074_Software_Developer_Errata_Notice_v10.pdf Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Marc Zyngier <maz@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: kvmarm@lists.cs.columbia.edu Link: https://lore.kernel.org/r/20201028182839.166037-2-robh@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-10-29arm64: Add part number for Arm Cortex-A77Rob Herring1-0/+2
Add the MIDR part number info for the Arm Cortex-A77. Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201028182839.166037-1-robh@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-10-28arm64: avoid -Woverride-init warningArnd Bergmann1-0/+1
The icache_policy_str[] definition causes a warning when extra warning flags are enabled: arch/arm64/kernel/cpuinfo.c:38:26: warning: initialized field overwritten [-Woverride-init] 38 | [ICACHE_POLICY_VIPT] = "VIPT", | ^~~~~~ arch/arm64/kernel/cpuinfo.c:38:26: note: (near initialization for 'icache_policy_str[2]') arch/arm64/kernel/cpuinfo.c:39:26: warning: initialized field overwritten [-Woverride-init] 39 | [ICACHE_POLICY_PIPT] = "PIPT", | ^~~~~~ arch/arm64/kernel/cpuinfo.c:39:26: note: (near initialization for 'icache_policy_str[3]') arch/arm64/kernel/cpuinfo.c:40:27: warning: initialized field overwritten [-Woverride-init] 40 | [ICACHE_POLICY_VPIPT] = "VPIPT", | ^~~~~~~ arch/arm64/kernel/cpuinfo.c:40:27: note: (near initialization for 'icache_policy_str[0]') There is no real need for the default initializer here, as printing a NULL string is harmless. Rewrite the logic to have an explicit reserved value for the only one that uses the default value. This partially reverts the commit that removed ICACHE_POLICY_AIVIVT. Fixes: 155433cb365e ("arm64: cache: Remove support for ASID-tagged VIVT I-caches") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20201026193807.3816388-1-arnd@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-10-25treewide: Convert macro and uses of __section(foo) to __section("foo")Joe Perches1-1/+1
Use a more generic form for __section that requires quotes to avoid complications with clang and gcc differences. Remove the quote operator # from compiler_attributes.h __section macro. Convert all unquoted __section(foo) uses to quoted __section("foo"). Also convert __attribute__((section("foo"))) uses to __section("foo") even if the __attribute__ has multiple list entry forms. Conversion done using the script at: https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds12-588/+560
Pull KVM updates from Paolo Bonzini: "For x86, there is a new alternative and (in the future) more scalable implementation of extended page tables that does not need a reverse map from guest physical addresses to host physical addresses. For now it is disabled by default because it is still lacking a few of the existing MMU's bells and whistles. However it is a very solid piece of work and it is already available for people to hammer on it. Other updates: ARM: - New page table code for both hypervisor and guest stage-2 - Introduction of a new EL2-private host context - Allow EL2 to have its own private per-CPU variables - Support of PMU event filtering - Complete rework of the Spectre mitigation PPC: - Fix for running nested guests with in-kernel IRQ chip - Fix race condition causing occasional host hard lockup - Minor cleanups and bugfixes x86: - allow trapping unknown MSRs to userspace - allow userspace to force #GP on specific MSRs - INVPCID support on AMD - nested AMD cleanup, on demand allocation of nested SVM state - hide PV MSRs and hypercalls for features not enabled in CPUID - new test for MSR_IA32_TSC writes from host and guest - cleanups: MMU, CPUID, shared MSRs - LAPIC latency optimizations ad bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits) kvm: x86/mmu: NX largepage recovery for TDP MMU kvm: x86/mmu: Don't clear write flooding count for direct roots kvm: x86/mmu: Support MMIO in the TDP MMU kvm: x86/mmu: Support write protection for nesting in tdp MMU kvm: x86/mmu: Support disabling dirty logging for the tdp MMU kvm: x86/mmu: Support dirty logging for the TDP MMU kvm: x86/mmu: Support changed pte notifier in tdp MMU kvm: x86/mmu: Add access tracking for tdp_mmu kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU kvm: x86/mmu: Add TDP MMU PF handler kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg kvm: x86/mmu: Support zapping SPTEs in the TDP MMU KVM: Cache as_id in kvm_memory_slot kvm: x86/mmu: Add functions to handle changed TDP SPTEs kvm: x86/mmu: Allocate and free TDP MMU roots kvm: x86/mmu: Init / Uninit the TDP MMU kvm: x86/mmu: Introduce tdp_iter KVM: mmu: extract spte.h and spte.c KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp ...
2020-10-23Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds2-5/+4
Pull more arm64 updates from Will Deacon: "A small selection of further arm64 fixes and updates. Most of these are fixes that came in during the merge window, with the exception of the HAVE_MOVE_PMD mremap() speed-up which we discussed back in 2018 and somehow forgot to enable upstream. - Improve performance of Spectre-v2 mitigation on Falkor CPUs (if you're lucky enough to have one) - Select HAVE_MOVE_PMD. This has been shown to improve mremap() performance, which is used heavily by the Android runtime GC, and it seems we forgot to enable this upstream back in 2018. - Ensure linker flags are consistent between LLVM and BFD - Fix stale comment in Spectre mitigation rework - Fix broken copyright header - Fix KASLR randomisation of the linear map - Prevent arm64-specific prctl()s from compat tasks (return -EINVAL)" Link: https://lore.kernel.org/kvmarm/20181108181201.88826-3-joelaf@google.com/ * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: proton-pack: Update comment to reflect new function name arm64: spectre-v2: Favour CPU-specific mitigation at EL2 arm64: link with -z norelro regardless of CONFIG_RELOCATABLE arm64: Fix a broken copyright header in gen_vdso_offsets.sh arm64: mremap speedup - Enable HAVE_MOVE_PMD arm64: mm: use single quantity to represent the PA to VA translation arm64: reject prctl(PR_PAC_RESET_KEYS) on compat tasks
2020-10-22Merge tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuildLinus Torvalds1-0/+7
Pull Kbuild updates from Masahiro Yamada: - Support 'make compile_commands.json' to generate the compilation database more easily, avoiding stale entries - Support 'make clang-analyzer' and 'make clang-tidy' for static checks using clang-tidy - Preprocess scripts/modules.lds.S to allow CONFIG options in the module linker script - Drop cc-option tests from compiler flags supported by our minimal GCC/Clang versions - Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y - Use sha1 build id for both BFD linker and LLD - Improve deb-pkg for reproducible builds and rootless builds - Remove stale, useless scripts/namespace.pl - Turn -Wreturn-type warning into error - Fix build error of deb-pkg when CONFIG_MODULES=n - Replace 'hostname' command with more portable 'uname -n' - Various Makefile cleanups * tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits) kbuild: Use uname for LINUX_COMPILE_HOST detection kbuild: Only add -fno-var-tracking-assignments for old GCC versions kbuild: remove leftover comment for filechk utility treewide: remove DISABLE_LTO kbuild: deb-pkg: clean up package name variables kbuild: deb-pkg: do not build linux-headers package if CONFIG_MODULES=n kbuild: enforce -Werror=return-type scripts: remove namespace.pl builddeb: Add support for all required debian/rules targets builddeb: Enable rootless builds builddeb: Pass -n to gzip for reproducible packages kbuild: split the build log of kallsyms kbuild: explicitly specify the build id style scripts/setlocalversion: make git describe output more reliable kbuild: remove cc-option test of -Werror=date-time kbuild: remove cc-option test of -fno-stack-check kbuild: remove cc-option test of -fno-strict-overflow kbuild: move CFLAGS_{KASAN,UBSAN,KCSAN} exports to relevant Makefiles kbuild: remove redundant CONFIG_KASAN check from scripts/Makefile.kasan kbuild: do not create built-in objects for external module builds ...
2020-10-20Merge tag 'kvmarm-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEADPaolo Bonzini18-794/+618
KVM/arm64 updates for Linux 5.10 - New page table code for both hypervisor and guest stage-2 - Introduction of a new EL2-private host context - Allow EL2 to have its own private per-CPU variables - Support of PMU event filtering - Complete rework of the Spectre mitigation
2020-10-18mm/madvise: introduce process_madvise() syscall: an external memory hinting APIMinchan Kim2-1/+3
There is usecase that System Management Software(SMS) want to give a memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the case of Android, it is the ActivityManagerService. The information required to make the reclaim decision is not known to the app. Instead, it is known to the centralized userspace daemon(ActivityManagerService), and that daemon must be able to initiate reclaim on its own without any app involvement. To solve the issue, this patch introduces a new syscall process_madvise(2). It uses pidfd of an external process to give the hint. It also supports vector address range because Android app has thousands of vmas due to zygote so it's totally waste of CPU and power if we should call the syscall one by one for each vma.(With testing 2000-vma syscall vs 1-vector syscall, it showed 15% performance improvement. I think it would be bigger in real practice because the testing ran very cache friendly environment). Another potential use case for the vector range is to amortize the cost ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could benefit users like TCP receive zerocopy and malloc implementations. In future, we could find more usecases for other advises so let's make it happens as API since we introduce a new syscall at this moment. With that, existing madvise(2) user could replace it with process_madvise(2) with their own pid if they want to have batch address ranges support feature. ince it could affect other process's address range, only privileged process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same UID) gives it the right to ptrace the process could use it successfully. The flag argument is reserved for future use if we need to extend the API. I think supporting all hints madvise has/will supported/support to process_madvise is rather risky. Because we are not sure all hints make sense from external process and implementation for the hint may rely on the caller being in the current context so it could be error-prone. Thus, I just limited hints as MADV_[COLD|PAGEOUT] in this patch. If someone want to add other hints, we could hear the usecase and review it for each hint. It's safer for maintenance rather than introducing a buggy syscall but hard to fix it later. So finally, the API is as follows, ssize_t process_madvise(int pidfd, const struct iovec *iovec, unsigned long vlen, int advice, unsigned int flags); DESCRIPTION The process_madvise() system call is used to give advice or directions to the kernel about the address ranges from external process as well as local process. It provides the advice to address ranges of process described by iovec and vlen. The goal of such advice is to improve system or application performance. The pidfd selects the process referred to by the PID file descriptor specified in pidfd. (See pidofd_open(2) for further information) The pointer iovec points to an array of iovec structures, defined in <sys/uio.h> as: struct iovec { void *iov_base; /* starting address */ size_t iov_len; /* number of bytes to be advised */ }; The iovec describes address ranges beginning at address(iov_base) and with size length of bytes(iov_len). The vlen represents the number of elements in iovec. The advice is indicated in the advice argument, which is one of the following at this moment if the target process specified by pidfd is external. MADV_COLD MADV_PAGEOUT Permission to provide a hint to external process is governed by a ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2). The process_madvise supports every advice madvise(2) has if target process is in same thread group with calling process so user could use process_madvise(2) to extend existing madvise(2) to support vector address ranges. RETURN VALUE On success, process_madvise() returns the number of bytes advised. This return value may be less than the total number of requested bytes, if an error occurred. The caller should check return value to determine whether a partial advice occurred. FAQ: Q.1 - Why does any external entity have better knowledge? Quote from Sandeep "For Android, every application (including the special SystemServer) are forked from Zygote. The reason of course is to share as many libraries and classes between the two as possible to benefit from the preloading during boot. After applications start, (almost) all of the APIs end up calling into this SystemServer process over IPC (binder) and back to the application. In a fully running system, the SystemServer monitors every single process periodically to calculate their PSS / RSS and also decides which process is "important" to the user for interactivity. So, because of how these processes start _and_ the fact that the SystemServer is looping to monitor each process, it does tend to *know* which address range of the application is not used / useful. Besides, we can never rely on applications to clean things up themselves. We've had the "hey app1, the system is low on memory, please trim your memory usage down" notifications for a long time[1]. They rely on applications honoring the broadcasts and very few do. So, if we want to avoid the inevitable killing of the application and restarting it, some way to be able to tell the OS about unimportant memory in these applications will be useful. - ssp Q.2 - How to guarantee the race(i.e., object validation) between when giving a hint from an external process and get the hint from the target process? process_madvise operates on the target process's address space as it exists at the instant that process_madvise is called. If the space target process can run between the time the process_madvise process inspects the target process address space and the time that process_madvise is actually called, process_madvise may operate on memory regions that the calling process does not expect. It's the responsibility of the process calling process_madvise to close this race condition. For example, the calling process can suspend the target process with ptrace, SIGSTOP, or the freezer cgroup so that it doesn't have an opportunity to change its own address space before process_madvise is called. Another option is to operate on memory regions that the caller knows a priori will be unchanged in the target process. Yet another option is to accept the race for certain process_madvise calls after reasoning that mistargeting will do no harm. The suggested API itself does not provide synchronization. It also apply other APIs like move_pages, process_vm_write. The race isn't really a problem though. Why is it so wrong to require that callers do their own synchronization in some manner? Nobody objects to write(2) merely because it's possible for two processes to open the same file and clobber each other's writes --- instead, we tell people to use flock or something. Think about mmap. It never guarantees newly allocated address space is still valid when the user tries to access it because other threads could unmap the memory right before. That's where we need synchronization by using other API or design from userside. It shouldn't be part of API itself. If someone needs more fine-grained synchronization rather than process level, there were two ideas suggested - cookie[2] and anon-fd[3]. Both are applicable via using last reserved argument of the API but I don't think it's necessary right now since we have already ways to prevent the race so don't want to add additional complexity with more fine-grained optimization model. To make the API extend, it reserved an unsigned long as last argument so we could support it in future if someone really needs it. Q.3 - Why doesn't ptrace work? Injecting an madvise in the target process using ptrace would not work for us because such injected madvise would have to be executed by the target process, which means that process would have to be runnable and that creates the risk of the abovementioned race and hinting a wrong VMA. Furthermore, we want to act the hint in caller's context, not the callee's, because the callee is usually limited in cpuset/cgroups or even freezed state so they can't act by themselves quick enough, which causes more thrashing/kill. It doesn't work if the target process are ptraced(e.g., strace, debugger, minidump) because a process can have at most one ptracer. [1] https://developer.android.com/topic/performance/memory" [2] process_getinfo for getting the cookie which is updated whenever vma of process address layout are changed - Daniel Colascione - https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224 [3] anonymous fd which is used for the object(i.e., address range) validation - Michal Hocko - https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/ [minchan@kernel.org: fix process_madvise build break for arm64] Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com [minchan@kernel.org: fix build error for mips of process_madvise] Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com [akpm@linux-foundation.org: fix patch ordering issue] [akpm@linux-foundation.org: fix arm64 whoops] [minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian] [akpm@linux-foundation.org: fix i386 build] [sfr@canb.auug.org.au: fix syscall numbering] Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au [sfr@canb.auug.org.au: madvise.c needs compat.h] Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au [minchan@kernel.org: fix mips build] Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com [yuehaibing@huawei.com: remove duplicate header which is included twice] Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com [minchan@kernel.org: do not use helper functions for process_madvise] Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com [akpm@linux-foundation.org: pidfd_get_pid() gained an argument] [sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"] Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.au Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <christian@brauner.io> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Florian Weimer <fw@deneb.enyo.de> Cc: <linux-man@vger.kernel.org> Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-15arm64: mm: use single quantity to represent the PA to VA translationArd Biesheuvel2-5/+4
On arm64, the global variable memstart_addr represents the physical address of PAGE_OFFSET, and so physical to virtual translations or vice versa used to come down to simple additions or subtractions involving the values of PAGE_OFFSET and memstart_addr. When support for 52-bit virtual addressing was introduced, we had to deal with PAGE_OFFSET potentially being outside of the region that can be covered by the virtual range (as the 52-bit VA capable build needs to be able to run on systems that are only 48-bit VA capable), and for this reason, another translation was introduced, and recorded in the global variable physvirt_offset. However, if we go back to the original definition of memstart_addr, i.e., the physical address of PAGE_OFFSET, it turns out that there is no need for two separate translations: instead, we can simply subtract the size of the unaddressable VA space from memstart_addr to make the available physical memory appear in the 48-bit addressable VA region. This simplifies things, but also fixes a bug on KASLR builds, which may update memstart_addr later on in arm64_memblock_init(), but fails to update vmemmap and physvirt_offset accordingly. Fixes: 5383cc6efed1 ("arm64: mm: Introduce vabits_actual") Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Steve Capper <steve.capper@arm.com> Link: https://lore.kernel.org/r/20201008153602.9467-2-ardb@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2020-10-14Merge tag 'iommu-updates-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommuLinus Torvalds2-0/+2
Pull iommu updates from Joerg Roedel: - ARM-SMMU Updates from Will: - Continued SVM enablement, where page-table is shared with CPU - Groundwork to support integrated SMMU with Adreno GPU - Allow disabling of MSI-based polling on the kernel command-line - Minor driver fixes and cleanups (octal permissions, error messages, ...) - Secure Nested Paging Support for AMD IOMMU. The IOMMU will fault when a device tries DMA on memory owned by a guest. This needs new fault-types as well as a rewrite of the IOMMU memory semaphore for command completions. - Allow broken Intel IOMMUs (wrong address widths reported) to still be used for interrupt remapping. - IOMMU UAPI updates for supporting vSVA, where the IOMMU can access address spaces of processes running in a VM. - Support for the MT8167 IOMMU in the Mediatek IOMMU driver. - Device-tree updates for the Renesas driver to support r8a7742. - Several smaller fixes and cleanups all over the place. * tag 'iommu-updates-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (57 commits) iommu/vt-d: Gracefully handle DMAR units with no supported address widths iommu/vt-d: Check UAPI data processed by IOMMU core iommu/uapi: Handle data and argsz filled by users iommu/uapi: Rename uapi functions iommu/uapi: Use named union for user data iommu/uapi: Add argsz for user filled data docs: IOMMU user API iommu/qcom: add missing put_device() call in qcom_iommu_of_xlate() iommu/arm-smmu-v3: Add SVA device feature iommu/arm-smmu-v3: Check for SVA features iommu/arm-smmu-v3: Seize private ASID iommu/arm-smmu-v3: Share process page tables iommu/arm-smmu-v3: Move definitions to a header iommu/io-pgtable-arm: Move some definitions to a header iommu/arm-smmu-v3: Ensure queue is read after updating prod pointer iommu/amd: Re-purpose Exclusion range registers to support SNP CWWB iommu/amd: Add support for RMP_PAGE_FAULT and RMP_HW_ERR iommu/amd: Use 4K page for completion wait write-back semaphore iommu/tegra-smmu: Allow to group clients in same swgroup iommu/tegra-smmu: Fix iova->phys translation ...
2020-10-14Merge tag 'pm-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds1-0/+2
Pull power management updates from Rafael Wysocki: "These rework the collection of cpufreq statistics to allow it to take place if fast frequency switching is enabled in the governor, rework the frequency invariance handling in the cpufreq core and drivers, add new hardware support to a couple of cpufreq drivers, fix a number of assorted issues and clean up the code all over. Specifics: - Rework cpufreq statistics collection to allow it to take place when fast frequency switching is enabled in the governor (Viresh Kumar). - Make the cpufreq core set the frequency scale on behalf of the driver and update several cpufreq drivers accordingly (Ionela Voinescu, Valentin Schneider). - Add new hardware support to the STI and qcom cpufreq drivers and improve them (Alain Volmat, Manivannan Sadhasivam). - Fix multiple assorted issues in cpufreq drivers (Jon Hunter, Krzysztof Kozlowski, Matthias Kaehlcke, Pali Rohár, Stephan Gerhold, Viresh Kumar). - Fix several assorted issues in the operating performance points (OPP) framework (Stephan Gerhold, Viresh Kumar). - Allow devfreq drivers to fetch devfreq instances by DT enumeration instead of using explicit phandles and modify the devfreq core code to support driver-specific devfreq DT bindings (Leonard Crestez, Chanwoo Choi). - Improve initial hardware resetting in the tegra30 devfreq driver and clean up the tegra cpuidle driver (Dmitry Osipenko). - Update the cpuidle core to collect state entry rejection statistics and expose them via sysfs (Lina Iyer). - Improve the ACPI _CST code handling diagnostics (Chen Yu). - Update the PSCI cpuidle driver to allow the PM domain initialization to occur in the OSI mode as well as in the PC mode (Ulf Hansson). - Rework the generic power domains (genpd) core code to allow domain power off transition to be aborted in the absence of the "power off" domain callback (Ulf Hansson). - Fix two suspend-to-idle issues in the ACPI EC driver (Rafael Wysocki). - Fix the handling of timer_expires in the PM-runtime framework on 32-bit systems and the handling of device links in it (Grygorii Strashko, Xiang Chen). - Add IO requests batching support to the hibernate image saving and reading code and drop a bogus get_gendisk() from there (Xiaoyi Chen, Christoph Hellwig). - Allow PCIe ports to be put into the D3cold power state if they are power-manageable via ACPI (Lukas Wunner). - Add missing header file include to a power capping driver (Pujin Shi). - Clean up the qcom-cpr AVS driver a bit (Liu Shixin). - Kevin Hilman steps down as designated reviwer of adaptive voltage scaling (AVS) drivers (Kevin Hilman)" * tag 'pm-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (65 commits) cpufreq: stats: Fix string format specifier mismatch arm: disable frequency invariance for CONFIG_BL_SWITCHER cpufreq,arm,arm64: restructure definitions of arch_set_freq_scale() cpufreq: stats: Add memory barrier to store_reset() cpufreq: schedutil: Simplify sugov_fast_switch() ACPI: EC: PM: Drop ec_no_wakeup check from acpi_ec_dispatch_gpe() ACPI: EC: PM: Flush EC work unconditionally after wakeup PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI PM: hibernate: remove the bogus call to get_gendisk() in software_resume() cpufreq: Move traces and update to policy->cur to cpufreq core cpufreq: stats: Enable stats for fast-switch as well cpufreq: stats: Mark few conditionals with unlikely() cpufreq: stats: Remove locking cpufreq: stats: Defer stats update to cpufreq_stats_record_transition() PM: domains: Allow to abort power off when no ->power_off() callback PM: domains: Rename power state enums for genpd PM / devfreq: tegra30: Improve initial hardware resetting PM / devfreq: event: Change prototype of devfreq_event_get_edev_by_phandle function PM / devfreq: Change prototype of devfreq_get_devfreq_by_phandle function PM / devfreq: Add devfreq_get_devfreq_by_node function ...
2020-10-14Merge tag 'for-linus-5.10b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds1-0/+6
Pull xen updates from Juergen Gross: - two small cleanup patches - avoid error messages when initializing MCA banks in a Xen dom0 - a small series for converting the Xen gntdev driver to use pin_user_pages*() instead of get_user_pages*() - intermediate fix for running as a Xen guest on Arm with KPTI enabled (the final solution will need new Xen functionality) * tag 'for-linus-5.10b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: Fix typo in xen_pagetable_p2m_free() x86/xen: disable Firmware First mode for correctable memory errors xen/arm: do not setup the runstate info page if kpti is enabled xen: remove redundant initialization of variable ret xen/gntdev.c: Convert get_user_pages*() to pin_user_pages*() xen/gntdev.c: Mark pages as dirty
2020-10-12Merge branch 'compat.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-1/+1
Pull compat mount cleanups from Al Viro: "The last remnants of mount(2) compat buried by Christoph. Buried into NFS, that is. Generally I'm less enthusiastic about "let's use in_compat_syscall() deep in call chain" kind of approach than Christoph seems to be, but in this case it's warranted - that had been an NFS-specific wart, hopefully not to be repeated in any other filesystems (read: any new filesystem introducing non-text mount options will get NAKed even if it doesn't mess the layout up). IOW, not worth trying to grow an infrastructure that would avoid that use of in_compat_syscall()..." * 'compat.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: remove compat_sys_mount fs,nfs: lift compat nfs4 mount data handling into the nfs code nfs: simplify nfs4_parse_monolithic
2020-10-12Merge branch 'work.quota-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-2/+0
Pull compat quotactl cleanups from Al Viro: "More Christoph's compat cleanups: quotactl(2)" * 'work.quota-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: quota: simplify the quotactl compat handling compat: add a compat_need_64bit_alignment_fixup() helper compat: lift compat_s64 and compat_u64 to <asm-generic/compat.h>
2020-10-12Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-5/+5
Pull compat iovec cleanups from Al Viro: "Christoph's series around import_iovec() and compat variant thereof" * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: security/keys: remove compat_keyctl_instantiate_key_iov mm: remove compat_process_vm_{readv,writev} fs: remove compat_sys_vmsplice fs: remove the compat readv/writev syscalls fs: remove various compat readv/writev helpers iov_iter: transparently handle compat iovecs in import_iovec iov_iter: refactor rw_copy_check_uvector and import_iovec iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c compat.h: fix a spelling error in <linux/compat.h>
2020-10-12Merge tag 'efi-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-3/+2
Pull EFI changes from Ingo Molnar: - Preliminary RISC-V enablement - the bulk of it will arrive via the RISCV tree. - Relax decompressed image placement rules for 32-bit ARM - Add support for passing MOK certificate table contents via a config table rather than a EFI variable. - Add support for 18 bit DIMM row IDs in the CPER records. - Work around broken Dell firmware that passes the entire Boot#### variable contents as the command line - Add definition of the EFI_MEMORY_CPU_CRYPTO memory attribute so we can identify it in the memory map listings. - Don't abort the boot on arm64 if the EFI RNG protocol is available but returns with an error - Replace slashes with exclamation marks in efivarfs file names - Split efi-pstore from the deprecated efivars sysfs code, so we can disable the latter on !x86. - Misc fixes, cleanups and updates. * tag 'efi-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits) efi: mokvar: add missing include of asm/early_ioremap.h efi: efivars: limit availability to X86 builds efi: remove some false dependencies on CONFIG_EFI_VARS efi: gsmi: fix false dependency on CONFIG_EFI_VARS efi: efivars: un-export efivars_sysfs_init() efi: pstore: move workqueue handling out of efivars efi: pstore: disentangle from deprecated efivars module efi: mokvar-table: fix some issues in new code efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure efivarfs: Replace invalid slashes with exclamation marks in dentries. efi: Delete deprecated parameter comments efi/libstub: Fix missing-prototypes in string.c efi: Add definition of EFI_MEMORY_CPU_CRYPTO and ability to report it cper,edac,efi: Memory Error Record: bank group/address and chip id edac,ghes,cper: Add Row Extension to Memory Error Record efi/x86: Add a quirk to support command line arguments on Dell EFI firmware efi/libstub: Add efi_warn and *_once logging helpers integrity: Load certs from the EFI MOK config table integrity: Move import of MokListRT certs to a separate routine efi: Support for MOK variable config table ...
2020-10-12Merge tag 'irq-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds5-28/+23
Pull irq updates from Thomas Gleixner: "Updates for the interrupt subsystem: Core: - Allow trimming of interrupt hierarchy to support odd hardware setups where only a subset of the interrupts requires the full hierarchy. - Allow the retrigger mechanism to follow a hierarchy to simplify driver code. - Provide a mechanism to force enable wakeup interrrupts on suspend. - More infrastructure to handle IPIs in the core code Architectures: - Convert ARM/ARM64 IPI handling to utilize the interrupt core code. Drivers: - The usual pile of new interrupt chips (MStar, Actions Owl, TI PRUSS, Designware ICTL) - ARM(64) IPI related conversions - Wakeup support for Qualcom PDC - Prevent hierarchy corruption in the NVIDIA Tegra driver - The usual small fixes, improvements and cleanups all over the place" * tag 'irq-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits) dt-bindings: interrupt-controller: Add MStar interrupt controller irqchip/irq-mst: Add MStar interrupt controller support soc/tegra: pmc: Don't create fake interrupt hierarchy levels soc/tegra: pmc: Allow optional irq parent callbacks gpio: tegra186: Allow optional irq parent callbacks genirq/irqdomain: Allow partial trimming of irq_data hierarchy irqchip/qcom-pdc: Reset PDC interrupts during init irqchip/qcom-pdc: Set IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag pinctrl: qcom: Set IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag genirq/PM: Introduce IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag pinctrl: qcom: Use return value from irq_set_wake() call pinctrl: qcom: Set IRQCHIP_SET_TYPE_MASKED and IRQCHIP_MASK_ON_SUSPEND flags ARM: Handle no IPI being registered in show_ipi_list() MAINTAINERS: Add entries for Actions Semi Owl SIRQ controller irqchip: Add Actions Semi Owl SIRQ controller dt-bindings: interrupt-controller: Add Actions SIRQ controller binding dt-bindings: dw-apb-ictl: Update binding to describe use as primary interrupt controller irqchip/dw-apb-ictl: Add primary interrupt controller support irqchip/dw-apb-ictl: Refactor priot to introducing hierarchical irq domains genirq: Add stub for set_handle_irq() when !GENERIC_IRQ_MULTI_HANDLER ...
2020-10-12Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linuxLinus Torvalds37-294/+526
Pull arm64 updates from Will Deacon: "There's quite a lot of code here, but much of it is due to the addition of a new PMU driver as well as some arm64-specific selftests which is an area where we've traditionally been lagging a bit. In terms of exciting features, this includes support for the Memory Tagging Extension which narrowly missed 5.9, hopefully allowing userspace to run with use-after-free detection in production on CPUs that support it. Work is ongoing to integrate the feature with KASAN for 5.11. Another change that I'm excited about (assuming they get the hardware right) is preparing the ASID allocator for sharing the CPU page-table with the SMMU. Those changes will also come in via Joerg with the IOMMU pull. We do stray outside of our usual directories in a few places, mostly due to core changes required by MTE. Although much of this has been Acked, there were a couple of places where we unfortunately didn't get any review feedback. Other than that, we ran into a handful of minor conflicts in -next, but nothing that should post any issues. Summary: - Userspace support for the Memory Tagging Extension introduced by Armv8.5. Kernel support (via KASAN) is likely to follow in 5.11. - Selftests for MTE, Pointer Authentication and FPSIMD/SVE context switching. - Fix and subsequent rewrite of our Spectre mitigations, including the addition of support for PR_SPEC_DISABLE_NOEXEC. - Support for the Armv8.3 Pointer Authentication enhancements. - Support for ASID pinning, which is required when sharing page-tables with the SMMU. - MM updates, including treating flush_tlb_fix_spurious_fault() as a no-op. - Perf/PMU driver updates, including addition of the ARM CMN PMU driver and also support to handle CPU PMU IRQs as NMIs. - Allow prefetchable PCI BARs to be exposed to userspace using normal non-cacheable mappings. - Implementation of ARCH_STACKWALK for unwinding. - Improve reporting of unexpected kernel traps due to BPF JIT failure. - Improve robustness of user-visible HWCAP strings and their corresponding numerical constants. - Removal of TEXT_OFFSET. - Removal of some unused functions, parameters and prototypes. - Removal of MPIDR-based topology detection in favour of firmware description. - Cleanups to handling of SVE and FPSIMD register state in preparation for potential future optimisation of handling across syscalls. - Cleanups to the SDEI driver in preparation for support in KVM. - Miscellaneous cleanups and refactoring work" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits) Revert "arm64: initialize per-cpu offsets earlier" arm64: random: Remove no longer needed prototypes arm64: initialize per-cpu offsets earlier kselftest/arm64: Check mte tagged user address in kernel kselftest/arm64: Verify KSM page merge for MTE pages kselftest/arm64: Verify all different mmap MTE options kselftest/arm64: Check forked child mte memory accessibility kselftest/arm64: Verify mte tag inclusion via prctl kselftest/arm64: Add utilities and a test to validate mte memory perf: arm-cmn: Fix conversion specifiers for node type perf: arm-cmn: Fix unsigned comparison to less than zero arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option arm64: Pull in task_stack_page() to Spectre-v4 mitigation code KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled arm64: Get rid of arm64_ssbd_state KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state() KVM: arm64: Get rid of kvm_arm_have_ssbd() KVM: arm64: Simplify handling of ARCH_WORKAROUND_2 ...
2020-10-11Merge tag 'irqchip-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/coreThomas Gleixner5-28/+23
Pull irqchip updates from Marc Zyngier: Core changes: - Allow irq retriggering to follow a hierarchy - Allow interrupt hierarchies to be trimmed at allocation time - Allow interrupts to be hidden from /proc/interrupts (IPIs) - Introduce stub for set_handle_irq() when !GENERIC_IRQ_MULTI_HANDLER - New per-cpu IPI handling flow Architecture changes: - Move arm/arm64 IPI handling to the core interrupt code, removing the home brewed accounting Driver updates: - New driver for the MStar (and more recently Mediatek) platforms - New driver for the Actions Owl SIRQ controller - New driver for the TI PRUSS infrastructure - Wake-up support for the Qualcomm PDC controller - Primary interrupt controller support for the Designware APB ICTL - Convert the IPI code for GIC, GICv3, hip04, armada-270-xp and bcm2836 to using standard interrupts - Improve GICv3 pseudo-NMI support to deal with both non-secure and secure priorities on arm64 - Convert the GIC/GICv3 drivers to using HW-based irq retrigger - A sprinkling of dev_err_probe() conversion - A set of NVIDIA Tegra fixes for interrupt hierarchy corruption - A reset fix for the Loongson HTVEC driver - A couple of error handling fixes in the TI SCI drivers
2020-10-09Revert "arm64: initialize per-cpu offsets earlier"Will Deacon1-2/+0
This reverts commit 353e228eb355be5a65a3c0996c774a0f46737fda. Qian Cai reports that TX2 no longer boots with his .config as it appears that task_cpu() gets instrumented and used before KASAN has been initialised. Although Mark has a proposed fix, let's take the safe option of reverting this for now and sorting it out properly later. Link: https://lore.kernel.org/r/711bc57a314d8d646b41307008db2845b7537b3d.camel@redhat.com Reported-by: Qian Cai <cai@redhat.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
2020-10-08cpufreq,arm,arm64: restructure definitions of arch_set_freq_scale()Ionela Voinescu1-0/+1
Compared to other arch_* functions, arch_set_freq_scale() has an atypical weak definition that can be replaced by a strong architecture specific implementation. The more typical support for architectural functions involves defining an empty stub in a header file if the symbol is not already defined in architecture code. Some examples involve: - #define arch_scale_freq_capacity topology_get_freq_scale - #define arch_scale_freq_invariant topology_scale_freq_invariant - #define arch_scale_cpu_capacity topology_get_cpu_scale - #define arch_update_cpu_topology topology_update_cpu_topology - #define arch_scale_thermal_pressure topology_get_thermal_pressure - #define arch_set_thermal_pressure topology_set_thermal_pressure Bring arch_set_freq_scale() in line with these functions by renaming it to topology_set_freq_scale() in the arch topology driver, and by defining the arch_set_freq_scale symbol to point to the new function for arm and arm64. While there are other users of the arch_topology driver, this patch defines arch_set_freq_scale for arm and arm64 only, due to their existing definitions of arch_scale_freq_capacity. This is the getter function of the frequency invariance scale factor and without a getter function, the setter function - arch_set_freq_scale() has not purpose. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> (BL_SWITCHER and topology parts) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-10-07Merge branch 'for-next/late-arrivals' into for-next/coreWill Deacon2-5/+2
Late patches for 5.10: MTE selftests, minor KCSAN preparation and removal of some unused prototypes. (Amit Daniel Kachhap and others) * for-next/late-arrivals: arm64: random: Remove no longer needed prototypes arm64: initialize per-cpu offsets earlier kselftest/arm64: Check mte tagged user address in kernel kselftest/arm64: Verify KSM page merge for MTE pages kselftest/arm64: Verify all different mmap MTE options kselftest/arm64: Check forked child mte memory accessibility kselftest/arm64: Verify mte tag inclusion via prctl kselftest/arm64: Add utilities and a test to validate mte memory
2020-10-07arm64: random: Remove no longer needed prototypesAndre Przywara1-5/+0
Commit 9bceb80b3cc4 ("arm64: kaslr: Use standard early random function") removed the direct calls of the __arm64_rndr() and __early_cpu_has_rndr() functions, but left the dummy prototypes in the #else branch of the #ifdef CONFIG_ARCH_RANDOM guard. Remove the redundant prototypes, as they have no users outside of this header file. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20201006194453.36519-1-andre.przywara@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-10-07Merge branches 'arm/allwinner', 'arm/mediatek', 'arm/renesas', 'arm/tegra', 'arm/qcom', 'arm/smmu', 'ppc/pamu', 'x86/amd', 'x86/vt-d' and 'core' into nextJoerg Roedel4-1/+15
2020-10-05arm64: initialize per-cpu offsets earlierMark Rutland1-0/+2
The current initialization of the per-cpu offset register is difficult to follow and this initialization is not always early enough for upcoming instrumentation with KCSAN, where the instrumentation callbacks use the per-cpu offset. To make it possible to support KCSAN, and to simplify reasoning about early bringup code, let's initialize the per-cpu offset earlier, before we run any C code that may consume it. To do so, this patch adds a new init_this_cpu_offset() helper that's called before the usual primary/secondary start functions. For consistency, this is also used to re-initialize the per-cpu offset after the runtime per-cpu areas have been allocated (which can change CPU0's offset). So that init_this_cpu_offset() isn't subject to any instrumentation that might consume the per-cpu offset, it is marked with noinstr, preventing instrumentation. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20201005164303.21389-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2020-10-05Merge back cpufreq material for 5.10.Rafael J. Wysocki1-0/+1
2020-10-04xen/arm: do not setup the runstate info page if kpti is enabledStefano Stabellini1-0/+6
The VCPUOP_register_runstate_memory_area hypercall takes a virtual address of a buffer as a parameter. The semantics of the hypercall are such that the virtual address should always be valid. When KPTI is enabled and we are running userspace code, the virtual address is not valid, thus, Linux is violating the semantics of VCPUOP_register_runstate_memory_area. Do not call VCPUOP_register_runstate_memory_area when KPTI is enabled. Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> CC: Bertrand Marquis <Bertrand.Marquis@arm.com> CC: boris.ostrovsky@oracle.com CC: jgross@suse.com Link: https://lore.kernel.org/r/20200924234955.15455-1-sstabellini@kernel.org Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2020-10-03mm: remove compat_process_vm_{readv,writev}Christoph Hellwig1-2/+2
Now that import_iovec handles compat iovecs, the native syscalls can be used for the compat case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-10-03fs: remove compat_sys_vmspliceChristoph Hellwig1-1/+1
Now that import_iovec handles compat iovecs, the native vmsplice syscall can be used for the compat case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-10-03fs: remove the compat readv/writev syscallsChristoph Hellwig1-2/+2
Now that import_iovec handles compat iovecs, the native readv and writev syscalls can be used for the compat case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-10-02Merge branch 'for-next/mte' into for-next/coreWill Deacon13-24/+293
Add userspace support for the Memory Tagging Extension introduced by Armv8.5. (Catalin Marinas and others) * for-next/mte: (30 commits) arm64: mte: Fix typo in memory tagging ABI documentation arm64: mte: Add Memory Tagging Extension documentation arm64: mte: Kconfig entry arm64: mte: Save tags when hibernating arm64: mte: Enable swap of tagged pages mm: Add arch hooks for saving/restoring tags fs: Handle intra-page faults in copy_mount_options() arm64: mte: ptrace: Add NT_ARM_TAGGED_ADDR_CTRL regset arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS support arm64: mte: Allow {set,get}_tagged_addr_ctrl() on non-current tasks arm64: mte: Restore the GCR_EL1 register after a suspend arm64: mte: Allow user control of the generated random tags via prctl() arm64: mte: Allow user control of the tag check mode via prctl() mm: Allow arm64 mmap(PROT_MTE) on RAM-based files arm64: mte: Validate the PROT_MTE request via arch_validate_flags() mm: Introduce arch_validate_flags() arm64: mte: Add PROT_MTE support to mmap() and mprotect() mm: Introduce arch_calc_vm_flag_bits() arm64: mte: Tags-aware aware memcmp_pages() implementation arm64: Avoid unnecessary clear_user_page() indirection ...
2020-10-02Merge branch 'for-next/ghostbusters' into for-next/coreWill Deacon9-206/+58
Fix and subsequently rewrite Spectre mitigations, including the addition of support for PR_SPEC_DISABLE_NOEXEC. (Will Deacon and Marc Zyngier) * for-next/ghostbusters: (22 commits) arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option arm64: Pull in task_stack_page() to Spectre-v4 mitigation code KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled arm64: Get rid of arm64_ssbd_state KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state() KVM: arm64: Get rid of kvm_arm_have_ssbd() KVM: arm64: Simplify handling of ARCH_WORKAROUND_2 arm64: Rewrite Spectre-v4 mitigation code arm64: Move SSBD prctl() handler alongside other spectre mitigation code arm64: Rename ARM64_SSBD to ARM64_SPECTRE_V4 arm64: Treat SSBS as a non-strict system feature arm64: Group start_thread() functions together KVM: arm64: Set CSV2 for guests on hardware unaffected by Spectre-v2 arm64: Rewrite Spectre-v2 mitigation code arm64: Introduce separate file for spectre mitigations and reporting arm64: Rename ARM64_HARDEN_BRANCH_PREDICTOR to ARM64_SPECTRE_V2 KVM: arm64: Simplify install_bp_hardening_cb() KVM: arm64: Replace CONFIG_KVM_INDIRECT_VECTORS with CONFIG_RANDOMIZE_BASE arm64: Remove Spectre-related CONFIG_* options arm64: Run ARCH_WORKAROUND_2 enabling code on all CPUs ...
2020-10-02Merge branches 'for-next/acpi', 'for-next/boot', 'for-next/bpf', 'for-next/cpuinfo', 'for-next/fpsimd', 'for-next/misc', 'for-next/mm', 'for-next/pci', 'for-next/perf', 'for-next/ptrauth', 'for-next/sdei', 'for-next/selftests', 'for-next/stacktrace', 'for-next/svm', 'for-next/topology', ↵Will Deacon25-59/+175
'for-next/tpyos' and 'for-next/vdso' into for-next/core Remove unused functions and parameters from ACPI IORT code. (Zenghui Yu via Lorenzo Pieralisi) * for-next/acpi: ACPI/IORT: Remove the unused inline functions ACPI/IORT: Drop the unused @ops of iort_add_device_replay() Remove redundant code and fix documentation of caching behaviour for the HVC_SOFT_RESTART hypercall. (Pingfan Liu) * for-next/boot: Documentation/kvm/arm: improve description of HVC_SOFT_RESTART arm64/relocate_kernel: remove redundant code Improve reporting of unexpected kernel traps due to BPF JIT failure. (Will Deacon) * for-next/bpf: arm64: Improve diagnostics when trapping BRK with FAULT_BRK_IMM Improve robustness of user-visible HWCAP strings and their corresponding numerical constants. (Anshuman Khandual) * for-next/cpuinfo: arm64/cpuinfo: Define HWCAP name arrays per their actual bit definitions Cleanups to handling of SVE and FPSIMD register state in preparation for potential future optimisation of handling across syscalls. (Julien Grall) * for-next/fpsimd: arm64/sve: Implement a helper to load SVE registers from FPSIMD state arm64/sve: Implement a helper to flush SVE registers arm64/fpsimdmacros: Allow the macro "for" to be used in more cases arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN arm64/signal: Update the comment in preserve_sve_context arm64/fpsimd: Update documentation of do_sve_acc Miscellaneous changes. (Tian Tao and others) * for-next/misc: arm64/mm: return cpu_all_mask when node is NUMA_NO_NODE arm64: mm: Fix missing-prototypes in pageattr.c arm64/fpsimd: Fix missing-prototypes in fpsimd.c arm64: hibernate: Remove unused including <linux/version.h> arm64/mm: Refactor {pgd, pud, pmd, pte}_ERROR() arm64: Remove the unused include statements arm64: get rid of TEXT_OFFSET arm64: traps: Add str of description to panic() in die() Memory management updates and cleanups. (Anshuman Khandual and others) * for-next/mm: arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op arm64/mm: Unify CONT_PMD_SHIFT arm64/mm: Unify CONT_PTE_SHIFT arm64/mm: Remove CONT_RANGE_OFFSET arm64/mm: Enable THP migration arm64/mm: Change THP helpers to comply with generic MM semantics arm64/mm/ptdump: Add address markers for BPF regions Allow prefetchable PCI BARs to be exposed to userspace using normal non-cacheable mappings. (Clint Sbisa) * for-next/pci: arm64: Enable PCI write-combine resources under sysfs Perf/PMU driver updates. (Julien Thierry and others) * for-next/perf: perf: arm-cmn: Fix conversion specifiers for node type perf: arm-cmn: Fix unsigned comparison to less than zero arm_pmu: arm64: Use NMIs for PMU arm_pmu: Introduce pmu_irq_ops KVM: arm64: pmu: Make overflow handler NMI safe arm64: perf: Defer irq_work to IPI_IRQ_WORK arm64: perf: Remove PMU locking arm64: perf: Avoid PMXEV* indirection arm64: perf: Add missing ISB in armv8pmu_enable_counter() perf: Add Arm CMN-600 PMU driver perf: Add Arm CMN-600 DT binding arm64: perf: Add support caps under sysfs drivers/perf: thunderx2_pmu: Fix memory resource error handling drivers/perf: xgene_pmu: Fix uninitialized resource struct perf: arm_dsu: Support DSU ACPI devices arm64: perf: Remove unnecessary event_idx check drivers/perf: hisi: Add missing include of linux/module.h arm64: perf: Add general hardware LLC events for PMUv3 Support for the Armv8.3 Pointer Authentication enhancements. (By Amit Daniel Kachhap) * for-next/ptrauth: arm64: kprobe: clarify the comment of steppable hint instructions arm64: kprobe: disable probe of fault prone ptrauth instruction arm64: cpufeature: Modify address authentication cpufeature to exact arm64: ptrauth: Introduce Armv8.3 pointer authentication enhancements arm64: traps: Allow force_signal_inject to pass esr error code arm64: kprobe: add checks for ARMv8.3-PAuth combined instructions Tonnes of cleanup to the SDEI driver. (Gavin Shan) * for-next/sdei: firmware: arm_sdei: Remove _sdei_event_unregister() firmware: arm_sdei: Remove _sdei_event_register() firmware: arm_sdei: Introduce sdei_do_local_call() firmware: arm_sdei: Cleanup on cross call function firmware: arm_sdei: Remove while loop in sdei_event_unregister() firmware: arm_sdei: Remove while loop in sdei_event_register() firmware: arm_sdei: Remove redundant error message in sdei_probe() firmware: arm_sdei: Remove duplicate check in sdei_get_conduit() firmware: arm_sdei: Unregister driver on error in sdei_init() firmware: arm_sdei: Avoid nested statements in sdei_init() firmware: arm_sdei: Retrieve event number from event instance firmware: arm_sdei: Common block for failing path in sdei_event_create() firmware: arm_sdei: Remove sdei_is_err() Selftests for Pointer Authentication and FPSIMD/SVE context-switching. (Mark Brown and Boyan Karatotev) * for-next/selftests: selftests: arm64: Add build and documentation for FP tests selftests: arm64: Add wrapper scripts for stress tests selftests: arm64: Add utility to set SVE vector lengths selftests: arm64: Add stress tests for FPSMID and SVE context switching selftests: arm64: Add test for the SVE ptrace interface selftests: arm64: Test case for enumeration of SVE vector lengths kselftests/arm64: add PAuth tests for single threaded consistency and differently initialized keys kselftests/arm64: add PAuth test for whether exec() changes keys kselftests/arm64: add nop checks for PAuth tests kselftests/arm64: add a basic Pointer Authentication test Implementation of ARCH_STACKWALK for unwinding. (Mark Brown) * for-next/stacktrace: arm64: Move console stack display code to stacktrace.c arm64: stacktrace: Convert to ARCH_STACKWALK arm64: stacktrace: Make stack walk callback consistent with generic code stacktrace: Remove reliable argument from arch_stack_walk() callback Support for ASID pinning, which is required when sharing page-tables with the SMMU. (Jean-Philippe Brucker) * for-next/svm: arm64: cpufeature: Export symbol read_sanitised_ftr_reg() arm64: mm: Pin down ASIDs for sharing mm with devices Rely on firmware tables for establishing CPU topology. (Valentin Schneider) * for-next/topology: arm64: topology: Stop using MPIDR for topology information Spelling fixes. (Xiaoming Ni and Yanfei Xu) * for-next/tpyos: arm64/numa: Fix a typo in comment of arm64_numa_init arm64: fix some spelling mistakes in the comments by codespell vDSO cleanups. (Will Deacon) * for-next/vdso: arm64: vdso: Fix unusual formatting in *setup_additional_pages() arm64: vdso32: Remove a bunch of #ifdef CONFIG_COMPAT_VDSO guards
2020-10-01arm64: mm: Make flush_tlb_fix_spurious_fault() a no-opWill Deacon1-0/+8
Our use of broadcast TLB maintenance means that spurious page-faults that have been handled already by another CPU do not require additional TLB maintenance. Make flush_tlb_fix_spurious_fault() a no-op and rely on the existing TLB invalidation instead. Add an explicit flush_tlb_page() when making a page dirty, as the TLB is permitted to cache the old read-only entry. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20200728092220.GA21800@willie-the-truck Signed-off-by: Will Deacon <will@kernel.org>
2020-09-30Merge branch 'kvm-arm64/hyp-pcpu' into kvmarm-master/nextMarc Zyngier12-267/+193
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-09-30Merge remote-tracking branch 'arm64/for-next/ghostbusters' into kvm-arm64/hyp-pcpuMarc Zyngier9-198/+58
Signed-off-by: Marc Zyngier <maz@kernel.org>
2020-09-30kvm: arm64: Remove unnecessary hyp mappingsDavid Brazdil1-20/+0
With all nVHE per-CPU variables being part of the hyp per-CPU region, mapping them individual is not necessary any longer. They are mapped to hyp as part of the overall per-CPU region. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Andrew Scull <ascull@google.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200922204910.7265-11-dbrazdil@google.com
2020-09-30kvm: arm64: Set up hyp percpu data for nVHEDavid Brazdil1-2/+17
Add hyp percpu section to linker script and rename the corresponding ELF sections of hyp/nvhe object files. This moves all nVHE-specific percpu variables to the new hyp percpu section. Allocate sufficient amount of memory for all percpu hyp regions at global KVM init time and create corresponding hyp mappings. The base addresses of hyp percpu regions are kept in a dynamically allocated array in the kernel. Add NULL checks in PMU event-reset code as it may run before KVM memory is initialized. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200922204910.7265-10-dbrazdil@google.com