aboutsummaryrefslogtreecommitdiffstats
path: root/arch (follow)
AgeCommit message (Collapse)AuthorFilesLines
2021-11-18KVM: nVMX: don't use vcpu->arch.efer when checking host state on nested state loadMaxim Levitsky1-5/+17
When loading nested state, don't use check vcpu->arch.efer to get the L1 host's 64-bit vs. 32-bit state and don't check it for consistency with respect to VM_EXIT_HOST_ADDR_SPACE_SIZE, as register state in vCPU may be stale when KVM_SET_NESTED_STATE is called---and architecturally does not exist. When restoring L2 state in KVM, the CPU is placed in non-root where nested VMX code has no snapshot of L1 host state: VMX (conditionally) loads host state fields loaded on VM-exit, but they need not correspond to the state before entry. A simple case occurs in KVM itself, where the host RIP field points to vmx_vmexit rather than the instruction following vmlaunch/vmresume. However, for the particular case of L1 being in 32- or 64-bit mode on entry, the exit controls can be treated instead as the source of truth regarding the state of L1 on entry, and can be used to check that vmcs12.VM_EXIT_HOST_ADDR_SPACE_SIZE matches vmcs12.HOST_EFER if vmcs12.VM_EXIT_LOAD_IA32_EFER is set. The consistency check on CPU EFER vs. vmcs12.VM_EXIT_HOST_ADDR_SPACE_SIZE, instead, happens only on VM-Enter. That's because, again, there's conceptually no "current" L1 EFER to check on KVM_SET_NESTED_STATE. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211115131837.195527-2-mlevitsk@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18KVM: Fix steal time asm constraintsDavid Woodhouse1-3/+3
In 64-bit mode, x86 instruction encoding allows us to use the low 8 bits of any GPR as an 8-bit operand. In 32-bit mode, however, we can only use the [abcd] registers. For which, GCC has the "q" constraint instead of the less restrictive "r". Also fix st->preempted, which is an input/output operand rather than an input. Fixes: 7e2175ebd695 ("KVM: x86: Fix recording of guest steal time / preempted status") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <89bf72db1b859990355f9c40713a34e0d2d86c98.camel@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-18cpuid: kvm_find_kvm_cpuid_features() should be declared 'static'Paul Durrant1-1/+1
The lack a static declaration currently results in: arch/x86/kvm/cpuid.c:128:26: warning: no previous prototype for function 'kvm_find_kvm_cpuid_features' when compiling with "W=1". Reported-by: kernel test robot <lkp@intel.com> Fixes: 760849b1476c ("KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES") Signed-off-by: Paul Durrant <pdurrant@amazon.com> Message-Id: <20211115144131.5943-1-pdurrant@amazon.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-17Merge tag 'mips-fixes_5.16_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linuxLinus Torvalds6-1/+16
Pull MIPS fixes from Thomas Bogendoerfer: - wire futex_waitv syscall - build fixes for lantiq and bcm63xx configs - yamon-dt bugfix * tag 'mips-fixes_5.16_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: mips: lantiq: add support for clk_get_parent() mips: bcm63xx: add support for clk_get_parent() MIPS: generic/yamon-dt: fix uninitialized variable error MIPS: syscalls: Wire up futex_waitv syscall
2021-11-17x86/perf: Fix snapshot_branch_stack warning in VMSong Liu1-2/+0
When running in VM intel_pmu_snapshot_branch_stack triggers WRMSR warning like: [ ] unchecked MSR access error: WRMSR to 0x3f1 (tried to write 0x0000000000000000) at rIP: 0xffffffff81011a5b (intel_pmu_snapshot_branch_stack+0x3b/0xd0) This can be triggered with BPF selftests: tools/testing/selftests/bpf/test_progs -t get_branch_snapshot This warning is caused by __intel_pmu_pebs_disable_all() in the VM. Since it is not necessary to disable PEBS for LBR, remove it from intel_pmu_snapshot_branch_stack and intel_pmu_snapshot_arch_branch_stack. Fixes: c22ac2a3d4bd ("perf: Enable branch record for software events") Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20211112054510.2667030-1-songliubraving@fb.com
2021-11-17perf/x86/intel/uncore: Fix IIO event constraints for SnowridgeAlexander Antonov1-0/+8
According to the latest uncore document, DATA_REQ_OF_CPU (0x83), DATA_REQ_BY_CPU (0xc0) and COMP_BUF_OCCUPANCY (0xd5) events have constraints. Add uncore IIO constraints for Snowridge. Fixes: 210cc5f9db7a ("perf/x86/intel/uncore: Add uncore support for Snow Ridge server") Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20211115090334.3789-4-alexander.antonov@linux.intel.com
2021-11-17perf/x86/intel/uncore: Fix IIO event constraints for Skylake ServerAlexander Antonov1-0/+1
According to the latest uncore document, COMP_BUF_OCCUPANCY (0xd5) event can be collected on 2-3 counters. Update uncore IIO event constraints for Skylake Server. Fixes: cd34cd97b7b4 ("perf/x86/intel/uncore: Add Skylake server uncore support") Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20211115090334.3789-3-alexander.antonov@linux.intel.com
2021-11-17perf/x86/intel/uncore: Fix filter_tid mask for CHA events on Skylake ServerAlexander Antonov1-0/+3
According Uncore Reference Manual: any of the CHA events may be filtered by Thread/Core-ID by using tid modifier in CHA Filter 0 Register. Update skx_cha_hw_config() to follow Uncore Guide. Fixes: cd34cd97b7b4 ("perf/x86/intel/uncore: Add Skylake server uncore support") Signed-off-by: Alexander Antonov <alexander.antonov@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20211115090334.3789-2-alexander.antonov@linux.intel.com
2021-11-17Documentation: update vcpu-requests.rst referenceMauro Carvalho Chehab1-1/+1
Changeset 2f5947dfcaec ("Documentation: move Documentation/virtual to Documentation/virt") renamed: Documentation/virtual/kvm/vcpu-requests.rst to: Documentation/virt/kvm/vcpu-requests.rst. Update its cross-reference accordingly. Fixes: 2f5947dfcaec ("Documentation: move Documentation/virtual to Documentation/virt") Reviewed-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2021-11-17powerpc/xive: Change IRQ domain to a tree domainCédric Le Goater2-3/+1
Commit 4f86a06e2d6e ("irqdomain: Make normal and nomap irqdomains exclusive") introduced an IRQ_DOMAIN_FLAG_NO_MAP flag to isolate the 'nomap' domains still in use under the powerpc arch. With this new flag, the revmap_tree of the IRQ domain is not used anymore. This change broke the support of shared LSIs [1] in the XIVE driver because it was relying on a lookup in the revmap_tree to query previously mapped interrupts. Linux now creates two distinct IRQ mappings on the same HW IRQ which can lead to unexpected behavior in the drivers. The XIVE IRQ domain is not a direct mapping domain and its HW IRQ interrupt number space is rather large : 1M/socket on POWER9 and POWER10, change the XIVE driver to use a 'tree' domain type instead. [1] For instance, a linux KVM guest with virtio-rng and virtio-balloon devices. Fixes: 4f86a06e2d6e ("irqdomain: Make normal and nomap irqdomains exclusive") Cc: stable@vger.kernel.org # v5.14+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Tested-by: Greg Kurz <groug@kaod.org> Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211116134022.420412-1-clg@kaod.org
2021-11-16x86/sgx: Fix free page accountingReinette Chatre1-6/+6
The SGX driver maintains a single global free page counter, sgx_nr_free_pages, that reflects the number of free pages available across all NUMA nodes. Correspondingly, a list of free pages is associated with each NUMA node and sgx_nr_free_pages is updated every time a page is added or removed from any of the free page lists. The main usage of sgx_nr_free_pages is by the reclaimer that runs when it (sgx_nr_free_pages) goes below a watermark to ensure that there are always some free pages available to, for example, support efficient page faults. With sgx_nr_free_pages accessed and modified from a few places it is essential to ensure that these accesses are done safely but this is not the case. sgx_nr_free_pages is read without any protection and updated with inconsistent protection by any one of the spin locks associated with the individual NUMA nodes. For example: CPU_A CPU_B ----- ----- spin_lock(&nodeA->lock); spin_lock(&nodeB->lock); ... ... sgx_nr_free_pages--; /* NOT SAFE */ sgx_nr_free_pages--; spin_unlock(&nodeA->lock); spin_unlock(&nodeB->lock); Since sgx_nr_free_pages may be protected by different spin locks while being modified from different CPUs, the following scenario is possible: CPU_A CPU_B ----- ----- {sgx_nr_free_pages = 100} spin_lock(&nodeA->lock); spin_lock(&nodeB->lock); sgx_nr_free_pages--; sgx_nr_free_pages--; /* LOAD sgx_nr_free_pages = 100 */ /* LOAD sgx_nr_free_pages = 100 */ /* sgx_nr_free_pages-- */ /* sgx_nr_free_pages-- */ /* STORE sgx_nr_free_pages = 99 */ /* STORE sgx_nr_free_pages = 99 */ spin_unlock(&nodeA->lock); spin_unlock(&nodeB->lock); In the above scenario, sgx_nr_free_pages is decremented from two CPUs but instead of sgx_nr_free_pages ending with a value that is two less than it started with, it was only decremented by one while the number of free pages were actually reduced by two. The consequence of sgx_nr_free_pages not being protected is that its value may not accurately reflect the actual number of free pages on the system, impacting the availability of free pages in support of many flows. The problematic scenario is when the reclaimer does not run because it believes there to be sufficient free pages while any attempt to allocate a page fails because there are no free pages available. In the SGX driver the reclaimer's watermark is only 32 pages so after encountering the above example scenario 32 times a user space hang is possible when there are no more free pages because of repeated page faults caused by no free pages made available. The following flow was encountered: asm_exc_page_fault ... sgx_vma_fault() sgx_encl_load_page() sgx_encl_eldu() // Encrypted page needs to be loaded from backing // storage into newly allocated SGX memory page sgx_alloc_epc_page() // Allocate a page of SGX memory __sgx_alloc_epc_page() // Fails, no free SGX memory ... if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) // Wake reclaimer wake_up(&ksgxd_waitq); return -EBUSY; // Return -EBUSY giving reclaimer time to run return -EBUSY; return -EBUSY; return VM_FAULT_NOPAGE; The reclaimer is triggered in above flow with the following code: static bool sgx_should_reclaim(unsigned long watermark) { return sgx_nr_free_pages < watermark && !list_empty(&sgx_active_page_list); } In the problematic scenario there were no free pages available yet the value of sgx_nr_free_pages was above the watermark. The allocation of SGX memory thus always failed because of a lack of free pages while no free pages were made available because the reclaimer is never started because of sgx_nr_free_pages' incorrect value. The consequence was that user space kept encountering VM_FAULT_NOPAGE that caused the same address to be accessed repeatedly with the same result. Change the global free page counter to an atomic type that ensures simultaneous updates are done safely. While doing so, move the updating of the variable outside of the spin lock critical section to which it does not belong. Cc: stable@vger.kernel.org Fixes: 901ddbb9ecf5 ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()") Suggested-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lkml.kernel.org/r/a95a40743bbd3f795b465f30922dde7f1ea9e0eb.1637004094.git.reinette.chatre@intel.com
2021-11-16KVM: x86: Fix uninitialized eoi_exit_bitmap usage in vcpu_load_eoi_exitmap()黄乐1-2/+6
In vcpu_load_eoi_exitmap(), currently the eoi_exit_bitmap[4] array is initialized only when Hyper-V context is available, in other path it is just passed to kvm_x86_ops.load_eoi_exitmap() directly from on the stack, which would cause unexpected interrupt delivery/handling issues, e.g. an *old* linux kernel that relies on PIT to do clock calibration on KVM might randomly fail to boot. Fix it by passing ioapic_handled_vectors to load_eoi_exitmap() when Hyper-V context is not available. Fixes: f2bc14b69c38 ("KVM: x86: hyper-v: Prepare to meet unallocated Hyper-V context") Cc: stable@vger.kernel.org Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Huang Le <huangle1@jd.com> Message-Id: <62115b277dab49ea97da5633f8522daf@jd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-16s390: wire up sys_futex_waitv system callVasily Gorbik1-0/+1
Tested with futex kselftests. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-11-16s390/vdso: filter out -mstack-guard and -mstack-sizeSven Schnelle2-6/+9
When CONFIG_VMAP_STACK is disabled, the user can enable CONFIG_STACK_CHECK, which adds a stack overflow check to each C function in the kernel. This is also done for functions in the vdso page. These functions are run in user context and user stack sizes are usually different to what the kernel uses. This might trigger the stack check although the stack size is valid. Therefore filter the -mstack-guard and -mstack-size flags when compiling vdso C files. Cc: stable@kernel.org # 5.10+ Fixes: 4bff8cb54502 ("s390: convert to GENERIC_VDSO") Reported-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-11-16s390/vdso: remove -nostdlib compiler flagMasahiro Yamada2-2/+2
The -nostdlib option requests the compiler to not use the standard system startup files or libraries when linking. It is effective only when $(CC) is used as a linker driver. Since commit 2b2a25845d53 ("s390/vdso: Use $(LD) instead of $(CC) to link vDSO"), $(LD) is directly used, hence -nostdlib is unneeded. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20211107162111.323701-1-masahiroy@kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-11-16s390/boot: simplify and fix kernel memory layout setupVasily Gorbik2-58/+32
Initial KASAN shadow memory range was picked to preserve original kernel modules area position. With protected execution support, which might impose addressing limitation on vmalloc area and hence affect modules area position, current fixed KASAN shadow memory range is only making kernel memory layout setup more complex. So move it to the very end of available virtual space and simplify calculations. At the same time return to previous kernel address space split. In particular commit 0c4f2623b957 ("s390: setup kernel memory layout early") introduced precise identity map size calculation and keeping vmemmap left most starting from a fresh region table entry. This didn't take into account additional mapping region requirement for potential DCSS mapping above available physical memory. So go back to virtual space split between 1:1 mapping & vmemmap array once vmalloc area size is subtracted. Cc: stable@vger.kernel.org Fixes: 0c4f2623b957 ("s390: setup kernel memory layout early") Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-11-16s390/setup: re-arrange memblock setupVasily Gorbik1-5/+4
- Avoid using ULONG_MAX in memblock_remove, it has no functional change but makes memblock_dbg output a range which makes sense. - Actually finish memblock memory setup before doing amode31/cr/uv setup. - Move memblock_dump_all() debug output after memblock memory setup is complete. This gives us final "memory" regions if they were trimmed due to addressing limits and still "physmem" regions as original info which came from mem_detect. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-11-16s390/setup: avoid using memblock_enforce_memory_limitVasily Gorbik1-3/+0
There is a difference in how architectures treat "mem=" option. For some that is an amount of online memory, for s390 and x86 this is the limiting max address. Some memblock api like memblock_enforce_memory_limit() take limit argument and explicitly treat it as the size of online memory, and use __find_max_addr to convert it to an actual max address. Current s390 usage: memblock_enforce_memory_limit(memblock_end_of_DRAM()); yields different results depending on presence of memory holes (offline memory blocks in between online memory). If there are no memory holes limit == max_addr in memblock_enforce_memory_limit() and it does trim online memory and reserved memory regions. With memory holes present it actually does nothing. Since we already use memblock_remove() explicitly to trim online memory regions to potential limit (think mem=, kdump, addressing limits, etc.) drop the usage of memblock_enforce_memory_limit() altogether. Trimming reserved regions should not be required, since we now use memblock_set_current_limit() to limit allocations and any explicit memory reservations above the limit is an actual problem we should not hide. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-11-16s390/setup: avoid reserving memory above identity mappingVasily Gorbik1-9/+1
Such reserved memory region, if not cleaned up later causes problems when memblock_free_all() is called to release free pages to the buddy allocator and those reserved regions are carried over to reserve_bootmem_region() which marks the pages as PageReserved. Instead use memblock_set_current_limit() to make sure memblock allocations do not go over identity mapping (which could happen when "mem=" option is used or during kdump). Cc: stable@vger.kernel.org Fixes: 73045a08cf55 ("s390: unify identity mapping limits handling") Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2021-11-16powerpc/8xx: Fix pinned TLBs with CONFIG_STRICT_KERNEL_RWXChristophe Leroy1-6/+7
As spotted and explained in commit c12ab8dbc492 ("powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST"), the selection of STRICT_KERNEL_RWX without selecting DEBUG_RODATA_TEST has spotted the lack of the DIRTY bit in the pinned kernel data TLBs. This problem should have been detected a lot earlier if things had been working as expected. But due to an incredible level of chance or mishap, this went undetected because of a set of bugs: In fact the DTLBs were not pinned, because instead of setting the reserve bit in MD_CTR, it was set in MI_CTR that is the register for ITLBs. But then, another huge bug was there: the physical address was reset to 0 at the boundary between RO and RW areas, leading to the same physical space being mapped at both 0xc0000000 and 0xc8000000. This had by miracle no consequence until now because the entry was not really pinned so it was overwritten soon enough to go undetected. Of course, now that we really pin the DTLBs, it must be fixed as well. Fixes: f76c8f6d257c ("powerpc/8xx: Add function to set pinned TLBs") Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Depends-on: c12ab8dbc492 ("powerpc/8xx: Fix Oops with STRICT_KERNEL_RWX without DEBUG_RODATA_TEST") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a21e9a057fe2d247a535aff0d157a54eefee017a.1636963688.git.christophe.leroy@csgroup.eu
2021-11-16powerpc/signal32: Fix sigset_t copyChristophe Leroy1-2/+8
The conversion from __copy_from_user() to __get_user() by commit d3ccc9781560 ("powerpc/signal: Use __get_user() to copy sigset_t") introduced a regression in __get_user_sigset() for powerpc/32. The bug was subsequently moved into unsafe_get_user_sigset(). The bug is due to the copied 64 bit value being truncated to 32 bits while being assigned to dst->sig[0] The regression was reported by users of the Xorg packages distributed in Debian/powerpc -- "The symptoms are that the fb screen goes blank, with the backlight remaining on and no errors logged in /var/log; wdm (or startx) run with no effect (I tried logging in in the blind, with no effect). And they are hard to kill, requiring 'kill -KILL ...'" Fix the regression by copying each word of the sigset, not only the first one. __get_user_sigset() was tentatively optimised to copy 64 bits at once in order to minimise KUAP unlock/lock impact, but the unsafe variant doesn't suffer that, so it can just copy words. Fixes: 887f3ceb51cd ("powerpc/signal32: Convert do_setcontext[_tm]() to user access block") Cc: stable@vger.kernel.org # v5.13+ Reported-by: Finn Thain <fthain@linux-m68k.org> Reported-and-tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/99ef38d61c0eb3f79c68942deb0c35995a93a777.1636966353.git.christophe.leroy@csgroup.eu
2021-11-16powerpc/book3e: Fix TLBCAM preset at bootChristophe Leroy2-3/+3
Commit 52bda69ae8b5 ("powerpc/fsl_booke: Tell map_mem_in_cams() if init is done") was supposed to just add an additional parameter to map_mem_in_cams() and always set it to 'true' at that time. But a few call sites were messed up. Fix them. Fixes: 52bda69ae8b5 ("powerpc/fsl_booke: Tell map_mem_in_cams() if init is done") Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d319f2a9367d4d08fd2154e506101bd5f100feeb.1636967119.git.christophe.leroy@csgroup.eu
2021-11-16mips: lantiq: add support for clk_get_parent()Randy Dunlap1-0/+6
Provide a simple implementation of clk_get_parent() in the lantiq subarch so that callers of it will build without errors. Fixes this build error: ERROR: modpost: "clk_get_parent" [drivers/iio/adc/ingenic-adc.ko] undefined! Fixes: 171bb2f19ed6 ("MIPS: Lantiq: Add initial support for Lantiq SoCs") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Cc: linux-mips@vger.kernel.org Cc: John Crispin <john@phrozen.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Jonathan Cameron <jic23@kernel.org> Cc: linux-iio@vger.kernel.org Cc: Russell King <linux@armlinux.org.uk> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: John Crispin <john@phrozen.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-11-16mips: bcm63xx: add support for clk_get_parent()Randy Dunlap1-0/+6
BCM63XX selects HAVE_LEGACY_CLK but does not provide/support clk_get_parent(), so add a simple implementation of that function so that callers of it will build without errors. Fixes these build errors: mips-linux-ld: drivers/iio/adc/ingenic-adc.o: in function `jz4770_adc_init_clk_div': ingenic-adc.c:(.text+0xe4): undefined reference to `clk_get_parent' mips-linux-ld: drivers/iio/adc/ingenic-adc.o: in function `jz4725b_adc_init_clk_div': ingenic-adc.c:(.text+0x1b8): undefined reference to `clk_get_parent' Fixes: e7300d04bd08 ("MIPS: BCM63xx: Add support for the Broadcom BCM63xx family of SOCs." ) Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: kernel test robot <lkp@intel.com> Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Cc: Artur Rojek <contact@artur-rojek.eu> Cc: Paul Cercueil <paul@crapouillou.net> Cc: linux-mips@vger.kernel.org Cc: Jonathan Cameron <jic23@kernel.org> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: linux-iio@vger.kernel.org Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: bcm-kernel-feedback-list@broadcom.com Cc: Jonas Gorski <jonas.gorski@gmail.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-11-16MIPS: generic/yamon-dt: fix uninitialized variable errorColin Ian King1-1/+1
In the case where fw_getenv returns an error when fetching values for ememsizea and memsize then variable phys_memsize is not assigned a variable and will be uninitialized on a zero check of phys_memsize. Fix this by initializing phys_memsize to zero. Cleans up cppcheck error: arch/mips/generic/yamon-dt.c:100:7: error: Uninitialized variable: phys_memsize [uninitvar] Fixes: f41d2430bbd6 ("MIPS: generic/yamon-dt: Support > 256MB of RAM") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-11-16MIPS: syscalls: Wire up futex_waitv syscallWang Haojun3-0/+3
Wire up the futex_waitv syscall. Fix Build warning: #warning syscall futex_waitv not implemented [-Wcpp] Signed-off-by: Wang Haojun <wanghaojun@loongson.cn> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2021-11-15x86/hyperv: Move required MSRs check to initial platform probingSean Christopherson2-13/+16
Explicitly check for MSR_HYPERCALL and MSR_VP_INDEX support when probing for running as a Hyper-V guest instead of waiting until hyperv_init() to detect the bogus configuration. Add messages to give the admin a heads up that they are likely running on a broken virtual machine setup. At best, silently disabling Hyper-V is confusing and difficult to debug, e.g. the kernel _says_ it's using all these fancy Hyper-V features, but always falls back to the native versions. At worst, the half baked setup will crash/hang the kernel. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20211104182239.1302956-3-seanjc@google.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-11-15x86/hyperv: Fix NULL deref in set_hv_tscchange_cb() if Hyper-V setup failsSean Christopherson1-0/+3
Check for a valid hv_vp_index array prior to derefencing hv_vp_index when setting Hyper-V's TSC change callback. If Hyper-V setup failed in hyperv_init(), the kernel will still report that it's running under Hyper-V, but will have silently disabled nearly all functionality. BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.15.0-rc2+ #75 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:set_hv_tscchange_cb+0x15/0xa0 Code: <8b> 04 82 8b 15 12 17 85 01 48 c1 e0 20 48 0d ee 00 01 00 f6 c6 08 ... Call Trace: kvm_arch_init+0x17c/0x280 kvm_init+0x31/0x330 vmx_init+0xba/0x13a do_one_initcall+0x41/0x1c0 kernel_init_freeable+0x1f2/0x23b kernel_init+0x16/0x120 ret_from_fork+0x22/0x30 Fixes: 93286261de1b ("x86/hyperv: Reenlightenment notifications support") Cc: stable@vger.kernel.org Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20211104182239.1302956-2-seanjc@google.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
2021-11-15x86/boot: Pull up cmdline preparation and early param parsingBorislav Petkov1-27/+39
Dan reports that Anjaneya Chagam can no longer use the efi=nosoftreserve kernel command line parameter to suppress "soft reservation" behavior. This is due to the fact that the following call-chain happens at boot: early_reserve_memory |-> efi_memblock_x86_reserve_range |-> efi_fake_memmap_early which does if (!efi_soft_reserve_enabled()) return; and that would have set EFI_MEM_NO_SOFT_RESERVE after having parsed "nosoftreserve". However, parse_early_param() gets called *after* it, leading to the boot cmdline not being taken into account. Therefore, carve out the command line preparation into a separate function which does the early param parsing too. So that it all goes together. And then call that function before early_reserve_memory() so that the params would have been parsed by then. Fixes: 8aa83e6395ce ("x86/setup: Call early_reserve_memory() earlier") Reported-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Anjaneya Chagam <anjaneya.chagam@intel.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/e8dd8993c38702ee6dd73b3c11f158617e665607.camel@intel.com
2021-11-15powerpc/pseries/ddw: Do not try direct mapping with persistent memory and one windowAlexey Kardashevskiy1-2/+4
There is a possibility of having just one DMA window available with a limited capacity which the existing code does not handle that well. If the window is big enough for the system RAM but less than MAX_PHYSMEM_BITS (which we want when persistent memory is present), we create 1:1 window and leave persistent memory without DMA. This disables 1:1 mapping entirely if there is persistent memory and either: - the huge DMA window does not cover the entire address space; - the default DMA window is removed. This relies on reverted 54fc3c681ded ("powerpc/pseries/ddw: Extend upper limit for huge DMA window for persistent memory") to return the actual amount RAM in ddw_memory_hotplug_max() (posted separately). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211108040320.3857636-4-aik@ozlabs.ru
2021-11-15powerpc/pseries/ddw: simplify enable_ddw()Alexey Kardashevskiy1-7/+4
This drops rather useless ddw_enabled flag as direct_mapping implies it anyway. While at this, fix indents in enable_ddw(). This should not cause any behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211108040320.3857636-3-aik@ozlabs.ru
2021-11-15powerpc/pseries/ddw: Revert "Extend upper limit for huge DMA window for persistent memory"Alexey Kardashevskiy1-9/+0
This reverts commit 54fc3c681ded9437e4548e2501dc1136b23cfa9a which does not allow 1:1 mapping even for the system RAM which is usually possible. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211108040320.3857636-2-aik@ozlabs.ru
2021-11-15powerpc/pseries: Fix numa FORM2 parsing fallback codeNicholas Piggin1-16/+12
In case the FORM2 distance table from firmware is not the expected size, there is fallback code that just populates the lookup table as local vs remote. However it then continues on to use the distance table. Fix. Fixes: 1c6b5a7e7405 ("powerpc/pseries: Add support for FORM2 associativity") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211109064900.2041386-2-npiggin@gmail.com
2021-11-15powerpc/pseries: rename numa_dist_table to form2_distancesNicholas Piggin1-9/+9
The name of the local variable holding the "form2" property address conflicts with the numa_distance_table global. This patch does 's/numa_dist_table/form2_distances/g' over the function, which also renames numa_dist_table_length to form2_distances_length. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211109064900.2041386-1-npiggin@gmail.com
2021-11-15powerpc: clean vdso32 and vdso64 directoriesMasahiro Yamada1-0/+3
Since commit bce74491c300 ("powerpc/vdso: fix unnecessary rebuilds of vgettimeofday.o"), "make ARCH=powerpc clean" does not clean up the arch/powerpc/kernel/{vdso32,vdso64} directories. Use the subdir- trick to let "make clean" descend into them. Fixes: bce74491c300 ("powerpc/vdso: fix unnecessary rebuilds of vgettimeofday.o") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211109185015.615517-1-masahiroy@kernel.org
2021-11-15powerpc/83xx/mpc8349emitx: Drop unused variableUwe Kleine-König1-1/+0
Commit 5d354dc35ebb ("powerpc/83xx/mpc8349emitx: Make mcu_gpiochip_remove() return void") removed the usage of the variable ret, but failed to remove the variable itself, resulting in: arch/powerpc/platforms/83xx/mcu_mpc8349emitx.c: In function ‘mcu_remove’: arch/powerpc/platforms/83xx/mcu_mpc8349emitx.c:189:6: error: unused variable ‘ret’ [-Werror=unused-variable] 189 | int ret; | ^~~ So remove the variable now. Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211110110739.1072634-1-u.kleine-koenig@pengutronix.de
2021-11-15KVM: PPC: Book3S HV: Use GLOBAL_TOC for kvmppc_h_set_dabr/xdabr()Michael Ellerman1-2/+2
kvmppc_h_set_dabr(), and kvmppc_h_set_xdabr() which jumps into it, need to use _GLOBAL_TOC to setup the kernel TOC pointer, because kvmppc_h_set_dabr() uses LOAD_REG_ADDR() to load dawr_force_enable. When called from hcall_try_real_mode() we have the kernel TOC in r2, established near the start of kvmppc_interrupt_hv(), so there is no issue. But they can also be called from kvmppc_pseries_do_hcall() which is module code, so the access ends up happening with the kvm-hv module's r2, which will not point at dawr_force_enable and could even cause a fault. With the current code layout and compilers we haven't observed a fault in practice, the load hits somewhere in kvm-hv.ko and silently returns some bogus value. Note that we we expect p8/p9 guests to use the DAWR, but SLOF uses h_set_dabr() to test if sc1 works correctly, see SLOF's lib/libhvcall/brokensc1.c. Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Daniel Axtens <dja@axtens.net> Link: https://lore.kernel.org/r/20210923151031.72408-1-mpe@ellerman.id.au
2021-11-14Merge tag 'for-5.16/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linuxLinus Torvalds5-6/+14
Pull more parisc fixes from Helge Deller: "Fix a build error in stracktrace.c, fix resolving of addresses to function names in backtraces, fix single-stepping in assembly code and flush userspace pte's when using set_pte_at()" * tag 'for-5.16/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc/entry: fix trace test in syscall exit path parisc: Flush kernel data mapping in set_pte_at() when installing pte for user page parisc: Fix implicit declaration of function '__kernel_text_address' parisc: Fix backtrace to always include init funtion names
2021-11-14Merge tag 'sh-for-5.16' of git://git.libc.org/linux-shLinus Torvalds20-179/+74
Pull arch/sh updates from Rich Felker. * tag 'sh-for-5.16' of git://git.libc.org/linux-sh: sh: pgtable-3level: Fix cast to pointer from integer of different size sh: fix READ/WRITE redefinition warnings sh: define __BIG_ENDIAN for math-emu sh: math-emu: drop unused functions sh: fix kconfig unmet dependency warning for FRAME_POINTER sh: Cleanup about SPARSE_IRQ sh: kdump: add some attribute to function maple: fix wrong return value of maple_bus_init(). sh: boot: avoid unneeded rebuilds under arch/sh/boot/compressed/ sh: boot: add intermediate vmlinux.bin* to targets instead of extra-y sh: boards: Fix the cacography in irq.c sh: check return code of request_irq sh: fix trivial misannotations
2021-11-14Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds2-13/+13
Pull ARM fixes from Russell King: - Fix early_iounmap - Drop cc-option fallbacks for architecture selection * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 9156/1: drop cc-option fallbacks for architecture selection ARM: 9155/1: fix early early_iounmap()
2021-11-14Merge tag 'devicetree-fixes-for-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linuxLinus Torvalds6-6/+6
Pull devicetree fixes from Rob Herring: - Two fixes due to DT node name changes on Arm, Ltd. boards - Treewide rename of Ingenic CGU headers - Update ST email addresses - Remove Netlogic DT bindings - Dropping few more cases of redundant 'maxItems' in schemas - Convert toshiba,tc358767 bridge binding to schema * tag 'devicetree-fixes-for-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: watchdog: sunxi: fix error in schema bindings: media: venus: Drop redundant maxItems for power-domain-names dt-bindings: Remove Netlogic bindings clk: versatile: clk-icst: Ensure clock names are unique of: Support using 'mask' in making device bus id dt-bindings: treewide: Update @st.com email address to @foss.st.com dt-bindings: media: Update maintainers for st,stm32-hwspinlock.yaml dt-bindings: media: Update maintainers for st,stm32-cec.yaml dt-bindings: mfd: timers: Update maintainers for st,stm32-timers dt-bindings: timer: Update maintainers for st,stm32-timer dt-bindings: i2c: imx: hardware do not restrict clock-frequency to only 100 and 400 kHz dt-bindings: display: bridge: Convert toshiba,tc358767.txt to yaml dt-bindings: Rename Ingenic CGU headers to ingenic,*.h
2021-11-14Merge tag 'locking-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-4/+11
Pull x86 static call update from Thomas Gleixner: "A single fix for static calls to make the trampoline patching more robust by placing explicit signature bytes after the call trampoline to prevent patching random other jumps like the CFI jump table entries" * tag 'locking-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: static_call,x86: Robustify trampoline patching
2021-11-14Merge tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds3-16/+15
Pull scheduler fixes from Borislav Petkov: - Avoid touching ~100 config files in order to be able to select the preemption model - clear cluster CPU masks too, on the CPU unplug path - prevent use-after-free in cfs - Prevent a race condition when updating CPU cache domains - Factor out common shared part of smp_prepare_cpus() into a common helper which can be called by both baremetal and Xen, in order to fix a booting of Xen PV guests * tag 'sched_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: preempt: Restore preemption model selection configs arch_topology: Fix missing clear cluster_cpumask in remove_cpu_topology() sched/fair: Prevent dead task groups from regaining cfs_rq's sched/core: Mitigate race cpus_share_cache()/update_top_cache_domain() x86/smp: Factor out parts of native_smp_prepare_cpus()
2021-11-14Merge tag 'perf_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-1/+5
Pull perf fixes from Borislav Petkov: - Prevent unintentional page sharing by checking whether a page reference to a PMU samples page has been acquired properly before that - Make sure the LBR_SELECT MSR is saved/restored too - Reset the LBR_SELECT MSR when resetting the LBR PMU to clear any residual data left * tag 'perf_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Avoid put_page() when GUP fails perf/x86/vlbr: Add c->flags to vlbr event constraints perf/x86/lbr: Reset LBR_SELECT during vlbr reset
2021-11-14Merge tag 'x86_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds6-4/+60
Pull x86 fixes from Borislav Petkov: - Add the model number of a new, Raptor Lake CPU, to intel-family.h - Do not log spurious corrected MCEs on SKL too, due to an erratum - Clarify the path of paravirt ops patches upstream - Add an optimization to avoid writing out AMX components to sigframes when former are in init state * tag 'x86_urgent_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add Raptor Lake to Intel family x86/mce: Add errata workaround for Skylake SKX37 MAINTAINERS: Add some information to PARAVIRT_OPS entry x86/fpu: Optimize out sigframe xfeatures when in init state
2021-11-13parisc/entry: fix trace test in syscall exit pathSven Schnelle1-1/+1
commit 8779e05ba8aa ("parisc: Fix ptrace check on syscall return") fixed testing of TI_FLAGS. This uncovered a bug in the test mask. syscall_restore_rfi is only used when the kernel needs to exit to usespace with single or block stepping and the recovery counter enabled. The test however used _TIF_SYSCALL_TRACE_MASK, which includes a lot of bits that shouldn't be tested here. Fix this by using TIF_SINGLESTEP and TIF_BLOCKSTEP directly. I encountered this bug by enabling syscall tracepoints. Both in qemu and on real hardware. As soon as i enabled the tracepoint (sys_exit_read, but i guess it doesn't really matter which one), i got random page faults in userspace almost immediately. Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: Helge Deller <deller@gmx.de>
2021-11-13parisc: Flush kernel data mapping in set_pte_at() when installing pte for user pageJohn David Anglin2-4/+10
For years, there have been random segmentation faults in userspace on SMP PA-RISC machines. It occurred to me that this might be a problem in set_pte_at(). MIPS and some other architectures do cache flushes when installing PTEs with the present bit set. Here I have adapted the code in update_mmu_cache() to flush the kernel mapping when the kernel flush is deferred, or when the kernel mapping may alias with the user mapping. This simplifies calls to update_mmu_cache(). I also changed the barrier in set_pte() from a compiler barrier to a full memory barrier. I know this change is not sufficient to fix the problem. It might not be needed. I have had a few days of operation with 5.14.16 to 5.15.1 and haven't seen any random segmentation faults on rp3440 or c8000 so far. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@kernel.org # 5.12+
2021-11-13parisc: Fix implicit declaration of function '__kernel_text_address'Helge Deller1-0/+1
Signed-off-by: Helge Deller <deller@gmx.de>
2021-11-13parisc: Fix backtrace to always include init funtion namesHelge Deller1-1/+2
I noticed that sometimes at kernel startup the backtraces did not included the function names of init functions. Their address were not resolved to function names and instead only the address was printed. Debugging shows that the culprit is is_ksym_addr() which is called by the backtrace functions to check if an address belongs to a function in the kernel. The problem occurs only for CONFIG_KALLSYMS_ALL=y. When looking at is_ksym_addr() one can see that for CONFIG_KALLSYMS_ALL=y the function only tries to resolve the address via is_kernel() function, which checks like this: if (addr >= _stext && addr <= _end) return 1; On parisc the init functions are located before _stext, so this check fails. Other platforms seem to have all functions (including init functions) behind _stext. The following patch moves the _stext symbol at the beginning of the kernel and thus includes the init section. This fixes the check and does not seem to have any negative side effects on where the kernel mapping happens in the map_pages() function in arch/parisc/mm/init.c. Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@kernel.org # 5.4+
2021-11-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds38-352/+822
Pull more kvm updates from Paolo Bonzini: "New x86 features: - Guest API and guest kernel support for SEV live migration - SEV and SEV-ES intra-host migration Bugfixes and cleanups for x86: - Fix misuse of gfn-to-pfn cache when recording guest steal time / preempted status - Fix selftests on APICv machines - Fix sparse warnings - Fix detection of KVM features in CPUID - Cleanups for bogus writes to MSR_KVM_PV_EOI_EN - Fixes and cleanups for MSR bitmap handling - Cleanups for INVPCID - Make x86 KVM_SOFT_MAX_VCPUS consistent with other architectures Bugfixes for ARM: - Fix finalization of host stage2 mappings - Tighten the return value of kvm_vcpu_preferred_target() - Make sure the extraction of ESR_ELx.EC is limited to architected bits" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (34 commits) KVM: SEV: unify cgroup cleanup code for svm_vm_migrate_from KVM: x86: move guest_pv_has out of user_access section KVM: x86: Drop arbitrary KVM_SOFT_MAX_VCPUS KVM: Move INVPCID type check from vmx and svm to the common kvm_handle_invpcid() KVM: VMX: Add a helper function to retrieve the GPR index for INVPCID, INVVPID, and INVEPT KVM: nVMX: Clean up x2APIC MSR handling for L2 KVM: VMX: Macrofy the MSR bitmap getters and setters KVM: nVMX: Handle dynamic MSR intercept toggling KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in use KVM: x86: Don't update vcpu->arch.pv_eoi.msr_val when a bogus value was written to MSR_KVM_PV_EOI_EN KVM: x86: Rename kvm_lapic_enable_pv_eoi() KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES KVM: x86: Add helper to consolidate core logic of SET_CPUID{2} flows kvm: mmu: Use fast PF path for access tracking of huge pages when possible KVM: x86/mmu: Properly dereference rcu-protected TDP MMU sptep iterator KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active kvm: x86: Convert return type of *is_valid_rdpmc_ecx() to bool KVM: x86: Fix recording of guest steal time / preempted status selftest: KVM: Add intra host migration tests selftest: KVM: Add open sev dev helper ...