aboutsummaryrefslogtreecommitdiffstats
path: root/arch (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-01-21KVM: Fix some wrong function names in commentMiaohe Lin1-1/+1
Fix some wrong function names in comment. mmu_check_roots is a typo for mmu_check_root, vmcs_read_any should be vmcs12_read_any and so on. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21KVM: x86: check kvm_pit outside kvm_vm_ioctl_reinject()Miaohe Lin1-3/+3
check kvm_pit outside kvm_vm_ioctl_reinject() to keep codestyle consistent with other kvm_pit func and prepare for futher cleanups. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21KVM: LAPIC: micro-optimize fixed mode ipi deliveryWanpeng Li1-3/+3
This patch optimizes redundancy logic before fixed mode ipi is delivered in the fast path, broadcast handling needs to go slow path, so the delivery mode repair can be delayed to before slow path. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpathWanpeng Li5-11/+78
ICR and TSCDEADLINE MSRs write cause the main MSRs write vmexits in our product observation, multicast IPIs are not as common as unicast IPI like RESCHEDULE_VECTOR and CALL_FUNCTION_SINGLE_VECTOR etc. This patch introduce a mechanism to handle certain performance-critical WRMSRs in a very early stage of KVM VMExit handler. This mechanism is specifically used for accelerating writes to x2APIC ICR that attempt to send a virtual IPI with physical destination-mode, fixed delivery-mode and single target. Which was found as one of the main causes of VMExits for Linux workloads. The reason this mechanism significantly reduce the latency of such virtual IPIs is by sending the physical IPI to the target vCPU in a very early stage of KVM VMExit handler, before host interrupts are enabled and before expensive operations such as reacquiring KVM’s SRCU lock. Latency is reduced even more when KVM is able to use APICv posted-interrupt mechanism (which allows to deliver the virtual IPI directly to target vCPU without the need to kick it to host). Testing on Xeon Skylake server: The virtual IPI latency from sender send to receiver receive reduces more than 200+ cpu cycles. Reviewed-by: Liran Alon <liran.alon@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86/mmu: WARN if root_hpa is invalid when handling a page faultSean Christopherson1-3/+1
WARN if root_hpa is invalid when handling a page fault. The check on root_hpa exists for historical reasons that no longer apply to the current KVM code base. Remove an equivalent debug-only warning in direct_page_fault(), whose existence more or less confirms that root_hpa should always be valid when handling a page fault. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86/mmu: WARN on an invalid root_hpaSean Christopherson2-2/+2
WARN on the existing invalid root_hpa checks in __direct_map() and FNAME(fetch). The "legitimate" path that invalidated root_hpa in the middle of a page fault is long since gone, i.e. it should no longer be impossible to invalidate in the middle of a page fault[*]. The root_hpa checks were added by two related commits 989c6b34f6a94 ("KVM: MMU: handle invalid root_hpa at __direct_map") 37f6a4e237303 ("KVM: x86: handle invalid root_hpa everywhere") to fix a bug where nested_vmx_vmexit() could be called *in the middle* of a page fault. At the time, vmx_interrupt_allowed(), which was and still is used by kvm_can_do_async_pf() via ->interrupt_allowed(), directly invoked nested_vmx_vmexit() to switch from L2 to L1 to emulate a VM-Exit on a pending interrupt. Emulating the nested VM-Exit resulted in root_hpa being invalidated by kvm_mmu_reset_context() without explicitly terminating the page fault. Now that root_hpa is checked for validity by kvm_mmu_page_fault(), WARN on an invalid root_hpa to detect any flows that reset the MMU while handling a page fault. The broken vmx_interrupt_allowed() behavior has long since been fixed and resetting the MMU during a page fault should not be considered legal behavior. [*] It's actually technically possible in FNAME(page_fault)() because it calls inject_page_fault() when the guest translation is invalid, but in that case the page fault handling is immediately terminated. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86/mmu: Move root_hpa validity checks to top of page fault handlerSean Christopherson1-10/+4
Add a check on root_hpa at the beginning of the page fault handler to consolidate several checks on root_hpa that are scattered throughout the page fault code. This is a preparatory step towards eventually removing such checks altogether, or at the very least WARNing if an invalid root is encountered. Remove only the checks that can be easily audited to confirm that root_hpa cannot be invalidated between their current location and the new check in kvm_mmu_page_fault(), and aren't currently protected by mmu_lock, i.e. keep the checks in __direct_map() and FNAME(fetch) for the time being. The root_hpa checks that are consolidate were all added by commit 37f6a4e237303 ("KVM: x86: handle invalid root_hpa everywhere") which was a follow up to a bug fix for __direct_map(), commit 989c6b34f6a94 ("KVM: MMU: handle invalid root_hpa at __direct_map") At the time, nested VMX had, in hindsight, crazy handling of nested interrupts and would trigger a nested VM-Exit in ->interrupt_allowed(), and thus unexpectedly reset the MMU in flows such as can_do_async_pf(). Now that the wonky nested VM-Exit behavior is gone, the root_hpa checks are bogus and confusing, e.g. it's not at all obvious what they actually protect against, and at first glance they appear to be broken since many of them run without holding mmu_lock. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86/mmu: Move calls to thp_adjust() down a levelSean Christopherson2-24/+18
Move the calls to thp_adjust() down a level from the page fault handlers to the map/fetch helpers and remove the page count shuffling done in thp_adjust(). Despite holding a reference to the underlying page while processing a page fault, the page fault flows don't actually rely on holding a reference to the page when thp_adjust() is called. At that point, the fault handlers hold mmu_lock, which prevents mmu_notifier from completing any invalidations, and have verified no invalidations from mmu_notifier have occurred since the page reference was acquired (which is done prior to taking mmu_lock). The kvm_release_pfn_clean()/kvm_get_pfn() dance in thp_adjust() is a quirk that is necessitated because thp_adjust() modifies the pfn that is consumed by its caller. Because the page fault handlers call kvm_release_pfn_clean() on said pfn, thp_adjust() needs to transfer the reference to the correct pfn purely for correctness when the pfn is released. Calling thp_adjust() from __direct_map() and FNAME(fetch) means the pfn adjustment doesn't change the pfn as seen by the page fault handlers, i.e. the pfn released by the page fault handlers is the same pfn that was returned by gfn_to_pfn(). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86/mmu: Move transparent_hugepage_adjust() above __direct_map()Sean Christopherson1-38/+38
Move thp_adjust() above __direct_map() in preparation of calling thp_adjust() from __direct_map() and FNAME(fetch). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86/mmu: Consolidate tdp_page_fault() and nonpaging_page_fault()Sean Christopherson1-71/+27
Consolidate the direct MMU page fault handlers into a common helper, direct_page_fault(). Except for unique max level conditions, the tdp and nonpaging fault handlers are functionally identical. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86/mmu: Rename lpage_disallowed to account_disallowed_nx_lpageSean Christopherson1-2/+2
Rename __direct_map()'s param that controls whether or not a disallowed NX large page should be accounted to match what it actually does. The nonpaging_page_fault() case unconditionally passes %false for the param even though it locally sets lpage_disallowed. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86/mmu: Persist gfn_lpage_is_disallowed() to max_levelSean Christopherson1-10/+18
Persist the max page level calculated via gfn_lpage_is_disallowed() to the max level "returned" by mapping_level() so that its naturally taken into account by the max level check that conditions calling transparent_hugepage_adjust(). Drop the gfn_lpage_is_disallowed() check in thp_adjust() as it's now handled by mapping_level() and its callers. Add a comment to document the behavior of host_mapping_level() and its interaction with max level and transparent huge pages. Note, transferring the gfn_lpage_is_disallowed() from thp_adjust() to mapping_level() superficially affects how changes to a memslot's disallow_lpage count will be handled due to thp_adjust() being run while holding mmu_lock. In the more common case where a different vCPU increments the count via account_shadowed(), gfn_lpage_is_disallowed() is rechecked by set_spte() to ensure a writable large page isn't created. In the less common case where the count is decremented to zero due to all shadow pages in the memslot being zapped, THP behavior now matches hugetlbfs behavior in the sense that a small page will be created when a large page could be used if the count reaches zero in the miniscule window between mapping_level() and acquiring mmu_lock. Lastly, the new THP behavior also follows hugetlbfs behavior in the absurdly unlikely scenario of a memslot being moved such that the memslot's compatibility with respect to large pages changes, but without changing the validity of the gpf->pfn walk. I.e. if a memslot is moved between mapping_level() and snapshotting mmu_seq, it's theoretically possible to consume a stale disallow_lpage count. But, since KVM zaps all shadow pages when moving a memslot and forces all vCPUs to reload a new MMU, the inserted spte will always be thrown away prior to completing the memslot move, i.e. whether or not the spte accurately reflects disallow_lpage is irrelevant. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86/mmu: Incorporate guest's page level into max level for shadow MMUSean Christopherson1-12/+8
Restrict the max level for a shadow page based on the guest's level instead of capping the level after the fact for host-mapped huge pages, e.g. hugetlbfs pages. Explicitly capping the max level using the guest mapping level also eliminates FNAME(page_fault)'s subtle dependency on THP only supporting 2mb pages. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86/mmu: Refactor handling of forced 4k pages in page faultsSean Christopherson2-32/+29
Refactor the page fault handlers and mapping_level() to track the max allowed page level instead of only tracking if a 4k page is mandatory due to one restriction or another. This paves the way for cleanly consolidating tdp_page_fault() and nonpaging_page_fault(), and for eliminating a redundant check on mmu_gfn_lpage_is_disallowed(). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86/mmu: Refactor the per-slot level calculation in mapping_level()Sean Christopherson1-5/+5
Invert the loop which adjusts the allowed page level based on what's compatible with the associated memslot to use a largest-to-smallest page size walk. This paves the way for passing around a "max level" variable instead of having redundant checks and/or multiple booleans. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86/mmu: Refactor handling of cache consistency with TDPSean Christopherson1-16/+14
Pre-calculate the max level for a TDP page with respect to MTRR cache consistency in preparation of replacing force_pt_level with max_level, and eventually combining the bulk of nonpaging_page_fault() and tdp_page_fault() into a common helper. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86/mmu: Move nonpaging_page_fault() below try_async_pf()Sean Christopherson1-53/+49
Move nonpaging_page_fault() below try_async_pf() to eliminate the forward declaration of try_async_pf() and to prepare for combining the bulk of nonpaging_page_fault() and tdp_page_fault() into a common helper. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86/mmu: Fold nonpaging_map() into nonpaging_page_fault()Sean Christopherson1-57/+49
Fold nonpaging_map() into its sole caller, nonpaging_page_fault(), in preparation for combining the bulk of nonpaging_page_fault() and tdp_page_fault() into a common helper. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86/mmu: Move definition of make_mmu_pages_available() upSean Christopherson1-21/+20
Move make_mmu_pages_available() above its first user to put it closer to related code and eliminate a forward declaration. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86: Use gpa_t for cr2/gpa to fix TDP support on 32-bit KVMSean Christopherson6-70/+86
Convert a plethora of parameters and variables in the MMU and page fault flows from type gva_t to gpa_t to properly handle TDP on 32-bit KVM. Thanks to PSE and PAE paging, 32-bit kernels can access 64-bit physical addresses. When TDP is enabled, the fault address is a guest physical address and thus can be a 64-bit value, even when both KVM and its guest are using 32-bit virtual addressing, e.g. VMX's VMCS.GUEST_PHYSICAL is a 64-bit field, not a natural width field. Using a gva_t for the fault address means KVM will incorrectly drop the upper 32-bits of the GPA. Ditto for gva_to_gpa() when it is used to translate L2 GPAs to L1 GPAs. Opportunistically rename variables and parameters to better reflect the dual address modes, e.g. use "cr2_or_gpa" for fault addresses and plain "addr" instead of "vaddr" when the address may be either a GVA or an L2 GPA. Similarly, use "gpa" in the nonpaging_page_fault() flows to avoid a confusing "gpa_t gva" declaration; this also sets the stage for a future patch to combing nonpaging_page_fault() and tdp_page_fault() with minimal churn. Sprinkle in a few comments to document flows where an address is known to be a GVA and thus can be safely truncated to a 32-bit value. Add WARNs in kvm_handle_page_fault() and FNAME(gva_to_gpa_nested)() to help document such cases and detect bugs. Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86: Add a WARN on TIF_NEED_FPU_LOAD in kvm_load_guest_fpu()Sean Christopherson1-0/+7
WARN once in kvm_load_guest_fpu() if TIF_NEED_FPU_LOAD is observed, as that would mean that KVM is corrupting userspace's FPU by saving unknown register state into arch.user_fpu. Add a comment to explain why KVM WARNs on TIF_NEED_FPU_LOAD instead of implementing logic similar to fpu__copy(). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86: Fix potential put_fpu() w/o load_fpu() on MPX platformSean Christopherson1-0/+4
Unlike most state managed by XSAVE, MPX is initialized to zero on INIT. Because INITs are usually recognized in the context of a VCPU_RUN call, kvm_vcpu_reset() puts the guest's FPU so that the FPU state is resident in memory, zeros the MPX state, and reloads FPU state to hardware. But, in the unlikely event that an INIT is recognized during kvm_arch_vcpu_ioctl_get_mpstate() via kvm_apic_accept_events(), kvm_vcpu_reset() will call kvm_put_guest_fpu() without a preceding kvm_load_guest_fpu() and corrupt the guest's FPU state (and possibly userspace's FPU state as well). Given that MPX is being removed from the kernel[*], fix the bug with the simple-but-ugly approach of loading the guest's FPU during KVM_GET_MP_STATE. [*] See commit f240652b6032b ("x86/mpx: Remove MPX APIs"). Fixes: f775b13eedee2 ("x86,kvm: move qemu/guest FPU switching out to vcpu_run") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08kvm: nVMX: Aesthetic cleanup of handle_vmread and handle_vmwriteJim Mattson1-36/+34
Apply reverse fir tree declaration order, shorten some variable names to avoid line wrap, reformat a block comment, delete an extra blank line, and use BIT(10) instead of (1u << 10). Signed-off-by: Jim Mattson <jmattson@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Reviewed-by: Jon Cargille <jcargill@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08kvm: nVMX: VMWRITE checks unsupported field before read-only fieldJim Mattson1-5/+6
According to the SDM, VMWRITE checks to see if the secondary source operand corresponds to an unsupported VMCS field before it checks to see if the secondary source operand corresponds to a VM-exit information field and the processor does not support writing to VM-exit information fields. Fixes: 49f705c5324aa ("KVM: nVMX: Implement VMREAD and VMWRITE") Signed-off-by: Jim Mattson <jmattson@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Reviewed-by: Jon Cargille <jcargill@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08kvm: nVMX: VMWRITE checks VMCS-link pointer before VMCS fieldJim Mattson1-34/+25
According to the SDM, a VMWRITE in VMX non-root operation with an invalid VMCS-link pointer results in VMfailInvalid before the validity of the VMCS field in the secondary source operand is checked. For consistency, modify both handle_vmwrite and handle_vmread, even though there was no problem with the latter. Fixes: 6d894f498f5d1 ("KVM: nVMX: vmread/vmwrite: Use shadow vmcs12 if running L2") Signed-off-by: Jim Mattson <jmattson@google.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Reviewed-by: Jon Cargille <jcargill@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: VMX: Fix the spelling of CPU_BASED_USE_TSC_OFFSETTINGXiaoyao Li3-8/+8
The mis-spelling is found by checkpatch.pl, so fix them. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: VMX: Rename NMI_PENDING to NMI_WINDOWXiaoyao Li3-9/+9
Rename the NMI-window exiting related definitions to match the latest Intel SDM. No functional changes. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: VMX: Rename INTERRUPT_PENDING to INTERRUPT_WINDOWXiaoyao Li4-14/+14
Rename interrupt-windown exiting related definitions to match the latest Intel SDM. No functional changes. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86: Fix some comment typosMiaohe Lin2-2/+2
Fix some typos in comment. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: X86: Convert the last users of "shorthand = 0" to use macrosPeter Xu3-4/+4
Change the last users of "shorthand = 0" to use APIC_DEST_NOSHORT. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: X86: Fix callers of kvm_apic_match_dest() to use correct macrosPeter Xu4-8/+12
Callers of kvm_apic_match_dest() should always pass in APIC_DEST_* macros for either dest_mode and short_hand parameters. Fix up all the callers of kvm_apic_match_dest() that are not following the rule. Since at it, rename the parameter from short_hand to shorthand in kvm_apic_match_dest(), as suggested by Vitaly. Reported-by: Sean Christopherson <sean.j.christopherson@intel.com> Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: X86: Drop KVM_APIC_SHORT_MASK and KVM_APIC_DEST_MASKPeter Xu3-7/+5
We have both APIC_SHORT_MASK and KVM_APIC_SHORT_MASK defined for the shorthand mask. Similarly, we have both APIC_DEST_MASK and KVM_APIC_DEST_MASK defined for the destination mode mask. Drop the KVM_APIC_* macros and replace the only user of them to use the APIC_DEST_* macros instead. At the meantime, move APIC_SHORT_MASK and APIC_DEST_MASK from lapic.c to lapic.h. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: X86: Use APIC_DEST_* macros properly in kvm_lapic_irq.dest_modePeter Xu4-7/+16
We were using either APIC_DEST_PHYSICAL|APIC_DEST_LOGICAL or 0|1 to fill in kvm_lapic_irq.dest_mode. It's fine only because in most cases when we check against dest_mode it's against APIC_DEST_PHYSICAL (which equals to 0). However, that's not consistent. We'll have problem when we want to start checking against APIC_DEST_LOGICAL, which does not equals to 1. This patch firstly introduces kvm_lapic_irq_dest_mode() helper to take any boolean of destination mode and return the APIC_DEST_* macro. Then, it replaces the 0|1 settings of irq.dest_mode with the helper. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: X86: Move irrelevant declarations out of ioapic.hPeter Xu4-7/+5
kvm_apic_match_dest() is declared in both ioapic.h and lapic.h. Remove the declaration in ioapic.h. kvm_apic_compare_prio() is declared in ioapic.h but defined in lapic.c. Move the declaration to lapic.h. kvm_irq_delivery_to_apic() is declared in ioapic.h but defined in irq_comm.c. Move the declaration to irq.h. hyperv.c needs to use kvm_irq_delivery_to_apic(). Include irq.h in hyperv.c. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: X86: Fix kvm_bitmap_or_dest_vcpus() to use irq shorthandPeter Xu1-1/+1
The 3rd parameter of kvm_apic_match_dest() is the irq shorthand, rather than the irq delivery mode. Fixes: 7ee30bc132c6 ("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs") Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: explicitly set rmap_head->val to 0 in pte_list_desc_remove_entry()Miaohe Lin1-1/+1
When we reach here, we have desc->sptes[j] = NULL with j = 0. So we can replace desc->sptes[0] with 0 to make it more clear. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: vmx: remove unreachable statement in vmx_get_msr_feature()Miaohe Lin1-2/+0
We have no way to reach the final statement, remove it. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08KVM: x86: use CPUID to locate host page table reserved bitsPaolo Bonzini1-8/+12
The comment in kvm_get_shadow_phys_bits refers to MKTME, but the same is actually true of SME and SEV. Just use CPUID[0x8000_0008].EAX[7:0] unconditionally if available, it is simplest and works even if memory is not encrypted. Cc: stable@vger.kernel.org Reported-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-05Merge tag 'riscv/for-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linuxLinus Torvalds6-19/+35
Pull RISC-V fixes from Paul Walmsley: "Several fixes for RISC-V: - Fix function graph trace support - Prefix the CSR IRQ_* macro names with "RV_", to avoid collisions with macros elsewhere in the Linux kernel tree named "IRQ_TIMER" - Use __pa_symbol() when computing the physical address of a kernel symbol, rather than __pa() - Mark the RISC-V port as supporting GCOV One DT addition: - Describe the L2 cache controller in the FU540 DT file One documentation update: - Add patch acceptance guideline documentation" * tag 'riscv/for-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: Documentation: riscv: add patch acceptance guidelines riscv: prefix IRQ_ macro names with an RV_ namespace clocksource: riscv: add notrace to riscv_sched_clock riscv: ftrace: correct the condition logic in function graph tracer riscv: dts: Add DT support for SiFive L2 cache controller riscv: gcov: enable gcov for RISC-V riscv: mm: use __pa_symbol for kernel symbols
2020-01-04riscv: prefix IRQ_ macro names with an RV_ namespacePaul Walmsley2-12/+12
"IRQ_TIMER", used in the arch/riscv CSR header file, is a sufficiently generic macro name that it's used by several source files across the Linux code base. Some of these other files ultimately include the arch/riscv CSR include file, causing collisions. Fix by prefixing the RISC-V csr.h IRQ_ macro names with an RV_ prefix. Fixes: a4c3733d32a72 ("riscv: abstract out CSR names for supervisor vs machine mode") Reported-by: Olof Johansson <olof@lixom.net> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2020-01-04Merge branch 'akpm' (patches from Andrew)Linus Torvalds15-46/+32
Merge misc fixes from Andrew Morton: "17 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: hexagon: define ioremap_uc ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less ocfs2: call journal flush to mark journal as empty after journal recovery when mount mm/hugetlb: defer freeing of huge pages if in non-task context mm/gup: fix memory leak in __gup_benchmark_ioctl mm/oom: fix pgtables units mismatch in Killed process message fs/posix_acl.c: fix kernel-doc warnings hexagon: work around compiler crash hexagon: parenthesize registers in asm predicates fs/namespace.c: make to_mnt_ns() static fs/nsfs.c: include headers for missing declarations fs/direct-io.c: include fs/internal.h for missing prototype mm: move_pages: return valid node id in status if the page is already on the target node memcg: account security cred as well to kmemcg kcov: fix struct layout for kcov_remote_arg mm/zsmalloc.c: fix the migrated zspage statistics. mm/memory_hotplug: shrink zones when offlining memory
2020-01-04Merge tag 'mips_fixes_5.5_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linuxLinus Torvalds8-18/+72
Pull MIPS fixes from Paul Burton: "A collection of MIPS fixes: - Fill the struct cacheinfo shared_cpu_map field with sensible values, notably avoiding issues with perf which was unhappy in the absence of these values. - A boot fix for Loongson 2E & 2F machines which was fallout from some refactoring performed this cycle. - A Kconfig dependency fix for the Loongson CPU HWMon driver. - A couple of VDSO fixes, ensuring gettimeofday() behaves appropriately for kernel configurations that don't include support for a clocksource the VDSO can use & fixing the calling convention for the n32 & n64 VDSOs which would previously clobber the $gp/$28 register. - A build fix for vmlinuz compressed images which were inappropriately building with -fsanitize-coverage despite not being part of the kernel proper, then failing to link due to the missing __sanitizer_cov_trace_pc() function. - A couple of eBPF JIT fixes, including disabling it for MIPS32 due to a large number of issues with the code generated there & reflecting ISA dependencies in Kconfig to enforce that systems which don't support the JIT must include the interpreter" * tag 'mips_fixes_5.5_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: Avoid VDSO ABI breakage due to global register variable MIPS: BPF: eBPF JIT: check for MIPS ISA compliance in Kconfig MIPS: BPF: Disable MIPS32 eBPF JIT MIPS: Prevent link failure with kcov instrumentation MIPS: Kconfig: Use correct form for 'depends on' mips: Fix gettimeofday() in the vdso library MIPS: Fix boot on Fuloong2 systems mips: cacheinfo: report shared CPU map
2020-01-04hexagon: define ioremap_ucNick Desaulniers1-0/+1
Similar to commit 38e45d81d14e ("sparc64: implement ioremap_uc") define ioremap_uc for hexagon to avoid errors from -Wimplicit-function-definition. Link: http://lkml.kernel.org/r/20191209222956.239798-2-ndesaulniers@google.com Link: https://github.com/ClangBuiltLinux/linux/issues/797 Fixes: e537654b7039 ("lib: devres: add a helper function for ioremap_uc") Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Suggested-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Brian Cain <bcain@codeaurora.org> Cc: Lee Jones <lee.jones@linaro.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Tuowen Zhao <ztuowen@gmail.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexios Zavras <alexios.zavras@intel.com> Cc: Allison Randal <allison@lohutok.net> Cc: Will Deacon <will@kernel.org> Cc: Richard Fontana <rfontana@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04hexagon: work around compiler crashNick Desaulniers1-3/+1
Clang cannot translate the string "r30" into a valid register yet. Link: https://github.com/ClangBuiltLinux/linux/issues/755 Link: http://lkml.kernel.org/r/20191028155722.23419-1-ndesaulniers@google.com Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Suggested-by: Sid Manning <sidneym@quicinc.com> Reviewed-by: Brian Cain <bcain@codeaurora.org> Cc: Allison Randal <allison@lohutok.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Richard Fontana <rfontana@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04hexagon: parenthesize registers in asm predicatesNick Desaulniers6-23/+23
Hexagon requires that register predicates in assembly be parenthesized. Link: https://github.com/ClangBuiltLinux/linux/issues/754 Link: http://lkml.kernel.org/r/20191209222956.239798-3-ndesaulniers@google.com Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Suggested-by: Sid Manning <sidneym@codeaurora.org> Acked-by: Brian Cain <bcain@codeaurora.org> Cc: Lee Jones <lee.jones@linaro.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Tuowen Zhao <ztuowen@gmail.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alexios Zavras <alexios.zavras@intel.com> Cc: Allison Randal <allison@lohutok.net> Cc: Will Deacon <will@kernel.org> Cc: Richard Fontana <rfontana@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04mm/memory_hotplug: shrink zones when offlining memoryDavid Hildenbrand7-20/+7
We currently try to shrink a single zone when removing memory. We use the zone of the first page of the memory we are removing. If that memmap was never initialized (e.g., memory was never onlined), we will read garbage and can trigger kernel BUGs (due to a stale pointer): BUG: unable to handle page fault for address: 000000000000353d #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP PTI CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4 Workqueue: kacpi_hotplug acpi_hotplug_work_fn RIP: 0010:clear_zone_contiguous+0x5/0x10 Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840 RSP: 0018:ffffad2400043c98 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000 RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40 RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000 R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680 FS: 0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __remove_pages+0x4b/0x640 arch_remove_memory+0x63/0x8d try_remove_memory+0xdb/0x130 __remove_memory+0xa/0x11 acpi_memory_device_remove+0x70/0x100 acpi_bus_trim+0x55/0x90 acpi_device_hotplug+0x227/0x3a0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x221/0x550 worker_thread+0x50/0x3b0 kthread+0x105/0x140 ret_from_fork+0x3a/0x50 Modules linked in: CR2: 000000000000353d Instead, shrink the zones when offlining memory or when onlining failed. Introduce and use remove_pfn_range_from_zone(() for that. We now properly shrink the zones, even if we have DIMMs whereby - Some memory blocks fall into no zone (never onlined) - Some memory blocks fall into multiple zones (offlined+re-onlined) - Multiple memory blocks that fall into different zones Drop the zone parameter (with a potential dubious value) from __remove_pages() and __remove_section(). Link: http://lkml.kernel.org/r/20191006085646.5768-6-david@redhat.com Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") [visible after d0dc12e86b319] Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: <stable@vger.kernel.org> [5.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-03Merge tag 'powerpc-5.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds2-2/+3
Pull powerpc fixes from Michael Ellerman: "Two more powerpc fixes for 5.5: - One commit to fix a build error when CONFIG_JUMP_LABEL=n, introduced by our recent fix to is_shared_processor(). - A commit marking some SLB related functions as notrace, as tracing them triggers warnings. Thanks to Jason A Donenfeld" * tag 'powerpc-5.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/spinlocks: Include correct header for static key powerpc/mm: Mark get_slice_psize() & slice_addr_is_low() as notrace
2020-01-03riscv: ftrace: correct the condition logic in function graph tracerZong Li1-1/+1
The condition should be logical NOT to assign the hook address to parent address. Because the return value 0 of function_graph_enter upon success. Fixes: e949b6db51dc (riscv/function_graph: Simplify with function_graph_enter()) Signed-off-by: Zong Li <zong.li@sifive.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: stable@vger.kernel.org Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2020-01-03riscv: dts: Add DT support for SiFive L2 cache controllerYash Shah1-0/+15
Add the L2 cache controller DT node in SiFive FU540 soc-specific DT file Signed-off-by: Yash Shah <yash.shah@sifive.com> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>
2020-01-03riscv: gcov: enable gcov for RISC-VZong Li1-0/+1
This patch enables GCOV code coverage measurement on RISC-V. Lightly tested on QEMU and Hifive Unleashed board, seems to work as expected. Signed-off-by: Zong Li <zong.li@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Acked-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>