aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2023-03-24KVM: selftests: Assert that XTILE_DATA is set in IA32_XFD on #NMMingwei Zhang1-0/+2
Add an extra check to IA32_XFD to ensure that XTILE_DATA is actually set, i.e. is consistent with the AMX architecture. In addition, repeat the checks after the guest/host world switch to ensure the values of IA32_XFD and IA32_XFD_ERR are well preserved. Signed-off-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20230221163655.920289-7-mizhang@google.com [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-24KVM: selftests: Add check of CR0.TS in the #NM handler in amx_testMingwei Zhang1-0/+1
Be extra paranoid and assert that CR0.TS is clear when verifying the #NM in the AMX test is due to the expected XFeature Disable error, i.e. that the #NM isn't due to CR0.TS=1. Signed-off-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20230221163655.920289-6-mizhang@google.com [sean: reword changelog to make it clear this is pure paranoia] Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-24KVM: selftests: Enable checking on xcomp_bv in amx_testMingwei Zhang1-0/+1
After tilerelease instruction, AMX tiles are in INIT state. According to Intel SDM vol 1. 13.10: "If RFBM[i] = 1, XSTATE_BV[i] is set to the value of XINUSE[i].", XSTATE_BV[18] should be cleared after xsavec. On the other hand, according to Intel SDM vol 1. 13.4.3: "If XCOMP_BV[i] = 1, state component i is located at a byte offset locationI from the base address of the XSAVE area". Since at the time of xsavec, XCR0[18] is set indicating AMX tile data component is still enabled, xcomp_bv[18] should be set. Complete the checks by adding the assert to xcomp_bv[18] after xsavec. Signed-off-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20230221163655.920289-5-mizhang@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-24KVM: selftests: Fix an error in comment of amx_testMingwei Zhang1-1/+4
After the execution of __tilerelease(), AMX component will be in INIT state. Therefore, execution of XSAVEC saving the AMX state into memory will cause the xstate_bv[18] cleared in xheader. However, the xcomp_bv[18] will remain set. Fix the error in comment. Also, update xsavec() to XSAVEC because xcomp_bv[18] is set due to the instruction, not the function. Finally, use XTILEDATA instead 'bit 18' in comments. Cc: Jim Mattson <jmattson@google.com> Cc: Venkatesh Srinivas <venkateshs@google.com> Cc: Aaron Lewis <aaronlewis@google.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20230221163655.920289-4-mizhang@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-24KVM: selftests: Add a fully functional "struct xstate" for x86Mingwei Zhang2-25/+23
Add a working xstate data structure for the usage of AMX and potential future usage on other xstate components. AMX selftest requires checking both the xstate_bv and xcomp_bv. Existing code relies on pointer arithmetics to fetch xstate_bv and does not support xcomp_bv. So, add a working xstate data structure into processor.h for x86. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Link: https://lore.kernel.org/r/20230221163655.920289-3-mizhang@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-24KVM: selftests: Add 'malloc' failure check in vcpu_save_stateIvan Orlov1-0/+1
There is a 'malloc' call in vcpu_save_state function, which can be unsuccessful. This patch will add the malloc failure checking to avoid possible null dereference and give more information about test fail reasons. Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Link: https://lore.kernel.org/r/20230322144528.704077-1-ivan.orlov0322@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-24KVM: selftests: Adjust VM's initial stack address to align with SysV ABI specAckerley Tng1-1/+17
Align the guest stack to match calling sequence requirements in section "The Stack Frame" of the System V ABI AMD64 Architecture Processor Supplement, which requires the value (%rsp + 8), NOT %rsp, to be a multiple of 16 when control is transferred to the function entry point. I.e. in a normal function call, %rsp needs to be 16-byte aligned _before_ CALL, not after. This fixes unexpected #GPs in guest code when the compiler uses SSE instructions, e.g. to initialize memory, as many SSE instructions require memory operands (including those on the stack) to be 16-byte-aligned. Signed-off-by: Ackerley Tng <ackerleytng@google.com> Link: https://lore.kernel.org/r/20230227180601.104318-1-ackerleytng@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-24KVM: selftests: Report enable_pmu module value when test is skippedLike Xu2-0/+2
Running x86_64/pmu_event_filter_test or x86_64/vmx_pmu_caps_test with enable_pmu globally disabled will report the following into: 1..0 # SKIP - Requirement not met: use_intel_pmu() || use_amd_pmu() or 1..0 # SKIP - Requirement not met: kvm_cpu_has(X86_FEATURE_PDCM) this can be confusing, so add a check on kvm.enable_pmu. Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230313085311.25327-3-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-24KVM: selftests: Add a helper to read kvm boolean module parametersLike Xu2-0/+6
Add a helper function for reading kvm boolean module parameters values. No functional change intended. Signed-off-by: Like Xu <likexu@tencent.com> Link: https://lore.kernel.org/r/20230214084920.59787-2-likexu@tencent.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-24KVM: selftests: Fix nsec to sec conversion in demand_paging_testAnish Moorthy1-1/+1
demand_paging_test uses 1E8 as the denominator to convert nanoseconds to seconds, which is wrong. Use NSEC_PER_SEC instead to fix the issue and make the conversion obvious. Reported-by: James Houghton <jthoughton@google.com> Signed-off-by: Anish Moorthy <amoorthy@google.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230223001805.2971237-1-amoorthy@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2023-03-16KVM: Change return type of kvm_arch_vm_ioctl() to "int"Thomas Huth7-15/+9
All kvm_arch_vm_ioctl() implementations now only deal with "int" types as return values, so we can change the return type of these functions to use "int" instead of "long". Signed-off-by: Thomas Huth <thuth@redhat.com> Acked-by: Anup Patel <anup@brainfault.org> Message-Id: <20230208140105.655814-7-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16KVM: Standardize on "int" return types instead of "long" in kvm_main.cThomas Huth1-2/+2
KVM functions use "long" return values for functions that are wired up to "struct file_operations", but otherwise use "int" return values for functions that can return 0/-errno in order to avoid unintentional divergences between 32-bit and 64-bit kernels. Some code still uses "long" in unnecessary spots, though, which can cause a little bit of confusion and unnecessary size casts. Let's change these spots to use "int" types, too. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230208140105.655814-6-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16KVM: arm64: Limit length in kvm_vm_ioctl_mte_copy_tags() to INT_MAXThomas Huth3-5/+10
In case of success, this function returns the amount of handled bytes. However, this does not work for large values: The function is called from kvm_arch_vm_ioctl() (which still returns a long), which in turn is called from kvm_vm_ioctl() in virt/kvm/kvm_main.c. And that function stores the return value in an "int r" variable. So the upper 32-bits of the "long" return value are lost there. KVM ioctl functions should only return "int" values, so let's limit the amount of bytes that can be requested here to INT_MAX to avoid the problem with the truncated return value. We can then also change the return type of the function to "int" to make it clearer that it is not possible to return a "long" here. Fixes: f0376edb1ddc ("KVM: arm64: Add ioctl to fetch/store tags in a guest") Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Steven Price <steven.price@arm.com> Message-Id: <20230208140105.655814-5-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16KVM: x86: Remove the KVM_GET_NR_MMU_PAGES ioctlThomas Huth3-10/+2
The KVM_GET_NR_MMU_PAGES ioctl is quite questionable on 64-bit hosts since it fails to return the full 64 bits of the value that can be set with the corresponding KVM_SET_NR_MMU_PAGES call. Its "long" return value is truncated into an "int" in the kvm_arch_vm_ioctl() function. Since this ioctl also never has been used by userspace applications (QEMU, Google's internal VMM, kvmtool and CrosVM have been checked), it's likely the best if we remove this badly designed ioctl before anybody really tries to use it. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230208140105.655814-4-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16KVM: s390: Use "int" as return type for kvm_s390_get/set_skeys()Thomas Huth1-2/+2
These two functions only return normal integers, so it does not make sense to declare the return type as "long" here. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230208140105.655814-3-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16KVM: PPC: Standardize on "int" return types in the powerpc KVM codeThomas Huth5-21/+21
Most functions that are related to kvm_arch_vm_ioctl() already use "int" as return type to pass error values back to the caller. Some outlier functions use "long" instead for no good reason (they do not really require long values here). Let's standardize on "int" here to avoid casting the values back and forth between the two types. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230208140105.655814-2-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16kvm: x86: Advertise FLUSH_L1D to user spaceEmanuele Giuseppe Esposito1-1/+1
FLUSH_L1D was already added in 11e34e64e4103, but the feature is not visible to userspace yet. The bit definition: CPUID.(EAX=7,ECX=0):EDX[bit 28] If the feature is supported by the host, kvm should support it too so that userspace can choose whether to expose it to the guest or not. One disadvantage of not exposing it is that the guest will report a non existing vulnerability in /sys/devices/system/cpu/vulnerabilities/mmio_stale_data because the mitigation is present only if the guest supports (FLUSH_L1D and MD_CLEAR) or FB_CLEAR. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20230201132905.549148-4-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16kvm: svm: Add IA32_FLUSH_CMD guest supportEmanuele Giuseppe Esposito1-13/+30
Expose IA32_FLUSH_CMD to the guest if the guest CPUID enumerates support for this MSR. As with IA32_PRED_CMD, permission for unintercepted writes to this MSR will be granted to the guest after the first non-zero write. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20230201132905.549148-3-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-16kvm: vmx: Add IA32_FLUSH_CMD guest supportEmanuele Giuseppe Esposito2-25/+46
Expose IA32_FLUSH_CMD to the guest if the guest CPUID enumerates support for this MSR. As with IA32_PRED_CMD, permission for unintercepted writes to this MSR will be granted to the guest after the first non-zero write. Co-developed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20230201132905.549148-2-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: VMX: Rename "KVM is using eVMCS" static key to match its wrapperSean Christopherson3-4/+4
Rename enable_evmcs to __kvm_is_using_evmcs to match its wrapper, and to avoid confusion with enabling eVMCS for nested virtualization, i.e. have "enable eVMCS" be reserved for "enable eVMCS support for L1". No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230211003534.564198-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: VMX: Stub out enable_evmcs static key for CONFIG_HYPERV=nSean Christopherson4-23/+28
Wrap enable_evmcs in a helper and stub it out when CONFIG_HYPERV=n in order to eliminate the static branch nop placeholders. clang-14 is clever enough to elide the nop, but gcc-12 is not. Stubbing out the key reduces the size of kvm-intel.ko by ~7.5% (200KiB) when compiled with gcc-12 (there are a _lot_ of VMCS accesses throughout KVM). Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230211003534.564198-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: nVMX: Move EVMCS1_SUPPORT_* macros to hyperv.cSean Christopherson2-105/+105
Move the macros that define the set of VMCS controls that are supported by eVMCS1 from hyperv.h to hyperv.c, i.e. make them "private". The macros should never be consumed directly by KVM at-large since the "final" set of supported controls depends on guest CPUID. No functional change intended. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230211003534.564198-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: x86/mmu: Remove FNAME(is_self_change_mapping)Lai Jiangshan1-44/+7
Drop FNAME(is_self_change_mapping) and instead rely on kvm_mmu_hugepage_adjust() to adjust the hugepage accordingly. Prior to commit 4cd071d13c5c ("KVM: x86/mmu: Move calls to thp_adjust() down a level"), the hugepage adjustment was done before allocating new shadow pages, i.e. failed to restrict the hugepage sizes if a new shadow page resulted in account_shadowed() changing the disallowed hugepage tracking. Removing FNAME(is_self_change_mapping) fixes a bug reported by Huang Hang where KVM unnecessarily forces a 4KiB page. FNAME(is_self_change_mapping) has a defect in that it blindly disables _all_ hugepage mappings rather than trying to reduce the size of the hugepage. If the guest is writing to a 1GiB page and the 1GiB is self-referential but a 2MiB page is not, then KVM can and should create a 2MiB mapping. Add a comment above the call to kvm_mmu_hugepage_adjust() to call out the new dependency on adjusting the hugepage size after walking indirect PTEs. Reported-by: Huang Hang <hhuang@linux.alibaba.com> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Link: https://lore.kernel.org/r/20221213125538.81209-1-jiangshanlai@gmail.com [sean: rework changelog after separating out the emulator change] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230202182817.407394-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: x86/mmu: Detect write #PF to shadow pages during FNAME(fetch) walkLai Jiangshan1-7/+5
Move the detection of write #PF to shadow pages, i.e. a fault on a write to a page table that is being shadowed by KVM that is used to translate the write itself, from FNAME(is_self_change_mapping) to FNAME(fetch). There is no need to detect the self-referential write before kvm_faultin_pfn() as KVM does not consume EMULTYPE_WRITE_PF_TO_SP for accesses that resolve to "error or no-slot" pfns, i.e. KVM doesn't allow retrying MMIO accesses or writes to read-only memslots. Detecting the EMULTYPE_WRITE_PF_TO_SP scenario in FNAME(fetch) will allow dropping FNAME(is_self_change_mapping) entirely, as the hugepage interaction can be deferred to kvm_mmu_hugepage_adjust(). Cc: Huang Hang <hhuang@linux.alibaba.com> Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Link: https://lore.kernel.org/r/20221213125538.81209-1-jiangshanlai@gmail.com [sean: split to separate patch, write changelog] Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230202182817.407394-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: x86/mmu: Use EMULTYPE flag to track write #PFs to shadow pagesSean Christopherson5-36/+37
Use a new EMULTYPE flag, EMULTYPE_WRITE_PF_TO_SP, to track page faults on self-changing writes to shadowed page tables instead of propagating that information to the emulator via a semi-persistent vCPU flag. Using a flag in "struct kvm_vcpu_arch" is confusing, especially as implemented, as it's not at all obvious that clearing the flag only when emulation actually occurs is correct. E.g. if KVM sets the flag and then retries the fault without ever getting to the emulator, the flag will be left set for future calls into the emulator. But because the flag is consumed if and only if both EMULTYPE_PF and EMULTYPE_ALLOW_RETRY_PF are set, and because EMULTYPE_ALLOW_RETRY_PF is deliberately not set for direct MMUs, emulated MMIO, or while L2 is active, KVM avoids false positives on a stale flag since FNAME(page_fault) is guaranteed to be run and refresh the flag before it's ultimately consumed by the tail end of reexecute_instruction(). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230202182817.407394-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: selftests: Sync KVM exit reasons in selftestsVipin Sharma1-2/+15
Add missing KVM_EXIT_* reasons in KVM selftests from include/uapi/linux/kvm.h Signed-off-by: Vipin Sharma <vipinsh@google.com> Message-Id: <20230204014547.583711-5-vipinsh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: selftests: Add macro to generate KVM exit reason stringsSean Christopherson1-26/+28
Add and use a macro to generate the KVM exit reason strings array instead of relying on developers to correctly copy+paste+edit each string. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230204014547.583711-4-vipinsh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: selftests: Print expected and actual exit reason in KVM exit reason assertVipin Sharma1-1/+2
Print what KVM exit reason a test was expecting and what it actually got int TEST_ASSERT_KVM_EXIT_REASON(). Signed-off-by: Vipin Sharma <vipinsh@google.com> Message-Id: <20230204014547.583711-3-vipinsh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: selftests: Make vCPU exit reason test assertion commonVipin Sharma44-293/+69
Make TEST_ASSERT_KVM_EXIT_REASON() macro and replace all exit reason test assert statements with it. No functional changes intended. Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Message-Id: <20230204014547.583711-2-vipinsh@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: selftests: Add EVTCHNOP_send slow path test to xen_shinfo_testDavid Woodhouse1-0/+27
When kvm_xen_evtchn_send() takes the slow path because the shinfo GPC needs to be revalidated, it used to violate the SRCU vs. kvm->lock locking rules and potentially cause a deadlock. Now that lockdep is learning to catch such things, make sure that code path is exercised by the selftest. Link: https://lore.kernel.org/all/20230113124606.10221-2-dwmw2@infradead.org Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230204024151.1373296-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: selftests: Use enum for test numbers in xen_shinfo_testDavid Woodhouse1-51/+82
The xen_shinfo_test started off with very few iterations, and the numbers we used in GUEST_SYNC() were precisely mapped to the RUNSTATE_xxx values anyway to start with. It has since grown quite a few more tests, and it's kind of awful to be handling them all as bare numbers. Especially when I want to add a new test in the middle. Define an enum for the test stages, and use it both in the guest code and the host switch statement. No functional change, if I can count to 24. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230204024151.1373296-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: selftests: Add helpers to make Xen-style VMCALL/VMMCALL hypercallsSean Christopherson3-54/+21
Add wrappers to do hypercalls using VMCALL/VMMCALL and Xen's register ABI (as opposed to full Xen-style hypercalls through a hypervisor provided page). Using the common helpers dedups a pile of code, and uses the native hypercall instruction when running on AMD. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230204024151.1373296-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: selftests: Move the guts of kvm_hypercall() to a separate macroSean Christopherson1-12/+17
Extract the guts of kvm_hypercall() to a macro so that Xen hypercalls, which have a different register ABI, can reuse the VMCALL vs. VMMCALL logic. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230204024151.1373296-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: SVM: WARN if GATag generation drops VM or vCPU ID informationSean Christopherson1-3/+12
WARN if generating a GATag given a VM ID and vCPU ID doesn't yield the same IDs when pulling the IDs back out of the tag. Don't bother adding error handling to callers, this is very much a paranoid sanity check as KVM fully controls the VM ID and is supposed to reject too-big vCPU IDs. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20230207002156.521736-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: SVM: Modify AVIC GATag to support max number of 512 vCPUsSuravee Suthikulpanit1-8/+18
Define AVIC_VCPU_ID_MASK based on AVIC_PHYSICAL_MAX_INDEX, i.e. the mask that effectively controls the largest guest physical APIC ID supported by x2AVIC, instead of hardcoding the number of bits to 8 (and the number of VM bits to 24). The AVIC GATag is programmed into the AMD IOMMU IRTE to provide a reference back to KVM in case the IOMMU cannot inject an interrupt into a non-running vCPU. In such a case, the IOMMU notifies software by creating a GALog entry with the corresponded GATag, and KVM then uses the GATag to find the correct VM+vCPU to kick. Dropping bit 8 from the GATag results in kicking the wrong vCPU when targeting vCPUs with x2APIC ID > 255. Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode") Cc: stable@vger.kernel.org Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20230207002156.521736-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: SVM: Fix a benign off-by-one bug in AVIC physical table maskSean Christopherson1-5/+7
Define the "physical table max index mask" as bits 8:0, not 9:0. x2AVIC currently supports a max of 512 entries, i.e. the max index is 511, and the inputs to GENMASK_ULL() are inclusive. The bug is benign as bit 9 is reserved and never set by KVM, i.e. KVM is just clearing bits that are guaranteed to be zero. Note, as of this writing, APM "Rev. 3.39-October 2022" incorrectly states that bits 11:8 are reserved in Table B-1. VMCB Layout, Control Area. I.e. that table wasn't updated when x2AVIC support was added. Opportunistically fix the comment for the max AVIC ID to align with the code, and clean up comment formatting too. Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode") Cc: stable@vger.kernel.org Cc: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20230207002156.521736-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14selftests: KVM: skip hugetlb tests if huge pages are not availablePaolo Bonzini1-9/+16
Right now, if KVM memory stress tests are run with hugetlb sources but hugetlb is not available (either in the kernel or because /proc/sys/vm/nr_hugepages is 0) the test will fail with a memory allocation error. This makes it impossible to add tests that default to hugetlb-backed memory, because on a machine with a default configuration they will fail. Therefore, check HugePages_Total as well and, if zero, direct the user to enable hugepages in procfs. Furthermore, return KSFT_SKIP whenever hugetlb is not available. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: VMX: Use tabs instead of spaces for indentationRong Tao1-2/+2
Code indentation should use tabs where possible and miss a '*'. Signed-off-by: Rong Tao <rongtao@cestc.cn> Message-Id: <tencent_A492CB3F9592578451154442830EA1B02C07@qq.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: VMX: Fix indentation coding style issueRong Tao1-6/+6
Code indentation should use tabs where possible. Signed-off-by: Rong Tao <rongtao@cestc.cn> Message-Id: <tencent_31E6ACADCB6915E157CF5113C41803212107@qq.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: nVMX: remove unnecessary #ifdefPaolo Bonzini1-7/+1
nested_vmx_check_controls() has already run by the time KVM checks host state, so the "host address space size" exit control can only be set on x86-64 hosts. Simplify the condition at the cost of adding some dead code to 32-bit kernels. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-14KVM: nVMX: add missing consistency checks for CR0 and CR4Paolo Bonzini1-2/+8
The effective values of the guest CR0 and CR4 registers may differ from those included in the VMCS12. In particular, disabling EPT forces CR4.PAE=1 and disabling unrestricted guest mode forces CR0.PG=CR0.PE=1. Therefore, checks on these bits cannot be delegated to the processor and must be performed by KVM. Reported-by: Reima ISHII <ishiir@g.ecc.u-tokyo.ac.jp> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-12Linux 6.3-rc2Linus Torvalds1-1/+1
2023-03-12wifi: cfg80211: Partial revert "wifi: cfg80211: Fix use after free for wext"Hector Martin1-2/+0
This reverts part of commit 015b8cc5e7c4 ("wifi: cfg80211: Fix use after free for wext") This commit broke WPA offload by unconditionally clearing the crypto modes for non-WEP connections. Drop that part of the patch. Signed-off-by: Hector Martin <marcan@marcan.st> Reported-by: Ilya <me@0upti.me> Reported-and-tested-by: Janne Grunau <j@jannau.net> Reviewed-by: Eric Curtin <ecurtin@redhat.com> Fixes: 015b8cc5e7c4 ("wifi: cfg80211: Fix use after free for wext") Cc: stable@kernel.org Link: https://lore.kernel.org/linux-wireless/ZAx0TWRBlGfv7pNl@kroah.com/T/#m11e6e0915ab8fa19ce8bc9695ab288c0fe018edf Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-12tpm: disable hwrng for fTPM on some AMD designsMario Limonciello2-1/+132
AMD has issued an advisory indicating that having fTPM enabled in BIOS can cause "stuttering" in the OS. This issue has been fixed in newer versions of the fTPM firmware, but it's up to system designers to decide whether to distribute it. This issue has existed for a while, but is more prevalent starting with kernel 6.1 because commit b006c439d58db ("hwrng: core - start hwrng kthread also for untrusted sources") started to use the fTPM for hwrng by default. However, all uses of /dev/hwrng result in unacceptable stuttering. So, simply disable registration of the defective hwrng when detecting these faulty fTPM versions. As this is caused by faulty firmware, it is plausible that such a problem could also be reproduced by other TPM interactions, but this hasn't been shown by any user's testing or reports. It is hypothesized to be triggered more frequently by the use of the RNG because userspace software will fetch random numbers regularly. Intentionally continue to register other TPM functionality so that users that rely upon PCR measurements or any storage of data will still have access to it. If it's found later that another TPM functionality is exacerbating this problem a module parameter it can be turned off entirely and a module parameter can be introduced to allow users who rely upon fTPM functionality to turn it on even though this problem is present. Link: https://www.amd.com/en/support/kb/faq/pa-410 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216989 Link: https://lore.kernel.org/all/20230209153120.261904-1-Jason@zx2c4.com/ Fixes: b006c439d58d ("hwrng: core - start hwrng kthread also for untrusted sources") Cc: stable@vger.kernel.org Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Tested-by: reach622@mailcuk.com Tested-by: Bell <1138267643@qq.com> Co-developed-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-03-12tpm/eventlog: Don't abort tpm_read_log on faulty ACPI addressMorten Linderud1-1/+5
tpm_read_log_acpi() should return -ENODEV when no eventlog from the ACPI table is found. If the firmware vendor includes an invalid log address we are unable to map from the ACPI memory and tpm_read_log() returns -EIO which would abort discovery of the eventlog. Change the return value from -EIO to -ENODEV when acpi_os_map_iomem() fails to map the event log. The following hardware was used to test this issue: Framework Laptop (Pre-production) BIOS: INSYDE Corp, Revision: 3.2 TPM Device: NTC, Firmware Revision: 7.2 Dump of the faulty ACPI TPM2 table: [000h 0000 4] Signature : "TPM2" [Trusted Platform Module hardware interface Table] [004h 0004 4] Table Length : 0000004C [008h 0008 1] Revision : 04 [009h 0009 1] Checksum : 2B [00Ah 0010 6] Oem ID : "INSYDE" [010h 0016 8] Oem Table ID : "TGL-ULT" [018h 0024 4] Oem Revision : 00000002 [01Ch 0028 4] Asl Compiler ID : "ACPI" [020h 0032 4] Asl Compiler Revision : 00040000 [024h 0036 2] Platform Class : 0000 [026h 0038 2] Reserved : 0000 [028h 0040 8] Control Address : 0000000000000000 [030h 0048 4] Start Method : 06 [Memory Mapped I/O] [034h 0052 12] Method Parameters : 00 00 00 00 00 00 00 00 00 00 00 00 [040h 0064 4] Minimum Log Length : 00010000 [044h 0068 8] Log Address : 000000004053D000 Fixes: 0cf577a03f21 ("tpm: Fix handling of missing event log") Tested-by: Erkki Eilonen <erkki@bearmetal.eu> Signed-off-by: Morten Linderud <morten@linderud.pw> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-03-12cpumask: relax sanity checking constraintsLinus Torvalds1-1/+1
The cpumask_check() was unnecessarily tight, and causes problems for the users of cpumask_next(). We have a number of users that take the previous return value of one of the bit scanning functions and subtract one to keep it in "range". But since the scanning functions end up returning up to 'small_cpumask_bits' instead of the tighter 'nr_cpumask_bits', the range really needs to be using that widened form. [ This "previous-1" behavior is also the reason we have all those comments about /* -1 is a legal arg here. */ and separate checks for that being ok. So we could have just made "small_cpumask_bits-1" be a similar special "don't check this" value. Tetsuo Handa even suggested a patch that only does that for cpumask_next(), since that seems to be the only actual case that triggers, but that all makes it even _more_ magical and special. So just relax the check ] One example of this kind of pattern being the 'c_start()' function in arch/x86/kernel/cpu/proc.c, but also duplicated in various forms on other architectures. Reported-by: syzbot+96cae094d90877641f32@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=96cae094d90877641f32 Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Link: https://lore.kernel.org/lkml/c1f4cc16-feea-b83c-82cf-1a1f007b7eb9@I-love.SAKURA.ne.jp/ Fixes: 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask optimizations") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-11ubi: block: Fix missing blk_mq_end_requestRichard Weinberger1-1/+4
Switching to BLK_MQ_F_BLOCKING wrongly removed the call to blk_mq_end_request(). Add it back to have our IOs finished Fixes: 91cc8fbcc8c7 ("ubi: block: set BLK_MQ_F_BLOCKING") Analyzed-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Daniel Palmer <daniel@0x0f.com> Link: https://lore.kernel.org/linux-mtd/CAHk-=wi29bbBNh3RqJKu3PxzpjDN5D5K17gEVtXrb7-6bfrnMQ@mail.gmail.com/ Signed-off-by: Richard Weinberger <richard@nod.at> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Daniel Palmer <daniel@0x0f.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-11KVM: arm64: timers: Convert per-vcpu virtual offset to a global valueMarc Zyngier4-36/+29
Having a per-vcpu virtual offset is a pain. It needs to be synchronized on each update, and expands badly to a setup where different timers can have different offsets, or have composite offsets (as with NV). So let's start by replacing the use of the CNTVOFF_EL2 shadow register (which we want to reclaim for NV anyway), and make the virtual timer carry a pointer to a VM-wide offset. This simplifies the code significantly. It also addresses two terrible bugs: - The use of CNTVOFF_EL2 leads to some nice offset corruption when the sysreg gets reset, as reported by Joey. - The kvm mutex is taken from a vcpu ioctl, which goes against the locking rules... Reported-by: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230224173915.GA17407@e124191.cambridge.arm.com Tested-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20230224191640.3396734-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-03-11ext4: zero i_disksize when initializing the bootloader inodeZhihao Cheng1-0/+1
If the boot loader inode has never been used before, the EXT4_IOC_SWAP_BOOT inode will initialize it, including setting the i_size to 0. However, if the "never before used" boot loader has a non-zero i_size, then i_disksize will be non-zero, and the inconsistency between i_size and i_disksize can trigger a kernel warning: WARNING: CPU: 0 PID: 2580 at fs/ext4/file.c:319 CPU: 0 PID: 2580 Comm: bb Not tainted 6.3.0-rc1-00004-g703695902cfa RIP: 0010:ext4_file_write_iter+0xbc7/0xd10 Call Trace: vfs_write+0x3b1/0x5c0 ksys_write+0x77/0x160 __x64_sys_write+0x22/0x30 do_syscall_64+0x39/0x80 Reproducer: 1. create corrupted image and mount it: mke2fs -t ext4 /tmp/foo.img 200 debugfs -wR "sif <5> size 25700" /tmp/foo.img mount -t ext4 /tmp/foo.img /mnt cd /mnt echo 123 > file 2. Run the reproducer program: posix_memalign(&buf, 1024, 1024) fd = open("file", O_RDWR | O_DIRECT); ioctl(fd, EXT4_IOC_SWAP_BOOT); write(fd, buf, 1024); Fix this by setting i_disksize as well as i_size to zero when initiaizing the boot loader inode. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217159 Cc: stable@kernel.org Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Link: https://lore.kernel.org/r/20230308032643.641113-1-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-11ext4: make sure fs error flag setted before clear journal errorYe Bin1-2/+4
Now, jounral error number maybe cleared even though ext4_commit_super() failed. This may lead to error flag miss, then fsck will miss to check file system deeply. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230307061703.245965-3-yebin@huaweicloud.com