aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/events (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-08-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds3-35/+159
Pull kvm updates from Paolo Bonzini: "Quite a large pull request due to a selftest API overhaul and some patches that had come in too late for 5.19. ARM: - Unwinder implementations for both nVHE modes (classic and protected), complete with an overflow stack - Rework of the sysreg access from userspace, with a complete rewrite of the vgic-v3 view to allign with the rest of the infrastructure - Disagregation of the vcpu flags in separate sets to better track their use model. - A fix for the GICv2-on-v3 selftest - A small set of cosmetic fixes RISC-V: - Track ISA extensions used by Guest using bitmap - Added system instruction emulation framework - Added CSR emulation framework - Added gfp_custom flag in struct kvm_mmu_memory_cache - Added G-stage ioremap() and iounmap() functions - Added support for Svpbmt inside Guest s390: - add an interface to provide a hypervisor dump for secure guests - improve selftests to use TAP interface - enable interpretive execution of zPCI instructions (for PCI passthrough) - First part of deferred teardown - CPU Topology - PV attestation - Minor fixes x86: - Permit guests to ignore single-bit ECC errors - Intel IPI virtualization - Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS - PEBS virtualization - Simplify PMU emulation by just using PERF_TYPE_RAW events - More accurate event reinjection on SVM (avoid retrying instructions) - Allow getting/setting the state of the speaker port data bit - Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent - "Notify" VM exit (detect microarchitectural hangs) for Intel - Use try_cmpxchg64 instead of cmpxchg64 - Ignore benign host accesses to PMU MSRs when PMU is disabled - Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior - Allow NX huge page mitigation to be disabled on a per-vm basis - Port eager page splitting to shadow MMU as well - Enable CMCI capability by default and handle injected UCNA errors - Expose pid of vcpu threads in debugfs - x2AVIC support for AMD - cleanup PIO emulation - Fixes for LLDT/LTR emulation - Don't require refcounted "struct page" to create huge SPTEs - Miscellaneous cleanups: - MCE MSR emulation - Use separate namespaces for guest PTEs and shadow PTEs bitmasks - PIO emulation - Reorganize rmap API, mostly around rmap destruction - Do not workaround very old KVM bugs for L0 that runs with nesting enabled - new selftests API for CPUID Generic: - Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache - new selftests API using struct kvm_vcpu instead of a (vm, id) tuple" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (606 commits) selftests: kvm: set rax before vmcall selftests: KVM: Add exponent check for boolean stats selftests: KVM: Provide descriptive assertions in kvm_binary_stats_test selftests: KVM: Check stat name before other fields KVM: x86/mmu: remove unused variable RISC-V: KVM: Add support for Svpbmt inside Guest/VM RISC-V: KVM: Use PAGE_KERNEL_IO in kvm_riscv_gstage_ioremap() RISC-V: KVM: Add G-stage ioremap() and iounmap() functions KVM: Add gfp_custom flag in struct kvm_mmu_memory_cache RISC-V: KVM: Add extensible CSR emulation framework RISC-V: KVM: Add extensible system instruction emulation framework RISC-V: KVM: Factor-out instruction emulation into separate sources RISC-V: KVM: move preempt_disable() call in kvm_arch_vcpu_ioctl_run RISC-V: KVM: Make kvm_riscv_guest_timer_init a void function RISC-V: KVM: Fix variable spelling mistake RISC-V: KVM: Improve ISA extension by using a bitmap KVM, x86/mmu: Fix the comment around kvm_tdp_mmu_zap_leafs() KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog KVM: x86/mmu: Treat NX as a valid SPTE bit for NPT KVM: x86: Do not block APIC write for non ICR registers ...
2022-08-01Merge tag 'perf-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds4-73/+226
Pull perf events updates from Ingo Molnar: - Fix Intel Alder Lake PEBS memory access latency & data source profiling info bugs. - Use Intel large-PEBS hardware feature in more circumstances, to reduce PMI overhead & reduce sampling data. - Extend the lost-sample profiling output with the PERF_FORMAT_LOST ABI variant, which tells tooling the exact number of samples lost. - Add new IBS register bits definitions. - AMD uncore events: Add PerfMonV2 DF (Data Fabric) enhancements. * tag 'perf-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/ibs: Add new IBS register bits into header perf/x86/intel: Fix PEBS data source encoding for ADL perf/x86/intel: Fix PEBS memory access info encoding for ADL perf/core: Add a new read format to get a number of lost samples perf/x86/amd/uncore: Add PerfMonV2 RDPMC assignments perf/x86/amd/uncore: Add PerfMonV2 DF event format perf/x86/amd/uncore: Detect available DF counters perf/x86/amd/uncore: Use attr_update for format attributes perf/x86/amd/uncore: Use dynamic events array x86/events/intel/ds: Enable large PEBS for PERF_SAMPLE_WEIGHT_TYPE
2022-08-01Merge remote-tracking branch 'kvm/next' into kvm-next-5.20Paolo Bonzini3-35/+159
KVM/s390, KVM/x86 and common infrastructure changes for 5.20 x86: * Permit guests to ignore single-bit ECC errors * Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache * Intel IPI virtualization * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS * PEBS virtualization * Simplify PMU emulation by just using PERF_TYPE_RAW events * More accurate event reinjection on SVM (avoid retrying instructions) * Allow getting/setting the state of the speaker port data bit * Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent * "Notify" VM exit (detect microarchitectural hangs) for Intel * Cleanups for MCE MSR emulation s390: * add an interface to provide a hypervisor dump for secure guests * improve selftests to use TAP interface * enable interpretive execution of zPCI instructions (for PCI passthrough) * First part of deferred teardown * CPU Topology * PV attestation * Minor fixes Generic: * new selftests API using struct kvm_vcpu instead of a (vm, id) tuple x86: * Use try_cmpxchg64 instead of cmpxchg64 * Bugfixes * Ignore benign host accesses to PMU MSRs when PMU is disabled * Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior * x86/MMU: Allow NX huge pages to be disabled on a per-vm basis * Port eager page splitting to shadow MMU as well * Enable CMCI capability by default and handle injected UCNA errors * Expose pid of vcpu threads in debugfs * x2AVIC support for AMD * cleanup PIO emulation * Fixes for LLDT/LTR emulation * Don't require refcounted "struct page" to create huge SPTEs x86 cleanups: * Use separate namespaces for guest PTEs and shadow PTEs bitmasks * PIO emulation * Reorganize rmap API, mostly around rmap destruction * Do not workaround very old KVM bugs for L0 that runs with nesting enabled * new selftests API for CPUID
2022-07-20perf/x86/intel/lbr: Fix unchecked MSR access error on HSWKan Liang1-9/+10
The fuzzer triggers the below trace. [ 7763.384369] unchecked MSR access error: WRMSR to 0x689 (tried to write 0x1fffffff8101349e) at rIP: 0xffffffff810704a4 (native_write_msr+0x4/0x20) [ 7763.397420] Call Trace: [ 7763.399881] <TASK> [ 7763.401994] intel_pmu_lbr_restore+0x9a/0x1f0 [ 7763.406363] intel_pmu_lbr_sched_task+0x91/0x1c0 [ 7763.410992] __perf_event_task_sched_in+0x1cd/0x240 On a machine with the LBR format LBR_FORMAT_EIP_FLAGS2, when the TSX is disabled, a TSX quirk is required to access LBR from registers. The lbr_from_signext_quirk_needed() is introduced to determine whether the TSX quirk should be applied. However, the lbr_from_signext_quirk_needed() is invoked before the intel_pmu_lbr_init(), which parses the LBR format information. Without the correct LBR format information, the TSX quirk never be applied. Move the lbr_from_signext_quirk_needed() into the intel_pmu_lbr_init(). Checking x86_pmu.lbr_has_tsx in the lbr_from_signext_quirk_needed() is not required anymore. Both LBR_FORMAT_EIP_FLAGS2 and LBR_FORMAT_INFO have LBR_TSX flag, but only the LBR_FORMAT_EIP_FLAGS2 requirs the quirk. Update the comments accordingly. Fixes: 1ac7fd8159a8 ("perf/x86/intel/lbr: Support LBR format V7") Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220714182630.342107-1-kan.liang@linux.intel.com
2022-07-04perf/x86/intel: Fix PEBS data source encoding for ADLKan Liang3-14/+45
The PEBS data source encoding for the e-core is different from the p-core. Add the pebs_data_source[] in the struct x86_hybrid_pmu to store the data source encoding for each type of the core. Add intel_pmu_pebs_data_source_grt() for the e-core. There is nothing changed for the data source encoding of the p-core, which still reuse the intel_pmu_pebs_data_source_skl(). Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/20220629150840.2235741-2-kan.liang@linux.intel.com
2022-07-04perf/x86/intel: Fix PEBS memory access info encoding for ADLKan Liang3-33/+60
The PEBS memory access latency encoding for the e-core is slightly different from the p-core. The bit 4 is Lock, while the bit 5 is TLB access. Add a new flag to indicate the load/store latency event on a hybrid platform. Add a new function pointer to retrieve the latency data for a hybrid platform. Only implement the new flag and function for the e-core on ADL. Still use the existing PERF_X86_EVENT_PEBS_LDLAT/STLAT flag for the p-core on ADL. Factor out pebs_set_tlb_lock() to set the generic memory data source information of the TLB access and lock for both load and store latency. Move the intel_get_event_constraints() to ahead of the :ppp check, otherwise the new flag never gets a chance to be set for the :ppp events. Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/20220629150840.2235741-1-kan.liang@linux.intel.com
2022-06-13perf/x86/amd/uncore: Add PerfMonV2 RDPMC assignmentsSandipan Das1-0/+10
The current RDPMC assignment scheme maps four DF PMCs and six L3 PMCs from index 6 to 15. If AMD Performance Monitoring Version 2 (PerfMonV2) is supported, there may be additional DF counters available which are mapped starting from index 16 i.e. just after the L3 counters. Update the RDPMC assignments accordingly. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/1359379ef34da760f108b075ac138ab082caa3ba.1652954372.git.sandipan.das@amd.com
2022-06-13perf/x86/amd/uncore: Add PerfMonV2 DF event formatSandipan Das1-7/+17
If AMD Performance Monitoring Version 2 (PerfMonV2) is supported, use bits 0-7, 32-37 as EventSelect and bits 8-15, 24-27 as UnitMask for Data Fabric (DF) events. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/ffc24d5a3375b1d6e457d88e83241114de5c1942.1652954372.git.sandipan.das@amd.com
2022-06-13perf/x86/amd/uncore: Detect available DF countersSandipan Das1-0/+10
If AMD Performance Monitoring Version 2 (PerfMonV2) is supported, use CPUID leaf 0x80000022 EBX to detect the number of Data Fabric (DF) PMCs. This offers more flexibility if the counts change in later processor families. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/bac7b2806561e03f2acc7fdc9db94f102df80e1d.1652954372.git.sandipan.das@amd.com
2022-06-13perf/x86/amd/uncore: Use attr_update for format attributesSandipan Das1-14/+54
Use the update_attrs attribute group introduced by commit f3a3a8257e5a ("perf/core: Add attr_groups_update into struct pmu") and the is_visible() callback to populate the family specifc attributes for uncore events. The changes apply to attributes that are unique to families such as slicemask for Family 17h and coreid for Family 19h. The addition of common attributes such as event and umask, whose formats change across families, remain unchanged. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/a5e4f4dd5c459199fc497e82b858ba09dc91c064.1652954372.git.sandipan.das@amd.com
2022-06-13perf/x86/amd/uncore: Use dynamic events arraySandipan Das1-7/+31
If AMD Performance Monitoring Version 2 (PerfMonV2) is supported, the number of available counters for a given uncore PMU may not be fixed across families and models and has to be determined at runtime. The per-cpu uncore PMU data currently uses a fixed-sized array for event information. Make it dynamic based on the number of available counters. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/21eea0cb6de9d14f78d52d1d62637ae02bc900f5.1652954372.git.sandipan.das@amd.com
2022-06-13x86/events/intel/ds: Enable large PEBS for PERF_SAMPLE_WEIGHT_TYPELike Xu1-1/+2
All the information required by the PERF_SAMPLE_WEIGHT is available in the pebs record. Thus large PEBS could be enabled for PERF_SAMPLE_WEIGHT sample type to save PMIs overhead until other non-compatible flags such as PERF_SAMPLE_DATA_PAGE_SIZE (due to lack of munmap tracking) stop it. To cover new weight extension, add PERF_SAMPLE_WEIGHT_TYPE to the guardian LARGE_PEBS_FLAGS. Tested it with: $ perf mem record -c 1000 workload Before: Captured and wrote 0.126 MB perf.data (958 samples) [958 PMIs] After: Captured and wrote 0.313 MB perf.data (4859 samples) [3 PMIs] Reported-by: Yongchao Duan <yongduan@tencent.com> Signed-off-by: Like Xu <likexu@tencent.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220519151913.80545-1-likexu@tencent.com
2022-06-08x86: events: Do not return bogus capabilities if PMU is brokenPaolo Bonzini1-2/+10
If the PMU is broken due to firmware issues, check_hw_exists() will return false but perf_get_x86_pmu_capability() will still return data from x86_pmu. Likewise if some of the hotplug callbacks cannot be installed the contents of x86_pmu will not be reverted. Handle the failure in both cases by clearing x86_pmu if init_hw_perf_events() or reverts to software events only. Co-developed-by: Like Xu <likexu@tencent.com> Signed-off-by: Like Xu <likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08perf/x86/intel: Fix the comment about guest LBR support on KVMLike Xu1-2/+1
Starting from v5.12, KVM reports guest LBR and extra_regs support when the host has relevant support. Just delete this part of the comment and fix a typo incidentally. Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Message-Id: <20220517154100.29983-2-weijiang.yang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08perf: x86/core: Add interface to query perfmon_event_map[] directlyLike Xu1-0/+11
Currently, we have [intel|knc|p4|p6]_perfmon_event_map on the Intel platforms and amd_[f17h]_perfmon_event_map on the AMD platforms. Early clumsy KVM code or other potential perf_event users may have hard-coded these perfmon_maps (e.g., arch/x86/kvm/svm/pmu.c), so it would not make sense to program a common hardware event based on the generic "enum perf_hw_id" once the two tables do not match. Let's provide an interface for callers outside the perf subsystem to get the counter config based on the perfmon_event_map currently in use, and it also helps to save bytes. Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Like Xu <likexu@tencent.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220518132512.37864-10-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86/pmu: Disable guest PEBS temporarily in two rare situationsLike Xu1-2/+9
The guest PEBS will be disabled when some users try to perf KVM and its user-space through the same PEBS facility OR when the host perf doesn't schedule the guest PEBS counter in a one-to-one mapping manner (neither of these are typical scenarios). The PEBS records in the guest DS buffer are still accurate and the above two restrictions will be checked before each vm-entry only if guest PEBS is deemed to be enabled. Suggested-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-15-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBSLike Xu1-0/+8
If IA32_PERF_CAPABILITIES.PEBS_BASELINE [bit 14] is set, the adaptive PEBS is supported. The PEBS_DATA_CFG MSR and adaptive record enable bits (IA32_PERFEVTSELx.Adaptive_Record and IA32_FIXED_CTR_CTRL. FCx_Adaptive_Record) are also supported. Adaptive PEBS provides software the capability to configure the PEBS records to capture only the data of interest, keeping the record size compact. An overflow of PMCx results in generation of an adaptive PEBS record with state information based on the selections specified in MSR_PEBS_DATA_CFG.By default, the record only contain the Basic group. When guest adaptive PEBS is enabled, the IA32_PEBS_ENABLE MSR will be added to the perf_guest_switch_msr() and switched during the VMX transitions just like CORE_PERF_GLOBAL_CTRL MSR. According to Intel SDM, software is recommended to PEBS Baseline when the following is true. IA32_PERF_CAPABILITIES.PEBS_BASELINE[14] && IA32_PERF_CAPABILITIES.PEBS_FMT[11:8] ≥ 4. Co-developed-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220411101946.20262-12-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DSLike Xu1-1/+9
When CPUID.01H:EDX.DS[21] is set, the IA32_DS_AREA MSR exists and points to the linear address of the first byte of the DS buffer management area, which is used to manage the PEBS records. When guest PEBS is enabled, the MSR_IA32_DS_AREA MSR will be added to the perf_guest_switch_msr() and switched during the VMX transitions just like CORE_PERF_GLOBAL_CTRL MSR. The WRMSR to IA32_DS_AREA MSR brings a #GP(0) if the source register contains a non-canonical address. Originally-by: Andi Kleen <ak@linux.intel.com> Co-developed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-11-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counterLike Xu1-1/+1
The PEBS-PDIR facility on Ice Lake server is supported on IA31_FIXED0 only. If the guest configures counter 32 and PEBS is enabled, the PEBS-PDIR facility is supposed to be used, in which case KVM adjusts attr.precise_ip to 3 and request host perf to assign the exactly requested counter or fail. The CPU model check is also required since some platforms may place the PEBS-PDIR facility in another counter index. Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-10-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBSLike Xu1-18/+57
If IA32_PERF_CAPABILITIES.PEBS_BASELINE [bit 14] is set, the IA32_PEBS_ENABLE MSR exists and all architecturally enumerated fixed and general-purpose counters have corresponding bits in IA32_PEBS_ENABLE that enable generation of PEBS records. The general-purpose counter bits start at bit IA32_PEBS_ENABLE[0], and the fixed counter bits start at bit IA32_PEBS_ENABLE[32]. When guest PEBS is enabled, the IA32_PEBS_ENABLE MSR will be added to the perf_guest_switch_msr() and atomically switched during the VMX transitions just like CORE_PERF_GLOBAL_CTRL MSR. Based on whether the platform supports x86_pmu.pebs_ept, it has also refactored the way to add more msrs to arr[] in intel_guest_get_msrs() for extensibility. Originally-by: Andi Kleen <ak@linux.intel.com> Co-developed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Co-developed-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-8-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08x86/perf/core: Add pebs_capable to store valid PEBS_COUNTER_MASK valuePeter Zijlstra (Intel)2-8/+7
The value of pebs_counter_mask will be accessed frequently for repeated use in the intel_guest_get_msrs(). So it can be optimized instead of endlessly mucking about with branches. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-7-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08perf/x86/core: Pass "struct kvm_pmu *" to determine the guest valuesLike Xu3-5/+5
Splitting the logic for determining the guest values is unnecessarily confusing, and potentially fragile. Perf should have full knowledge and control of what values are loaded for the guest. If we change .guest_get_msrs() to take a struct kvm_pmu pointer, then it can generate the full set of guest values by grabbing guest ds_area and pebs_data_cfg. Alternatively, .guest_get_msrs() could take the desired guest MSR values directly (ds_area and pebs_data_cfg), but kvm_pmu is vendor agnostic, so we don't see any reason to not just pass the pointer. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-4-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08perf/x86/intel: Handle guest PEBS overflow PMI for KVM guestLike Xu1-0/+42
With PEBS virtualization, the guest PEBS records get delivered to the guest DS, and the host pmi handler uses perf_guest_cbs->is_in_guest() to distinguish whether the PMI comes from the guest code like Intel PT. No matter how many guest PEBS counters are overflowed, only triggering one fake event is enough. The fake event causes the KVM PMI callback to be called, thereby injecting the PEBS overflow PMI into the guest. KVM may inject the PMI with BUFFER_OVF set, even if the guest DS is empty. That should really be harmless. Thus guest PEBS handler would retrieve the correct information from its own PEBS records buffer. Cc: linux-perf-users@vger.kernel.org Originally-by: Andi Kleen <ak@linux.intel.com> Co-developed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220411101946.20262-3-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake ServerLike Xu3-1/+4
Add support for EPT-Friendly PEBS, a new CPU feature that enlightens PEBS to translate guest linear address through EPT, and facilitates handling VM-Exits that occur when accessing PEBS records. More information can be found in the December 2021 release of Intel's SDM, Volume 3, 18.9.5 "EPT-Friendly PEBS". This new hardware facility makes sure the guest PEBS records will not be lost, which is available on Intel Ice Lake Server platforms (and later). KVM will check this field through perf_get_x86_pmu_capability() instead of hard coding the CPU models in the KVM code. If it is supported, the guest PEBS capability will be exposed to the guest. Guest PEBS can be enabled when and only when "EPT-Friendly PEBS" is supported and EPT is enabled. Cc: linux-perf-users@vger.kernel.org Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220411101946.20262-2-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-05Merge tag 'perf-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-1/+1
Pull perf fixes from Thomas Gleixner: - Make the ICL event constraints match reality - Remove a unused local variable * tag 'perf-urgent-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Remove unused local variable perf/x86/intel: Fix event constraints for ICL
2022-05-25perf/x86/intel: Fix event constraints for ICLKan Liang1-1/+1
According to the latest event list, the event encoding 0x55 INST_DECODED.DECODERS and 0x56 UOPS_DECODED.DEC0 are only available on the first 4 counters. Add them into the event constraints table. Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220525133952.1660658-1-kan.liang@linux.intel.com
2022-05-25perf/x86/Kconfig: Fix indentation in the Kconfig fileJuerg Haefliger1-6/+6
The convention for indentation seems to be a single tab. Help text is further indented by an additional two whitespaces. Fix the lines that violate these rules. Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20220525133949.53730-1-juerg.haefliger@canonical.com
2022-05-19perf/x86/amd/core: Fix reloading events for SVMSandipan Das1-4/+20
Commit 1018faa6cf23 ("perf/x86/kvm: Fix Host-Only/Guest-Only counting with SVM disabled") addresses an issue in which the Host-Only bit in the counter control registers needs to be masked off when SVM is not enabled. The events need to be reloaded whenever SVM is enabled or disabled for a CPU and this requires the PERF_CTL registers to be reprogrammed using {enable,disable}_all(). However, PerfMonV2 variants of these functions do not reprogram the PERF_CTL registers. Hence, the legacy enable_all() function should also be called. Fixes: 9622e67e3980 ("perf/x86/amd/core: Add PerfMonV2 counter control") Reported-by: Like Xu <likexu@tencent.com> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220518084327.464005-1-sandipan.das@amd.com
2022-05-18perf/x86/amd: Run AMD BRS code only on supported hwBorislav Petkov1-1/+4
This fires on a Fam16h machine here: unchecked MSR access error: WRMSR to 0xc000010f (tried to write 0x0000000000000018) \ at rIP: 0xffffffff81007db1 (amd_brs_reset+0x11/0x50) Call Trace: <TASK> amd_pmu_cpu_starting ? x86_pmu_dead_cpu x86_pmu_starting_cpu cpuhp_invoke_callback ? x86_pmu_starting_cpu ? x86_pmu_dead_cpu cpuhp_issue_call ? x86_pmu_starting_cpu __cpuhp_setup_state_cpuslocked ? x86_pmu_dead_cpu ? x86_pmu_starting_cpu __cpuhp_setup_state ? map_vsyscall init_hw_perf_events ? map_vsyscall do_one_initcall ? _raw_spin_unlock_irqrestore ? try_to_wake_up kernel_init_freeable ? rest_init kernel_init ret_from_fork because that CPU hotplug callback gets executed on any AMD CPU - not only on the BRS-enabled ones. Check the BRS feature bit properly. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-By: Stephane Eranian <eranian@google.com> Link: https://lkml.kernel.org/r/20220516154838.7044-1-bp@alien8.de
2022-05-18perf/x86/amd: Fix AMD BRS period adjustmentPeter Zijlstra3-25/+13
There's two problems with the current amd_brs_adjust_period() code: - it isn't in fact AMD specific and wil always adjust the period; - it adjusts the period, while it should only adjust the event count, resulting in repoting a short period. Fix this by using x86_pmu.limit_period, this makes it specific to the AMD BRS case and ensures only the event count is adjusted while the reported period is unmodified. Fixes: ba2fe7500845 ("perf/x86/amd: Add AMD branch sampling period adjustment") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2022-05-12perf/x86/amd: Remove unused variable 'hwc'Zucheng Zheng1-3/+0
'hwc' is never used in amd_pmu_enable_all(), so remove it. Signed-off-by: Zucheng Zheng <zhengzucheng@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220421111031.174698-1-zhengzucheng@huawei.com
2022-05-11perf/amd/ibs: Advertise zen4_ibs_extensions as pmu capability attributeRavi Bangoria1-0/+21
PMU driver can advertise certain feature via capability attribute('caps' sysfs directory) which can be consumed by userspace tools like perf. Add zen4_ibs_extensions capability attribute for IBS pmus. This attribute will be enabled when CPUID_Fn8000001B_EAX[11] is set. With patch on Zen4: $ ls /sys/bus/event_source/devices/ibs_op/caps zen4_ibs_extensions Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220509044914.1473-5-ravi.bangoria@amd.com
2022-05-11perf/amd/ibs: Add support for L3 miss filteringRavi Bangoria1-7/+60
IBS L3 miss filtering works by tagging an instruction on IBS counter overflow and generating an NMI if the tagged instruction causes an L3 miss. Samples without an L3 miss are discarded and counter is reset with random value (between 1-15 for fetch pmu and 1-127 for op pmu). This helps in reducing sampling overhead when user is interested only in such samples. One of the use case of such filtered samples is to feed data to page-migration daemon in tiered memory systems. Add support for L3 miss filtering in IBS driver via new pmu attribute "l3missonly". Example usage: # perf record -a -e ibs_op/l3missonly=1/ --raw-samples sleep 5 Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220509044914.1473-4-ravi.bangoria@amd.com
2022-05-11perf/amd/ibs: Use ->is_visible callback for dynamic attributesRavi Bangoria1-24/+54
Currently, some attributes are added at build time whereas others at boot time depending on IBS pmu capabilities. Instead, we can just add all attribute groups at build time but hide individual group at boot time using more appropriate ->is_visible() callback. Also, struct perf_ibs has bunch of fields for pmu attributes which just pass on the pointer, does not do anything else. Remove them. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220509044914.1473-3-ravi.bangoria@amd.com
2022-05-11perf/amd/ibs: Cascade pmu init functions' return valueRavi Bangoria1-8/+29
IBS pmu initialization code ignores return value provided by callee functions. Fix it. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220509044914.1473-2-ravi.bangoria@amd.com
2022-05-11perf/x86/uncore: Add new Alder Lake and Raptor Lake supportKan Liang2-0/+54
From the perspective of the uncore PMU, there is nothing changed for the new Alder Lake N and Raptor Lake P. Add new PCIIDs of IMC. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220504194413.1003071-5-kan.liang@linux.intel.com
2022-05-11perf/x86/uncore: Clean up uncore_pci_ids[]Kan Liang1-316/+86
The initialization code to assign PCI IDs for different platforms is similar. Add the new macros to reduce the redundant code. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220504194413.1003071-4-kan.liang@linux.intel.com
2022-05-11perf/x86/cstate: Add new Alder Lake and Raptor Lake supportKan Liang1-0/+2
From the perspective of Intel cstate residency counters, there is nothing changed for the new Alder Lake N and Raptor Lake P. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220504194413.1003071-3-kan.liang@linux.intel.com
2022-05-11perf/x86/msr: Add new Alder Lake and Raptor Lake supportKan Liang1-0/+2
The new Alder Lake N and Raptor Lake P also support PPERF and SMI_COUNT MSRs. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220504194413.1003071-2-kan.liang@linux.intel.com
2022-05-11perf/x86: Add new Alder Lake and Raptor Lake supportKan Liang1-0/+2
From PMU's perspective, there is no difference for the new Alder Lake N and Raptor Lake P. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220504194413.1003071-1-kan.liang@linux.intel.com
2022-05-11Merge branch 'v5.18-rc5'Peter Zijlstra5-12/+42
Obtain the new INTEL_FAM6 stuff required. Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2022-05-10perf/amd/ibs: Use interrupt regs ip for stack unwindingRavi Bangoria1-0/+18
IbsOpRip is recorded when IBS interrupt is triggered. But there is a skid from the time IBS interrupt gets triggered to the time the interrupt is presented to the core. Meanwhile processor would have moved ahead and thus IbsOpRip will be inconsistent with rsp and rbp recorded as part of the interrupt regs. This causes issues while unwinding stack using the ORC unwinder as it needs consistent rip, rsp and rbp. Fix this by using rip from interrupt regs instead of IbsOpRip for stack unwinding. Fixes: ee9f8fce99640 ("x86/unwind: Add the ORC unwinder") Reported-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220429051441.14251-1-ravi.bangoria@amd.com
2022-05-04perf/x86/amd/core: Add PerfMonV2 overflow handlingSandipan Das1-11/+133
If AMD Performance Monitoring Version 2 (PerfMonV2) is supported, use a new scheme to process Core PMC overflows in the NMI handler using the new global control and status registers. This will be bypassed on unsupported hardware (x86_pmu.version < 2). In x86_pmu_handle_irq(), overflows are detected by testing the contents of the PERF_CTR register for each active PMC in a loop. The new scheme instead inspects the overflow bits of the global status register. The Performance Counter Global Status (PerfCntrGlobalStatus) register has overflow (PerfCntrOvfl) bits for each PMC. This is, however, a read-only MSR. To acknowledge that overflows have been processed, the NMI handler must clear the bits by writing to the PerfCntrGlobalStatusClr register. In x86_pmu_handle_irq(), PMCs counting the same event that are started and stopped at the same time record slightly different counts due to delays in between reads from the PERF_CTR registers. This is fixed by stopping and starting the PMCs at the same before and with a single write to the Performance Counter Global Control (PerfCntrGlobalCtl) upon entering and before exiting the NMI handler. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/f20b7e4da0b0a83bdbe05857f354146623bc63ab.1650515382.git.sandipan.das@amd.com
2022-05-04perf/x86/amd/core: Add PerfMonV2 counter controlSandipan Das1-5/+45
If AMD Performance Monitoring Version 2 (PerfMonV2) is supported, use a new scheme to manage the Core PMCs using the new global control and status registers. This will be bypassed on unsupported hardware (x86_pmu.version < 2). Currently, all PMCs have dedicated control (PERF_CTL) and counter (PERF_CTR) registers. For a given PMC, the enable (En) bit of its PERF_CTL register is used to start or stop counting. The Performance Counter Global Control (PerfCntrGlobalCtl) register has enable (PerfCntrEn) bits for each PMC. For a PMC to start counting, both PERF_CTL and PerfCntrGlobalCtl enable bits must be set. If either of those are cleared, the PMC stops counting. In x86_pmu_{en,dis}able_all(), the PERF_CTL registers of all active PMCs are written to in a loop. Ideally, PMCs counting the same event that were started and stopped at the same time should record the same counts. Due to delays in between writes to the PERF_CTL registers across loop iterations, the PMCs cannot be enabled or disabled at the same instant and hence, record slightly different counts. This is fixed by enabling or disabling all active PMCs at the same time with a single write to the PerfCntrGlobalCtl register. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/dfe8e934074aaabc6ba748dfaccd0a77c974bb82.1650515382.git.sandipan.das@amd.com
2022-05-04perf/x86/amd/core: Detect available countersSandipan Das1-0/+6
If AMD Performance Monitoring Version 2 (PerfMonV2) is supported, use CPUID leaf 0x80000022 EBX to detect the number of Core PMCs. This offers more flexibility if the counts change in later processor families. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/68a6d9688df189267db26530378870edd34f7b06.1650515382.git.sandipan.das@amd.com
2022-05-04perf/x86/amd/core: Detect PerfMonV2 supportSandipan Das1-0/+27
AMD Performance Monitoring Version 2 (PerfMonV2) introduces some new Core PMU features such as detection of the number of available PMCs and managing PMCs using global registers namely, PerfCntrGlobalCtl and PerfCntrGlobalStatus. Clearing PerfCntrGlobalCtl and PerfCntrGlobalStatus ensures that all PMCs are inactive and have no pending overflows when CPUs are onlined or offlined. The PMU version (x86_pmu.version) now indicates PerfMonV2 support and will be used to bypass the new features on unsupported processors. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/dc8672ecbddff394e088ca8abf94b089b8ecc2e7.1650515382.git.sandipan.das@amd.com
2022-04-19perf/x86/cstate: Add SAPPHIRERAPIDS_X CPU supportZhang Rui1-3/+4
From the perspective of Intel cstate residency counters, SAPPHIRERAPIDS_X is the same as ICELAKE_X. Share the code with it. And update the comments for SAPPHIRERAPIDS_X. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20220415104520.2737004-1-rui.zhang@intel.com
2022-04-05perf/x86: Unify format of events sysfs showYang Jihong1-1/+1
Sysfs show formats of files in /sys/devices/cpu/events/ are not unified, some end with "\n", and some do not. Modify sysfs show format of events defined by EVENT_ATTR_STR to end with "\n". Before: $ ls /sys/devices/cpu/events/* | xargs -i sh -c 'echo -n "{}: "; cat -A {}; echo' branch-instructions: event=0xc4$ branch-misses: event=0xc5$ bus-cycles: event=0x3c,umask=0x01$ cache-misses: event=0x2e,umask=0x41$ cache-references: event=0x2e,umask=0x4f$ cpu-cycles: event=0x3c$ instructions: event=0xc0$ ref-cycles: event=0x00,umask=0x03$ slots: event=0x00,umask=0x4 topdown-bad-spec: event=0x00,umask=0x81 topdown-be-bound: event=0x00,umask=0x83 topdown-fe-bound: event=0x00,umask=0x82 topdown-retiring: event=0x00,umask=0x80 After: $ ls /sys/devices/cpu/events/* | xargs -i sh -c 'echo -n "{}: "; cat -A {}; echo' /sys/devices/cpu/events/branch-instructions: event=0xc4$ /sys/devices/cpu/events/branch-misses: event=0xc5$ /sys/devices/cpu/events/bus-cycles: event=0x3c,umask=0x01$ /sys/devices/cpu/events/cache-misses: event=0x2e,umask=0x41$ /sys/devices/cpu/events/cache-references: event=0x2e,umask=0x4f$ /sys/devices/cpu/events/cpu-cycles: event=0x3c$ /sys/devices/cpu/events/instructions: event=0xc0$ /sys/devices/cpu/events/ref-cycles: event=0x00,umask=0x03$ /sys/devices/cpu/events/slots: event=0x00,umask=0x4$ Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220324031957.135595-1-yangjihong1@huawei.com
2022-04-05perf/x86/amd: Add idle hooks for branch samplingStephane Eranian3-0/+38
On AMD Fam19h Zen3, the branch sampling (BRS) feature must be disabled before entering low power and re-enabled (if was active) when returning from low power. Otherwise, the NMI interrupt may be held up for too long and cause problems. Stopping BRS will cause the NMI to be delivered if it was held up. Define a perf_amd_brs_lopwr_cb() callback to stop/restart BRS. The callback is protected by a jump label which is enabled only when AMD BRS is detected. In all other cases, the callback is never called. Signed-off-by: Stephane Eranian <eranian@google.com> [peterz: static_call() and build fixes] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220322221517.2510440-10-eranian@google.com
2022-04-05perf/x86/amd: Make Zen3 branch sampling opt-inStephane Eranian3-11/+49
Add a kernel config option CONFIG_PERF_EVENTS_AMD_BRS to make the support for AMD Zen3 Branch Sampling (BRS) an opt-in compile time option. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220322221517.2510440-8-eranian@google.com