aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/events/intel (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-11-02perf/x86/intel: Add Cooper Lake stepping to isolation_ucodes[]Kan Liang1-0/+1
The intel_pebs_isolation quirk checks both model number and stepping. Cooper Lake has a different stepping (11) than the other Skylake Xeon. It cannot benefit from the optimization in commit 9b545c04abd4f ("perf/x86/kvm: Avoid unnecessary work in guest filtering"). Add the stepping of Cooper Lake into the isolation_ucodes[] table. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221031154550.571663-1-kan.liang@linux.intel.com
2022-11-02perf/x86/intel: Fix pebs event constraints for SPRKan Liang1-2/+7
According to the latest event list, update the MEM_INST_RETIRED events which support the DataLA facility for SPR. Fixes: 61b985e3e775 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221031154119.571386-2-kan.liang@linux.intel.com
2022-11-02perf/x86/intel: Fix pebs event constraints for ICLKan Liang1-2/+7
According to the latest event list, update the MEM_INST_RETIRED events which support the DataLA facility. Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support") Reported-by: Jannis Klinkenberg <jannis.klinkenberg@rwth-aachen.de> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221031154119.571386-1-kan.liang@linux.intel.com
2022-10-20perf/x86/intel/lbr: Use setup_clear_cpu_cap() instead of clear_cpu_cap()Maxim Levitsky1-1/+1
clear_cpu_cap(&boot_cpu_data) is very similar to setup_clear_cpu_cap() except that the latter also sets a bit in 'cpu_caps_cleared' which later clears the same cap in secondary cpus, which is likely what is meant here. Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20220718141123.136106-2-mlevitsk@redhat.com
2022-10-10Merge tag 'perf-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds6-334/+134
Pull perf events updates from Ingo Molnar: "PMU driver updates: - Add AMD Last Branch Record Extension Version 2 (LbrExtV2) feature support for Zen 4 processors. - Extend the perf ABI to provide branch speculation information, if available, and use this on CPUs that have it (eg. LbrExtV2). - Improve Intel PEBS TSC timestamp handling & integration. - Add Intel Raptor Lake S CPU support. - Add 'perf mem' and 'perf c2c' memory profiling support on AMD CPUs by utilizing IBS tagged load/store samples. - Clean up & optimize various x86 PMU details. HW breakpoints: - Big rework to optimize the code for systems with hundreds of CPUs and thousands of breakpoints: - Replace the nr_bp_mutex global mutex with the bp_cpuinfo_sem per-CPU rwsem that is read-locked during most of the key operations. - Improve the O(#cpus * #tasks) logic in toggle_bp_slot() and fetch_bp_busy_slots(). - Apply micro-optimizations & cleanups. - Misc cleanups & enhancements" * tag 'perf-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits) perf/hw_breakpoint: Annotate tsk->perf_event_mutex vs ctx->mutex perf: Fix pmu_filter_match() perf: Fix lockdep_assert_event_ctx() perf/x86/amd/lbr: Adjust LBR regardless of filtering perf/x86/utils: Fix uninitialized var in get_branch_type() perf/uapi: Define PERF_MEM_SNOOPX_PEER in kernel header file perf/x86/amd: Support PERF_SAMPLE_PHY_ADDR perf/x86/amd: Support PERF_SAMPLE_ADDR perf/x86/amd: Support PERF_SAMPLE_{WEIGHT|WEIGHT_STRUCT} perf/x86/amd: Support PERF_SAMPLE_DATA_SRC perf/x86/amd: Add IBS OP_DATA2 DataSrc bit definitions perf/mem: Introduce PERF_MEM_LVLNUM_{EXTN_MEM|IO} perf/x86/uncore: Add new Raptor Lake S support perf/x86/cstate: Add new Raptor Lake S support perf/x86/msr: Add new Raptor Lake S support perf/x86: Add new Raptor Lake S support bpf: Check flags for branch stack in bpf_read_branch_records helper perf, hw_breakpoint: Fix use-after-free if perf_event_open() fails perf: Use sample_flags for raw_data perf: Use sample_flags for addr ...
2022-09-29perf/x86/uncore: Add new Raptor Lake S supportKan Liang1-0/+1
From the perspective of the uncore PMU, the new Raptor Lake S is the same as the other hybrid {ALDER,RAPTOP}LAKE. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220928153331.3757388-4-kan.liang@linux.intel.com
2022-09-29perf/x86/cstate: Add new Raptor Lake S supportKan Liang1-0/+1
From the perspective of Intel cstate residency counters, the new Raptor Lake S is the same as the other hybrid {ALDER,RAPTOP}LAKE. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220928153331.3757388-3-kan.liang@linux.intel.com
2022-09-29perf/x86: Add new Raptor Lake S supportKan Liang1-0/+1
From PMU's perspective, the new Raptor Lake S is the same as the other of hybrid {ALDER,RAPTOP}LAKE. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220928153331.3757388-1-kan.liang@linux.intel.com
2022-09-29Merge branch 'v6.0-rc7'Peter Zijlstra4-8/+38
Merge upstream to get RAPTORLAKE_S Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2022-09-27perf: Use sample_flags for addrNamhyung Kim1-2/+6
Use the new sample_flags to indicate whether the addr field is filled by the PMU driver. As most PMU drivers pass 0, it can set the flag only if it has a non-zero value. And use 0 in perf_sample_output() if it's not filled already. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220921220032.2858517-1-namhyung@kernel.org
2022-09-13perf: Kill __PERF_SAMPLE_CALLCHAIN_EARLYNamhyung Kim1-3/+0
There's no in-tree user anymore. Let's get rid of it. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220908214104.3851807-3-namhyung@kernel.org
2022-09-13perf: Use sample_flags for callchainNamhyung Kim1-2/+6
So that it can call perf_callchain() only if needed. Historically it used __PERF_SAMPLE_CALLCHAIN_EARLY but we can do that with sample_flags in the struct perf_sample_data. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220908214104.3851807-1-namhyung@kernel.org
2022-09-07perf/x86/intel: Optimize FIXED_CTR_CTRL accessKan Liang1-9/+13
All the fixed counters share a fixed control register. The current perf reads and re-writes the fixed control register for each fixed counter disable/enable, which is unnecessary. When changing the fixed control register, the entire PMU must be disabled via the global control register. The changing cannot be taken effect until the entire PMU is re-enabled. Only updating the fixed control register once right before the entire PMU re-enabling is enough. The read of the fixed control register is not necessary either. The value can be cached in the per CPU cpu_hw_events. Test results: Counting all the fixed counters with the perf bench sched pipe as below on a SPR machine. $perf stat -e cycles,instructions,ref-cycles,slots --no-inherit -- taskset -c 1 perf bench sched pipe The Total elapsed time reduces from 5.36s (without the patch) to 4.99s (with the patch), which is ~6.9% improvement. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220804140729.2951259-1-kan.liang@linux.intel.com
2022-09-07perf/x86/p4: Remove perfctr_second_write quirkPeter Zijlstra1-10/+27
Now that we have a x86_pmu::set_period() method, use it to remove the perfctr_second_write quirk from the generic code. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220829101321.839502514@infradead.org
2022-09-07perf/x86/intel: Remove x86_pmu::update_topdown_eventPeter Zijlstra1-10/+12
Now that it is all internal to the intel driver, remove x86_pmu::update_topdown_event. Assumes that is_topdown_count(event) can only be true when the hardware has topdown stuff and the function is set. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220829101321.771635301@infradead.org
2022-09-07perf/x86/intel: Remove x86_pmu::set_topdown_event_periodPeter Zijlstra1-6/+10
Now that it is all internal to the intel driver, remove x86_pmu::set_topdown_event_period. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220829101321.706354189@infradead.org
2022-09-07perf/x86: Change x86_pmu::limit_period signaturePeter Zijlstra1-11/+8
In preparation for making it a static_call, change the signature. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220829101321.573713839@infradead.org
2022-09-07perf/x86/intel: Move the topdown stuff into the intel driverPeter Zijlstra1-3/+23
Use the new x86_pmu::{set_period,update}() methods to push the topdown stuff into the Intel driver, where it belongs. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220829101321.505933457@infradead.org
2022-09-06perf: Use sample_flags for txnKan Liang1-2/+6
Use the new sample_flags to indicate whether the txn field is filled by the PMU driver. Remove the txn field from the perf_sample_data_init() to minimize the number of cache lines touched. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220901130959.1285717-7-kan.liang@linux.intel.com
2022-09-06perf: Use sample_flags for data_srcKan Liang1-2/+6
Use the new sample_flags to indicate whether the data_src field is filled by the PMU driver. Remove the data_src field from the perf_sample_data_init() to minimize the number of cache lines touched. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220901130959.1285717-6-kan.liang@linux.intel.com
2022-09-06perf: Use sample_flags for weightKan Liang1-3/+7
Use the new sample_flags to indicate whether the weight field is filled by the PMU driver. Remove the weight field from the perf_sample_data_init() to minimize the number of cache lines touched. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220901130959.1285717-5-kan.liang@linux.intel.com
2022-09-06perf: Use sample_flags for branch stackKan Liang2-2/+7
Use the new sample_flags to indicate whether the branch stack is filled by the PMU driver. Remove the br_stack from the perf_sample_data_init() to minimize the number of cache lines touched. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220901130959.1285717-4-kan.liang@linux.intel.com
2022-09-06perf/x86/intel/pebs: Fix PEBS timestamps overwrittenKan Liang1-2/+6
The PEBS TSC-based timestamps do not appear correctly in the final perf.data output file from perf record. The data->time field setup by PEBS in the setup_pebs_fixed_sample_data() is later overwritten by perf_events generic code in perf_prepare_sample(). There is an ordering problem. Set the sample flags when the data->time is updated by PEBS. The data->time field will not be overwritten anymore. Reported-by: Andreas Kogler <andreas.kogler.0x@gmail.com> Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220901130959.1285717-3-kan.liang@linux.intel.com
2022-09-01Merge tag 'kvm-s390-master-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEADPaolo Bonzini4-7/+36
PCI interpretation compile fixes
2022-09-01perf/x86/core: Completely disable guest PEBS via guest's global_ctrlLike Xu1-1/+2
When a guest PEBS counter is cross-mapped by a host counter, software will remove the corresponding bit in the arr[global_ctrl].guest and expect hardware to perform a change of state "from enable to disable" via the msr_slot[] switch during the vmx transaction. The real world is that if user adjust the counter overflow value small enough, it still opens a tiny race window for the previously PEBS-enabled counter to write cross-mapped PEBS records into the guest's PEBS buffer, when arr[global_ctrl].guest has been prioritised (switch_msr_special stuff) to switch into the enabled state, while the arr[pebs_enable].guest has not. Close this window by clearing invalid bits in the arr[global_ctrl].guest. Cc: linux-perf-users@vger.kernel.org Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sean Christopherson <seanjc@google.com> Fixes: 854250329c02 ("KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations") Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220831033524.58561-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-09-01perf/x86/core: Completely disable guest PEBS via guest's global_ctrlLike Xu1-1/+2
When a guest PEBS counter is cross-mapped by a host counter, software will remove the corresponding bit in the arr[global_ctrl].guest and expect hardware to perform a change of state "from enable to disable" via the msr_slot[] switch during the vmx transaction. The real world is that if user adjust the counter overflow value small enough, it still opens a tiny race window for the previously PEBS-enabled counter to write cross-mapped PEBS records into the guest's PEBS buffer, when arr[global_ctrl].guest has been prioritised (switch_msr_special stuff) to switch into the enabled state, while the arr[pebs_enable].guest has not. Close this window by clearing invalid bits in the arr[global_ctrl].guest. Fixes: 854250329c02 ("KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations") Signed-off-by: Like Xu <likexu@tencent.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220831033524.58561-1-likexu@tencent.com
2022-09-01perf/x86/intel: Fix unchecked MSR access error for Alder Lake NKan Liang2-3/+46
For some Alder Lake N machine, the below unchecked MSR access error may be triggered. [ 0.088017] rcu: Hierarchical SRCU implementation. [ 0.088017] unchecked MSR access error: WRMSR to 0x38f (tried to write 0x0001000f0000003f) at rIP: 0xffffffffb5684de8 (native_write_msr+0x8/0x30) [ 0.088017] Call Trace: [ 0.088017] <TASK> [ 0.088017] __intel_pmu_enable_all.constprop.46+0x4a/0xa0 The Alder Lake N only has e-cores. The X86_FEATURE_HYBRID_CPU flag is not set. The perf cannot retrieve the correct CPU type via get_this_hybrid_cpu_type(). The model specific get_hybrid_cpu_type() is hardcode to p-core. The wrong CPU type is given to the PMU of the Alder Lake N. Since Alder Lake N isn't in fact a hybrid CPU, remove ALDERLAKE_N from the rest of {ALDER,RAPTOP}LAKE and create a non-hybrid PMU setup. The differences between Gracemont and the previous Tremont are, - Number of GP counters - Load and store latency Events - PEBS event_constraints - Instruction Latency support - Data source encoding - Memory access latency encoding Fixes: c2a960f7c574 ("perf/x86: Add new Alder Lake and Raptor Lake support") Reported-by: Jianfeng Gao <jianfeng.gao@intel.com> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220831142702.153110-1-kan.liang@linux.intel.com
2022-08-27perf/x86: Move branch classifierSandipan Das1-273/+0
Commit 3e702ff6d1ea ("perf/x86: Add LBR software filter support for Intel CPUs") introduces a software branch filter which complements the hardware branch filter and adds an x86 branch classifier. Move the branch classifier to arch/x86/events/ so that it can be utilized by other vendors for branch record filtering. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/bae5b95470d6bd49f40954bd379f414f5afcb965.1660211399.git.sandipan.das@amd.com
2022-08-27perf/x86/intel/uncore: Fix broken read_counter() for SNB IMC PMUStephane Eranian1-1/+17
Existing code was generating bogus counts for the SNB IMC bandwidth counters: $ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/ 1.000327813 1,024.03 MiB uncore_imc/data_reads/ 1.000327813 20.73 MiB uncore_imc/data_writes/ 2.000580153 261,120.00 MiB uncore_imc/data_reads/ 2.000580153 23.28 MiB uncore_imc/data_writes/ The problem was introduced by commit: 07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC") Where the read_counter callback was replace to point to the generic uncore_mmio_read_counter() function. The SNB IMC counters are freerunnig 32-bit counters laid out contiguously in MMIO. But uncore_mmio_read_counter() is using a readq() call to read from MMIO therefore reading 64-bit from MMIO. Although this is okay for the uncore_perf_event_update() function because it is shifting the value based on the actual counter width to compute a delta, it is not okay for the uncore_pmu_event_start() which is simply reading the counter and therefore priming the event->prev_count with a bogus value which is responsible for causing bogus deltas in the perf stat command above. The fix is to reintroduce the custom callback for read_counter for the SNB IMC PMU and use readl() instead of readq(). With the change the output of perf stat is back to normal: $ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/ 1.000120987 296.94 MiB uncore_imc/data_reads/ 1.000120987 138.42 MiB uncore_imc/data_writes/ 2.000403144 175.91 MiB uncore_imc/data_reads/ 2.000403144 68.50 MiB uncore_imc/data_writes/ Fixes: 07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC") Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20220803160031.1379788-1-eranian@google.com
2022-08-19perf/x86/intel: Fix pebs event constraints for ADLKan Liang1-1/+1
According to the latest event list, the LOAD_LATENCY PEBS event only works on the GP counter 0 and 1 for ADL and RPL. Update the pebs event constraints table. Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support") Reported-by: Ammy Yi <ammy.yi@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220818184429.2355857-1-kan.liang@linux.intel.com
2022-08-19perf/x86/intel/ds: Fix precise store latency handlingStephane Eranian1-1/+9
With the existing code in store_latency_data(), the memory operation (mem_op) returned to the user is always OP_LOAD where in fact, it should be OP_STORE. This comes from the fact that the function is simply grabbing the information from a data source map which covers only load accesses. Intel 12th gen CPU offers precise store sampling that captures both the data source and latency. Therefore it can use the data source mapping table but must override the memory operation to reflect stores instead of loads. Fixes: 61b985e3e775 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids") Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220818054613.1548130-1-eranian@google.com
2022-08-19perf/x86/core: Set pebs_capable and PMU_FL_PEBS_ALL for the BaselinePeter Zijlstra2-4/+1
The SDM explicitly states that PEBS Baseline implies Extended PEBS. For cpu model forward compatibility (e.g. on ICX, SPR, ADL), it's safe to stop doing FMS table thing such as setting pebs_capable and PMU_FL_PEBS_ALL since it's already set in the intel_ds_init(). The Goldmont Plus is the only platform which supports extended PEBS but doesn't have Baseline. Keep the status quo. Reported-by: Like Xu <likexu@tencent.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20220816114057.51307-1-likexu@tencent.com
2022-08-19perf/x86/lbr: Enable the branch type for the Arch LBR by defaultKan Liang1-0/+8
On the platform with Arch LBR, the HW raw branch type encoding may leak to the perf tool when the SAVE_TYPE option is not set. In the intel_pmu_store_lbr(), the HW raw branch type is stored in lbr_entries[].type. If the SAVE_TYPE option is set, the lbr_entries[].type will be converted into the generic PERF_BR_* type in the intel_pmu_lbr_filter() and exposed to the user tools. But if the SAVE_TYPE option is NOT set by the user, the current perf kernel doesn't clear the field. The HW raw branch type leaks. There are two solutions to fix the issue for the Arch LBR. One is to clear the field if the SAVE_TYPE option is NOT set. The other solution is to unconditionally convert the branch type and expose the generic type to the user tools. The latter is implemented here, because - The branch type is valuable information. I don't see a case where you would not benefit from the branch type. (Stephane Eranian) - Not having the branch type DOES NOT save any space in the branch record (Stephane Eranian) - The Arch LBR HW can retrieve the common branch types from the LBR_INFO. It doesn't require the high overhead SW disassemble. Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR") Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220816125612.2042397-1-kan.liang@linux.intel.com
2022-08-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-29/+131
Pull kvm updates from Paolo Bonzini: "Quite a large pull request due to a selftest API overhaul and some patches that had come in too late for 5.19. ARM: - Unwinder implementations for both nVHE modes (classic and protected), complete with an overflow stack - Rework of the sysreg access from userspace, with a complete rewrite of the vgic-v3 view to allign with the rest of the infrastructure - Disagregation of the vcpu flags in separate sets to better track their use model. - A fix for the GICv2-on-v3 selftest - A small set of cosmetic fixes RISC-V: - Track ISA extensions used by Guest using bitmap - Added system instruction emulation framework - Added CSR emulation framework - Added gfp_custom flag in struct kvm_mmu_memory_cache - Added G-stage ioremap() and iounmap() functions - Added support for Svpbmt inside Guest s390: - add an interface to provide a hypervisor dump for secure guests - improve selftests to use TAP interface - enable interpretive execution of zPCI instructions (for PCI passthrough) - First part of deferred teardown - CPU Topology - PV attestation - Minor fixes x86: - Permit guests to ignore single-bit ECC errors - Intel IPI virtualization - Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS - PEBS virtualization - Simplify PMU emulation by just using PERF_TYPE_RAW events - More accurate event reinjection on SVM (avoid retrying instructions) - Allow getting/setting the state of the speaker port data bit - Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent - "Notify" VM exit (detect microarchitectural hangs) for Intel - Use try_cmpxchg64 instead of cmpxchg64 - Ignore benign host accesses to PMU MSRs when PMU is disabled - Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior - Allow NX huge page mitigation to be disabled on a per-vm basis - Port eager page splitting to shadow MMU as well - Enable CMCI capability by default and handle injected UCNA errors - Expose pid of vcpu threads in debugfs - x2AVIC support for AMD - cleanup PIO emulation - Fixes for LLDT/LTR emulation - Don't require refcounted "struct page" to create huge SPTEs - Miscellaneous cleanups: - MCE MSR emulation - Use separate namespaces for guest PTEs and shadow PTEs bitmasks - PIO emulation - Reorganize rmap API, mostly around rmap destruction - Do not workaround very old KVM bugs for L0 that runs with nesting enabled - new selftests API for CPUID Generic: - Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache - new selftests API using struct kvm_vcpu instead of a (vm, id) tuple" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (606 commits) selftests: kvm: set rax before vmcall selftests: KVM: Add exponent check for boolean stats selftests: KVM: Provide descriptive assertions in kvm_binary_stats_test selftests: KVM: Check stat name before other fields KVM: x86/mmu: remove unused variable RISC-V: KVM: Add support for Svpbmt inside Guest/VM RISC-V: KVM: Use PAGE_KERNEL_IO in kvm_riscv_gstage_ioremap() RISC-V: KVM: Add G-stage ioremap() and iounmap() functions KVM: Add gfp_custom flag in struct kvm_mmu_memory_cache RISC-V: KVM: Add extensible CSR emulation framework RISC-V: KVM: Add extensible system instruction emulation framework RISC-V: KVM: Factor-out instruction emulation into separate sources RISC-V: KVM: move preempt_disable() call in kvm_arch_vcpu_ioctl_run RISC-V: KVM: Make kvm_riscv_guest_timer_init a void function RISC-V: KVM: Fix variable spelling mistake RISC-V: KVM: Improve ISA extension by using a bitmap KVM, x86/mmu: Fix the comment around kvm_tdp_mmu_zap_leafs() KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog KVM: x86/mmu: Treat NX as a valid SPTE bit for NPT KVM: x86: Do not block APIC write for non ICR registers ...
2022-08-01Merge tag 'perf-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds2-46/+90
Pull perf events updates from Ingo Molnar: - Fix Intel Alder Lake PEBS memory access latency & data source profiling info bugs. - Use Intel large-PEBS hardware feature in more circumstances, to reduce PMI overhead & reduce sampling data. - Extend the lost-sample profiling output with the PERF_FORMAT_LOST ABI variant, which tells tooling the exact number of samples lost. - Add new IBS register bits definitions. - AMD uncore events: Add PerfMonV2 DF (Data Fabric) enhancements. * tag 'perf-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/ibs: Add new IBS register bits into header perf/x86/intel: Fix PEBS data source encoding for ADL perf/x86/intel: Fix PEBS memory access info encoding for ADL perf/core: Add a new read format to get a number of lost samples perf/x86/amd/uncore: Add PerfMonV2 RDPMC assignments perf/x86/amd/uncore: Add PerfMonV2 DF event format perf/x86/amd/uncore: Detect available DF counters perf/x86/amd/uncore: Use attr_update for format attributes perf/x86/amd/uncore: Use dynamic events array x86/events/intel/ds: Enable large PEBS for PERF_SAMPLE_WEIGHT_TYPE
2022-08-01Merge remote-tracking branch 'kvm/next' into kvm-next-5.20Paolo Bonzini1-29/+131
KVM/s390, KVM/x86 and common infrastructure changes for 5.20 x86: * Permit guests to ignore single-bit ECC errors * Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache * Intel IPI virtualization * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS * PEBS virtualization * Simplify PMU emulation by just using PERF_TYPE_RAW events * More accurate event reinjection on SVM (avoid retrying instructions) * Allow getting/setting the state of the speaker port data bit * Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent * "Notify" VM exit (detect microarchitectural hangs) for Intel * Cleanups for MCE MSR emulation s390: * add an interface to provide a hypervisor dump for secure guests * improve selftests to use TAP interface * enable interpretive execution of zPCI instructions (for PCI passthrough) * First part of deferred teardown * CPU Topology * PV attestation * Minor fixes Generic: * new selftests API using struct kvm_vcpu instead of a (vm, id) tuple x86: * Use try_cmpxchg64 instead of cmpxchg64 * Bugfixes * Ignore benign host accesses to PMU MSRs when PMU is disabled * Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior * x86/MMU: Allow NX huge pages to be disabled on a per-vm basis * Port eager page splitting to shadow MMU as well * Enable CMCI capability by default and handle injected UCNA errors * Expose pid of vcpu threads in debugfs * x2AVIC support for AMD * cleanup PIO emulation * Fixes for LLDT/LTR emulation * Don't require refcounted "struct page" to create huge SPTEs x86 cleanups: * Use separate namespaces for guest PTEs and shadow PTEs bitmasks * PIO emulation * Reorganize rmap API, mostly around rmap destruction * Do not workaround very old KVM bugs for L0 that runs with nesting enabled * new selftests API for CPUID
2022-07-20perf/x86/intel/lbr: Fix unchecked MSR access error on HSWKan Liang1-9/+10
The fuzzer triggers the below trace. [ 7763.384369] unchecked MSR access error: WRMSR to 0x689 (tried to write 0x1fffffff8101349e) at rIP: 0xffffffff810704a4 (native_write_msr+0x4/0x20) [ 7763.397420] Call Trace: [ 7763.399881] <TASK> [ 7763.401994] intel_pmu_lbr_restore+0x9a/0x1f0 [ 7763.406363] intel_pmu_lbr_sched_task+0x91/0x1c0 [ 7763.410992] __perf_event_task_sched_in+0x1cd/0x240 On a machine with the LBR format LBR_FORMAT_EIP_FLAGS2, when the TSX is disabled, a TSX quirk is required to access LBR from registers. The lbr_from_signext_quirk_needed() is introduced to determine whether the TSX quirk should be applied. However, the lbr_from_signext_quirk_needed() is invoked before the intel_pmu_lbr_init(), which parses the LBR format information. Without the correct LBR format information, the TSX quirk never be applied. Move the lbr_from_signext_quirk_needed() into the intel_pmu_lbr_init(). Checking x86_pmu.lbr_has_tsx in the lbr_from_signext_quirk_needed() is not required anymore. Both LBR_FORMAT_EIP_FLAGS2 and LBR_FORMAT_INFO have LBR_TSX flag, but only the LBR_FORMAT_EIP_FLAGS2 requirs the quirk. Update the comments accordingly. Fixes: 1ac7fd8159a8 ("perf/x86/intel/lbr: Support LBR format V7") Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220714182630.342107-1-kan.liang@linux.intel.com
2022-07-04perf/x86/intel: Fix PEBS data source encoding for ADLKan Liang2-14/+39
The PEBS data source encoding for the e-core is different from the p-core. Add the pebs_data_source[] in the struct x86_hybrid_pmu to store the data source encoding for each type of the core. Add intel_pmu_pebs_data_source_grt() for the e-core. There is nothing changed for the data source encoding of the p-core, which still reuse the intel_pmu_pebs_data_source_skl(). Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/20220629150840.2235741-2-kan.liang@linux.intel.com
2022-07-04perf/x86/intel: Fix PEBS memory access info encoding for ADLKan Liang2-33/+52
The PEBS memory access latency encoding for the e-core is slightly different from the p-core. The bit 4 is Lock, while the bit 5 is TLB access. Add a new flag to indicate the load/store latency event on a hybrid platform. Add a new function pointer to retrieve the latency data for a hybrid platform. Only implement the new flag and function for the e-core on ADL. Still use the existing PERF_X86_EVENT_PEBS_LDLAT/STLAT flag for the p-core on ADL. Factor out pebs_set_tlb_lock() to set the generic memory data source information of the TLB access and lock for both load and store latency. Move the intel_get_event_constraints() to ahead of the :ppp check, otherwise the new flag never gets a chance to be set for the :ppp events. Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/20220629150840.2235741-1-kan.liang@linux.intel.com
2022-06-08perf/x86/intel: Fix the comment about guest LBR support on KVMLike Xu1-2/+1
Starting from v5.12, KVM reports guest LBR and extra_regs support when the host has relevant support. Just delete this part of the comment and fix a typo incidentally. Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Signed-off-by: Yang Weijiang <weijiang.yang@intel.com> Message-Id: <20220517154100.29983-2-weijiang.yang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86/pmu: Disable guest PEBS temporarily in two rare situationsLike Xu1-2/+9
The guest PEBS will be disabled when some users try to perf KVM and its user-space through the same PEBS facility OR when the host perf doesn't schedule the guest PEBS counter in a one-to-one mapping manner (neither of these are typical scenarios). The PEBS records in the guest DS buffer are still accurate and the above two restrictions will be checked before each vm-entry only if guest PEBS is deemed to be enabled. Suggested-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-15-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBSLike Xu1-0/+8
If IA32_PERF_CAPABILITIES.PEBS_BASELINE [bit 14] is set, the adaptive PEBS is supported. The PEBS_DATA_CFG MSR and adaptive record enable bits (IA32_PERFEVTSELx.Adaptive_Record and IA32_FIXED_CTR_CTRL. FCx_Adaptive_Record) are also supported. Adaptive PEBS provides software the capability to configure the PEBS records to capture only the data of interest, keeping the record size compact. An overflow of PMCx results in generation of an adaptive PEBS record with state information based on the selections specified in MSR_PEBS_DATA_CFG.By default, the record only contain the Basic group. When guest adaptive PEBS is enabled, the IA32_PEBS_ENABLE MSR will be added to the perf_guest_switch_msr() and switched during the VMX transitions just like CORE_PERF_GLOBAL_CTRL MSR. According to Intel SDM, software is recommended to PEBS Baseline when the following is true. IA32_PERF_CAPABILITIES.PEBS_BASELINE[14] && IA32_PERF_CAPABILITIES.PEBS_FMT[11:8] ≥ 4. Co-developed-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220411101946.20262-12-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DSLike Xu1-1/+9
When CPUID.01H:EDX.DS[21] is set, the IA32_DS_AREA MSR exists and points to the linear address of the first byte of the DS buffer management area, which is used to manage the PEBS records. When guest PEBS is enabled, the MSR_IA32_DS_AREA MSR will be added to the perf_guest_switch_msr() and switched during the VMX transitions just like CORE_PERF_GLOBAL_CTRL MSR. The WRMSR to IA32_DS_AREA MSR brings a #GP(0) if the source register contains a non-canonical address. Originally-by: Andi Kleen <ak@linux.intel.com> Co-developed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-11-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counterLike Xu1-1/+1
The PEBS-PDIR facility on Ice Lake server is supported on IA31_FIXED0 only. If the guest configures counter 32 and PEBS is enabled, the PEBS-PDIR facility is supposed to be used, in which case KVM adjusts attr.precise_ip to 3 and request host perf to assign the exactly requested counter or fail. The CPU model check is also required since some platforms may place the PEBS-PDIR facility in another counter index. Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-10-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBSLike Xu1-18/+57
If IA32_PERF_CAPABILITIES.PEBS_BASELINE [bit 14] is set, the IA32_PEBS_ENABLE MSR exists and all architecturally enumerated fixed and general-purpose counters have corresponding bits in IA32_PEBS_ENABLE that enable generation of PEBS records. The general-purpose counter bits start at bit IA32_PEBS_ENABLE[0], and the fixed counter bits start at bit IA32_PEBS_ENABLE[32]. When guest PEBS is enabled, the IA32_PEBS_ENABLE MSR will be added to the perf_guest_switch_msr() and atomically switched during the VMX transitions just like CORE_PERF_GLOBAL_CTRL MSR. Based on whether the platform supports x86_pmu.pebs_ept, it has also refactored the way to add more msrs to arr[] in intel_guest_get_msrs() for extensibility. Originally-by: Andi Kleen <ak@linux.intel.com> Co-developed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Co-developed-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-8-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08x86/perf/core: Add pebs_capable to store valid PEBS_COUNTER_MASK valuePeter Zijlstra (Intel)1-8/+6
The value of pebs_counter_mask will be accessed frequently for repeated use in the intel_guest_get_msrs(). So it can be optimized instead of endlessly mucking about with branches. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-7-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08perf/x86/core: Pass "struct kvm_pmu *" to determine the guest valuesLike Xu1-2/+2
Splitting the logic for determining the guest values is unnecessarily confusing, and potentially fragile. Perf should have full knowledge and control of what values are loaded for the guest. If we change .guest_get_msrs() to take a struct kvm_pmu pointer, then it can generate the full set of guest values by grabbing guest ds_area and pebs_data_cfg. Alternatively, .guest_get_msrs() could take the desired guest MSR values directly (ds_area and pebs_data_cfg), but kvm_pmu is vendor agnostic, so we don't see any reason to not just pass the pointer. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-Id: <20220411101946.20262-4-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08perf/x86/intel: Handle guest PEBS overflow PMI for KVM guestLike Xu1-0/+42
With PEBS virtualization, the guest PEBS records get delivered to the guest DS, and the host pmi handler uses perf_guest_cbs->is_in_guest() to distinguish whether the PMI comes from the guest code like Intel PT. No matter how many guest PEBS counters are overflowed, only triggering one fake event is enough. The fake event causes the KVM PMI callback to be called, thereby injecting the PEBS overflow PMI into the guest. KVM may inject the PMI with BUFFER_OVF set, even if the guest DS is empty. That should really be harmless. Thus guest PEBS handler would retrieve the correct information from its own PEBS records buffer. Cc: linux-perf-users@vger.kernel.org Originally-by: Andi Kleen <ak@linux.intel.com> Co-developed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220411101946.20262-3-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-06-08perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake ServerLike Xu1-0/+1
Add support for EPT-Friendly PEBS, a new CPU feature that enlightens PEBS to translate guest linear address through EPT, and facilitates handling VM-Exits that occur when accessing PEBS records. More information can be found in the December 2021 release of Intel's SDM, Volume 3, 18.9.5 "EPT-Friendly PEBS". This new hardware facility makes sure the guest PEBS records will not be lost, which is available on Intel Ice Lake Server platforms (and later). KVM will check this field through perf_get_x86_pmu_capability() instead of hard coding the CPU models in the KVM code. If it is supported, the guest PEBS capability will be exposed to the guest. Guest PEBS can be enabled when and only when "EPT-Friendly PEBS" is supported and EPT is enabled. Cc: linux-perf-users@vger.kernel.org Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220411101946.20262-2-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-25perf/x86/intel: Fix event constraints for ICLKan Liang1-1/+1
According to the latest event list, the event encoding 0x55 INST_DECODED.DECODERS and 0x56 UOPS_DECODED.DEC0 are only available on the first 4 counters. Add them into the event constraints table. Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support") Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220525133952.1660658-1-kan.liang@linux.intel.com