aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/util (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-11-08perf test: Fix skipping branch stack sampling testJames Clark1-1/+3
Commit f4a2aade6809c657 ("perf tests powerpc: Fix branch stack sampling test to include sanity check for branch filter") added a skip if certain branch options aren't available. But the change added both -b (--branch-any) and --branch-filter options at the same time, which will always result in a failure on any platform because the arguments can't be used together. Fix this by removing -b (--branch-any) and leaving --branch-filter which already specifies 'any'. Also add warning messages to the test and perf tool. Output on x86 before this fix: $ sudo ./perf test branch 108: Check branch stack sampling : Skip After: $ sudo ./perf test branch 108: Check branch stack sampling : Ok Fixes: f4a2aade6809c657 ("perf tests powerpc: Fix branch stack sampling test to include sanity check for branch filter") Signed-off-by: James Clark <james.clark@arm.com> Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman.Khandual@arm.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221028121913.745307-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-08perf stat: Fix printing os->prefix in CSV metrics outputAthira Rajeev1-1/+1
'perf stat' with CSV output option prints an extra empty string as first field in metrics output line. Sample output below: # ./perf stat -x, --per-socket -a -C 1 ls S0,1,1.78,msec,cpu-clock,1785146,100.00,0.973,CPUs utilized S0,1,26,,context-switches,1781750,100.00,0.015,M/sec S0,1,1,,cpu-migrations,1780526,100.00,0.561,K/sec S0,1,1,,page-faults,1779060,100.00,0.561,K/sec S0,1,875807,,cycles,1769826,100.00,0.491,GHz S0,1,85281,,stalled-cycles-frontend,1767512,100.00,9.74,frontend cycles idle S0,1,576839,,stalled-cycles-backend,1766260,100.00,65.86,backend cycles idle S0,1,288430,,instructions,1762246,100.00,0.33,insn per cycle ====> ,S0,1,,,,,,,2.00,stalled cycles per insn The above command line uses field separator as "," via "-x," option and per-socket option displays socket value as first field. But here the last line for "stalled cycles per insn" has "," in the beginning. Sample output using interval mode: # ./perf stat -I 1000 -x, --per-socket -a -C 1 ls 0.001813453,S0,1,1.87,msec,cpu-clock,1872052,100.00,0.002,CPUs utilized 0.001813453,S0,1,2,,context-switches,1868028,100.00,1.070,K/sec ------ 0.001813453,S0,1,85379,,instructions,1856754,100.00,0.32,insn per cycle ====> 0.001813453,,S0,1,,,,,,,1.34,stalled cycles per insn Above result also has an extra CSV separator after the timestamp. Patch addresses extra field separator in the beginning of the metric output line. The counter stats are displayed by function "perf_stat__print_shadow_stats" in code "util/stat-shadow.c". While printing the stats info for "stalled cycles per insn", function "new_line_csv" is used as new_line callback. The new_line_csv function has check for "os->prefix" and if prefix is not null, it will be printed along with cvs separator. Snippet from "new_line_csv": if (os->prefix) fprintf(os->fh, "%s%s", os->prefix, config->csv_sep); Here os->prefix gets printed followed by "," which is the cvs separator. The os->prefix is used in interval mode option ( -I ), to print time stamp on every new line. But prefix is already set to contain CSV separator when used in interval mode for CSV option. Reference: Function "static void print_interval" Snippet: sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, config->csv_sep); Also if prefix is not assigned (if not used with -I option), it gets set to empty string. Reference: function printout() in util/stat-display.c Snippet: .prefix = prefix ? prefix : "", Since prefix already set to contain cvs_sep in interval option, patch removes printing config->csv_sep in new_line_csv function to avoid printing extra field. After the patch: # ./perf stat -x, --per-socket -a -C 1 ls S0,1,2.04,msec,cpu-clock,2045202,100.00,1.013,CPUs utilized S0,1,2,,context-switches,2041444,100.00,979.289,/sec S0,1,0,,cpu-migrations,2040820,100.00,0.000,/sec S0,1,2,,page-faults,2040288,100.00,979.289,/sec S0,1,254589,,cycles,2036066,100.00,0.125,GHz S0,1,82481,,stalled-cycles-frontend,2032420,100.00,32.40,frontend cycles idle S0,1,113170,,stalled-cycles-backend,2031722,100.00,44.45,backend cycles idle S0,1,88766,,instructions,2030942,100.00,0.35,insn per cycle S0,1,,,,,,,1.27,stalled cycles per insn Fixes: 92a61f6412d3a09d ("perf stat: Implement CSV metrics output") Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com> Reviewed-By: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nageswara R Sastry <rnsastry@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20221018085605.63834-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-08perf stat: Fix crash with --per-node --metric-only in CSV modeNamhyung Kim1-1/+3
The following command will get segfault due to missing aggr_header_csv for AGGR_NODE: $ sudo perf stat -a --per-node -x, --metric-only true Committer testing: Before this patch: # perf stat -a --per-node -x, --metric-only true Segmentation fault (core dumped) # After: # gdb perf -bash: gdb: command not found # perf stat -a --per-node -x, --metric-only true node,Ghz,frontend cycles idle,backend cycles idle,insn per cycle,branch-misses of all branches, N0,32,0.335,2.10,0.65,0.69,0.03,1.92, # Fixes: 86895b480a2f10c7 ("perf stat: Add --per-node agregation support") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: http://lore.kernel.org/lkml/20221107213314.3239159-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-26perf auxtrace: Fix address filter symbol name match for modulesAdrian Hunter1-1/+9
For modules, names from kallsyms__parse() contain the module name which meant that module symbols did not match exactly by name. Fix by matching the name string up to the separating tab character. Fixes: 1b36c03e356936d6 ("perf record: Add support for using symbols in address filters") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221026072736.2982-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-25tools headers: Update the copy of x86's memcpy_64.S used in 'perf bench'Arnaldo Carvalho de Melo1-0/+13
We also need to add SYM_TYPED_FUNC_START() to util/include/linux/linkage.h and update tools/perf/check_headers.sh to ignore the include cfi_types.h line when checking if the kernel original files drifted from the copies we carry. This is to get the changes from: ccace936eec7b805 ("x86: Add types to indirectly called assembly functions") Addressing these tools/perf build warnings: Warning: Kernel ABI header at 'tools/arch/x86/lib/memcpy_64.S' differs from latest version at 'arch/x86/lib/memcpy_64.S' diff -u tools/arch/x86/lib/memcpy_64.S arch/x86/lib/memcpy_64.S Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/lkml/Y1f3VRIec9EBgX6F@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-25perf bpf: Fix build with libbpf 0.7.0 by checking if bpf_program__set_insns() is availableArnaldo Carvalho de Melo1-0/+18
During the transition to libbpf 1.0 some functions that perf used were deprecated and finally removed from libbpf, so bpf_program__set_insns() was introduced for perf to continue to use its bpf loader. But when build with LIBBPF_DYNAMIC=1 we now need to check if that function is available so that perf can build with older libbpf versions, even if the end result is emitting a warning to the user that the use of the perf BPF loader requires a newer libbpf, since bpf_program__set_insns() touches libbpf objects internal state. This affects only 'perf trace' when using bpf C code or pre-compiled bytecode as an event. Noticed on RHEL9, that has libbpf 0.7.0, where bpf_program__set_insns() isn't available. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-25perf bpf: Fix build with libbpf 0.7.0 by adding prototype for bpf_load_program()Arnaldo Carvalho de Melo1-0/+5
The bpf_load_program() prototype appeared in tools/lib/bpf/bpf.h as deprecated, but nowadays its completely removed, so add it back for building with the system libbpf when using 'make LIBBPF_DYNAMIC=1'. This is a stop gap hack till we do like tools/bpf does with bpftool, i.e. bootstrap the libbpf build and install it in the perf build directory when not using 'make LIBBPF_DYNAMIC=1'. That has to be done to all libraries in tools/lib/, so tha we can remove -Itools/lib/ from the tools/perf CFLAGS. Noticed when building with LIBBPF_DYNAMIC=1 and libbpf 0.7.0 on RHEL9. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-15perf auxtrace arm64: Add support for parsing HiSilicon PCIe Trace packetQi Liu7-0/+396
Add support for using 'perf report --dump-raw-trace' to parse PTT packet. Example usage: Output will contain raw PTT data and its textual representation, such as (8DW format): 0 0 0x5810 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000 offset: 0 ref: 0xa5d50c725 idx: 0 tid: -1 cpu: 0 . . ... HISI PTT data: size 4194304 bytes . 00000000: 00 00 00 00 Prefix . 00000004: 08 20 00 60 Header DW0 . 00000008: ff 02 00 01 Header DW1 . 0000000c: 20 08 00 00 Header DW2 . 00000010: 10 e7 44 ab Header DW3 . 00000014: 2a a8 1e 01 Time . 00000020: 00 00 00 00 Prefix . 00000024: 01 00 00 60 Header DW0 . 00000028: 0f 1e 00 01 Header DW1 . 0000002c: 04 00 00 00 Header DW2 . 00000030: 40 00 81 02 Header DW3 . 00000034: ee 02 00 00 Time .... This patch only add basic parsing support according to the definition of the PTT packet described in Documentation/trace/hisi-ptt.rst. And the fields of each packet can be further decoded following the PCIe Spec's definition of TLP packet. Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: John Garry <john.garry@huawei.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Liu <liuqi6124@gmail.com> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Zeng Prime <prime.zeng@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/20220927081400.14364-4-yangyicong@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-15perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driverQi Liu3-0/+18
HiSilicon PCIe tune and trace device (PTT) could dynamically tune the PCIe link's events, and trace the TLP headers). This patch add support for PTT device in perf tool, so users could use 'perf record' to get TLP headers trace data. Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Acked-by: John Garry <john.garry@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Liu <liuqi6124@gmail.com> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Zeng Prime <prime.zeng@huawei.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/20220927081400.14364-3-yangyicong@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-15perf intel-pt: Fix segfault in intel_pt_print_info() with uClibcAdrian Hunter1-2/+7
uClibc segfaulted because NULL was passed as the format to fprintf(). That happened because one of the format strings was missing and intel_pt_print_info() didn't check that before calling fprintf(). Add the missing format string, and check format is not NULL before calling fprintf(). Fixes: 11fa7cb86b56d361 ("perf tools: Pass Intel PT information for decoding MTC and CYC") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221012082259.22394-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-15perf inject: Fix GEN_ELF_TEXT_OFFSET for jitAdrian Hunter1-1/+3
When a program header was added, it moved the text section but GEN_ELF_TEXT_OFFSET was not updated. Fix by adding the program header size and aligning. Fixes: babd04386b1df8c3 ("perf jit: Include program header in ELF files") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Lieven Hey <lieven.hey@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20221014170905.64069-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-14perf: Skip and warn on unknown format 'configN' attrsRob Herring5-13/+26
If the kernel exposes a new perf_event_attr field in a format attr, perf will return an error stating the specified PMU can't be found. For example, a format attr with 'config3:0-63' causes an error as config3 is unknown to perf. This causes a compatibility issue between a newer kernel with older perf tool. Before this change with a kernel adding 'config3' I get: $ perf record -e arm_spe// -- true event syntax error: 'arm_spe//' \___ Cannot find PMU `arm_spe'. Missing kernel support? Run 'perf list' for a list of valid events Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -e, --event <event> event selector. use 'perf list' to list available events After this change, I get: $ perf record -e arm_spe// -- true WARNING: 'arm_spe_0' format 'inv_event_filter' requires 'perf_event_attr::config3' which is not supported by this version of perf! [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.091 MB perf.data ] To support unknown configN formats, rework the YACC implementation to pass any config[0-9]+ format to perf_pmu__new_format() to handle with a warning. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Rob Herring <robh@kernel.org> Tested-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220914-arm-perf-tool-spe1-2-v2-v4-1-83c098e6212e@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-14perf stat: Support old kernels for bperf cgroup countingNamhyung Kim1-1/+28
The recent change in the cgroup will break the backward compatiblity in the BPF program. It should support both old and new kernels using BPF CO-RE technique. Like the task_struct->__state handling in the offcpu analysis, we can check the field name in the cgroup struct. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Song Liu <songliubraving@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: bpf@vger.kernel.org Cc: cgroups@vger.kernel.org Cc: zefan li <lizefan.x@bytedance.com> Link: http://lore.kernel.org/lkml/20221011052808.282394-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-11Merge tag 'perf-tools-for-v6.1-1-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linuxLinus Torvalds67-662/+1557
Pull perf tools updates from Arnaldo Carvalho de Melo: - Add support for AMD on 'perf mem' and 'perf c2c', the kernel enablement patches went via tip. Example: $ sudo perf mem record -- -c 10000 ^C[ perf record: Woken up 227 times to write data ] [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ] $ sudo perf mem report -F mem,sample,snoop Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762 Memory access Samples Snoop N/A 700620 N/A L1 hit 126675 N/A L2 hit 424 N/A L3 hit 664 HitM L3 hit 10 N/A Local RAM hit 2 N/A Remote RAM (1 hop) hit 8558 N/A Remote Cache (1 hop) hit 3 N/A Remote Cache (1 hop) hit 2 HitM Remote Cache (2 hops) hit 10 HitM Remote Cache (2 hops) hit 6 N/A Uncached hit 4 N/A $ - "perf lock" improvements: - Add -E/--entries option to limit the number of entries to display, say to ask for just the top 5 contended locks. - Add -q/--quiet option to suppress header and debug messages. - Add a 'perf test' kernel lock contention entry to test 'perf lock'. - "perf lock contention" improvements: - Ask BPF's bpf_get_stackid() to skip some callchain entries. The ones closer to the tooling are bpf related and not that interesting, the ones calling the locking function are the ones we're interested in, example of a full, unskipped callstack: - Allow changing the callstack depth and number of entries to skip. 1 10.74 us 10.74 us 10.74 us spinlock __bpf_trace_contention_begin+0xb 0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117 0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117 0xffffffffbb8b8e75 bpf_trace_run2+0x35 0xffffffffbb7eab9b __bpf_trace_contention_begin+0xb 0xffffffffbb7ebe75 queued_spin_lock_slowpath+0x1f5 0xffffffffbc1c26ff _raw_spin_lock+0x1f 0xffffffffbb841015 tick_do_update_jiffies64+0x25 0xffffffffbb8409ee tick_irq_enter+0x9e - Show full callstack in verbose mode (-v option), sometimes this is desirable instead of showing just one callstack entry. - Allow multiple time ranges in 'perf record --delay' to help in reducing the amount of data collected from hardware tracing (Intel PT, etc) when there is a rough idea of periods of time where events of interest take time. - Add Intel PT to record only decoder debug messages when error happens. - Improve layout of Intel PT man page. - Add new branch types: alignment, data and inst faults and arch specific ones, such as fiq, debug_halt, debug_exit, debug_inst and debug_data on arm64. Kernel enablement went thru the tip tree. - Fix 'perf probe' error log check in 'perf test' when no debuginfo is available. - Fix 'perf stat' aggregation mode logic, it should be looking at the CPU not at the core number. - Fix flags parsing in 'perf trace' filters. - Introduce compact encoding of CPU range encoding on perf.data, to avoid having a bitmap with all the CPUs. - Improvements to the 'perf stat' metrics, including adding "core_wide", and computing "smt" from the CPU topology. - Add support to the new PERF_FORMAT_LOST perf_event_attr.read_format, that allows tooling to ask for the precise number of lost samples for a given event. - Add 'addr' sort key to see just the address of sampled instructions: $ perf record -o- true | perf report -i- -s addr [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] # Samples: 12 of event 'cycles:u' # Event count (approx.): 252512 # # Overhead Address # ........ .................. 42.96% 0x7f96f08443d7 29.55% 0x7f96f0859b50 14.76% 0x7f96f0852e02 8.30% 0x7f96f0855028 4.43% 0xffffffff8de01087 perf annotate: Toggle full address <-> offset display - Add 'f' hotkey to the 'perf annotate' TUI interface when in 'disassembler output' mode ('o' hotkey) to toggle showing full virtual address or just the offset. - Cache DSO build-ids when synthesizing PERF_RECORD_MMAP records for pre-existing threads, at the start of a 'perf record' session, speeding up that record startup phase. - Add a command line option to specify build ids in 'perf inject'. - Update JSON event files for the Intel alderlake, broadwell, broadwellde, broadwellx, cascadelakex, haswell, haswellx, icelake, icelakex, ivybridge, ivytown, jaketown, sandybridge, sapphirerapids, skylake, skylakex, and tigerlake processors. - Update vendor JSON event files for the ARM Neoverse V1 and E1 platforms. - Add a 'perf test' entry for 'perf mem' where a struct has false sharing and this gets detected in the 'perf mem' output, tested with Intel, AMD and ARM64 systems. - Add a 'perf test' entry to test the resolution of java symbols, where an output like this is expected: 8.18% jshell jitted-50116-29.so [.] Interpreter 0.75% Thread-1 jitted-83602-1670.so [.] jdk.internal.jimage.BasicImageReader.getString(int) - Add tests for the ARM64 CoreSight hardware tracing feature, with specially crafted pureloop, memcpy, thread loop and unroll tread that then gets traced and the output compared with expected output. Documentation explaining it is also included. - Add per thread Intel PT 'perf test' entry to check that PERF_RECORD_TEXT_POKE events are recorded per CPU, resulting in a mixture of per thread and per CPU events and mmaps, verify that this gets all recorded correctly. - Introduce pthread mutex wrappers to allow for building with clang's -Wthread-safety, i.e. using the "guarded_by" "pt_guarded_by" "lockable", "exclusive_lock_function", "exclusive_trylock_function", "exclusive_locks_required", and "no_thread_safety_analysis" compiler function attributes. - Fix empty version number when building outside of a git repo. - Improve feature detection display when multiple versions of a feature are present, such as for binutils libbfd, that has a mix of possible ways to detect according to the Linux distribution. Previously in some cases we had: Auto-detecting system features <SNIP> ... libbfd: [ on ] ... libbfd-liberty: [ on ] ... libbfd-liberty-z: [ on ] <SNIP> Now for this case we show just the main feature: Auto-detecting system features <SNIP> ... libbfd: [ on ] <SNIP> - Remove some unused structs, variables, macros, function prototypes and includes from various places. * tag 'perf-tools-for-v6.1-1-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (169 commits) perf script: Add missing fields in usage hint perf mem: Print "LFB/MAB" for PERF_MEM_LVLNUM_LFB perf mem/c2c: Avoid printing empty lines for unsupported events perf mem/c2c: Add load store event mappings for AMD perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events perf mem: Add support for printing PERF_MEM_LVLNUM_{CXL|IO} perf amd ibs: Sync arch/x86/include/asm/amd-ibs.h header with the kernel tools headers UAPI: Sync include/uapi/linux/perf_event.h header with the kernel perf stat: Fix cpu check to use id.cpu.cpu in aggr_printout() perf test coresight: Add relevant documentation about ARM64 CoreSight testing perf test: Add git ignore for tmp and output files of ARM CoreSight tests perf test coresight: Add unroll thread test shell script perf test coresight: Add unroll thread test tool perf test coresight: Add thread loop test shell scripts perf test coresight: Add thread loop test tool perf test coresight: Add memcpy thread test shell script perf test coresight: Add memcpy thread test tool perf test: Add git ignore for perf data generated by the ARM CoreSight tests perf test: Add arm64 asm pureloop test shell script perf test: Add asm pureloop test tool ...
2022-10-10Merge tag 'cgroup-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroupLinus Torvalds1-1/+1
Pull cgroup updates from Tejun Heo: - cpuset now support isolated cpus.partition type, which will enable dynamic CPU isolation - pids.peak added to remember the max number of pids used - holes in cgroup namespace plugged - internal cleanups * tag 'cgroup-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (25 commits) cgroup: use strscpy() is more robust and safer iocost_monitor: reorder BlkgIterator cgroup: simplify code in cgroup_apply_control cgroup: Make cgroup_get_from_id() prettier cgroup/cpuset: remove unreachable code cgroup: Remove CFTYPE_PRESSURE cgroup: Improve cftype add/rm error handling kselftest/cgroup: Add cpuset v2 partition root state test cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule cgroup/cpuset: Relocate a code block in validate_change() cgroup/cpuset: Show invalid partition reason string cgroup/cpuset: Add a new isolated cpus.partition type cgroup/cpuset: Relax constraints to partition & cpus changes cgroup/cpuset: Allow no-task partition to have empty cpuset.cpus.effective cgroup/cpuset: Miscellaneous cleanups & add helper functions cgroup/cpuset: Enable update_tasks_cpumask() on top_cpuset cgroup: add pids.peak interface for pids controller cgroup: Remove data-race around cgrp_dfl_visible cgroup: Fix build failure when CONFIG_SHRINKER_DEBUG ...
2022-10-06perf mem: Print "LFB/MAB" for PERF_MEM_LVLNUM_LFBRavi Bangoria1-2/+2
A hw component to track outstanding L1 Data Cache misses is called LFB (Line Fill Buffer) on Intel and Arm. However similar component exists on other arch with different names, for ex, it's called MAB (Miss Address Buffer) on AMD. Use 'LFB/MAB' instead of just 'LFB'. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: https://lore.kernel.org/r/20221006153946.7816-8-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf mem/c2c: Avoid printing empty lines for unsupported eventsRavi Bangoria1-5/+6
The 'perf mem' and 'perf c2c' tools can be used with 3 different events: load, store and combined load-store. Some architectures might support only partial set of events in which case, perf prints an empty line for unsupported events. Avoid that. Ex, AMD Zen cpus supports only combined load-store event and does not support individual load and store event. Before patch: $ perf mem record -e list mem-ldst : available $ After patch: $ perf mem record -e list mem-ldst : available $ Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: https://lore.kernel.org/r/20221006153946.7816-7-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf mem: Add support for printing PERF_MEM_LVLNUM_{CXL|IO}Ravi Bangoria1-0/+2
Add support for printing these new fields in perf mem report. Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ali Saidi <alisaidi@amazon.com> Cc: Ananth Narayan <ananth.narayan@amd.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Santosh Shukla <santosh.shukla@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Link: https://lore.kernel.org/r/20221006153946.7816-4-ravi.bangoria@amd.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf stat: Fix cpu check to use id.cpu.cpu in aggr_printout()Athira Rajeev1-2/+2
'perf stat' has options to aggregate the counts in different modes like per socket, per core etc. The function "aggr_printout" in util/stat-display.c which is used to print the aggregates, has a check for cpu in case of AGGR_NONE. This check was originally using condition : "if (id.cpu.cpu > -1)". But this got changed after commit df936cadfb58 ("perf stat: Add JSON output option"), which added option to output json format for different aggregation modes. After this commit, the check in "aggr_printout" is using "if (id.core > -1)". The old code was using "id.cpu.cpu > -1" while the new code is using "id.core > -1". But since the value printed is id.cpu.cpu, fix this check to use cpu and not core. Suggested-by: Ian Rogers <irogers@google.com> Suggested-by: James Clark <james.clark@arm.com> Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Cc: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nageswara R Sastry <rnsastry@linux.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20221006114225.66303-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf record: Save DSO build-ID for synthesizingNamhyung Kim1-3/+22
When synthesizing MMAP2 with build-id, it'd read the same file repeatedly as it has no idea if it's done already. Maintain a dsos to check that and skip the file access if possible. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220920222822.2171056-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf stat: Rename to aggr_cpu_id.thread_idxNamhyung Kim3-11/+11
The aggr_cpu_id has a thread value but it's actually an index to the thread_map. To reduce possible confusion, rename it to thread_idx. Suggested-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220930202110.845199-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf stat: Don't compare runtime stat for shadow statsNamhyung Kim1-12/+0
Now it always uses the global rt_stat. Let's get rid of the field from the saved_value. When the both evsels are NULL, it'd return 0 so remove the block in the saved_value_cmp. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220930202110.845199-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf stat: Kill unused per-thread runtime statsNamhyung Kim1-2/+0
Now it's using the global rt_stat, no need to use per-thread stats. Let get rid of them. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220930202110.845199-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf stat: Use thread map index for shadow statNamhyung Kim3-18/+12
When AGGR_THREAD is active, it aggregates the values for each thread. Previously it used cpu map index which is invalid for AGGR_THREAD so it had to use separate runtime stats with index 0. But it can just use the rt_stat with thread_map_index. Rename the first_shadow_map_idx() and make it return the thread index. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220930202110.845199-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf stat: Rename saved_value->cpu_map_idxNamhyung Kim2-157/+157
The cpu_map_idx fields is just to differentiate values from other entries. It doesn't need to be strictly cpu map index. Actually we can pass thread map index or aggr map index. So rename the fields first. No functional change intended. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220930202110.845199-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf stat: Don't call perf_stat_evsel_id_init() repeatedlyNamhyung Kim1-1/+1
evsel__reset_stat_priv() is called more than once if user gave -r option for multiple runs. But it doesn't need to re-initialize the id. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220930202110.845199-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf stat: Convert perf_stat_evsel.res_stats arrayNamhyung Kim3-9/+5
It uses only one member, no need to have it as an array. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220930202110.845199-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf tools: Remove special handling of system-wide evselNamhyung Kim2-13/+2
For system-wide evsels, the thread map should be dummy - i.e. it has a single entry of -1. But the code guarantees such a thread map, so no need to handle it specially. No functional change intended. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221003204647.1481128-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf tools: Add evlist__add_sched_switch()Namhyung Kim2-0/+18
Add a help to create a system-wide sched_switch event. One merit is that it sets the system-wide bit before adding it to evlist so that the libperf can handle the cpu and thread maps correctly. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221003204647.1481128-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf tools: Get rid of evlist__add_on_all_cpus()Namhyung Kim1-27/+2
The cpu and thread maps are properly handled in libperf now. No need to do it in the perf tools anymore. Let's remove the logic. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221003204647.1481128-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf metrics: Don't scale counts going into metricsIan Rogers1-2/+7
Counts are scaled prior to going into saved_value, reverse the scaling so that metrics don't double scale values. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221004021612.325521-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf expr: Remove jevents case workaroundIan Rogers1-10/+1
jevents.py no longer lowercases metrics and altering the case can cause hashmap lookups to fail, so remove. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221004021612.325521-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf expr: Allow a double if expressionIan Rogers1-1/+1
Some TMA metrics have double if expressions like: ( CPU_CLK_UNHALTED.THREAD / 2 ) * ( 1 + CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_XCLK ) ) if #core_wide < 1 else ( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else CPU_CLK_UNHALTED.THREAD This currently fails to parse as the left hand side if expression needs to be in parentheses. By allowing the if expression to have a right hand side that is an if expression we can parse the expression above, with left to right evaluation order that matches languages like Python. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20221004021612.325521-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf parse-events: Remove unused macros __PERF_EVENT_FIELD()Chen Zhongjin1-8/+0
Unused macros reported by [-Wunused-macros]. This macros were introduced as __PERF_COUNTER_FIELD and used for reading the bit in config. cdd6c482c9ff9c55 ("perf: Do the big rename: Performance Counters -> Performance Events") Changes it to __PERF_EVENT_FIELD but at this commit there is already nowhere else using these macros, also no macros called PERF_EVENT_##name##_MASK/SHIFT. Now we are not reading type or id from config. These macros are useless and incomplete. So removing them for code cleaning. Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20220926031440.28275-5-chenzhongjin@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf tools: Fix empty version number when building outside of a git repoWill Chandler1-2/+8
When perf is built in a full source tree that is not a git repository, e.g. from a kernel source tarball, `perf version` will print empty tag and commit strings: $ perf version perf version Currently the tag version is only generated from the root Makefile when building in a git repository. If PERF-VERSION-FILE has not been generated and the source tree is not in a git repository, then PERF-VERSION-GEN will return an empty version. The problem can be reproduced with the following steps: $ wget https://git.kernel.org/torvalds/t/linux-6.0-rc7.tar.gz $ tar -xf linux-6.0-rc7.tar.gz && cd linux-6.0-rc7 $ make -C tools/perf $ tools/perf/perf -v perf version Builds from tarballs generated with `make perf-tar-src-pkg` are not impacted by this issue as PERF-VERSION-FILE is included in the archive. The perf RPM provided by Fedora for 5.18+ is experiencing this problem. Package build logs[0] show that the build is attempting to fall back on PERF-VERSION-FILE, but it is not present. To resolve this, revert back to the previous logic of using the kernel Makefile version if not in a git repository and PERF-VERSION-FILE does not exist. [0] https://kojipkgs.fedoraproject.org/packages/kernel-tools/5.19.4/200.fc36/data/logs/x86_64/build.log Fixes: 7572733b84997d23 ("perf tools: Fix version kernel tag") Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Will Chandler <wfc@wfchandler.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220930151157.529674-1-wfc@wfchandler.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf lock: Remove unused struct lock_contention_keyYuan Can1-5/+0
The struct lock_contention_key is never used, remove it. Signed-off-by: Yuan Can <yuancan@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/linux-perf-users/20220927013931.110475-6-yuancan@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf jit: Remove unused struct debug_line_infoYuan Can1-7/+0
The struct debug_line_info is never used, remove it. Signed-off-by: Yuan Can <yuancan@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/linux-perf-users/20220927013931.110475-5-yuancan@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf metric: Remove unused struct metric_ref_nodeYuan Can1-11/+0
After commit 46bdc0bf8d21 ("perf metric: Simplify metric_refs calculation"), no one use struct metric_ref_node, so remove it. Signed-off-by: Yuan Can <yuancan@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/linux-perf-users/20220927013931.110475-4-yuancan@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf machine: Remove unused struct process_argsYuan Can1-4/+0
After commit a93f0e551af9 ("perf symbols: Get kernel start address by symbol name"), no one uses struct process_args any more, so remove it. Signed-off-by: Yuan Can <yuancan@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/linux-perf-users/20220927013931.110475-2-yuancan@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06perf lock contention: Fix a build error on 32-bitNamhyung Kim2-2/+2
It was reported that it failed to build the BPF lock contention skeleton on 32 bit arch due to the size of long. The lost count is used only for reporting errors due to lack of stackmap space through bad_hist which type is 'int'. Let's use int type then. Fixes: 6d499a6b3d90277d ("perf lock: Print the number of lost entries for BPF") Reported-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Link: http://lore.kernel.org/lkml/20220926215638.3931222-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf string: Remove unused macro K()Chen Zhongjin1-1/+0
Unused macro reported by [-Wunused-macros]. This macro is introduced to calculate the 'unit' size, in: d2fb8b4151a92223 ("perf tools: Add new perf_atoll() function to parse string representing size in bytes") 8ba7f6c2faada3ad ("saner perf_atoll()") This commit has simplified the perf_atoll() function and remove the 'unit' variable. This macro is not deleted, but nowhere else is using it. A single letter macro is confusing and easy to be misused. So remove it for code cleaning. Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20220926031440.28275-6-chenzhongjin@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf tools: Add debug messages and comments for testingAdrian Hunter1-0/+2
Add debug messages to enable scripts to track aspects of 'perf record' behaviour. The messages will be consumed after 'perf record' has run, with the exception of "perf record has started" which is consequently flushed. Put comments so developers know which messages are also being used by test scripts. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220912083412.7058-11-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf annotate: Toggle full address <-> offset displayNamhyung Kim2-2/+21
Handle 'f' key to toggle the display offset and full address. Obviously it only works when users set to see disassembler output ('o' key). It'd be useful when users want to see the full virtual address in the TUI annotate browser. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220923173142.805896-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf tools: Add 'addr' sort keyNamhyung Kim4-0/+41
Sometimes users want to see actual (virtual) address of sampled instructions. Add a new 'addr' sort key to display the raw addresses. $ perf record -o- true | perf report -i- -s addr # To display the perf.data header info, please use --header/--header-only options. # [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] # # Total Lost Samples: 0 # # Samples: 12 of event 'cycles:u' # Event count (approx.): 252512 # # Overhead Address # ........ .................. # 42.96% 0x7f96f08443d7 29.55% 0x7f96f0859b50 14.76% 0x7f96f0852e02 8.30% 0x7f96f0855028 4.43% 0xffffffff8de01087 Note that it just compares and displays the sample ip. Each process can have a different memory layout and the ip will be different even if they run the same binary. So this sort key is mostly meaningful for per-process profile data. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220923173142.805896-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf genelf: Fix error code in jit_write_elf()Shang XiaoJing1-0/+1
The error code is set to -1 at the beginning of jit_write_elf(), but it is assigned by jit_add_eh_frame_info() in the middle, hence the following error can only return the error code of jit_add_eh_frame_info(). Reset the error code to the default value after being assigned by jit_add_eh_frame_info(). Fixes: 086f9f3d7897d808 ("perf jit: Generate .eh_frame/.eh_frame_hdr in DSO") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stefano Sanfilippo <ssanfilippo@chromium.org> Link: https://lore.kernel.org/r/20220922141438.22487-2-shangxiaojing@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf lock contention: Skip stack trace from BPFNamhyung Kim2-4/+6
Currently it collects stack traces to max size then skip entries. Because we don't have control how to skip perf callchains. But BPF can do it with bpf_get_stackid() with a flag. Say we have max-stack=4 and stack-skip=2, we get these stack traces. Before: After: .---> +---+ <--. .---> +---+ <--. | | | | | | | | | +---+ usable | +---+ | max | | | max | | | stack +---+ <--' stack +---+ usable | | X | | | | | | +---+ skip | +---+ | | | X | | | | | `---> +---+ `---> +---+ <--' <=== collection | X | +---+ skip | X | +---+ Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20220912055314.744552-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf lock contention: Allow to change stack depth and skipNamhyung Kim2-3/+6
It needs stack traces to find callers of locks. To minimize the performance overhead it only collects up to 8 entries for each stack trace. And it skips first 3 entries as they came from BPF, tracepoint and lock functions which are not interested for most users. But it turned out that those numbers are different in some configuration. Using fixed number can result in non meaningful caller names. Let's make them adjustable with --stack-depth and --skip-stack options. On my setup, the default output is like below: # /perf lock con -ab -F contended,wait_total sleep 3 contended total wait type caller 28 4.55 ms rwlock:W __bpf_trace_contention_begin+0xb 33 1.67 ms rwlock:W __bpf_trace_contention_begin+0xb 12 580.28 us spinlock __bpf_trace_contention_begin+0xb 60 240.54 us rwsem:R __bpf_trace_contention_begin+0xb 27 64.45 us spinlock __bpf_trace_contention_begin+0xb If I change the stack skip to 5, the result will be like: # perf lock con -ab -F contended,wait_total --stack-skip 5 sleep 3 contended total wait type caller 32 715.45 us spinlock folio_lruvec_lock_irqsave+0x61 26 550.22 us spinlock folio_lruvec_lock_irqsave+0x61 15 486.93 us rwsem:R mmap_read_lock+0x13 12 139.66 us rwsem:W vm_mmap_pgoff+0x93 1 7.04 us spinlock tick_do_update_jiffies64+0x25 Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20220912055314.744552-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf lock contention: Show full callstack with -v optionNamhyung Kim2-0/+10
Currently it shows a caller function for each entry, but users need to see the full call stacks sometimes. Use -v/--verbose option to do that. # perf lock con -a -b -v sleep 3 Looking at the vmlinux_path (8 entries long) symsrc__init: cannot get elf header. Using /proc/kcore for kernel data Using /proc/kallsyms for symbols contended total wait max wait avg wait type caller 1 10.74 us 10.74 us 10.74 us spinlock __bpf_trace_contention_begin+0xb 0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117 0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117 0xffffffffbb8b8e75 bpf_trace_run2+0x35 0xffffffffbb7eab9b __bpf_trace_contention_begin+0xb 0xffffffffbb7ebe75 queued_spin_lock_slowpath+0x1f5 0xffffffffbc1c26ff _raw_spin_lock+0x1f 0xffffffffbb841015 tick_do_update_jiffies64+0x25 0xffffffffbb8409ee tick_irq_enter+0x9e 1 7.70 us 7.70 us 7.70 us spinlock __bpf_trace_contention_begin+0xb 0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117 0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117 0xffffffffbb8b8e75 bpf_trace_run2+0x35 0xffffffffbb7eab9b __bpf_trace_contention_begin+0xb 0xffffffffbb7ebe75 queued_spin_lock_slowpath+0x1f5 0xffffffffbc1c26ff _raw_spin_lock+0x1f 0xffffffffbb7bc27e raw_spin_rq_lock_nested+0xe 0xffffffffbb7cef9c load_balance+0x66c Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20220912055314.744552-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf metrics: Wire up core_wideIan Rogers7-43/+120
Pass state necessary for core_wide into the expression parser. Add system_wide and user_requested_cpu_list to perf_stat_config to make it available at display time. evlist isn't used as the evlist__create_maps, that computes user_requested_cpus, needs the list of events which is generated by the metric. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220831174926.579643-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04perf stat: Delay metric parsingIan Rogers2-3/+2
Having metric parsing as part of argument processing causes issues as flags like metric-no-group may be specified later. It also denies the opportunity to optimize the events on SMT systems where fewer events may be possible if we know the target is system-wide. Move metric parsing to after command line option parsing. Because of how stat runs this moves the parsing after record/report which fail to work with metrics currently anyway. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220831174926.579643-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>