aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-04-05Merge tag 'perf-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds91-615/+2987
Pull more perf updates from Thomas Gleixner: "Perf updates all over the place: core: - Support for cgroup tracking in samples to allow cgroup based analysis tools: - Support for cgroup analysis - Commandline option and hotkey for perf top to change the sort order - A set of fixes all over the place - Various build system related improvements - Updates of the X86 pmu event JSON data - Documentation updates" * tag 'perf-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits) perf python: Fix clang detection to strip out options passed in $CC perf tools: Support Python 3.8+ in Makefile perf script: Fix invalid read of directory entry after closedir() perf script report: Fix SEGFAULT when using DWARF mode perf script: add -S/--symbols documentation perf pmu-events x86: Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric perf events parser: Add missing Intel CPU events to parser perf script: Allow --symbol to accept hexadecimal addresses perf report/top TUI: Fix title line formatting perf top: Support hotkey to change sort order perf top: Support --group-sort-idx to change the sort order perf symbols: Fix arm64 gap between kernel start and module end perf build-test: Honour JOBS to override detection of number of cores perf script: Add --show-cgroup-events option perf top: Add --all-cgroups option perf record: Add --all-cgroups option perf record: Support synthesizing cgroup events perf report: Add 'cgroup' sort key perf cgroup: Maintain cgroup hierarchy perf tools: Basic support for CGROUP event ...
2020-04-03Merge tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdxLinus Torvalds2-0/+2
Pull SPDX updates from Greg KH: "Here are three SPDX patches for 5.7-rc1. One fixes up the SPDX tag for a single driver, while the other two go through the tree and add SPDX tags for all of the .gitignore files as needed. Nothing too complex, but you will get a merge conflict with your current tree, that should be trivial to handle (one file modified by two things, one file deleted.) All three of these have been in linux-next for a while, with no reported issues other than the merge conflict" * tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: ASoC: MT6660: make spdxcheck.py happy .gitignore: add SPDX License Identifier .gitignore: remove too obvious comments
2020-04-03perf python: Fix clang detection to strip out options passed in $CCArnaldo Carvalho de Melo1-1/+1
The clang check in the python setup.py file expected $CC to be just the name of the compiler, not the compiler + options, i.e. all options were expected to be passed in $CFLAGS, this ends up making it fail in systems where CC is set to, e.g.: "aarch64-linaro-linux-gcc --sysroot=/oe/build/tmp/work/juno-linaro-linux/perf/1.0-r9/recipe-sysroot" Like this: $ python3 >>> from subprocess import Popen >>> a = Popen(["aarch64-linux-gnu-gcc --sysroot=/oe/build/tmp/work/juno-linaro-linux/perf/1.0-r9/recipe-sysroot", "-v"]) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.6/subprocess.py", line 729, in __init__ restore_signals, start_new_session) File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'aarch64-linux-gnu-gcc --sysroot=/oe/build/tmp/work/juno-linaro-linux/perf/1.0-r9/recipe-sysroot': 'aarch64-linux-gnu-gcc --sysroot=/oe/build/tmp/work/juno-linaro-linux/perf/1.0-r9/recipe-sysroot' >>> Make it more robust, covering this case, by passing cc.split()[0] as the first arg to popen(). Fixes: a7ffd416d804 ("perf python: Fix clang detection when using CC=clang-version") Reported-by: Daniel Díaz <daniel.diaz@linaro.org> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Tested-by: Daniel Díaz <daniel.diaz@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ilie Halip <ilie.halip@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200401124037.GA12534@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf tools: Support Python 3.8+ in MakefileSam Lunt1-1/+10
Python 3.8 changed the output of 'python-config --ldflags' to no longer include the '-lpythonX.Y' flag (this apparently fixed an issue loading modules with a statically linked Python executable). The libpython feature check in linux/build/feature fails if the Python library is not included in FEATURE_CHECK_LDFLAGS-libpython variable. This adds a check in the Makefile to determine if PYTHON_CONFIG accepts the '--embed' flag and passes that flag alongside '--ldflags' if so. tools/perf is the only place the libpython feature check is used. Signed-off-by: Sam Lunt <samuel.j.lunt@gmail.com> Tested-by: He Zhe <zhe.he@windriver.com> Link: http://lore.kernel.org/lkml/c56be2e1-8111-9dfe-8298-f7d0f9ab7431@windriver.com Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: trivial@kernel.org Cc: stable@kernel.org Link: http://lore.kernel.org/lkml/20200131181123.tmamivhq4b7uqasr@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf script: Fix invalid read of directory entry after closedir()Andreas Gerstmayr1-1/+1
closedir(lang_dir) frees the memory of script_dirent->d_name, which gets accessed in the next line in a call to scnprintf(). Valgrind report: Invalid read of size 1 ==413557== at 0x483CBE6: strlen (vg_replace_strmem.c:461) ==413557== by 0x4DD45FD: __vfprintf_internal (vfprintf-internal.c:1688) ==413557== by 0x4DE6679: __vsnprintf_internal (vsnprintf.c:114) ==413557== by 0x53A037: vsnprintf (stdio2.h:80) ==413557== by 0x53A037: scnprintf (vsprintf.c:21) ==413557== by 0x435202: get_script_path (builtin-script.c:3223) ==413557== Address 0x52e7313 is 1,139 bytes inside a block of size 32,816 free'd ==413557== at 0x483AA0C: free (vg_replace_malloc.c:540) ==413557== by 0x4E303C0: closedir (closedir.c:50) ==413557== by 0x4351DC: get_script_path (builtin-script.c:3222) Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200402124337.419456-1-agerstmayr@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf script report: Fix SEGFAULT when using DWARF modeAndreas Gerstmayr1-0/+3
When running perf script report with a Python script and a callgraph in DWARF mode, intr_regs->regs can be 0 and therefore crashing the regs_map function. Added a check for this condition (same check as in builtin-script.c:595). Signed-off-by: Andreas Gerstmayr <agerstmayr@redhat.com> Tested-by: Kim Phillips <kim.phillips@amd.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: http://lore.kernel.org/lkml/20200402125417.422232-1-agerstmayr@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf script: add -S/--symbols documentationIan Rogers1-0/+8
Capture both that this option exists and that symbols can be hexadecimal addresses. Signed-off-by: Ian Rogers <irogers@google.com> Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200402174130.140319-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf pmu-events x86: Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metricJin Yao12-12/+12
The kernel utilization metric does multiplexing currently and is somewhat unreliable. The problem is that it uses two instances of the fixed counter, and the kernel has to multipleplex which causes errors. So should use CPU_CLK_UNHALTED.THREAD instead. Before: # perf stat -M Kernel_Utilization -- sleep 1 Performance counter stats for 'sleep 1': 1,419,425 cpu_clk_unhalted.ref_tsc:k <not counted> cpu_clk_unhalted.ref_tsc (0.00%) After: # perf stat -M Kernel_Utilization -- sleep 1 Performance counter stats for 'sleep 1': 746,688 cpu_clk_unhalted.thread:k # 0.7 Kernel_Utilization 1,088,348 cpu_clk_unhalted.thread Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200309013125.7559-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf events parser: Add missing Intel CPU events to parserAdrian Hunter1-5/+7
perf list expects CPU events to be parseable by name, e.g. # perf list | grep el-capacity-read el-capacity-read OR cpu/el-capacity-read/ [Kernel PMU event] But the event parser does not recognize them that way, e.g. # perf test -v "Parse event" <SNIP> running test 54 'cycles//u' running test 55 'cycles:k' running test 0 'cpu/config=10,config1,config2=3,period=1000/u' running test 1 'cpu/config=1,name=krava/u,cpu/config=2/u' running test 2 'cpu/config=1,call-graph=fp,time,period=100000/,cpu/config=2,call-graph=no,time=0,period=2000/' running test 3 'cpu/name='COMPLEX_CYCLES_NAME:orig=cycles,desc=chip-clock-ticks',period=0x1,event=0x2/ukp' -> cpu/event=0,umask=0x11/ -> cpu/event=0,umask=0x13/ -> cpu/event=0x54,umask=0x1/ failed to parse event 'el-capacity-read:u,cpu/event=el-capacity-read/u', err 1, str 'parser error' event syntax error: 'el-capacity-read:u,cpu/event=el-capacity-read/u' \___ parser error test child finished with 1 ---- end ---- Parse event definition strings: FAILED! This happens because the parser splits names by '-' in order to deal with cache events. For example 'L1-dcache' is a token in parse-events.l which is matched to 'L1-dcache-load-miss' by the following rule: PE_NAME_CACHE_TYPE '-' PE_NAME_CACHE_OP_RESULT '-' PE_NAME_CACHE_OP_RESULT opt_event_config And so there is special handling for 2-part PMU names i.e. PE_PMU_EVENT_PRE '-' PE_PMU_EVENT_SUF sep_dc but no handling for 3-part names, which are instead added as tokens e.g. topdown-[a-z-]+ While it would be possible to add a rule for 3-part names, that would not work if the first parts were also a valid PMU name e.g. 'el-capacity-read' would be matched to 'el-capacity' before the parser reached the 3rd part. The parser would need significant change to rationalize all this, so instead fix for now by adding missing Intel CPU events with 3-part names to the event parser as tokens. Missing events were found by using: grep -r EVENT_ATTR_STR arch/x86/events/intel/core.c Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Link: http://lore.kernel.org/lkml/90c7ae07-c568-b6d3-f9c4-d0c1528a0610@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf script: Allow --symbol to accept hexadecimal addressesStephane Eranian1-4/+17
This patch extends the perf script --symbols option to filter on hexadecimal addresses in addition to symbol names. This makes it easier to handle cases where symbols are aliased. With this patch, it is possible to mix and match symbols and hexadecimal addresses using the --symbols option. $ perf script --symbols=noploop,0x4007a0 Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325220802.15039-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf report/top TUI: Fix title line formattingArnaldo Carvalho de Melo1-4/+4
In d10ec006dcd7 ("perf hists browser: Allow passing an initial hotkey") the hist_entry__title() call was cut'n'pasted to a function where the 'title' variable is a pointer, not an array, so the sizeof(title) continues syntactically valid but ends up reducing the real size of the buffer where to format the first line in the screen to 8 bytes, which makes the formatting at the title at each refresh to produce just the string "Samples ", duh, fix it by passing the size of the buffer. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: d10ec006dcd7 ("perf hists browser: Allow passing an initial hotkey") Link: http://lore.kernel.org/lkml/20200330154314.GB4576@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf top: Support hotkey to change sort orderJin Yao1-2/+9
It would be nice if we can use a hotkey in perf top browser to select a event for sorting. For example: perf top --group -e cycles,instructions,cache-misses Samples Overhead Shared Object Symbol 40.03% 45.71% 0.03% div [.] main 20.46% 14.67% 0.21% libc-2.27.so [.] __random_r 20.01% 19.54% 0.02% libc-2.27.so [.] __random 9.68% 10.68% 0.00% div [.] compute_flag 4.32% 4.70% 0.00% libc-2.27.so [.] rand 3.84% 3.43% 0.00% div [.] rand@plt 0.05% 0.05% 2.33% libc-2.27.so [.] __strcmp_sse2_unaligned 0.04% 0.08% 2.43% perf [.] perf_hpp__is_dynamic_en 0.04% 0.02% 6.64% perf [.] rb_next 0.04% 0.01% 3.87% perf [.] dso__find_symbol 0.04% 0.04% 1.77% perf [.] sort__dso_cmp When user press hotkey '2' (event index, starting from 0), it indicates to sort output by the third event in group (cache-misses). Samples Overhead Shared Object Symbol 4.07% 1.28% 6.68% perf [.] rb_next 3.57% 3.98% 4.11% perf [.] __hists__insert_output 3.67% 11.24% 3.60% perf [.] perf_hpp__is_dynamic_e 3.67% 3.20% 3.20% perf [.] hpp__sort_overhead 0.81% 0.06% 3.01% perf [.] dso__find_symbol 1.62% 5.47% 2.51% perf [.] hists__match 2.70% 1.86% 2.47% libc-2.27.so [.] _int_malloc 0.19% 0.00% 2.29% [kernel] [k] copy_page 0.41% 0.32% 1.98% perf [.] hists__decay_entries 1.84% 3.67% 1.68% perf [.] sort__dso_cmp 0.16% 0.00% 1.63% [kernel] [k] clear_page_erms Now the output is sorted by cache-misses. v2: --- Zero the history if hotkey is pressed. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200324220711.6025-2-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf top: Support --group-sort-idx to change the sort orderJin Yao2-0/+9
'perf report' supports the option --group-sort-idx, which sorts the output by the event at the index n in event group. For example: perf record -e cycles,instructions,cache-misses perf report --group --group-sort-idx 2 --stdio The perf-report output is sorted by cache-misses. This patch supports --group-sort-idx in perf-top. For example: perf top --group -e cycles,instructions,cache-misses --group-sort-idx 2 The perf-top output is sorted by cache-misses. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200324220711.6025-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf symbols: Fix arm64 gap between kernel start and module endKemeng Shi2-0/+28
During execution of command 'perf report' in my arm64 virtual machine, this error message is showed: failed to process sample __symbol__inc_addr_samples(860): ENOMEM! sym->name=__this_module, start=0x1477100, addr=0x147dbd8, end=0x80002000, func: 0 The error is caused with path: cmd_report __cmd_report perf_session__process_events __perf_session__process_events ordered_events__flush __ordered_events__flush oe->deliver (ordered_events__deliver_event) perf_session__deliver_event machines__deliver_event perf_evlist__deliver_sample tool->sample (process_sample_event) hist_entry_iter__add iter->add_entry_cb(hist_iter__report_callback) hist_entry__inc_addr_samples symbol__inc_addr_samples __symbol__inc_addr_samples h = annotated_source__histogram(src, evidx) (NULL) annotated_source__histogram failed is caused with path: ... hist_entry__inc_addr_samples symbol__inc_addr_samples symbol__hists annotated_source__alloc_histograms src->histograms = calloc(nr_hists, sizeof_sym_hist) (failed) Calloc failed as the symbol__size(sym) is too huge. As show in error message: start=0x1477100, end=0x80002000, size of symbol is about 2G. This is the same problem as 'perf annotate: Fix s390 gap between kernel end and module start (b9c0a64901d5bd)'. Perf gets symbol information from /proc/kallsyms in __dso__load_kallsyms. A part of symbol in /proc/kallsyms from my virtual machine is as follows: #cat /proc/kallsyms | sort ... ffff000001475080 d rpfilter_mt_reg [ip6t_rpfilter] ffff000001475100 d $d [ip6t_rpfilter] ffff000001475100 d __this_module [ip6t_rpfilter] ffff000080080000 t _head ffff000080080000 T _text ffff000080080040 t pe_header ... Take line 'ffff000001475100 d __this_module [ip6t_rpfilter]' as example. The start and end of symbol are both set to ffff000001475100 in dso__load_all_kallsyms. Then symbols__fixup_end will set the end of symbol to next big address to ffff000001475100 in /proc/kallsyms, ffff000080080000 in this example. Then sizeof of symbol will be about 2G and cause the problem. The start of module in my machine is ffff000000a62000 t $x [dm_mod] The start of kernel in my machine is ffff000080080000 t _head There is a big gap between end of module and begin of kernel if a samll amount of memory is used by module. And the last symbol in module will have a large address range as caotaining the big gap. Give that the module and kernel text segment sequence may change in the future, fix this by limiting range of last symbol in module and kernel to 4K in arch arm64. Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hewenliang <hewenliang4@huawei.com> Cc: Hu Shiyuan <hushiyuan@huawei.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Link: http://lore.kernel.org/lkml/33fd24c4-0d5a-9d93-9b62-dffa97c992ca@huawei.com [ refreshed the patch on current codebase, added string.h include as strchr() is used ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf build-test: Honour JOBS to override detection of number of coresArnaldo Carvalho de Melo1-3/+7
When one does: $ make -C tools/perf build-test The makefile in tools/perf/tests/ will, just like the main one, detect how many cores are in the system and use it with -j. Sometimes we may need to override that, for instance, when using icecream or distcc to use multiple machines in the build process, then we need to, as with the main makefile, use: $ make JOBS=N -C tools/perf build-test Fix the tests makefile to honour that. Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200330130301.GA31702@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf script: Add --show-cgroup-events optionNamhyung Kim2-0/+44
The --show-cgroup-events option is to print CGROUP events in the output like others. Committer testing: [root@seventh ~]# perf record --all-cgroups --namespaces /wb/cgtest [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.039 MB perf.data (487 samples) ] [root@seventh ~]# perf script --show-cgroup-events | grep PERF_RECORD_CGROUP -B2 -A2 swapper 0 0.000000: PERF_RECORD_CGROUP cgroup: 1 / perf 12145 11200.440730: 1 cycles: ffffffffb900d58b __intel_pmu_enable_all.constprop.0+0x3b (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux) perf 12145 11200.440733: 1 cycles: ffffffffb900d58b __intel_pmu_enable_all.constprop.0+0x3b (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux) -- cgtest 12145 11200.440739: 193472 cycles: ffffffffb90f6fbc commit_creds+0x1fc (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux) cgtest 12145 11200.440790: 2691608 cycles: 7fa2cb43019b _dl_sysdep_start+0x7cb (/usr/lib64/ld-2.29.so) cgtest 12145 11200.440962: PERF_RECORD_CGROUP cgroup: 83 /sub cgtest 12147 11200.441054: 1 cycles: ffffffffb900d58b __intel_pmu_enable_all.constprop.0+0x3b (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux) cgtest 12147 11200.441057: 1 cycles: ffffffffb900d58b __intel_pmu_enable_all.constprop.0+0x3b (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux) -- cgtest 12148 11200.441103: 10227 cycles: ffffffffb9a0153d end_repeat_nmi+0x48 (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux) cgtest 12148 11200.441106: 273295 cycles: ffffffffb99ecbc7 copy_page+0x7 (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux) cgtest 12147 11200.441133: PERF_RECORD_CGROUP cgroup: 88 /sub/cgrp1 cgtest 12147 11200.441143: 2788845 cycles: ffffffffb94676c2 security_genfs_sid+0x102 (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux) cgtest 12148 11200.441162: PERF_RECORD_CGROUP cgroup: 93 /sub/cgrp2 cgtest 12148 11200.441182: 2669546 cycles: 401020 _init+0x20 (/wb/cgtest) cgtest 12149 11200.441247: 1 cycles: ffffffffb900d58b __intel_pmu_enable_all.constprop.0+0x3b (/lib/modules/5.6.0-rc6-00008-gfe2413eefd7f/build/vmlinux) [root@seventh ~]# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325124536.2800725-10-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf top: Add --all-cgroups optionNamhyung Kim2-0/+19
The --all-cgroups option is to enable cgroup profiling support. It tells kernel to record CGROUP events in the ring buffer so that 'perf top' can identify task/cgroup association later. Committer testing: Use: # perf top --all-cgroups -s cgroup_id,cgroup,pid Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325124536.2800725-9-namhyung@kernel.org Link: http://lore.kernel.org/lkml/20200402015249.3800462-1-namhyung@kernel.org [ Extracted the HAVE_FILE_HANDLE from the followup patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf record: Add --all-cgroups optionNamhyung Kim5-2/+27
The --all-cgroups option is to enable cgroup profiling support. It tells kernel to record CGROUP events in the ring buffer so that perf report can identify task/cgroup association later. [root@seventh ~]# perf record --all-cgroups --namespaces /wb/cgtest [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.042 MB perf.data (558 samples) ] [root@seventh ~]# perf report --stdio -s cgroup_id,cgroup,pid # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 558 of event 'cycles' # Event count (approx.): 458017341 # # Overhead cgroup id (dev/inode) Cgroup Pid:Command # ........ ..................... .......... ............... # 33.15% 4/0xeffffffb /sub 9615:looper0 32.83% 4/0xf00002f5 /sub/cgrp2 9620:looper2 32.79% 4/0xf00002f4 /sub/cgrp1 9619:looper1 0.35% 4/0xf00002f5 /sub/cgrp2 9618:cgtest 0.34% 4/0xf00002f4 /sub/cgrp1 9617:cgtest 0.32% 4/0xeffffffb / 9615:looper0 0.11% 4/0xeffffffb /sub 9617:cgtest 0.10% 4/0xeffffffb /sub 9618:cgtest # # (Tip: Sample related events with: perf record -e '{cycles,instructions}:S') # [root@seventh ~]# Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325124536.2800725-8-namhyung@kernel.org Link: http://lore.kernel.org/lkml/20200402015249.3800462-1-namhyung@kernel.org [ Extracted the HAVE_FILE_HANDLE from the followup patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf record: Support synthesizing cgroup eventsNamhyung Kim4-0/+129
Synthesize cgroup events by iterating cgroup filesystem directories. The cgroup event only saves the portion of cgroup path after the mount point and the cgroup id (which actually is a file handle). Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325124536.2800725-7-namhyung@kernel.org Link: http://lore.kernel.org/lkml/20200402015249.3800462-1-namhyung@kernel.org [ Extracted the HAVE_FILE_HANDLE from the followup patch, added missing __maybe_unused ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf report: Add 'cgroup' sort keyNamhyung Kim5-0/+54
The cgroup sort key is to show cgroup membership of each task. Currently it shows full path in the cgroupfs (not relative to the root of cgroup namespace) since it'd be more intuitive IMHO. Otherwise root cgroup in different namespaces will all show same name - "/". The cgroup sort key should come before cgroup_id otherwise sort_dimension__add() will match it to cgroup_id as it only matches with the given substring. For example it will look like following. Note that record patch adding --all-cgroups patch will come later. $ perf record -a --namespace --all-cgroups cgtest [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.208 MB perf.data (4090 samples) ] $ perf report -s cgroup_id,cgroup,pid ... # Overhead cgroup id (dev/inode) Cgroup Pid:Command # ........ ..................... .......... ............... # 93.96% 0/0x0 / 0:swapper 1.25% 3/0xeffffffb / 278:looper0 0.86% 3/0xf000015f /sub/cgrp1 280:cgtest 0.37% 3/0xf0000160 /sub/cgrp2 281:cgtest 0.34% 3/0xf0000163 /sub/cgrp3 282:cgtest 0.22% 3/0xeffffffb /sub 278:looper0 0.20% 3/0xeffffffb / 280:cgtest 0.15% 3/0xf0000163 /sub/cgrp3 285:looper3 Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325124536.2800725-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf cgroup: Maintain cgroup hierarchyNamhyung Kim5-5/+109
Each cgroup is kept in the perf_env's cgroup_tree sorted by the cgroup id. Hist entries have cgroup id can compare it directly and later it can be used to find a group name using this tree. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325124536.2800725-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf tools: Basic support for CGROUP eventNamhyung Kim12-1/+67
Implement basic functionality to support cgroup tracking. Each cgroup can be identified by inode number which can be read from userspace too. The actual cgroup processing will come in the later patch. Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> [ fix perf test failure on sampling parsing ] Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200325124536.2800725-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf tools: Add file-handle feature testNamhyung Kim1-0/+4
The file handle (FHANDLE) support is configurable so some systems might not have it. So add a config feature item to check it on build time so that we don't add the cgroup tracking feature based on that. Committer notes: Had to make the test use the same construct as its later use in synthetic-events.c, in the next patch in this series. i.e. make it be: struct { struct file_handle fh; uint64_t cgroup_id; } handle; To cope with: CC /tmp/build/perf/util/cloexec.o util/synthetic-events.c:428:22: error: field 'fh' with CC /tmp/build/perf/util/call-path.o variable sized type 'struct file_handle' not at the end of a struct or class is a GNU extension [-Werror,-Wgnu-variable-sized-type-not-at-end] struct file_handle fh; ^ 1 error generated. Deal with this at some point, i.e. investigate if the right thing is to remove that -Wgnu-variable-sized-type-not-at-end from our CFLAGS, for now do the test the same way as it is used looks more sensible. Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200402015249.3800462-1-namhyung@kernel.org [ split from a larger patch, removed blank line at EOF ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-03perf python: Include rwsem.c in the pythong bidingArnaldo Carvalho de Melo1-0/+1
We'll need it for the cgroup patches, and its better to have it in a separate patch in case we need to later revert the cgroup patches. I.e. without this we have: [root@five ~]# perf test -v python 19: 'import perf' in python : --- start --- test child forked, pid 148447 Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: /tmp/build/perf/python/perf.cpython-37m-x86_64-linux-gnu.so: undefined symbol: down_write test child finished with -1 ---- end ---- 'import perf' in python: FAILED! [root@five ~]# Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200403123606.GC23243@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-31Merge branch 'x86-misc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds3-0/+544
Pull misc x86 updates from Ingo Molnar: - extend the decoder maps with CET instructions - fix !vDSO corner cases * 'x86-misc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/tests: Add CET instructions to the new instructions test x86/insn: Add Control-flow Enforcement (CET) instructions to the opcode map selftests/x86/ptrace_syscall_32: Fix no-vDSO segfault selftests/x86/vdso: Fix no-vDSO segfaults
2020-03-27perf script: Introduce --deltatime optionHagen Paul Pfeifer2-0/+20
For some kind of analysis a deltatime output is more human friendly and reduce the cognitive load for further analysis. The following output demonstrate the new option "deltatime": calculate the time difference in relation to the previous event. $ perf script --deltatime test 2525 [001] 0.000000: sdt_libev:ev_add: (5635e72a5ebd) test 2525 [001] 0.000091: sdt_libev:epoll_wait_enter: (5635e72a76a9) test 2525 [001] 1.000051: sdt_libev:epoll_wait_return: (5635e72a772e) arg1=1 test 2525 [001] 0.000685: sdt_libev:ev_add: (5635e72a5ebd) test 2525 [001] 0.000048: sdt_libev:epoll_wait_enter: (5635e72a76a9) test 2525 [001] 1.000104: sdt_libev:epoll_wait_return: (5635e72a772e) arg1=1 test 2525 [001] 0.003895: sdt_libev:epoll_wait_enter: (5635e72a76a9) test 2525 [001] 0.996034: sdt_libev:epoll_wait_return: (5635e72a772e) arg1=1 test 2525 [001] 0.000058: sdt_libev:epoll_wait_enter: (5635e72a76a9) test 2525 [001] 1.000004: sdt_libev:epoll_wait_return: (5635e72a772e) arg1=1 test 2525 [001] 0.000064: sdt_libev:epoll_wait_enter: (5635e72a76a9) test 2525 [001] 0.999934: sdt_libev:epoll_wait_return: (5635e72a772e) arg1=1 test 2525 [001] 0.000056: sdt_libev:epoll_wait_enter: (5635e72a76a9) test 2525 [001] 0.999930: sdt_libev:epoll_wait_return: (5635e72a772e) arg1=1 Committer testing: So go from default output to --reltime and then this new --deltatime, to contrast the various timestamp presentation modes for a random perf.data file I had laying around: [root@five ~]# perf script --reltime | head perf 442394 [000] 0.000000: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 0.000002: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 0.000004: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 0.000006: 128 cycles: ffffffff972415a1 perf_event_update_userpage+0x1 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 0.000009: 2597 cycles: ffffffff97463785 cap_task_setscheduler+0x5 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 0.000036: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 0.000038: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 0.000040: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 0.000041: 224 cycles: ffffffff9700a53a perf_ibs_handle_irq+0x1da (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 0.000044: 4439 cycles: ffffffff97120d85 put_prev_entity+0x45 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) [root@five ~]# perf script --deltatime | head perf 442394 [000] 0.000000: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 0.000002: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 0.000001: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 0.000001: 128 cycles: ffffffff972415a1 perf_event_update_userpage+0x1 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 0.000002: 2597 cycles: ffffffff97463785 cap_task_setscheduler+0x5 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 0.000027: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 0.000002: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 0.000001: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 0.000001: 224 cycles: ffffffff9700a53a perf_ibs_handle_irq+0x1da (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 0.000002: 4439 cycles: ffffffff97120d85 put_prev_entity+0x45 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) [root@five ~]# perf script | head perf 442394 [000] 7600.157861: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 7600.157864: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 7600.157866: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 7600.157867: 128 cycles: ffffffff972415a1 perf_event_update_userpage+0x1 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [000] 7600.157870: 2597 cycles: ffffffff97463785 cap_task_setscheduler+0x5 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 7600.157897: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 7600.157900: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 7600.157901: 16 cycles: ffffffff9706e544 native_write_msr+0x4 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 7600.157903: 224 cycles: ffffffff9700a53a perf_ibs_handle_irq+0x1da (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) perf 442394 [001] 7600.157906: 4439 cycles: ffffffff97120d85 put_prev_entity+0x45 (/usr/lib/debug/lib/modules/5.5.10-200.fc31.x86_64/vmlinux) [root@five ~]# Andi suggested we better implement it as a new field, i.e. -F deltatime, like: [root@five ~]# perf script -F deltatime Invalid field requested. Usage: perf script [<options>] or: perf script [<options>] record <script> [<record-options>] <command> or: perf script [<options>] report <script> [script-args] or: perf script [<options>] <script> [<record-options>] <command> or: perf script [<options>] <top-script> [script-args] -F, --fields <str> comma separated output fields prepend with 'type:'. +field to add and -field to remove.Valid types: hw,sw,trace,raw,synth. Fields: comm,tid,pid,time,cpu,event,trace,ip,sym,dso,addr,symoff,srcline,period,iregs,uregs,brstack,brstacksym,flags,bpf-output,brstackinsn,brstackoff,callindent,insn,insnlen,synth,phys_addr,metric,misc,ipc [root@five ~]# I.e. we have -F for maximum flexibility: [root@five ~]# perf script -F comm,pid,cpu,time | head perf 442394 [000] 7600.157861: perf 442394 [000] 7600.157864: perf 442394 [000] 7600.157866: perf 442394 [000] 7600.157867: perf 442394 [000] 7600.157870: perf 442394 [001] 7600.157897: perf 442394 [001] 7600.157900: perf 442394 [001] 7600.157901: perf 442394 [001] 7600.157903: perf 442394 [001] 7600.157906: [root@five ~]# But since we already have --reltime, having --deltatime, documented one after the other is sensible. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20200204173709.489161-1-hagen@jauu.net [ Added 'perf script' man page entry for --deltatime ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-27perf test x86: Add CET instructions to the new instructions testAdrian Hunter3-0/+544
Add to the "x86 instruction decoder - new instructions" test the following instructions: incsspd incsspq rdsspd rdsspq saveprevssp rstorssp wrssd wrssq wrussd wrussq setssbsy clrssbsy endbr32 endbr64 And the "notrack" prefix for indirect calls and jumps. For information about the instructions, refer Intel Control-flow Enforcement Technology Specification May 2019 (334525-003). Committer testing: $ perf test instr 67: x86 instruction decoder - new instructions : Ok $ Then use verbose mode and check one of those new instructions: $ perf test -v instr |& grep saveprevssp Decoded ok: f3 0f 01 ea saveprevssp Decoded ok: f3 0f 01 ea saveprevssp $ Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi v. Shankar <ravi.v.shankar@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/20200204171425.28073-3-yu-cheng.yu@intel.com Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-26perf: Normalize gcc parameter when generating arch errno tableHe Zhe1-1/+1
The $(CC) passed to arch_errno_names.sh may include a series of parameters along with gcc itself. To avoid overwriting the following parameters of arch_errno_names.sh and break the build like below, we just pick up the first word of the $(CC). find: unknown predicate `-m64/arch' x86_64-wrs-linux-gcc: warning: '-x c' after last input file has no effect x86_64-wrs-linux-gcc: error: unrecognized command line option '-m64/include/uapi/asm-generic/errno.h' x86_64-wrs-linux-gcc: fatal error: no input files Signed-off-by: He Zhe <zhe.he@windriver.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/1581618066-187262-2-git-send-email-zhe.he@windriver.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-26perf parse-events: Add defensive NULL checkIan Rogers1-6/+5
Terms may have a NULL config in which case a strcmp will SEGV. This can be reproduced with: perf stat -e '*/event=?,nr/' sleep 1 Add a NULL check to avoid this. This was caught by LLVM's libfuzzer. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: clang-built-linux@googlegroups.com Link: http://lore.kernel.org/lkml/20200325164022.41385-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-26perf/tests: Add CET instructions to the new instructions testAdrian Hunter3-0/+544
Add to the "x86 instruction decoder - new instructions" test the following instructions: incsspd incsspq rdsspd rdsspq saveprevssp rstorssp wrssd wrssq wrussd wrussq setssbsy clrssbsy endbr32 endbr64 And the notrack prefix for indirect calls and jumps. For information about the instructions, refer Intel Control-flow Enforcement Technology Specification May 2019 (334525-003). Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lkml.kernel.org/r/20200204171425.28073-3-yu-cheng.yu@intel.com
2020-03-25perf callchain: Update docs regarding kernel/user space unwindingTony Jones2-12/+20
The method of unwinding for kernel space is defined by the kernel config, not by the value of --call-graph. Improve the documentation to reflect this. Signed-off-by: Tony Jones <tonyj@suse.de> Link: http://lore.kernel.org/lkml/20200325164053.10177-1-tonyj@suse.de Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-25.gitignore: add SPDX License IdentifierMasahiro Yamada2-0/+2
Add SPDX License Identifier to all .gitignore files. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-24perf dso: Fix dso comparisonRavi Bangoria1-1/+21
Perf gets dso details from two different sources. 1st, from builid headers in perf.data and 2nd from MMAP2 samples. Dso from buildid header does not have dso_id detail. And dso from MMAP2 samples does not have buildid information. If detail of the same dso is present at both the places, filename is common. Previously, __dsos__findnew_link_by_longname_id() used to compare only long or short names, but Commit 0e3149f86b99 ("perf dso: Move dso_id from 'struct map' to 'struct dso'") also added a dso_id comparison. Because of that, now perf is creating two different dso objects of the same file, one from buildid header (with dso_id but without buildid) and second from MMAP2 sample (with buildid but without dso_id). This is causing issues with archive, buildid-list etc subcommands. Fix this by comparing dso_id only when it's present. And incase dso is present in 'dsos' list without dso_id, inject dso_id detail as well. Before: $ sudo ./perf buildid-list -H 0000000000000000000000000000000000000000 /usr/bin/ls 0000000000000000000000000000000000000000 /usr/lib64/ld-2.30.so 0000000000000000000000000000000000000000 /usr/lib64/libc-2.30.so $ ./perf archive perf archive: no build-ids found After: $ ./perf buildid-list -H b6b1291d0cead046ed0fa5734037fa87a579adee /usr/bin/ls 641f0c90cfa15779352f12c0ec3c7a2b2b6f41e8 /usr/lib64/ld-2.30.so 675ace3ca07a0b863df01f461a7b0984c65c8b37 /usr/lib64/libc-2.30.so $ ./perf archive Now please run: $ tar xvf perf.data.tar.bz2 -C ~/.debug wherever you need to run 'perf report' on. Committer notes: Renamed is_empty_dso_id() to dso_id__empty() and inject_dso_id() to dso__inject_id() to keep namespacing consistent. Fixes: 0e3149f86b99 ("perf dso: Move dso_id from 'struct map' to 'struct dso'") Reported-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20200324042424.68366-1-ravi.bangoria@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-24perf cpumap: Fix snprintf overflow checkChristophe JAILLET1-5/+5
'snprintf' returns the number of characters which would be generated for the given input. If the returned value is *greater than* or equal to the buffer size, it means that the output has been truncated. Fix the overflow test accordingly. Fixes: 7780c25bae59f ("perf tools: Allow ability to map cpus to nodes easily") Fixes: 92a7e1278005b ("perf cpumap: Add cpu__max_present_cpu()") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Suggested-by: David Laight <David.Laight@ACULAB.COM> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Don Zickus <dzickus@redhat.com> Cc: He Zhe <zhe.he@windriver.com> Cc: Jan Stancek <jstancek@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-janitors@vger.kernel.org Link: http://lore.kernel.org/lkml/20200324070319.10901-1-christophe.jaillet@wanadoo.fr Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-24perf test: Test pmu-events aliasesJohn Garry1-1/+147
Add creating event aliases to the pmu-events test. So currently we verify that the generated pmu-events.c is as expected for some test events. Now test that we generate aliases as expected for those events during normal operation. For that, we cycle through each HW PMU in the system, and use the test events to create aliases, and verify those against known, expected values. For core PMUs, we should create an alias for every event in test_cpu_events[]. However, for uncore PMUs, they need to be matched by the pmu_event.pmu member, so use test_uncore_events[]; so check the match beforehand with pmu_uncore_alias_match(). A sample run is as follows for my x86 machine: john@linux-3c19:~/linux> tools/perf/perf test -vv 10 10: PMU events : --- start --- ... testing PMU uncore_arb aliases: no events to match testing PMU cstate_pkg aliases: no events to match skipping testing PMU breakpoint testing aliases PMU uncore_cbox_1: matched event unc_cbo_xsnp_response.miss_eviction testing PMU uncore_cbox_1 aliases: pass testing PMU power aliases: no events to match testing aliases PMU cpu: matched event bp_l1_btb_correct testing aliases PMU cpu: matched event bp_l2_btb_correct testing aliases PMU cpu: matched event segment_reg_loads.any testing aliases PMU cpu: matched event dispatch_blocked.any testing aliases PMU cpu: matched event eist_trans testing PMU cpu aliases: pass testing PMU intel_pt aliases: no events to match skipping testing PMU software skipping testing PMU intel_bts testing PMU uncore_imc aliases: no events to match testing aliases PMU uncore_cbox_0: matched event unc_cbo_xsnp_response.miss_eviction testing PMU uncore_cbox_0 aliases: pass testing PMU cstate_core aliases: no events to match skipping testing PMU tracepoint testing PMU msr aliases: no events to match test child finished with 0 Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1584442939-8911-8-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-24perf pmu: Make pmu_uncore_alias_match() publicJohn Garry2-1/+2
The perf pmu-events test will want to use pmu_uncore_alias_match(), so make it public. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1584442939-8911-7-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-24perf pmu: Add is_pmu_core()John Garry2-0/+6
Add a function to decide whether a PMU is a core PMU. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1584442939-8911-6-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-24perf test: Add pmu-events testJohn Garry4-0/+239
The initial test will verify that the test tables in generated pmu-events.c match against known, expected values. For known events added in pmu-events/arch/test, we need to add an entry in test_cpu_aliases_events[] or test_uncore_events[]. A sample run is as follows for x86: john@linux-3c19:~/linux> tools/perf/perf test -vv 10 10: PMU event aliases : --- start --- test child forked, pid 5316 testing event table bp_l1_btb_correct: pass testing event table bp_l2_btb_correct: pass testing event table segment_reg_loads.any: pass testing event table dispatch_blocked.any: pass testing event table eist_trans: pass testing event table uncore_hisi_ddrc.flux_wcmd: pass testing event table unc_cbo_xsnp_response.miss_eviction: pass test child finished with 0 ---- end ---- PMU event aliases: Ok Signed-off-by: John Garry <john.garry@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com [ Fixup test_cpu_events[] and test_uncore_events[] sentinels to initialize one of its members to NULL, fixing the build in older compilers ] Link: http://lore.kernel.org/lkml/1584442939-8911-5-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-24perf pmu: Refactor pmu_add_cpu_aliases()John Garry2-8/+16
Create pmu_add_cpu_aliases_map() from pmu_add_cpu_aliases(), so the caller can pass the map; the pmu-events test would use this since there would be no CPUID matching to a mapfile there. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1584442939-8911-4-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-24perf jevents: Support test events folderJohn Garry1-0/+30
With the goal of supporting pmu-events test case, introduce support for a test events folder. These test events can be used for testing generation of pmu-event tables and alias creation for any arch. When running the pmu-events test case, these test events will be used as the platform-agnostic events, so aliases can be created per-PMU and validated against known expected values. To support the test events, add a "testcpu" entry in pmu_events_map[]. The pmu-events test will be able to lookup the events map for "testcpu", to verify the generated tables against expected values. The resultant generated pmu-events.c will now look like the following: struct pmu_event pme_ampere_emag[] = { { .name = "ldrex_spec", .event = "event=0x6c", .desc = "Exclusive operation spe...", .topic = "intrinsic", .long_desc = "Exclusive operation ...", }, ... }; struct pmu_event pme_test_cpu[] = { { .name = "uncore_hisi_ddrc.flux_wcmd", .event = "event=0x2", .desc = "DDRC write commands. Unit: hisi_sccl,ddrc ", .topic = "uncore", .long_desc = "DDRC write commands", .pmu = "hisi_sccl,ddrc", }, { .name = "unc_cbo_xsnp_response.miss_eviction", .event = "umask=0x81,event=0x22", .desc = "Unit: uncore_cbox A cross-core snoop resulted ...", .topic = "uncore", .long_desc = "A cross-core snoop resulted from L3 ...", .pmu = "uncore_cbox", }, { .name = "eist_trans", .event = "umask=0x0,period=200000,event=0x3a", .desc = "Number of Enhanced Intel SpeedStep(R) ...", .topic = "other", }, { .name = 0, }, }; struct pmu_events_map pmu_events_map[] = { ... { .cpuid = "0x00000000500f0000", .version = "v1", .type = "core", .table = pme_ampere_emag }, ... { .cpuid = "testcpu", .version = "v1", .type = "core", .table = pme_test_cpu, }, { .cpuid = 0, .version = 0, .type = 0, .table = 0, }, }; Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1584442939-8911-3-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-24perf jevents: Add some test eventsJohn Garry3-0/+59
Add some test PMU events. The events are randomly chosen from x86 and arm64 JSONs. The events include CPU and uncore events. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: linuxarm@huawei.com Link: http://lore.kernel.org/lkml/1584442939-8911-2-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-24perf tools: Unify a bit the build directory outputJiri Olsa1-4/+5
Removing the extra 'SUBDIR' line from clean and doc build output. Because it's annoying.. ;-) Before: $ make clean ... SUBDIR Documentation CLEAN Documentation After: $ make clean ... CLEAN Documentation Before: $ make doc BUILD: Doing 'make -j8' parallel build SUBDIR Documentation ASCIIDOC perf-stat.html ... After: $ make doc BUILD: Doing 'make -j8' parallel build ASCIIDOC perf-stat.html ... Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200318204522.1200981-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-24perf vendor events amd: Update Zen1 events to V2Vijay Thakkar7-136/+172
This patch updates the PMCs for AMD Zen1 core based processors (Family 17h; Models 0 through 2F) to be in accordance with PMCs as documented in the latest versions of the AMD Processor Programming Reference [1], [2] and [3]. Note that some events, such as FPU pipe assignment are missing in [1], and therefore [3] is included for full coverage of events. PMCs added: fpu_pipe_assignment.dual{0|1|2|3} fpu_pipe_assignment.total{0|1|2|3} ls_mab_alloc.dc_prefetcher ls_mab_alloc.stores ls_mab_alloc.loads bp_dyn_ind_pred bp_de_redirect PMC removed: ex_ret_cond_misp Cumulative counts, fpu_pipe_assignment.total and fpu_pipe_assignment.dual, existed in v1, but did expose port-level counters. ex_ret_cond_misp has been removed as it has been removed from the latest versions of the PPR, and when tested, always seems to sample zero as tested on a Ryzen 3400G system. [1]: Processor Programming Reference (PPR) for AMD Family 17h Models 01h,08h, Revision B2 Processors, 54945 Rev 3.03 - Jun 14, 2019. [2]: Processor Programming Reference (PPR) for AMD Family 17h Model 18h, Revision B1 Processors, 55570-B1 Rev 3.14 - Sep 26, 2019. [3]: OSRR for AMD Family 17h processors, Models 00h-2Fh, 56255 Rev 3.03 - July, 2018 All of the PPRs can be found at: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Signed-off-by: Vijay Thakkar <vijaythakkar@me.com> Acked-by: Kim Phillips <kim.phillips@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jon Grimm <jon.grimm@amd.com> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: vijay thakkar <vijaythakkar@me.com> Link: http://lore.kernel.org/lkml/20200318190002.307290-4-vijaythakkar@me.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-24perf vendor events amd: Add Zen2 eventsVijay Thakkar7-0/+1117
This patch adds PMU events for AMD Zen2 core based processors, namely, Matisse (model 71h), Castle Peak (model 31h) and Rome (model 2xh), as documented in the AMD Processor Programming Reference for Matisse [1]. The model number regex has been set to detect all the models under family 17 that do not match those of Zen1, as the range is larger for zen2. Zen2 adds some additional counters that are not present in Zen1 and events for them have been added in this patch. Some counters have also been removed for Zen2 thatwere previously present in Zen1 and have been confirmed to always sample zero on zen2. These added/removed counters have been omitted for brevity but can be found here: https://gist.github.com/thakkarV/5b12ca5fd7488eb2c42e451e40bdd5f3 Note that PPR for Zen2 [1] does not include some counters that were documented in the PPR for Zen1 based processors [2]. After having tested these counters, some of them that still work for zen2 systems have been preserved in the events for zen2. The counters that are omitted in [1] but are still measurable and non-zero on zen2 (tested on a Ryzen 3900X system) are the following: PMC 0x000 fpu_pipe_assignment.{total|total0|total1|total2|total3} PMC 0x004 fp_num_mov_elim_scal_op.* PMC 0x046 ls_tablewalker.* PMC 0x062 l2_latency.l2_cycles_waiting_on_fills PMC 0x063 l2_wcb_req.* PMC 0x06D l2_fill_pending.l2_fill_busy PMC 0x080 ic_fw32 PMC 0x081 ic_fw32_miss PMC 0x086 bp_snp_re_sync PMC 0x087 ic_fetch_stall.* PMC 0x08C ic_cache_inval.* PMC 0x099 bp_tlb_rel PMC 0x0C7 ex_ret_brn_resync PMC 0x28A ic_oc_mode_switch.* L3PMC 0x001 l3_request_g1.* L3PMC 0x006 l3_comb_clstr_state.* [1]: Processor Programming Reference (PPR) for AMD Family 17h Model 71h, Revision B0 Processors, 56176 Rev 3.06 - Jul 17, 2019 [2]: Processor Programming Reference (PPR) for AMD Family 17h Models 01h,08h, Revision B2 Processors, 54945 Rev 3.03 - Jun 14, 2019 All of the PPRs can be found at: https://bugzilla.kernel.org/show_bug.cgi?id=206537 Here are the results of running "fpu_pipe_assignment.total" events on my Ryzen 3900X family 17h model 71h system: Before this patch: $> perf list *fpu_pipe_assignment* List of pre-defined events (to be used in -e): After: $> perf list *fpu_pipe_assignment* floating point: fpu_pipe_assignment.total [Total number of fp uOps] fpu_pipe_assignment.total0 [Total number uOps assigned to pipe 0] fpu_pipe_assignment.total1 [Total number uOps assigned to pipe 1] fpu_pipe_assignment.total2 [Total number uOps assigned to pipe 2] fpu_pipe_assignment.total3 [Total number uOps assigned to pipe 3] Metric Groups: $> perf stat -e fpu_pipe_assignment.total sleep 1 Performance counter stats for 'sleep 1': 25,883 fpu_pipe_assignment.total 1.004145868 seconds time elapsed 0.001805000 seconds user 0.000000000 seconds sys Usage tests while running Linpackin the background: $> perf stat -I1000 -e fpu_pipe_assignment.total 1.000266796 79,313,191,516 fpu_pipe_assignment.total 2.000809630 68,091,474,430 fpu_pipe_assignment.total 3.001028115 52,925,023,174 fpu_pipe_assignment.total $> perf record -e fpu_pipe_assignment.total,fpu_pipe_assignment.total0 -a sleep 1 [ perf record: Woken up 9 times to write data ] [ perf record: Captured and wrote 4.031 MB perf.data (64764 samples) ] $> perf report --stdio --no-header | head -30 98.33% xhpl xhpl [.] dgemm_kernel 0.28% xhpl xhpl [.] dtrsm_kernel_LT 0.10% xhpl [kernel.kallsyms] [k] entry_SYSCALL_64 0.08% xhpl xhpl [.] idamax_k 0.07% baloo_file_extr liblmdb.so [.] mdb_mid2l_insert 0.06% xhpl xhpl [.] dgemm_itcopy 0.06% xhpl xhpl [.] dgemm_oncopy 0.06% xhpl [kernel.kallsyms] [k] __schedule 0.06% xhpl [kernel.kallsyms] [k] syscall_trace_enter 0.06% xhpl [kernel.kallsyms] [k] native_sched_clock 0.06% xhpl [kernel.kallsyms] [k] pick_next_task_fair 0.05% xhpl xhpl [.] blas_thread_server.llvm.15009391670273914865 0.04% xhpl [kernel.kallsyms] [k] do_syscall_64 0.04% xhpl [kernel.kallsyms] [k] yield_task_fair 0.04% xhpl libpthread-2.31.so [.] __pthread_mutex_unlock_usercnt 0.03% xhpl [kernel.kallsyms] [k] cpuacct_charge 0.03% xhpl [kernel.kallsyms] [k] syscall_return_via_sysret 0.03% xhpl libc-2.31.so [.] __sched_yield 0.03% xhpl [kernel.kallsyms] [k] __calc_delta $> perf annotate --stdio2 dgemm_kernel | egrep '^ {0,2}[0-9]+' -B2 -A2 sub $0x60,%rsp mov %rbx,(%rsp) 0.00 mov %rbp,0x8(%rsp) mov %r12,0x10(%rsp) 0.00 mov %r13,0x18(%rsp) mov %r14,0x20(%rsp) mov %r15,0x28(%rsp) -- mov %rdi,%r13 mov %rsi,0x28(%rsp) 0.00 mov %rdx,%r12 vmovsd %xmm0,0x30(%rsp) shl $0x3,%r10 mov 0x28(%rsp),%rax 0.00 xor %rdx,%rdx mov $0x18,%rdi div %rdi -- nop a0: mov %r12,%rax 0.00 shl $0x3,%rax mov %r8,%rdi lea (%r8,%rax,8),%r15 -- mov %r12,%rax nop 0.00 c0: vmovups (%rdi),%ymm1 0.09 vmovups 0x20(%rdi),%ymm2 0.02 vmovups (%r15),%ymm3 0.10 vmovups %ymm1,(%rsi) 0.07 vmovups %ymm2,0x20(%rsi) 0.07 vmovups %ymm3,0x40(%rsi) 0.06 add $0x40,%rdi add $0x40,%r15 add $0x60,%rsi 0.00 dec %rax ↑ jne c0 mov %r9,%r15 -- nop 110: lea 0x80(%rsp),%rsi 0.01 add $0x60,%rsi 0.03 mov %r12,%rax 0.00 sar $0x3,%rax cmp $0x2,%rax ↓ jl d26 prefetcht0 0x200(%rdi) 0.01 vmovups -0x60(%rsi),%ymm1 0.02 prefetcht0 0xa0(%rsi) 0.00 vbroadcastsd -0x80(%rdi),%ymm0 0.00 prefetcht0 0xe0(%rsi) 0.03 vmovups -0x40(%rsi),%ymm2 0.00 prefetcht0 0x120(%rsi) vmovups -0x20(%rsi),%ymm3 vmulpd %ymm0,%ymm1,%ymm4 0.01 prefetcht0 0x160(%rsi) vmulpd %ymm0,%ymm2,%ymm8 0.01 vmulpd %ymm0,%ymm3,%ymm12 0.02 prefetcht0 0x1a0(%rsi) 0.01 vbroadcastsd -0x78(%rdi),%ymm0 vmulpd %ymm0,%ymm1,%ymm5 0.01 vmulpd %ymm0,%ymm2,%ymm9 vmulpd %ymm0,%ymm3,%ymm13 0.01 vbroadcastsd -0x70(%rdi),%ymm0 vmulpd %ymm0,%ymm1,%ymm6 0.00 vmulpd %ymm0,%ymm2,%ymm10 0.00 add $0x60,%rsi ... snip ... nop 65e0: vmovddup -0x60(%rsi),%xmm2 0.00 vmovups -0x80(%rdi),%xmm0 vmovups -0x70(%rdi),%xmm1 0.00 vmovddup -0x58(%rsi),%xmm3 vfmadd231pd %xmm0,%xmm2,%xmm4 0.00 vfmadd231pd %xmm1,%xmm2,%xmm5 0.00 vfmadd231pd %xmm0,%xmm3,%xmm6 0.00 vfmadd231pd %xmm1,%xmm3,%xmm7 0.00 add $0x10,%rsi add $0x20,%rdi 0.00 dec %rax ↑ jne 65e0 nop nop 6620: vmovddup 0x30(%rsp),%xmm0 0.00 vmulpd %xmm0,%xmm4,%xmm4 0.00 vmulpd %xmm0,%xmm5,%xmm5 vmulpd %xmm0,%xmm6,%xmm6 vmulpd %xmm0,%xmm7,%xmm7 vaddpd (%r15),%xmm4,%xmm4 vaddpd 0x10(%r15),%xmm5,%xmm5 0.00 vaddpd (%r15,%r10,1),%xmm6,%xmm6 0.00 vaddpd 0x10(%r15,%r10,1),%xmm7,%xmm7 0.00 vmovups %xmm4,(%r15) vmovups %xmm5,0x10(%r15) 0.00 vmovups %xmm6,(%r15,%r10,1) vmovups %xmm7,0x10(%r15,%r10,1) add $0x20,%r15 -- lea (%r8,%rax,8),%r8 69d8: mov 0x20(%rsp),%r14 0.00 test $0x1,%r14 ↓ je 6d84 mov %r9,%r15 -- vbroadcastsd -0x28(%rsi),%ymm3 vfmadd231pd (%rdi),%ymm0,%ymm4 0.00 vfmadd231pd 0x20(%rdi),%ymm1,%ymm5 vfmadd231pd 0x40(%rdi),%ymm2,%ymm6 vfmadd231pd 0x60(%rdi),%ymm3,%ymm7 -- vmulpd %ymm0,%ymm4,%ymm4 vaddpd (%r15),%ymm4,%ymm4 0.00 vmovups %ymm4,(%r15) add $0x20,%r15 dec %r11 -- mov %rbx,%rsp mov (%rsp),%rbx 0.01 mov 0x8(%rsp),%rbp mov 0x10(%rsp),%r12 mov 0x18(%rsp),%r13 Signed-off-by: Vijay Thakkar <vijaythakkar@me.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Kim Phillips <kim.phillips@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jon Grimm <jon.grimm@amd.com> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200318190002.307290-3-vijaythakkar@me.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-24perf vendor events amd: Restrict model detection for zen1 based processorsVijay Thakkar7-1/+1
This patch changes the previous blanket detection of AMD Family 17h processors to be more specific to Zen1 core based products only by replacing model detection regex pattern [[:xdigit:]]+ with ([12][0-9A-F]|[0-9A-F]), restricting to models 0 though 2f only. This change is required to allow for the addition of separate PMU events for Zen2 core based models in the following patches as those belong to family 17h but have different PMCs. Current PMU events directory has also been renamed to "amdzen1" from "amdfam17h" to reflect this specificity. Note that although this change does not break PMU counters for existing zen1 based systems, it does disable the current set of counters for zen2 based systems. Counters for zen2 have been added in the following patches in this patchset. Signed-off-by: Vijay Thakkar <vijaythakkar@me.com> Acked-by: Kim Phillips <kim.phillips@amd.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jon Grimm <jon.grimm@amd.com> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200318190002.307290-2-vijaythakkar@me.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-24perf metricgroup: Fix printing event names of metric group with multiple events incase of overlapping eventsKajol Jain1-19/+30
Commit f01642e4912b ("perf metricgroup: Support multiple events for metricgroup") introduced support for multiple events in a metric group. But with the current upstream, metric events names are not printed properly incase we try to run multiple metric groups with overlapping event. With current upstream version, incase of overlapping metric events issue is, we always start our comparision logic from start. So, the events which already matched with some metric group also take part in comparision logic. Because of that when we have overlapping events, we end up matching current metric group event with already matched one. For example, in skylake machine we have metric event CoreIPC and Instructions. Both of them need 'inst_retired.any' event value. As events in Instructions is subset of events in CoreIPC, they endup in pointing to same 'inst_retired.any' value. In skylake platform: command:# ./perf stat -M CoreIPC,Instructions -C 0 sleep 1 Performance counter stats for 'CPU(s) 0': 1,254,992,790 inst_retired.any # 1254992790.0 Instructions # 1.3 CoreIPC 977,172,805 cycles 1,254,992,756 inst_retired.any 1.000802596 seconds time elapsed command:# sudo ./perf stat -M UPI,IPC sleep 1 Performance counter stats for 'sleep 1': 948,650 uops_retired.retire_slots 866,182 inst_retired.any # 0.7 IPC 866,182 inst_retired.any 1,175,671 cpu_clk_unhalted.thread Patch fixes the issue by adding a new bool pointer 'evlist_used' to keep track of events which already matched with some group by setting it true. So, we skip all used events in list when we start comparision logic. Patch also make some changes in comparision logic, incase we get a match miss, we discard the whole match and start again with first event id in metric event. With this patch: In skylake platform: command:# ./perf stat -M CoreIPC,Instructions -C 0 sleep 1 Performance counter stats for 'CPU(s) 0': 3,348,415 inst_retired.any # 0.3 CoreIPC 11,779,026 cycles 3,348,381 inst_retired.any # 3348381.0 Instructions 1.001649056 seconds time elapsed command:# ./perf stat -M UPI,IPC sleep 1 Performance counter stats for 'sleep 1': 1,023,148 uops_retired.retire_slots # 1.1 UPI 924,976 inst_retired.any 924,976 inst_retired.any # 0.6 IPC 1,489,414 cpu_clk_unhalted.thread 1.003064672 seconds time elapsed Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Link: http://lore.kernel.org/lkml/20200221101121.28920-1-kjain@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-24perf stat: Align the output for interval aggregation modeJin Yao1-3/+3
There is a slight misalignment in -A -I output. For example: # perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000 # time CPU counts unit events 1.000440863 CPU0 1,068,388 cpu/event=cpu-cycles/ 1.000440863 CPU1 875,954 cpu/event=cpu-cycles/ 1.000440863 CPU2 3,072,538 cpu/event=cpu-cycles/ 1.000440863 CPU3 4,026,870 cpu/event=cpu-cycles/ 1.000440863 CPU4 5,919,630 cpu/event=cpu-cycles/ 1.000440863 CPU5 2,714,260 cpu/event=cpu-cycles/ 1.000440863 CPU6 2,219,240 cpu/event=cpu-cycles/ 1.000440863 CPU7 1,299,232 cpu/event=cpu-cycles/ The value of counts is not aligned with the column "counts" and the event name is not aligned with the column "events". With this patch, the output is, # perf stat -e cpu/event=cpu-cycles/ -a -A -I 1000 # time CPU counts unit events 1.000423009 CPU0 997,421 cpu/event=cpu-cycles/ 1.000423009 CPU1 1,422,042 cpu/event=cpu-cycles/ 1.000423009 CPU2 484,651 cpu/event=cpu-cycles/ 1.000423009 CPU3 525,791 cpu/event=cpu-cycles/ 1.000423009 CPU4 1,370,100 cpu/event=cpu-cycles/ 1.000423009 CPU5 442,072 cpu/event=cpu-cycles/ 1.000423009 CPU6 205,643 cpu/event=cpu-cycles/ 1.000423009 CPU7 1,302,250 cpu/event=cpu-cycles/ Now output is aligned. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200218071614.25736-1-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-24perf report/top TUI: Support hotkeys to let user select any event for sortingJin Yao1-1/+27
When performing "perf report --group", it shows the event group information together. In previous patch, we have supported a new option "--group-sort-idx" to sort the output by the event at the index n in event group. It would be nice if we can use a hotkey in browser to select a event to sort. For example, # perf report --group Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, ... Overhead Command Shared Object Symbol 92.19% 98.68% 0.00% 93.30% mgen mgen [.] LOOP1 3.12% 0.29% 0.00% 0.16% gsd-color libglib-2.0.so.0.5600.4 [.] 0x0000000000049515 1.56% 0.03% 0.00% 0.04% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494b7 1.56% 0.01% 0.00% 0.00% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494ce 1.56% 0.00% 0.00% 0.00% mgen [kernel.kallsyms] [k] task_tick_fair 0.00% 0.15% 0.00% 0.04% perf [kernel.kallsyms] [k] smp_call_function_single 0.00% 0.13% 0.00% 6.08% swapper [kernel.kallsyms] [k] intel_idle 0.00% 0.03% 0.00% 0.00% gsd-color libglib-2.0.so.0.5600.4 [.] g_main_context_check 0.00% 0.03% 0.00% 0.00% swapper [kernel.kallsyms] [k] apic_timer_interrupt 0.00% 0.03% 0.00% 0.00% swapper [kernel.kallsyms] [k] check_preempt_curr When user press hotkey '3' (event index, starting from 0), it indicates to sort output by the forth event in group. Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, ... Overhead Command Shared Object Symbol 92.19% 98.68% 0.00% 93.30% mgen mgen [.] LOOP1 0.00% 0.13% 0.00% 6.08% swapper [kernel.kallsyms] [k] intel_idle 3.12% 0.29% 0.00% 0.16% gsd-color libglib-2.0.so.0.5600.4 [.] 0x0000000000049515 0.00% 0.00% 0.00% 0.06% swapper [kernel.kallsyms] [k] hrtimer_start_range_ns 1.56% 0.03% 0.00% 0.04% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494b7 0.00% 0.15% 0.00% 0.04% perf [kernel.kallsyms] [k] smp_call_function_single 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] update_curr 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] apic_timer_interrupt 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] native_apic_msr_eoi_write 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] __update_load_avg_se v6: --- Jiri provided a good improvement to eliminate unneeded refresh. This improvement is added to v6. v2: --- 1. Report warning at helpline when index is invalid. 2. Report warning at helpline when it's not group event. 3. Use "case '0' ... '9'" to refine the code 4. Split K_RELOAD implementation to another patch. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200220013616.19916-4-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-24perf report: Support a new key to reload the browserJin Yao4-3/+6
Sometimes we may need to reload the browser to update the output since some options are changed. This patch creates a new key K_RELOAD. Once the __cmd_report() returns K_RELOAD, it would repeat the whole process, such as, read samples from data file, sort the data and display in the browser. v5: --- 1. Fix the 'make NO_SLANG=1' error. Define K_RELOAD in util/hist.h. 2. Skip setup_sorting() in repeat path if last key is K_RELOAD. v4: --- Need to quit in perf_evsel_menu__run if key is K_RELOAD. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200220013616.19916-3-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-03-24perf report: Allow specifying event to be used as sort key in --group outputJin Yao4-15/+94
When performing "perf report --group", it shows the event group information together. By default, the output is sorted by the first event in group. It would be nice for user to select any event for sorting. This patch introduces a new option "--group-sort-idx" to sort the output by the event at the index n in event group. For example, Before: # perf report --group --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, BR_MISP_RETIRED.ALL_BRANCHES:pp, cpu/event=0xc0,umask=1,cmask=1, # Event count (approx.): 6451235635 # # Overhead Command Shared Object Symbol # ................................ ......... ....................... ................................... # 92.19% 98.68% 0.00% 93.30% mgen mgen [.] LOOP1 3.12% 0.29% 0.00% 0.16% gsd-color libglib-2.0.so.0.5600.4 [.] 0x0000000000049515 1.56% 0.03% 0.00% 0.04% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494b7 1.56% 0.01% 0.00% 0.00% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494ce 1.56% 0.00% 0.00% 0.00% mgen [kernel.kallsyms] [k] task_tick_fair 0.00% 0.15% 0.00% 0.04% perf [kernel.kallsyms] [k] smp_call_function_single 0.00% 0.13% 0.00% 6.08% swapper [kernel.kallsyms] [k] intel_idle 0.00% 0.03% 0.00% 0.00% gsd-color libglib-2.0.so.0.5600.4 [.] g_main_context_check 0.00% 0.03% 0.00% 0.00% swapper [kernel.kallsyms] [k] apic_timer_interrupt ... After: # perf report --group --stdio --group-sort-idx 3 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 12K of events 'cpu/instructions,period=2000003/, cpu/cpu-cycles,period=200003/, BR_MISP_RETIRED.ALL_BRANCHES:pp, cpu/event=0xc0,umask=1,cmask=1, # Event count (approx.): 6451235635 # # Overhead Command Shared Object Symbol # ................................ ......... ....................... ................................... # 92.19% 98.68% 0.00% 93.30% mgen mgen [.] LOOP1 0.00% 0.13% 0.00% 6.08% swapper [kernel.kallsyms] [k] intel_idle 3.12% 0.29% 0.00% 0.16% gsd-color libglib-2.0.so.0.5600.4 [.] 0x0000000000049515 0.00% 0.00% 0.00% 0.06% swapper [kernel.kallsyms] [k] hrtimer_start_range_ns 1.56% 0.03% 0.00% 0.04% gsd-color libglib-2.0.so.0.5600.4 [.] 0x00000000000494b7 0.00% 0.15% 0.00% 0.04% perf [kernel.kallsyms] [k] smp_call_function_single 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] update_curr 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] apic_timer_interrupt 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] native_apic_msr_eoi_write 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] __update_load_avg_se 0.00% 0.00% 0.00% 0.02% mgen [kernel.kallsyms] [k] scheduler_tick Now the output is sorted by the fourth event in group. v7: --- Rebase to latest perf/core, no other change. v4: --- 1. Update Documentation/perf-report.txt to mention '--group-sort-idx' support multiple groups with different amount of events and it should be used on grouped events. 2. Update __hpp__group_sort_idx(), just return when the idx is out of limit. 3. Return failure on symbol_conf.group_sort_idx && !session->evlist->nr_groups. So now we don't need to use together with --group. v3: --- Refine the code in __hpp__group_sort_idx(). Before: for (i = 1; i < nr_members; i++) { if (i == idx) { ret = field_cmp(fields_a[i], fields_b[i]); if (ret) goto out; } } After: if (idx >= 1 && idx < nr_members) { ret = field_cmp(fields_a[idx], fields_b[idx]); if (ret) goto out; } Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20200220013616.19916-2-yao.jin@linux.intel.com [ Renamed pair_fields_alloc() to hist_entry__new_pair() and combined decl + assignment of vars ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>