diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-01-18 06:32:11 +0200 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-01-18 06:32:11 +0200 |
commit | 57d17378a4a042401b0c2fe211e5a0e3a276cb3d (patch) | |
tree | be65e51a7b11cccb903b44708b98b3af47477304 /tools/perf/util/sort.c | |
parent | Merge tag 'for-linus-5.17-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux (diff) | |
parent | perf record: Disable debuginfod by default (diff) | |
download | wireguard-linux-57d17378a4a042401b0c2fe211e5a0e3a276cb3d.tar.xz wireguard-linux-57d17378a4a042401b0c2fe211e5a0e3a276cb3d.zip |
Merge tag 'perf-tools-for-v5.17-2022-01-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tool updates from Arnaldo Carvalho de Melo:
"New features:
- Add 'trace' subcommand for 'perf ftrace', setting the stage for
more 'perf ftrace' subcommands. Not using a subcommand yields the
previous behaviour of 'perf ftrace'.
- Add 'latency' subcommand to 'perf ftrace', that can use the
function graph tracer or a BPF optimized one, via the -b/--use-bpf
option.
E.g.:
$ sudo perf ftrace latency -a -T mutex_lock sleep 1
# DURATION | COUNT | GRAPH |
0 - 1 us | 4596 | ######################## |
1 - 2 us | 1680 | ######### |
2 - 4 us | 1106 | ##### |
4 - 8 us | 546 | ## |
8 - 16 us | 562 | ### |
16 - 32 us | 1 | |
32 - 64 us | 0 | |
64 - 128 us | 0 | |
128 - 256 us | 0 | |
256 - 512 us | 0 | |
512 - 1024 us | 0 | |
1 - 2 ms | 0 | |
2 - 4 ms | 0 | |
4 - 8 ms | 0 | |
8 - 16 ms | 0 | |
16 - 32 ms | 0 | |
32 - 64 ms | 0 | |
64 - 128 ms | 0 | |
128 - 256 ms | 0 | |
256 - 512 ms | 0 | |
512 - 1024 ms | 0 | |
1 - ... s | 0 | |
The original implementation of this command was in the bcc tool.
- Support --cputype option for hybrid events in 'perf stat'.
Improvements:
- Call chain improvements for ARM64.
- No need to do any affinity setup when profiling pids.
- Reduce multiplexing with duration_time in 'perf stat' metrics.
- Improve error message for uncore events, stating that some event
groups are can only be used in system wide (-a) mode.
- perf stat metric group leader fixes/improvements, including arch
specific changes to better support Intel topdown events.
- Probe non-deprecated sysfs path first, i.e. try the path
/sys/devices/system/cpu/cpuN/topology/thread_siblings first, then
the old /sys/devices/system/cpu/cpuN/topology/core_cpus.
- Disable debuginfod by default in 'perf record', to avoid stalls on
distros such as Fedora 35.
- Use unbuffered output in 'perf bench' when pipe/tee'ing to a file.
- Enable ignore_missing_thread in 'perf trace'
Fixes:
- Avoid TUI crash when navigating in the annotation of recursive
functions.
- Fix hex dump character output in 'perf script'.
- Fix JSON indentation to 4 spaces standard in the ARM vendor event
files.
- Fix use after free in metric__new().
- Fix IS_ERR_OR_NULL() usage in the perf BPF loader.
- Fix up cross-arch register support, i.e. when printing register
names take into account the architecture where the perf.data file
was collected.
- Fix SMT fallback with large core counts.
- Don't lower case MetricExpr when parsing JSON files so as not to
lose info such as the ":G" event modifier in metrics.
perf test:
- Add basic stress test for sigtrap handling to 'perf test'.
- Fix 'perf test' failures on s/390
- Enable system wide for metricgroups test in 'perf test´.
- Use 3 digits for test numbering now we can have more tests.
Arch specific:
- Add events for Arm Neoverse N2 in the ARM JSON vendor event files
- Support PERF_MEM_LVLNUM encodings in powerpc, that came from a
single patch series, where I incorrectly merged the kernel bits,
that were then reverted after coordination with Michael Ellerman
and Stephen Rothwell.
- Add ARM SPE total latency as PERF_SAMPLE_WEIGHT.
- Update AMD documentation, with info on raw event encoding.
- Add support for global and local variants of the "p_stage_cyc" sort
key, applicable to perf.data files collected on powerpc.
- Remove duplicate and incorrect aux size checks in the ARM CoreSight
ETM code.
Refactorings:
- Add a perf_cpu abstraction to disambiguate CPUs and CPU map
indexes, fixing problems along the way.
- Document CPU map methods.
UAPI sync:
- Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench
mem memcpy'
- Sync UAPI files with the kernel sources: drm, msr-index,
cpufeatures.
Build system
- Enable warnings through HOSTCFLAGS.
- Drop requirement for libstdc++.so for libopencsd check
libperf:
- Make libperf adopt perf_counts_values__scale() from tools/perf/util/.
- Add a stat multiplexing test to libperf"
* tag 'perf-tools-for-v5.17-2022-01-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (115 commits)
perf record: Disable debuginfod by default
perf evlist: No need to do any affinity setup when profiling pids
perf cpumap: Add is_dummy() method
perf metric: Fix metric_leader
perf cputopo: Fix CPU topology reading on s/390
perf metricgroup: Fix use after free in metric__new()
libperf tests: Update a use of the new cpumap API
perf arm: Fix off-by-one directory path
tools arch x86: Sync the msr-index.h copy with the kernel sources
tools headers cpufeatures: Sync with the kernel sources
tools headers UAPI: Update tools's copy of drm.h header
tools arch: Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy'
perf pmu-events: Don't lower case MetricExpr
perf expr: Add debug logging for literals
perf tools: Probe non-deprecated sysfs path 1st
perf tools: Fix SMT fallback with large core counts
perf cpumap: Give CPUs their own type
perf stat: Correct first_shadow_cpu to return index
perf script: Fix flipped index and cpu
perf c2c: Use more intention revealing iterator
...
Diffstat (limited to 'tools/perf/util/sort.c')
-rw-r--r-- | tools/perf/util/sort.c | 34 |
1 files changed, 25 insertions, 9 deletions
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index d9a106f0edb2..cfba8c337783 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -37,7 +37,7 @@ const char default_parent_pattern[] = "^sys_|^do_page_fault"; const char *parent_pattern = default_parent_pattern; const char *default_sort_order = "comm,dso,symbol"; const char default_branch_sort_order[] = "comm,dso_from,symbol_from,symbol_to,cycles"; -const char default_mem_sort_order[] = "local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,p_stage_cyc"; +const char default_mem_sort_order[] = "local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,local_p_stage_cyc"; const char default_top_sort_order[] = "dso,symbol"; const char default_diff_sort_order[] = "dso,symbol"; const char default_tracepoint_sort_order[] = "trace"; @@ -46,8 +46,8 @@ const char *field_order; regex_t ignore_callees_regex; int have_ignore_callees = 0; enum sort_mode sort__mode = SORT_MODE__NORMAL; -const char *dynamic_headers[] = {"local_ins_lat", "p_stage_cyc"}; -const char *arch_specific_sort_keys[] = {"p_stage_cyc"}; +static const char *const dynamic_headers[] = {"local_ins_lat", "ins_lat", "local_p_stage_cyc", "p_stage_cyc"}; +static const char *const arch_specific_sort_keys[] = {"local_p_stage_cyc", "p_stage_cyc"}; /* * Replaces all occurrences of a char used with the: @@ -1392,22 +1392,37 @@ struct sort_entry sort_global_ins_lat = { }; static int64_t -sort__global_p_stage_cyc_cmp(struct hist_entry *left, struct hist_entry *right) +sort__p_stage_cyc_cmp(struct hist_entry *left, struct hist_entry *right) { return left->p_stage_cyc - right->p_stage_cyc; } +static int hist_entry__global_p_stage_cyc_snprintf(struct hist_entry *he, char *bf, + size_t size, unsigned int width) +{ + return repsep_snprintf(bf, size, "%-*u", width, + he->p_stage_cyc * he->stat.nr_events); +} + + static int hist_entry__p_stage_cyc_snprintf(struct hist_entry *he, char *bf, size_t size, unsigned int width) { return repsep_snprintf(bf, size, "%-*u", width, he->p_stage_cyc); } -struct sort_entry sort_p_stage_cyc = { - .se_header = "Pipeline Stage Cycle", - .se_cmp = sort__global_p_stage_cyc_cmp, +struct sort_entry sort_local_p_stage_cyc = { + .se_header = "Local Pipeline Stage Cycle", + .se_cmp = sort__p_stage_cyc_cmp, .se_snprintf = hist_entry__p_stage_cyc_snprintf, - .se_width_idx = HISTC_P_STAGE_CYC, + .se_width_idx = HISTC_LOCAL_P_STAGE_CYC, +}; + +struct sort_entry sort_global_p_stage_cyc = { + .se_header = "Pipeline Stage Cycle", + .se_cmp = sort__p_stage_cyc_cmp, + .se_snprintf = hist_entry__global_p_stage_cyc_snprintf, + .se_width_idx = HISTC_GLOBAL_P_STAGE_CYC, }; struct sort_entry sort_mem_daddr_sym = { @@ -1858,7 +1873,8 @@ static struct sort_dimension common_sort_dimensions[] = { DIM(SORT_CODE_PAGE_SIZE, "code_page_size", sort_code_page_size), DIM(SORT_LOCAL_INS_LAT, "local_ins_lat", sort_local_ins_lat), DIM(SORT_GLOBAL_INS_LAT, "ins_lat", sort_global_ins_lat), - DIM(SORT_PIPELINE_STAGE_CYC, "p_stage_cyc", sort_p_stage_cyc), + DIM(SORT_LOCAL_PIPELINE_STAGE_CYC, "local_p_stage_cyc", sort_local_p_stage_cyc), + DIM(SORT_GLOBAL_PIPELINE_STAGE_CYC, "p_stage_cyc", sort_global_p_stage_cyc), }; #undef DIM |