aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/util/hist.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-02-06perf evsel: Add output_resort_cb methodJiri Olsa1-2/+8
Add perf_evsel__output_resort_cb() so we have an interface with a callback for each hist entry. It will be used in the following patch. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190204141808.23031-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf hists: Add argument to hists__resort_cb_t callbackJiri Olsa1-5/+6
Add argument to hists__resort_cb_t so that we can pass data from upper layers to the callback function. It will be used in the following patches. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190204141808.23031-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf tools: Add missing include <callchain.h> in various placesArnaldo Carvalho de Melo1-0/+1
Its getting it from hist.h and that will go away, as that header doesn't need callchain.h at all. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-6ebl3mwwiqocl79yts44qltu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf tools: Add missing include for symbols.hArnaldo Carvalho de Melo1-0/+1
Several places were using definitions found in symbols.h but not including it, getting it by sheer luck from some other headers that now are in the process of removing that include because they don't need it or because simply having struct forward declarations is enough, fix it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-xbcvvx296d70kpg9wb0qmeq9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-25perf hist: Use cached rbtreesDavidlohr Bueso1-87/+112
At the cost of an extra pointer, we can avoid the O(logN) cost of finding the first element in the tree (smallest node), which is something heavily required for histograms. Specifically, the following are converted to rb_root_cached, and users accordingly: hist::entries_in_array hist::entries_in hist::entries hist::entries_collapsed hist_entry::hroot_in hist_entry::hroot_out Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/20181206191819.30182-7-dave@stgolabs.net [ Added some missing conversions to rb_first_cached() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17perf tools: Fix diverse comment typosIngo Molnar1-1/+1
Go over the tools/ files that are maintained in Arnaldo's tree and fix common typos: half of them were in comments, the other half in JSON files. No change in functionality intended. Committer notes: This was split from a larger patch as there are code that is, additionally, maintained outside the kernel tree, so to ease cherry-picking and/or backporting, split this into multiple patches. Just typos in comments, no need to backport, reducing the possibility of possible backporting artifacts. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20181203102200.GA104797@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-07perf hists: Reimplement hists__has_callchains()Arnaldo Carvalho de Melo1-2/+4
There are places where we have only access to struct hists and need to know if any of its hist_entries has callchains, like when drawing headers for the various output modes (stdio, TUI, etc), so, when adding a new hist_entry, check if it has callchains, storing this info for later use by hists__has_callchains(). This reimplementation is necessary because not always a 'struct hists' is allocated together with a 'struct perf evsel', so we can't go from 'hists' to 'perf_event_attr.sample_type & PERF_SAMPLE_CALLCHAIN'. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-hg5g7yddjio3ljwyqnnaj5dt@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-07perf hists: Save the callchain_size in struct hist_entryArnaldo Carvalho de Melo1-2/+4
So that we can figure out the real size of the struct and also be able to tell if callchains may be present in this histogram entry. Since we can't always guarantee that from hist_entry->hists we can use hists_to_evsel, to then look at evsel->attr.sample_type for PERF_SAMPLE_CALLCHAIN, like with the 'perf c2c' tool, that uses plain 'struct hists' instances, we need another way of deciding if a specific hist_entry instance has callchains associated with it, i.e. if its hist_entry->callchain[0] has space allocated for. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-ptvndealxs1k7myluvu9flnq@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06perf hists: Check if a hist_entry has callchains before using themArnaldo Carvalho de Melo1-5/+6
So far if we use 'perf record -g' this will make symbol_conf.use_callchain 'true' and logic will assume that all events have callchains enabled, but ever since we added the possibility of setting up callchains for some events (e.g.: -e cycles/call-graph=dwarf/) while not for others, we limit usage scenarios by looking at that symbol_conf.use_callchain global boolean, we better look at each event attributes. On the road to that we need to look if a hist_entry has callchains, that is, to go from hist_entry->hists to the evsel that contains it, to then look at evsel->sample_type for PERF_SAMPLE_CALLCHAIN. The next step is to add a symbol_conf.ignore_callchains global, to use in the places where what we really want to know is if callchains should be ignored, even if present. Then -g will mean just to select a callchain mode to be applied to all events not explicitely setting some other callchain mode, i.e. a default callchain mode, and --no-call-graph will set symbol_conf.ignore_callchains with that clear intention. That too will at some point become a per evsel thing, that tools can set for all or just a few of its evsels. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-0sas5cm4dsw2obn75g7ruz69@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-05perf evsel: Add has_callchain() helper to make code more compact/clearArnaldo Carvalho de Melo1-1/+1
Its common to have the (evsel->attr.sample_type & PERF_SAMPLE_CALLCHAIN), so add an evsel__has_callchain(evsel) helper. This will actually get more uses as we check that instead of symbol_conf.use_callchain in places where that produces the same result but makes this decision to be more fine grained, per evsel. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-145340oytbthatpfeaq1do18@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04perf tools: No need to check if the argument to __get() function is NULLArnaldo Carvalho de Melo1-1/+1
Those functions always check if the argument is NULL before trying to grab a reference count, and also will return the received object, so, to make code more compact, no need to check for NULL. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Krister Johansen <kjlx@templeofstupid.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-i9wycjdxh0fwhryu55lmafks@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-03perf hists: Move hists__scnprintf_title() away from the TUI codeArnaldo Carvalho de Melo1-0/+81
The previous patch made this function useful to non-TUI parts of the tools, but left it where the function from what it was carved, so that the patch showed more clearly the process. Now just move it outside the TUI parts so that we can finally use it, even when the TUI code doesn't get built/linked. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196935 Link: https://lkml.kernel.org/n/tip-hqj7hvcr3mu5lvcqp3cssio6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-08perf tools: Add refcnt into struct mem_infoJiri Olsa1-2/+2
It's passed along several hists entries in --hierarchy mode, so it's better we keep track of it. The current fail I see is that it gets removed in hierarchy --mem-mode mode, where it's shared in the different hierarchies, but removed from the template hist entry, so the report crashes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180307155020.32613-6-jolsa@kernel.org [ Rename mem_info__aloc() to mem_info__new(), to fix the typo and use the convention for constructors ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16perf report: Fix memory corruption in --branch-history mode --branch-historyJiri Olsa1-3/+1
Jin Yao reported memory corrupton in perf report with branch info used for stack trace: > Following command lines will cause perf crash. > perf record -j call -g -a <application> > perf report --branch-history > > *** Error in `perf': double free or corruption (!prev): 0x00000000104aa040 *** > ======= Backtrace: ========= > /lib/x86_64-linux-gnu/libc.so.6(+0x77725)[0x7f6b37254725] > /lib/x86_64-linux-gnu/libc.so.6(+0x7ff4a)[0x7f6b3725cf4a] > /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f6b37260abc] > perf[0x51b914] > perf(hist_entry_iter__add+0x1e5)[0x51f305] > perf[0x43cf01] > perf[0x4fa3bf] > perf[0x4fa923] > perf[0x4fd396] > perf[0x4f9614] > perf(perf_session__process_events+0x89e)[0x4fc38e] > perf(cmd_report+0x15d2)[0x43f202] > perf[0x4a059f] > perf(main+0x631)[0x427b71] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f6b371fd830] > perf(_start+0x29)[0x427d89] For the cumulative output, we allocate the he_cache array based on the --max-stack option value and populate it with data from 'callchain_cursor'. The --max-stack option value does not ensure now the limit for number of callchain_cursor nodes, so the cumulative iter code will allocate smaller array than it's actually needed and cause above corruption. I think the --max-stack limit does not apply here anyway, because we add callchain data as normal hist entries, while the --max-stack control the limit of single entry callchain depth. Using the callchain_cursor.nr as he_cache array count to fix this. Also removing struct hist_entry_iter::max_stack, because there's no longer any use for it. We need more fixes to ensure that the branch stack code follows properly the logic of --max-stack, which is not the case at the moment. Original-patch-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reported-by: Jin Yao <yao.jin@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180216123619.GA9945@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-11-07Merge branch 'linus' into perf/core, to fix conflictsIngo Molnar1-0/+1
Conflicts: tools/perf/arch/arm/annotate/instructions.c tools/perf/arch/arm64/annotate/instructions.c tools/perf/arch/powerpc/annotate/instructions.c tools/perf/arch/s390/annotate/instructions.c tools/perf/arch/x86/tests/intel-cqm.c tools/perf/ui/tui/progress.c tools/perf/util/zlib.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-02License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman1-0/+1
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-25perf report: Use srcline from callchain for hist entriesMilian Wolff1-0/+2
This also removes the symbol name from the srcline column, more on this below. This ensures we use the correct srcline, which could originate from a potentially inlined function. The hist entries used to query for the srcline based purely on the IP, which leads to wrong results for inlined entries. Before: ~~~~~ perf report --inline -s srcline -g none --stdio ... # Children Self Source:Line # ........ ........ .................................................................................................................................. # 94.23% 0.00% __libc_start_main+18446603487898210537 94.23% 0.00% _start+41 44.58% 0.00% main+100 44.58% 0.00% std::_Norm_helper<true>::_S_do_it<double>+100 44.58% 0.00% std::__complex_abs+100 44.58% 0.00% std::abs<double>+100 44.58% 0.00% std::norm<double>+100 36.01% 0.00% hypot+18446603487892193300 25.81% 0.00% main+41 25.81% 0.00% std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41 25.81% 0.00% std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41 25.75% 25.75% random.h:143 18.39% 0.00% main+57 18.39% 0.00% std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57 18.39% 0.00% std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57 13.80% 13.80% random.tcc:3330 5.64% 0.00% ??:0 4.13% 4.13% __hypot_finite+163 4.13% 0.00% __hypot_finite+18446603487892193443 ... ~~~~~ After: ~~~~~ perf report --inline -s srcline -g none --stdio ... # Children Self Source:Line # ........ ........ ........................................... # 94.30% 1.19% main.cpp:39 94.23% 0.00% __libc_start_main+18446603487898210537 94.23% 0.00% _start+41 48.44% 1.70% random.h:1823 48.44% 0.00% random.h:1814 46.74% 2.53% random.h:185 44.68% 0.10% complex:589 44.68% 0.00% complex:597 44.68% 0.00% complex:654 44.68% 0.00% complex:664 40.61% 13.80% random.tcc:3330 36.01% 0.00% hypot+18446603487892193300 26.81% 0.00% random.h:151 26.81% 0.00% random.h:332 25.75% 25.75% random.h:143 5.64% 0.00% ??:0 4.13% 4.13% __hypot_finite+163 4.13% 0.00% __hypot_finite+18446603487892193443 ... ~~~~~ Note that this change removes the symbol from the source:line hist column. If this information is desired, users should explicitly query for it if needed. I.e. run this command instead: ~~~~~ perf report --inline -s sym,srcline -g none --stdio ... # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1K of event 'cycles:uppp' # Event count (approx.): 1381229476 # # Children Self Symbol Source:Line # ........ ........ ................................................................................................................................... ........................................... # 94.30% 1.19% [.] main main.cpp:39 94.23% 0.00% [.] __libc_start_main __libc_start_main+18446603487898210537 94.23% 0.00% [.] _start _start+41 48.44% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) random.h:1814 48.44% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) random.h:1823 46.74% 0.00% [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) random.h:185 44.68% 0.00% [.] std::_Norm_helper<true>::_S_do_it<double> (inlined) complex:654 44.68% 0.00% [.] std::__complex_abs (inlined) complex:589 44.68% 0.00% [.] std::abs<double> (inlined) complex:597 44.68% 0.00% [.] std::norm<double> (inlined) complex:664 39.80% 13.59% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3330 36.01% 0.00% [.] hypot hypot+18446603487892193300 26.81% 0.00% [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined) random.h:151 26.81% 0.00% [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined) random.h:332 25.75% 0.00% [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined) random.h:143 25.19% 25.19% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:143 4.13% 4.13% [.] __hypot_finite __hypot_finite+163 4.13% 0.00% [.] __hypot_finite __hypot_finite+18446603487892193443 ... ~~~~~ Compared to the old behavior, this reduces duplication in the output. Before we used to print the symbol name in the srcline column even when the sym column was explicitly requested. I.e. the output was: ~~~~~ perf report --inline -s sym,srcline -g none --stdio ... # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 1K of event 'cycles:uppp' # Event count (approx.): 1381229476 # # Children Self Symbol Source:Line # ........ ........ ................................................................................................................................... .................................................................................................................................. # 94.23% 0.00% [.] __libc_start_main __libc_start_main+18446603487898210537 94.23% 0.00% [.] _start _start+41 44.58% 0.00% [.] main main+100 44.58% 0.00% [.] std::_Norm_helper<true>::_S_do_it<double> (inlined) std::_Norm_helper<true>::_S_do_it<double>+100 44.58% 0.00% [.] std::__complex_abs (inlined) std::__complex_abs+100 44.58% 0.00% [.] std::abs<double> (inlined) std::abs<double>+100 44.58% 0.00% [.] std::norm<double> (inlined) std::norm<double>+100 36.01% 0.00% [.] hypot hypot+18446603487892193300 25.81% 0.00% [.] main main+41 25.81% 0.00% [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41 25.81% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41 25.69% 25.69% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.h:143 18.39% 0.00% [.] main main+57 18.39% 0.00% [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined) std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57 18.39% 0.00% [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined) std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57 13.80% 13.80% [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > random.tcc:3330 4.13% 4.13% [.] __hypot_finite __hypot_finite+163 4.13% 0.00% [.] __hypot_finite __hypot_finite+18446603487892193443 ... ~~~~~ Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20171019113836.5548-5-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-24perf report: Remove code to handle inline frames from browsersMilian Wolff1-5/+0
The follow-up commits will make inline frames first-class citizens in the callchain, thereby obsoleting all of this special code. Signed-off-by: Milian Wolff <milian.wolff@kdab.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yao Jin <yao.jin@linux.intel.com> Link: http://lkml.kernel.org/r/20171009203310.17362-2-milian.wolff@kdab.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-01perf sort: Add sort option for physical addressKan Liang1-0/+4
Add a new sort option "phys_daddr" for --mem-mode sort. With this option applied, perf can sort and report by sample's physical address. Signed-off-by: Kan Liang <kan.liang@intel.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Stephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1504026672-7304-3-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-07-25perf report: Make --branch-history work without callgraphs(-g) option in perf recordJin Yao1-0/+2
perf record -b -g <command> perf report --branch-history This merges the LBRs with the callgraphs. However it would be nice if it also works without callgraphs (-g) set in perf record, so that only the LBRs are displayed. But currently perf report errors in this case. For example, perf record -b <command> perf report --branch-history Error: Selected -g or --branch-history but no callchain data. Did you call 'perf record' without -g? This patch displays the LBRs only even if callgraphs(-g) is not enabled in perf record. Change log: v2: According to Milian Wolff's comment, change the obsolete error message. Now the error message is: ┌─Error:─────────────────────────────────────┐ │Selected -g or --branch-history. │ │But no callchain or branch data. │ │Did you call 'perf record' without -g or -b?│ │ │ │ │ │Press any key... │ └────────────────────────────────────────────┘ When passing the last parameter to hists__fprintf, changes "|" to "||". hists__fprintf(hists, !quiet, 0, 0, rep->min_percent, stdout, symbol_conf.use_callchain || symbol_conf.show_branchflag_count); Signed-off-by: Yao Jin <yao.jin@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1494240182-28899-1-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-07-18perf report: Show branch type statistics for stdio modeJin Yao1-4/+1
Show the branch type statistics at the end of perf report --stdio. For example: perf report --stdio COND_FWD: 28.5% COND_BWD: 9.4% CROSS_4K: 0.7% CROSS_2M: 14.1% COND: 37.9% UNCOND: 0.2% IND: 6.7% CALL: 26.5% RET: 28.7% SYSRET: 0.0% The branch types are: COND_FWD: conditional forward COND_BWD: conditional backward COND: conditional branch UNCOND: unconditional branch IND: indirect CALL: function call IND_CALL: indirect function call RET: function return SYSCALL: syscall SYSRET: syscall return COND_CALL: conditional function call COND_RET: conditional function return CROSS_4K and CROSS_2M: They are the metrics checking for branches cross 4K or 2MB pages. It's an approximate computing. We don't know if the area is 4K or 2MB, so always compute both. To make the output simple, if a branch crosses 2M area, CROSS_4K will not be incremented. Change log v7: Since the common branch type definitions are changed, some tags/strings are updated accordingly. v6: Remove branch_type_stat_display() since it's moved to branch.c. v5: Remove the unnecessary sort__mode checking in hist_iter__branch_callback(). v4: Comparing to previous version, the major changes are: Add the computing of JCC forward/JCC backward and cross page checking by using the from and to addresses. Signed-off-by: Yao Jin <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1500379995-6449-7-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-24perf tools: Use just forward declarations for struct thread where possibleArnaldo Carvalho de Melo1-0/+1
Removing various instances of unnecessary includes, reducing the maze of header dependencies. Link: http://lkml.kernel.org/n/tip-hwu6eyuok9pc57alookyzmsf@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-24perf tools: Include sys/param.h where neededArnaldo Carvalho de Melo1-0/+1
As it is going away from util.h, where it is not needed. This is mostly for things like MAXPATHLEN, MAX() and MIN(), these later two probably should go away in favor of its kernel sources replacements. Link: http://lkml.kernel.org/n/tip-z1666f3fl3fqobxvjr5o2r39@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-19perf tools: Include errno.h where neededArnaldo Carvalho de Melo1-0/+1
Removing it from util.h, part of an effort to disentangle the includes hell, that makes changes to util.h or something included by it to cause a complete rebuild of the tools. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ztrjy52q1rqcchuy3rubfgt2@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-19perf tools: Move srcline definitions to separate headerArnaldo Carvalho de Melo1-0/+1
Out of util.h into a new file, srcline.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ludnlm4djqcdjziekzr4s3u9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-30perf utils: Fix spelling mistake: "Invalud" -> "Invalid"Colin Ian King1-1/+1
Trivial fix to spelling mistake in pr_debug message. Signed-off-by: Colin King <colin.king@canonical.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Krister Johansen <kjlx@templeofstupid.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/20170330095440.19444-1-colin.king@canonical.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-27perf report: Show inline stack for browser modeJin Yao1-0/+5
If the address belongs to an inlined function, the source information back to the first non-inlined function will be printed. For example: 1. Show inlined function name perf report -g function --inline - 0.69% 0.00% inline ld-2.23.so [.] dl_main - dl_main 0.56% _dl_relocate_object _dl_relocate_object (inline) elf_dynamic_do_Rela (inline) 2. Show the file/line information perf report -g address --inline - 0.69% 0.00% inline ld-2.23.so [.] _dl_start _dl_start rtld.c:307 /build/glibc-GKVZIf/glibc-2.23/elf/rtld.c:413 (inline) + _dl_sysdep_start dl-sysdep.c:250 Signed-off-by: Yao Jin <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Milian Wolff <milian.wolff@kdab.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/1490474069-15823-6-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-14perf tools: Add 'cgroup_id' sort order keywordHari Bathini1-0/+7
This patch introduces a cgroup identifier entry field in perf report to identify or distinguish data of different cgroups. It uses the device number and inode number of cgroup namespace, included in perf data with the new PERF_RECORD_NAMESPACES event, as cgroup identifier. With the assumption that each container is created with it's own cgroup namespace, this allows assessment/analysis of multiple containers at once. A simple test for this would be to clone a few processes passing SIGCHILD & CLONE_NEWCROUP flags to each of them, execute shell and run different workloads on each of those contexts, while running perf record command with --namespaces option. Shown below is the output of perf report, sorted with cgroup identifier, on perf.data generated with the above test scenario, clearly indicating one context's considerable use of kernel memory in comparison with others: $ perf report -s cgroup_id,sample --stdio # # Total Lost Samples: 0 # # Samples: 5K of event 'kmem:kmalloc' # Event count (approx.): 5965 # # Overhead cgroup id (dev/inode) Samples # ........ ..................... ............ # 81.27% 3/0xeffffffb 4848 16.24% 3/0xf00000d0 969 1.16% 3/0xf00000ce 69 0.82% 3/0xf00000cf 49 0.50% 0/0x0 30 While this is a start, there is further scope of improving this. For example, instead of cgroup namespace's device and inode numbers, dev and inode numbers of some or all namespaces may be used to distinguish which processes are running in a given container context. Also, scripts to map device and inode info to containers sounds plausible for better tracing of containers. Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sargun Dhillon <sargun@sargun.me> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/148891933338.25309.756882900782042645.stgit@hbathini.in.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-02-20perf utils: Check verbose flag properlyNamhyung Kim1-3/+3
It now can have negative value to suppress the message entirely. So it needs to check it being positive. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: kernel-team@lge.com Link: http://lkml.kernel.org/r/20170217081742.17417-3-namhyung@kernel.org [ Adjust fuzz on tools/perf/util/pmu.c, add > 0 checks in many other places ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-31perf callchain: Reference count mapsKrister Johansen1-0/+7
If dso__load_kcore frees all of the existing maps, but one has already been attached to a callchain cursor node, then we can get a SIGSEGV in any function that happens to try to use this invalid cursor. Use the existing map refcount mechanism to forestall cleanup of a map until the cursor iterates past the node. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: stable@kernel.org Fixes: 84c2cafa2889 ("perf tools: Reference count struct map") Link: http://lkml.kernel.org/r/20170106062331.GB2707@templeofstupid.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-27perf tools: Propagate perf_config() errorsArnaldo Carvalho de Melo1-1/+3
Previously these were being ignored, sometimes silently. Stop doing that, emitting debug messages and handling the errors. Testing it: $ cat ~/.perfconfig cat: /home/acme/.perfconfig: No such file or directory $ perf stat -e cycles usleep 1 Performance counter stats for 'usleep 1': 938,996 cycles:u 0.003813731 seconds time elapsed $ perf top --stdio Error: You may not have permission to collect system-wide stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, <SNIP> [ perf record: Captured and wrote 0.019 MB perf.data (7 samples) ] [acme@jouet linux]$ perf report --stdio # To display the perf.data header info, please use --header/--header-only options. # Overhead Command Shared Object Symbol # ........ ....... ................. ......................... 71.77% usleep libc-2.24.so [.] _dl_addr 27.07% usleep ld-2.24.so [.] _dl_next_ld_env_entry 1.13% usleep [kernel.kallsyms] [k] page_fault $ $ touch ~/.perfconfig $ ls -la ~/.perfconfig -rw-rw-r--. 1 acme acme 0 Jan 27 12:14 /home/acme/.perfconfig $ $ perf stat -e instructions usleep 1 Performance counter stats for 'usleep 1': 244,610 instructions:u 0.000805383 seconds time elapsed $ [root@jouet ~]# chown acme.acme ~/.perfconfig [root@jouet ~]# perf stat -e cycles usleep 1 Warning: File /root/.perfconfig not owned by current user or root, ignoring it. Performance counter stats for 'usleep 1': 937,615 cycles 0.000836931 seconds time elapsed # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-j2rq96so6xdqlr8p8rd6a3jx@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-11-24Merge branch 'linus' into perf/core, to pick up fixesIngo Molnar1-6/+6
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-09perf hists: Fix column length on --hierarchyNamhyung Kim1-6/+6
Markus reported that there's a weird behavior on perf top --hierarchy regarding the column length. Looking at the code, I found a dubious code which affects the symptoms. When --hierarchy option is used, the last column length might be inaccurate since it skips to update the length on leaf entries. I cannot remember why it did and looks like a leftover from previous version during the development. Anyway, updating the column length often is not harmful. So let's move the code out. Reported-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 1a3906a7e6b9 ("perf hists: Resort hist entries with hierarchy") Link: http://lkml.kernel.org/r/20161108130833.9263-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-21perf c2c report: Limit the cachelines table entriesJiri Olsa1-0/+1
Add a limit for entries number of the cachelines table entries. By default now it's the 0.0005% minimum of remote HITMs. Also display only cachelines with remote hitm or store data. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-inykbom2f19difvsu1e18avr@git.kernel.org [ Disabled for now ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-20perf hists: Fix width computation for srcline sort entryJiri Olsa1-2/+4
Adding header size to width computation for srcline sort entry, because it's possible to get empty data with ':0' which set width of 2 which is lower than width needed to display column header. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1474290610-23241-62-git-send-email-jolsa@kernel.org [ Added declaration to sort.h ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-13perf hist: Initialize hierarchy tree explicitlyNamhyung Kim1-0/+2
The hroot_in and hroot_out are roots of hierarchy trees of hist entries. But when a hist entry is initialized by copying existing template entry, it sometimes has non-empty tree and copies it incorrectly. This is a problem especially when an event group is used since it creates dummy entries from already-processed entries in other event members. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20160913074552.13284-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-13perf hists: Introduce hists__link_hierarchy()Namhyung Kim1-0/+95
The hists__link_hierarchy() is to support hierarchy reports with an event group. When it matches the leader event and the other members (using hists__match_hierarchy()), it also needs to link unmatched member entries with a dummy leader event so that it can show up in the output. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20160913074552.13284-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-13perf hists: Introduce hists__match_hierarchy()Namhyung Kim1-0/+51
The hists__match_hierarchy() is to find matching hist entries in a group. A matching entry has the same values for all sort keys given. With an event group (e.g.: -e "{cycles,instructions}"), a leader event should show other members in a group. So each entry in the leader should be able to find its pair entries which have same values. With hierarchy mode, it needs to search all matching children in a hierarchy. An example output looks like: # Overhead Command / Shared Object / Symbol # ...................... .................................. # 25.74% 27.18% sh 19.96% 24.14% libc-2.24.so 9.55% 14.64% [.] __strcmp_sse2 1.54% 0.00% [.] __tfind 1.07% 1.13% [.] _int_malloc ... In the above example, two overheads are shown - one for the leader and another for the other group member. They were matched since their command, dso and symbol have the same values. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20160913074552.13284-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-08-02perf hists: Introduce output_resort_cb methodJiri Olsa1-3/+12
When dealing with nested hist entries it's helpful to have a way to resort those nested objects. Adding optional callback call into output_resort function and following new interface function: typedef int (*hists__resort_cb_t)(struct hist_entry *he); void hists__output_resort_cb(struct hists *hists, struct ui_progress *prog, hists__resort_cb_t cb); Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1470074555-24889-7-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-12perf hists: Introduce hists__add_entry_ops functionJiri Olsa1-7/+35
Introducing hists__add_entry_ops function to allow using the allocation callbacks externally. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1467701765-26194-4-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-12perf hists: Introduce hist_entry_opsJiri Olsa1-4/+27
Introducing allocation callbacks, that allows to extend current hist_entry object into objects with special needs without polluting the current hist_entry object. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1467701765-26194-3-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-07-12perf hists: Introduce hist_entry__init functionJiri Olsa1-66/+73
Move the 'struct hist_entry' initialization code to a separate function. It'll be useful and more clear for the following patches that introduce allocation callbacks. Releasing the hist_entry object in hist_entry__new function (where it's allocated) rather than in hist_entry__init. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1467701765-26194-2-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-06-23perf evlist: Rename for_each() macros to for_each_entry()Arnaldo Carvalho de Melo1-1/+1
To match the semantics for list.h in the kernel, that are used to implement those macros. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-qbcjlgj0ffxquxscahbpddi3@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-06-22perf hists: Enlarge pid sort entry sizeJiri Olsa1-1/+1
The pid sort entry currently aligns pids with 5 digits, which is not enough for current 4 million pids limit. This leads to unaligned ':' header-data output when we display 7 digits pid: # Children Self Symbol Pid:Command # ........ ........ ...................... ..................... # 0.12% 0.12% [.] 0x0000000000147e0f 2052894:krava ... Adding 2 more digit to properly align the pid limit: # Children Self Symbol Pid:Command # ........ ........ ...................... ....................... # 0.12% 0.12% [.] 0x0000000000147e0f 2052894:krava Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1466459899-1166-9-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-06-21perf hists: Rename __hists__add_entry to hists__add_entryJiri Olsa1-17/+17
There's no reason we should suffer the '__' prefix for the base global function. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1465928361-2442-12-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-06-15perf hists: Replace perf_evsel arg perf_hpp_fmt's width callbackJiri Olsa1-1/+1
Replacing perf_evsel arg perf_hpp_fmt's width callback with hists object. This will be helpful in future for non evsel related hist browsers. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1465928361-2442-11-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-23perf report: Add srcline_from/to branch sort keysAndi Kleen1-0/+9
Add "srcline_from" and "srcline_to" branch sort keys that allow to show the source lines of a branch. That makes it much easier to track down where particular branches happen in the program, for example to examine branch mispredictions, or to associate it with cycle counts: % perf record -b -e cycles:p ./tcall % perf report --sort srcline_from,srcline_to,mispredict ... 15.10% tcall.c:18 tcall.c:10 N 14.83% tcall.c:11 tcall.c:5 N 14.12% tcall.c:7 tcall.c:12 N 14.04% tcall.c:12 tcall.c:5 N 12.42% tcall.c:17 tcall.c:18 N 12.39% tcall.c:7 tcall.c:13 N 12.27% tcall.c:13 tcall.c:17 N ... % perf report --sort srcline_from,srcline_to,cycles ... 17.12% tcall.c:18 tcall.c:11 1 17.01% tcall.c:12 tcall.c:6 1 16.98% tcall.c:11 tcall.c:6 1 15.91% tcall.c:17 tcall.c:18 1 6.38% tcall.c:7 tcall.c:17 7 4.80% tcall.c:7 tcall.c:12 8 4.21% tcall.c:7 tcall.c:17 8 2.67% tcall.c:7 tcall.c:12 7 2.62% tcall.c:7 tcall.c:12 10 2.10% tcall.c:7 tcall.c:17 9 1.58% tcall.c:7 tcall.c:12 6 1.44% tcall.c:7 tcall.c:12 5 1.38% tcall.c:7 tcall.c:12 9 1.06% tcall.c:7 tcall.c:17 13 1.05% tcall.c:7 tcall.c:12 4 1.01% tcall.c:7 tcall.c:17 6 Open issues: - Some kernel symbols get misresolved. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/1463775308-32748-1-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-05perf hists: Move sort__need_collapse into struct perf_hpp_listJiri Olsa1-7/+7
Now we have sort dimensions private for struct hists, we need to make dimension booleans hists specific as well. Moving sort__need_collapse into struct perf_hpp_list. Adding hists__has macro to easily access this info perf struct hists object. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1462276488-26683-2-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-25perf hists: Clear dummy entry accumulated periodKan Liang1-0/+2
The accumulated period for dummy entry should also be 0. Otherwise, the total overhead could be overcounted. $ perf record -e '{LLC-load-misses,cpu/instructions/}' --call-graph=lbr ./tchain $ perf report --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 21K of event 'anon group { LLC-load-misses, cpu/instructions/ }' # Event count (approx.): 16313667937 # # Children Self Command Shared Object Symbol # ................ ................ ........... ................ ............................ # 4769.98% 0.01% 0.00% 0.01% tchain_edit [kernel.vmlinux] [k] update_fast_timekeeper 4356.18% 0.01% 0.00% 0.01% tchain_edit [kernel.vmlinux] [k] trigger_load_balance 3181.12% 0.01% 0.00% 0.01% tchain_edit [kernel.vmlinux] [k] irq_work_tick 1592.37% 0.00% 0.00% 0.00% tchain_edit [kernel.vmlinux] [k] cpu_needs_another_gp Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1461565689-5862-1-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-14perf callchain: Start moving away from global per thread cursorsArnaldo Carvalho de Melo1-1/+1
The recent perf_evsel__fprintf_callchain() move to evsel.c added several new symbol requirements to the python binding, for instance: # perf test -v python 16: Try 'import perf' in python, checking link problems : --- start --- test child forked, pid 18030 Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: /tmp/build/perf/python/perf.so: undefined symbol: callchain_cursor test child finished with -1 ---- end ---- Try 'import perf' in python, checking link problems: FAILED! # This would require linking against callchain.c to access to the global callchain_cursor variables. Since lots of functions already receive as a parameter a callchain_cursor struct pointer, make that be the case for some more function so that we can start phasing out usage of yet another global variable. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-djko3097eyg2rn66v2qcqfvn@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>