aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/util/intel-pt.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-03-09perf tools: Add hw_idx in struct branch_stackKan Liang1-0/+2
The low level index of raw branch records for the most recent branch can be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX branch_sample_type. Extend struct branch_stack to support it. However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and entries[] will be output by kernel. The pointer of entries[] could be wrong, since the output format is different with new struct branch_stack. Add a variable no_hw_idx in struct perf_sample to indicate whether the hw_idx is output. Add get_branch_entry() to return corresponding pointer of entries[0]. To make dummy branch sample consistent as new branch sample, add hw_idx in struct dummy_branch_stack for cs-etm and intel-pt. Apply the new struct branch_stack for synthetic events as well. Extend test case sample-parsing to support new struct branch_stack. Committer notes: Renamed get_branch_entries() to perf_sample__branch_entries() to have proper namespacing and pave the way for this to be moved to libperf, eventually. Add 'static' to that inline as it is in a header. Add 'hw_idx' to 'struct dummy_branch_stack' in cs-etm.c to fix the build on arm64. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Pavel Gerasimov <pavel.gerasimov@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Stephane Eranian <eranian@google.com> Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com> Link: http://lore.kernel.org/lkml/20200228163011.19358-2-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-26perf maps: Merge 'struct maps' with 'struct map_groups'Arnaldo Carvalho de Melo1-1/+1
And pick the shortest name: 'struct maps'. The split existed because we used to have two groups of maps, one for functions and one for variables, but that only complicated things, sometimes we needed to figure out what was at some address and then had to first try it on the functions group and if that failed, fall back to the variables one. That split is long gone, so for quite a while we had only one struct maps per struct map_groups, simplify things by combining those structs. First patch is the minimum needed to merge both, follow up patches will rename 'thread->mg' to 'thread->maps', etc. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-hom6639ro7020o708trhxh59@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-22perf intel-pt: Add support for decoding AUX area samplesAdrian Hunter1-3/+106
Add support for dumping, queuing and decoding AUX area samples. Decoding samples is the same as regular decoding, except in the case where there are no timestamps, in which case buffers are decoded immediately before the sample event. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20191115124225.5247-15-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-25libperf: Move 'ids' from 'struct evsel' to 'struct perf_evsel'Jiri Olsa1-1/+1
Move 'ids' from 'struct evsel' to libperf's 'struct perf_evsel'. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lore.kernel.org/lkml/20190913132355.21634-26-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-25libperf: Move 'id' from 'struct evsel' to 'struct perf_evsel'Jiri Olsa1-4/+4
Move the 'id' array from 'struct evsel' to libperf's 'struct perf_evsel'. Committer note: Fix the tools/perf/util/cs-etm.c build, i.e. aarch64's CoreSight. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lore.kernel.org/lkml/20190913132355.21634-25-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-09-20perf tools: Move event synthesizing routines to separate headerArnaldo Carvalho de Melo1-0/+1
Those are the only routines using the perf_event__handler_t typedef and are all related, so move to a separate header to reduce the header dependency tree, lots of places were getting event.h and even stdio.h, limits.h indirectly, so fix those as well. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-yvx9u1mf7baq6cu1abfhbqgs@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-31perf debug: Remove needless include directives from debug.hArnaldo Carvalho de Melo1-0/+1
All we need there is a forward declaration for 'union perf_event', so remove it from there and add missing header directives in places using things from this indirect include. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-7ftk0ztstqub1tirjj8o8xbl@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-29perf tools: Remove perf.h from source files not needing itArnaldo Carvalho de Melo1-1/+0
With the movement of lots of stuff out of perf.h to other headers we ended up not needing it in lots of places, remove it from those places. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-c718m0sxxwp73lp9d8vpihb4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-29libperf: Rename the PERF_RECORD_ structs to have a "perf" prefixJiri Olsa1-4/+4
Even more, to have a "perf_record_" prefix, so that they match the PERF_RECORD_ enum they map to. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190828135717.7245-23-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-29libperf: Add PERF_RECORD_AUXTRACE_INFO 'struct auxtrace_info_event' to perf/event.hJiri Olsa1-2/+2
Move the PERF_RECORD_AUXTRACE_INFO event definition to libperf's event.h. In order to keep libperf simple, we switch 'u64/u32/u16/u8' types used events to their generic '__u*' versions. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190828135717.7245-9-jolsa@kernel.org [ Fix cs_etm__print_auxtrace_info() arg to be __u64 too to fix the CORESIGHT=1 build ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-14perf intel-pt: Process options for PEBS event synthesisAdrian Hunter1-0/+18
Process synth_opts.other_events and attr.aux_output to set up for synthesizing PEBs via Intel PT events. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190806084606.4021-6-alexander.shishkin@linux.intel.com Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> [ Fixed up libbperf clashes, i.e. some places using perf_evsel (now in libperf) need to use instead 'evsel' (a tools/perf only abstraction) ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-29libperf: Move perf_event_attr field from perf's evsel to libperf's perf_evselJiri Olsa1-25/+25
Move the perf_event_attr struct fron 'struct evsel' to 'struct perf_evsel'. Committer notes: Fixed up these: tools/perf/arch/arm/util/auxtrace.c tools/perf/arch/arm/util/cs-etm.c tools/perf/arch/arm64/util/arm-spe.c tools/perf/arch/s390/util/auxtrace.c tools/perf/util/cs-etm.c Also cc1: warnings being treated as errors tests/sample-parsing.c: In function 'do_test': tests/sample-parsing.c:162: error: missing initializer tests/sample-parsing.c:162: error: (near initialization for 'evsel.core.cpus') struct evsel evsel = { .needs_swap = false, - .core.attr = { - .sample_type = sample_type, - .read_format = read_format, + .core = { + . attr = { + .sample_type = sample_type, + .read_format = read_format, + }, [perfbuilder@a70e4eeb5549 /]$ gcc --version |& head -1 gcc (GCC) 4.4.7 Also we don't need to include perf_event.h in tools/perf/lib/include/perf/evsel.h, forward declaring 'struct perf_event_attr' is enough. And this even fixes the build in some systems where things are used somewhere down the include path from perf_event.h without defining __always_inline. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-43-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-29perf evlist: Rename struct perf_evlist to struct evlistJiri Olsa1-5/+5
Rename struct perf_evlist to struct evlist, so we don't have a name clash when we add struct perf_evlist in libperf. Committer notes: Added fixes to build on arm64, from Jiri and from me (tools/perf/util/cs-etm.c) Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-29perf evsel: Rename struct perf_evsel to struct evselJiri Olsa1-18/+18
Rename struct perf_evsel to struct evsel, so we don't have a name clash when we add struct perf_evsel in libperf. Committer notes: Added fixes for arm64, provided by Jiri. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190721112506.12306-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-09perf intel-pt: Fix potential NULL pointer dereference found by the smatch toolLeo Yan1-8/+5
Based on the following report from Smatch, fix the potential NULL pointer dereference check. tools/perf/util/intel-pt.c:3200 intel_pt_process_auxtrace_info() error: we previously assumed 'session->itrace_synth_opts' could be null (see line 3196) tools/perf/util/intel-pt.c:3206 intel_pt_process_auxtrace_info() warn: variable dereferenced before check 'session->itrace_synth_opts' (see line 3200) tools/perf/util/intel-pt.c 3196 if (session->itrace_synth_opts && session->itrace_synth_opts->set) { 3197 pt->synth_opts = *session->itrace_synth_opts; 3198 } else { 3199 itrace_synth_opts__set_default(&pt->synth_opts, 3200 session->itrace_synth_opts->default_no_sample); ^^^^^^^^^^^^^^^^^^^^^^^^^^ 3201 if (!session->itrace_synth_opts->default_no_sample && 3202 !session->itrace_synth_opts->inject) { 3203 pt->synth_opts.branches = false; 3204 pt->synth_opts.callchain = true; 3205 } 3206 if (session->itrace_synth_opts) ^^^^^^^^^^^^^^^^^^^^^^^^^^ 3207 pt->synth_opts.thread_stack = 3208 session->itrace_synth_opts->thread_stack; 3209 } 'session->itrace_synth_opts' is impossible to be a NULL pointer in intel_pt_process_auxtrace_info(), thus this patch removes the NULL test for 'session->itrace_synth_opts'. Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/20190708143937.7722-4-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-09tools lib: Adopt zalloc()/zfree() from tools/perfArnaldo Carvalho de Melo1-1/+1
Eroding a bit more the tools/perf/util/util.h hodpodge header. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-natazosyn9rwjka25tvcnyi0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-25perf intel-pt: Synthesize CBR events when last seen value changesAdrian Hunter1-24/+41
The first core-to-bus ratio (CBR) event will not be shown if --itrace 's' option (skip initial number of events) is used, nor if time intervals are specified that do not include the start of tracing. Change the logic to record the last CBR value seen by the user, and synthesize CBR events whenever that changes. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190622093248.581-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-17perf intel-pt: Add callchain to synthesized PEBS sampleAdrian Hunter1-0/+8
Like other synthesized events, if there is also an Intel PT branch trace, then a call stack can also be synthesized. Add that. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190610072803.10456-12-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-17perf intel-pt: Add memory information to synthesized PEBS sampleAdrian Hunter1-0/+27
Add memory information from PEBS data in the Intel PT trace to the synthesized PEBS sample. This provides sample types PERF_SAMPLE_ADDR, PERF_SAMPLE_WEIGHT, and PERF_SAMPLE_TRANSACTION, but not PERF_SAMPLE_DATA_SRC. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190610072803.10456-11-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-17perf intel-pt: Add LBR information to synthesized PEBS sampleAdrian Hunter1-0/+72
Add LBR information from PEBS data in the Intel PT trace to the synthesized PEBS sample. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190610072803.10456-10-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-17perf intel-pt: Add XMM registers to synthesized PEBS sampleAdrian Hunter1-1/+29
Add XMM register information from PEBS data in the Intel PT trace to the synthesized PEBS sample. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190610072803.10456-9-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-17perf intel-pt: Add gp registers to synthesized PEBS sampleAdrian Hunter1-0/+69
Add general purpose register information from PEBS data in the Intel PT trace to the synthesized PEBS sample. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190610072803.10456-8-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-17perf intel-pt: Synthesize PEBS sample basic informationAdrian Hunter1-2/+50
Synthesize a PEBS sample using basic information (ip, timestamp) only. Other PEBS information will be added in later patches. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190610072803.10456-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-17perf intel-pt: Factor out common sample preparation for re-useAdrian Hunter1-7/+16
Factor out common sample preparation for re-use when synthesizing PEBS samples. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190610072803.10456-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-17perf intel-pt: Prepare to synthesize PEBS samplesAdrian Hunter1-0/+18
Add infrastructure to prepare for synthesizing PEBS samples but leave the actual synthesis to later patches. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190610072803.10456-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-17perf intel-pt: Add new packets for PEBS via PTAdrian Hunter1-1/+2
Add 3 new packets to supports PEBS via PT, namely Block Begin Packet (BBP), Block Item Packet (BIP) and Block End Packet (BEP). PEBS data is encoded into multiple BIP packets that come between BBP and BEP. The BEP packet might be associated with a FUP packet. That is indicated by using a separate packet type (INTEL_PT_BEP_IP) similar to other packets types with the _IP suffix. Refer to the Intel SDM for more information about PEBS via PT: https://software.intel.com/en-us/articles/intel-sdm May 2019 version: Vol. 3B 18.5.5.2 PEBS output to IntelĀ® Processor Trace Decoding of BIP packets conflicts with single-byte TNT packets. Since BIP packets only occur in the context of a block (i.e. between BBP and BEP), that context must be recorded and passed to the packet decoder. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190610072803.10456-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-17Merge tag 'perf-core-for-mingo-5.3-20190611' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/coreIngo Molnar1-23/+331
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: perf record: Alexey Budankov: - Allow mixing --user-regs with --call-graph=dwarf, making sure that the minimal set of registers for DWARF unwinding is present in the set of user registers requested to be present in each sample, while warning the user that this may make callchains unreliable if more that the minimal set of registers is needed to unwind. yuzhoujian: - Add support to collect callchains from kernel or user space only, IOW allow setting the perf_event_attr.exclude_callchain_{kernel,user} bits from the command line. perf trace: Arnaldo Carvalho de Melo: - Remove x86_64 specific syscall numbers from the augmented_raw_syscalls BPF in-kernel collector of augmented raw_syscalls:sys_{enter,exit} payloads, use instead the syscall numbers obtainer either by the arch specific syscalltbl generators or from audit-libs. - Allow 'perf trace' to ask for the number of bytes to collect for string arguments, for now ask for PATH_MAX, i.e. the whole pathnames, which ends up being just a way to speficy which syscall args are pathnames and thus should be read using bpf_probe_read_str(). - Skip unknown syscalls when expanding strace like syscall groups. This helps using the 'string' group of syscalls to work in arm64, where some of the syscalls present in x86_64 that deal with strings, for instance 'access', are deprecated and this should not be asked for tracing. Leo Yan: - Exit when failing to build eBPF program. perf config: Arnaldo Carvalho de Melo: - Bail out when a handler returns failure for a key-value pair. This helps with cases where processing a key-value pair is not just a matter of setting some tool specific knob, involving, for instance building a BPF program to then attach to the list of events 'perf trace' will use, e.g. augmented_raw_syscalls.c. perf.data: Kan Liang: - Read and store die ID information available in new Intel processors in CPUID.1F in the CPU topology written in the perf.data header. perf stat: Kan Liang: - Support per-die aggregation. Documentation: Arnaldo Carvalho de Melo: - Update perf.data documentation about the CPU_TOPOLOGY, MEM_TOPOLOGY, CLOCKID and DIR_FORMAT headers. Song Liu: - Add description of headers HEADER_BPF_PROG_INFO and HEADER_BPF_BTF. Leo Yan: - Update default value for llvm.clang-bpf-cmd-template in 'man perf-config'. JVMTI: Jiri Olsa: - Address gcc string overflow warning for strncpy() core: - Remove superfluous nthreads system_wide setup in perf_evsel__alloc_fd(). Intel PT: Adrian Hunter: - Add support for samples to contain IPC ratio, collecting cycles information from CYC packets, showing the IPC info periodically, because Intel PT does not update the cycle count on every branch or instruction, the incremental values will often be zero. When there are values, they will be the number of instructions and number of cycles since the last update, and thus represent the average IPC since the last IPC value. E.g.: # perf record --cpu 1 -m200000 -a -e intel_pt/cyc/u sleep 0.0001 rounding mmap pages size to 1024M (262144 pages) [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 2.208 MB perf.data ] # perf script --insn-trace --xed -F+ipc,-dso,-cpu,-tid # <SNIP + add line numbering to make sense of IPC counts e.g.: (18/3)> 1 cc1 63501.650479626: 7f5219ac27bf _int_free+0x3f jnz 0x7f5219ac2af0 IPC: 0.81 (36/44) 2 cc1 63501.650479626: 7f5219ac27c5 _int_free+0x45 cmp $0x1f, %rbp 3 cc1 63501.650479626: 7f5219ac27c9 _int_free+0x49 jbe 0x7f5219ac2b00 4 cc1 63501.650479626: 7f5219ac27cf _int_free+0x4f test $0x8, %al 5 cc1 63501.650479626: 7f5219ac27d1 _int_free+0x51 jnz 0x7f5219ac2b00 6 cc1 63501.650479626: 7f5219ac27d7 _int_free+0x57 movq 0x13c58a(%rip), %rcx 7 cc1 63501.650479626: 7f5219ac27de _int_free+0x5e mov %rdi, %r12 8 cc1 63501.650479626: 7f5219ac27e1 _int_free+0x61 movq %fs:(%rcx), %rax 9 cc1 63501.650479626: 7f5219ac27e5 _int_free+0x65 test %rax, %rax 10 cc1 63501.650479626: 7f5219ac27e8 _int_free+0x68 jz 0x7f5219ac2821 11 cc1 63501.650479626: 7f5219ac27ea _int_free+0x6a leaq -0x11(%rbp), %rdi 12 cc1 63501.650479626: 7f5219ac27ee _int_free+0x6e mov %rdi, %rsi 13 cc1 63501.650479626: 7f5219ac27f1 _int_free+0x71 shr $0x4, %rsi 14 cc1 63501.650479626: 7f5219ac27f5 _int_free+0x75 cmpq %rsi, 0x13caf4(%rip) 15 cc1 63501.650479626: 7f5219ac27fc _int_free+0x7c jbe 0x7f5219ac2821 16 cc1 63501.650479626: 7f5219ac2821 _int_free+0xa1 cmpq 0x13f138(%rip), %rbp 17 cc1 63501.650479626: 7f5219ac2828 _int_free+0xa8 jnbe 0x7f5219ac28d8 18 cc1 63501.650479626: 7f5219ac28d8 _int_free+0x158 testb $0x2, 0x8(%rbx) 19 cc1 63501.650479628: 7f5219ac28dc _int_free+0x15c jnz 0x7f5219ac2ab0 IPC: 6.00 (18/3) <SNIP> - Allow using time ranges with Intel PT, i.e. these features, already present but not optimially usable with Intel PT, should be now: Select the second 10% time slice: $ perf script --time 10%/2 Select from 0% to 10% time slice: $ perf script --time 0%-10% Select the first and second 10% time slices: $ perf script --time 10%/1,10%/2 Select from 0% to 10% and 30% to 40% slices: $ perf script --time 0%-10%,30%-40% cs-etm (ARM): Mathieu Poirier: - Add support for CPU-wide trace scenarios. s390: Thomas Richter: - Fix missing kvm module load for s390. - Fix OOM error in TUI mode on s390 - Support s390 diag event display when doing analysis on !s390 architectures. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-17Merge branch 'x86/cpu' into perf/core, to pick up dependent changesIngo Molnar1-10/+1
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-10perf intel-pt: Add support for efficient time interval filteringAdrian Hunter1-0/+208
Set up time ranges for efficient time interval filtering using the new "fast forward" facility. Because decoding is done in time order, intel_pt_time_filter() needs to look only at the next start or end timestamp - refer intel_pt_next_time(). Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190604130017.31207-12-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-10perf intel-pt: Add support for lookaheadAdrian Hunter1-1/+58
Implement the lookahead callback to let the decoder access subsequent buffers. intel_pt_lookahead() manages the buffer lifetime and calls the decoder for each buffer until the decoder returns a non-zero value. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190604130017.31207-11-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-10perf intel-pt: Factor out intel_pt_get_buffer()Adrian Hunter1-23/+37
Factor out intel_pt_get_buffer() so it can be reused. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190604130017.31207-10-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288Thomas Gleixner1-10/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms and conditions of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 263 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05perf intel-pt: Add support for samples to contain IPC ratioAdrian Hunter1-0/+29
Copy the incremental instruction count and cycle count onto 'instructions' and 'branches' samples. Because Intel PT does not update the cycle count on every branch or instruction, the incremental values will often be zero. When there are values, they will be the number of instructions and number of cycles since the last update, and thus represent the average IPC since the last IPC value. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190520113728.14389-8-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-28perf intel-pt: Rationalize intel_pt_sync_switch()'s use of next_tidAdrian Hunter1-2/+2
Returning 1 from intel_pt_sync_switch() causes the current tid to be set. That negates the need to keep next_tid anymore. Rationalize the code to that effect. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190412113830.4126-9-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-28perf intel-pt: Improve sync_switch by processing PERF_RECORD_SWITCH* in eventsAdrian Hunter1-1/+39
sync_switch is a facility to synchronize decoding more closely with the point in the kernel when the context actually switched. Improve it by processing "context switch in" events. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190412113830.4126-8-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-28perf intel-pt: Fix itrace defaults for perf scriptAdrian Hunter1-1/+2
Commit 4eb068157121 ("perf script: Make itrace script default to all calls") does not work because 'use_browser' is being used to determine whether to default to periodic sampling (i.e. better for perf report). The result is that nothing but CBR events display for perf script when no --itrace option is specified. Fix by using 'default_no_sample' and 'inject' instead. Example: Before: $ perf record -e intel_pt/cyc/u ls $ perf script > cmp1.txt $ perf script --itrace=cepwx > cmp2.txt $ diff -sq cmp1.txt cmp2.txt Files cmp1.txt and cmp2.txt differ After: $ perf script > cmp1.txt $ perf script --itrace=cepwx > cmp2.txt $ diff -sq cmp1.txt cmp2.txt Files cmp1.txt and cmp2.txt are identical Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v4.20+ Fixes: 90e457f7be08 ("perf tools: Add Intel PT support") Link: http://lkml.kernel.org/r/20190520113728.14389-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-01perf intel-pt: Fix divide by zero when TSC is not availableAdrian Hunter1-0/+2
When TSC is not available, "timeless" decoding is used but a divide by zero occurs if perf_time_to_tsc() is called. Ensure the divisor is not zero. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v4.9+ Link: https://lkml.kernel.org/n/tip-1i4j0wqoc8vlbkcizqqxpsf4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf auxtrace: Add timestamp to auxtrace errorsAdrian Hunter1-7/+16
The timestamp can use useful to find part of a trace that has an error without outputting all of the trace e.g. using the itrace 's' option to skip initial number of events. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190206103947.15750-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-02perf thread-stack: Fix thread stack processing for the idle taskAdrian Hunter1-3/+3
perf creates a single 'struct thread' to represent the idle task. That is because threads are identified by PID and TID, and the idle task always has PID == TID == 0. However, there are actually separate idle tasks for each CPU. That creates a problem for thread stack processing which assumes that each thread has a single stack, not one stack per CPU. Fix that by passing through the CPU number, and in the case of the idle "thread", pick the thread stack from an array based on the CPU number. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20181221120620.9659-8-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05perf intel-pt: Add more event information to debug logAdrian Hunter1-3/+13
More event information is useful for debugging, especially MMAP events. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Link: http://lkml.kernel.org/r/20181105073505.8129-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-31perf intel-pt/bts: Calculate cpumode for synthesized samplesAdrian Hunter1-9/+13
In the absence of a fallback, samples must provide a correct cpumode for the 'ip'. Do that now there is no fallback. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: stable@vger.kernel.org # 4.19 Link: http://lkml.kernel.org/r/20181031091043.23465-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-31perf intel-pt: Insert callchain context into synthesized callchainsAdrian Hunter1-2/+4
In the absence of a fallback, callchains must encode also the callchain context. Do that now there is no fallback. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: stable@vger.kernel.org # 4.19 Link: http://lkml.kernel.org/r/100ea2ec-ed14-b56d-d810-e0a6d2f4b069@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-24perf script: Make itrace script default to all callsAndi Kleen1-1/+2
By default 'perf script' for itrace outputs sampled instructions or branches. In my experience this is confusing to users because it's hard to correlate with real program behavior. The sampling makes sense for tools like 'perf report' that actually sample to reduce the run time, but run time is normally not a problem for 'perf script'. It's better to give an accurate representation of the program flow. Default 'perf script' to output all calls for itrace. That's a much saner default. The old behavior can be still requested with 'perf script' --itrace=ibxwpe100000 v2: Fix ETM build failure v3: Really fix ETM build failure (Kim Phillips) Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Leo Yan <leo.yan@linaro.org> Link: http://lkml.kernel.org/r/20180920180540.14039-3-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-09-20perf intel-pt: Add decoder flags for trace begin / endAdrian Hunter1-0/+5
Previously, the decoder would indicate begin / end by a branch from / to zero. That hides useful information, in particular when a trace ends with a call. To prepare for remedying that, add Intel PT decoder flags for trace begin / end and map them to the existing sample flags. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20180920130048.31432-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06perf intel-pt: Fix "Unexpected indirect branch" errorAdrian Hunter1-0/+4
Some Atom CPUs can produce FUP packets that contain NLIP (next linear instruction pointer) instead of CLIP (current linear instruction pointer). That will result in "Unexpected indirect branch" errors. Fix by comparing IP to NLIP in that case. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1527762225-26024-5-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06perf intel-pt: Fix sync_switch INTEL_PT_SS_NOT_TRACINGAdrian Hunter1-0/+1
sync_switch is a facility to synchronize decoding more closely with the point in the kernel when the context actually switched. In one case, INTEL_PT_SS_NOT_TRACING state was not correctly transitioning to INTEL_PT_SS_TRACING state due to a missing case clause. Add it. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1527762225-26024-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-26perf symbols: Shorten dso__(first|last)_symbol()Arnaldo Carvalho de Melo1-1/+1
All users want MAP__FUNCTION, and this split is going away. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-sm72zwt1f03ma5uw78l6zze0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-26perf thread: Make thread__find_map() return the mapArnaldo Carvalho de Melo1-4/+2
It was returning the searched map just on the addr_location passed, with the function itself returning void. Make it return the map so that we can make the code more compact. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-tzlrrzdeoof4i6ktyqv1t6ks@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-26perf thread: Introduce thread__find_map()Arnaldo Carvalho de Melo1-2/+2
Out of thread__find_add_map(..., MAP__FUNCTION, ...), idea here is to continue removing references to MAP__{FUNCTION,VARIABLE} ahead of getting both types of symbols in the same rbtree, as various places do two lookups, looking first at MAP__FUNCTION, then at MAP__VARIABLE. So thread__find_map() will eventually do just that, and 'struct symbol' will have the symbol type, for code that cares about that. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-q27xee34l4izpfau49w103s6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-08perf intel-pt: Adjust overlap-checking to support sampling modeAdrian Hunter1-3/+4
Adjust overlap-checking to support sampling mode. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1520431349-30689-10-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>