aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/util/intel-pt-decoder (follow)
AgeCommit message (Collapse)AuthorFilesLines
2021-07-07perf intel-pt: Add a config for max loops without consuming a packetAdrian Hunter2-4/+10
The Intel PT decoder limits the number of unconditional branches (e.g. jmps) decoded without consuming any trace packets. Generally, a loop needs a conditional branch which generates a TNT packet, whereas a "ret" instruction will generate a TIP or TNT packet. So exceeding the limit is assumed to be a never-ending loop, which can happen if there has been a decoding error putting the decoder at the wrong place in the code. Up until now, the limit of 10000 has been enough but some analytic purposes have been reported to exceed that. Increase the limit to 100000, and make it configurable via perf config intel-pt.max-loops. Also amend the "Never-ending loop" message to mention the configuration entry. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20210701175132.3977-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-25Merge remote-tracking branch 'torvalds/master' into perf/coreArnaldo Carvalho de Melo1-1/+5
To pick up fixes from perf/urgent. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-19perf intel-pt: Fix transaction abort handlingAdrian Hunter1-1/+5
When adding support for power events, some handling of FUP packets was unified. That resulted in breaking reporting of TSX aborts, by not considering the associated TIP packet. Fix that. Example: A machine that supports TSX is required. It will have flag "rtm". Kernel parameter tsx=on may be required. # for w in `cat /proc/cpuinfo | grep -m1 flags `;do echo $w | grep rtm ; done rtm Test program: #include <stdio.h> #include <immintrin.h> int main() { int x = 0; if (_xbegin() == _XBEGIN_STARTED) { x = 1; _xabort(1); } else { printf("x = %d\n", x); } return 0; } Compile with -mrtm i.e. gcc -Wall -Wextra -mrtm xabort.c -o xabort Record: perf record -e intel_pt/cyc/u --filter 'filter main @ ./xabort' ./xabort Before: # perf script --itrace=be -F+flags,+addr,-period,-event --ns xabort 1478 [007] 92161.431348552: tr strt 0 [unknown] ([unknown]) => 400b6d main+0x0 (/root/xabort) xabort 1478 [007] 92161.431348624: jmp 400b96 main+0x29 (/root/xabort) => 400bae main+0x41 (/root/xabort) xabort 1478 [007] 92161.431348624: return 400bb4 main+0x47 (/root/xabort) => 400b87 main+0x1a (/root/xabort) xabort 1478 [007] 92161.431348637: jcc 400b8a main+0x1d (/root/xabort) => 400b98 main+0x2b (/root/xabort) xabort 1478 [007] 92161.431348644: tr end call 400ba9 main+0x3c (/root/xabort) => 40f690 printf+0x0 (/root/xabort) xabort 1478 [007] 92161.431360859: tr strt 0 [unknown] ([unknown]) => 400bae main+0x41 (/root/xabort) xabort 1478 [007] 92161.431360882: tr end return 400bb4 main+0x47 (/root/xabort) => 401139 __libc_start_main+0x309 (/root/xabort) After: # perf script --itrace=be -F+flags,+addr,-period,-event --ns xabort 1478 [007] 92161.431348552: tr strt 0 [unknown] ([unknown]) => 400b6d main+0x0 (/root/xabort) xabort 1478 [007] 92161.431348624: tx abrt 400b93 main+0x26 (/root/xabort) => 400b87 main+0x1a (/root/xabort) xabort 1478 [007] 92161.431348637: jcc 400b8a main+0x1d (/root/xabort) => 400b98 main+0x2b (/root/xabort) xabort 1478 [007] 92161.431348644: tr end call 400ba9 main+0x3c (/root/xabort) => 40f690 printf+0x0 (/root/xabort) xabort 1478 [007] 92161.431360859: tr strt 0 [unknown] ([unknown]) => 400bae main+0x41 (/root/xabort) xabort 1478 [007] 92161.431360882: tr end return 400bb4 main+0x47 (/root/xabort) => 401139 __libc_start_main+0x309 (/root/xabort) Fixes: a472e65fc490a ("perf intel-pt: Add decoder support for ptwrite and power event packets") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20210519074515.9262-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12perf intel-pt: Add VM Time Correlation to decoderAdrian Hunter3-0/+696
VM Time Correlation means determining if each TSC packet belongs to a VM Guest or the Host. When the trace is "in context" that is indicated by the NR flag in the PIP packet. However, when tracing kernel-only, userspace only, or using address filters, the trace can be "out of context" in which case timing packets are produced but not PIP packets. Nevertheless, it is very unlikely the VM Guest timestamps will be in the same range as the Host timestamps. Host time ranges are established by a starting side-band event timestamp, and subsequently by the buffer timestamp, written when the buffer is copied to the perf.data file. This patch supports updating the VM Guest timestamp packets, assuming an unchanging (during perf record) VMX TSC Offset and no VMX TSC scaling. Furthermore, it is possible to determine what the VMX TSC Offset is, although not necessarily at the start. The dry-run option lets that information be determined so that the user can pass it to a subsequent run. For more detail, refer to the example in the Intel PT documentation in a subsequent patch. VM Time Correlation is also performed on the TSC value in PEBs-via-PT records. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210430070309.17624-12-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12perf intel-pt: Better 7-byte timestamp wraparound logicAdrian Hunter1-3/+9
A timestamp should not go backwards. If it does it is assumed that the 7-byte TSC packet value has wrapped. Improve that logic so that it will not allow the timestamp to go past the buffer timestamp (which is recorded when the buffer is copied out) Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210430070309.17624-11-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12perf intel-pt: Pass the first timestamp to the decoderAdrian Hunter2-0/+12
VM Time Correlation will use time ranges to determine whether a TSC packet belongs to the Host or Guest. To start, the first non-zero timestamp is needed. Pass that to the decoder. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210430070309.17624-10-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12perf intel-pt: Add a tree for VMCS informationAdrian Hunter2-0/+13
Even when VMX TSC Offset is not changing (during perf record), different virtual machines can have different TSC Offsets. There is a Virtual Machine Control Structure (VMCS) for each virtual CPU, the address of which is reported to Intel PT in the VMCS packet. We do not know which VMCS belongs to which virtual machine, so use a tree to keep track of VMCS information. Then the decoder will be able to use the current VMCS value to look up the current TSC Offset. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210430070309.17624-9-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-05-12perf intel-pt: Let overlap detection handle VM timestampsAdrian Hunter2-5/+10
Intel PT timestamps are affected by virtualization. While TSC packets can still be considered to be unique, the TSC values need not be in order any more. Adjust the algorithm accordingly. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210430070309.17624-8-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-15tools/perf: Convert to insn_decode()Borislav Petkov1-7/+10
Simplify code, no functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lkml.kernel.org/r/20210304174237.31945-20-bp@alien8.de
2021-02-18perf intel-pt: Amend decoder to track the NR flagAdrian Hunter2-9/+53
The PIP packet NR (non-root) flag indicates whether or not a virtual machine is being traced (NR=1 => VM). Add support for tracking its value. In particular note that the PIP packet (outside of PSB+) will be associated with a TIP packet from which address the NR value takes effect. At that point, there is a branch from_ip, to_ip with corresponding from_nr and to_nr. In the event of VM-Entry failure, there should still PIP and TIP packets that can be followed in the same way. Also note that this assumes that a host VMM is not employing VMX controls that affect Intel PT, e.g. to hide the host from a guest using Intel PT. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210218095801.19576-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18perf intel-pt: Retain the last PIP packet payload as isAdrian Hunter4-16/+14
Retain the PIP packet payload as is, instead of just the CR3, because it contains also the VMX NR flag which is needed to track VM-Entry. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210218095801.19576-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18perf intel_pt: Add vmlaunch and vmresume as branchesAdrian Hunter2-0/+16
In preparation to support Intel PT decoding of virtual machine traces, add vmlaunch and vmresume as branch instructions. Note, sample flags will show "VMentry" even if the VM-Entry fails. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210218095801.19576-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18perf intel-pt: Add PSB eventsAdrian Hunter2-51/+183
Emitting a PSB+ can cause a CPU a slight delay. When doing timing analysis of code with Intel PT, it is useful to know if a timing bubble was caused by Intel PT or not. Add reporting of PSB events via perf script. PSB events are printed with the existing itrace 'p' option which also prints power and frequency changes. The PSB event contains the trace offset at which the PSB occurs, to allow easy reference back to the PSB+ packets. The PSB event timestamp is always the timestamp from the PSB+ TSC packet, and the ip is always the address from the PSB+ FUP packet. The code changes are non-trivial because the decoder must walk to the PSB+ FUP address before outputting the PSB event. Example: $ perf record -e intel_pt/cyc,psb_period=0/u uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.046 MB perf.data ] $ perf script --itrace=p --ns perf 17981 [006] 25617.510820383: psb: psb offs: 0 0 [unknown] ([unknown]) perf 17981 [006] 25617.510820383: cbr: cbr: 42 freq: 4219 MHz (156%) 0 [unknown] ([unknown]) uname 17981 [006] 25617.510889753: psb: psb offs: 0xb50 7f78c12a212e __GI___tunables_init+0xee (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.510899162: psb: psb offs: 0x12d0 7f78c128af1c dl_main+0x93c (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.510939242: psb: psb offs: 0x1a50 7f78c128eefc _dl_map_object_from_fd+0x13c (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.510981274: psb: psb offs: 0x21c8 7f78c1296307 _dl_relocate_object+0x927 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.510993034: psb: psb offs: 0x2948 7f78c12940e4 _dl_lookup_symbol_x+0x14 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.511003871: psb: psb offs: 0x30c8 7f78c12937b3 do_lookup_x+0x2f3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.511019854: psb: psb offs: 0x3850 7f78c1295eed _dl_relocate_object+0x50d (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.511029015: psb: psb offs: 0x4390 7f78c12a855a strcmp+0xf6a (/usr/lib/x86_64-linux-gnu/ld-2.31.so) uname 17981 [006] 25617.511064876: psb: psb offs: 0x4b10 0 [unknown] ([unknown]) uname 17981 [006] 25617.511080762: psb: psb offs: 0x5290 7f78c11db53d _dl_addr+0x13d (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511086035: psb: psb offs: 0x5a08 7f78c11db538 _dl_addr+0x138 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511091381: psb: psb offs: 0x6190 7f78c11db534 _dl_addr+0x134 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511096681: psb: psb offs: 0x6910 7f78c11db4c3 _dl_addr+0xc3 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511119520: psb: psb offs: 0x7090 7f78c10ada5e _nl_intern_locale_data+0x12e (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511126584: psb: psb offs: 0x7818 7f78c10ada50 _nl_intern_locale_data+0x120 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511132775: psb: psb offs: 0x8358 7f78c10c20c0 getenv+0xa0 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511134598: psb: psb offs: 0x8ad0 7f78c10ada09 _nl_intern_locale_data+0xd9 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511135685: psb: psb offs: 0x9258 7f78c10ada50 _nl_intern_locale_data+0x120 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511138322: psb: psb offs: 0x99d0 7f78c11fffd9 __strncmp_avx2+0x39 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) uname 17981 [006] 25617.511158907: psb: psb offs: 0xa150 0 [unknown] ([unknown]) Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210205175350.23817-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18perf intel-pt: Fix IPC with CYC thresholdAdrian Hunter2-0/+28
The code assumed every CYC-eligible packet has a CYC packet, which is not the case when CYC thresholds are used. Fix by checking if a CYC packet is actually present in that case. Fixes: 5b1dc0fd1da06 ("perf intel-pt: Add support for samples to contain IPC ratio") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210205175350.23817-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18perf intel-pt: Fix premature IPCAdrian Hunter2-1/+11
The code assumed a change in cycle count means accurate IPC. That is not correct, for example when sampling both branches and instructions, or at a FUP packet (which is not CYC-eligible) address. Fix by using an explicit flag to indicate when IPC can be sampled. Fixes: 5b1dc0fd1da06 ("perf intel-pt: Add support for samples to contain IPC ratio") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/20210205175350.23817-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18perf intel-pt: Fix missing CYC processing in PSBAdrian Hunter1-0/+3
Add missing CYC packet processing when walking through PSB+. This improves the accuracy of timestamps that follow PSB+, until the next MTC. Fixes: 3d49807870f08 ("perf tools: Add new Intel PT packet definitions") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20210205175350.23817-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-06perf intel-pt: Add support for decoding PSB+ onlyAdrian Hunter1-0/+18
A single q option decodes ip from only FUP/TIP packets. Make it so that repeating the q option (i.e. qq) decodes only PSB+, getting ip if there is a FUP packet within PSB+ (i.e. between PSB and PSBEND). Example: $ perf record -e intel_pt//u grep -rI pudding drivers [ perf record: Woken up 52 times to write data ] [ perf record: Captured and wrote 57.870 MB perf.data ] $ time perf script --itrace=bi | wc -l 58948289 real 1m23.863s user 1m23.251s sys 0m7.452s $ time perf script --itrace=biq | wc -l 3385694 real 0m4.453s user 0m4.455s sys 0m0.328s $ time perf script --itrace=biqq | wc -l 1883 real 0m0.047s user 0m0.043s sys 0m0.009s Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200710151104.15137-13-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-06perf intel-pt: Add support for decoding FUP/TIP onlyAdrian Hunter2-4/+164
Use the new itrace 'q' option to add support for a mode of decoding that ignores TNT, does not walk object code, but gets the ip from FUP and TIP packets. Example: $ perf record -e intel_pt//u grep -rI pudding drivers [ perf record: Woken up 52 times to write data ] [ perf record: Captured and wrote 57.870 MB perf.data ] $ time perf script --itrace=bi | wc -l 58948289 real 1m23.863s user 1m23.251s sys 0m7.452s $ time perf script --itrace=biq | wc -l 3385694 real 0m4.453s user 0m4.455s sys 0m0.328s Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20200710151104.15137-12-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-06perf intel-pt: Fix duplicate branch after CBRAdrian Hunter1-2/+6
CBR events can result in a duplicate branch event, because the state type defaults to a branch. Fix by clearing the state type. Example: trace 'sleep' and hope for a frequency change Before: $ perf record -e intel_pt//u sleep 0.1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.034 MB perf.data ] $ perf script --itrace=bpe > before.txt After: $ perf script --itrace=bpe > after.txt $ diff -u before.txt after.txt --- before.txt 2020-07-07 14:42:18.191508098 +0300 +++ after.txt 2020-07-07 14:42:36.587891753 +0300 @@ -29673,7 +29673,6 @@ sleep 93431 [007] 15411.619905: 1 branches:u: 0 [unknown] ([unknown]) => 7f0818abb2e0 clock_nanosleep@@GLIBC_2.17+0x0 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) sleep 93431 [007] 15411.619905: 1 branches:u: 7f0818abb30c clock_nanosleep@@GLIBC_2.17+0x2c (/usr/lib/x86_64-linux-gnu/libc-2.31.so) => 0 [unknown] ([unknown]) sleep 93431 [007] 15411.720069: cbr: cbr: 15 freq: 1507 MHz ( 56%) 7f0818abb30c clock_nanosleep@@GLIBC_2.17+0x2c (/usr/lib/x86_64-linux-gnu/libc-2.31.so) - sleep 93431 [007] 15411.720069: 1 branches:u: 7f0818abb30c clock_nanosleep@@GLIBC_2.17+0x2c (/usr/lib/x86_64-linux-gnu/libc-2.31.so) => 0 [unknown] ([unknown]) sleep 93431 [007] 15411.720076: 1 branches:u: 0 [unknown] ([unknown]) => 7f0818abb30e clock_nanosleep@@GLIBC_2.17+0x2e (/usr/lib/x86_64-linux-gnu/libc-2.31.so) sleep 93431 [007] 15411.720077: 1 branches:u: 7f0818abb323 clock_nanosleep@@GLIBC_2.17+0x43 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) => 7f0818ac0eb7 __nanosleep+0x17 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) sleep 93431 [007] 15411.720077: 1 branches:u: 7f0818ac0ebf __nanosleep+0x1f (/usr/lib/x86_64-linux-gnu/libc-2.31.so) => 55cb7e4c2827 rpl_nanosleep+0x97 (/usr/bin/sleep) Fixes: 91de8684f1cff ("perf intel-pt: Cater for CBR change in PSB+") Fixes: abe5a1d3e4bee ("perf intel-pt: Decoder to output CBR changes immediately") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20200710151104.15137-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-08-06perf intel-pt: Fix FUP packet stateAdrian Hunter1-14/+7
While walking code towards a FUP ip, the packet state is INTEL_PT_STATE_FUP or INTEL_PT_STATE_FUP_NO_TIP. That was mishandled resulting in the state becoming INTEL_PT_STATE_IN_SYNC prematurely. The result was an occasional lost EXSTOP event. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20200710151104.15137-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30perf tools: Remove unneeded semicolonsZou Wei1-1/+1
Fixes coccicheck warnings: tools/perf/builtin-diff.c:1565:2-3: Unneeded semicolon tools/perf/builtin-lock.c:778:2-3: Unneeded semicolon tools/perf/builtin-mem.c:126:2-3: Unneeded semicolon tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c:555:2-3: Unneeded semicolon tools/perf/util/ordered-events.c:317:2-3: Unneeded semicolon tools/perf/util/synthetic-events.c:1131:2-3: Unneeded semicolon tools/perf/util/trace-event-read.c:78:2-3: Unneeded semicolon Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/1588065523-71423-1-git-send-email-zou_wei@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-31perf intel-pt: Use shared x86 insn decoderJosh Poimboeuf9-2623/+7
Now that there's a common version of the decoder for all tools, use it instead of the local copy. Also use perf's check-headers.sh script to diff the decoder files to make sure they remain in sync with the kernel version. Objtool has a similar check. Committer notes: Had to keep this all pointing explicitely to x86 headers/files, i.e. instead of asm/isnn.h we had to use ../include/asm/insn.h when the files were in differemt dirs, or just replace "<asm/foo.h>" with "foo.h". This way we continue to be able to process perf.data files with Intel PT traces in distros other than x86. Also fixed up the awk script paths to use $(srcdir)/tools/arch instead or relative directories so that we keep detached tarballs (make help | grep perf) working. For now the include lines in these headers are being ignored so as not to flag false reports of kernel/tools out of sync. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/8a37e615d2880f039505d693d1e068a009358a2b.1567118001.git.jpoimboe@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-31perf intel-pt: Remove inat.c from build dependency listJosh Poimboeuf1-1/+1
intel-pt-insn-decoder.c includes inat.c directly, so it already has an implicit dependency on inat.c. The Build file dependency is redundant. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: x86@kernel.org Link: http://lore.kernel.org/lkml/53776d6d29bc9eceb571d52df8fa32250c58a0f3.1567118001.git.jpoimboe@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-31perf tools: Remove needless evlist.h include directivesArnaldo Carvalho de Melo1-1/+1
Remove the last unneeded use of cache.h in a header, we can check where it is really needed, i.e. we can remove it and be sure that it isn't being obtained indirectly. This is an old file, used by now incorrectly in many places, so it was providing includes needed indirectly, fixup this fallout. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-3x3l8gihoaeh7714os861ia7@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-09tools lib: Adopt zalloc()/zfree() from tools/perfArnaldo Carvalho de Melo1-1/+1
Eroding a bit more the tools/perf/util/util.h hodpodge header. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-natazosyn9rwjka25tvcnyi0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-25perf intel-pt: Add CBR value to decoder stateAdrian Hunter2-0/+2
For convenience, add the core-to-bus ratio (CBR) value to the decoder state. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190622093248.581-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-25perf intel-pt: Cater for CBR change in PSB+Adrian Hunter1-0/+7
PSB+ provides status information only so the core-to-bus ratio (CBR) in PSB+ will not have changed from its previous value. However, cater for the possibility of a another CBR change that gets caught up in the PSB+ anyway. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190622093248.581-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-25perf intel-pt: Decoder to output CBR changes immediatelyAdrian Hunter1-10/+6
The core-to-bus ratio (CBR) provides the CPU frequency. With branches enabled, the decoder was outputting CBR changes only when there was a branch. That loses the correct time of the change if the trace is not in context (e.g. not tracing kernel space). Change to output the CBR change immediately. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190622093248.581-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-17perf intel-pt: Add decoder support for PEBS via PTAdrian Hunter2-1/+214
PEBS data is encoded in Block Item Packets (BIP). Populate a new structure intel_pt_blk_items with the values and, upon a Block End Packet (BEP), report them as a new Intel PT sample type INTEL_PT_BLK_ITEMS. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190610072803.10456-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-17perf intel-pt: Add new packets for PEBS via PTAdrian Hunter3-8/+191
Add 3 new packets to supports PEBS via PT, namely Block Begin Packet (BBP), Block Item Packet (BIP) and Block End Packet (BEP). PEBS data is encoded into multiple BIP packets that come between BBP and BEP. The BEP packet might be associated with a FUP packet. That is indicated by using a separate packet type (INTEL_PT_BEP_IP) similar to other packets types with the _IP suffix. Refer to the Intel SDM for more information about PEBS via PT: https://software.intel.com/en-us/articles/intel-sdm May 2019 version: Vol. 3B 18.5.5.2 PEBS output to IntelĀ® Processor Trace Decoding of BIP packets conflicts with single-byte TNT packets. Since BIP packets only occur in the context of a block (i.e. between BBP and BEP), that context must be recorded and passed to the packet decoder. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190610072803.10456-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-17Merge tag 'perf-core-for-mingo-5.3-20190611' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/coreIngo Molnar2-43/+292
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: perf record: Alexey Budankov: - Allow mixing --user-regs with --call-graph=dwarf, making sure that the minimal set of registers for DWARF unwinding is present in the set of user registers requested to be present in each sample, while warning the user that this may make callchains unreliable if more that the minimal set of registers is needed to unwind. yuzhoujian: - Add support to collect callchains from kernel or user space only, IOW allow setting the perf_event_attr.exclude_callchain_{kernel,user} bits from the command line. perf trace: Arnaldo Carvalho de Melo: - Remove x86_64 specific syscall numbers from the augmented_raw_syscalls BPF in-kernel collector of augmented raw_syscalls:sys_{enter,exit} payloads, use instead the syscall numbers obtainer either by the arch specific syscalltbl generators or from audit-libs. - Allow 'perf trace' to ask for the number of bytes to collect for string arguments, for now ask for PATH_MAX, i.e. the whole pathnames, which ends up being just a way to speficy which syscall args are pathnames and thus should be read using bpf_probe_read_str(). - Skip unknown syscalls when expanding strace like syscall groups. This helps using the 'string' group of syscalls to work in arm64, where some of the syscalls present in x86_64 that deal with strings, for instance 'access', are deprecated and this should not be asked for tracing. Leo Yan: - Exit when failing to build eBPF program. perf config: Arnaldo Carvalho de Melo: - Bail out when a handler returns failure for a key-value pair. This helps with cases where processing a key-value pair is not just a matter of setting some tool specific knob, involving, for instance building a BPF program to then attach to the list of events 'perf trace' will use, e.g. augmented_raw_syscalls.c. perf.data: Kan Liang: - Read and store die ID information available in new Intel processors in CPUID.1F in the CPU topology written in the perf.data header. perf stat: Kan Liang: - Support per-die aggregation. Documentation: Arnaldo Carvalho de Melo: - Update perf.data documentation about the CPU_TOPOLOGY, MEM_TOPOLOGY, CLOCKID and DIR_FORMAT headers. Song Liu: - Add description of headers HEADER_BPF_PROG_INFO and HEADER_BPF_BTF. Leo Yan: - Update default value for llvm.clang-bpf-cmd-template in 'man perf-config'. JVMTI: Jiri Olsa: - Address gcc string overflow warning for strncpy() core: - Remove superfluous nthreads system_wide setup in perf_evsel__alloc_fd(). Intel PT: Adrian Hunter: - Add support for samples to contain IPC ratio, collecting cycles information from CYC packets, showing the IPC info periodically, because Intel PT does not update the cycle count on every branch or instruction, the incremental values will often be zero. When there are values, they will be the number of instructions and number of cycles since the last update, and thus represent the average IPC since the last IPC value. E.g.: # perf record --cpu 1 -m200000 -a -e intel_pt/cyc/u sleep 0.0001 rounding mmap pages size to 1024M (262144 pages) [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 2.208 MB perf.data ] # perf script --insn-trace --xed -F+ipc,-dso,-cpu,-tid # <SNIP + add line numbering to make sense of IPC counts e.g.: (18/3)> 1 cc1 63501.650479626: 7f5219ac27bf _int_free+0x3f jnz 0x7f5219ac2af0 IPC: 0.81 (36/44) 2 cc1 63501.650479626: 7f5219ac27c5 _int_free+0x45 cmp $0x1f, %rbp 3 cc1 63501.650479626: 7f5219ac27c9 _int_free+0x49 jbe 0x7f5219ac2b00 4 cc1 63501.650479626: 7f5219ac27cf _int_free+0x4f test $0x8, %al 5 cc1 63501.650479626: 7f5219ac27d1 _int_free+0x51 jnz 0x7f5219ac2b00 6 cc1 63501.650479626: 7f5219ac27d7 _int_free+0x57 movq 0x13c58a(%rip), %rcx 7 cc1 63501.650479626: 7f5219ac27de _int_free+0x5e mov %rdi, %r12 8 cc1 63501.650479626: 7f5219ac27e1 _int_free+0x61 movq %fs:(%rcx), %rax 9 cc1 63501.650479626: 7f5219ac27e5 _int_free+0x65 test %rax, %rax 10 cc1 63501.650479626: 7f5219ac27e8 _int_free+0x68 jz 0x7f5219ac2821 11 cc1 63501.650479626: 7f5219ac27ea _int_free+0x6a leaq -0x11(%rbp), %rdi 12 cc1 63501.650479626: 7f5219ac27ee _int_free+0x6e mov %rdi, %rsi 13 cc1 63501.650479626: 7f5219ac27f1 _int_free+0x71 shr $0x4, %rsi 14 cc1 63501.650479626: 7f5219ac27f5 _int_free+0x75 cmpq %rsi, 0x13caf4(%rip) 15 cc1 63501.650479626: 7f5219ac27fc _int_free+0x7c jbe 0x7f5219ac2821 16 cc1 63501.650479626: 7f5219ac2821 _int_free+0xa1 cmpq 0x13f138(%rip), %rbp 17 cc1 63501.650479626: 7f5219ac2828 _int_free+0xa8 jnbe 0x7f5219ac28d8 18 cc1 63501.650479626: 7f5219ac28d8 _int_free+0x158 testb $0x2, 0x8(%rbx) 19 cc1 63501.650479628: 7f5219ac28dc _int_free+0x15c jnz 0x7f5219ac2ab0 IPC: 6.00 (18/3) <SNIP> - Allow using time ranges with Intel PT, i.e. these features, already present but not optimially usable with Intel PT, should be now: Select the second 10% time slice: $ perf script --time 10%/2 Select from 0% to 10% time slice: $ perf script --time 0%-10% Select the first and second 10% time slices: $ perf script --time 10%/1,10%/2 Select from 0% to 10% and 30% to 40% slices: $ perf script --time 0%-10%,30%-40% cs-etm (ARM): Mathieu Poirier: - Add support for CPU-wide trace scenarios. s390: Thomas Richter: - Fix missing kvm module load for s390. - Fix OOM error in TUI mode on s390 - Support s390 diag event display when doing analysis on !s390 architectures. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-10perf intel-pt: Add intel_pt_fast_forward()Adrian Hunter2-0/+132
Intel PT decoding is done in time order. In order to support efficient time interval filtering, add a facility to "fast forward" towards a particular timestamp. That involves finding the right buffer, stepping to that buffer, and then stepping forward PSBs. Because decoding must begin at a PSB, "fast forward" stops at the last PSB that has a timestamp before the target timestamp. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190604130017.31207-9-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-10perf intel-pt: Add reposition parameter to intel_pt_get_data()Adrian Hunter1-8/+9
When the decoder gets the next trace buffer, some state is reset if the buffer is not consecutive to the previous buffer. Add a parameter 'reposition' so that can be done also to support a "fast forward" facility. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190604130017.31207-8-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-10perf intel-pt: Factor out intel_pt_reposition()Adrian Hunter1-4/+9
Factor out intel_pt_reposition() so it can be reused. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190604130017.31207-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-10perf intel-pt: Factor out intel_pt_8b_tsc()Adrian Hunter1-9/+17
Factor out intel_pt_8b_tsc() so it can be reused. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190604130017.31207-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-10perf intel-pt: Add lookahead callbackAdrian Hunter2-0/+5
Add a callback function to enable the decoder to lookahead at subsequent trace buffers. This will be used to implement a "fast forward" facility which will be needed to support efficient time interval filtering. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190604130017.31207-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288Thomas Gleixner8-80/+8
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms and conditions of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 263 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05perf intel-pt: Accumulate cycle count from TSC/TMA/MTC packetsAdrian Hunter1-0/+51
When CYC packets are not available, it is still possible to count cycles using TSC/TMA/MTC timestamps. As the timestamp increments in TSC ticks, convert to CPU cycles using the current core-to-bus ratio. Do not accumulate cycles when control flow packet generation is not enabled, nor when time has been "lost", typically due to mwait, which is indicated by a TSC/TMA packet that is not part of PSB+. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190520113728.14389-12-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-05perf intel-pt: Re-factor TIP cases in intel_pt_walk_to_ipAdrian Hunter1-6/+17
To make it easier to add new code for different TIP cases, separate each case. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190520113728.14389-11-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-05perf intel-pt: Record when decoding PSB+ packetsAdrian Hunter1-10/+31
In preparation for using MTC packets to count cycles, record whether decoding is between a PSB and PSBEND packets. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190520113728.14389-10-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-05perf intel-pt: Accumulate cycle count from CYC packetsAdrian Hunter2-1/+14
In preparation for providing instructions-per-cycle (IPC) information, accumulate cycle count from CYC packets. Although CYC packets are optional (requires config term 'cyc' to enable cycle-accurate mode when recording), the simplest way to count cycles is with CYC packets. The first complication is that cycles must be counted only when also counting instructions. That means when control flow packet generation is enabled i.e. between TIP.PGE and TIP.PGD packets. Also, sampling the cycle count follows the same rules as sampling the timestamp, that is, not before the instruction to which the decoder is walking is reached. In addition, the cycle count is not accurate for any but the first branch of a TNT packet. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190520113728.14389-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-05perf intel-pt: Factor out intel_pt_update_sample_timeAdrian Hunter1-8/+10
To eliminate some duplication and make the code more understandable, factor out intel_pt_update_sample_time. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190520113728.14389-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156Thomas Gleixner5-73/+5
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 02111 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1334 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Richard Fontana <rfontana@redhat.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070033.113240726@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-16perf intel-pt: Fix sample timestamp wrt non-taken branchesAdrian Hunter1-1/+4
The sample timestamp is updated to ensure that the timestamp represents the time of the sample and not a branch that the decoder is still walking towards. The sample timestamp is updated when the decoder returns, but the decoder does not return for non-taken branches. Update the sample timestamp then also. Note that commit 3f04d98e972b5 ("perf intel-pt: Improve sample timestamp") was also a stable fix and appears, for example, in v4.4 stable tree as commit a4ebb58fd124 ("perf intel-pt: Improve sample timestamp"). Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v4.4+ Fixes: 3f04d98e972b ("perf intel-pt: Improve sample timestamp") Link: http://lkml.kernel.org/r/20190510124143.27054-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-16perf intel-pt: Fix improved sample timestampAdrian Hunter1-3/+10
The decoder uses its current timestamp in samples. Usually that is a timestamp that has already passed, but in some cases it is a timestamp for a branch that the decoder is walking towards, and consequently hasn't reached. The intel_pt_sample_time() function decides which is which, but was not handling TNT packets exactly correctly. In the case of TNT, the timestamp applies to the first branch, so the decoder must first walk to that branch. That means intel_pt_sample_time() should return true for TNT, and this patch makes that change. However, if the first branch is a non-taken branch (i.e. a 'N'), then intel_pt_sample_time() needs to return false for subsequent taken branches in the same TNT packet. To handle that, introduce a new state INTEL_PT_STATE_TNT_CONT to distinguish the cases. Note that commit 3f04d98e972b5 ("perf intel-pt: Improve sample timestamp") was also a stable fix and appears, for example, in v4.4 stable tree as commit a4ebb58fd124 ("perf intel-pt: Improve sample timestamp"). Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org # v4.4+ Fixes: 3f04d98e972b5 ("perf intel-pt: Improve sample timestamp") Link: http://lkml.kernel.org/r/20190510124143.27054-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-16perf intel-pt: Fix instructions sampling rateAdrian Hunter1-3/+10
The timestamp used to determine if an instruction sample is made, is an estimate based on the number of instructions since the last known timestamp. A consequence is that it might go backwards, which results in extra samples. Change it so that a sample is only made when the timestamp goes forwards. Note this does not affect a sampling period of 0 or sampling periods specified as a count of instructions. Example: Before: $ perf script --itrace=i10us ls 13812 [003] 2167315.222583: 3270 instructions:u: 7fac71e2e494 __GI___tunables_init+0xf4 (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222667: 30902 instructions:u: 7fac71e2da0f _dl_cache_libcmp+0x2f (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222667: 10 instructions:u: 7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222667: 8 instructions:u: 7fac71e2d9ea _dl_cache_libcmp+0xa (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222667: 14 instructions:u: 7fac71e2d9ea _dl_cache_libcmp+0xa (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222667: 6 instructions:u: 7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222667: 14 instructions:u: 7fac71e2d9ff _dl_cache_libcmp+0x1f (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222667: 4 instructions:u: 7fac71e2dab2 _dl_cache_libcmp+0xd2 (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222728: 16423 instructions:u: 7fac71e2477a _dl_map_object_deps+0x1ba (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222734: 12731 instructions:u: 7fac71e27938 _dl_name_match_p+0x68 (/lib/x86_64-linux-gnu/ld-2.28.so) ... After: $ perf script --itrace=i10us ls 13812 [003] 2167315.222583: 3270 instructions:u: 7fac71e2e494 __GI___tunables_init+0xf4 (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222667: 30902 instructions:u: 7fac71e2da0f _dl_cache_libcmp+0x2f (/lib/x86_64-linux-gnu/ld-2.28.so) ls 13812 [003] 2167315.222728: 16479 instructions:u: 7fac71e2477a _dl_map_object_deps+0x1ba (/lib/x86_64-linux-gnu/ld-2.28.so) ... Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Fixes: f4aa081949e7b ("perf tools: Add Intel PT decoder") Link: http://lkml.kernel.org/r/20190510124143.27054-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-28perf intel-pt: Fix TSC slipAdrian Hunter1-12/+8
A TSC packet can slip past MTC packets so that the timestamp appears to go backwards. One estimate is that can be up to about 40 CPU cycles, which is certainly less than 0x1000 TSC ticks, but accept slippage an order of magnitude more to be on the safe side. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Fixes: 79b58424b821c ("perf tools: Add Intel PT support for decoding MTC packets") Link: http://lkml.kernel.org/r/20190325135135.18348-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-14perf tools: Rename build libperf to perfJiri Olsa1-1/+1
Rename build libperf to perf, because it's used to build perf. The libperf build object name will be used for libperf library. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190213123246.4015-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf intel-pt: Packet splitting can happen only on 32-bitAdrian Hunter1-1/+1
Data is copied when the trace is stopped, so packets are never split between buffers except when processing if the buffer cannot fit in the address space which can only happen on 32-bit systems. Change the logic to reflect that. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20190206103947.15750-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06perf intel-pt: Fix CYC timestamp calculation after OVFAdrian Hunter1-1/+0
CYC packet timestamp calculation depends upon CBR which was being cleared upon overflow (OVF). That can cause errors due to failing to synchronize with sideband events. Even if a CBR change has been lost, the old CBR is still a better estimate than zero. So remove the clearing of CBR. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20190206103947.15750-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>