<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-dev/tools/perf/arch, branch master</title>
<subtitle>Linux kernel development work - see feature branches</subtitle>
<id>https://git.zx2c4.com/linux-dev/atom/tools/perf/arch?h=master</id>
<link rel='self' href='https://git.zx2c4.com/linux-dev/atom/tools/perf/arch?h=master'/>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/'/>
<updated>2022-10-25T20:40:48Z</updated>
<entry>
<title>tools headers UAPI: Sync powerpc syscall tables with the kernel sources</title>
<updated>2022-10-25T20:40:48Z</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@redhat.com</email>
</author>
<published>2021-09-08T19:09:08Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=88864611940ae7dd5a9d40667287c4c9fa455140'/>
<id>urn:sha1:88864611940ae7dd5a9d40667287c4c9fa455140</id>
<content type='text'>
To pick the changes in these csets:

  e237506238352f3b ("powerpc/32: fix syscall wrappers with 64-bit arguments of unaligned register-pairs")

That doesn't cause any changes in the perf tools.

As a reminder, this table is used in tools perf to allow features such as:

  [root@five ~]# perf trace -e set_mempolicy_home_node
  ^C[root@five ~]#
  [root@five ~]# perf trace -v -e set_mempolicy_home_node
  Using CPUID AuthenticAMD-25-21-0
  event qualifier tracepoint filter: (common_pid != 253729 &amp;&amp; common_pid != 3585) &amp;&amp; (id == 450)
  mmap size 528384B
  ^C[root@five ~]
  [root@five ~]# perf trace -v -e set*  --max-events 5
  Using CPUID AuthenticAMD-25-21-0
  event qualifier tracepoint filter: (common_pid != 253734 &amp;&amp; common_pid != 3585) &amp;&amp; (id == 38 || id == 54 || id == 105 || id == 106 || id == 109 || id == 112 || id == 113 || id == 114 || id == 116 || id == 117 || id == 119 || id == 122 || id == 123 || id == 141 || id == 160 || id == 164 || id == 170 || id == 171 || id == 188 || id == 205 || id == 218 || id == 238 || id == 273 || id == 308 || id == 450)
  mmap size 528384B
       0.000 ( 0.008 ms): bash/253735 setpgid(pid: 253735 (bash), pgid: 253735 (bash))      = 0
    6849.011 ( 0.008 ms): bash/16046 setpgid(pid: 253736 (bash), pgid: 253736 (bash))       = 0
    6849.080 ( 0.005 ms): bash/253736 setpgid(pid: 253736 (bash), pgid: 253736 (bash))      = 0
    7437.718 ( 0.009 ms): gnome-shell/253737 set_robust_list(head: 0x7f34b527e920, len: 24) = 0
   13445.986 ( 0.010 ms): bash/16046 setpgid(pid: 253738 (bash), pgid: 253738 (bash))       = 0
  [root@five ~]#

That is the filter expression attached to the raw_syscalls:sys_{enter,exit}
tracepoints.

  $ find tools/perf/arch/ -name "syscall*tbl" | xargs grep -w set_mempolicy_home_node
  tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl:450	common	set_mempolicy_home_node		sys_set_mempolicy_home_node
  tools/perf/arch/powerpc/entry/syscalls/syscall.tbl:450 	nospu	set_mempolicy_home_node		sys_set_mempolicy_home_node
  tools/perf/arch/s390/entry/syscalls/syscall.tbl:450  common	set_mempolicy_home_node	sys_set_mempolicy_home_node	sys_set_mempolicy_home_node
  tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:450	common	set_mempolicy_home_node	sys_set_mempolicy_home_node
  $

  $ grep -w set_mempolicy_home_node /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c
	[450] = "set_mempolicy_home_node",
  $

This addresses these perf build warnings:

  Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
  diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
  Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
  diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
  Warning: Kernel ABI header at 'tools/perf/arch/powerpc/entry/syscalls/syscall.tbl' differs from latest version at 'arch/powerpc/kernel/syscalls/syscall.tbl'
  diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl
  Warning: Kernel ABI header at 'tools/perf/arch/s390/entry/syscalls/syscall.tbl' differs from latest version at 'arch/s390/kernel/syscalls/syscall.tbl'
  diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl
  Warning: Kernel ABI header at 'tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl' differs from latest version at 'arch/mips/kernel/syscalls/syscall_n64.tbl'
  diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl

Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Cc: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Link: https://lore.kernel.org/lkml/Y01HN2DGkWz8tC%2FJ@kernel.org/
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driver</title>
<updated>2022-10-15T13:13:16Z</updated>
<author>
<name>Qi Liu</name>
<email>liuqi115@huawei.com</email>
</author>
<published>2022-09-27T08:13:59Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=057381a7ece1b2726509ce47cdb9c1a111acfce9'/>
<id>urn:sha1:057381a7ece1b2726509ce47cdb9c1a111acfce9</id>
<content type='text'>
HiSilicon PCIe tune and trace device (PTT) could dynamically tune the
PCIe link's events, and trace the TLP headers).

This patch add support for PTT device in perf tool, so users could use
'perf record' to get TLP headers trace data.

Reviewed-by: Leo Yan &lt;leo.yan@linaro.org&gt;
Signed-off-by: Qi Liu &lt;liuqi115@huawei.com&gt;
Signed-off-by: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Acked-by: John Garry &lt;john.garry@huawei.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Bjorn Helgaas &lt;helgaas@kernel.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Clark &lt;james.clark@arm.com&gt;
Cc: Jonathan Cameron &lt;jonathan.cameron@huawei.com&gt;
Cc: Lorenzo Pieralisi &lt;lorenzo.pieralisi@arm.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Mathieu Poirier &lt;mathieu.poirier@linaro.org&gt;
Cc: Mike Leach &lt;mike.leach@linaro.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Qi Liu &lt;liuqi6124@gmail.com&gt;
Cc: Shameerali Kolothum Thodi &lt;shameerali.kolothum.thodi@huawei.com&gt;
Cc: Shaokun Zhang &lt;zhangshaokun@hisilicon.com&gt;
Cc: Suzuki Poulouse &lt;suzuki.poulose@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Zeng Prime &lt;prime.zeng@huawei.com&gt;
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/20220927081400.14364-3-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf auxtrace arm: Refactor event list iteration in auxtrace_record__init()</title>
<updated>2022-10-15T13:13:16Z</updated>
<author>
<name>Qi Liu</name>
<email>liuqi115@huawei.com</email>
</author>
<published>2022-09-27T08:13:58Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=45a3975f8e4c56829ada20f7a6a29095ca05e375'/>
<id>urn:sha1:45a3975f8e4c56829ada20f7a6a29095ca05e375</id>
<content type='text'>
Add find_pmu_for_event() and use to simplify logic in
auxtrace_record_init(). find_pmu_for_event() will be reused in
subsequent patches.

Reviewed-by: John Garry &lt;john.garry@huawei.com&gt;
Reviewed-by: Jonathan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Reviewed-by: Leo Yan &lt;leo.yan@linaro.org&gt;
Signed-off-by: Qi Liu &lt;liuqi115@huawei.com&gt;
Signed-off-by: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Bjorn Helgaas &lt;helgaas@kernel.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Clark &lt;james.clark@arm.com&gt;
Cc: Lorenzo Pieralisi &lt;lorenzo.pieralisi@arm.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Mathieu Poirier &lt;mathieu.poirier@linaro.org&gt;
Cc: Mike Leach &lt;mike.leach@linaro.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Qi Liu &lt;liuqi6124@gmail.com&gt;
Cc: Shameerali Kolothum Thodi &lt;shameerali.kolothum.thodi@huawei.com&gt;
Cc: Shaokun Zhang &lt;zhangshaokun@hisilicon.com&gt;
Cc: Suzuki Poulouse &lt;suzuki.poulose@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Zeng Prime &lt;prime.zeng@huawei.com&gt;
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-pci@vger.kernel.org
Cc: linuxarm@huawei.com
Link: https://lore.kernel.org/r/20220927081400.14364-2-yangyicong@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf intel-pt: Fix system_wide dummy event for hybrid</title>
<updated>2022-10-15T13:13:16Z</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2022-10-12T08:22:59Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=6cef7dab3e2e5cb23a13569c3880c0532326748c'/>
<id>urn:sha1:6cef7dab3e2e5cb23a13569c3880c0532326748c</id>
<content type='text'>
User space tasks can migrate between CPUs, so when tracing selected CPUs,
system-wide sideband is still needed, however evlist-&gt;core.has_user_cpus
is not set in the hybrid case, so check the target cpu_list instead.

Fixes: 7d189cadbeebc778 ("perf intel-pt: Track sideband system-wide when needed")
Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Acked-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221012082259.22394-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf annotate: Add missing condition flags for arm64</title>
<updated>2022-10-14T13:47:35Z</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2022-10-06T22:22:32Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=531778b129937ea3a6923bc67a45d024712a12f7'/>
<id>urn:sha1:531778b129937ea3a6923bc67a45d024712a12f7</id>
<content type='text'>
According to the document [1], it can also have 'hs', 'lo', 'vc', 'vs' as a
condition code.  Let's add them too.

[1] https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/condition-codes-1-condition-flags-and-codes

Reported-by: Kevin Nomura &lt;nomurak@google.com&gt;
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: James Clark &lt;james.clark@arm.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: John Garry &lt;john.garry@huawei.com&gt;
Cc: Leo Yan &lt;leo.yan@linaro.org&gt;
Cc: Mike Leach &lt;mike.leach@linaro.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20221006222232.266416-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'perf-tools-for-v6.1-1-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux</title>
<updated>2022-10-11T22:02:25Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-10-11T22:02:25Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=d465bff130bf4ca17b6980abe51164ace1e0cba4'/>
<id>urn:sha1:d465bff130bf4ca17b6980abe51164ace1e0cba4</id>
<content type='text'>
Pull perf tools updates from Arnaldo Carvalho de Melo:

 - Add support for AMD on 'perf mem' and 'perf c2c', the kernel
   enablement patches went via tip.

   Example:

      $ sudo perf mem record -- -c 10000
      ^C[ perf record: Woken up 227 times to write data ]
      [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]

      $ sudo perf mem report -F mem,sample,snoop
      Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
      Memory access                  Samples  Snoop
      N/A                             700620  N/A
      L1 hit                          126675  N/A
      L2 hit                             424  N/A
      L3 hit                             664  HitM
      L3 hit                              10  N/A
      Local RAM hit                        2  N/A
      Remote RAM (1 hop) hit            8558  N/A
      Remote Cache (1 hop) hit             3  N/A
      Remote Cache (1 hop) hit             2  HitM
      Remote Cache (2 hops) hit           10  HitM
      Remote Cache (2 hops) hit            6  N/A
      Uncached hit                         4  N/A
      $

 - "perf lock" improvements:

     - Add -E/--entries option to limit the number of entries to
       display, say to ask for just the top 5 contended locks.

     - Add -q/--quiet option to suppress header and debug messages.

     - Add a 'perf test' kernel lock contention entry to test 'perf
       lock'.

 - "perf lock contention" improvements:

     - Ask BPF's bpf_get_stackid() to skip some callchain entries.

       The ones closer to the tooling are bpf related and not that
       interesting, the ones calling the locking function are the ones
       we're interested in, example of a full, unskipped callstack:

     - Allow changing the callstack depth and number of entries to skip.

           1     10.74 us     10.74 us     10.74 us     spinlock   __bpf_trace_contention_begin+0xb
                          0xffffffffc03b5c47  bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117
                          0xffffffffc03b5c47  bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117
                          0xffffffffbb8b8e75  bpf_trace_run2+0x35
                          0xffffffffbb7eab9b  __bpf_trace_contention_begin+0xb
                          0xffffffffbb7ebe75  queued_spin_lock_slowpath+0x1f5
                          0xffffffffbc1c26ff  _raw_spin_lock+0x1f
                          0xffffffffbb841015  tick_do_update_jiffies64+0x25
                          0xffffffffbb8409ee  tick_irq_enter+0x9e

     - Show full callstack in verbose mode (-v option), sometimes this
       is desirable instead of showing just one callstack entry.

 - Allow multiple time ranges in 'perf record --delay' to help in
   reducing the amount of data collected from hardware tracing (Intel
   PT, etc) when there is a rough idea of periods of time where events
   of interest take time.

 - Add Intel PT to record only decoder debug messages when error
   happens.

 - Improve layout of Intel PT man page.

 - Add new branch types: alignment, data and inst faults and arch
   specific ones, such as fiq, debug_halt, debug_exit, debug_inst and
   debug_data on arm64.

   Kernel enablement went thru the tip tree.

 - Fix 'perf probe' error log check in 'perf test' when no debuginfo is
   available.

 - Fix 'perf stat' aggregation mode logic, it should be looking at the
   CPU not at the core number.

 - Fix flags parsing in 'perf trace' filters.

 - Introduce compact encoding of CPU range encoding on perf.data, to
   avoid having a bitmap with all the CPUs.

 - Improvements to the 'perf stat' metrics, including adding
   "core_wide", and computing "smt" from the CPU topology.

 - Add support to the new PERF_FORMAT_LOST perf_event_attr.read_format,
   that allows tooling to ask for the precise number of lost samples for
   a given event.

 - Add 'addr' sort key to see just the address of sampled instructions:

      $ perf record -o- true | perf report -i- -s addr
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 0.000 MB - ]
      # Samples: 12  of event 'cycles:u'
      # Event count (approx.): 252512
      #
      # Overhead  Address
      # ........  ..................
          42.96%  0x7f96f08443d7
          29.55%  0x7f96f0859b50
          14.76%  0x7f96f0852e02
           8.30%  0x7f96f0855028
           4.43%  0xffffffff8de01087

      perf annotate: Toggle full address &lt;-&gt; offset display

 - Add 'f' hotkey to the 'perf annotate' TUI interface when in
   'disassembler output' mode ('o' hotkey) to toggle showing full
   virtual address or just the offset.

 - Cache DSO build-ids when synthesizing PERF_RECORD_MMAP records for
   pre-existing threads, at the start of a 'perf record' session,
   speeding up that record startup phase.

 - Add a command line option to specify build ids in 'perf inject'.

 - Update JSON event files for the Intel alderlake, broadwell,
   broadwellde, broadwellx, cascadelakex, haswell, haswellx, icelake,
   icelakex, ivybridge, ivytown, jaketown, sandybridge, sapphirerapids,
   skylake, skylakex, and tigerlake processors.

 - Update vendor JSON event files for the ARM Neoverse V1 and E1
   platforms.

 - Add a 'perf test' entry for 'perf mem' where a struct has false
   sharing and this gets detected in the 'perf mem' output, tested with
   Intel, AMD and ARM64 systems.

 - Add a 'perf test' entry to test the resolution of java symbols, where
   an output like this is expected:

       8.18%  jshell    jitted-50116-29.so    [.] Interpreter
       0.75%  Thread-1  jitted-83602-1670.so  [.] jdk.internal.jimage.BasicImageReader.getString(int)

 - Add tests for the ARM64 CoreSight hardware tracing feature, with
   specially crafted pureloop, memcpy, thread loop and unroll tread that
   then gets traced and the output compared with expected output.

   Documentation explaining it is also included.

 - Add per thread Intel PT 'perf test' entry to check that
   PERF_RECORD_TEXT_POKE events are recorded per CPU, resulting in a
   mixture of per thread and per CPU events and mmaps, verify that this
   gets all recorded correctly.

 - Introduce pthread mutex wrappers to allow for building with clang's
   -Wthread-safety, i.e. using the "guarded_by" "pt_guarded_by"
   "lockable", "exclusive_lock_function", "exclusive_trylock_function",
   "exclusive_locks_required", and "no_thread_safety_analysis" compiler
   function attributes.

 - Fix empty version number when building outside of a git repo.

 - Improve feature detection display when multiple versions of a feature
   are present, such as for binutils libbfd, that has a mix of possible
   ways to detect according to the Linux distribution.

   Previously in some cases we had:

      Auto-detecting system features
      &lt;SNIP&gt;
      ...                                  libbfd: [ on  ]
      ...                          libbfd-liberty: [ on  ]
      ...                        libbfd-liberty-z: [ on  ]
      &lt;SNIP&gt;

   Now for this case we show just the main feature:

      Auto-detecting system features
      &lt;SNIP&gt;
      ...                                  libbfd: [ on  ]
      &lt;SNIP&gt;

 - Remove some unused structs, variables, macros, function prototypes
   and includes from various places.

* tag 'perf-tools-for-v6.1-1-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (169 commits)
  perf script: Add missing fields in usage hint
  perf mem: Print "LFB/MAB" for PERF_MEM_LVLNUM_LFB
  perf mem/c2c: Avoid printing empty lines for unsupported events
  perf mem/c2c: Add load store event mappings for AMD
  perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events
  perf mem: Add support for printing PERF_MEM_LVLNUM_{CXL|IO}
  perf amd ibs: Sync arch/x86/include/asm/amd-ibs.h header with the kernel
  tools headers UAPI: Sync include/uapi/linux/perf_event.h header with the kernel
  perf stat: Fix cpu check to use id.cpu.cpu in aggr_printout()
  perf test coresight: Add relevant documentation about ARM64 CoreSight testing
  perf test: Add git ignore for tmp and output files of ARM CoreSight tests
  perf test coresight: Add unroll thread test shell script
  perf test coresight: Add unroll thread test tool
  perf test coresight: Add thread loop test shell scripts
  perf test coresight: Add thread loop test tool
  perf test coresight: Add memcpy thread test shell script
  perf test coresight: Add memcpy thread test tool
  perf test: Add git ignore for perf data generated by the ARM CoreSight tests
  perf test: Add arm64 asm pureloop test shell script
  perf test: Add asm pureloop test tool
  ...
</content>
</entry>
<entry>
<title>perf mem/c2c: Add load store event mappings for AMD</title>
<updated>2022-10-06T19:30:06Z</updated>
<author>
<name>Ravi Bangoria</name>
<email>ravi.bangoria@amd.com</email>
</author>
<published>2022-10-06T15:39:43Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=f7b58cbdb3ff36eba8622e67eee66c10dd1c9995'/>
<id>urn:sha1:f7b58cbdb3ff36eba8622e67eee66c10dd1c9995</id>
<content type='text'>
The 'perf mem' and 'perf c2c' tools are wrappers around 'perf record'
with mem load/ store events. IBS tagged load/store sample provides most
of the information needed for these tools. Wire in the "ibs_op//" event
as mem-ldst event for AMD.

There are some limitations though: Only load/store micro-ops provide
mem/c2c information. Whereas, IBS does not have a way to choose a
particular type of micro-op to tag. This results in many non-LS
micro-ops being tagged which appear as N/A in the perf report. IBS,
being an uncore pmu from kernel point of view[1], does not support per
process monitoring. Thus, perf mem/c2c on AMD are currently supported in
per-cpu mode only.

Example:

  $ sudo perf mem record -- -c 10000
  ^C[ perf record: Woken up 227 times to write data ]
  [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ]

  $ sudo perf mem report -F mem,sample,snoop
  Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762
  Memory access                  Samples  Snoop
  N/A                             700620  N/A
  L1 hit                          126675  N/A
  L2 hit                             424  N/A
  L3 hit                             664  HitM
  L3 hit                              10  N/A
  Local RAM hit                        2  N/A
  Remote RAM (1 hop) hit            8558  N/A
  Remote Cache (1 hop) hit             3  N/A
  Remote Cache (1 hop) hit             2  HitM
  Remote Cache (2 hops) hit           10  HitM
  Remote Cache (2 hops) hit            6  N/A
  Uncached hit                         4  N/A
  $

[1]: https://lore.kernel.org/lkml/20220829113347.295-1-ravi.bangoria@amd.com

Signed-off-by: Ravi Bangoria &lt;ravi.bangoria@amd.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Ali Saidi &lt;alisaidi@amazon.com&gt;
Cc: Ananth Narayan &lt;ananth.narayan@amd.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Joe Mario &lt;jmario@redhat.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Kim Phillips &lt;kim.phillips@amd.com&gt;
Cc: Leo Yan &lt;leo.yan@linaro.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Sandipan Das &lt;sandipan.das@amd.com&gt;
Cc: Santosh Shukla &lt;santosh.shukla@amd.com&gt;
Cc: Stephane Eranian &lt;eranian@google.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: x86@kernel.org
Link: https://lore.kernel.org/r/20221006153946.7816-6-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>perf tools: Add evlist__add_sched_switch()</title>
<updated>2022-10-06T11:03:53Z</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@kernel.org</email>
</author>
<published>2022-10-03T20:46:46Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=182bb594e0678b3ceac99f4ec3daa5d22ea3d0ce'/>
<id>urn:sha1:182bb594e0678b3ceac99f4ec3daa5d22ea3d0ce</id>
<content type='text'>
Add a help to create a system-wide sched_switch event.  One merit is
that it sets the system-wide bit before adding it to evlist so that
the libperf can handle the cpu and thread maps correctly.

Reviewed-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Ian Rogers &lt;irogers@google.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Leo Yan &lt;leo.yan@linaro.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20221003204647.1481128-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
</entry>
<entry>
<title>powerpc: Adopt SYSCALL_DEFINE for arch-specific syscall handlers</title>
<updated>2022-09-28T09:22:08Z</updated>
<author>
<name>Rohan McLure</name>
<email>rmclure@linux.ibm.com</email>
</author>
<published>2022-09-21T06:55:55Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=dec20c50df79cadaff17e964ef7f622491a52134'/>
<id>urn:sha1:dec20c50df79cadaff17e964ef7f622491a52134</id>
<content type='text'>
Arch-specific implementations of syscall handlers are currently used
over generic implementations for the following reasons:

1. Semantics unique to powerpc
2. Compatibility syscalls require 'argument padding' to comply with
   64-bit argument convention in ELF32 abi.
3. Parameter types or order is different in other architectures.

These syscall handlers have been defined prior to this patch series
without invoking the SYSCALL_DEFINE or COMPAT_SYSCALL_DEFINE macros with
custom input and output types. We remove every such direct definition in
favour of the aforementioned macros.

Also update syscalls.tbl in order to refer to the symbol names generated
by each of these macros. Since ppc64_personality can be called by both
64 bit and 32 bit binaries through compatibility, we must generate both
both compat_sys_ and sys_ symbols for this handler.

As an aside:
A number of architectures including arm and powerpc agree on an
alternative argument order and numbering for most of these arch-specific
handlers. A future patch series may allow for asm/unistd.h to signal
through its defines that a generic implementation of these syscall
handlers with the correct calling convention be emitted, through the
__ARCH_WANT_COMPAT_SYS_... convention.

Signed-off-by: Rohan McLure &lt;rmclure@linux.ibm.com&gt;
Reviewed-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/r/20220921065605.1051927-16-rmclure@linux.ibm.com

</content>
</entry>
<entry>
<title>powerpc/32: Remove powerpc select specialisation</title>
<updated>2022-09-26T13:00:15Z</updated>
<author>
<name>Rohan McLure</name>
<email>rmclure@linux.ibm.com</email>
</author>
<published>2022-09-21T06:55:51Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=b6b1334c9510e162bd8ca0ae58403cafad9572f1'/>
<id>urn:sha1:b6b1334c9510e162bd8ca0ae58403cafad9572f1</id>
<content type='text'>
Syscall #82 has been implemented for 32-bit platforms in a unique way on
powerpc systems. This hack will in effect guess whether the caller is
expecting new select semantics or old select semantics. It does so via a
guess, based off the first parameter. In new select, this parameter
represents the length of a user-memory array of file descriptors, and in
old select this is a pointer to an arguments structure.

The heuristic simply interprets sufficiently large values of its first
parameter as being a call to old select. The following is a discussion
on how this syscall should be handled.


As discussed in this thread, the existence of such a hack suggests that for
whatever powerpc binaries may predate glibc, it is most likely that they
would have taken use of the old select semantics. x86 and arm64 both
implement this syscall with oldselect semantics.

Remove the powerpc implementation, and update syscall.tbl to refer to emit
a reference to sys_old_select and compat_sys_old_select
for 32-bit binaries, in keeping with how other architectures support
syscall #82.

Signed-off-by: Rohan McLure &lt;rmclure@linux.ibm.com&gt;
Reviewed-by: Nicholas Piggin &lt;npiggin@gmail.com&gt;
Signed-off-by: Michael Ellerman &lt;mpe@ellerman.id.au&gt;
Link: https://lore.kernel.org/lkml/13737de5-0eb7-e881-9af0-163b0d29a1a0@csgroup.eu/
Link: https://lore.kernel.org/r/20220921065605.1051927-12-rmclure@linux.ibm.com

</content>
</entry>
</feed>
