aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/exported-sql-viewer.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2025-05-09perf demangle-rust: Add rustc-demangle C demanglerIan Rogers2-0/+2130
Imported at commit 80e40f57d99f ("add comment about finding latest version of code") from: https://github.com/rust-lang/rustc-demangle/blob/main/crates/native-c/src/demangle.c https://github.com/rust-lang/rustc-demangle/blob/main/crates/native-c/include/demangle.h There is discussion of this issue motivating the import in: https://github.com/rust-lang/rust/issues/60705 https://lore.kernel.org/lkml/20250129193037.573431-1-irogers@google.com/ The SPDX lines reflect the dual license Apache-2 or MIT in: https://github.com/rust-lang/rustc-demangle/blob/main/README.md Following Migual Ojeda's suggestion comments were added on copyright and keeping the code in sync with upstream. The files are renamed as perf supports multiple demanglers and so demangle as a name would be overloaded. The work here was done by Ariel Ben-Yehuda <ariel.byd@gmail.com> and I am merely importing it as discussed in the rust-lang issue. Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@kernel.org> Cc: Ariel Ben-Yehuda <ariel.byd@gmail.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Bill Wendling <morbo@google.com> Cc: Björn Roy Baron <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Daniel Xu <dxu@dxuuu.xyz> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Justin Stitt <justinstitt@google.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Trevor Gross <tmgross@umich.edu> Link: https://lore.kernel.org/r/20250430004128.474388-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09perf test amd ibs: Fix spelling mistake "Asssuming" -> "Assuming"Colin Ian King1-1/+1
There is a spelling mistake ina pr_debug message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250507082421.188848-1-colin.i.king@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09perf pmu: Use available core PMU for raw eventsNamhyung Kim1-7/+14
When it finds a matching PMU for a legacy event, it should look for core PMUs. The raw events also refers to core events so it should be handled similarly. On x86, PERF_TYPE_RAW should match with the existing cpu PMU. But on ARM, there's no PMU with the matching type so it'll pick the first core PMU for it. Suggested-by: Ian Rogers <irogers@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250507215939.54399-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09perf lock contention: Add -J/--inject-delay optionNamhyung Kim5-0/+164
This is to slow down lock acquistion (on contention locks) deliberately. A possible use case is to estimate impact on application performance by optimization of kernel locking behavior. By delaying the lock it can simulate the worse condition as a control group, and then compare with the current behavior as a optimized condition. The syntax is 'time@function' and the time can have unit suffix like "us" and "ms". For example, I ran a simple test like below. $ sudo perf lock con -abl -L tasklist_lock -- \ sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 92 1.18 ms 199.54 us 12.79 us ffffffff8a806080 tasklist_lock (rwlock) The contention count was 92 and the average wait time was around 10 us. But if I add 100 usec of delay to the tasklist_lock, $ sudo perf lock con -abl -L tasklist_lock -J 100us@tasklist_lock -- \ sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 190 15.67 ms 230.10 us 82.46 us ffffffff8a806080 tasklist_lock (rwlock) The contention count increased and the average wait time was up closed to 100 usec. If I increase the delay even more, $ sudo perf lock con -abl -L tasklist_lock -J 1ms@tasklist_lock -- \ sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 1002 2.80 s 3.01 ms 2.80 ms ffffffff8a806080 tasklist_lock (rwlock) Now every sleep process had contention and the wait time was more than 1 msec. This is on my 4 CPU laptop so I guess one CPU has the lock while other 3 are waiting for it mostly. For simplicity, it only supports global locks for now. Committer testing: root@number:~# grep -m1 'model name' /proc/cpuinfo model name : AMD Ryzen 9 9950X3D 16-Core Processor root@number:~# perf lock con -abl -L tasklist_lock -- sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 142 453.85 us 25.39 us 3.20 us ffffffffae808080 tasklist_lock (rwlock) root@number:~# perf lock con -abl -L tasklist_lock -J 100us@tasklist_lock -- sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 1040 2.39 s 3.11 ms 2.30 ms ffffffffae808080 tasklist_lock (rwlock) root@number:~# perf lock con -abl -L tasklist_lock -J 1ms@tasklist_lock -- sh -c 'for i in $(seq 1000); do sleep 1 & done; wait' contended total wait max wait avg wait address symbol 1025 24.72 s 31.01 ms 24.12 ms ffffffffae808080 tasklist_lock (rwlock) root@number:~# Suggested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250509171950.183591-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-09perf tests: Fix 'perf report' tests installationMichael Petlan1-1/+2
There was a copy-paste mistake in the installation commands. Also, we need to install stderr-whitelist.txt file, which contains allowed messages that are printed on stderr and should not cause test fail. Fixes: 097fe67df1aa9cc7 ("perf testsuite: Install perf-report tests in the 'make install-tests -C tools/perf' target") Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250113182605.130719-6-vmolnaro@redhat.com Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf trace: Fix leaks of 'struct thread' in set_filter_loop_pids()Namhyung Kim1-0/+3
I've found some leaks from 'perf trace -a'. It seems there are more leaks but this is what I can find for now. Fixes: 082ab9a18e532864 ("perf trace: Filter out 'sshd' in the tracer ancestry in syswide tracing") Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250403054213.7021-1-namhyung@kernel.org [ split from a larget patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf trace: Fix leaks of 'struct thread' in fprintf_sys_enter()Namhyung Kim1-1/+1
I've found some leaks from 'perf trace -a'. It seems there are more leaks but this is what I can find for now. Fixes: 70351029b55677eb ("perf thread: Add support for reading the e_machine type for a thread") Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250403054213.7021-1-namhyung@kernel.org [ split from a larget patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf parse-events: Add debug dump of evlist if reorderedIan Rogers1-3/+13
Add debug verbose output to show how evsels were reordered by parse_events__sort_events_and_fix_groups(). For example: ``` $ perf record -v -e '{instructions,cycles}' true Using CPUID GenuineIntel-6-B7-1 WARNING: events were regrouped to match PMUs evlist after sorting/fixing: '{cpu_atom/instructions/,cpu_atom/cycles/},{cpu_core/instructions/,cpu_core/cycles/}' ``` Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250402201549.4090305-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf evlist: Make groups visible in evlist__format_evsels() outputIan Rogers1-1/+12
Make groups visible in output: Before: {cycles,instructions} -> cpu_atom/cycles/,cpu_atom/instructions/,cpu_core/cycles/,cpu_core/instructions/ After: {cycles,instructions} -> {cpu_atom/cycles/,cpu_atom/instructions/},{cpu_core/cycles/,cpu_core/instructions/} Committer testing: Before: root@number:~# perf record -e '{cycles,instructions,cache-misses}' /tmp/bla Failed to collect 'cycles,instructions,cache-misses' for the '/tmp/bla' workload: Permission denied root@number:~# After: root@number:~# perf record -e '{cycles,instructions,cache-misses}' /tmp/bla Failed to collect '{cycles,instructions,cache-misses}' for the '/tmp/bla' workload: Permission denied root@number:~# Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250402201549.4090305-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf evlist: Refactor evlist__scnprintf_evsels()Ian Rogers3-13/+20
Switch output to using a strbuf so the storage can be resized. Add a maximum size argument to avoid too much output that may happen for uncore events. Rename as scnprintf is no longer used. Committer testing: With the patch applied: root@number:~# perf probe -x ~/bin/perf evlist__format_evsels Added new event: probe_perf:evlist_format_evsels (on evlist__format_evsels in /home/acme/bin/perf) You can now use it in all perf tools, such as: perf record -e probe_perf:evlist_format_evsels -aR sleep 1 root@number:~# perf probe -l probe_perf:evlist_format_evsels (on evlist__format_evsels@util/evlist.c in /home/acme/bin/perf) root@number:~# perf trace -e probe_perf:*/max-stack=10/ perf record -e cycles,instructions,cache-misses /tmp/bla Failed to collect 'cycles,instructions,cache-misses' for the '/tmp/bla' workload: Permission denied 0.000 perf/3893011 probe_perf:evlist_format_evsels(__probe_ip: 6183397) evlist__format_evsels (/home/acme/bin/perf) __cmd_record (/home/acme/bin/perf) cmd_record (/home/acme/bin/perf) run_builtin (/home/acme/bin/perf) handle_internal_command (/home/acme/bin/perf) run_argv (/home/acme/bin/perf) main (/home/acme/bin/perf) __libc_start_call_main (/usr/lib64/libc.so.6) __libc_start_main@@GLIBC_2.34 (/usr/lib64/libc.so.6) _start (/home/acme/bin/perf) root@number:~# Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250402201549.4090305-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf stat: Remove print_mixed_hw_group_errorIan Rogers2-32/+0
print_mixed_hw_group_error will print a warning when a group of events uses different PMUs. This isn't possible to happen as parse_events__sort_events_and_fix_groups() will break groups when this happens, adding the warning at the start of perf of: WARNING: events were regrouped to match PMUs As the previous mixed group warning can never happen, remove the associated code. Committer testing: Before/after: acme@five:~$ perf stat -e '{cpu_core/cycles/,cpu_atom/cycles/}' sleep 1 WARNING: events were regrouped to match PMUs Performance counter stats for 'sleep 1': 424,895 cpu_atom/cycles/u <not counted> cpu_core/cycles/u (0.00%) 1.011862314 seconds time elapsed 0.000000000 seconds user 0.003166000 seconds sys acme@five:~$ Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250402201549.4090305-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf stat: Better hybrid support for the NMI watchdog warningIan Rogers1-3/+15
Prior to this patch evlist__has_hybrid would return false if the processor wasn't hybrid or the evlist didn't contain any core events. If the only PMU used by events was cpu_core then it would true even though there are no cpu_atom events. For example: ``` $ perf stat --cputype=cpu_core -e '{cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles}' true Performance counter stats for 'true': <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) <not counted> cpu_core/cycles/ (0.00%) 0.001981900 seconds time elapsed 0.002311000 seconds user 0.000000000 seconds sys ``` This patch changes evlist__has_hybrid to return true only if the evlist contains events from >1 core PMU. This means the NMI watchdog warning is shown for the case above. Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Levi Yun <yeoreum.yun@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250402201549.4090305-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf trace: Add missing thread__put() in thread__e_machine()Ian Rogers1-0/+1
Add missing thread__put() of the found parent thread in thread__e_machine(). Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250401202715.3493567-1-irogers@google.com [ split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-08perf trace: Free the files.max entry in files->tableIan Rogers1-1/+2
The files.max is the maximum valid fd in the files array and so freeing the values needs to be inclusive of the max value. Reviewed-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250401202715.3493567-1-irogers@google.com [ split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-06dm: fix copying after src array boundariesTudor Ambarus1-1/+1
The blammed commit copied to argv the size of the reallocated argv, instead of the size of the old_argv, thus reading and copying from past the old_argv allocated memory. Following BUG_ON was hit: [ 3.038929][ T1] kernel BUG at lib/string_helpers.c:1040! [ 3.039147][ T1] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP ... [ 3.056489][ T1] Call trace: [ 3.056591][ T1] __fortify_panic+0x10/0x18 (P) [ 3.056773][ T1] dm_split_args+0x20c/0x210 [ 3.056942][ T1] dm_table_add_target+0x13c/0x360 [ 3.057132][ T1] table_load+0x110/0x3ac [ 3.057292][ T1] dm_ctl_ioctl+0x424/0x56c [ 3.057457][ T1] __arm64_sys_ioctl+0xa8/0xec [ 3.057634][ T1] invoke_syscall+0x58/0x10c [ 3.057804][ T1] el0_svc_common+0xa8/0xdc [ 3.057970][ T1] do_el0_svc+0x1c/0x28 [ 3.058123][ T1] el0_svc+0x50/0xac [ 3.058266][ T1] el0t_64_sync_handler+0x60/0xc4 [ 3.058452][ T1] el0t_64_sync+0x1b0/0x1b4 [ 3.058620][ T1] Code: f800865e a9bf7bfd 910003fd 941f48aa (d4210000) [ 3.058897][ T1] ---[ end trace 0000000000000000 ]--- [ 3.059083][ T1] Kernel panic - not syncing: Oops - BUG: Fatal exception Fix it by copying the size of src, and not the size of dst, as it was. Fixes: 5a2a6c428190 ("dm: always update the array size in realloc_argv on success") Cc: stable@vger.kernel.org Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-05-05perf test: Add direct off-cpu testsHoward Chu1-0/+71
Since we added --off-cpu-thresh, add tests for when a sample's off-cpu time is above the threshold, and when it's below the threshold. Note that the basic test performed in test_offcpu_basic() collects a direct sample now, since sleep 1 has duration of 1000ms, higher than the default value of --off-cpu-thresh of 500ms, resulting in a direct sample. An example: $ sudo perf test offcpu 124: perf record offcpu profiling tests : Ok $ Committer testing: root@number:~# perf test offcpu 126: perf record offcpu profiling tests : Ok root@number:~# perf test -v offcpu 126: perf record offcpu profiling tests : Ok root@number:~# perf test -vv offcpu 126: perf record offcpu profiling tests: --- start --- test child forked, pid 1410791 Checking off-cpu privilege Basic off-cpu test Basic off-cpu test [Success] Child task off-cpu test Child task off-cpu test [Success] Threshold test (above threshold) Threshold test (above threshold) [Success] Threshold test (below threshold) Threshold test (below threshold) [Success] ---- end(0) ---- 126: perf record offcpu profiling tests : Ok root@number:~# Suggested-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250501022809.449767-11-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf record --off-cpu: Add --off-cpu-thresh optionHoward Chu6-1/+41
Specify the threshold for dumping offcpu samples with --off-cpu-thresh, the unit is milliseconds. Default value is 500ms. Example: perf record --off-cpu --off-cpu-thresh 824 The example above collects direct off-cpu samples where the off-cpu time is longer than 824ms. Committer testing: After commenting out the end off-cpu dump to have just the ones that are added right after the task is scheduled back, and using a threshould of 1000ms, we see some periods (the 5th column, just before "offcpu-time" in the 'perf script' output) that are over 1000.000.000 nanoseconds: root@number:~# perf record --off-cpu --off-cpu-thresh 10000 ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 3.902 MB perf.data (34335 samples) ] root@number:~# perf script <SNIP> Isolated Web Co 59932 [028] 63839.594437: 1000049427 offcpu-time: 7fe63c7976c2 __syscall_cancel_arch_end+0x0 (/usr/lib64/libc.so.6) 7fe63c78c04c __futex_abstimed_wait_common+0x7c (/usr/lib64/libc.so.6) 7fe63c78e928 pthread_cond_timedwait@@GLIBC_2.3.2+0x178 (/usr/lib64/libc.so.6) 5599974a9fe7 mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&)+0xe7 (/usr/lib64/fir> 100000000 [unknown] ([unknown]) swapper 0 [025] 63839.594459: 195724 cycles:P: ffffffffac328270 read_tsc+0x0 ([kernel.kallsyms]) Isolated Web Co 59932 [010] 63839.594466: 1000055278 offcpu-time: 7fe63c7976c2 __syscall_cancel_arch_end+0x0 (/usr/lib64/libc.so.6) 7fe63c78ba24 __syscall_cancel+0x14 (/usr/lib64/libc.so.6) 7fe63c804c4e __poll+0x1e (/usr/lib64/libc.so.6) 7fe633b0d1b8 PollWrapper(_GPollFD*, unsigned int, int) [clone .lto_priv.0]+0xf8 (/usr/lib64/firefox/libxul.so) 10000002c [unknown] ([unknown]) swapper 0 [027] 63839.594475: 134433 cycles:P: ffffffffad4c45d9 irqentry_enter+0x19 ([kernel.kallsyms]) swapper 0 [028] 63839.594499: 215838 cycles:P: ffffffffac39199a switch_mm_irqs_off+0x10a ([kernel.kallsyms]) MediaPD~oder #1 1407676 [027] 63839.594514: 134433 cycles:P: 7f982ef5e69f dct_IV(int*, int, int*)+0x24f (/usr/lib64/libfdk-aac.so.2.0.0) swapper 0 [024] 63839.594524: 267411 cycles:P: ffffffffad4c6ee6 poll_idle+0x56 ([kernel.kallsyms]) MediaSu~sor #75 1093827 [026] 63839.594555: 332652 cycles:P: 55be753ad030 moz_xmalloc+0x200 (/usr/lib64/firefox/firefox) swapper 0 [027] 63839.594616: 160548 cycles:P: ffffffffad144840 menu_select+0x570 ([kernel.kallsyms]) Isolated Web Co 14019 [027] 63839.595120: 1000050178 offcpu-time: 7fc9537cc6c2 __syscall_cancel_arch_end+0x0 (/usr/lib64/libc.so.6) 7fc9537c104c __futex_abstimed_wait_common+0x7c (/usr/lib64/libc.so.6) 7fc9537c3928 pthread_cond_timedwait@@GLIBC_2.3.2+0x178 (/usr/lib64/libc.so.6) 7fc95372a3c8 pt_TimedWait+0xb8 (/usr/lib64/libnspr4.so) 7fc95372a8d8 PR_WaitCondVar+0x68 (/usr/lib64/libnspr4.so) 7fc94afb1f7c WatchdogMain(void*)+0xac (/usr/lib64/firefox/libxul.so) 7fc947498660 [unknown] ([unknown]) 7fc9535fce88 [unknown] ([unknown]) 7fc94b620e60 WatchdogManager::~WatchdogManager()+0x0 (/usr/lib64/firefox/libxul.so) fff8548387f8b48 [unknown] ([unknown]) swapper 0 [003] 63839.595712: 212948 cycles:P: ffffffffacd5b865 acpi_os_read_port+0x55 ([kernel.kallsyms]) <SNIP> Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Suggested-by: Ian Rogers <irogers@google.com> Suggested-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-2-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-10-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf record --off-cpu: Dump the remaining PERF_SAMPLE_ in sample_type from BPF's stack trace mapHoward Chu1-24/+35
Dump the remaining PERF_SAMPLE_ data, as if it is dumping a direct sample. Put the stack trace, tid, off-cpu time and cgroup id into the raw_data section, just like a direct off-cpu sample coming from BPF's bpf_perf_event_output(). This ensures that evsel__parse_sample() correctly parses both direct samples and accumulated samples. Suggested-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-10-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-9-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf script: Display off-cpu samples correctlyHoward Chu1-2/+2
No PERF_SAMPLE_CALLCHAIN in sample_type, but 'perf script' needs to display a callchain, have to specify manually. Also, prefer displaying a callchain: gvfs-afc-volume 2267 [001] 3829232.955656: 1001115340 offcpu-time: 77f05292603f __pselect+0xbf (/usr/lib/x86_64-linux-gnu/libc.so.6) 77f052a1801c [unknown] (/usr/lib/x86_64-linux-gnu/libusbmuxd-2.0.so.6.0.0) 77f052a18d45 [unknown] (/usr/lib/x86_64-linux-gnu/libusbmuxd-2.0.so.6.0.0) 77f05289ca94 start_thread+0x384 (/usr/lib/x86_64-linux-gnu/libc.so.6) 77f052929c3c clone3+0x2c (/usr/lib/x86_64-linux-gnu/libc.so.6) to a raw binary BPF output: BPF output: 0000: dd 08 00 00 db 08 00 00 <DD>...<DB>... 0008: cc ce ab 3b 00 00 00 00 <CC>Ϋ;.... 0010: 06 00 00 00 00 00 00 00 ........ 0018: 00 fe ff ff ff ff ff ff .<FE><FF><FF><FF><FF><FF><FF> 0020: 3f 60 92 52 f0 77 00 00 ?`.R<F0>w.. 0028: 1c 80 a1 52 f0 77 00 00 ..<A1>R<F0>w.. 0030: 45 8d a1 52 f0 77 00 00 E.<A1>R<F0>w.. 0038: 94 ca 89 52 f0 77 00 00 .<CA>.R<F0>w.. 0040: 3c 9c 92 52 f0 77 00 00 <..R<F0>w.. 0048: 00 00 00 00 00 00 00 00 ........ 0050: 00 00 00 00 .... Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-9-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-8-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf record --off-cpu: Disable perf_event's callchain collectionHoward Chu1-1/+1
There is a check in evsel.c that does this: if (evsel__is_offcpu_event(evsel)) evsel->core.attr.sample_type &= OFFCPU_SAMPLE_TYPES; This along with: #define OFFCPU_SAMPLE_TYPES (PERF_SAMPLE_IDENTIFIER | PERF_SAMPLE_IP | \ PERF_SAMPLE_TID | PERF_SAMPLE_TIME | \ PERF_SAMPLE_ID | PERF_SAMPLE_CPU | \ PERF_SAMPLE_PERIOD | PERF_SAMPLE_CALLCHAIN | \ PERF_SAMPLE_CGROUP) will tell perf_event to collect callchain. We don't need the callchain from perf_event when collecting off-cpu samples, because it's prev's callchain, not next's callchain. (perf_event) (task_storage) (needed) prev next | | ---sched_switch----> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-8-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-7-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf evsel: Assemble off-cpu samplesHoward Chu1-1/+34
Use the data in bpf-output samples, to assemble off-cpu samples. In evsel__is_offcpu_event(), check if sample_type is PERF_SAMPLE_RAW to support off-cpu sample data created by an older version of perf. Testing compatibility on off-cpu samples collected by perf before this patch series: See below, the sample_type still uses PERF_SAMPLE_CALLCHAIN $ perf script --header -i ./perf.data.ptn | grep "event : name = offcpu-time" # event : name = offcpu-time, , id = { 237917, 237918, 237919, 237920 }, type = 1 (software), size = 136, config = 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq } = 1, sample_type = IP|TID|TIME|CALLCHAIN|CPU|PERIOD|IDENTIFIER, read_format = ID|LOST, disabled = 1, freq = 1, sample_id_all = 1 The output is correct. $ perf script -i ./perf.data.ptn | grep offcpu-time gmain 2173 [000] 18446744069.414584: 100102015 offcpu-time: NetworkManager 901 [000] 18446744069.414584: 5603579 offcpu-time: Web Content 1183550 [000] 18446744069.414584: 46278 offcpu-time: gnome-control-c 2200559 [000] 18446744069.414584: 11998247014 offcpu-time: <SNIP> $ And after this patch series: $ perf script --header -i ./perf.data.off-cpu-v9 | grep "event : name = offcpu-time" # event : name = offcpu-time, , id = { 237959, 237960, 237961, 237962 }, type = 1 (software), size = 136, config = 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq } = 1, sample_type = IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format = ID|LOST, disabled = 1, freq = 1, sample_id_all = 1 $ ./perf script -i ./perf.data.off-cpu-v9 | grep offcpu-time gnome-shell 1875 [001] 4789616.361225: 100097057 offcpu-time: gnome-shell 1875 [001] 4789616.461419: 100107463 offcpu-time: firefox 2206821 [002] 4789616.475690: 255257245 offcpu-time: $ Committer testing: The command to record those samples: root@number:~# perf record --off-cpu -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.092 MB perf.data (1552 samples) ] root@number:~# Then, before this patch series, the sample_type for the "offcpu-time" event is: root@number:~# perf evlist -v | grep offcpu-time offcpu-time: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CALLCHAIN|CPU|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, freq: 1, sample_id_all: 1 root@number:~# And after it, after recording it again: root@number:~# perf record --off-cpu -a sleep 1 ; perf evlist -v | grep offcpu-time [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.151 MB perf.data (2843 samples) ] offcpu-time: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, sample_id_all: 1 root@number:~# Suggested-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-7-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-6-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf record --off-cpu: Dump off-cpu samples in BPFHoward Chu1-5/+84
Collect tid, period, callchain, and cgroup id and dump them when off-cpu time threshold is reached. We don't collect the off-cpu time twice (the delta), it's either in direct samples, or accumulated samples that are dumped at the end of perf.data. Suggested-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-6-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-5-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf record --off-cpu: Preparation of off-cpu BPF programHoward Chu2-0/+33
Set the perf_event map in BPF for dumping off-cpu samples, and set the offcpu_thresh to specify the threshold. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-5-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-4-howardchu95@gmail.com [ Added some missing iteration variables to off_cpu_config() and fixed up a manually edited patch hunk line boundary line ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf record --off-cpu: Parse off-cpu eventHoward Chu3-23/+21
Parse the off-cpu event using parse_event(), as bpf-output. Call evlist__enable_evsel() on off-cpu event. This fixes the inability to collect direct off-cpu samples on a workload, as reported by Arnaldo Carvalho de Melo <acme@redhat.com>. The reason being, workload sets enable_on_exec instead of calling evlist__enable(), but off-cpu event does not attach to an executable and execve won't be called, so the fds from perf_event_open() are not enabled. no-inherit should be set to 1, here's the reason: We update the BPF perf_event map for direct off-cpu sample dumping (in following patches), it executes as follows: bpf_map_update_value() bpf_fd_array_map_update_elem() perf_event_fd_array_get_ptr() perf_event_read_local() In perf_event_read_local(), there is: int perf_event_read_local(struct perf_event *event, u64 *value, u64 *enabled, u64 *running) { ... /* * It must not be an event with inherit set, we cannot read * all child counters from atomic context. */ if (event->attr.inherit) { ret = -EOPNOTSUPP; goto out; } Which means no-inherit has to be true for updating the BPF perf_event map. Moreover, for bpf-output events, we primarily want a system-wide event instead of a per-task event. The reason is that in BPF's bpf_perf_event_output(), BPF uses the CPU index to retrieve the perf_event file descriptor it outputs to. Making a bpf-output event system-wide naturally satisfies this requirement by mapping CPU appropriately. Suggested-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-4-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-3-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05perf evsel: Expose evsel__is_offcpu_event() for future useHoward Chu2-1/+3
Expose evsel__is_offcpu_event() so it can be used in off_cpu_config(), evsel__parse_sample() and 'perf script'. Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Tested-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241108204137.2444151-3-howardchu95@gmail.com Link: https://lore.kernel.org/r/20250501022809.449767-2-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-05um: fix _nofault accessesJohannes Berg4-15/+17
Nathan reported [1] that when built with clang, the um kernel crashes pretty much immediately. This turned out to be an issue with the inline assembly I had added, when clang used %rax/%eax for both operands. Reorder it so current->thread.segv_continue is written first, and then the lifetime of _faulted won't have overlap with the lifetime of segv_continue. In the email thread Benjamin also pointed out that current->mm is only NULL for true kernel tasks, but we could do this for a userspace task, so the current->thread.segv_continue logic must be lifted out of the mm==NULL check. Finally, while looking at this, put a barrier() so the NULL assignment to thread.segv_continue cannot be reorder before the possibly faulting operation. Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://lore.kernel.org/r/20250402221254.GA384@ax162 [1] Fixes: d1d7f01f7cd3 ("um: mark rodata read-only and implement _nofault accesses") Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-05-04Linux 6.15-rc5Linus Torvalds1-1/+1
2025-05-04parisc: Fix double SIGFPE crashHelge Deller1-3/+13
Camm noticed that on parisc a SIGFPE exception will crash an application with a second SIGFPE in the signal handler. Dave analyzed it, and it happens because glibc uses a double-word floating-point store to atomically update function descriptors. As a result of lazy binding, we hit a floating-point store in fpe_func almost immediately. When the T bit is set, an assist exception trap occurs when when the co-processor encounters *any* floating-point instruction except for a double store of register %fr0. The latter cancels all pending traps. Let's fix this by clearing the Trap (T) bit in the FP status register before returning to the signal handler in userspace. The issue can be reproduced with this test program: root@parisc:~# cat fpe.c static void fpe_func(int sig, siginfo_t *i, void *v) { sigset_t set; sigemptyset(&set); sigaddset(&set, SIGFPE); sigprocmask(SIG_UNBLOCK, &set, NULL); printf("GOT signal %d with si_code %ld\n", sig, i->si_code); } int main() { struct sigaction action = { .sa_sigaction = fpe_func, .sa_flags = SA_RESTART|SA_SIGINFO }; sigaction(SIGFPE, &action, 0); feenableexcept(FE_OVERFLOW); return printf("%lf\n",1.7976931348623158E308*1.7976931348623158E308); } root@parisc:~# gcc fpe.c -lm root@parisc:~# ./a.out Floating point exception root@parisc:~# strace -f ./a.out execve("./a.out", ["./a.out"], 0xf9ac7034 /* 20 vars */) = 0 getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0 ... rt_sigaction(SIGFPE, {sa_handler=0x1110a, sa_mask=[], sa_flags=SA_RESTART|SA_SIGINFO}, NULL, 8) = 0 --- SIGFPE {si_signo=SIGFPE, si_code=FPE_FLTOVF, si_addr=0x1078f} --- --- SIGFPE {si_signo=SIGFPE, si_code=FPE_FLTOVF, si_addr=0xf8f21237} --- +++ killed by SIGFPE +++ Floating point exception Signed-off-by: Helge Deller <deller@gmx.de> Suggested-by: John David Anglin <dave.anglin@bell.net> Reported-by: Camm Maguire <camm@maguirefamily.org> Cc: stable@vger.kernel.org
2025-05-04x86/boot/sev: Support memory acceptance in the EFI stub under SVSMArd Biesheuvel3-4/+43
Commit: d54d610243a4 ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance") provided a fix for SEV-SNP memory acceptance from the EFI stub when running at VMPL #0. However, that fix was insufficient for SVSM SEV-SNP guests running at VMPL >0, as those rely on a SVSM calling area, which is a shared buffer whose address is programmed into a SEV-SNP MSR, and the SEV init code that sets up this calling area executes much later during the boot. Given that booting via the EFI stub at VMPL >0 implies that the firmware has configured this calling area already, reuse it for performing memory acceptance in the EFI stub. Fixes: fcd042e86422 ("x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0") Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250428174322.2780170-2-ardb+git@google.com
2025-05-02irqchip/qcom-mpm: Prevent crash when trying to handle non-wake GPIOsStephan Gerhold1-0/+3
On Qualcomm chipsets not all GPIOs are wakeup capable. Those GPIOs do not have a corresponding MPM pin and should not be handled inside the MPM driver. The IRQ domain hierarchy is always applied, so it's required to explicitly disconnect the hierarchy for those. The pinctrl-msm driver marks these with GPIO_NO_WAKE_IRQ. qcom-pdc has a check for this, but irq-qcom-mpm is currently missing the check. This is causing crashes when setting up interrupts for non-wake GPIOs: root@rb1:~# gpiomon -c gpiochip1 10 irq: IRQ159: trimming hierarchy from :soc@0:interrupt-controller@f200000-1 Unable to handle kernel paging request at virtual address ffff8000a1dc3820 Hardware name: Qualcomm Technologies, Inc. Robotics RB1 (DT) pc : mpm_set_type+0x80/0xcc lr : mpm_set_type+0x5c/0xcc Call trace: mpm_set_type+0x80/0xcc (P) qcom_mpm_set_type+0x64/0x158 irq_chip_set_type_parent+0x20/0x38 msm_gpio_irq_set_type+0x50/0x530 __irq_set_trigger+0x60/0x184 __setup_irq+0x304/0x6bc request_threaded_irq+0xc8/0x19c edge_detector_setup+0x260/0x364 linereq_create+0x420/0x5a8 gpio_ioctl+0x2d4/0x6c0 Fix this by copying the check for GPIO_NO_WAKE_IRQ from qcom-pdc.c, so that MPM is removed entirely from the hierarchy for non-wake GPIOs. Fixes: a6199bb514d8 ("irqchip: Add Qualcomm MPM controller driver") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Signed-off-by: Stephan Gerhold <stephan.gerhold@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Alexey Klimov <alexey.klimov@linaro.org> Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20250502-irq-qcom-mpm-fix-no-wake-v1-1-8a1eafcd28d4@linaro.org
2025-05-02perf symbol-minimal: Fix double free in filename__read_build_idIan Rogers1-16/+18
Running the "perf script task-analyzer tests" with address sanitizer showed a double free: ``` FAIL: "test_csv_extended_times" Error message: "Failed to find required string:'Out-Out;'." ================================================================= ==19190==ERROR: AddressSanitizer: attempting double-free on 0x50b000017b10 in thread T0: #0 0x55da9601c78a in free (perf+0x26078a) (BuildId: e7ef50e08970f017a96fde6101c5e2491acc674a) #1 0x55da96640c63 in filename__read_build_id tools/perf/util/symbol-minimal.c:221:2 0x50b000017b10 is located 0 bytes inside of 112-byte region [0x50b000017b10,0x50b000017b80) freed by thread T0 here: #0 0x55da9601ce40 in realloc (perf+0x260e40) (BuildId: e7ef50e08970f017a96fde6101c5e2491acc674a) #1 0x55da96640ad6 in filename__read_build_id tools/perf/util/symbol-minimal.c:204:10 previously allocated by thread T0 here: #0 0x55da9601ca23 in malloc (perf+0x260a23) (BuildId: e7ef50e08970f017a96fde6101c5e2491acc674a) #1 0x55da966407e7 in filename__read_build_id tools/perf/util/symbol-minimal.c:181:9 SUMMARY: AddressSanitizer: double-free (perf+0x26078a) (BuildId: e7ef50e08970f017a96fde6101c5e2491acc674a) in free ==19190==ABORTING FAIL: "invocation of perf script report task-analyzer --csv-summary csvsummary --summary-extended command failed" Error message: "" FAIL: "test_csvsummary_extended" Error message: "Failed to find required string:'Out-Out;'." ---- end(-1) ---- 132: perf script task-analyzer tests : FAILED! ``` The buf_size if always set to phdr->p_filesz, but that may be 0 causing a free and realloc to return NULL. This is treated in filename__read_build_id like a failure and the buffer is freed again. To avoid this problem only grow buf, meaning the buf_size will never be 0. This also reduces the number of memory (re)allocations. Fixes: b691f64360ecec49 ("perf symbols: Implement poor man's ELF parser") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250501070003.22251-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf mem: Add 'dtlb' output fieldNamhyung Kim6-0/+46
This is a breakdown of perf_mem_data_src.mem_dtlb values. It assumes PMU drivers would set PERF_MEM_TLB_HIT bit with an appropriate level. And having PERF_MEM_TLB_MISS means that it failed to find one in any levels of TLB. For now, it doesn't use PERF_MEM_TLB_{WK,OS} bits. Also it seems Intel machines don't distinguish L1 or L2 precisely. So I added ANY_HIT (printed as "L?-Hit") to handle the case. $ perf mem report -F overhead,dtlb,dso --stdio ... # --- D-TLB ---- # Overhead L?-Hit Miss Shared Object # ........ .............. ................. # 67.03% 99.5% 0.5% [unknown] 31.23% 99.2% 0.8% [kernel.kallsyms] 1.08% 97.8% 2.2% [i915] 0.36% 100.0% 0.0% [JIT] tid 6853 0.12% 100.0% 0.0% [drm] 0.05% 100.0% 0.0% [drm_kms_helper] 0.05% 100.0% 0.0% [ext4] 0.02% 100.0% 0.0% [aesni_intel] 0.02% 100.0% 0.0% [crc32c_intel] 0.02% 100.0% 0.0% [dm_crypt] ... Committer testing: # perf report --header | grep cpudesc # cpudesc : AMD Ryzen 9 9950X3D 16-Core Processor # perf mem report -F overhead,dtlb,dso --stdio | head -20 # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 2K of event 'cycles:P' # Total weight : 2637 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,local_p_stage_cyc # # ---------- D-TLB ----------- # Overhead L1-Hit L2-Hit Miss Other Shared Object # ........ ............................ ................................. # 77.47% 18.4% 0.1% 0.6% 80.9% [kernel.kallsyms] 5.61% 36.5% 0.7% 1.4% 61.5% libxul.so 2.77% 39.7% 0.0% 12.3% 47.9% libc.so.6 2.01% 34.0% 1.9% 1.9% 62.3% libglib-2.0.so.0.8400.1 1.93% 31.4% 2.0% 2.0% 64.7% [amdgpu] 1.63% 48.8% 0.0% 0.0% 51.2% [JIT] tid 60168 1.14% 3.3% 0.0% 0.0% 96.7% [vdso] # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-12-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf mem: Add 'snoop' output fieldNamhyung Kim6-0/+41
This is a breakdown of perf_mem_data_src.mem_snoop values. For now, it doesn't use mem_snoopx values like FWD and PEER. $ perf mem report -F overhead,snoop,comm --stdio ... # ---------- Snoop ----------- # Overhead Hit HitM Miss Other Command # ........ ............................ ............... # 34.24% 0.6% 0.0% 0.0% 99.4% gnome-shell 12.02% 1.0% 0.0% 0.0% 99.0% chrome 9.32% 1.0% 0.0% 0.3% 98.7% Isolated Web Co 6.85% 1.0% 0.3% 0.0% 98.6% swapper 6.30% 0.8% 0.8% 0.0% 98.5% Xorg 3.02% 2.4% 0.0% 0.0% 97.6% VizCompositorTh 2.35% 0.0% 0.0% 0.0% 100.0% firefox-esr 2.04% 0.0% 0.0% 0.0% 100.0% JS Helper 1.51% 3.2% 0.0% 0.0% 96.8% threaded-ml 1.44% 0.0% 0.0% 0.0% 100.0% AudioIP~allback ... Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-11-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf mem: Add 'cache' and 'memory' output fieldsNamhyung Kim6-2/+113
This is a breakdown of perf_mem_data_src.mem_lvl_num. But it's also divided into two parts because the combination is bigger than 8. Since there are many entries for different cache levels, 'cache' field focuses on them. I generalized buffers like LFB, MAB and MHB to L1-buf and L2-buf. The rest goes to 'memory' field which can be RAM, CXL, PMEM, IO, etc. $ perf mem report -F cache,mem,dso --stdio ... # # -------------- Cache -------------- --- Memory --- # L1 L2 L3 L1-buf Other RAM Other Shared Object # ................................... .............. .................................... # 53.9% 3.6% 16.2% 21.6% 4.8% 4.8% 95.2% [kernel.kallsyms] 64.7% 1.7% 3.5% 17.4% 12.8% 12.8% 87.2% chrome (deleted) 78.3% 2.8% 0.0% 1.0% 17.9% 17.9% 82.1% libc.so.6 39.6% 1.5% 0.0% 5.7% 53.2% 53.2% 46.8% libxul.so 26.2% 0.0% 0.0% 0.0% 73.8% 73.8% 26.2% [unknown] 85.5% 0.0% 0.0% 14.5% 0.0% 0.0% 100.0% libspa-audioconvert.so 66.3% 4.4% 0.0% 29.4% 0.0% 0.0% 100.0% libglib-2.0.so.0.8200.1 (deleted) 1.9% 0.0% 0.0% 0.0% 98.1% 98.1% 1.9% libmutter-cogl-15.so.0.0.0 (deleted) 10.6% 0.0% 0.0% 89.4% 0.0% 0.0% 100.0% libpulsecommon-16.1.so 0.0% 0.0% 0.0% 100.0% 0.0% 0.0% 100.0% libfreeblpriv3.so (deleted) ... Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-10-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf hist: Hide unused mem stat columnsNamhyung Kim3-2/+36
Some mem_stat types don't use all 8 columns. And there are cases only samples in certain kinds of mem_stat types are available only. For that case hide columns which has no samples. The new output for the previous data would be: $ perf mem report -F overhead,op,comm --stdio ... # ------ Mem Op ------- # Overhead Load Store Other Command # ........ ..................... ............... # 44.85% 21.1% 30.7% 48.3% swapper 26.82% 98.8% 0.3% 0.9% netsli-prober 7.19% 51.7% 13.7% 34.6% perf 5.81% 89.7% 2.2% 8.1% qemu-system-ppc 4.77% 100.0% 0.0% 0.0% notifications_c 1.77% 95.9% 1.2% 3.0% MemoryReleaser 0.77% 71.6% 4.1% 24.3% DefaultEventMan 0.19% 66.7% 22.2% 11.1% gnome-shell ... On Intel machines, the event is only for loads or stores so it'll have only one column: # Mem Op # Overhead Load Command # ........ ....... ............... # 20.55% 100.0% swapper 17.13% 100.0% chrome 9.02% 100.0% data-loop.0 6.26% 100.0% pipewire-pulse 5.63% 100.0% threaded-ml 5.47% 100.0% GraphRunner 5.37% 100.0% AudioIP~allback 5.30% 100.0% Chrome_ChildIOT 3.17% 100.0% Isolated Web Co ... Committer testing: # grep "model name" -m1 /proc/cpuinfo model name : AMD Ryzen 9 9950X3D 16-Core Processo # perf mem report -F overhead,op,comm --stdio # Total Lost Samples: 0 # # Samples: 2K of event 'cycles:P' # Total weight : 2637 # Sort order : local_weight,mem,sym,dso,symbol_daddr,dso_daddr,snoop,tlb,locked,blocked,local_ins_lat,local_p_stage_cyc # # ------ Mem Op ------- # Overhead Load Store Other Command # ........ ..................... ............... # 61.02% 14.4% 25.5% 60.1% swapper 5.61% 26.4% 13.5% 60.1% Isolated Web Co 5.50% 21.4% 29.7% 49.0% perf 4.74% 27.2% 15.2% 57.6% gnome-shell 4.63% 33.6% 11.5% 54.9% mdns_service 4.29% 28.3% 12.4% 59.3% ptyxis 2.16% 24.6% 19.3% 56.1% DOM Worker 0.99% 23.1% 34.6% 42.3% firefox 0.72% 26.3% 15.8% 57.9% IPC I/O Parent 0.61% 12.5% 12.5% 75.0% kworker/u130:20 0.61% 37.5% 18.8% 43.8% podman 0.57% 33.3% 6.7% 60.0% Timer 0.53% 14.3% 7.1% 78.6% KMS thread 0.49% 30.8% 7.7% 61.5% kworker/u130:3- 0.46% 41.7% 33.3% 25.0% IPDL Background Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-9-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf mem: Add 'op' output fieldNamhyung Kim6-17/+61
This is an actual example of the he_mem_stat based sample breakdown. It uses 'mem_op' field of union perf_mem_data_src which means memory operations. It'd have basically 'load' or 'store' which can be useful if PMU doesn't have separate events for them like IBS or SPE. In addition, there's an entry in case load and store happen at the same time. Also adds entries for prefetching and execution. $ perf mem report -F +op -s comm --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 4K of event 'ibs_op//' # Total weight : 9559 # Sort order : comm # # --------------------- Mem Op ---------------------- # Overhead Samples Load Store Ld+St Pfetch Exec Other N/A N/A Command # ........ ....... ................................................... ............... # 44.85% 4077 21.1% 30.7% 0.0% 0.0% 0.0% 48.3% 0.0% 0.0% swapper 26.82% 45 98.8% 0.3% 0.0% 0.0% 0.0% 0.9% 0.0% 0.0% netsli-prober 7.19% 442 51.7% 13.7% 0.0% 0.0% 0.0% 34.6% 0.0% 0.0% perf 5.81% 75 89.7% 2.2% 0.0% 0.0% 0.0% 8.1% 0.0% 0.0% qemu-system-ppc 4.77% 1 100.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% notifications_c 1.77% 10 95.9% 1.2% 0.0% 0.0% 0.0% 3.0% 0.0% 0.0% MemoryReleaser 0.77% 32 71.6% 4.1% 0.0% 0.0% 0.0% 24.3% 0.0% 0.0% DefaultEventMan 0.19% 10 66.7% 22.2% 0.0% 0.0% 0.0% 11.1% 0.0% 0.0% gnome-shell Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-8-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf hist: Implement output fields for mem statsNamhyung Kim6-12/+202
This is a preparation for later changes to support mem_stat output. The new fields will need two lines for the header - the first line will show type of mem stat and the second line will show the name of each item which is returned by mem_stat_name(). Each element in the mem_stat array will be printed in percentage for the hist_entry and their sum would be 100%. Add new output field dimension only for SORT_MODE__MEM using mem_stat. To handle possible name conflict with existing sort keys, move the order of checking output field dimensions after the sort dimensions when it looks for sort keys. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf hist: Basic support for mem_stat accountingNamhyung Kim6-2/+75
Add a logic to account he->mem_stat based on mem_stat_type in hists. Each mem_stat entry will have different meaning based on the type so the index in the array is calculated at runtime using the corresponding value in the sample.data_src. Still hists has no mem_stat_types yet so this code won't work for now. Later hists->mem_stat_types will be allocated based on what users want in the output actually. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf hist: Add struct he_mem_statNamhyung Kim2-0/+83
The 'struct he_mem_stat' is to save detailed information about memory instruction. It'll be used to show breakdown of various data from PERF_SAMPLE_DATA_SRC. Note that this structure is generic and the contents will be different depending on actual data it'll use later. The information about the actual data will be saved in 'struct hists' and its length is in nr_mem_stats. This commit just adds ground works and does nothing since hists->nr_mem_stats is 0 for now. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf hist: Support multi-line headerNamhyung Kim4-34/+64
This is a preparation to support multi-line headers in 'perf mem report'. Normal sort keys and output fields that don't have contents for multi- line will print the header string at the last line only. As we don't use multi-line headers normally, it should not have any changes in the output. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf record: Add --sample-mem-info optionNamhyung Kim4-2/+14
There's no way to enable PERF_SAMPLE_DATA_SRC without PERF_SAMPLE_ADDR which brings a lot of overhead due to the number of MMAP[2] records. Let's add a new option to enable this information separately. Committer testing: # perf record -a --sample-mem-info ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 1.815 MB perf.data (2637 samples) ] # # perf evlist -v cycles:P: type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0 (PERF_COUNT_HW_CPU_CYCLES), { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|IDENTIFIER|DATA_SRC, read_format: ID|LOST, disabled: 1, freq: 1, precise_ip: 2, sample_id_all: 1 dummy:u: type: 1 (PERF_TYPE_SOFTWARE), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|IDENTIFIER|DATA_SRC, read_format: ID|LOST, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 # # perf report -D |& grep -w PERF_RECORD_SAMPLE -A3 -m1 0 44675164447282 0x1a7590 [0x40]: PERF_RECORD_SAMPLE(IP, 0x4001): 107299/107299: 0xffffffffac4a5e11 period: 144 addr: 0 . data_src: 0x229080142 ... thread: perf:107299 ...... dso: /lib/modules/6.15.0-rc4+/build/vmlinux # Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Link: https://lore.kernel.org/r/20250430205548.789750-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-02perf hist: Remove output field from sort-list properlyNamhyung Kim1-10/+5
When it removes an output format for cancelled children or latency, it should delete itself from the sort list as well. Otherwise assertion in fmt_free() will fire. $ perf report -H --stdio perf: ui/hist.c:603: fmt_free: Assertion `!(!list_empty(&fmt->sort_list))' failed. Aborted (core dumped) Also convert to perf_hpp__column_unregister() for the same open codes. Committer notes: Before this patch: # perf test hierarchy 83: perf report --hierarchy : FAILED! # perf test -v hierarchy --- start --- test child forked, pid 102242 perf report --hierarchy Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.025 MB /tmp/perf-test-report.HX0N85TlPq/perf-report-hierarchy-perf.data (6 samples) ] perf: ui/hist.c:603: fmt_free: Assertion `!(!list_empty(&fmt->sort_list))' failed. /home/acme/libexec/perf-core/tests/shell/perf-report-hierarchy.sh: line 34: 102250 Aborted (core dumped) perf report --hierarchy > /dev/null --- Cleaning up --- ---- end(-1) ---- 83: perf report --hierarchy : FAILED! # After: # perf test hierarchy 83: perf report --hierarchy : Ok # Fixes: dbd11b6bdab12f60 ("perf hist: Remove formats in hierarchy when cancel children") Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250430180321.736939-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>