Age | Commit message (Collapse) | Author | Files | Lines |
|
install(1), by default, installs with rwxr-xr-x permissions. Modify
perf's Makefile to pass '-m 644' when installing:
* Documentation/tips.txt
* examples/bpf/*
* perf-completion.sh
* perf_dlfilter.h header
* scripts/perl/Perf-Trace-Util/lib/Perf/Trace/*
* scripts/perl/*.pl
* tests/attr/*
* tests/attr.py
* tests/shell/lib/*.sh
* trace/strace/groups/*
All those are supposed to be non-executable. Either they are not scripts
at all, or they don't have shebang.
Signed-off-by: <jslaby@suse.cz>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220908060426.9619-1-jslaby@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Commit b91e5492f9d7ca89 ("perf record: Add a dummy event on hybrid
systems to collect metadata records") adds a dummy event on hybrid
systems to fix the symbol "unknown" issue when the workload is created
in a P-core but runs on an E-core. The added dummy event will cause
"perf script -F iregs" to fail. Dummy events do not have "iregs"
attribute set, so when we do evsel__check_attr, the "iregs" attribute
check will fail, so the issue happened.
The following commit [1] has fixed a similar issue by skipping the attr
check for the dummy event because it does not have any samples anyway. It
works okay for the normal mode, but the issue still happened when running
the test in the pipe mode. In the pipe mode, it calls process_attr() which
still checks the attr for the dummy event. This commit fixed the issue by
skipping the attr check for the dummy event in the API evsel__check_attr,
Otherwise, we have to patch everywhere when evsel__check_attr() is called.
Before:
#./perf record -o - --intr-regs=di,r8,dx,cx -e br_inst_retired.near_call:p -c 1000 --per-thread true 2>/dev/null|./perf script -F iregs |head -5
Samples for 'dummy:HG' event do not have IREGS attribute set. Cannot print 'iregs' field.
0x120 [0x90]: failed to process type: 64
#
After:
# ./perf record -o - --intr-regs=di,r8,dx,cx -e br_inst_retired.near_call:p -c 1000 --per-thread true 2>/dev/null|./perf script -F iregs |head -5
ABI:2 CX:0x55b8efa87000 DX:0x55b8efa7e000 DI:0xffffba5e625efbb0 R8:0xffff90e51f8ae100
ABI:2 CX:0x7f1dae1e4000 DX:0xd0 DI:0xffff90e18c675ac0 R8:0x71
ABI:2 CX:0xcc0 DX:0x1 DI:0xffff90e199880240 R8:0x0
ABI:2 CX:0xffff90e180dd7500 DX:0xffff90e180dd7500 DI:0xffff90e180043500 R8:0x1
ABI:2 CX:0x50 DX:0xffff90e18c583bd0 DI:0xffff90e1998803c0 R8:0x58
#
[1]https://lore.kernel.org/lkml/20220831124041.219925-1-jolsa@kernel.org/
Fixes: b91e5492f9d7ca89 ("perf record: Add a dummy event on hybrid systems to collect metadata records")
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220908070030.3455164-1-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Before:
# perf lock -h
Usage: perf lock [<options>] {record|report|script|info|contention|contention}
-D, --dump-raw-trace dump raw trace in ASCII
-f, --force don't complain, do it
-i, --input <file> input file name
-v, --verbose be more verbose (show symbol address, etc)
--kallsyms <file>
kallsyms pathname
--vmlinux <file> vmlinux pathname
After:
# perf lock -h
Usage: perf lock [<options>] {record|report|script|info|contention}
-D, --dump-raw-trace dump raw trace in ASCII
-f, --force don't complain, do it
-i, --input <file> input file name
-v, --verbose be more verbose (show symbol address, etc)
--kallsyms <file>
kallsyms pathname
--vmlinux <file> vmlinux pathname
Fixes: 528b9cab3b813a3b ("perf lock: Add 'contention' subcommand")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220908014854.151203-1-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Avoid compiler warning about format %llu that expects long long unsigned
int but argument has type __u64.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fixes: c3afd6e50fce824f ("perf dlfilter: Add dlfilter-show-cycles")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220905074735.4513-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The offending commit removed mmap_per_thread(), which did not consider
the different set-output rules for per-thread mmaps i.e. in the per-thread
case set-output is used for file descriptors of the same thread not the
same cpu.
This was not immediately noticed because it only happens with
multi-threaded targets and we do not have a test for that yet.
Reinstate mmap_per_thread() expanding it to cover also system-wide per-cpu
events i.e. to continue to allow the mixing of per-thread and per-cpu
mmaps.
Debug messages (with -vv) show the file descriptors that are opened with
sys_perf_event_open. New debug messages are added (needs -vvv) that show
also which file descriptors are mmapped and which are redirected with
set-output.
In the per-cpu case (cpu != -1) file descriptors for the same CPU are
set-output to the first file descriptor for that CPU.
In the per-thread case (cpu == -1) file descriptors for the same thread are
set-output to the first file descriptor for that thread.
Example (process 17489 has 2 threads):
Before (but with new debug prints):
$ perf record --no-bpf-event -vvv --per-thread -p 17489
<SNIP>
sys_perf_event_open: pid 17489 cpu -1 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid 17490 cpu -1 group_fd -1 flags 0x8 = 6
<SNIP>
libperf: idx 0: mmapping fd 5
libperf: idx 0: set output fd 6 -> 5
failed to mmap with 22 (Invalid argument)
After:
$ perf record --no-bpf-event -vvv --per-thread -p 17489
<SNIP>
sys_perf_event_open: pid 17489 cpu -1 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid 17490 cpu -1 group_fd -1 flags 0x8 = 6
<SNIP>
libperf: mmap_per_thread: nr cpu values (may include -1) 1 nr threads 2
libperf: idx 0: mmapping fd 5
libperf: idx 1: mmapping fd 6
<SNIP>
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.018 MB perf.data (15 samples) ]
Per-cpu example (process 20341 has 2 threads, same as above):
$ perf record --no-bpf-event -vvv -p 20341
<SNIP>
sys_perf_event_open: pid 20341 cpu 0 group_fd -1 flags 0x8 = 5
sys_perf_event_open: pid 20342 cpu 0 group_fd -1 flags 0x8 = 6
sys_perf_event_open: pid 20341 cpu 1 group_fd -1 flags 0x8 = 7
sys_perf_event_open: pid 20342 cpu 1 group_fd -1 flags 0x8 = 8
sys_perf_event_open: pid 20341 cpu 2 group_fd -1 flags 0x8 = 9
sys_perf_event_open: pid 20342 cpu 2 group_fd -1 flags 0x8 = 10
sys_perf_event_open: pid 20341 cpu 3 group_fd -1 flags 0x8 = 11
sys_perf_event_open: pid 20342 cpu 3 group_fd -1 flags 0x8 = 12
sys_perf_event_open: pid 20341 cpu 4 group_fd -1 flags 0x8 = 13
sys_perf_event_open: pid 20342 cpu 4 group_fd -1 flags 0x8 = 14
sys_perf_event_open: pid 20341 cpu 5 group_fd -1 flags 0x8 = 15
sys_perf_event_open: pid 20342 cpu 5 group_fd -1 flags 0x8 = 16
sys_perf_event_open: pid 20341 cpu 6 group_fd -1 flags 0x8 = 17
sys_perf_event_open: pid 20342 cpu 6 group_fd -1 flags 0x8 = 18
sys_perf_event_open: pid 20341 cpu 7 group_fd -1 flags 0x8 = 19
sys_perf_event_open: pid 20342 cpu 7 group_fd -1 flags 0x8 = 20
<SNIP>
libperf: mmap_per_cpu: nr cpu values 8 nr threads 2
libperf: idx 0: mmapping fd 5
libperf: idx 0: set output fd 6 -> 5
libperf: idx 1: mmapping fd 7
libperf: idx 1: set output fd 8 -> 7
libperf: idx 2: mmapping fd 9
libperf: idx 2: set output fd 10 -> 9
libperf: idx 3: mmapping fd 11
libperf: idx 3: set output fd 12 -> 11
libperf: idx 4: mmapping fd 13
libperf: idx 4: set output fd 14 -> 13
libperf: idx 5: mmapping fd 15
libperf: idx 5: set output fd 16 -> 15
libperf: idx 6: mmapping fd 17
libperf: idx 6: set output fd 18 -> 17
libperf: idx 7: mmapping fd 19
libperf: idx 7: set output fd 20 -> 19
<SNIP>
[ perf record: Woken up 7 times to write data ]
[ perf record: Captured and wrote 0.020 MB perf.data (17 samples) ]
Fixes: ae4f8ae16a078964 ("libperf evlist: Allow mixing per-thread and per-cpu mmaps")
Reported-by: Tomáš Trnka <trnka@scm.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216441
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220905114209.8389-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Free allocated resources when zalloc() fails for members in c2c_he, to
prevent potential memory leak in c2c_he_zalloc().
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220906032906.21395-4-shangxiaojing@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Switch to the flavored EVP API like in test-libcrypto.c, and remove the
bad gcc #pragma.
Inspired-by: 5b245985a6de5ac1 ("tools build: Switch to new openssl API for test-libcrypto")
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/CABwm_eTnARC1GwMD-JF176k8WXU1Z0+H190mvXn61yr369qt6g@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The cpu mask init code in "record__mmap_cpu_mask_init" function access
"bits" array part of "struct mmap_cpu_mask". The size of this array is
the value from cpu__max_cpu().cpu. This array is used to contain the
cpumask value for each cpu. While setting bit for each cpu, it calls
"set_bit" function which access index in "bits" array.
If we provide a command line option to -C which is greater than the
number of CPU's present in the system, the set_bit could access an array
member which is out-of the array size. This is because currently, there
is no boundary check for the CPU. This will result in seg fault:
<<>>
./perf record -C 12341234 ls
Perf can support 2048 CPUs. Consider raising MAX_NR_CPUS
Segmentation fault (core dumped)
<<>>
Debugging with gdb, points to function flow as below:
<<>>
set_bit
record__mmap_cpu_mask_init
record__init_thread_default_masks
record__init_thread_masks
cmd_record
<<>>
Fix this by adding boundary check for the array.
After the patch:
<<>>
./perf record -C 12341234 ls
Perf can support 2048 CPUs. Consider raising MAX_NR_CPUS
Failed to initialize parallel data streaming masks
<<>>
With this fix, if -C is given a non-exsiting CPU, perf
record will fail with:
<<>>
./perf record -C 50 ls
Failed to initialize parallel data streaming masks
<<>>
Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220905141929.7171-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The affinity code in "affinity_set" function access array named
"sched_cpus". The size for this array is allocated in affinity_setup
function which is nothing but value from get_cpu_set_size. This is used
to contain the cpumask value for each cpu.
While setting bit for each cpu, it calls "set_bit" function which access
index in sched_cpus array. If we provide a command-line option to -C
which is more than the number of CPU's present in the system, the
set_bit could access an array member which is out-of the array size.
This is because currently, there is no boundary check for the CPU. This
will result in seg fault:
<<>>
./perf stat -C 12323431 ls
Perf can support 2048 CPUs. Consider raising MAX_NR_CPUS
Segmentation fault (core dumped)
<<>>
Fix this by adding boundary check for the array.
After the fix from powerpc system:
<<>>
./perf stat -C 12323431 ls 1>out
Perf can support 2048 CPUs. Consider raising MAX_NR_CPUS
Performance counter stats for 'CPU(s) 12323431':
<not supported> msec cpu-clock
<not supported> context-switches
<not supported> cpu-migrations
<not supported> page-faults
<not supported> cycles
<not supported> instructions
<not supported> branches
<not supported> branch-misses
0.001192373 seconds time elapsed
<<>>
Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220905141929.7171-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In perf/Documentation/perf-stat.txt, for "--td-level" the default "0" means
the max level that the current hardware support.
So we need initialize the stat_config.topdown_level to TOPDOWN_MAX_LEVEL
when “--td-level=0” or no “--td-level” option. Otherwise, for the
hardware with a max level is 2, the 2nd level metrics disappear for raw
events in this case.
The issue cannot be observed for the perf stat default or "--topdown"
options. This commit fixes the raw events issue and removes the
duplicated code for the perf stat default.
Before:
# ./perf stat -e "cpu-clock,context-switches,cpu-migrations,page-faults,instructions,cycles,ref-cycles,branches,branch-misses,{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}" sleep 1
Performance counter stats for 'sleep 1':
1.03 msec cpu-clock # 0.001 CPUs utilized
1 context-switches # 966.216 /sec
0 cpu-migrations # 0.000 /sec
60 page-faults # 57.973 K/sec
1,132,112 instructions # 1.41 insn per cycle
803,872 cycles # 0.777 GHz
1,909,120 ref-cycles # 1.845 G/sec
236,634 branches # 228.640 M/sec
6,367 branch-misses # 2.69% of all branches
4,823,232 slots # 4.660 G/sec
1,210,536 topdown-retiring # 25.1% Retiring
699,841 topdown-bad-spec # 14.5% Bad Speculation
1,777,975 topdown-fe-bound # 36.9% Frontend Bound
1,134,878 topdown-be-bound # 23.5% Backend Bound
189,146 topdown-heavy-ops # 182.756 M/sec
662,012 topdown-br-mispredict # 639.647 M/sec
1,097,048 topdown-fetch-lat # 1.060 G/sec
416,121 topdown-mem-bound # 402.063 M/sec
1.002423690 seconds time elapsed
0.002494000 seconds user
0.000000000 seconds sys
After:
# ./perf stat -e "cpu-clock,context-switches,cpu-migrations,page-faults,instructions,cycles,ref-cycles,branches,branch-misses,{slots,topdown-retiring,topdown-bad-spec,topdown-fe-bound,topdown-be-bound,topdown-heavy-ops,topdown-br-mispredict,topdown-fetch-lat,topdown-mem-bound}" sleep 1
Performance counter stats for 'sleep 1':
1.13 msec cpu-clock # 0.001 CPUs utilized
1 context-switches # 882.128 /sec
0 cpu-migrations # 0.000 /sec
61 page-faults # 53.810 K/sec
1,137,612 instructions # 1.29 insn per cycle
881,477 cycles # 0.778 GHz
2,093,496 ref-cycles # 1.847 G/sec
236,356 branches # 208.496 M/sec
7,090 branch-misses # 3.00% of all branches
5,288,862 slots # 4.665 G/sec
1,223,697 topdown-retiring # 23.1% Retiring
767,403 topdown-bad-spec # 14.5% Bad Speculation
2,053,322 topdown-fe-bound # 38.8% Frontend Bound
1,244,438 topdown-be-bound # 23.5% Backend Bound
186,665 topdown-heavy-ops # 3.5% Heavy Operations # 19.6% Light Operations
725,922 topdown-br-mispredict # 13.7% Branch Mispredict # 0.8% Machine Clears
1,327,400 topdown-fetch-lat # 25.1% Fetch Latency # 13.7% Fetch Bandwidth
497,775 topdown-mem-bound # 9.4% Memory Bound # 14.1% Core Bound
1.002701530 seconds time elapsed
0.002744000 seconds user
0.000000000 seconds sys
Fixes: 63e39aa6ae103451 ("perf stat: Support L2 Topdown events")
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220826140057.3289401-1-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Hongtao Yu reported problem when displaying uregs in perf script
for system wide perf.data:
# perf script -F uregs | head -10
Samples for 'dummy:HG' event do not have UREGS attribute set. Cannot print 'uregs' field.
The problem is the extra dummy event added for system wide,
which does not have proper sample_type setup.
Skipping attr check completely for dummy event as suggested
by Namhyung, because it does not have any samples anyway.
Reported-by: Hongtao Yu <hoy@fb.com>
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220831124041.219925-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Previous behavior is to segfault if there is no CPU PMU table and a
metric is sought. To reproduce compile with NO_JEVENTS=1 then request a
metric, for example, "perf stat -M IPC true".
Committer testing:
Before:
$ make -k NO_JEVENTS=1 BUILD_BPF_SKEL=1 O=/tmp/build/perf-urgent -C tools/perf install-bin
$ perf stat -M IPC true
Segmentation fault (core dumped)
$
After:
$ perf stat -M IPC true
Usage: perf stat [<options>] [<command>]
-M, --metrics <metric/metric group list>
monitor specified metrics or metric groups (separated by ,)
$
Fixes: 00facc760903be66 ("perf jevents: Switch build to use jevents.py")
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ian Rogers <rogers.email@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kshipra Bopardikar <kshipra.bopardikar@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220830164846.401143-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
There are two definitions of the is_signed_type() macro: one in
<linux/overflow.h> and a second definition in <linux/trace_events.h>.
As suggested by Linus, move the definition of the is_signed_type() macro
into the <linux/compiler.h> header file. Change the definition of the
is_signed_type() macro to make sure that it does not trigger any sparse
warnings with future versions of sparse for bitwise types.
Link: https://lore.kernel.org/all/CAHk-=whjH6p+qzwUdx5SOVVHjS3WvzJQr6mDUwhEyTf6pJWzaQ@mail.gmail.com/
Link: https://lore.kernel.org/all/CAHk-=wjQGnVfb4jehFR0XyZikdQvCZouE96xR_nnf5kqaM5qqQ@mail.gmail.com/
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
My Bootlin address is preferred from now on.
Link: https://lkml.kernel.org/r/20220826130515.3011951-1-luca.ceresoli@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Atish Patra <atishp@atishpatra.org>
Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Yu Zhao reported a bug after the commit "mm/swap: Add swp_offset_pfn() to
fetch PFN from swap entry" added a check in swp_offset_pfn() for swap type [1]:
kernel BUG at include/linux/swapops.h:117!
CPU: 46 PID: 5245 Comm: EventManager_De Tainted: G S O L 6.0.0-dbg-DEV #2
RIP: 0010:pfn_swap_entry_to_page+0x72/0xf0
Code: c6 48 8b 36 48 83 fe ff 74 53 48 01 d1 48 83 c1 08 48 8b 09 f6
c1 01 75 7b 66 90 48 89 c1 48 8b 09 f6 c1 01 74 74 5d c3 eb 9e <0f> 0b
48 ba ff ff ff ff 03 00 00 00 eb ae a9 ff 0f 00 00 75 13 48
RSP: 0018:ffffa59e73fabb80 EFLAGS: 00010282
RAX: 00000000ffffffe8 RBX: 0c00000000000000 RCX: ffffcd5440000000
RDX: 1ffffffffff7a80a RSI: 0000000000000000 RDI: 0c0000000000042b
RBP: ffffa59e73fabb80 R08: ffff9965ca6e8bb8 R09: 0000000000000000
R10: ffffffffa5a2f62d R11: 0000030b372e9fff R12: ffff997b79db5738
R13: 000000000000042b R14: 0c0000000000042b R15: 1ffffffffff7a80a
FS: 00007f549d1bb700(0000) GS:ffff99d3cf680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000440d035b3180 CR3: 0000002243176004 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
change_pte_range+0x36e/0x880
change_p4d_range+0x2e8/0x670
change_protection_range+0x14e/0x2c0
mprotect_fixup+0x1ee/0x330
do_mprotect_pkey+0x34c/0x440
__x64_sys_mprotect+0x1d/0x30
It triggers because pfn_swap_entry_to_page() could be called upon e.g. a
genuine swap entry.
Fix it by only calling it when it's a write migration entry where the page*
is used.
[1] https://lore.kernel.org/lkml/CAOUHufaVC2Za-p8m0aiHw6YkheDcrO-C3wRGixwDS32VTS+k1w@mail.gmail.com/
Link: https://lkml.kernel.org/r/20220823221138.45602-1-peterx@redhat.com
Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Yu Zhao <yuzhao@google.com>
Tested-by: Yu Zhao <yuzhao@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The decompressors may be called while in an atomic section. So move the
kmalloc() out of this path, and into the "page actor" init function.
This fixes a regression introduced by commit
f268eedddf35 ("squashfs: extend "page actor" to handle missing pages")
Link: https://lkml.kernel.org/r/20220822215430.15933-1-phillip@squashfs.org.uk
Fixes: f268eedddf35 ("squashfs: extend "page actor" to handle missing pages")
Reported-by: Chris Murphy <lists@colorremedies.com>
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When user tries to create a DAMON context via the DAMON debugfs interface
with a name of an already existing context, the context directory creation
fails but a new context is created and added in the internal data
structure, due to absence of the directory creation success check. As a
result, memory could leak and DAMON cannot be turned on. An example test
case is as below:
# cd /sys/kernel/debug/damon/
# echo "off" > monitor_on
# echo paddr > target_ids
# echo "abc" > mk_context
# echo "abc" > mk_context
# echo $$ > abc/target_ids
# echo "on" > monitor_on <<< fails
Return value of 'debugfs_create_dir()' is expected to be ignored in
general, but this is an exceptional case as DAMON feature is depending
on the debugfs functionality and it has the potential duplicate name
issue. This commit therefore fixes the issue by checking the directory
creation failure and immediately return the error in the case.
Link: https://lkml.kernel.org/r/20220821180853.2400-1-sj@kernel.org
Fixes: 75c1c2b53c78 ("mm/damon/dbgfs: support multiple contexts")
Signed-off-by: Badari Pulavarty <badari.pulavarty@intel.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: <stable@vger.kernel.org> [ 5.15.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Colin King is working on kernel janitorial fixes in his spare time and
using his Intel email is confusing. Use his gmail account as the default
email address.
Link: https://lkml.kernel.org/r/20220817212753.101109-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
There are two problems with the current code of memory_intersects:
First, it doesn't check whether the region (begin, end) falls inside the
region (virt, vend), that is (virt < begin && vend > end).
The second problem is if vend is equal to begin, it will return true but
this is wrong since vend (virt + size) is not the last address of the
memory region but (virt + size -1) is. The wrong determination will
trigger the misreporting when the function check_for_illegal_area calls
memory_intersects to check if the dma region intersects with stext region.
The misreporting is as below (stext is at 0x80100000):
WARNING: CPU: 0 PID: 77 at kernel/dma/debug.c:1073 check_for_illegal_area+0x130/0x168
DMA-API: chipidea-usb2 e0002000.usb: device driver maps memory from kernel text or rodata [addr=800f0000] [len=65536]
Modules linked in:
CPU: 1 PID: 77 Comm: usb-storage Not tainted 5.19.0-yocto-standard #5
Hardware name: Xilinx Zynq Platform
unwind_backtrace from show_stack+0x18/0x1c
show_stack from dump_stack_lvl+0x58/0x70
dump_stack_lvl from __warn+0xb0/0x198
__warn from warn_slowpath_fmt+0x80/0xb4
warn_slowpath_fmt from check_for_illegal_area+0x130/0x168
check_for_illegal_area from debug_dma_map_sg+0x94/0x368
debug_dma_map_sg from __dma_map_sg_attrs+0x114/0x128
__dma_map_sg_attrs from dma_map_sg_attrs+0x18/0x24
dma_map_sg_attrs from usb_hcd_map_urb_for_dma+0x250/0x3b4
usb_hcd_map_urb_for_dma from usb_hcd_submit_urb+0x194/0x214
usb_hcd_submit_urb from usb_sg_wait+0xa4/0x118
usb_sg_wait from usb_stor_bulk_transfer_sglist+0xa0/0xec
usb_stor_bulk_transfer_sglist from usb_stor_bulk_srb+0x38/0x70
usb_stor_bulk_srb from usb_stor_Bulk_transport+0x150/0x360
usb_stor_Bulk_transport from usb_stor_invoke_transport+0x38/0x440
usb_stor_invoke_transport from usb_stor_control_thread+0x1e0/0x238
usb_stor_control_thread from kthread+0xf8/0x104
kthread from ret_from_fork+0x14/0x2c
Refactor memory_intersects to fix the two problems above.
Before the 1d7db834a027e ("dma-debug: use memory_intersects()
directly"), memory_intersects is called only by printk_late_init:
printk_late_init -> init_section_intersects ->memory_intersects.
There were few places where memory_intersects was called.
When commit 1d7db834a027e ("dma-debug: use memory_intersects()
directly") was merged and CONFIG_DMA_API_DEBUG is enabled, the DMA
subsystem uses it to check for an illegal area and the calltrace above
is triggered.
[akpm@linux-foundation.org: fix nearby comment typo]
Link: https://lkml.kernel.org/r/20220819081145.948016-1-quanyang.wang@windriver.com
Fixes: 979559362516 ("asm/sections: add helpers to check for section data")
Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thierry Reding <treding@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The vmemmap pages is marked by kmemleak when allocated from memblock.
Remove it from kmemleak when freeing the page. Otherwise, when we reuse
the page, kmemleak may report such an error and then stop working.
kmemleak: Cannot insert 0xffff98fb6eab3d40 into the object search tree (overlaps existing)
kmemleak: Kernel memory leak detector disabled
kmemleak: Object 0xffff98fb6be00000 (size 335544320):
kmemleak: comm "swapper", pid 0, jiffies 4294892296
kmemleak: min_count = 0
kmemleak: count = 0
kmemleak: flags = 0x1
kmemleak: checksum = 0
kmemleak: backtrace:
Link: https://lkml.kernel.org/r/20220819094005.2928241-1-liushixin2@huawei.com
Fixes: f41f2ed43ca5 (mm: hugetlb: free the vmemmap pages associated with each HugeTLB page)
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
After commit 0737e01de9c4 ("ocfs2: ocfs2_mount_volume does cleanup job
before return error"), any procedure after ocfs2_dlm_init() fails will
trigger crash when calling ocfs2_dlm_shutdown().
ie: On local mount mode, no dlm resource is initialized. If
ocfs2_mount_volume() fails in ocfs2_find_slot(), error handling will call
ocfs2_dlm_shutdown(), then does dlm resource cleanup job, which will
trigger kernel crash.
This solution should bypass uninitialized resources in
ocfs2_dlm_shutdown().
Link: https://lkml.kernel.org/r/20220815085754.20417-1-heming.zhao@suse.com
Fixes: 0737e01de9c4 ("ocfs2: ocfs2_mount_volume does cleanup job before return error")
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This reverts commit 96e51ccf1af33e82f429a0d6baebba29c6448d0f.
Recently we started running the kernel with rstat infrastructure on
production traffic and begin to see negative memcg stats values.
Particularly the 'sock' stat is the one which we observed having negative
value.
$ grep "sock " /mnt/memory/job/memory.stat
sock 253952
total_sock 18446744073708724224
Re-run after couple of seconds
$ grep "sock " /mnt/memory/job/memory.stat
sock 253952
total_sock 53248
For now we are only seeing this issue on large machines (256 CPUs) and
only with 'sock' stat. I think the networking stack increase the stat on
one cpu and decrease it on another cpu much more often. So, this negative
sock is due to rstat flusher flushing the stats on the CPU that has seen
the decrement of sock but missed the CPU that has increments. A typical
race condition.
For easy stable backport, revert is the most simple solution. For long
term solution, I am thinking of two directions. First is just reduce the
race window by optimizing the rstat flusher. Second is if the reader sees
a negative stat value, force flush and restart the stat collection.
Basically retry but limited.
Link: https://lkml.kernel.org/r/20220817172139.3141101-1-shakeelb@google.com
Fixes: 96e51ccf1af33e8 ("memcg: cleanup racy sum avoidance code")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Cc: "Michal Koutný" <mkoutny@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: <stable@vger.kernel.org> [5.15]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
zsmalloc() now returns ERR_PTR values as handles, which zram accidentally
can pass to zs_free(). Another bad scenario is when zcomp_compress()
fails - handle has default -ENOMEM value, and zs_free() will try to free
that "pointer value".
Add the missing check and make sure that zs_free() bails out when
ERR_PTR() is passed to it.
Link: https://lkml.kernel.org/r/20220816050906.2583956-1-senozhatsky@chromium.org
Fixes: c7e6f17b52e9 ("zsmalloc: zs_malloc: return ERR_PTR on failure")
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>,
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Take the mmap_read_lock() when using the VMA in binder_alloc_print_pages()
and when checking for a VMA in binder_alloc_new_buf_locked().
It is worth noting binder_alloc_new_buf_locked() drops the VMA read lock
after it verifies a VMA exists, but may be taken again deeper in the call
stack, if necessary.
Link: https://lkml.kernel.org/r/20220810160209.1630707-1-Liam.Howlett@oracle.com
Fixes: a43cfc87caaf (android: binder: stop saving a pointer to the VMA)
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Reported-by: <syzbot+a7b60a176ec13cafb793@syzkaller.appspotmail.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Tested-by: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Martijn Coenen <maco@android.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Todd Kjos <tkjos@android.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: "Arve Hjønnevåg" <arve@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The below referenced commit makes the same error as 1c563432588d ("mm: fix
is_pinnable_page against a cma page"), re-interpreting the logic to
exclude pinning of the zero page, which breaks device assignment with
vfio.
To avoid further subtle mistakes, split the logic into discrete tests.
[akpm@linux-foundation.org: simplify comment, per John]
Link: https://lkml.kernel.org/r/166015037385.760108.16881097713975517242.stgit@omen
Link: https://lore.kernel.org/all/165490039431.944052.12458624139225785964.stgit@omen
Fixes: f25cbb7a95a2 ("mm: add zone device coherent type memory support")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Suggested-by: Felix Kuehling <felix.kuehling@amd.com>
Tested-by: Slawomir Laba <slawomirx.laba@intel.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Alex Sierra <alex.sierra@amd.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The rest of the kallsyms symbols are useless without knowing the number of
symbols in the table. In an earlier patch, I somehow dropped the
kallsyms_num_syms symbol, so add it back in.
Link: https://lkml.kernel.org/r/20220808205410.18590-1-stephen.s.brennan@oracle.com
Fixes: 5fd8fea935a1 ("vmcoreinfo: include kallsyms symbols")
Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Both @canonical and @ibm email addresses are invalid now; use my personal
address instead.
Link: https://lkml.kernel.org/r/20220804202207.439427-1-gpiccoli@igalia.com
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When a disk is removed, bdi_unregister gets called to stop further
writeback and wait for associated delayed work to complete. However,
wb_inode_writeback_end() may schedule bandwidth estimation dwork after
this has completed, which can result in the timer attempting to access the
just freed bdi_writeback.
Fix this by checking if the bdi_writeback is alive, similar to when
scheduling writeback work.
Since this requires wb->work_lock, and wb_inode_writeback_end() may get
called from interrupt, switch wb->work_lock to an irqsafe lock.
Link: https://lkml.kernel.org/r/20220801155034.3772543-1-khazhy@google.com
Fixes: 45a2966fd641 ("writeback: fix bandwidth estimate for spiky workload")
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Michael Stapelberg <stapelberg+linux@google.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
If we allocate a new page, we need to make sure that our folio matches
that new page.
If we do end up in this code path, we store the wrong page in the shmem
inode's page cache, and I would rather imagine that data corruption
ensues.
This will be solved by changing shmem_replace_page() to
shmem_replace_folio(), but this is the minimal fix.
Link: https://lkml.kernel.org/r/20220730042518.1264767-1-willy@infradead.org
Fixes: da08e9b79323 ("mm/shmem: convert shmem_swapin_page() to shmem_swapin_folio()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In MCOPY_ATOMIC_CONTINUE case with a non-shared VMA, pages in the page
cache are installed in the ptes. But hugepage_add_new_anon_rmap is called
for them mistakenly because they're not vm_shared. This will corrupt the
page->mapping used by page cache code.
Link: https://lkml.kernel.org/r/20220712130542.18836-1-linmiaohe@huawei.com
Fixes: f619147104c8 ("userfaultfd: add UFFDIO_CONTINUE ioctl")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Some architectures define their own arch_test_bit and they also need
arch_test_bit_acquire, otherwise they won't compile. We also clean up
the code by using the generic test_bit if that is equivalent to the
arch-specific version.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 8238b4579866 ("wait_on_bit: add an acquire memory barrier")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Capitalize topdown metrics' names to follow the intel SDM.
Before:
# ./perf stat -a sleep 1
Performance counter stats for 'system wide':
228,094.05 msec cpu-clock # 225.026 CPUs utilized
842 context-switches # 3.691 /sec
224 cpu-migrations # 0.982 /sec
70 page-faults # 0.307 /sec
23,164,105 cycles # 0.000 GHz
29,403,446 instructions # 1.27 insn per cycle
5,268,185 branches # 23.097 K/sec
33,239 branch-misses # 0.63% of all branches
136,248,990 slots # 597.337 K/sec
32,976,450 topdown-retiring # 24.2% retiring
4,651,918 topdown-bad-spec # 3.4% bad speculation
26,148,695 topdown-fe-bound # 19.2% frontend bound
72,515,776 topdown-be-bound # 53.2% backend bound
6,008,540 topdown-heavy-ops # 4.4% heavy operations # 19.8% light operations
3,934,049 topdown-br-mispredict # 2.9% branch mispredict # 0.5% machine clears
16,655,439 topdown-fetch-lat # 12.2% fetch latency # 7.0% fetch bandwidth
41,635,972 topdown-mem-bound # 30.5% memory bound # 22.7% Core bound
1.013634593 seconds time elapsed
After:
# ./perf stat -a sleep 1
Performance counter stats for 'system wide':
228,081.94 msec cpu-clock # 225.003 CPUs utilized
824 context-switches # 3.613 /sec
224 cpu-migrations # 0.982 /sec
67 page-faults # 0.294 /sec
22,647,423 cycles # 0.000 GHz
28,870,551 instructions # 1.27 insn per cycle
5,167,099 branches # 22.655 K/sec
32,383 branch-misses # 0.63% of all branches
133,411,074 slots # 584.926 K/sec
32,352,607 topdown-retiring # 24.3% Retiring
4,456,977 topdown-bad-spec # 3.3% Bad Speculation
25,626,487 topdown-fe-bound # 19.2% Frontend Bound
70,955,316 topdown-be-bound # 53.2% Backend Bound
5,834,844 topdown-heavy-ops # 4.4% Heavy Operations # 19.9% Light Operations
3,738,781 topdown-br-mispredict # 2.8% Branch Mispredict # 0.5% Machine Clears
16,286,803 topdown-fetch-lat # 12.2% Fetch Latency # 7.0% Fetch Bandwidth
40,802,069 topdown-mem-bound # 30.6% Memory Bound # 22.6% Core Bound
1.013683125 seconds time elapsed
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220825015458.3252239-1-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Update the documentation to reflect the kernel changes.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220816125612.2042397-2-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
An array of strings is passed to cmd_record but not freed. As
cmd_record modifies the array, add another array as a copy that can be
mutated allowing the original array contents to all be freed.
Detected with -fsanitize=address.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220824145733.409005-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The Intel hybrid description is written in a different style than the
rest of the perf record man page. There were some new command line
options added after it which resulted in very strange section ordering.
Move the hybrid include last.
Also the sub sections in the hybrid document don't fit the record
manpage well (especially since it talks about all kinds of unrelated
commands). I left this for now, but would be better to separate this
properly in the different man pages.
It would be better to use sub sections for the other sections, but these
don't seem to be supported in AsciiDoc?
Some of the examples are still misrendered in the manpage with an
indented troff command, but I don't know how to fix that.
In any case it's now better than before.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: zhengjun.xing@intel.com
Link: https://lore.kernel.org/r/20220818100127.249401-1-ak@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Breaking a weak group requires multiple passes of an evlist, with
multiple runs this can introduce bugs ultimately leading to
segfaults. Add a test to cover this.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220822213352.75721-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
If a weak group is broken then the reset_group flag remains set for
the next run. Having reset_group set means the counter isn't created
and ultimately a segfault.
A simple reproduction of this is:
# perf stat -r2 -e '{cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles}:W
which will be added as a test in the next patch.
Fixes: 4804e0111662d7d8 ("perf stat: Use affinity for opening events")
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220822213352.75721-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick the changes from:
ae3b1da95413614f ("KVM: arm64: Fix compile error due to sign extension")
That doesn't result in any changes in tooling (when built on x86), only
addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/all/YwOMCCc4E79FuvDe@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The previous change to Python autodetection had a small mistake where
the auto value was used to determine the Python binary, rather than the
user supplied value. The Python binary is only used for one part of the
build process, rather than the final linking, so it was producing
correct builds in most scenarios, especially when the auto detected
value matched what the user wanted, or the system only had a valid set
of Pythons.
Change it so that the Python binary path is derived from either the
PYTHON_CONFIG value or PYTHON value, depending on what is specified by
the user. This was the original intention.
This error was spotted in a build failure an odd cross compilation
environment after commit 4c41cb46a732fe82 ("perf python: Prefer
python3") was merged.
Fixes: 630af16eee495f58 ("perf tools: Use Python devtools for version autodetection rather than runtime")
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220728093946.1337642-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Stephen Rothwell reported htmldocs warning when merging net-next:
Documentation/networking/kapi:26: net/core/skbuff.c:780: WARNING: Error in declarator or parameters
Invalid C declaration: Expecting "(" in parameters. [error at 19]
void __fix_address kfree_skb_reason (struct sk_buff *skb, enum skb_drop_reason reason)
-------------------^
Add __fix_address keyword to c_id_attributes array in conf.py to fix
the warning.
Link: https://lore.kernel.org/linux-next/20220825154105.534d78ab@canb.auug.org.au/
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
|
Existing code was generating bogus counts for the SNB IMC bandwidth counters:
$ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/
1.000327813 1,024.03 MiB uncore_imc/data_reads/
1.000327813 20.73 MiB uncore_imc/data_writes/
2.000580153 261,120.00 MiB uncore_imc/data_reads/
2.000580153 23.28 MiB uncore_imc/data_writes/
The problem was introduced by commit:
07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC")
Where the read_counter callback was replace to point to the generic
uncore_mmio_read_counter() function.
The SNB IMC counters are freerunnig 32-bit counters laid out contiguously in
MMIO. But uncore_mmio_read_counter() is using a readq() call to read from
MMIO therefore reading 64-bit from MMIO. Although this is okay for the
uncore_perf_event_update() function because it is shifting the value based
on the actual counter width to compute a delta, it is not okay for the
uncore_pmu_event_start() which is simply reading the counter and therefore
priming the event->prev_count with a bogus value which is responsible for
causing bogus deltas in the perf stat command above.
The fix is to reintroduce the custom callback for read_counter for the SNB
IMC PMU and use readl() instead of readq(). With the change the output of
perf stat is back to normal:
$ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/
1.000120987 296.94 MiB uncore_imc/data_reads/
1.000120987 138.42 MiB uncore_imc/data_writes/
2.000403144 175.91 MiB uncore_imc/data_reads/
2.000403144 68.50 MiB uncore_imc/data_writes/
Fixes: 07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC")
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Link: https://lore.kernel.org/r/20220803160031.1379788-1-eranian@google.com
|
|
fbcon_do_set_font() calls vc_resize() when font size is changed.
However, if if vc_resize() failed, current implementation doesn't
revert changes for font size, and this causes inconsistent state.
syzbot reported unable to handle page fault due to this issue [1].
syzbot's repro uses fault injection which cause failure for memory
allocation, so vc_resize() failed.
This patch fixes this issue by properly revert changes for font
related date when vc_resize() failed.
Link: https://syzkaller.appspot.com/bug?id=3443d3a1fa6d964dd7310a0cb1696d165a3e07c4 [1]
Reported-by: syzbot+a168dbeaaa7778273c1b@syzkaller.appspotmail.com
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Signed-off-by: Helge Deller <deller@gmx.de>
CC: stable@vger.kernel.org # 5.15+
|
|
There are several places in the kernel where wait_on_bit is not followed
by a memory barrier (for example, in drivers/md/dm-bufio.c:new_read).
On architectures with weak memory ordering, it may happen that memory
accesses that follow wait_on_bit are reordered before wait_on_bit and
they may return invalid data.
Fix this class of bugs by introducing a new function "test_bit_acquire"
that works like test_bit, but has acquire memory ordering semantics.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The success and return_code are needed by the filters. Move
audit_return_fixup() before the filters. This was causing syscall
auditing events to be missed.
Link: https://github.com/linux-audit/audit-kernel/issues/138
Cc: stable@vger.kernel.org
Fixes: 12c5e81d3fd0 ("audit: prepare audit_context for use in calling contexts beyond syscalls")
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: manual merge required]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
In a situation where memory allocation fails, an invalid buffer address
is stored. When this descriptor is used again, the system panics in the
build_skb() function when accessing memory.
Fixes: 7ea6cd16f159 ("lantiq: net: fix duplicated skb in rx descriptor ring")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the xrx200_hw_receive() function returns -ENOMEM, the NAPI poll
function immediately returns an error.
This is incorrect for two reasons:
* the function terminates without enabling interrupts or scheduling NAPI,
* the error code (-ENOMEM) is returned instead of the number of received
packets.
After the first memory allocation failure occurs, packet reception is
locked due to disabled interrupts from DMA..
Fixes: fe1a56420cf2 ("net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
xrx200_hw_receive() assumes build_skb() always works and goes straight
to skb_reserve(). However, build_skb() can fail under memory pressure.
Add a check in case build_skb() failed to allocate and return NULL.
Fixes: e015593573b3 ("net: lantiq_xrx200: convert to build_skb")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This is a follow-up to the discussion in [0]. It seems to me that
at least the IP version used on Amlogic SoC's sometimes has a problem
if register MAC_CTRL_REG is written whilst the chip is still processing
a previous write. But that's just a guess.
Adding a delay between two writes to this register helps, but we can
also simply omit the offending second write. This patch uses the second
approach and is based on a suggestion from Qi Duan.
Benefit of this approach is that we can save few register writes, also
on not affected chip versions.
[0] https://www.spinics.net/lists/netdev/msg831526.html
Fixes: bfab27a146ed ("stmmac: add the experimental PCI support")
Suggested-by: Qi Duan <qi.duan@amlogic.com>
Suggested-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/e99857ce-bd90-5093-ca8c-8cd480b5a0a2@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Assign a random mac address to the VF interface station
address if it boots with a zero mac address in order to match
similar behavior seen in other VF drivers. Handle the errors
where the older firmware does not allow the VF to set its own
station address.
Newer firmware will allow the VF to set the station mac address
if it hasn't already been set administratively through the PF.
Setting it will also be allowed if the VF has trust.
Fixes: fbb39807e9ae ("ionic: support sr-iov operations")
Signed-off-by: R Mohamed Shah <mohamed@pensando.io>
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|