linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2019-08-28	perf evsel: Kernel profiling is disallowed only when perf_event_paranoid > 1	Igor Lubashev	1	-1/+1
	Perf was too restrictive about sysctl kernel.perf_event_paranoid. The kernel only disallows profiling when perf_event_paranoid > 1. Make perf do the same. Committer testing: For a non-root user: $ id uid=1000(acme) gid=1000(acme) groups=1000(acme),10(wheel) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 $ Before: We were restricting it to just userspace (:u suffix) even for a workload started by the user: $ perf record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.001 MB perf.data (8 samples) ] $ perf evlist cycles:u $ perf evlist -v cycles:u: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|PERIOD, read_format: ID, disabled: 1, inherit: 1, exclude_kernel: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 $ perf report --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 8 of event 'cycles:u' # Event count (approx.): 1040396 # # Overhead Command Shared Object Symbol # ........ ....... ................ ...................... # 68.36% sleep libc-2.29.so [.] _dl_addr 27.33% sleep ld-2.29.so [.] dl_main 3.80% sleep ld-2.29.so [.] _dl_setup_hash # # (Tip: Order by the overhead of source file name and line number: perf report -s srcline) # $ $ After: When the kernel allows profiling the kernel in that scenario: $ perf record sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.023 MB perf.data (11 samples) ] $ perf evlist cycles $ perf evlist -v cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP\|TID\|TIME\|PERIOD, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1 $ $ perf report --stdio # To display the perf.data header info, please use --header/--header-only options. # # Total Lost Samples: 0 # # Samples: 11 of event 'cycles' # Event count (approx.): 1601964 # # Overhead Command Shared Object Symbol # ........ ....... ................ .......................... # 28.14% sleep [kernel.vmlinux] [k] __rb_erase_color 27.21% sleep [kernel.vmlinux] [k] unmap_page_range 27.20% sleep ld-2.29.so [.] __tunable_get_val 15.24% sleep [kernel.vmlinux] [k] thp_get_unmapped_area 1.96% perf [kernel.vmlinux] [k] perf_event_exec 0.22% perf [kernel.vmlinux] [k] native_sched_clock 0.02% perf [kernel.vmlinux] [k] intel_bts_enable_local 0.00% perf [kernel.vmlinux] [k] native_write_msr # # (Tip: Boolean options have negative forms, e.g.: perf report --no-children) # $ Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Igor Lubashev <ilubashe@akamai.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: James Morris <jmorris@namei.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1566869956-7154-4-git-send-email-ilubashe@akamai.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-28	perf tools: Use CAP_SYS_ADMIN with perf_event_paranoid checks	Igor Lubashev	5	-5/+8
	The kernel is using CAP_SYS_ADMIN instead of euid==0 to override perf_event_paranoid check. Make perf do the same. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Mathieu Poirier <mathieu.poirier@linaro.org> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> # coresight part Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: James Morris <jmorris@namei.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1566869956-7154-3-git-send-email-ilubashe@akamai.com Signed-off-by: Igor Lubashev <ilubashe@akamai.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-28	perf event: Check ref_reloc_sym before using it	Igor Lubashev	1	-3/+4
	Check for ref_reloc_sym before using it instead of checking symbol_conf.kptr_restrict and relying solely on that check. Reported-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Igor Lubashev <ilubashe@akamai.com> Tested-by: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: James Morris <jmorris@namei.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1566869956-7154-2-git-send-email-ilubashe@akamai.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-28	perf arch powerpc: Sync powerpc syscall.tbl	Naveen N. Rao	1	-27/+119
	Copy over powerpc syscall.tbl to grab changes from the below commits: commit cee3536d24a1 ("powerpc: Wire up clone3 syscall") commit 1a271a68e030 ("arch: mark syscall number 435 reserved for clone3") commit 7615d9e1780e ("arch: wire-up pidfd_open()") commit d8076bdb56af ("uapi: Wire up the mount API syscalls on non-x86 arches [ver #2]") commit 39036cd27273 ("arch: add pidfd and io_uring syscalls everywhere") commit 48166e6ea47d ("y2038: add 64-bit time_t syscalls to all 32-bit architectures") commit d33c577cccd0 ("y2038: rename old time and utime syscalls") commit 00bf25d693e7 ("y2038: use time32 syscall names on 32-bit") commit 8dabe7245bbc ("y2038: syscalls: rename y2038 compat syscalls") commit 0d6040d46817 ("arch: add split IPC system calls where needed") Reported-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20190827071458.19897-1-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-28	perf/x86/intel: Support PEBS output to PT	Alexander Shishkin	7	-1/+130
	If PEBS declares ability to output its data to Intel PT stream, use the aux_output attribute bit to enable PEBS data output to PT. This requires a PT event to be present and scheduled in the same context. Unlike the DS area, the kernel does not extract PEBS records from the PT stream to generate corresponding records in the perf stream, because that would require real time in-kernel PT decoding, which is not feasible. The PMI, however, can still be used. The output setting is per-CPU, so all PEBS events must be either writing to PT or to the DS area, therefore, in case of conflict, the conflicting event will fail to schedule, allowing the rotation logic to alternate between the PEBS->PT and PEBS->DS events. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: kan.liang@linux.intel.com Link: https://lkml.kernel.org/r/20190806084606.4021-3-alexander.shishkin@linux.intel.com
2019-08-28	perf: Allow normal events to output AUX data	Alexander Shishkin	3	-1/+109
	In some cases, ordinary (non-AUX) events can generate data for AUX events. For example, PEBS events can come out as records in the Intel PT stream instead of their usual DS records, if configured to do so. One requirement for such events is to consistently schedule together, to ensure that the data from the "AUX output" events isn't lost while their corresponding AUX event is not scheduled. We use grouping to provide this guarantee: an "AUX output" event can be added to a group where an AUX event is a group leader, and provided that the former supports writing to the latter. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: kan.liang@linux.intel.com Link: https://lkml.kernel.org/r/20190806084606.4021-2-alexander.shishkin@linux.intel.com
2019-08-26	perf evsel: Rename perf_missing_features::bpf_event to ::bpf	Arnaldo Carvalho de Melo	2	-6/+5
	No need for that _event suffix, do just like all the other meta events and do away with that. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lkml.kernel.org/n/tip-bvc83f380dva83wlg52yd10t@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	perf tool: Rename perf_tool::bpf_event to bpf	Arnaldo Carvalho de Melo	8	-30/+29
	No need for that _event suffix, do just like all the other meta event handlers and suppress that suffix. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lkml.kernel.org/n/tip-03spzxtqafbabbbmnm7y4xfx@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	perf tools: Rename perf_event::bpf_event to perf_event::bpf	Arnaldo Carvalho de Melo	3	-13/+10
	Just like all the other meta events, that extra _event suffix is just redundant, ditch it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lkml.kernel.org/n/tip-505qwpaizq1k0t6pk13v1ibd@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	perf tools: Rename perf_event::ksymbol_event to perf_event::ksymbol	Arnaldo Carvalho de Melo	4	-13/+13
	Just like all the other meta events, that extra _event suffix is just redundant, ditch it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lkml.kernel.org/n/tip-0q8b2xnfs17q0g523oej75s0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	libperf: Rename the PERF_RECORD_ structs to have a "perf" suffix	Arnaldo Carvalho de Melo	13	-69/+69
	Even more, to have a "perf_record_" prefix, so that they match the PERF_RECORD_ enum they map to. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-qbabmcz2a0pkzt72liyuz3p8@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	libperf: Add PERF_RECORD_SAMPLE 'struct sample_event' to perf/event.h	Jiri Olsa	4	-10/+10
	Move the PERF_RECORD_SAMPLE event definition to libperf's event.h header include. In order to keep libperf simple, we switch 'u64/u32/u16/u8' types used events to their generic '__u*' versions. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190825181752.722-13-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	libperf: Add PERF_RECORD_BPF_EVENT 'struct bpf_event' to perf/event.h	Jiri Olsa	2	-10/+11
	Move the PERF_RECORD_BPF_EVENT event definition to libperf's event.h. In order to keep libperf simple, we switch 'u64/u32/u16/u8' types used events to their generic '__u*' versions. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190825181752.722-12-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	libperf: Add PERF_RECORD_KSYMBOL 'struct ksymbol_event' to perf/event.h	Jiri Olsa	3	-14/+14
	Move the PERF_RECORD_KSYMBOL event definition into libperf's event.h header include. In order to keep libperf simple, we switch 'u64/u32/u16/u8' types used events to their generic '__u' versions. Perf added 'u' types mainly to ease up printing __u64 values as stated in the linux/types.h comment: /* * We define u64 as uint64_t for every architecture * so that we can print it with "%"PRIx64 without getting warnings. * * typedef __u64 u64; * typedef __s64 s64; / Add and use new PRI_lu64 and PRI_lx64 macros for that. Use extra '_' to ease up the reading and differentiate them from standard PRI64 macros. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190825181752.722-11-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	libperf: Add PERF_RECORD_THROTTLE 'struct throttle_event' to perf/event.h	Jiri Olsa	3	-9/+9
	Move the PERF_RECORD_THROTTLE event definition into libperf's event.h header include. In order to keep libperf simple, we switch 'u64/u32/u16/u8' types used events to their generic '__u' versions. Perf added 'u' types mainly to ease up printing __u64 values as stated in the linux/types.h comment: /* * We define u64 as uint64_t for every architecture * so that we can print it with "%"PRIx64 without getting warnings. * * typedef __u64 u64; * typedef __s64 s64; / Add and use new PRI_lu64 and PRI_lx64 macros for that. Use extra '_' to ease up the reading and differentiate them from standard PRI64 macros. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190825181752.722-10-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	libperf: Add PERF_RECORD_READ 'struct read_event' to perf/event.h	Jiri Olsa	3	-16/+16
	Move the PERF_RECORD_READ event definition to libperf's event.h header include. In order to keep libperf simple, we switch 'u64/u32/u16/u8' types used events to their generic '__u' versions. Perf added 'u' types mainly to ease up printing __u64 values as stated in the linux/types.h comment: /* * We define u64 as uint64_t for every architecture * so that we can print it with "%"PRIx64 without getting warnings. * * typedef __u64 u64; * typedef __s64 s64; / Add and use new PRI_lu64 and PRI_lx64 macros for that. Use extra '_' to ease up the reading and differentiate them from standard PRI64 macros. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190825181752.722-9-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	libperf: Add PERF_RECORD_LOST_SAMPLES 'struct lost_samples_event' to perf/event.h	Jiri Olsa	3	-6/+6
	Move the PERF_RECORD_LOST_SAMPLES event definition into libperf's event.h header include. In order to keep libperf simple, we switch 'u64/u32/u16/u8' types used events to their generic '__u' versions. Perf added 'u' types mainly to ease up printing __u64 values as stated in the linux/types.h comment: /* * We define u64 as uint64_t for every architecture * so that we can print it with "%"PRIx64 without getting warnings. * * typedef __u64 u64; * typedef __s64 s64; / Add and use new PRI_lu64 and PRI_lx64 macros for that. Use extra '_' to ease up the reading and differentiate them from standard PRI64 macros. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190825181752.722-8-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	libperf: Add PERF_RECORD_LOST 'struct lost_event' to perf/event.h	Jiri Olsa	6	-11/+11
	Move the lost_event event definition to libperf's event.h header include. In order to keep libperf simple, we switch 'u64/u32/u16/u8' types used events to their generic '__u' versions. Perf added 'u' types mainly to ease up printing __u64 values as stated in the linux/types.h comment: /* * We define u64 as uint64_t for every architecture * so that we can print it with "%"PRIx64 without getting warnings. * * typedef __u64 u64; * typedef __s64 s64; / Add and use new PRI_lu64 and PRI_lx64 macros for that. Use extra '_' to ease up the reading and differentiate them from standard PRI64 macros. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190825181752.722-7-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	libperf: Add PERF_RECORD_FORK 'struct fork_event' to perf/event.h	Jiri Olsa	3	-8/+8
	Move the fork_event event definition into libperf's event.h header include. In order to keep libperf simple, we switch 'u64/u32/u16/u8' types used events to their generic '__u' versions. Perf added 'u' types mainly to ease up printing __u64 values as stated in the linux/types.h comment: /* * We define u64 as uint64_t for every architecture * so that we can print it with "%"PRIx64 without getting warnings. * * typedef __u64 u64; * typedef __s64 s64; / Add and use new PRI_lu64 and PRI_lx64 macros for that. Using extra '_' to ease up the reading and differentiate them from standard PRI64 macros. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190825181752.722-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	libperf: Add PERF_RECORD_NAMESPACES 'struct namespaces_event' to perf/event.h	Jiri Olsa	2	-7/+7
	Move the namespaces_event event definition into libperf's event.h header include. In order to keep libperf simple, we switch 'u64/u32/u16/u8' types used events to their generic '__u*' versions. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190825181752.722-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	libperf: Add PERF_RECORD_COMM 'struct comm_event' to perf/event.h	Jiri Olsa	2	-6/+6
	Moving comm_event event definition into libperf's event.h header include. In order to keep libperf simple, we switch 'u64/u32/u16/u8' types used events to their generic '__u*' versions. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190825181752.722-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	libperf: Add PERF_RECORD_MMAP2 'struct mmap2_event' to perf/event.h	Jiri Olsa	3	-18/+18
	Moving mmap2_event event definition into libperf's event.h header include. In order to keep libperf simple, we switch 'u64/u32/u16/u8' types used events to their generic '__u' versions. Perf added 'u' types mainly to ease up printing __u64 values as stated in the linux/types.h comment: /* * We define u64 as uint64_t for every architecture * so that we can print it with "%"PRIx64 without getting warnings. * * typedef __u64 u64; * typedef __s64 s64; / Adding and using new PRI_lu64 and PRI_lx64 macros to be used for that. Using extra '_' to ease up the reading and differentiate them from standard PRI64 macros. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/n/tip-ufs9ityr5w2xqwtd5w3p6dm4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	libperf: Add PERF_RECORD_MMAP 'struct mmap_event' to perf/event.h	Jiri Olsa	4	-11/+35
	Move the mmap_event event definition to libperf's event.h header include. In order to keep libperf simple, we switch 'u64/u32/u16/u8' types used events to their generic '__u' versions. Perf added 'u' types mainly to ease up printing __u64 values as stated in the linux/types.h comment: /* * We define u64 as uint64_t for every architecture * so that we can print it with "%"PRIx64 without getting warnings. * * typedef __u64 u64; * typedef __s64 s64; / Add and use new PRI_lu64 and PRI_lx64 macros for that. Use extra '_' to ease up reading and differentiate them from standard PRI64 macros. Committer notes: Fixup the PRI_l[ux]64 macros on 32-bit arches, conditionally defining it with that extra 'l' modifier only on arches where __u64 is long long, leaving it aside on 32-bit arches. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20190825181752.722-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	perf script: Fix memory leaks in list_scripts()	Gustavo A. R. Silva	1	-2/+4
	In case memory resources for buf and paths were allocated, jump to out and release them before return. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Gustavo A. R. Silva <gustavo@embeddedor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Addresses-Coverity-ID: 1444328 ("Resource leak") Fixes: 6f3da20e151f ("perf report: Support builtin perf script in scripts menu") Link: http://lkml.kernel.org/r/20190408162748.GA21008@embeddedor Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	perf report: Fix --ns time sort key output	Andi Kleen	1	-1/+4
	If the user specified --ns, the column to print the sort time stamp wasn't wide enough to actually print the full nanoseconds. Widen the time key column width when --ns is specified. Before: % perf record -a sleep 1 % perf report --sort time,overhead,symbol --stdio --ns ... 2.39% 187851.10000 [k] smp_call_function_single - - 1.53% 187851.10000 [k] intel_idle - - 0.59% 187851.10000 [.] __wcscmp_ifunc - - 0.33% 187851.10000 [.] 0000000000000000 - - 0.28% 187851.10000 [k] cpuidle_enter_state - - After: % perf report --sort time,overhead,symbol --stdio --ns ... 2.39% 187851.100000000 [k] smp_call_function_single - - 1.53% 187851.100000000 [k] intel_idle - - 0.59% 187851.100000000 [.] __wcscmp_ifunc - - 0.33% 187851.100000000 [.] 0000000000000000 - - 0.28% 187851.100000000 [k] cpuidle_enter_state - - Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20190823210338.12360-2-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	perf report: Use timestamp__scnprintf_nsec() for time sort key	Andi Kleen	1	-8/+2
	Use timestamp__scnprintf_nsec() to print nanoseconds for the time sort key, instead of open coding. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20190823210338.12360-1-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	perf tools: Remove duplicate headers	Souptick Joarder	3	-3/+0
	Removed headers which are included twice. Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1566663319-4283-1-git-send-email-jrdr.linux@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	perf augmented_raw_syscalls: Reduce perf_event_output() boilerplate	Arnaldo Carvalho de Melo	1	-12/+12
	Add a augmented__output() helper to reduce the boilerplate of sending the augmented tracepoint to the PERF_EVENT_ARRAY BPF map associated with the bpf-output event used to communicate with the userspace perf trace tool. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-ln99gt0j4fv0kw0778h6vphm@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	perf augmented_raw_syscalls: Introduce helper to get the scratch space	Arnaldo Carvalho de Melo	1	-16/+16
	We need more than the BPF stack can give us to format the raw_syscalls:sys_enter augmented tracepoint, so we use a PERCPU_ARRAY map for that, use a helper to shorten the sequence to access that area. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	perf augmented_raw_syscalls: Postpone tmp map lookup to after pid_filter	Arnaldo Carvalho de Melo	1	-3/+3
	No sense in doing that lookup before figuring out if it will be used, i.e. if the pid is being filtered that tmp space lookup will be useless. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-o74yggieorucfg4j74tb6rta@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	perf augmented_raw_syscalls: Rename augmented_filename to augmented_arg	Arnaldo Carvalho de Melo	1	-22/+20
	Because it is not used only for strings, we already use it for sockaddr structs and will use it for all other types. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-w9nkt3tvmyn5i4qnwng3ap1k@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	perf trace beauty ioctl: Fix off-by-one error in cmd->string table	Benjamin Peterson	1	-1/+1
	While tracing a program that calls isatty(3), I noticed that strace reported TCGETS for the request argument of the underlying ioctl(2) syscall while perf trace reported TCSETS. strace is corrrect. The bug in perf was due to the tty ioctl beauty table starting at 0x5400 rather than 0x5401. Committer testing: Using augmented_raw_syscalls.o and settings to make 'perf trace' use strace formatting, i.e. with this in ~/.perfconfig # cat ~/.perfconfig [trace] add_events = /home/acme/git/linux/tools/perf/examples/bpf/augmented_raw_syscalls.c show_zeros = yes show_duration = no no_inherit = yes show_timestamp = no show_arg_names = no args_alignment = 40 show_prefix = yes # strace -e ioctl stty > /dev/null ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0 ioctl(1, TIOCGWINSZ, 0x7fff8a9b0860) = -1 ENOTTY (Inappropriate ioctl for device) ioctl(1, TCGETS, 0x7fff8a9b0540) = -1 ENOTTY (Inappropriate ioctl for device) +++ exited with 0 +++ # Before: # perf trace -e ioctl stty > /dev/null ioctl(0, TCSETS, 0x7fff2cf79f20) = 0 ioctl(1, TIOCSWINSZ, 0x7fff2cf79f40) = -1 ENOTTY (Inappropriate ioctl for device) ioctl(1, TCSETS, 0x7fff2cf79c20) = -1 ENOTTY (Inappropriate ioctl for device) # After: # perf trace -e ioctl stty > /dev/null ioctl(0, TCGETS, 0x7ffed0763920) = 0 ioctl(1, TIOCGWINSZ, 0x7ffed0763940) = -1 ENOTTY (Inappropriate ioctl for device) ioctl(1, TCGETS, 0x7ffed0763620) = -1 ENOTTY (Inappropriate ioctl for device) # Signed-off-by: Benjamin Peterson <benjamin@python.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 1cc47f2d46206d67285aea0ca7e8450af571da13 ("perf trace beauty ioctl: Improve 'cmd' beautifier") Link: http://lkml.kernel.org/r/20190823033625.18814-1-benjamin@python.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	perf tests: Fixes hang in zstd compression test by changing the source of random data	James Clark	1	-1/+1
	Running 'perf test' with zstd compression linked will hang at the test 'Zstd perf.data compression/decompression' because /dev/random blocks reads until there is enough entropy. This means that the test will appear to never complete unless the mouse is continually moved while running it. Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Jeremy Linton <jeremy.linton@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/3d8cc701-df4e-f949-1715-5118b530e990@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	perf sort: Remove needless headers from sort.h, provide fwd struct decls	Arnaldo Carvalho de Melo	5	-13/+8
	Reducing the includes hell a bit more, speeding up the build and avoiding needless rebuilds when just one of those files gets updated. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-u63el2vqsovsmnhebx1rcixo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	perf srcline: Add missing srcline.h header to files needing its defs	Arnaldo Carvalho de Melo	6	-0/+9
	When srcline was introduced it wrongly added the include to util/sort.h, even with that header not needing the definitions it provides, fix it by adding it to the places that need it as a pre patch to remove srcline.h from sort.h. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-shuebppedtye8hrgxk15qe3x@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	perf cacheline: Move cacheline related routines to separate files	Arnaldo Carvalho de Melo	8	-33/+50
	To disentangle util/sort.h a bit more. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-6kbf2cauas06rbqp15pyter5@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	perf record: Move record_opts and other record decls out of perf.h	Arnaldo Carvalho de Melo	30	-73/+107
	And into a separate util/record.h, to better isolate things and make sure that those who use record_opts and the other moved declarations are explicitly including the necessary header. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-31q8mei1qkh74qvkl9nwidfq@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	perf stat: Remove needless headers from stat.h	Arnaldo Carvalho de Melo	1	-3/+2
	Just a forward declaration for 'struct timespec' is needed, ditch the rest. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-6shdqw801oqe7ax6r307k27r@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	perf cpumap: No need to include perf.h, ditch it	Arnaldo Carvalho de Melo	1	-2/+0
	From a quick look this was never needed and just polluted the build, needlessly making things including cpumap.h to be rebuild if perf.h or anything it includes gets changed. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-x10p8slllqkn3fc3bntjx3n0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26	perf/x86/intel/pt: Get rid of reverse lookup table for ToPA	Alexander Shishkin	2	-73/+131
	In order to quickly find a ToPA entry by its page offset in the buffer, we're using a reverse lookup table. The problem with it is that it's a large array of mostly similar pointers, especially so now that we're using high order allocations from the page allocator. Because its size is limited to whatever is the maximum for kmalloc(), it places a limit on the number of ToPA entries per buffer, and therefore, on the total buffer size, which otherwise doesn't have to be there. Replace the reverse lookup table with a simple runtime lookup. With the high order AUX allocations in place, the runtime penalty of such a lookup is much smaller and in cases where all entries in a ToPA table are of the same size, the complexity is O(1). Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20190821124727.73310-7-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-26	perf/x86/intel/pt: Free up space in a ToPA descriptor	Alexander Shishkin	1	-8/+10
	Currently, we're storing physical address of a ToPA table in its descriptor, which is completely unnecessary. Since the descriptor and the table itself share the same page, reducing the descriptor size leaves more space for the table. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20190821124727.73310-6-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-26	perf/x86/intel/pt: Split ToPA metadata and page layout	Alexander Shishkin	1	-33/+60
	PT uses page sized ToPA tables, where the ToPA table resides at the bottom and its driver-specific metadata taking up a few words at the top of the page. The split is currently calculated manually and needs to be redone every time a field is added to or removed from the metadata structure. Also, the 32-bit version can be made smaller. By splitting the table and metadata into separate structures, we are making the compiler figure out the division of the page. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20190821124727.73310-5-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-26	perf/x86/intel/pt: Use pointer arithmetics instead in ToPA entry calculation	Alexander Shishkin	1	-2/+1
	Currently, pt_buffer_reset_offsets() calculates the current ToPA entry by casting pointers to addresses and performing ungainly subtractions and divisions instead of a simpler pointer arithmetic, which would be perfectly applicable in that case. Fix that. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20190821124727.73310-4-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-26	perf/x86/intel/pt: Use helpers to obtain ToPA entry size	Alexander Shishkin	1	-6/+6
	There are a few places in the PT driver that need to obtain the size of a ToPA entry, some of them for the current ToPA entry in the buffer. Use helpers for those, to make the lines shorter and more readable. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20190821124727.73310-3-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-26	perf/x86/intel/pt: Clean up ToPA allocation path	Alexander Shishkin	2	-10/+7
	Some of the allocation parameters are passed as function arguments, while the CPU number for per-cpu allocation is passed via the buffer object. There's no reason for this. Pass the CPU as a function argument instead. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20190821124727.73310-2-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-08-25	Linux 5.3-rc6	Linus Torvalds	1	-1/+1

2019-08-24	mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=y	Andrey Ryabinin	1	-2/+8
	The code like this: ptr = kmalloc(size, GFP_KERNEL); page = virt_to_page(ptr); offset = offset_in_page(ptr); kfree(page_address(page) + offset); may produce false-positive invalid-free reports on the kernel with CONFIG_KASAN_SW_TAGS=y. In the example above we lose the original tag assigned to 'ptr', so kfree() gets the pointer with 0xFF tag. In kfree() we check that 0xFF tag is different from the tag in shadow hence print false report. Instead of just comparing tags, do the following: 1) Check that shadow doesn't contain KASAN_TAG_INVALID. Otherwise it's double-free and it doesn't matter what tag the pointer have. 2) If pointer tag is different from 0xFF, make sure that tag in the shadow is the same as in the pointer. Link: http://lkml.kernel.org/r/20190819172540.19581-1-aryabinin@virtuozzo.com Fixes: 7f94ffbc4c6a ("kasan: add hooks implementation for tag-based mode") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: Walter Wu <walter-zh.wu@mediatek.com> Reported-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24	mm/zsmalloc.c: fix race condition in zs_destroy_pool	Henry Burns	1	-2/+59
	In zs_destroy_pool() we call flush_work(&pool->free_work). However, we have no guarantee that migration isn't happening in the background at that time. Since migration can't directly free pages, it relies on free_work being scheduled to free the pages. But there's nothing preventing an in-progress migrate from queuing the work after zs_unregister_migration() has called flush_work(). Which would mean pages still pointing at the inode when we free it. Since we know at destroy time all objects should be free, no new migrations can come in (since zs_page_isolate() fails for fully-free zspages). This means it is sufficient to track a "# isolated zspages" count by class, and have the destroy logic ensure all such pages have drained before proceeding. Keeping that state under the class spinlock keeps the logic straightforward. In this case a memory leak could lead to an eventual crash if compaction hits the leaked page. This crash would only occur if people are changing their zswap backend at runtime (which eventually starts destruction). Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com Fixes: 48b4800a1c6a ("zsmalloc: page migration support") Signed-off-by: Henry Burns <henryburns@google.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Jonathan Adams <jwadams@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24	mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely	Henry Burns	1	-4/+15
	In zs_page_migrate() we call putback_zspage() after we have finished migrating all pages in this zspage. However, the return value is ignored. If a zs_free() races in between zs_page_isolate() and zs_page_migrate(), freeing the last object in the zspage, putback_zspage() will leave the page in ZS_EMPTY for potentially an unbounded amount of time. To fix this, we need to do the same thing as zs_page_putback() does: schedule free_work to occur. To avoid duplicated code, move the sequence to a new putback_zspage_deferred() function which both zs_page_migrate() and zs_page_putback() call. Link: http://lkml.kernel.org/r/20190809181751.219326-1-henryburns@google.com Fixes: 48b4800a1c6a ("zsmalloc: page migration support") Signed-off-by: Henry Burns <henryburns@google.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Henry Burns <henrywolfeburns@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Jonathan Adams <jwadams@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-08-24	mm, page_owner: handle THP splits correctly	Vlastimil Babka	1	-0/+4
	THP splitting path is missing the split_page_owner() call that split_page() has. As a result, split THP pages are wrongly reported in the page_owner file as order-9 pages. Furthermore when the former head page is freed, the remaining former tail pages are not listed in the page_owner file at all. This patch fixes that by adding the split_page_owner() call into __split_huge_page(). Link: http://lkml.kernel.org/r/20190820131828.22684-2-vbabka@suse.cz Fixes: a9627bc5e34e ("mm/page_owner: introduce split_page_owner and replace manual handling") Reported-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>