linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-12-17	bpf: fix regression on verifier pruning wrt map lookups	Daniel Borkmann	1	-0/+28
	Commit 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") introduced a regression where existing programs stopped loading due to reaching the verifier's maximum complexity limit, whereas prior to this commit they were loading just fine; the affected program has roughly 2k instructions. What was found is that state pruning couldn't be performed effectively anymore due to mismatches of the verifier's register state, in particular in the id tracking. It doesn't mean that 57a09bf0a416 is incorrect per se, but rather that verifier needs to perform a lot more work for the same program with regards to involved map lookups. Since commit 57a09bf0a416 is only about tracking registers with type PTR_TO_MAP_VALUE_OR_NULL, the id is only needed to follow registers until they are promoted through pattern matching with a NULL check to either PTR_TO_MAP_VALUE or UNKNOWN_VALUE type. After that point, the id becomes irrelevant for the transitioned types. For UNKNOWN_VALUE, id is already reset to 0 via mark_reg_unknown_value(), but not so for PTR_TO_MAP_VALUE where id is becoming stale. It's even transferred further into other types that don't make use of it. Among others, one example is where UNKNOWN_VALUE is set on function call return with RET_INTEGER return type. states_equal() will then fall through the memcmp() on register state; note that the second memcmp() uses offsetofend(), so the id is part of that since d2a4dd37f6b4 ("bpf: fix state equivalence"). But the bisect pointed already to 57a09bf0a416, where we really reach beyond complexity limit. What I found was that states_equal() often failed in this case due to id mismatches in spilled regs with registers in type PTR_TO_MAP_VALUE. Unlike non-spilled regs, spilled regs just perform a memcmp() on their reg state and don't have any other optimizations in place, therefore also id was relevant in this case for making a pruning decision. We can safely reset id to 0 as well when converting to PTR_TO_MAP_VALUE. For the affected program, it resulted in a ~17 fold reduction of complexity and let the program load fine again. Selftest suite also runs fine. The only other place where env->id_gen is used currently is through direct packet access, but for these cases id is long living, thus a different scenario. Also, the current logic in mark_map_regs() is not fully correct when marking NULL branch with UNKNOWN_VALUE. We need to cache the destination reg's id in any case. Otherwise, once we marked that reg as UNKNOWN_VALUE, it's id is reset and any subsequent registers that hold the original id and are of type PTR_TO_MAP_VALUE_OR_NULL won't be marked UNKNOWN_VALUE anymore, since mark_map_reg() reuses the uncached regs[regno].id that was just overridden. Note, we don't need to cache it outside of mark_map_regs(), since it's called once on this_branch and the other time on other_branch, which are both two independent verifier states. A test case for this is added here, too. Fixes: 57a09bf0a416 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-16	Merge tag 'powerpc-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux	Linus Torvalds	53	-337/+3307
	Pull powerpc updates from Michael Ellerman: "Highlights include: - Support for the kexec_file_load() syscall, which is a prereq for secure and trusted boot. - Prevent kernel execution of userspace on P9 Radix (similar to SMEP/PXN). - Sort the exception tables at build time, to save time at boot, and store them as relative offsets to save space in the kernel image & memory. - Allow building the kernel with thin archives, which should allow us to build an allyesconfig once some other fixes land. - Build fixes to allow us to correctly rebuild when changing the kernel endian from big to little or vice versa. - Plumbing so that we can avoid doing a full mm TLB flush on P9 Radix. - Initial stack protector support (-fstack-protector). - Support for dumping the radix (aka. Linux) and hash page tables via debugfs. - Fix an oops in cxl coredump generation when cxl_get_fd() is used. - Freescale updates from Scott: "Highlights include 8xx hugepage support, qbman fixes/cleanup, device tree updates, and some misc cleanup." - Many and varied fixes and minor enhancements as always. Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Anshuman Khandual, Anton Blanchard, Balbir Singh, Bartlomiej Zolnierkiewicz, Christophe Jaillet, Christophe Leroy, Denis Kirjanov, Elimar Riesebieter, Frederic Barrat, Gautham R. Shenoy, Geliang Tang, Geoff Levand, Jack Miller, Johan Hovold, Lars-Peter Clausen, Libin, Madhavan Srinivasan, Michael Neuling, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Pan Xinhui, Peter Senna Tschudin, Rashmica Gupta, Rui Teng, Russell Currey, Scott Wood, Simon Guo, Suraj Jitindar Singh, Thiago Jung Bauermann, Tobias Klauser, Vaibhav Jain" [ And thanks to Michael, who took time off from a new baby to get this pull request done. - Linus ] * tag 'powerpc-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (174 commits) powerpc/fsl/dts: add FMan node for t1042d4rdb powerpc/fsl/dts: add sg_2500_aqr105_phy4 alias on t1024rdb powerpc/fsl/dts: add QMan and BMan nodes on t1024 powerpc/fsl/dts: add QMan and BMan nodes on t1023 soc/fsl/qman: test: use DEFINE_SPINLOCK() powerpc/fsl-lbc: use DEFINE_SPINLOCK() powerpc/8xx: Implement support of hugepages powerpc: get hugetlbpage handling more generic powerpc: port 64 bits pgtable_cache to 32 bits powerpc/boot: Request no dynamic linker for boot wrapper soc/fsl/bman: Use resource_size instead of computation soc/fsl/qe: use builtin_platform_driver powerpc/fsl_pmc: use builtin_platform_driver powerpc/83xx/suspend: use builtin_platform_driver powerpc/ftrace: Fix the comments for ftrace_modify_code powerpc/perf: macros for power9 format encoding powerpc/perf: power9 raw event format encoding powerpc/perf: update attribute_group data structure powerpc/perf: factor out the event format field powerpc/mm/iommu, vfio/spapr: Put pages on VFIO container shutdown ...
2016-12-15	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	Linus Torvalds	3	-9/+6
	Pull virtio updates from Michael Tsirkin: "virtio, vhost: new device, fixes, speedups This includes the new virtio crypto device, and fixes all over the place. In particular enabling endian-ness checks for sparse builds found some bugs which this fixes. And it appears that everyone is in agreement that disabling endian-ness sparse checks shouldn't be necessary any longer. So this enables them for everyone, and drops the __CHECK_ENDIAN__ and __bitwise__ APIs. IRQ handling in virtio has been refactored somewhat, the larger switch to IRQ_SHARED will have to wait as it proved too aggressive" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits) Makefile: drop -D__CHECK_ENDIAN__ from cflags fs/logfs: drop __CHECK_ENDIAN__ Documentation/sparse: drop __CHECK_ENDIAN__ linux: drop __bitwise__ everywhere checkpatch: replace __bitwise__ with __bitwise Documentation/sparse: drop __bitwise__ tools: enable endian checks for all sparse builds linux/types.h: enable endian checks for all sparse builds virtio_mmio: Set dev.release() to avoid warning vhost: remove unused feature bit virtio_ring: fix description of virtqueue_get_buf vhost/scsi: Remove unused but set variable tools/virtio: use {READ,WRITE}_ONCE() in uaccess.h vringh: kill off ACCESS_ONCE() tools/virtio: fix READ_ONCE() crypto: add virtio-crypto driver vhost: cache used event for better performance vsock: lookup and setup guest_cid inside vhost_vsock_lock virtio_pci: split vp_try_to_find_vqs into INTx and MSI-X variants virtio_pci: merge vp_free_vectors into vp_del_vqs ...
2016-12-15	Merge tag 'linux-kselftest-4.10-rc1-update' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest	Linus Torvalds	24	-0/+1948
	Pull kselftest updates from Shuah Khan: "This update consists of: - new tests to exercise the Sync Kernel Infrastructure. These tests are part of a battery of Android libsync tests and are re-written to test the new sync user-space interfaces from Emilio López, and Gustavo Padovan. - test to run hw-independent mock tests for i915.ko from Chris Wilson - a new gpio test case from Bamvor Jian Zhang - missing gitignore additions" * tag 'linux-kselftest-4.10-rc1-update' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftest/gpio: add gpio test case selftest: sync: improve assert() failure message kselftests: Exercise hw-independent mock tests for i915.ko selftests: add missing gitignore files/dirs selftests: add missing set-tz to timers .gitignore selftest: sync: stress test for merges selftest: sync: stress consumer/producer test selftest: sync: stress test for parallelism selftest: sync: wait tests for sw_sync framework selftest: sync: merge tests for sw_sync framework selftest: sync: fence tests for sw_sync framework selftest: sync: basic tests for sw_sync framework
2016-12-16	tools: enable endian checks for all sparse builds	Michael S. Tsirkin	1	-4/+0
	We dropped need for __CHECK_ENDIAN__ for linux, this mirrors this for tools. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-16	tools/virtio: use {READ,WRITE}_ONCE() in uaccess.h	Mark Rutland	1	-4/+5
	As a step towards killing off ACCESS_ONCE, use {READ,WRITE}_ONCE() for the virtio tools uaccess primitives, pulling these in from <linux/compiler.h>. With this done, we can kill off the now-unused ACCESS_ONCE() definition. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: linux-kernel@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
2016-12-16	tools/virtio: fix READ_ONCE()	Mark Rutland	1	-1/+1
	The virtio tools implementation of READ_ONCE() has a single parameter called 'var', but erroneously refers to 'val' for its cast, and thus won't work unless there's a variable of the correct type that happens to be called 'var'. Fix this with s/var/val/, making READ_ONCE() work as expected regardless. Fixes: a7c490333df3cff5 ("tools/virtio: use virt_xxx barriers") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: linux-kernel@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
2016-12-15	Merge tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace	Linus Torvalds	7	-7/+131
	Pull tracing updates from Steven Rostedt: "This release has a few updates: - STM can hook into the function tracer - Function filtering now supports more advance glob matching - Ftrace selftests updates and added tests - Softirq tag in traces now show only softirqs - ARM nop added to non traced locations at compile time - New trace_marker_raw file that allows for binary input - Optimizations to the ring buffer - Removal of kmap in trace_marker - Wakeup and irqsoff tracers now adhere to the set_graph_notrace file - Other various fixes and clean ups" * tag 'trace-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (42 commits) selftests: ftrace: Shift down default message verbosity kprobes/trace: Fix kprobe selftest for newer gcc tracing/kprobes: Add a helper method to return number of probe hits tracing/rb: Init the CPU mask on allocation tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results tracing/fgraph: Have wakeup and irqsoff tracers ignore graph functions too fgraph: Handle a case where a tracer ignores set_graph_notrace tracing: Replace kmap with copy_from_user() in trace_marker writing ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it tracing: Allow benchmark to be enabled at early_initcall() tracing: Have system enable return error if one of the events fail tracing: Do not start benchmark on boot up tracing: Have the reg function allow to fail ring-buffer: Force rb_end_commit() and rb_set_commit_to_write() inline ring-buffer: Froce rb_update_write_stamp() to be inlined ring-buffer: Force inline of hotpath helper functions tracing: Make __buffer_unlock_commit() always_inline tracing: Make tracepoint_printk a static_key ring-buffer: Always inline rb_event_data() ring-buffer: Make rb_reserve_next_event() always inlined ...
2016-12-15	tools lib bpf: Add flags to bpf_create_map()	Joe Stringer	3	-3/+5
	Commit 6c905981743 ("bpf: pre-allocate hash map elements") introduces map_flags to bpf_attr for BPF_MAP_CREATE command. Expose this new parameter in libbpf. By exposing it, users can access flags such as whether or not to preallocate the map. Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Wang Nan <wangnan0@huawei.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Link: http://lkml.kernel.org/r/20161209024620.31660-4-joe@ovn.org [ Added clarifying comment made by Wang Nan ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15	tools lib bpf: use __u32 from linux/types.h	Joe Stringer	2	-4/+4
	Fixes the following issue when building without access to 'u32' type: ./tools/lib/bpf/bpf.h:27:23: error: unknown type name ‘u32’ Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Wang Nan <wangnan0@huawei.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Link: http://lkml.kernel.org/r/20161209024620.31660-3-joe@ovn.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15	tools lib bpf: Sync {tools,}/include/uapi/linux/bpf.h	Joe Stringer	1	-229/+364
	The tools version of this header is out of date; update it to the latest version from the kernel headers. Signed-off-by: Joe Stringer <joe@ovn.org> Acked-by: Wang Nan <wangnan0@huawei.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Link: http://lkml.kernel.org/r/20161209024620.31660-2-joe@ovn.org [ Sync it harder, after merging with what was in net-next via perf/urgent via torvalds/master to get BPG_PROG_(AT\|DE)TACH, etc ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15	perf annotate: Fix jump target outside of function address range	Ravi Bangoria	3	-9/+15
	If jump target is outside of function range, perf is not handling it correctly. Especially when target address is lesser than function start address, target offset will be negative. But, target address declared to be unsigned, converts negative number into 2's complement. See below example. Here target of 'jumpq' instruction at 34cf8 is 34ac0 which is lesser than function start address(34cf0). 34ac0 - 34cf0 = -0x230 = 0xfffffffffffffdd0 Objdump output: 0000000000034cf0 <__sigaction>: __GI___sigaction(): 34cf0: lea -0x20(%rdi),%eax 34cf3: cmp -bashx1,%eax 34cf6: jbe 34d00 <__sigaction+0x10> 34cf8: jmpq 34ac0 <__GI___libc_sigaction> 34cfd: nopl (%rax) 34d00: mov 0x386161(%rip),%rax # 3bae68 <_DYNAMIC+0x2e8> 34d07: movl -bashx16,%fs:(%rax) 34d0e: mov -bashxffffffff,%eax 34d13: retq perf annotate before applying patch: __GI___sigaction /usr/lib64/libc-2.22.so lea -0x20(%rdi),%eax cmp -bashx1,%eax v jbe 10 v jmpq fffffffffffffdd0 nop 10: mov _DYNAMIC+0x2e8,%rax movl -bashx16,%fs:(%rax) mov -bashxffffffff,%eax retq perf annotate after applying patch: __GI___sigaction /usr/lib64/libc-2.22.so lea -0x20(%rdi),%eax cmp -bashx1,%eax v jbe 10 ^ jmpq 34ac0 <__GI___libc_sigaction> nop 10: mov _DYNAMIC+0x2e8,%rax movl -bashx16,%fs:(%rax) mov -bashxffffffff,%eax retq Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chris Riyder <chris.ryder@arm.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1480953407-7605-3-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15	perf annotate: Support jump instruction with target as second operand	Ravi Bangoria	1	-1/+5
	Architectures like PowerPC have jump instructions that includes a target address as a second operand. For example, 'bne cr7,0xc0000000000f6154'. Add support for such instruction in perf annotate. objdump o/p: c0000000000f6140: ld r9,1032(r31) c0000000000f6144: cmpdi cr7,r9,0 c0000000000f6148: bne cr7,0xc0000000000f6154 c0000000000f614c: ld r9,2312(r30) c0000000000f6150: std r9,1032(r31) c0000000000f6154: ld r9,88(r31) Corresponding perf annotate o/p: Before patch: ld r9,1032(r31) cmpdi cr7,r9,0 v bne 3ffffffffff09f2c ld r9,2312(r30) std r9,1032(r31) 74: ld r9,88(r31) After patch: ld r9,1032(r31) cmpdi cr7,r9,0 v bne 74 ld r9,2312(r30) std r9,1032(r31) 74: ld r9,88(r31) Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chris Riyder <chris.ryder@arm.com> Cc: Kim Phillips <kim.phillips@arm.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1480953407-7605-2-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15	perf record: Force ignore_missing_thread for uid option	Jiri Olsa	1	-0/+3
	Enable perf_evsel::ignore_missing_thread for -u option to ignore complete failure if any of the user's processes die between its enumeration and time we open the event. Committer notes: While doing a 'make -j4 allmodconfig' we sometimes get into the race: Before: # perf record -u acme Error: The sys_perf_event_open() syscall returned with 3 (No such process) for event (cycles:ppp). /bin/dmesg may provide additional information. No CONFIG_PERF_EVENTS=y kernel support configured? # After: [root@jouet ~]# perf record -u acme WARNING: Ignored open failure for pid 9888 WARNING: Ignored open failure for pid 18059 [root@jouet ~]# Which is an improvement, with the races not preventing the remaining threads for the specified user from being monitored, but the message probably needs further clarification. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1481538943-21874-6-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15	perf evsel: Allow to ignore missing pid	Jiri Olsa	3	-0/+46
	Adding perf_evsel::ignore_missing_cpu_thread bool. When set true, it allows perf to ignore error of missing pid of perf event syscall. We remove missing thread id from the thread_map, so the rest of the processing like ioctl and mmap won't get disturbed with -1 fd. The reason for supporting this is to ease up monitoring group of pids, that 'disappear' before perf opens their event. This currently leads perf to report error and exit and makes perf record's -u option unusable under certain setup. With this change we will allow this race and ignore such failure with following warning: WARNING: Ignored open failure for pid 8605 Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20161213074622.GA3084@krava Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15	perf thread_map: Add thread_map__remove function	Jiri Olsa	5	-0/+72
	Add thread_map__remove function to remove thread from thread map. Add automated test also. Committer notes: Testing it: # perf test "Remove thread map" 39: Remove thread map : Ok # perf test -v "Remove thread map" 39: Remove thread map : --- start --- test child forked, pid 4483 2 threads: 4482, 4483 1 thread: 4483 0 thread: test child finished with 0 ---- end ---- Remove thread map: Ok # Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1481538943-21874-4-git-send-email-jolsa@kernel.org [ Added stdlib.h, to get the free() declaration ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15	perf evsel: Use variable instead of repeating lengthy FD macro	Jiri Olsa	1	-8/+9
	It's more readable and will ease up following patches. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1481538943-21874-3-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15	perf mem: Fix --all-user/--all-kernel options	Jiri Olsa	1	-2/+2
	Removing extra '--' prefix. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixes: ad16511b0e40 ("perf mem: Add -U/-K (--all-user/--all-kernel) options") Link: http://lkml.kernel.org/r/1481538943-21874-2-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15	perf tools: Remove some needless __maybe_unused	Arnaldo Carvalho de Melo	3	-11/+10
	I.e. those parameters/functions _are_ used, so ditch that misleading attribute. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-13cqtjh0yojg5gzvpq1zzpl0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15	perf sched timehist: Show callchains for idle stat	Namhyung Kim	1	-0/+86
	When --idle-hist option is used with --summary, it now shows idle stats with callchains like below: Idle stats by callchain: CPU 0: 902.195 msec Idle time (msec) Count Callchains ---------------- ------- -------------------------------------------------- 370.589 69 futex_wait_queue_me <- futex_wait <- do_futex <- sys_futex <- entry_SYSCALL_64_fastpath 178.799 17 worker_thread <- kthread <- ret_from_fork 128.352 17 schedule_timeout <- rcu_gp_kthread <- kthread <- ret_from_fork 125.111 19 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_select <- core_sys_select 71.599 50 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_poll 23.146 1 rcu_gp_kthread <- kthread <- ret_from_fork 4.510 1 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- ep_poll <- sys_epoll_wait <- do_syscall_64 0.085 1 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- do_restart_poll ... Committer notes: Extra testing: # uname -a Linux jouet 4.8.8-300.fc25.x86_64 #1 SMP Tue Nov 15 18:10:06 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux 1) Run 'perf sched record -g' 2) Run 'perf sched timehist --idle --summary' <SNIP> Idle stats by callchain: CPU 0: 13456.840 msec Idle time (msec) Count Callchains ---------------- ----- -------------------------------------------------- 5386.637 3283 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_poll 2750.238 2299 futex_wait_queue_me <- futex_wait <- do_futex <- sys_futex <- do_syscall_64 1275.672 1287 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- ep_poll <- sys_epoll_wait <- entry_SYSCALL_64_fastpath 936.322 452 worker_thread <- kthread <- ret_from_fork 741.311 385 rcu_nocb_kthread <- kthread <- ret_from_fork 729.385 248 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_sys_poll <- sys_ppoll 365.386 229 irq_thread <- kthread <- ret_from_fork 338.934 265 futex_wait_queue_me <- futex_wait <- do_futex <- sys_futex <- entry_SYSCALL_64_fastpath 219.488 201 schedule_timeout <- rcu_gp_kthread <- kthread <- ret_from_fork 186.839 410 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- ep_poll <- sys_epoll_wait <- do_syscall_64 142.541 59 kvm_vcpu_block <- kvm_arch_vcpu_ioctl_run <- kvm_vcpu_ioctl <- do_vfs_ioctl <- sys_ioctl 83.887 92 smpboot_thread_fn <- kthread <- ret_from_fork 62.722 96 do_exit <- do_group_exit <- 0x2a5594 <- entry_SYSCALL_64_fastpath 47.894 83 pipe_wait <- pipe_read <- __vfs_read <- vfs_read <- sys_read 46.554 61 rcu_gp_kthread <- kthread <- ret_from_fork 34.337 21 schedule_timeout <- intel_fbc_work_fn <- process_one_work <- worker_thread <- kthread 29.521 14 schedule_hrtimeout_range_clock <- schedule_hrtimeout_range <- poll_schedule_timeout <- do_select <- core_sys_select 20.274 10 schedule_timeout <- io_schedule_timeout <- bit_wait_io <- __wait_on_bit <- out_of_line_wait_on_bit 15.085 55 schedule_timeout <- unix_stream_read_generic <- unix_stream_recvmsg <- sock_recvmsg <- SYSC_recvfrom <SNIP> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161208144755.16673-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15	perf sched timehist: Add -I/--idle-hist option	Namhyung Kim	2	-5/+45
	The --idle-hist option is to analyze system idle state so which process makes cpu to go idle. If this option is specified, non-idle events will be skipped and processes switching to/from idle will be shown. This option is mostly useful when used with --summary(-only) option. In the idle-time summary view, idle time is accounted to previous thread which is run before idle task. The example output looks like following: Idle-time summary comm parent sched-out idle-time min-idle avg-idle max-idle stddev migrations (count) (msec) (msec) (msec) (msec) % -------------------------------------------------------------------------------------------- rcu_preempt[7] 2 95 550.872 0.011 5.798 23.146 7.63 0 migration/1[16] 2 1 15.558 15.558 15.558 15.558 0.00 0 khugepaged[39] 2 1 3.062 3.062 3.062 3.062 0.00 0 kworker/0:1H[124] 2 2 4.728 0.611 2.364 4.116 74.12 0 systemd-journal[167] 1 1 4.510 4.510 4.510 4.510 0.00 0 kworker/u16:3[558] 2 13 74.737 0.080 5.749 12.960 21.96 0 irq/34-iwlwifi[628] 2 21 118.403 0.032 5.638 23.990 24.00 0 kworker/u17:0[673] 2 1 3.523 3.523 3.523 3.523 0.00 0 dbus-daemon[722] 1 1 6.743 6.743 6.743 6.743 0.00 0 ifplugd[741] 1 1 58.826 58.826 58.826 58.826 0.00 0 wpa_supplicant[1490] 1 1 13.302 13.302 13.302 13.302 0.00 0 wpa_actiond[1492] 1 2 4.064 0.168 2.032 3.896 91.72 0 dockerd[1500] 1 1 0.055 0.055 0.055 0.055 0.00 0 ... Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161208144755.16673-6-namhyung@kernel.org Link: http://lkml.kernel.org/r/20161213080632.19099-2-namhyung@kernel.org [ Merged fix sent by Namhyumg, as posted in the second Link: tag ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15	perf sched timehist: Skip non-idle events when necessary	Namhyung Kim	1	-7/+18
	Sometimes it only focuses on idle-related events like upcoming idle-hist feature. In this case we don't want to see other event to reduce noise. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161208144755.16673-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15	perf sched timehist: Save callchain when entering idle	Namhyung Kim	1	-0/+30
	In order to investigate the idleness reason, it is necessary to keep the callchains when entering idle. This can be identified by the sched:sched_switch event having the next_pid field as 0. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161208144755.16673-4-namhyung@kernel.org Link: http://lkml.kernel.org/r/20161213080632.19099-1-namhyung@kernel.org [ Merged fix from Namhyung, see second Link: tag ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15	perf sched timehist: Introduce struct idle_time_data	Namhyung Kim	1	-4/+33
	The struct idle_time_data is to keep idle stats with callchains entering to the idle task. The normal thread_runtime calculation is done transparently since it extends the struct thread_runtime. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161208144755.16673-3-namhyung@kernel.org [ Align struct field names ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15	perf sched timehist: Split is_idle_sample()	Namhyung Kim	1	-19/+20
	The is_idle_sample() function actually does more than determining whether sample come from idle task. Split the callchain part into save_task_callchain() to make it clearer. Also checking prev_pid from trace data looks preferred than just checking sample->pid since it's possible, although rare, to have invalid 0 pid/tid on scheduling an exiting task. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20161208144755.16673-2-namhyung@kernel.org [ Remove some needless () in some return statements ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15	perf tools: Move headers check into bash script	Jiri Olsa	2	-93/+60
	To make it nicer and easily maintainable. Also moving the check into fixdep sub make, so its output is not scattered around the build output. Removing extra $$ from mman*.h checks. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1481030331-31944-5-git-send-email-jolsa@kernel.org [ Use /bin/sh, and 'function check() {' -> 'check () {' to make it work with busybox, in Alpine Linux, for instance ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15	redo: radix tree test suite: fix compilation	Matthew Wilcox	2	-29/+1
	[ This resurrects commit 53855d10f456, which was reverted in 2b41226b39b6. It depended on commit d544abd5ff7d ("lib/radix-tree: Convert to hotplug state machine") so now it is correct to apply ] Patch "lib/radix-tree: Convert to hotplug state machine" breaks the test suite as it adds a call to cpuhp_setup_state_nocalls() which is not currently emulated in the test suite. Add it, and delete the emulation of the old CPU hotplug mechanism. Link: http://lkml.kernel.org/r/1480369871-5271-36-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds	24	-302/+842
	Merge more updates from Andrew Morton: - a few misc things - kexec updates - DMA-mapping updates to better support networking DMA operations - IPC updates - various MM changes to improve DAX fault handling - lots of radix-tree changes, mainly to the test suite. All leading up to reimplementing the IDA/IDR code to be a wrapper layer over the radix-tree. However the final trigger-pulling patch is held off for 4.11. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits) radix tree test suite: delete unused rcupdate.c radix tree test suite: add new tag check radix-tree: ensure counts are initialised radix tree test suite: cache recently freed objects radix tree test suite: add some more functionality idr: reduce the number of bits per level from 8 to 6 rxrpc: abstract away knowledge of IDR internals tpm: use idr_find(), not idr_find_slowpath() idr: add ida_is_empty radix tree test suite: check multiorder iteration radix-tree: fix replacement for multiorder entries radix-tree: add radix_tree_split_preload() radix-tree: add radix_tree_split radix-tree: add radix_tree_join radix-tree: delete radix_tree_range_tag_if_tagged() radix-tree: delete radix_tree_locate_item() radix-tree: improve multiorder iterators btrfs: fix race in btrfs_free_dummy_fs_info() radix-tree: improve dump output radix-tree: make radix_tree_find_next_bit more useful ...
2016-12-14	radix tree test suite: delete unused rcupdate.c	Matthew Wilcox	1	-86/+0
	This file was used to implement call_rcu() before liburcu implemented that function. It hasn't even been compiled since before the test suite was added to the kernel. Remove it to reduce confusion. Link: http://lkml.kernel.org/r/1481667692-14500-5-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	radix tree test suite: add new tag check	Matthew Wilcox	1	-0/+3
	We have a check that setting a tag on a single entry at root succeeds, but we were missing a check that clearing a tag on that same entry also succeeds. Link: http://lkml.kernel.org/r/1481667692-14500-4-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	radix-tree: ensure counts are initialised	Matthew Wilcox	1	-4/+41
	radix_tree_join() was freeing nodes with a non-zero ->exceptional count, and radix_tree_split() wasn't zeroing ->exceptional when it allocated the new node. Fix this by making all callers of radix_tree_node_alloc() pass in the new counts (and some other always-initialised fields), which will prevent the problem recurring if in future we decide to do something similar. Link: http://lkml.kernel.org/r/1481667692-14500-3-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	radix tree test suite: cache recently freed objects	Matthew Wilcox	2	-12/+41
	The kmem_cache_alloc implementation simply allocates new memory from malloc() and calls the ctor, which zeroes out the entire object. This means it cannot spot bugs where the object isn't properly reinitialised before being freed. Add a small (11 objects) cache before freeing objects back to malloc. This is enough to let us write a test to catch it, although the memory allocator is now aware of the structure of the radix tree node, since it chains free objects through ->private_data (like the percpu cache does). Link: http://lkml.kernel.org/r/1481667692-14500-2-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	radix tree test suite: add some more functionality	Matthew Wilcox	3	-0/+21
	IDR needs more functionality from the kernel: kmalloc()/kfree(), and xchg(). Link: http://lkml.kernel.org/r/1480369871-5271-67-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	radix tree test suite: check multiorder iteration	Matthew Wilcox	4	-35/+73
	The random iteration test only inserts order-0 entries currently. Update it to insert entries of order between 7 and 0. Also make the maximum index configurable, make some variables static, make the test duration variable, remove some useless spinning, and add a fifth thread which calls tag_tagged_items(). Link: http://lkml.kernel.org/r/1480369871-5271-62-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	radix-tree: fix replacement for multiorder entries	Matthew Wilcox	1	-12/+75
	When replacing an entry with NULL, we need to delete any sibling entries. Also account deleting exceptional entries properly. Also fix a bug with radix_tree_iter_replace() where we would fail to remove entirely freed nodes. Also fix accounting bug when switching between normal and exceptional entries with replace_slot. Also add testcases for all these bugs. Link: http://lkml.kernel.org/r/1480369871-5271-61-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	radix-tree: add radix_tree_split_preload()	Matthew Wilcox	2	-2/+45
	Calculate how many nodes we need to allocate to split an old_order entry into multiple entries, each of size new_order. The test suite checks that we allocated exactly the right number of nodes; neither too many (checked by rtp->nr == 0), nor too few (checked by comparing nr_allocated before and after the call to radix_tree_split()). Link: http://lkml.kernel.org/r/1480369871-5271-60-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	radix-tree: add radix_tree_split	Matthew Wilcox	1	-0/+64
	This new function splits a larger multiorder entry into smaller entries (potentially multi-order entries). These entries are initialised to RADIX_TREE_RETRY to ensure that RCU walkers who see this state aren't confused. The caller should then call radix_tree_for_each_slot() and radix_tree_replace_slot() in order to turn these retry entries into the intended new entries. Tags are replicated from the original multiorder entry into each new entry. Link: http://lkml.kernel.org/r/1480369871-5271-59-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	radix-tree: add radix_tree_join	Matthew Wilcox	1	-0/+58
	This new function allows for the replacement of many smaller entries in the radix tree with one larger multiorder entry. From the point of view of an RCU walker, they may see a mixture of the smaller entries and the large entry during the same walk, but they will never see NULL for an index which was populated before the join. Link: http://lkml.kernel.org/r/1480369871-5271-58-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	radix-tree: delete radix_tree_range_tag_if_tagged()	Matthew Wilcox	6	-19/+50
	This is an exceptionally complicated function with just one caller (tag_pages_for_writeback). We devote a large portion of the runtime of the test suite to testing this one function which has one caller. By introducing the new function radix_tree_iter_tag_set(), we can eliminate all of the complexity while keeping the performance. The caller can now use a fairly standard radix_tree_for_each() loop, and it doesn't need to worry about tricksy things like 'start' wrapping. The test suite continues to spend a large amount of time investigating this function, but now it's testing the underlying primitives such as radix_tree_iter_resume() and the radix_tree_for_each_tagged() iterator which are also used by other parts of the kernel. Link: http://lkml.kernel.org/r/1480369871-5271-57-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <willy@infradead.org> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	radix-tree: delete radix_tree_locate_item()	Matthew Wilcox	3	-4/+28
	This rather complicated function can be better implemented as an iterator. It has only one caller, so move the functionality to the only place that needs it. Update the test suite to follow the same pattern. Link: http://lkml.kernel.org/r/1480369871-5271-56-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Konstantin Khlebnikov <koct9i@gmail.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	radix-tree: improve multiorder iterators	Matthew Wilcox	4	-19/+30
	This fixes several interlinked problems with the iterators in the presence of multiorder entries. 1. radix_tree_iter_next() would only advance by one slot, which would result in the iterators returning the same entry more than once if there were sibling entries. 2. radix_tree_next_slot() could return an internal pointer instead of a user pointer if a tagged multiorder entry was immediately followed by an entry of lower order. 3. radix_tree_next_slot() expanded to a lot more code than it used to when multiorder support was compiled in. And I wasn't comfortable with entry_to_node() being in a header file. Fixing radix_tree_iter_next() for the presence of sibling entries necessarily involves examining the contents of the radix tree, so we now need to pass 'slot' to radix_tree_iter_next(), and we need to change the calling convention so it is called before dropping the lock which protects the tree. Also rename it to radix_tree_iter_resume(), as some people thought it was necessary to call radix_tree_iter_next() each time around the loop. radix_tree_next_slot() becomes closer to how it looked before multiorder support was introduced. It only checks to see if the next entry in the chunk is a sibling entry or a pointer to a node; this should be rare enough that handling this case out of line is not a performance impact (and such impact is amortised by the fact that the entry we just processed was a multiorder entry). Also, radix_tree_next_slot() used to force a new chunk lookup for untagged entries, which is more expensive than the out of line sibling entry skipping. Link: http://lkml.kernel.org/r/1480369871-5271-55-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	radix tree test suite: use common find-bit code	Matthew Wilcox	5	-80/+48
	Remove the old find_next_bit code in favour of linking in the find_bit code from tools/lib. Link: http://lkml.kernel.org/r/1480369871-5271-48-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	tools: add more bitmap functions	Matthew Wilcox	1	-0/+26
	I need the following functions for the radix tree: bitmap_fill bitmap_empty bitmap_full Copy the implementations from include/linux/bitmap.h Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	radix tree test suite: record order in each item	Matthew Wilcox	3	-14/+23
	This probably doubles the size of each item allocated by the test suite but it lets us check a few more things, and may be needed for upcoming API changes that require the caller pass in the order of the entry. Link: http://lkml.kernel.org/r/1480369871-5271-46-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	radix tree test suite: handle exceptional entries	Matthew Wilcox	1	-0/+7
	item_kill_tree() assumes that everything in the tree is a pointer to a struct item, which is annoying when testing the behaviour of exceptional entries. Fix it to delete exceptional entries on the assumption they don't need to be freed. Link: http://lkml.kernel.org/r/1480369871-5271-45-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	radix tree test suite: use rcu_barrier	Matthew Wilcox	2	-2/+15
	Calling rcu_barrier() allows all of the rcu-freed memory to be actually returned to the pool, and allows nr_allocated to return to 0. As well as allowing diffs between runs to be more useful, it also lets us pinpoint leaks more effectively. Link: http://lkml.kernel.org/r/1480369871-5271-44-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	radix tree test suite: benchmark for iterator	Konstantin Khlebnikov	5	-1/+110
	This adds simple benchmark for iterator similar to one I've used for commit 78c1d78488a3 ("radix-tree: introduce bit-optimized iterator") Building with make BENCHMARK=1 set radix tree order to 6, this allows to get performance comparable to in kernel performance. Link: http://lkml.kernel.org/r/1480369871-5271-43-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	radix tree test suite: iteration test misuses RCU	Matthew Wilcox	1	-2/+26
	Each thread needs to register itself with RCU, otherwise the reading thread's read lock has no effect and the freeing thread will free the memory in the tree without waiting for the read lock to be dropped. Link: http://lkml.kernel.org/r/1480369871-5271-42-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	radix tree test suite: make runs more reproducible	Matthew Wilcox	2	-6/+14
	Instead of reseeding the random number generator every time around the loop in big_gang_check(), seed it at the beginning of execution. Use rand_r() and an independent base seed for each thread in iteration_test() so they don't stomp all over each others state. Since this particular test depends on the kernel scheduler, the iteration test can't be reproduced based purely on the random seed, but at least it won't pollute the other tests. Print the seed, and allow the seed to be specified so that a run which hits a problem can be reproduced. Link: http://lkml.kernel.org/r/1480369871-5271-41-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <willy@infradead.org> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14	radix tree test suite: free preallocated nodes	Matthew Wilcox	2	-0/+4
	It can be a source of mild concern when the test suite shows that we're leaking nodes. While poring over the source code looking for leaks can lead to some fascinating bugs being discovered, sometimes the leak is simply that these nodes were preallocated and are sitting on the per-CPU list. Free them by calling the CPU dead callback. Link: http://lkml.kernel.org/r/1480369871-5271-40-git-send-email-mawilcox@linuxonhyperv.com Signed-off-by: Matthew Wilcox <willy@infradead.org> Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>