linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-11-15	selftests: Adjust spectrum-2 ctcam_two_atcam_masks_test	Jiri Pirko	1	-1/+1
	In order for this to behave as required with delta bits, change the mask for rule with handle 103. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15	selftests: Adjust spectrum-2 two_mask_test	Jiri Pirko	1	-1/+1
	In order for this to behave as required with delta bits, change the mask for rule with handle 103. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-15	selftests/powerpc: Adjust wild_bctr to build with old binutils	Gustavo Romero	1	-2/+3
	Currently the selftest wild_bctr can fail to build when an old gcc is used, notably on gcc using a binutils version <= 2.27, because the assembler does not support the integer suffix UL. This patch adjusts the wild_bctr test so the REG_POISON value is still treated as an unsigned long for the shifts on compilation but the UL suffix is absent on the stringification, so the inline asm code generated has no UL suffixes. Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> [mpe: Wrap long line] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-11-12	perf tools: Fix crash on synthesizing the unit	Jiri Olsa	2	-2/+2
	Adam reported a record command crash for simple session like: $ perf record -e cpu-clock ls with following backtrace: Program received signal SIGSEGV, Segmentation fault. 3543 ev = event_update_event__new(size + 1, PERF_EVENT_UPDATE__UNIT, evsel->id[0]); (gdb) bt #0 perf_event__synthesize_event_update_unit #1 0x000000000051e469 in perf_event__synthesize_extra_attr #2 0x00000000004445cb in record__synthesize #3 0x0000000000444bc5 in __cmd_record ... We synthesize an update event that needs to touch the evsel id array, which is not defined at that time. Fix this by forcing the id allocation for events with their unit defined. Reflecting possible read_format ID bit in the attr tests. Reported-by: Yongxin Liu <yongxin.liu@outlook.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Adam Lee <leeadamrobert@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201477 Fixes: bfd8f72c2778 ("perf record: Synthesize unit/scale/... in event update") Link: http://lkml.kernel.org/r/20181112130012.5424-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-12	selftests: add script to stress-test nft packet path vs. control plane	Florian Westphal	4	-0/+87
	Start flood ping for each cpu while loading/flushing rulesets to make sure we do not access already-free'd rules from nf_tables evaluation loop. Also add this to TARGETS so 'make run_tests' in selftest dir runs it automatically. This would have caught the bug fixed in previous change ("netfilter: nf_tables: do not skip inactive chains during generation update") sooner. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-11-12	selftests/powerpc: Fix wild_bctr test to work on ppc64	Michael Ellerman	1	-1/+15
	The selftest I recently added to test branching to an out-of-bounds NIP doesn't work on 64-bit big endian. It does fail but not in the right way. That is it SEGVs trying to load from the opd at BAD_NIP, but it never gets as far as branching to BAD_NIP. To fix it we need to create an opd which is reachable but which holds the bad address. Fixes: b7683fc66eba ("selftests/powerpc: Add a test of wild bctr") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-11-11	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	19	-121/+820

2018-11-10	selftests/bpf: Test narrow loads with off > 0 for bpf_sock_addr	Andrey Ignatov	1	-4/+24
	Add more test cases for context bpf_sock_addr to test narrow loads with offset > 0 for ctx->user_ip4 field (__u32): * off=1, size=1; * off=2, size=1; * off=3, size=1; * off=2, size=2. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	selftests/bpf: Test narrow loads with off > 0 in test_verifier	Andrey Ignatov	1	-10/+38
	Test the following narrow loads in test_verifier for context __sk_buff: * off=1, size=1 - ok; * off=2, size=1 - ok; * off=3, size=1 - ok; * off=0, size=2 - ok; * off=1, size=2 - fail; * off=0, size=2 - ok; * off=3, size=2 - fail. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	bpftool: support loading flow dissector	Stanislav Fomichev	3	-51/+74
	This commit adds support for loading/attaching/detaching flow dissector program. When `bpftool loadall` is called with a flow_dissector prog (i.e. when the 'type flow_dissector' argument is passed), we load and pin all programs. User is responsible to construct the jump table for the tail calls. The last argument of `bpftool attach` is made optional for this use case. Example: bpftool prog load tools/testing/selftests/bpf/bpf_flow.o \ /sys/fs/bpf/flow type flow_dissector \ pinmaps /sys/fs/bpf/flow bpftool map update pinned /sys/fs/bpf/flow/jmp_table \ key 0 0 0 0 \ value pinned /sys/fs/bpf/flow/IP bpftool map update pinned /sys/fs/bpf/flow/jmp_table \ key 1 0 0 0 \ value pinned /sys/fs/bpf/flow/IPV6 bpftool map update pinned /sys/fs/bpf/flow/jmp_table \ key 2 0 0 0 \ value pinned /sys/fs/bpf/flow/IPV6OP bpftool map update pinned /sys/fs/bpf/flow/jmp_table \ key 3 0 0 0 \ value pinned /sys/fs/bpf/flow/IPV6FR bpftool map update pinned /sys/fs/bpf/flow/jmp_table \ key 4 0 0 0 \ value pinned /sys/fs/bpf/flow/MPLS bpftool map update pinned /sys/fs/bpf/flow/jmp_table \ key 5 0 0 0 \ value pinned /sys/fs/bpf/flow/VLAN bpftool prog attach pinned /sys/fs/bpf/flow/flow_dissector flow_dissector Tested by using the above lines to load the prog in the test_flow_dissector.sh selftest. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	bpftool: add pinmaps argument to the load/loadall	Stanislav Fomichev	3	-3/+28
	This new additional argument lets users pin all maps from the object at specified path. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	bpftool: add loadall command	Stanislav Fomichev	5	-43/+81
	This patch adds new loadall command which slightly differs from the existing load. load command loads all programs from the obj file, but pins only the first programs. loadall pins all programs from the obj file under specified directory. The intended usecase is flow_dissector, where we want to load a bunch of progs, pin them all and after that construct a jump table. Signed-off-by: Stanislav Fomichev <sdf@google.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	libbpf: add internal pin_name	Stanislav Fomichev	1	-3/+26
	pin_name is the same as section_name where '/' is replaced by '_'. bpf_object__pin_programs is converted to use pin_name to avoid the situation where section_name would require creating another subdirectory for a pin (as, for example, when calling bpf_object__pin_programs for programs in sections like "cgroup/connect6"). Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	libbpf: bpf_program__pin: add special case for instances.nr == 1	Stanislav Fomichev	1	-0/+10
	When bpf_program has only one instance, don't create a subdirectory with per-instance pin files (<prog>/0). Instead, just create a single pin file for that single instance. This simplifies object pinning by not creating unnecessary subdirectories. This can potentially break existing users that depend on the case where '/0' is always created. However, I couldn't find any serious usage of bpf_program__pin inside the kernel tree and I suppose there should be none outside. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	libbpf: cleanup after partial failure in bpf_object__pin	Stanislav Fomichev	2	-23/+319
	bpftool will use bpf_object__pin in the next commits to pin all programs and maps from the file; in case of a partial failure, we need to get back to the clean state (undo previous program/map pins). As part of a cleanup, I've added and exported separate routines to pin all maps (bpf_object__pin_maps) and progs (bpf_object__pin_programs) of an object. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-10	selftests/bpf: rename flow dissector section to flow_dissector	Stanislav Fomichev	2	-2/+2
	Makes it compatible with the logic that derives program type from section name in libbpf_prog_type_by_name. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-11-09	kselftests/bpf: use ping6 as the default ipv6 ping binary when it exists	Li Zhijian	1	-1/+4
	At commit deee2cae27d1 ("kselftests/bpf: use ping6 as the default ipv6 ping binary if it exists"), it fixed similar issues for shell script, but it missed a same issue in the C code. Fixes: 371e4fcc9d96 ("selftests/bpf: cgroup local storage-based network counters") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> CC: Philip Li <philip.li@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09	bpftool: Improve handling of ENOENT on map dumps	David Ahern	1	-4/+14
	bpftool output is not user friendly when dumping a map with only a few populated entries: $ bpftool map 1: devmap name tx_devmap flags 0x0 key 4B value 4B max_entries 64 memlock 4096B 2: array name tx_idxmap flags 0x0 key 4B value 4B max_entries 64 memlock 4096B $ bpftool map dump id 1 key: 00 00 00 00 value: No such file or directory key: 01 00 00 00 value: No such file or directory key: 02 00 00 00 value: No such file or directory key: 03 00 00 00 value: 03 00 00 00 Handle ENOENT by keeping the line format sane and dumping "<no entry>" for the value $ bpftool map dump id 1 key: 00 00 00 00 value: <no entry> key: 01 00 00 00 value: <no entry> key: 02 00 00 00 value: <no entry> key: 03 00 00 00 value: 03 00 00 00 ... Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09	selftests/bpf: add a test case for sock_ops perf-event notification	Sowmini Varadhan	4	-1/+303
	This patch provides a tcp_bpf based eBPF sample. The test - ncat(1) as the TCP client program to connect() to a port with the intention of triggerring SYN retransmissions: we first install an iptables DROP rule to make sure ncat SYNs are resent (instead of aborting instantly after a TCP RST) - has a bpf kernel module that sends a perf-event notification for each TCP retransmit, and also tracks the number of such notifications sent in the global_map The test passes when the number of event notifications intercepted in user-space matches the value in the global_map. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09	tools: bpftool: update references to other man pages in documentation	Quentin Monnet	6	-7/+42
	Update references to other bpftool man pages at the bottom of each manual page. Also reference the "bpf(2)" and "bpf-helpers(7)" man pages. References are sorted by number of man section, then by "prog-and-map-go-first", the other pages in alphabetical order. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09	tools: bpftool: pass an argument to silence open_obj_pinned()	Quentin Monnet	2	-8/+9
	Function open_obj_pinned() prints error messages when it fails to open a link in the BPF virtual file system. However, in some occasions it is not desirable to print an error, for example when we parse all links under the bpffs root, and the error is due to some paths actually being symbolic links. Example output: # ls -l /sys/fs/bpf/ lrwxrwxrwx 1 root root 0 Oct 18 19:00 ip -> /sys/fs/bpf/tc/ drwx------ 3 root root 0 Oct 18 19:00 tc lrwxrwxrwx 1 root root 0 Oct 18 19:00 xdp -> /sys/fs/bpf/tc/ # bpftool --bpffs prog show Error: bpf obj get (/sys/fs/bpf): Permission denied Error: bpf obj get (/sys/fs/bpf): Permission denied # strace -e bpf bpftool --bpffs prog show bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/ip", bpf_fd=0}, 72) = -1 EACCES (Permission denied) Error: bpf obj get (/sys/fs/bpf): Permission denied bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/xdp", bpf_fd=0}, 72) = -1 EACCES (Permission denied) Error: bpf obj get (/sys/fs/bpf): Permission denied ... To fix it, pass a bool as a second argument to the function, and prevent it from printing an error when the argument is set to true. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09	tools: bpftool: fix plain output and doc for --bpffs option	Quentin Monnet	2	-3/+3
	Edit the documentation of the -f\|--bpffs option to make it explicit that it dumps paths of pinned programs when bpftool is used to list the programs only, so that users do not believe they will see the name of the newly pinned program with "bpftool prog pin" or "bpftool prog load". Also fix the plain output: do not add a blank line after each program block, in order to remain consistent with what bpftool does when the option is not passed. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09	tools: bpftool: prevent infinite loop in get_fdinfo()	Quentin Monnet	1	-1/+1
	Function getline() returns -1 on failure to read a line, thus creating an infinite loop in get_fdinfo() if the key is not found. Fix it by calling the function only as long as we get a strictly positive return value. Found by copying the code for a key which is not always present... Fixes: 71bb428fe2c1 ("tools: bpf: add bpftool") Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-09	tools/bpftool: copy a few net uapi headers to tools directory	Yonghong Song	2	-0/+649
	Commit f6f3bac08ff9 ("tools/bpf: bpftool: add net support") added certain networking support to bpftool. The implementation relies on a relatively recent uapi header file linux/tc_act/tc_bpf.h on the host which contains the marco definition of TCA_ACT_BPF_ID. Unfortunately, this is not the case for all distributions. See the email message below where rhel-7.2 does not have an up-to-date linux/tc_act/tc_bpf.h. https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1799211.html Further investigation found that linux/pkt_cls.h is also needed for macro TCA_BPF_TAG. This patch fixed the issue by copying linux/tc_act/tc_bpf.h and linux/pkt_cls.h from kernel include/uapi directory to tools/include/uapi directory so building the bpftool does not depend on host system for these files. Fixes: f6f3bac08ff9 ("tools/bpf: bpftool: add net support") Reported-by: kernel test robot <rong.a.chen@intel.com> Cc: Li Zhijian <zhijianx.li@intel.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-08	selftests: pmtu: Introduce FoU and GUE PMTU exceptions tests	Stefano Brivio	1	-0/+179
	Introduce eight tests, for FoU and GUE, with IPv4 and IPv6 payload, on IPv4 and IPv6 transport, that check that PMTU exceptions are created with the right value when exceeding the MTU on a link of the path. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08	selftests: pmtu: Introduce tests for IPv4/IPv6 over GENEVE over IPv4/IPv6	Stefano Brivio	1	-31/+86
	Use a router between endpoints, implemented via namespaces, set a low MTU between router and destination endpoint, exceed it and check PMTU value in route exceptions. v2: - Introduce IPv4 tests right away, if iproute2 doesn't support the 'df' link option they will be skipped (David Ahern) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-08	selftests: pmtu: Introduce tests for IPv4/IPv6 over VXLAN over IPv4/IPv6	Stefano Brivio	1	-18/+125
	Use a router between endpoints, implemented via namespaces, set a low MTU between router and destination endpoint, exceed it and check PMTU value in route exceptions. v2: - Change all occurrences of VxLAN to VXLAN (Jiri Benc) - Introduce IPv4 tests right away, if iproute2 doesn't support the 'df' link option they will be skipped (David Ahern) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07	selftests: add functionals test for UDP GRO	Paolo Abeni	6	-23/+282
	Extends the existing udp programs to allow checking for proper GRO aggregation/GSO size, and run the tests via a shell script, using a veth pair with XDP program attached to trigger the GRO code path. rfc v3 -> v1: - use ip route to attach the xdp helper to the veth rfc v2 -> rfc v3: - add missing test program options documentation - fix sporatic test failures (receiver faster than sender) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07	selftests: add some benchmark for UDP GRO	Paolo Abeni	2	-0/+94
	Run on top of veth pair, using a dummy XDP program to enable the GRO. rfc v3 -> v1: - use ip route to attach the xdp helper to the veth Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07	selftests: add dummy xdp test helper	Paolo Abeni	2	-1/+15
	This trivial XDP program does nothing, but will be used by the next patch to test the GRO path in a net namespace, leveraging the veth XDP implementation. It's added here, despite its 'net' usage, to avoid the duplication of the llc-related makefile boilerplate. rfc v3 -> v1: - move the helper implementation into the bpf directory, don't touch udpgso_bench_rx rfc v2 -> rfc v3: - move 'x' option handling here Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07	selftests: add GRO support to udp bench rx program	Paolo Abeni	1	-7/+30
	And fix a couple of buglets (port option processing, clean termination on SIGINT). This is preparatory work for GRO tests. rfc v2 -> rfc v3: - use ETH_MAX_MTU Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-07	tools: bpftool: adjust rlimit RLIMIT_MEMLOCK when loading programs, maps	Quentin Monnet	4	-0/+14
	The limit for memory locked in the kernel by a process is usually set to 64 kbytes by default. This can be an issue when creating large BPF maps and/or loading many programs. A workaround is to raise this limit for the current process before trying to create a new BPF map. Changing the hard limit requires the CAP_SYS_RESOURCE and can usually only be done by root user (for non-root users, a call to setrlimit fails (and sets errno) and the program simply goes on with its rlimit unchanged). There is no API to get the current amount of memory locked for a user, therefore we cannot raise the limit only when required. One solution, used by bcc, is to try to create the map, and on getting a EPERM error, raising the limit to infinity before giving another try. Another approach, used in iproute2, is to raise the limit in all cases, before trying to create the map. Here we do the same as in iproute2: the rlimit is raised to infinity before trying to load programs or to create maps with bpftool. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-07	selftests/bpf: enable (uncomment) all tests in test_libbpf.sh	Quentin Monnet	2	-10/+14
	libbpf is now able to load successfully test_l4lb_noinline.o and samples/bpf/tracex3_kern.o. For the test_l4lb_noinline, uncomment related tests from test_libbpf.c and remove the associated "TODO". For tracex3_kern.o, instead of loading a program from samples/bpf/ that might not have been compiled at this stage, try loading a program from BPF selftests. Since this test case is about loading a program compiled without the "-target bpf" flag, change the Makefile to compile one program accordingly (instead of passing the flag for compiling all programs). Regarding test_xdp_noinline.o: in its current shape the program fails to load because it provides no version section, but the loader needs one. The test was added to make sure that libbpf could load XDP programs even if they do not provide a version number in a dedicated section. But libbpf is already capable of doing that: in our case loading fails because the loader does not know that this is an XDP program (it does not need to, since it does not attach the program). So trying to load test_xdp_noinline.o does not bring much here: just delete this subtest. For the record, the error message obtained with tracex3_kern.o was fixed by commit e3d91b0ca523 ("tools/libbpf: handle issues with bpf ELF objects containing .eh_frames") I have not been abled to reproduce the "libbpf: incorrect bpf_call opcode" error for test_l4lb_noinline.o, even with the version of libbpf present at the time when test_libbpf.sh and test_libbpf_open.c were created. RFC -> v1: - Compile test_xdp without the "-target bpf" flag, and try to load it instead of ../../samples/bpf/tracex3_kern.o. - Delete test_xdp_noinline.o subtest. Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-11-06	Merge tag 'perf-urgent-for-mingo-4.20-20181106' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent	Ingo Molnar	19	-121/+820
	Pull perf/urgent improvements and fixes from Arnaldo Carvalho de Melo: Intel PT SQL viewer: (Adrian Hunter) - Fall back to /usr/local/lib/libxed.so - Add Selected branches report - Add help window - Fix table find when table re-ordered Intel PT debug log (Adrian Hunter) - Add more event information - Add MTC and CYC timestamps perf record: (Andi Kleen) - Support weak groups, just like with 'perf stat' perf trace: (Arnaldo Carvalho de Melo) - Start augmenting raw_syscalls:{sys_enter,sys_exit}: goal is to have a generic, arch independent eBPF kernel component that is programmed with syscall table details, what to copy, how many bytes, pid, arg filters from the userspace via eBPF maps by the 'perf trace' tool that continues to use all its argument beautifiers, just taking advantage of the extra pointer contents. JVMTI: (Gustavo Romero) - Fix undefined symbol scnprintf in libperf-jvmti.so perf top: (Jin Yao) - Display the LBR stats in callchain entries perf stat: (Thomas Richter) - Handle different PMU names with common prefix arm64: Will (Deacon) - Fix arm64 tools build failure wrt smp_load_{acquire,release}. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-06	tools cpupower: Override CFLAGS assignments	Jiri Olsa	1	-6/+6
	So user could specify outside CFLAGS values. Cc: Thomas Renninger <trenn@suse.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
2018-11-06	tools cpupower debug: Allow to use outside build flags	Jiri Olsa	1	-2/+2
	Adding CFLAGS and LDFLAGS to be used during the build. Cc: Thomas Renninger <trenn@suse.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
2018-11-06	tools/power/cpupower: fix compilation with STATIC=true	Konstantin Khlebnikov	5	-6/+6
	Rename duplicate sysfs_read_file into cpupower_read_sysfs and fix linking. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Thomas Renninger <trenn@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
2018-11-06	perf tools: Do not zero sample_id_all for group members	Jiri Olsa	2	-2/+0
	Andi reported following malfunction: # perf record -e '{ref-cycles,cycles}:S' -a sleep 1 # perf script non matching sample_id_all That's because we disable sample_id_all bit for non-sampling group members. We can't do that, because it needs to be the same over the whole event list. This patch keeps it untouched again. Reported-by: Andi Kleen <andi@firstfloor.org> Tested-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180923150420.27327-1-jolsa@kernel.org Fixes: e9add8bac6c6 ("perf evsel: Disable write_backward for leader sampling group events") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05	tools/testing/nvdimm: Fix the array size for dimm devices.	Masayoshi Mizuma	1	-4/+4
	KASAN reports following global out of bounds access while nfit_test is being loaded. The out of bound access happens the following reference to dimm_fail_cmd_flags[dimm]. 'dimm' is over than the index value, NUM_DCR (==5). static int override_return_code(int dimm, unsigned int func, int rc) { if ((1 << func) & dimm_fail_cmd_flags[dimm]) { dimm_fail_cmd_flags[] definition: static unsigned long dimm_fail_cmd_flags[NUM_DCR]; 'dimm' is the return value of get_dimm(), and get_dimm() returns the index of handle[] array. The handle[] has 7 index. Let's use ARRAY_SIZE(handle) as the array size. KASAN report: ================================================================== BUG: KASAN: global-out-of-bounds in nfit_test_ctl+0x47bb/0x55b0 [nfit_test] Read of size 8 at addr ffffffffc10cbbe8 by task kworker/u41:0/8 ... Call Trace: dump_stack+0xea/0x1b0 ? dump_stack_print_info.cold.0+0x1b/0x1b ? kmsg_dump_rewind_nolock+0xd9/0xd9 print_address_description+0x65/0x22e ? nfit_test_ctl+0x47bb/0x55b0 [nfit_test] kasan_report.cold.6+0x92/0x1a6 nfit_test_ctl+0x47bb/0x55b0 [nfit_test] ... The buggy address belongs to the variable: dimm_fail_cmd_flags+0x28/0xffffffffffffa440 [nfit_test] ================================================================== Fixes: 39611e83a28c ("tools/testing/nvdimm: Make DSM failure code injection...") Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-11-05	perf tools: Fix undefined symbol scnprintf in libperf-jvmti.so	Gustavo Romero	1	-11/+38
	Currently jvmti agent can not be used because function scnprintf is not present in the agent libperf-jvmti.so. As a result the JVM when using such agent to record JITed code profiling information will fail on looking up scnprintf: java: symbol lookup error: lib/libperf-jvmti.so: undefined symbol: scnprintf This commit fixes that by reverting to the use of snprintf, that can be looked up, instead of scnprintf, adding a proper check for the returned value in order to print a better error message when the jitdump file pathname is too long. Checking the returned value also helps to comply with some recent gcc versions, like gcc8, which will fail due to truncated writing checks related to the -Werror=format-truncation= flag. Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> LPU-Reference: 1541117601-18937-2-git-send-email-gromero@linux.vnet.ibm.com Link: https://lkml.kernel.org/n/tip-mvpxxxy7wnzaj74cq75muw3f@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05	perf beauty: Use SRCARCH, ARCH=x86_64 must map to "x86" to find the headers	Arnaldo Carvalho de Melo	1	-1/+1
	Guenter reported that using ARCH=x86_64 to build perf has regressed: $ make -C tools/perf O=/tmp/build/perf ARCH=x86_64 make: Entering directory '/home/acme/git/perf/tools/perf' BUILD: Doing 'make -j4' parallel build HOSTCC /tmp/build/perf/fixdep.o HOSTLD /tmp/build/perf/fixdep-in.o LINK /tmp/build/perf/fixdep Auto-detecting system features: ... dwarf: [ on ] <SNIP> ... bpf: [ on ] GEN /tmp/build/perf/common-cmds.h make[2]: * No rule to make target '/home/acme/git/perf/tools/arch/x86_64/include/uapi/asm//mman.h', needed by '/tmp/build/perf/trace/beauty/generated/mmap_flags_array.c'. Stop. make[2]: * Waiting for unfinished jobs.... PERF_VERSION = 4.19.gf6c23e3 make[1]: * [Makefile.perf:207: sub-make] Error 2 make: * [Makefile:70: all] Error 2 make: Leaving directory '/home/acme/git/perf/tools/perf' $ This is because we must use $(SRCARCH) where we were using $(ARCH), so that, just like the top level Makefile, we get this done: # Additional ARCH settings for x86 ifeq ($(ARCH),i386) SRCARCH := x86 endif ifeq ($(ARCH),x86_64) SRCARCH := x86 endif Which is done in tools/scripts/Makefile.arch, so switch to use $(SRCARCH). Reported-by: Guenter Roeck <linux@roeck-us.net> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Clark Williams <williams@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: fbd7458db757 ("perf beauty: Wire up the mmap flags table generator to the Makefile") Link: https://lkml.kernel.org/r/20181105184612.GD7077@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05	perf intel-pt: Add MTC and CYC timestamps to debug log	Adrian Hunter	1	-0/+4
	One cause of decoding errors is un-synchronized side-band data. Timestamps are needed to debug such cases. TSC packet timestamps are logged. Log also MTC and CYC timestamps. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Link: http://lkml.kernel.org/r/20181105073505.8129-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05	perf intel-pt: Add more event information to debug log	Adrian Hunter	3	-3/+19
	More event information is useful for debugging, especially MMAP events. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Link: http://lkml.kernel.org/r/20181105073505.8129-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05	perf scripts python: exported-sql-viewer.py: Fix table find when table re-ordered	Adrian Hunter	1	-1/+3
	Table rows can be re-ordered by selecting a column to sort by. After re-ordering, the "find" operation was highlighting the wrong row, fix it. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20181104151238.15947-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05	perf scripts python: exported-sql-viewer.py: Add help window	Adrian Hunter	1	-1/+154
	Add a window to display help. It is also possible to display the help only, by using the option "--help-only" instead of a database name. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20181104151238.15947-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05	perf scripts python: exported-sql-viewer.py: Add Selected branches report	Adrian Hunter	1	-0/+327
	Fetching data from the database can be slow. Add a report that provides the ability to select a subset of branches. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20181104151238.15947-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05	perf scripts python: exported-sql-viewer.py: Fall back to /usr/local/lib/libxed.so	Adrian Hunter	1	-1/+6
	Fall back to /usr/local/lib/libxed.so to cater for distributions that do not have /usr/local/lib in the library path by default. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20181104151238.15947-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05	perf top: Display the LBR stats in callchain entry	Jin Yao	1	-0/+3
	'perf report' has supported the displaying of LBR stats (such as cycles, predicted%) in callchain entry. For example: $ perf report --branch-history --stdio --1.01%--intel_idle mwait.h:29 intel_idle cpufeature.h:164 (cycles:5) intel_idle cpufeature.h:164 (predicted:76.4%) intel_idle mwait.h:102 (cycles:41) intel_idle current.h:15 While 'perf top' doesn't support that. For example: $ perf top -a -b --call-graph branch - 13.86% 0.23% [kernel] [k] __x86_indirect_thunk_rax - 13.65% __x86_indirect_thunk_rax + 1.69% do_syscall_64 + 1.68% do_select + 1.41% ktime_get + 0.70% __schedule + 0.62% do_sys_poll 0.58% __x86_indirect_thunk_rax Actually it's very easy to enable this feature in 'perf top'. With this patch, the result is: $ perf top -a -b --call-graph branch $ - 13.58% 0.00% [kernel] [k] __x86_indirect_thunk_rax $ - 13.57% __x86_indirect_thunk_rax (predicted:93.9%) $ + 1.78% do_select (cycles:2) $ + 1.68% perf_pmu_disable.part.99 (cycles:1) $ + 1.45% ___sys_recvmsg (cycles:25) $ + 0.81% unix_stream_sendmsg (cycles:18) $ + 0.80% ktime_get (cycles:400) $ 0.58% pick_next_task_fair (cycles:47) $ + 0.56% i915_request_retire (cycles:2) $ + 0.52% do_sys_poll (cycles:4) Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1540983995-20462-1-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05	perf stat: Handle different PMU names with common prefix	Thomas Richter	1	-1/+1
	On s390 the CPU Measurement Facility for counters now supports 2 PMUs named cpum_cf (CPU Measurement Facility for counters) and cpum_cf_diag (CPU Measurement Facility for diagnostic counters) for one and the same CPU. Running command [root@s35lp76 perf]# ./perf stat -e tx_c_tend \ -- ~/mytests/cf-tx-events 1 Measuring transactions TX_C_TABORT_NO_SPECIAL: 0 expected:0 TX_C_TABORT_SPECIAL: 0 expected:0 TX_C_TEND: 1 expected:1 TX_NC_TABORT: 11 expected:11 TX_NC_TEND: 1 expected:1 Performance counter stats for '/root/mytests/cf-tx-events 1': 2 tx_c_tend 0.002120091 seconds time elapsed 0.000121000 seconds user 0.002127000 seconds sys [root@s35lp76 perf]# displays output which is unexpected (and wrong): 2 tx_c_tend The test program definitely triggers only one transaction, as shown in line 'TX_C_TEND: 1 expected:1'. This is caused by the following call sequence: pmu_lookup() scans and installs a PMU. +--> pmu_aliases() parses all aliases in directory .../<pmu-name>/events/* which are file names. +--> pmu_aliases_parse() Read each file in directory and create an new alias entry. This is done with +--> perf_pmu__new_alias() and +--> __perf_pmu__new_alias() which also check for identical alias names. After pmu_aliases() returns, a complete list of event names for this pmu has been created. Now function pmu_add_cpu_aliases() is called to add the events listed in the json \| files to the alias list of the cpu. +--> perf_pmu__find_map() Returns a pointer to the json events. Now function pmu_add_cpu_aliases() scans through all events listed in the JSON files for this CPU. Each json event pmu name is compared with the current PMU being built up and if they mismatch, the json event is added to the current PMUs alias list. To avoid duplicate entries the following comparison is done: if (!is_arm_pmu_core(name)) { pname = pe->pmu ? pe->pmu : "cpu"; if (strncmp(pname, name, strlen(pname))) continue; } The culprit is the strncmp() function. Using current s390 PMU naming, the first PMU is 'cpum_cf' and a long list of events is added, among them 'tx_c_tend' When the second PMU named 'cpum_cf_diag' is added, only one event named 'CF_DIAG' is added by the pmu_aliases() function. Now function pmu_add_cpu_aliases() is invoked for PMU 'cpum_cf_diag'. Since the CPUID string is the same for both PMUs, json file events for PMU named 'cpum_cf' are added to the PMU 'cpm_cf_diag' This happens because the strncmp() actually compares: strncmp("cpum_cf", "cpum_cf_diag", 6); The first parameter is the pmu name taken from the event in the json file. The second parameter is the pmu name of the PMU currently being built. They are different, but the length of the compare only tests the common prefix and this returns 0(true) when it should return false. Now all events for PMU cpum_cf are added to the alias list for pmu cpum_cf_diag. Later on in function parse_events_add_pmu() the event 'tx_c_end' is searched in all available PMUs and found twice, adding it two times to the evsel_list global variable which is the root of all events. This results in a counter value of 2 instead of 1. Output with this patch: [root@s35lp76 perf]# ./perf stat -e tx_c_tend \ -- ~/mytests/cf-tx-events 1 Measuring transactions TX_C_TABORT_NO_SPECIAL: 0 expected:0 TX_C_TABORT_SPECIAL: 0 expected:0 TX_C_TEND: 1 expected:1 TX_NC_TABORT: 11 expected:11 TX_NC_TEND: 1 expected:1 Performance counter stats for '/root/mytests/cf-tx-events 1': 1 tx_c_tend 0.001815365 seconds time elapsed 0.000123000 seconds user 0.001756000 seconds sys [root@s35lp76 perf]# Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Reviewed-by: Sebastien Boisvert <sboisvert@gydle.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: stable@vger.kernel.org Fixes: 292c34c10249 ("perf pmu: Fix core PMU alias list for X86 platform") Link: http://lkml.kernel.org/r/20181023151616.78193-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-11-05	perf record: Support weak groups	Andi Kleen	2	-2/+6
	Implement a weak group fallback for 'perf record', similar to the existing 'perf stat' support. This allows to use groups that might be longer than the available counters without failing. Before: $ perf record -e '{cycles,cache-misses,cache-references,cpu_clk_unhalted.thread,cycles,cycles,cycles}' -a sleep 1 Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cycles). /bin/dmesg \| grep -i perf may provide additional information. After: $ ./perf record -e '{cycles,cache-misses,cache-references,cpu_clk_unhalted.thread,cycles,cycles,cycles}:W' -a sleep 1 WARNING: No sample_id_all support, falling back to unordered processing [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 8.136 MB perf.data (134069 samples) ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20181001195927.14211-2-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>