Merge branch 'bpf-tracepoints'

Alexei Starovoitov says: ==================== allow bpf attach to tracepoints Hi Steven, Peter, v1->v2: addressed Peter's comments: - fixed wording in patch 1, added ack - refactored 2nd patch into 3: 2/10 remove unused __perf_addr macro which frees up an argument in perf_trace_buf_submit 3/10 split perf_trace_buf_prepare into alloc and update parts, so that bpf programs don't have to pay performance penalty for update of struct trace_entry which is not going to be accessed by bpf 4/10 actual addition of bpf filter to perf tracepoint handler is now trivial and bpf prog can be used as proper filter of tracepoints v1 cover: last time we discussed bpf+tracepoints it was a year ago [1] and the reason we didn't proceed with that approach was that bpf would make arguments arg1, arg2 to trace_xx(arg1, arg2) call to be exposed to bpf program and that was considered unnecessary extension of abi. Back then I wanted to avoid the cost of buffer alloc and field assign part in all of the tracepoints, but looks like when optimized the cost is acceptable. So this new apporach doesn't expose any new abi to bpf program. The program is looking at tracepoint fields after they were copied by perf_trace_xx() and described in /sys/kernel/debug/tracing/events/xxx/format We made a tool [2] that takes arguments from /sys/.../format and works as: $ tplist.py -v random:urandom_read int got_bits; int pool_left; int input_left; Then these fields can be copy-pasted into bpf program like: struct urandom_read { __u64 hidden_pad; int got_bits; int pool_left; int input_left; }; and the program can use it: SEC("tracepoint/random/urandom_read") int bpf_prog(struct urandom_read *ctx) { return ctx->pool_left > 0 ? 1 : 0; } This way the program can access tracepoint fields faster than equivalent bpf+kprobe program, which is the main goal of these patches. Patch 1-4 are simple changes in perf core side, please review. I'd like to take the whole set via net-next tree, since the rest of the patches might conflict with other bpf work going on in net-next and we want to avoid cross-tree merge conflicts. Alternatively we can put patches 1-4 into both tip and net-next. Patch 9 is an example of access to tracepoint fields from bpf prog. Patch 10 is a micro benchmark for bpf+kprobe vs bpf+tracepoint. Note that for actual tracing tools the user doesn't need to run tplist.py and copy-paste fields manually. The tools do it automatically. Like argdist tool [3] can be used as: $ argdist -H 't:block:block_rq_complete():u32:nr_sector' where 'nr_sector' is name of tracepoint field taken from /sys/kernel/debug/tracing/events/block/block_rq_complete/format and appropriate bpf program is generated on the fly. [1] http://thread.gmane.org/gmane.linux.kernel.api/8127/focus=8165 [2] https://github.com/iovisor/bcc/blob/master/tools/tplist.py [3] https://github.com/iovisor/bcc/blob/master/tools/argdist.py ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
author: David S. Miller <davem@davemloft.net> 2016-04-07 21:04:27 -0400
committer: David S. Miller <davem@davemloft.net> 2016-04-07 21:04:27 -0400
commit: f8711655f862eabc0cb03e2bccd871069399c53e (patch)
tree: c7516296b2081373659aa0dfe39e61fe7598b56e /include/linux/trace_events.h
parent: net: Fix build failure due to lockdep_sock_is_held(). (diff)
parent: samples/bpf: add tracepoint vs kprobe performance tests (diff)
1 files changed, 5 insertions, 4 deletions
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 0810f81b6db2..fe6441203b59 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -569,6 +569,7 @@ extern int trace_define_field(struct trace_event_call *call, const char *type,
 			      int is_signed, int filter_type);
 extern int trace_add_event_call(struct trace_event_call *call);
 extern int trace_remove_event_call(struct trace_event_call *call);
+extern int trace_event_get_offsets(struct trace_event_call *call);
 
 #define is_signed_type(type)	(((type)(-1)) < (type)1)
 
@@ -605,15 +606,15 @@ extern void perf_trace_del(struct perf_event *event, int flags);
 extern int  ftrace_profile_set_filter(struct perf_event *event, int event_id,
 				     char *filter_str);
 extern void ftrace_profile_free_filter(struct perf_event *event);
-extern void *perf_trace_buf_prepare(int size, unsigned short type,
-				    struct pt_regs **regs, int *rctxp);
+void perf_trace_buf_update(void *record, u16 type);
+void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp);
 
 static inline void
-perf_trace_buf_submit(void *raw_data, int size, int rctx, u64 addr,
+perf_trace_buf_submit(void *raw_data, int size, int rctx, u16 type,
 		       u64 count, struct pt_regs *regs, void *head,
 		       struct task_struct *task)
 {
-	perf_tp_event(addr, count, raw_data, size, regs, head, rctx, task);
+	perf_tp_event(type, count, raw_data, size, regs, head, rctx, task);
 }
 #endif
author	David S. Miller <davem@davemloft.net>	2016-04-07 21:04:27 -0400
committer	David S. Miller <davem@davemloft.net>	2016-04-07 21:04:27 -0400
commit	f8711655f862eabc0cb03e2bccd871069399c53e (patch)
tree	c7516296b2081373659aa0dfe39e61fe7598b56e /include/linux/trace_events.h
parent	net: Fix build failure due to lockdep_sock_is_held(). (diff)
parent	samples/bpf: add tracepoint vs kprobe performance tests (diff)