Merge tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt: - Added new "bootconfig". This looks for a file appended to initrd to add boot config options, and has been discussed thoroughly at Linux Plumbers. Very useful for adding kprobes at bootup. Only enabled if "bootconfig" is on the real kernel command line. - Created dynamic event creation. Merges common code between creating synthetic events and kprobe events. - Rename perf "ring_buffer" structure to "perf_buffer" - Rename ftrace "ring_buffer" structure to "trace_buffer" Had to rename existing "trace_buffer" to "array_buffer" - Allow trace_printk() to work withing (some) tracing code. - Sort of tracing configs to be a little better organized - Fixed bug where ftrace_graph hash was not being protected properly - Various other small fixes and clean ups * tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (88 commits) bootconfig: Show the number of nodes on boot message tools/bootconfig: Show the number of bootconfig nodes bootconfig: Add more parse error messages bootconfig: Use bootconfig instead of boot config ftrace: Protect ftrace_graph_hash with ftrace_sync ftrace: Add comment to why rcu_dereference_sched() is open coded tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu tracing: Annotate ftrace_graph_hash pointer with __rcu bootconfig: Only load bootconfig if "bootconfig" is on the kernel cmdline tracing: Use seq_buf for building dynevent_cmd string tracing: Remove useless code in dynevent_arg_pair_add() tracing: Remove check_arg() callbacks from dynevent args tracing: Consolidate some synth_event_trace code tracing: Fix now invalid var_ref_vals assumption in trace action tracing: Change trace_boot to use synth_event interface tracing: Move tracing selftests to bottom of menu tracing: Move mmio tracer config up with the other tracers tracing: Move tracing test module configs together tracing: Move all function tracing configs together tracing: Documentation for in-kernel synthetic event API ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2020-02-06 07:12:11 +0000
committer: Linus Torvalds <torvalds@linux-foundation.org> 2020-02-06 07:12:11 +0000
commit: e310396bb8d7db977a0e10ef7b5040e98b89c34c (patch)
tree: 461357ada35c637292884f0af1dc9fc41c0203ff /Documentation
parent: Merge tag 'io_uring-5.6-2020-02-05' of git://git.kernel.dk/linux-block (diff)
parent: bootconfig: Show the number of nodes on boot message (diff)
download: linux-dev-e310396bb8d7db977a0e10ef7b5040e98b89c34c.tar.xz
linux-dev-e310396bb8d7db977a0e10ef7b5040e98b89c34c.zip
7 files changed, 898 insertions, 0 deletions
diff --git a/Documentation/admin-guide/bootconfig.rst b/Documentation/admin-guide/bootconfig.rst
new file mode 100644
index 000000000000..b342a6796392
--- /dev/null
+++ b/Documentation/admin-guide/bootconfig.rst
@@ -0,0 +1,190 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _bootconfig:
+
+==================
+Boot Configuration
+==================
+
+:Author: Masami Hiramatsu <mhiramat@kernel.org>
+
+Overview
+========
+
+The boot configuration expands the current kernel command line to support
+additional key-value data when booting the kernel in an efficient way.
+This allows administrators to pass a structured-Key config file.
+
+Config File Syntax
+==================
+
+The boot config syntax is a simple structured key-value. Each key consists
+of dot-connected-words, and key and value are connected by ``=``. The value
+has to be terminated by semi-colon (``;``) or newline (``\n``).
+For array value, array entries are separated by comma (``,``). ::
+
+KEY[.WORD[...]] = VALUE[, VALUE2[...]][;]
+
+Unlike the kernel command line syntax, spaces are OK around the comma and ``=``.
+
+Each key word must contain only alphabets, numbers, dash (``-``) or underscore
+(``_``). And each value only contains printable characters or spaces except
+for delimiters such as semi-colon (``;``), new-line (``\n``), comma (``,``),
+hash (``#``) and closing brace (``}``).
+
+If you want to use those delimiters in a value, you can use either double-
+quotes (``"VALUE"``) or single-quotes (``'VALUE'``) to quote it. Note that
+you can not escape these quotes.
+
+There can be a key which doesn't have value or has an empty value. Those keys
+are used for checking if the key exists or not (like a boolean).
+
+Key-Value Syntax
+----------------
+
+The boot config file syntax allows user to merge partially same word keys
+by brace. For example::
+
+ foo.bar.baz = value1
+ foo.bar.qux.quux = value2
+
+These can be written also in::
+
+ foo.bar {
+    baz = value1
+    qux.quux = value2
+ }
+
+Or more shorter, written as following::
+
+ foo.bar { baz = value1; qux.quux = value2 }
+
+In both styles, same key words are automatically merged when parsing it
+at boot time. So you can append similar trees or key-values.
+
+Comments
+--------
+
+The config syntax accepts shell-script style comments. The comments starting
+with hash ("#") until newline ("\n") will be ignored.
+
+::
+
+ # comment line
+ foo = value # value is set to foo.
+ bar = 1, # 1st element
+       2, # 2nd element
+       3  # 3rd element
+
+This is parsed as below::
+
+ foo = value
+ bar = 1, 2, 3
+
+Note that you can not put a comment between value and delimiter(``,`` or
+``;``). This means following config has a syntax error ::
+
+ key = 1 # comment
+       ,2
+
+
+/proc/bootconfig
+================
+
+/proc/bootconfig is a user-space interface of the boot config.
+Unlike /proc/cmdline, this file shows the key-value style list.
+Each key-value pair is shown in each line with following style::
+
+ KEY[.WORDS...] = "[VALUE]"[,"VALUE2"...]
+
+
+Boot Kernel With a Boot Config
+==============================
+
+Since the boot configuration file is loaded with initrd, it will be added
+to the end of the initrd (initramfs) image file. The Linux kernel decodes
+the last part of the initrd image in memory to get the boot configuration
+data.
+Because of this "piggyback" method, there is no need to change or
+update the boot loader and the kernel image itself.
+
+To do this operation, Linux kernel provides "bootconfig" command under
+tools/bootconfig, which allows admin to apply or delete the config file
+to/from initrd image. You can build it by the following command::
+
+ # make -C tools/bootconfig
+
+To add your boot config file to initrd image, run bootconfig as below
+(Old data is removed automatically if exists)::
+
+ # tools/bootconfig/bootconfig -a your-config /boot/initrd.img-X.Y.Z
+
+To remove the config from the image, you can use -d option as below::
+
+ # tools/bootconfig/bootconfig -d /boot/initrd.img-X.Y.Z
+
+Then add "bootconfig" on the normal kernel command line to tell the
+kernel to look for the bootconfig at the end of the initrd file.
+
+Config File Limitation
+======================
+
+Currently the maximum config size size is 32KB and the total key-words (not
+key-value entries) must be under 1024 nodes.
+Note: this is not the number of entries but nodes, an entry must consume
+more than 2 nodes (a key-word and a value). So theoretically, it will be
+up to 512 key-value pairs. If keys contains 3 words in average, it can
+contain 256 key-value pairs. In most cases, the number of config items
+will be under 100 entries and smaller than 8KB, so it would be enough.
+If the node number exceeds 1024, parser returns an error even if the file
+size is smaller than 32KB.
+Anyway, since bootconfig command verifies it when appending a boot config
+to initrd image, user can notice it before boot.
+
+
+Bootconfig APIs
+===============
+
+User can query or loop on key-value pairs, also it is possible to find
+a root (prefix) key node and find key-values under that node.
+
+If you have a key string, you can query the value directly with the key
+using xbc_find_value(). If you want to know what keys exist in the boot
+config, you can use xbc_for_each_key_value() to iterate key-value pairs.
+Note that you need to use xbc_array_for_each_value() for accessing
+each array's value, e.g.::
+
+ vnode = NULL;
+ xbc_find_value("key.word", &vnode);
+ if (vnode && xbc_node_is_array(vnode))
+    xbc_array_for_each_value(vnode, value) {
+      printk("%s ", value);
+    }
+
+If you want to focus on keys which have a prefix string, you can use
+xbc_find_node() to find a node by the prefix string, and iterate
+keys under the prefix node with xbc_node_for_each_key_value().
+
+But the most typical usage is to get the named value under prefix
+or get the named array under prefix as below::
+
+ root = xbc_find_node("key.prefix");
+ value = xbc_node_find_value(root, "option", &vnode);
+ ...
+ xbc_node_for_each_array_value(root, "array-option", value, anode) {
+    ...
+ }
+
+This accesses a value of "key.prefix.option" and an array of
+"key.prefix.array-option".
+
+Locking is not needed, since after initialization, the config becomes
+read-only. All data and keys must be copied if you need to modify it.
+
+
+Functions and structures
+========================
+
+.. kernel-doc:: include/linux/bootconfig.h
+.. kernel-doc:: lib/bootconfig.c
+
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 4433f3929481..f1d0ccffbe72 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -64,6 +64,7 @@ configure specific aspects of kernel behavior to your liking.
    binderfs
    binfmt-misc
    blockdev/index
+   bootconfig
    braille-console
    btmrvl
    cgroup-v1/index
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ddc5ccdd4cd1..dbc22d684627 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -437,6 +437,12 @@
 			no delay (0).
 			Format: integer
 
+	bootconfig	[KNL]
+			Extended command line options can be added to an initrd
+			and this will cause the kernel to look for it.
+
+			See Documentation/admin-guide/bootconfig.rst
+
 	bert_disable	[ACPI]
 			Disable BERT OS support on buggy BIOSes.
 
diff --git a/Documentation/trace/boottime-trace.rst b/Documentation/trace/boottime-trace.rst
new file mode 100644
index 000000000000..dcb390075ca1
--- /dev/null
+++ b/Documentation/trace/boottime-trace.rst
@@ -0,0 +1,184 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================
+Boot-time tracing
+=================
+
+:Author: Masami Hiramatsu <mhiramat@kernel.org>
+
+Overview
+========
+
+Boot-time tracing allows users to trace boot-time process including
+device initialization with full features of ftrace including per-event
+filter and actions, histograms, kprobe-events and synthetic-events,
+and trace instances.
+Since kernel command line is not enough to control these complex features,
+this uses bootconfig file to describe tracing feature programming.
+
+Options in the Boot Config
+==========================
+
+Here is the list of available options list for boot time tracing in
+boot config file [1]_. All options are under "ftrace." or "kernel."
+prefix. See kernel parameters for the options which starts
+with "kernel." prefix [2]_.
+
+.. [1] See :ref:`Documentation/admin-guide/bootconfig.rst <bootconfig>`
+.. [2] See :ref:`Documentation/admin-guide/kernel-parameters.rst <kernelparameters>`
+
+Ftrace Global Options
+---------------------
+
+Ftrace global options have "kernel." prefix in boot config, which means
+these options are passed as a part of kernel legacy command line.
+
+kernel.tp_printk
+   Output trace-event data on printk buffer too.
+
+kernel.dump_on_oops [= MODE]
+   Dump ftrace on Oops. If MODE = 1 or omitted, dump trace buffer
+   on all CPUs. If MODE = 2, dump a buffer on a CPU which kicks Oops.
+
+kernel.traceoff_on_warning
+   Stop tracing if WARN_ON() occurs.
+
+kernel.fgraph_max_depth = MAX_DEPTH
+   Set MAX_DEPTH to maximum depth of fgraph tracer.
+
+kernel.fgraph_filters = FILTER[, FILTER2...]
+   Add fgraph tracing function filters.
+
+kernel.fgraph_notraces = FILTER[, FILTER2...]
+   Add fgraph non-tracing function filters.
+
+
+Ftrace Per-instance Options
+---------------------------
+
+These options can be used for each instance including global ftrace node.
+
+ftrace.[instance.INSTANCE.]options = OPT1[, OPT2[...]]
+   Enable given ftrace options.
+
+ftrace.[instance.INSTANCE.]trace_clock = CLOCK
+   Set given CLOCK to ftrace's trace_clock.
+
+ftrace.[instance.INSTANCE.]buffer_size = SIZE
+   Configure ftrace buffer size to SIZE. You can use "KB" or "MB"
+   for that SIZE.
+
+ftrace.[instance.INSTANCE.]alloc_snapshot
+   Allocate snapshot buffer.
+
+ftrace.[instance.INSTANCE.]cpumask = CPUMASK
+   Set CPUMASK as trace cpu-mask.
+
+ftrace.[instance.INSTANCE.]events = EVENT[, EVENT2[...]]
+   Enable given events on boot. You can use a wild card in EVENT.
+
+ftrace.[instance.INSTANCE.]tracer = TRACER
+   Set TRACER to current tracer on boot. (e.g. function)
+
+ftrace.[instance.INSTANCE.]ftrace.filters
+   This will take an array of tracing function filter rules.
+
+ftrace.[instance.INSTANCE.]ftrace.notraces
+   This will take an array of NON-tracing function filter rules.
+
+
+Ftrace Per-Event Options
+------------------------
+
+These options are setting per-event options.
+
+ftrace.[instance.INSTANCE.]event.GROUP.EVENT.enable
+   Enable GROUP:EVENT tracing.
+
+ftrace.[instance.INSTANCE.]event.GROUP.EVENT.filter = FILTER
+   Set FILTER rule to the GROUP:EVENT.
+
+ftrace.[instance.INSTANCE.]event.GROUP.EVENT.actions = ACTION[, ACTION2[...]]
+   Set ACTIONs to the GROUP:EVENT.
+
+ftrace.[instance.INSTANCE.]event.kprobes.EVENT.probes = PROBE[, PROBE2[...]]
+   Defines new kprobe event based on PROBEs. It is able to define
+   multiple probes on one event, but those must have same type of
+   arguments. This option is available only for the event which
+   group name is "kprobes".
+
+ftrace.[instance.INSTANCE.]event.synthetic.EVENT.fields = FIELD[, FIELD2[...]]
+   Defines new synthetic event with FIELDs. Each field should be
+   "type varname".
+
+Note that kprobe and synthetic event definitions can be written under
+instance node, but those are also visible from other instances. So please
+take care for event name conflict.
+
+
+Examples
+========
+
+For example, to add filter and actions for each event, define kprobe
+events, and synthetic events with histogram, write a boot config like
+below::
+
+  ftrace.event {
+        task.task_newtask {
+                filter = "pid < 128"
+                enable
+        }
+        kprobes.vfs_read {
+                probes = "vfs_read $arg1 $arg2"
+                filter = "common_pid < 200"
+                enable
+        }
+        synthetic.initcall_latency {
+                fields = "unsigned long func", "u64 lat"
+                actions = "hist:keys=func.sym,lat:vals=lat:sort=lat"
+        }
+        initcall.initcall_start {
+                actions = "hist:keys=func:ts0=common_timestamp.usecs"
+        }
+        initcall.initcall_finish {
+                actions = "hist:keys=func:lat=common_timestamp.usecs-$ts0:onmatch(initcall.initcall_start).initcall_latency(func,$lat)"
+        }
+  }
+
+Also, boot-time tracing supports "instance" node, which allows us to run
+several tracers for different purpose at once. For example, one tracer
+is for tracing functions starting with "user\_", and others tracing
+"kernel\_" functions, you can write boot config as below::
+
+  ftrace.instance {
+        foo {
+                tracer = "function"
+                ftrace.filters = "user_*"
+        }
+        bar {
+                tracer = "function"
+                ftrace.filters = "kernel_*"
+        }
+  }
+
+The instance node also accepts event nodes so that each instance
+can customize its event tracing.
+
+This boot-time tracing also supports ftrace kernel parameters via boot
+config.
+For example, following kernel parameters::
+
+ trace_options=sym-addr trace_event=initcall:* tp_printk trace_buf_size=1M ftrace=function ftrace_filter="vfs*"
+
+This can be written in boot config like below::
+
+  kernel {
+        trace_options = sym-addr
+        trace_event = "initcall:*"
+        tp_printk
+        trace_buf_size = 1M
+        ftrace = function
+        ftrace_filter = "vfs*"
+  }
+
+Note that parameters start with "kernel" prefix instead of "ftrace".
diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
index f7e1fcc0953c..ed79b220bd07 100644
--- a/Documentation/trace/events.rst
+++ b/Documentation/trace/events.rst
@@ -525,3 +525,518 @@ The following commands are supported:
   event counts (hitcount).
 
   See Documentation/trace/histogram.rst for details and examples.
+
+6.3 In-kernel trace event API
+-----------------------------
+
+In most cases, the command-line interface to trace events is more than
+sufficient.  Sometimes, however, applications might find the need for
+more complex relationships than can be expressed through a simple
+series of linked command-line expressions, or putting together sets of
+commands may be simply too cumbersome.  An example might be an
+application that needs to 'listen' to the trace stream in order to
+maintain an in-kernel state machine detecting, for instance, when an
+illegal kernel state occurs in the scheduler.
+
+The trace event subsystem provides an in-kernel API allowing modules
+or other kernel code to generate user-defined 'synthetic' events at
+will, which can be used to either augment the existing trace stream
+and/or signal that a particular important state has occurred.
+
+A similar in-kernel API is also available for creating kprobe and
+kretprobe events.
+
+Both the synthetic event and k/ret/probe event APIs are built on top
+of a lower-level "dynevent_cmd" event command API, which is also
+available for more specialized applications, or as the basis of other
+higher-level trace event APIs.
+
+The API provided for these purposes is describe below and allows the
+following:
+
+  - dynamically creating synthetic event definitions
+  - dynamically creating kprobe and kretprobe event definitions
+  - tracing synthetic events from in-kernel code
+  - the low-level "dynevent_cmd" API
+
+6.3.1 Dyamically creating synthetic event definitions
+-----------------------------------------------------
+
+There are a couple ways to create a new synthetic event from a kernel
+module or other kernel code.
+
+The first creates the event in one step, using synth_event_create().
+In this method, the name of the event to create and an array defining
+the fields is supplied to synth_event_create().  If successful, a
+synthetic event with that name and fields will exist following that
+call.  For example, to create a new "schedtest" synthetic event:
+
+  ret = synth_event_create("schedtest", sched_fields,
+                           ARRAY_SIZE(sched_fields), THIS_MODULE);
+
+The sched_fields param in this example points to an array of struct
+synth_field_desc, each of which describes an event field by type and
+name:
+
+  static struct synth_field_desc sched_fields[] = {
+        { .type = "pid_t",              .name = "next_pid_field" },
+        { .type = "char[16]",           .name = "next_comm_field" },
+        { .type = "u64",                .name = "ts_ns" },
+        { .type = "u64",                .name = "ts_ms" },
+        { .type = "unsigned int",       .name = "cpu" },
+        { .type = "char[64]",           .name = "my_string_field" },
+        { .type = "int",                .name = "my_int_field" },
+  };
+
+See synth_field_size() for available types. If field_name contains [n]
+the field is considered to be an array.
+
+If the event is created from within a module, a pointer to the module
+must be passed to synth_event_create().  This will ensure that the
+trace buffer won't contain unreadable events when the module is
+removed.
+
+At this point, the event object is ready to be used for generating new
+events.
+
+In the second method, the event is created in several steps.  This
+allows events to be created dynamically and without the need to create
+and populate an array of fields beforehand.
+
+To use this method, an empty or partially empty synthetic event should
+first be created using synth_event_gen_cmd_start() or
+synth_event_gen_cmd_array_start().  For synth_event_gen_cmd_start(),
+the name of the event along with one or more pairs of args each pair
+representing a 'type field_name;' field specification should be
+supplied.  For synth_event_gen_cmd_array_start(), the name of the
+event along with an array of struct synth_field_desc should be
+supplied. Before calling synth_event_gen_cmd_start() or
+synth_event_gen_cmd_array_start(), the user should create and
+initialize a dynevent_cmd object using synth_event_cmd_init().
+
+For example, to create a new "schedtest" synthetic event with two
+fields:
+
+  struct dynevent_cmd cmd;
+  char *buf;
+
+  /* Create a buffer to hold the generated command */
+  buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL);
+
+  /* Before generating the command, initialize the cmd object */
+  synth_event_cmd_init(&cmd, buf, MAX_DYNEVENT_CMD_LEN);
+
+  ret = synth_event_gen_cmd_start(&cmd, "schedtest", THIS_MODULE,
+                                  "pid_t", "next_pid_field",
+                                  "u64", "ts_ns");
+
+Alternatively, using an array of struct synth_field_desc fields
+containing the same information:
+
+  ret = synth_event_gen_cmd_array_start(&cmd, "schedtest", THIS_MODULE,
+                                        fields, n_fields);
+
+Once the synthetic event object has been created, it can then be
+populated with more fields.  Fields are added one by one using
+synth_event_add_field(), supplying the dynevent_cmd object, a field
+type, and a field name.  For example, to add a new int field named
+"intfield", the following call should be made:
+
+  ret = synth_event_add_field(&cmd, "int", "intfield");
+
+See synth_field_size() for available types. If field_name contains [n]
+the field is considered to be an array.
+
+A group of fields can also be added all at once using an array of
+synth_field_desc with add_synth_fields().  For example, this would add
+just the first four sched_fields:
+
+  ret = synth_event_add_fields(&cmd, sched_fields, 4);
+
+If you already have a string of the form 'type field_name',
+synth_event_add_field_str() can be used to add it as-is; it will
+also automatically append a ';' to the string.
+
+Once all the fields have been added, the event should be finalized and
+registered by calling the synth_event_gen_cmd_end() function:
+
+  ret = synth_event_gen_cmd_end(&cmd);
+
+At this point, the event object is ready to be used for tracing new
+events.
+
+6.3.3 Tracing synthetic events from in-kernel code
+--------------------------------------------------
+
+To trace a synthetic event, there are several options.  The first
+option is to trace the event in one call, using synth_event_trace()
+with a variable number of values, or synth_event_trace_array() with an
+array of values to be set.  A second option can be used to avoid the
+need for a pre-formed array of values or list of arguments, via
+synth_event_trace_start() and synth_event_trace_end() along with
+synth_event_add_next_val() or synth_event_add_val() to add the values
+piecewise.
+
+6.3.3.1 Tracing a synthetic event all at once
+---------------------------------------------
+
+To trace a synthetic event all at once, the synth_event_trace() or
+synth_event_trace_array() functions can be used.
+
+The synth_event_trace() function is passed the trace_event_file
+representing the synthetic event (which can be retrieved using
+trace_get_event_file() using the synthetic event name, "synthetic" as
+the system name, and the trace instance name (NULL if using the global
+trace array)), along with an variable number of u64 args, one for each
+synthetic event field, and the number of values being passed.
+
+So, to trace an event corresponding to the synthetic event definition
+above, code like the following could be used:
+
+  ret = synth_event_trace(create_synth_test, 7, /* number of values */
+                          444,             /* next_pid_field */
+                          (u64)"clackers", /* next_comm_field */
+                          1000000,         /* ts_ns */
+                          1000,            /* ts_ms */
+                          smp_processor_id(),/* cpu */
+                          (u64)"Thneed",   /* my_string_field */
+                          999);            /* my_int_field */
+
+All vals should be cast to u64, and string vals are just pointers to
+strings, cast to u64.  Strings will be copied into space reserved in
+the event for the string, using these pointers.
+
+Alternatively, the synth_event_trace_array() function can be used to
+accomplish the same thing.  It is passed the trace_event_file
+representing the synthetic event (which can be retrieved using
+trace_get_event_file() using the synthetic event name, "synthetic" as
+the system name, and the trace instance name (NULL if using the global
+trace array)), along with an array of u64, one for each synthetic
+event field.
+
+To trace an event corresponding to the synthetic event definition
+above, code like the following could be used:
+
+  u64 vals[7];
+
+  vals[0] = 777;                  /* next_pid_field */
+  vals[1] = (u64)"tiddlywinks";   /* next_comm_field */
+  vals[2] = 1000000;              /* ts_ns */
+  vals[3] = 1000;                 /* ts_ms */
+  vals[4] = smp_processor_id();   /* cpu */
+  vals[5] = (u64)"thneed";        /* my_string_field */
+  vals[6] = 398;                  /* my_int_field */
+
+The 'vals' array is just an array of u64, the number of which must
+match the number of field in the synthetic event, and which must be in
+the same order as the synthetic event fields.
+
+All vals should be cast to u64, and string vals are just pointers to
+strings, cast to u64.  Strings will be copied into space reserved in
+the event for the string, using these pointers.
+
+In order to trace a synthetic event, a pointer to the trace event file
+is needed.  The trace_get_event_file() function can be used to get
+it - it will find the file in the given trace instance (in this case
+NULL since the top trace array is being used) while at the same time
+preventing the instance containing it from going away:
+
+       schedtest_event_file = trace_get_event_file(NULL, "synthetic",
+                                                   "schedtest");
+
+Before tracing the event, it should be enabled in some way, otherwise
+the synthetic event won't actually show up in the trace buffer.
+
+To enable a synthetic event from the kernel, trace_array_set_clr_event()
+can be used (which is not specific to synthetic events, so does need
+the "synthetic" system name to be specified explicitly).
+
+To enable the event, pass 'true' to it:
+
+       trace_array_set_clr_event(schedtest_event_file->tr,
+                                 "synthetic", "schedtest", true);
+
+To disable it pass false:
+
+       trace_array_set_clr_event(schedtest_event_file->tr,
+                                 "synthetic", "schedtest", false);
+
+Finally, synth_event_trace_array() can be used to actually trace the
+event, which should be visible in the trace buffer afterwards:
+
+       ret = synth_event_trace_array(schedtest_event_file, vals,
+                                     ARRAY_SIZE(vals));
+
+To remove the synthetic event, the event should be disabled, and the
+trace instance should be 'put' back using trace_put_event_file():
+
+       trace_array_set_clr_event(schedtest_event_file->tr,
+                                 "synthetic", "schedtest", false);
+       trace_put_event_file(schedtest_event_file);
+
+If those have been successful, synth_event_delete() can be called to
+remove the event:
+
+       ret = synth_event_delete("schedtest");
+
+6.3.3.1 Tracing a synthetic event piecewise
+-------------------------------------------
+
+To trace a synthetic using the piecewise method described above, the
+synth_event_trace_start() function is used to 'open' the synthetic
+event trace:
+
+       struct synth_trace_state trace_state;
+
+       ret = synth_event_trace_start(schedtest_event_file, &trace_state);
+
+It's passed the trace_event_file representing the synthetic event
+using the same methods as described above, along with a pointer to a
+struct synth_trace_state object, which will be zeroed before use and
+used to maintain state between this and following calls.
+
+Once the event has been opened, which means space for it has been
+reserved in the trace buffer, the individual fields can be set.  There
+are two ways to do that, either one after another for each field in
+the event, which requires no lookups, or by name, which does.  The
+tradeoff is flexibility in doing the assignments vs the cost of a
+lookup per field.
+
+To assign the values one after the other without lookups,
+synth_event_add_next_val() should be used.  Each call is passed the
+same synth_trace_state object used in the synth_event_trace_start(),
+along with the value to set the next field in the event.  After each
+field is set, the 'cursor' points to the next field, which will be set
+by the subsequent call, continuing until all the fields have been set
+in order.  The same sequence of calls as in the above examples using
+this method would be (without error-handling code):
+
+       /* next_pid_field */
+       ret = synth_event_add_next_val(777, &trace_state);
+
+       /* next_comm_field */
+       ret = synth_event_add_next_val((u64)"slinky", &trace_state);
+
+       /* ts_ns */
+       ret = synth_event_add_next_val(1000000, &trace_state);
+
+       /* ts_ms */
+       ret = synth_event_add_next_val(1000, &trace_state);
+
+       /* cpu */
+       ret = synth_event_add_next_val(smp_processor_id(), &trace_state);
+
+       /* my_string_field */
+       ret = synth_event_add_next_val((u64)"thneed_2.01", &trace_state);
+
+       /* my_int_field */
+       ret = synth_event_add_next_val(395, &trace_state);
+
+To assign the values in any order, synth_event_add_val() should be
+used.  Each call is passed the same synth_trace_state object used in
+the synth_event_trace_start(), along with the field name of the field
+to set and the value to set it to.  The same sequence of calls as in
+the above examples using this method would be (without error-handling
+code):
+
+       ret = synth_event_add_val("next_pid_field", 777, &trace_state);
+       ret = synth_event_add_val("next_comm_field", (u64)"silly putty",
+                                 &trace_state);
+       ret = synth_event_add_val("ts_ns", 1000000, &trace_state);
+       ret = synth_event_add_val("ts_ms", 1000, &trace_state);
+       ret = synth_event_add_val("cpu", smp_processor_id(), &trace_state);
+       ret = synth_event_add_val("my_string_field", (u64)"thneed_9",
+                                 &trace_state);
+       ret = synth_event_add_val("my_int_field", 3999, &trace_state);
+
+Note that synth_event_add_next_val() and synth_event_add_val() are
+incompatible if used within the same trace of an event - either one
+can be used but not both at the same time.
+
+Finally, the event won't be actually traced until it's 'closed',
+which is done using synth_event_trace_end(), which takes only the
+struct synth_trace_state object used in the previous calls:
+
+       ret = synth_event_trace_end(&trace_state);
+
+Note that synth_event_trace_end() must be called at the end regardless
+of whether any of the add calls failed (say due to a bad field name
+being passed in).
+
+6.3.4 Dyamically creating kprobe and kretprobe event definitions
+----------------------------------------------------------------
+
+To create a kprobe or kretprobe trace event from kernel code, the
+kprobe_event_gen_cmd_start() or kretprobe_event_gen_cmd_start()
+functions can be used.
+
+To create a kprobe event, an empty or partially empty kprobe event
+should first be created using kprobe_event_gen_cmd_start().  The name
+of the event and the probe location should be specfied along with one
+or args each representing a probe field should be supplied to this
+function.  Before calling kprobe_event_gen_cmd_start(), the user
+should create and initialize a dynevent_cmd object using
+kprobe_event_cmd_init().
+
+For example, to create a new "schedtest" kprobe event with two fields:
+
+  struct dynevent_cmd cmd;
+  char *buf;
+
+  /* Create a buffer to hold the generated command */
+  buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL);
+
+  /* Before generating the command, initialize the cmd object */
+  kprobe_event_cmd_init(&cmd, buf, MAX_DYNEVENT_CMD_LEN);
+
+  /*
+   * Define the gen_kprobe_test event with the first 2 kprobe
+   * fields.
+   */
+  ret = kprobe_event_gen_cmd_start(&cmd, "gen_kprobe_test", "do_sys_open",
+                                   "dfd=%ax", "filename=%dx");
+
+Once the kprobe event object has been created, it can then be
+populated with more fields.  Fields can be added using
+kprobe_event_add_fields(), supplying the dynevent_cmd object along
+with a variable arg list of probe fields.  For example, to add a
+couple additional fields, the following call could be made:
+
+  ret = kprobe_event_add_fields(&cmd, "flags=%cx", "mode=+4($stack)");
+
+Once all the fields have been added, the event should be finalized and
+registered by calling the kprobe_event_gen_cmd_end() or
+kretprobe_event_gen_cmd_end() functions, depending on whether a kprobe
+or kretprobe command was started:
+
+  ret = kprobe_event_gen_cmd_end(&cmd);
+
+or
+
+  ret = kretprobe_event_gen_cmd_end(&cmd);
+
+At this point, the event object is ready to be used for tracing new
+events.
+
+Similarly, a kretprobe event can be created using
+kretprobe_event_gen_cmd_start() with a probe name and location and
+additional params such as $retval:
+
+  ret = kretprobe_event_gen_cmd_start(&cmd, "gen_kretprobe_test",
+                                      "do_sys_open", "$retval");
+
+Similar to the synthetic event case, code like the following can be
+used to enable the newly created kprobe event:
+
+  gen_kprobe_test = trace_get_event_file(NULL, "kprobes", "gen_kprobe_test");
+
+  ret = trace_array_set_clr_event(gen_kprobe_test->tr,
+                                  "kprobes", "gen_kprobe_test", true);
+
+Finally, also similar to synthetic events, the following code can be
+used to give the kprobe event file back and delete the event:
+
+  trace_put_event_file(gen_kprobe_test);
+
+  ret = kprobe_event_delete("gen_kprobe_test");
+
+6.3.4 The "dynevent_cmd" low-level API
+--------------------------------------
+
+Both the in-kernel synthetic event and kprobe interfaces are built on
+top of a lower-level "dynevent_cmd" interface.  This interface is
+meant to provide the basis for higher-level interfaces such as the
+synthetic and kprobe interfaces, which can be used as examples.
+
+The basic idea is simple and amounts to providing a general-purpose
+layer that can be used to generate trace event commands.  The
+generated command strings can then be passed to the command-parsing
+and event creation code that already exists in the trace event
+subystem for creating the corresponding trace events.
+
+In a nutshell, the way it works is that the higher-level interface
+code creates a struct dynevent_cmd object, then uses a couple
+functions, dynevent_arg_add() and dynevent_arg_pair_add() to build up
+a command string, which finally causes the command to be executed
+using the dynevent_create() function.  The details of the interface
+are described below.
+
+The first step in building a new command string is to create and
+initialize an instance of a dynevent_cmd.  Here, for instance, we
+create a dynevent_cmd on the stack and initialize it:
+
+  struct dynevent_cmd cmd;
+  char *buf;
+  int ret;
+
+  buf = kzalloc(MAX_DYNEVENT_CMD_LEN, GFP_KERNEL);
+
+  dynevent_cmd_init(cmd, buf, maxlen, DYNEVENT_TYPE_FOO,
+                    foo_event_run_command);
+
+The dynevent_cmd initialization needs to be given a user-specified
+buffer and the length of the buffer (MAX_DYNEVENT_CMD_LEN can be used
+for this purpose - at 2k it's generally too big to be comfortably put
+on the stack, so is dynamically allocated), a dynevent type id, which
+is meant to be used to check that further API calls are for the
+correct command type, and a pointer to an event-specific run_command()
+callback that will be called to actually execute the event-specific
+command function.
+
+Once that's done, the command string can by built up by successive
+calls to argument-adding functions.
+
+To add a single argument, define and initialize a struct dynevent_arg
+or struct dynevent_arg_pair object.  Here's an example of the simplest
+possible arg addition, which is simply to append the given string as
+a whitespace-separated argument to the command:
+
+  struct dynevent_arg arg;
+
+  dynevent_arg_init(&arg, NULL, 0);
+
+  arg.str = name;
+
+  ret = dynevent_arg_add(cmd, &arg);
+
+The arg object is first initialized using dynevent_arg_init() and in
+this case the parameters are NULL or 0, which means there's no
+optional sanity-checking function or separator appended to the end of
+the arg.
+
+Here's another more complicated example using an 'arg pair', which is
+used to create an argument that consists of a couple components added
+together as a unit, for example, a 'type field_name;' arg or a simple
+expression arg e.g. 'flags=%cx':
+
+  struct dynevent_arg_pair arg_pair;
+
+  dynevent_arg_pair_init(&arg_pair, dynevent_foo_check_arg_fn, 0, ';');
+
+  arg_pair.lhs = type;
+  arg_pair.rhs = name;
+
+  ret = dynevent_arg_pair_add(cmd, &arg_pair);
+
+Again, the arg_pair is first initialized, in this case with a callback
+function used to check the sanity of the args (for example, that
+neither part of the pair is NULL), along with a character to be used
+to add an operator between the pair (here none) and a separator to be
+appended onto the end of the arg pair (here ';').
+
+There's also a dynevent_str_add() function that can be used to simply
+add a string as-is, with no spaces, delimeters, or arg check.
+
+Any number of dynevent_*_add() calls can be made to build up the string
+(until its length surpasses cmd->maxlen).  When all the arguments have
+been added and the command string is complete, the only thing left to
+do is run the command, which happens by simply calling
+dynevent_create():
+
+  ret = dynevent_create(&cmd);
+
+At that point, if the return value is 0, the dynamic event has been
+created and is ready to use.
+
+See the dynevent_cmd function definitions themselves for the details
+of the API.
diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
index 04acd277c5f6..fa9e1c730f6a 100644
--- a/Documentation/trace/index.rst
+++ b/Documentation/trace/index.rst
@@ -19,6 +19,7 @@ Linux Tracing Technologies
    events-msr
    mmiotrace
    histogram
+   boottime-trace
    hwlat_detector
    intel_th
    stm
diff --git a/Documentation/trace/kprobetrace.rst b/Documentation/trace/kprobetrace.rst
index 55993055902c..705d73087099 100644
--- a/Documentation/trace/kprobetrace.rst
+++ b/Documentation/trace/kprobetrace.rst
@@ -97,6 +97,7 @@ which shows given pointer in "symbol+offset" style.
 For $comm, the default type is "string"; any other type is invalid.
 
 .. _user_mem_access:
+
 User Memory Access
 ------------------
 Kprobe events supports user-space memory access. For that purpose, you can use
author	Linus Torvalds <torvalds@linux-foundation.org>	2020-02-06 07:12:11 +0000
committer	Linus Torvalds <torvalds@linux-foundation.org>	2020-02-06 07:12:11 +0000
commit	e310396bb8d7db977a0e10ef7b5040e98b89c34c (patch)
tree	461357ada35c637292884f0af1dc9fc41c0203ff /Documentation
parent	Merge tag 'io_uring-5.6-2020-02-05' of git://git.kernel.dk/linux-block (diff)
parent	bootconfig: Show the number of nodes on boot message (diff)
download	linux-dev-e310396bb8d7db977a0e10ef7b5040e98b89c34c.tar.xz linux-dev-e310396bb8d7db977a0e10ef7b5040e98b89c34c.zip