aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/trace (follow)
AgeCommit message (Collapse)AuthorFilesLines
2025-10-15tracing: Stop fortify-string from warning in tracing_mark_raw_write()Steven Rostedt1-2/+6
commit 54b91e54b113d4f15ab023a44f508251db6e22e7 upstream. The way tracing_mark_raw_write() records its data is that it has the following structure: struct { struct trace_entry; int id; char buf[]; }; But memcpy(&entry->id, buf, size) triggers the following warning when the size is greater than the id: ------------[ cut here ]------------ memcpy: detected field-spanning write (size 6) of single field "&entry->id" at kernel/trace/trace.c:7458 (size 4) WARNING: CPU: 7 PID: 995 at kernel/trace/trace.c:7458 write_raw_marker_to_buffer.isra.0+0x1f9/0x2e0 Modules linked in: CPU: 7 UID: 0 PID: 995 Comm: bash Not tainted 6.17.0-test-00007-g60b82183e78a-dirty #211 PREEMPT(voluntary) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 RIP: 0010:write_raw_marker_to_buffer.isra.0+0x1f9/0x2e0 Code: 04 00 75 a7 b9 04 00 00 00 48 89 de 48 89 04 24 48 c7 c2 e0 b1 d1 b2 48 c7 c7 40 b2 d1 b2 c6 05 2d 88 6a 04 01 e8 f7 e8 bd ff <0f> 0b 48 8b 04 24 e9 76 ff ff ff 49 8d 7c 24 04 49 8d 5c 24 08 48 RSP: 0018:ffff888104c3fc78 EFLAGS: 00010292 RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 1ffffffff6b363b4 RDI: 0000000000000001 RBP: ffff888100058a00 R08: ffffffffb041d459 R09: ffffed1020987f40 R10: 0000000000000007 R11: 0000000000000001 R12: ffff888100bb9010 R13: 0000000000000000 R14: 00000000000003e3 R15: ffff888134800000 FS: 00007fa61d286740(0000) GS:ffff888286cad000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000560d28d509f1 CR3: 00000001047a4006 CR4: 0000000000172ef0 Call Trace: <TASK> tracing_mark_raw_write+0x1fe/0x290 ? __pfx_tracing_mark_raw_write+0x10/0x10 ? security_file_permission+0x50/0xf0 ? rw_verify_area+0x6f/0x4b0 vfs_write+0x1d8/0xdd0 ? __pfx_vfs_write+0x10/0x10 ? __pfx_css_rstat_updated+0x10/0x10 ? count_memcg_events+0xd9/0x410 ? fdget_pos+0x53/0x5e0 ksys_write+0x182/0x200 ? __pfx_ksys_write+0x10/0x10 ? do_user_addr_fault+0x4af/0xa30 do_syscall_64+0x63/0x350 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7fa61d318687 Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff RSP: 002b:00007ffd87fe0120 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007fa61d286740 RCX: 00007fa61d318687 RDX: 0000000000000006 RSI: 0000560d28d509f0 RDI: 0000000000000001 RBP: 0000560d28d509f0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000006 R13: 00007fa61d4715c0 R14: 00007fa61d46ee80 R15: 0000000000000000 </TASK> ---[ end trace 0000000000000000 ]--- This is because fortify string sees that the size of entry->id is only 4 bytes, but it is writing more than that. But this is OK as the dynamic_array is allocated to handle that copy. The size allocated on the ring buffer was actually a bit too big: size = sizeof(*entry) + cnt; But cnt includes the 'id' and the buffer data, so adding cnt to the size of *entry actually allocates too much on the ring buffer. Change the allocation to: size = struct_size(entry, buf, cnt - sizeof(entry->id)); and the memcpy() to unsafe_memcpy() with an added justification. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20251011112032.77be18e4@gandalf.local.home Fixes: 64cf7d058a00 ("tracing: Have trace_marker use per-cpu data to read user space") Reported-by: syzbot+9a2ede1643175f350105@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68e973f5.050a0220.1186a4.0010.GAE@google.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-15tracing: Fix tracing_mark_raw_write() to use buf and not ubufSteven Rostedt1-2/+2
commit bda745ee8fbb63330d8f2f2ea4157229a5df959e upstream. The fix to use a per CPU buffer to read user space tested only the writes to trace_marker. But it appears that the selftests are missing tests to the trace_maker_raw file. The trace_maker_raw file is used by applications that writes data structures and not strings into the file, and the tools read the raw ring buffer to process the structures it writes. The fix that reads the per CPU buffers passes the new per CPU buffer to the trace_marker file writes, but the update to the trace_marker_raw write read the data from user space into the per CPU buffer, but then still used then passed the user space address to the function that records the data. Pass in the per CPU buffer and not the user space address. TODO: Add a test to better test trace_marker_raw. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20251011035243.386098147@kernel.org Fixes: 64cf7d058a00 ("tracing: Have trace_marker use per-cpu data to read user space") Reported-by: syzbot+9a2ede1643175f350105@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68e973f5.050a0220.1186a4.0010.GAE@google.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-15tracing: Have trace_marker use per-cpu data to read user spaceSteven Rostedt1-48/+220
commit 64cf7d058a005c5c31eb8a0b741f35dc12915d18 upstream. It was reported that using __copy_from_user_inatomic() can actually schedule. Which is bad when preemption is disabled. Even though there's logic to check in_atomic() is set, but this is a nop when the kernel is configured with PREEMPT_NONE. This is due to page faulting and the code could schedule with preemption disabled. Link: https://lore.kernel.org/all/20250819105152.2766363-1-luogengkun@huaweicloud.com/ The solution was to change the __copy_from_user_inatomic() to copy_from_user_nofault(). But then it was reported that this caused a regression in Android. There's several applications writing into trace_marker() in Android, but now instead of showing the expected data, it is showing: tracing_mark_write: <faulted> After reverting the conversion to copy_from_user_nofault(), Android was able to get the data again. Writes to the trace_marker is a way to efficiently and quickly enter data into the Linux tracing buffer. It takes no locks and was designed to be as non-intrusive as possible. This means it cannot allocate memory, and must use pre-allocated data. A method that is actively being worked on to have faultable system call tracepoints read user space data is to allocate per CPU buffers, and use them in the callback. The method uses a technique similar to seqcount. That is something like this: preempt_disable(); cpu = smp_processor_id(); buffer = this_cpu_ptr(&pre_allocated_cpu_buffers, cpu); do { cnt = nr_context_switches_cpu(cpu); migrate_disable(); preempt_enable(); ret = copy_from_user(buffer, ptr, size); preempt_disable(); migrate_enable(); } while (!ret && cnt != nr_context_switches_cpu(cpu)); if (!ret) ring_buffer_write(buffer); preempt_enable(); It's a little more involved than that, but the above is the basic logic. The idea is to acquire the current CPU buffer, disable migration, and then enable preemption. At this moment, it can safely use copy_from_user(). After reading the data from user space, it disables preemption again. It then checks to see if there was any new scheduling on this CPU. If there was, it must assume that the buffer was corrupted by another task. If there wasn't, then the buffer is still valid as only tasks in preemptable context can write to this buffer and only those that are running on the CPU. By using this method, where trace_marker open allocates the per CPU buffers, trace_marker writes can access user space and even fault it in, without having to allocate or take any locks of its own. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Luo Gengkun <luogengkun@huaweicloud.com> Cc: Wattson CI <wattson-external@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/20251008124510.6dba541a@gandalf.local.home Fixes: 3d62ab32df065 ("tracing: Fix tracing_marker may trigger page fault during preempt_disable") Reported-by: Runping Lai <runpinglai@google.com> Tested-by: Runping Lai <runpinglai@google.com> Closes: https://lore.kernel.org/linux-trace-kernel/20251007003417.3470979-2-runpinglai@google.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-15tracing: Fix irqoff tracers on failure of acquiring calltimeSteven Rostedt1-13/+10
commit c834a97962c708ff5bb8582ca76b0e1225feb675 upstream. The functions irqsoff_graph_entry() and irqsoff_graph_return() both call func_prolog_dec() that will test if the data->disable is already set and if not, increment it and return. If it was set, it returns false and the caller exits. The caller of this function must decrement the disable counter, but misses doing so if the calltime fails to be acquired. Instead of exiting out when calltime is NULL, change the logic to do the work if it is not NULL and still do the clean up at the end of the function if it is NULL. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20251008114943.6f60f30f@gandalf.local.home Fixes: a485ea9e3ef3 ("tracing: Fix irqsoff and wakeup latency tracers when using function graph") Reported-by: Sasha Levin <sashal@kernel.org> Closes: https://lore.kernel.org/linux-trace-kernel/20251006175848.1906912-2-sashal@kernel.org/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-15tracing: Fix wakeup tracers on failure of acquiring calltimeSteven Rostedt1-10/+6
commit 4f7bf54b07e5acf79edd58dafede4096854776cd upstream. The functions wakeup_graph_entry() and wakeup_graph_return() both call func_prolog_preempt_disable() that will test if the data->disable is already set and if not, increment it and disable preemption. If it was set, it returns false and the caller exits. The caller of this function must decrement the disable counter, but misses doing so if the calltime fails to be acquired. Instead of exiting out when calltime is NULL, change the logic to do the work if it is not NULL and still do the clean up at the end of the function if it is NULL. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20251008114835.027b878a@gandalf.local.home Fixes: a485ea9e3ef3 ("tracing: Fix irqsoff and wakeup latency tracers when using function graph") Reported-by: Sasha Levin <sashal@kernel.org> Closes: https://lore.kernel.org/linux-trace-kernel/20251006175848.1906912-1-sashal@kernel.org/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-15tracing: Fix race condition in kprobe initialization causing NULL pointer dereferenceYuan Chen4-14/+28
commit 9cf9aa7b0acfde7545c1a1d912576e9bab28dc6f upstream. There is a critical race condition in kprobe initialization that can lead to NULL pointer dereference and kernel crash. [1135630.084782] Unable to handle kernel paging request at virtual address 0000710a04630000 ... [1135630.260314] pstate: 404003c9 (nZcv DAIF +PAN -UAO) [1135630.269239] pc : kprobe_perf_func+0x30/0x260 [1135630.277643] lr : kprobe_dispatcher+0x44/0x60 [1135630.286041] sp : ffffaeff4977fa40 [1135630.293441] x29: ffffaeff4977fa40 x28: ffffaf015340e400 [1135630.302837] x27: 0000000000000000 x26: 0000000000000000 [1135630.312257] x25: ffffaf029ed108a8 x24: ffffaf015340e528 [1135630.321705] x23: ffffaeff4977fc50 x22: ffffaeff4977fc50 [1135630.331154] x21: 0000000000000000 x20: ffffaeff4977fc50 [1135630.340586] x19: ffffaf015340e400 x18: 0000000000000000 [1135630.349985] x17: 0000000000000000 x16: 0000000000000000 [1135630.359285] x15: 0000000000000000 x14: 0000000000000000 [1135630.368445] x13: 0000000000000000 x12: 0000000000000000 [1135630.377473] x11: 0000000000000000 x10: 0000000000000000 [1135630.386411] x9 : 0000000000000000 x8 : 0000000000000000 [1135630.395252] x7 : 0000000000000000 x6 : 0000000000000000 [1135630.403963] x5 : 0000000000000000 x4 : 0000000000000000 [1135630.412545] x3 : 0000710a04630000 x2 : 0000000000000006 [1135630.421021] x1 : ffffaeff4977fc50 x0 : 0000710a04630000 [1135630.429410] Call trace: [1135630.434828] kprobe_perf_func+0x30/0x260 [1135630.441661] kprobe_dispatcher+0x44/0x60 [1135630.448396] aggr_pre_handler+0x70/0xc8 [1135630.454959] kprobe_breakpoint_handler+0x140/0x1e0 [1135630.462435] brk_handler+0xbc/0xd8 [1135630.468437] do_debug_exception+0x84/0x138 [1135630.475074] el1_dbg+0x18/0x8c [1135630.480582] security_file_permission+0x0/0xd0 [1135630.487426] vfs_write+0x70/0x1c0 [1135630.493059] ksys_write+0x5c/0xc8 [1135630.498638] __arm64_sys_write+0x24/0x30 [1135630.504821] el0_svc_common+0x78/0x130 [1135630.510838] el0_svc_handler+0x38/0x78 [1135630.516834] el0_svc+0x8/0x1b0 kernel/trace/trace_kprobe.c: 1308 0xffff3df8995039ec <kprobe_perf_func+0x2c>: ldr x21, [x24,#120] include/linux/compiler.h: 294 0xffff3df8995039f0 <kprobe_perf_func+0x30>: ldr x1, [x21,x0] kernel/trace/trace_kprobe.c 1308: head = this_cpu_ptr(call->perf_events); 1309: if (hlist_empty(head)) 1310: return 0; crash> struct trace_event_call -o struct trace_event_call { ... [120] struct hlist_head *perf_events; //(call->perf_event) ... } crash> struct trace_event_call ffffaf015340e528 struct trace_event_call { ... perf_events = 0xffff0ad5fa89f088, //this value is correct, but x21 = 0 ... } Race Condition Analysis: The race occurs between kprobe activation and perf_events initialization: CPU0 CPU1 ==== ==== perf_kprobe_init perf_trace_event_init tp_event->perf_events = list;(1) tp_event->class->reg (2)← KPROBE ACTIVE Debug exception triggers ... kprobe_dispatcher kprobe_perf_func (tk->tp.flags & TP_FLAG_PROFILE) head = this_cpu_ptr(call->perf_events)(3) (perf_events is still NULL) Problem: 1. CPU0 executes (1) assigning tp_event->perf_events = list 2. CPU0 executes (2) enabling kprobe functionality via class->reg() 3. CPU1 triggers and reaches kprobe_dispatcher 4. CPU1 checks TP_FLAG_PROFILE - condition passes (step 2 completed) 5. CPU1 calls kprobe_perf_func() and crashes at (3) because call->perf_events is still NULL CPU1 sees that kprobe functionality is enabled but does not see that perf_events has been assigned. Add pairing read and write memory barriers to guarantee that if CPU1 sees that kprobe functionality is enabled, it must also see that perf_events has been assigned. Link: https://lore.kernel.org/all/20251001022025.44626-1-chenyuan_fl@163.com/ Fixes: 50d780560785 ("tracing/kprobes: Add probe handler dispatcher to support perf and ftrace concurrent use") Cc: stable@vger.kernel.org Signed-off-by: Yuan Chen <chenyuan@kylinos.cn> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-15tracing: Fix lock imbalance in s_start() memory allocation failure pathSasha Levin1-2/+1
commit 61e19cd2e5c5235326a13a68df1a2f8ec4eeed7b upstream. When s_start() fails to allocate memory for set_event_iter, it returns NULL before acquiring event_mutex. However, the corresponding s_stop() function always tries to unlock the mutex, causing a lock imbalance warning: WARNING: bad unlock balance detected! 6.17.0-rc7-00175-g2b2e0c04f78c #7 Not tainted ------------------------------------- syz.0.85611/376514 is trying to release lock (event_mutex) at: [<ffffffff8dafc7a4>] traverse.part.0.constprop.0+0x2c4/0x650 fs/seq_file.c:131 but there are no more locks to release! The issue was introduced by commit b355247df104 ("tracing: Cache ':mod:' events for modules not loaded yet") which added the kzalloc() allocation before the mutex lock, creating a path where s_start() could return without locking the mutex while s_stop() would still try to unlock it. Fix this by unconditionally acquiring the mutex immediately after allocation, regardless of whether the allocation succeeded. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250929113238.3722055-1-sashal@kernel.org Fixes: b355247df104 ("tracing: Cache ":mod:" events for modules not loaded yet") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-15bpf: Remove migrate_disable in kprobe_multi_link_prog_runTao Chen1-2/+7
[ Upstream commit abdaf49be5424db74e19d167c10d7dad79a0efc2 ] Graph tracer framework ensures we won't migrate, kprobe_multi_link_prog_run called all the way from graph tracer, which disables preemption in function_graph_enter_regs, as Jiri and Yonghong suggested, there is no need to use migrate_disable. As a result, some overhead may will be reduced. And add cant_sleep check for __this_cpu_inc_return. Fixes: 0dcac2725406 ("bpf: Add multi kprobe link") Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250814121430.2347454-1-chen.dylane@linux.dev Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-10-12ring buffer: Propagate __rb_map_vma return value to callerAnkit Khushwaha1-1/+1
commit de4cbd704731778a2dc833ce5a24b38e5d672c05 upstream. The return value from `__rb_map_vma()`, which rejects writable or executable mappings (VM_WRITE, VM_EXEC, or !VM_MAYSHARE), was being ignored. As a result the caller of `__rb_map_vma` always returned 0 even when the mapping had actually failed, allowing it to proceed with an invalid VMA. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20251008172516.20697-1-ankitkhushwaha.linux@gmail.com Fixes: 117c39200d9d7 ("ring-buffer: Introducing ring-buffer mapping functions") Reported-by: syzbot+ddc001b92c083dbf2b97@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?id=194151be8eaebd826005329b2e123aecae714bdb Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-28Merge tag 'trace-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-traceLinus Torvalds2-1/+14
Pull tracing fixes from Steven Rostedt: - Fix buffer overflow in osnoise_cpu_write() The allocated buffer to read user space did not add a nul terminating byte after copying from user the string. It then reads the string, and if user space did not add a nul byte, the read will continue beyond the string. Add a nul terminating byte after reading the string. - Fix missing check for lockdown on tracing There's a path from kprobe events or uprobe events that can update the tracing system even if lockdown on tracing is activate. Add a check in the dynamic event path. - Add a recursion check for the function graph return path Now that fprobes can hook to the function graph tracer and call different code between the entry and the exit, the exit code may now call functions that are not called in entry. This means that the exit handler can possibly trigger recursion that is not caught and cause the system to crash. Add the same recursion checks in the function exit handler as exists in the entry handler path. * tag 'trace-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: fgraph: Protect return handler from recursion loop tracing: dynevent: Add a missing lockdown check on dynevent tracing/osnoise: Fix slab-out-of-bounds in _parse_integer_limit()
2025-09-27tracing: fgraph: Protect return handler from recursion loopMasami Hiramatsu (Google)1-0/+12
function_graph_enter_regs() prevents itself from recursion by ftrace_test_recursion_trylock(), but __ftrace_return_to_handler(), which is called at the exit, does not prevent such recursion. Therefore, while it can prevent recursive calls from fgraph_ops::entryfunc(), it is not able to prevent recursive calls to fgraph from fgraph_ops::retfunc(), resulting in a recursive loop. This can lead an unexpected recursion bug reported by Menglong. is_endbr() is called in __ftrace_return_to_handler -> fprobe_return -> kprobe_multi_link_exit_handler -> is_endbr. To fix this issue, acquire ftrace_test_recursion_trylock() in the __ftrace_return_to_handler() after unwind the shadow stack to mark this section must prevent recursive call of fgraph inside user-defined fgraph_ops::retfunc(). This is essentially a fix to commit 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer"), because before that fgraph was only used from the function graph tracer. Fprobe allowed user to run any callbacks from fgraph after that commit. Reported-by: Menglong Dong <menglong8.dong@gmail.com> Closes: https://lore.kernel.org/all/20250918120939.1706585-1-dongml2@chinatelecom.cn/ Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer") Cc: stable@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/175852292275.307379.9040117316112640553.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Menglong Dong <menglong8.dong@gmail.com> Acked-by: Menglong Dong <menglong8.dong@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-24Merge tag 'probes-fixes-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-traceLinus Torvalds2-3/+8
Pull probes fixes from Masami Hiramatsu: - fprobe: Even if there is a memory allocation failure, try to remove the addresses recorded until then from the filter. Previously we just skipped it. - tracing: dynevent: Add a missing lockdown check on dynevent. This dynevent is the interface for all probe events. Thus if there is no check, any probe events can be added after lock down the tracefs. * tag 'probes-fixes-v6.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: dynevent: Add a missing lockdown check on dynevent tracing: fprobe: Fix to remove recorded module addresses from filter
2025-09-25tracing: dynevent: Add a missing lockdown check on dyneventMasami Hiramatsu (Google)1-0/+4
Since dynamic_events interface on tracefs is compatible with kprobe_events and uprobe_events, it should also check the lockdown status and reject if it is set. Link: https://lore.kernel.org/all/175824455687.45175.3734166065458520748.stgit@devnote2/ Fixes: 17911ff38aa5 ("tracing: Add locked_down checks to the open calls of files created for tracefs") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: stable@vger.kernel.org
2025-09-24tracing: fprobe: Fix to remove recorded module addresses from filterMasami Hiramatsu (Google)1-3/+4
Even if there is a memory allocation failure in fprobe_addr_list_add(), there is a partial list of module addresses. So remove the recorded addresses from filter if exists. This also removes the redundant ret local variable. Fixes: a3dc2983ca7b ("tracing: fprobe: Cleanup fprobe hash when module unloading") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: stable@vger.kernel.org Reviewed-by: Menglong Dong <menglong8.dong@gmail.com>
2025-09-23tracing: dynevent: Add a missing lockdown check on dyneventMasami Hiramatsu (Google)1-0/+4
Since dynamic_events interface on tracefs is compatible with kprobe_events and uprobe_events, it should also check the lockdown status and reject if it is set. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/175824455687.45175.3734166065458520748.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-23tracing/osnoise: Fix slab-out-of-bounds in _parse_integer_limit()Wang Liang1-1/+2
When config osnoise cpus by write() syscall, the following KASAN splat may be observed: BUG: KASAN: slab-out-of-bounds in _parse_integer_limit+0x103/0x130 Read of size 1 at addr ffff88810121e3a1 by task test/447 CPU: 1 UID: 0 PID: 447 Comm: test Not tainted 6.17.0-rc6-dirty #288 PREEMPT(voluntary) Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x55/0x70 print_report+0xcb/0x610 kasan_report+0xb8/0xf0 _parse_integer_limit+0x103/0x130 bitmap_parselist+0x16d/0x6f0 osnoise_cpus_write+0x116/0x2d0 vfs_write+0x21e/0xcc0 ksys_write+0xee/0x1c0 do_syscall_64+0xa8/0x2a0 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> This issue can be reproduced by below code: const char *cpulist = "1"; int fd=open("/sys/kernel/debug/tracing/osnoise/cpus", O_WRONLY); write(fd, cpulist, strlen(cpulist)); Function bitmap_parselist() was called to parse cpulist, it require that the parameter 'buf' must be terminated with a '\0' or '\n'. Fix this issue by adding a '\0' to 'buf' in osnoise_cpus_write(). Cc: <mhiramat@kernel.org> Cc: <mathieu.desnoyers@efficios.com> Cc: <tglozar@redhat.com> Link: https://lore.kernel.org/20250916063948.3154627-1-wangliang74@huawei.com Fixes: 17f89102fe23 ("tracing/osnoise: Allow arbitrarily long CPU string") Signed-off-by: Wang Liang <wangliang74@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-18Merge tag 'trace-rv-v6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-traceLinus Torvalds2-2/+6
Pull runtime verifier fixes from Steven Rostedt: - Fix build in some RISC-V flavours Some system calls only are available for the 64bit RISC-V machines. #ifdef out the cases of clock_nanosleep and futex in the sleep monitor if they are not supported by the architecture. - Fix wrong cast, obsolete after refactoring Use container_of() to get to the rv_monitor structure from the enable_monitors_next() 'p' pointer. The assignment worked only because the list field used happened to be the first field of the structure. - Remove redundant include files Some include files were listed twice. Remove the extra ones and sort the includes. - Fix missing unlock on failure There was an error path that exited the rv_register_monitor() function without releasing a lock. Change that to goto the lock release. - Add Gabriele Monaco to be Runtime Verifier maintainer Gabriele is doing most of the work on RV as well as collecting patches. Add him to the maintainers file for Runtime Verification. * tag 'trace-rv-v6.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rv: Add Gabriele Monaco as maintainer for Runtime Verification rv: Fix missing mutex unlock in rv_register_monitor() include/linux/rv.h: remove redundant include file rv: Fix wrong type cast in enabled_monitors_next() rv: Support systems with time64-only syscalls
2025-09-18tracing: kprobe-event: Fix null-ptr-deref in trace_kprobe_create_internal()Wang Liang1-0/+2
A crash was observed with the following output: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 UID: 0 PID: 2899 Comm: syz.2.399 Not tainted 6.17.0-rc5+ #5 PREEMPT(none) RIP: 0010:trace_kprobe_create_internal+0x3fc/0x1440 kernel/trace/trace_kprobe.c:911 Call Trace: <TASK> trace_kprobe_create_cb+0xa2/0xf0 kernel/trace/trace_kprobe.c:1089 trace_probe_create+0xf1/0x110 kernel/trace/trace_probe.c:2246 dyn_event_create+0x45/0x70 kernel/trace/trace_dynevent.c:128 create_or_delete_trace_kprobe+0x5e/0xc0 kernel/trace/trace_kprobe.c:1107 trace_parse_run_command+0x1a5/0x330 kernel/trace/trace.c:10785 vfs_write+0x2b6/0xd00 fs/read_write.c:684 ksys_write+0x129/0x240 fs/read_write.c:738 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x5d/0x2d0 arch/x86/entry/syscall_64.c:94 </TASK> Function kmemdup() may return NULL in trace_kprobe_create_internal(), add check for it's return value. Link: https://lore.kernel.org/all/20250916075816.3181175-1-wangliang74@huawei.com/ Fixes: 33b4e38baa03 ("tracing: kprobe-event: Allocate string buffers from heap") Signed-off-by: Wang Liang <wangliang74@huawei.com> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-09-15rv: Fix missing mutex unlock in rv_register_monitor()Zhen Ni1-1/+1
If create_monitor_dir() fails, the function returns directly without releasing rv_interface_lock. This leaves the mutex locked and causes subsequent monitor registration attempts to deadlock. Fix it by making the error path jump to out_unlock, ensuring that the mutex is always released before returning. Fixes: 24cbfe18d55a ("rv: Merge struct rv_monitor_def into struct rv_monitor") Signed-off-by: Zhen Ni <zhen.ni@easystack.cn> Reviewed-by: Gabriele Monaco <gmonaco@redhat.com> Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20250903065112.1878330-1-zhen.ni@easystack.cn Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2025-09-15rv: Fix wrong type cast in enabled_monitors_next()Nam Cao1-1/+1
Argument 'p' of enabled_monitors_next() is not a pointer to struct rv_monitor, it is actually a pointer to the list_head inside struct rv_monitor. Therefore it is wrong to cast 'p' to struct rv_monitor *. This wrong type cast has been there since the beginning. But it still worked because the list_head was the first field in struct rv_monitor_def. This is no longer true since commit 24cbfe18d55a ("rv: Merge struct rv_monitor_def into struct rv_monitor") moved the list_head, and this wrong type cast became a functional problem. Properly use container_of() instead. Fixes: 24cbfe18d55a ("rv: Merge struct rv_monitor_def into struct rv_monitor") Signed-off-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/r/20250806120911.989365-1-namcao@linutronix.de Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2025-09-15rv: Support systems with time64-only syscallsPalmer Dabbelt1-0/+4
Some systems (like 32-bit RISC-V) only have the 64-bit time_t versions of syscalls. So handle the 32-bit time_t version of those being undefined. Fixes: f74f8bb246cf ("rv: Add rtapp_sleep monitor") Closes: https://lore.kernel.org/oe-kbuild-all/202508160204.SsFyNfo6-lkp@intel.com Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com> Acked-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20250804194518.97620-2-palmer@dabbelt.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2025-09-08tracing: Silence warning when chunk allocation fails in trace_pid_writePu Lehui1-1/+5
Syzkaller trigger a fault injection warning: WARNING: CPU: 1 PID: 12326 at tracepoint_add_func+0xbfc/0xeb0 Modules linked in: CPU: 1 UID: 0 PID: 12326 Comm: syz.6.10325 Tainted: G U 6.14.0-rc5-syzkaller #0 Tainted: [U]=USER Hardware name: Google Compute Engine/Google Compute Engine RIP: 0010:tracepoint_add_func+0xbfc/0xeb0 kernel/tracepoint.c:294 Code: 09 fe ff 90 0f 0b 90 0f b6 74 24 43 31 ff 41 bc ea ff ff ff RSP: 0018:ffffc9000414fb48 EFLAGS: 00010283 RAX: 00000000000012a1 RBX: ffffffff8e240ae0 RCX: ffffc90014b78000 RDX: 0000000000080000 RSI: ffffffff81bbd78b RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffffffffffffffef R13: 0000000000000000 R14: dffffc0000000000 R15: ffffffff81c264f0 FS: 00007f27217f66c0(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2e80dff8 CR3: 00000000268f8000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> tracepoint_probe_register_prio+0xc0/0x110 kernel/tracepoint.c:464 register_trace_prio_sched_switch include/trace/events/sched.h:222 [inline] register_pid_events kernel/trace/trace_events.c:2354 [inline] event_pid_write.isra.0+0x439/0x7a0 kernel/trace/trace_events.c:2425 vfs_write+0x24c/0x1150 fs/read_write.c:677 ksys_write+0x12b/0x250 fs/read_write.c:731 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f We can reproduce the warning by following the steps below: 1. echo 8 >> set_event_notrace_pid. Let tr->filtered_pids owns one pid and register sched_switch tracepoint. 2. echo ' ' >> set_event_pid, and perform fault injection during chunk allocation of trace_pid_list_alloc. Let pid_list with no pid and assign to tr->filtered_pids. 3. echo ' ' >> set_event_pid. Let pid_list is NULL and assign to tr->filtered_pids. 4. echo 9 >> set_event_pid, will trigger the double register sched_switch tracepoint warning. The reason is that syzkaller injects a fault into the chunk allocation in trace_pid_list_alloc, causing a failure in trace_pid_list_set, which may trigger double register of the same tracepoint. This only occurs when the system is about to crash, but to suppress this warning, let's add failure handling logic to trace_pid_list_set. Link: https://lore.kernel.org/20250908024658.2390398-1-pulehui@huaweicloud.com Fixes: 8d6e90983ade ("tracing: Create a sparse bitmask for pid filtering") Reported-by: syzbot+161412ccaeff20ce4dde@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67cb890e.050a0220.d8275.022e.GAE@google.com Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-06tracing/osnoise: Fix null-ptr-deref in bitmap_parselist()Wang Liang1-0/+3
A crash was observed with the following output: BUG: kernel NULL pointer dereference, address: 0000000000000010 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 2 UID: 0 PID: 92 Comm: osnoise_cpus Not tainted 6.17.0-rc4-00201-gd69eb204c255 #138 PREEMPT(voluntary) RIP: 0010:bitmap_parselist+0x53/0x3e0 Call Trace: <TASK> osnoise_cpus_write+0x7a/0x190 vfs_write+0xf8/0x410 ? do_sys_openat2+0x88/0xd0 ksys_write+0x60/0xd0 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> This issue can be reproduced by below code: fd=open("/sys/kernel/debug/tracing/osnoise/cpus", O_WRONLY); write(fd, "0-2", 0); When user pass 'count=0' to osnoise_cpus_write(), kmalloc() will return ZERO_SIZE_PTR (16) and cpulist_parse() treat it as a normal value, which trigger the null pointer dereference. Add check for the parameter 'count'. Cc: <mhiramat@kernel.org> Cc: <mathieu.desnoyers@efficios.com> Cc: <tglozar@redhat.com> Link: https://lore.kernel.org/20250906035610.3880282-1-wangliang74@huawei.com Fixes: 17f89102fe23 ("tracing/osnoise: Allow arbitrarily long CPU string") Signed-off-by: Wang Liang <wangliang74@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-06trace/fgraph: Fix error handlingGuenter Roeck1-1/+2
Commit edede7a6dcd7 ("trace/fgraph: Fix the warning caused by missing unregister notifier") added a call to unregister the PM notifier if register_ftrace_graph() failed. It does so unconditionally. However, the PM notifier is only registered with the first call to register_ftrace_graph(). If the first registration was successful and a subsequent registration failed, the notifier is now unregistered even if ftrace graphs are still registered. Fix the problem by only unregistering the PM notifier during error handling if there are no active fgraph registrations. Fixes: edede7a6dcd7 ("trace/fgraph: Fix the warning caused by missing unregister notifier") Closes: https://lore.kernel.org/all/63b0ba5a-a928-438e-84f9-93028dd72e54@roeck-us.net/ Cc: Ye Weihua <yeweihua4@huawei.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250906050618.2634078-1-linux@roeck-us.net Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-02tracing: Fix tracing_marker may trigger page fault during preempt_disableLuo Gengkun1-2/+2
Both tracing_mark_write and tracing_mark_raw_write call __copy_from_user_inatomic during preempt_disable. But in some case, __copy_from_user_inatomic may trigger page fault, and will call schedule() subtly. And if a task is migrated to other cpu, the following warning will be trigger: if (RB_WARN_ON(cpu_buffer, !local_read(&cpu_buffer->committing))) An example can illustrate this issue: process flow CPU --------------------------------------------------------------------- tracing_mark_raw_write(): cpu:0 ... ring_buffer_lock_reserve(): cpu:0 ... cpu = raw_smp_processor_id() cpu:0 cpu_buffer = buffer->buffers[cpu] cpu:0 ... ... __copy_from_user_inatomic(): cpu:0 ... # page fault do_mem_abort(): cpu:0 ... # Call schedule schedule() cpu:0 ... # the task schedule to cpu1 __buffer_unlock_commit(): cpu:1 ... ring_buffer_unlock_commit(): cpu:1 ... cpu = raw_smp_processor_id() cpu:1 cpu_buffer = buffer->buffers[cpu] cpu:1 As shown above, the process will acquire cpuid twice and the return values are not the same. To fix this problem using copy_from_user_nofault instead of __copy_from_user_inatomic, as the former performs 'access_ok' before copying. Link: https://lore.kernel.org/20250819105152.2766363-1-luogengkun@huaweicloud.com Fixes: 656c7f0d2d2b ("tracing: Replace kmap with copy_from_user() in trace_marker writing") Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-02trace: Remove redundant __GFP_NOWARNQianfeng Rong1-1/+1
Commit 16f5dfbc851b ("gfp: include __GFP_NOWARN in GFP_NOWAIT") made GFP_NOWAIT implicitly include __GFP_NOWARN. Therefore, explicit __GFP_NOWARN combined with GFP_NOWAIT (e.g., `GFP_NOWAIT | __GFP_NOWARN`) is now redundant. Let's clean up these redundant flags across subsystems. No functional changes. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250805023630.335719-1-rongqianfeng@vivo.com Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-23Merge tag 'trace-v6.17-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-traceLinus Torvalds6-25/+49
Pull tracing fixes from Steven Rostedt: - Fix rtla and latency tooling pkg-config errors If libtraceevent and libtracefs is installed, but their corresponding '.pc' files are not installed, it reports that the libraries are missing and confuses the developer. Instead, report that the pkg-config files are missing and should be installed. - Fix overflow bug of the parser in trace_get_user() trace_get_user() uses the parsing functions to parse the user space strings. If the parser fails due to incorrect processing, it doesn't terminate the buffer with a nul byte. Add a "failed" flag to the parser that gets set when parsing fails and is used to know if the buffer is fine to use or not. - Remove a semicolon that was at an end of a comment line - Fix register_ftrace_graph() to unregister the pm notifier on error The register_ftrace_graph() registers a pm notifier but there's an error path that can exit the function without unregistering it. Since the function returns an error, it will never be unregistered. - Allocate and copy ftrace hash for reader of ftrace filter files When the set_ftrace_filter or set_ftrace_notrace files are open for read, an iterator is created and sets its hash pointer to the associated hash that represents filtering or notrace filtering to it. The issue is that the hash it points to can change while the iteration is happening. All the locking used to access the tracer's hashes are released which means those hashes can change or even be freed. Using the hash pointed to by the iterator can cause UAF bugs or similar. Have the read of these files allocate and copy the corresponding hashes and use that as that will keep them the same while the iterator is open. This also simplifies the code as opening it for write already does an allocate and copy, and now that the read is doing the same, there's no need to check which way it was opened on the release of the file, and the iterator hash can always be freed. - Fix function graph to copy args into temp storage The output of the function graph tracer shows both the entry and the exit of a function. When the exit is right after the entry, it combines the two events into one with the output of "function();", instead of showing: function() { } In order to do this, the iterator descriptor that reads the events includes storage that saves the entry event while it peaks at the next event in the ring buffer. The peek can free the entry event so the iterator must store the information to use it after the peek. With the addition of function graph tracer recording the args, where the args are a dynamic array in the entry event, the temp storage does not save them. This causes the args to be corrupted or even cause a read of unsafe memory. Add space to save the args in the temp storage of the iterator. - Fix race between ftrace_dump and reading trace_pipe ftrace_dump() is used when a crash occurs where the ftrace buffer will be printed to the console. But it can also be triggered by sysrq-z. If a sysrq-z is triggered while a task is reading trace_pipe it can cause a race in the ftrace_dump() where it checks if the buffer has content, then it checks if the next event is available, and then prints the output (regardless if the next event was available or not). Reading trace_pipe at the same time can cause it to not be available, and this triggers a WARN_ON in the print. Move the printing into the check if the next event exists or not * tag 'trace-v6.17-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ftrace: Also allocate and copy hash for reading of filter files ftrace: Fix potential warning in trace_printk_seq during ftrace_dump fgraph: Copy args in intermediate storage with entry trace/fgraph: Fix the warning caused by missing unregister notifier ring-buffer: Remove redundant semicolons tracing: Limit access to parser->buffer when trace_get_user failed rtla: Check pkg-config install tools/latency-collector: Check pkg-config install
2025-08-22ftrace: Also allocate and copy hash for reading of filter filesSteven Rostedt1-9/+10
Currently the reader of set_ftrace_filter and set_ftrace_notrace just adds the pointer to the global tracer hash to its iterator. Unlike the writer that allocates a copy of the hash, the reader keeps the pointer to the filter hashes. This is problematic because this pointer is static across function calls that release the locks that can update the global tracer hashes. This can cause UAF and similar bugs. Allocate and copy the hash for reading the filter files like it is done for the writers. This not only fixes UAF bugs, but also makes the code a bit simpler as it doesn't have to differentiate when to free the iterator's hash between writers and readers. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/20250822183606.12962cc3@batman.local.home Fixes: c20489dad156 ("ftrace: Assign iter->hash to filter or notrace hashes on seq read") Closes: https://lore.kernel.org/all/20250813023044.2121943-1-wutengda@huaweicloud.com/ Closes: https://lore.kernel.org/all/20250822192437.GA458494@ax162/ Reported-by: Tengda Wu <wutengda@huaweicloud.com> Tested-by: Tengda Wu <wutengda@huaweicloud.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-22ftrace: Fix potential warning in trace_printk_seq during ftrace_dumpTengda Wu1-2/+2
When calling ftrace_dump_one() concurrently with reading trace_pipe, a WARN_ON_ONCE() in trace_printk_seq() can be triggered due to a race condition. The issue occurs because: CPU0 (ftrace_dump) CPU1 (reader) echo z > /proc/sysrq-trigger !trace_empty(&iter) trace_iterator_reset(&iter) <- len = size = 0 cat /sys/kernel/tracing/trace_pipe trace_find_next_entry_inc(&iter) __find_next_entry ring_buffer_empty_cpu <- all empty return NULL trace_printk_seq(&iter.seq) WARN_ON_ONCE(s->seq.len >= s->seq.size) In the context between trace_empty() and trace_find_next_entry_inc() during ftrace_dump, the ring buffer data was consumed by other readers. This caused trace_find_next_entry_inc to return NULL, failing to populate `iter.seq`. At this point, due to the prior trace_iterator_reset, both `iter.seq.len` and `iter.seq.size` were set to 0. Since they are equal, the WARN_ON_ONCE condition is triggered. Move the trace_printk_seq() into the if block that checks to make sure the return value of trace_find_next_entry_inc() is non-NULL in ftrace_dump_one(), ensuring the 'iter.seq' is properly populated before subsequent operations. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ingo Molnar <mingo@elte.hu> Link: https://lore.kernel.org/20250822033343.3000289-1-wutengda@huaweicloud.com Fixes: d769041f8653 ("ring_buffer: implement new locking") Signed-off-by: Tengda Wu <wutengda@huaweicloud.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-22fgraph: Copy args in intermediate storage with entrySteven Rostedt1-6/+16
The output of the function graph tracer has two ways to display its entries. One way for leaf functions with no events recorded within them, and the other is for functions with events recorded inside it. As function graph has an entry and exit event, to simplify the output of leaf functions it combines the two, where as non leaf functions are separate: 2) | invoke_rcu_core() { 2) | raise_softirq() { 2) 0.391 us | __raise_softirq_irqoff(); 2) 1.191 us | } 2) 2.086 us | } The __raise_softirq_irqoff() function above is really two events that were merged into one. Otherwise it would have looked like: 2) | invoke_rcu_core() { 2) | raise_softirq() { 2) | __raise_softirq_irqoff() { 2) 0.391 us | } 2) 1.191 us | } 2) 2.086 us | } In order to do this merge, the reading of the trace output file needs to look at the next event before printing. But since the pointer to the event is on the ring buffer, it needs to save the entry event before it looks at the next event as the next event goes out of focus as soon as a new event is read from the ring buffer. After it reads the next event, it will print the entry event with either the '{' (non leaf) or ';' and timestamps (leaf). The iterator used to read the trace file has storage for this event. The problem happens when the function graph tracer has arguments attached to the entry event as the entry now has a variable length "args" field. This field only gets set when funcargs option is used. But the args are not recorded in this temp data and garbage could be printed. The entry field is copied via: data->ent = *curr; Where "curr" is the entry field. But this method only saves the non variable length fields from the structure. Add a helper structure to the iterator data that adds the max args size to the data storage in the iterator. Then simply copy the entire entry into this storage (with size protection). Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/20250820195522.51d4a268@gandalf.local.home Reported-by: Sasha Levin <sashal@kernel.org> Tested-by: Sasha Levin <sashal@kernel.org> Closes: https://lore.kernel.org/all/aJaxRVKverIjF4a6@lappy/ Fixes: ff5c9c576e75 ("ftrace: Add support for function argument to graph tracer") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-20tracing: fprobe-event: Sanitize wildcard for fprobe event nameMasami Hiramatsu (Google)1-1/+1
Fprobe event accepts wildcards for the target functions, but unless user specifies its event name, it makes an event with the wildcards. /sys/kernel/tracing # echo 'f mutex*' >> dynamic_events /sys/kernel/tracing # cat dynamic_events f:fprobes/mutex*__entry mutex* /sys/kernel/tracing # ls events/fprobes/ enable filter mutex*__entry To fix this, replace the wildcard ('*') with an underscore. Link: https://lore.kernel.org/all/175535345114.282990.12294108192847938710.stgit@devnote2/ Fixes: 334e5519c375 ("tracing/probes: Add fprobe events for tracing function entry and exit.") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: stable@vger.kernel.org
2025-08-20trace/fgraph: Fix the warning caused by missing unregister notifierYe Weihua1-0/+1
This warning was triggered during testing on v6.16: notifier callback ftrace_suspend_notifier_call already registered WARNING: CPU: 2 PID: 86 at kernel/notifier.c:23 notifier_chain_register+0x44/0xb0 ... Call Trace: <TASK> blocking_notifier_chain_register+0x34/0x60 register_ftrace_graph+0x330/0x410 ftrace_profile_write+0x1e9/0x340 vfs_write+0xf8/0x420 ? filp_flush+0x8a/0xa0 ? filp_close+0x1f/0x30 ? do_dup2+0xaf/0x160 ksys_write+0x65/0xe0 do_syscall_64+0xa4/0x260 entry_SYSCALL_64_after_hwframe+0x77/0x7f When writing to the function_profile_enabled interface, the notifier was not unregistered after start_graph_tracing failed, causing a warning the next time function_profile_enabled was written. Fixed by adding unregister_pm_notifier in the exception path. Link: https://lore.kernel.org/20250818073332.3890629-1-yeweihua4@huawei.com Fixes: 4a2b8dda3f870 ("tracing/function-graph-tracer: fix a regression while suspend to disk") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ye Weihua <yeweihua4@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-20ring-buffer: Remove redundant semicolonsLiao Yuanhong1-1/+1
Remove unnecessary semicolons. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250813095114.559530-1-liaoyuanhong@vivo.com Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-20tracing: Limit access to parser->buffer when trace_get_user failedPu Lehui2-7/+19
When the length of the string written to set_ftrace_filter exceeds FTRACE_BUFF_MAX, the following KASAN alarm will be triggered: BUG: KASAN: slab-out-of-bounds in strsep+0x18c/0x1b0 Read of size 1 at addr ffff0000d00bd5ba by task ash/165 CPU: 1 UID: 0 PID: 165 Comm: ash Not tainted 6.16.0-g6bcdbd62bd56-dirty Hardware name: linux,dummy-virt (DT) Call trace: show_stack+0x34/0x50 (C) dump_stack_lvl+0xa0/0x158 print_address_description.constprop.0+0x88/0x398 print_report+0xb0/0x280 kasan_report+0xa4/0xf0 __asan_report_load1_noabort+0x20/0x30 strsep+0x18c/0x1b0 ftrace_process_regex.isra.0+0x100/0x2d8 ftrace_regex_release+0x484/0x618 __fput+0x364/0xa58 ____fput+0x28/0x40 task_work_run+0x154/0x278 do_notify_resume+0x1f0/0x220 el0_svc+0xec/0xf0 el0t_64_sync_handler+0xa0/0xe8 el0t_64_sync+0x1ac/0x1b0 The reason is that trace_get_user will fail when processing a string longer than FTRACE_BUFF_MAX, but not set the end of parser->buffer to 0. Then an OOB access will be triggered in ftrace_regex_release-> ftrace_process_regex->strsep->strpbrk. We can solve this problem by limiting access to parser->buffer when trace_get_user failed. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20250813040232.1344527-1-pulehui@huaweicloud.com Fixes: 8c9af478c06b ("ftrace: Handle commands when closing set_ftrace_filter file") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-03Merge tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mmLinus Torvalds1-20/+2
Pull non-MM updates from Andrew Morton: "Significant patch series in this pull request: - "squashfs: Remove page->mapping references" (Matthew Wilcox) gets us closer to being able to remove page->mapping - "relayfs: misc changes" (Jason Xing) does some maintenance and minor feature addition work in relayfs - "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches us from static preallocation of the kdump crashkernel's working memory over to dynamic allocation. So the difficulty of a-priori estimation of the second kernel's needs is removed and the first kernel obtains extra memory - "generalize panic_print's dump function to be used by other kernel parts" (Feng Tang) implements some consolidation and rationalization of the various ways in which a failing kernel splats information at the operator * tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits) tools/getdelays: add backward compatibility for taskstats version kho: add test for kexec handover delaytop: enhance error logging and add PSI feature description samples: Kconfig: fix spelling mistake "instancess" -> "instances" fat: fix too many log in fat_chain_add() scripts/spelling.txt: add notifer||notifier to spelling.txt xen/xenbus: fix typo "notifer" net: mvneta: fix typo "notifer" drm/xe: fix typo "notifer" cxl: mce: fix typo "notifer" KVM: x86: fix typo "notifer" MAINTAINERS: add maintainers for delaytop ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below() ucount: fix atomic_long_inc_below() argument type kexec: enable CMA based contiguous allocation stackdepot: make max number of pools boot-time configurable lib/xxhash: remove unused functions init/Kconfig: restore CONFIG_BROKEN help text lib/raid6: update recov_rvv.c zero page usage docs: update docs after introducing delaytop ...
2025-08-03Merge tag 'trace-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-traceLinus Torvalds4-200/+117
Pull more tracing updates from Steven Rostedt: - Remove unneeded goto out statements Over time, the logic was restructured but left a "goto out" where the out label simply did a "return ret;". Instead of jumping to this out label, simply return immediately and remove the out label. - Add guard(ring_buffer_nest) Some calls to the tracing ring buffer can happen when the ring buffer is already being written to at the same context (for example, a trace_printk() in between a ring_buffer_lock_reserve() and a ring_buffer_unlock_commit()). In order to not trigger the recursion detection, these functions use ring_buffer_nest_start() and ring_buffer_nest_end(). Create a guard() for these functions so that their use cases can be simplified and not need to use goto for the release. - Clean up the tracing code with guard() and __free() logic There were several locations that were prime candidates for using guard() and __free() helpers. Switch them over to use them. - Fix output of function argument traces for unsigned int values The function tracer with "func-args" option set will record up to 6 argument registers and then use BTF to format them for human consumption when the trace file is read. There are several arguments that are "unsigned long" and even "unsigned int" that are either and address or a mask. It is easier to understand if they were printed using hexadecimal instead of decimal. The old method just printed all non-pointer values as signed integers, which made it even worse for unsigned integers. For instance, instead of: __local_bh_disable_ip(ip=-2127311112, cnt=256) <-handle_softirqs show: __local_bh_disable_ip(ip=0xffffffff8133cef8, cnt=0x100) <-handle_softirqs" * tag 'trace-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Have unsigned int function args displayed as hexadecimal ring-buffer: Convert ring_buffer_write() to use guard(preempt_notrace) tracing: Use __free(kfree) in trace.c to remove gotos tracing: Add guard() around locks and mutexes in trace.c tracing: Add guard(ring_buffer_nest) tracing: Remove unneeded goto out logic
2025-08-03Merge tag 'modules-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linuxLinus Torvalds1-1/+1
Pull module updates from Daniel Gomez: "This is a small set of changes for modules, primarily to extend module users to use the module data structures in combination with the already no-op stub module functions, even when support for modules is disabled in the kernel configuration. This change follows the kernel's coding style for conditional compilation and allows kunit code to drop all CONFIG_MODULES ifdefs, which is also part of the changes. This should allow others part of the kernel to do the same cleanup. The remaining changes include a fix for module name length handling which could potentially lead to the removal of an incorrect module, and various cleanups" * tag 'modules-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux: module: Rename MAX_PARAM_PREFIX_LEN to __MODULE_NAME_LEN tracing: Replace MAX_PARAM_PREFIX_LEN with MODULE_NAME_LEN module: Restore the moduleparam prefix length check module: Remove unnecessary +1 from last_unloaded_module::name size module: Prevent silent truncation of module name in delete_module(2) kunit: test: Drop CONFIG_MODULE ifdeffery module: make structure definitions always visible module: move 'struct module_use' to internal.h
2025-08-01tracing: Have unsigned int function args displayed as hexadecimalSteven Rostedt1-1/+7
Most function arguments that are passed in as unsigned int or unsigned long are better displayed as hexadecimal than normal integer. For example, the functions: static void __create_object(unsigned long ptr, size_t size, int min_count, gfp_t gfp, unsigned int objflags); static bool stack_access_ok(struct unwind_state *state, unsigned long _addr, size_t len); void __local_bh_disable_ip(unsigned long ip, unsigned int cnt); Show up in the trace as: __create_object(ptr=-131387050520576, size=4096, min_count=1, gfp=3264, objflags=0) <-kmem_cache_alloc_noprof stack_access_ok(state=0xffffc9000233fc98, _addr=-60473102566256, len=8) <-unwind_next_frame __local_bh_disable_ip(ip=-2127311112, cnt=256) <-handle_softirqs Instead, by displaying unsigned as hexadecimal, they look more like this: __create_object(ptr=0xffff8881028d2080, size=0x280, min_count=1, gfp=0x82820, objflags=0x0) <-kmem_cache_alloc_node_noprof stack_access_ok(state=0xffffc90000003938, _addr=0xffffc90000003930, len=0x8) <-unwind_next_frame __local_bh_disable_ip(ip=0xffffffff8133cef8, cnt=0x100) <-handle_softirqs Which is much easier to understand as most unsigned longs are usually just pointers. Even the "unsigned int cnt" in __local_bh_disable_ip() looks better as hexadecimal as a lot of flags are passed as unsigned. Changes since v2: https://lore.kernel.org/20250801111453.01502861@gandalf.local.home - Use btf_int_encoding() instead of open coding it (Martin KaFai Lau) Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Link: https://lore.kernel.org/20250801165601.7770d65c@gandalf.local.home Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-01ring-buffer: Convert ring_buffer_write() to use guard(preempt_notrace)Steven Rostedt1-10/+6
The function ring_buffer_write() has a goto out to only do a preempt_enable_notrace(). This can be replaced by a guard. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250801203858.205479143@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-01tracing: Use __free(kfree) in trace.c to remove gotosSteven Rostedt1-24/+12
There's a couple of locations that have goto out in trace.c for the only purpose of freeing a variable that was allocated. These can be replaced with __free(kfree). Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250801203858.040892777@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-01tracing: Add guard() around locks and mutexes in trace.cSteven Rostedt1-98/+46
There's several locations in trace.c that can be simplified by using guards around raw_spin_lock_irqsave, mutexes and preempt disabling. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250801203857.879085376@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-01tracing: Add guard(ring_buffer_nest)Steven Rostedt2-44/+31
Some calls to the tracing ring buffer can happen when the ring buffer is already being written to by the same context (for example, a trace_printk() in between a ring_buffer_lock_reserve() and a ring_buffer_unlock_commit()). In order to not trigger the recursion detection, these functions use ring_buffer_nest_start() and ring_buffer_nest_end(). Create a guard() for these functions so that their use cases can be simplified and not need to use goto for the release. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250801203857.710501021@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-01tracing: Remove unneeded goto out logicSteven Rostedt1-23/+15
Several places in the trace.c file there's a goto out where the out is simply a return. There's no reason to jump to the out label if it's not doing any more logic but simply returning from the function. Replace the goto outs with a return and remove the out labels. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250801203857.538726745@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-01Merge tag 'trace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-traceLinus Torvalds9-80/+208
Pull tracing updates from Steven Rostedt: - Deprecate auto-mounting tracefs to /sys/kernel/debug/tracing When tracefs was first introduced back in 2014, the directory /sys/kernel/tracing was added and is the designated location to mount tracefs. To keep backward compatibility, tracefs was auto-mounted in /sys/kernel/debug/tracing as well. All distros now mount tracefs on /sys/kernel/tracing. Having it seen in two different locations has lead to various issues and inconsistencies. The VFS folks have to also maintain debugfs_create_automount() for this single user. It's been over 10 years. Tooling and scripts should start replacing the debugfs location with the tracefs one. The reason tracefs was created in the first place was to allow access to the tracing facilities without the need to configure debugfs into the kernel. Using tracefs should now be more robust. A new config is created: CONFIG_TRACEFS_AUTOMOUNT_DEPRECATED which is default y, so that the kernel is still built with the automount. This config allows those that want to remove the automount from debugfs to do so. When tracefs is accessed from /sys/kernel/debug/tracing, the following printk is triggerd: pr_warn("NOTICE: Automounting of tracing to debugfs is deprecated and will be removed in 2030\n"); This gives users another 5 years to fix their scripts. - Use queue_rcu_work() instead of call_rcu() for freeing event filters The number of filters to be free can be many depending on the number of events within an event system. Freeing them from softirq context can potentially cause undesired latency. Use the RCU workqueue to free them instead. - Remove pointless memory barriers in latency code Memory barriers were added to some of the latency code a long time ago with the idea of "making them visible", but that's not what memory barriers are for. They are to synchronize access between different variables. There was no synchronization here making them pointless. - Remove "__attribute__()" from the type field of event format When LLVM is used to compile the kernel with CONFIG_DEBUG_INFO_BTF=y and PAHOLE_HAS_BTF_TAG=y, some of the format fields get expanded with the following: field:const char * filename; offset:24; size:8; signed:0; Turns into: field:const char __attribute__((btf_type_tag("user"))) * filename; offset:24; size:8; signed:0; This confuses parsers. Add code to strip these tags from the strings. - Add eprobe config option CONFIG_EPROBE_EVENTS Eprobes were added back in 5.15 but were only enabled when another probe was enabled (kprobe, fprobe, uprobe, etc). The eprobes had no config option of their own. Add one as they should be a separate entity. It's default y to keep with the old kernels but still has dependencies on TRACING and HAVE_REGS_AND_STACK_ACCESS_API. - Add eprobe documentation When eprobes were added back in 5.15 no documentation was added to describe them. This needs to be rectified. - Replace open coded cpumask_next_wrap() in move_to_next_cpu() - Have preemptirq_delay_run() use off-stack CPU mask - Remove obsolete comment about pelt_cfs event DECLARE_TRACE() appends "_tp" to trace events now, but the comment above pelt_cfs still mentioned appending it manually. - Remove EVENT_FILE_FL_SOFT_MODE flag The SOFT_MODE flag was required when the soft enabling and disabling of trace events was first introduced. But there was a bug with this approach as it only worked for a single instance. When multiple users required soft disabling and disabling the code was changed to have a ref count. The SOFT_MODE flag is now set iff the ref count is non zero. This is redundant and just reading the ref count is good enough. - Fix typo in comment * tag 'trace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: Documentation: tracing: Add documentation about eprobes tracing: Have eprobes have their own config option tracing: Remove "__attribute__()" from the type field of event format tracing: Deprecate auto-mounting tracefs in debugfs tracing: Fix comment in trace_module_remove_events() tracing: Remove EVENT_FILE_FL_SOFT_MODE flag tracing: Remove pointless memory barriers tracing/sched: Remove obsolete comment on suffixes kernel: trace: preemptirq_delay_test: use offstack cpu mask tracing: Use queue_rcu_work() to free filters tracing: Replace opencoded cpumask_next_wrap() in move_to_next_cpu()
2025-07-31tracing: Replace MAX_PARAM_PREFIX_LEN with MODULE_NAME_LENPetr Pavlu1-1/+1
Use the MODULE_NAME_LEN definition in module_exists() to obtain the maximum size of a module name, instead of using MAX_PARAM_PREFIX_LEN. The values are the same but MODULE_NAME_LEN is more appropriate in this context. MAX_PARAM_PREFIX_LEN was added in commit 730b69d22525 ("module: check kernel param length at compile time, not runtime") only to break a circular dependency between module.h and moduleparam.h, and should mostly be limited to use in moduleparam.h. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20250630143535.267745-5-petr.pavlu@suse.com Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
2025-07-30Merge tag 'trace-unused-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-traceLinus Torvalds3-1/+10
Pull tracepoint cleanup from Steven Rostedt: "Remove or hide unused tracepoints Tracepoints take up memory (around 5K per tracepoint) even when they are unused. Changes are being made to detect when a tracepoint is defined but unused and a warning is shown at build. But those changes are not yet ready for inclusion. - Fix some of the unused tracepoints that it detected Some tracepoints were removed and others were hidden by config settings to match the config settings of where they are instantiated. Some tracepoints were moved into architecture specific code as only one architecture used them. - Call the ftrace_test_filter tracepoint in an unreachable if statement The ftrace_test_filter tracepoint which is defined when ftrace selftests are configured and is used to test the filter logic, but the tracepoint is not actually called. It is put into an if statement to not have it get compiled out, but also not warn for not being used" * tag 'trace-unused-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: sched: Hide numa events under CONFIG_NUMA_BALANCING powerpc/thp: tracing: Hide hugepage events under CONFIG_PPC_BOOK3S_64 tracing: Call trace_ftrace_test_filter() for the event tracing: arm: arm64: Hide trace events ipi_raise, ipi_entry and ipi_exit binder: Remove unused binder lock events PM: tracing: Hide power_domain_target event under ARCH_OMAP2PLUS PM: tracing: Hide device_pm_callback events under PM_SLEEP PM: tracing: Hide psci_domain_idle events under ARM_PSCI_CPUIDLE PM: cpufreq: powernv/tracing: Move powernv_throttle trace event alarmtimer: Hide alarmtimer_suspend event when RTC_CLASS is not configured tracing, AER: Hide PCIe AER event when PCIEAER is not configured
2025-07-30Merge tag 'trace-rv-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-traceLinus Torvalds51-634/+2182
Pull runtime verification updates from Steven Rostedt: - Added Linear temporal logic monitors for RT application Real-time applications may have design flaws causing them to have unexpected latency. For example, the applications may raise page faults, or may be blocked trying to take a mutex without priority inheritance. However, while attempting to implement DA monitors for these real-time rules, deterministic automaton is found to be inappropriate as the specification language. The automaton is complicated, hard to understand, and error-prone. For these cases, linear temporal logic is found to be more suitable. The LTL is more concise and intuitive. - Make printk_deferred() public The new monitors needed access to printk_deferred(). Make them visible for the entire kernel. - Add a vpanic() to allow for va_list to be passed to panic. - Add rtapp container monitor. A collection of monitors that check for common problems with real-time applications that cause unexpected latency. - Add page fault tracepoints to risc-v These tracepoints are necessary to for the RV monitor to run on risc-v. - Fix the behaviour of the rv tool with -s and idle tasks. - Allow the rv tool to gracefully terminate with SIGTERM - Adjusts dot2c not to create lines over 100 columns - Properly order nested monitors in the RV Kconfig file - Return the registration error in all DA monitor instead of 0 - Update and add new sched collection monitors Replace tss and sncid monitors with more complete sts: Not only prove that switches occur in scheduling context and scheduling needs interrupt disabled but also that each call to the scheduler disables interrupts to (optionally) switch. New monitor: nrp Preemption requires need resched which is cleared by any switch (includes a non optimal workaround for /nested/ preemptions) New monitor: sssw suspension requires setting the task to sleepable and, after the switch occurs, the task requires a wakeup to come back to runnable New monitor: opid waking and need-resched operations occur with interrupts and preemption disabled or in IRQ without explicitly disabling preemption" * tag 'trace-rv-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (48 commits) rv: Add opid per-cpu monitor rv: Add nrp and sssw per-task monitors rv: Replace tss and sncid monitors with more complete sts sched: Adapt sched tracepoints for RV task model rv: Retry when da monitor detects race conditions rv: Adjust monitor dependencies rv: Use strings in da monitors tracepoints rv: Remove trailing whitespace from tracepoint string rv: Add da_handle_start_run_event_ to per-task monitors rv: Fix wrong type cast in reactors_show() and monitor_reactor_show() rv: Fix wrong type cast in monitors_show() rv: Remove struct rv_monitor::reacting rv: Remove rv_reactor's reference counter rv: Merge struct rv_reactor_def into struct rv_reactor rv: Merge struct rv_monitor_def into struct rv_monitor rv: Remove unused field in struct rv_monitor_def rv: Return init error when registering monitors verification/rvgen: Organise Kconfig entries for nested monitors tools/dot2c: Fix generated files going over 100 column limit tools/rv: Stop gracefully also on SIGTERM ...
2025-07-30Merge tag 'trace-ringbuffer-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-traceLinus Torvalds3-72/+118
Pull ring-buffer updates from Steven Rostedt: - Rewind persistent ring buffer on boot When the persistent ring buffer is being used for live kernel tracing and the system crashes, the tool that is reading the trace may not have recorded the data when the system crashed. Although the persistent ring buffer still has that data, when reading it after a reboot, it will start where it left off. That is, what was read will not be accessible. Instead, on reboot, have the persistent ring buffer restart where the data starts and this will allow the tooling to recover what was lost when the crash occurred. - Remove the ring_buffer_read_prepare_sync() logic Reading the trace file required stopping writing to the ring buffer as the trace file is only an iterator and does not consume what it read. It was originally not safe to read the ring buffer in this mode and required disabling writing. The ring_buffer_read_prepare_sync() logic was used to stop each per_cpu ring buffer, call synchronize_rcu() and then start the iterator. This was used instead of calling synchronize_rcu() for each per_cpu buffer. Today, the iterator has been updated where it is safe to read the trace file while writing to the ring buffer is still occurring. There is no more need to do this synchronization and it is causing large delays on machines with many CPUs. Remove this unneeded synchronization. - Make static string array a constant in show_irq_str() Making the string array into a constant has shown to decrease code text/data size. * tag 'trace-ringbuffer-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ring-buffer: Make the const read-only 'type' static ring-buffer: Remove ring_buffer_read_prepare_sync() tracing: ring_buffer: Rewind persistent ring buffer on reboot
2025-07-30Merge tag 'ftrace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-traceLinus Torvalds4-23/+19
Pull ftrace updates from Steven Rostedt: - Keep track of when fgraph_ops are registered or not Keep accounting of when fgraph_ops are registered as if a fgraph_ops is registered twice it can mess up the accounting and it will not work as expected later. Trigger a warning if something registers it twice as to catch bugs before they are found by things just not working as expected. - Make DYNAMIC_FTRACE always enabled for architectures that support it As static ftrace (where all functions are always traced) is very expensive and only exists to help architectures support ftrace, do not make it an option. As soon as an architecture supports DYNAMIC_FTRACE make it use it. This simplifies the code. - Remove redundant config HAVE_FTRACE_MCOUNT_RECORD The CONFIG_HAVE_FTRACE_MCOUNT was added to help simplify the DYNAMIC_FTRACE work, but now every architecture that implements DYNAMIC_FTRACE also has HAVE_FTRACE_MCOUNT set too, making it redundant with the HAVE_DYNAMIC_FTRACE. - Make pid_ptr string size match the comment In print_graph_proc() the pid_ptr string is of size 11, but the comment says /* sign + log10(MAX_INT) + '\0' */ which is actually 12. * tag 'ftrace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Remove redundant config HAVE_FTRACE_MCOUNT_RECORD ftrace: Make DYNAMIC_FTRACE always enabled for architectures that support it fgraph: Keep track of when fgraph_ops are registered or not fgraph: Make pid_str size match the comment
2025-07-30Merge tag 'probes-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-traceLinus Torvalds7-328/+630
Pull probes updates from Masami Hiramatsu: "Stack usage reduction for probe events: - Allocate string buffers from the heap for uprobe, eprobe, kprobe, and fprobe events to avoid stack overflow - Allocate traceprobe_parse_context from the heap to prevent potential stack overflow - Fix a typo in the above commit New features for eprobe and tprobe events: - Add support for arrays in eprobes - Support multiple tprobes on the same tracepoint Improve efficiency: - Register fprobe-events only when it is enabled to reduce overhead - Register tracepoints for tprobe events only when enabled to resolve a lock dependency Code Cleanup: - Add kerneldoc for traceprobe_parse_event_name() and __get_insn_slot() - Sort #include alphabetically in the probes code - Remove the unused 'mod' field from the tprobe-event - Clean up the entry-arg storing code in probe-events Selftest update - Enable fprobe events before checking enable_functions in selftests" * tag 'probes-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: trace_fprobe: Fix typo of the semicolon tracing: Have eprobes handle arrays tracing: probes: Add a kerneldoc for traceprobe_parse_event_name() tracing: uprobe-event: Allocate string buffers from heap tracing: eprobe-event: Allocate string buffers from heap tracing: kprobe-event: Allocate string buffers from heap tracing: fprobe-event: Allocate string buffers from heap tracing: probe: Allocate traceprobe_parse_context from heap tracing: probes: Sort #include alphabetically kprobes: Add missing kerneldoc for __get_insn_slot tracing: tprobe-events: Register tracepoint when enable tprobe event selftests: tracing: Enable fprobe events before checking enable_functions tracing: fprobe-events: Register fprobe-events only when it is enabled tracing: tprobe-events: Support multiple tprobes on the same tracepoint tracing: tprobe-events: Remove mod field from tprobe-event tracing: probe-events: Cleanup entry-arg storing code