aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2022-11-23tracing: Optimize event type allocation with IDAZheng Yejian2-51/+16
After commit 060fa5c83e67 ("tracing/events: reuse trace event ids after overflow"), trace events with dynamic type are linked up in list 'ftrace_event_list' through field 'trace_event.list'. Then when max event type number used up, it's possible to reuse type number of some freed one by traversing 'ftrace_event_list'. As instead, using IDA to manage available type numbers can make codes simpler and then the field 'trace_event.list' can be dropped. Since 'struct trace_event' is used in static tracepoints, drop 'trace_event.list' can make vmlinux smaller. Local test with about 2000 tracepoints, vmlinux reduced about 64KB: before:-rwxrwxr-x 1 root root 76669448 Nov 8 17:14 vmlinux after: -rwxrwxr-x 1 root root 76604176 Nov 8 17:15 vmlinux Link: https://lkml.kernel.org/r/20221110020319.1259291-1-zhengyejian1@huawei.com Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23tracing: Make tracepoint_print_iter staticXiu Jianfeng2-3/+1
After change in commit 4239174570da ("tracing: Make tracepoint_printk a static_key"), this symbol is not used outside of the file, so mark it static. Link: https://lkml.kernel.org/r/20221122091456.72055-1-xiujianfeng@huawei.com Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23tracing/perf: Use strndup_user instead of kzalloc/strncpy_from_userChuang Wang1-10/+6
This patch uses strndup_user instead of kzalloc + strncpy_from_user, which makes the code more concise. Link: https://lkml.kernel.org/r/20221121080831.707409-1-nashuiliang@gmail.com Signed-off-by: Chuang Wang <nashuiliang@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23Documentation/osnoise: Add osnoise/options documentationDaniel Bristot de Oliveira1-0/+12
Add the documentation about the osnoise/options file, along with an explanation about the OSNOISE_WORKLOAD option. Link: https://lkml.kernel.org/r/777af8f3d87beedd304805f98eff6c8291d64226.1668692096.git.bristot@kernel.org Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23tracing/osnoise: Add OSNOISE_WORKLOAD optionDaniel Bristot de Oliveira1-6/+22
The osnoise tracer is not only a tracer, and a set of tracepoints, but also a workload dispatcher. In preparation for having other workloads, e.g., in user-space, add an option to avoid dispatching the workload. By not dispatching the workload, the osnoise: tracepoints become generic events to measure the execution time of *any* task on Linux. For example: # cd /sys/kernel/tracing/ # cat osnoise/options DEFAULTS OSNOISE_WORKLOAD # echo NO_OSNOISE_WORKLOAD > osnoise/options # cat osnoise/options NO_DEFAULTS NO_OSNOISE_WORKLOAD # echo osnoise > set_event # echo osnoise > current_tracer # tail -8 trace make-94722 [002] d..3. 1371.794507: thread_noise: make:94722 start 1371.794302286 duration 200897 ns sh-121042 [020] d..3. 1371.794534: thread_noise: sh:121042 start 1371.781610976 duration 8943683 ns make-121097 [005] d..3. 1371.794542: thread_noise: make:121097 start 1371.794481522 duration 60444 ns <...>-40 [005] d..3. 1371.794550: thread_noise: migration/5:40 start 1371.794542256 duration 7154 ns <idle>-0 [018] dNh2. 1371.794554: irq_noise: reschedule:253 start 1371.794553547 duration 40 ns <idle>-0 [018] dNh2. 1371.794561: irq_noise: local_timer:236 start 1371.794556222 duration 4890 ns <idle>-0 [018] .Ns2. 1371.794563: softirq_noise: SCHED:7 start 1371.794561803 duration 992 ns <idle>-0 [018] d..3. 1371.794566: thread_noise: swapper/18:0 start 1371.781368110 duration 13191798 ns In preparation for the rtla exec_time tracer/tool and rtla osnoise --user option. Link: https://lkml.kernel.org/r/f5cfbd37aefd419eefe9243b4d2fc38ed5753fe4.1668692096.git.bristot@kernel.org Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23tracing/osnoise: Add osnoise/options fileDaniel Bristot de Oliveira1-0/+170
Add the tracing/osnoise/options file to control osnoise/timerlat tracer features. It is a single file to contain multiple features, similar to the sched/features file. Reading the file displays a list of options. Writing the OPTION_NAME enables it, writing NO_OPTION_NAME disables it. The DEAFULTS is a particular option that resets the options to the default ones. It uses a bitmask to keep track of the status of the option. When needed, we can add a list of static keys, but for now it does not justify the memory increase. Link: https://lkml.kernel.org/r/f8d34aefdb225d2603fcb4c02a120832a0cd3339.1668692096.git.bristot@kernel.org Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23ring_buffer: Remove unused "event" parameterSong Chen4-11/+8
After commit a389d86f7fd0 ("ring-buffer: Have nested events still record running time stamp"), the "event" parameter is no longer used in either ring_buffer_unlock_commit() or rb_commit(). Best to remove it. Link: https://lkml.kernel.org/r/1666274811-24138-1-git-send-email-chensong_2000@189.cn Signed-off-by: Song Chen <chensong_2000@189.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23tracing: Add trace_trigger kernel command line optionSteven Rostedt (Google)2-2/+89
Allow triggers to be enabled at kernel boot up. For example: trace_trigger="sched_switch.stacktrace if prev_state == 2" The above will enable the stacktrace trigger on top of the sched_switch event and only trigger if its prev_state is 2 (TASK_UNINTERRUPTIBLE). Then at boot up, a stacktrace will trigger and be recorded in the tracing ring buffer every time the sched_switch happens where the previous state is TASK_INTERRUPTIBLE. Another useful trigger would be "traceoff" which can stop tracing on an event if a field of the event matches a certain value defined by the filter ("if" statement). Link: https://lore.kernel.org/linux-trace-kernel/20221020210056.0d8d0a5b@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23tracing: Add __cpumask to denote a trace event field that is a cpumask_tSteven Rostedt (Google)11-9/+91
The trace events have a __bitmask field that can be used for anything that requires bitmasks. Although currently it is only used for CPU masks, it could be used in the future for any type of bitmasks. There is some user space tooling that wants to know if a field is a CPU mask and not just some random unsigned long bitmask. Introduce "__cpumask()" helper functions that work the same as the current __bitmask() helpers but displays in the format file: field:__data_loc cpumask_t *[] mask; offset:36; size:4; signed:0; Instead of: field:__data_loc unsigned long[] mask; offset:32; size:4; signed:0; The main difference is the type. Instead of "unsigned long" it is "cpumask_t *". Note, this type field needs to be a real type in the __dynamic_array() logic that both __cpumask and__bitmask use, but the comparison field requires it to be a scalar type whereas cpumask_t is a structure (non-scalar). But everything works when making it a pointer. Valentin added changes to remove the need of passing in "nr_bits" and the __cpumask will always use nr_cpumask_bits as its size. Link: https://lkml.kernel.org/r/20221014080456.1d32b989@rorschach.local.home Requested-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23ftrace: Clean comments related to FTRACE_OPS_FL_PER_CPUZheng Yejian1-7/+3
Commit b3a88803ac5b ("ftrace: Kill FTRACE_OPS_FL_PER_CPU") didn't completely remove the comments related to FTRACE_OPS_FL_PER_CPU. Link: https://lkml.kernel.org/r/20221025153923.1995973-1-zhengyejian1@huawei.com Fixes: b3a88803ac5b ("ftrace: Kill FTRACE_OPS_FL_PER_CPU") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23tracing: Free buffers when a used dynamic event is removedSteven Rostedt (Google)2-1/+12
After 65536 dynamic events have been added and removed, the "type" field of the event then uses the first type number that is available (not currently used by other events). A type number is the identifier of the binary blobs in the tracing ring buffer (known as events) to map them to logic that can parse the binary blob. The issue is that if a dynamic event (like a kprobe event) is traced and is in the ring buffer, and then that event is removed (because it is dynamic, which means it can be created and destroyed), if another dynamic event is created that has the same number that new event's logic on parsing the binary blob will be used. To show how this can be an issue, the following can crash the kernel: # cd /sys/kernel/tracing # for i in `seq 65536`; do echo 'p:kprobes/foo do_sys_openat2 $arg1:u32' > kprobe_events # done For every iteration of the above, the writing to the kprobe_events will remove the old event and create a new one (with the same format) and increase the type number to the next available on until the type number reaches over 65535 which is the max number for the 16 bit type. After it reaches that number, the logic to allocate a new number simply looks for the next available number. When an dynamic event is removed, that number is then available to be reused by the next dynamic event created. That is, once the above reaches the max number, the number assigned to the event in that loop will remain the same. Now that means deleting one dynamic event and created another will reuse the previous events type number. This is where bad things can happen. After the above loop finishes, the kprobes/foo event which reads the do_sys_openat2 function call's first parameter as an integer. # echo 1 > kprobes/foo/enable # cat /etc/passwd > /dev/null # cat trace cat-2211 [005] .... 2007.849603: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196 cat-2211 [005] .... 2007.849620: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196 cat-2211 [005] .... 2007.849838: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196 cat-2211 [005] .... 2007.849880: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196 # echo 0 > kprobes/foo/enable Now if we delete the kprobe and create a new one that reads a string: # echo 'p:kprobes/foo do_sys_openat2 +0($arg2):string' > kprobe_events And now we can the trace: # cat trace sendmail-1942 [002] ..... 530.136320: foo: (do_sys_openat2+0x0/0x240) arg1= cat-2046 [004] ..... 530.930817: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������" cat-2046 [004] ..... 530.930961: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������" cat-2046 [004] ..... 530.934278: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������" cat-2046 [004] ..... 530.934563: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������" bash-1515 [007] ..... 534.299093: foo: (do_sys_openat2+0x0/0x240) arg1="kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk���������@��4Z����;Y�����U And dmesg has: ================================================================== BUG: KASAN: use-after-free in string+0xd4/0x1c0 Read of size 1 at addr ffff88805fdbbfa0 by task cat/2049 CPU: 0 PID: 2049 Comm: cat Not tainted 6.1.0-rc6-test+ #641 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 Call Trace: <TASK> dump_stack_lvl+0x5b/0x77 print_report+0x17f/0x47b kasan_report+0xad/0x130 string+0xd4/0x1c0 vsnprintf+0x500/0x840 seq_buf_vprintf+0x62/0xc0 trace_seq_printf+0x10e/0x1e0 print_type_string+0x90/0xa0 print_kprobe_event+0x16b/0x290 print_trace_line+0x451/0x8e0 s_show+0x72/0x1f0 seq_read_iter+0x58e/0x750 seq_read+0x115/0x160 vfs_read+0x11d/0x460 ksys_read+0xa9/0x130 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fc2e972ade2 Code: c0 e9 b2 fe ff ff 50 48 8d 3d b2 3f 0a 00 e8 05 f0 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 RSP: 002b:00007ffc64e687c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2e972ade2 RDX: 0000000000020000 RSI: 00007fc2e980d000 RDI: 0000000000000003 RBP: 00007fc2e980d000 R08: 00007fc2e980c010 R09: 0000000000000000 R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020f00 R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 </TASK> The buggy address belongs to the physical page: page:ffffea00017f6ec0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5fdbb flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) raw: 000fffffc0000000 0000000000000000 ffffea00017f6ec8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88805fdbbe80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88805fdbbf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff88805fdbbf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff88805fdbc000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88805fdbc080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== This was found when Zheng Yejian sent a patch to convert the event type number assignment to use IDA, which gives the next available number, and this bug showed up in the fuzz testing by Yujie Liu and the kernel test robot. But after further analysis, I found that this behavior is the same as when the event type numbers go past the 16bit max (and the above shows that). As modules have a similar issue, but is dealt with by setting a "WAS_ENABLED" flag when a module event is enabled, and when the module is freed, if any of its events were enabled, the ring buffer that holds that event is also cleared, to prevent reading stale events. The same can be done for dynamic events. If any dynamic event that is being removed was enabled, then make sure the buffers they were enabled in are now cleared. Link: https://lkml.kernel.org/r/20221123171434.545706e3@gandalf.local.home Link: https://lore.kernel.org/all/20221110020319.1259291-1-zhengyejian1@huawei.com/ Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Depends-on: e18eb8783ec49 ("tracing: Add tracing_reset_all_online_cpus_unlocked() function") Depends-on: 5448d44c38557 ("tracing: Add unified dynamic event framework") Depends-on: 6212dd29683ee ("tracing/kprobes: Use dyn_event framework for kprobe events") Depends-on: 065e63f951432 ("tracing: Only have rmmod clear buffers that its events were active in") Depends-on: 575380da8b469 ("tracing: Only clear trace buffer on module unload if event was traced") Fixes: 77b44d1b7c283 ("tracing/kprobes: Rename Kprobe-tracer to kprobe-event") Reported-by: Zheng Yejian <zhengyejian1@huawei.com> Reported-by: Yujie Liu <yujie.liu@intel.com> Reported-by: kernel test robot <yujie.liu@intel.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23tracing: Add tracing_reset_all_online_cpus_unlocked() functionSteven Rostedt (Google)4-4/+12
Currently the tracing_reset_all_online_cpus() requires the trace_types_lock held. But only one caller of this function actually has that lock held before calling it, and the other just takes the lock so that it can call it. More users of this function is needed where the lock is not held. Add a tracing_reset_all_online_cpus_unlocked() function for the one use case that calls it without being held, and also add a lockdep_assert to make sure it is held when called. Then have tracing_reset_all_online_cpus() take the lock internally, such that callers do not need to worry about taking it. Link: https://lkml.kernel.org/r/20221123192741.658273220@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23tracing: Fix race where histograms can be called before the eventSteven Rostedt (Google)1-0/+3
commit 94eedf3dded5 ("tracing: Fix race where eprobes can be called before the event") fixed an issue where if an event is soft disabled, and the trigger is being added, there's a small window where the event sees that there's a trigger but does not see that it requires reading the event yet, and then calls the trigger with the record == NULL. This could be solved with adding memory barriers in the hot path, or to make sure that all the triggers requiring a record check for NULL. The latter was chosen. Commit 94eedf3dded5 set the eprobe trigger handle to check for NULL, but the same needs to be done with histograms. Link: https://lore.kernel.org/linux-trace-kernel/20221118211809.701d40c0f8a757b0df3c025a@kernel.org/ Link: https://lore.kernel.org/linux-trace-kernel/20221123164323.03450c3a@gandalf.local.home Cc: Tom Zanussi <zanussi@kernel.org> Cc: stable@vger.kernel.org Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events") Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-22tracing/osnoise: Fix duration typeDaniel Bristot de Oliveira1-3/+3
The duration type is a 64 long value, not an int. This was causing some long noise to report wrong values. Change the duration to a 64 bits value. Link: https://lkml.kernel.org/r/a93d8a8378c7973e9c609de05826533c9e977939.1668692096.git.bristot@kernel.org Cc: stable@vger.kernel.org Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Fixes: bce29ac9ce0b ("trace: Add osnoise tracer") Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-22tracing/user_events: Fix memory leak in user_event_create()Xiu Jianfeng1-1/+3
Before current_user_event_group(), it has allocated memory and save it in @name, this should freed before return error. Link: https://lkml.kernel.org/r/20221115014445.158419-1-xiujianfeng@huawei.com Fixes: e5d271812e7a ("tracing/user_events: Move pages/locks into groups to prepare for namespaces") Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-22tracing/hist: add in missing * in comment blocksColin Ian King1-2/+2
There are a couple of missing * in comment blocks. Fix these. Cleans up two clang warnings: kernel/trace/trace_events_hist.c:986: warning: bad line: kernel/trace/trace_events_hist.c:3229: warning: bad line: Link: https://lkml.kernel.org/r/20221020133019.1547587-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-20Linux 6.1-rc6Linus Torvalds1-1/+1
2022-11-20tracing: Fix race where eprobes can be called before the eventSteven Rostedt (Google)1-0/+3
The flag that tells the event to call its triggers after reading the event is set for eprobes after the eprobe is enabled. This leads to a race where the eprobe may be triggered at the beginning of the event where the record information is NULL. The eprobe then dereferences the NULL record causing a NULL kernel pointer bug. Test for a NULL record to keep this from happening. Link: https://lore.kernel.org/linux-trace-kernel/20221116192552.1066630-1-rafaelmendsr@gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/20221117214249.2addbe10@gandalf.local.home Cc: Linux Trace Kernel <linux-trace-kernel@vger.kernel.org> Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: stable@vger.kernel.org Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reported-by: Rafael Mendonca <rafaelmendsr@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-19iommu/vt-d: Set SRE bit only when hardware has SRS capTina Zhang1-2/+3
SRS cap is the hardware cap telling if the hardware IOMMU can support requests seeking supervisor privilege or not. SRE bit in scalable-mode PASID table entry is treated as Reserved(0) for implementation not supporting SRS cap. Checking SRS cap before setting SRE bit can avoid the non-recoverable fault of "Non-zero reserved field set in PASID Table Entry" caused by setting SRE bit while there is no SRS cap support. The fault messages look like below: DMAR: DRHD: handling fault status reg 2 DMAR: [DMA Read NO_PASID] Request device [00:0d.0] fault addr 0x1154e1000 [fault reason 0x5a] SM: Non-zero reserved field set in PASID Table Entry Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface") Cc: stable@vger.kernel.org Signed-off-by: Tina Zhang <tina.zhang@intel.com> Link: https://lore.kernel.org/r/20221115070346.1112273-1-tina.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20221116051544.26540-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-19iommu/vt-d: Preset Access bit for IOVA in FL non-leaf paging entriesTina Zhang1-5/+3
The A/D bits are preseted for IOVA over first level(FL) usage for both kernel DMA (i.e, domain typs is IOMMU_DOMAIN_DMA) and user space DMA usage (i.e., domain type is IOMMU_DOMAIN_UNMANAGED). Presetting A bit in FL requires to preset the bit in every related paging entries, including the non-leaf ones. Otherwise, hardware may treat this as an error. For example, in a case of ECAP_REG.SMPWC==0, DMA faults might occur with below DMAR fault messages (wrapped for line length) dumped. DMAR: DRHD: handling fault status reg 2 DMAR: [DMA Read NO_PASID] Request device [aa:00.0] fault addr 0x10c3a6000 [fault reason 0x90] SM: A/D bit update needed in first-level entry when set up in no snoop Fixes: 289b3b005cb9 ("iommu/vt-d: Preset A/D bits for user space DMA usage") Cc: stable@vger.kernel.org Signed-off-by: Tina Zhang <tina.zhang@intel.com> Link: https://lore.kernel.org/r/20221113010324.1094483-1-tina.zhang@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20221116051544.26540-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-11-18Input: i8042 - fix leaking of platform device on module removalChen Jun1-4/+0
Avoid resetting the module-wide i8042_platform_device pointer in i8042_probe() or i8042_remove(), so that the device can be properly destroyed by i8042_exit() on module unload. Fixes: 9222ba68c3f4 ("Input: i8042 - add deferred probe support") Signed-off-by: Chen Jun <chenjun102@huawei.com> Link: https://lore.kernel.org/r/20221109034148.23821-1-chenjun102@huawei.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2022-11-18arm64/mm: fix incorrect file_map_count for non-leaf pmd/pudLiu Shixin1-2/+2
The page table check trigger BUG_ON() unexpectedly when collapse hugepage: ------------[ cut here ]------------ kernel BUG at mm/page_table_check.c:82! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 6 PID: 68 Comm: khugepaged Not tainted 6.1.0-rc3+ #750 Hardware name: linux,dummy-virt (DT) pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : page_table_check_clear.isra.0+0x258/0x3f0 lr : page_table_check_clear.isra.0+0x240/0x3f0 [...] Call trace: page_table_check_clear.isra.0+0x258/0x3f0 __page_table_check_pmd_clear+0xbc/0x108 pmdp_collapse_flush+0xb0/0x160 collapse_huge_page+0xa08/0x1080 hpage_collapse_scan_pmd+0xf30/0x1590 khugepaged_scan_mm_slot.constprop.0+0x52c/0xac8 khugepaged+0x338/0x518 kthread+0x278/0x2f8 ret_from_fork+0x10/0x20 [...] Since pmd_user_accessible_page() doesn't check if a pmd is leaf, it decrease file_map_count for a non-leaf pmd comes from collapse_huge_page(). and so trigger BUG_ON() unexpectedly. Fix this problem by using pmd_leaf() insteal of pmd_present() in pmd_user_accessible_page(). Moreover, use pud_leaf() for pud_user_accessible_page() too. Fixes: 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK") Reported-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Liu Shixin <liushixin2@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20221117075602.2904324-2-liushixin2@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-11-18io_uring: disallow self-propelled ring pollingPavel Begunkov1-0/+2
When we post a CQE we wake all ring pollers as it normally should be. However, if a CQE was generated by a multishot poll request targeting its own ring, it'll wake that request up, which will make it to post a new CQE, which will wake the request and so on until it exhausts all CQ entries. Don't allow multishot polling io_uring files but downgrade them to oneshots, which was always stated as a correct behaviour that the userspace should check for. Cc: stable@vger.kernel.org Fixes: aa43477b04025 ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3124038c0e7474d427538c2d915335ec28c92d21.1668785722.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-18dm integrity: clear the journal on suspendMikulas Patocka1-0/+13
There was a problem that a user burned a dm-integrity image on CDROM and could not activate it because it had a non-empty journal. Fix this problem by flushing the journal (done by the previous commit) and clearing the journal (done by this commit). Once the journal is cleared, dm-integrity won't attempt to replay it on the next activation. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-11-18dm integrity: flush the journal on suspendMikulas Patocka1-6/+1
This commit flushes the journal on suspend. It is prerequisite for the next commit that enables activating dm integrity devices in read-only mode. Note that we deliberately didn't flush the journal on suspend, so that the journal replay code would be tested. However, the dm-integrity code is 5 years old now, so that journal replay is well-tested, and we can make this change now. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-11-18dm bufio: Fix missing decrement of no_sleep_enabled if dm_bufio_client_create failedZhihao Cheng1-0/+2
The 'no_sleep_enabled' should be decreased in error handling path in dm_bufio_client_create() when the DM_BUFIO_CLIENT_NO_SLEEP flag is set, otherwise static_branch_unlikely() will always return true even if no dm_bufio_client instances have DM_BUFIO_CLIENT_NO_SLEEP flag set. Cc: stable@vger.kernel.org Fixes: 3c1c875d0586 ("dm bufio: conditionally enable branching for DM_BUFIO_CLIENT_NO_SLEEP") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-11-18dm ioctl: fix misbehavior if list_versions races with module loadingMikulas Patocka1-2/+2
__list_versions will first estimate the required space using the "dm_target_iterate(list_version_get_needed, &needed)" call and then will fill the space using the "dm_target_iterate(list_version_get_info, &iter_info)" call. Each of these calls locks the targets using the "down_read(&_lock)" and "up_read(&_lock)" calls, however between the first and second "dm_target_iterate" there is no lock held and the target modules can be loaded at this point, so the second "dm_target_iterate" call may need more space than what was the first "dm_target_iterate" returned. The code tries to handle this overflow (see the beginning of list_version_get_info), however this handling is incorrect. The code sets "param->data_size = param->data_start + needed" and "iter_info.end = (char *)vers+len" - "needed" is the size returned by the first dm_target_iterate call; "len" is the size of the buffer allocated by userspace. "len" may be greater than "needed"; in this case, the code will write up to "len" bytes into the buffer, however param->data_size is set to "needed", so it may write data past the param->data_size value. The ioctl interface copies only up to param->data_size into userspace, thus part of the result will be truncated. Fix this bug by setting "iter_info.end = (char *)vers + needed;" - this guarantees that the second "dm_target_iterate" call will write only up to the "needed" buffer and it will exit with "DM_BUFFER_FULL_FLAG" if it overflows the "needed" space - in this case, userspace will allocate a larger buffer and retry. Note that there is also a bug in list_version_get_needed - we need to add "strlen(tt->name) + 1" to the needed size, not "strlen(tt->name)". Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-11-18gpu: host1x: Avoid trying to use GART on Tegra20Robin Murphy2-0/+8
Since commit c7e3ca515e78 ("iommu/tegra: gart: Do not register with bus") quite some time ago, the GART driver has effectively disabled itself to avoid issues with the GPU driver expecting it to work in ways that it doesn't. As of commit 57365a04c921 ("iommu: Move bus setup to IOMMU device registration") that bodge no longer works, but really the GPU driver should be responsible for its own behaviour anyway. Make the workaround explicit. Reported-by: Jon Hunter <jonathanh@nvidia.com> Suggested-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2022-11-17tracing: Fix potential null-pointer-access of entry in list 'tr->err_log'Zheng Yejian1-4/+5
Entries in list 'tr->err_log' will be reused after entry number exceed TRACING_LOG_ERRS_MAX. The cmd string of the to be reused entry will be freed first then allocated a new one. If the allocation failed, then the entry will still be in list 'tr->err_log' but its 'cmd' field is set to be NULL, later access of 'cmd' is risky. Currently above problem can cause the loss of 'cmd' information of first entry in 'tr->err_log'. When execute `cat /sys/kernel/tracing/error_log`, reproduce logs like: [ 37.495100] trace_kprobe: error: Maxactive is not for kprobe(null) ^ [ 38.412517] trace_kprobe: error: Maxactive is not for kprobe Command: p4:myprobe2 do_sys_openat2 ^ Link: https://lore.kernel.org/linux-trace-kernel/20221114104632.3547266-1-zhengyejian1@huawei.com Fixes: 1581a884b7ca ("tracing: Remove size restriction on tracing_log_err cmd strings") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-17tracing: Remove unused __bad_type_size() methodQiujun Huang1-2/+0
__bad_type_size() is unused after commit 04ae87a52074("ftrace: Rework event_create_dir()"). So, remove it. Link: https://lkml.kernel.org/r/D062EC2E-7DB7-4402-A67E-33C3577F551E@gmail.com Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-18tracing/eprobe: Fix eprobe filter to make a filter correctlyMasami Hiramatsu (Google)1-1/+1
Since the eprobe filter was defined based on the eprobe's trace event itself, it doesn't work correctly. Use the original trace event of the eprobe when making the filter so that the filter works correctly. Without this fix: # echo 'e syscalls/sys_enter_openat \ flags_rename=$flags:u32 if flags < 1000' >> dynamic_events # echo 1 > events/eprobes/sys_enter_openat/enable [ 114.551550] event trace: Could not enable event sys_enter_openat -bash: echo: write error: Invalid argument With this fix: # echo 'e syscalls/sys_enter_openat \ flags_rename=$flags:u32 if flags < 1000' >> dynamic_events # echo 1 > events/eprobes/sys_enter_openat/enable # tail trace cat-241 [000] ...1. 266.498449: sys_enter_openat: (syscalls.sys_enter_openat) flags_rename=0 cat-242 [000] ...1. 266.977640: sys_enter_openat: (syscalls.sys_enter_openat) flags_rename=0 Link: https://lore.kernel.org/all/166823166395.1385292.8931770640212414483.stgit@devnote3/ Fixes: 752be5c5c910 ("tracing/eprobe: Add eprobe filter support") Reported-by: Rafael Mendonca <rafaelmendsr@gmail.com> Tested-by: Rafael Mendonca <rafaelmendsr@gmail.com> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-18tracing/eprobe: Fix warning in filter creationRafael Mendonca1-1/+1
The filter pointer (filterp) passed to create_filter() function must be a pointer that references a NULL pointer, otherwise, we get a warning when adding a filter option to the event probe: root@localhost:/sys/kernel/tracing# echo 'e:egroup/stat_runtime_4core sched/sched_stat_runtime \ runtime=$runtime:u32 if cpu < 4' >> dynamic_events [ 5034.340439] ------------[ cut here ]------------ [ 5034.341258] WARNING: CPU: 0 PID: 223 at kernel/trace/trace_events_filter.c:1939 create_filter+0x1db/0x250 [...] stripped [ 5034.345518] RIP: 0010:create_filter+0x1db/0x250 [...] stripped [ 5034.351604] Call Trace: [ 5034.351803] <TASK> [ 5034.351959] ? process_preds+0x1b40/0x1b40 [ 5034.352241] ? rcu_read_lock_bh_held+0xd0/0xd0 [ 5034.352604] ? kasan_set_track+0x29/0x40 [ 5034.352904] ? kasan_save_alloc_info+0x1f/0x30 [ 5034.353264] create_event_filter+0x38/0x50 [ 5034.353573] __trace_eprobe_create+0x16f4/0x1d20 [ 5034.353964] ? eprobe_dyn_event_release+0x360/0x360 [ 5034.354363] ? mark_held_locks+0xa6/0xf0 [ 5034.354684] ? _raw_spin_unlock_irqrestore+0x35/0x60 [ 5034.355105] ? trace_hardirqs_on+0x41/0x120 [ 5034.355417] ? _raw_spin_unlock_irqrestore+0x35/0x60 [ 5034.355751] ? __create_object+0x5b7/0xcf0 [ 5034.356027] ? lock_is_held_type+0xaf/0x120 [ 5034.356362] ? rcu_read_lock_bh_held+0xb0/0xd0 [ 5034.356716] ? rcu_read_lock_bh_held+0xd0/0xd0 [ 5034.357084] ? kasan_set_track+0x29/0x40 [ 5034.357411] ? kasan_save_alloc_info+0x1f/0x30 [ 5034.357715] ? __kasan_kmalloc+0xb8/0xc0 [ 5034.357985] ? write_comp_data+0x2f/0x90 [ 5034.358302] ? __sanitizer_cov_trace_pc+0x25/0x60 [ 5034.358691] ? argv_split+0x381/0x460 [ 5034.358949] ? write_comp_data+0x2f/0x90 [ 5034.359240] ? eprobe_dyn_event_release+0x360/0x360 [ 5034.359620] trace_probe_create+0xf6/0x110 [ 5034.359940] ? trace_probe_match_command_args+0x240/0x240 [ 5034.360376] eprobe_dyn_event_create+0x21/0x30 [ 5034.360709] create_dyn_event+0xf3/0x1a0 [ 5034.360983] trace_parse_run_command+0x1a9/0x2e0 [ 5034.361297] ? dyn_event_release+0x500/0x500 [ 5034.361591] dyn_event_write+0x39/0x50 [ 5034.361851] vfs_write+0x311/0xe50 [ 5034.362091] ? dyn_event_seq_next+0x40/0x40 [ 5034.362376] ? kernel_write+0x5b0/0x5b0 [ 5034.362637] ? write_comp_data+0x2f/0x90 [ 5034.362937] ? __sanitizer_cov_trace_pc+0x25/0x60 [ 5034.363258] ? ftrace_syscall_enter+0x544/0x840 [ 5034.363563] ? write_comp_data+0x2f/0x90 [ 5034.363837] ? __sanitizer_cov_trace_pc+0x25/0x60 [ 5034.364156] ? write_comp_data+0x2f/0x90 [ 5034.364468] ? write_comp_data+0x2f/0x90 [ 5034.364770] ksys_write+0x158/0x2a0 [ 5034.365022] ? __ia32_sys_read+0xc0/0xc0 [ 5034.365344] __x64_sys_write+0x7c/0xc0 [ 5034.365669] ? syscall_enter_from_user_mode+0x53/0x70 [ 5034.366084] do_syscall_64+0x60/0x90 [ 5034.366356] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 5034.366767] RIP: 0033:0x7ff0b43938f3 [...] stripped [ 5034.371892] </TASK> [ 5034.374720] ---[ end trace 0000000000000000 ]--- Link: https://lore.kernel.org/all/20221108202148.1020111-1-rafaelmendsr@gmail.com/ Fixes: 752be5c5c910 ("tracing/eprobe: Add eprobe filter support") Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-18kprobes: Skip clearing aggrprobe's post_handler in kprobe-on-ftrace caseLi Huafei1-1/+7
In __unregister_kprobe_top(), if the currently unregistered probe has post_handler but other child probes of the aggrprobe do not have post_handler, the post_handler of the aggrprobe is cleared. If this is a ftrace-based probe, there is a problem. In later calls to disarm_kprobe(), we will use kprobe_ftrace_ops because post_handler is NULL. But we're armed with kprobe_ipmodify_ops. This triggers a WARN in __disarm_kprobe_ftrace() and may even cause use-after-free: Failed to disarm kprobe-ftrace at kernel_clone+0x0/0x3c0 (error -2) WARNING: CPU: 5 PID: 137 at kernel/kprobes.c:1135 __disarm_kprobe_ftrace.isra.21+0xcf/0xe0 Modules linked in: testKprobe_007(-) CPU: 5 PID: 137 Comm: rmmod Not tainted 6.1.0-rc4-dirty #18 [...] Call Trace: <TASK> __disable_kprobe+0xcd/0xe0 __unregister_kprobe_top+0x12/0x150 ? mutex_lock+0xe/0x30 unregister_kprobes.part.23+0x31/0xa0 unregister_kprobe+0x32/0x40 __x64_sys_delete_module+0x15e/0x260 ? do_user_addr_fault+0x2cd/0x6b0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] For the kprobe-on-ftrace case, we keep the post_handler setting to identify this aggrprobe armed with kprobe_ipmodify_ops. This way we can disarm it correctly. Link: https://lore.kernel.org/all/20221112070000.35299-1-lihuafei1@huawei.com/ Fixes: 0bc11ed5ab60 ("kprobes: Allow kprobes coexist with livepatch") Reported-by: Zhao Gongyi <zhaogongyi@huawei.com> Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Li Huafei <lihuafei1@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-18rethook: fix a potential memleak in rethook_alloc()Yi Yang1-1/+3
In rethook_alloc(), the variable rh is not freed or passed out if handler is NULL, which could lead to a memleak, fix it. Link: https://lore.kernel.org/all/20221110104438.88099-1-yiyang13@huawei.com/ [Masami: Add "rethook:" tag to the title.] Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook") Cc: stable@vger.kernel.org Signed-off-by: Yi Yang <yiyang13@huawei.com> Acke-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-18tracing/eprobe: Fix memory leak of filter stringRafael Mendonca1-0/+1
The filter string doesn't get freed when a dynamic event is deleted. If a filter is set, then memory is leaked: root@localhost:/sys/kernel/tracing# echo 'e:egroup/stat_runtime_4core \ sched/sched_stat_runtime runtime=$runtime:u32 if cpu < 4' >> dynamic_events root@localhost:/sys/kernel/tracing# echo "-:egroup/stat_runtime_4core" >> dynamic_events root@localhost:/sys/kernel/tracing# echo scan > /sys/kernel/debug/kmemleak [ 224.416373] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak) root@localhost:/sys/kernel/tracing# cat /sys/kernel/debug/kmemleak unreferenced object 0xffff88810156f1b8 (size 8): comm "bash", pid 224, jiffies 4294935612 (age 55.800s) hex dump (first 8 bytes): 63 70 75 20 3c 20 34 00 cpu < 4. backtrace: [<000000009f880725>] __kmem_cache_alloc_node+0x18e/0x720 [<0000000042492946>] __kmalloc+0x57/0x240 [<0000000034ea7995>] __trace_eprobe_create+0x1214/0x1d30 [<00000000d70ef730>] trace_probe_create+0xf6/0x110 [<00000000915c7b16>] eprobe_dyn_event_create+0x21/0x30 [<000000000d894386>] create_dyn_event+0xf3/0x1a0 [<00000000e9af57d5>] trace_parse_run_command+0x1a9/0x2e0 [<0000000080777f18>] dyn_event_write+0x39/0x50 [<0000000089f0ec73>] vfs_write+0x311/0xe50 [<000000003da1bdda>] ksys_write+0x158/0x2a0 [<00000000bb1e616e>] __x64_sys_write+0x7c/0xc0 [<00000000e8aef1f7>] do_syscall_64+0x60/0x90 [<00000000fe7fe8ba>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Additionally, in __trace_eprobe_create() function, if an error occurs after the call to trace_eprobe_parse_filter(), which allocates the filter string, then memory is also leaked. That can be reproduced by creating the same event probe twice: root@localhost:/sys/kernel/tracing# echo 'e:egroup/stat_runtime_4core \ sched/sched_stat_runtime runtime=$runtime:u32 if cpu < 4' >> dynamic_events root@localhost:/sys/kernel/tracing# echo 'e:egroup/stat_runtime_4core \ sched/sched_stat_runtime runtime=$runtime:u32 if cpu < 4' >> dynamic_events -bash: echo: write error: File exists root@localhost:/sys/kernel/tracing# echo scan > /sys/kernel/debug/kmemleak [ 207.871584] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak) root@localhost:/sys/kernel/tracing# cat /sys/kernel/debug/kmemleak unreferenced object 0xffff8881020d17a8 (size 8): comm "bash", pid 223, jiffies 4294938308 (age 31.000s) hex dump (first 8 bytes): 63 70 75 20 3c 20 34 00 cpu < 4. backtrace: [<000000000e4f5f31>] __kmem_cache_alloc_node+0x18e/0x720 [<0000000024f0534b>] __kmalloc+0x57/0x240 [<000000002930a28e>] __trace_eprobe_create+0x1214/0x1d30 [<0000000028387903>] trace_probe_create+0xf6/0x110 [<00000000a80d6a9f>] eprobe_dyn_event_create+0x21/0x30 [<000000007168698c>] create_dyn_event+0xf3/0x1a0 [<00000000f036bf6a>] trace_parse_run_command+0x1a9/0x2e0 [<00000000014bde8b>] dyn_event_write+0x39/0x50 [<0000000078a097f7>] vfs_write+0x311/0xe50 [<00000000996cb208>] ksys_write+0x158/0x2a0 [<00000000a3c2acb0>] __x64_sys_write+0x7c/0xc0 [<0000000006b5d698>] do_syscall_64+0x60/0x90 [<00000000780e8ecf>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fix both issues by releasing the filter string in trace_event_probe_cleanup(). Link: https://lore.kernel.org/all/20221108235738.1021467-1-rafaelmendsr@gmail.com/ Fixes: 752be5c5c910 ("tracing/eprobe: Add eprobe filter support") Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-18tracing: kprobe: Fix potential null-ptr-deref on trace_array in kprobe_event_gen_test_exit()Shang XiaoJing1-0/+4
When test_gen_kprobe_cmd() failed after kprobe_event_gen_cmd_end(), it will goto delete, which will call kprobe_event_delete() and release the corresponding resource. However, the trace_array in gen_kretprobe_test will point to the invalid resource. Set gen_kretprobe_test to NULL after called kprobe_event_delete() to prevent null-ptr-deref. BUG: kernel NULL pointer dereference, address: 0000000000000070 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 246 Comm: modprobe Tainted: G W 6.1.0-rc1-00174-g9522dc5c87da-dirty #248 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 RIP: 0010:__ftrace_set_clr_event_nolock+0x53/0x1b0 Code: e8 82 26 fc ff 49 8b 1e c7 44 24 0c ea ff ff ff 49 39 de 0f 84 3c 01 00 00 c7 44 24 18 00 00 00 00 e8 61 26 fc ff 48 8b 6b 10 <44> 8b 65 70 4c 8b 6d 18 41 f7 c4 00 02 00 00 75 2f RSP: 0018:ffffc9000159fe00 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff88810971d268 RCX: 0000000000000000 RDX: ffff8881080be600 RSI: ffffffff811b48ff RDI: ffff88810971d058 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 R10: ffffc9000159fe58 R11: 0000000000000001 R12: ffffffffa0001064 R13: ffffffffa000106c R14: ffff88810971d238 R15: 0000000000000000 FS: 00007f89eeff6540(0000) GS:ffff88813b600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000070 CR3: 000000010599e004 CR4: 0000000000330ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __ftrace_set_clr_event+0x3e/0x60 trace_array_set_clr_event+0x35/0x50 ? 0xffffffffa0000000 kprobe_event_gen_test_exit+0xcd/0x10b [kprobe_event_gen_test] __x64_sys_delete_module+0x206/0x380 ? lockdep_hardirqs_on_prepare+0xd8/0x190 ? syscall_enter_from_user_mode+0x1c/0x50 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f89eeb061b7 Link: https://lore.kernel.org/all/20221108015130.28326-3-shangxiaojing@huawei.com/ Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-18tracing: kprobe: Fix potential null-ptr-deref on trace_event_file in kprobe_event_gen_test_exit()Shang XiaoJing1-16/+28
When trace_get_event_file() failed, gen_kretprobe_test will be assigned as the error code. If module kprobe_event_gen_test is removed now, the null pointer dereference will happen in kprobe_event_gen_test_exit(). Check if gen_kprobe_test or gen_kretprobe_test is error code or NULL before dereference them. BUG: kernel NULL pointer dereference, address: 0000000000000012 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 2210 Comm: modprobe Not tainted 6.1.0-rc1-00171-g2159299a3b74-dirty #217 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 RIP: 0010:kprobe_event_gen_test_exit+0x1c/0xb5 [kprobe_event_gen_test] Code: Unable to access opcode bytes at 0xffffffff9ffffff2. RSP: 0018:ffffc900015bfeb8 EFLAGS: 00010246 RAX: ffffffffffffffea RBX: ffffffffa0002080 RCX: 0000000000000000 RDX: ffffffffa0001054 RSI: ffffffffa0001064 RDI: ffffffffdfc6349c RBP: ffffffffa0000000 R08: 0000000000000004 R09: 00000000001e95c0 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000800 R13: ffffffffa0002420 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f56b75be540(0000) GS:ffff88813bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff9ffffff2 CR3: 000000010874a006 CR4: 0000000000330ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __x64_sys_delete_module+0x206/0x380 ? lockdep_hardirqs_on_prepare+0xd8/0x190 ? syscall_enter_from_user_mode+0x1c/0x50 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Link: https://lore.kernel.org/all/20221108015130.28326-2-shangxiaojing@huawei.com/ Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-17tracing: Fix wild-memory-access in register_synth_event()Shang XiaoJing1-3/+2
In register_synth_event(), if set_synth_event_print_fmt() failed, then both trace_remove_event_call() and unregister_trace_event() will be called, which means the trace_event_call will call __unregister_trace_event() twice. As the result, the second unregister will causes the wild-memory-access. register_synth_event set_synth_event_print_fmt failed trace_remove_event_call event_remove if call->event.funcs then __unregister_trace_event (first call) unregister_trace_event __unregister_trace_event (second call) Fix the bug by avoiding to call the second __unregister_trace_event() by checking if the first one is called. general protection fault, probably for non-canonical address 0xfbd59c0000000024: 0000 [#1] SMP KASAN PTI KASAN: maybe wild-memory-access in range [0xdead000000000120-0xdead000000000127] CPU: 0 PID: 3807 Comm: modprobe Not tainted 6.1.0-rc1-00186-g76f33a7eedb4 #299 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 RIP: 0010:unregister_trace_event+0x6e/0x280 Code: 00 fc ff df 4c 89 ea 48 c1 ea 03 80 3c 02 00 0f 85 0e 02 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b 63 08 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 85 e2 01 00 00 49 89 2c 24 48 85 ed 74 28 e8 7a 9b RSP: 0018:ffff88810413f370 EFLAGS: 00010a06 RAX: dffffc0000000000 RBX: ffff888105d050b0 RCX: 0000000000000000 RDX: 1bd5a00000000024 RSI: ffff888119e276e0 RDI: ffffffff835a8b20 RBP: dead000000000100 R08: 0000000000000000 R09: fffffbfff0913481 R10: ffffffff8489a407 R11: fffffbfff0913480 R12: dead000000000122 R13: ffff888105d050b8 R14: 0000000000000000 R15: ffff888105d05028 FS: 00007f7823e8d540(0000) GS:ffff888119e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7823e7ebec CR3: 000000010a058002 CR4: 0000000000330ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __create_synth_event+0x1e37/0x1eb0 create_or_delete_synth_event+0x110/0x250 synth_event_run_command+0x2f/0x110 test_gen_synth_cmd+0x170/0x2eb [synth_event_gen_test] synth_event_gen_test_init+0x76/0x9bc [synth_event_gen_test] do_one_initcall+0xdb/0x480 do_init_module+0x1cf/0x680 load_module+0x6a50/0x70a0 __do_sys_finit_module+0x12f/0x1c0 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Link: https://lkml.kernel.org/r/20221117012346.22647-3-shangxiaojing@huawei.com Fixes: 4b147936fa50 ("tracing: Add support for 'synthetic' events") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Cc: stable@vger.kernel.org Cc: <mhiramat@kernel.org> Cc: <zanussi@kernel.org> Cc: <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-17tracing: Fix memory leak in test_gen_synth_cmd() and test_empty_synth_event()Shang XiaoJing1-10/+6
test_gen_synth_cmd() only free buf in fail path, hence buf will leak when there is no failure. Add kfree(buf) to prevent the memleak. The same reason and solution in test_empty_synth_event(). unreferenced object 0xffff8881127de000 (size 2048): comm "modprobe", pid 247, jiffies 4294972316 (age 78.756s) hex dump (first 32 bytes): 20 67 65 6e 5f 73 79 6e 74 68 5f 74 65 73 74 20 gen_synth_test 20 70 69 64 5f 74 20 6e 65 78 74 5f 70 69 64 5f pid_t next_pid_ backtrace: [<000000004254801a>] kmalloc_trace+0x26/0x100 [<0000000039eb1cf5>] 0xffffffffa00083cd [<000000000e8c3bc8>] 0xffffffffa00086ba [<00000000c293d1ea>] do_one_initcall+0xdb/0x480 [<00000000aa189e6d>] do_init_module+0x1cf/0x680 [<00000000d513222b>] load_module+0x6a50/0x70a0 [<000000001fd4d529>] __do_sys_finit_module+0x12f/0x1c0 [<00000000b36c4c0f>] do_syscall_64+0x3f/0x90 [<00000000bbf20cf3>] entry_SYSCALL_64_after_hwframe+0x63/0xcd unreferenced object 0xffff8881127df000 (size 2048): comm "modprobe", pid 247, jiffies 4294972324 (age 78.728s) hex dump (first 32 bytes): 20 65 6d 70 74 79 5f 73 79 6e 74 68 5f 74 65 73 empty_synth_tes 74 20 20 70 69 64 5f 74 20 6e 65 78 74 5f 70 69 t pid_t next_pi backtrace: [<000000004254801a>] kmalloc_trace+0x26/0x100 [<00000000d4db9a3d>] 0xffffffffa0008071 [<00000000c31354a5>] 0xffffffffa00086ce [<00000000c293d1ea>] do_one_initcall+0xdb/0x480 [<00000000aa189e6d>] do_init_module+0x1cf/0x680 [<00000000d513222b>] load_module+0x6a50/0x70a0 [<000000001fd4d529>] __do_sys_finit_module+0x12f/0x1c0 [<00000000b36c4c0f>] do_syscall_64+0x3f/0x90 [<00000000bbf20cf3>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Link: https://lkml.kernel.org/r/20221117012346.22647-2-shangxiaojing@huawei.com Cc: <mhiramat@kernel.org> Cc: <zanussi@kernel.org> Cc: <fengguang.wu@intel.com> Cc: stable@vger.kernel.org Fixes: 9fe41efaca08 ("tracing: Add synth event generation test module") Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-17ftrace: Fix null pointer dereference in ftrace_add_mod()Xiu Jianfeng1-0/+1
The @ftrace_mod is allocated by kzalloc(), so both the members {prev,next} of @ftrace_mode->list are NULL, it's not a valid state to call list_del(). If kstrdup() for @ftrace_mod->{func|module} fails, it goes to @out_free tag and calls free_ftrace_mod() to destroy @ftrace_mod, then list_del() will write prev->next and next->prev, where null pointer dereference happens. BUG: kernel NULL pointer dereference, address: 0000000000000008 Oops: 0002 [#1] PREEMPT SMP NOPTI Call Trace: <TASK> ftrace_mod_callback+0x20d/0x220 ? do_filp_open+0xd9/0x140 ftrace_process_regex.isra.51+0xbf/0x130 ftrace_regex_write.isra.52.part.53+0x6e/0x90 vfs_write+0xee/0x3a0 ? __audit_filter_op+0xb1/0x100 ? auditd_test_task+0x38/0x50 ksys_write+0xa5/0xe0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Kernel panic - not syncing: Fatal exception So call INIT_LIST_HEAD() to initialize the list member to fix this issue. Link: https://lkml.kernel.org/r/20221116015207.30858-1-xiujianfeng@huawei.com Cc: stable@vger.kernel.org Fixes: 673feb9d76ab ("ftrace: Add :mod: caching infrastructure to trace_array") Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-17ring_buffer: Do not deactivate non-existant pagesDaniil Tatianin1-2/+2
rb_head_page_deactivate() expects cpu_buffer to contain a valid list of ->pages, so verify that the list is actually present before calling it. Found by Linux Verification Center (linuxtesting.org) with the SVACE static analysis tool. Link: https://lkml.kernel.org/r/20221114143129.3534443-1-d-tatianin@yandex-team.ru Cc: stable@vger.kernel.org Fixes: 77ae365eca895 ("ring-buffer: make lockless") Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-17ftrace: Optimize the allocation for mcount entriesWang Wensheng1-1/+1
If we can't allocate this size, try something smaller with half of the size. Its order should be decreased by one instead of divided by two. Link: https://lkml.kernel.org/r/20221109094434.84046-3-wangwensheng4@huawei.com Cc: <mhiramat@kernel.org> Cc: <mark.rutland@arm.com> Cc: stable@vger.kernel.org Fixes: a79008755497d ("ftrace: Allocate the mcount record pages as groups") Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-17ftrace: Fix the possible incorrect kernel messageWang Wensheng1-1/+1
If the number of mcount entries is an integer multiple of ENTRIES_PER_PAGE, the page count showing on the console would be wrong. Link: https://lkml.kernel.org/r/20221109094434.84046-2-wangwensheng4@huawei.com Cc: <mhiramat@kernel.org> Cc: <mark.rutland@arm.com> Cc: stable@vger.kernel.org Fixes: 5821e1b74f0d0 ("function tracing: fix wrong pos computing when read buffer has been fulfilled") Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-17io_uring: fix multishot recv request leaksPavel Begunkov1-9/+7
Having REQ_F_POLLED set doesn't guarantee that the request is executed as a multishot from the polling path. Fortunately for us, if the code thinks it's multishot issue when it's not, it can only ask to skip completion so leaking the request. Use issue_flags to mark multipoll issues. Cc: stable@vger.kernel.org Fixes: 1300ebb20286b ("io_uring: multishot recv") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/37762040ba9c52b81b92a2f5ebfd4ee484088951.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-17io_uring: fix multishot accept request leaksPavel Begunkov4-8/+8
Having REQ_F_POLLED set doesn't guarantee that the request is executed as a multishot from the polling path. Fortunately for us, if the code thinks it's multishot issue when it's not, it can only ask to skip completion so leaking the request. Use issue_flags to mark multipoll issues. Cc: stable@vger.kernel.org Fixes: 390ed29b5e425 ("io_uring: add IORING_ACCEPT_MULTISHOT for accept") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7700ac57653f2823e30b34dc74da68678c0c5f13.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-17io_uring: fix tw losing poll eventsPavel Begunkov1-0/+7
We may never try to process a poll wake and its mask if there was multiple wake ups racing for queueing up a tw. Force io_poll_check_events() to update the mask by vfs_poll(). Cc: stable@vger.kernel.org Fixes: aa43477b04025 ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/00344d60f8b18907171178d7cf598de71d127b0b.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-17io_uring: update res mask in io_poll_check_eventsPavel Begunkov1-0/+3
When io_poll_check_events() collides with someone attempting to queue a task work, it'll spin for one more time. However, it'll continue to use the mask from the first iteration instead of updating it. For example, if the first wake up was a EPOLLIN and the second EPOLLOUT, the userspace will not get EPOLLOUT in time. Clear the mask for all subsequent iterations to force vfs_poll(). Cc: stable@vger.kernel.org Fixes: aa43477b04025 ("io_uring: poll rework") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2dac97e8f691231049cb259c4ae57e79e40b537c.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-11-17scsi: iscsi: Fix possible memory leak when device_register() failedZhou Guanghui1-15/+16
If device_register() returns error, the name allocated by the dev_set_name() need be freed. As described in the comment of device_register(), we should use put_device() to give up the reference in the error path. Fix this by calling put_device(), the name will be freed in the kobject_cleanup(), and this patch modified resources will be released by calling the corresponding callback function in the device_release(). Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com> Link: https://lore.kernel.org/r/20221110033729.1555-1-zhouguanghui1@huawei.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-11-17scsi: zfcp: Fix double free of FSF request when qdio send failsBenjamin Block1-1/+1
We used to use the wrong type of integer in 'zfcp_fsf_req_send()' to cache the FSF request ID when sending a new FSF request. This is used in case the sending fails and we need to remove the request from our internal hash table again (so we don't keep an invalid reference and use it when we free the request again). In 'zfcp_fsf_req_send()' we used to cache the ID as 'int' (signed and 32 bit wide), but the rest of the zfcp code (and the firmware specification) handles the ID as 'unsigned long'/'u64' (unsigned and 64 bit wide [s390x ELF ABI]). For one this has the obvious problem that when the ID grows past 32 bit (this can happen reasonably fast) it is truncated to 32 bit when storing it in the cache variable and so doesn't match the original ID anymore. The second less obvious problem is that even when the original ID has not yet grown past 32 bit, as soon as the 32nd bit is set in the original ID (0x80000000 = 2'147'483'648) we will have a mismatch when we cast it back to 'unsigned long'. As the cached variable is of a signed type, the compiler will choose a sign-extending instruction to load the 32 bit variable into a 64 bit register (e.g.: 'lgf %r11,188(%r15)'). So once we pass the cached variable into 'zfcp_reqlist_find_rm()' to remove the request again all the leading zeros will be flipped to ones to extend the sign and won't match the original ID anymore (this has been observed in practice). If we can't successfully remove the request from the hash table again after 'zfcp_qdio_send()' fails (this happens regularly when zfcp cannot notify the adapter about new work because the adapter is already gone during e.g. a ChpID toggle) we will end up with a double free. We unconditionally free the request in the calling function when 'zfcp_fsf_req_send()' fails, but because the request is still in the hash table we end up with a stale memory reference, and once the zfcp adapter is either reset during recovery or shutdown we end up freeing the same memory twice. The resulting stack traces vary depending on the kernel and have no direct correlation to the place where the bug occurs. Here are three examples that have been seen in practice: list_del corruption. next->prev should be 00000001b9d13800, but was 00000000dead4ead. (next=00000001bd131a00) ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:62! monitor event: 0040 ilc:2 [#1] PREEMPT SMP Modules linked in: ... CPU: 9 PID: 1617 Comm: zfcperp0.0.1740 Kdump: loaded Hardware name: ... Krnl PSW : 0704d00180000000 00000003cbeea1f8 (__list_del_entry_valid+0x98/0x140) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 Krnl GPRS: 00000000916d12f1 0000000080000000 000000000000006d 00000003cb665cd6 0000000000000001 0000000000000000 0000000000000000 00000000d28d21e8 00000000d3844000 00000380099efd28 00000001bd131a00 00000001b9d13800 00000000d3290100 0000000000000000 00000003cbeea1f4 00000380099efc70 Krnl Code: 00000003cbeea1e8: c020004f68a7 larl %r2,00000003cc8d7336 00000003cbeea1ee: c0e50027fd65 brasl %r14,00000003cc3e9cb8 #00000003cbeea1f4: af000000 mc 0,0 >00000003cbeea1f8: c02000920440 larl %r2,00000003cd12aa78 00000003cbeea1fe: c0e500289c25 brasl %r14,00000003cc3fda48 00000003cbeea204: b9040043 lgr %r4,%r3 00000003cbeea208: b9040051 lgr %r5,%r1 00000003cbeea20c: b9040032 lgr %r3,%r2 Call Trace: [<00000003cbeea1f8>] __list_del_entry_valid+0x98/0x140 ([<00000003cbeea1f4>] __list_del_entry_valid+0x94/0x140) [<000003ff7ff502fe>] zfcp_fsf_req_dismiss_all+0xde/0x150 [zfcp] [<000003ff7ff49cd0>] zfcp_erp_strategy_do_action+0x160/0x280 [zfcp] [<000003ff7ff4a22e>] zfcp_erp_strategy+0x21e/0xca0 [zfcp] [<000003ff7ff4ad34>] zfcp_erp_thread+0x84/0x1a0 [zfcp] [<00000003cb5eece8>] kthread+0x138/0x150 [<00000003cb557f3c>] __ret_from_fork+0x3c/0x60 [<00000003cc4172ea>] ret_from_fork+0xa/0x40 INFO: lockdep is turned off. Last Breaking-Event-Address: [<00000003cc3e9d04>] _printk+0x4c/0x58 Kernel panic - not syncing: Fatal exception: panic_on_oops or: Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803 Fault in home space mode while using kernel ASCE. AS:0000000063b10007 R3:0000000000000024 Oops: 0038 ilc:3 [#1] SMP Modules linked in: ... CPU: 10 PID: 0 Comm: swapper/10 Kdump: loaded Hardware name: ... Krnl PSW : 0404d00180000000 000003ff7febaf8e (zfcp_fsf_reqid_check+0x86/0x158 [zfcp]) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 Krnl GPRS: 5a6f1cfa89c49ac3 00000000aff2c4c8 6b6b6b6b6b6b6b6b 00000000000002a8 0000000000000000 0000000000000055 0000000000000000 00000000a8515800 0700000000000000 00000000a6e14500 00000000aff2c000 000000008003c44c 000000008093c700 0000000000000010 00000380009ebba8 00000380009ebb48 Krnl Code: 000003ff7febaf7e: a7f4003d brc 15,000003ff7febaff8 000003ff7febaf82: e32020000004 lg %r2,0(%r2) #000003ff7febaf88: ec2100388064 cgrj %r2,%r1,8,000003ff7febaff8 >000003ff7febaf8e: e3b020100020 cg %r11,16(%r2) 000003ff7febaf94: a774fff7 brc 7,000003ff7febaf82 000003ff7febaf98: ec280030007c cgij %r2,0,8,000003ff7febaff8 000003ff7febaf9e: e31020080004 lg %r1,8(%r2) 000003ff7febafa4: e33020000004 lg %r3,0(%r2) Call Trace: [<000003ff7febaf8e>] zfcp_fsf_reqid_check+0x86/0x158 [zfcp] [<000003ff7febbdbc>] zfcp_qdio_int_resp+0x6c/0x170 [zfcp] [<000003ff7febbf90>] zfcp_qdio_irq_tasklet+0xd0/0x108 [zfcp] [<0000000061d90a04>] tasklet_action_common.constprop.0+0xdc/0x128 [<000000006292f300>] __do_softirq+0x130/0x3c0 [<0000000061d906c6>] irq_exit_rcu+0xfe/0x118 [<000000006291e818>] do_io_irq+0xc8/0x168 [<000000006292d516>] io_int_handler+0xd6/0x110 [<000000006292d596>] psw_idle_exit+0x0/0xa ([<0000000061d3be50>] arch_cpu_idle+0x40/0xd0) [<000000006292ceea>] default_idle_call+0x52/0xf8 [<0000000061de4fa4>] do_idle+0xd4/0x168 [<0000000061de51fe>] cpu_startup_entry+0x36/0x40 [<0000000061d4faac>] smp_start_secondary+0x12c/0x138 [<000000006292d88e>] restart_int_handler+0x6e/0x90 Last Breaking-Event-Address: [<000003ff7febaf94>] zfcp_fsf_reqid_check+0x8c/0x158 [zfcp] Kernel panic - not syncing: Fatal exception in interrupt or: Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 523b05d3ae76a000 TEID: 523b05d3ae76a803 Fault in home space mode while using kernel ASCE. AS:0000000077c40007 R3:0000000000000024 Oops: 0038 ilc:3 [#1] SMP Modules linked in: ... CPU: 3 PID: 453 Comm: kworker/3:1H Kdump: loaded Hardware name: ... Workqueue: kblockd blk_mq_run_work_fn Krnl PSW : 0404d00180000000 0000000076fc0312 (__kmalloc+0xd2/0x398) R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 Krnl GPRS: ffffffffffffffff 523b05d3ae76abf6 0000000000000000 0000000000092a20 0000000000000002 00000007e49b5cc0 00000007eda8f000 0000000000092a20 00000007eda8f000 00000003b02856b9 00000000000000a8 523b05d3ae76abf6 00000007dd662000 00000007eda8f000 0000000076fc02b2 000003e0037637a0 Krnl Code: 0000000076fc0302: c004000000d4 brcl 0,76fc04aa 0000000076fc0308: b904001b lgr %r1,%r11 #0000000076fc030c: e3106020001a algf %r1,32(%r6) >0000000076fc0312: e31010000082 xg %r1,0(%r1) 0000000076fc0318: b9040001 lgr %r0,%r1 0000000076fc031c: e30061700082 xg %r0,368(%r6) 0000000076fc0322: ec59000100d9 aghik %r5,%r9,1 0000000076fc0328: e34003b80004 lg %r4,952 Call Trace: [<0000000076fc0312>] __kmalloc+0xd2/0x398 [<0000000076f318f2>] mempool_alloc+0x72/0x1f8 [<000003ff8027c5f8>] zfcp_fsf_req_create.isra.7+0x40/0x268 [zfcp] [<000003ff8027f1bc>] zfcp_fsf_fcp_cmnd+0xac/0x3f0 [zfcp] [<000003ff80280f1a>] zfcp_scsi_queuecommand+0x122/0x1d0 [zfcp] [<000003ff800b4218>] scsi_queue_rq+0x778/0xa10 [scsi_mod] [<00000000771782a0>] __blk_mq_try_issue_directly+0x130/0x208 [<000000007717a124>] blk_mq_request_issue_directly+0x4c/0xa8 [<000003ff801302e2>] dm_mq_queue_rq+0x2ea/0x468 [dm_mod] [<0000000077178c12>] blk_mq_dispatch_rq_list+0x33a/0x818 [<000000007717f064>] __blk_mq_do_dispatch_sched+0x284/0x2f0 [<000000007717f44c>] __blk_mq_sched_dispatch_requests+0x1c4/0x218 [<000000007717fa7a>] blk_mq_sched_dispatch_requests+0x52/0x90 [<0000000077176d74>] __blk_mq_run_hw_queue+0x9c/0xc0 [<0000000076da6d74>] process_one_work+0x274/0x4d0 [<0000000076da7018>] worker_thread+0x48/0x560 [<0000000076daef18>] kthread+0x140/0x160 [<000000007751d144>] ret_from_fork+0x28/0x30 Last Breaking-Event-Address: [<0000000076fc0474>] __kmalloc+0x234/0x398 Kernel panic - not syncing: Fatal exception: panic_on_oops To fix this, simply change the type of the cache variable to 'unsigned long', like the rest of zfcp and also the argument for 'zfcp_reqlist_find_rm()'. This prevents truncation and wrong sign extension and so can successfully remove the request from the hash table. Fixes: e60a6d69f1f8 ("[SCSI] zfcp: Remove function zfcp_reqlist_find_safe") Cc: <stable@vger.kernel.org> #v2.6.34+ Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Link: https://lore.kernel.org/r/979f6e6019d15f91ba56182f1aaf68d61bf37fc6.1668595505.git.bblock@linux.ibm.com Reviewed-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-11-17scsi: scsi_debug: Fix possible UAF in sdebug_add_host_helper()Yuan Can1-1/+5
If device_register() fails in sdebug_add_host_helper(), it will goto clean and sdbg_host will be freed, but sdbg_host->host_list will not be removed from sdebug_host_list, then list traversal may cause UAF. Fix it. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Yuan Can <yuancan@huawei.com> Link: https://lore.kernel.org/r/20221117084421.58918-1-yuancan@huawei.com Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>