Age | Commit message (Collapse) | Author | Files | Lines |
|
After commit 060fa5c83e67 ("tracing/events: reuse trace event ids after
overflow"), trace events with dynamic type are linked up in list
'ftrace_event_list' through field 'trace_event.list'. Then when max
event type number used up, it's possible to reuse type number of some
freed one by traversing 'ftrace_event_list'.
As instead, using IDA to manage available type numbers can make codes
simpler and then the field 'trace_event.list' can be dropped.
Since 'struct trace_event' is used in static tracepoints, drop
'trace_event.list' can make vmlinux smaller. Local test with about 2000
tracepoints, vmlinux reduced about 64KB:
before:-rwxrwxr-x 1 root root 76669448 Nov 8 17:14 vmlinux
after: -rwxrwxr-x 1 root root 76604176 Nov 8 17:15 vmlinux
Link: https://lkml.kernel.org/r/20221110020319.1259291-1-zhengyejian1@huawei.com
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
After change in commit 4239174570da ("tracing: Make tracepoint_printk a
static_key"), this symbol is not used outside of the file, so mark it
static.
Link: https://lkml.kernel.org/r/20221122091456.72055-1-xiujianfeng@huawei.com
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
This patch uses strndup_user instead of kzalloc + strncpy_from_user,
which makes the code more concise.
Link: https://lkml.kernel.org/r/20221121080831.707409-1-nashuiliang@gmail.com
Signed-off-by: Chuang Wang <nashuiliang@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add the documentation about the osnoise/options file, along
with an explanation about the OSNOISE_WORKLOAD option.
Link: https://lkml.kernel.org/r/777af8f3d87beedd304805f98eff6c8291d64226.1668692096.git.bristot@kernel.org
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The osnoise tracer is not only a tracer, and a set of tracepoints,
but also a workload dispatcher.
In preparation for having other workloads, e.g., in user-space,
add an option to avoid dispatching the workload.
By not dispatching the workload, the osnoise: tracepoints become
generic events to measure the execution time of *any* task on Linux.
For example:
# cd /sys/kernel/tracing/
# cat osnoise/options
DEFAULTS OSNOISE_WORKLOAD
# echo NO_OSNOISE_WORKLOAD > osnoise/options
# cat osnoise/options
NO_DEFAULTS NO_OSNOISE_WORKLOAD
# echo osnoise > set_event
# echo osnoise > current_tracer
# tail -8 trace
make-94722 [002] d..3. 1371.794507: thread_noise: make:94722 start 1371.794302286 duration 200897 ns
sh-121042 [020] d..3. 1371.794534: thread_noise: sh:121042 start 1371.781610976 duration 8943683 ns
make-121097 [005] d..3. 1371.794542: thread_noise: make:121097 start 1371.794481522 duration 60444 ns
<...>-40 [005] d..3. 1371.794550: thread_noise: migration/5:40 start 1371.794542256 duration 7154 ns
<idle>-0 [018] dNh2. 1371.794554: irq_noise: reschedule:253 start 1371.794553547 duration 40 ns
<idle>-0 [018] dNh2. 1371.794561: irq_noise: local_timer:236 start 1371.794556222 duration 4890 ns
<idle>-0 [018] .Ns2. 1371.794563: softirq_noise: SCHED:7 start 1371.794561803 duration 992 ns
<idle>-0 [018] d..3. 1371.794566: thread_noise: swapper/18:0 start 1371.781368110 duration 13191798 ns
In preparation for the rtla exec_time tracer/tool and
rtla osnoise --user option.
Link: https://lkml.kernel.org/r/f5cfbd37aefd419eefe9243b4d2fc38ed5753fe4.1668692096.git.bristot@kernel.org
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add the tracing/osnoise/options file to control
osnoise/timerlat tracer features. It is a single
file to contain multiple features, similar to
the sched/features file.
Reading the file displays a list of options. Writing
the OPTION_NAME enables it, writing NO_OPTION_NAME disables
it.
The DEAFULTS is a particular option that resets the options
to the default ones.
It uses a bitmask to keep track of the status of the option. When
needed, we can add a list of static keys, but for now
it does not justify the memory increase.
Link: https://lkml.kernel.org/r/f8d34aefdb225d2603fcb4c02a120832a0cd3339.1668692096.git.bristot@kernel.org
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
After commit a389d86f7fd0 ("ring-buffer: Have nested events still record
running time stamp"), the "event" parameter is no longer used in either
ring_buffer_unlock_commit() or rb_commit(). Best to remove it.
Link: https://lkml.kernel.org/r/1666274811-24138-1-git-send-email-chensong_2000@189.cn
Signed-off-by: Song Chen <chensong_2000@189.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Allow triggers to be enabled at kernel boot up. For example:
trace_trigger="sched_switch.stacktrace if prev_state == 2"
The above will enable the stacktrace trigger on top of the sched_switch
event and only trigger if its prev_state is 2 (TASK_UNINTERRUPTIBLE). Then
at boot up, a stacktrace will trigger and be recorded in the tracing ring
buffer every time the sched_switch happens where the previous state is
TASK_INTERRUPTIBLE.
Another useful trigger would be "traceoff" which can stop tracing on an
event if a field of the event matches a certain value defined by the
filter ("if" statement).
Link: https://lore.kernel.org/linux-trace-kernel/20221020210056.0d8d0a5b@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The trace events have a __bitmask field that can be used for anything
that requires bitmasks. Although currently it is only used for CPU
masks, it could be used in the future for any type of bitmasks.
There is some user space tooling that wants to know if a field is a CPU
mask and not just some random unsigned long bitmask. Introduce
"__cpumask()" helper functions that work the same as the current
__bitmask() helpers but displays in the format file:
field:__data_loc cpumask_t *[] mask; offset:36; size:4; signed:0;
Instead of:
field:__data_loc unsigned long[] mask; offset:32; size:4; signed:0;
The main difference is the type. Instead of "unsigned long" it is
"cpumask_t *". Note, this type field needs to be a real type in the
__dynamic_array() logic that both __cpumask and__bitmask use, but the
comparison field requires it to be a scalar type whereas cpumask_t is a
structure (non-scalar). But everything works when making it a pointer.
Valentin added changes to remove the need of passing in "nr_bits" and the
__cpumask will always use nr_cpumask_bits as its size.
Link: https://lkml.kernel.org/r/20221014080456.1d32b989@rorschach.local.home
Requested-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Commit b3a88803ac5b ("ftrace: Kill FTRACE_OPS_FL_PER_CPU") didn't
completely remove the comments related to FTRACE_OPS_FL_PER_CPU.
Link: https://lkml.kernel.org/r/20221025153923.1995973-1-zhengyejian1@huawei.com
Fixes: b3a88803ac5b ("ftrace: Kill FTRACE_OPS_FL_PER_CPU")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
After 65536 dynamic events have been added and removed, the "type" field
of the event then uses the first type number that is available (not
currently used by other events). A type number is the identifier of the
binary blobs in the tracing ring buffer (known as events) to map them to
logic that can parse the binary blob.
The issue is that if a dynamic event (like a kprobe event) is traced and
is in the ring buffer, and then that event is removed (because it is
dynamic, which means it can be created and destroyed), if another dynamic
event is created that has the same number that new event's logic on
parsing the binary blob will be used.
To show how this can be an issue, the following can crash the kernel:
# cd /sys/kernel/tracing
# for i in `seq 65536`; do
echo 'p:kprobes/foo do_sys_openat2 $arg1:u32' > kprobe_events
# done
For every iteration of the above, the writing to the kprobe_events will
remove the old event and create a new one (with the same format) and
increase the type number to the next available on until the type number
reaches over 65535 which is the max number for the 16 bit type. After it
reaches that number, the logic to allocate a new number simply looks for
the next available number. When an dynamic event is removed, that number
is then available to be reused by the next dynamic event created. That is,
once the above reaches the max number, the number assigned to the event in
that loop will remain the same.
Now that means deleting one dynamic event and created another will reuse
the previous events type number. This is where bad things can happen.
After the above loop finishes, the kprobes/foo event which reads the
do_sys_openat2 function call's first parameter as an integer.
# echo 1 > kprobes/foo/enable
# cat /etc/passwd > /dev/null
# cat trace
cat-2211 [005] .... 2007.849603: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
cat-2211 [005] .... 2007.849620: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
cat-2211 [005] .... 2007.849838: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
cat-2211 [005] .... 2007.849880: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196
# echo 0 > kprobes/foo/enable
Now if we delete the kprobe and create a new one that reads a string:
# echo 'p:kprobes/foo do_sys_openat2 +0($arg2):string' > kprobe_events
And now we can the trace:
# cat trace
sendmail-1942 [002] ..... 530.136320: foo: (do_sys_openat2+0x0/0x240) arg1= cat-2046 [004] ..... 530.930817: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
cat-2046 [004] ..... 530.930961: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
cat-2046 [004] ..... 530.934278: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
cat-2046 [004] ..... 530.934563: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������"
bash-1515 [007] ..... 534.299093: foo: (do_sys_openat2+0x0/0x240) arg1="kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk���������@��4Z����;Y�����U
And dmesg has:
==================================================================
BUG: KASAN: use-after-free in string+0xd4/0x1c0
Read of size 1 at addr ffff88805fdbbfa0 by task cat/2049
CPU: 0 PID: 2049 Comm: cat Not tainted 6.1.0-rc6-test+ #641
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
Call Trace:
<TASK>
dump_stack_lvl+0x5b/0x77
print_report+0x17f/0x47b
kasan_report+0xad/0x130
string+0xd4/0x1c0
vsnprintf+0x500/0x840
seq_buf_vprintf+0x62/0xc0
trace_seq_printf+0x10e/0x1e0
print_type_string+0x90/0xa0
print_kprobe_event+0x16b/0x290
print_trace_line+0x451/0x8e0
s_show+0x72/0x1f0
seq_read_iter+0x58e/0x750
seq_read+0x115/0x160
vfs_read+0x11d/0x460
ksys_read+0xa9/0x130
do_syscall_64+0x3a/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fc2e972ade2
Code: c0 e9 b2 fe ff ff 50 48 8d 3d b2 3f 0a 00 e8 05 f0 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24
RSP: 002b:00007ffc64e687c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2e972ade2
RDX: 0000000000020000 RSI: 00007fc2e980d000 RDI: 0000000000000003
RBP: 00007fc2e980d000 R08: 00007fc2e980c010 R09: 0000000000000000
R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020f00
R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
</TASK>
The buggy address belongs to the physical page:
page:ffffea00017f6ec0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5fdbb
flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc0000000 0000000000000000 ffffea00017f6ec8 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88805fdbbe80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff88805fdbbf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff88805fdbbf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
^
ffff88805fdbc000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff88805fdbc080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================
This was found when Zheng Yejian sent a patch to convert the event type
number assignment to use IDA, which gives the next available number, and
this bug showed up in the fuzz testing by Yujie Liu and the kernel test
robot. But after further analysis, I found that this behavior is the same
as when the event type numbers go past the 16bit max (and the above shows
that).
As modules have a similar issue, but is dealt with by setting a
"WAS_ENABLED" flag when a module event is enabled, and when the module is
freed, if any of its events were enabled, the ring buffer that holds that
event is also cleared, to prevent reading stale events. The same can be
done for dynamic events.
If any dynamic event that is being removed was enabled, then make sure the
buffers they were enabled in are now cleared.
Link: https://lkml.kernel.org/r/20221123171434.545706e3@gandalf.local.home
Link: https://lore.kernel.org/all/20221110020319.1259291-1-zhengyejian1@huawei.com/
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Depends-on: e18eb8783ec49 ("tracing: Add tracing_reset_all_online_cpus_unlocked() function")
Depends-on: 5448d44c38557 ("tracing: Add unified dynamic event framework")
Depends-on: 6212dd29683ee ("tracing/kprobes: Use dyn_event framework for kprobe events")
Depends-on: 065e63f951432 ("tracing: Only have rmmod clear buffers that its events were active in")
Depends-on: 575380da8b469 ("tracing: Only clear trace buffer on module unload if event was traced")
Fixes: 77b44d1b7c283 ("tracing/kprobes: Rename Kprobe-tracer to kprobe-event")
Reported-by: Zheng Yejian <zhengyejian1@huawei.com>
Reported-by: Yujie Liu <yujie.liu@intel.com>
Reported-by: kernel test robot <yujie.liu@intel.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Currently the tracing_reset_all_online_cpus() requires the
trace_types_lock held. But only one caller of this function actually has
that lock held before calling it, and the other just takes the lock so
that it can call it. More users of this function is needed where the lock
is not held.
Add a tracing_reset_all_online_cpus_unlocked() function for the one use
case that calls it without being held, and also add a lockdep_assert to
make sure it is held when called.
Then have tracing_reset_all_online_cpus() take the lock internally, such
that callers do not need to worry about taking it.
Link: https://lkml.kernel.org/r/20221123192741.658273220@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
commit 94eedf3dded5 ("tracing: Fix race where eprobes can be called before
the event") fixed an issue where if an event is soft disabled, and the
trigger is being added, there's a small window where the event sees that
there's a trigger but does not see that it requires reading the event yet,
and then calls the trigger with the record == NULL.
This could be solved with adding memory barriers in the hot path, or to
make sure that all the triggers requiring a record check for NULL. The
latter was chosen.
Commit 94eedf3dded5 set the eprobe trigger handle to check for NULL, but
the same needs to be done with histograms.
Link: https://lore.kernel.org/linux-trace-kernel/20221118211809.701d40c0f8a757b0df3c025a@kernel.org/
Link: https://lore.kernel.org/linux-trace-kernel/20221123164323.03450c3a@gandalf.local.home
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events")
Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The duration type is a 64 long value, not an int. This was
causing some long noise to report wrong values.
Change the duration to a 64 bits value.
Link: https://lkml.kernel.org/r/a93d8a8378c7973e9c609de05826533c9e977939.1668692096.git.bristot@kernel.org
Cc: stable@vger.kernel.org
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Fixes: bce29ac9ce0b ("trace: Add osnoise tracer")
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Before current_user_event_group(), it has allocated memory and save it
in @name, this should freed before return error.
Link: https://lkml.kernel.org/r/20221115014445.158419-1-xiujianfeng@huawei.com
Fixes: e5d271812e7a ("tracing/user_events: Move pages/locks into groups to prepare for namespaces")
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
There are a couple of missing * in comment blocks. Fix these.
Cleans up two clang warnings:
kernel/trace/trace_events_hist.c:986: warning: bad line:
kernel/trace/trace_events_hist.c:3229: warning: bad line:
Link: https://lkml.kernel.org/r/20221020133019.1547587-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
|
|
The flag that tells the event to call its triggers after reading the event
is set for eprobes after the eprobe is enabled. This leads to a race where
the eprobe may be triggered at the beginning of the event where the record
information is NULL. The eprobe then dereferences the NULL record causing
a NULL kernel pointer bug.
Test for a NULL record to keep this from happening.
Link: https://lore.kernel.org/linux-trace-kernel/20221116192552.1066630-1-rafaelmendsr@gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/20221117214249.2addbe10@gandalf.local.home
Cc: Linux Trace Kernel <linux-trace-kernel@vger.kernel.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reported-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
SRS cap is the hardware cap telling if the hardware IOMMU can support
requests seeking supervisor privilege or not. SRE bit in scalable-mode
PASID table entry is treated as Reserved(0) for implementation not
supporting SRS cap.
Checking SRS cap before setting SRE bit can avoid the non-recoverable
fault of "Non-zero reserved field set in PASID Table Entry" caused by
setting SRE bit while there is no SRS cap support. The fault messages
look like below:
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Read NO_PASID] Request device [00:0d.0] fault addr 0x1154e1000
[fault reason 0x5a]
SM: Non-zero reserved field set in PASID Table Entry
Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface")
Cc: stable@vger.kernel.org
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20221115070346.1112273-1-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20221116051544.26540-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The A/D bits are preseted for IOVA over first level(FL) usage for both
kernel DMA (i.e, domain typs is IOMMU_DOMAIN_DMA) and user space DMA
usage (i.e., domain type is IOMMU_DOMAIN_UNMANAGED).
Presetting A bit in FL requires to preset the bit in every related paging
entries, including the non-leaf ones. Otherwise, hardware may treat this
as an error. For example, in a case of ECAP_REG.SMPWC==0, DMA faults might
occur with below DMAR fault messages (wrapped for line length) dumped.
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Read NO_PASID] Request device [aa:00.0] fault addr 0x10c3a6000
[fault reason 0x90]
SM: A/D bit update needed in first-level entry when set up in no snoop
Fixes: 289b3b005cb9 ("iommu/vt-d: Preset A/D bits for user space DMA usage")
Cc: stable@vger.kernel.org
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20221113010324.1094483-1-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20221116051544.26540-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Avoid resetting the module-wide i8042_platform_device pointer in
i8042_probe() or i8042_remove(), so that the device can be properly
destroyed by i8042_exit() on module unload.
Fixes: 9222ba68c3f4 ("Input: i8042 - add deferred probe support")
Signed-off-by: Chen Jun <chenjun102@huawei.com>
Link: https://lore.kernel.org/r/20221109034148.23821-1-chenjun102@huawei.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
The page table check trigger BUG_ON() unexpectedly when collapse hugepage:
------------[ cut here ]------------
kernel BUG at mm/page_table_check.c:82!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 6 PID: 68 Comm: khugepaged Not tainted 6.1.0-rc3+ #750
Hardware name: linux,dummy-virt (DT)
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : page_table_check_clear.isra.0+0x258/0x3f0
lr : page_table_check_clear.isra.0+0x240/0x3f0
[...]
Call trace:
page_table_check_clear.isra.0+0x258/0x3f0
__page_table_check_pmd_clear+0xbc/0x108
pmdp_collapse_flush+0xb0/0x160
collapse_huge_page+0xa08/0x1080
hpage_collapse_scan_pmd+0xf30/0x1590
khugepaged_scan_mm_slot.constprop.0+0x52c/0xac8
khugepaged+0x338/0x518
kthread+0x278/0x2f8
ret_from_fork+0x10/0x20
[...]
Since pmd_user_accessible_page() doesn't check if a pmd is leaf, it
decrease file_map_count for a non-leaf pmd comes from collapse_huge_page().
and so trigger BUG_ON() unexpectedly.
Fix this problem by using pmd_leaf() insteal of pmd_present() in
pmd_user_accessible_page(). Moreover, use pud_leaf() for
pud_user_accessible_page() too.
Fixes: 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK")
Reported-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221117075602.2904324-2-liushixin2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
When we post a CQE we wake all ring pollers as it normally should be.
However, if a CQE was generated by a multishot poll request targeting
its own ring, it'll wake that request up, which will make it to post
a new CQE, which will wake the request and so on until it exhausts all
CQ entries.
Don't allow multishot polling io_uring files but downgrade them to
oneshots, which was always stated as a correct behaviour that the
userspace should check for.
Cc: stable@vger.kernel.org
Fixes: aa43477b04025 ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3124038c0e7474d427538c2d915335ec28c92d21.1668785722.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There was a problem that a user burned a dm-integrity image on CDROM
and could not activate it because it had a non-empty journal.
Fix this problem by flushing the journal (done by the previous commit)
and clearing the journal (done by this commit). Once the journal is
cleared, dm-integrity won't attempt to replay it on the next
activation.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
This commit flushes the journal on suspend. It is prerequisite for the
next commit that enables activating dm integrity devices in read-only mode.
Note that we deliberately didn't flush the journal on suspend, so that the
journal replay code would be tested. However, the dm-integrity code is 5
years old now, so that journal replay is well-tested, and we can make this
change now.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
The 'no_sleep_enabled' should be decreased in error handling path
in dm_bufio_client_create() when the DM_BUFIO_CLIENT_NO_SLEEP flag
is set, otherwise static_branch_unlikely() will always return true
even if no dm_bufio_client instances have DM_BUFIO_CLIENT_NO_SLEEP
flag set.
Cc: stable@vger.kernel.org
Fixes: 3c1c875d0586 ("dm bufio: conditionally enable branching for DM_BUFIO_CLIENT_NO_SLEEP")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
__list_versions will first estimate the required space using the
"dm_target_iterate(list_version_get_needed, &needed)" call and then will
fill the space using the "dm_target_iterate(list_version_get_info,
&iter_info)" call. Each of these calls locks the targets using the
"down_read(&_lock)" and "up_read(&_lock)" calls, however between the first
and second "dm_target_iterate" there is no lock held and the target
modules can be loaded at this point, so the second "dm_target_iterate"
call may need more space than what was the first "dm_target_iterate"
returned.
The code tries to handle this overflow (see the beginning of
list_version_get_info), however this handling is incorrect.
The code sets "param->data_size = param->data_start + needed" and
"iter_info.end = (char *)vers+len" - "needed" is the size returned by the
first dm_target_iterate call; "len" is the size of the buffer allocated by
userspace.
"len" may be greater than "needed"; in this case, the code will write up
to "len" bytes into the buffer, however param->data_size is set to
"needed", so it may write data past the param->data_size value. The ioctl
interface copies only up to param->data_size into userspace, thus part of
the result will be truncated.
Fix this bug by setting "iter_info.end = (char *)vers + needed;" - this
guarantees that the second "dm_target_iterate" call will write only up to
the "needed" buffer and it will exit with "DM_BUFFER_FULL_FLAG" if it
overflows the "needed" space - in this case, userspace will allocate a
larger buffer and retry.
Note that there is also a bug in list_version_get_needed - we need to add
"strlen(tt->name) + 1" to the needed size, not "strlen(tt->name)".
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
Since commit c7e3ca515e78 ("iommu/tegra: gart: Do not register with
bus") quite some time ago, the GART driver has effectively disabled
itself to avoid issues with the GPU driver expecting it to work in ways
that it doesn't. As of commit 57365a04c921 ("iommu: Move bus setup to
IOMMU device registration") that bodge no longer works, but really the
GPU driver should be responsible for its own behaviour anyway. Make the
workaround explicit.
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Suggested-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
Entries in list 'tr->err_log' will be reused after entry number
exceed TRACING_LOG_ERRS_MAX.
The cmd string of the to be reused entry will be freed first then
allocated a new one. If the allocation failed, then the entry will
still be in list 'tr->err_log' but its 'cmd' field is set to be NULL,
later access of 'cmd' is risky.
Currently above problem can cause the loss of 'cmd' information of first
entry in 'tr->err_log'. When execute `cat /sys/kernel/tracing/error_log`,
reproduce logs like:
[ 37.495100] trace_kprobe: error: Maxactive is not for kprobe(null) ^
[ 38.412517] trace_kprobe: error: Maxactive is not for kprobe
Command: p4:myprobe2 do_sys_openat2
^
Link: https://lore.kernel.org/linux-trace-kernel/20221114104632.3547266-1-zhengyejian1@huawei.com
Fixes: 1581a884b7ca ("tracing: Remove size restriction on tracing_log_err cmd strings")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
__bad_type_size() is unused after
commit 04ae87a52074("ftrace: Rework event_create_dir()").
So, remove it.
Link: https://lkml.kernel.org/r/D062EC2E-7DB7-4402-A67E-33C3577F551E@gmail.com
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Since the eprobe filter was defined based on the eprobe's trace event
itself, it doesn't work correctly. Use the original trace event of
the eprobe when making the filter so that the filter works correctly.
Without this fix:
# echo 'e syscalls/sys_enter_openat \
flags_rename=$flags:u32 if flags < 1000' >> dynamic_events
# echo 1 > events/eprobes/sys_enter_openat/enable
[ 114.551550] event trace: Could not enable event sys_enter_openat
-bash: echo: write error: Invalid argument
With this fix:
# echo 'e syscalls/sys_enter_openat \
flags_rename=$flags:u32 if flags < 1000' >> dynamic_events
# echo 1 > events/eprobes/sys_enter_openat/enable
# tail trace
cat-241 [000] ...1. 266.498449: sys_enter_openat: (syscalls.sys_enter_openat) flags_rename=0
cat-242 [000] ...1. 266.977640: sys_enter_openat: (syscalls.sys_enter_openat) flags_rename=0
Link: https://lore.kernel.org/all/166823166395.1385292.8931770640212414483.stgit@devnote3/
Fixes: 752be5c5c910 ("tracing/eprobe: Add eprobe filter support")
Reported-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Tested-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
The filter pointer (filterp) passed to create_filter() function must be a
pointer that references a NULL pointer, otherwise, we get a warning when
adding a filter option to the event probe:
root@localhost:/sys/kernel/tracing# echo 'e:egroup/stat_runtime_4core sched/sched_stat_runtime \
runtime=$runtime:u32 if cpu < 4' >> dynamic_events
[ 5034.340439] ------------[ cut here ]------------
[ 5034.341258] WARNING: CPU: 0 PID: 223 at kernel/trace/trace_events_filter.c:1939 create_filter+0x1db/0x250
[...] stripped
[ 5034.345518] RIP: 0010:create_filter+0x1db/0x250
[...] stripped
[ 5034.351604] Call Trace:
[ 5034.351803] <TASK>
[ 5034.351959] ? process_preds+0x1b40/0x1b40
[ 5034.352241] ? rcu_read_lock_bh_held+0xd0/0xd0
[ 5034.352604] ? kasan_set_track+0x29/0x40
[ 5034.352904] ? kasan_save_alloc_info+0x1f/0x30
[ 5034.353264] create_event_filter+0x38/0x50
[ 5034.353573] __trace_eprobe_create+0x16f4/0x1d20
[ 5034.353964] ? eprobe_dyn_event_release+0x360/0x360
[ 5034.354363] ? mark_held_locks+0xa6/0xf0
[ 5034.354684] ? _raw_spin_unlock_irqrestore+0x35/0x60
[ 5034.355105] ? trace_hardirqs_on+0x41/0x120
[ 5034.355417] ? _raw_spin_unlock_irqrestore+0x35/0x60
[ 5034.355751] ? __create_object+0x5b7/0xcf0
[ 5034.356027] ? lock_is_held_type+0xaf/0x120
[ 5034.356362] ? rcu_read_lock_bh_held+0xb0/0xd0
[ 5034.356716] ? rcu_read_lock_bh_held+0xd0/0xd0
[ 5034.357084] ? kasan_set_track+0x29/0x40
[ 5034.357411] ? kasan_save_alloc_info+0x1f/0x30
[ 5034.357715] ? __kasan_kmalloc+0xb8/0xc0
[ 5034.357985] ? write_comp_data+0x2f/0x90
[ 5034.358302] ? __sanitizer_cov_trace_pc+0x25/0x60
[ 5034.358691] ? argv_split+0x381/0x460
[ 5034.358949] ? write_comp_data+0x2f/0x90
[ 5034.359240] ? eprobe_dyn_event_release+0x360/0x360
[ 5034.359620] trace_probe_create+0xf6/0x110
[ 5034.359940] ? trace_probe_match_command_args+0x240/0x240
[ 5034.360376] eprobe_dyn_event_create+0x21/0x30
[ 5034.360709] create_dyn_event+0xf3/0x1a0
[ 5034.360983] trace_parse_run_command+0x1a9/0x2e0
[ 5034.361297] ? dyn_event_release+0x500/0x500
[ 5034.361591] dyn_event_write+0x39/0x50
[ 5034.361851] vfs_write+0x311/0xe50
[ 5034.362091] ? dyn_event_seq_next+0x40/0x40
[ 5034.362376] ? kernel_write+0x5b0/0x5b0
[ 5034.362637] ? write_comp_data+0x2f/0x90
[ 5034.362937] ? __sanitizer_cov_trace_pc+0x25/0x60
[ 5034.363258] ? ftrace_syscall_enter+0x544/0x840
[ 5034.363563] ? write_comp_data+0x2f/0x90
[ 5034.363837] ? __sanitizer_cov_trace_pc+0x25/0x60
[ 5034.364156] ? write_comp_data+0x2f/0x90
[ 5034.364468] ? write_comp_data+0x2f/0x90
[ 5034.364770] ksys_write+0x158/0x2a0
[ 5034.365022] ? __ia32_sys_read+0xc0/0xc0
[ 5034.365344] __x64_sys_write+0x7c/0xc0
[ 5034.365669] ? syscall_enter_from_user_mode+0x53/0x70
[ 5034.366084] do_syscall_64+0x60/0x90
[ 5034.366356] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 5034.366767] RIP: 0033:0x7ff0b43938f3
[...] stripped
[ 5034.371892] </TASK>
[ 5034.374720] ---[ end trace 0000000000000000 ]---
Link: https://lore.kernel.org/all/20221108202148.1020111-1-rafaelmendsr@gmail.com/
Fixes: 752be5c5c910 ("tracing/eprobe: Add eprobe filter support")
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
In __unregister_kprobe_top(), if the currently unregistered probe has
post_handler but other child probes of the aggrprobe do not have
post_handler, the post_handler of the aggrprobe is cleared. If this is
a ftrace-based probe, there is a problem. In later calls to
disarm_kprobe(), we will use kprobe_ftrace_ops because post_handler is
NULL. But we're armed with kprobe_ipmodify_ops. This triggers a WARN in
__disarm_kprobe_ftrace() and may even cause use-after-free:
Failed to disarm kprobe-ftrace at kernel_clone+0x0/0x3c0 (error -2)
WARNING: CPU: 5 PID: 137 at kernel/kprobes.c:1135 __disarm_kprobe_ftrace.isra.21+0xcf/0xe0
Modules linked in: testKprobe_007(-)
CPU: 5 PID: 137 Comm: rmmod Not tainted 6.1.0-rc4-dirty #18
[...]
Call Trace:
<TASK>
__disable_kprobe+0xcd/0xe0
__unregister_kprobe_top+0x12/0x150
? mutex_lock+0xe/0x30
unregister_kprobes.part.23+0x31/0xa0
unregister_kprobe+0x32/0x40
__x64_sys_delete_module+0x15e/0x260
? do_user_addr_fault+0x2cd/0x6b0
do_syscall_64+0x3a/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
[...]
For the kprobe-on-ftrace case, we keep the post_handler setting to
identify this aggrprobe armed with kprobe_ipmodify_ops. This way we
can disarm it correctly.
Link: https://lore.kernel.org/all/20221112070000.35299-1-lihuafei1@huawei.com/
Fixes: 0bc11ed5ab60 ("kprobes: Allow kprobes coexist with livepatch")
Reported-by: Zhao Gongyi <zhaogongyi@huawei.com>
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
In rethook_alloc(), the variable rh is not freed or passed out
if handler is NULL, which could lead to a memleak, fix it.
Link: https://lore.kernel.org/all/20221110104438.88099-1-yiyang13@huawei.com/
[Masami: Add "rethook:" tag to the title.]
Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook")
Cc: stable@vger.kernel.org
Signed-off-by: Yi Yang <yiyang13@huawei.com>
Acke-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
The filter string doesn't get freed when a dynamic event is deleted. If a
filter is set, then memory is leaked:
root@localhost:/sys/kernel/tracing# echo 'e:egroup/stat_runtime_4core \
sched/sched_stat_runtime runtime=$runtime:u32 if cpu < 4' >> dynamic_events
root@localhost:/sys/kernel/tracing# echo "-:egroup/stat_runtime_4core" >> dynamic_events
root@localhost:/sys/kernel/tracing# echo scan > /sys/kernel/debug/kmemleak
[ 224.416373] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
root@localhost:/sys/kernel/tracing# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff88810156f1b8 (size 8):
comm "bash", pid 224, jiffies 4294935612 (age 55.800s)
hex dump (first 8 bytes):
63 70 75 20 3c 20 34 00 cpu < 4.
backtrace:
[<000000009f880725>] __kmem_cache_alloc_node+0x18e/0x720
[<0000000042492946>] __kmalloc+0x57/0x240
[<0000000034ea7995>] __trace_eprobe_create+0x1214/0x1d30
[<00000000d70ef730>] trace_probe_create+0xf6/0x110
[<00000000915c7b16>] eprobe_dyn_event_create+0x21/0x30
[<000000000d894386>] create_dyn_event+0xf3/0x1a0
[<00000000e9af57d5>] trace_parse_run_command+0x1a9/0x2e0
[<0000000080777f18>] dyn_event_write+0x39/0x50
[<0000000089f0ec73>] vfs_write+0x311/0xe50
[<000000003da1bdda>] ksys_write+0x158/0x2a0
[<00000000bb1e616e>] __x64_sys_write+0x7c/0xc0
[<00000000e8aef1f7>] do_syscall_64+0x60/0x90
[<00000000fe7fe8ba>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Additionally, in __trace_eprobe_create() function, if an error occurs after
the call to trace_eprobe_parse_filter(), which allocates the filter string,
then memory is also leaked. That can be reproduced by creating the same
event probe twice:
root@localhost:/sys/kernel/tracing# echo 'e:egroup/stat_runtime_4core \
sched/sched_stat_runtime runtime=$runtime:u32 if cpu < 4' >> dynamic_events
root@localhost:/sys/kernel/tracing# echo 'e:egroup/stat_runtime_4core \
sched/sched_stat_runtime runtime=$runtime:u32 if cpu < 4' >> dynamic_events
-bash: echo: write error: File exists
root@localhost:/sys/kernel/tracing# echo scan > /sys/kernel/debug/kmemleak
[ 207.871584] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
root@localhost:/sys/kernel/tracing# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8881020d17a8 (size 8):
comm "bash", pid 223, jiffies 4294938308 (age 31.000s)
hex dump (first 8 bytes):
63 70 75 20 3c 20 34 00 cpu < 4.
backtrace:
[<000000000e4f5f31>] __kmem_cache_alloc_node+0x18e/0x720
[<0000000024f0534b>] __kmalloc+0x57/0x240
[<000000002930a28e>] __trace_eprobe_create+0x1214/0x1d30
[<0000000028387903>] trace_probe_create+0xf6/0x110
[<00000000a80d6a9f>] eprobe_dyn_event_create+0x21/0x30
[<000000007168698c>] create_dyn_event+0xf3/0x1a0
[<00000000f036bf6a>] trace_parse_run_command+0x1a9/0x2e0
[<00000000014bde8b>] dyn_event_write+0x39/0x50
[<0000000078a097f7>] vfs_write+0x311/0xe50
[<00000000996cb208>] ksys_write+0x158/0x2a0
[<00000000a3c2acb0>] __x64_sys_write+0x7c/0xc0
[<0000000006b5d698>] do_syscall_64+0x60/0x90
[<00000000780e8ecf>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fix both issues by releasing the filter string in
trace_event_probe_cleanup().
Link: https://lore.kernel.org/all/20221108235738.1021467-1-rafaelmendsr@gmail.com/
Fixes: 752be5c5c910 ("tracing/eprobe: Add eprobe filter support")
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
When test_gen_kprobe_cmd() failed after kprobe_event_gen_cmd_end(), it
will goto delete, which will call kprobe_event_delete() and release the
corresponding resource. However, the trace_array in gen_kretprobe_test
will point to the invalid resource. Set gen_kretprobe_test to NULL
after called kprobe_event_delete() to prevent null-ptr-deref.
BUG: kernel NULL pointer dereference, address: 0000000000000070
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 246 Comm: modprobe Tainted: G W
6.1.0-rc1-00174-g9522dc5c87da-dirty #248
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:__ftrace_set_clr_event_nolock+0x53/0x1b0
Code: e8 82 26 fc ff 49 8b 1e c7 44 24 0c ea ff ff ff 49 39 de 0f 84 3c
01 00 00 c7 44 24 18 00 00 00 00 e8 61 26 fc ff 48 8b 6b 10 <44> 8b 65
70 4c 8b 6d 18 41 f7 c4 00 02 00 00 75 2f
RSP: 0018:ffffc9000159fe00 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88810971d268 RCX: 0000000000000000
RDX: ffff8881080be600 RSI: ffffffff811b48ff RDI: ffff88810971d058
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
R10: ffffc9000159fe58 R11: 0000000000000001 R12: ffffffffa0001064
R13: ffffffffa000106c R14: ffff88810971d238 R15: 0000000000000000
FS: 00007f89eeff6540(0000) GS:ffff88813b600000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000070 CR3: 000000010599e004 CR4: 0000000000330ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__ftrace_set_clr_event+0x3e/0x60
trace_array_set_clr_event+0x35/0x50
? 0xffffffffa0000000
kprobe_event_gen_test_exit+0xcd/0x10b [kprobe_event_gen_test]
__x64_sys_delete_module+0x206/0x380
? lockdep_hardirqs_on_prepare+0xd8/0x190
? syscall_enter_from_user_mode+0x1c/0x50
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f89eeb061b7
Link: https://lore.kernel.org/all/20221108015130.28326-3-shangxiaojing@huawei.com/
Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
When trace_get_event_file() failed, gen_kretprobe_test will be assigned
as the error code. If module kprobe_event_gen_test is removed now, the
null pointer dereference will happen in kprobe_event_gen_test_exit().
Check if gen_kprobe_test or gen_kretprobe_test is error code or NULL
before dereference them.
BUG: kernel NULL pointer dereference, address: 0000000000000012
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 3 PID: 2210 Comm: modprobe Not tainted
6.1.0-rc1-00171-g2159299a3b74-dirty #217
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:kprobe_event_gen_test_exit+0x1c/0xb5 [kprobe_event_gen_test]
Code: Unable to access opcode bytes at 0xffffffff9ffffff2.
RSP: 0018:ffffc900015bfeb8 EFLAGS: 00010246
RAX: ffffffffffffffea RBX: ffffffffa0002080 RCX: 0000000000000000
RDX: ffffffffa0001054 RSI: ffffffffa0001064 RDI: ffffffffdfc6349c
RBP: ffffffffa0000000 R08: 0000000000000004 R09: 00000000001e95c0
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000800
R13: ffffffffa0002420 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f56b75be540(0000) GS:ffff88813bc00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff9ffffff2 CR3: 000000010874a006 CR4: 0000000000330ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__x64_sys_delete_module+0x206/0x380
? lockdep_hardirqs_on_prepare+0xd8/0x190
? syscall_enter_from_user_mode+0x1c/0x50
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Link: https://lore.kernel.org/all/20221108015130.28326-2-shangxiaojing@huawei.com/
Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
In register_synth_event(), if set_synth_event_print_fmt() failed, then
both trace_remove_event_call() and unregister_trace_event() will be
called, which means the trace_event_call will call
__unregister_trace_event() twice. As the result, the second unregister
will causes the wild-memory-access.
register_synth_event
set_synth_event_print_fmt failed
trace_remove_event_call
event_remove
if call->event.funcs then
__unregister_trace_event (first call)
unregister_trace_event
__unregister_trace_event (second call)
Fix the bug by avoiding to call the second __unregister_trace_event() by
checking if the first one is called.
general protection fault, probably for non-canonical address
0xfbd59c0000000024: 0000 [#1] SMP KASAN PTI
KASAN: maybe wild-memory-access in range
[0xdead000000000120-0xdead000000000127]
CPU: 0 PID: 3807 Comm: modprobe Not tainted
6.1.0-rc1-00186-g76f33a7eedb4 #299
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:unregister_trace_event+0x6e/0x280
Code: 00 fc ff df 4c 89 ea 48 c1 ea 03 80 3c 02 00 0f 85 0e 02 00 00 48
b8 00 00 00 00 00 fc ff df 4c 8b 63 08 4c 89 e2 48 c1 ea 03 <80> 3c 02
00 0f 85 e2 01 00 00 49 89 2c 24 48 85 ed 74 28 e8 7a 9b
RSP: 0018:ffff88810413f370 EFLAGS: 00010a06
RAX: dffffc0000000000 RBX: ffff888105d050b0 RCX: 0000000000000000
RDX: 1bd5a00000000024 RSI: ffff888119e276e0 RDI: ffffffff835a8b20
RBP: dead000000000100 R08: 0000000000000000 R09: fffffbfff0913481
R10: ffffffff8489a407 R11: fffffbfff0913480 R12: dead000000000122
R13: ffff888105d050b8 R14: 0000000000000000 R15: ffff888105d05028
FS: 00007f7823e8d540(0000) GS:ffff888119e00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7823e7ebec CR3: 000000010a058002 CR4: 0000000000330ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__create_synth_event+0x1e37/0x1eb0
create_or_delete_synth_event+0x110/0x250
synth_event_run_command+0x2f/0x110
test_gen_synth_cmd+0x170/0x2eb [synth_event_gen_test]
synth_event_gen_test_init+0x76/0x9bc [synth_event_gen_test]
do_one_initcall+0xdb/0x480
do_init_module+0x1cf/0x680
load_module+0x6a50/0x70a0
__do_sys_finit_module+0x12f/0x1c0
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Link: https://lkml.kernel.org/r/20221117012346.22647-3-shangxiaojing@huawei.com
Fixes: 4b147936fa50 ("tracing: Add support for 'synthetic' events")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Cc: stable@vger.kernel.org
Cc: <mhiramat@kernel.org>
Cc: <zanussi@kernel.org>
Cc: <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
test_gen_synth_cmd() only free buf in fail path, hence buf will leak
when there is no failure. Add kfree(buf) to prevent the memleak. The
same reason and solution in test_empty_synth_event().
unreferenced object 0xffff8881127de000 (size 2048):
comm "modprobe", pid 247, jiffies 4294972316 (age 78.756s)
hex dump (first 32 bytes):
20 67 65 6e 5f 73 79 6e 74 68 5f 74 65 73 74 20 gen_synth_test
20 70 69 64 5f 74 20 6e 65 78 74 5f 70 69 64 5f pid_t next_pid_
backtrace:
[<000000004254801a>] kmalloc_trace+0x26/0x100
[<0000000039eb1cf5>] 0xffffffffa00083cd
[<000000000e8c3bc8>] 0xffffffffa00086ba
[<00000000c293d1ea>] do_one_initcall+0xdb/0x480
[<00000000aa189e6d>] do_init_module+0x1cf/0x680
[<00000000d513222b>] load_module+0x6a50/0x70a0
[<000000001fd4d529>] __do_sys_finit_module+0x12f/0x1c0
[<00000000b36c4c0f>] do_syscall_64+0x3f/0x90
[<00000000bbf20cf3>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
unreferenced object 0xffff8881127df000 (size 2048):
comm "modprobe", pid 247, jiffies 4294972324 (age 78.728s)
hex dump (first 32 bytes):
20 65 6d 70 74 79 5f 73 79 6e 74 68 5f 74 65 73 empty_synth_tes
74 20 20 70 69 64 5f 74 20 6e 65 78 74 5f 70 69 t pid_t next_pi
backtrace:
[<000000004254801a>] kmalloc_trace+0x26/0x100
[<00000000d4db9a3d>] 0xffffffffa0008071
[<00000000c31354a5>] 0xffffffffa00086ce
[<00000000c293d1ea>] do_one_initcall+0xdb/0x480
[<00000000aa189e6d>] do_init_module+0x1cf/0x680
[<00000000d513222b>] load_module+0x6a50/0x70a0
[<000000001fd4d529>] __do_sys_finit_module+0x12f/0x1c0
[<00000000b36c4c0f>] do_syscall_64+0x3f/0x90
[<00000000bbf20cf3>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Link: https://lkml.kernel.org/r/20221117012346.22647-2-shangxiaojing@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <zanussi@kernel.org>
Cc: <fengguang.wu@intel.com>
Cc: stable@vger.kernel.org
Fixes: 9fe41efaca08 ("tracing: Add synth event generation test module")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The @ftrace_mod is allocated by kzalloc(), so both the members {prev,next}
of @ftrace_mode->list are NULL, it's not a valid state to call list_del().
If kstrdup() for @ftrace_mod->{func|module} fails, it goes to @out_free
tag and calls free_ftrace_mod() to destroy @ftrace_mod, then list_del()
will write prev->next and next->prev, where null pointer dereference
happens.
BUG: kernel NULL pointer dereference, address: 0000000000000008
Oops: 0002 [#1] PREEMPT SMP NOPTI
Call Trace:
<TASK>
ftrace_mod_callback+0x20d/0x220
? do_filp_open+0xd9/0x140
ftrace_process_regex.isra.51+0xbf/0x130
ftrace_regex_write.isra.52.part.53+0x6e/0x90
vfs_write+0xee/0x3a0
? __audit_filter_op+0xb1/0x100
? auditd_test_task+0x38/0x50
ksys_write+0xa5/0xe0
do_syscall_64+0x3a/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Kernel panic - not syncing: Fatal exception
So call INIT_LIST_HEAD() to initialize the list member to fix this issue.
Link: https://lkml.kernel.org/r/20221116015207.30858-1-xiujianfeng@huawei.com
Cc: stable@vger.kernel.org
Fixes: 673feb9d76ab ("ftrace: Add :mod: caching infrastructure to trace_array")
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
rb_head_page_deactivate() expects cpu_buffer to contain a valid list of
->pages, so verify that the list is actually present before calling it.
Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.
Link: https://lkml.kernel.org/r/20221114143129.3534443-1-d-tatianin@yandex-team.ru
Cc: stable@vger.kernel.org
Fixes: 77ae365eca895 ("ring-buffer: make lockless")
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
If we can't allocate this size, try something smaller with half of the
size. Its order should be decreased by one instead of divided by two.
Link: https://lkml.kernel.org/r/20221109094434.84046-3-wangwensheng4@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Fixes: a79008755497d ("ftrace: Allocate the mcount record pages as groups")
Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
If the number of mcount entries is an integer multiple of
ENTRIES_PER_PAGE, the page count showing on the console would be wrong.
Link: https://lkml.kernel.org/r/20221109094434.84046-2-wangwensheng4@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Fixes: 5821e1b74f0d0 ("function tracing: fix wrong pos computing when read buffer has been fulfilled")
Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Having REQ_F_POLLED set doesn't guarantee that the request is
executed as a multishot from the polling path. Fortunately for us, if
the code thinks it's multishot issue when it's not, it can only ask to
skip completion so leaking the request. Use issue_flags to mark
multipoll issues.
Cc: stable@vger.kernel.org
Fixes: 1300ebb20286b ("io_uring: multishot recv")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/37762040ba9c52b81b92a2f5ebfd4ee484088951.1668710222.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Having REQ_F_POLLED set doesn't guarantee that the request is
executed as a multishot from the polling path. Fortunately for us, if
the code thinks it's multishot issue when it's not, it can only ask to
skip completion so leaking the request. Use issue_flags to mark
multipoll issues.
Cc: stable@vger.kernel.org
Fixes: 390ed29b5e425 ("io_uring: add IORING_ACCEPT_MULTISHOT for accept")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7700ac57653f2823e30b34dc74da68678c0c5f13.1668710222.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We may never try to process a poll wake and its mask if there was
multiple wake ups racing for queueing up a tw. Force
io_poll_check_events() to update the mask by vfs_poll().
Cc: stable@vger.kernel.org
Fixes: aa43477b04025 ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/00344d60f8b18907171178d7cf598de71d127b0b.1668710222.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When io_poll_check_events() collides with someone attempting to queue a
task work, it'll spin for one more time. However, it'll continue to use
the mask from the first iteration instead of updating it. For example,
if the first wake up was a EPOLLIN and the second EPOLLOUT, the
userspace will not get EPOLLOUT in time.
Clear the mask for all subsequent iterations to force vfs_poll().
Cc: stable@vger.kernel.org
Fixes: aa43477b04025 ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2dac97e8f691231049cb259c4ae57e79e40b537c.1668710222.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If device_register() returns error, the name allocated by the
dev_set_name() need be freed. As described in the comment of
device_register(), we should use put_device() to give up the reference in
the error path.
Fix this by calling put_device(), the name will be freed in the
kobject_cleanup(), and this patch modified resources will be released by
calling the corresponding callback function in the device_release().
Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
Link: https://lore.kernel.org/r/20221110033729.1555-1-zhouguanghui1@huawei.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
We used to use the wrong type of integer in 'zfcp_fsf_req_send()' to cache
the FSF request ID when sending a new FSF request. This is used in case the
sending fails and we need to remove the request from our internal hash
table again (so we don't keep an invalid reference and use it when we free
the request again).
In 'zfcp_fsf_req_send()' we used to cache the ID as 'int' (signed and 32
bit wide), but the rest of the zfcp code (and the firmware specification)
handles the ID as 'unsigned long'/'u64' (unsigned and 64 bit wide [s390x
ELF ABI]). For one this has the obvious problem that when the ID grows
past 32 bit (this can happen reasonably fast) it is truncated to 32 bit
when storing it in the cache variable and so doesn't match the original ID
anymore. The second less obvious problem is that even when the original ID
has not yet grown past 32 bit, as soon as the 32nd bit is set in the
original ID (0x80000000 = 2'147'483'648) we will have a mismatch when we
cast it back to 'unsigned long'. As the cached variable is of a signed
type, the compiler will choose a sign-extending instruction to load the 32
bit variable into a 64 bit register (e.g.: 'lgf %r11,188(%r15)'). So once
we pass the cached variable into 'zfcp_reqlist_find_rm()' to remove the
request again all the leading zeros will be flipped to ones to extend the
sign and won't match the original ID anymore (this has been observed in
practice).
If we can't successfully remove the request from the hash table again after
'zfcp_qdio_send()' fails (this happens regularly when zfcp cannot notify
the adapter about new work because the adapter is already gone during
e.g. a ChpID toggle) we will end up with a double free. We unconditionally
free the request in the calling function when 'zfcp_fsf_req_send()' fails,
but because the request is still in the hash table we end up with a stale
memory reference, and once the zfcp adapter is either reset during recovery
or shutdown we end up freeing the same memory twice.
The resulting stack traces vary depending on the kernel and have no direct
correlation to the place where the bug occurs. Here are three examples that
have been seen in practice:
list_del corruption. next->prev should be 00000001b9d13800, but was 00000000dead4ead. (next=00000001bd131a00)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:62!
monitor event: 0040 ilc:2 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 9 PID: 1617 Comm: zfcperp0.0.1740 Kdump: loaded
Hardware name: ...
Krnl PSW : 0704d00180000000 00000003cbeea1f8 (__list_del_entry_valid+0x98/0x140)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 00000000916d12f1 0000000080000000 000000000000006d 00000003cb665cd6
0000000000000001 0000000000000000 0000000000000000 00000000d28d21e8
00000000d3844000 00000380099efd28 00000001bd131a00 00000001b9d13800
00000000d3290100 0000000000000000 00000003cbeea1f4 00000380099efc70
Krnl Code: 00000003cbeea1e8: c020004f68a7 larl %r2,00000003cc8d7336
00000003cbeea1ee: c0e50027fd65 brasl %r14,00000003cc3e9cb8
#00000003cbeea1f4: af000000 mc 0,0
>00000003cbeea1f8: c02000920440 larl %r2,00000003cd12aa78
00000003cbeea1fe: c0e500289c25 brasl %r14,00000003cc3fda48
00000003cbeea204: b9040043 lgr %r4,%r3
00000003cbeea208: b9040051 lgr %r5,%r1
00000003cbeea20c: b9040032 lgr %r3,%r2
Call Trace:
[<00000003cbeea1f8>] __list_del_entry_valid+0x98/0x140
([<00000003cbeea1f4>] __list_del_entry_valid+0x94/0x140)
[<000003ff7ff502fe>] zfcp_fsf_req_dismiss_all+0xde/0x150 [zfcp]
[<000003ff7ff49cd0>] zfcp_erp_strategy_do_action+0x160/0x280 [zfcp]
[<000003ff7ff4a22e>] zfcp_erp_strategy+0x21e/0xca0 [zfcp]
[<000003ff7ff4ad34>] zfcp_erp_thread+0x84/0x1a0 [zfcp]
[<00000003cb5eece8>] kthread+0x138/0x150
[<00000003cb557f3c>] __ret_from_fork+0x3c/0x60
[<00000003cc4172ea>] ret_from_fork+0xa/0x40
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<00000003cc3e9d04>] _printk+0x4c/0x58
Kernel panic - not syncing: Fatal exception: panic_on_oops
or:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803
Fault in home space mode while using kernel ASCE.
AS:0000000063b10007 R3:0000000000000024
Oops: 0038 ilc:3 [#1] SMP
Modules linked in: ...
CPU: 10 PID: 0 Comm: swapper/10 Kdump: loaded
Hardware name: ...
Krnl PSW : 0404d00180000000 000003ff7febaf8e (zfcp_fsf_reqid_check+0x86/0x158 [zfcp])
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 5a6f1cfa89c49ac3 00000000aff2c4c8 6b6b6b6b6b6b6b6b 00000000000002a8
0000000000000000 0000000000000055 0000000000000000 00000000a8515800
0700000000000000 00000000a6e14500 00000000aff2c000 000000008003c44c
000000008093c700 0000000000000010 00000380009ebba8 00000380009ebb48
Krnl Code: 000003ff7febaf7e: a7f4003d brc 15,000003ff7febaff8
000003ff7febaf82: e32020000004 lg %r2,0(%r2)
#000003ff7febaf88: ec2100388064 cgrj %r2,%r1,8,000003ff7febaff8
>000003ff7febaf8e: e3b020100020 cg %r11,16(%r2)
000003ff7febaf94: a774fff7 brc 7,000003ff7febaf82
000003ff7febaf98: ec280030007c cgij %r2,0,8,000003ff7febaff8
000003ff7febaf9e: e31020080004 lg %r1,8(%r2)
000003ff7febafa4: e33020000004 lg %r3,0(%r2)
Call Trace:
[<000003ff7febaf8e>] zfcp_fsf_reqid_check+0x86/0x158 [zfcp]
[<000003ff7febbdbc>] zfcp_qdio_int_resp+0x6c/0x170 [zfcp]
[<000003ff7febbf90>] zfcp_qdio_irq_tasklet+0xd0/0x108 [zfcp]
[<0000000061d90a04>] tasklet_action_common.constprop.0+0xdc/0x128
[<000000006292f300>] __do_softirq+0x130/0x3c0
[<0000000061d906c6>] irq_exit_rcu+0xfe/0x118
[<000000006291e818>] do_io_irq+0xc8/0x168
[<000000006292d516>] io_int_handler+0xd6/0x110
[<000000006292d596>] psw_idle_exit+0x0/0xa
([<0000000061d3be50>] arch_cpu_idle+0x40/0xd0)
[<000000006292ceea>] default_idle_call+0x52/0xf8
[<0000000061de4fa4>] do_idle+0xd4/0x168
[<0000000061de51fe>] cpu_startup_entry+0x36/0x40
[<0000000061d4faac>] smp_start_secondary+0x12c/0x138
[<000000006292d88e>] restart_int_handler+0x6e/0x90
Last Breaking-Event-Address:
[<000003ff7febaf94>] zfcp_fsf_reqid_check+0x8c/0x158 [zfcp]
Kernel panic - not syncing: Fatal exception in interrupt
or:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 523b05d3ae76a000 TEID: 523b05d3ae76a803
Fault in home space mode while using kernel ASCE.
AS:0000000077c40007 R3:0000000000000024
Oops: 0038 ilc:3 [#1] SMP
Modules linked in: ...
CPU: 3 PID: 453 Comm: kworker/3:1H Kdump: loaded
Hardware name: ...
Workqueue: kblockd blk_mq_run_work_fn
Krnl PSW : 0404d00180000000 0000000076fc0312 (__kmalloc+0xd2/0x398)
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: ffffffffffffffff 523b05d3ae76abf6 0000000000000000 0000000000092a20
0000000000000002 00000007e49b5cc0 00000007eda8f000 0000000000092a20
00000007eda8f000 00000003b02856b9 00000000000000a8 523b05d3ae76abf6
00000007dd662000 00000007eda8f000 0000000076fc02b2 000003e0037637a0
Krnl Code: 0000000076fc0302: c004000000d4 brcl 0,76fc04aa
0000000076fc0308: b904001b lgr %r1,%r11
#0000000076fc030c: e3106020001a algf %r1,32(%r6)
>0000000076fc0312: e31010000082 xg %r1,0(%r1)
0000000076fc0318: b9040001 lgr %r0,%r1
0000000076fc031c: e30061700082 xg %r0,368(%r6)
0000000076fc0322: ec59000100d9 aghik %r5,%r9,1
0000000076fc0328: e34003b80004 lg %r4,952
Call Trace:
[<0000000076fc0312>] __kmalloc+0xd2/0x398
[<0000000076f318f2>] mempool_alloc+0x72/0x1f8
[<000003ff8027c5f8>] zfcp_fsf_req_create.isra.7+0x40/0x268 [zfcp]
[<000003ff8027f1bc>] zfcp_fsf_fcp_cmnd+0xac/0x3f0 [zfcp]
[<000003ff80280f1a>] zfcp_scsi_queuecommand+0x122/0x1d0 [zfcp]
[<000003ff800b4218>] scsi_queue_rq+0x778/0xa10 [scsi_mod]
[<00000000771782a0>] __blk_mq_try_issue_directly+0x130/0x208
[<000000007717a124>] blk_mq_request_issue_directly+0x4c/0xa8
[<000003ff801302e2>] dm_mq_queue_rq+0x2ea/0x468 [dm_mod]
[<0000000077178c12>] blk_mq_dispatch_rq_list+0x33a/0x818
[<000000007717f064>] __blk_mq_do_dispatch_sched+0x284/0x2f0
[<000000007717f44c>] __blk_mq_sched_dispatch_requests+0x1c4/0x218
[<000000007717fa7a>] blk_mq_sched_dispatch_requests+0x52/0x90
[<0000000077176d74>] __blk_mq_run_hw_queue+0x9c/0xc0
[<0000000076da6d74>] process_one_work+0x274/0x4d0
[<0000000076da7018>] worker_thread+0x48/0x560
[<0000000076daef18>] kthread+0x140/0x160
[<000000007751d144>] ret_from_fork+0x28/0x30
Last Breaking-Event-Address:
[<0000000076fc0474>] __kmalloc+0x234/0x398
Kernel panic - not syncing: Fatal exception: panic_on_oops
To fix this, simply change the type of the cache variable to 'unsigned
long', like the rest of zfcp and also the argument for
'zfcp_reqlist_find_rm()'. This prevents truncation and wrong sign extension
and so can successfully remove the request from the hash table.
Fixes: e60a6d69f1f8 ("[SCSI] zfcp: Remove function zfcp_reqlist_find_safe")
Cc: <stable@vger.kernel.org> #v2.6.34+
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Link: https://lore.kernel.org/r/979f6e6019d15f91ba56182f1aaf68d61bf37fc6.1668595505.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
If device_register() fails in sdebug_add_host_helper(), it will goto clean
and sdbg_host will be freed, but sdbg_host->host_list will not be removed
from sdebug_host_list, then list traversal may cause UAF. Fix it.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Link: https://lore.kernel.org/r/20221117084421.58918-1-yuancan@huawei.com
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|