Age | Commit message (Collapse) | Author | Files | Lines |
|
Those two cleanup routines are using the helper ovl_dir_read() with the
merge dir filler, which populates an rb tree, that is never used.
The index dir entry names all have a long (42 bytes) constant prefix, so it
is not surprising that perf top has demostrated high CPU usage by rb tree
population during cleanup of a large index dir:
- 9.53% ovl_fill_merge
- 78.41% ovl_cache_entry_find_link.constprop.27
+ 72.11% strncmp
Use the plain list filler that does not populate the unneeded rb tree.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
ovl_indexdir_cleanup() is called on mount of overayfs with nfs_export
feature to cleanup stale index records for lower and upper files that have
been deleted while overlayfs was offline.
This has the side effect (good or bad) of pre populating inode cache with
all the copied up upper inodes, while verifying the index entries.
For copied up directories, the upper file handles are decoded to conncted
upper dentries. This has the even bigger side effect of reading the
content of all the parent upper directories which may take significantly
more time and IO than just reading the upper inodes.
Do not request connceted upper dentries for verifying upper directory index
entries, because we have no use for the connected dentry.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Fix two typos.
Reported-by: k2ci <kernel-bot@kylinos.cn>
Signed-off-by: Jiangshan Yi <yijiangshan@kylinos.cn>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
A while ago we introduced a dedicated vfs{g,u}id_t type in commit
1e5267cd0895 ("mnt_idmapping: add vfs{g,u}id_t"). We already switched
over a good part of the VFS. Ultimately we will remove all legacy
idmapped mount helpers that operate only on k{g,u}id_t in favor of the
new type safe helpers that operate on vfs{g,u}id_t.
Cc: Seth Forshee (Digital Ocean) <sforshee@kernel.org>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
There is a wrong case of link() on overlay:
$ mkdir /lower /fuse /merge
$ mount -t fuse /fuse
$ mkdir /fuse/upper /fuse/work
$ mount -t overlay /merge -o lowerdir=/lower,upperdir=/fuse/upper,\
workdir=work
$ touch /merge/file
$ chown bin.bin /merge/file // the file's caller becomes "bin"
$ ln /merge/file /merge/lnkfile
Then we will get an error(EACCES) because fuse daemon checks the link()'s
caller is "bin", it denied this request.
In the changing history of ovl_link(), there are two key commits:
The first is commit bb0d2b8ad296 ("ovl: fix sgid on directory") which
overrides the cred's fsuid/fsgid using the new inode. The new inode's
owner is initialized by inode_init_owner(), and inode->fsuid is
assigned to the current user. So the override fsuid becomes the
current user. We know link() is actually modifying the directory, so
the caller must have the MAY_WRITE permission on the directory. The
current caller may should have this permission. This is acceptable
to use the caller's fsuid.
The second is commit 51f7e52dc943 ("ovl: share inode for hard link")
which removed the inode creation in ovl_link(). This commit move
inode_init_owner() into ovl_create_object(), so the ovl_link() just
give the old inode to ovl_create_or_link(). Then the override fsuid
becomes the old inode's fsuid, neither the caller nor the overlay's
mounter! So this is incorrect.
Fix this bug by using ovl mounter's fsuid/fsgid to do underlying
fs's link().
Link: https://lore.kernel.org/all/20220817102952.xnvesg3a7rbv576x@wittgenstein/T
Link: https://lore.kernel.org/lkml/20220825130552.29587-1-zhangtianci.1997@bytedance.com/t
Signed-off-by: Zhang Tianci <zhangtianci.1997@bytedance.com>
Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Fixes: 51f7e52dc943 ("ovl: share inode for hard link")
Cc: <stable@vger.kernel.org> # v4.8
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The "buf" flexible array needs to be the memcpy() destination to avoid
false positive run-time warning from the recent FORTIFY_SOURCE
hardening:
memcpy: detected field-spanning write (size 93) of single field "&fh->fb"
at fs/overlayfs/export.c:799 (size 21)
Reported-by: syzbot+9d14351a171d0d1c7955@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/000000000000763a6c05e95a5985@google.com/
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
ovl_change_flags() is an open-coded variant of fs/fcntl.c:setfl() and it
got missed by commit 164f4064ca81 ("keep iocb_flags() result cached in
struct file"); the same change applies there.
Reported-by: Pierre Labastie <pierre.labastie@neuf.fr>
Fixes: 164f4064ca81 ("keep iocb_flags() result cached in struct file")
Cc: <stable@vger.kernel.org> # v6.0
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216738
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
ovl_dentry_revalidate_common() can be called in rcu-walk mode. As document
said, "in rcu-walk mode, d_parent and d_inode should not be used without
care".
Check inode here to protect access under rcu-walk mode.
Fixes: bccece1ead36 ("ovl: allow remote upper")
Reported-and-tested-by: syzbot+a4055c78774bbf3498bb@syzkaller.appspotmail.com
Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com>
Cc: <stable@vger.kernel.org> # v5.7
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
|
|
The flag that tells the event to call its triggers after reading the event
is set for eprobes after the eprobe is enabled. This leads to a race where
the eprobe may be triggered at the beginning of the event where the record
information is NULL. The eprobe then dereferences the NULL record causing
a NULL kernel pointer bug.
Test for a NULL record to keep this from happening.
Link: https://lore.kernel.org/linux-trace-kernel/20221116192552.1066630-1-rafaelmendsr@gmail.com/
Link: https://lore.kernel.org/linux-trace-kernel/20221117214249.2addbe10@gandalf.local.home
Cc: Linux Trace Kernel <linux-trace-kernel@vger.kernel.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reported-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
SRS cap is the hardware cap telling if the hardware IOMMU can support
requests seeking supervisor privilege or not. SRE bit in scalable-mode
PASID table entry is treated as Reserved(0) for implementation not
supporting SRS cap.
Checking SRS cap before setting SRE bit can avoid the non-recoverable
fault of "Non-zero reserved field set in PASID Table Entry" caused by
setting SRE bit while there is no SRS cap support. The fault messages
look like below:
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Read NO_PASID] Request device [00:0d.0] fault addr 0x1154e1000
[fault reason 0x5a]
SM: Non-zero reserved field set in PASID Table Entry
Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface")
Cc: stable@vger.kernel.org
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20221115070346.1112273-1-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20221116051544.26540-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
The A/D bits are preseted for IOVA over first level(FL) usage for both
kernel DMA (i.e, domain typs is IOMMU_DOMAIN_DMA) and user space DMA
usage (i.e., domain type is IOMMU_DOMAIN_UNMANAGED).
Presetting A bit in FL requires to preset the bit in every related paging
entries, including the non-leaf ones. Otherwise, hardware may treat this
as an error. For example, in a case of ECAP_REG.SMPWC==0, DMA faults might
occur with below DMAR fault messages (wrapped for line length) dumped.
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Read NO_PASID] Request device [aa:00.0] fault addr 0x10c3a6000
[fault reason 0x90]
SM: A/D bit update needed in first-level entry when set up in no snoop
Fixes: 289b3b005cb9 ("iommu/vt-d: Preset A/D bits for user space DMA usage")
Cc: stable@vger.kernel.org
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Link: https://lore.kernel.org/r/20221113010324.1094483-1-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20221116051544.26540-2-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
Avoid resetting the module-wide i8042_platform_device pointer in
i8042_probe() or i8042_remove(), so that the device can be properly
destroyed by i8042_exit() on module unload.
Fixes: 9222ba68c3f4 ("Input: i8042 - add deferred probe support")
Signed-off-by: Chen Jun <chenjun102@huawei.com>
Link: https://lore.kernel.org/r/20221109034148.23821-1-chenjun102@huawei.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
The page table check trigger BUG_ON() unexpectedly when collapse hugepage:
------------[ cut here ]------------
kernel BUG at mm/page_table_check.c:82!
Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 6 PID: 68 Comm: khugepaged Not tainted 6.1.0-rc3+ #750
Hardware name: linux,dummy-virt (DT)
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : page_table_check_clear.isra.0+0x258/0x3f0
lr : page_table_check_clear.isra.0+0x240/0x3f0
[...]
Call trace:
page_table_check_clear.isra.0+0x258/0x3f0
__page_table_check_pmd_clear+0xbc/0x108
pmdp_collapse_flush+0xb0/0x160
collapse_huge_page+0xa08/0x1080
hpage_collapse_scan_pmd+0xf30/0x1590
khugepaged_scan_mm_slot.constprop.0+0x52c/0xac8
khugepaged+0x338/0x518
kthread+0x278/0x2f8
ret_from_fork+0x10/0x20
[...]
Since pmd_user_accessible_page() doesn't check if a pmd is leaf, it
decrease file_map_count for a non-leaf pmd comes from collapse_huge_page().
and so trigger BUG_ON() unexpectedly.
Fix this problem by using pmd_leaf() insteal of pmd_present() in
pmd_user_accessible_page(). Moreover, use pud_leaf() for
pud_user_accessible_page() too.
Fixes: 42b2547137f5 ("arm64/mm: enable ARCH_SUPPORTS_PAGE_TABLE_CHECK")
Reported-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20221117075602.2904324-2-liushixin2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
When we post a CQE we wake all ring pollers as it normally should be.
However, if a CQE was generated by a multishot poll request targeting
its own ring, it'll wake that request up, which will make it to post
a new CQE, which will wake the request and so on until it exhausts all
CQ entries.
Don't allow multishot polling io_uring files but downgrade them to
oneshots, which was always stated as a correct behaviour that the
userspace should check for.
Cc: stable@vger.kernel.org
Fixes: aa43477b04025 ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3124038c0e7474d427538c2d915335ec28c92d21.1668785722.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There was a problem that a user burned a dm-integrity image on CDROM
and could not activate it because it had a non-empty journal.
Fix this problem by flushing the journal (done by the previous commit)
and clearing the journal (done by this commit). Once the journal is
cleared, dm-integrity won't attempt to replay it on the next
activation.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
This commit flushes the journal on suspend. It is prerequisite for the
next commit that enables activating dm integrity devices in read-only mode.
Note that we deliberately didn't flush the journal on suspend, so that the
journal replay code would be tested. However, the dm-integrity code is 5
years old now, so that journal replay is well-tested, and we can make this
change now.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
The 'no_sleep_enabled' should be decreased in error handling path
in dm_bufio_client_create() when the DM_BUFIO_CLIENT_NO_SLEEP flag
is set, otherwise static_branch_unlikely() will always return true
even if no dm_bufio_client instances have DM_BUFIO_CLIENT_NO_SLEEP
flag set.
Cc: stable@vger.kernel.org
Fixes: 3c1c875d0586 ("dm bufio: conditionally enable branching for DM_BUFIO_CLIENT_NO_SLEEP")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
__list_versions will first estimate the required space using the
"dm_target_iterate(list_version_get_needed, &needed)" call and then will
fill the space using the "dm_target_iterate(list_version_get_info,
&iter_info)" call. Each of these calls locks the targets using the
"down_read(&_lock)" and "up_read(&_lock)" calls, however between the first
and second "dm_target_iterate" there is no lock held and the target
modules can be loaded at this point, so the second "dm_target_iterate"
call may need more space than what was the first "dm_target_iterate"
returned.
The code tries to handle this overflow (see the beginning of
list_version_get_info), however this handling is incorrect.
The code sets "param->data_size = param->data_start + needed" and
"iter_info.end = (char *)vers+len" - "needed" is the size returned by the
first dm_target_iterate call; "len" is the size of the buffer allocated by
userspace.
"len" may be greater than "needed"; in this case, the code will write up
to "len" bytes into the buffer, however param->data_size is set to
"needed", so it may write data past the param->data_size value. The ioctl
interface copies only up to param->data_size into userspace, thus part of
the result will be truncated.
Fix this bug by setting "iter_info.end = (char *)vers + needed;" - this
guarantees that the second "dm_target_iterate" call will write only up to
the "needed" buffer and it will exit with "DM_BUFFER_FULL_FLAG" if it
overflows the "needed" space - in this case, userspace will allocate a
larger buffer and retry.
Note that there is also a bug in list_version_get_needed - we need to add
"strlen(tt->name) + 1" to the needed size, not "strlen(tt->name)".
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
Since commit c7e3ca515e78 ("iommu/tegra: gart: Do not register with
bus") quite some time ago, the GART driver has effectively disabled
itself to avoid issues with the GPU driver expecting it to work in ways
that it doesn't. As of commit 57365a04c921 ("iommu: Move bus setup to
IOMMU device registration") that bodge no longer works, but really the
GPU driver should be responsible for its own behaviour anyway. Make the
workaround explicit.
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Suggested-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
Entries in list 'tr->err_log' will be reused after entry number
exceed TRACING_LOG_ERRS_MAX.
The cmd string of the to be reused entry will be freed first then
allocated a new one. If the allocation failed, then the entry will
still be in list 'tr->err_log' but its 'cmd' field is set to be NULL,
later access of 'cmd' is risky.
Currently above problem can cause the loss of 'cmd' information of first
entry in 'tr->err_log'. When execute `cat /sys/kernel/tracing/error_log`,
reproduce logs like:
[ 37.495100] trace_kprobe: error: Maxactive is not for kprobe(null) ^
[ 38.412517] trace_kprobe: error: Maxactive is not for kprobe
Command: p4:myprobe2 do_sys_openat2
^
Link: https://lore.kernel.org/linux-trace-kernel/20221114104632.3547266-1-zhengyejian1@huawei.com
Fixes: 1581a884b7ca ("tracing: Remove size restriction on tracing_log_err cmd strings")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
__bad_type_size() is unused after
commit 04ae87a52074("ftrace: Rework event_create_dir()").
So, remove it.
Link: https://lkml.kernel.org/r/D062EC2E-7DB7-4402-A67E-33C3577F551E@gmail.com
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Since the eprobe filter was defined based on the eprobe's trace event
itself, it doesn't work correctly. Use the original trace event of
the eprobe when making the filter so that the filter works correctly.
Without this fix:
# echo 'e syscalls/sys_enter_openat \
flags_rename=$flags:u32 if flags < 1000' >> dynamic_events
# echo 1 > events/eprobes/sys_enter_openat/enable
[ 114.551550] event trace: Could not enable event sys_enter_openat
-bash: echo: write error: Invalid argument
With this fix:
# echo 'e syscalls/sys_enter_openat \
flags_rename=$flags:u32 if flags < 1000' >> dynamic_events
# echo 1 > events/eprobes/sys_enter_openat/enable
# tail trace
cat-241 [000] ...1. 266.498449: sys_enter_openat: (syscalls.sys_enter_openat) flags_rename=0
cat-242 [000] ...1. 266.977640: sys_enter_openat: (syscalls.sys_enter_openat) flags_rename=0
Link: https://lore.kernel.org/all/166823166395.1385292.8931770640212414483.stgit@devnote3/
Fixes: 752be5c5c910 ("tracing/eprobe: Add eprobe filter support")
Reported-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Tested-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
The filter pointer (filterp) passed to create_filter() function must be a
pointer that references a NULL pointer, otherwise, we get a warning when
adding a filter option to the event probe:
root@localhost:/sys/kernel/tracing# echo 'e:egroup/stat_runtime_4core sched/sched_stat_runtime \
runtime=$runtime:u32 if cpu < 4' >> dynamic_events
[ 5034.340439] ------------[ cut here ]------------
[ 5034.341258] WARNING: CPU: 0 PID: 223 at kernel/trace/trace_events_filter.c:1939 create_filter+0x1db/0x250
[...] stripped
[ 5034.345518] RIP: 0010:create_filter+0x1db/0x250
[...] stripped
[ 5034.351604] Call Trace:
[ 5034.351803] <TASK>
[ 5034.351959] ? process_preds+0x1b40/0x1b40
[ 5034.352241] ? rcu_read_lock_bh_held+0xd0/0xd0
[ 5034.352604] ? kasan_set_track+0x29/0x40
[ 5034.352904] ? kasan_save_alloc_info+0x1f/0x30
[ 5034.353264] create_event_filter+0x38/0x50
[ 5034.353573] __trace_eprobe_create+0x16f4/0x1d20
[ 5034.353964] ? eprobe_dyn_event_release+0x360/0x360
[ 5034.354363] ? mark_held_locks+0xa6/0xf0
[ 5034.354684] ? _raw_spin_unlock_irqrestore+0x35/0x60
[ 5034.355105] ? trace_hardirqs_on+0x41/0x120
[ 5034.355417] ? _raw_spin_unlock_irqrestore+0x35/0x60
[ 5034.355751] ? __create_object+0x5b7/0xcf0
[ 5034.356027] ? lock_is_held_type+0xaf/0x120
[ 5034.356362] ? rcu_read_lock_bh_held+0xb0/0xd0
[ 5034.356716] ? rcu_read_lock_bh_held+0xd0/0xd0
[ 5034.357084] ? kasan_set_track+0x29/0x40
[ 5034.357411] ? kasan_save_alloc_info+0x1f/0x30
[ 5034.357715] ? __kasan_kmalloc+0xb8/0xc0
[ 5034.357985] ? write_comp_data+0x2f/0x90
[ 5034.358302] ? __sanitizer_cov_trace_pc+0x25/0x60
[ 5034.358691] ? argv_split+0x381/0x460
[ 5034.358949] ? write_comp_data+0x2f/0x90
[ 5034.359240] ? eprobe_dyn_event_release+0x360/0x360
[ 5034.359620] trace_probe_create+0xf6/0x110
[ 5034.359940] ? trace_probe_match_command_args+0x240/0x240
[ 5034.360376] eprobe_dyn_event_create+0x21/0x30
[ 5034.360709] create_dyn_event+0xf3/0x1a0
[ 5034.360983] trace_parse_run_command+0x1a9/0x2e0
[ 5034.361297] ? dyn_event_release+0x500/0x500
[ 5034.361591] dyn_event_write+0x39/0x50
[ 5034.361851] vfs_write+0x311/0xe50
[ 5034.362091] ? dyn_event_seq_next+0x40/0x40
[ 5034.362376] ? kernel_write+0x5b0/0x5b0
[ 5034.362637] ? write_comp_data+0x2f/0x90
[ 5034.362937] ? __sanitizer_cov_trace_pc+0x25/0x60
[ 5034.363258] ? ftrace_syscall_enter+0x544/0x840
[ 5034.363563] ? write_comp_data+0x2f/0x90
[ 5034.363837] ? __sanitizer_cov_trace_pc+0x25/0x60
[ 5034.364156] ? write_comp_data+0x2f/0x90
[ 5034.364468] ? write_comp_data+0x2f/0x90
[ 5034.364770] ksys_write+0x158/0x2a0
[ 5034.365022] ? __ia32_sys_read+0xc0/0xc0
[ 5034.365344] __x64_sys_write+0x7c/0xc0
[ 5034.365669] ? syscall_enter_from_user_mode+0x53/0x70
[ 5034.366084] do_syscall_64+0x60/0x90
[ 5034.366356] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 5034.366767] RIP: 0033:0x7ff0b43938f3
[...] stripped
[ 5034.371892] </TASK>
[ 5034.374720] ---[ end trace 0000000000000000 ]---
Link: https://lore.kernel.org/all/20221108202148.1020111-1-rafaelmendsr@gmail.com/
Fixes: 752be5c5c910 ("tracing/eprobe: Add eprobe filter support")
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
In __unregister_kprobe_top(), if the currently unregistered probe has
post_handler but other child probes of the aggrprobe do not have
post_handler, the post_handler of the aggrprobe is cleared. If this is
a ftrace-based probe, there is a problem. In later calls to
disarm_kprobe(), we will use kprobe_ftrace_ops because post_handler is
NULL. But we're armed with kprobe_ipmodify_ops. This triggers a WARN in
__disarm_kprobe_ftrace() and may even cause use-after-free:
Failed to disarm kprobe-ftrace at kernel_clone+0x0/0x3c0 (error -2)
WARNING: CPU: 5 PID: 137 at kernel/kprobes.c:1135 __disarm_kprobe_ftrace.isra.21+0xcf/0xe0
Modules linked in: testKprobe_007(-)
CPU: 5 PID: 137 Comm: rmmod Not tainted 6.1.0-rc4-dirty #18
[...]
Call Trace:
<TASK>
__disable_kprobe+0xcd/0xe0
__unregister_kprobe_top+0x12/0x150
? mutex_lock+0xe/0x30
unregister_kprobes.part.23+0x31/0xa0
unregister_kprobe+0x32/0x40
__x64_sys_delete_module+0x15e/0x260
? do_user_addr_fault+0x2cd/0x6b0
do_syscall_64+0x3a/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
[...]
For the kprobe-on-ftrace case, we keep the post_handler setting to
identify this aggrprobe armed with kprobe_ipmodify_ops. This way we
can disarm it correctly.
Link: https://lore.kernel.org/all/20221112070000.35299-1-lihuafei1@huawei.com/
Fixes: 0bc11ed5ab60 ("kprobes: Allow kprobes coexist with livepatch")
Reported-by: Zhao Gongyi <zhaogongyi@huawei.com>
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
In rethook_alloc(), the variable rh is not freed or passed out
if handler is NULL, which could lead to a memleak, fix it.
Link: https://lore.kernel.org/all/20221110104438.88099-1-yiyang13@huawei.com/
[Masami: Add "rethook:" tag to the title.]
Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook")
Cc: stable@vger.kernel.org
Signed-off-by: Yi Yang <yiyang13@huawei.com>
Acke-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
The filter string doesn't get freed when a dynamic event is deleted. If a
filter is set, then memory is leaked:
root@localhost:/sys/kernel/tracing# echo 'e:egroup/stat_runtime_4core \
sched/sched_stat_runtime runtime=$runtime:u32 if cpu < 4' >> dynamic_events
root@localhost:/sys/kernel/tracing# echo "-:egroup/stat_runtime_4core" >> dynamic_events
root@localhost:/sys/kernel/tracing# echo scan > /sys/kernel/debug/kmemleak
[ 224.416373] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
root@localhost:/sys/kernel/tracing# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff88810156f1b8 (size 8):
comm "bash", pid 224, jiffies 4294935612 (age 55.800s)
hex dump (first 8 bytes):
63 70 75 20 3c 20 34 00 cpu < 4.
backtrace:
[<000000009f880725>] __kmem_cache_alloc_node+0x18e/0x720
[<0000000042492946>] __kmalloc+0x57/0x240
[<0000000034ea7995>] __trace_eprobe_create+0x1214/0x1d30
[<00000000d70ef730>] trace_probe_create+0xf6/0x110
[<00000000915c7b16>] eprobe_dyn_event_create+0x21/0x30
[<000000000d894386>] create_dyn_event+0xf3/0x1a0
[<00000000e9af57d5>] trace_parse_run_command+0x1a9/0x2e0
[<0000000080777f18>] dyn_event_write+0x39/0x50
[<0000000089f0ec73>] vfs_write+0x311/0xe50
[<000000003da1bdda>] ksys_write+0x158/0x2a0
[<00000000bb1e616e>] __x64_sys_write+0x7c/0xc0
[<00000000e8aef1f7>] do_syscall_64+0x60/0x90
[<00000000fe7fe8ba>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Additionally, in __trace_eprobe_create() function, if an error occurs after
the call to trace_eprobe_parse_filter(), which allocates the filter string,
then memory is also leaked. That can be reproduced by creating the same
event probe twice:
root@localhost:/sys/kernel/tracing# echo 'e:egroup/stat_runtime_4core \
sched/sched_stat_runtime runtime=$runtime:u32 if cpu < 4' >> dynamic_events
root@localhost:/sys/kernel/tracing# echo 'e:egroup/stat_runtime_4core \
sched/sched_stat_runtime runtime=$runtime:u32 if cpu < 4' >> dynamic_events
-bash: echo: write error: File exists
root@localhost:/sys/kernel/tracing# echo scan > /sys/kernel/debug/kmemleak
[ 207.871584] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
root@localhost:/sys/kernel/tracing# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8881020d17a8 (size 8):
comm "bash", pid 223, jiffies 4294938308 (age 31.000s)
hex dump (first 8 bytes):
63 70 75 20 3c 20 34 00 cpu < 4.
backtrace:
[<000000000e4f5f31>] __kmem_cache_alloc_node+0x18e/0x720
[<0000000024f0534b>] __kmalloc+0x57/0x240
[<000000002930a28e>] __trace_eprobe_create+0x1214/0x1d30
[<0000000028387903>] trace_probe_create+0xf6/0x110
[<00000000a80d6a9f>] eprobe_dyn_event_create+0x21/0x30
[<000000007168698c>] create_dyn_event+0xf3/0x1a0
[<00000000f036bf6a>] trace_parse_run_command+0x1a9/0x2e0
[<00000000014bde8b>] dyn_event_write+0x39/0x50
[<0000000078a097f7>] vfs_write+0x311/0xe50
[<00000000996cb208>] ksys_write+0x158/0x2a0
[<00000000a3c2acb0>] __x64_sys_write+0x7c/0xc0
[<0000000006b5d698>] do_syscall_64+0x60/0x90
[<00000000780e8ecf>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fix both issues by releasing the filter string in
trace_event_probe_cleanup().
Link: https://lore.kernel.org/all/20221108235738.1021467-1-rafaelmendsr@gmail.com/
Fixes: 752be5c5c910 ("tracing/eprobe: Add eprobe filter support")
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
When test_gen_kprobe_cmd() failed after kprobe_event_gen_cmd_end(), it
will goto delete, which will call kprobe_event_delete() and release the
corresponding resource. However, the trace_array in gen_kretprobe_test
will point to the invalid resource. Set gen_kretprobe_test to NULL
after called kprobe_event_delete() to prevent null-ptr-deref.
BUG: kernel NULL pointer dereference, address: 0000000000000070
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 246 Comm: modprobe Tainted: G W
6.1.0-rc1-00174-g9522dc5c87da-dirty #248
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:__ftrace_set_clr_event_nolock+0x53/0x1b0
Code: e8 82 26 fc ff 49 8b 1e c7 44 24 0c ea ff ff ff 49 39 de 0f 84 3c
01 00 00 c7 44 24 18 00 00 00 00 e8 61 26 fc ff 48 8b 6b 10 <44> 8b 65
70 4c 8b 6d 18 41 f7 c4 00 02 00 00 75 2f
RSP: 0018:ffffc9000159fe00 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88810971d268 RCX: 0000000000000000
RDX: ffff8881080be600 RSI: ffffffff811b48ff RDI: ffff88810971d058
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
R10: ffffc9000159fe58 R11: 0000000000000001 R12: ffffffffa0001064
R13: ffffffffa000106c R14: ffff88810971d238 R15: 0000000000000000
FS: 00007f89eeff6540(0000) GS:ffff88813b600000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000070 CR3: 000000010599e004 CR4: 0000000000330ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__ftrace_set_clr_event+0x3e/0x60
trace_array_set_clr_event+0x35/0x50
? 0xffffffffa0000000
kprobe_event_gen_test_exit+0xcd/0x10b [kprobe_event_gen_test]
__x64_sys_delete_module+0x206/0x380
? lockdep_hardirqs_on_prepare+0xd8/0x190
? syscall_enter_from_user_mode+0x1c/0x50
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f89eeb061b7
Link: https://lore.kernel.org/all/20221108015130.28326-3-shangxiaojing@huawei.com/
Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
When trace_get_event_file() failed, gen_kretprobe_test will be assigned
as the error code. If module kprobe_event_gen_test is removed now, the
null pointer dereference will happen in kprobe_event_gen_test_exit().
Check if gen_kprobe_test or gen_kretprobe_test is error code or NULL
before dereference them.
BUG: kernel NULL pointer dereference, address: 0000000000000012
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 3 PID: 2210 Comm: modprobe Not tainted
6.1.0-rc1-00171-g2159299a3b74-dirty #217
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:kprobe_event_gen_test_exit+0x1c/0xb5 [kprobe_event_gen_test]
Code: Unable to access opcode bytes at 0xffffffff9ffffff2.
RSP: 0018:ffffc900015bfeb8 EFLAGS: 00010246
RAX: ffffffffffffffea RBX: ffffffffa0002080 RCX: 0000000000000000
RDX: ffffffffa0001054 RSI: ffffffffa0001064 RDI: ffffffffdfc6349c
RBP: ffffffffa0000000 R08: 0000000000000004 R09: 00000000001e95c0
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000800
R13: ffffffffa0002420 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f56b75be540(0000) GS:ffff88813bc00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff9ffffff2 CR3: 000000010874a006 CR4: 0000000000330ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__x64_sys_delete_module+0x206/0x380
? lockdep_hardirqs_on_prepare+0xd8/0x190
? syscall_enter_from_user_mode+0x1c/0x50
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Link: https://lore.kernel.org/all/20221108015130.28326-2-shangxiaojing@huawei.com/
Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
In register_synth_event(), if set_synth_event_print_fmt() failed, then
both trace_remove_event_call() and unregister_trace_event() will be
called, which means the trace_event_call will call
__unregister_trace_event() twice. As the result, the second unregister
will causes the wild-memory-access.
register_synth_event
set_synth_event_print_fmt failed
trace_remove_event_call
event_remove
if call->event.funcs then
__unregister_trace_event (first call)
unregister_trace_event
__unregister_trace_event (second call)
Fix the bug by avoiding to call the second __unregister_trace_event() by
checking if the first one is called.
general protection fault, probably for non-canonical address
0xfbd59c0000000024: 0000 [#1] SMP KASAN PTI
KASAN: maybe wild-memory-access in range
[0xdead000000000120-0xdead000000000127]
CPU: 0 PID: 3807 Comm: modprobe Not tainted
6.1.0-rc1-00186-g76f33a7eedb4 #299
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
RIP: 0010:unregister_trace_event+0x6e/0x280
Code: 00 fc ff df 4c 89 ea 48 c1 ea 03 80 3c 02 00 0f 85 0e 02 00 00 48
b8 00 00 00 00 00 fc ff df 4c 8b 63 08 4c 89 e2 48 c1 ea 03 <80> 3c 02
00 0f 85 e2 01 00 00 49 89 2c 24 48 85 ed 74 28 e8 7a 9b
RSP: 0018:ffff88810413f370 EFLAGS: 00010a06
RAX: dffffc0000000000 RBX: ffff888105d050b0 RCX: 0000000000000000
RDX: 1bd5a00000000024 RSI: ffff888119e276e0 RDI: ffffffff835a8b20
RBP: dead000000000100 R08: 0000000000000000 R09: fffffbfff0913481
R10: ffffffff8489a407 R11: fffffbfff0913480 R12: dead000000000122
R13: ffff888105d050b8 R14: 0000000000000000 R15: ffff888105d05028
FS: 00007f7823e8d540(0000) GS:ffff888119e00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7823e7ebec CR3: 000000010a058002 CR4: 0000000000330ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__create_synth_event+0x1e37/0x1eb0
create_or_delete_synth_event+0x110/0x250
synth_event_run_command+0x2f/0x110
test_gen_synth_cmd+0x170/0x2eb [synth_event_gen_test]
synth_event_gen_test_init+0x76/0x9bc [synth_event_gen_test]
do_one_initcall+0xdb/0x480
do_init_module+0x1cf/0x680
load_module+0x6a50/0x70a0
__do_sys_finit_module+0x12f/0x1c0
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Link: https://lkml.kernel.org/r/20221117012346.22647-3-shangxiaojing@huawei.com
Fixes: 4b147936fa50 ("tracing: Add support for 'synthetic' events")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Cc: stable@vger.kernel.org
Cc: <mhiramat@kernel.org>
Cc: <zanussi@kernel.org>
Cc: <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
test_gen_synth_cmd() only free buf in fail path, hence buf will leak
when there is no failure. Add kfree(buf) to prevent the memleak. The
same reason and solution in test_empty_synth_event().
unreferenced object 0xffff8881127de000 (size 2048):
comm "modprobe", pid 247, jiffies 4294972316 (age 78.756s)
hex dump (first 32 bytes):
20 67 65 6e 5f 73 79 6e 74 68 5f 74 65 73 74 20 gen_synth_test
20 70 69 64 5f 74 20 6e 65 78 74 5f 70 69 64 5f pid_t next_pid_
backtrace:
[<000000004254801a>] kmalloc_trace+0x26/0x100
[<0000000039eb1cf5>] 0xffffffffa00083cd
[<000000000e8c3bc8>] 0xffffffffa00086ba
[<00000000c293d1ea>] do_one_initcall+0xdb/0x480
[<00000000aa189e6d>] do_init_module+0x1cf/0x680
[<00000000d513222b>] load_module+0x6a50/0x70a0
[<000000001fd4d529>] __do_sys_finit_module+0x12f/0x1c0
[<00000000b36c4c0f>] do_syscall_64+0x3f/0x90
[<00000000bbf20cf3>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
unreferenced object 0xffff8881127df000 (size 2048):
comm "modprobe", pid 247, jiffies 4294972324 (age 78.728s)
hex dump (first 32 bytes):
20 65 6d 70 74 79 5f 73 79 6e 74 68 5f 74 65 73 empty_synth_tes
74 20 20 70 69 64 5f 74 20 6e 65 78 74 5f 70 69 t pid_t next_pi
backtrace:
[<000000004254801a>] kmalloc_trace+0x26/0x100
[<00000000d4db9a3d>] 0xffffffffa0008071
[<00000000c31354a5>] 0xffffffffa00086ce
[<00000000c293d1ea>] do_one_initcall+0xdb/0x480
[<00000000aa189e6d>] do_init_module+0x1cf/0x680
[<00000000d513222b>] load_module+0x6a50/0x70a0
[<000000001fd4d529>] __do_sys_finit_module+0x12f/0x1c0
[<00000000b36c4c0f>] do_syscall_64+0x3f/0x90
[<00000000bbf20cf3>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Link: https://lkml.kernel.org/r/20221117012346.22647-2-shangxiaojing@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <zanussi@kernel.org>
Cc: <fengguang.wu@intel.com>
Cc: stable@vger.kernel.org
Fixes: 9fe41efaca08 ("tracing: Add synth event generation test module")
Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The @ftrace_mod is allocated by kzalloc(), so both the members {prev,next}
of @ftrace_mode->list are NULL, it's not a valid state to call list_del().
If kstrdup() for @ftrace_mod->{func|module} fails, it goes to @out_free
tag and calls free_ftrace_mod() to destroy @ftrace_mod, then list_del()
will write prev->next and next->prev, where null pointer dereference
happens.
BUG: kernel NULL pointer dereference, address: 0000000000000008
Oops: 0002 [#1] PREEMPT SMP NOPTI
Call Trace:
<TASK>
ftrace_mod_callback+0x20d/0x220
? do_filp_open+0xd9/0x140
ftrace_process_regex.isra.51+0xbf/0x130
ftrace_regex_write.isra.52.part.53+0x6e/0x90
vfs_write+0xee/0x3a0
? __audit_filter_op+0xb1/0x100
? auditd_test_task+0x38/0x50
ksys_write+0xa5/0xe0
do_syscall_64+0x3a/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Kernel panic - not syncing: Fatal exception
So call INIT_LIST_HEAD() to initialize the list member to fix this issue.
Link: https://lkml.kernel.org/r/20221116015207.30858-1-xiujianfeng@huawei.com
Cc: stable@vger.kernel.org
Fixes: 673feb9d76ab ("ftrace: Add :mod: caching infrastructure to trace_array")
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
rb_head_page_deactivate() expects cpu_buffer to contain a valid list of
->pages, so verify that the list is actually present before calling it.
Found by Linux Verification Center (linuxtesting.org) with the SVACE
static analysis tool.
Link: https://lkml.kernel.org/r/20221114143129.3534443-1-d-tatianin@yandex-team.ru
Cc: stable@vger.kernel.org
Fixes: 77ae365eca895 ("ring-buffer: make lockless")
Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
If we can't allocate this size, try something smaller with half of the
size. Its order should be decreased by one instead of divided by two.
Link: https://lkml.kernel.org/r/20221109094434.84046-3-wangwensheng4@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Fixes: a79008755497d ("ftrace: Allocate the mcount record pages as groups")
Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
If the number of mcount entries is an integer multiple of
ENTRIES_PER_PAGE, the page count showing on the console would be wrong.
Link: https://lkml.kernel.org/r/20221109094434.84046-2-wangwensheng4@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: stable@vger.kernel.org
Fixes: 5821e1b74f0d0 ("function tracing: fix wrong pos computing when read buffer has been fulfilled")
Signed-off-by: Wang Wensheng <wangwensheng4@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Having REQ_F_POLLED set doesn't guarantee that the request is
executed as a multishot from the polling path. Fortunately for us, if
the code thinks it's multishot issue when it's not, it can only ask to
skip completion so leaking the request. Use issue_flags to mark
multipoll issues.
Cc: stable@vger.kernel.org
Fixes: 1300ebb20286b ("io_uring: multishot recv")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/37762040ba9c52b81b92a2f5ebfd4ee484088951.1668710222.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Having REQ_F_POLLED set doesn't guarantee that the request is
executed as a multishot from the polling path. Fortunately for us, if
the code thinks it's multishot issue when it's not, it can only ask to
skip completion so leaking the request. Use issue_flags to mark
multipoll issues.
Cc: stable@vger.kernel.org
Fixes: 390ed29b5e425 ("io_uring: add IORING_ACCEPT_MULTISHOT for accept")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7700ac57653f2823e30b34dc74da68678c0c5f13.1668710222.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We may never try to process a poll wake and its mask if there was
multiple wake ups racing for queueing up a tw. Force
io_poll_check_events() to update the mask by vfs_poll().
Cc: stable@vger.kernel.org
Fixes: aa43477b04025 ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/00344d60f8b18907171178d7cf598de71d127b0b.1668710222.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When io_poll_check_events() collides with someone attempting to queue a
task work, it'll spin for one more time. However, it'll continue to use
the mask from the first iteration instead of updating it. For example,
if the first wake up was a EPOLLIN and the second EPOLLOUT, the
userspace will not get EPOLLOUT in time.
Clear the mask for all subsequent iterations to force vfs_poll().
Cc: stable@vger.kernel.org
Fixes: aa43477b04025 ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2dac97e8f691231049cb259c4ae57e79e40b537c.1668710222.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If device_register() returns error, the name allocated by the
dev_set_name() need be freed. As described in the comment of
device_register(), we should use put_device() to give up the reference in
the error path.
Fix this by calling put_device(), the name will be freed in the
kobject_cleanup(), and this patch modified resources will be released by
calling the corresponding callback function in the device_release().
Signed-off-by: Zhou Guanghui <zhouguanghui1@huawei.com>
Link: https://lore.kernel.org/r/20221110033729.1555-1-zhouguanghui1@huawei.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
We used to use the wrong type of integer in 'zfcp_fsf_req_send()' to cache
the FSF request ID when sending a new FSF request. This is used in case the
sending fails and we need to remove the request from our internal hash
table again (so we don't keep an invalid reference and use it when we free
the request again).
In 'zfcp_fsf_req_send()' we used to cache the ID as 'int' (signed and 32
bit wide), but the rest of the zfcp code (and the firmware specification)
handles the ID as 'unsigned long'/'u64' (unsigned and 64 bit wide [s390x
ELF ABI]). For one this has the obvious problem that when the ID grows
past 32 bit (this can happen reasonably fast) it is truncated to 32 bit
when storing it in the cache variable and so doesn't match the original ID
anymore. The second less obvious problem is that even when the original ID
has not yet grown past 32 bit, as soon as the 32nd bit is set in the
original ID (0x80000000 = 2'147'483'648) we will have a mismatch when we
cast it back to 'unsigned long'. As the cached variable is of a signed
type, the compiler will choose a sign-extending instruction to load the 32
bit variable into a 64 bit register (e.g.: 'lgf %r11,188(%r15)'). So once
we pass the cached variable into 'zfcp_reqlist_find_rm()' to remove the
request again all the leading zeros will be flipped to ones to extend the
sign and won't match the original ID anymore (this has been observed in
practice).
If we can't successfully remove the request from the hash table again after
'zfcp_qdio_send()' fails (this happens regularly when zfcp cannot notify
the adapter about new work because the adapter is already gone during
e.g. a ChpID toggle) we will end up with a double free. We unconditionally
free the request in the calling function when 'zfcp_fsf_req_send()' fails,
but because the request is still in the hash table we end up with a stale
memory reference, and once the zfcp adapter is either reset during recovery
or shutdown we end up freeing the same memory twice.
The resulting stack traces vary depending on the kernel and have no direct
correlation to the place where the bug occurs. Here are three examples that
have been seen in practice:
list_del corruption. next->prev should be 00000001b9d13800, but was 00000000dead4ead. (next=00000001bd131a00)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:62!
monitor event: 0040 ilc:2 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 9 PID: 1617 Comm: zfcperp0.0.1740 Kdump: loaded
Hardware name: ...
Krnl PSW : 0704d00180000000 00000003cbeea1f8 (__list_del_entry_valid+0x98/0x140)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 00000000916d12f1 0000000080000000 000000000000006d 00000003cb665cd6
0000000000000001 0000000000000000 0000000000000000 00000000d28d21e8
00000000d3844000 00000380099efd28 00000001bd131a00 00000001b9d13800
00000000d3290100 0000000000000000 00000003cbeea1f4 00000380099efc70
Krnl Code: 00000003cbeea1e8: c020004f68a7 larl %r2,00000003cc8d7336
00000003cbeea1ee: c0e50027fd65 brasl %r14,00000003cc3e9cb8
#00000003cbeea1f4: af000000 mc 0,0
>00000003cbeea1f8: c02000920440 larl %r2,00000003cd12aa78
00000003cbeea1fe: c0e500289c25 brasl %r14,00000003cc3fda48
00000003cbeea204: b9040043 lgr %r4,%r3
00000003cbeea208: b9040051 lgr %r5,%r1
00000003cbeea20c: b9040032 lgr %r3,%r2
Call Trace:
[<00000003cbeea1f8>] __list_del_entry_valid+0x98/0x140
([<00000003cbeea1f4>] __list_del_entry_valid+0x94/0x140)
[<000003ff7ff502fe>] zfcp_fsf_req_dismiss_all+0xde/0x150 [zfcp]
[<000003ff7ff49cd0>] zfcp_erp_strategy_do_action+0x160/0x280 [zfcp]
[<000003ff7ff4a22e>] zfcp_erp_strategy+0x21e/0xca0 [zfcp]
[<000003ff7ff4ad34>] zfcp_erp_thread+0x84/0x1a0 [zfcp]
[<00000003cb5eece8>] kthread+0x138/0x150
[<00000003cb557f3c>] __ret_from_fork+0x3c/0x60
[<00000003cc4172ea>] ret_from_fork+0xa/0x40
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<00000003cc3e9d04>] _printk+0x4c/0x58
Kernel panic - not syncing: Fatal exception: panic_on_oops
or:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803
Fault in home space mode while using kernel ASCE.
AS:0000000063b10007 R3:0000000000000024
Oops: 0038 ilc:3 [#1] SMP
Modules linked in: ...
CPU: 10 PID: 0 Comm: swapper/10 Kdump: loaded
Hardware name: ...
Krnl PSW : 0404d00180000000 000003ff7febaf8e (zfcp_fsf_reqid_check+0x86/0x158 [zfcp])
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 5a6f1cfa89c49ac3 00000000aff2c4c8 6b6b6b6b6b6b6b6b 00000000000002a8
0000000000000000 0000000000000055 0000000000000000 00000000a8515800
0700000000000000 00000000a6e14500 00000000aff2c000 000000008003c44c
000000008093c700 0000000000000010 00000380009ebba8 00000380009ebb48
Krnl Code: 000003ff7febaf7e: a7f4003d brc 15,000003ff7febaff8
000003ff7febaf82: e32020000004 lg %r2,0(%r2)
#000003ff7febaf88: ec2100388064 cgrj %r2,%r1,8,000003ff7febaff8
>000003ff7febaf8e: e3b020100020 cg %r11,16(%r2)
000003ff7febaf94: a774fff7 brc 7,000003ff7febaf82
000003ff7febaf98: ec280030007c cgij %r2,0,8,000003ff7febaff8
000003ff7febaf9e: e31020080004 lg %r1,8(%r2)
000003ff7febafa4: e33020000004 lg %r3,0(%r2)
Call Trace:
[<000003ff7febaf8e>] zfcp_fsf_reqid_check+0x86/0x158 [zfcp]
[<000003ff7febbdbc>] zfcp_qdio_int_resp+0x6c/0x170 [zfcp]
[<000003ff7febbf90>] zfcp_qdio_irq_tasklet+0xd0/0x108 [zfcp]
[<0000000061d90a04>] tasklet_action_common.constprop.0+0xdc/0x128
[<000000006292f300>] __do_softirq+0x130/0x3c0
[<0000000061d906c6>] irq_exit_rcu+0xfe/0x118
[<000000006291e818>] do_io_irq+0xc8/0x168
[<000000006292d516>] io_int_handler+0xd6/0x110
[<000000006292d596>] psw_idle_exit+0x0/0xa
([<0000000061d3be50>] arch_cpu_idle+0x40/0xd0)
[<000000006292ceea>] default_idle_call+0x52/0xf8
[<0000000061de4fa4>] do_idle+0xd4/0x168
[<0000000061de51fe>] cpu_startup_entry+0x36/0x40
[<0000000061d4faac>] smp_start_secondary+0x12c/0x138
[<000000006292d88e>] restart_int_handler+0x6e/0x90
Last Breaking-Event-Address:
[<000003ff7febaf94>] zfcp_fsf_reqid_check+0x8c/0x158 [zfcp]
Kernel panic - not syncing: Fatal exception in interrupt
or:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 523b05d3ae76a000 TEID: 523b05d3ae76a803
Fault in home space mode while using kernel ASCE.
AS:0000000077c40007 R3:0000000000000024
Oops: 0038 ilc:3 [#1] SMP
Modules linked in: ...
CPU: 3 PID: 453 Comm: kworker/3:1H Kdump: loaded
Hardware name: ...
Workqueue: kblockd blk_mq_run_work_fn
Krnl PSW : 0404d00180000000 0000000076fc0312 (__kmalloc+0xd2/0x398)
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: ffffffffffffffff 523b05d3ae76abf6 0000000000000000 0000000000092a20
0000000000000002 00000007e49b5cc0 00000007eda8f000 0000000000092a20
00000007eda8f000 00000003b02856b9 00000000000000a8 523b05d3ae76abf6
00000007dd662000 00000007eda8f000 0000000076fc02b2 000003e0037637a0
Krnl Code: 0000000076fc0302: c004000000d4 brcl 0,76fc04aa
0000000076fc0308: b904001b lgr %r1,%r11
#0000000076fc030c: e3106020001a algf %r1,32(%r6)
>0000000076fc0312: e31010000082 xg %r1,0(%r1)
0000000076fc0318: b9040001 lgr %r0,%r1
0000000076fc031c: e30061700082 xg %r0,368(%r6)
0000000076fc0322: ec59000100d9 aghik %r5,%r9,1
0000000076fc0328: e34003b80004 lg %r4,952
Call Trace:
[<0000000076fc0312>] __kmalloc+0xd2/0x398
[<0000000076f318f2>] mempool_alloc+0x72/0x1f8
[<000003ff8027c5f8>] zfcp_fsf_req_create.isra.7+0x40/0x268 [zfcp]
[<000003ff8027f1bc>] zfcp_fsf_fcp_cmnd+0xac/0x3f0 [zfcp]
[<000003ff80280f1a>] zfcp_scsi_queuecommand+0x122/0x1d0 [zfcp]
[<000003ff800b4218>] scsi_queue_rq+0x778/0xa10 [scsi_mod]
[<00000000771782a0>] __blk_mq_try_issue_directly+0x130/0x208
[<000000007717a124>] blk_mq_request_issue_directly+0x4c/0xa8
[<000003ff801302e2>] dm_mq_queue_rq+0x2ea/0x468 [dm_mod]
[<0000000077178c12>] blk_mq_dispatch_rq_list+0x33a/0x818
[<000000007717f064>] __blk_mq_do_dispatch_sched+0x284/0x2f0
[<000000007717f44c>] __blk_mq_sched_dispatch_requests+0x1c4/0x218
[<000000007717fa7a>] blk_mq_sched_dispatch_requests+0x52/0x90
[<0000000077176d74>] __blk_mq_run_hw_queue+0x9c/0xc0
[<0000000076da6d74>] process_one_work+0x274/0x4d0
[<0000000076da7018>] worker_thread+0x48/0x560
[<0000000076daef18>] kthread+0x140/0x160
[<000000007751d144>] ret_from_fork+0x28/0x30
Last Breaking-Event-Address:
[<0000000076fc0474>] __kmalloc+0x234/0x398
Kernel panic - not syncing: Fatal exception: panic_on_oops
To fix this, simply change the type of the cache variable to 'unsigned
long', like the rest of zfcp and also the argument for
'zfcp_reqlist_find_rm()'. This prevents truncation and wrong sign extension
and so can successfully remove the request from the hash table.
Fixes: e60a6d69f1f8 ("[SCSI] zfcp: Remove function zfcp_reqlist_find_safe")
Cc: <stable@vger.kernel.org> #v2.6.34+
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Link: https://lore.kernel.org/r/979f6e6019d15f91ba56182f1aaf68d61bf37fc6.1668595505.git.bblock@linux.ibm.com
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
If device_register() fails in sdebug_add_host_helper(), it will goto clean
and sdbg_host will be freed, but sdbg_host->host_list will not be removed
from sdebug_host_list, then list traversal may cause UAF. Fix it.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Yuan Can <yuancan@huawei.com>
Link: https://lore.kernel.org/r/20221117084421.58918-1-yuancan@huawei.com
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
If device_register() fails in tcm_loop_setup_hba_bus(), the name allocated
by dev_set_name() need be freed. As comment of device_register() says, it
should use put_device() to give up the reference in the error path. So fix
this by calling put_device(), then the name can be freed in kobject_cleanup().
The 'tl_hba' will be freed in tcm_loop_release_adapter(), so it don't need
goto error label in this case.
Fixes: 3703b2c5d041 ("[SCSI] tcm_loop: Add multi-fabric Linux/SCSI LLD fabric module")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Link: https://lore.kernel.org/r/20221115015042.3652261-1-yangyingliang@huawei.com
Reviewed-by: Mike Christie <michael.chritie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
After it receives command reply, mpi3mr driver checks command result. If
the result is not zero, it prints out command information. This debug
information is confusing since they are printed even when the non-zero
result is expected. "Power-on or device reset occurred" is printed for Test
Unit Ready command at drive detection. Inquiry failure for unsupported VPD
page header is also printed. They are harmless but look like failures.
To avoid the confusion, print the command reply debug information only when
the module parameter logging_level has value MPI3_DEBUG_SCSI_ERROR= 64, in
same manner as mpt3sas driver.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20221111014449.1649968-1-shinichiro.kawasaki@wdc.com
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
kernel test robot reported warnings when build bonding module with
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/net/bonding/:
from ../drivers/net/bonding/bond_main.c:35:
In function ‘fortify_memcpy_chk’,
inlined from ‘iph_to_flow_copy_v4addrs’ at ../include/net/ip.h:566:2,
inlined from ‘bond_flow_ip’ at ../drivers/net/bonding/bond_main.c:3984:3:
../include/linux/fortify-string.h:413:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of f
ield (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
413 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘fortify_memcpy_chk’,
inlined from ‘iph_to_flow_copy_v6addrs’ at ../include/net/ipv6.h:900:2,
inlined from ‘bond_flow_ip’ at ../drivers/net/bonding/bond_main.c:3994:3:
../include/linux/fortify-string.h:413:25: warning: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of f
ield (2nd parameter); maybe use struct_group()? [-Wattribute-warning]
413 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is because we try to copy the whole ip/ip6 address to the flow_key,
while we only point the to ip/ip6 saddr. Note that since these are UAPI
headers, __struct_group() is used to avoid the compiler warnings.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: c3f8324188fa ("net: Add full IPv6 addresses to flow_keys")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20221115142400.1204786-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
An external PHY needs settling time after power up or reset.
In the bind() function an mdio bus is registered. If at this point
the external PHY is still initialising, no valid PHY ID will be
read and on phy_find_first() the bind() function will fail.
If an external PHY is present, wait the maximum time specified
in 802.3 45.2.7.1.1.
Fixes: 05b35e7eb9a1 ("smsc95xx: add phylib support")
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20221115114434.9991-2-alexandru.tachici@analog.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add the following Telit LE910C4-WWX composition:
0x103a: rmnet
Signed-off-by: Enrico Sau <enrico.sau@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Link: https://lore.kernel.org/r/20221115105859.14324-1-enrico.sau@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Since 2df8220cc511 ("kbuild: build init/built-in.a just once"),
generating Debian packages using 'make bindeb-pkg' results in
packages that are stuck to the same .version, leading to unexpected
behaviours (multiple packages with the same version).
That's because the mkdebian script samples the build version
before building the kernel, and forces the use of that version
number for the actual build.
Restore the previous behaviour by calling init/build-version
instead of reading the .version file. This is likely to result
in too many .version bumps, but this is what was happening before
(although the bump was affecting builds made after the current one).
Fixes: 2df8220cc511 ("kbuild: build init/built-in.a just once")
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Move the declaration of 'struct trace_array' out of #ifdef
CONFIG_TRACING block, to fix the following warning when CONFIG_TRACING
is not set:
>> include/linux/trace.h:63:45: warning: 'struct trace_array' declared
inside parameter list will not be visible outside of this definition or
declaration
Link: https://lkml.kernel.org/r/20221107160556.2139463-1-shraash@google.com
Fixes: 1a77dd1c2bb5 ("scsi: tracing: Fix compile error in trace_array calls when TRACING is disabled")
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Arun Easi <aeasi@marvell.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Aashish Sharma <shraash@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
kmemleak reports this issue:
unreferenced object 0xffff888105a18900 (size 128):
comm "test_progs", pid 18933, jiffies 4336275356 (age 22801.766s)
hex dump (first 32 bytes):
25 73 00 90 81 88 ff ff 26 05 00 00 42 01 58 04 %s......&...B.X.
03 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000560143a1>] __kmalloc_node_track_caller+0x4a/0x140
[<000000006af00822>] krealloc+0x8d/0xf0
[<00000000c309be6a>] trace_iter_expand_format+0x99/0x150
[<000000005a53bdb6>] trace_check_vprintf+0x1e0/0x11d0
[<0000000065629d9d>] trace_event_printf+0xb6/0xf0
[<000000009a690dc7>] trace_raw_output_bpf_trace_printk+0x89/0xc0
[<00000000d22db172>] print_trace_line+0x73c/0x1480
[<00000000cdba76ba>] tracing_read_pipe+0x45c/0x9f0
[<0000000015b58459>] vfs_read+0x17b/0x7c0
[<000000004aeee8ed>] ksys_read+0xed/0x1c0
[<0000000063d3d898>] do_syscall_64+0x3b/0x90
[<00000000a06dda7f>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
iter->fmt alloced in
tracing_read_pipe() -> .. ->trace_iter_expand_format(), but not
freed, to fix, add free in tracing_release_pipe()
Link: https://lkml.kernel.org/r/1667819090-4643-1-git-send-email-wangyufen@huawei.com
Cc: stable@vger.kernel.org
Fixes: efbbdaa22bb7 ("tracing: Show real address for trace event arguments")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|