Age | Commit message (Collapse) | Author | Files | Lines |
|
We need to lookup the master statfs inode and the local statfs
inodes earlier in the mount process (in init_journal) so journal
recovery can use them when it attempts to recover the statfs info.
We lookup all the local statfs inodes and store them in a linked
list to allow a node to recover statfs info for other nodes in the
cluster.
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
And read these in __get_log_header() from the log header.
Also make gfs2_statfs_change_out() non-static so it can be used
outside of super.c
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Once a withdraw has occurred, ignore errors that are the consequence of the
withdraw.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The gfs2_glock structure has a gl_vm member, introduced in commit 7005c3e4ae428
("GFS2: Use range based functions for rgrp sync/invalidation"), which stores
the location of resource groups within their address space. This structure is
in a union with iopen glock specific fields. It was introduced because at
unmount time, the resource group objects were destroyed before flushing out any
pending resource group glock work, and flushing out such work could require
flushing / truncating the address space.
Since commit b3422cacdd7e6 ("gfs2: Rework how rgrp buffer_heads are managed"),
any pending resource group glock work is flushed out before destroying the
resource group objects. So the resource group objects will now always exist in
rgrp_go_sync and rgrp_go_inval, and we now simply compute the gl_vm values
where needed instead of caching them. This also eliminates the union.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Only initialize gl_delete for iopen glocks, but more importantly, only access
it for iopen glocks in flush_delete_work: flush_delete_work is called for
different types of glocks including rgrp glocks, and those use gl_vm which is
in a union with gl_delete. Without this fix, we'll end up clobbering gl_vm,
which results in general memory corruption.
Fixes: a0e3cc65fa29 ("gfs2: Turn gl_delete into a delayed work")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The comments before function glock_hash_walk had the wrong name and
an extra parameter. This simply fixes the comments.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Before this patch, glock.c maintained a flag, GLF_QUEUED, which indicated
when a glock had a holder queued. It was only checked for inode glocks,
although set and cleared by all glocks, and it was only used to determine
whether the glock should be held for the minimum hold time before releasing.
The problem is that the flag is not accurate at all. If a process holds
the glock, the flag is set. When they dequeue the glock, it only cleared
the flag in cases when the state actually changed. So if the state doesn't
change, the flag may still be set, even when nothing is queued.
This happens to iopen glocks often: the get held in SH, then the file is
closed, but the glock remains in SH mode.
We don't need a special flag to indicate this: we can simply tell whether
the glock has any items queued to the holders queue. It's a waste of cpu
time to maintain it.
This patch eliminates the flag in favor of simply checking list_empty
on the glock holders.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
When flushing out its ail1 list, gfs2_write_jdata_page calls function
__block_write_full_page passing in function gfs2_get_block_noalloc.
But there was a problem when a process wrote to a jdata file, then
truncated it or punched a hole, leaving references to the blocks within
the new hole in its ail list, which are to be written to the journal log.
In writing them to the journal, after calling gfs2_block_map, function
gfs2_get_block_noalloc determined that the (hole-punched) block was not
mapped, so it returned -EIO to generic_writepages, which passed it back
to gfs2_ail1_start_one. This, in turn, performed a withdraw, assuming
there was a real IO error writing to the journal.
This might be a valid error when writing metadata to the journal, but for
journaled data writes, it does not warrant a withdraw.
This patch adds a check to function gfs2_block_map that makes an exception
for journaled data writes that correspond to jdata holes: If the iomap
get function returns a block type of IOMAP_HOLE, it instead returns
-ENODATA which does not cause the withdraw. Other errors are returned as
before.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Function gfs2_block_map had a lot of redundancy between its create and
no_create paths. This patch simplifies the code to eliminate the redundancy.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
With jdata writes, we frequently got into situations where gfs2 deadlocked
because of this calling sequence:
gfs2_ail1_start
gfs2_ail1_flush - for every tr on the sd_ail1_list:
gfs2_ail1_start_one - for every bd on the tr's tr_ail1_list:
generic_writepages
write_cache_pages passing __writepage()
calls clear_page_dirty_for_io which calls set_page_dirty:
which calls jdata_set_page_dirty which sets PageChecked.
__writepage() calls
mapping->a_ops->writepage AKA gfs2_jdata_writepage
However, gfs2_jdata_writepage checks if PageChecked is set, and if so, it
ignores the write and redirties the page. The problem is that write_cache_pages
calls clear_page_dirty_for_io, which often calls set_page_dirty(). See comments
in page-writeback.c starting with "Yes, Virginia". If it's jdata,
set_page_dirty will call jdata_set_page_dirty which will set PageChecked.
That causes a conflict because it makes it look like the page has been
redirtied by another writer, in which case we need to skip writing it and
redirty the page. That ends up in a deadlock because it isn't a "real" writer
and nothing will ever clear PageChecked.
If we do have a real writer, it will have started a transaction. So this
patch checks if a transaction is in use, and if not, it skips setting
PageChecked. That way, the page will be dirtied, cleaned, and written
appropriately.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Patch 380f7c65a7eb3288e4b6812acf3474a1de230707 changed gfs2_releasepage
so that it held the sd_ail_lock spin_lock for most of its processing.
It did this for some mysterious undocumented bug somewhere in the
evict code path. But in the nine years since, evict has been reworked
and fixed many times, and so have the transactions and ail list.
I can't see a reason to hold the sd_ail_lock unless it's protecting
the actual ail lists hung off the transactions. Therefore, this patch
removes the locking to increase speed and efficiency, and to further help
us rework the log flush code to be more concurrent with transactions.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
This patch is one baby step toward simplifying the journal management.
It simply changes function gfs2_ail1_empty_one from a void to an int and
makes it return a count of active items. This allows the caller to check
the return code rather than list_empty on the tr_ail1_list. This way
we can, in a later patch, combine transaction ail1 and ail2 lists.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Before this patch, when blocks were freed, it called gfs2_meta_wipe to
take the metadata out of the pending journal blocks. It did this mostly
by calling another function called gfs2_remove_from_journal. This is
shortsighted because it does not do anything with jdata blocks which
may also be in the journal.
This patch expands the function so that it wipes out jdata blocks from
the journal as well, and it wipes it from the ail1 list if it hasn't
been written back yet. Since it now processes jdata blocks as well,
the function has been renamed from gfs2_meta_wipe to gfs2_journal_wipe.
New function gfs2_ail1_wipe wants a static view of the ail list, so it
locks the sd_ail_lock when removing items. To accomplish this, function
gfs2_remove_from_journal no longer locks the sd_ail_lock, and it's now
the caller's responsibility to do so.
I was going to make sd_ail_lock locking conditional, but the practice is
generally frowned upon. For details, see: https://lwn.net/Articles/109066/
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
This patch adds some code to enhance the log_blocks trace point. It
reports the number of free log blocks. This makes the trace point much
more useful, especially for debugging performance problems when we can
tell when the journal gets full and needs to wait for flushes, etc.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Function gfs2_write_revokes was incrementing and decrementing the number
of log blocks free, but there was never a log_blocks trace point for it.
Thus, the free blocks from a log_blocks trace would jump around
mysteriously.
This patch adds the missing trace points so the trace makes more sense.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Since the function is only used for writing jdata pages, this patch
simply renames function gfs2_write_full_page to a more appropriate
name: gfs2_write_jdata_page. This makes the code easier to understand.
The function was only called in one place, which passed in a pointer to
function gfs2_get_block_noalloc. The function doesn't need to be
passed in. Therefore, this also eliminates the unnecessary parameter
to increase efficiency.
I also took the liberty of cleaning up the function comments.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
In gfs2_check_sb(), no validation checks are performed with regards to
the size of the superblock.
syzkaller detected a slab-out-of-bounds bug that was primarily caused
because the block size for a superblock was set to zero.
A valid size for a superblock is a power of 2 between 512 and PAGE_SIZE.
Performing validation checks and ensuring that the size of the superblock
is valid fixes this bug.
Reported-by: syzbot+af90d47a37376844e731@syzkaller.appspotmail.com
Tested-by: syzbot+af90d47a37376844e731@syzkaller.appspotmail.com
Suggested-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com>
[Minor code reordering.]
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
syzkaller found the following splat with CONFIG_DEBUG_KOBJECT_RELEASE=y:
Read of size 1 at addr ffff000028e896b8 by task kworker/1:2/228
CPU: 1 PID: 228 Comm: kworker/1:2 Tainted: G S 5.9.0-rc8+ #101
Hardware name: linux,dummy-virt (DT)
Workqueue: events kobject_delayed_cleanup
Call trace:
dump_backtrace+0x0/0x4d8
show_stack+0x34/0x48
dump_stack+0x174/0x1f8
print_address_description.constprop.0+0x5c/0x550
kasan_report+0x13c/0x1c0
__asan_report_load1_noabort+0x34/0x60
memcmp+0xd0/0xd8
gfs2_uevent+0xc4/0x188
kobject_uevent_env+0x54c/0x1240
kobject_uevent+0x2c/0x40
__kobject_del+0x190/0x1d8
kobject_delayed_cleanup+0x2bc/0x3b8
process_one_work+0x96c/0x18c0
worker_thread+0x3f0/0xc30
kthread+0x390/0x498
ret_from_fork+0x10/0x18
Allocated by task 1110:
kasan_save_stack+0x28/0x58
__kasan_kmalloc.isra.0+0xc8/0xe8
kasan_kmalloc+0x10/0x20
kmem_cache_alloc_trace+0x1d8/0x2f0
alloc_super+0x64/0x8c0
sget_fc+0x110/0x620
get_tree_bdev+0x190/0x648
gfs2_get_tree+0x50/0x228
vfs_get_tree+0x84/0x2e8
path_mount+0x1134/0x1da8
do_mount+0x124/0x138
__arm64_sys_mount+0x164/0x238
el0_svc_common.constprop.0+0x15c/0x598
do_el0_svc+0x60/0x150
el0_svc+0x34/0xb0
el0_sync_handler+0xc8/0x5b4
el0_sync+0x15c/0x180
Freed by task 228:
kasan_save_stack+0x28/0x58
kasan_set_track+0x28/0x40
kasan_set_free_info+0x24/0x48
__kasan_slab_free+0x118/0x190
kasan_slab_free+0x14/0x20
slab_free_freelist_hook+0x6c/0x210
kfree+0x13c/0x460
Use the same pattern as f2fs + ext4 where the kobject destruction must
complete before allowing the FS itself to be freed. This means that we
need an explicit free_sbd in the callers.
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
[Also go to fail_free when init_names fails.]
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
When an rindex entry is found to be corrupt, compute_bitstructs() calls
gfs2_consist_rgrpd() which calls gfs2_rgrp_dump() like this:
gfs2_rgrp_dump(NULL, rgd->rd_gl, fs_id_buf);
gfs2_rgrp_dump then dereferences the gl without checking it and we get
BUG: KASAN: null-ptr-deref in gfs2_rgrp_dump+0x28/0x280
because there's no rgrp glock involved while reading the rindex on mount.
Fix this by changing gfs2_rgrp_dump to take an rgrp argument.
Reported-by: syzbot+43fa87986bdd31df9de6@syzkaller.appspotmail.com
Signed-off-by: Andrew Price <anprice@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Switch to using the iomap readpage and writepage helpers for all I/O in
the ordered and writeback modes, and thus eliminate using buffer_heads
for I/O in these cases. The journaled data mode is left untouched.
(Andreas Gruenbacher: In gfs2_unstuffer_page, switch from mark_buffer_dirty
to set_page_dirty instead of accidentally leaving the page / buffer clean.)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Before this patch, we were not calling truncate_inode_pages_final for the
address space for glocks, which left the possibility of a leak. We now
take care of the problem instead of complaining, and we do it during
glock tear-down..
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Now that we've factored out the deleted and undeleted dinode cases
in gfs2_evict_inode, we can greatly simplify the logic. Now the
function is easy to read and understand.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Now that we've factored out the delete-dinode case to simplify
gfs2_evict_inode, we take it a step further and factor out the other
case: where we don't delete the inode.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
This patch further simplifies function gfs2_evict_inode() by adding a
new function evict_should_delete. The function may also lock the inode
glock.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Function gfs2_evict_inode is way too big, complex and unreadable. This
is a baby step toward breaking it apart to be more readable. It factors
out the portion that deletes the online bits for a dinode that is
unlinked and needs to be deleted. A future patch will factor out more.
(If I factor out too much, the patch itself becomes unreadable).
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Function gfs2_evict_inode is too big and unreadable. This patch is just
a baby step toward improving that. This first step just renames variable
error to ret. This will help make future patches more readable.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Commit ca399c96e96e changes gfs2_log_flush to not withdraw the
filesystem while holding the log flush lock, but it fails to check if
the filesystem needs to be withdrawn once the log flush lock has been
released. Likewise, commit f05b86db314d depends on gfs2_log_flush to
trigger for delayed withdraws. Add that and clean up the code flow
somewhat.
In gfs2_put_super, add a check for delayed withdraws that have been
missed to prevent these kinds of bugs in the future.
Fixes: ca399c96e96e ("gfs2: flesh out delayed withdraw for gfs2_log_flush")
Fixes: f05b86db314d ("gfs2: Prepare to withdraw as soon as an IO error occurs in log write")
Cc: stable@vger.kernel.org # v5.7+: 462582b99b607: gfs2: add some much needed cleanup for log flushes that fail
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
|
|
When memory is hotplug added or removed the min_free_kbytes should be
recalculated based on what is expected by khugepaged. Currently after
hotplug, min_free_kbytes will be set to a lower default and higher
default set when THP enabled is lost.
This change restores min_free_kbytes as expected for THP consumers.
[vijayb@linux.microsoft.com: v5]
Link: https://lkml.kernel.org/r/1601398153-5517-1-git-send-email-vijayb@linux.microsoft.com
Fixes: f000565adb77 ("thp: set recommended min free kbytes")
Signed-off-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Allen Pais <apais@microsoft.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/1600305709-2319-2-git-send-email-vijayb@linux.microsoft.com
Link: https://lkml.kernel.org/r/1600204258-13683-1-git-send-email-vijayb@linux.microsoft.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The swap address_space doesn't have host. Thus, it makes kernel crash once
swap write meets error. Fix it.
Fixes: 735e4ae5ba28 ("vfs: track per-sb writeback errors and report them to syncfs")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Andres Freund <andres@anarazel.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20201010000650.750063-1-minchan@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The syzbot reported the below general protection fault:
general protection fault, probably for non-canonical address
0xe00eeaee0000003b: 0000 [#1] PREEMPT SMP KASAN
KASAN: maybe wild-memory-access in range [0x00777770000001d8-0x00777770000001df]
CPU: 1 PID: 10488 Comm: syz-executor721 Not tainted 5.9.0-rc3-syzkaller #0
RIP: 0010:unlink_file_vma+0x57/0xb0 mm/mmap.c:164
Call Trace:
free_pgtables+0x1b3/0x2f0 mm/memory.c:415
exit_mmap+0x2c0/0x530 mm/mmap.c:3184
__mmput+0x122/0x470 kernel/fork.c:1076
mmput+0x53/0x60 kernel/fork.c:1097
exit_mm kernel/exit.c:483 [inline]
do_exit+0xa8b/0x29f0 kernel/exit.c:793
do_group_exit+0x125/0x310 kernel/exit.c:903
get_signal+0x428/0x1f00 kernel/signal.c:2757
arch_do_signal+0x82/0x2520 arch/x86/kernel/signal.c:811
exit_to_user_mode_loop kernel/entry/common.c:136 [inline]
exit_to_user_mode_prepare+0x1ae/0x200 kernel/entry/common.c:167
syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:242
entry_SYSCALL_64_after_hwframe+0x44/0xa9
It's because the ->mmap() callback can change vma->vm_file and fput the
original file. But the commit d70cec898324 ("mm: mmap: merge vma after
call_mmap() if possible") failed to catch this case and always fput()
the original file, hence add an extra fput().
[ Thanks Hillf for pointing this extra fput() out. ]
Fixes: d70cec898324 ("mm: mmap: merge vma after call_mmap() if possible")
Reported-by: syzbot+c5d5a51dcbb558ca0cb5@syzkaller.appspotmail.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christian König <ckoenig.leichtzumerken@gmail.com>
Cc: Hongxiang Lou <louhongxiang@huawei.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Link: https://lkml.kernel.org/r/20200916090733.31427-1-linmiaohe@huawei.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Use my kernel.org address instead of my bootlin.com one.
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: https://lkml.kernel.org/r/20201005164533.16811-1-atenart@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
As more email from git history gets aimed at the OpenWall
kernel-hardening@ list, there has been a desire to separate "new topics"
from "on-going" work.
To handle this, the superset of hardening email topics are now to be
directed to linux-hardening@vger.kernel.org.
Update the MAINTAINERS file and the .mailmap to accomplish this, so that
linux-hardening@ can be treated like any other regular upstream kernel
development list.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: "Tobin C. Harding" <me@tobin.cc>
Cc: Tycho Andersen <tycho@tycho.pizza>
Cc: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/linux-hardening/202010051443.279CC265D@keescook/
Link: https://lkml.kernel.org/r/20201006000012.2768958-1-keescook@chromium.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
On setxattr() syscall path due to an apprent typo the size of a dynamically
allocated memory chunk for storing struct smb2_file_full_ea_info object is
computed incorrectly, to be more precise the first addend is the size of
a pointer instead of the wanted object size. Coincidentally it makes no
difference on 64-bit platforms, however on 32-bit targets the following
memcpy() writes 4 bytes of data outside of the dynamically allocated memory.
=============================================================================
BUG kmalloc-16 (Not tainted): Redzone overwritten
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: 0x79e69a6f-0x9e5cdecf @offset=368. First byte 0x73 instead of 0xcc
INFO: Slab 0xd36d2454 objects=85 used=51 fp=0xf7d0fc7a flags=0x35000201
INFO: Object 0x6f171df3 @offset=352 fp=0x00000000
Redzone 5d4ff02d: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
Object 6f171df3: 00 00 00 00 00 05 06 00 73 6e 72 75 62 00 66 69 ........snrub.fi
Redzone 79e69a6f: 73 68 32 0a sh2.
Padding 56254d82: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
CPU: 0 PID: 8196 Comm: attr Tainted: G B 5.9.0-rc8+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
Call Trace:
dump_stack+0x54/0x6e
print_trailer+0x12c/0x134
check_bytes_and_report.cold+0x3e/0x69
check_object+0x18c/0x250
free_debug_processing+0xfe/0x230
__slab_free+0x1c0/0x300
kfree+0x1d3/0x220
smb2_set_ea+0x27d/0x540
cifs_xattr_set+0x57f/0x620
__vfs_setxattr+0x4e/0x60
__vfs_setxattr_noperm+0x4e/0x100
__vfs_setxattr_locked+0xae/0xd0
vfs_setxattr+0x4e/0xe0
setxattr+0x12c/0x1a0
path_setxattr+0xa4/0xc0
__ia32_sys_lsetxattr+0x1d/0x20
__do_fast_syscall_32+0x40/0x70
do_fast_syscall_32+0x29/0x60
do_SYSENTER_32+0x15/0x20
entry_SYSENTER_32+0x9f/0xf2
Fixes: 5517554e4313 ("cifs: Add support for writing attributes on SMB2+")
Signed-off-by: Vladimir Zapolskiy <vladimir@tuxera.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There have been elusive reports of filemap_fault() hitting its
VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built
with CONFIG_READ_ONLY_THP_FOR_FS=y.
Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and
CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged
without NUMA reuses the same huge page after collapse_file() failed
(whereas NUMA targets its allocation to the respective node each time).
And most of us were usually testing with CONFIG_NUMA=y kernels.
collapse_file(old start)
new_page = khugepaged_alloc_page(hpage)
__SetPageLocked(new_page)
new_page->index = start // hpage->index=old offset
new_page->mapping = mapping
xas_store(&xas, new_page)
filemap_fault
page = find_get_page(mapping, offset)
// if offset falls inside hpage then
// compound_head(page) == hpage
lock_page_maybe_drop_mmap()
__lock_page(page)
// collapse fails
xas_store(&xas, old page)
new_page->mapping = NULL
unlock_page(new_page)
collapse_file(new start)
new_page = khugepaged_alloc_page(hpage)
__SetPageLocked(new_page)
new_page->index = start // hpage->index=new offset
new_page->mapping = mapping // mapping becomes valid again
// since compound_head(page) == hpage
// page_to_pgoff(page) got changed
VM_BUG_ON_PAGE(page_to_pgoff(page) != offset)
An initial patch replaced __SetPageLocked() by lock_page(), which did
fix the race which Suren illustrates above. But testing showed that it's
not good enough: if the racing task's __lock_page() gets delayed long
after its find_get_page(), then it may follow collapse_file(new start)'s
successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE.
It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a
check and retry (as is done for mapping), with similar relaxations in
find_lock_entry() and pagecache_get_page(): but it's not obvious what
else might get caught out; and khugepaged non-NUMA appears to be unique
in exposing a page to page cache, then revoking, without going through
a full cycle of freeing before reuse.
Instead, non-NUMA khugepaged_prealloc_page() release the old page
if anyone else has a reference to it (1% of cases when I tested).
Although never reported on huge tmpfs, I believe its find_lock_entry()
has been at similar risk; but huge tmpfs does not rely on khugepaged
for its normal working nearly so much as READ_ONLY_THP_FOR_FS does.
Reported-by: Denis Lisov <dennis.lissov@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569
Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%40linux-foundation.org
Reported-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pw
Reported-and-analyzed-by: Suren Baghdasaryan <surenb@google.com>
Fixes: 87c460a0bded ("mm/khugepaged: collapse_shmem() without freezing new_page")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v4.9+
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When the NACK and BUS error bits are set by the hardware, the driver is
responsible for clearing them by writing "1" into the corresponding
status registers.
Hence perform the necessary operations in owl_i2c_interrupt().
Fixes: d211e62af466 ("i2c: Add Actions Semiconductor Owl family S900 I2C driver")
Reported-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@gmail.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
This reverts commit fa4d30556883f2eaab425b88ba9904865a4d00f3. An updated
version was sent. So, revert this version and give the new version more
time for testing.
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
In mmc_queue_setup_discard() the mmc driver queue's discard_granularity
might be set as 0 (when card->pref_erase > max_discard) while the mmc
device still declares to support discard operation. This is buggy and
triggered the following kernel warning message,
WARNING: CPU: 0 PID: 135 at __blkdev_issue_discard+0x200/0x294
CPU: 0 PID: 135 Comm: f2fs_discard-17 Not tainted 5.9.0-rc6 #1
Hardware name: Google Kevin (DT)
pstate: 00000005 (nzcv daif -PAN -UAO BTYPE=--)
pc : __blkdev_issue_discard+0x200/0x294
lr : __blkdev_issue_discard+0x54/0x294
sp : ffff800011dd3b10
x29: ffff800011dd3b10 x28: 0000000000000000 x27: ffff800011dd3cc4 x26: ffff800011dd3e18 x25: 000000000004e69b x24: 0000000000000c40 x23: ffff0000f1deaaf0 x22: ffff0000f2849200 x21: 00000000002734d8 x20: 0000000000000008 x19: 0000000000000000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000394 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000000 x10: 00000000000008b0 x9 : ffff800011dd3cb0 x8 : 000000000004e69b x7 : 0000000000000000 x6 : ffff0000f1926400 x5 : ffff0000f1940800 x4 : 0000000000000000 x3 : 0000000000000c40 x2 : 0000000000000008 x1 : 00000000002734d8 x0 : 0000000000000000 Call trace:
__blkdev_issue_discard+0x200/0x294
__submit_discard_cmd+0x128/0x374
__issue_discard_cmd_orderly+0x188/0x244
__issue_discard_cmd+0x2e8/0x33c
issue_discard_thread+0xe8/0x2f0
kthread+0x11c/0x120
ret_from_fork+0x10/0x1c
---[ end trace e4c8023d33dfe77a ]---
This patch fixes the issue by setting discard_granularity as SECTOR_SIZE
instead of 0 when (card->pref_erase > max_discard) is true. Now no more
complain from __blkdev_issue_discard() for the improper value of discard
granularity.
This issue is exposed after commit b35fd7422c2f ("block: check queue's
limits.discard_granularity in __blkdev_issue_discard()"), a "Fixes:" tag
is also added for the commit to make sure people won't miss this patch
after applying the change of __blkdev_issue_discard().
Fixes: e056a1b5b67b ("mmc: queue: let host controllers specify maximum discard timeout")
Fixes: b35fd7422c2f ("block: check queue's limits.discard_granularity in __blkdev_issue_discard()").
Reported-and-tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Coly Li <colyli@suse.de>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20201002013852.51968-1-colyli@suse.de
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
The error handling introduced by commit:
2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")
looses any return value from smp_call_function_single() that is not
{0, -EINVAL}. This is a problem because it will return -EXNIO when the
target CPU is offline. Worse, in that case it'll turn into an infinite
loop.
Fixes: 2ed6edd33a21 ("perf: Add cond_resched() to task_function_call()")
Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Barret Rhoden <brho@google.com>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: https://lkml.kernel.org/r/20200827064732.20860-1-kjain@linux.ibm.com
|
|
Looks like the I2C tunnel implementation from Chromebook's
embedded controller does not handle PEC correctly. Fix this
by disabling PEC for batteries behind those I2C tunnels as
a workaround.
Note, that some Chromebooks actually have been reported to
have working PEC support (with I2C tunnel). Since the problem
has not yet been fully understood this simply reverts all
Chromebooks to not use PEC for now.
Reported-by: "Milan P. Stanić" <mps@arvanta.net>
Reported-by: Vicente Bergas <vicencb@gmail.com>
CC: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Fixes: 7222bd603dd2 ("power: supply: sbs-battery: add PEC support")
Tested-by: Vicente Bergas <vicencb@gmail.com>
Tested-by: "Milan P. Stanić" <mps@arvanta.net>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|
|
[Why & How]
change abm config init interface to support multiple ABMs.
Signed-off-by: Yongqiang Sun <yongqiang.sun@amd.com>
Reviewed-by: Chris Park <Chris.Park@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Remove propmt for selecting MLX5_VDPA by the user and modify
MLX5_VDPA_NET to select MLX5_VDPA. Also modify MLX5_VDPA_NET to depend
on mlx5_core.
This fixes an issue where configuration sets 'y' for MLX5_VDPA_NET while
MLX5_CORE is compiled as a module causing link errors.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 device")s
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20201007064011.GA50074@mtl-vdi-166.wap.labs.mlnx
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
A VM with mlx5 vDPA has below warnings while being reset:
vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11)
vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11)
We should allow userspace emulating the virtio device be
able to get to vq's avail_index, regardless of vDPA device
status. Save the index that was last seen when virtq was
stopped, so that userspace doesn't complain.
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Link: https://lore.kernel.org/r/1601583511-15138-1-git-send-email-si-wei.liu@oracle.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eli Cohen <elic@nvidia.com>
|
|
Add usb ids of the Cellient MPL200 card.
Signed-off-by: Wilken Gottwalt <wilken.gottwalt@mailbox.org>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
De-referencing skb after call to gro_cells_receive() is not allowed.
We need to fetch skb->len earlier.
Fixes: 5491e7c6b1a9 ("macsec: enable GRO and RPS on macsec devices")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Some firmware files trigger a PHY soft reset and don't wait for it to
be finished. PHY register writes directly after applying the firmware
may fail or provide unexpected results therefore. Fix this by waiting
for bit BMCR_RESET to be cleared after applying firmware.
There's nothing wrong with the referenced change, it's just that the
fix will apply cleanly only after this change.
Fixes: 89fbd26cca7e ("r8169: fix firmware not resetting tp->ocp_base")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
With multiple DNAT rules it's possible that after destination
translation the resulting tuples collide.
For example, two openvswitch flows:
nw_dst=10.0.0.10,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20))
nw_dst=10.0.0.20,tp_dst=10, actions=ct(commit,table=2,nat(dst=20.0.0.1:20))
Assuming two TCP clients initiating the following connections:
10.0.0.10:5000->10.0.0.10:10
10.0.0.10:5000->10.0.0.20:10
Both tuples would translate to 10.0.0.10:5000->20.0.0.1:20 causing
nf_conntrack_confirm() to fail because of tuple collision.
Netfilter handles this case by allocating a null binding for SNAT at
egress by default. Perform the same operation in openvswitch for DNAT
if no explicit SNAT is requested by the user and allocate a null binding
for SNAT for packets in the "original" direction.
Reported-at: https://bugzilla.redhat.com/1877128
Suggested-by: Florian Westphal <fw@strlen.de>
Fixes: 05752523e565 ("openvswitch: Interface with NAT.")
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
After freeing ep->auth_hmacs we have to clear the pointer
or risk use-after-free as reported by syzbot:
BUG: KASAN: use-after-free in sctp_auth_destroy_hmacs net/sctp/auth.c:509 [inline]
BUG: KASAN: use-after-free in sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline]
BUG: KASAN: use-after-free in sctp_auth_free+0x17e/0x1d0 net/sctp/auth.c:1070
Read of size 8 at addr ffff8880a8ff52c0 by task syz-executor941/6874
CPU: 0 PID: 6874 Comm: syz-executor941 Not tainted 5.9.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x198/0x1fd lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
sctp_auth_destroy_hmacs net/sctp/auth.c:509 [inline]
sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline]
sctp_auth_free+0x17e/0x1d0 net/sctp/auth.c:1070
sctp_endpoint_destroy+0x95/0x240 net/sctp/endpointola.c:203
sctp_endpoint_put net/sctp/endpointola.c:236 [inline]
sctp_endpoint_free+0xd6/0x110 net/sctp/endpointola.c:183
sctp_destroy_sock+0x9c/0x3c0 net/sctp/socket.c:4981
sctp_v6_destroy_sock+0x11/0x20 net/sctp/socket.c:9415
sk_common_release+0x64/0x390 net/core/sock.c:3254
sctp_close+0x4ce/0x8b0 net/sctp/socket.c:1533
inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:475
__sock_release+0xcd/0x280 net/socket.c:596
sock_close+0x18/0x20 net/socket.c:1277
__fput+0x285/0x920 fs/file_table.c:281
task_work_run+0xdd/0x190 kernel/task_work.c:141
exit_task_work include/linux/task_work.h:25 [inline]
do_exit+0xb7d/0x29f0 kernel/exit.c:806
do_group_exit+0x125/0x310 kernel/exit.c:903
__do_sys_exit_group kernel/exit.c:914 [inline]
__se_sys_exit_group kernel/exit.c:912 [inline]
__x64_sys_exit_group+0x3a/0x50 kernel/exit.c:912
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x43f278
Code: Bad RIP value.
RSP: 002b:00007fffe0995c38 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000043f278
RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
RBP: 00000000004bf068 R08: 00000000000000e7 R09: ffffffffffffffd0
R10: 0000000020000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00000000006d1180 R14: 0000000000000000 R15: 0000000000000000
Allocated by task 6874:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
kasan_set_track mm/kasan/common.c:56 [inline]
__kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:461
kmem_cache_alloc_trace+0x174/0x300 mm/slab.c:3554
kmalloc include/linux/slab.h:554 [inline]
kmalloc_array include/linux/slab.h:593 [inline]
kcalloc include/linux/slab.h:605 [inline]
sctp_auth_init_hmacs+0xdb/0x3b0 net/sctp/auth.c:464
sctp_auth_init+0x8a/0x4a0 net/sctp/auth.c:1049
sctp_setsockopt_auth_supported net/sctp/socket.c:4354 [inline]
sctp_setsockopt+0x477e/0x97f0 net/sctp/socket.c:4631
__sys_setsockopt+0x2db/0x610 net/socket.c:2132
__do_sys_setsockopt net/socket.c:2143 [inline]
__se_sys_setsockopt net/socket.c:2140 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2140
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 6874:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
__kasan_slab_free+0xd8/0x120 mm/kasan/common.c:422
__cache_free mm/slab.c:3422 [inline]
kfree+0x10e/0x2b0 mm/slab.c:3760
sctp_auth_destroy_hmacs net/sctp/auth.c:511 [inline]
sctp_auth_destroy_hmacs net/sctp/auth.c:501 [inline]
sctp_auth_init_hmacs net/sctp/auth.c:496 [inline]
sctp_auth_init_hmacs+0x2b7/0x3b0 net/sctp/auth.c:454
sctp_auth_init+0x8a/0x4a0 net/sctp/auth.c:1049
sctp_setsockopt_auth_supported net/sctp/socket.c:4354 [inline]
sctp_setsockopt+0x477e/0x97f0 net/sctp/socket.c:4631
__sys_setsockopt+0x2db/0x610 net/socket.c:2132
__do_sys_setsockopt net/socket.c:2143 [inline]
__se_sys_setsockopt net/socket.c:2140 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2140
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 1f485649f529 ("[SCTP]: Implement SCTP-AUTH internals")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|