Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
With idle polling, IPIs are not sent when a CPU idle, but queued
and run later from do_idle(). The default kgdb_call_nmi_hook()
implementation gets the pointer to struct pt_regs from get_irq_reqs(),
which doesn't work in that case because it was not called from the
IPI interrupt handler. Fix it by defining our own kgdb_roundup()
function which sents an IPI_ENTER_KGDB. When that IPI is received
on the target CPU kgdb_nmicallback() is called.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
This implements the CONFIG_THREAD_INFO_IN_TASK option.
With this change:
- before thread_info was part of the stack and located at the beginning of the stack
- now the thread_info struct is moved and located inside the task_struct structure
- the stack is allocated and handled like the major other platforms
- drop the cpu field of thread_info and use instead the one in task_struct
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sven Schnelle <svens@stackframe.org>
|
|
Almost all PA-RISC machines have either a button that
is labeled with 'TOC' or a BMC function to trigger a TOC.
TOC is a non-maskable interrupt that is sent to the processor.
This can be used for diagnostic purposes like obtaining a
stack trace/register dump or to enter KDB/KGDB.
As an example, on my c8000, TOC can be used with:
CONFIG_KGDB=y
CONFIG_KGDB_KDB=y
and the 'kgdboc=ttyS0,115200' appended to the command line.
Press ^[( on serial console, which will enter the BMC command line,
and enter 'TOC s':
root@(none):/# (
cli>TOC s
Sending TOC/INIT.
<Cpu3> 2800035d03e00000 0000000040c21ac8 CC_ERR_CHECK_TOC
<Cpu0> 2800035d00e00000 0000000040c21ad0 CC_ERR_CHECK_TOC
<Cpu2> 2800035d02e00000 0000000040c21ac8 CC_ERR_CHECK_TOC
<Cpu1> 2800035d01e00000 0000000040c21ad0 CC_ERR_CHECK_TOC
<Cpu3> 37000f7303e00000 2000000000000000 CC_ERR_CPU_CHECK_SUMMARY
<Cpu0> 37000f7300e00000 2000000000000000 CC_ERR_CPU_CHECK_SUMMARY
<Cpu2> 37000f7302e00000 2000000000000000 CC_ERR_CPU_CHECK_SUMMARY
<Cpu1> 37000f7301e00000 2000000000000000 CC_ERR_CPU_CHECK_SUMMARY
<Cpu3> 4300100803e00000 c0000000001d26cc CC_MC_BR_TO_OS_TOC
<Cpu0> 4300100800e00000 c0000000001d26cc CC_MC_BR_TO_OS_TOC
<Cpu2> 4300100802e00000 c0000000001d26cc CC_MC_BR_TO_OS_TOC
<Cpu1> 4300100801e00000 c0000000001d26cc CC_MC_BR_TO_OS_TOC
Entering kdb (current=0x00000000411cef80, pid 0) on processor 0 due to NonMaskable Interrupt @ 0x40c21ad0
[0]kdb>
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Add functions to retrieve TOC data from firmware both
for 1.1 and 2.0 PDC.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
These data structures describe the TOC data we get from firmware
when issuing a PDC_PIM_TOC request.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
This macro will also be used by the TOC code, so move it
into asm/assembly.h to avoid duplication.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
With 64 bit kernels unwind_special() is not working because
it compares the pc to the address of the function descriptor.
Add a helper function that compares pc with the dereferenced
address. This fixes all of the backtraces on my c8000. Without
this changes, a lot of backtraces are missing in kdb or the
show-all-tasks command from /proc/sysrq-trigger.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
The TIF_XXX flags are stored in the flags field in the thread_info
struct (TI_FLAGS), not in the flags field of the task_struct structure
(TASK_FLAGS).
It seems this bug didn't generate any important side-effects, otherwise it
wouldn't have went unnoticed for 12 years (since v2.6.32).
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: ecd3d4bc06e48 ("parisc: stop using task->ptrace for {single,block}step flags")
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: stable@vger.kernel.org
|
|
The function end_of_stack() returns a pointer to the last entry of a
stack. For architectures like parisc where the stack grows upwards
return the pointer to the highest address in the stack.
Without this change I faced a crash on parisc, because the stackleak
functionality wrote STACKLEAK_POISON to the lowest address and thus
overwrote the first 4 bytes of the task_struct which included the
TIF_FLAGS.
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
We will put the stack directly behind the task struct, so
make sure that we allocate it with an alignment of 64 bytes.
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
I've got the following splat after enabling preemption:
[ 3.724721] BUG: using __this_cpu_add() in preemptible [00000000] code: swapper/0/1
[ 3.734630] caller is __this_cpu_preempt_check+0x38/0x50
[ 3.740635] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.15.0-rc4-64bit+ #324
[ 3.744605] Hardware name: 9000/785/C8000
[ 3.744605] Backtrace:
[ 3.744605] [<00000000401d9d58>] show_stack+0x74/0xb0
[ 3.744605] [<0000000040c27bd4>] dump_stack_lvl+0x10c/0x188
[ 3.744605] [<0000000040c27c84>] dump_stack+0x34/0x48
[ 3.744605] [<0000000040c33438>] check_preemption_disabled+0x178/0x1b0
[ 3.744605] [<0000000040c334f8>] __this_cpu_preempt_check+0x38/0x50
[ 3.744605] [<00000000401d632c>] flush_tlb_all+0x58/0x2e0
[ 3.744605] [<00000000401075c0>] 0x401075c0
[ 3.744605] [<000000004010b8fc>] 0x4010b8fc
[ 3.744605] [<00000000401080fc>] 0x401080fc
[ 3.744605] [<00000000401d5224>] do_one_initcall+0x128/0x378
[ 3.744605] [<0000000040102de8>] 0x40102de8
[ 3.744605] [<0000000040c33864>] kernel_init+0x60/0x3a8
[ 3.744605] [<00000000401d1020>] ret_from_kernel_thread+0x20/0x28
[ 3.744605]
Fix this by moving the __inc_irq_stat() into the locked section.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Otherwise we might not stop all other CPUs.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
preempt_count in struct thread_info is unsigned int,
but the entry.S code used LDREG, which generates a 64 bit
load when compiled for 64 bit. Fix this to use an ldw and
also change the condition in the compare one line below
to only compares 32 bits, although ldw zero extends, and
that should work with a 64 bit compare.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Parts of both functions are the same, so deduplicate them. No functional
change.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
flush_cache_mm() and flush_cache_range() fetch %sr3 via mfsp().
If it matches mm->context, they flush caches and the TLB. However,
the TLB is cpu-local, so if the code gets preempted shortly after
the mfsp(), and later resumed on another CPU, the wrong TLB is flushed.
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
It's shorter and kfence currently depends on this stack unwinding
implementation.
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
When adding kfence support, we need to tell kfence_handle_page_fault()
if the interrupted assembler statement is a read or write operation.
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
I have no idea why get_user() is used there, but we're unwinding the
kernel stack, so we should use copy_from_kernel_nofault().
Signed-off-by: Sven Schnelle <svens@stackframe.org>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Nathan reported that because KASAN_SHADOW_OFFSET was not defined in
Kconfig, it prevents asan-stack from getting disabled with clang even
when CONFIG_KASAN_STACK is disabled: fix this by defining the
corresponding config.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Fixes: 8ad8b72721d0 ("riscv: Add KASAN support")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
When calling this function, all the shadow memory is already populated
with kasan_early_shadow_pte which has PAGE_KERNEL protection.
kasan_populate_early_shadow write-protects the mapping of the range
of addresses passed in argument in zero_pte_populate, which actually
write-protects all the shadow memory mapping since kasan_early_shadow_pte
is used for all the shadow memory at this point. And then when using
memblock API to populate the shadow memory, the first write access to the
kernel stack triggers a trap. This becomes visible with the next commit
that contains a fix for asan-stack.
We already manually populate all the shadow memory in kasan_early_init
and we write-protect kasan_early_shadow_pte at the end of kasan_init
which makes the calls to kasan_populate_early_shadow superfluous so
we can remove them.
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Fixes: e178d670f251 ("riscv/kasan: add KASAN_VMALLOC support")
Fixes: 8ad8b72721d0 ("riscv: Add KASAN support")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
My snake instinct was on and I wrote "misssing" instead of "missing".
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Some functions had kernel-doc that used a comma instead of a hash to
separate the function name from the one line description.
Also, the "ftrace_is_dead()" had an incomplete description.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
This reverts commit 8c945d32e60427cbc0859cf7045bbe6196bb03d8.
The kmaps in compression code are still needed and cause crashes on
32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004
with enabled LZO or ZSTD compression.
The revert does not apply cleanly due to changes in a6e66e6f8c1b
("btrfs: rework lzo_decompress_bio() to make it subpage compatible")
that reworked the page iteration so the revert is done to be equivalent
to the original code.
Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839
Tested-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
This reverts commit 696ab562e6df9fbafd6052d8ce4aafcb2ed16069.
The kmaps in compression code are still needed and cause crashes on
32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004
with enabled LZO or ZSTD compression.
Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
This reverts commit bbaf9715f3f5b5ff0de71da91fcc34ee9c198ed8.
The kmaps in compression code are still needed and cause crashes on
32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004
with enabled LZO or ZSTD compression.
Example stacktrace with ZSTD on a 32bit ARM machine:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c4159ed3
[00000000] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 0 PID: 210 Comm: kworker/u2:3 Not tainted 5.14.0-rc79+ #12
Hardware name: Allwinner sun4i/sun5i Families
Workqueue: btrfs-delalloc btrfs_work_helper
PC is at mmiocpy+0x48/0x330
LR is at ZSTD_compressStream_generic+0x15c/0x28c
(mmiocpy) from [<c0629648>] (ZSTD_compressStream_generic+0x15c/0x28c)
(ZSTD_compressStream_generic) from [<c06297dc>] (ZSTD_compressStream+0x64/0xa0)
(ZSTD_compressStream) from [<c049444c>] (zstd_compress_pages+0x170/0x488)
(zstd_compress_pages) from [<c0496798>] (btrfs_compress_pages+0x124/0x12c)
(btrfs_compress_pages) from [<c043c068>] (compress_file_range+0x3c0/0x834)
(compress_file_range) from [<c043c4ec>] (async_cow_start+0x10/0x28)
(async_cow_start) from [<c0475c3c>] (btrfs_work_helper+0x100/0x230)
(btrfs_work_helper) from [<c014ef68>] (process_one_work+0x1b4/0x418)
(process_one_work) from [<c014f210>] (worker_thread+0x44/0x524)
(worker_thread) from [<c0156aa4>] (kthread+0x180/0x1b0)
(kthread) from [<c0100150>]
Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The coccinelle check report:
./tools/testing/selftests/vm/split_huge_page_test.c:344:36-42:
ERROR: application of sizeof to pointer
Use "strlen" to fix it.
Link: https://lkml.kernel.org/r/20211012030116.184027-1-davidcomponentone@gmail.com
Signed-off-by: David Yang <davidcomponentone@gmail.com>
Reported-by: Zeal Robot <zealci@zte.com.cn>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Kunit test cases for 'damon_split_regions_of()' expects the number of
regions after calling the function will be same to their request
('nr_sub'). However, the requested number is just an upper-limit,
because the function randomly decides the size of each sub-region.
This fixes the wrong expectation.
Link: https://lkml.kernel.org/r/20211028090628.14948-1-sj@kernel.org
Fixes: 17ccae8bb5c9 ("mm/damon: add kunit tests")
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The read-only THP for filesystems will collapse THP for files opened
readonly and mapped with VM_EXEC. The intended usecase is to avoid TLB
misses for large text segments. But it doesn't restrict the file types
so a THP could be collapsed for a non-regular file, for example, block
device, if it is opened readonly and mapped with EXEC permission. This
may cause bugs, like [1] and [2].
This is definitely not the intended usecase, so just collapse THP for
regular files in order to close the attack surface.
[shy828301@gmail.com: fix vm_file check [3]]
Link: https://lore.kernel.org/lkml/CACkBjsYwLYLRmX8GpsDpMthagWOjWWrNxqY6ZLNQVr6yx+f5vA@mail.gmail.com/ [1]
Link: https://lore.kernel.org/linux-mm/000000000000c6a82505ce284e4c@google.com/ [2]
Link: https://lkml.kernel.org/r/CAHbLzkqTW9U3VvTu1Ki5v_cLRC9gHW+znBukg_ycergE0JWj-A@mail.gmail.com [3]
Link: https://lkml.kernel.org/r/20211027195221.3825-1-shy828301@gmail.com
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reported-by: Hao Sun <sunhao.th@gmail.com>
Reported-by: syzbot+aae069be1de40fb11825@syzkaller.appspotmail.com
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Andrea Righi <andrea.righi@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently collapse_file does not explicitly check PG_writeback, instead,
page_has_private and try_to_release_page are used to filter writeback
pages. This does not work for xfs with blocksize equal to or larger
than pagesize, because in such case xfs has no page->private.
This makes collapse_file bail out early for writeback page. Otherwise,
xfs end_page_writeback will panic as follows.
page:fffffe00201bcc80 refcount:0 mapcount:0 mapping:ffff0003f88c86a8 index:0x0 pfn:0x84ef32
aops:xfs_address_space_operations [xfs] ino:30000b7 dentry name:"libtest.so"
flags: 0x57fffe0000008027(locked|referenced|uptodate|active|writeback)
raw: 57fffe0000008027 ffff80001b48bc28 ffff80001b48bc28 ffff0003f88c86a8
raw: 0000000000000000 0000000000000000 00000000ffffffff ffff0000c3e9a000
page dumped because: VM_BUG_ON_PAGE(((unsigned int) page_ref_count(page) + 127u <= 127u))
page->mem_cgroup:ffff0000c3e9a000
------------[ cut here ]------------
kernel BUG at include/linux/mm.h:1212!
Internal error: Oops - BUG: 0 [#1] SMP
Modules linked in:
BUG: Bad page state in process khugepaged pfn:84ef32
xfs(E)
page:fffffe00201bcc80 refcount:0 mapcount:0 mapping:0 index:0x0 pfn:0x84ef32
libcrc32c(E) rfkill(E) aes_ce_blk(E) crypto_simd(E) ...
CPU: 25 PID: 0 Comm: swapper/25 Kdump: loaded Tainted: ...
pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
Call trace:
end_page_writeback+0x1c0/0x214
iomap_finish_page_writeback+0x13c/0x204
iomap_finish_ioend+0xe8/0x19c
iomap_writepage_end_bio+0x38/0x50
bio_endio+0x168/0x1ec
blk_update_request+0x278/0x3f0
blk_mq_end_request+0x34/0x15c
virtblk_request_done+0x38/0x74 [virtio_blk]
blk_done_softirq+0xc4/0x110
__do_softirq+0x128/0x38c
__irq_exit_rcu+0x118/0x150
irq_exit+0x1c/0x30
__handle_domain_irq+0x8c/0xf0
gic_handle_irq+0x84/0x108
el1_irq+0xcc/0x180
arch_cpu_idle+0x18/0x40
default_idle_call+0x4c/0x1a0
cpuidle_idle_call+0x168/0x1e0
do_idle+0xb4/0x104
cpu_startup_entry+0x30/0x9c
secondary_start_kernel+0x104/0x180
Code: d4210000 b0006161 910c8021 94013f4d (d4210000)
---[ end trace 4a88c6a074082f8c ]---
Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt
Link: https://lkml.kernel.org/r/20211022023052.33114-1-rongwei.wang@linux.alibaba.com
Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS")
Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
Suggested-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Song Liu <song@kernel.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Eric Dumazet reported a strange numa spreading info in [1], and found
commit 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings") introduced
this issue [2].
Dig into the difference before and after this patch, page allocation has
some difference:
before:
alloc_large_system_hash
__vmalloc
__vmalloc_node(..., NUMA_NO_NODE, ...)
__vmalloc_node_range
__vmalloc_area_node
alloc_page /* because NUMA_NO_NODE, so choose alloc_page branch */
alloc_pages_current
alloc_page_interleave /* can be proved by print policy mode */
after:
alloc_large_system_hash
__vmalloc
__vmalloc_node(..., NUMA_NO_NODE, ...)
__vmalloc_node_range
__vmalloc_area_node
alloc_pages_node /* choose nid by nuam_mem_id() */
__alloc_pages_node(nid, ....)
So after commit 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings"),
it will allocate memory in current node instead of interleaving allocate
memory.
Link: https://lore.kernel.org/linux-mm/CANn89iL6AAyWhfxdHO+jaT075iOa3XcYn9k6JJc7JR2XYn6k_Q@mail.gmail.com/ [1]
Link: https://lore.kernel.org/linux-mm/CANn89iLofTR=AK-QOZY87RdUZENCZUT4O6a0hvhu3_EwRMerOg@mail.gmail.com/ [2]
Link: https://lkml.kernel.org/r/20211021080744.874701-2-chenwandun@huawei.com
Fixes: 121e6f3258fe ("mm/vmalloc: hugepage vmalloc mappings")
Signed-off-by: Chen Wandun <chenwandun@huawei.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Quoting Dmitry:
"refcount_inc() needs to be done before fd_install(). After
fd_install() finishes, the fd can be used by userspace and
we can have secret data in memory before the refcount_inc().
A straightforward misuse where a user will predict the returned
fd in another thread before the syscall returns and will use it
to store secret data is somewhat dubious because such a user just
shoots themself in the foot.
But a more interesting misuse would be to close the predicted fd
and decrement the refcount before the corresponding refcount_inc,
this way one can briefly drop the refcount to zero while there are
other users of secretmem."
Move fd_install() after refcount_inc().
Link: https://lkml.kernel.org/r/20211021154046.880251-1-keescook@chromium.org
Link: https://lore.kernel.org/lkml/CACT4Y+b1sW6-Hkn8HQYw_SsT7X3tp-CJNh2ci0wG3ZnQz9jjig@mail.gmail.com
Fixes: 9a436f8ff631 ("PM: hibernate: disable when there are active secretmem users")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jordy Zomer <jordy@pwning.systems>
Cc: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Encountered a race between ocfs2_test_bg_bit_allocatable() and
jbd2_journal_put_journal_head() resulting in the below vmcore.
PID: 106879 TASK: ffff880244ba9c00 CPU: 2 COMMAND: "loop3"
Call trace:
panic
oops_end
no_context
__bad_area_nosemaphore
bad_area_nosemaphore
__do_page_fault
do_page_fault
page_fault
[exception RIP: ocfs2_block_group_find_clear_bits+316]
ocfs2_block_group_find_clear_bits [ocfs2]
ocfs2_cluster_group_search [ocfs2]
ocfs2_search_chain [ocfs2]
ocfs2_claim_suballoc_bits [ocfs2]
__ocfs2_claim_clusters [ocfs2]
ocfs2_claim_clusters [ocfs2]
ocfs2_local_alloc_slide_window [ocfs2]
ocfs2_reserve_local_alloc_bits [ocfs2]
ocfs2_reserve_clusters_with_limit [ocfs2]
ocfs2_reserve_clusters [ocfs2]
ocfs2_lock_refcount_allocators [ocfs2]
ocfs2_make_clusters_writable [ocfs2]
ocfs2_replace_cow [ocfs2]
ocfs2_refcount_cow [ocfs2]
ocfs2_file_write_iter [ocfs2]
lo_rw_aio
loop_queue_work
kthread_worker_fn
kthread
ret_from_fork
When ocfs2_test_bg_bit_allocatable() called bh2jh(bg_bh), the
bg_bh->b_private NULL as jbd2_journal_put_journal_head() raced and
released the jounal head from the buffer head. Needed to take bit lock
for the bit 'BH_JournalHead' to fix this race.
Link: https://lkml.kernel.org/r/1634820718-6043-1-git-send-email-gautham.ananthakrishna@oracle.com
Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: <rajesh.sivaramasubramaniom@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Race between process_mrelease and exit_mmap, where free_pgtables is
called while __oom_reap_task_mm is in progress, leads to kernel crash
during pte_offset_map_lock call. oom-reaper avoids this race by setting
MMF_OOM_VICTIM flag and causing exit_mmap to take and release
mmap_write_lock, blocking it until oom-reaper releases mmap_read_lock.
Reusing MMF_OOM_VICTIM for process_mrelease would be the simplest way to
fix this race, however that would be considered a hack. Fix this race
by elevating mm->mm_users and preventing exit_mmap from executing until
process_mrelease is finished. Patch slightly refactors the code to
adapt for a possible mmget_not_zero failure.
This fix has considerable negative impact on process_mrelease
performance and will likely need later optimization.
Link: https://lkml.kernel.org/r/20211022014658.263508-1-surenb@google.com
Fixes: 884a7e5964e0 ("mm: introduce process_mrelease system call")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Christian Brauner <christian@brauner.io>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When handling shmem page fault the THP with corrupted subpage could be
PMD mapped if certain conditions are satisfied. But kernel is supposed
to send SIGBUS when trying to map hwpoisoned page.
There are two paths which may do PMD map: fault around and regular
fault.
Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault()
codepaths") the thing was even worse in fault around path. The THP
could be PMD mapped as long as the VMA fits regardless what subpage is
accessed and corrupted. After this commit as long as head page is not
corrupted the THP could be PMD mapped.
In the regular fault path the THP could be PMD mapped as long as the
corrupted page is not accessed and the VMA fits.
This loophole could be fixed by iterating every subpage to check if any
of them is hwpoisoned or not, but it is somewhat costly in page fault
path.
So introduce a new page flag called HasHWPoisoned on the first tail
page. It indicates the THP has hwpoisoned subpage(s). It is set if any
subpage of THP is found hwpoisoned by memory failure and after the
refcount is bumped successfully, then cleared when the THP is freed or
split.
The soft offline path doesn't need this since soft offline handler just
marks a subpage hwpoisoned when the subpage is migrated successfully.
But shmem THP didn't get split then migrated at all.
Link: https://lkml.kernel.org/r/20211020210755.23964-3-shy828301@gmail.com
Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When handling THP hwpoison checked if the THP is in allocation or free
stage since hwpoison may mistreat it as hugetlb page. After commit
415c64c1453a ("mm/memory-failure: split thp earlier in memory error
handling") the problem has been fixed, so this check is no longer
needed. Remove it. The side effect of the removal is hwpoison may
report unsplit THP instead of unknown error for shmem THP. It seems not
like a big deal.
The following patch "mm: filemap: check if THP has hwpoisoned subpage
for PMD page fault" depends on this, which fixes shmem THP with
hwpoisoned subpage(s) are mapped PMD wrongly. So this patch needs to be
backported to -stable as well.
Link: https://lkml.kernel.org/r/20211020210755.23964-2-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Suggested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in
__vmalloc_area_node()") switched to bulk page allocator for order 0
allocation backing vmalloc. However bulk page allocator does not
support __GFP_ACCOUNT allocations and there are several users of
kvmalloc(__GFP_ACCOUNT).
For now make __GFP_ACCOUNT allocations bypass bulk page allocator. In
future if there is workload that can be significantly improved with the
bulk page allocator with __GFP_ACCCOUNT support, we can revisit the
decision.
Link: https://lkml.kernel.org/r/20211014151607.2171970-1-shakeelb@google.com
Fixes: 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reported-by: Vasily Averin <vvs@virtuozzo.com>
Tested-by: Vasily Averin <vvs@virtuozzo.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The reset callback may clear the internal card detect interrupts, so
make sure to reenable them if needed.
Fixes: b4d86f37eacb ("mmc: renesas_sdhi: do hard reset if possible")
Reported-by: Biju Das <biju.das.jz@bp.renesas.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211028195149.8003-1-wsa+renesas@sang-engineering.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Somehow we only have a list of subdirectories, which apparently made
it harder for folks to find the gpu maintainers. Fix that.
References: https://lore.kernel.org/dri-devel/YXrAAZlxxStNFG%2FK@phenom.ffwll.local/
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211028170857.4029606-1-daniel.vetter@ffwll.ch
|
|
This reverts commit 3bda2e5df476417b6d08967e2d84234a59d57b1c.
According to discussion with Andrew as follow:
https://lore.kernel.org/netdev/09eda9fe-196b-006b-6f01-f54e75715961@huawei.com/
HNS3 driver needs to separate pause autoneg from general autoneg, so revert
this incorrect patch.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Link: https://lore.kernel.org/r/20211028140624.53149-1-huangguangbin2@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
using packetdrill it's possible to observe that the receiver key contains
random values when clients transmit MP_CAPABLE with data and checksum (as
specified in RFC8684 §3.1). Fix the layout of mptcp_out_options, to avoid
using the skb extension copy when writing the MP_CAPABLE sub-option.
Fixes: d7b269083786 ("mptcp: shrink mptcp_out_options struct")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/233
Reported-by: Poorva Sonparote <psonparo@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Link: https://lore.kernel.org/r/20211027203855.264600-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The bpf_jit_binary_free() function requires a non-NULL argument. When
the RISC-V BPF JIT fails to converge in NR_JIT_ITERATIONS steps,
jit_data->header will be NULL, which triggers a NULL
dereference. Avoid this by checking the argument, prior calling the
function.
Fixes: ca6cb5447cec ("riscv, bpf: Factor common RISC-V JIT code")
Signed-off-by: Björn Töpel <bjorn@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20211028125115.514587-1-bjorn@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch fixes possible null pointer dereference in files
"rvu_debugfs.c" and "rvu_nix.c"
Fixes: 8756828a8148 ("octeontx2-af: Add NPA aura and pool contexts to debugfs")
Fixes: 9a946def264d ("octeontx2-af: Modify nix_vtag_cfg mailbox to support TX VTAG entries")
Signed-off-by: Rakesh Babu Saladi <rsaladi2@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, we are using a fixed buffer size of length 2048 to display
rsrc_alloc output. As a result a maximum of 2048 characters of
rsrc_alloc output is displayed, which may lead sometimes to display only
partial output. This patch fixes this dependency on max limit of buffer
size and displays all PF VF entries.
Each column of the debugfs entry "rsrc_alloc" uses a fixed width of 12
characters to print the list of LFs of each block for a PF/VF. If the
length of list of LFs of a block exceeds this fixed width then the list
gets truncated and displays only a part of the list. This patch fixes
this by using the maximum possible length of list of LFs among all
blocks of all PFs and VFs entries as the width size.
Fixes: f7884097141b ("octeontx2-af: Formatting debugfs entry rsrc_alloc.")
Fixes: 23205e6d06d4 ("octeontx2-af: Dump current resource provisioning status")
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <Sunil.Goutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While displaying ingress policers information in
debugfs check whether ingress policers exist in
the hardware or not because some platforms(CN9XXX)
do not have this feature.
Fixes: e7d8971763f3 ("octeontx2-af: cn10k: Debugfs support for bandwidth")
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Rakesh Babu <rsaladi2@marvell.com>
Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The driver allocates skb during ndo_open with GFP_ATOMIC which has high chance of failure when there are multiple instances.
GFP_KERNEL is enough while open and use GFP_ATOMIC only from interrupt context.
Fixes: 23f0703c125b ("lan743x: Add main source files for new lan743x driver")
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|