Age | Commit message (Collapse) | Author | Files | Lines |
|
If we open a file without read access and then pass the fd to a syscall
whose implementation calls kernel_read_file_from_fd(), we get a warning
from __kernel_read():
if (WARN_ON_ONCE(!(file->f_mode & FMODE_READ)))
This currently affects both finit_module() and kexec_file_load(), but it
could affect other syscalls in the future.
Link: https://lkml.kernel.org/r/20211007220110.600005-1-willy@infradead.org
Fixes: b844f0ecbc56 ("vfs: define kernel_copy_file_from_fd()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Hao Sun <sunhao.th@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mimi Zohar <zohar@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 6e7b64b9dd6d ("elfcore: fix building with clang") introduces
special handling for two architectures, ia64 and User Mode Linux.
However, the wrong name, i.e., CONFIG_UM, for the intended Kconfig
symbol for User-Mode Linux was used.
Although the directory for User Mode Linux is ./arch/um; the Kconfig
symbol for this architecture is called CONFIG_UML.
Luckily, ./scripts/checkkconfigsymbols.py warns on non-existing configs:
UM
Referencing files: include/linux/elfcore.h
Similar symbols: UML, NUMA
Correct the name of the config to the intended one.
[akpm@linux-foundation.org: fix um/x86_64, per Catalin]
Link: https://lkml.kernel.org/r/20211006181119.2851441-1-catalin.marinas@arm.com
Link: https://lkml.kernel.org/r/YV6pejGzLy5ppEpt@arm.com
Link: https://lkml.kernel.org/r/20211006082209.417-1-lukas.bulwahn@gmail.com
Fixes: 6e7b64b9dd6d ("elfcore: fix building with clang")
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Barret Rhoden <brho@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
kmem_cache_free_bulk() will call memcg_slab_free_hook() for all objects
when doing bulk free. So we shouldn't call memcg_slab_free_hook() again
for bulk free to avoid incorrect memcg slab count.
Link: https://lkml.kernel.org/r/20210916123920.48704-6-linmiaohe@huawei.com
Fixes: d1b2cf6cb84a ("mm: memcg/slab: uncharge during kmem_cache_free_bulk()")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Faiyaz Mohammed <faiyazm@codeaurora.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When sysfs_slab_add failed, we shouldn't call debugfs_slab_add() for s
because s will be freed soon. And slab_debugfs_fops will use s later
leading to a use-after-free.
Link: https://lkml.kernel.org/r/20210916123920.48704-5-linmiaohe@huawei.com
Fixes: 64dd68497be7 ("mm: slub: move sysfs slab alloc/free interfaces to debugfs")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Faiyaz Mohammed <faiyazm@codeaurora.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In error path, the random_seq of slub cache might be leaked. Fix this
by using __kmem_cache_release() to release all the relevant resources.
Link: https://lkml.kernel.org/r/20210916123920.48704-4-linmiaohe@huawei.com
Fixes: 210e7a43fa90 ("mm: SLUB freelist randomization")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Faiyaz Mohammed <faiyazm@codeaurora.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If object's reuse is delayed, it will be excluded from the reconstructed
freelist. But we forgot to adjust the cnt accordingly. So there will
be a mismatch between reconstructed freelist depth and cnt. This will
lead to free_debug_processing() complaining about freelist count or a
incorrect slub inuse count.
Link: https://lkml.kernel.org/r/20210916123920.48704-3-linmiaohe@huawei.com
Fixes: c3895391df38 ("kasan, slub: fix handling of kasan_slab_free hook")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Faiyaz Mohammed <faiyazm@codeaurora.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "Fixups for slub".
This series contains various bug fixes for slub. We fix memoryleak,
use-afer-free, NULL pointer dereferencing and so on in slub. More
details can be found in the respective changelogs.
This patch (of 5):
It's possible that __seq_open_private() will return NULL. So we should
check it before using lest dereferencing NULL pointer. And in error
paths, we forgot to release private buffer via seq_release_private().
Memory will leak in these paths.
Link: https://lkml.kernel.org/r/20210916123920.48704-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20210916123920.48704-2-linmiaohe@huawei.com
Fixes: 64dd68497be7 ("mm: slub: move sysfs slab alloc/free interfaces to debugfs")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Faiyaz Mohammed <faiyazm@codeaurora.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
syzbot reported access to unitialized memory in mbind() [1]
Issue came with commit bda420b98505 ("numa balancing: migrate on fault
among multiple bound nodes")
This commit added a new bit in MPOL_MODE_FLAGS, but only checked valid
combination (MPOL_F_NUMA_BALANCING can only be used with MPOL_BIND) in
do_set_mempolicy()
This patch moves the check in sanitize_mpol_flags() so that it is also
used by mbind()
[1]
BUG: KMSAN: uninit-value in __mpol_equal+0x567/0x590 mm/mempolicy.c:2260
__mpol_equal+0x567/0x590 mm/mempolicy.c:2260
mpol_equal include/linux/mempolicy.h:105 [inline]
vma_merge+0x4a1/0x1e60 mm/mmap.c:1190
mbind_range+0xcc8/0x1e80 mm/mempolicy.c:811
do_mbind+0xf42/0x15f0 mm/mempolicy.c:1333
kernel_mbind mm/mempolicy.c:1483 [inline]
__do_sys_mbind mm/mempolicy.c:1490 [inline]
__se_sys_mbind+0x437/0xb80 mm/mempolicy.c:1486
__x64_sys_mbind+0x19d/0x200 mm/mempolicy.c:1486
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x44/0xae
Uninit was created at:
slab_alloc_node mm/slub.c:3221 [inline]
slab_alloc mm/slub.c:3230 [inline]
kmem_cache_alloc+0x751/0xff0 mm/slub.c:3235
mpol_new mm/mempolicy.c:293 [inline]
do_mbind+0x912/0x15f0 mm/mempolicy.c:1289
kernel_mbind mm/mempolicy.c:1483 [inline]
__do_sys_mbind mm/mempolicy.c:1490 [inline]
__se_sys_mbind+0x437/0xb80 mm/mempolicy.c:1486
__x64_sys_mbind+0x19d/0x200 mm/mempolicy.c:1486
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x44/0xae
=====================================================
Kernel panic - not syncing: panic_on_kmsan set ...
CPU: 0 PID: 15049 Comm: syz-executor.0 Tainted: G B 5.15.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1ff/0x28e lib/dump_stack.c:106
dump_stack+0x25/0x28 lib/dump_stack.c:113
panic+0x44f/0xdeb kernel/panic.c:232
kmsan_report+0x2ee/0x300 mm/kmsan/report.c:186
__msan_warning+0xd7/0x150 mm/kmsan/instrumentation.c:208
__mpol_equal+0x567/0x590 mm/mempolicy.c:2260
mpol_equal include/linux/mempolicy.h:105 [inline]
vma_merge+0x4a1/0x1e60 mm/mmap.c:1190
mbind_range+0xcc8/0x1e80 mm/mempolicy.c:811
do_mbind+0xf42/0x15f0 mm/mempolicy.c:1333
kernel_mbind mm/mempolicy.c:1483 [inline]
__do_sys_mbind mm/mempolicy.c:1490 [inline]
__se_sys_mbind+0x437/0xb80 mm/mempolicy.c:1486
__x64_sys_mbind+0x19d/0x200 mm/mempolicy.c:1486
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x44/0xae
Link: https://lkml.kernel.org/r/20211001215630.810592-1-eric.dumazet@gmail.com
Fixes: bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
mem=[X][G|M] is broken on ARM64 platform, there are cases that even
type.cnt is 1, but total_size is not 0 because regions are merged into
1. So only check 'cnt' is not enough, total_size should be used,
othersize bootargs 'mem=[X][G|B]' not work anymore.
Link: https://lkml.kernel.org/r/20210930024437.32598-1-peng.fan@oss.nxp.com
Fixes: e888fa7bb882 ("memblock: Check memory add/cap ordering")
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Starting with kernel 5.11 built with CONFIG_FORTIFY_SOURCE mouting an
ocfs2 filesystem with either o2cb or pcmk cluster stack fails with the
trace below. Problem seems to be that strings for cluster stack and
cluster name are not guaranteed to be null terminated in the disk
representation, while strlcpy assumes that the source string is always
null terminated. This causes a read outside of the source string
triggering the buffer overflow detection.
detected buffer overflow in strlen
------------[ cut here ]------------
kernel BUG at lib/string.c:1149!
invalid opcode: 0000 [#1] SMP PTI
CPU: 1 PID: 910 Comm: mount.ocfs2 Not tainted 5.14.0-1-amd64 #1
Debian 5.14.6-2
RIP: 0010:fortify_panic+0xf/0x11
...
Call Trace:
ocfs2_initialize_super.isra.0.cold+0xc/0x18 [ocfs2]
ocfs2_fill_super+0x359/0x19b0 [ocfs2]
mount_bdev+0x185/0x1b0
legacy_get_tree+0x27/0x40
vfs_get_tree+0x25/0xb0
path_mount+0x454/0xa20
__x64_sys_mount+0x103/0x140
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
Link: https://lkml.kernel.org/r/20210929180654.32460-1-vvidic@valentin-vidic.from.hr
Signed-off-by: Valentin Vidic <vvidic@valentin-vidic.from.hr>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 6dbf7bb55598 ("fs: Don't invalidate page buffers in
block_write_full_page()") uncovered a latent bug in ocfs2 conversion
from inline inode format to a normal inode format.
The code in ocfs2_convert_inline_data_to_extents() attempts to zero out
the whole cluster allocated for file data by grabbing, zeroing, and
dirtying all pages covering this cluster. However these pages are
beyond i_size, thus writeback code generally ignores these dirty pages
and no blocks were ever actually zeroed on the disk.
This oversight was fixed by commit 693c241a5f6a ("ocfs2: No need to zero
pages past i_size.") for standard ocfs2 write path, inline conversion
path was apparently forgotten; the commit log also has a reasoning why
the zeroing actually is not needed.
After commit 6dbf7bb55598, things became worse as writeback code stopped
invalidating buffers on pages beyond i_size and thus these pages end up
with clean PageDirty bit but with buffers attached to these pages being
still dirty. So when a file is converted from inline format, then
writeback triggers, and then the file is grown so that these pages
become valid, the invalid dirtiness state is preserved,
mark_buffer_dirty() does nothing on these pages (buffers are already
dirty) but page is never written back because it is clean. So data
written to these pages is lost once pages are reclaimed.
Simple reproducer for the problem is:
xfs_io -f -c "pwrite 0 2000" -c "pwrite 2000 2000" -c "fsync" \
-c "pwrite 4000 2000" ocfs2_file
After unmounting and mounting the fs again, you can observe that end of
'ocfs2_file' has lost its contents.
Fix the problem by not doing the pointless zeroing during conversion
from inline format similarly as in the standard write path.
[akpm@linux-foundation.org: fix whitespace, per Joseph]
Link: https://lkml.kernel.org/r/20210930095405.21433-1-jack@suse.cz
Fixes: 6dbf7bb55598 ("fs: Don't invalidate page buffers in block_write_full_page()")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Cc: "Markov, Andrey" <Markov.Andrey@Dell.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The node demotion order needs to be updated during CPU hotplug. Because
whether a NUMA node has CPU may influence the demotion order. The
update function should be called during CPU online/offline after the
node_states[N_CPU] has been updated. That is done in
CPUHP_AP_ONLINE_DYN during CPU online and in CPUHP_MM_VMSTAT_DEAD during
CPU offline. But in commit 884a6e5d1f93 ("mm/migrate: update node
demotion order on hotplug events"), the function to update node demotion
order is called in CPUHP_AP_ONLINE_DYN during CPU online/offline. This
doesn't satisfy the order requirement.
For example, there are 4 CPUs (P0, P1, P2, P3) in 2 sockets (P0, P1 in S0
and P2, P3 in S1), the demotion order is
- S0 -> NUMA_NO_NODE
- S1 -> NUMA_NO_NODE
After P2 and P3 is offlined, because S1 has no CPU now, the demotion
order should have been changed to
- S0 -> S1
- S1 -> NO_NODE
but it isn't changed, because the order updating callback for CPU
hotplug doesn't see the new nodemask. After that, if P1 is offlined,
the demotion order is changed to the expected order as above.
So in this patch, we added CPUHP_AP_MM_DEMOTION_ONLINE and
CPUHP_MM_DEMOTION_DEAD to be called after CPUHP_AP_ONLINE_DYN and
CPUHP_MM_VMSTAT_DEAD during CPU online and offline, and register the
update function on them.
Link: https://lkml.kernel.org/r/20210929060351.7293-1-ying.huang@intel.com
Fixes: 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Keith Busch <kbusch@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Once upon a time, the node demotion updates were driven solely by memory
hotplug events. But now, there are handlers for both CPU and memory
hotplug.
However, the #ifdef around the code checks only memory hotplug. A
system that has HOTPLUG_CPU=y but MEMORY_HOTPLUG=n would miss CPU
hotplug events.
Update the #ifdef around the common code. Add memory and CPU-specific
#ifdefs for their handlers. These memory/CPU #ifdefs avoid unused
function warnings when their Kconfig option is off.
[arnd@arndb.de: rework hotplug_memory_notifier() stub]
Link: https://lkml.kernel.org/r/20211013144029.2154629-1-arnd@kernel.org
Link: https://lkml.kernel.org/r/20210924161255.E5FE8F7E@davehans-spike.ostc.intel.com
Fixes: 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "mm/migrate: 5.15 fixes for automatic demotion", v2.
This contains two fixes for the "automatic demotion" code which was
merged into 5.15:
* Fix memory hotplug performance regression by watching
suppressing any real action on irrelevant hotplug events.
* Ensure CPU hotplug handler is registered when memory hotplug
is disabled.
This patch (of 2):
== tl;dr ==
Automatic demotion opted for a simple, lazy approach to handling hotplug
events. This noticeably slows down memory hotplug[1]. Optimize away
updates to the demotion order when memory hotplug events should have no
effect.
This has no effect on CPU hotplug. There is no known problem on the CPU
side and any work there will be in a separate series.
== Background ==
Automatic demotion is a memory migration strategy to ensure that new
allocations have room in faster memory tiers on tiered memory systems.
The kernel maintains an array (node_demotion[]) to drive these
migrations.
The node_demotion[] path is calculated by starting at nodes with CPUs
and then "walking" to nodes with memory. Only hotplug events which
online or offline a node with memory (N_ONLINE) or CPUs (N_CPU) will
actually affect the migration order.
== Problem ==
However, the current code is lazy. It completely regenerates the
migration order on *any* CPU or memory hotplug event. The logic was
that these events are extremely rare and that the overhead from
indiscriminate order regeneration is minimal.
Part of the update logic involves a synchronize_rcu(), which is a pretty
big hammer. Its overhead was large enough to be detected by some 0day
tests that watch memory hotplug performance[1].
== Solution ==
Add a new helper (node_demotion_topo_changed()) which can differentiate
between superfluous and impactful hotplug events. Skip the expensive
update operation for superfluous events.
== Aside: Locking ==
It took me a few moments to declare the locking to be safe enough for
node_demotion_topo_changed() to work. It all hinges on the memory
hotplug lock:
During memory hotplug events, 'mem_hotplug_lock' is held for write.
This ensures that two memory hotplug events can not be called
simultaneously.
CPU hotplug has a similar lock (cpuhp_state_mutex) which also provides
mutual exclusion between CPU hotplug events. In addition, the demotion
code acquire and hold the mem_hotplug_lock for read during its CPU
hotplug handlers. This provides mutual exclusion between the demotion
memory hotplug callbacks and the CPU hotplug callbacks.
This effectively allows treating the migration target generation code to
act as if it is single-threaded.
1. https://lore.kernel.org/all/20210905135932.GE15026@xsang-OptiPlex-9020/
Link: https://lkml.kernel.org/r/20210924161251.093CCD06@davehans-spike.ostc.intel.com
Link: https://lkml.kernel.org/r/20210924161253.D7673E31@davehans-spike.ostc.intel.com
Fixes: 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events")
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
A race is possible when a process exits, its VMAs are removed by
exit_mmap() and at the same time userfaultfd_writeprotect() is called.
The race was detected by KASAN on a development kernel, but it appears
to be possible on vanilla kernels as well.
Use mmget_not_zero() to prevent the race as done in other userfaultfd
operations.
Link: https://lkml.kernel.org/r/20210921200247.25749-1-namit@vmware.com
Fixes: 63b2d4174c4ad ("userfaultfd: wp: add the writeprotect API to userfaultfd ioctl")
Signed-off-by: Nadav Amit <namit@vmware.com>
Tested-by: Li Wang <liwang@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In RHEL's gating selftests we've encountered memory corruption in the
uffd event test even with upstream kernel:
# ./userfaultfd anon 128 4
nr_pages: 32768, nr_pages_per_cpu: 32768
bounces: 3, mode: rnd racing read, userfaults: 6240 missing (6240) 14729 wp (14729)
bounces: 2, mode: racing read, userfaults: 1444 missing (1444) 28877 wp (28877)
bounces: 1, mode: rnd read, userfaults: 6055 missing (6055) 14699 wp (14699)
bounces: 0, mode: read, userfaults: 82 missing (82) 25196 wp (25196)
testing uffd-wp with pagemap (pgsize=4096): done
testing uffd-wp with pagemap (pgsize=2097152): done
testing events (fork, remap, remove): ERROR: nr 32427 memory corruption 0 1 (errno=0, line=963)
ERROR: faulting process failed (errno=0, line=1117)
It can be easily reproduced when global thp enabled, which is the
default for RHEL.
It's also known as a side effect of commit 0db282ba2c12 ("selftest: use
mmap instead of posix_memalign to allocate memory", 2021-07-23), which
is imho right itself on using mmap() to make sure the addresses will be
untagged even on arm.
The problem is, for each test we allocate buffers using two
allocate_area() calls. We assumed these two buffers won't affect each
other, however they could, because mmap() could have found that the two
buffers are near each other and having the same VMA flags, so they got
merged into one VMA.
It won't be a big problem if thp is not enabled, but when thp is
agressively enabled it means when initializing the src buffer it could
accidentally setup part of the dest buffer too when there's a shared THP
that overlaps the two regions. Then some of the dest buffer won't be
able to be trapped by userfaultfd missing mode, then it'll cause memory
corruption as described.
To fix it, do release_pages() after initializing the src buffer.
Since the previous two release_pages() calls are after
uffd_test_ctx_clear() which will unmap all the buffers anyway (which is
stronger than release pages; as unmap() also tear town pgtables), drop
them as they shouldn't really be anything useful.
We can mark the Fixes tag upon 0db282ba2c12 as it's reported to only
happen there, however the real "Fixes" IMHO should be 8ba6e8640844, as
before that commit we'll always do explicit release_pages() before
registration of uffd, and 8ba6e8640844 changed that logic by adding
extra unmap/map and we didn't release the pages at the right place.
Meanwhile I don't have a solid glue anyway on whether posix_memalign()
could always avoid triggering this bug, hence it's safer to attach this
fix to commit 8ba6e8640844.
Link: https://lkml.kernel.org/r/20210923232512.210092-1-peterx@redhat.com
Fixes: 8ba6e8640844 ("userfaultfd/selftests: reinitialize test context in each test")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1994931
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Li Wang <liwan@redhat.com>
Tested-by: Li Wang <liwang@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
Since commit 430a67f9d616 ("block, bfq: merge bursts of newly-created
queues"), BFQ maintains a per-group pointer to the last bfq_queue
created. If such a queue, say bfqq, happens to move to a different
group, then bfqq is no more a valid last bfq_queue created for its
previous group. That pointer must then be cleared. Not resetting such
a pointer may also cause UAF, if bfqq happens to also be freed after
being moved to a different group. This commit performs this missing
reset. As such it fixes commit 430a67f9d616 ("block, bfq: merge bursts
of newly-created queues").
Such a missing reset is most likely the cause of the crash reported in [1].
With some analysis, we found that this crash was due to the
above UAF. And such UAF did go away with this commit applied [1].
Anyway, before this commit, that crash happened to be triggered in
conjunction with commit 2d52c58b9c9b ("block, bfq: honor already-setup
queue merges"). The latter was then reverted by commit ebc69e897e17
("Revert "block, bfq: honor already-setup queue merges""). Yet commit
2d52c58b9c9b ("block, bfq: honor already-setup queue merges") contains
no error related with the above UAF, and can then be restored.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=214503
Fixes: 430a67f9d616 ("block, bfq: merge bursts of newly-created queues")
Tested-by: Grzegorz Kowal <custos.mentis@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Link: https://lore.kernel.org/r/20211015144336.45894-2-paolo.valente@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Warn when the last reference on a live disk is put without calling
del_gendisk first. There are some BDI related bug reports that look
like a case of this, so make sure we have the proper instrumentation
to catch it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211014130231.1468538-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
As with commit 8b52d8be86d72308 ("loop: reorder loop_exit"),
unregister_blkdev() needs to be called first in order to avoid calling
brd_alloc() from brd_probe() after brd_del_one() from brd_exit(). Then,
we can avoid holding global mutex during add_disk()/del_gendisk() as with
commit 1c500ad706383f1a ("loop: reduce the loop_ctl_mutex scope").
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/e205f13d-18ff-a49c-0988-7de6ea5ff823@i-love.sakura.ne.jp
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This is a fix for the fix (yeah, /facepalm).
The correct mask to use is not the negation of the MXCSR_MASK but the
actual mask which contains the supported bits in the MXCSR register.
Reported and debugged by Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: d298b03506d3 ("x86/fpu: Restore the masking out of reserved MXCSR bits")
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ser Olmy <ser.olmy@protonmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/YWgYIYXLriayyezv@intel.com
|
|
A new warning in clang points out a few places in this driver where a
bitwise OR is being used with boolean types:
drivers/input/touchscreen.c:81:17: warning: use of bitwise '|' with boolean operands [-Wbitwise-instead-of-logical]
data_present = touchscreen_get_prop_u32(dev, "touchscreen-min-x",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This use of a bitwise OR is intentional, as bitwise operations do not
short circuit, which allows all the calls to touchscreen_get_prop_u32()
to happen so that the last parameter is initialized while coalescing the
results of the calls to make a decision after they are all evaluated.
To make this clearer to the compiler, use the '|=' operator to assign
the result of each touchscreen_get_prop_u32() call to data_present,
which keeps the meaning of the code the same but makes it obvious that
every one of these calls is expected to happen.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20211014205757.3474635-1-nathan@kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
The Nacon GX100XF is already mapped, but it seems there is a Nacon
GC-100 (identified as NC5136Wht PCGC-100WHITE though I believe other
colours exist) with a different USB ID when in XInput mode.
Signed-off-by: Michael Cullen <michael@michaelcullen.name>
Link: https://lore.kernel.org/r/20211015192051.5196-1-michael@michaelcullen.name
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
For proper pressure calculation we need at least x and z1 to be non
zero. Even worse, in case z1 we may run in to division by zero
error.
Fixes: 60b7db914ddd ("Input: resistive-adc-touch - rework mapping of channels")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20211007095727.29579-1-o.rempel@pengutronix.de
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
On i.MX7S and i.MX8M* (but not i.MX6*) the pwrkey device has an
associated clock. Accessing the registers requires that this clock is
enabled. Binding the driver on at least i.MX7S and i.MX8MP while not
having the clock enabled results in a complete hang of the machine.
(This usually only happens if snvs_pwrkey is built as a module and the
rtc-snvs driver isn't already bound because at bootup the required clk
is on and only gets disabled when the clk framework disables unused clks
late during boot.)
This completes the fix in commit 135be16d3505 ("ARM: dts: imx7s: add
snvs clock to pwrkey").
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20211013062848.2667192-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
q->disk becomes invalid after the gendisk is removed. Work around this
by caching the dev_t for the tracepoints. The real fix would be to
properly tear down the I/O schedulers with the gendisk, but that is
a much more invasive change.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012093301.GA27795@lst.de
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Don't switch back to percpu mode to avoid the double RCU grace period
when tearing down SCSI devices. After removing the disk only passthrough
commands can be send anyway.
Suggested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20210929071241.934472-6-hch@lst.de
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Instead of delaying draining of file system I/O related items like the
blk-qos queues, the integrity read workqueue and timeouts only when the
request_queue is removed, do that when del_gendisk is called. This is
important for SCSI where the upper level drivers that control the gendisk
are separate entities, and the disk can be freed much earlier than the
request_queue, or can even be unbound without tearing down the queue.
Fixes: edb0872f44ec ("block: move the bdi from the request_queue to the gendisk")
Reported-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20210929071241.934472-5-hch@lst.de
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
To prepare for fixing a gendisk shutdown race, open code the
blk_queue_enter logic in bio_queue_enter. This also removes the
pointless flags translation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20210929071241.934472-4-hch@lst.de
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Factor out the code to try to get q_usage_counter without blocking into
a separate helper. Both to improve code readability and to prepare for
splitting bio_queue_enter from blk_queue_enter.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20210929071241.934472-3-hch@lst.de
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Ensure all bios check the current values of the queue under freeze
protection, i.e. to make sure the zero capacity set by del_gendisk
is actually seen before dispatching to the driver.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210929071241.934472-2-hch@lst.de
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
I received a build failure for a new patch I'm working on the nds32
architecture, and when I went to test it, I couldn't get to my build error,
because it failed to build with a bunch of:
Error: invalid operands (*UND* and *UND* sections) for `^'
issues with various files. Those files were temporary asm files that looked
like: kernel/.tmp_mc_fork.s
I decided to look deeper, and found that the "mc" portion of that name
stood for "mcount", and was created by the recordmcount.pl script. One that
I wrote over a decade ago. Once I knew the source of the problem, I was
able to investigate it further.
The way the recordmcount.pl script works (BTW, there's a C version that
simply modifies the ELF object) is by doing an "objdump" on the object
file. Looks for all the calls to "mcount", and creates an offset of those
locations from some global variable it can use (usually a global function
name, found with <.*>:). Creates a asm file that is a table of references
to these locations, using the found variable/function. Compiles it and
links it back into the original object file. This asm file is called
".tmp_mc_<object_base_name>.s".
The problem here is that the objdump produced by the nds32 object file,
contains things that look like:
0000159a <.L3^B1>:
159a: c6 00 beqz38 $r6, 159a <.L3^B1>
159a: R_NDS32_9_PCREL_RELA .text+0x159e
159c: 84 d2 movi55 $r6, #-14
159e: 80 06 mov55 $r0, $r6
15a0: ec 3c addi10.sp #0x3c
Where ".L3^B1 is somehow selected as the "global" variable to index off of.
Then the assembly file that holds the mcount locations looks like this:
.section __mcount_loc,"a",@progbits
.align 2
.long .L3^B1 + -5522
.long .L3^B1 + -5384
.long .L3^B1 + -5270
.long .L3^B1 + -5098
.long .L3^B1 + -4970
.long .L3^B1 + -4758
.long .L3^B1 + -4122
[...]
And when it is compiled back to an object to link to the original object,
the compile fails on the "^" symbol.
Simple solution for now, is to have the perl script ignore using function
symbols that have an "^" in the name.
Link: https://lkml.kernel.org/r/20211014143507.4ad2c0f7@gandalf.local.home
Cc: stable@vger.kernel.org
Acked-by: Greentime Hu <green.hu@gmail.com>
Fixes: fbf58a52ac088 ("nds32/ftrace: Add RECORD_MCOUNT support")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
In the big pgtable header split, I inadvertently introduced a couple of
duplicate symbols.
Fixes: fe6cb7b043b69cd9 ("ARC: mm: disintegrate pgtable.h into levels and flags")
Signed-off-by: Vineet Gupta <vgupta@kernel.org>
|
|
Building csky:allmodconfig results in the following build errors.
arch/csky/mm/tcm.c:9:2: error:
#error "You should define ITCM_RAM_BASE"
9 | #error "You should define ITCM_RAM_BASE"
| ^~~~~
arch/csky/mm/tcm.c:14:2: error:
#error "You should define DTCM_RAM_BASE"
14 | #error "You should define DTCM_RAM_BASE"
| ^~~~~
arch/csky/mm/tcm.c:18:2: error:
#error "You should define correct DTCM_RAM_BASE"
18 | #error "You should define correct DTCM_RAM_BASE"
This is seen with compile tests since those enable HAVE_TCM,
but do not provide useful default values for ITCM_RAM_BASE or
DTCM_RAM_BASE. Disable HAVE_TCM for commpile tests to avoid
the error.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guo Ren <guoren@kernel.org>
|
|
Building csky:allmodconfig results in the following build error.
In file included from ./include/linux/bitops.h:33,
from ./include/linux/log2.h:12,
from kernel/bounds.c:13:
./arch/csky/include/asm/bitops.h:77: error: "__clear_bit" redefined
Since commit 9248e52fec95 ("locking/atomic: simplify non-atomic wrappers"),
__clear_bit is defined in include/asm-generic/bitops/non-atomic.h,
and the define in the csky include file is no longer necessary or useful.
Remove it.
Fixes: 9248e52fec95 ("locking/atomic: simplify non-atomic wrappers")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guo Ren <guoren@kernel.org>
|
|
Compiling csky:allmodconfig with an upstream C compiler results
in the following error.
csky-linux-gcc: error:
unrecognized command-line option '-mbacktrace';
did you mean '-fbacktrace'?
Select ARCH_WANT_FRAME_POINTERS only if gcc supports it to
avoid the error.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guo Ren <guoren@kernel.org>
|
|
gpr_get() return the entire pt_regs (include sr) to userspace, if we
don't restore the C bit in gpr_set, it may break the ALU result in
that context. So the C flag bit is part of gpr context, that's why
riscv totally remove the C bit in the ISA. That makes sr reg clear
from userspace to supervisor privilege.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
|
|
csky restore_sigcontext() blindly overwrites regs->sr with the value
it finds in sigcontext. Attacker can store whatever they want in there,
which includes things like S-bit. Userland shouldn't be able to set
that, or anything other than C flag (bit 0).
Do the same thing other architectures with protected bits in flags
register do - preserve everything that shouldn't be settable in
user mode, picking the rest from the value saved is sigcontext.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: stable@vger.kernel.org
|
|
We call idle_kvm_start_guest() from power7_offline() if the thread has
been requested to enter KVM. We pass it the SRR1 value that was returned
from power7_idle_insn() which tells us what sort of wakeup we're
processing.
Depending on the SRR1 value we pass in, the KVM code might enter the
guest, or it might return to us to do some host action if the wakeup
requires it.
If idle_kvm_start_guest() is able to handle the wakeup, and enter the
guest it is supposed to indicate that by returning a zero SRR1 value to
us.
That was the behaviour prior to commit 10d91611f426 ("powerpc/64s:
Reimplement book3s idle code in C"), however in that commit the
handling of SRR1 was reworked, and the zeroing behaviour was lost.
Returning from idle_kvm_start_guest() without zeroing the SRR1 value can
confuse the host offline code, causing the guest to crash and other
weirdness.
Fixes: 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015133929.832061-2-mpe@ellerman.id.au
|
|
In commit 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in
C") kvm_start_guest() became idle_kvm_start_guest(). The old code
allocated a stack frame on the emergency stack, but didn't use the
frame to store anything, and also didn't store anything in its caller's
frame.
idle_kvm_start_guest() on the other hand is written more like a normal C
function, it creates a frame on entry, and also stores CR/LR into its
callers frame (per the ABI). The problem is that there is no caller
frame on the emergency stack.
The emergency stack for a given CPU is allocated with:
paca_ptrs[i]->emergency_sp = alloc_stack(limit, i) + THREAD_SIZE;
So emergency_sp actually points to the first address above the emergency
stack allocation for a given CPU, we must not store above it without
first decrementing it to create a frame. This is different to the
regular kernel stack, paca->kstack, which is initialised to point at an
initial frame that is ready to use.
idle_kvm_start_guest() stores the backchain, CR and LR all of which
write outside the allocation for the emergency stack. It then creates a
stack frame and saves the non-volatile registers. Unfortunately the
frame it creates is not large enough to fit the non-volatiles, and so
the saving of the non-volatile registers also writes outside the
emergency stack allocation.
The end result is that we corrupt whatever is at 0-24 bytes, and 112-248
bytes above the emergency stack allocation.
In practice this has gone unnoticed because the memory immediately above
the emergency stack happens to be used for other stack allocations,
either another CPUs mc_emergency_sp or an IRQ stack. See the order of
calls to irqstack_early_init() and emergency_stack_init().
The low addresses of another stack are the top of that stack, and so are
only used if that stack is under extreme pressue, which essentially
never happens in practice - and if it did there's a high likelyhood we'd
crash due to that stack overflowing.
Still, we shouldn't be corrupting someone else's stack, and it is purely
luck that we aren't corrupting something else.
To fix it we save CR/LR into the caller's frame using the existing r1 on
entry, we then create a SWITCH_FRAME_SIZE frame (which has space for
pt_regs) on the emergency stack with the backchain pointing to the
existing stack, and then finally we switch to the new frame on the
emergency stack.
Fixes: 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015133929.832061-1-mpe@ellerman.id.au
|
|
SMI_COUNT MSR is supported on Sapphire Rapids CPU.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1633551137-192083-1-git-send-email-kan.liang@linux.intel.com
|
|
The newly added SPI device ID table does not work because the
entry is incorrectly copied from the OF device table.
During build testing, this shows as a compile failure when building
it as a loadable module:
drivers/misc/eeprom/eeprom_93xx46.c:424:1: error: redefinition of '__mod_of__eeprom_93xx46_of_table_device_table'
MODULE_DEVICE_TABLE(of, eeprom_93xx46_of_table);
Change the entry to refer to the correct symbol.
Fixes: 137879f7ff23 ("eeprom: 93xx46: Add SPI device ID table")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211014153730.3821376-1-arnd@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fix the following build/link error by adding a dependency on the CRC32
routines:
ld: drivers/gpu/drm/panel/panel-olimex-lcd-olinuxino.o: in function `lcd_olinuxino_probe':
panel-olimex-lcd-olinuxino.c:(.text+0x303): undefined reference to `crc32_le'
Fixes: 17fd7a9d324fd ("drm/panel: Add support for Olimex LCD-OLinuXino panel")
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211012115242.10325-1-vegard.nossum@oracle.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Fix a build error on CONFIG_UML, which does not support (provide)
wbinvd(). UML can use the generic mb() instead.
../drivers/gpu/drm/r128/ati_pcigart.c: In function ‘drm_ati_pcigart_init’:
../drivers/gpu/drm/r128/ati_pcigart.c:218:2: error: implicit declaration of function ‘wbinvd’ [-Werror=implicit-function-declaration]
wbinvd();
^~~~~~
Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: linux-um@lists.infradead.org
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211011080006.31081-1-rdunlap@infradead.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Commit 64f7c698bea9 ("drm/nouveau/fifo: add engine_id hook") replaced
fifo/chang84.c g84_fifo_chan_engine() call with an indirect call of
fifo/g84.c g84_fifo_engine_id(). The G84_FIFO_ENGN_* values returned
from the later g84_fifo_engine_id() are incremented by 1 compared to
the previous g84_fifo_chan_engine() return values.
This is fine either way for most of the code, except this one line
where an engine bit programmed into the hardware is derived from the
return value. Decrement the return value accordingly, otherwise the
wrong engine bit is programmed into the hardware and that leads to
the following failure:
nouveau 0000:01:00.0: gr: 00000030 [ILLEGAL_MTHD ILLEGAL_CLASS] ch 1 [003fbce000 DRM] subc 3 class 0000 mthd 085c data 00000420
On the following hardware:
lspci -s 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation GT216GLM [Quadro FX 880M] (rev a2)
lspci -ns 01:00.0
01:00.0 0300: 10de:0a3c (rev a2)
Fixes: 64f7c698bea9 ("drm/nouveau/fifo: add engine_id hook")
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: <stable@vger.kernel.org> # 5.12+
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211007214117.231472-1-marex@denx.de
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Hyper-V supports a hardware cursor feature. It is not used by Linux VM,
but the Hyper-V host still draws a point as an extra mouse pointer,
which is unwanted, especially when Xorg is running.
The hyperv_fb driver uses synthvid_send_ptr() to hide the unwanted pointer.
When the hyperv_drm driver was developed, the function synthvid_send_ptr()
was not copied from the hyperv_fb driver. Fix the issue by adding the
function into hyperv_drm.
Fixes: 76c56a5affeb ("drm/hyperv: Add DRM driver for hyperv synthetic video device")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Deepak Rawat <drawat.floss@gmail.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20210916193644.45650-1-decui@microsoft.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Clamp the fbdev surface size of the available maximumi height to avoid
failing to init console emulation. An example error is shown below.
bad framebuffer height 2304, should be >= 768 && <= 768
[drm] Initialized simpledrm 1.0.0 20200625 for simple-framebuffer.0 on minor 0
simple-framebuffer simple-framebuffer.0: [drm] *ERROR* fbdev: Failed to setup generic emulation (ret=-22)
This is especially a problem with drivers that have very small screen
sizes and cannot over-allocate at all.
v2:
* reduce warning level (Ville)
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 11e8f5fd223b ("drm: Add simpledrm driver")
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reported-by: Amanoel Dawod <kernel@amanoeldawod.com>
Reported-by: Zoltán Kővágó <dirty.ice.hu@gmail.com>
Reported-by: Michael Stapelberg <michael+lkml@stapelberg.ch>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Maxime Ripard <maxime@cerno.tech>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v5.14+
Link: https://patchwork.freedesktop.org/patch/msgid/20211005070355.7680-1-tzimmermann@suse.de
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
In commit e11f5bd8228f ("drm: Add support for DP 1.4 Compliance edid
corruption test") the function connector_bad_edid() started assuming
that the memory for the EDID passed to it was big enough to hold
`edid[0x7e] + 1` blocks of data (1 extra for the base block). It
completely ignored the fact that the function was passed `num_blocks`
which indicated how much memory had been allocated for the EDID.
Let's fix this by adding a bounds check.
This is important for handling the case where there's an error in the
first block of the EDID. In that case we will call
connector_bad_edid() without having re-allocated memory based on
`edid[0x7e]`.
Fixes: e11f5bd8228f ("drm: Add support for DP 1.4 Compliance edid corruption test")
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211005192905.v2.1.Ib059f9c23c2611cb5a9d760e7d0a700c1295928d@changeid
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Starting with commit 6b2117ad65f1 ("of: property: fw_devlink: Add
support for "resets" and "pwms""), the imx-drm driver fails to load
due to forever dormant devlinks to the reset-controller node. This
node was never associated with a struct device.
Add a platform device to allow fw_devnode to activate the devlinks.
Fixes: 6b2117ad65f1 ("of: property: fw_devlink: Add support for "resets" and "pwms"")
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Tested-by: Fabio Estevam <festevam@gmail.com>
Reviewed-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
The `cpu` argument of perf_evsel__read() must specify the cpu index.
perf_cpu_map__for_each_cpu() is for iterating the cpu number (not index)
and is thus not appropriate for use with perf_evsel__read().
So, if there is an offline CPU, the cpu number specified in the argument
may point out of range because the cpu number and the cpu index are
different.
Fix test_stat_cpu().
Testing it:
# make tests -C tools/lib/perf/
make: Entering directory '/home/nakamura/kernel_src/linux-5.15-rc4_fix/tools/lib/perf'
running static:
- running tests/test-cpumap.c...OK
- running tests/test-threadmap.c...OK
- running tests/test-evlist.c...OK
- running tests/test-evsel.c...OK
running dynamic:
- running tests/test-cpumap.c...OK
- running tests/test-threadmap.c...OK
- running tests/test-evlist.c...OK
- running tests/test-evsel.c...OK
make: Leaving directory '/home/nakamura/kernel_src/linux-5.15-rc4_fix/tools/lib/perf'
Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20211011083704.4108720-1-nakamura.shun@fujitsu.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|