Age | Commit message (Collapse) | Author | Files | Lines |
|
If writeback_cache is enabled, then the size, mtime and ctime attributes of
regular files are always valid in the kernel's cache. They are retrieved
from userspace only when the inode is freshly looked up.
Add a more generic "cache_mask", that indicates which attributes are
currently valid in cache.
This patch doesn't change behavior.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
In case of writeback_cache fuse_fillattr() would revert the queried
attributes to the cached version.
Move this to fuse_change_attributes() in order to manage the writeback
logic in a central helper. This will be necessary for patches that follow.
Only fuse_do_getattr() -> fuse_fillattr() uses the attributes after calling
fuse_change_attributes(), so this should not change behavior.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
There are two instances of "bool is_wb = fc->writeback_cache" where the
actual use mostly involves checking "is_wb && S_ISREG(inode->i_mode)".
Clean up these cases by storing the second condition in the local variable.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
It's safe to call file_update_time() if writeback cache is not enabled,
since S_NOCMTIME is set in this case. This part is purely a cleanup.
__fuse_copy_file_range() also calls fuse_write_update_attr() only in the
writeback cache case. This is inconsistent with other callers, where it's
called unconditionally.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
A READ request returning a short count is taken as indication of EOF, and
the cached file size is modified accordingly.
Fix the attribute version checking to allow for changes to fc->attr_version
on other inodes.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Extend the fuse_write_update_attr() helper to invalidate cached attributes
after a write.
This has already been done in all cases except in fuse_notify_store(), so
this is mostly a cleanup.
fuse_direct_write_iter() calls fuse_direct_IO() which already calls
fuse_write_update_attr(), so don't repeat that again in the former.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
This function already updates the attr_version in fuse_inode, regardless of
whether the size was changed or not.
Rename the helper to fuse_write_update_attr() to reflect the more generic
nature.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The attribute version in fuse_inode should be updated whenever the
attributes might have changed on the server. In case of cached writes this
is not the case, so updating the attr_version is unnecessary and could
possibly affect performance.
Open code the remaining part of fuse_write_update_size().
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Only invalidate attributes that the operation might have changed.
Introduce two constants for common combinations of changed attributes:
FUSE_STATX_MODIFY: file contents are modified but not size
FUSE_STATX_MODSIZE: size and/or file contents modified
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The fuse_iget() call in create_new_entry() already updated the inode with
all the new attributes and incremented the attribute version.
Incrementing the nlink will result in the wrong count. This wasn't noticed
because the attributes were invalidated right after this.
Updating ctime is still needed for the writeback case when the ctime is not
refreshed.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Rename didn't decrement/clear nlink on overwritten target inode.
Create a common helper fuse_entry_unlinked() that handles this for unlink,
rmdir and rename.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Use list_first_entry_or_null() instead of list_empty() + list_entry().
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Logically it belongs there since attributes are invalidated due to the
updated ctime. This is a cleanup and should not change behavior.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
'ia->io=io' has been set in fuse_io_alloc.
Signed-off-by: Peng Hao <flyingpeng@tencent.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Due to the introduction of kmap_local_*, the storage of slots used for
short-term mapping has changed from per-CPU to per-thread. kmap_atomic()
disable preemption, while kmap_local_*() only disable migration.
There is no need to disable preemption in several kamp_atomic places used
in fuse.
Link: https://lwn.net/Articles/836144/
Signed-off-by: Peng Hao <flyingpeng@tencent.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Add missing inode lock annotatation; found by syzbot.
Reported-and-tested-by: syzbot+9f747458f5990eaa8d43@syzkaller.appspotmail.com
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Fuse ->release() is otherwise asynchronous for the reason that it can
happen in contexts unrelated to close/munmap.
Inode is already written back from fuse_flush(). Add it to
fuse_vma_close() as well to make sure inode dirtying from mmaps also get
written out before the file is released.
Also add error handling.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
In writeback cache mode mtime/ctime updates are cached, and flushed to the
server using the ->write_inode() callback.
Closing the file will result in a dirty inode being immediately written,
but in other cases the inode can remain dirty after all references are
dropped. This result in the inode being written back from reclaim, which
can deadlock on a regular allocation while the request is being served.
The usual mechanisms (GFP_NOFS/PF_MEMALLOC*) don't work for FUSE, because
serving a request involves unrelated userspace process(es).
Instead do the same as for dirty pages: make sure the inode is written
before the last reference is gone.
- fallocate(2)/copy_file_range(2): these call file_update_time() or
file_modified(), so flush the inode before returning from the call
- unlink(2), link(2) and rename(2): these call fuse_update_ctime(), so
flush the ctime directly from this helper
Reported-by: chenguanyou <chenguanyou@xiaomi.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Instead of "goto err", return error directly, since there's no error
cleanup to do now.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Syzkaller reports a null pointer dereference in fuse_test_super() that is
caused by sb->s_fs_info being NULL.
This is due to the fact that fuse_fill_super() is initializing s_fs_info,
which is too late, it's already on the fs_supers list. The initialization
needs to be done in sget_fc() with the sb_lock held.
Move allocation of fuse_mount and fuse_conn from fuse_fill_super() into
fuse_get_tree().
After this ->kill_sb() will always be called with non-NULL ->s_fs_info,
hence fuse_mount_destroy() can drop the test for non-NULL "fm".
Reported-by: syzbot+74a15f02ccb51f398601@syzkaller.appspotmail.com
Fixes: 5d5b74aa9c76 ("fuse: allow sharing existing sb")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
1. call fuse_mount_destroy() for open coded variants
2. before deactivate_locked_super() don't need fuse_mount destruction since
that will now be done (if ->s_fs_info is not cleared)
3. rearrange fuse_mount setup in fuse_get_tree_submount() so that the
regular pattern can be used
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The ->put_super callback is called from generic_shutdown_super() in case of
a fully initialized sb. This is called from kill_***_super(), which is
called from ->kill_sb instances.
Fuse uses ->put_super to destroy the fs specific fuse_mount and drop the
reference to the fuse_conn, while it does the same on each error case
during sb setup.
This patch moves the destruction from fuse_put_super() to
fuse_mount_destroy(), called at the end of all ->kill_sb instances. A
follup patch will clean up the error paths.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Checking "fm" works because currently sb->s_fs_info is cleared on error
paths; however, sb->s_root is what generic_shutdown_super() checks to
determine whether the sb was fully initialized or not.
This change will allow cleanup of sb setup error paths.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
|
|
Since commit 430a67f9d616 ("block, bfq: merge bursts of newly-created
queues"), BFQ maintains a per-group pointer to the last bfq_queue
created. If such a queue, say bfqq, happens to move to a different
group, then bfqq is no more a valid last bfq_queue created for its
previous group. That pointer must then be cleared. Not resetting such
a pointer may also cause UAF, if bfqq happens to also be freed after
being moved to a different group. This commit performs this missing
reset. As such it fixes commit 430a67f9d616 ("block, bfq: merge bursts
of newly-created queues").
Such a missing reset is most likely the cause of the crash reported in [1].
With some analysis, we found that this crash was due to the
above UAF. And such UAF did go away with this commit applied [1].
Anyway, before this commit, that crash happened to be triggered in
conjunction with commit 2d52c58b9c9b ("block, bfq: honor already-setup
queue merges"). The latter was then reverted by commit ebc69e897e17
("Revert "block, bfq: honor already-setup queue merges""). Yet commit
2d52c58b9c9b ("block, bfq: honor already-setup queue merges") contains
no error related with the above UAF, and can then be restored.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=214503
Fixes: 430a67f9d616 ("block, bfq: merge bursts of newly-created queues")
Tested-by: Grzegorz Kowal <custos.mentis@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Link: https://lore.kernel.org/r/20211015144336.45894-2-paolo.valente@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Warn when the last reference on a live disk is put without calling
del_gendisk first. There are some BDI related bug reports that look
like a case of this, so make sure we have the proper instrumentation
to catch it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211014130231.1468538-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
As with commit 8b52d8be86d72308 ("loop: reorder loop_exit"),
unregister_blkdev() needs to be called first in order to avoid calling
brd_alloc() from brd_probe() after brd_del_one() from brd_exit(). Then,
we can avoid holding global mutex during add_disk()/del_gendisk() as with
commit 1c500ad706383f1a ("loop: reduce the loop_ctl_mutex scope").
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/e205f13d-18ff-a49c-0988-7de6ea5ff823@i-love.sakura.ne.jp
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This is a fix for the fix (yeah, /facepalm).
The correct mask to use is not the negation of the MXCSR_MASK but the
actual mask which contains the supported bits in the MXCSR register.
Reported and debugged by Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: d298b03506d3 ("x86/fpu: Restore the masking out of reserved MXCSR bits")
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ser Olmy <ser.olmy@protonmail.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/YWgYIYXLriayyezv@intel.com
|
|
A new warning in clang points out a few places in this driver where a
bitwise OR is being used with boolean types:
drivers/input/touchscreen.c:81:17: warning: use of bitwise '|' with boolean operands [-Wbitwise-instead-of-logical]
data_present = touchscreen_get_prop_u32(dev, "touchscreen-min-x",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This use of a bitwise OR is intentional, as bitwise operations do not
short circuit, which allows all the calls to touchscreen_get_prop_u32()
to happen so that the last parameter is initialized while coalescing the
results of the calls to make a decision after they are all evaluated.
To make this clearer to the compiler, use the '|=' operator to assign
the result of each touchscreen_get_prop_u32() call to data_present,
which keeps the meaning of the code the same but makes it obvious that
every one of these calls is expected to happen.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20211014205757.3474635-1-nathan@kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
The Nacon GX100XF is already mapped, but it seems there is a Nacon
GC-100 (identified as NC5136Wht PCGC-100WHITE though I believe other
colours exist) with a different USB ID when in XInput mode.
Signed-off-by: Michael Cullen <michael@michaelcullen.name>
Link: https://lore.kernel.org/r/20211015192051.5196-1-michael@michaelcullen.name
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
For proper pressure calculation we need at least x and z1 to be non
zero. Even worse, in case z1 we may run in to division by zero
error.
Fixes: 60b7db914ddd ("Input: resistive-adc-touch - rework mapping of channels")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://lore.kernel.org/r/20211007095727.29579-1-o.rempel@pengutronix.de
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
On i.MX7S and i.MX8M* (but not i.MX6*) the pwrkey device has an
associated clock. Accessing the registers requires that this clock is
enabled. Binding the driver on at least i.MX7S and i.MX8MP while not
having the clock enabled results in a complete hang of the machine.
(This usually only happens if snvs_pwrkey is built as a module and the
rtc-snvs driver isn't already bound because at bootup the required clk
is on and only gets disabled when the clk framework disables unused clks
late during boot.)
This completes the fix in commit 135be16d3505 ("ARM: dts: imx7s: add
snvs clock to pwrkey").
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20211013062848.2667192-1-u.kleine-koenig@pengutronix.de
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
q->disk becomes invalid after the gendisk is removed. Work around this
by caching the dev_t for the tracepoints. The real fix would be to
properly tear down the I/O schedulers with the gendisk, but that is
a much more invasive change.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211012093301.GA27795@lst.de
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Don't switch back to percpu mode to avoid the double RCU grace period
when tearing down SCSI devices. After removing the disk only passthrough
commands can be send anyway.
Suggested-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20210929071241.934472-6-hch@lst.de
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Instead of delaying draining of file system I/O related items like the
blk-qos queues, the integrity read workqueue and timeouts only when the
request_queue is removed, do that when del_gendisk is called. This is
important for SCSI where the upper level drivers that control the gendisk
are separate entities, and the disk can be freed much earlier than the
request_queue, or can even be unbound without tearing down the queue.
Fixes: edb0872f44ec ("block: move the bdi from the request_queue to the gendisk")
Reported-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20210929071241.934472-5-hch@lst.de
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
To prepare for fixing a gendisk shutdown race, open code the
blk_queue_enter logic in bio_queue_enter. This also removes the
pointless flags translation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20210929071241.934472-4-hch@lst.de
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Factor out the code to try to get q_usage_counter without blocking into
a separate helper. Both to improve code readability and to prepare for
splitting bio_queue_enter from blk_queue_enter.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20210929071241.934472-3-hch@lst.de
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Ensure all bios check the current values of the queue under freeze
protection, i.e. to make sure the zero capacity set by del_gendisk
is actually seen before dispatching to the driver.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210929071241.934472-2-hch@lst.de
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
I received a build failure for a new patch I'm working on the nds32
architecture, and when I went to test it, I couldn't get to my build error,
because it failed to build with a bunch of:
Error: invalid operands (*UND* and *UND* sections) for `^'
issues with various files. Those files were temporary asm files that looked
like: kernel/.tmp_mc_fork.s
I decided to look deeper, and found that the "mc" portion of that name
stood for "mcount", and was created by the recordmcount.pl script. One that
I wrote over a decade ago. Once I knew the source of the problem, I was
able to investigate it further.
The way the recordmcount.pl script works (BTW, there's a C version that
simply modifies the ELF object) is by doing an "objdump" on the object
file. Looks for all the calls to "mcount", and creates an offset of those
locations from some global variable it can use (usually a global function
name, found with <.*>:). Creates a asm file that is a table of references
to these locations, using the found variable/function. Compiles it and
links it back into the original object file. This asm file is called
".tmp_mc_<object_base_name>.s".
The problem here is that the objdump produced by the nds32 object file,
contains things that look like:
0000159a <.L3^B1>:
159a: c6 00 beqz38 $r6, 159a <.L3^B1>
159a: R_NDS32_9_PCREL_RELA .text+0x159e
159c: 84 d2 movi55 $r6, #-14
159e: 80 06 mov55 $r0, $r6
15a0: ec 3c addi10.sp #0x3c
Where ".L3^B1 is somehow selected as the "global" variable to index off of.
Then the assembly file that holds the mcount locations looks like this:
.section __mcount_loc,"a",@progbits
.align 2
.long .L3^B1 + -5522
.long .L3^B1 + -5384
.long .L3^B1 + -5270
.long .L3^B1 + -5098
.long .L3^B1 + -4970
.long .L3^B1 + -4758
.long .L3^B1 + -4122
[...]
And when it is compiled back to an object to link to the original object,
the compile fails on the "^" symbol.
Simple solution for now, is to have the perl script ignore using function
symbols that have an "^" in the name.
Link: https://lkml.kernel.org/r/20211014143507.4ad2c0f7@gandalf.local.home
Cc: stable@vger.kernel.org
Acked-by: Greentime Hu <green.hu@gmail.com>
Fixes: fbf58a52ac088 ("nds32/ftrace: Add RECORD_MCOUNT support")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
In the big pgtable header split, I inadvertently introduced a couple of
duplicate symbols.
Fixes: fe6cb7b043b69cd9 ("ARC: mm: disintegrate pgtable.h into levels and flags")
Signed-off-by: Vineet Gupta <vgupta@kernel.org>
|
|
Building csky:allmodconfig results in the following build errors.
arch/csky/mm/tcm.c:9:2: error:
#error "You should define ITCM_RAM_BASE"
9 | #error "You should define ITCM_RAM_BASE"
| ^~~~~
arch/csky/mm/tcm.c:14:2: error:
#error "You should define DTCM_RAM_BASE"
14 | #error "You should define DTCM_RAM_BASE"
| ^~~~~
arch/csky/mm/tcm.c:18:2: error:
#error "You should define correct DTCM_RAM_BASE"
18 | #error "You should define correct DTCM_RAM_BASE"
This is seen with compile tests since those enable HAVE_TCM,
but do not provide useful default values for ITCM_RAM_BASE or
DTCM_RAM_BASE. Disable HAVE_TCM for commpile tests to avoid
the error.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guo Ren <guoren@kernel.org>
|
|
Building csky:allmodconfig results in the following build error.
In file included from ./include/linux/bitops.h:33,
from ./include/linux/log2.h:12,
from kernel/bounds.c:13:
./arch/csky/include/asm/bitops.h:77: error: "__clear_bit" redefined
Since commit 9248e52fec95 ("locking/atomic: simplify non-atomic wrappers"),
__clear_bit is defined in include/asm-generic/bitops/non-atomic.h,
and the define in the csky include file is no longer necessary or useful.
Remove it.
Fixes: 9248e52fec95 ("locking/atomic: simplify non-atomic wrappers")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guo Ren <guoren@kernel.org>
|
|
Compiling csky:allmodconfig with an upstream C compiler results
in the following error.
csky-linux-gcc: error:
unrecognized command-line option '-mbacktrace';
did you mean '-fbacktrace'?
Select ARCH_WANT_FRAME_POINTERS only if gcc supports it to
avoid the error.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guo Ren <guoren@kernel.org>
|
|
gpr_get() return the entire pt_regs (include sr) to userspace, if we
don't restore the C bit in gpr_set, it may break the ALU result in
that context. So the C flag bit is part of gpr context, that's why
riscv totally remove the C bit in the ISA. That makes sr reg clear
from userspace to supervisor privilege.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org
|
|
csky restore_sigcontext() blindly overwrites regs->sr with the value
it finds in sigcontext. Attacker can store whatever they want in there,
which includes things like S-bit. Userland shouldn't be able to set
that, or anything other than C flag (bit 0).
Do the same thing other architectures with protected bits in flags
register do - preserve everything that shouldn't be settable in
user mode, picking the rest from the value saved is sigcontext.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: stable@vger.kernel.org
|
|
We call idle_kvm_start_guest() from power7_offline() if the thread has
been requested to enter KVM. We pass it the SRR1 value that was returned
from power7_idle_insn() which tells us what sort of wakeup we're
processing.
Depending on the SRR1 value we pass in, the KVM code might enter the
guest, or it might return to us to do some host action if the wakeup
requires it.
If idle_kvm_start_guest() is able to handle the wakeup, and enter the
guest it is supposed to indicate that by returning a zero SRR1 value to
us.
That was the behaviour prior to commit 10d91611f426 ("powerpc/64s:
Reimplement book3s idle code in C"), however in that commit the
handling of SRR1 was reworked, and the zeroing behaviour was lost.
Returning from idle_kvm_start_guest() without zeroing the SRR1 value can
confuse the host offline code, causing the guest to crash and other
weirdness.
Fixes: 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015133929.832061-2-mpe@ellerman.id.au
|
|
In commit 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in
C") kvm_start_guest() became idle_kvm_start_guest(). The old code
allocated a stack frame on the emergency stack, but didn't use the
frame to store anything, and also didn't store anything in its caller's
frame.
idle_kvm_start_guest() on the other hand is written more like a normal C
function, it creates a frame on entry, and also stores CR/LR into its
callers frame (per the ABI). The problem is that there is no caller
frame on the emergency stack.
The emergency stack for a given CPU is allocated with:
paca_ptrs[i]->emergency_sp = alloc_stack(limit, i) + THREAD_SIZE;
So emergency_sp actually points to the first address above the emergency
stack allocation for a given CPU, we must not store above it without
first decrementing it to create a frame. This is different to the
regular kernel stack, paca->kstack, which is initialised to point at an
initial frame that is ready to use.
idle_kvm_start_guest() stores the backchain, CR and LR all of which
write outside the allocation for the emergency stack. It then creates a
stack frame and saves the non-volatile registers. Unfortunately the
frame it creates is not large enough to fit the non-volatiles, and so
the saving of the non-volatile registers also writes outside the
emergency stack allocation.
The end result is that we corrupt whatever is at 0-24 bytes, and 112-248
bytes above the emergency stack allocation.
In practice this has gone unnoticed because the memory immediately above
the emergency stack happens to be used for other stack allocations,
either another CPUs mc_emergency_sp or an IRQ stack. See the order of
calls to irqstack_early_init() and emergency_stack_init().
The low addresses of another stack are the top of that stack, and so are
only used if that stack is under extreme pressue, which essentially
never happens in practice - and if it did there's a high likelyhood we'd
crash due to that stack overflowing.
Still, we shouldn't be corrupting someone else's stack, and it is purely
luck that we aren't corrupting something else.
To fix it we save CR/LR into the caller's frame using the existing r1 on
entry, we then create a SWITCH_FRAME_SIZE frame (which has space for
pt_regs) on the emergency stack with the backchain pointing to the
existing stack, and then finally we switch to the new frame on the
emergency stack.
Fixes: 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015133929.832061-1-mpe@ellerman.id.au
|
|
SMI_COUNT MSR is supported on Sapphire Rapids CPU.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1633551137-192083-1-git-send-email-kan.liang@linux.intel.com
|
|
The newly added SPI device ID table does not work because the
entry is incorrectly copied from the OF device table.
During build testing, this shows as a compile failure when building
it as a loadable module:
drivers/misc/eeprom/eeprom_93xx46.c:424:1: error: redefinition of '__mod_of__eeprom_93xx46_of_table_device_table'
MODULE_DEVICE_TABLE(of, eeprom_93xx46_of_table);
Change the entry to refer to the correct symbol.
Fixes: 137879f7ff23 ("eeprom: 93xx46: Add SPI device ID table")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211014153730.3821376-1-arnd@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fix the following build/link error by adding a dependency on the CRC32
routines:
ld: drivers/gpu/drm/panel/panel-olimex-lcd-olinuxino.o: in function `lcd_olinuxino_probe':
panel-olimex-lcd-olinuxino.c:(.text+0x303): undefined reference to `crc32_le'
Fixes: 17fd7a9d324fd ("drm/panel: Add support for Olimex LCD-OLinuXino panel")
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20211012115242.10325-1-vegard.nossum@oracle.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
|