Age | Commit message (Collapse) | Author | Files | Lines |
|
For static initialization of spinlock_t variable, use DEFINE_SPINLOCK
instead of explicitly calling spin_lock_init().
Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Due to historical reasons mark_kernel_pXd() functions
misuse the notion of physical vs virtual addresses
difference.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
This patch fixes a lockdep splat introduced by commit f21916ec4826
("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated").
The lockdep splat only occurs when starting a Secure Execution guest.
Crypto virtualization (vfio_ap) is not yet supported for SE guests;
however, in order to avoid this problem when support becomes available,
this fix is being provided.
The circular locking dependency was introduced when the setting of the
masks in the guest's APCB was executed while holding the matrix_dev->lock.
While the lock is definitely needed to protect the setting/unsetting of the
matrix_mdev->kvm pointer, it is not necessarily critical for setting the
masks; so, the matrix_dev->lock will be released while the masks are being
set or cleared.
Keep in mind, however, that another process that takes the matrix_dev->lock
can get control while the masks in the guest's APCB are being set or
cleared as a result of the driver being notified that the KVM pointer
has been set or unset. This could result in invalid access to the
matrix_mdev->kvm pointer by the intervening process. To avoid this
scenario, two new fields are being added to the ap_matrix_mdev struct:
struct ap_matrix_mdev {
...
bool kvm_busy;
wait_queue_head_t wait_for_kvm;
...
};
The functions that handle notification that the KVM pointer value has
been set or cleared will set the kvm_busy flag to true until they are done
processing at which time they will set it to false and wake up the tasks on
the matrix_mdev->wait_for_kvm wait queue. Functions that require
access to matrix_mdev->kvm will sleep on the wait queue until they are
awakened at which time they can safely access the matrix_mdev->kvm
field.
Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated")
Cc: stable@vger.kernel.org
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
struct ccw1 is declared twice. One has been declared
at 21st line. Remove the duplicate.
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Acked-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
wait_queue_head_t can be initialized automatically with
DECLARE_WAIT_QUEUE_HEAD() rather than explicitly calling
init_waitqueue_head().
Signed-off-by: Shixin Liu <liushixin2@huawei.com>
Acked-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
static spinlock can be initialized automatically with DEFINE_SPINLOCK()
rather than explicitly calling spin_lock_init().
Signed-off-by: Shixin Liu <liushixin2@huawei.com>
Acked-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
On s390 each PCI device has a user-defined ID (UID) exposed under
/sys/bus/pci/devices/<dev>/uid. This ID was designed to serve as the PCI
device's primary index and to match the device within Linux to the
device configured in the hypervisor. To serve as a primary identifier
the UID must be unique within the Linux instance, this is guaranteed by
the platform if and only if the UID Uniqueness Checking flag is set
within the CLP List PCI Functions response.
While the UID has been exposed to userspace since commit ac4995b9d570
("s390/pci: add some new arch specific pci attributes") whether or not
the platform guarantees its uniqueness for the lifetime of the Linux
instance while defined is not visible from userspace. Remedy this by
exposing this as a per device attribute at
/sys/bus/pci/devices/<dev>/uid_is_unique
Keeping this a per device attribute allows for maximum flexibility if we
ever end up with some devices not having a UID or not enjoying the
guaranteed uniqueness.
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: Viktor Mihajlovski <mihajlov@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
In commit dee60c0dbc83 ("s390/pci: add zpci_event_hard_deconfigured()")
we added a zdev_enabled() check to what was previously an uncoditional
call to zpci_disable_device(). There are two problems with that. Firstly
zpci_had_deconfigured() is only called on event 0x0304 for which the
device is always already disabled by the platform so it is always false.
Secondly zpci_disable_device() not only disables the device but also
calls zpci_dma_exit_device() which is thus not called and we leak the
DMA tables.
Fix this by calling zpci_disable_device() unconditionally to perform
Linux side cleanup including the freeing of DMA tables.
Fixes: dee60c0dbc83 ("s390/pci: add zpci_event_hard_deconfigured()")
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
No need to add an align attribute for an integer.
The alignment is correct anyway.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
s/defintions/definitions/
s/intermedate/intermediate/
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20210322130533.3805976-1-unixbhaskar@gmail.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
prot_virt_host is only available if CONFIG_KVM is enabled. So lets use
a variable initialized to zero and overwrite it when that config
option is set with prot_virt_host.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Fixes: 37564ed834ac ("s390/uv: add prot virt guest/host indication files")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
s/struture/structure/
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Link: https://lore.kernel.org/r/20210322062500.3109603-1-unixbhaskar@gmail.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
We are spending way too much effort on qdio-internal bookkeeping for
QAOB management & caching, and it's still not robust. Once qdio's
TX path has detached the QAOB from a PENDING buffer, we lost all
track of it until it shows up in a CQ notification again. So if the
device is torn down before that notification arrives, we leak the QAOB.
Just have the driver take care of it, and simply pass down a QAOB if
they want a TX with async-completion capability. For a buffer in PENDING
state that requires the QAOB for final completion, qeth can now also try
to recycle the buffer's QAOB rather than unconditionally freeing it.
This also eliminates the qdio_outbuf_state array, which was only needed
to transfer the aob->user1 tag from the driver to the qdio layer.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Acked-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The zpci_remove_device() function removes the device from the PCI common
code core which is an operation dealing primarily with the zbus and PCI
bus code. With that and to match an upcoming refactoring of the
symmetric scanning part move it to the bus code.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
A zPCI event with PEC 0x0301 for an existing zPCI device goes through
the same actions as enable_slot(). Similarly a zPCI event with PEC
0x0303 does the same steps as disable_slot().
We can thus unify both actions as zpci_configure_device() respectively
zpci_deconfigure_device().
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
This patch introduces the mechanism to inject artificial events to the
CIO layer.
One of the main-event type which triggers the CommonIO operations are
Channel Report events. When a malfunction or other condition affecting
channel-subsystem operation is recognized, a Channel Report Word
(consisting of one or more CRWs) describing the condition is made
pending for retrieval and analysis by the program. The CRW contains
information concerning the identity and state of a facility following
the detection of the malfunction or other condition.
The patch introduces two debugfs interfaces which can be used to inject
'artificial' events from the userspace. It is intended to provide an easy
means to increase the test coverage for CIO code. And this functionality
can be enabled via a new configuration option CONFIG_CIO_INJECT.
The newly introduces debugfs interfaces can be used as mentioned below
to generate different fake-events. To use the crw_inject, first we should
enable it by using enable_inject interface.
i.e
echo 1 > /sys/kernel/debug/s390/cio/enable_inject
After the first step, user can simulate CRW as follows:
echo <solicited> <overflow> <chaining> <rsc> <ancillary> <erc> <rsid> \
> /sys/kernel/debug/s390/cio/crw_inject
Example:
A permanent error ERC on CHPID 0x60 would look like this:
echo 0 0 0 4 0 6 0x60 > /sys/kernel/debug/s390/cio/crw_inject
and an initialized ERC on the same CHPID:
echo 0 0 0 4 0 2 0x60 > /sys/kernel/debug/s390/cio/crw_inject
Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
This patch introduces an s390 Common I/O layer debugfs directory that is
intended to provide access to debugging-related CIO functions and data.
The directory is created as /sys/kernel/debug/s390/cio
Signed-off-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Extract the handling of PEC 0x0304 into a function and make sure we only
attempt to disable the function if it is enabled. Also check for errors
returned by zpci_disable_device() and leave the function alone if there
are any.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
When zpci_release_device() is called on a zPCI function that is still
configured it would not be deconfigured. Until now this hasn't caused
any problems because zpci_zdev_put() is only ever called for devices
in Standby or Reserved. Fix it by adding sclp_pci_deconfigure() to the
switch when in Configured state.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The current zdev->state mixes the configuration states supported by CLP
with an additional Online state which is used inconsistently to include
enabled zPCI functions which are not yet visible to the common PCI
subsytem. In preparation for a clean separation between architected
configuration states and fine grained function states remove the Online
function state.
Where we previously checked for Online it is more accurate to check if
the function is enabled to avoid an edge case where a disabled device
was still treated as Online. This also simplifies checks whether
a function is configured as this is now directly reflected by its
function state.
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Kernel and console/TTY messages written to the SCLP line mode console
are wrapped at 80 characters per line by the associated SCLP driver.
This makes long lines of output difficult to read, and requires editing
of wrapped lines copied from the output device.
Neither the firmware interface used to access the SCLP console, nor the
HMC "Operating System Messages" web interface that displays these
messages require such a length limit. Also other operating systems such
as z/VM do not impose similar limits on messages they emit to the same
console device.
This patch therefore increases the limit to 320 characters per line to
make SCLP line mode console output more readable. As a result 99% of
lines written during a typical boot will not be wrapped, compared to
about 50% wrapped lines at 80 characters per line. Another positive
side-effect of this change is that the HMC console interface is able to
keep more messages in its history buffer due to fewer line-breaks being
generated.
In a worst case scenario this means that a 4k console buffer is emitted
with the last ~400 bytes empty (320 text + 78 headers). This is more
than offset by the fact that each line that is not truncated saves 78
header bytes in the buffer. As a result the actual number of emitted
buffers should be about the same as with the 80 character limit.
This patch also removes the differentiation between line lengths of
SCLP line mode output on z/VM and non-z/VM systems. While the z/VM
hypervisor adds a prefix in front of each line ('xx: ' where xx is the
number of the CPU issuing the message), adjusting Linux line lengths did
not significantly increase readability of console output, and makes even
less of a difference with longer lines.
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Let's export the prot_virt_guest and prot_virt_host variables into the
UV sysfs firmware interface to make them easily consumable by
administrators.
prot_virt_host being 1 indicates that we did the UV
initialization (opt-in)
prot_virt_guest being 1 indicates that the UV indicates the share and
unshare ultravisor calls which is an indication that we are running as
a protected guest.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
|
|
Without that it's not safe to use them in a linked combination with
others.
Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE
should be possible.
We already handle short reads and writes for the following opcodes:
- IORING_OP_READV
- IORING_OP_READ_FIXED
- IORING_OP_READ
- IORING_OP_WRITEV
- IORING_OP_WRITE_FIXED
- IORING_OP_WRITE
- IORING_OP_SPLICE
- IORING_OP_TEE
Now we have it for these as well:
- IORING_OP_SENDMSG
- IORING_OP_SEND
- IORING_OP_RECVMSG
- IORING_OP_RECV
For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC
flags in order to call req_set_fail_links().
There might be applications arround depending on the behavior
that even short send[msg]()/recv[msg]() retuns continue an
IOSQE_IO_LINK chain.
It's very unlikely that such applications pass in MSG_WAITALL,
which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'.
It's expected that the low level sock_sendmsg() call just ignores
MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set
SO_ZEROCOPY.
We also expect the caller to know about the implicit truncation to
MAX_RW_COUNT, which we don't detect.
cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/c4e1a4cc0d905314f4d5dc567e65a7b09621aab3.1615908477.git.metze@samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Mark the current task as running if we need to run task_work from the
io-wq threads as part of work handling. If that is the case, then return
as such so that the caller can appropriately loop back and reset if it
was part of a going-to-sleep flush.
Fixes: 3bfe6106693b ("io-wq: fork worker threads from original task")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Just like we don't allow normal signals to IO threads, don't deliver a
STOP to a task that has PF_IO_WORKER set. The IO threads don't take
signals in general, and have no means of flushing out a stop either.
Longer term, we may want to look into allowing stop of these threads,
as it relates to eg process freezing. For now, this prevents a spin
issue if a SIGSTOP is delivered to the parent task.
Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
They don't take signals individually, and even if they share signals with
the parent task, don't allow them to be delivered through the worker
thread. Linux does allow this kind of behavior for regular threads, but
it's really a compatability thing that we need not care about for the IO
threads.
Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The inode update should be stopped before returing the error code.
Signed-off-by: Pan Bian <bianpan2016@163.com>
Link: https://lore.kernel.org/r/20210117085732.93788-1-bianpan2016@163.com
Fixes: 8016e29f4362 ("ext4: fast commit recovery path")
Cc: stable@kernel.org
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
This patch adds rename whiteout support in fast commits. Note that the
whiteout object that gets created is actually char device. Which
imples, the function ext4_inode_journal_mode(struct inode *inode)
would return "JOURNAL_DATA" for this inode. This has a consequence in
fast commit code that it will make creation of the whiteout object a
fast-commit ineligible behavior and thus will fall back to full
commits. With this patch, this can be observed by running fast commits
with rename whiteout and seeing the stats generated by ext4_fc_stats
tracepoint as follows:
ext4_fc_stats: dev 254:32 fc ineligible reasons:
XATTR:0, CROSS_RENAME:0, JOURNAL_FLAG_CHANGE:0, NO_MEM:0, SWAP_BOOT:0,
RESIZE:0, RENAME_DIR:0, FALLOC_RANGE:0, INODE_JOURNAL_DATA:16;
num_commits:6, ineligible: 6, numblks: 3
So in short, this patch guarantees that in case of rename whiteout, we
fall back to full commits.
Amir mentioned that instead of creating a new whiteout object for
every rename, we can create a static whiteout object with irrelevant
nlink. That will make fast commits to not fall back to full
commit. But until this happens, this patch will ensure correctness by
falling back to full commits.
Fixes: 8016e29f4362 ("ext4: fast commit recovery path")
Cc: stable@kernel.org
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20210316221921.1124955-1-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When filesystem mount fails because of corrupted filesystem we first
cancel the s_err_report timer reminding fs errors every day and only
then we flush s_error_work. However s_error_work may report another fs
error and re-arm timer thus resulting in timer use-after-free. Fix the
problem by first flushing the work and only after that canceling the
s_err_report timer.
Reported-by: syzbot+628472a2aac693ab0fcd@syzkaller.appspotmail.com
Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210315165906.2175-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If set_large_file = 1 and errors occur in ext4_handle_dirty_metadata(),
the error code will be overridden, go to out_brelse to avoid this
situation.
Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Link: https://lore.kernel.org/r/20210312065051.36314-1-luoshijie1@huawei.com
Cc: stable@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Syzbot report a warning that ext4 may create an empty ea_inode if set
an empty extent attribute to a file on the file system which is no free
blocks left.
WARNING: CPU: 6 PID: 10667 at fs/ext4/xattr.c:1640 ext4_xattr_set_entry+0x10f8/0x1114 fs/ext4/xattr.c:1640
...
Call trace:
ext4_xattr_set_entry+0x10f8/0x1114 fs/ext4/xattr.c:1640
ext4_xattr_block_set+0x1d0/0x1b1c fs/ext4/xattr.c:1942
ext4_xattr_set_handle+0x8a0/0xf1c fs/ext4/xattr.c:2390
ext4_xattr_set+0x120/0x1f0 fs/ext4/xattr.c:2491
ext4_xattr_trusted_set+0x48/0x5c fs/ext4/xattr_trusted.c:37
__vfs_setxattr+0x208/0x23c fs/xattr.c:177
...
Now, ext4 try to store extent attribute into an external inode if
ext4_xattr_block_set() return -ENOSPC, but for the case of store an
empty extent attribute, store the extent entry into the extent
attribute block is enough. A simple reproduce below.
fallocate test.img -l 1M
mkfs.ext4 -F -b 2048 -O ea_inode test.img
mount test.img /mnt
dd if=/dev/zero of=/mnt/foo bs=2048 count=500
setfattr -n "user.test" /mnt/foo
Reported-by: syzbot+98b881fdd8ebf45ab4ae@syzkaller.appspotmail.com
Fixes: 9c6e7853c531 ("ext4: reserve space for xattr entries/names")
Cc: stable@kernel.org
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20210305120508.298465-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
In ext4_rename(), when RENAME_WHITEOUT failed to add new entry into
directory, it ends up dropping new created whiteout inode under the
running transaction. After commit <9b88f9fb0d2> ("ext4: Do not iput inode
under running transaction"), we follow the assumptions that evict() does
not get called from a transaction context but in ext4_rename() it breaks
this suggestion. Although it's not a real problem, better to obey it, so
this patch add inode to orphan list and stop transaction before final
iput().
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20210303131703.330415-2-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If we failed to add new entry on rename whiteout, we cannot reset the
old->de entry directly, because the old->de could have moved from under
us during make indexed dir. So find the old entry again before reset is
needed, otherwise it may corrupt the filesystem as below.
/dev/sda: Entry '00000001' in ??? (12) has deleted/unused inode 15. CLEARED.
/dev/sda: Unattached inode 75
/dev/sda: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
Fixes: 6b4b8e6b4ad ("ext4: fix bug for rename with RENAME_WHITEOUT")
Cc: stable@vger.kernel.org
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20210303131703.330415-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
With interrupt force threading all device interrupt handlers are invoked
from kernel threads. Contrary to hard interrupt context the invocation only
disables bottom halfs, but not interrupts. This was an oversight back then
because any code like this will have an issue:
thread(irq_A)
irq_handler(A)
spin_lock(&foo->lock);
interrupt(irq_B)
irq_handler(B)
spin_lock(&foo->lock);
This has been triggered with networking (NAPI vs. hrtimers) and console
drivers where printk() happens from an interrupt which interrupted the
force threaded handler.
Now people noticed and started to change the spin_lock() in the handler to
spin_lock_irqsave() which affects performance or add IRQF_NOTHREAD to the
interrupt request which in turn breaks RT.
Fix the root cause and not the symptom and disable interrupts before
invoking the force threaded handler which preserves the regular semantics
and the usefulness of the interrupt force threading as a general debugging
tool.
For not RT this is not changing much, except that during the execution of
the threaded handler interrupts are delayed until the handler
returns. Vs. scheduling and softirq processing there is no difference.
For RT kernels there is no issue.
Fixes: 8d32a307e4fa ("genirq: Provide forced interrupt threading")
Reported-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Johan Hovold <johan@kernel.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20210317143859.513307808@linutronix.de
|
|
Architectures that describe the CPU topology in devicetree and do not have
an identity mapping between physical and logical CPU ids must override the
default implementation of arch_match_cpu_phys_id().
Failing to do so breaks CPU devicetree-node lookups using of_get_cpu_node()
and of_cpu_device_node_get() which several drivers rely on. It also causes
the CPU struct devices exported through sysfs to point to the wrong
devicetree nodes.
On x86, CPUs are described in devicetree using their APIC ids and those
do not generally coincide with the logical ids, even if CPU0 typically
uses APIC id 0.
Add the missing implementation of arch_match_cpu_phys_id() so that CPU-node
lookups work also with SMP.
Apart from fixing the broken sysfs devicetree-node links this likely does
not affect current users of mainline kernels on x86.
Fixes: 4e07db9c8db8 ("x86/devicetree: Use CPU description from Device Tree")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210312092033.26317-1-johan@kernel.org
|
|
Applications that create and extend and write to a file do not
expect to see 0 allocation size. When file is extended,
set its allocation size to a plausible value until we have a
chance to query the server for it. When the file is cached
this will prevent showing an impossible number of allocated
blocks (like 0). This fixes e.g. xfstests 614 which does
1) create a file and set its size to 64K
2) mmap write 64K to the file
3) stat -c %b for the file (to query the number of allocated blocks)
It was failing because we returned 0 blocks. Even though we would
return the correct cached file size, we returned an impossible
allocation size.
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: <stable@vger.kernel.org>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
|
|
Revert commit 44cc89f76464 ("PM: runtime: Update device status
before letting suppliers suspend") that introduced a race condition
into __rpm_callback() which allowed a concurrent rpm_resume() to
run and resume the device prematurely after its status had been
changed to RPM_SUSPENDED by __rpm_callback().
Fixes: 44cc89f76464 ("PM: runtime: Update device status before letting suppliers suspend")
Link: https://lore.kernel.org/linux-pm/24dfb6fc-5d54-6ee2-9195-26428b7ecf8a@intel.com/
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: 4.10+ <stable@vger.kernel.org> # 4.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Sites that match init_section_contains() get marked as INIT. For
built-in code init_sections contains both __init and __exit text. OTOH
kernel_text_address() only explicitly includes __init text (and there
are no __exit text markers).
Match what jump_label already does and ignore the warning for INIT
sites. Also see the excellent changelog for commit: 8f35eaa5f2de
("jump_label: Don't warn on __exit jump entries")
Fixes: 9183c3f9ed710 ("static_call: Add inline static call infrastructure")
Reported-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lkml.kernel.org/r/20210318113610.739542434@infradead.org
|
|
The intent is to avoid writing init code after init (because the text
might have been freed). The code is needlessly different between
jump_label and static_call and not obviously correct.
The existing code relies on the fact that the module loader clears the
init layout, such that within_module_init() always fails, while
jump_label relies on the module state which is more obvious and
matches the kernel logic.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lkml.kernel.org/r/20210318113610.636651340@infradead.org
|
|
It turns out that static_call_set_init() does not preserve the other
flags; IOW. it clears TAIL if it was set.
Fixes: 9183c3f9ed710 ("static_call: Add inline static call infrastructure")
Reported-by: Sumit Garg <sumit.garg@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
Tested-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lkml.kernel.org/r/20210318113610.519406371@infradead.org
|
|
Vitaly ran into an issue with hotplugging CPU0 on an Amazon instance where
the matrix allocator claimed to be out of vectors. He analyzed it down to
the point that IRQ2, the PIC cascade interrupt, which is supposed to be not
ever routed to the IO/APIC ended up having an interrupt vector assigned
which got moved during unplug of CPU0.
The underlying issue is that IRQ2 for various reasons (see commit
af174783b925 ("x86: I/O APIC: Never configure IRQ2" for details) is treated
as a reserved system vector by the vector core code and is not accounted as
a regular vector. The Amazon BIOS has an routing entry of pin2 to IRQ2
which causes the IO/APIC setup to claim that interrupt which is granted by
the vector domain because there is no sanity check. As a consequence the
allocation counter of CPU0 underflows which causes a subsequent unplug to
fail with:
[ ... ] CPU 0 has 4294967295 vectors, 589 available. Cannot disable CPU
There is another sanity check missing in the matrix allocator, but the
underlying root cause is that the IO/APIC code lost the IRQ2 ignore logic
during the conversion to irqdomains.
For almost 6 years nobody complained about this wreckage, which might
indicate that this requirement could be lifted, but for any system which
actually has a PIC IRQ2 is unusable by design so any routing entry has no
effect and the interrupt cannot be connected to a device anyway.
Due to that and due to history biased paranoia reasons restore the IRQ2
ignore logic and treat it as non existent despite a routing entry claiming
otherwise.
Fixes: d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210318192819.636943062@linutronix.de
|
|
The ioctl KVM_SET_BOOT_CPU_ID fails when called after vcpu creation.
Add this explanation in the documentation.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20210319091650.11967-1-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Commit 494c704f9af0 ("efi: Use 32-bit alignment for efi_guid_t") updated
the type definition of efi_guid_t to ensure that it always appears
sufficiently aligned (the UEFI spec is ambiguous about this, but given
the fact that its EFI_GUID type is defined in terms of a struct carrying
a uint32_t, the natural alignment is definitely >= 32 bits).
However, we missed the EFI_GUID() macro which is used to instantiate
efi_guid_t literals: that macro is still based on the guid_t type,
which does not have a minimum alignment at all. This results in warnings
such as
In file included from drivers/firmware/efi/mokvar-table.c:35:
include/linux/efi.h:1093:34: warning: passing 1-byte aligned argument to
4-byte aligned parameter 2 of 'get_var' may result in an unaligned pointer
access [-Walign-mismatch]
status = get_var(L"SecureBoot", &EFI_GLOBAL_VARIABLE_GUID, NULL, &size,
^
include/linux/efi.h:1101:24: warning: passing 1-byte aligned argument to
4-byte aligned parameter 2 of 'get_var' may result in an unaligned pointer
access [-Walign-mismatch]
get_var(L"SetupMode", &EFI_GLOBAL_VARIABLE_GUID, NULL, &size, &setupmode);
The distinction only matters on CPUs that do not support misaligned loads
fully, but 32-bit ARM's load-multiple instructions fall into that category,
and these are likely to be emitted by the compiler that built the firmware
for loading word-aligned 128-bit GUIDs from memory
So re-implement the initializer in terms of our own efi_guid_t type, so that
the alignment becomes a property of the literal's type.
Fixes: 494c704f9af0 ("efi: Use 32-bit alignment for efi_guid_t")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1327
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
In the for loop in efi_mem_reserve_persistent(), prsv = rsv->next
use the unmapped rsv. Use the unmapped pages will cause segment
fault.
Fixes: 18df7577adae6 ("efi/memreserve: deal with memreserve entries in unmapped memory")
Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
If CONFIG_CIFS_ROOT is not set, rootfs mount option is invalid
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
CC: <stable@vger.kernel.org> # v5.11
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
A typo is found out by codespell tool in 251th lines of cifs_swn.c:
$ codespell ./fs/cifs/
./cifs_swn.c:251: funciton ==> function
Fix a typo found by codespell.
Signed-off-by: Liu xuzhi <liu.xuzhi@zte.com.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Starting with commit f295c8cfec833c2707ff1512da10d65386dde7af
("drm/nouveau: fix dma syncing warning with debugging on.")
the following oops occures:
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 6 PID: 1013 Comm: Xorg.bin Tainted: G E 5.11.0-desktop-rc0+ #2
Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.11 08/01/2018
RIP: 0010:nouveau_bo_sync_for_device+0x40/0xb0 [nouveau]
Call Trace:
nouveau_bo_validate+0x5d/0x80 [nouveau]
nouveau_gem_ioctl_pushbuf+0x662/0x1120 [nouveau]
? nouveau_gem_ioctl_new+0xf0/0xf0 [nouveau]
drm_ioctl_kernel+0xa6/0xf0 [drm]
drm_ioctl+0x1f4/0x3a0 [drm]
? nouveau_gem_ioctl_new+0xf0/0xf0 [nouveau]
nouveau_drm_ioctl+0x50/0xa0 [nouveau]
__x64_sys_ioctl+0x7e/0xb0
do_syscall_64+0x33/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
---[ end trace ccfb1e7f4064374f ]---
RIP: 0010:nouveau_bo_sync_for_device+0x40/0xb0 [nouveau]
The underlying problem is not introduced by the commit, yet it uncovered the
underlying issue. The cited commit relies on valid pages. This is not given for
due to some bugs. For now, just warn and work around the issue by just ignoring
the bad ttm objects.
Below is some debug info gathered while debugging this issue:
nouveau 0000:01:00.0: DRM: ttm_dma->num_pages: 2048
nouveau 0000:01:00.0: DRM: ttm_dma->pages is NULL
nouveau 0000:01:00.0: DRM: ttm_dma: 00000000e96058e7
nouveau 0000:01:00.0: DRM: ttm_dma->page_flags:
nouveau 0000:01:00.0: DRM: ttm_dma: Populated: 1
nouveau 0000:01:00.0: DRM: ttm_dma: No Retry: 0
nouveau 0000:01:00.0: DRM: ttm_dma: SG: 256
nouveau 0000:01:00.0: DRM: ttm_dma: Zero Alloc: 0
nouveau 0000:01:00.0: DRM: ttm_dma: Swapped: 0
Signed-off-by: Tobias Klausmann <tobias.klausmann@freenet.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210313222159.3346-1-tobias.klausmann@freenet.de
|
|
After commit 997acaf6b4b59c (lockdep: report broken irq restoration), the guest
splatting below during boot:
raw_local_irq_restore() called with IRQs enabled
WARNING: CPU: 1 PID: 169 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x26/0x30
Modules linked in: hid_generic usbhid hid
CPU: 1 PID: 169 Comm: systemd-udevd Not tainted 5.11.0+ #25
RIP: 0010:warn_bogus_irq_restore+0x26/0x30
Call Trace:
kvm_wait+0x76/0x90
__pv_queued_spin_lock_slowpath+0x285/0x2e0
do_raw_spin_lock+0xc9/0xd0
_raw_spin_lock+0x59/0x70
lockref_get_not_dead+0xf/0x50
__legitimize_path+0x31/0x60
legitimize_root+0x37/0x50
try_to_unlazy_next+0x7f/0x1d0
lookup_fast+0xb0/0x170
path_openat+0x165/0x9b0
do_filp_open+0x99/0x110
do_sys_openat2+0x1f1/0x2e0
do_sys_open+0x5c/0x80
__x64_sys_open+0x21/0x30
do_syscall_64+0x32/0x50
entry_SYSCALL_64_after_hwframe+0x44/0xae
The new consistency checking, expects local_irq_save() and
local_irq_restore() to be paired and sanely nested, and therefore expects
local_irq_restore() to be called with irqs disabled.
The irqflags handling in kvm_wait() which ends up doing:
local_irq_save(flags);
safe_halt();
local_irq_restore(flags);
instead triggers it. This patch fixes it by using
local_irq_disable()/enable() directly.
Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1615791328-2735-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|