Age | Commit message (Collapse) | Author | Files | Lines |
|
Some fields such as gt_logd_secs of the struct gfs2_tune are accessed
without holding the lock gt_spin in gfs2_show_options():
val = sdp->sd_tune.gt_logd_secs;
if (val != 30)
seq_printf(s, ",commit=%d", val);
And thus can cause data races when gfs2_show_options() and other functions
such as gfs2_reconfigure() are concurrently executed:
spin_lock(>->gt_spin);
gt->gt_logd_secs = newargs->ar_commit;
To fix these possible data races, the lock sdp->sd_tune.gt_spin is
acquired before accessing the fields of gfs2_tune and released after these
accesses.
Further changes by Andreas:
- Don't hold the spin lock over the seq_printf operations.
Reported-by: BassCheck <bass@buaa.edu.cn>
Signed-off-by: Tuo Li <islituo@gmail.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
In gfs2_file_buffered_write(), we currently jump from the second call of
function should_fault_in_pages() to above the first call, so
should_fault_in_pages() is getting called twice in a row, causing it to
accidentally fall back to single-page writes rather than trying the more
efficient multi-page writes first.
Fix that by moving the retry label to the correct place, behind the
first call to should_fault_in_pages().
Fixes: e1fa9ea85ce8 ("gfs2: Stop using glock holder auto-demotion for now")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Since commit a2ad63daa88b ("VFS: add FMODE_CAN_ODIRECT file flag"), file
systems can just set the FMODE_CAN_ODIRECT flag at open time instead of
wiring up a dummy direct_IO method to indicate support for direct I/O.
Remove .direct_IO from gfs2_aops and set FMODE_CAN_ODIRECT in
gfs2_open_common for regular files that do not use data journalling.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
This patch changes function evict_unlinked_inode so it does not call
gfs2_inode_remember_delete until it gets a good return code from
gfs2_dinode_dealloc.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Function gfs2_free_di was changing the rgrp lvb count of unlinked
dinodes after the lock was released. This patch moves it inside the
lock.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
This patch fixes a case in which function gfs2_quota_hold encounters an
assert error and exits. The lack of gfs2_qa_put causes further problems
when the inode is evicted and the get/put count is non-zero.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Before this patch, function gfs2_dinode_dealloc would abort if it got a
bad return code from gfs2_rindex_update(). The problem is that it left the
dinode in the unlinked (not free) state, which meant subsequent fsck
would clean it up and flag an error. That meant some of our QE tests
would fail.
The sole purpose of gfs2_rindex_update(), in this code path, is to read in
any newer rgrps added by gfs2_grow. But since this is a delete operation
it won't actually use any of those new rgrps. It can really only twiddle
the bits from "Unlinked" to "Free" in an existing rgrp. Therefore the
error should not prevent the transition from unlinked to free.
This patch makes gfs2_dinode_dealloc ignore the bad return code and
proceed with freeing the dinode so the QE tests will not be tripped up.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
This patch introduces a new out_free label and consolidates the three
places function gdlm_put_lock freed the glock. No change in
functionality.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
It seems we forgot the normal case to terminate the retry loop,
making us asking 3 times each command, which is probably a little bit
too much.
And remove the ugly "goto exit" that can be replaced by a simpler "break"
Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
Suggested-by: Mark Lord <mlord@pobox.com>
Tested-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
|
|
|
The introduction of high resolution PWM support changed the order of the
operations in the calculation of min and max period. The result in both
divisions is in most cases a truncation to 0, which limits the period to
the range of [0, 0].
Both numerators (and denominators) are within 64 bits, so the whole
expression can be put directly into the div64_u64, instead of doing it
partially.
Fixes: b00d2ed37617 ("leds: rgb: leds-qcom-lpg: Add support for high resolution PWM")
Reviewed-by: Caleb Connolly <caleb.connolly@linaro.org>
Tested-by: Steev Klimaszewski <steev@kali.org>
Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Acked-by: Lee Jones <lee@kernel.org>
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-QRD
Link: https://lore.kernel.org/r/20230515162604.649203-1-quic_bjorande@quicinc.com
Signed-off-by: Johan Hovold <johan@kernel.org>
|
|
Since the event-filter-function.tc expects the 'exit_mmap()' directly
calls 'kmem_cache_free()', this is vulnerable to code modifications.
Choose the target function for the filter test from the sample
event data so that it can keep test running correctly even if the caller
function name will be changed.
Link: https://lore.kernel.org/linux-trace-kernel/167919441260.1922645.18355804179347364057.stgit@mhiramat.roam.corp.google.com/
Link: https://lore.kernel.org/all/CA+G9fYtF-XEKi9YNGgR=Kf==7iRb2FrmEC7qtwAeQbfyah-UhA@mail.gmail.com/
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Fixes: 7f09d639b8c4 ("tracing/selftests: Add test for event filtering on function name")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Keep switching between LAPIC_MODE_X2APIC and LAPIC_MODE_DISABLED during
APIC map construction to hunt for TOCTOU bugs in KVM. KVM's optimized map
recalc makes multiple passes over the list of vCPUs, and the calculations
ignore vCPU's whose APIC is hardware-disabled, i.e. there's a window where
toggling LAPIC_MODE_DISABLED is quite interesting.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230602233250.1014316-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Bail from kvm_recalculate_phys_map() and disable the optimized map if the
target vCPU's x2APIC ID is out-of-bounds, i.e. if the vCPU was added
and/or enabled its local APIC after the map was allocated. This fixes an
out-of-bounds access bug in the !x2apic_format path where KVM would write
beyond the end of phys_map.
Check the x2APIC ID regardless of whether or not x2APIC is enabled,
as KVM's hardcodes x2APIC ID to be the vCPU ID, i.e. it can't change, and
the map allocation in kvm_recalculate_apic_map() doesn't check for x2APIC
being enabled, i.e. the check won't get false postivies.
Note, this also affects the x2apic_format path, which previously just
ignored the "x2apic_id > new->max_apic_id" case. That too is arguably a
bug fix, as ignoring the vCPU meant that KVM would not send interrupts to
the vCPU until the next map recalculation. In practice, that "bug" is
likely benign as a newly present vCPU/APIC would immediately trigger a
recalc. But, there's no functional downside to disabling the map, and
a future patch will gracefully handle the -E2BIG case by retrying instead
of simply disabling the optimized map.
Opportunistically add a sanity check on the xAPIC ID size, along with a
comment explaining why the xAPIC ID is guaranteed to be "good".
Reported-by: Michal Luczaj <mhal@rbox.co>
Fixes: 5b84b0291702 ("KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230602233250.1014316-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Increment vcpu->stat.exits when handling a fastpath VM-Exit without
going through any part of the "slow" path. Not bumping the exits stat
can result in wildly misleading exit counts, e.g. if the primary reason
the guest is exiting is to program the TSC deadline timer.
Fixes: 404d5d7bff0d ("KVM: X86: Introduce more exit_fastpath_completion enum values")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230602011920.787844-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware
I noticed that with vCPU count large enough (> 16) they sometimes froze at
boot.
With vCPU count of 64 they never booted successfully - suggesting some kind
of a race condition.
Since adding "vnmi=0" module parameter made these guests boot successfully
it was clear that the problem is most likely (v)NMI-related.
Running kvm-unit-tests quickly showed failing NMI-related tests cases, like
"multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests
and the NMI parts of eventinj test.
The issue was that once one NMI was being serviced no other NMI was allowed
to be set pending (NMI limit = 0), which was traced to
svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather
than for the "NMI pending" flag.
Fix this by testing for the right flag in svm_is_vnmi_pending().
Once this is done, the NMI-related kvm-unit-tests pass successfully and
the Windows guest no longer freezes at boot.
Fixes: fa4c027a7956 ("KVM: x86: Add support for SVM's Virtual NMI")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/be4ca192eb0c1e69a210db3009ca984e6a54ae69.1684495380.git.maciej.szmigiero@oracle.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Factor in the address space (non-SMM vs. SMM) of the target shadow page
when recovering potential NX huge pages, otherwise KVM will retrieve the
wrong memslot when zapping shadow pages that were created for SMM. The
bug most visibly manifests as a WARN on the memslot being non-NULL, but
the worst case scenario is that KVM could unaccount the shadow page
without ensuring KVM won't install a huge page, i.e. if the non-SMM slot
is being dirty logged, but the SMM slot is not.
------------[ cut here ]------------
WARNING: CPU: 1 PID: 3911 at arch/x86/kvm/mmu/mmu.c:7015
kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm]
CPU: 1 PID: 3911 Comm: kvm-nx-lpage-re
RIP: 0010:kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm]
RSP: 0018:ffff99b284f0be68 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff99b284edd000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff9271397024e0 R08: 0000000000000000 R09: ffff927139702450
R10: 0000000000000000 R11: 0000000000000001 R12: ffff99b284f0be98
R13: 0000000000000000 R14: ffff9270991fcd80 R15: 0000000000000003
FS: 0000000000000000(0000) GS:ffff927f9f640000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0aacad3ae0 CR3: 000000088fc2c005 CR4: 00000000003726e0
Call Trace:
<TASK>
__pfx_kvm_nx_huge_page_recovery_worker+0x10/0x10 [kvm]
kvm_vm_worker_thread+0x106/0x1c0 [kvm]
kthread+0xd9/0x100
ret_from_fork+0x2c/0x50
</TASK>
---[ end trace 0000000000000000 ]---
This bug was exposed by commit edbdb43fc96b ("KVM: x86: Preserve TDP MMU
roots until they are explicitly invalidated"), which allowed KVM to retain
SMM TDP MMU roots effectively indefinitely. Before commit edbdb43fc96b,
KVM would zap all SMM TDP MMU roots and thus all SMM TDP MMU shadow pages
once all vCPUs exited SMM, which made the window where this bug (recovering
an SMM NX huge page) could be encountered quite tiny. To hit the bug, the
NX recovery thread would have to run while at least one vCPU was in SMM.
Most VMs typically only use SMM during boot, and so the problematic shadow
pages were gone by the time the NX recovery thread ran.
Now that KVM preserves TDP MMU roots until they are explicitly invalidated
(e.g. by a memslot deletion), the window to trigger the bug is effectively
never closed because most VMMs don't delete memslots after boot (except
for a handful of special scenarios).
Fixes: eb298605705a ("KVM: x86/mmu: Do not recover dirty-tracked NX Huge Pages")
Reported-by: Fabio Coatti <fabio.coatti@gmail.com>
Closes: https://lore.kernel.org/all/CADpTngX9LESCdHVu_2mQkNGena_Ng2CphWNwsRGSMxzDsTjU2A@mail.gmail.com
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230602010137.784664-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
With commit 858e8b792d06 ("tpm, tpm_tis: Avoid cache incoherency in test
for interrupts") bit accessor functions are used to access flags in
tpm_tis_data->flags.
However these functions expect bit numbers, while the flags are defined
as bit masks in enum tpm_tis_flag.
Fix this inconsistency by using numbers instead of masks also for the
flags in the enum.
Reported-by: Pavel Machek <pavel@denx.de>
Fixes: 858e8b792d06 ("tpm, tpm_tis: Avoid cache incoherency in test for interrupts")
Signed-off-by: Lino Sanfilippo <l.sanfilippo@kunbus.com>
Cc: stable@vger.kernel.org
Reviewed-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This reverts commit 32c0869370194ae5ac9f9f501953ef693040f6a1.
The reverted commit was intended to remove a dead check however it was observed
that this check was actually being used to exit early instead of looping
sbi->s_mb_max_to_scan times when we are able to find a free extent bigger than
the goal extent. Due to this, a my performance tests (fsmark, parallel file
writes in a highly fragmented FS) were seeing a 2x-3x regression.
Example, the default value of the following variables is:
sbi->s_mb_max_to_scan = 200
sbi->s_mb_min_to_scan = 10
In ext4_mb_check_limits() if we find an extent smaller than goal, then we return
early and try again. This loop will go on until we have processed
sbi->s_mb_max_to_scan(=200) number of free extents at which point we exit and
just use whatever we have even if it is smaller than goal extent.
Now, the regression comes when we find an extent bigger than goal. Earlier, in
this case we would loop only sbi->s_mb_min_to_scan(=10) times and then just use
the bigger extent. However with commit 32c08693 that check was removed and hence
we would loop sbi->s_mb_max_to_scan(=200) times even though we have a big enough
free extent to satisfy the request. The only time we would exit early would be
when the free extent is *exactly* the size of our goal, which is pretty uncommon
occurrence and so we would almost always end up looping 200 times.
Hence, revert the commit by adding the check back to fix the regression. Also
add a comment to outline this policy.
Fixes: 32c086937019 ("ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits")
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/ddcae9658e46880dfec2fb0aa61d01fb3353d202.1685449706.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When the uvcvideo driver encounters a format descriptor with an unknown
format GUID, it creates a corresponding struct uvc_format instance with
the fcc field set to 0. Since commit 50459f103edf ("media: uvcvideo:
Remove format descriptions"), the driver relies on the V4L2 core to
provide the format description string, which the V4L2 core can't do
without a valid 4CC. This triggers a WARN_ON.
As a format with a zero 4CC can't be selected, it is unusable for
applications. Ignore the format completely without creating a uvc_format
instance, which fixes the warning.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217252
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2180107
Fixes: 50459f103edf ("media: uvcvideo: Remove format descriptions")
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Reviewed-by: Ricardo Ribalda <ribalda@chromium.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
|
|
Add missing kernel doc for the new 'client_caps' field in struct
v4l2_subdev_fh.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Fixes: f57fa2959244 ("media: v4l2-subdev: Add new ioctl for client capabilities")
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
|
|
Initialize hs_settle to 0 to avoid this compiler warning:
imx8mq-mipi-csi2.c: In function 'imx8mq_mipi_csi_start_stream.part.0':
imx8mq-mipi-csi2.c:91:55: warning: 'hs_settle' may be used uninitialized [-Wmaybe-uninitialized]
91 | #define GPR_CSI2_1_S_PRG_RXHS_SETTLE(x) (((x) & 0x3f) << 2)
| ^~
imx8mq-mipi-csi2.c:357:13: note: 'hs_settle' was declared here
357 | u32 hs_settle;
| ^~~~~~~~~
It's a false positive, but it is too complicated for the compiler to detect that.
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reviewed-by: Martin Kepplinger <martink@posteo.de>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
|
|
While updating v4l2_create_fwnode_links_to_pad() to accept non-subdev
sinks, the check is_media_entity_v4l2_subdev() was not removed which
prevented the function from being used with non-subdev sinks, Drop the
unnecessary check.
Fixes: bd5a03bc5be8 ("media: Accept non-subdev sinks in v4l2_create_fwnode_links_to_pad()")
Signed-off-by: Vaishnav Achath <vaishnav.a@ti.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
|
|
Commit ac4e97abce9b8 ("scatterlist: sg_set_buf() argument must be in linear
mapping") checks that both the signature and the digest reside in the
linear mapping area.
However, more recently commit ba14a194a434c ("fork: Add generic vmalloced
stack support") made it possible to move the stack in the vmalloc area,
which is not contiguous, and thus not suitable for sg_set_buf() which needs
adjacent pages.
Always make a copy of the signature and digest in the same buffer used to
store the key and its parameters, and pass them to sg_init_one(). Prefer it
to conditionally doing the copy if necessary, to keep the code simple. The
buffer allocated with kmalloc() is in the linear mapping area.
Cc: stable@vger.kernel.org # 4.9.x
Fixes: ba14a194a434 ("fork: Add generic vmalloced stack support")
Link: https://lore.kernel.org/linux-integrity/Y4pIpxbjBdajymBJ@sol.localdomain/
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Tested-by: Stefan Berger <stefanb@linux.ibm.com>
|
|
huge_ptep_get must be reimplemented in order to go through all the PTEs
of a NAPOT region: this is needed because the HW can update the A/D bits
of any of the PTE that constitutes the NAPOT region.
Fixes: 82a1a1f3bfb6 ("riscv: mm: support Svnapot in hugetlb page")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230428120120.21620-2-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
We need to avoid inconsistencies across the PTEs that form a NAPOT
region, so when we write protect such a region, we should clear and flush
all the PTEs to make sure that any of those PTEs is not cached which would
result in such inconsistencies (arm64 does the same).
Fixes: 82a1a1f3bfb6 ("riscv: mm: support Svnapot in hugetlb page")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230428120120.21620-1-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
While implementing support for in-kernel decompression in kmod,
finit_module() was returning a very suspicious value:
finit_module(3, "", MODULE_INIT_COMPRESSED_FILE) = 18446744072717407296
It turns out the check for module_get_next_page() failing is wrong,
and hence the decompression was not really taking place. Invert
the condition to fix it.
Fixes: 169a58ad824d ("module/decompress: Support zstd in-kernel decompression")
Cc: stable@kernel.org
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
|
When switching from kthreads to vhost_tasks two bugs were added:
1. The vhost worker tasks's now show up as processes so scripts doing
ps or ps a would not incorrectly detect the vhost task as another
process. 2. kthreads disabled freeze by setting PF_NOFREEZE, but
vhost tasks's didn't disable or add support for them.
To fix both bugs, this switches the vhost task to be thread in the
process that does the VHOST_SET_OWNER ioctl, and has vhost_worker call
get_signal to support SIGKILL/SIGSTOP and freeze signals. Note that
SIGKILL/STOP support is required because CLONE_THREAD requires
CLONE_SIGHAND which requires those 2 signals to be supported.
This is a modified version of the patch written by Mike Christie
<michael.christie@oracle.com> which was a modified version of patch
originally written by Linus.
Much of what depended upon PF_IO_WORKER now depends on PF_USER_WORKER.
Including ignoring signals, setting up the register state, and having
get_signal return instead of calling do_group_exit.
Tidied up the vhost_task abstraction so that the definition of
vhost_task only needs to be visible inside of vhost_task.c. Making
it easier to review the code and tell what needs to be done where.
As part of this the main loop has been moved from vhost_worker into
vhost_task_fn. vhost_worker now returns true if work was done.
The main loop has been updated to call get_signal which handles
SIGSTOP, freezing, and collects the message that tells the thread to
exit as part of process exit. This collection clears
__fatal_signal_pending. This collection is not guaranteed to
clear signal_pending() so clear that explicitly so the schedule()
sleeps.
For now the vhost thread continues to exist and run work until the
last file descriptor is closed and the release function is called as
part of freeing struct file. To avoid hangs in the coredump
rendezvous and when killing threads in a multi-threaded exec. The
coredump code and de_thread have been modified to ignore vhost threads.
Remvoing the special case for exec appears to require teaching
vhost_dev_flush how to directly complete transactions in case
the vhost thread is no longer running.
Removing the special case for coredump rendezvous requires either the
above fix needed for exec or moving the coredump rendezvous into
get_signal.
Fixes: 6e890c5d5021 ("vhost: use vhost_tasks for worker threads")
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Co-developed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add rs485-rts-active-high property, this was removed by mistake.
In general we just use rs485-rts-active-low property, however the OMAP
UART for legacy reason uses the -high one.
Fixes: 767d3467eb60 ("dt-bindings: serial: 8250_omap: drop rs485 properties")
Closes: https://lore.kernel.org/all/ZGefR4mTHHo1iQ7H@francesco-nb.int.toradex.com/
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230531111038.6302-1-francesco@dolcini.it
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The Linux Kernel currently only requires make v3.82 while the grouped
target functionality requires make v4.3. Removed the grouped target
introduced in 4ce1f694eb5d ("selinux: ensure av_permissions.h is
built when needed") as well as the multiple header file targets in
the make rule. This effectively reverts the problem commit.
We will revisit this change when make >= 4.3 is required by the rest
of the kernel.
Cc: stable@vger.kernel.org
Fixes: 4ce1f694eb5d ("selinux: ensure av_permissions.h is built when needed")
Reported-by: Erwan Velu <e.velu@criteo.com>
Reported-by: Luiz Capitulino <luizcap@amazon.com>
Tested-by: Luiz Capitulino <luizcap@amazon.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
For RISC-V, when tracing with tracepoint events, the IP and status are
set to 0, preventing the perf code parsing the callchain and resolving
the symbols correctly.
./ply 'tracepoint:kmem/kmem_cache_alloc { @[stack]=count(); }'
@:
{ <STACKID4294967282> }: 1
The fix is to implement perf_arch_fetch_caller_regs for riscv, which
fills several necessary registers used for callchain unwinding,
including epc, sp, s0 and status. It's similar to commit b3eac0265bf6
("arm: perf: Fix callchain parse error with kernel tracepoint events")
and commit 5b09a094f2fb ("arm64: perf: Fix callchain parse error with
kernel tracepoint events").
With this patch, callchain can be parsed correctly as:
./ply 'tracepoint:kmem/kmem_cache_alloc { @[stack]=count(); }'
@:
{
__traceiter_kmem_cache_alloc+68
__traceiter_kmem_cache_alloc+68
kmem_cache_alloc+354
__sigqueue_alloc+94
__send_signal_locked+646
send_signal_locked+154
do_send_sig_info+84
__kill_pgrp_info+130
kill_pgrp+60
isig+150
n_tty_receive_signal_char+36
n_tty_receive_buf_standard+2214
n_tty_receive_buf_common+280
n_tty_receive_buf2+26
tty_ldisc_receive_buf+34
tty_port_default_receive_buf+62
flush_to_ldisc+158
process_one_work+458
worker_thread+138
kthread+178
riscv_cpufeature_patch_func+832
}: 1
Signed-off-by: Ism Hong <ism.hong@gmail.com>
Link: https://lore.kernel.org/r/20230601095355.1168910-1-ism.hong@gmail.com
Fixes: 178e9fc47aae ("perf: riscv: preliminary RISC-V support")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Active subflow are inserted into the connection list at creation time.
When the MPJ handshake completes successfully, a new subflow creation
netlink event is generated correctly, but the current code wrongly
avoid initializing a couple of subflow data.
The above will cause misbehavior on a few exceptional events: unneeded
mptcp-level retransmission on msk-level sequence wrap-around and infinite
mapping fallback even when a MPJ socket is present.
Address the issue factoring out the needed initialization in a new helper
and invoking the latter from __mptcp_finish_join() time for passive
subflow and from mptcp_finish_join() for active ones.
Fixes: 0530020a7c8f ("mptcp: track and update contiguous data status")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Christoph reported the mptcp variant of a recently addressed plain
TCP issue. Similar to commit e14cadfd80d7 ("tcp: add annotations around
sk->sk_shutdown accesses") add READ/WRITE ONCE annotations to silence
KCSAN reports around lockless sk_shutdown access.
Fixes: 71ba088ce0aa ("mptcp: cleanup accept and poll")
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/401
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The first subflow socket is accessed outside the msk socket lock
by mptcp_subflow_fail(), we need to annotate each write access
with WRITE_ONCE, but a few spots still lacks it.
Fixes: 76a13b315709 ("mptcp: invoke MP_FAIL response when needed")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When the msk socket is cloned at MPC handshake time, a few
fields are initialized in a racy way outside mptcp_sk_clone()
and the msk socket lock.
The above is due historical reasons: before commit a88d0092b24b
("mptcp: simplify subflow_syn_recv_sock()") as the first subflow socket
carrying all the needed date was not available yet at msk creation
time
We can now refactor the code moving the missing initialization bit
under the socket lock, removing the init race and avoiding some
code duplication.
This will also simplify the next patch, as all msk->first write
access are now under the msk socket lock.
Fixes: 0397c6d85f9c ("mptcp: keep unaccepted MPC subflow into join list")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The MPTCP can access the first subflow socket in a few spots
outside the socket lock scope. That is actually safe, as MPTCP
will delete the socket itself only after the msk sock close().
Still the such accesses causes a few KCSAN splats, as reported
by Christoph. Silence the harmless warning adding a few annotation
around the relevant accesses.
Fixes: 71ba088ce0aa ("mptcp: cleanup accept and poll")
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/402
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Ondrej reported a functional issue WRT timeout handling on connect
with a nice reproducer.
The problem is that the current mptcp connect waits for both the
MPTCP socket level timeout, and the first subflow socket timeout.
The latter is not influenced/touched by the exposed setsockopt().
Overall the above makes the SO_SNDTIMEO a no-op on connect.
Since mptcp_connect is invoked via inet_stream_connect and the
latter properly handle the MPTCP level timeout, we can address the
issue making the nested subflow level connect always unblocking.
This also allow simplifying a bit the code, dropping an ugly hack
to handle the fastopen and custom proto_ops connect.
The issues predates the blamed commit below, but the current resolution
requires the infrastructure introduced there.
Fixes: 54f1944ed6d2 ("mptcp: factor out mptcp_connect()")
Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/399
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This fixes the issue that dev gro_max_size and gso_ipv4_max_size
can be set to a huge value:
# ip link add dummy1 type dummy
# ip link set dummy1 gro_max_size 4294967295
# ip -d link show dummy1
dummy addrgenmode eui64 ... gro_max_size 4294967295
Fixes: 0fe79f28bfaf ("net: allow gro_max_size to exceed 65536")
Fixes: 9eefedd58ae1 ("net: add gso_ipv4_max_size and gro_ipv4_max_size per device")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
These IFLA_GSO_* tb check should also be done for the new created link,
otherwise, they can be set to a huge value when creating links:
# ip link add dummy1 gso_max_size 4294967295 type dummy
# ip -d link show dummy1
dummy addrgenmode eui64 ... gso_max_size 4294967295
Fixes: 46e6b992c250 ("rtnetlink: allow GSO maximums to be set on device creation")
Fixes: 9eefedd58ae1 ("net: add gso_ipv4_max_size and gro_ipv4_max_size per device")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
validate_linkmsg() was introduced by commit 1840bb13c22f5b ("[RTNL]:
Validate hardware and broadcast address attribute for RTM_NEWLINK")
to validate tb[IFLA_ADDRESS/BROADCAST] for existing links. The same
check should also be done for newly created links.
This patch adds validate_linkmsg() call in rtnl_create_link(), to
avoid the invalid address set when creating some devices like:
# ip link add dummy0 type dummy
# ip link add link dummy0 name mac0 address 01:02 type macsec
Fixes: 0e06877c6fdb ("[RTNETLINK]: rtnl_link: allow specifying initial device address")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The ice driver caches next_to_clean value at the beginning of
ice_clean_rx_irq() in order to remember the first buffer that has to be
freed/recycled after main Rx processing loop. The end boundary is
indicated by first descriptor of frame that Rx processing loop has ended
its duties. Note that if mentioned loop ended in the middle of gathering
multi-buffer frame, next_to_clean would be pointing to the descriptor in
the middle of the frame BUT freeing/recycling stage will stop at the
first descriptor. This means that next iteration of ice_clean_rx_irq()
will miss the (first_desc, next_to_clean - 1) entries.
When running various 9K MTU workloads, such splats were observed:
[ 540.780716] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 540.787787] #PF: supervisor read access in kernel mode
[ 540.793002] #PF: error_code(0x0000) - not-present page
[ 540.798218] PGD 0 P4D 0
[ 540.800801] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 540.805231] CPU: 18 PID: 3984 Comm: xskxceiver Tainted: G W 6.3.0-rc7+ #96
[ 540.813619] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[ 540.824209] RIP: 0010:ice_clean_rx_irq+0x2b6/0xf00 [ice]
[ 540.829678] Code: 74 24 10 e9 aa 00 00 00 8b 55 78 41 31 57 10 41 09 c4 4d 85 ff 0f 84 83 00 00 00 49 8b 57 08 41 8b 4f 1c 65 8b 35 1a fa 4b 3f <48> 8b 02 48 c1 e8 3a 39 c6 0f 85 a2 00 00 00 f6 42 08 02 0f 85 98
[ 540.848717] RSP: 0018:ffffc9000f42fc50 EFLAGS: 00010282
[ 540.854029] RAX: 0000000000000004 RBX: 0000000000000002 RCX: 000000000000fffe
[ 540.861272] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000ffffffff
[ 540.868519] RBP: ffff88984a05ac00 R08: 0000000000000000 R09: dead000000000100
[ 540.875760] R10: ffff88983fffcd00 R11: 000000000010f2b8 R12: 0000000000000004
[ 540.883008] R13: 0000000000000003 R14: 0000000000000800 R15: ffff889847a10040
[ 540.890253] FS: 00007f6ddf7fe640(0000) GS:ffff88afdf800000(0000) knlGS:0000000000000000
[ 540.898465] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 540.904299] CR2: 0000000000000000 CR3: 000000010d3da001 CR4: 00000000007706e0
[ 540.911542] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 540.918789] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 540.926032] PKRU: 55555554
[ 540.928790] Call Trace:
[ 540.931276] <TASK>
[ 540.933418] ice_napi_poll+0x4ca/0x6d0 [ice]
[ 540.937804] ? __pfx_ice_napi_poll+0x10/0x10 [ice]
[ 540.942716] napi_busy_loop+0xd7/0x320
[ 540.946537] xsk_recvmsg+0x143/0x170
[ 540.950178] sock_recvmsg+0x99/0xa0
[ 540.953729] __sys_recvfrom+0xa8/0x120
[ 540.957543] ? do_futex+0xbd/0x1d0
[ 540.961008] ? __x64_sys_futex+0x73/0x1d0
[ 540.965083] __x64_sys_recvfrom+0x20/0x30
[ 540.969155] do_syscall_64+0x38/0x90
[ 540.972796] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 540.977934] RIP: 0033:0x7f6de5f27934
To fix this, set cached_ntc to first_desc so that at the end, when
freeing/recycling buffers, descriptors from first to ntc are not missed.
Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230531154457.3216621-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The interrupt fix in commit 97a89ed101bb should be applied on all variants
of GPY2xx PHY and GPY115C.
Fixes: 97a89ed101bb ("net: phy: mxl-gpy: disable interrupts on GPY215 by default")
Signed-off-by: Xu Liang <lxu@maxlinear.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230531074822.39136-1-lxu@maxlinear.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix return value in the error path of rswitch_start_xmit(). If TX
queues are full, this function should return NETDEV_TX_BUSY.
Fixes: 3590918b5d07 ("net: ethernet: renesas: Add support for "Ethernet Switch"")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Link: https://lore.kernel.org/r/20230529073817.1145208-1-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
marvell_nfc_setup_interface() uses the frequency retrieved from the
clock associated with the nand interface to determine the timings that
will be used. By changing the NAND frequency select without reflecting
this in the clock configuration this means that the timings calculated
don't correctly meet the requirements of the NAND chip. This hasn't been
an issue up to now because of a different bug that was stopping the
timings being updated after they were initially set.
Fixes: b25251414f6e ("mtd: rawnand: marvell: Stop implementing ->select_chip()")
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230525003154.2303012-2-chris.packham@alliedtelesis.co.nz
|
|
When new timing values are calculated in marvell_nfc_setup_interface()
ensure that they will be applied in marvell_nfc_select_target() by
clearing the selected_chip pointer.
Fixes: b25251414f6e ("mtd: rawnand: marvell: Stop implementing ->select_chip()")
Suggested-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230525003154.2303012-1-chris.packham@alliedtelesis.co.nz
|
|
The addition of the mtdchar_read_ioctl() function caused the stack usage
of mtdchar_ioctl() to grow beyond the warning limit on 32-bit architectures
with gcc-13:
drivers/mtd/mtdchar.c: In function 'mtdchar_ioctl':
drivers/mtd/mtdchar.c:1229:1: error: the frame size of 1488 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
Mark both the read and write portions as noinline_for_stack to ensure
they don't get inlined and use separate stack slots to reduce the
maximum usage, both in the mtdchar_ioctl() and combined with any
of its callees.
Fixes: 095bb6e44eb1 ("mtdchar: add MEMREAD ioctl")
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20230417205654.1982368-1-arnd@kernel.org
|
|
Naga no longer works for AMD/Xilinx and there is no activity from him to
continue to maintain Xilinx related drivers. Add myself instead to be kept
in loop if there is any need for testing.
Signed-off-by: Michal Simek <michal.simek@amd.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
[<miquel.raynal@bootlin.com>: Manually apply on top of the latest -rc which
where the MAINTAINERS file got sorted]
Link: https://lore.kernel.org/linux-mtd/06df49c300c53a27423260e99acc217b06d4e588.1684827820.git.michal.simek@amd.com
|
|
A switch held in reset by default needs to wait longer until we can
reliably detect it.
An issue was observed when testing on the Marvell 88E6393X (Link Street).
The driver failed to detect the switch on some upstarts. Increasing the
wait time after reset deactivation solves this issue.
The updated wait time is now also the same as the wait time in the
mv88e6xxx_hardware_reset function.
Fixes: 7b75e49de424 ("net: dsa: mv88e6xxx: wait after reset deactivation")
Signed-off-by: Andreas Svensson <andreas.svensson@axis.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230530145223.1223993-1-andreas.svensson@axis.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Zero-length and one-element arrays are deprecated, and we are moving
towards adopting C99 flexible-array members, instead.
Address the following warnings found with GCC-13 and
-fstrict-flex-arrays=3 enabled:
sound/firewire/amdtp-stream.c: In function ‘build_it_pkt_header’:
sound/firewire/amdtp-stream.c:694:17: warning: ‘generate_cip_header’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=]
694 | generate_cip_header(s, cip_header, data_block_counter, syt);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sound/firewire/amdtp-stream.c:694:17: note: referencing argument 2 of type ‘__be32[2]’ {aka ‘unsigned int[2]’}
sound/firewire/amdtp-stream.c:667:13: note: in a call to function ‘generate_cip_header’
667 | static void generate_cip_header(struct amdtp_stream *s, __be32 cip_header[2],
| ^~~~~~~~~~~~~~~~~~~
This helps with the ongoing efforts to tighten the FORTIFY_SOURCE
routines on memcpy() and help us make progress towards globally
enabling -fstrict-flex-arrays=3 [1].
Link: https://github.com/KSPP/linux/issues/21
Link: https://github.com/KSPP/linux/issues/303
Link: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/602902.html [1]
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/ZHT0V3SpvHyxCv5W@work
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
|