Age | Commit message (Collapse) | Author | Files | Lines |
|
The rmap array in the guest memslot is an array of size number of guest
pages, allocated at memslot creation time. Each rmap entry in this array
is used to store information about the guest page to which it
corresponds. For example for a hpt guest it is used to store a lock bit,
rc bits, a present bit and the index of a hpt entry in the guest hpt
which maps this page. For a radix guest which is running nested guests
it is used to store a pointer to a linked list of nested rmap entries
which store the nested guest physical address which maps this guest
address and for which there is a pte in the shadow page table.
As there are currently two uses for the rmap array, and the potential
for this to expand to more in the future, define a type field (being the
top 8 bits of the rmap entry) to be used to define the type of the rmap
entry which is currently present and define two values for this field
for the two current uses of the rmap array.
Since the nested case uses the rmap entry to store a pointer, define
this type as having the two high bits set as is expected for a pointer.
Define the hpt entry type as having bit 56 set (bit 7 IBM bit ordering).
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
Fix the error below triggered by `-Wimplicit-fallthrough`, by tagging
it as an expected fall-through.
arch/powerpc/kvm/book3s_32_mmu.c: In function ‘kvmppc_mmu_book3s_32_xlate_pte’:
arch/powerpc/kvm/book3s_32_mmu.c:241:21: error: this statement may fall through [-Werror=implicit-fallthrough=]
pte->may_write = true;
~~~~~~~~~~~~~~~^~~~~~
arch/powerpc/kvm/book3s_32_mmu.c:242:5: note: here
case 3:
^~~~
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
|
Testing has revealed the existence of a race condition where a XIVE
interrupt being shut down can be in one of the XIVE interrupt queues
(of which there are up to 8 per CPU, one for each priority) at the
point where free_irq() is called. If this happens, can return an
interrupt number which has been shut down. This can lead to various
symptoms:
- irq_to_desc(irq) can be NULL. In this case, no end-of-interrupt
function gets called, resulting in the CPU's elevated interrupt
priority (numerically lowered CPPR) never gets reset. That then
means that the CPU stops processing interrupts, causing device
timeouts and other errors in various device drivers.
- The irq descriptor or related data structures can be in the process
of being freed as the interrupt code is using them. This typically
leads to crashes due to bad pointer dereferences.
This race is basically what commit 62e0468650c3 ("genirq: Add optional
hardware synchronization for shutdown", 2019-06-28) is intended to
fix, given a get_irqchip_state() method for the interrupt controller
being used. It works by polling the interrupt controller when an
interrupt is being freed until the controller says it is not pending.
With XIVE, the PQ bits of the interrupt source indicate the state of
the interrupt source, and in particular the P bit goes from 0 to 1 at
the point where the hardware writes an entry into the interrupt queue
that this interrupt is directed towards. Normally, the code will then
process the interrupt and do an end-of-interrupt (EOI) operation which
will reset PQ to 00 (assuming another interrupt hasn't been generated
in the meantime). However, there are situations where the code resets
P even though a queue entry exists (for example, by setting PQ to 01,
which disables the interrupt source), and also situations where the
code leaves P at 1 after removing the queue entry (for example, this
is done for escalation interrupts so they cannot fire again until
they are explicitly re-enabled).
The code already has a 'saved_p' flag for the interrupt source which
indicates that a queue entry exists, although it isn't maintained
consistently. This patch adds a 'stale_p' flag to indicate that
P has been left at 1 after processing a queue entry, and adds code
to set and clear saved_p and stale_p as necessary to maintain a
consistent indication of whether a queue entry may or may not exist.
With this, we can implement xive_get_irqchip_state() by looking at
stale_p, saved_p and the ESB PQ bits for the interrupt.
There is some additional code to handle escalation interrupts
properly; because they are enabled and disabled in KVM assembly code,
which does not have access to the xive_irq_data struct for the
escalation interrupt. Hence, stale_p may be incorrect when the
escalation interrupt is freed in kvmppc_xive_{,native_}cleanup_vcpu().
Fortunately, we can fix it up by looking at vcpu->arch.xive_esc_on,
with some careful attention to barriers in order to ensure the correct
result if xive_esc_irq() races with kvmppc_xive_cleanup_vcpu().
Finally, this adds code to make noise on the console (pr_crit and
WARN_ON(1)) if we find an interrupt queue entry for an interrupt
which does not have a descriptor. While this won't catch the race
reliably, if it does get triggered it will be an indication that
the race is occurring and needs to be debugged.
Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190813100648.GE9567@blackberry
|
|
At present, when running a guest on POWER9 using HV KVM but not using
an in-kernel interrupt controller (XICS or XIVE), for example if QEMU
is run with the kernel_irqchip=off option, the guest entry code goes
ahead and tries to load the guest context into the XIVE hardware, even
though no context has been set up.
To fix this, we check that the "CAM word" is non-zero before pushing
it to the hardware. The CAM word is initialized to a non-zero value
in kvmppc_xive_connect_vcpu() and kvmppc_xive_native_connect_vcpu(),
and is now cleared in kvmppc_xive_{,native_}cleanup_vcpu.
Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.12+
Reported-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190813100100.GC9567@blackberry
|
|
Escalation interrupts are interrupts sent to the host by the XIVE
hardware when it has an interrupt to deliver to a guest VCPU but that
VCPU is not running anywhere in the system. Hence we disable the
escalation interrupt for the VCPU being run when we enter the guest
and re-enable it when the guest does an H_CEDE hypercall indicating
it is idle.
It is possible that an escalation interrupt gets generated just as we
are entering the guest. In that case the escalation interrupt may be
using a queue entry in one of the interrupt queues, and that queue
entry may not have been processed when the guest exits with an H_CEDE.
The existing entry code detects this situation and does not clear the
vcpu->arch.xive_esc_on flag as an indication that there is a pending
queue entry (if the queue entry gets processed, xive_esc_irq() will
clear the flag). There is a comment in the code saying that if the
flag is still set on H_CEDE, we have to abort the cede rather than
re-enabling the escalation interrupt, lest we end up with two
occurrences of the escalation interrupt in the interrupt queue.
However, the exit code doesn't do that; it aborts the cede in the sense
that vcpu->arch.ceded gets cleared, but it still enables the escalation
interrupt by setting the source's PQ bits to 00. Instead we need to
set the PQ bits to 10, indicating that an interrupt has been triggered.
We also need to avoid setting vcpu->arch.xive_esc_on in this case
(i.e. vcpu->arch.xive_esc_on seen to be set on H_CEDE) because
xive_esc_irq() will run at some point and clear it, and if we race with
that we may end up with an incorrect result (i.e. xive_esc_on set when
the escalation interrupt has just been handled).
It is extremely unlikely that having two queue entries would cause
observable problems; theoretically it could cause queue overflow, but
the CPU would have to have thousands of interrupts targetted to it for
that to be possible. However, this fix will also make it possible to
determine accurately whether there is an unhandled escalation
interrupt in the queue, which will be needed by the following patch.
Fixes: 9b9b13a6d153 ("KVM: PPC: Book3S HV: Keep XIVE escalation interrupt masked unless ceded")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190813100349.GD9567@blackberry
|
|
When a vCPU is brought done, the XIVE VP (Virtual Processor) is first
disabled and then the event notification queues are freed. When freeing
the queues, we check for possible escalation interrupts and free them
also.
But when a XIVE VP is disabled, the underlying XIVE ENDs also are
disabled in OPAL. When an END (Event Notification Descriptor) is
disabled, its ESB pages (ESn and ESe) are disabled and loads return all
1s. Which means that any access on the ESB page of the escalation
interrupt will return invalid values.
When an interrupt is freed, the shutdown handler computes a 'saved_p'
field from the value returned by a load in xive_do_source_set_mask().
This value is incorrect for escalation interrupts for the reason
described above.
This has no impact on Linux/KVM today because we don't make use of it
but we will introduce in future changes a xive_get_irqchip_state()
handler. This handler will use the 'saved_p' field to return the state
of an interrupt and 'saved_p' being incorrect, softlockup will occur.
Fix the vCPU cleanup sequence by first freeing the escalation interrupts
if any, then disable the XIVE VP and last free the queues.
Fixes: 90c73795afa2 ("KVM: PPC: Book3S HV: Add a new KVM device for the XIVE native exploitation mode")
Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190806172538.5087-1-clg@kaod.org
|
|
|
|
This is unused since commit 9f69a496f100 ("kbuild: split out *.mod out
of {single,multi}-used-m rules").
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Running gen_compile_commands.py after building the kernel with
allnoconfig gave this:
$ ./scripts/gen_compile_commands.py
WARNING: Found 449 entries. Have you compiled the kernel?
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
This file is used by clangd to use language server protocol.
It can be generated at each compile using scripts/gen_compile_commands.py.
Therefore it is different depending on the environment and should be
ignored.
Signed-off-by: Toru Komatsu <k0ma@utam0k.jp>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
Commit 415008af3219 ("docs-rst: convert lsm from DocBook to ReST")
removed the last users of this macro.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
The path in the schema '$id' values are wrong. Fix them.
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Now that the examples are validated, the examples in the SID binding
generates an error since the node names aren't one of the valid ones.
Let's switch for one that is ok.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
The nvmem providers and consumers have a bunch of generic properties that
are needed in a device tree. Add a YAML schemas for those.
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
[Srini: Changed licence to (GPL-2.0 OR BSD-2-Clause)]
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
"Findfrom" is not a word. Replace the function synopsis by something
that makes sense.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
We should only set the max segment size to unlimited if we actually
have a virt boundary. Otherwise we accidentally clear that limit
when called from the SCSI midlayer, which always calls
blk_queue_virt_boundary, even if that mask is 0.
Fixes: 7ad388d8e4c7 ("scsi: core: add a host / host template field for the virt boundary")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix duplicate words.
Signed-off-by: Guido Günther <agx@sigxcpu.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
|
We need to drop everything we remove from the tree, whether
mnt_has_parent() is true or not. Usually the bug manifests as a slow
memory leak (leaked struct mount for initramfs); it becomes much more
visible in mount_subtree() users, such as btrfs. There we leak
a struct mount for btrfs superblock being mounted, which prevents
fs shutdown on subsequent umount.
Fixes: 56cbb429d911 ("switch the remnants of releasing the mountpoint away from fs_pin")
Reported-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
I will not be able to continue with my maintainership responsibilities
going forward, so remove myself as the maintainer.
Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
|
Since commit eec4844fae7c ("proc/sysctl: add shared variables for range
check") special shared variables are available for sysctl range check.
Reuse them for /proc/sys/vm/allocate_pgste proc handler.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
|
The access to airq_areas was racy ever since the adapter interrupts got
introduced to virtio-ccw, but since commit 39c7dcb15892 ("virtio/s390:
make airq summary indicators DMA") this became an issue in practice as
well. Namely before that commit the airq_info that got overwritten was
still functional. After that commit however the two infos share a
summary_indicator, which aggravates the situation. Which means
auto-online mechanism occasionally hangs the boot with virtio_blk.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 96b14536d935 ("virtio-ccw: virtio-ccw adapter interrupt support.")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
|
On s390 ZONE_DMA is up to 2G, i.e. ARCH_ZONE_DMA_BITS should be 31 bits.
The current value is 24 and makes __dma_direct_alloc_pages() take a
wrong turn first (but __dma_direct_alloc_pages() recovers then).
Let's correct ARCH_ZONE_DMA_BITS value and avoid wrong turns.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reported-by: Petr Tesarik <ptesarik@suse.cz>
Fixes: c61e9637340e ("dma-direct: add support for allocation from ZONE_DMA and ZONE_DMA32")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
|
btrfs_lock_and_flush_ordered_range() loads given "*cached_state" into
cachedp, which, in general, is NULL. Then, lock_extent_bits() updates
"cachedp", but it never goes backs to the caller. Thus the caller still
see its "cached_state" to be NULL and never free the state allocated
under btrfs_lock_and_flush_ordered_range(). As a result, we will
see massive state leak with e.g. fstests btrfs/005. Fix this bug by
properly handling the pointers.
Fixes: bd80d94efb83 ("btrfs: Always use a cached extent_state in btrfs_lock_and_flush_ordered_range")
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Now that all the fall-through warnings have been addressed in the
kernel, enable the fall-through warning globally.
Also, update the deprecated.rst file to include implicit fall-through
as 'deprecated' so people can be pointed to a single location for
justification.
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-kbuild@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
This patch fixes the following warnings:
drivers/gpu/drm/i915/gem/i915_gem_mman.c: In function ‘i915_gem_fault’:
drivers/gpu/drm/i915/gem/i915_gem_mman.c:342:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (!i915_terminally_wedged(i915))
^
drivers/gpu/drm/i915/gem/i915_gem_mman.c:345:2: note: here
case -EAGAIN:
^~~~
drivers/gpu/drm/i915/gem/i915_gem_pages.c: In function ‘i915_gem_object_map’:
./include/linux/compiler.h:78:22: warning: this statement may fall through [-Wimplicit-fallthrough=]
# define unlikely(x) __builtin_expect(!!(x), 0)
^~~~~~~~~~~~~~~~~~~~~~~~~~
./include/asm-generic/bug.h:136:2: note: in expansion of macro ‘unlikely’
unlikely(__ret_warn_on); \
^~~~~~~~
drivers/gpu/drm/i915/i915_utils.h:49:25: note: in expansion of macro ‘WARN’
#define MISSING_CASE(x) WARN(1, "Missing case (%s == %ld)\n", \
^~~~
drivers/gpu/drm/i915/gem/i915_gem_pages.c:270:3: note: in expansion of macro ‘MISSING_CASE’
MISSING_CASE(type);
^~~~~~~~~~~~
drivers/gpu/drm/i915/gem/i915_gem_pages.c:272:2: note: here
case I915_MAP_WB:
^~~~
drivers/gpu/drm/i915/i915_gpu_error.c: In function ‘error_record_engine_registers’:
./include/linux/compiler.h:78:22: warning: this statement may fall through [-Wimplicit-fallthrough=]
# define unlikely(x) __builtin_expect(!!(x), 0)
^~~~~~~~~~~~~~~~~~~~~~~~~~
./include/asm-generic/bug.h:136:2: note: in expansion of macro ‘unlikely’
unlikely(__ret_warn_on); \
^~~~~~~~
drivers/gpu/drm/i915/i915_utils.h:49:25: note: in expansion of macro ‘WARN’
#define MISSING_CASE(x) WARN(1, "Missing case (%s == %ld)\n", \
^~~~
drivers/gpu/drm/i915/i915_gpu_error.c:1196:5: note: in expansion of macro ‘MISSING_CASE’
MISSING_CASE(engine->id);
^~~~~~~~~~~~
drivers/gpu/drm/i915/i915_gpu_error.c:1197:4: note: here
case RCS0:
^~~~
drivers/gpu/drm/i915/display/intel_dp.c: In function ‘intel_dp_get_fia_supported_lane_count’:
./include/linux/compiler.h:78:22: warning: this statement may fall through [-Wimplicit-fallthrough=]
# define unlikely(x) __builtin_expect(!!(x), 0)
^~~~~~~~~~~~~~~~~~~~~~~~~~
./include/asm-generic/bug.h:136:2: note: in expansion of macro ‘unlikely’
unlikely(__ret_warn_on); \
^~~~~~~~
drivers/gpu/drm/i915/i915_utils.h:49:25: note: in expansion of macro ‘WARN’
#define MISSING_CASE(x) WARN(1, "Missing case (%s == %ld)\n", \
^~~~
drivers/gpu/drm/i915/display/intel_dp.c:233:3: note: in expansion of macro ‘MISSING_CASE’
MISSING_CASE(lane_info);
^~~~~~~~~~~~
drivers/gpu/drm/i915/display/intel_dp.c:234:2: note: here
case 1:
^~~~
drivers/gpu/drm/i915/display/intel_display.c: In function ‘check_digital_port_conflicts’:
CC [M] drivers/gpu/drm/nouveau/nvkm/engine/disp/cursgv100.o
drivers/gpu/drm/i915/display/intel_display.c:12043:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (WARN_ON(!HAS_DDI(to_i915(dev))))
^
drivers/gpu/drm/i915/display/intel_display.c:12046:3: note: here
case INTEL_OUTPUT_DP:
^~~~
Also, notice that the Makefile is modified to stop ignoring
fall-through warnings. The -Wimplicit-fallthrough option
will be enabled globally in v5.3.
Warning level 3 was used: -Wimplicit-fallthrough=3
This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
Warning level 3 was used: -Wimplicit-fallthrough=3
This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
|
|
In preparation to enabling -Wimplicit-fallthrough, this patch silences
the following warning:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_mqd_manager_v10.c: In function ‘mqd_manager_init_v10’:
./include/linux/dynamic_debug.h:122:52: warning: this statement may fall through [-Wimplicit-fallthrough=]
#define __dynamic_func_call(id, fmt, func, ...) do { \
^
./include/linux/dynamic_debug.h:143:2: note: in expansion of macro ‘__dynamic_func_call’
__dynamic_func_call(__UNIQUE_ID(ddebug), fmt, func, ##__VA_ARGS__)
^~~~~~~~~~~~~~~~~~~
./include/linux/dynamic_debug.h:153:2: note: in expansion of macro ‘_dynamic_func_call’
_dynamic_func_call(fmt, __dynamic_pr_debug, \
^~~~~~~~~~~~~~~~~~
./include/linux/printk.h:336:2: note: in expansion of macro ‘dynamic_pr_debug’
dynamic_pr_debug(fmt, ##__VA_ARGS__)
^~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_mqd_manager_v10.c:432:3: note: in expansion of macro ‘pr_debug’
pr_debug("%s@%i\n", __func__, __LINE__);
^~~~~~~~
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_mqd_manager_v10.c:433:2: note: here
case KFD_MQD_TYPE_COMPUTE:
^~~~
by removing the call to pr_debug() in KFD_MQD_TYPE_CP:
"The mqd init for CP and COMPUTE will have the same routine." [1]
This bug was found thanks to the ongoing efforts to enable
-Wimplicit-fallthrough.
[1] https://lore.kernel.org/lkml/c735a1cc-a545-50fb-44e7-c0ad93ee8ee7@amd.com/
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
|
|
Add missing break statement in order to prevent the code from falling
through to case AMDGPU_IRQ_STATE_ENABLE.
This bug was found thanks to the ongoing efforts to enable
-Wimplicit-fallthrough.
Fixes: a644d85a5cd4 ("drm/amdgpu: add gfx v10 implementation (v10)")
Cc: stable@vger.kernel.org
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
|
|
Add missing break statement in order to prevent the code from falling
through to case CHIP_NAVI10.
This bug was found thanks to the ongoing efforts to enable
-Wimplicit-fallthrough.
Fixes: 14328aa58ce5 ("drm/amdkfd: Add navi10 support to amdkfd. (v3)")
Cc: stable@vger.kernel.org
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
This patch fixes the following warnings:
arch/x86/events/intel/core.c: In function ‘intel_pmu_init’:
arch/x86/events/intel/core.c:4959:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
pmem = true;
~~~~~^~~~~~
arch/x86/events/intel/core.c:4960:2: note: here
case INTEL_FAM6_SKYLAKE_MOBILE:
^~~~
arch/x86/events/intel/core.c:5008:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
pmem = true;
~~~~~^~~~~~
arch/x86/events/intel/core.c:5009:2: note: here
case INTEL_FAM6_ICELAKE_MOBILE:
^~~~
Warning level 3 was used: -Wimplicit-fallthrough=3
This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
This patch fixes the following warning:
drivers/mtd/nand/onenand/onenand_base.c: In function ‘onenand_check_features’:
drivers/mtd/nand/onenand/onenand_base.c:3264:17: warning: this statement may fall through [-Wimplicit-fallthrough=]
this->options |= ONENAND_HAS_NOP_1;
drivers/mtd/nand/onenand/onenand_base.c:3265:2: note: here
case ONENAND_DEVICE_DENSITY_4Gb:
^~~~
Warning level 3 was used: -Wimplicit-fallthrough=3
This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Cc: Jonathan Bakker <xc-racer2@live.ca>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
This patch fixes the following warnings:
Warning level 3 was used: -Wimplicit-fallthrough=3
fs/afs/fsclient.c: In function ‘afs_deliver_fs_fetch_acl’:
fs/afs/fsclient.c:2199:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
call->unmarshall++;
~~~~~~~~~~~~~~~~^~
fs/afs/fsclient.c:2202:2: note: here
case 1:
^~~~
fs/afs/fsclient.c:2216:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
call->unmarshall++;
~~~~~~~~~~~~~~~~^~
fs/afs/fsclient.c:2219:2: note: here
case 2:
^~~~
fs/afs/fsclient.c:2225:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
call->unmarshall++;
~~~~~~~~~~~~~~~~^~
fs/afs/fsclient.c:2228:2: note: here
case 3:
^~~~
This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
This patch fixes the following warnings:
fs/afs/yfsclient.c: In function ‘yfs_deliver_fs_fetch_opaque_acl’:
fs/afs/yfsclient.c:1984:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
call->unmarshall++;
~~~~~~~~~~~~~~~~^~
fs/afs/yfsclient.c:1987:2: note: here
case 1:
^~~~
fs/afs/yfsclient.c:2005:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
call->unmarshall++;
~~~~~~~~~~~~~~~~^~
fs/afs/yfsclient.c:2008:2: note: here
case 2:
^~~~
fs/afs/yfsclient.c:2014:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
call->unmarshall++;
~~~~~~~~~~~~~~~~^~
fs/afs/yfsclient.c:2017:2: note: here
case 3:
^~~~
fs/afs/yfsclient.c:2035:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
call->unmarshall++;
~~~~~~~~~~~~~~~~^~
fs/afs/yfsclient.c:2038:2: note: here
case 4:
^~~~
fs/afs/yfsclient.c:2047:19: warning: this statement may fall through [-Wimplicit-fallthrough=]
call->unmarshall++;
~~~~~~~~~~~~~~~~^~
fs/afs/yfsclient.c:2050:2: note: here
case 5:
^~~~
Warning level 3 was used: -Wimplicit-fallthrough=3
Also, fix some commenting style issues.
This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
This patch fixes the following warnings:
drivers/net/can/peak_canfd/peak_pciefd_main.c:668:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/net/can/spi/mcp251x.c:875:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/net/can/usb/peak_usb/pcan_usb.c:422:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/net/can/at91_can.c:895:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/net/can/at91_can.c:953:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
drivers/net/can/usb/peak_usb/pcan_usb.c: In function ‘pcan_usb_decode_error’:
drivers/net/can/usb/peak_usb/pcan_usb.c:422:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (n & PCAN_USB_ERROR_BUS_LIGHT) {
^
drivers/net/can/usb/peak_usb/pcan_usb.c:428:2: note: here
case CAN_STATE_ERROR_WARNING:
^~~~
Warning level 3 was used: -Wimplicit-fallthrough=3
This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.
Notice that in some cases spelling mistakes were fixed.
In other cases, the /* fall through */ comment is placed
at the bottom of the case statement, which is what GCC
is expecting to find.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
This patch fixes the following warnings:
drivers/firewire/core-device.c: In function ‘set_broadcast_channel’:
drivers/firewire/core-device.c:969:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (data & cpu_to_be32(1 << 31)) {
^
drivers/firewire/core-device.c:974:3: note: here
case RCODE_ADDRESS_ERROR:
^~~~
drivers/firewire/core-iso.c: In function ‘manage_channel’:
drivers/firewire/core-iso.c:308:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
if ((data[0] & bit) == (data[1] & bit))
^
drivers/firewire/core-iso.c:312:3: note: here
default:
^~~~~~~
drivers/firewire/core-topology.c: In function ‘count_ports’:
drivers/firewire/core-topology.c:69:23: warning: this statement may fall through [-Wimplicit-fallthrough=]
(*child_port_count)++;
~~~~~~~~~~~~~~~~~~~^~
drivers/firewire/core-topology.c:70:3: note: here
case SELFID_PORT_PARENT:
^~~~
Warning level 3 was used: -Wimplicit-fallthrough=3
Notice that in some cases, the code comment is modified in
accordance with what GCC is expecting to find.
This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Cc: Kees Cook <keescook@chromium.org>
Cc: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> (reworded a comment)
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
|
|
The combination of KASAN_STACK and GCC_PLUGIN_STRUCTLEAK_BYREF
leads to much larger kernel stack usage, as seen from the warnings
about functions that now exceed the 2048 byte limit:
drivers/media/i2c/tvp5150.c:253:1: error: the frame size of 3936 bytes is larger than 2048 bytes
drivers/media/tuners/r820t.c:1327:1: error: the frame size of 2816 bytes is larger than 2048 bytes
drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:16552:1: error: the frame size of 3144 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
fs/ocfs2/aops.c:1892:1: error: the frame size of 2088 bytes is larger than 2048 bytes
fs/ocfs2/dlm/dlmrecovery.c:737:1: error: the frame size of 2088 bytes is larger than 2048 bytes
fs/ocfs2/namei.c:1677:1: error: the frame size of 2584 bytes is larger than 2048 bytes
fs/ocfs2/super.c:1186:1: error: the frame size of 2640 bytes is larger than 2048 bytes
fs/ocfs2/xattr.c:3678:1: error: the frame size of 2176 bytes is larger than 2048 bytes
net/bluetooth/l2cap_core.c:7056:1: error: the frame size of 2144 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
net/bluetooth/l2cap_core.c: In function 'l2cap_recv_frame':
net/bridge/br_netlink.c:1505:1: error: the frame size of 2448 bytes is larger than 2048 bytes
net/ieee802154/nl802154.c:548:1: error: the frame size of 2232 bytes is larger than 2048 bytes
net/wireless/nl80211.c:1726:1: error: the frame size of 2224 bytes is larger than 2048 bytes
net/wireless/nl80211.c:2357:1: error: the frame size of 4584 bytes is larger than 2048 bytes
net/wireless/nl80211.c:5108:1: error: the frame size of 2760 bytes is larger than 2048 bytes
net/wireless/nl80211.c:6472:1: error: the frame size of 2112 bytes is larger than 2048 bytes
The structleak plugin was previously disabled for CONFIG_COMPILE_TEST,
but meant we missed some bugs, so this time we should address them.
The frame size warnings are distracting, and risking a kernel stack
overflow is generally not beneficial to performance, so it may be best
to disallow that particular combination. This can be done by turning
off either one. I picked the dependency in GCC_PLUGIN_STRUCTLEAK_BYREF
and GCC_PLUGIN_STRUCTLEAK_BYREF_ALL, as this option is designed to
make uninitialized stack usage less harmful when enabled on its own,
but it also prevents KASAN from detecting those cases in which it was
in fact needed.
KASAN_STACK is currently implied by KASAN on gcc, but could be made a
user selectable option if we want to allow combining (non-stack) KASAN
with GCC_PLUGIN_STRUCTLEAK_BYREF.
Note that it would be possible to specifically address the files that
print the warning, but presumably the overall stack usage is still
significantly higher than in other configurations, so this would not
address the full problem.
I could not test this with CONFIG_INIT_STACK_ALL, which may or may not
suffer from a similar problem.
Fixes: 81a56f6dcd20 ("gcc-plugins: structleak: Generalize to all variable types")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20190722114134.3123901-1-arnd@arndb.de
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Daniel reports that when testing an http server that uses io_uring
to poll for incoming connections, sometimes it hard crashes. This is
due to an uninitialized list member for the io_uring request. Normally
this doesn't trigger and none of the test cases caught it.
Reported-by: Daniel Kozak <kozzi11@gmail.com>
Tested-by: Daniel Kozak <kozzi11@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit 06297d8cefca ("btrfs: switch extent_buffer blocking_writers from
atomic to int") changed the type of blocking_writers but forgot to
adjust relevant code in btrfs_tree_unlock by converting the
smp_mb__after_atomic to smp_mb. This opened up the possibility of a
deadlock due to re-ordering of setting blocking_writers and
checking/waking up the waiter. This particular lockup is explained in a
comment above waitqueue_active() function.
Fix it by converting the memory barrier to a full smp_mb, accounting
for the fact that blocking_writers is a simple integer.
Fixes: 06297d8cefca ("btrfs: switch extent_buffer blocking_writers from atomic to int")
Tested-by: Johannes Thumshirn <jthumshirn@suse.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
This patch fixes the following warnings:
arch/x86/events/intel/core.c: In function ‘intel_pmu_init’:
arch/x86/events/intel/core.c:4959:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
arch/x86/events/intel/core.c:5008:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
Warning level 3 was used: -Wimplicit-fallthrough=3
This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190624161913.GA32270@embeddedor
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Some hardware PMU drivers will override perf_event.cpu inside their
event_init callback. This causes a lockdep splat when initialized through
the kernel API:
WARNING: CPU: 0 PID: 250 at kernel/events/core.c:2917 ctx_sched_out+0x78/0x208
pc : ctx_sched_out+0x78/0x208
Call trace:
ctx_sched_out+0x78/0x208
__perf_install_in_context+0x160/0x248
remote_function+0x58/0x68
generic_exec_single+0x100/0x180
smp_call_function_single+0x174/0x1b8
perf_install_in_context+0x178/0x188
perf_event_create_kernel_counter+0x118/0x160
Fix this by calling perf_install_in_context with event->cpu, just like
perf_event_open
Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Frank Li <Frank.li@nxp.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/c4ebe0503623066896d7046def4d6b1e06e0eb2e.1563972056.git.leonard.crestez@nxp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
check_msr is used to fix a bug report in guest where KVM doesn't support
LBR MSR and cause #GP.
The msr check is bypassed on real HW to workaround a false failure,
see commit d0e1a507bdc7 ("perf/x86/intel: Disable check_msr for real HW")
When running a guest with CONFIG_HYPERVISOR_GUEST not set or "nopv"
enabled, current check isn't enough and #GP could trigger.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1564022366-18293-1-git-send-email-zhenzhong.duan@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The Intel SDM states that bit 13 of Icelake's MSR_OFFCORE_RSP_x
register is valid, and used for counting hardware generated prefetches
of L3 cache. Update the bitmask to allow bit 13.
Before:
$ perf stat -e cpu/event=0xb7,umask=0x1,config1=0x1bfff/u sleep 3
Performance counter stats for 'sleep 3':
<not supported> cpu/event=0xb7,umask=0x1,config1=0x1bfff/u
After:
$ perf stat -e cpu/event=0xb7,umask=0x1,config1=0x1bfff/u sleep 3
Performance counter stats for 'sleep 3':
9,293 cpu/event=0xb7,umask=0x1,config1=0x1bfff/u
Signed-off-by: Yunying Sun <yunying.sun@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: alexander.shishkin@linux.intel.com
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: jolsa@redhat.com
Cc: namhyung@kernel.org
Link: https://lkml.kernel.org/r/20190724082932.12833-1-yunying.sun@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Sampling SLOTS event and ref-cycles event in a group on Icelake gives
EINVAL.
SLOTS event is the event stands for the fixed counter 3, not fixed
counter 2. Wrong mask was set to SLOTS event in
intel_icl_pebs_event_constraints[].
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 6017608936c1 ("perf/x86/intel: Add Icelake support")
Link: https://lkml.kernel.org/r/20190723200429.8180-1-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
An uninitialized/ zeroed mutex will go unnoticed because there is no
check for it. There is a magic check in the unlock's slowpath path which
might go unnoticed if the unlock happens in the fastpath.
Add a ->magic check early in the mutex_lock() and mutex_trylock() path.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190703092125.lsdf4gpsh2plhavb@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
As Will Deacon points out, CONFIG_PROVE_LOCKING implies TRACE_IRQFLAGS,
so the conditions I added in the previous patch, and some others in the
same file can be simplified by only checking for the former.
No functional change.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Yuyang Du <duyuyang@gmail.com>
Fixes: 886532aee3cd ("locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING")
Link: https://lkml.kernel.org/r/20190628102919.2345242-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The usage is now hidden in an #ifdef, so we need to move
the variable itself in there as well to avoid this warning:
kernel/locking/lockdep_proc.c:203:21: error: unused variable 'class' [-Werror,-Wunused-variable]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yuyang Du <duyuyang@gmail.com>
Cc: frederic@kernel.org
Fixes: 68d41d8c94a3 ("locking/lockdep: Fix lock used or unused stats error")
Link: https://lkml.kernel.org/r/20190715092809.736834-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Since we just reviewed read_slowpath for ACQUIRE correctness, add a
few coments to retain our findings.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
While reviewing rwsem down_slowpath, Will noticed ldsem had a copy of
a bug we just found for rwsem.
X = 0;
CPU0 CPU1
rwsem_down_read()
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
X = 1;
rwsem_up_write();
rwsem_mark_wake()
atomic_long_add(adjustment, &sem->count);
smp_store_release(&waiter->task, NULL);
if (!waiter.task)
break;
...
}
r = X;
Allows 'r == 0'.
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 4898e640caf0 ("tty: Add timed, writer-prioritized rw semaphore")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
While reviewing another read_slowpath patch, both Will and I noticed
another missing ACQUIRE, namely:
X = 0;
CPU0 CPU1
rwsem_down_read()
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
X = 1;
rwsem_up_write();
rwsem_mark_wake()
atomic_long_add(adjustment, &sem->count);
smp_store_release(&waiter->task, NULL);
if (!waiter.task)
break;
...
}
r = X;
Allows 'r == 0'.
Reported-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
LTP mtest06 has been observed to occasionally hit "still mapped when
deleted" and following BUG_ON on arm64.
The extra mapcount originated from pagefault handler, which handled
pagefault for vma that has already been detached. vma is detached
under mmap_sem write lock by detach_vmas_to_be_unmapped(), which
also invalidates vmacache.
When the pagefault handler (under mmap_sem read lock) calls
find_vma(), vmacache_valid() wrongly reports vmacache as valid.
After rwsem down_read() returns via 'queue empty' path (as of v5.2),
it does so without an ACQUIRE on sem->count:
down_read()
__down_read()
rwsem_down_read_failed()
__rwsem_down_read_failed_common()
raw_spin_lock_irq(&sem->wait_lock);
if (list_empty(&sem->wait_list)) {
if (atomic_long_read(&sem->count) >= 0) {
raw_spin_unlock_irq(&sem->wait_lock);
return sem;
The problem can be reproduced by running LTP mtest06 in a loop and
building the kernel (-j $NCPUS) in parallel. It does reproduces since
v4.20 on arm64 HPE Apollo 70 (224 CPUs, 256GB RAM, 2 nodes). It
triggers reliably in about an hour.
The patched kernel ran fine for 10+ hours.
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Will Deacon <will@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dbueso@suse.de
Fixes: 4b486b535c33 ("locking/rwsem: Exit read lock slowpath if queue empty & no writer")
Link: https://lkml.kernel.org/r/50b8914e20d1d62bb2dee42d342836c2c16ebee7.1563438048.git.jstancek@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|