Age | Commit message (Collapse) | Author | Files | Lines |
|
Defer enabling pointer authentication on boot core until
after its required to be enabled by cpufeature framework.
This will help in controlling the feature dynamically
with a boot parameter.
Signed-off-by: Ajay Patil <pajay@qti.qualcomm.com>
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Srinivas Ramana <sramana@codeaurora.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1610152163-16554-2-git-send-email-sramana@codeaurora.org
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-22-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
In order to be able to disable BTI at runtime, whether it is
for testing purposes, or to work around HW issues, let's add
support for overriding the ID_AA64PFR1_EL1.BTI field.
This is further mapped on the arm64.nobti command-line alias.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Tested-by: Srinivas Ramana <sramana@codeaurora.org>
Link: https://lore.kernel.org/r/20210208095732.3267263-21-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Given that the early cpufeature infrastructure has borrowed quite
a lot of code from the kaslr implementation, let's reimplement
the matching of the "nokaslr" option with it.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-20-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
For completeness, let's document the HVC_VHE_RESTART stub.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Brazdil <dbrazdil@google.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-19-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Admitedly, passing id_aa64mmfr1.vh=0 on the command-line isn't
that easy to understand, and it is likely that users would much
prefer write "kvm-arm.mode=nvhe", or "...=protected".
So here you go. This has the added advantage that we can now
always honor the "kvm-arm.mode=protected" option, even when
booting on a VHE system.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-18-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
In order to map the override of idregs to options that a user
can easily understand, let's introduce yet another option
array, which maps an option to the corresponding idreg options.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-17-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Finally we can check whether VHE is disabled on the command line,
and not enable it if that's the user's wish.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Brazdil <dbrazdil@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-16-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
As we want to be able to disable VHE at runtime, let's match
"id_aa64mmfr1.vh=" from the command line as an override.
This doesn't have much effect yet as our boot code doesn't look
at the cpufeature, but only at the HW registers.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Brazdil <dbrazdil@google.com>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-15-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
In order to be able to override CPU features at boot time,
let's add a command line parser that matches options of the
form "cpureg.feature=value", and store the corresponding
value into the override val/mask pair.
No features are currently defined, so no expected change in
functionality.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Brazdil <dbrazdil@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-14-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
As we want to parse more options very early in the kernel lifetime,
let's always map the FDT early. This is achieved by moving that
code out of kaslr_early_init().
No functional change expected.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-13-maz@kernel.org
[will: Ensue KASAN is enabled before running C code]
Signed-off-by: Will Deacon <will@kernel.org>
|
|
__read_sysreg_by_encoding() is used by a bunch of cpufeature helpers,
which should take the feature override into account. Let's do that.
For a good measure (and because we are likely to need to further
down the line), make this helper available to the rest of the
non-modular kernel.
Code that needs to know the *real* features of a CPU can still
use read_sysreg_s(), and find the bare, ugly truth.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-12-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add a facility to globally override a feature, no matter what
the HW says. Yes, this sounds dangerous, but we do respect the
"safe" value for a given feature. This doesn't mean the user
doesn't need to know what they are doing.
Nothing uses this yet, so we are pretty safe. For now.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-11-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
We can now move the initial SCTLR_EL1 setup to be used for both
EL1 and EL2 setup.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-10-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
As init_el2_state is now nVHE only, let's simplify it and drop
the VHE setup.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Brazdil <dbrazdil@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-9-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
There isn't much that a VHE kernel needs on top of whatever has
been done for nVHE, so let's move the little we need to the
VHE stub (the SPE setup), and drop the init_el2_state macro.
No expected functional change.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Brazdil <dbrazdil@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-8-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When running VHE, we set MDSCR_EL2.TPMS very early on to force
the trapping of EL1 SPE accesses to EL2.
However:
- we are running with HCR_EL2.{E2H,TGE}={1,1}, meaning that there
is no EL1 to trap from
- before entering a guest, we call kvm_arm_setup_debug(), which
sets MDCR_EL2_TPMS in the per-vcpu shadow mdscr_el2, which gets
applied on entry by __activate_traps_common().
The early setting of MDSCR_EL2.TPMS is therefore useless and can
be dropped.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210208095732.3267263-7-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
As we are aiming to be able to control whether we enable VHE or
not, let's always drop down to EL1 first, and only then upgrade
to VHE if at all possible.
This means that if the kernel is booted at EL2, we always start
with a nVHE init, drop to EL1 to initialise the the kernel, and
only then upgrade the kernel EL to EL2 if possible (the process
is obviously shortened for secondary CPUs).
The resume path is handled similarly to a secondary CPU boot.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Brazdil <dbrazdil@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-6-maz@kernel.org
[will: Avoid calling switch_to_vhe twice on kaslr path]
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The workaround for Cortex-A76 erratum 1463225 is split across the
syscall and debug handlers in separate files. This structure currently
forces us to do some redundant work for debug exceptions from EL0, is a
little difficult to follow, and gets in the way of some future rework of
the exception entry code as it requires exceptions to be unmasked late
in the syscall handling path.
To simplify things, and as a preparatory step for future rework of
exception entry, this patch moves all the workaround logic into
entry-common.c. As the debug handler only needs to run for EL1 debug
exceptions, we no longer call it for EL0 debug exceptions, and no longer
need to check user_mode(regs) as this is always false. For clarity
cortex_a76_erratum_1463225_debug_handler() is changed to return bool.
In the SVC path, the workaround is applied earlier, but this should have
no functional impact as exceptions are still masked. In the debug path
we run the fixup before explicitly disabling preemption, but we will not
attempt to preempt before returning from the exception.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210202120341.28858-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
As we are about to change the way a VHE system boots, let's
provide the core helper, in the form of a stub hypercall that
enables VHE and replicates the full EL1 context at EL2, thanks
to EL1 and VHE-EL2 being extremely similar.
On exception return, the kernel carries on at EL2. Fancy!
Nothing calls this new hypercall yet, so no functional change.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Brazdil <dbrazdil@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-5-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Turning the MMU on is a popular sport in the arm64 kernel, and
we do it more than once, or even twice. As we are about to add
even more, let's turn it into a macro.
No expected functional change.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-4-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The arm64 kernel has long be able to use more than 39bit VAs.
Since day one, actually. Let's rewrite the offending comment.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-3-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
If someone happens to write the following code:
b 1f
init_el2_state vhe
1:
[...]
they will be in for a long debugging session, as the label "1f"
will be resolved *inside* the init_el2_state macro instead of
after it. Not really what one expects.
Instead, rewite the EL2 setup macros to use unambiguous labels,
thanks to the usual macro counter trick.
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Brazdil <dbrazdil@google.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-2-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The erratum 1024718 affects Cortex-A55 r0p0 to r2p0. However
we apply the work around for r0p0 - r1p0. Unfortunately this
won't be fixed for the future revisions for the CPU. Thus
extend the work around for all versions of A55, to cover
for r2p0 and any future revisions.
Cc: stable@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20210203230057.3961239-1-suzuki.poulose@arm.com
[will: Update Kconfig help text]
Signed-off-by: Will Deacon <will@kernel.org>
|
|
commit d8ed45c5dcd4 ("mmap locking API: use coccinelle to convert mmap_sem
rwsem call sites") has convertd down_read_trylock() to mmap_read_trylock().
But it forgot to update the relevant comment.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Link: https://lore.kernel.org/r/20210205090919.63382-1-linmiaohe@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
In a few places we don't have whitespace between macro parameters,
which makes them hard to read. This patch adds whitespace to clearly
separate the parameters.
In a few places we have unnecessary whitespace around unary operators,
which is confusing, This patch removes the unnecessary whitespace.
Signed-off-by: Zhiyuan Dai <daizhiyuan@phytium.com.cn>
Link: https://lore.kernel.org/r/1612403029-5011-1-git-send-email-daizhiyuan@phytium.com.cn
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add a macro cond_yield that branches to a specified label when called if
the TIF_NEED_RESCHED flag is set and decreasing the preempt count would
make the task preemptible again, resulting in a schedule to occur. This
can be used by kernel mode SIMD code that keeps a lot of state in SIMD
registers, which would make chunking the input in order to perform the
cond_resched() check from C code disproportionately costly.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210203113626.220151-2-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
|
|
So technically there is nothing wrong with adding a pinned page to the
swap cache, but the pinning obviously means that the page can't actually
be free'd right now anyway, so it's a bit pointless.
However, the real problem is not with it being a bit pointless: the real
issue is that after we've added it to the swap cache, we'll try to unmap
the page. That will succeed, because the code in mm/rmap.c doesn't know
or care about pinned pages.
Even the unmapping isn't fatal per se, since the page will stay around
in memory due to the pinning, and we do hold the connection to it using
the swap cache. But when we then touch it next and take a page fault,
the logic in do_swap_page() will map it back into the process as a
possibly read-only page, and we'll then break the page association on
the next COW fault.
Honestly, this issue could have been fixed in any of those other places:
(a) we could refuse to unmap a pinned page (which makes conceptual
sense), or (b) we could make sure to re-map a pinned page writably in
do_swap_page(), or (c) we could just make do_wp_page() not COW the
pinned page (which was what we historically did before that "mm:
do_wp_page() simplification" commit).
But while all of them are equally valid models for breaking this chain,
not putting pinned pages into the swap cache in the first place is the
simplest one by far.
It's also the safest one: the reason why do_wp_page() was changed in the
first place was that getting the "can I re-use this page" wrong is so
fraught with errors. If you do it wrong, you end up with an incorrectly
shared page.
As a result, using "page_maybe_dma_pinned()" in either do_wp_page() or
do_swap_page() would be a serious bug since it is only a (very good)
heuristic. Re-using the page requires a hard black-and-white rule with
no room for ambiguity.
In contrast, saying "this page is very likely dma pinned, so let's not
add it to the swap cache and try to unmap it" is an obviously safe thing
to do, and if the heuristic might very rarely be a false positive, no
harm is done.
Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Reported-and-tested-by: Martin Raiber <martin@urbackup.org>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
We are not guaranteed the locking environment that would prevent
dentry getting renamed right under us. And it's possible for
old long name to be freed after rename, leading to UAF here.
Cc: stable@kernel.org # v2.6.2+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Turning a pinned page read-only breaks the pinning after COW. Don't do it.
The whole "track page soft dirty" state doesn't work with pinned pages
anyway, since the page might be dirtied by the pinning entity without
ever being noticed in the page tables.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Turning page table entries read-only requires the mmap_sem held for
writing.
So stop doing the odd games with turning things from read locks to write
locks and back. Just get the write lock.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Linux kernel can only map 1GB of address space for RV32 as the page offset
is set to 0xC0000000. The current description in the Kconfig is confusing
as it indicates that RV32 can support 2GB of physical memory. That is
simply not true for current kernel. In future, a 2GB split support can be
added to allow 2GB physical address space.
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
Currently, linux kernel can not use last 4k bytes of addressable space
because IS_ERR_VALUE macro treats those as an error. This will be an issue
for RV32 as any memblock allocator potentially allocate chunk of memory
from the end of DRAM (2GB) leading bad address error even though the
address was technically valid.
Fix this issue by limiting the memblock if available memory spans the
entire address space.
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
Currently, resource tree allocates memory blocks while iterating on the
list. It leads to following kernel warning because memblock allocation
also invokes memory block reservation API.
[ 0.000000] ------------[ cut here ]------------
[ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/resource.c:795
__insert_resource+0x8e/0xd0
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted
5.10.0-00022-ge20097fb37e2-dirty #549
[ 0.000000] epc: c00125c2 ra : c001262c sp : c1c01f50
[ 0.000000] gp : c1d456e0 tp : c1c0a980 t0 : ffffcf20
[ 0.000000] t1 : 00000000 t2 : 00000000 s0 : c1c01f60
[ 0.000000] s1 : ffffcf00 a0 : ffffff00 a1 : c1c0c0c4
[ 0.000000] a2 : 80c12b15 a3 : 80402000 a4 : 80402000
[ 0.000000] a5 : c1c0c0c4 a6 : 80c12b15 a7 : f5faf600
[ 0.000000] s2 : c1c0c0c4 s3 : c1c0e000 s4 : c1009a80
[ 0.000000] s5 : c1c0c000 s6 : c1d48000 s7 : c1613b4c
[ 0.000000] s8 : 00000fff s9 : 80000200 s10: c1613b40
[ 0.000000] s11: 00000000 t3 : c1d4a000 t4 : ffffffff
This is also unnecessary as we can pre-compute the total memblocks required
for each memory region and allocate it before the loop. It save precious
boot time not going through memblock allocation code every time.
Fixes: 00ab027a3b82 ("RISC-V: Add kernel image sections to the resource tree")
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
sizeof needs to be called on the compat pointer, not the native one.
Fixes: 89cd35c58bc2 ("iov_iter: transparently handle compat iovecs in import_iovec")
Reported-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
If we enter with requests pending and performm cancelations, we'll have
a different inflight count before and after calling prepare_to_wait().
This causes the loop to restart. If we actually ended up canceling
everything, or everything completed in-between, then we'll break out
of the loop without calling finish_wait() on the waitqueue. This can
trigger a warning on exit_signals(), as we leave the task state in
TASK_UNINTERRUPTIBLE.
Put a finish_wait() after the loop to catch that case.
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When 'perf inject' reads a perf.data file from an older version of perf,
it writes event attributes into the output with the original size field,
but lays them out as if they had the size currently used. Readers see a
corrupt file. Update the size field to match the layout.
Signed-off-by: Al Grant <al.grant@foss.arm.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20201124195818.30603-1-al.grant@arm.com
Signed-off-by: Denis Nikitin <denik@chromium.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In some cases, the number of cpus (nr_cpus_online) is confused with the
maximum cpu number (nr_cpus_avail), which results in the error in the
example below:
Example on system with 8 cpus:
Before:
# echo 0 > /sys/devices/system/cpu/cpu2/online
# ./perf record --kcore -e intel_pt// taskset --cpu-list 7 uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.147 MB perf.data ]
# ./perf script --itrace=e
Requested CPU 7 too large. Consider raising MAX_NR_CPUS
0x25908 [0x8]: failed to process type: 68 [Invalid argument]
After:
# ./perf script --itrace=e
#
Fixes: 8c7274691f0d ("perf machine: Replace MAX_NR_CPUS with perf_env::nr_cpus_online")
Fixes: 7df4e36a4785 ("perf session: Replace MAX_NR_CPUS with perf_env::nr_cpus_online")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lore.kernel.org/lkml/20210107174159.24897-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
As of now it doesn't consider cgroups when collecting shadow stats and
metrics so counter values from different cgroups will be saved in a same
slot. This resulted in incorrect numbers when those cgroups have
different workloads.
For example, let's look at the scenario below: cgroups A and C runs same
workload which burns a cpu while cgroup B runs a light workload.
$ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C sleep 1
Performance counter stats for 'system wide':
3,958,116,522 cycles A
6,722,650,929 instructions A # 2.53 insn per cycle
1,132,741 cycles B
571,743 instructions B # 0.00 insn per cycle
4,007,799,935 cycles C
6,793,181,523 instructions C # 2.56 insn per cycle
1.001050869 seconds time elapsed
When I run 'perf stat' with single workload, it usually shows IPC around
1.7. We can verify it (6,722,650,929.0 / 3,958,116,522 = 1.698) for cgroup A.
But in this case, since cgroups are ignored, cycles are averaged so it
used the lower value for IPC calculation and resulted in around 2.5.
avg cycle: (3958116522 + 1132741 + 4007799935) / 3 = 2655683066
IPC (A) : 6722650929 / 2655683066 = 2.531
IPC (B) : 571743 / 2655683066 = 0.0002
IPC (C) : 6793181523 / 2655683066 = 2.557
We can simply compare cgroup pointers in the evsel and it'll be NULL
when cgroups are not specified. With this patch, I can see correct
numbers like below:
$ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C sleep 1
Performance counter stats for 'system wide':
4,171,051,687 cycles A
7,219,793,922 instructions A # 1.73 insn per cycle
1,051,189 cycles B
583,102 instructions B # 0.55 insn per cycle
4,171,124,710 cycles C
7,192,944,580 instructions C # 1.72 insn per cycle
1.007909814 seconds time elapsed
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210115071139.257042-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pass more info to the saved_value in the runtime_stat, add a new
struct runtime_stat_data. Currently it only has 'ctx' field but later
patch will add more.
Note that we intentionally pass 0 as ctx to clock-related events for
compatibility. It was already there in a few places. So move the code
into the saved_value_lookup() explicitly and add a comment.
Suggested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210115071139.257042-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Permissions are necessary to get a tracepoint id. Fail the test when the
read fails.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210114180250.3853825-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
If a test fails return -1 rather than 0. This is consistent with the
return value in test-cpumap.c
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210114180250.3853825-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
The variable 'bf' is read (for a write call) without being initialized
triggering a memory sanitizer warning. Use 'bf' in the read and switch
the write to reading from a string.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210114212304.4018119-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
In the fast commit, it adds REQ_FUA and REQ_PREFLUSH on each fast
commit block when barrier is enabled. However, in recovery phase,
ext4 compares CRC value in the tail. So it is sufficient to add
REQ_FUA and REQ_PREFLUSH on the block that has tail.
Signed-off-by: Daejun Park <daejun7.park@samsung.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20210106013242epcms2p5b6b4ed8ca86f29456fdf56aa580e74b4@epcms2p5
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
We got a "deleted inode referenced" warning cross our fsstress test. The
bug can be reproduced easily with following steps:
cd /dev/shm
mkdir test/
fallocate -l 128M img
mkfs.ext4 -b 1024 img
mount img test/
dd if=/dev/zero of=test/foo bs=1M count=128
mkdir test/dir/ && cd test/dir/
for ((i=0;i<1000;i++)); do touch file$i; done # consume all block
cd ~ && renameat2(AT_FDCWD, /dev/shm/test/dir/file1, AT_FDCWD,
/dev/shm/test/dir/dst_file, RENAME_WHITEOUT) # ext4_add_entry in
ext4_rename will return ENOSPC!!
cd /dev/shm/ && umount test/ && mount img test/ && ls -li test/dir/file1
We will get the output:
"ls: cannot access 'test/dir/file1': Structure needs cleaning"
and the dmesg show:
"EXT4-fs error (device loop0): ext4_lookup:1626: inode #2049: comm ls:
deleted inode referenced: 139"
ext4_rename will create a special inode for whiteout and use this 'ino'
to replace the source file's dir entry 'ino'. Once error happens
latter(the error above was the ENOSPC return from ext4_add_entry in
ext4_rename since all space has been consumed), the cleanup do drop the
nlink for whiteout, but forget to restore 'ino' with source file. This
will trigger the bug describle as above.
Signed-off-by: yangerkun <yangerkun@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Fixes: cd808deced43 ("ext4: support RENAME_WHITEOUT")
Link: https://lore.kernel.org/r/20210105062857.3566-1-yangerkun@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
After full/fast commit, entries in staging queue are promoted to main
queue. In ext4_fs_cleanup function, it splice to staging queue to
staging queue.
Fixes: aa75f4d3daaeb ("ext4: main fast-commit commit path")
Signed-off-by: Daejun Park <daejun7.park@samsung.com>
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20201230094851epcms2p6eeead8cc984379b37b2efd21af90fd1a@epcms2p6
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
1: ext4_iget/ext4_find_extent never returns NULL, use IS_ERR
instead of IS_ERR_OR_NULL to fix this.
2: ext4_fc_replay_inode should set the inode to NULL when IS_ERR.
and go to call iput properly.
Fixes: 8016e29f4362 ("ext4: fast commit recovery path")
Signed-off-by: Yi Li <yili@winhong.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20201230033827.3996064-1-yili@winhong.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
It was using some bash-specific features and failed to parse when
running with a different shell like below:
root@kbl-ppc:~/kbl-ws/perf-dev/lck-9077/acme.tmp/tools/perf# ./perf test 83 -vv
83: perf stat metrics (shadow stat) test :
--- start ---
test child forked, pid 3922
./tests/shell/stat+shadow_stat.sh: 19: ./tests/shell/stat+shadow_stat.sh: [[: not found
./tests/shell/stat+shadow_stat.sh: 24: ./tests/shell/stat+shadow_stat.sh: [[: not found
./tests/shell/stat+shadow_stat.sh: 30: ./tests/shell/stat+shadow_stat.sh: [[: not found
(standard_in) 2: syntax error
./tests/shell/stat+shadow_stat.sh: 36: ./tests/shell/stat+shadow_stat.sh: [[: not found
./tests/shell/stat+shadow_stat.sh: 19: ./tests/shell/stat+shadow_stat.sh: [[: not found
./tests/shell/stat+shadow_stat.sh: 24: ./tests/shell/stat+shadow_stat.sh: [[: not found
./tests/shell/stat+shadow_stat.sh: 30: ./tests/shell/stat+shadow_stat.sh: [[: not found
(standard_in) 2: syntax error
./tests/shell/stat+shadow_stat.sh: 36: ./tests/shell/stat+shadow_stat.sh: [[: not found
./tests/shell/stat+shadow_stat.sh: 45: ./tests/shell/stat+shadow_stat.sh: declare: not found
test child finished with -1
---- end ----
perf stat metrics (shadow stat) test: FAILED!
Reported-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Laight <david.laight@aculab.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210114050609.1258820-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick up the changes in:
3a176b94609a18f5 ("Revert "kbuild: avoid static_assert for genksyms"")
And silence this perf build warning:
Warning: Kernel ABI header at 'tools/include/linux/build_bug.h' differs from latest version at 'include/linux/build_bug.h'
diff -u tools/include/linux/build_bug.h include/linux/build_bug.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick the changes in:
647daca25d24fb6e ("KVM: SVM: Add support for booting APs in an SEV-ES guest")
That don't cause any tooling change, just silences this perf build
warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|