Age | Commit message (Collapse) | Author | Files | Lines |
|
Add a generic device tree for Kendryte K210 SoC based boards. This is
for now a very simple device tree describing the core elements of the
SoC. This is suitable (and tested) for the Kendryte KD233 development
board, the Sipeed MAIX M1 Dan Dock board and the Sipeed MAIXDUINO board.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Sean Anderson <seanga2@gmail.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
This patch selects drivers required for the Kendryte K210 SOC.
Since K210 SoC based boards do not provide a device tree, this patch
also enables the BUILTIN_DTB option.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
Add support for the Kendryte K210 RISC-V SoC. For now, this support
only provides a simple sysctl driver allowing to setup the CPU and
uart clock. This support is enabled through the new Kconfig option
SOC_KENDRYTE and defines the config option CONFIG_K210_SYSCTL
to enable the K210 SoC sysctl driver compilation.
The sysctl driver also registers an early SoC initialization function
allowing enabling the general purpose use of the 2MB of SRAM normally
reserved for the SoC AI engine. This initialization function is
automatically called before the dt early initialization using the flat
dt root node compatible property matching the value "kendryte,k210".
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
[Palmer: Add missing endmenu in Kconfig.socs]
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
Add a mechanism for early SoC initialization for platforms that need
additional hardware initialization not possible through the regular
device tree and drivers mechanism. With this, a SoC specific
initialization function can be called very early, before DTB parsing
is done by parse_dtb() in Linux RISC-V kernel setup code.
This can be very useful for early hardware initialization for No-MMU
kernels booted directly in M-mode because it is quite likely that no
other booting stage exist prior to the No-MMU kernel.
Example use of a SoC early initialization is as follows:
static void vendor_abc_early_init(const void *fdt)
{
/*
* some early init code here that can use simple matches
* against the flat device tree file.
*/
}
SOC_EARLY_INIT_DECLARE("vendor,abc", abc_early_init);
This early initialization function is executed only if the flat device
tree for the board has a 'compatible = "vendor,abc"' entry;
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
Add handlers for unaligned load and store traps that may be generated
by applications. Code heavily inspired from the OpenSBI project.
Handling of the unaligned access traps is suitable for applications
compiled with or without compressed instructions and is independent of
the kernel CONFIG_RISCV_ISA_C option value.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
This patch enable support for cpu hotplug in RISC-V. It uses SBI HSM
extension to online/offline any hart. As a result, the harts are
returned to firmware once they are offline. If the harts are brought
online afterwards, they re-enter Linux kernel as if a secondary hart
booted for the first time. All booting requirements are honored during
this process.
Tested both on QEMU and HighFive Unleashed board with. Test result follows.
---------------------------------------------------
Offline cpu 2
---------------------------------------------------
$ echo 0 > /sys/devices/system/cpu/cpu2/online
[ 32.828684] CPU2: off
$ cat /proc/cpuinfo
processor : 0
hart : 0
isa : rv64imafdcsu
mmu : sv48
processor : 1
hart : 1
isa : rv64imafdcsu
mmu : sv48
processor : 3
hart : 3
isa : rv64imafdcsu
mmu : sv48
processor : 4
hart : 4
isa : rv64imafdcsu
mmu : sv48
processor : 5
hart : 5
isa : rv64imafdcsu
mmu : sv48
processor : 6
hart : 6
isa : rv64imafdcsu
mmu : sv48
processor : 7
hart : 7
isa : rv64imafdcsu
mmu : sv48
---------------------------------------------------
online cpu 2
---------------------------------------------------
$ echo 1 > /sys/devices/system/cpu/cpu2/online
$ cat /proc/cpuinfo
processor : 0
hart : 0
isa : rv64imafdcsu
mmu : sv48
processor : 1
hart : 1
isa : rv64imafdcsu
mmu : sv48
processor : 2
hart : 2
isa : rv64imafdcsu
mmu : sv48
processor : 3
hart : 3
isa : rv64imafdcsu
mmu : sv48
processor : 4
hart : 4
isa : rv64imafdcsu
mmu : sv48
processor : 5
hart : 5
isa : rv64imafdcsu
mmu : sv48
processor : 6
hart : 6
isa : rv64imafdcsu
mmu : sv48
processor : 7
hart : 7
isa : rv64imafdcsu
mmu : sv48
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
|
|
Currently, all harts have to jump Linux in RISC-V. This complicates the
multi-stage boot process as every transient stage also has to ensure all
harts enter to that stage and jump to Linux afterwards. It also obstructs
a clean Kexec implementation.
SBI HSM extension provides alternate solutions where only a single hart
need to boot and enter Linux. The booting hart can bring up secondary
harts one by one afterwards.
Add SBI HSM based cpu_ops that implements an ordered booting method in
RISC-V. This change is also backward compatible with older firmware not
implementing HSM extension. If a latest kernel is used with older
firmware, it will continue to use the default spinning booting method.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
SBI specification defines HSM extension that allows to start/stop a hart
by a supervisor anytime. The specification is available at
https://github.com/riscv/riscv-sbi-doc/blob/master/riscv-sbi.adoc
Add those definitions here.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
All SBI related extensions will not be implemented in sbi.c to avoid
bloating. Thus, sbi_err_map_linux_errno() will be used in other files
implementing that specific extension.
Export the function so that it can be used later.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
Currently, all non-booting harts start booting after the booting hart
updates the per-hart stack pointer. This is done in a way that, it's
difficult to implement any other booting method without breaking the
backward compatibility.
Define a cpu_ops method that allows to introduce other booting methods
in future. Modify the current booting method to be compatible with
cpu_ops.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
The secondary hart booting and relocation code are under .init section.
As a result, it will be freed once kernel booting is done. However,
ordered booting protocol and CPU hotplug always requires these functions
to be present to bringup harts after initial kernel boot.
Move the required functions to a different section and make sure that
they are in memory within first 2MB offset as trampoline page directory
only maps first 2MB.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
Few v0.1 SBI calls are being replaced by new SBI calls that follows v0.2
calling convention.
Implement the replacement extensions and few additional new SBI function calls
that makes way for a better SBI interface in future.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
We now have SBI v0.2 which is more scalable and extendable to handle
future needs for RISC-V supervisor interfaces.
Introduce a new config and move all SBI v0.1 code under that config.
This allows to implement the new replacement SBI extensions cleanly
and remove v0.1 extensions easily in future. Currently, the config
is enabled by default. Once all M-mode software, with v0.1, is no
longer in use, this config option and all relevant code can be easily
removed.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
Few v0.1 SBI calls are being replaced by new SBI calls that follows
v0.2 calling convention.
This patch just defines these new extensions.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
The SBI v0.2 introduces a base extension which is backward compatible
with v0.1. Implement all helper functions and minimum required SBI
calls from v0.2 for now. All other base extension function will be
added later as per need.
As v0.2 calling convention is backward compatible with v0.1, remove
the v0.1 helper functions and just use v0.2 calling convention.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
As per the new SBI specification, current SBI implementation version
is defined as 0.1 and will be removed/replaced in future. Each of the
function call in 0.1 is defined as a separate extension which makes
easier to replace them one at a time.
Rename existing implementation to reflect that. This patch is just
a preparatory patch for SBI v0.2 and doesn't introduce any functional
changes.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
The KERN_VIRT_START defines the start virtual address of kernel space.
Use this macro instead of magic number.
Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
In a similar manner to arm64, x86, powerpc, etc., it can traverse all
page tables, and dump the page table layout with the memory types and
permissions.
Add a debugfs file at /sys/kernel/debug/kernel_page_tables to export
the page table layout to userspace.
Signed-off-by: Zong Li <zong.li@sifive.com>
Tested-by: Alexandre Ghiti <alex@ghiti.fr>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
On strict kernel memory permission, the ftrace have to change the
permission of text for dynamic patching the intructions. Use
riscv_patch_text_nosync() to patch code instead of probe_kernel_write.
Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
On strict kernel memory permission, we couldn't patch code without
writable permission. Preserve two holes in fixmap area, so we can map
the kernel code temporarily to fixmap area, then patch the instructions.
We need two pages here because we support the compressed instruction, so
the instruction might be align to 2 bytes. When patching the 32-bit
length instruction which is 2 bytes alignment, it will across two pages.
Introduce two interfaces to patch kernel code:
riscv_patch_text_nosync:
- patch code without synchronization, it's caller's responsibility to
synchronize all CPUs if needed.
riscv_patch_text:
- patch code and always synchronize with stop_machine()
Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
Extract the calculation of instruction length for common use.
Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
The commit contains that make text section as non-writable, rodata
section as read-only, and data section as non-executable.
The init section should be changed to non-executable.
Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
The kernel mapping will tried to optimize its mapping by using bigger
size. In rv64, it tries to use PMD_SIZE, and tryies to use PGDIR_SIZE in
rv32. To ensure that the start address of these sections could fit the
mapping entry size, make them align to the biggest alignment.
Define a macro SECTION_ALIGN because the HPAGE_SIZE or PMD_SIZE, etc.,
are invisible in linker script.
This patch is prepared for STRICT_KERNEL_RWX support.
Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
Move EXCEPTION_TABLE immediately after RO_DATA. Make it easy to set the
attribution of the sections which should be read-only at a time.
Add _data to specify the start of data section with write permission.
This patch is prepared for STRICT_KERNEL_RWX support.
Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
ARCH_SUPPORTS_DEBUG_PAGEALLOC provides a hook to map and unmap
pages for debugging purposes. Implement the __kernel_map_pages
functions to fill the poison pattern.
Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
Add set_direct_map_*() functions for setting the direct map alias for
the page to its default permissions and to an invalid state that cannot
be cached in a TLB. (See d253ca0c ("x86/mm/cpa: Add set_direct_map_*()
functions")) Add a similar implementation for RISC-V.
Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
Add set_memory_ro/rw/x/nx architecture hooks to change the page
attribution.
Use own set_memory.h rather than generic set_memory.h
(i.e. include/asm-generic/set_memory.h), because we want to add other
function prototypes here.
Signed-off-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
If both CONFIG_KASAN and CONFIG_SPARSEMEM_VMEMMAP are set, we get the
following compilation error.
---------------------------------------------------------------
./arch/riscv/include/asm/pgtable-64.h: In function ‘pud_page’:
./include/asm-generic/memory_model.h:54:29: error: ‘vmemmap’ undeclared
(first use in this function); did you mean ‘mem_map’?
#define __pfn_to_page(pfn) (vmemmap + (pfn))
^~~~~~~
./include/asm-generic/memory_model.h:82:21: note: in expansion of
macro ‘__pfn_to_page’
#define pfn_to_page __pfn_to_page
^~~~~~~~~~~~~
./arch/riscv/include/asm/pgtable-64.h:70:9: note: in expansion of macro
‘pfn_to_page’
return pfn_to_page(pud_val(pud) >> _PAGE_PFN_SHIFT);
---------------------------------------------------------------
Fix the compliation errors by moving all the address space definition
macros before including pgtable-64.h.
Fixes: 8ad8b72721d0 (riscv: Add KASAN support)
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
The newly introduced p*d_leaf macros allow to check if an entry of the
page table map to a physical page instead of the next level. To avoid
duplication of code, use those macros to determine if a page table entry
points to a hugepage.
Suggested-by: Paul Walmsley <paul.walmsley@sifive.com>
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
LLVM's integrated assembler doesn't support the LOCAL directive, which we're
using when generating our uaccess fixup tables. Luckily the table fragment is
small enough that there's only one internal symbol, so using a relative symbol
reference doesn't really complicate anything.
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
These are only used once, and when reading the code I've always found them to
be more of a headache than a benefit. While they were never worth removing
before, LLVM's integrated assembler doesn't support LOCAL so rather that trying
to figure out how to refactor the macros it seems saner to just inline them.
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
GCC allows users to hint to the register allocation that a variable should be
placed in a register by using a syntax along the lines of
function(...) {
register long in_REG __asm__("REG");
}
We've abused this a bit throughout the RISC-V port to access fixed registers
directly as C variables. In practice it's never going to blow up because GCC
isn't going to allocate these registers, but it's not a well defined syntax so
we really shouldn't be relying upon this. Luckily there is a very similar but
well defined syntax that allows us to still access these registers directly as
C variables, which is to simply declare the register variables globally. For
fixed variables this doesn't change the ABI.
LLVM disallows this ambiguous syntax, so this isn't just strictly a formatting
change.
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
I don't know why we were doing this, as it's been there since the beginning.
After d841f729e655 ("riscv: force hart_lottery to put in .sdata section") my
guess would be that it made the kernel boot and we forgot to fix it more
cleanly.
The default .bss segment already contains the .sbss section, so we don't need
to do anything additional to ensure the symbols in .sbss continue to work.
Tested-by: Zong Li <zong.li@sifive.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
In PIC code model, the zero initialized data always be put in .bss
section, so when building kernel as PIE, the hart_lottery won't present
in small data section, and it causes more than one harts to get the
lottery, because the main hart clears the content of .bss section
immediately after it getting the lottery.
Signed-off-by: Zong Li <zong.li@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
[Palmer: added a comment]
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
According to init/Kconfig:
"sys_sysfs is an obsolete system call no longer supported in libc.
Note that disabling this option is more secure but might break
compatibility with some systems."
This syscall is not required for new architectures. Since the config
defaults to 'y'. Set this to 'n' exlicitly.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
The only call path is:
__access_remote_vm -> copy_to_user_page -> flush_icache_user_range
Seems it's ok to use flush_icache_mm instead of flush_icache_all and
it could reduce flush_icache_all called on other harts.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
[Palmer: git-am wouldn't apply the patch, I did so manually]
Fixes: 08f051eda33b ("RISC-V: Flush I$ when making a dirty page executable")
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
|
|
|
|
KVM emulates UMIP on hardware that doesn't support it by setting the
'descriptor table exiting' VM-execution control and performing
instruction emulation. When running nested, this emulation is broken as
KVM refuses to emulate L2 instructions by default.
Correct this regression by allowing the emulation of descriptor table
instructions if L1 hasn't requested 'descriptor table exiting'.
Fixes: 07721feee46b ("KVM: nVMX: Don't emulate instructions in guest mode")
Reported-by: Jan Kiszka <jan.kiszka@web.de>
Cc: stable@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
If sbi->s_flex_groups_allocated is zero and the first allocation fails
then this code will crash. The problem is that "i--" will set "i" to
-1 but when we compare "i >= sbi->s_flex_groups_allocated" then the -1
is type promoted to unsigned and becomes UINT_MAX. Since UINT_MAX
is more than zero, the condition is true so we call kvfree(new_groups[-1]).
The loop will carry on freeing invalid memory until it crashes.
Fixes: 7c990728b99e ("ext4: fix potential race between s_flex_groups online resizing and access")
Reviewed-by: Suraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20200228092142.7irbc44yaz3by7nb@kili.mountain
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Removing attach_adapter from this driver caused a regression for at
least some machines. Those machines had the sensors described in their
DT, too, so they didn't need manual creation of the sensor devices. The
old code worked, though, because manual creation came first. Creation of
DT devices then failed later and caused error logs, but the sensors
worked nonetheless because of the manually created devices.
When removing attach_adaper, manual creation now comes later and loses
the race. The sensor devices were already registered via DT, yet with
another binding, so the driver could not be bound to it.
This fix refactors the code to remove the race and only manually creates
devices if there are no DT nodes present. Also, the DT binding is updated
to match both, the DT and manually created devices. Because we don't
know which device creation will be used at runtime, the code to start
the kthread is moved to do_probe() which will be called by both methods.
Fixes: 3e7bed52719d ("macintosh: therm_windtunnel: drop using attach_adapter")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201723
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Tested-by: Erhard Furtner <erhard_f@mailbox.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org # v4.19+
|
|
journal_head::b_transaction and journal_head::b_next_transaction could
be accessed concurrently as noticed by KCSAN,
LTP: starting fsync04
/dev/zero: Can't open blockdev
EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem
EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
==================================================================
BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2]
write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70:
__jbd2_journal_refile_buffer+0xdd/0x210 [jbd2]
__jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569
jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2]
(inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034
kjournald2+0x13b/0x450 [jbd2]
kthread+0x1cd/0x1f0
ret_from_fork+0x27/0x50
read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68:
jbd2_write_access_granted+0x1b2/0x250 [jbd2]
jbd2_write_access_granted at fs/jbd2/transaction.c:1155
jbd2_journal_get_write_access+0x2c/0x60 [jbd2]
__ext4_journal_get_write_access+0x50/0x90 [ext4]
ext4_mb_mark_diskspace_used+0x158/0x620 [ext4]
ext4_mb_new_blocks+0x54f/0xca0 [ext4]
ext4_ind_map_blocks+0xc79/0x1b40 [ext4]
ext4_map_blocks+0x3b4/0x950 [ext4]
_ext4_get_block+0xfc/0x270 [ext4]
ext4_get_block+0x3b/0x50 [ext4]
__block_write_begin_int+0x22e/0xae0
__block_write_begin+0x39/0x50
ext4_write_begin+0x388/0xb50 [ext4]
generic_perform_write+0x15d/0x290
ext4_buffered_write_iter+0x11f/0x210 [ext4]
ext4_file_write_iter+0xce/0x9e0 [ext4]
new_sync_write+0x29c/0x3b0
__vfs_write+0x92/0xa0
vfs_write+0x103/0x260
ksys_write+0x9d/0x130
__x64_sys_write+0x4c/0x60
do_syscall_64+0x91/0xb05
entry_SYSCALL_64_after_hwframe+0x49/0xbe
5 locks held by fsync04/25724:
#0: ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260
#1: ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4]
#2: ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2]
#3: ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4]
#4: ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2]
irq event stamp: 1407125
hardirqs last enabled at (1407125): [<ffffffff980da9b7>] __find_get_block+0x107/0x790
hardirqs last disabled at (1407124): [<ffffffff980da8f9>] __find_get_block+0x49/0x790
softirqs last enabled at (1405528): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c
softirqs last disabled at (1405521): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0
Reported by Kernel Concurrency Sanitizer on:
CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-20200221+ #7
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
The plain reads are outside of jh->b_state_lock critical section which result
in data races. Fix them by adding pairs of READ|WRITE_ONCE().
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/r/20200222043111.2227-1-cai@lca.pw
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
In older version of systemd(219), at boot time, udevadm is called with :
/usr/bin/udevadm trigger --type=devices --action=add"
This program generates an echo "add" in /sys/devices/system/cpu/cpu<x>/uevent,
leading to the "kvm: disabled by bios" message in case of your Bios disabled
the virtualization extensions.
On a modern system running up to 256 CPU threads, this pollutes the Kernel logs.
This patch offers to ratelimit this message to avoid any userspace program triggering
this uevent printing this message too often.
This patch is only a workaround but greatly reduce the pollution without
breaking the current behavior of printing a message if some try to instantiate
KVM on a system that doesn't support it.
Note that recent versions of systemd (>239) do not have trigger this behavior.
This patch will be useful at least for some using older systemd with recent Kernels.
Signed-off-by: Erwan Velu <e.velu@criteo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
struct cpufreq_policy is quite big and it is not a good idea
to allocate one on the stack. Just use cpufreq_cpu_get and
cpufreq_cpu_put which is even simpler.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Restrict -Werror to well-tested configurations and allow disabling it
via Kconfig.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Compile error with CONFIG_KVM_INTEL=y and W=1:
CC arch/x86/kvm/vmx/vmx.o
arch/x86/kvm/vmx/vmx.c:68:32: error: 'vmx_cpu_id' defined but not used [-Werror=unused-const-variable=]
68 | static const struct x86_cpu_id vmx_cpu_id[] = {
| ^~~~~~~~~~
cc1: all warnings being treated as errors
When building with =y, the MODULE_DEVICE_TABLE macro doesn't generate a
reference to the structure (or any code at all). This makes W=1 compiles
unhappy.
Wrap both in a #ifdef to avoid the issue.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
[Do the same for CONFIG_KVM_AMD. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Nick Desaulniers Reported:
When building with:
$ make CC=clang arch/x86/ CFLAGS=-Wframe-larger-than=1000
The following warning is observed:
arch/x86/kernel/kvm.c:494:13: warning: stack frame size of 1064 bytes in
function 'kvm_send_ipi_mask_allbutself' [-Wframe-larger-than=]
static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int
vector)
^
Debugging with:
https://github.com/ClangBuiltLinux/frame-larger-than
via:
$ python3 frame_larger_than.py arch/x86/kernel/kvm.o \
kvm_send_ipi_mask_allbutself
points to the stack allocated `struct cpumask newmask` in
`kvm_send_ipi_mask_allbutself`. The size of a `struct cpumask` is
potentially large, as it's CONFIG_NR_CPUS divided by BITS_PER_LONG for
the target architecture. CONFIG_NR_CPUS for X86_64 can be as high as
8192, making a single instance of a `struct cpumask` 1024 B.
This patch fixes it by pre-allocate 1 cpumask variable per cpu and use it for
both pv tlb and pv ipis..
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Introduce some pv check helpers for consistency.
Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Sparse notices that declaration and implementation do not match:
arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17: warning: incorrect type in return expression (different address spaces)
arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17: expected struct kvm_vcpu [noderef] <asn:3> **
arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17: got struct kvm_vcpu *[noderef] <asn:3> *
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Even if APICv is disabled at startup, the backing page and ir_list need
to be initialized in case they are needed later. The only case in
which this can be skipped is for userspace irqchip, and that must be
done because avic_init_backing_page dereferences vcpu->arch.apic
(which is NULL for userspace irqchip).
Tested-by: rmuncrief@humanavance.com
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=206579
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
de80f95ccb9c ("PCI: cadence: Move all files to per-device cadence
directory") moved files of the PCI cadence drivers, but did not update the
MAINTAINERS entry.
Since then, ./scripts/get_maintainer.pl --self-test complains:
warning: no file matches F: drivers/pci/controller/pcie-cadence*
Repair the MAINTAINERS entry.
Link: https://lore.kernel.org/r/20200221185402.4703-1-lukas.bulwahn@gmail.com
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|