linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2020-04-03	riscv: Add Kendryte K210 device tree	Damien Le Moal	5	-0/+169
	Add a generic device tree for Kendryte K210 SoC based boards. This is for now a very simple device tree describing the core elements of the SoC. This is suitable (and tested) for the Kendryte KD233 development board, the Sipeed MAIX M1 Dan Dock board and the Sipeed MAIXDUINO board. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Sean Anderson <seanga2@gmail.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-04-03	riscv: Select required drivers for Kendryte SOC	Damien Le Moal	1	-0/+4
	This patch selects drivers required for the Kendryte K210 SOC. Since K210 SoC based boards do not provide a device tree, this patch also enables the BUILTIN_DTB option. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-04-03	riscv: Add Kendryte K210 SoC support	Christoph Hellwig	6	-0/+297
	Add support for the Kendryte K210 RISC-V SoC. For now, this support only provides a simple sysctl driver allowing to setup the CPU and uart clock. This support is enabled through the new Kconfig option SOC_KENDRYTE and defines the config option CONFIG_K210_SYSCTL to enable the K210 SoC sysctl driver compilation. The sysctl driver also registers an early SoC initialization function allowing enabling the general purpose use of the 2MB of SRAM normally reserved for the SoC AI engine. This initialization function is automatically called before the dt early initialization using the flat dt root node compatible property matching the value "kendryte,k210". Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> [Palmer: Add missing endmenu in Kconfig.socs] Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-04-03	riscv: Add SOC early init support	Damien Le Moal	5	-0/+59
	Add a mechanism for early SoC initialization for platforms that need additional hardware initialization not possible through the regular device tree and drivers mechanism. With this, a SoC specific initialization function can be called very early, before DTB parsing is done by parse_dtb() in Linux RISC-V kernel setup code. This can be very useful for early hardware initialization for No-MMU kernels booted directly in M-mode because it is quite likely that no other booting stage exist prior to the No-MMU kernel. Example use of a SoC early initialization is as follows: static void vendor_abc_early_init(const void fdt) { / * some early init code here that can use simple matches * against the flat device tree file. */ } SOC_EARLY_INIT_DECLARE("vendor,abc", abc_early_init); This early initialization function is executed only if the flat device tree for the board has a 'compatible = "vendor,abc"' entry; Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-04-03	riscv: Unaligned load/store handling for M_MODE	Damien Le Moal	3	-4/+395
	Add handlers for unaligned load and store traps that may be generated by applications. Code heavily inspired from the OpenSBI project. Handling of the unaligned access traps is suitable for applications compiled with or without compressed instructions and is independent of the kernel CONFIG_RISCV_ISA_C option value. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-31	RISC-V: Support cpu hotplug	Atish Patra	7	-2/+180
	This patch enable support for cpu hotplug in RISC-V. It uses SBI HSM extension to online/offline any hart. As a result, the harts are returned to firmware once they are offline. If the harts are brought online afterwards, they re-enter Linux kernel as if a secondary hart booted for the first time. All booting requirements are honored during this process. Tested both on QEMU and HighFive Unleashed board with. Test result follows. --------------------------------------------------- Offline cpu 2 --------------------------------------------------- $ echo 0 > /sys/devices/system/cpu/cpu2/online [ 32.828684] CPU2: off $ cat /proc/cpuinfo processor : 0 hart : 0 isa : rv64imafdcsu mmu : sv48 processor : 1 hart : 1 isa : rv64imafdcsu mmu : sv48 processor : 3 hart : 3 isa : rv64imafdcsu mmu : sv48 processor : 4 hart : 4 isa : rv64imafdcsu mmu : sv48 processor : 5 hart : 5 isa : rv64imafdcsu mmu : sv48 processor : 6 hart : 6 isa : rv64imafdcsu mmu : sv48 processor : 7 hart : 7 isa : rv64imafdcsu mmu : sv48 --------------------------------------------------- online cpu 2 --------------------------------------------------- $ echo 1 > /sys/devices/system/cpu/cpu2/online $ cat /proc/cpuinfo processor : 0 hart : 0 isa : rv64imafdcsu mmu : sv48 processor : 1 hart : 1 isa : rv64imafdcsu mmu : sv48 processor : 2 hart : 2 isa : rv64imafdcsu mmu : sv48 processor : 3 hart : 3 isa : rv64imafdcsu mmu : sv48 processor : 4 hart : 4 isa : rv64imafdcsu mmu : sv48 processor : 5 hart : 5 isa : rv64imafdcsu mmu : sv48 processor : 6 hart : 6 isa : rv64imafdcsu mmu : sv48 processor : 7 hart : 7 isa : rv64imafdcsu mmu : sv48 Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org>
2020-03-31	RISC-V: Add supported for ordered booting method using HSM	Atish Patra	6	-3/+121
	Currently, all harts have to jump Linux in RISC-V. This complicates the multi-stage boot process as every transient stage also has to ensure all harts enter to that stage and jump to Linux afterwards. It also obstructs a clean Kexec implementation. SBI HSM extension provides alternate solutions where only a single hart need to boot and enter Linux. The booting hart can bring up secondary harts one by one afterwards. Add SBI HSM based cpu_ops that implements an ordered booting method in RISC-V. This change is also backward compatible with older firmware not implementing HSM extension. If a latest kernel is used with older firmware, it will continue to use the default spinning booting method. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-31	RISC-V: Add SBI HSM extension definitions	Atish Patra	1	-0/+14
	SBI specification defines HSM extension that allows to start/stop a hart by a supervisor anytime. The specification is available at https://github.com/riscv/riscv-sbi-doc/blob/master/riscv-sbi.adoc Add those definitions here. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-31	RISC-V: Export SBI error to linux error mapping function	Atish Patra	2	-1/+4
	All SBI related extensions will not be implemented in sbi.c to avoid bloating. Thus, sbi_err_map_linux_errno() will be used in other files implementing that specific extension. Export the function so that it can be used later. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-31	RISC-V: Add cpu_ops and modify default booting method	Atish Patra	5	-21/+147
	Currently, all non-booting harts start booting after the booting hart updates the per-hart stack pointer. This is done in a way that, it's difficult to implement any other booting method without breaking the backward compatibility. Define a cpu_ops method that allows to introduce other booting methods in future. Modify the current booting method to be compatible with cpu_ops. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-31	RISC-V: Move relocate and few other functions out of __init	Atish Patra	2	-72/+86
	The secondary hart booting and relocation code are under .init section. As a result, it will be freed once kernel booting is done. However, ordered booting protocol and CPU hotplug always requires these functions to be present to bringup harts after initial kernel boot. Move the required functions to a different section and make sure that they are in memory within first 2MB offset as trampoline page directory only maps first 2MB. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-31	RISC-V: Implement new SBI v0.2 extensions	Atish Patra	3	-4/+270
	Few v0.1 SBI calls are being replaced by new SBI calls that follows v0.2 calling convention. Implement the replacement extensions and few additional new SBI function calls that makes way for a better SBI interface in future. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-31	RISC-V: Introduce a new config for SBI v0.1	Atish Patra	3	-23/+118
	We now have SBI v0.2 which is more scalable and extendable to handle future needs for RISC-V supervisor interfaces. Introduce a new config and move all SBI v0.1 code under that config. This allows to implement the new replacement SBI extensions cleanly and remove v0.1 extensions easily in future. Currently, the config is enabled by default. Once all M-mode software, with v0.1, is no longer in use, this config option and all relevant code can be easily removed. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-31	RISC-V: Add SBI v0.2 extension definitions	Atish Patra	1	-0/+21
	Few v0.1 SBI calls are being replaced by new SBI calls that follows v0.2 calling convention. This patch just defines these new extensions. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-31	RISC-V: Add basic support for SBI v0.2	Atish Patra	3	-73/+314
	The SBI v0.2 introduces a base extension which is backward compatible with v0.1. Implement all helper functions and minimum required SBI calls from v0.2 for now. All other base extension function will be added later as per need. As v0.2 calling convention is backward compatible with v0.1, remove the v0.1 helper functions and just use v0.2 calling convention. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-31	RISC-V: Mark existing SBI as 0.1 SBI.	Atish Patra	1	-19/+22
	As per the new SBI specification, current SBI implementation version is defined as 0.1 and will be removed/replaced in future. Each of the function call in 0.1 is defined as a separate extension which makes easier to replace them one at a time. Rename existing implementation to reflect that. This patch is just a preparatory patch for SBI v0.2 and doesn't introduce any functional changes. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-26	riscv: Use macro definition instead of magic number	Zong Li	1	-1/+1
	The KERN_VIRT_START defines the start virtual address of kernel space. Use this macro instead of magic number. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-26	riscv: Add support to dump the kernel page tables	Zong Li	5	-0/+340
	In a similar manner to arm64, x86, powerpc, etc., it can traverse all page tables, and dump the page table layout with the memory types and permissions. Add a debugfs file at /sys/kernel/debug/kernel_page_tables to export the page table layout to userspace. Signed-off-by: Zong Li <zong.li@sifive.com> Tested-by: Alexandre Ghiti <alex@ghiti.fr> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-26	riscv: patch code by fixmap mapping	Zong Li	1	-9/+4
	On strict kernel memory permission, the ftrace have to change the permission of text for dynamic patching the intructions. Use riscv_patch_text_nosync() to patch code instead of probe_kernel_write. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-26	riscv: introduce interfaces to patch kernel code	Zong Li	4	-1/+137
	On strict kernel memory permission, we couldn't patch code without writable permission. Preserve two holes in fixmap area, so we can map the kernel code temporarily to fixmap area, then patch the instructions. We need two pages here because we support the compressed instruction, so the instruction might be align to 2 bytes. When patching the 32-bit length instruction which is 2 bytes alignment, it will across two pages. Introduce two interfaces to patch kernel code: riscv_patch_text_nosync: - patch code without synchronization, it's caller's responsibility to synchronize all CPUs if needed. riscv_patch_text: - patch code and always synchronize with stop_machine() Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-26	riscv: add macro to get instruction length	Zong Li	2	-1/+10
	Extract the calculation of instruction length for common use. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-26	riscv: add STRICT_KERNEL_RWX support	Zong Li	3	-0/+53
	The commit contains that make text section as non-writable, rodata section as read-only, and data section as non-executable. The init section should be changed to non-executable. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-26	riscv: add alignment for text, rodata and data sections	Zong Li	2	-1/+17
	The kernel mapping will tried to optimize its mapping by using bigger size. In rv64, it tries to use PMD_SIZE, and tryies to use PGDIR_SIZE in rv32. To ensure that the start address of these sections could fit the mapping entry size, make them align to the biggest alignment. Define a macro SECTION_ALIGN because the HPAGE_SIZE or PMD_SIZE, etc., are invisible in linker script. This patch is prepared for STRICT_KERNEL_RWX support. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-26	riscv: move exception table immediately after RO_DATA	Zong Li	1	-2/+4
	Move EXCEPTION_TABLE immediately after RO_DATA. Make it easy to set the attribution of the sections which should be read-only at a time. Add _data to specify the start of data section with write permission. This patch is prepared for STRICT_KERNEL_RWX support. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-26	riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support	Zong Li	2	-0/+16
	ARCH_SUPPORTS_DEBUG_PAGEALLOC provides a hook to map and unmap pages for debugging purposes. Implement the __kernel_map_pages functions to fill the poison pattern. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-26	riscv: add ARCH_HAS_SET_DIRECT_MAP support	Zong Li	3	-0/+28
	Add set_direct_map_() functions for setting the direct map alias for the page to its default permissions and to an invalid state that cannot be cached in a TLB. (See d253ca0c ("x86/mm/cpa: Add set_direct_map_() functions")) Add a similar implementation for RISC-V. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-26	riscv: add ARCH_HAS_SET_MEMORY support	Zong Li	4	-1/+176
	Add set_memory_ro/rw/x/nx architecture hooks to change the page attribution. Use own set_memory.h rather than generic set_memory.h (i.e. include/asm-generic/set_memory.h), because we want to add other function prototypes here. Signed-off-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-05	RISC-V: Move all address space definition macros to one place	Atish Patra	1	-37/+41
	If both CONFIG_KASAN and CONFIG_SPARSEMEM_VMEMMAP are set, we get the following compilation error. --------------------------------------------------------------- ./arch/riscv/include/asm/pgtable-64.h: In function ‘pud_page’: ./include/asm-generic/memory_model.h:54:29: error: ‘vmemmap’ undeclared (first use in this function); did you mean ‘mem_map’? #define __pfn_to_page(pfn) (vmemmap + (pfn)) ^~~~~~~ ./include/asm-generic/memory_model.h:82:21: note: in expansion of macro ‘__pfn_to_page’ #define pfn_to_page __pfn_to_page ^~~~~~~~~~~~~ ./arch/riscv/include/asm/pgtable-64.h:70:9: note: in expansion of macro ‘pfn_to_page’ return pfn_to_page(pud_val(pud) >> _PAGE_PFN_SHIFT); --------------------------------------------------------------- Fix the compliation errors by moving all the address space definition macros before including pgtable-64.h. Fixes: 8ad8b72721d0 (riscv: Add KASAN support) Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-05	riscv: Use pd_leaf macros to define pd_huge	Alexandre Ghiti	1	-4/+2
	The newly introduced p*d_leaf macros allow to check if an entry of the page table map to a physical page instead of the next level. To avoid duplication of code, use those macros to determine if a page table entry points to a hugepage. Suggested-by: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-03	RISC-V: Stop using LOCAL for the uaccess fixups	Palmer Dabbelt	1	-4/+2
	LLVM's integrated assembler doesn't support the LOCAL directive, which we're using when generating our uaccess fixup tables. Luckily the table fragment is small enough that there's only one internal symbol, so using a relative symbol reference doesn't really complicate anything. Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-03	RISC-V: Inline the assembly register save/restore macros	Palmer Dabbelt	1	-82/+61
	These are only used once, and when reading the code I've always found them to be more of a headache than a benefit. While they were never worth removing before, LLVM's integrated assembler doesn't support LOCAL so rather that trying to figure out how to refactor the macros it seems saner to just inline them. Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-03	RISC-V: Stop relying on GCC's register allocator's hueristics	Palmer Dabbelt	3	-7/+10
	GCC allows users to hint to the register allocation that a variable should be placed in a register by using a syntax along the lines of function(...) { register long in_REG __asm__("REG"); } We've abused this a bit throughout the RISC-V port to access fixed registers directly as C variables. In practice it's never going to blow up because GCC isn't going to allocate these registers, but it's not a well defined syntax so we really shouldn't be relying upon this. Luckily there is a very similar but well defined syntax that allows us to still access these registers directly as C variables, which is to simply declare the register variables globally. For fixed variables this doesn't change the ABI. LLVM disallows this ambiguous syntax, so this isn't just strictly a formatting change. Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-03	RISC-V: Stop putting .sbss in .sdata	Palmer Dabbelt	1	-1/+0
	I don't know why we were doing this, as it's been there since the beginning. After d841f729e655 ("riscv: force hart_lottery to put in .sdata section") my guess would be that it made the kernel boot and we forgot to fix it more cleanly. The default .bss segment already contains the .sbss section, so we don't need to do anything additional to ensure the symbols in .sbss continue to work. Tested-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-03	riscv: force hart_lottery to put in .sdata section	Zong Li	1	-2/+6
	In PIC code model, the zero initialized data always be put in .bss section, so when building kernel as PIE, the hart_lottery won't present in small data section, and it causes more than one harts to get the lottery, because the main hart clears the content of .bss section immediately after it getting the lottery. Signed-off-by: Zong Li <zong.li@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> [Palmer: added a comment] Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-03	riscv: Delete CONFIG_SYSFS_SYSCALL from defconfigs	Deepa Dinamani	2	-0/+2
	According to init/Kconfig: "sys_sysfs is an obsolete system call no longer supported in libc. Note that disabling this option is more secure but might break compatibility with some systems." This syscall is not required for new architectures. Since the config defaults to 'y'. Set this to 'n' exlicitly. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-03	riscv: Use flush_icache_mm for flush_icache_user_range	Guo Ren	1	-1/+1
	The only call path is: __access_remote_vm -> copy_to_user_page -> flush_icache_user_range Seems it's ok to use flush_icache_mm instead of flush_icache_all and it could reduce flush_icache_all called on other harts. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> [Palmer: git-am wouldn't apply the patch, I did so manually] Fixes: 08f051eda33b ("RISC-V: Flush I$ when making a dirty page executable") Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2020-03-01	Linux 5.6-rc4	Linus Torvalds	1	-1/+1

2020-03-01	KVM: VMX: check descriptor table exits on instruction emulation	Oliver Upton	1	-0/+15
	KVM emulates UMIP on hardware that doesn't support it by setting the 'descriptor table exiting' VM-execution control and performing instruction emulation. When running nested, this emulation is broken as KVM refuses to emulate L2 instructions by default. Correct this regression by allowing the emulation of descriptor table instructions if L1 hasn't requested 'descriptor table exiting'. Fixes: 07721feee46b ("KVM: nVMX: Don't emulate instructions in guest mode") Reported-by: Jan Kiszka <jan.kiszka@web.de> Cc: stable@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-29	ext4: potential crash on allocation error in ext4_alloc_flex_bg_array()	Dan Carpenter	1	-3/+3
	If sbi->s_flex_groups_allocated is zero and the first allocation fails then this code will crash. The problem is that "i--" will set "i" to -1 but when we compare "i >= sbi->s_flex_groups_allocated" then the -1 is type promoted to unsigned and becomes UINT_MAX. Since UINT_MAX is more than zero, the condition is true so we call kvfree(new_groups[-1]). The loop will carry on freeing invalid memory until it crashes. Fixes: 7c990728b99e ("ext4: fix potential race between s_flex_groups online resizing and access") Reviewed-by: Suraj Jitindar Singh <surajjs@amazon.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20200228092142.7irbc44yaz3by7nb@kili.mountain Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-02-29	macintosh: therm_windtunnel: fix regression when instantiating devices	Wolfram Sang	1	-21/+31
	Removing attach_adapter from this driver caused a regression for at least some machines. Those machines had the sensors described in their DT, too, so they didn't need manual creation of the sensor devices. The old code worked, though, because manual creation came first. Creation of DT devices then failed later and caused error logs, but the sensors worked nonetheless because of the manually created devices. When removing attach_adaper, manual creation now comes later and loses the race. The sensor devices were already registered via DT, yet with another binding, so the driver could not be bound to it. This fix refactors the code to remove the race and only manually creates devices if there are no DT nodes present. Also, the DT binding is updated to match both, the DT and manually created devices. Because we don't know which device creation will be used at runtime, the code to start the kthread is moved to do_probe() which will be called by both methods. Fixes: 3e7bed52719d ("macintosh: therm_windtunnel: drop using attach_adapter") Link: https://bugzilla.kernel.org/show_bug.cgi?id=201723 Reported-by: Erhard Furtner <erhard_f@mailbox.org> Tested-by: Erhard Furtner <erhard_f@mailbox.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org # v4.19+
2020-02-29	jbd2: fix data races at struct journal_head	Qian Cai	1	-4/+4
	journal_head::b_transaction and journal_head::b_next_transaction could be accessed concurrently as noticed by KCSAN, LTP: starting fsync04 /dev/zero: Can't open blockdev EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null) ================================================================== BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2] write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70: __jbd2_journal_refile_buffer+0xdd/0x210 [jbd2] __jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569 jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2] (inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034 kjournald2+0x13b/0x450 [jbd2] kthread+0x1cd/0x1f0 ret_from_fork+0x27/0x50 read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68: jbd2_write_access_granted+0x1b2/0x250 [jbd2] jbd2_write_access_granted at fs/jbd2/transaction.c:1155 jbd2_journal_get_write_access+0x2c/0x60 [jbd2] __ext4_journal_get_write_access+0x50/0x90 [ext4] ext4_mb_mark_diskspace_used+0x158/0x620 [ext4] ext4_mb_new_blocks+0x54f/0xca0 [ext4] ext4_ind_map_blocks+0xc79/0x1b40 [ext4] ext4_map_blocks+0x3b4/0x950 [ext4] _ext4_get_block+0xfc/0x270 [ext4] ext4_get_block+0x3b/0x50 [ext4] __block_write_begin_int+0x22e/0xae0 __block_write_begin+0x39/0x50 ext4_write_begin+0x388/0xb50 [ext4] generic_perform_write+0x15d/0x290 ext4_buffered_write_iter+0x11f/0x210 [ext4] ext4_file_write_iter+0xce/0x9e0 [ext4] new_sync_write+0x29c/0x3b0 __vfs_write+0x92/0xa0 vfs_write+0x103/0x260 ksys_write+0x9d/0x130 __x64_sys_write+0x4c/0x60 do_syscall_64+0x91/0xb05 entry_SYSCALL_64_after_hwframe+0x49/0xbe 5 locks held by fsync04/25724: #0: ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260 #1: ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4] #2: ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2] #3: ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4] #4: ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2] irq event stamp: 1407125 hardirqs last enabled at (1407125): [<ffffffff980da9b7>] __find_get_block+0x107/0x790 hardirqs last disabled at (1407124): [<ffffffff980da8f9>] __find_get_block+0x49/0x790 softirqs last enabled at (1405528): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c softirqs last disabled at (1405521): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0 Reported by Kernel Concurrency Sanitizer on: CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-20200221+ #7 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 The plain reads are outside of jh->b_state_lock critical section which result in data races. Fix them by adding pairs of READ\|WRITE_ONCE(). Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Qian Cai <cai@lca.pw> Link: https://lore.kernel.org/r/20200222043111.2227-1-cai@lca.pw Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-02-28	kvm: x86: Limit the number of "kvm: disabled by bios" messages	Erwan Velu	1	-2/+2
	In older version of systemd(219), at boot time, udevadm is called with : /usr/bin/udevadm trigger --type=devices --action=add" This program generates an echo "add" in /sys/devices/system/cpu/cpu<x>/uevent, leading to the "kvm: disabled by bios" message in case of your Bios disabled the virtualization extensions. On a modern system running up to 256 CPU threads, this pollutes the Kernel logs. This patch offers to ratelimit this message to avoid any userspace program triggering this uevent printing this message too often. This patch is only a workaround but greatly reduce the pollution without breaking the current behavior of printing a message if some try to instantiate KVM on a system that doesn't support it. Note that recent versions of systemd (>239) do not have trigger this behavior. This patch will be useful at least for some using older systemd with recent Kernels. Signed-off-by: Erwan Velu <e.velu@criteo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28	KVM: x86: avoid useless copy of cpufreq policy	Paolo Bonzini	1	-5/+5
	struct cpufreq_policy is quite big and it is not a good idea to allocate one on the stack. Just use cpufreq_cpu_get and cpufreq_cpu_put which is even simpler. Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28	KVM: allow disabling -Werror	Paolo Bonzini	2	-1/+14
	Restrict -Werror to well-tested configurations and allow disabling it via Kconfig. Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28	KVM: x86: allow compiling as non-module with W=1	Valdis Klētnieks	2	-0/+4
	Compile error with CONFIG_KVM_INTEL=y and W=1: CC arch/x86/kvm/vmx/vmx.o arch/x86/kvm/vmx/vmx.c:68:32: error: 'vmx_cpu_id' defined but not used [-Werror=unused-const-variable=] 68 \| static const struct x86_cpu_id vmx_cpu_id[] = { \| ^~~~~~~~~~ cc1: all warnings being treated as errors When building with =y, the MODULE_DEVICE_TABLE macro doesn't generate a reference to the structure (or any code at all). This makes W=1 compiles unhappy. Wrap both in a #ifdef to avoid the issue. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> [Do the same for CONFIG_KVM_AMD. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28	KVM: Pre-allocate 1 cpumask variable per cpu for both pv tlb and pv ipis	Wanpeng Li	1	-12/+21
	Nick Desaulniers Reported: When building with: $ make CC=clang arch/x86/ CFLAGS=-Wframe-larger-than=1000 The following warning is observed: arch/x86/kernel/kvm.c:494:13: warning: stack frame size of 1064 bytes in function 'kvm_send_ipi_mask_allbutself' [-Wframe-larger-than=] static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int vector) ^ Debugging with: https://github.com/ClangBuiltLinux/frame-larger-than via: $ python3 frame_larger_than.py arch/x86/kernel/kvm.o \ kvm_send_ipi_mask_allbutself points to the stack allocated `struct cpumask newmask` in `kvm_send_ipi_mask_allbutself`. The size of a `struct cpumask` is potentially large, as it's CONFIG_NR_CPUS divided by BITS_PER_LONG for the target architecture. CONFIG_NR_CPUS for X86_64 can be as high as 8192, making a single instance of a `struct cpumask` 1024 B. This patch fixes it by pre-allocate 1 cpumask variable per cpu and use it for both pv tlb and pv ipis.. Reported-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28	KVM: Introduce pv check helpers	Wanpeng Li	1	-10/+24
	Introduce some pv check helpers for consistency. Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28	KVM: let declaration of kvm_get_running_vcpus match implementation	Christian Borntraeger	1	-1/+1
	Sparse notices that declaration and implementation do not match: arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17: warning: incorrect type in return expression (different address spaces) arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17: expected struct kvm_vcpu [noderef] <asn:3> ** arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17: got struct kvm_vcpu [noderef] <asn:3> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28	KVM: SVM: allocate AVIC data structures based on kvm_amd module parameter	Paolo Bonzini	1	-1/+2
	Even if APICv is disabled at startup, the backing page and ir_list need to be initialized in case they are needed later. The only case in which this can be skipped is for userspace irqchip, and that must be done because avic_init_backing_page dereferences vcpu->arch.apic (which is NULL for userspace irqchip). Tested-by: rmuncrief@humanavance.com Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=206579 Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-27	MAINTAINERS: Correct Cadence PCI driver path	Lukas Bulwahn	1	-1/+1
	de80f95ccb9c ("PCI: cadence: Move all files to per-device cadence directory") moved files of the PCI cadence drivers, but did not update the MAINTAINERS entry. Since then, ./scripts/get_maintainer.pl --self-test complains: warning: no file matches F: drivers/pci/controller/pcie-cadence* Repair the MAINTAINERS entry. Link: https://lore.kernel.org/r/20200221185402.4703-1-lukas.bulwahn@gmail.com Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>