| Age | Commit message (Collapse) | Author | Files | Lines |
|
Modules always live before the kernel, MODULES_END is fixed but
MODULES_VADDR isn't fixed, it depends on the kernel size.
Let's add it to virtual kernel memory layout dump.
As MODULES is only defined for CONFIG_64BIT, so we dump it when
CONFIG_64BIT=y.
eg,
MODULES_VADDR - MODULES_END
0xffffffff01133000 - 0xffffffff80000000
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220811074150.3020189-5-xianting.tian@linux.alibaba.com
Cc: stable@vger.kernel.org
Fixes: 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear mapping")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Use existing stringify macro from the kernel headers.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20220623112905.253157-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Current task of executing crash kexec will be schedule out when panic is
triggered by RCU Stall, as it needs to wait rcu completion. It lead to
inability to enter the crash system.
The implementation of machine_crash_shutdown() is non-standard for RISC-V
according to other Arch's implementation(eg, x86, arm64), we need to send
IPI to stop secondary harts.
[224521.877268] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[224521.883471] rcu: 0-...0: (3 GPs behind) idle=cfa/0/0x1 softirq=3968793/3968793 fqs=2495
[224521.891742] (detected by 2, t=5255 jiffies, g=60855593, q=328)
[224521.897754] Task dump for CPU 0:
[224521.901074] task:swapper/0 state:R running task stack: 0 pid: 0 ppid: 0 flags:0x00000008
[224521.911090] Call Trace:
[224521.913638] [<ffffffe000c432de>] __schedule+0x208/0x5ea
[224521.918957] Kernel panic - not syncing: RCU Stall
[224521.923773] bad: scheduling from the idle thread!
[224521.928571] CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Tainted: G O 5.10.113-yocto-standard #1
[224521.938658] Call Trace:
[224521.941200] [<ffffffe00020395c>] walk_stackframe+0x0/0xaa
[224521.946689] [<ffffffe000c34f8e>] show_stack+0x32/0x3e
[224521.951830] [<ffffffe000c39020>] dump_stack_lvl+0x7e/0xa2
[224521.957317] [<ffffffe000c39058>] dump_stack+0x14/0x1c
[224521.962459] [<ffffffe000243884>] dequeue_task_idle+0x2c/0x40
[224521.968207] [<ffffffe000c434f4>] __schedule+0x41e/0x5ea
[224521.973520] [<ffffffe000c43826>] schedule+0x34/0xe4
[224521.978487] [<ffffffe000c46cae>] schedule_timeout+0xc6/0x170
[224521.984234] [<ffffffe000c4491e>] wait_for_completion+0x98/0xf2
[224521.990157] [<ffffffe00026d9e2>] __wait_rcu_gp+0x148/0x14a
[224521.995733] [<ffffffe0002761c4>] synchronize_rcu+0x5c/0x66
[224522.001307] [<ffffffe00026f1a6>] rcu_sync_enter+0x54/0xe6
[224522.006795] [<ffffffe00025a436>] percpu_down_write+0x32/0x11c
[224522.012629] [<ffffffe000c4266a>] _cpu_down+0x92/0x21a
[224522.017771] [<ffffffe000219a0a>] smp_shutdown_nonboot_cpus+0x90/0x118
[224522.024299] [<ffffffe00020701e>] machine_crash_shutdown+0x30/0x4a
[224522.030483] [<ffffffe00029a3f8>] __crash_kexec+0x62/0xa6
[224522.035884] [<ffffffe000c3515e>] panic+0xfa/0x2b6
[224522.040678] [<ffffffe0002772be>] rcu_sched_clock_irq+0xc26/0xcb8
[224522.046774] [<ffffffe00027fc7a>] update_process_times+0x62/0x8a
[224522.052785] [<ffffffe00028d522>] tick_sched_timer+0x9e/0x102
[224522.058533] [<ffffffe000280c3a>] __hrtimer_run_queues+0x16a/0x318
[224522.064716] [<ffffffe0002812ec>] hrtimer_interrupt+0xd4/0x228
[224522.070551] [<ffffffe0009a69b6>] riscv_timer_interrupt+0x3c/0x48
[224522.076646] [<ffffffe000268f8c>] handle_percpu_devid_irq+0xb0/0x24c
[224522.083004] [<ffffffe00026428e>] __handle_domain_irq+0xa8/0x122
[224522.089014] [<ffffffe00062f954>] riscv_intc_irq+0x38/0x60
[224522.094501] [<ffffffe000201bd4>] ret_from_exception+0x0/0xc
[224522.100161] [<ffffffe000c42146>] rcu_eqs_enter.constprop.0+0x8c/0xb8
With the patch, it can enter crash system when RCU Stall occur.
Fixes: e53d28180d4d ("RISC-V: Add kdump support")
Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220811074150.3020189-4-xianting.tian@linux.alibaba.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
When use 'echo c > /proc/sysrq-trigger' to trigger kdump, riscv_crash_save_regs()
will be called to save regs for vmcore, we found "epc" value 00ffffffa5537400
is not a valid kernel virtual address, but is a user virtual address. Other
regs(eg, ra, sp, gp...) are correct kernel virtual address.
Actually 0x00ffffffb0dd9400 is the user mode PC of 'PID: 113 Comm: sh', which
is saved in the task's stack.
[ 21.201701] CPU: 0 PID: 113 Comm: sh Kdump: loaded Not tainted 5.18.9 #45
[ 21.201979] Hardware name: riscv-virtio,qemu (DT)
[ 21.202160] epc : 00ffffffa5537400 ra : ffffffff80088640 sp : ff20000010333b90
[ 21.202435] gp : ffffffff810dde38 tp : ff6000000226c200 t0 : ffffffff8032be7c
[ 21.202707] t1 : 0720072007200720 t2 : 30203a7375746174 s0 : ff20000010333cf0
[ 21.202973] s1 : 0000000000000000 a0 : ff20000010333b98 a1 : 0000000000000001
[ 21.203243] a2 : 0000000000000010 a3 : 0000000000000000 a4 : 28c8f0aeffea4e00
[ 21.203519] a5 : 28c8f0aeffea4e00 a6 : 0000000000000009 a7 : ffffffff8035c9b8
[ 21.203794] s2 : ffffffff810df0a8 s3 : ffffffff810df718 s4 : ff20000010333b98
[ 21.204062] s5 : 0000000000000000 s6 : 0000000000000007 s7 : ffffffff80c4a468
[ 21.204331] s8 : 00ffffffef451410 s9 : 0000000000000007 s10: 00aaaaaac0510700
[ 21.204606] s11: 0000000000000001 t3 : ff60000001218f00 t4 : ff60000001218f00
[ 21.204876] t5 : ff60000001218000 t6 : ff200000103338b8
[ 21.205079] status: 0000000200000020 badaddr: 0000000000000000 cause: 0000000000000008
With the incorrect PC, the backtrace showed by crash tool as below, the first
stack frame is abnormal,
crash> bt
PID: 113 TASK: ff60000002269600 CPU: 0 COMMAND: "sh"
#0 [ff2000001039bb90] __efistub_.Ldebug_info0 at 00ffffffa5537400 <-- Abnormal
#1 [ff2000001039bcf0] panic at ffffffff806578ba
#2 [ff2000001039bd50] sysrq_reset_seq_param_set at ffffffff8038c030
#3 [ff2000001039bda0] __handle_sysrq at ffffffff8038c5f8
#4 [ff2000001039be00] write_sysrq_trigger at ffffffff8038cad8
#5 [ff2000001039be20] proc_reg_write at ffffffff801b7edc
#6 [ff2000001039be40] vfs_write at ffffffff80152ba6
#7 [ff2000001039be80] ksys_write at ffffffff80152ece
#8 [ff2000001039bed0] sys_write at ffffffff80152f46
With the patch, we can get current kernel mode PC, the output as below,
[ 17.607658] CPU: 0 PID: 113 Comm: sh Kdump: loaded Not tainted 5.18.9 #42
[ 17.607937] Hardware name: riscv-virtio,qemu (DT)
[ 17.608150] epc : ffffffff800078f8 ra : ffffffff8008862c sp : ff20000010333b90
[ 17.608441] gp : ffffffff810dde38 tp : ff6000000226c200 t0 : ffffffff8032be68
[ 17.608741] t1 : 0720072007200720 t2 : 666666666666663c s0 : ff20000010333cf0
[ 17.609025] s1 : 0000000000000000 a0 : ff20000010333b98 a1 : 0000000000000001
[ 17.609320] a2 : 0000000000000010 a3 : 0000000000000000 a4 : 0000000000000000
[ 17.609601] a5 : ff60000001c78000 a6 : 000000000000003c a7 : ffffffff8035c9a4
[ 17.609894] s2 : ffffffff810df0a8 s3 : ffffffff810df718 s4 : ff20000010333b98
[ 17.610186] s5 : 0000000000000000 s6 : 0000000000000007 s7 : ffffffff80c4a468
[ 17.610469] s8 : 00ffffffca281410 s9 : 0000000000000007 s10: 00aaaaaab5bb6700
[ 17.610755] s11: 0000000000000001 t3 : ff60000001218f00 t4 : ff60000001218f00
[ 17.611041] t5 : ff60000001218000 t6 : ff20000010333988
[ 17.611255] status: 0000000200000020 badaddr: 0000000000000000 cause: 0000000000000008
With the correct PC, the backtrace showed by crash tool as below,
crash> bt
PID: 113 TASK: ff6000000226c200 CPU: 0 COMMAND: "sh"
#0 [ff20000010333b90] riscv_crash_save_regs at ffffffff800078f8 <--- Normal
#1 [ff20000010333cf0] panic at ffffffff806578c6
#2 [ff20000010333d50] sysrq_reset_seq_param_set at ffffffff8038c03c
#3 [ff20000010333da0] __handle_sysrq at ffffffff8038c604
#4 [ff20000010333e00] write_sysrq_trigger at ffffffff8038cae4
#5 [ff20000010333e20] proc_reg_write at ffffffff801b7ee8
#6 [ff20000010333e40] vfs_write at ffffffff80152bb2
#7 [ff20000010333e80] ksys_write at ffffffff80152eda
#8 [ff20000010333ed0] sys_write at ffffffff80152f52
Fixes: e53d28180d4d ("RISC-V: Add kdump support")
Co-developed-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220811074150.3020189-3-xianting.tian@linux.alibaba.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Use __smp_processor_id() to avoid check the preemption context when
CONFIG_DEBUG_PREEMPT enabled, as we will enter crash kernel and no
return.
Without the patch,
[ 103.781044] sysrq: Trigger a crash
[ 103.784625] Kernel panic - not syncing: sysrq triggered crash
[ 103.837634] CPU1: off
[ 103.889668] CPU2: off
[ 103.933479] CPU3: off
[ 103.939424] Starting crashdump kernel...
[ 103.943442] BUG: using smp_processor_id() in preemptible [00000000] code: sh/346
[ 103.950884] caller is debug_smp_processor_id+0x1c/0x26
[ 103.956051] CPU: 0 PID: 346 Comm: sh Kdump: loaded Not tainted 5.10.113-00002-gce03f03bf4ec-dirty #149
[ 103.965355] Call Trace:
[ 103.967805] [<ffffffe00020372a>] walk_stackframe+0x0/0xa2
[ 103.973206] [<ffffffe000bcf1f4>] show_stack+0x32/0x3e
[ 103.978258] [<ffffffe000bd382a>] dump_stack_lvl+0x72/0x8e
[ 103.983655] [<ffffffe000bd385a>] dump_stack+0x14/0x1c
[ 103.988705] [<ffffffe000bdc8fe>] check_preemption_disabled+0x9e/0xaa
[ 103.995057] [<ffffffe000bdc926>] debug_smp_processor_id+0x1c/0x26
[ 104.001150] [<ffffffe000206c64>] machine_kexec+0x22/0xd0
[ 104.006463] [<ffffffe000291a7e>] __crash_kexec+0x6a/0xa4
[ 104.011774] [<ffffffe000bcf3fa>] panic+0xfc/0x2b0
[ 104.016480] [<ffffffe000656ca4>] sysrq_reset_seq_param_set+0x0/0x70
[ 104.022745] [<ffffffe000657310>] __handle_sysrq+0x8c/0x154
[ 104.028229] [<ffffffe0006577e8>] write_sysrq_trigger+0x5a/0x6a
[ 104.034061] [<ffffffe0003d90e0>] proc_reg_write+0x58/0xd4
[ 104.039459] [<ffffffe00036cff4>] vfs_write+0x7e/0x254
[ 104.044509] [<ffffffe00036d2f6>] ksys_write+0x58/0xbe
[ 104.049558] [<ffffffe00036d36a>] sys_write+0xe/0x16
[ 104.054434] [<ffffffe000201b9a>] ret_from_syscall+0x0/0x2
[ 104.067863] Will call new kernel at ecc00000 from hart id 0
[ 104.074939] FDT image at fc5ee000
[ 104.079523] Bye...
With the patch we can got clear output,
[ 67.740553] sysrq: Trigger a crash
[ 67.744166] Kernel panic - not syncing: sysrq triggered crash
[ 67.809123] CPU1: off
[ 67.865210] CPU2: off
[ 67.909075] CPU3: off
[ 67.919123] Starting crashdump kernel...
[ 67.924900] Will call new kernel at ecc00000 from hart id 0
[ 67.932045] FDT image at fc5ee000
[ 67.935560] Bye...
Fixes: 0e105f1d0037 ("riscv: use hart id instead of cpu id on machine_kexec")
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220811074150.3020189-2-xianting.tian@linux.alibaba.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Implement support for the ZiHintPause extension.
The PAUSE instruction is a HINT that indicates the current hart’s rate
of instruction retirement should be temporarily reduced or paused.
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Dao Lu <daolu@rivosinc.com>
[Palmer: Some minor merge conflicts.]
Link: https://lore.kernel.org/all/20220620201530.3929352-1-daolu@rivosinc.com/
Link: https://lore.kernel.org/all/20220811053356.17375-1-palmer@rivosinc.com/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
LoongArch architecture changes for 5.20 depend on the irqchip and pci
changes to work, so merge them to create a base.
|
|
find_vqs() adds a new parameter sizes to specify the size of each vq
vring.
NULL as sizes means that all queues in find_vqs() use the maximum size.
A value in the array is 0, which means that the corresponding queue uses
the maximum size.
In the split scenario, the meaning of size is the largest size, because
it may be limited by memory, the virtio core will try a smaller size.
And the size is power of 2.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-34-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
virtio-net can display the maximum (supported by hardware) ring size in
ethtool -g eth0.
When the subsequent patch implements vring reset, it can judge whether
the ring size passed by the driver is legal based on this.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-2-xuanzhuo@linux.alibaba.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
This series is based on the alternatives changes done in my svpbmt
series and thus also depends on Atish's isa-extension parsing series.
It implements using the cache-management instructions from the Zicbom-
extension to handle cache flush, etc actions on platforms needing them.
SoCs using cpu cores from T-Head like the Allwinne D1 implement a
different set of cache instructions. But while they are different,
instructions they provide the same functionality, so a variant can easly
hook into the existing alternatives mechanism on those.
[Palmer: Some minor fixups, including a RISCV_ISA_ZICBOM dependency on
MMU that's probably not strictly necessary. The Zicbom support will
trip up sparse for users that have new toolchains, I just sent a patch.]
Link: https://lore.kernel.org/all/20220706231536.2041855-1-heiko@sntech.de/
Link: https://lore.kernel.org/linux-sparse/20220811033138.20676-1-palmer@rivosinc.com/T/#u
* palmer/riscv-zicbom:
riscv: implement cache-management errata for T-Head SoCs
riscv: Add support for non-coherent devices using zicbom extension
dt-bindings: riscv: document cbom-block-size
of: also handle dma-noncoherent in of_dma_is_coherent()
|
|
Users of GNU ld (BFD) from binutils 2.39+ will observe multiple
instances of a new warning when linking kernels in the form:
ld: warning: arch/x86/boot/pmjump.o: missing .note.GNU-stack section implies executable stack
ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker
ld: warning: arch/x86/boot/compressed/vmlinux has a LOAD segment with RWX permissions
Generally, we would like to avoid the stack being executable. Because
there could be a need for the stack to be executable, assembler sources
have to opt-in to this security feature via explicit creation of the
.note.GNU-stack feature (which compilers create by default) or command
line flag --noexecstack. Or we can simply tell the linker the
production of such sections is irrelevant and to link the stack as
--noexecstack.
LLVM's LLD linker defaults to -z noexecstack, so this flag isn't
strictly necessary when linking with LLD, only BFD, but it doesn't hurt
to be explicit here for all linkers IMO. --no-warn-rwx-segments is
currently BFD specific and only available in the current latest release,
so it's wrapped in an ld-option check.
While the kernel makes extensive usage of ELF sections, it doesn't use
permissions from ELF segments.
Link: https://lore.kernel.org/linux-block/3af4127a-f453-4cf7-f133-a181cce06f73@kernel.dk/
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=ba951afb99912da01a6e8434126b8fac7aa75107
Link: https://github.com/llvm/llvm-project/issues/57009
Reported-and-tested-by: Jens Axboe <axboe@kernel.dk>
Suggested-by: Fangrui Song <maskray@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This series should rid us of dtbs_check errors for the RISC-V Canaan
k210 based boards. To make keeping it that way a little easier, I
changed the Canaan devicetree Makefile so that it would build all of the
devicetrees in the directory if SOC_CANAAN.
Link: https://lore.kernel.org/all/mhng-85044754-c361-40bc-a6a2-7082f35930bb@palmer-ri-x1c9/
* remotes/palmer/riscv-canaan_dt_schema:
riscv: dts: canaan: build all devicetress if SOC_CANAAN
riscv: dts: canaan: add specific compatible for kd233's LCD
riscv: dts: canaan: fix bus {ranges,reg} warnings
riscv: dts: canaan: remove spi-max-frequency from controllers
riscv: dts: canaan: use custom compatible for k210 i2s
riscv: dts: canaan: fix kd233 display spi frequency
riscv: dts: canaan: fix mmc node names
riscv: dts: canaan: fix the k210's timer nodes
riscv: dts: canaan: fix the k210's memory node
dt-bindings: memory-controllers: add canaan k210 sram controller
dt-bindings: display: ili9341: document canaan kd233's lcd
dt-bindings: display: convert ilitek,ili9341.txt to dt-schema
|
|
Since commit 5d8544e2d007 ("RISC-V: Generic library routines and assembly")
and commit ebcbd75e3962 ("riscv: Fix the bug in memory access fixup code"),
if __clear_user and __copy_user return from an fixup branch,
CSR_STATUS SR_SUM bit will be set, it is a vulnerability, so that
S-mode memory accesses to pages that are accessible by U-mode will success.
Disable S-mode access to U-mode memory should clear SR_SUM bit.
Fixes: 5d8544e2d007 ("RISC-V: Generic library routines and assembly")
Fixes: ebcbd75e3962 ("riscv: Fix the bug in memory access fixup code")
Signed-off-by: Chen Lifu <chenlifu@huawei.com>
Reviewed-by: Ben Dooks <ben.dooks@codethink.co.uk>
Link: https://lore.kernel.org/r/20220615014714.1650349-1-chenlifu@huawei.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Now that the PMU is refreshed when MSR_IA32_PERF_CAPABILITIES is written
by host userspace, zero out the number of LBR records for a vCPU during
PMU refresh if PMU_CAP_LBR_FMT is not set in PERF_CAPABILITIES instead of
handling the check at run-time.
guest_cpuid_has() is expensive due to the linear search of guest CPUID
entries, intel_pmu_lbr_is_enabled() is checked on every VM-Enter, _and_
simply enumerating the same "Model" as the host causes KVM to set the
number of LBR records to a non-zero value.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220727233424.2968356-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Turn vcpu_to_lbr_desc() and vcpu_to_lbr_records() into functions in order
to provide type safety, to document exactly what they return, and to
allow consuming the helpers in vmx.h. Move the definitions as necessary
(the macros "reference" to_vmx() before its definition).
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220727233424.2968356-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Refresh the PMU if userspace modifies MSR_IA32_PERF_CAPABILITIES. KVM
consumes the vCPU's PERF_CAPABILITIES when enumerating PEBS support, but
relies on CPUID updates to refresh the PMU. I.e. KVM will do the wrong
thing if userspace stuffs PERF_CAPABILITIES _after_ setting guest CPUID.
Opportunistically fix a curly-brace indentation.
Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
Cc: Like Xu <like.xu.linux@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220727233424.2968356-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add compile-time and init-time sanity checks to ensure that the MMIO SPTE
mask doesn't overlap the MMIO SPTE generation or the MMU-present bit.
The generation currently avoids using bit 63, but that's as much
coincidence as it is strictly necessarly. That will change in the future,
as TDX support will require setting bit 63 (SUPPRESS_VE) in the mask.
Explicitly carve out the bits that are allowed in the mask so that any
future shuffling of SPTE bits doesn't silently break MMIO caching (KVM
has broken MMIO caching more than once due to overlapping the generation
with other things).
Suggested-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-Id: <20220805194133.86299-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Rename the tracepoint function from trace_kvm_async_pf_doublefault() to
trace_kvm_async_pf_repeated_fault() to make it clear, since double fault
has nothing to do with this trace function.
Asynchronous Page Fault (APF) is an artifact generated by KVM when it
cannot find a physical page to satisfy an EPT violation. KVM uses APF to
tell the guest OS to do something else such as scheduling other guest
processes to make forward progress. However, when another guest process
also touches a previously APFed page, KVM halts the vCPU instead of
generating a repeated APF to avoid wasting cycles.
Double fault (#DF) clearly has a different meaning and a different
consequence when triggered. #DF requires two nested contributory exceptions
instead of two page faults faulting at the same address. A prevous bug on
APF indicates that it may trigger a double fault in the guest [1] and
clearly this trace function has nothing to do with it. So rename this
function should be a valid choice.
No functional change intended.
[1] https://www.spinics.net/lists/kvm/msg214957.html
Signed-off-by: Mingwei Zhang <mizhang@google.com>
Message-Id: <20220807052141.69186-1-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Stop Xen timer (if it's running) prior to changing the IRQ vector and
potentially (re)starting the timer. Changing the IRQ vector while the
timer is still running can result in KVM injecting a garbage event, e.g.
vm_xen_inject_timer_irqs() could see a non-zero xen.timer_pending from
a previous timer but inject the new xen.timer_virq.
Fixes: 536395260582 ("KVM: x86/xen: handle PV timers oneshot mode")
Cc: stable@vger.kernel.org
Link: https://syzkaller.appspot.com/bug?id=8234a9dfd3aafbf092cc5a7cd9842e3ebc45fc42
Reported-by: syzbot+e54f930ed78eb0f85281@syzkaller.appspotmail.com
Signed-off-by: Coleman Dietsch <dietschc@csp.edu>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20220808190607.323899-3-dietschc@csp.edu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a check for existing xen timers before initializing a new one.
Currently kvm_xen_init_timer() is called on every
KVM_XEN_VCPU_ATTR_TYPE_TIMER, which is causing the following ODEBUG
crash when vcpu->arch.xen.timer is already set.
ODEBUG: init active (active state 0)
object type: hrtimer hint: xen_timer_callbac0
RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:502
Call Trace:
__debug_object_init
debug_hrtimer_init
debug_init
hrtimer_init
kvm_xen_init_timer
kvm_xen_vcpu_set_attr
kvm_arch_vcpu_ioctl
kvm_vcpu_ioctl
vfs_ioctl
Fixes: 536395260582 ("KVM: x86/xen: handle PV timers oneshot mode")
Cc: stable@vger.kernel.org
Link: https://syzkaller.appspot.com/bug?id=8234a9dfd3aafbf092cc5a7cd9842e3ebc45fc42
Reported-by: syzbot+e54f930ed78eb0f85281@syzkaller.appspotmail.com
Signed-off-by: Coleman Dietsch <dietschc@csp.edu>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220808190607.323899-2-dietschc@csp.edu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Disable SEV-ES if MMIO caching is disabled as SEV-ES relies on MMIO SPTEs
generating #NPF(RSVD), which are reflected by the CPU into the guest as
a #VC. With SEV-ES, the untrusted host, a.k.a. KVM, doesn't have access
to the guest instruction stream or register state and so can't directly
emulate in response to a #NPF on an emulated MMIO GPA. Disabling MMIO
caching means guest accesses to emulated MMIO ranges cause #NPF(!PRESENT),
and those flavors of #NPF cause automatic VM-Exits, not #VC.
Adjust KVM's MMIO masks to account for the C-bit location prior to doing
SEV(-ES) setup, and document that dependency between adjusting the MMIO
SPTE mask and SEV(-ES) setup.
Fixes: b09763da4dd8 ("KVM: x86/mmu: Add module param to disable MMIO caching (for testing)")
Reported-by: Michael Roth <michael.roth@amd.com>
Tested-by: Michael Roth <michael.roth@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220803224957.1285926-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Fully re-evaluate whether or not MMIO caching can be enabled when SPTE
masks change; simply clearing enable_mmio_caching when a configuration
isn't compatible with caching fails to handle the scenario where the
masks are updated, e.g. by VMX for EPT or by SVM to account for the C-bit
location, and toggle compatibility from false=>true.
Snapshot the original module param so that re-evaluating MMIO caching
preserves userspace's desire to allow caching. Use a snapshot approach
so that enable_mmio_caching still reflects KVM's actual behavior.
Fixes: 8b9e74bfbf8c ("KVM: x86/mmu: Use enable_mmio_caching to track if MMIO caching is enabled")
Reported-by: Michael Roth <michael.roth@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Tested-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-Id: <20220803224957.1285926-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Mark kvm_mmu_x86_module_init() with __init, the entire reason it exists
is to initialize variables when kvm.ko is loaded, i.e. it must never be
called after module initialization.
Fixes: 1d0e84806047 ("KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loaded")
Cc: stable@vger.kernel.org
Reviewed-by: Kai Huang <kai.huang@intel.com>
Tested-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220803224957.1285926-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The emulator mishandles LEA with register source operand. Even though such
LEA is illegal, it can be encoded and fed to CPU. In which case real
hardware throws #UD. The emulator, instead, returns address of
x86_emulate_ctxt._regs. This info leak hurts host's kASLR.
Tell the decoder that illegal LEA is not to be emulated.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Message-Id: <20220729134801.1120-1-mhal@rbox.co>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
kvm_fixup_and_inject_pf_error() was introduced to fixup the error code(
e.g., to add RSVD flag) and inject the #PF to the guest, when guest
MAXPHYADDR is smaller than the host one.
When it comes to nested, L0 is expected to intercept and fix up the #PF
and then inject to L2 directly if
- L2.MAXPHYADDR < L0.MAXPHYADDR and
- L1 has no intention to intercept L2's #PF (e.g., L2 and L1 have the
same MAXPHYADDR value && L1 is using EPT for L2),
instead of constructing a #PF VM Exit to L1. Currently, with PFEC_MASK
and PFEC_MATCH both set to 0 in vmcs02, the interception and injection
may happen on all L2 #PFs.
However, failing to initialize 'fault' in kvm_fixup_and_inject_pf_error()
may cause the fault.async_page_fault being NOT zeroed, and later the #PF
being treated as a nested async page fault, and then being injected to L1.
Instead of zeroing 'fault' at the beginning of this function, we mannually
set the value of 'fault.async_page_fault', because false is the value we
really expect.
Fixes: 897861479c064 ("KVM: x86: Add helper functions for illegal GPA checking and page fault injection")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216178
Reported-by: Yang Lixiao <lixiao.yang@intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220718074756.53788-1-yu.c.zhang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Bug the VM if retrieving the x2APIC MSR/register while processing an
accelerated vAPIC trap VM-Exit fails. In theory it's impossible for the
lookup to fail as hardware has already validated the register, but bugs
happen, and not checking the result of kvm_lapic_msr_read() would result
in consuming the uninitialized "val" if a KVM or hardware bug occurs.
Fixes: 1bd9dfec9fd4 ("KVM: x86: Do not block APIC write for non ICR registers")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220804235028.1766253-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Commit 7e2175ebd695 ("KVM: x86: Fix recording of guest steal time
/ preempted status", 2021-11-11) open coded the previous call to
kvm_map_gfn, but in doing so it dropped the comparison between the cached
guest physical address and the one in the MSR. This cause an incorrect
cache hit if the guest modifies the steal time address while the memslots
remain the same. This can happen with kexec, in which case the preempted
bit is written at the address used by the old kernel instead of
the old one.
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: stable@vger.kernel.org
Fixes: 7e2175ebd695 ("KVM: x86: Fix recording of guest steal time / preempted status")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Commit 7e2175ebd695 ("KVM: x86: Fix recording of guest steal time
/ preempted status", 2021-11-11) open coded the previous call to
kvm_map_gfn, but in doing so it dropped the comparison between the cached
guest physical address and the one in the MSR. This cause an incorrect
cache hit if the guest modifies the steal time address while the memslots
remain the same. This can happen with kexec, in which case the steal
time data is written at the address used by the old kernel instead of
the old one.
While at it, rename the variable from gfn to gpa since it is a plain
physical address and not a right-shifted one.
Reported-by: Dave Young <ruyang@redhat.com>
Reported-by: Xiaoying Yan <yiyan@redhat.com>
Analyzed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: stable@vger.kernel.org
Fixes: 7e2175ebd695 ("KVM: x86: Fix recording of guest steal time / preempted status")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Pull remaining MM updates from Andrew Morton:
"Three patch series - two that perform cleanups and one feature:
- hugetlb_vmemmap cleanups from Muchun Song
- hardware poisoning support for 1GB hugepages, from Naoya Horiguchi
- highmem documentation fixups from Fabio De Francesco"
* tag 'mm-stable-2022-08-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (23 commits)
Documentation/mm: add details about kmap_local_page() and preemption
highmem: delete a sentence from kmap_local_page() kdocs
Documentation/mm: rrefer kmap_local_page() and avoid kmap()
Documentation/mm: avoid invalid use of addresses from kmap_local_page()
Documentation/mm: don't kmap*() pages which can't come from HIGHMEM
highmem: specify that kmap_local_page() is callable from interrupts
highmem: remove unneeded spaces in kmap_local_page() kdocs
mm, hwpoison: enable memory error handling on 1GB hugepage
mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage
mm, hwpoison: make __page_handle_poison returns int
mm, hwpoison: set PG_hwpoison for busy hugetlb pages
mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage
mm, hwpoison, hugetlb: support saving mechanism of raw error pages
mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry
mm/hugetlb: check gigantic_page_runtime_supported() in return_unused_surplus_pages()
mm: hugetlb_vmemmap: use PTRS_PER_PTE instead of PMD_SIZE / PAGE_SIZE
mm: hugetlb_vmemmap: move code comments to vmemmap_dedup.rst
mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability
mm: hugetlb_vmemmap: replace early_param() with core_param()
mm: hugetlb_vmemmap: move vmemmap code related to HugeTLB to hugetlb_vmemmap.c
...
|
|
Pull cxl updates from Dan Williams:
"Compute Express Link (CXL) updates for 6.0:
- Introduce a 'struct cxl_region' object with support for
provisioning and assembling persistent memory regions.
- Introduce alloc_free_mem_region() to accompany the existing
request_free_mem_region() as a method to allocate physical memory
capacity out of an existing resource.
- Export insert_resource_expand_to_fit() for the CXL subsystem to
late-publish CXL platform windows in iomem_resource.
- Add a polled mode PCI DOE (Data Object Exchange) driver service and
use it in cxl_pci to retrieve the CDAT (Coherent Device Attribute
Table)"
* tag 'cxl-for-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (74 commits)
cxl/hdm: Fix skip allocations vs multiple pmem allocations
cxl/region: Disallow region granularity != window granularity
cxl/region: Fix x1 interleave to greater than x1 interleave routing
cxl/region: Move HPA setup to cxl_region_attach()
cxl/region: Fix decoder interleave programming
Documentation: cxl: remove dangling kernel-doc reference
cxl/region: describe targets and nr_targets members of cxl_region_params
cxl/regions: add padding for cxl_rr_ep_add nested lists
cxl/region: Fix IS_ERR() vs NULL check
cxl/region: Fix region reference target accounting
cxl/region: Fix region commit uninitialized variable warning
cxl/region: Fix port setup uninitialized variable warnings
cxl/region: Stop initializing interleave granularity
cxl/hdm: Fix DPA reservation vs cxl_endpoint_decoder lifetime
cxl/acpi: Minimize granularity for x1 interleaves
cxl/region: Delete 'region' attribute from root decoders
cxl/acpi: Autoload driver for 'cxl_acpi' test devices
cxl/region: decrement ->nr_targets on error in cxl_region_attach()
cxl/region: prevent underflow in ways_to_cxl()
cxl/region: uninitialized variable in alloc_hpa()
...
|
|
Pull pin control updates from Linus Walleij:
"Outside the pinctrl driver and DT bindings we hit some Arm DT files,
patched by the maintainers.
Other than that it is business as usual.
Core changes:
- Add PINCTRL_PINGROUP() helper macro (and use it in the AMD driver).
New drivers:
- Intel Meteor Lake support.
- Reneasas RZ/V2M and r8a779g0 (R-Car V4H).
- AXP209 variants AXP221, AXP223 and AXP809.
- Qualcomm MSM8909, PM8226, PMP8074 and SM6375.
- Allwinner D1.
Improvements:
- Proper pin multiplexing in the AMD driver.
- Mediatek MT8192 can use generic drive strength and pin bias, then
fixes on top plus some I2C pin group fixes.
- Have the Allwinner Sunplus SP7021 use the generic DT schema and
make interrupts optional.
- Handle Qualcomm SC7280 ADSP.
- Handle Qualcomm MSM8916 CAMSS GP clock muxing.
- High impedance bias on ZynqMP.
- Serialize StarFive access to MMIO.
- Immutable gpiochip for BCM2835, Ingenic, Qualcomm SPMI GPIO"
* tag 'pinctrl-v6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (117 commits)
dt-bindings: pinctrl: qcom,pmic-gpio: add PM8226 constraints
pinctrl: qcom: Make PINCTRL_SM8450 depend on PINCTRL_MSM
pinctrl: qcom: sm8250: Fix PDC map
pinctrl: amd: Fix an unused variable
dt-bindings: pinctrl: mt8186: Add and use drive-strength-microamp
dt-bindings: pinctrl: mt8186: Add gpio-line-names property
ARM: dts: imxrt1170-pinfunc: Add pinctrl binding header
pinctrl: amd: Use unicode for debugfs output
pinctrl: amd: Fix newline declaration in debugfs output
pinctrl: at91: Fix typo 'the the' in comment
dt-bindings: pinctrl: st,stm32: Correct 'resets' property name
pinctrl: mvebu: Missing a blank line after declarations.
pinctrl: qcom: Add SM6375 TLMM driver
dt-bindings: pinctrl: Add DT schema for SM6375 TLMM
dt-bindings: pinctrl: mt8195: Use drive-strength-microamp in examples
Revert "pinctrl: qcom: spmi-gpio: make the irqchip immutable"
pinctrl: imx93: Add MODULE_DEVICE_TABLE()
pinctrl: sunxi: Add driver for Allwinner D1
pinctrl: sunxi: Make some layout parameters dynamic
pinctrl: sunxi: Refactor register/offset calculation
...
|
|
Pull Kbuild updates from Masahiro Yamada:
- Remove the support for -O3 (CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3)
- Fix error of rpm-pkg cross-builds
- Support riscv for checkstack tool
- Re-enable -Wformwat warnings for Clang
- Clean up modpost, Makefiles, and misc scripts
* tag 'kbuild-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (30 commits)
modpost: remove .symbol_white_list field entirely
modpost: remove unneeded .symbol_white_list initializers
modpost: add PATTERNS() helper macro
modpost: shorten warning messages in report_sec_mismatch()
Revert "Kbuild, lto, workaround: Don't warn for initcall_reference in modpost"
modpost: use more reliable way to get fromsec in section_rel(a)()
modpost: add array range check to sec_name()
modpost: refactor get_secindex()
kbuild: set EXIT trap before creating temporary directory
modpost: remove unused Elf_Sword macro
Makefile.extrawarn: re-enable -Wformat for clang
kbuild: add dtbs_prepare target
kconfig: Qt5: tell the user which packages are required
modpost: use sym_get_data() to get module device_table data
modpost: drop executable ELF support
checkstack: add riscv support for scripts/checkstack.pl
kconfig: shorten the temporary directory name for cc-option
scripts: headers_install.sh: Update config leak ignore entries
kbuild: error out if $(INSTALL_MOD_PATH) contains % or :
kbuild: error out if $(KBUILD_EXTMOD) contains % or :
...
|
|
The sparse tool complains as follows:
arch/arm64/net/bpf_jit_comp.c:1684:16:
warning: incorrect type in assignment (different base types)
arch/arm64/net/bpf_jit_comp.c:1684:16:
expected unsigned int [usertype] *branch
arch/arm64/net/bpf_jit_comp.c:1684:16:
got restricted __le32 [usertype] *
arch/arm64/net/bpf_jit_comp.c:1700:52:
error: subtraction of different types can't work (different base
types)
arch/arm64/net/bpf_jit_comp.c:1734:29:
warning: incorrect type in assignment (different base types)
arch/arm64/net/bpf_jit_comp.c:1734:29:
expected unsigned int [usertype] *
arch/arm64/net/bpf_jit_comp.c:1734:29:
got restricted __le32 [usertype] *
arch/arm64/net/bpf_jit_comp.c:1918:52:
error: subtraction of different types can't work (different base
types)
This is because the variable branch in function invoke_bpf_prog and the
variable branches in function prepare_trampoline are defined as type
u32 *, which conflicts with ctx->image's type __le32 *, so sparse complains
when assignment or arithmetic operation are performed on these two
variables and ctx->image.
Since arm64 instructions are always little-endian, change the type of
these two variables to __le32 * and call cpu_to_le32() to convert
instruction to little-endian before writing it to memory. This is also
in line with emit() which internally does cpu_to_le32(), too.
Fixes: efc9909fdce0 ("bpf, arm64: Add bpf trampoline for arm64")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/bpf/20220808040735.1232002-1-xukuohai@huawei.com
|
|
Use GENMASK() to generate the masks of device type and device id,
fixing compilation errors due to the sign extension when using
older versions of GCC (such as is 7.5):
In function ‘kvm_vm_ioctl_set_device_addr.isra.38’,
inlined from ‘kvm_arch_vm_ioctl’ at arch/arm64/kvm/arm.c:1454:10:
././include/linux/compiler_types.h:354:38: error: call to ‘__compiletime_assert_599’ \
declared with attribute error: FIELD_GET: mask is not constant
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
Fixes: 9f968c9266aa ("KVM: arm64: vgic-v2: Add helper for legacy dist/cpuif base address setting")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
[maz: tidy up commit message]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220810013435.1525363-1-yangyingliang@huawei.com
|
|
clang 14 won't build because ret is uninitialised and can be returned if
both prop and fdtprop are NULL. Drop the ret variable and return an
error in that failure case.
Fixes: b1fc44eaa9ba ("pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-window")
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220810054331.373761-1-ruscur@russell.cc
|
|
PPC_RAW_TW() is erroneously defined with base code 0x7f000008
instead of 0x7c000008.
That's invisible because its only user is PPC_RAW_TRAP() which is
0x7fe00008, but fix it anyway to avoid any risk of future bug.
Fixes: d00d762daf12 ("powerpc/ppc-opcode: Define and use PPC_RAW_TRAP() and PPC_RAW_TW()")
Reported-by: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eca9251f1e1f82c4c46ec6380ddb28356ab3fdfe.1659527244.git.christophe.leroy@csgroup.eu
|
|
Clang doesn't support -mprofile-kernel ABI, so guard the checks against
CONFIG_DYNAMIC_FTRACE_WITH_REGS, rather than the elf ABI version.
Fixes: 23b44fc248f4 ("powerpc/ftrace: Make __ftrace_make_{nop/call}() common to PPC32 and PPC64")
Cc: stable@vger.kernel.org # v5.19+
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Reported-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Tested-by: Ondrej Mosnacek <omosnacek@gmail.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://github.com/llvm/llvm-project/issues/57031
Link: https://github.com/ClangBuiltLinux/linux/issues/1682
Link: https://lore.kernel.org/r/20220809095907.418764-1-naveen.n.rao@linux.vnet.ibm.com
|
|
Just like the first patch of this series, define a local 'eh' in order
to make the code clearer.
And IS_ENABLED() returns either 1 or 0 so no need to do
IS_ENABLED(CONFIG_PPC64) ? 1 : 0.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Use symbolic names, use 'n' constraint per Segher]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/629befaa2d05e2922346e58a383886510d6af55a.1659430931.git.christophe.leroy@csgroup.eu
|
|
The eh field must remain 0 for PPC32 and is only used
by PPC64.
Don't hide that behind a macro, just leave the responsibility
to the user.
At the time being, the only users of PPC_RAW_L{WDQ}ARX are
setting the eh field to 0, so the special handling of __PPC_EH
is useless. Just take the value given by the caller.
Same for DEFINE_TESTOP(), don't do special handling in that
macro, ensure the caller hands over the proper eh value.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Use 'n' constraint per Segher]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8b9c8a1a14f9143552a85fcbf96698224a8c2469.1659430931.git.christophe.leroy@csgroup.eu
|
|
Commit 9401f4e46cf6 ("powerpc: Use lwarx/ldarx directly instead of
PPC_LWARX/LDARX macros") properly handled the eh field of lwarx
in asm/bitops.h but failed to clear it for PPC32 in
asm/simple_spinlock.h
So, do as in arch_atomic_try_cmpxchg_lock(), set it to 1 if PPC64
but set it to 0 if PPC32. For that use IS_ENABLED(CONFIG_PPC64) which
returns 1 when CONFIG_PPC64 is set and 0 otherwise.
Fixes: 9401f4e46cf6 ("powerpc: Use lwarx/ldarx directly instead of PPC_LWARX/LDARX macros")
Cc: stable@vger.kernel.org # v5.15+
Reported-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Tested-by: Pali Rohár <pali@kernel.org>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
[mpe: Use symbolic names, use 'n' constraint per Segher]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a1176e19e627dd6a1b8d24c6c457a8ab874b7d12.1659430931.git.christophe.leroy@csgroup.eu
|
|
Pull m68knommu fixes from Greg Ungerer:
- spelling in comment
- compilation when flexcan driver enabled
- sparse warning
* tag 'm68knommu-for-v5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k: Fix syntax errors in comments
m68k: coldfire: make symbol m523x_clk_lookup static
m68k: coldfire/device.c: protect FLEXCAN blocks
|
|
Pull x86 eIBRS fixes from Borislav Petkov:
"More from the CPU vulnerability nightmares front:
Intel eIBRS machines do not sufficiently mitigate against RET
mispredictions when doing a VM Exit therefore an additional RSB,
one-entry stuffing is needed"
* tag 'x86_bugs_pbrsb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Add LFENCE to RSB fill sequence
x86/speculation: Add RSB VM Exit protections
|
|
follow_pud_mask() does not support non-present pud entry now. As long as
I tested on x86_64 server, follow_pud_mask() still simply returns
no_page_table() for non-present_pud_entry() due to pud_bad(), so no severe
user-visible effect should happen. But generally we should call
follow_huge_pud() for non-present pud entry for 1GB hugetlb page.
Update pud_huge() and follow_huge_pud() to handle non-present pud entries.
The changes are similar to previous works for pud entries commit
e66f17ff7177 ("mm/hugetlb: take page table lock in follow_huge_pmd()") and
commit cbef8478bee5 ("mm/hugetlb: pmd_huge() returns true for non-present
hugepage").
Link: https://lkml.kernel.org/r/20220714042420.1847125-3-naoya.horiguchi@linux.dev
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Simplify hugetlb vmemmap and improve its readability", v2.
This series aims to simplify hugetlb vmemmap and improve its readability.
This patch (of 8):
The name hugetlb_optimize_vmemmap_enabled() a bit confusing as it tests
two conditions (enabled and pages in use). Instead of coming up to an
appropriate name, we could just delete it. There is already a discussion
about deleting it in thread [1].
There is only one user of hugetlb_optimize_vmemmap_enabled() outside of
hugetlb_vmemmap, that is flush_dcache_page() in arch/arm64/mm/flush.c.
However, it does not need to call hugetlb_optimize_vmemmap_enabled() in
flush_dcache_page() since HugeTLB pages are always fully mapped and only
head page will be set PG_dcache_clean meaning only head page's flag may
need to be cleared (see commit cf5a501d985b). So it is easy to remove
hugetlb_optimize_vmemmap_enabled().
Link: https://lore.kernel.org/all/c77c61c8-8a5a-87e8-db89-d04d8aaab4cc@oracle.com/ [1]
Link: https://lkml.kernel.org/r/20220628092235.91270-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Pull tty / serial driver updates from Greg KH:
"Here is the big set of tty and serial driver changes for 6.0-rc1.
It was delayed from last week as I wanted to make sure the last commit
here got some good testing in linux-next and elsewhere as it seemed to
show up only late in testing for some reason.
Nothing major here, just lots of cleanups from Jiri and Ilpo to make
the tty core cleaner (Jiri) and the rs485 code simpler to use (Ilpo).
Also included in here is the obligatory n_gsm updates from Daniel
Starke and lots of tiny driver updates and minor fixes and tweaks for
other smaller serial drivers.
All of these have been in linux-next for a while with no reported
problems"
* tag 'tty-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (186 commits)
tty: serial: qcom-geni-serial: Fix %lu -> %u in print statements
tty: amiserial: Fix comment typo
tty: serial: document uart_get_console()
tty: serial: serial_core, reformat kernel-doc for functions
Documentation: serial: link uart_ops properly
Documentation: serial: move GPIO kernel-doc to the functions
Documentation: serial: dedup kernel-doc for uart functions
Documentation: serial: move uart_ops documentation to the struct
dt-bindings: serial: snps-dw-apb-uart: Document Rockchip RV1126
serial: mvebu-uart: uart2 error bits clearing
tty: serial: fsl_lpuart: correct the count of break characters
serial: stm32: make info structs static to avoid sparse warnings
serial: fsl_lpuart: zero out parity bit in CS7 mode
tty: serial: qcom-geni-serial: Fix get_clk_div_rate() which otherwise could return a sub-optimal clock rate.
serial: 8250_bcm2835aux: Add missing clk_disable_unprepare()
tty: vt: initialize unicode screen buffer
serial: remove VR41XX serial driver
serial: 8250: lpc18xx: Remove redundant sanity check for RS485 flags
serial: 8250_dwlib: remove redundant sanity check for RS485 flags
dt_bindings: rs485: Correct delay values
...
|
|
AMD's "Technical Guidance for Mitigating Branch Type Confusion,
Rev. 1.0 2022-07-12" whitepaper, under section 6.1.2 "IBPB On
Privileged Mode Entry / SMT Safety" says:
Similar to the Jmp2Ret mitigation, if the code on the sibling thread
cannot be trusted, software should set STIBP to 1 or disable SMT to
ensure SMT safety when using this mitigation.
So, like already being done for retbleed=unret, and now also for
retbleed=ibpb, force STIBP on machines that have it, and report its SMT
vulnerability status accordingly.
[ bp: Remove the "we" and remove "[AMD]" applicability parameter which
doesn't work here. ]
Fixes: 3ebc17006888 ("x86/bugs: Add retbleed=ibpb")
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@vger.kernel.org # 5.10, 5.15, 5.19
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
Link: https://lore.kernel.org/r/20220804192201.439596-1-kim.phillips@amd.com
|
|
It is not necessary to allocate contiguous physical memory for BPF
program buffer using kcalloc. When the BPF program is large more than
memory page size, kcalloc allocates multiple memory pages from buddy
system. If the device can not provide sufficient memory, for example
in low-end android devices [0], memory allocation for BPF program is
likely to fail.
Test cases in lib/test_bpf.c all pass on ARM64 QEMU.
[0]
AndroidTestSuit: page allocation failure: order:4,
mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=foreground,mems_allowed=0
Call trace:
dump_stack+0xa4/0x114
warn_alloc+0xf8/0x14c
__alloc_pages_slowpath+0xac8/0xb14
__alloc_pages_nodemask+0x194/0x3d0
kmalloc_order_trace+0x44/0x1e8
__kmalloc+0x29c/0x66c
bpf_int_jit_compile+0x17c/0x568
bpf_prog_select_runtime+0x4c/0x1b0
bpf_prepare_filter+0x5fc/0x6bc
bpf_prog_create_from_user+0x118/0x1c0
seccomp_set_mode_filter+0x1c4/0x7cc
__do_sys_prctl+0x380/0x1424
__arm64_sys_prctl+0x20/0x2c
el0_svc_common+0xc8/0x22c
el0_svc_handler+0x1c/0x28
el0_svc+0x8/0x100
Signed-off-by: Aijun Sun <aijun.sun@unisoc.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220804025442.22524-1-aijun.sun@unisoc.com
|
|
MIPS doesn't declare find_pch_pic(), which makes a build warning:
>> drivers/irqchip/irq-loongson-pch-pic.c:51:5: warning: no previous prototype for function 'find_pch_pic' [-Wmissing-prototypes]
int find_pch_pic(u32 gsi)
^
drivers/irqchip/irq-loongson-pch-pic.c:51:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int find_pch_pic(u32 gsi)
^
static
1 warning generated.
Move find_pch_pic() into CONFIG_ACPI which only used by LoongArch to fix
the warning.
BTW, remove the duplicated declaration of find_pch_pic() in irq.h.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220808093205.3658485-1-chenhuacai@loongson.cn
|
|
Pull bitmap updates from Yury Norov:
- fix the duplicated comments on bitmap_to_arr64() (Qu Wenruo)
- optimize out non-atomic bitops on compile-time constants (Alexander
Lobakin)
- cleanup bitmap-related headers (Yury Norov)
- x86/olpc: fix 'logical not is only applied to the left hand side'
(Alexander Lobakin)
- lib/nodemask: inline wrappers around bitmap (Yury Norov)
* tag 'bitmap-6.0-rc1' of https://github.com/norov/linux: (26 commits)
lib/nodemask: inline next_node_in() and node_random()
powerpc: drop dependency on <asm/machdep.h> in archrandom.h
x86/olpc: fix 'logical not is only applied to the left hand side'
lib/cpumask: move some one-line wrappers to header file
headers/deps: mm: align MANITAINERS and Docs with new gfp.h structure
headers/deps: mm: Split <linux/gfp_types.h> out of <linux/gfp.h>
headers/deps: mm: Optimize <linux/gfp.h> header dependencies
lib/cpumask: move trivial wrappers around find_bit to the header
lib/cpumask: change return types to unsigned where appropriate
cpumask: change return types to bool where appropriate
lib/bitmap: change type of bitmap_weight to unsigned long
lib/bitmap: change return types to bool where appropriate
arm: align find_bit declarations with generic kernel
iommu/vt-d: avoid invalid memory access via node_online(NUMA_NO_NODE)
lib/test_bitmap: test the tail after bitmap_to_arr64()
lib/bitmap: fix off-by-one in bitmap_to_arr64()
lib: test_bitmap: add compile-time optimization/evaluations assertions
bitmap: don't assume compiler evaluates small mem*() builtins calls
net/ice: fix initializing the bitmap in the switch code
bitops: let optimize out non-atomic bitops on compile-time constants
...
|
|
Pull misc updates from Andrew Morton:
"Updates to various subsystems which I help look after. lib, ocfs2,
fatfs, autofs, squashfs, procfs, etc. A relatively small amount of
material this time"
* tag 'mm-nonmm-stable-2022-08-06-2' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (72 commits)
scripts/gdb: ensure the absolute path is generated on initial source
MAINTAINERS: kunit: add David Gow as a maintainer of KUnit
mailmap: add linux.dev alias for Brendan Higgins
mailmap: update Kirill's email
profile: setup_profiling_timer() is moslty not implemented
ocfs2: fix a typo in a comment
ocfs2: use the bitmap API to simplify code
ocfs2: remove some useless functions
lib/mpi: fix typo 'the the' in comment
proc: add some (hopefully) insightful comments
bdi: remove enum wb_congested_state
kernel/hung_task: fix address space of proc_dohung_task_timeout_secs
lib/lzo/lzo1x_compress.c: replace ternary operator with min() and min_t()
squashfs: support reading fragments in readahead call
squashfs: implement readahead
squashfs: always build "file direct" version of page actor
Revert "squashfs: provide backing_dev_info in order to disable read-ahead"
fs/ocfs2: Fix spelling typo in comment
ia64: old_rr4 added under CONFIG_HUGETLB_PAGE
proc: fix test for "vsyscall=xonly" boot option
...
|