aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2019-04-05KVM: PPC: Book3S HV: Perserve PSSCR FAKE_SUSPEND bit on guest exitSuraj Jitindar Singh1-1/+3
There is a hardware bug in some POWER9 processors where a treclaim in fake suspend mode can cause an inconsistency in the XER[SO] bit across the threads of a core, the workaround being to force the core into SMT4 when doing the treclaim. The FAKE_SUSPEND bit (bit 10) in the PSSCR is used to control whether a thread is in fake suspend or real suspend. The important difference here being that thread reconfiguration is blocked in real suspend but not fake suspend mode. When we exit a guest which was in fake suspend mode, we force the core into SMT4 while we do the treclaim in kvmppc_save_tm_hv(). However on the new exit path introduced with the function kvmhv_run_single_vcpu() we restore the host PSSCR before calling kvmppc_save_tm_hv() which means that if we were in fake suspend mode we put the thread into real suspend mode when we clear the PSSCR[FAKE_SUSPEND] bit. This means that we block thread reconfiguration and the thread which is trying to get the core into SMT4 before it can do the treclaim spins forever since it itself is blocking thread reconfiguration. The result is that that core is essentially lost. This results in a trace such as: [ 93.512904] CPU: 7 PID: 13352 Comm: qemu-system-ppc Not tainted 5.0.0 #4 [ 93.512905] NIP: c000000000098a04 LR: c0000000000cc59c CTR: 0000000000000000 [ 93.512908] REGS: c000003fffd2bd70 TRAP: 0100 Not tainted (5.0.0) [ 93.512908] MSR: 9000000302883033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE,TM[SE]> CR: 22222444 XER: 00000000 [ 93.512914] CFAR: c000000000098a5c IRQMASK: 3 [ 93.512915] PACATMSCRATCH: 0000000000000001 [ 93.512916] GPR00: 0000000000000001 c000003f6cc1b830 c000000001033100 0000000000000004 [ 93.512928] GPR04: 0000000000000004 0000000000000002 0000000000000004 0000000000000007 [ 93.512930] GPR08: 0000000000000000 0000000000000004 0000000000000000 0000000000000004 [ 93.512932] GPR12: c000203fff7fc000 c000003fffff9500 0000000000000000 0000000000000000 [ 93.512935] GPR16: 2000000000300375 000000000000059f 0000000000000000 0000000000000000 [ 93.512951] GPR20: 0000000000000000 0000000000080053 004000000256f41f c000003f6aa88ef0 [ 93.512953] GPR24: c000003f6aa89100 0000000000000010 0000000000000000 0000000000000000 [ 93.512956] GPR28: c000003f9e9a0800 0000000000000000 0000000000000001 c000203fff7fc000 [ 93.512959] NIP [c000000000098a04] pnv_power9_force_smt4_catch+0x1b4/0x2c0 [ 93.512960] LR [c0000000000cc59c] kvmppc_save_tm_hv+0x40/0x88 [ 93.512960] Call Trace: [ 93.512961] [c000003f6cc1b830] [0000000000080053] 0x80053 (unreliable) [ 93.512965] [c000003f6cc1b8a0] [c00800001e9cb030] kvmhv_p9_guest_entry+0x508/0x6b0 [kvm_hv] [ 93.512967] [c000003f6cc1b940] [c00800001e9cba44] kvmhv_run_single_vcpu+0x2dc/0xb90 [kvm_hv] [ 93.512968] [c000003f6cc1ba10] [c00800001e9cc948] kvmppc_vcpu_run_hv+0x650/0xb90 [kvm_hv] [ 93.512969] [c000003f6cc1bae0] [c00800001e8f620c] kvmppc_vcpu_run+0x34/0x48 [kvm] [ 93.512971] [c000003f6cc1bb00] [c00800001e8f2d4c] kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm] [ 93.512972] [c000003f6cc1bb90] [c00800001e8e3918] kvm_vcpu_ioctl+0x460/0x7d0 [kvm] [ 93.512974] [c000003f6cc1bd00] [c0000000003ae2c0] do_vfs_ioctl+0xe0/0x8e0 [ 93.512975] [c000003f6cc1bdb0] [c0000000003aeb24] ksys_ioctl+0x64/0xe0 [ 93.512978] [c000003f6cc1be00] [c0000000003aebc8] sys_ioctl+0x28/0x80 [ 93.512981] [c000003f6cc1be20] [c00000000000b3a4] system_call+0x5c/0x70 [ 93.512983] Instruction dump: [ 93.512986] 419dffbc e98c0000 2e8b0000 38000001 60000000 60000000 60000000 40950068 [ 93.512993] 392bffff 39400000 79290020 39290001 <7d2903a6> 60000000 60000000 7d235214 To fix this we preserve the PSSCR[FAKE_SUSPEND] bit until we call kvmppc_save_tm_hv() which will mean the core can get into SMT4 and perform the treclaim. Note kvmppc_save_tm_hv() clears the PSSCR[FAKE_SUSPEND] bit again so there is no need to explicitly do that. Fixes: 95a6432ce9038 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests") Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2019-03-28Documentation: kvm: clarify KVM_SET_USER_MEMORY_REGIONPaolo Bonzini1-8/+10
The documentation does not mention how to delete a slot, add the information. Reported-by: Nathaniel McCallum <npmccallum@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: doc: Document the life cycle of a VM and its resourcesSean Christopherson1-0/+17
The series to add memcg accounting to KVM allocations[1] states: There are many KVM kernel memory allocations which are tied to the life of the VM process and should be charged to the VM process's cgroup. While it is correct to account KVM kernel allocations to the cgroup of the process that created the VM, it's technically incorrect to state that the KVM kernel memory allocations are tied to the life of the VM process. This is because the VM itself, i.e. struct kvm, is not tied to the life of the process which created it, rather it is tied to the life of its associated file descriptor. In other words, kvm_destroy_vm() is not invoked until fput() decrements its associated file's refcount to zero. A simple example is to fork() in Qemu and have the child sleep indefinitely; kvm_destroy_vm() isn't called until Qemu closes its file descriptor *and* the rogue child is killed. The allocations are guaranteed to be *accounted* to the process which created the VM, but only because KVM's per-{VM,vCPU} ioctls reject the ioctl() with -EIO if kvm->mm != current->mm. I.e. the child can keep the VM "alive" but can't do anything useful with its reference. Note that because 'struct kvm' also holds a reference to the mm_struct of its owner, the above behavior also applies to userspace allocations. Given that mucking with a VM's file descriptor can lead to subtle and undesirable behavior, e.g. memcg charges persisting after a VM is shut down, explicitly document a VM's lifecycle and its impact on the VM's resources. Alternatively, KVM could aggressively free resources when the creating process exits, e.g. via mmu_notifier->release(). However, mmu_notifier isn't guaranteed to be available, and freeing resources when the creator exits is likely to be error prone and fragile as KVM would need to ensure that it only freed resources that are truly out of reach. In practice, the existing behavior shouldn't be problematic as a properly configured system will prevent a child process from being moved out of the appropriate cgroup hierarchy, i.e. prevent hiding the process from the OOM killer, and will prevent an unprivileged user from being able to to hold a reference to struct kvm via another method, e.g. debugfs. [1]https://patchwork.kernel.org/patch/10806707/ Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: selftests: complete IO before migrating guest stateSean Christopherson3-2/+33
Documentation/virtual/kvm/api.txt states: NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and KVM_EXIT_EPR the corresponding operations are complete (and guest state is consistent) only after userspace has re-entered the kernel with KVM_RUN. The kernel side will first finish incomplete operations and then check for pending signals. Userspace can re-enter the guest with an unmasked signal pending to complete pending operations. Because guest state may be inconsistent, starting state migration after an IO exit without first completing IO may result in test failures, e.g. a proposed change to KVM's handling of %rip in its fast PIO handling[1] will cause the new VM, i.e. the post-migration VM, to have its %rip set to the IN instruction that triggered KVM_EXIT_IO, leading to a test assertion due to a stage mismatch. For simplicitly, require KVM_CAP_IMMEDIATE_EXIT to complete IO and skip the test if it's not available. The addition of KVM_CAP_IMMEDIATE_EXIT predates the state selftest by more than a year. [1] https://patchwork.kernel.org/patch/10848545/ Fixes: fa3899add1056 ("kvm: selftests: add basic test for state save and restore") Reported-by: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: selftests: disable stack protector for all KVM testsSean Christopherson1-2/+2
Since 4.8.3, gcc has enabled -fstack-protector by default. This is problematic for the KVM selftests as they do not configure fs or gs segments (the stack canary is pulled from fs:0x28). With the default behavior, gcc will insert a stack canary on any function that creates buffers of 8 bytes or more. As a result, ucall() will hit a triple fault shutdown due to reading a bad fs segment when inserting its stack canary, i.e. every test fails with an unexpected SHUTDOWN. Fixes: 14c47b7530e2d ("kvm: selftests: introduce ucall") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: selftests: explicitly disable PIE for testsSean Christopherson1-1/+1
KVM selftests embed the guest "image" as a function in the test itself and extract the guest code at runtime by manually parsing the elf headers. The parsing is very simple and doesn't supporting fancy things like position independent executables. Recent versions of gcc enable pie by default, which results in triple fault shutdowns in the guest due to the virtual address in the headers not matching up with the virtual address retrieved from the function pointer. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: selftests: assert on exit reason in CR4/cpuid sync testSean Christopherson1-16/+19
...so that the test doesn't end up in an infinite loop if it fails for whatever reason, e.g. SHUTDOWN due to gcc inserting stack canary code into ucall() and attempting to derefence a null segment. Fixes: ca359066889f7 ("kvm: selftests: add cr4_cpuid_sync_test") Cc: Wei Huang <wei@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: x86: update %rip after emulating IOSean Christopherson2-10/+27
Most (all?) x86 platforms provide a port IO based reset mechanism, e.g. OUT 92h or CF9h. Userspace may emulate said mechanism, i.e. reset a vCPU in response to KVM_EXIT_IO, without explicitly announcing to KVM that it is doing a reset, e.g. Qemu jams vCPU state and resumes running. To avoid corruping %rip after such a reset, commit 0967b7bf1c22 ("KVM: Skip pio instruction when it is emulated, not executed") changed the behavior of PIO handlers, i.e. today's "fast" PIO handling to skip the instruction prior to exiting to userspace. Full emulation doesn't need such tricks becase re-emulating the instruction will naturally handle %rip being changed to point at the reset vector. Updating %rip prior to executing to userspace has several drawbacks: - Userspace sees the wrong %rip on the exit, e.g. if PIO emulation fails it will likely yell about the wrong address. - Single step exits to userspace for are effectively dropped as KVM_EXIT_DEBUG is overwritten with KVM_EXIT_IO. - Behavior of PIO emulation is different depending on whether it goes down the fast path or the slow path. Rather than skip the PIO instruction before exiting to userspace, snapshot the linear %rip and cancel PIO completion if the current value does not match the snapshot. For a 64-bit vCPU, i.e. the most common scenario, the snapshot and comparison has negligible overhead as VMCS.GUEST_RIP will be cached regardless, i.e. there is no extra VMREAD in this case. All other alternatives to snapshotting the linear %rip that don't rely on an explicit reset announcenment suffer from one corner case or another. For example, canceling PIO completion on any write to %rip fails if userspace does a save/restore of %rip, and attempting to avoid that issue by canceling PIO only if %rip changed then fails if PIO collides with the reset %rip. Attempting to zero in on the exact reset vector won't work for APs, which means adding more hooks such as the vCPU's MP_STATE, and so on and so forth. Checking for a linear %rip match technically suffers from corner cases, e.g. userspace could theoretically rewrite the underlying code page and expect a different instruction to execute, or the guest hardcodes a PIO reset at 0xfffffff0, but those are far, far outside of what can be considered normal operation. Fixes: 432baf60eee3 ("KVM: VMX: use kvm_fast_pio_in for handling IN I/O") Cc: <stable@vger.kernel.org> Reported-by: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28x86/kvm/hyper-v: avoid spurious pending stimer on vCPU initVitaly Kuznetsov1-2/+7
When userspace initializes guest vCPUs it may want to zero all supported MSRs including Hyper-V related ones including HV_X64_MSR_STIMERn_CONFIG/ HV_X64_MSR_STIMERn_COUNT. With commit f3b138c5d89a ("kvm/x86: Update SynIC timers on guest entry only") we began doing stimer_mark_pending() unconditionally on every config change. The issue I'm observing manifests itself as following: - Qemu writes 0 to STIMERn_{CONFIG,COUNT} MSRs and marks all stimers as pending in stimer_pending_bitmap, arms KVM_REQ_HV_STIMER; - kvm_hv_has_stimer_pending() starts returning true; - kvm_vcpu_has_events() starts returning true; - kvm_arch_vcpu_runnable() starts returning true; - when kvm_arch_vcpu_ioctl_run() gets into (vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED) case: - kvm_vcpu_block() gets in 'kvm_vcpu_check_block(vcpu) < 0' and returns immediately, avoiding normal wait path; - -EAGAIN is returned from kvm_arch_vcpu_ioctl_run() immediately forcing userspace to retry. So instead of normal wait path we get a busy loop on all secondary vCPUs before they get INIT signal. This seems to be undesirable, especially given that this happens even when Hyper-V extensions are not used. Generally, it seems to be pointless to mark an stimer as pending in stimer_pending_bitmap and arm KVM_REQ_HV_STIMER as the only thing kvm_hv_process_stimers() will do is clear the corresponding bit. We may just not mark disabled timers as pending instead. Fixes: f3b138c5d89a ("kvm/x86: Update SynIC timers on guest entry only") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28kvm/x86: Move MSR_IA32_ARCH_CAPABILITIES to array emulated_msrsXiaoyao Li1-1/+2
Since MSR_IA32_ARCH_CAPABILITIES is emualted unconditionally even if host doesn't suppot it. We should move it to array emulated_msrs from arry msrs_to_save, to report to userspace that guest support this msr. Signed-off-by: Xiaoyao Li <xiaoyao.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hostsSean Christopherson4-14/+13
The CPUID flag ARCH_CAPABILITIES is unconditioinally exposed to host userspace for all x86 hosts, i.e. KVM advertises ARCH_CAPABILITIES regardless of hardware support under the pretense that KVM fully emulates MSR_IA32_ARCH_CAPABILITIES. Unfortunately, only VMX hosts handle accesses to MSR_IA32_ARCH_CAPABILITIES (despite KVM_GET_MSRS also reporting MSR_IA32_ARCH_CAPABILITIES for all hosts). Move the MSR_IA32_ARCH_CAPABILITIES handling to common x86 code so that it's emulated on AMD hosts. Fixes: 1eaafe91a0df4 ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported") Cc: stable@vger.kernel.org Reported-by: Xiaoyao Li <xiaoyao.li@linux.intel.com> Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28kvm: don't redefine flags as something elseSebastian Andrzej Siewior1-3/+3
The function irqfd_wakeup() has flags defined as __poll_t and then it has additional flags which is used for irqflags. Redefine the inner flags variable as iflags so it does not shadow the outer flags. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28kvm: mmu: Used range based flushing in slot_handle_level_rangeBen Gardon1-2/+5
Replace kvm_flush_remote_tlbs with kvm_flush_remote_tlbs_with_address in slot_handle_level_range. When range based flushes are not enabled kvm_flush_remote_tlbs_with_address falls back to kvm_flush_remote_tlbs. This changes the behavior of many functions that indirectly use slot_handle_level_range, iff the range based flushes are enabled. The only potential problem I see with this is that kvm->tlbs_dirty will be cleared less often, however the only caller of slot_handle_level_range that checks tlbs_dirty is kvm_mmu_notifier_invalidate_range_start which checks it and does a kvm_flush_remote_tlbs after calling kvm_unmap_hva_range anyway. Tested: Ran all kvm-unit-tests on a Intel Haswell machine with and without this patch. The patch introduced no new failures. Signed-off-by: Ben Gardon <bgardon@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: export <linux/kvm_para.h> and <asm/kvm_para.h> iif KVM is supportedMasahiro Yamada33-20/+18
I do not see any consistency about headers_install of <linux/kvm_para.h> and <asm/kvm_para.h>. According to my analysis of Linux 5.1-rc1, there are 3 groups: [1] Both <linux/kvm_para.h> and <asm/kvm_para.h> are exported alpha, arm, hexagon, mips, powerpc, s390, sparc, x86 [2] <asm/kvm_para.h> is exported, but <linux/kvm_para.h> is not arc, arm64, c6x, h8300, ia64, m68k, microblaze, nios2, openrisc, parisc, sh, unicore32, xtensa [3] Neither <linux/kvm_para.h> nor <asm/kvm_para.h> is exported csky, nds32, riscv This does not match to the actual KVM support. At least, [2] is half-baked. Nor do arch maintainers look like they care about this. For example, commit 0add53713b1c ("microblaze: Add missing kvm_para.h to Kbuild") exported <asm/kvm_para.h> to user-space in order to fix an in-kernel build error. We have two ways to make this consistent: [A] export both <linux/kvm_para.h> and <asm/kvm_para.h> for all architectures, irrespective of the KVM support [B] Match the header export of <linux/kvm_para.h> and <asm/kvm_para.h> to the KVM support My first attempt was [A] because the code looks cleaner, but Paolo suggested [B]. So, this commit goes with [B]. For most architectures, <asm/kvm_para.h> was moved to the kernel-space. I changed include/uapi/linux/Kbuild so that it checks generated asm/kvm_para.h as well as check-in ones. After this commit, there will be two groups: [1] Both <linux/kvm_para.h> and <asm/kvm_para.h> are exported arm, arm64, mips, powerpc, s390, x86 [2] Neither <linux/kvm_para.h> nor <asm/kvm_para.h> is exported alpha, arc, c6x, csky, h8300, hexagon, ia64, m68k, microblaze, nds32, nios2, openrisc, parisc, riscv, sh, sparc, unicore32, xtensa Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: x86: remove check on nr_mmu_pages in kvm_arch_commit_memory_region()Wei Yang3-8/+4
* nr_mmu_pages would be non-zero only if kvm->arch.n_requested_mmu_pages is non-zero. * nr_mmu_pages is always non-zero, since kvm_mmu_calculate_mmu_pages() never return zero. Based on these two reasons, we can merge the two *if* clause and use the return value from kvm_mmu_calculate_mmu_pages() directly. This simplify the code and also eliminate the possibility for reader to believe nr_mmu_pages would be zero. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28kvm: nVMX: Add a vmentry check for HOST_SYSENTER_ESP and HOST_SYSENTER_EIP fieldsKrish Sadhukhan1-0/+5
According to section "Checks on VMX Controls" in Intel SDM vol 3C, the following check is performed on vmentry of L2 guests: On processors that support Intel 64 architecture, the IA32_SYSENTER_ESP field and the IA32_SYSENTER_EIP field must each contain a canonical address. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)Singh, Brijesh4-3/+45
Errata#1096: On a nested data page fault when CR.SMAP=1 and the guest data read generates a SMAP violation, GuestInstrBytes field of the VMCB on a VMEXIT will incorrectly return 0h instead the correct guest instruction bytes . Recommend Workaround: To determine what instruction the guest was executing the hypervisor will have to decode the instruction at the instruction pointer. The recommended workaround can not be implemented for the SEV guest because guest memory is encrypted with the guest specific key, and instruction decoder will not be able to decode the instruction bytes. If we hit this errata in the SEV guest then log the message and request a guest shutdown. Reported-by: Venkatesh Srinivas <venkateshs@google.com> Cc: Jim Mattson <jmattson@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: Reject device ioctls from processes other than the VM's creatorSean Christopherson2-5/+14
KVM's API requires thats ioctls must be issued from the same process that created the VM. In other words, userspace can play games with a VM's file descriptors, e.g. fork(), SCM_RIGHTS, etc..., but only the creator can do anything useful. Explicitly reject device ioctls that are issued by a process other than the VM's creator, and update KVM's API documentation to extend its requirements to device ioctls. Fixes: 852b6d57dc7f ("kvm: add device control API") Cc: <stable@vger.kernel.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: doc: Fix incorrect word ordering regarding supported use of APIsSean Christopherson1-12/+16
Per Paolo[1], instantiating multiple VMs in a single process is legal; but this conflicts with KVM's API documentation, which states: The only supported use is one virtual machine per process, and one vcpu per thread. However, an earlier section in the documentation states: Only run VM ioctls from the same process (address space) that was used to create the VM. and: Only run vcpu ioctls from the same thread that was used to create the vcpu. This suggests that the conflicting documentation is simply an incorrect ordering of of words, i.e. what's really meant is that a virtual machine can't be shared across multiple processes and a vCPU can't be shared across multiple threads. Tweak the blurb on issuing ioctls to use a more assertive tone, and rewrite the "supported use" sentence to reference said blurb instead of poorly restating it in different terms. Opportunistically add missing punctuation. [1] https://lkml.kernel.org/r/f23265d4-528e-3bd4-011f-4d7b8f3281db@redhat.com Fixes: 9c1b96e34717 ("KVM: Document basic API") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> [Improve notes on asynchronous ioctl] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size'Sean Christopherson4-22/+35
The cr4_pae flag is a bit of a misnomer, its purpose is really to track whether the guest PTE that is being shadowed is a 4-byte entry or an 8-byte entry. Prior to supporting nested EPT, the size of the gpte was reflected purely by CR4.PAE. KVM fudged things a bit for direct sptes, but it was mostly harmless since the size of the gpte never mattered. Now that a spte may be tracking an indirect EPT entry, relying on CR4.PAE is wrong and ill-named. For direct shadow pages, force the gpte_size to '1' as they are always 8-byte entries; EPT entries can only be 8-bytes and KVM always uses 8-byte entries for NPT and its identity map (when running with EPT but not unrestricted guest). Likewise, nested EPT entries are always 8-bytes. Nested EPT presents a unique scenario as the size of the entries are not dictated by CR4.PAE, but neither is the shadow page a direct map. To handle this scenario, set cr0_wp=1 and smap_andnot_wp=1, an otherwise impossible combination, to denote a nested EPT shadow page. Use the information to avoid incorrectly zapping an unsync'd indirect page in __kvm_sync_page(). Providing a consistent and accurate gpte_size fixes a bug reported by Vitaly where fast_cr3_switch() always fails when switching from L2 to L1 as kvm_mmu_get_page() would force role.cr4_pae=0 for direct pages, whereas kvm_calc_mmu_role_common() would set it according to CR4.PAE. Fixes: 7dcd575520082 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed") Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: nVMX: Do not inherit quadrant and invalid for the root shadow EPTSean Christopherson1-4/+9
Explicitly zero out quadrant and invalid instead of inheriting them from the root_mmu. Functionally, this patch is a nop as we (should) never set quadrant for a direct mapped (EPT) root_mmu and nested EPT is only allowed if EPT is used for L1, and the root_mmu will never be invalid at this point. Explicitly setting flags sets the stage for repurposing the legacy paging bits in role, e.g. nxe, cr0_wp, and sm{a,e}p_andnot_wp, at which point 'smm' would be the only flag to be inherited from root_mmu. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-28KVM: arm/arm64: Comments cleanup in mmu.cZenghui Yu1-14/+9
Some comments in virt/kvm/arm/mmu.c are outdated. Update them to reflect the current state of the code. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> [maz: commit message tidy-up] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2019-03-24Linux 5.1-rc2Linus Torvalds1-1/+1
2019-03-24clocksource/drivers/clps711x: Remove board supportAlexander Shiyan1-32/+13
Since board support for the CLPS711X platform was removed, remove the board support from the clps711x-timer driver. Signed-off-by: Alexander Shiyan <shc_work@mail.ru> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lkml.kernel.org/r/20181220111626.17140-1-shc_work@mail.ru
2019-03-23ext4: prohibit fstrim in norecovery modeDarrick J. Wong1-0/+7
The ext4 fstrim implementation uses the block bitmaps to find free space that can be discarded. If we haven't replayed the journal, the bitmaps will be stale and we absolutely *cannot* use stale metadata to zap the underlying storage. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-03-23ext4: cleanup bh release code in ext4_ind_remove_space()zhangyi (F)1-25/+22
Currently, we are releasing the indirect buffer where we are done with it in ext4_ind_remove_space(), so we can see the brelse() and BUFFER_TRACE() everywhere. It seems fragile and hard to read, and we may probably forget to release the buffer some day. This patch cleans up the code by putting of the code which releases the buffers to the end of the function. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2019-03-23ext4: brelse all indirect buffer in ext4_ind_remove_space()zhangyi (F)1-4/+8
All indirect buffers get by ext4_find_shared() should be released no mater the branch should be freed or not. But now, we forget to release the lower depth indirect buffers when removing space from the same higher depth indirect block. It will lead to buffer leak and futher more, it may lead to quota information corruption when using old quota, consider the following case. - Create and mount an empty ext4 filesystem without extent and quota features, - quotacheck and enable the user & group quota, - Create some files and write some data to them, and then punch hole to some files of them, it may trigger the buffer leak problem mentioned above. - Disable quota and run quotacheck again, it will create two new aquota files and write the checked quota information to them, which probably may reuse the freed indirect block(the buffer and page cache was not freed) as data block. - Enable quota again, it will invoke vfs_load_quota_inode()->invalidate_bdev() to try to clean unused buffers and pagecache. Unfortunately, because of the buffer of quota data block is still referenced, quota code cannot read the up to date quota info from the device and lead to quota information corruption. This problem can be reproduced by xfstests generic/231 on ext3 file system or ext4 file system without extent and quota features. This patch fix this problem by releasing the missing indirect buffers, in ext4_ind_remove_space(). Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org
2019-03-23genirq: Mark expected switch case fall-throughGustavo A. R. Silva1-0/+1
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. With -Wimplicit-fallthrough added to CFLAGS: kernel/irq/manage.c: In function ‘irq_do_set_affinity’: kernel/irq/manage.c:198:3: warning: this statement may fall through [-Wimplicit-fallthrough=] cpumask_copy(desc->irq_common_data.affinity, mask); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/irq/manage.c:199:2: note: here case IRQ_SET_MASK_OK_NOCOPY: ^~~~ Annotate it. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20190228213714.GA9246@embeddedor
2019-03-23clocksource/drivers/riscv: Fix clocksource maskAtish Patra1-3/+2
For all riscv architectures (RV32, RV64 and RV128), the clocksource is a 64 bit incrementing counter. Fix the clock source mask accordingly. Tested on both 64bit and 32 bit virt machine in QEMU. Fixes: 62b019436814 ("clocksource: new RISC-V SBI timer driver") Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Anup Patel <anup@brainfault.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: linux-riscv@lists.infradead.org Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Anup Patel <Anup.Patel@wdc.com> Cc: Damien Le Moal <Damien.LeMoal@wdc.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190322215411.19362-1-atish.patra@wdc.com
2019-03-23x86/gart: Exclude GART aperture from kcoreKairui Song3-7/+42
On machines where the GART aperture is mapped over physical RAM, /proc/kcore contains the GART aperture range. Accessing the GART range via /proc/kcore results in a kernel crash. vmcore used to have the same issue, until it was fixed with commit 2a3e83c6f96c ("x86/gart: Exclude GART aperture from vmcore")', leveraging existing hook infrastructure in vmcore to let /proc/vmcore return zeroes when attempting to read the aperture region, and so it won't read from the actual memory. Apply the same workaround for kcore. First implement the same hook infrastructure for kcore, then reuse the hook functions introduced in the previous vmcore fix. Just with some minor adjustment, rename some functions for more general usage, and simplify the hook infrastructure a bit as there is no module usage yet. Suggested-by: Baoquan He <bhe@redhat.com> Signed-off-by: Kairui Song <kasong@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jiri Bohac <jbohac@suse.cz> Acked-by: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Omar Sandoval <osandov@fb.com> Cc: Dave Young <dyoung@redhat.com> Link: https://lkml.kernel.org/r/20190308030508.13548-1-kasong@redhat.com
2019-03-22cifs: update internal module version numberSteve French1-1/+1
To 2.19 Signed-off-by: Steve French <stfrench@microsoft.com>
2019-03-22SMB3: Fix SMB3.1.1 guest mounts to SambaSteve French1-1/+4
Workaround problem with Samba responses to SMB3.1.1 null user (guest) mounts. The server doesn't set the expected flag in the session setup response so we have to do a similar check to what is done in smb3_validate_negotiate where we also check if the user is a null user (but not sec=krb5 since username might not be passed in on mount for Kerberos case). Note that the commit below tightened the conditions and forced signing for the SMB2-TreeConnect commands as per MS-SMB2. However, this should only apply to normal user sessions and not for cases where there is no user (even if server forgets to set the flag in the response) since we don't have anything useful to sign with. This is especially important now that the more secure SMB3.1.1 protocol is in the default dialect list. An earlier patch ("cifs: allow guest mounts to work for smb3.11") fixed the guest mounts to Windows. Fixes: 6188f28bf608 ("Tree connect for SMB3.1.1 must be signed for non-encrypted shares") Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Paulo Alcantara <palcantara@suse.de> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-03-22cifs: Fix slab-out-of-bounds when tracing SMB tconPaulo Alcantara (SUSE)1-3/+3
This patch fixes the following KASAN report: [ 779.044746] BUG: KASAN: slab-out-of-bounds in string+0xab/0x180 [ 779.044750] Read of size 1 at addr ffff88814f327968 by task trace-cmd/2812 [ 779.044756] CPU: 1 PID: 2812 Comm: trace-cmd Not tainted 5.1.0-rc1+ #62 [ 779.044760] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-0-ga698c89-prebuilt.qemu.org 04/01/2014 [ 779.044761] Call Trace: [ 779.044769] dump_stack+0x5b/0x90 [ 779.044775] ? string+0xab/0x180 [ 779.044781] print_address_description+0x6c/0x23c [ 779.044787] ? string+0xab/0x180 [ 779.044792] ? string+0xab/0x180 [ 779.044797] kasan_report.cold.3+0x1a/0x32 [ 779.044803] ? string+0xab/0x180 [ 779.044809] string+0xab/0x180 [ 779.044816] ? widen_string+0x160/0x160 [ 779.044822] ? vsnprintf+0x5bf/0x7f0 [ 779.044829] vsnprintf+0x4e7/0x7f0 [ 779.044836] ? pointer+0x4a0/0x4a0 [ 779.044841] ? seq_buf_vprintf+0x79/0xc0 [ 779.044848] seq_buf_vprintf+0x62/0xc0 [ 779.044855] trace_seq_printf+0x113/0x210 [ 779.044861] ? trace_seq_puts+0x110/0x110 [ 779.044867] ? trace_raw_output_prep+0xd8/0x110 [ 779.044876] trace_raw_output_smb3_tcon_class+0x9f/0xc0 [ 779.044882] print_trace_line+0x377/0x890 [ 779.044888] ? tracing_buffers_read+0x300/0x300 [ 779.044893] ? ring_buffer_read+0x58/0x70 [ 779.044899] s_show+0x6e/0x140 [ 779.044906] seq_read+0x505/0x6a0 [ 779.044913] vfs_read+0xaf/0x1b0 [ 779.044919] ksys_read+0xa1/0x130 [ 779.044925] ? kernel_write+0xa0/0xa0 [ 779.044931] ? __do_page_fault+0x3d5/0x620 [ 779.044938] do_syscall_64+0x63/0x150 [ 779.044944] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 779.044949] RIP: 0033:0x7f62c2c2db31 [ 779.044955] Code: fe ff ff 48 8d 3d 17 9e 09 00 48 83 ec 08 e8 96 02 02 00 66 0f 1f 44 00 00 8b 05 fa fc 2c 00 48 63 ff 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 55 53 48 89 d5 48 89 [ 779.044958] RSP: 002b:00007ffd6e116678 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 779.044964] RAX: ffffffffffffffda RBX: 0000560a38be9260 RCX: 00007f62c2c2db31 [ 779.044966] RDX: 0000000000002000 RSI: 00007ffd6e116710 RDI: 0000000000000003 [ 779.044966] RDX: 0000000000002000 RSI: 00007ffd6e116710 RDI: 0000000000000003 [ 779.044969] RBP: 00007f62c2ef5420 R08: 0000000000000000 R09: 0000000000000003 [ 779.044972] R10: ffffffffffffffa8 R11: 0000000000000246 R12: 00007ffd6e116710 [ 779.044975] R13: 0000000000002000 R14: 0000000000000d68 R15: 0000000000002000 [ 779.044981] Allocated by task 1257: [ 779.044987] __kasan_kmalloc.constprop.5+0xc1/0xd0 [ 779.044992] kmem_cache_alloc+0xad/0x1a0 [ 779.044997] getname_flags+0x6c/0x2a0 [ 779.045003] user_path_at_empty+0x1d/0x40 [ 779.045008] do_faccessat+0x12a/0x330 [ 779.045012] do_syscall_64+0x63/0x150 [ 779.045017] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 779.045019] Freed by task 1257: [ 779.045023] __kasan_slab_free+0x12e/0x180 [ 779.045029] kmem_cache_free+0x85/0x1b0 [ 779.045034] filename_lookup.part.70+0x176/0x250 [ 779.045039] do_faccessat+0x12a/0x330 [ 779.045043] do_syscall_64+0x63/0x150 [ 779.045048] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 779.045052] The buggy address belongs to the object at ffff88814f326600 which belongs to the cache names_cache of size 4096 [ 779.045057] The buggy address is located 872 bytes to the right of 4096-byte region [ffff88814f326600, ffff88814f327600) [ 779.045058] The buggy address belongs to the page: [ 779.045062] page:ffffea00053cc800 count:1 mapcount:0 mapping:ffff88815b191b40 index:0x0 compound_mapcount: 0 [ 779.045067] flags: 0x200000000010200(slab|head) [ 779.045075] raw: 0200000000010200 dead000000000100 dead000000000200 ffff88815b191b40 [ 779.045081] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000 [ 779.045083] page dumped because: kasan: bad access detected [ 779.045085] Memory state around the buggy address: [ 779.045089] ffff88814f327800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 779.045093] ffff88814f327880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 779.045097] >ffff88814f327900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 779.045099] ^ [ 779.045103] ffff88814f327980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 779.045107] ffff88814f327a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 779.045109] ================================================================== [ 779.045110] Disabling lock debugging due to kernel taint Correctly assign tree name str for smb3_tcon event. Signed-off-by: Paulo Alcantara (SUSE) <paulo@paulo.ac> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-03-22cifs: allow guest mounts to work for smb3.11Ronnie Sahlberg1-2/+6
Fix Guest/Anonymous sessions so that they work with SMB 3.11. The commit noted below tightened the conditions and forced signing for the SMB2-TreeConnect commands as per MS-SMB2. However, this should only apply to normal user sessions and not for Guest/Anonumous sessions. Fixes: 6188f28bf608 ("Tree connect for SMB3.1.1 must be signed for non-encrypted shares") Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-03-22fix incorrect error code mapping for OBJECTID_NOT_FOUNDSteve French1-1/+2
It was mapped to EIO which can be confusing when user space queries for an object GUID for an object for which the server file system doesn't support (or hasn't saved one). As Amir Goldstein suggested this is similar to ENOATTR (equivalently ENODATA in Linux errno definitions) so changing NT STATUS code mapping for OBJECTID_NOT_FOUND to ENODATA. Signed-off-by: Steve French <stfrench@microsoft.com> CC: Amir Goldstein <amir73il@gmail.com>
2019-03-22cifs: fix that return -EINVAL when do dedupe operationXiaoli Feng1-1/+1
dedupe_file_range operations is combiled into remap_file_range. But it's always skipped for dedupe operations in function cifs_remap_file_range. Example to test: Before this patch: # dd if=/dev/zero of=cifs/file bs=1M count=1 # xfs_io -c "dedupe cifs/file 4k 64k 4k" cifs/file XFS_IOC_FILE_EXTENT_SAME: Invalid argument After this patch: # dd if=/dev/zero of=cifs/file bs=1M count=1 # xfs_io -c "dedupe cifs/file 4k 64k 4k" cifs/file XFS_IOC_FILE_EXTENT_SAME: Operation not supported Influence for xfstests: generic/091 generic/112 generic/127 generic/263 These tests report this error "do_copy_range:: Invalid argument" instead of "FIDEDUPERANGE: Invalid argument". Because there are still two bugs cause these test failed. https://bugzilla.kernel.org/show_bug.cgi?id=202935 https://bugzilla.kernel.org/show_bug.cgi?id=202785 Signed-off-by: Xiaoli Feng <fengxiaoli0714@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2019-03-22CIFS: Fix an issue with re-sending rdata when transport returning -EAGAINLong Li1-30/+41
When sending a rdata, transport may return -EAGAIN. In this case we should re-obtain credits because the session may have been reconnected. Change in v2: adjust_credits before re-sending Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-03-22CIFS: Fix an issue with re-sending wdata when transport returning -EAGAINLong Li1-32/+45
When sending a wdata, transport may return -EAGAIN. In this case we should re-obtain credits because the session may have been reconnected. Change in v2: adjust_credits before re-sending Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-03-22clocksource/drivers/mips-gic-timer: Make gic_compare_irqaction staticYueHaibing1-1/+1
Fix sparse warning: drivers/clocksource/mips-gic-timer.c:70:18: warning: symbol 'gic_compare_irqaction' was not declared. Should it be static? Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <daniel.lezcano@linaro.org> Link: https://lkml.kernel.org/r/20190322144359.19516-1-yuehaibing@huawei.com
2019-03-22clocksource/drivers/timer-ti-dm: Make omap_dm_timer_set_load_start() staticYueHaibing1-2/+2
Fix sparse warning: drivers/clocksource/timer-ti-dm.c:589:5: warning: symbol 'omap_dm_timer_set_load_start' was not declared. Should it be static? Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <daniel.lezcano@linaro.org> Link: https://lkml.kernel.org/r/20190322144302.6704-1-yuehaibing@huawei.com
2019-03-22clocksource/drivers/tcb_clksrc: Make tc_clksrc_suspend/resume() staticYueHaibing1-2/+2
Fix sparse warnings: drivers/clocksource/tcb_clksrc.c:74:6: warning: symbol 'tc_clksrc_suspend' was not declared. Should it be static? drivers/clocksource/tcb_clksrc.c:89:6: warning: symbol 'tc_clksrc_resume' was not declared. Should it be static? Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <nicolas.ferre@microchip.com> Cc: <daniel.lezcano@linaro.org> Cc: <linux-arm-kernel@lists.infradead.org> Link: https://lkml.kernel.org/r/20190322143940.12396-1-yuehaibing@huawei.com
2019-03-22clocksource/drivers/clps711x: Make clps711x_clksrc_init() staticYueHaibing1-2/+3
Fix sparse warning: drivers/clocksource/clps711x-timer.c:96:13: warning: symbol 'clps711x_clksrc_init' was not declared. Should it be static? Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <daniel.lezcano@linaro.org> Cc: <shc_work@mail.ru> Cc: <linux-arm-kernel@lists.infradead.org> Link: https://lkml.kernel.org/r/20190322143708.12716-1-yuehaibing@huawei.com
2019-03-22sbitmap: trivial - update comment for sbitmap_deferred_clear_bitShenghui Wang1-1/+1
"sbitmap_batch_clear" should be "sbitmap_deferred_clear" Acked-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Shenghui Wang <shhuiw@foxmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-22x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an errorNathan Chancellor1-0/+1
When building with -Wsometimes-uninitialized, Clang warns: arch/x86/kernel/hw_breakpoint.c:355:2: warning: variable 'align' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized] The default cannot be reached because arch_build_bp_info() initializes hw->len to one of the specified cases. Nevertheless the warning is valid and returning -EINVAL makes sure that this cannot be broken by future modifications. Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: clang-built-linux@googlegroups.com Link: https://github.com/ClangBuiltLinux/linux/issues/392 Link: https://lkml.kernel.org/r/20190307212756.4648-1-natechancellor@gmail.com
2019-03-22watchdog/core: Make variables staticValdis Kletnieks1-2/+2
sparse complains: CHECK kernel/watchdog.c kernel/watchdog.c:45:19: warning: symbol 'nmi_watchdog_available' was not declared. Should it be static? kernel/watchdog.c:47:16: warning: symbol 'watchdog_allowed_mask' was not declared. Should it be static? They're not referenced by name from anyplace else, make them static. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/7855.1552383228@turing-police
2019-03-22time/jiffies: Make refined_jiffies staticValdis Kletnieks1-1/+1
sparse complains: CHECK kernel/time/jiffies.c kernel/time/jiffies.c:92:20: warning: symbol 'refined_jiffies' was not declared. Should it be static? Its only used in file scope. Make it static. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/32342.1552379915@turing-police
2019-03-22genirq/devres: Remove excess parameter from kernel docValdis Kletnieks1-2/+0
Building with 'make W=1' complains: CC kernel/irq/devres.o kernel/irq/devres.c:104: warning: Excess function parameter 'thread_fn' description in 'devm_request_any_context_irq' Remove it. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/31207.1552378676@turing-police
2019-03-22x86/mm/pti: Make local symbols staticValdis Kletnieks1-2/+2
With 'make C=2 W=1', sparse and gcc both complain: CHECK arch/x86/mm/pti.c arch/x86/mm/pti.c:84:3: warning: symbol 'pti_mode' was not declared. Should it be static? arch/x86/mm/pti.c:605:6: warning: symbol 'pti_set_kernel_image_nonglobal' was not declared. Should it be static? CC arch/x86/mm/pti.o arch/x86/mm/pti.c:605:6: warning: no previous prototype for 'pti_set_kernel_image_nonglobal' [-Wmissing-prototypes] 605 | void pti_set_kernel_image_nonglobal(void) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ pti_set_kernel_image_nonglobal() is only used locally. 'pti_mode' exists in drivers/hwtracing/intel_th/pti.c as well, but it's a completely unrelated local (static) symbol. Make both static. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/27680.1552376873@turing-police
2019-03-22futex: Ensure that futex address is aligned in handle_futex_death()Chen Jie1-0/+4
The futex code requires that the user space addresses of futexes are 32bit aligned. sys_futex() checks this in futex_get_keys() but the robust list code has no alignment check in place. As a consequence the kernel crashes on architectures with strict alignment requirements in handle_futex_death() when trying to cmpxchg() on an unaligned futex address which was retrieved from the robust list. [ tglx: Rewrote changelog, proper sizeof() based alignement check and add comment ] Fixes: 0771dfefc9e5 ("[PATCH] lightweight robust futexes: core") Signed-off-by: Chen Jie <chenjie6@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <dvhart@infradead.org> Cc: <peterz@infradead.org> Cc: <zengweilin@huawei.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1552621478-119787-1-git-send-email-chenjie6@huawei.com
2019-03-22iommu/vt-d: Save the right domain ID used by hardwareLu Baolu1-1/+1
The driver sets a default domain id (FLPT_DEFAULT_DID) in the first level only pasid entry, but saves a different domain id in @sdev->did. The value saved in @sdev->did will be used to invalidate the translation caches. Hence, the driver might result in invalidating the caches with a wrong domain id. Cc: Ashok Raj <ashok.raj@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Fixes: 1c4f88b7f1f92 ("iommu/vt-d: Shared virtual address in scalable mode") Signed-off-by: Liu Yi L <yi.l.liu@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>