aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2020-02-05kvm: mmu: Separate generating and setting mmio ptesBen Gardon1-2/+13
Separate the functions for generating MMIO page table entries from the function that inserts them into the paging structure. This refactoring will facilitate changes to the MMU sychronization model to use atomic compare / exchanges (which are not guaranteed to succeed) instead of a monolithic MMU lock. No functional change expected. Tested by running kvm-unit-tests on an Intel Haswell machine. This commit introduced no new failures. Signed-off-by: Ben Gardon <bgardon@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: mmu: Replace unsigned with unsigned int for PTE accessBen Gardon1-11/+13
There are several functions which pass an access permission mask for SPTEs as an unsigned. This works, but checkpatch complains about it. Switch the occurrences of unsigned to unsigned int to satisfy checkpatch. No functional change expected. Tested by running kvm-unit-tests on an Intel Haswell machine. This commit introduced no new failures. Signed-off-by: Ben Gardon <bgardon@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: nVMX: Remove stale comment from nested_vmx_load_cr3()Sean Christopherson1-4/+4
The blurb pertaining to the return value of nested_vmx_load_cr3() no longer matches reality, remove it entirely as the behavior it is attempting to document is quite obvious when reading the actual code. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: MIPS: Fold comparecount_func() into comparecount_wakeup()Sean Christopherson1-10/+5
Fold kvm_mips_comparecount_func() into kvm_mips_comparecount_wakeup() to eliminate the nondescript function name as well as its unnecessary cast of a vcpu to "unsigned long" and back to a vcpu. Presumably func() was used as a callback at some point during pre-upstream development, as wakeup() is the only user of func() and has been the only user since both with introduced by commit 669e846e6c4e ("KVM/MIPS32: MIPS arch specific APIs for KVM"). Cc: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: MIPS: Fix a build error due to referencing not-yet-defined functionSean Christopherson1-21/+21
Hoist kvm_mips_comparecount_wakeup() above its only user, kvm_arch_vcpu_create() to fix a compilation error due to referencing an undefined function. Fixes: d11dfed5d700 ("KVM: MIPS: Move all vcpu init code into kvm_arch_vcpu_create()") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05x86/kvm: do not setup pv tlb flush when not paravirtualizedThadeu Lima de Souza Cascardo1-0/+3
kvm_setup_pv_tlb_flush will waste memory and print a misguiding message when KVM paravirtualization is not available. Intel SDM says that the when cpuid is used with EAX higher than the maximum supported value for basic of extended function, the data for the highest supported basic function will be returned. So, in some systems, kvm_arch_para_features will return bogus data, causing kvm_setup_pv_tlb_flush to detect support for pv tlb flush. Testing for kvm_para_available will work as it checks for the hypervisor signature. Besides, when the "nopv" command line parameter is used, it should not continue as well, as kvm_guest_init will no be called in that case. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: fix overflow of zero page refcount with ksm runningZhuang Yanying1-0/+1
We are testing Virtual Machine with KSM on v5.4-rc2 kernel, and found the zero_page refcount overflow. The cause of refcount overflow is increased in try_async_pf (get_user_page) without being decreased in mmu_set_spte() while handling ept violation. In kvm_release_pfn_clean(), only unreserved page will call put_page. However, zero page is reserved. So, as well as creating and destroy vm, the refcount of zero page will continue to increase until it overflows. step1: echo 10000 > /sys/kernel/pages_to_scan/pages_to_scan echo 1 > /sys/kernel/pages_to_scan/run echo 1 > /sys/kernel/pages_to_scan/use_zero_pages step2: just create several normal qemu kvm vms. And destroy it after 10s. Repeat this action all the time. After a long period of time, all domains hang because of the refcount of zero page overflow. Qemu print error log as follow: … error: kvm run failed Bad address EAX=00006cdc EBX=00000008 ECX=80202001 EDX=078bfbfd ESI=ffffffff EDI=00000000 EBP=00000008 ESP=00006cc4 EIP=000efd75 EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA] SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] FS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] GS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy GDT= 000f7070 00000037 IDT= 000f70ae 00000000 CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=00 01 00 00 00 e9 e8 00 00 00 c7 05 4c 55 0f 00 01 00 00 00 <8b> 35 00 00 01 00 8b 3d 04 00 01 00 b8 d8 d3 00 00 c1 e0 08 0c ea a3 00 00 01 00 c7 05 04 … Meanwhile, a kernel warning is departed. [40914.836375] WARNING: CPU: 3 PID: 82067 at ./include/linux/mm.h:987 try_get_page+0x1f/0x30 [40914.836412] CPU: 3 PID: 82067 Comm: CPU 0/KVM Kdump: loaded Tainted: G OE 5.2.0-rc2 #5 [40914.836415] RIP: 0010:try_get_page+0x1f/0x30 [40914.836417] Code: 40 00 c3 0f 1f 84 00 00 00 00 00 48 8b 47 08 a8 01 75 11 8b 47 34 85 c0 7e 10 f0 ff 47 34 b8 01 00 00 00 c3 48 8d 78 ff eb e9 <0f> 0b 31 c0 c3 66 90 66 2e 0f 1f 84 00 0 0 00 00 00 48 8b 47 08 a8 [40914.836418] RSP: 0018:ffffb4144e523988 EFLAGS: 00010286 [40914.836419] RAX: 0000000080000000 RBX: 0000000000000326 RCX: 0000000000000000 [40914.836420] RDX: 0000000000000000 RSI: 00004ffdeba10000 RDI: ffffdf07093f6440 [40914.836421] RBP: ffffdf07093f6440 R08: 800000424fd91225 R09: 0000000000000000 [40914.836421] R10: ffff9eb41bfeebb8 R11: 0000000000000000 R12: ffffdf06bbd1e8a8 [40914.836422] R13: 0000000000000080 R14: 800000424fd91225 R15: ffffdf07093f6440 [40914.836423] FS: 00007fb60ffff700(0000) GS:ffff9eb4802c0000(0000) knlGS:0000000000000000 [40914.836425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [40914.836426] CR2: 0000000000000000 CR3: 0000002f220e6002 CR4: 00000000003626e0 [40914.836427] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [40914.836427] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [40914.836428] Call Trace: [40914.836433] follow_page_pte+0x302/0x47b [40914.836437] __get_user_pages+0xf1/0x7d0 [40914.836441] ? irq_work_queue+0x9/0x70 [40914.836443] get_user_pages_unlocked+0x13f/0x1e0 [40914.836469] __gfn_to_pfn_memslot+0x10e/0x400 [kvm] [40914.836486] try_async_pf+0x87/0x240 [kvm] [40914.836503] tdp_page_fault+0x139/0x270 [kvm] [40914.836523] kvm_mmu_page_fault+0x76/0x5e0 [kvm] [40914.836588] vcpu_enter_guest+0xb45/0x1570 [kvm] [40914.836632] kvm_arch_vcpu_ioctl_run+0x35d/0x580 [kvm] [40914.836645] kvm_vcpu_ioctl+0x26e/0x5d0 [kvm] [40914.836650] do_vfs_ioctl+0xa9/0x620 [40914.836653] ksys_ioctl+0x60/0x90 [40914.836654] __x64_sys_ioctl+0x16/0x20 [40914.836658] do_syscall_64+0x5b/0x180 [40914.836664] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [40914.836666] RIP: 0033:0x7fb61cb6bfc7 Signed-off-by: LinFeng <linfeng23@huawei.com> Signed-off-by: Zhuang Yanying <ann.zhuangyanying@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: x86: Take a u64 when checking for a valid dr7 valueSean Christopherson1-1/+1
Take a u64 instead of an unsigned long in kvm_dr7_valid() to fix a build warning on i386 due to right-shifting a 32-bit value by 32 when checking for bits being set in dr7[63:32]. Alternatively, the warning could be resolved by rewriting the check to use an i386-friendly method, but taking a u64 fixes another oddity on 32-bit KVM. Beause KVM implements natural width VMCS fields as u64s to avoid layout issues between 32-bit and 64-bit, a devious guest can stuff vmcs12->guest_dr7 with a 64-bit value even when both the guest and host are 32-bit kernels. KVM eventually drops vmcs12->guest_dr7[63:32] when propagating vmcs12->guest_dr7 to vmcs02, but ideally KVM would not rely on that behavior for correctness. Cc: Jim Mattson <jmattson@google.com> Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com> Fixes: ecb697d10f70 ("KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests") Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: x86: use raw clock values consistentlyPaolo Bonzini1-15/+23
Commit 53fafdbb8b21f ("KVM: x86: switch KVMCLOCK base to monotonic raw clock") changed kvmclock to use tkr_raw instead of tkr_mono. However, the default kvmclock_offset for the VM was still based on the monotonic clock and, if the raw clock drifted enough from the monotonic clock, this could cause a negative system_time to be written to the guest's struct pvclock. RHEL5 does not like it and (if it boots fast enough to observe a negative time value) it hangs. There is another thing to be careful about: getboottime64 returns the host boot time with tkr_mono frequency, and subtracting the tkr_raw-based kvmclock value will cause the wallclock to be off if tkr_raw drifts from tkr_mono. To avoid this, compute the wallclock delta from the current time instead of being clever and using getboottime64. Fixes: 53fafdbb8b21f ("KVM: x86: switch KVMCLOCK base to monotonic raw clock") Cc: stable@vger.kernel.org Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: x86: reorganize pvclock_gtod_data membersPaolo Bonzini1-17/+12
We will need a copy of tk->offs_boot in the next patch. Store it and cleanup the struct: instead of storing tk->tkr_xxx.base with the tk->offs_boot included, store the raw value in struct pvclock_clock and sum it in do_monotonic_raw and do_realtime. tk->tkr_xxx.xtime_nsec also moves to struct pvclock_clock. While at it, fix a (usually harmless) typo in do_monotonic_raw, which was using gtod->clock.shift instead of gtod->raw_clock.shift. Fixes: 53fafdbb8b21f ("KVM: x86: switch KVMCLOCK base to monotonic raw clock") Cc: stable@vger.kernel.org Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: nVMX: delete meaningless nested_vmx_run() declarationMiaohe Lin1-2/+0
The function nested_vmx_run() declaration is below its implementation. So this is meaningless and should be removed. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: SVM: allow AVIC without split irqchipPaolo Bonzini1-1/+1
SVM is now able to disable AVIC dynamically whenever the in-kernel PIT sets up an ack notifier, so we can enable it even if in-kernel IOAPIC/PIC/PIT are in use. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: ioapic: Lazy update IOAPIC EOISuravee Suthikulpanit1-0/+39
In-kernel IOAPIC does not receive EOI with AMD SVM AVIC since the processor accelerate write to APIC EOI register and does not trap if the interrupt is edge-triggered. Workaround this by lazy check for pending APIC EOI at the time when setting new IOPIC irq, and update IOAPIC EOI if no pending APIC EOI. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: ioapic: Refactor kvm_ioapic_update_eoi()Suravee Suthikulpanit1-54/+56
Refactor code for handling IOAPIC EOI for subsequent patch. There is no functional change. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: i8254: Deactivate APICv when using in-kernel PIT re-injection mode.Suravee Suthikulpanit3-2/+22
AMD SVM AVIC accelerates EOI write and does not trap. This causes in-kernel PIT re-injection mode to fail since it relies on irq-ack notifier mechanism. So, APICv is activated only when in-kernel PIT is in discard mode e.g. w/ qemu option: -global kvm-pit.lost_tick_policy=discard Also, introduce APICV_INHIBIT_REASON_PIT_REINJ bit to be used for this reason. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05svm: Temporarily deactivate AVIC during ExtINT handlingSuravee Suthikulpanit2-4/+30
AMD AVIC does not support ExtINT. Therefore, AVIC must be temporary deactivated and fall back to using legacy interrupt injection via vINTR and interrupt window. Also, introduce APICV_INHIBIT_REASON_IRQWIN to be used for this reason. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> [Rename svm_request_update_avic to svm_toggle_avic_for_extint. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05svm: Deactivate AVIC when launching guest with nested SVM supportSuravee Suthikulpanit2-1/+11
Since AVIC does not currently work w/ nested virtualization, deactivate AVIC for the guest if setting CPUID Fn80000001_ECX[SVM] (i.e. indicate support for SVM, which is needed for nested virtualization). Also, introduce a new APICV_INHIBIT_REASON_NESTED bit to be used for this reason. Suggested-by: Alexander Graf <graf@amazon.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: hyperv: Use APICv update request interfaceSuravee Suthikulpanit5-18/+8
Since disabling APICv has to be done for all vcpus on AMD-based system, adopt the newly introduced kvm_request_apicv_update() interface, and introduce a new APICV_INHIBIT_REASON_HYPERV. Also, remove the kvm_vcpu_deactivate_apicv() since no longer used. Cc: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05svm: Add support for dynamic APICvSuravee Suthikulpanit1-10/+28
Add necessary logics to support (de)activate AVIC at runtime. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: Introduce x86 ops hook for pre-update APICvSuravee Suthikulpanit3-0/+9
AMD SVM AVIC needs to update APIC backing page mapping before changing APICv mode. Introduce struct kvm_x86_ops.pre_update_apicv_exec_ctrl function hook to be called prior KVM APICv update request to each vcpu. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: Introduce APICv x86 ops for checking APIC inhibit reasonsSuravee Suthikulpanit4-0/+21
Inibit reason bits are used to determine if APICv deactivation is applicable for a particular hardware virtualization architecture. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: svm: avic: Add support for dynamic setup/teardown of virtual APIC backing pageSuravee Suthikulpanit1-6/+5
Re-factor avic_init_access_page() to avic_update_access_page() since activate/deactivate AVIC requires setting/unsetting the memory region used for virtual APIC backing page (APIC_ACCESS_PAGE_PRIVATE_MEMSLOT). Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: svm: Add support to (de)activate posted interruptsSuravee Suthikulpanit1-1/+36
Introduce interface for (de)activate posted interrupts, and implement SVM hooks to toggle AMD IOMMU guest virtual APIC mode. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: Add APICv (de)activate request trace pointsSuravee Suthikulpanit2-0/+21
Add trace points when sending request to (de)activate APICv. Suggested-by: Alexander Graf <graf@amazon.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: Add support for dynamic APICv activationSuravee Suthikulpanit2-0/+42
Certain runtime conditions require APICv to be temporary deactivated during runtime. The current implementation only support run-time deactivation of APICv when Hyper-V SynIC is enabled, which is not temporary. In addition, for AMD, when APICv is (de)activated at runtime, all vcpus in the VM have to operate in the same mode. Thus the requesting vcpu must notify the others. So, introduce the following: * A new KVM_REQ_APICV_UPDATE request bit * Interfaces to request all vcpus to update APICv status * A new interface to update APICV-related parameters for each vcpu Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05KVM: x86: remove get_enable_apicv from kvm_x86_opsPaolo Bonzini3-13/+0
It is unused now. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: x86: Introduce APICv inhibit reason bitsSuravee Suthikulpanit4-2/+38
There are several reasons in which a VM needs to deactivate APICv e.g. disable APICv via parameter during module loading, or when enable Hyper-V SynIC support. Additional inhibit reasons will be introduced later on when dynamic APICv is supported, Introduce KVM APICv inhibit reason bits along with a new variable, apicv_inhibit_reasons, to help keep track of APICv state for each VM, Initially, the APICV_INHIBIT_REASON_DISABLE bit is used to indicate the case where APICv is disabled during KVM module load. (e.g. insmod kvm_amd avic=0 or insmod kvm_intel enable_apicv=0). Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> [Do not use get_enable_apicv; consider irqchip_split in svm.c. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05kvm: lapic: Introduce APICv update helper functionSuravee Suthikulpanit2-5/+18
Re-factor code into a helper function for setting lapic parameters when activate/deactivate APICv, and export the function for subsequent usage. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-30x86/KVM: Clean up host's steal time structureBoris Ostrovsky2-10/+4
Now that we are mapping kvm_steal_time from the guest directly we don't need keep a copy of it in kvm_vcpu_arch.st. The same is true for the stime field. This is part of CVE-2019-3016. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-30x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missedBoris Ostrovsky1-21/+30
There is a potential race in record_steal_time() between setting host-local vcpu->arch.st.steal.preempted to zero (i.e. clearing KVM_VCPU_PREEMPTED) and propagating this value to the guest with kvm_write_guest_cached(). Between those two events the guest may still see KVM_VCPU_PREEMPTED in its copy of kvm_steal_time, set KVM_VCPU_FLUSH_TLB and assume that hypervisor will do the right thing. Which it won't. Instad of copying, we should map kvm_steal_time and that will guarantee atomicity of accesses to @preempted. This is part of CVE-2019-3016. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-30x86/kvm: Cache gfn to pfn translationBoris Ostrovsky5-22/+103
__kvm_map_gfn()'s call to gfn_to_pfn_memslot() is * relatively expensive * in certain cases (such as when done from atomic context) cannot be called Stashing gfn-to-pfn mapping should help with both cases. This is part of CVE-2019-3016. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-30x86/kvm: Introduce kvm_(un)map_gfn()Boris Ostrovsky2-5/+26
kvm_vcpu_(un)map operates on gfns from any current address space. In certain cases we want to make sure we are not mapping SMRAM and for that we can use kvm_(un)map_gfn() that we are introducing in this patch. This is part of CVE-2019-3016. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-30x86/kvm: Be careful not to clear KVM_VCPU_FLUSH_TLB bitBoris Ostrovsky1-0/+3
kvm_steal_time_set_preempted() may accidentally clear KVM_VCPU_FLUSH_TLB bit if it is called more than once while VCPU is preempted. This is part of CVE-2019-3016. (This bug was also independently discovered by Jim Mattson <jmattson@google.com>) Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-30net/core: Do not clear VF index for node/port GUIDs queryLeon Romanovsky1-2/+2
VF numbers were assigned to node_guid and port_guid, but cleared right before such query calls were issued. It caused to return node/port GUIDs of VF index 0 for all VFs. Fixes: 30aad41721e0 ("net/core: Add support for getting VF GUIDs") Reported-by: Adrian Chiris <adrianc@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-30y2038: sparc: remove use of struct timexArnd Bergmann2-16/+19
'struct timex' is one of the last users of 'struct timeval' and is only referenced in one place in the kernel any more, to convert the user space timex into the kernel-internal version on sparc64, with a different tv_usec member type. As a preparation for hiding the time_t definition and everything using that in the kernel, change the implementation once more to only convert the timeval member, and then enclose the struct definition in an #ifdef. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Julian Calaby <julian.calaby@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-30sparc64: add support for folded p4d page tablesMike Rapoport7-32/+84
Implement primitives necessary for the 4th level folding, add walks of p4d level where appropriate and replace 5level-fixup.h with pgtable-nop4d.h. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-30ide: make drive->dn read onlyDan Carpenter2-1/+5
The IDE core always sets ->dn correctly so changing it is never required. Setting it to a different value than assigned by IDE core is very likely to result in data corruption (due to wrong transfer timings being set on the controller etc.) Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Tested-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-30mptcp: Fix undefined mptcp_handle_ipv6_mapped for modular IPV6Geert Uytterhoeven3-12/+9
If CONFIG_MPTCP=y, CONFIG_MPTCP_IPV6=n, and CONFIG_IPV6=m: ERROR: "mptcp_handle_ipv6_mapped" [net/ipv6/ipv6.ko] undefined! This does not happen if CONFIG_MPTCP_IPV6=y, as CONFIG_MPTCP_IPV6 selects CONFIG_IPV6, and thus forces CONFIG_IPV6 builtin. As exporting a symbol for an empty function would be a bit wasteful, fix this by providing a dummy version of mptcp_handle_ipv6_mapped() for the CONFIG_MPTCP_IPV6=n case. Rename mptcp_handle_ipv6_mapped() to mptcpv6_handle_mapped(), to make it clear this is a pure-IPV6 function, just like mptcpv6_init(). Fixes: cec37a6e41aae7bf ("mptcp: Handle MP_CAPABLE options for outgoing connections") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-30net: drop_monitor: Use kstrdupJoe Perches1-6/+2
Convert the equivalent but rather odd uses of kmemdup with __GFP_ZERO to the more common kstrdup and avoid unnecessary zeroing of copied over memory. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-30udp: document udp_rcv_segment special case for looped packetsWillem de Bruijn1-0/+7
Commit 6cd021a58c18a ("udp: segment looped gso packets correctly") fixes an issue with rare udp gso multicast packets looped onto the receive path. The stable backport makes the narrowest change to target only these packets, when needed. As opposed to, say, expanding __udp_gso_segment, which is harder to reason to be free from unintended side-effects. But the resulting code is hardly self-describing. Document its purpose and rationale. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-30mptcp: MPTCP_HMAC_TEST should depend on MPTCPGeert Uytterhoeven1-2/+4
As the MPTCP HMAC test is integrated into the MPTCP code, it can be built only when MPTCP is enabled. Hence when MPTCP is disabled, asking the user if the test code should be enabled is futile. Wrap the whole block of MPTCP-specific config options inside a check for MPTCP. While at it, drop the "default n" for MPTCP_HMAC_TEST, as that is the default anyway. Fixes: 65492c5a6ab5df50 ("mptcp: move from sha1 (v0) to sha256 (v1)") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-30mptcp: Fix incorrect IPV6 dependency checkGeert Uytterhoeven1-1/+1
If CONFIG_MPTCP=y, CONFIG_MPTCP_IPV6=n, and CONFIG_IPV6=m: net/mptcp/protocol.o: In function `__mptcp_tcp_fallback': protocol.c:(.text+0x786): undefined reference to `inet6_stream_ops' Fix this by checking for CONFIG_MPTCP_IPV6 instead of CONFIG_IPV6, like is done in all other places in the mptcp code. Fixes: 8ab183deb26a3b79 ("mptcp: cope with later TCP fallback") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-29io_uring: add support for epoll_ctl(2)Jens Axboe2-0/+72
This adds IORING_OP_EPOLL_CTL, which can perform the same work as the epoll_ctl(2) system call. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29eventpoll: support non-blocking do_epoll_ctl() callsJens Axboe2-13/+42
Also make it available outside of epoll, along with the helper that decides if we need to copy the passed in epoll_event. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29eventpoll: abstract out epoll_ctl() handlerJens Axboe1-20/+25
No functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29io_uring: fix linked command file table usageJens Axboe3-14/+21
We're not consistent in how the file table is grabbed and assigned if we have a command linked that requires the use of it. Add ->file_table to the io_op_defs[] array, and use that to determine when to grab the table instead of having the handlers set it if they need to defer. This also means we can kill the IO_WQ_WORK_NEEDS_FILES flag. We always initialize work->files, so io-wq can just check for that. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-29Revert "MAINTAINERS: mptcp@ mailing list is moderated"Mat Martineau1-1/+1
This reverts commit 74759e1693311a8d1441de836c4080c192374238. mptcp@lists.01.org accepts messages from non-subscribers. There was an invisible and unexpected server-wide rule limiting the number of recipients for subscribers and non-subscribers alike, and that has now been turned off for this list. Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-29mptcp: handle tcp fallback when using syn cookiesFlorian Westphal5-3/+14
We can't deal with syncookie mode yet, the syncookie rx path will create tcp reqsk, i.e. we get OOB access because we treat tcp reqsk as mptcp reqsk one: TCP: SYN flooding on port 20002. Sending cookies. BUG: KASAN: slab-out-of-bounds in subflow_syn_recv_sock+0x451/0x4d0 net/mptcp/subflow.c:191 Read of size 1 at addr ffff8881167bc148 by task syz-executor099/2120 subflow_syn_recv_sock+0x451/0x4d0 net/mptcp/subflow.c:191 tcp_get_cookie_sock+0xcf/0x520 net/ipv4/syncookies.c:209 cookie_v6_check+0x15a5/0x1e90 net/ipv6/syncookies.c:252 tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1123 [inline] [..] Bug can be reproduced via "sysctl net.ipv4.tcp_syncookies=2". Note that MPTCP should work with syncookies (4th ack would carry needed state), but it appears better to sort that out in -next so do tcp fallback for now. I removed the MPTCP ifdef for tcp_rsk "is_mptcp" member because if (IS_ENABLED()) is easier to read than "#ifdef IS_ENABLED()/#endif" pair. Cc: Eric Dumazet <edumazet@google.com> Fixes: cec37a6e41aae7bf ("mptcp: Handle MP_CAPABLE options for outgoing connections") Reported-by: Christoph Paasch <cpaasch@apple.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-29mptcp: avoid a lockdep splat when mcast group was joinedFlorian Westphal1-2/+6
syzbot triggered following lockdep splat: ffffffff82d2cd40 (rtnl_mutex){+.+.}, at: ip_mc_drop_socket+0x52/0x180 but task is already holding lock: ffff8881187a2310 (sk_lock-AF_INET){+.+.}, at: mptcp_close+0x18/0x30 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (sk_lock-AF_INET){+.+.}: lock_acquire+0xee/0x230 lock_sock_nested+0x89/0xc0 do_ip_setsockopt.isra.0+0x335/0x22f0 ip_setsockopt+0x35/0x60 tcp_setsockopt+0x5d/0x90 __sys_setsockopt+0xf3/0x190 __x64_sys_setsockopt+0x61/0x70 do_syscall_64+0x72/0x300 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (rtnl_mutex){+.+.}: check_prevs_add+0x2b7/0x1210 __lock_acquire+0x10b6/0x1400 lock_acquire+0xee/0x230 __mutex_lock+0x120/0xc70 ip_mc_drop_socket+0x52/0x180 inet_release+0x36/0xe0 __sock_release+0xfd/0x130 __mptcp_close+0xa8/0x1f0 inet_release+0x7f/0xe0 __sock_release+0x69/0x130 sock_close+0x18/0x20 __fput+0x179/0x400 task_work_run+0xd5/0x110 do_exit+0x685/0x1510 do_group_exit+0x7e/0x170 __x64_sys_exit_group+0x28/0x30 do_syscall_64+0x72/0x300 entry_SYSCALL_64_after_hwframe+0x49/0xbe The trigger is: socket(AF_INET, SOCK_STREAM, 0x106 /* IPPROTO_MPTCP */) = 4 setsockopt(4, SOL_IP, MCAST_JOIN_GROUP, {gr_interface=7, gr_group={sa_family=AF_INET, sin_port=htons(20003), sin_addr=inet_addr("224.0.0.2")}}, 136) = 0 exit(0) Which results in a call to rtnl_lock while we are holding the parent mptcp socket lock via mptcp_close -> lock_sock(msk) -> inet_release -> ip_mc_drop_socket -> rtnl_lock(). >From lockdep point of view we thus have both 'rtnl_lock; lock_sock' and 'lock_sock; rtnl_lock'. Fix this by stealing the msk conn_list and doing the subflow close without holding the msk lock. Fixes: cec37a6e41aae7bf ("mptcp: Handle MP_CAPABLE options for outgoing connections") Reported-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-29mptcp: fix panic on user pointer accessFlorian Westphal1-18/+22
Its not possible to call the kernel_(s|g)etsockopt functions here, the address points to user memory: General protection fault in user access. Non-canonical address? WARNING: CPU: 1 PID: 5352 at arch/x86/mm/extable.c:77 ex_handler_uaccess+0xba/0xe0 arch/x86/mm/extable.c:77 Kernel panic - not syncing: panic_on_warn set ... [..] Call Trace: fixup_exception+0x9d/0xcd arch/x86/mm/extable.c:178 general_protection+0x2d/0x40 arch/x86/entry/entry_64.S:1202 do_ip_getsockopt+0x1f6/0x1860 net/ipv4/ip_sockglue.c:1323 ip_getsockopt+0x87/0x1c0 net/ipv4/ip_sockglue.c:1561 tcp_getsockopt net/ipv4/tcp.c:3691 [inline] tcp_getsockopt+0x8c/0xd0 net/ipv4/tcp.c:3685 kernel_getsockopt+0x121/0x1f0 net/socket.c:3736 mptcp_getsockopt+0x69/0x90 net/mptcp/protocol.c:830 __sys_getsockopt+0x13a/0x220 net/socket.c:2175 We can call tcp_get/setsockopt functions instead. Doing so fixes crashing, but still leaves rtnl related lockdep splat: WARNING: possible circular locking dependency detected 5.5.0-rc6 #2 Not tainted ------------------------------------------------------ syz-executor.0/16334 is trying to acquire lock: ffffffff84f7a080 (rtnl_mutex){+.+.}, at: do_ip_setsockopt.isra.0+0x277/0x3820 net/ipv4/ip_sockglue.c:644 but task is already holding lock: ffff888116503b90 (sk_lock-AF_INET){+.+.}, at: lock_sock include/net/sock.h:1516 [inline] ffff888116503b90 (sk_lock-AF_INET){+.+.}, at: mptcp_setsockopt+0x28/0x90 net/mptcp/protocol.c:1284 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (sk_lock-AF_INET){+.+.}: lock_sock_nested+0xca/0x120 net/core/sock.c:2944 lock_sock include/net/sock.h:1516 [inline] do_ip_setsockopt.isra.0+0x281/0x3820 net/ipv4/ip_sockglue.c:645 ip_setsockopt+0x44/0xf0 net/ipv4/ip_sockglue.c:1248 udp_setsockopt+0x5d/0xa0 net/ipv4/udp.c:2639 __sys_setsockopt+0x152/0x240 net/socket.c:2130 __do_sys_setsockopt net/socket.c:2146 [inline] __se_sys_setsockopt net/socket.c:2143 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2143 do_syscall_64+0xbd/0x5b0 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (rtnl_mutex){+.+.}: check_prev_add kernel/locking/lockdep.c:2475 [inline] check_prevs_add kernel/locking/lockdep.c:2580 [inline] validate_chain kernel/locking/lockdep.c:2970 [inline] __lock_acquire+0x1fb2/0x4680 kernel/locking/lockdep.c:3954 lock_acquire+0x127/0x330 kernel/locking/lockdep.c:4484 __mutex_lock_common kernel/locking/mutex.c:956 [inline] __mutex_lock+0x158/0x1340 kernel/locking/mutex.c:1103 do_ip_setsockopt.isra.0+0x277/0x3820 net/ipv4/ip_sockglue.c:644 ip_setsockopt+0x44/0xf0 net/ipv4/ip_sockglue.c:1248 tcp_setsockopt net/ipv4/tcp.c:3159 [inline] tcp_setsockopt+0x8c/0xd0 net/ipv4/tcp.c:3153 kernel_setsockopt+0x121/0x1f0 net/socket.c:3767 mptcp_setsockopt+0x69/0x90 net/mptcp/protocol.c:1288 __sys_setsockopt+0x152/0x240 net/socket.c:2130 __do_sys_setsockopt net/socket.c:2146 [inline] __se_sys_setsockopt net/socket.c:2143 [inline] __x64_sys_setsockopt+0xba/0x150 net/socket.c:2143 do_syscall_64+0xbd/0x5b0 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sk_lock-AF_INET); lock(rtnl_mutex); lock(sk_lock-AF_INET); lock(rtnl_mutex); The lockdep complaint is because we hold mptcp socket lock when calling the sk_prot get/setsockopt handler, and those might need to acquire the rtnl mutex. Normally, order is: rtnl_lock(sk) -> lock_sock Whereas for mptcp the order is lock_sock(mptcp_sk) rtnl_lock -> lock_sock(subflow_sk) We can avoid this by releasing the mptcp socket lock early, but, as Paolo points out, we need to get/put the subflow socket refcount before doing so to avoid race with concurrent close(). Fixes: 717e79c867ca5 ("mptcp: Add setsockopt()/getsockopt() socket operations") Reported-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>