linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2019-05-28	iommu/vt-d: Remove static identity map code	Lu Baolu	1	-143/+1
	The code to prepare the static identity map for various reserved memory ranges in intel_iommu_init() is duplicated with the default domain mechanism now. Remove it to avoid duplication. Signed-off-by: James Sewart <jamessewart@arista.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-05-28	iommu/vt-d: Remove duplicated code for device hotplug	Lu Baolu	1	-34/+0
	The iommu generic code has handled the device hotplug cases. Remove the duplicated code. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-05-28	iommu/vt-d: Remove startup parameter from device_def_domain_type()	Lu Baolu	1	-7/+7
	It isn't used anywhere. Remove it to make code concise. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-05-28	iommu/vt-d: Cleanup get_valid_domain_for_dev()	Lu Baolu	2	-11/+8
	Previously, get_valid_domain_for_dev() is used to retrieve the DMA domain which has been attached to the device or allocate one if no domain has been attached yet. As we have delegated the DMA domain management to upper layer, this function is used purely to allocate a private DMA domain if the default domain doesn't work for ths device. Cleanup the code for readability. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-05-28	iommu/vt-d: Implement is_attach_deferred iommu ops entry	Lu Baolu	1	-0/+23
	As a domain is now attached to a device earlier, we should implement the is_attach_deferred call-back and use it to defer the domain attach from iommu driver init to device driver init when iommu is pre-enabled in kdump kernel. Suggested-by: Tom Murphy <tmurphy@arista.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-05-28	iommu/vt-d: Probe DMA-capable ACPI name space devices	Lu Baolu	1	-0/+45
	Some platforms may support ACPI name-space enumerated devices that are capable of generating DMA requests. Platforms which support DMA remapping explicitly declares any such DMA-capable ACPI name-space devices in the platform through ACPI Name-space Device Declaration (ANDD) structure and enumerate them through the Device Scope of the appropriate remapping hardware unit. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-05-28	iommu/vt-d: Handle 32bit device with identity default domain	Lu Baolu	1	-33/+22
	The iommu driver doesn't know whether the bit width of a PCI device is sufficient for access to the whole system memory. Hence, the driver checks this when the driver calls into the dma APIs. If a device is using an identity domain, but the bit width is less than the system requirement, we need to use a dma domain instead. This also applies after we delegated the domain life cycle management to the upper layer. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-05-28	iommu/vt-d: Identify default domains replaced with private	Lu Baolu	1	-1/+63
	When we put a device into an iommu group, the group's default domain will be attached to the device. There are some corner cases where the type (identity or dma) of the default domain doesn't work for the device and the request of a new default domain results in failure (e.x. multiple devices have already existed in the group). In order to be compatible with the past, we used a private domain. Mark the private domains and disallow some iommu apis (map/unmap/iova_to_phys) on them. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-05-28	iommu/vt-d: Delegate the dma domain to upper layer	Lu Baolu	1	-55/+19
	This allows the iommu generic layer to allocate a dma domain and attach it to a device through the iommu api's. With all types of domains being delegated to upper layer, we can remove an internal flag which was used to distinguish domains mananged internally or externally. Signed-off-by: James Sewart <jamessewart@arista.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-05-28	iommu/vt-d: Delegate the identity domain to upper layer	Lu Baolu	1	-32/+58
	This allows the iommu generic layer to allocate an identity domain and attach it to a device. Hence, the identity domain is delegated to upper layer. As a side effect, iommu_identity_mapping can't be used to check the existence of identity domains any more. Signed-off-by: James Sewart <jamessewart@arista.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-05-28	iommu/vt-d: Add device_def_domain_type() helper	Lu Baolu	1	-13/+27
	This helper returns the default domain type that the device requires. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-05-28	iommu/vt-d: Enable DMA remapping after rmrr mapped	Lu Baolu	1	-6/+10
	The rmrr devices require identity map of the rmrr regions before enabling DMA remapping. Otherwise, there will be a window during which DMA from/to the rmrr regions will be blocked. In order to alleviate this, we move enabling DMA remapping after all rmrr regions get mapped. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-05-28	iommu/vt-d: Expose ISA direct mapping region via iommu_get_resv_regions	Lu Baolu	1	-0/+13
	To support mapping ISA region via iommu_group_create_direct_mappings, make sure its exposed by iommu_get_resv_regions. Signed-off-by: James Sewart <jamessewart@arista.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-05-27	iommu/vt-d: Implement apply_resv_region iommu ops entry	James Sewart	1	-0/+14
	Used by iommu.c before creating identity mappings for reserved ranges to ensure dma-ops won't ever remap these ranges. Signed-off-by: James Sewart <jamessewart@arista.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-05-27	iommu: Add API to request DMA domain for device	Lu Baolu	2	-11/+31
	Normally during iommu probing a device, a default doamin will be allocated and attached to the device. The domain type of the default domain is statically defined, which results in a situation where the allocated default domain isn't suitable for the device due to some limitations. We already have API iommu_request_dm_for_dev() to replace a DMA domain with an identity one. This adds iommu_request_dma_domain_for_dev() to request a dma domain if an allocated identity domain isn't suitable for the device in question. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-05-27	iommu/vt-d: Add debugfs support to show scalable mode DMAR table internals	Sai Praneeth Prakhya	1	-3/+75
	A DMAR table walk would typically follow the below process. 1. Bus number is used to index into root table which points to a context table. 2. Device number and Function number are used together to index into context table which then points to a pasid directory. 3. PASID[19:6] is used to index into PASID directory which points to a PASID table. 4. PASID[5:0] is used to index into PASID table which points to all levels of page tables. Whenever a user opens the file "/sys/kernel/debug/iommu/intel/dmar_translation_struct", the above described DMAR table walk is performed and the contents of the table are dumped into the file. The dump could be handy while dealing with devices that use PASID. Example of such dump: cat /sys/kernel/debug/iommu/intel/dmar_translation_struct (Please note that because of 80 char limit, entries that should have been in the same line are broken into different lines) IOMMU dmar0: Root Table Address: 0x436f7c000 B.D.F Root_entry Context_entry PASID PASID_table_entry 00:0a.0 0x0000000000000000:0x000000044dd3f001 0x0000000000100000:0x0000000435460e1d 0 0x000000044d6e1089:0x0000000000000003:0x0000000000000001 00:0a.0 0x0000000000000000:0x000000044dd3f001 0x0000000000100000:0x0000000435460e1d 1 0x0000000000000049:0x0000000000000001:0x0000000003c0e001 Note that the above format is followed even for legacy DMAR table dump which doesn't support PASID and hence in such cases PASID is defaulted to -1 indicating that PASID and it's related fields are invalid. Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-05-27	iommu/vt-d: Introduce macros useful for dumping DMAR table	Sai Praneeth Prakhya	4	-22/+33
	A scalable mode DMAR table walk would involve looking at bits in each stage of walk, like, 1. Is PASID enabled in the context entry? 2. What's the size of PASID directory? 3. Is the PASID directory entry present? 4. Is the PASID table entry present? 5. Number of PASID table entries? Hence, add these macros that will later be used during this walk. Apart from adding new macros, move existing macros (like pasid_pde_is_present(), get_pasid_table_from_pde() and pasid_supported()) to appropriate header files so that they could be reused. Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-05-27	iommu/vt-d: Modify the format of intel DMAR tables dump	Sai Praneeth Prakhya	1	-24/+41
	Presently, "/sys/kernel/debug/iommu/intel/dmar_translation_struct" file dumps DMAR tables in the below format IOMMU dmar2: Root Table Address:4362cc000 Root Table Entries: Bus: 0 H: 0 L: 4362f0001 Context Table Entries for Bus: 0 Entry B:D.F High Low 160 00:14.0 102 4362ef001 184 00:17.0 302 435ec4001 248 00:1f.0 202 436300001 This format has few short comings like 1. When extended for dumping scalable mode DMAR table it will quickly be very clumsy, making it unreadable. 2. It has information like the Bus number and Entry which are basically part of B:D.F, hence are a repetition and are not so useful. So, change it to a new format which could be easily extended to dump scalable mode DMAR table. The new format looks as below: IOMMU dmar2: Root Table Address: 0x436f7d000 B.D.F Root_entry Context_entry 00:14.0 0x0000000000000000:0x0000000436fbd001 0x0000000000000102:0x0000000436fbc001 00:17.0 0x0000000000000000:0x0000000436fbd001 0x0000000000000302:0x0000000436af4001 00:1f.0 0x0000000000000000:0x0000000436fbd001 0x0000000000000202:0x0000000436fcd001 Cc: Joerg Roedel <joro@8bytes.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Sohil Mehta <sohil.mehta@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-05-27	iommu/vt-d: Remove unnecessary rcu_read_locks	Lukasz Odzioba	1	-4/+0
	We use RCU's for rarely updated lists like iommus, rmrr, atsr units. I'm not sure why domain_remove_dev_info() in domain_exit() was surrounded by rcu_read_lock. Lock was present before refactoring in d160aca527, but it was related to rcu list, not domain_remove_dev_info function. dmar_remove_one_dev_info() doesn't touch any of those lists, so it doesn't require a lock. In fact it is called 6 times without it anyway. Fixes: d160aca5276d ("iommu/vt-d: Unify domain->iommu attach/detachment") Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-05-27	iommu/vt-d: Fix bind svm with multiple devices	Jacob Pan	1	-0/+15
	If multiple devices try to bind to the same mm/PASID, we need to set up first level PASID entries for all the devices. The current code does not consider this case which results in failed DMA for devices after the first bind. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Reported-by: Mike Campin <mike.campin@intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>
2019-05-26	Linux 5.2-rc2	Linus Torvalds	1	-2/+2

2019-05-26	random: fix soft lockup when trying to read from an uninitialized blocking pool	Theodore Ts'o	1	-3/+13
	Fixes: eb9d1bf079bb: "random: only read from /dev/random after its pool has received 128 bits" Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-05-25	tracing: Silence GCC 9 array bounds warning	Miguel Ojeda	3	-10/+20
	Starting with GCC 9, -Warray-bounds detects cases when memset is called starting on a member of a struct but the size to be cleared ends up writing over further members. Such a call happens in the trace code to clear, at once, all members after and including `seq` on struct trace_iterator: In function 'memset', inlined from 'ftrace_dump' at kernel/trace/trace.c:8914:3: ./include/linux/string.h:344:9: warning: '__builtin_memset' offset [8505, 8560] from the object at 'iter' is out of the bounds of referenced subobject 'seq' with type 'struct trace_seq' at offset 4368 [-Warray-bounds] 344 \| return __builtin_memset(p, c, size); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ In order to avoid GCC complaining about it, we compute the address ourselves by adding the offsetof distance instead of referring directly to the member. Since there are two places doing this clear (trace.c and trace_kdb.c), take the chance to move the workaround into a single place in the internal header. Link: http://lkml.kernel.org/r/20190523124535.GA12931@gmail.com Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> [ Removed unnecessary parenthesis around "iter" ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-05-24	ext4: fix dcache lookup of !casefolded directories	Gabriel Krisman Bertazi	1	-1/+1
	Found by visual inspection, this wasn't caught by my xfstest, since it's effect is ignoring positive dentries in the cache the fallback just goes to the disk. it was introduced in the last iteration of the case-insensitive patch. d_compare should return 0 when the entries match, so make sure we are correctly comparing the entire string if the encoding feature is set and we are on a case-INsensitive directory. Fixes: b886ee3e778e ("ext4: Support case-insensitive file name lookups") Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-05-24	locking/lock_events: Use this_cpu_add() when necessary	Waiman Long	1	-2/+40
	The kernel test robot has reported that the use of __this_cpu_add() causes bug messages like: BUG: using __this_cpu_add() in preemptible [00000000] code: ... Given the imprecise nature of the count and the possibility of resetting the count and doing the measurement again, this is not really a big problem to use the unprotected __this_cpu_() functions. To make the preemption checking code happy, the this_cpu_() functions will be used if CONFIG_DEBUG_PREEMPT is defined. The imprecise nature of the locking counts are also documented with the suggestion that we should run the measurement a few times with the counts reset in between to get a better picture of what is going on under the hood. Fixes: a8654596f0371 ("locking/rwsem: Enable lock event counting") Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-24	KVM: x86: fix return value for reserved EFER	Paolo Bonzini	1	-1/+1
	Commit 11988499e62b ("KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes", 2019-04-02) introduced a "return false" in a function returning int, and anyway set_efer has a "nonzero on error" conventon so it should be returning 1. Reported-by: Pavel Machek <pavel@denx.de> Fixes: 11988499e62b ("KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes") Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	tools/kvm_stat: fix fields filter for child events	Stefan Raspl	2	-4/+14
	The fields filter would not work with child fields, as the respective parents would not be included. No parents displayed == no childs displayed. To reproduce, run on s390 (would work on other platforms, too, but would require a different filter name): - Run 'kvm_stat -d' - Press 'f' - Enter 'instruct' Notice that events like instruction_diag_44 or instruction_diag_500 are not displayed - the output remains empty. With this patch, we will filter by matching events and their parents. However, consider the following example where we filter by instruction_diag_44: kvm statistics - summary regex filter: instruction_diag_44 Event Total %Total CurAvg/s exit_instruction 276 100.0 12 instruction_diag_44 256 92.8 11 Total 276 12 Note that the parent ('exit_instruction') displays the total events, but the childs listed do not match its total (256 instead of 276). This is intended (since we're filtering all but one child), but might be confusing on first sight. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	KVM: selftests: Wrap vcpu_nested_state_get/set functions with x86 guard	Thomas Huth	2	-0/+4
	struct kvm_nested_state is only available on x86 so far. To be able to compile the code on other architectures as well, we need to wrap the related code with #ifdefs. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	kvm: selftests: aarch64: compile with warnings on	Andrew Jones	1	-4/+5
	aarch64 fixups needed to compile with warnings as errors. Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	kvm: selftests: aarch64: fix default vm mode	Andrew Jones	1	-1/+1
	VM_MODE_P52V48_4K is not a valid mode for AArch64. Replace its use in vm_create_default() with a mode that works and represents a good AArch64 default. (We didn't ever see a problem with this because we don't have any unit tests using vm_create_default(), but it's good to get it fixed in advance.) Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	kvm: selftests: aarch64: dirty_log_test: fix unaligned memslot size	Andrew Jones	1	-1/+1
	The memory slot size must be aligned to the host's page size. When testing a guest with a 4k page size on a host with a 64k page size, then 3 guest pages are not host page size aligned. Since we just need a nearly arbitrary number of extra pages to ensure the memslot is not aligned to a 64 host-page boundary for this test, then we can use 16, as that's 64k aligned, but not 64 * 64k aligned. Fixes: 76d58e0f07ec ("KVM: fix KVM_CLEAR_DIRTY_LOG for memory slots of unaligned size", 2019-04-17) Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	KVM: s390: fix memory slot handling for KVM_SET_USER_MEMORY_REGION	Christian Borntraeger	1	-14/+21
	kselftests exposed a problem in the s390 handling for memory slots. Right now we only do proper memory slot handling for creation of new memory slots. Neither MOVE, nor DELETION are handled properly. Let us implement those. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	KVM: x86/pmu: do not mask the value that is written to fixed PMUs	Paolo Bonzini	1	-5/+8
	According to the SDM, for MSR_IA32_PERFCTR0/1 "the lower-order 32 bits of each MSR may be written with any value, and the high-order 8 bits are sign-extended according to the value of bit 31", but the fixed counters in real hardware are limited to the width of the fixed counters ("bits beyond the width of the fixed-function counter are reserved and must be written as zeros"). Fix KVM to do the same. Reported-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	KVM: x86/pmu: mask the result of rdpmc according to the width of the counters	Paolo Bonzini	4	-13/+15
	This patch will simplify the changes in the next, by enforcing the masking of the counters to RDPMC and RDMSR. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	x86/kvm/pmu: Set AMD's virt PMU version to 1	Borislav Petkov	1	-1/+1
	After commit: 672ff6cff80c ("KVM: x86: Raise #GP when guest vCPU do not support PMU") my AMD guests started #GPing like this: general protection fault: 0000 [#1] PREEMPT SMP CPU: 1 PID: 4355 Comm: bash Not tainted 5.1.0-rc6+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 RIP: 0010:x86_perf_event_update+0x3b/0xa0 with Code: pointing to RDPMC. It is RDPMC because the guest has the hardware watchdog CONFIG_HARDLOCKUP_DETECTOR_PERF enabled which uses perf. Instrumenting kvm_pmu_rdpmc() some, showed that it fails due to: if (!pmu->version) return 1; which the above commit added. Since AMD's PMU leaves the version at 0, that causes the #GP injection into the guest. Set pmu->version arbitrarily to 1 and move it above the non-applicable struct kvm_pmu members. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: kvm@vger.kernel.org Cc: Liran Alon <liran.alon@oracle.com> Cc: Mihai Carabas <mihai.carabas@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: stable@vger.kernel.org Fixes: 672ff6cff80c ("KVM: x86: Raise #GP when guest vCPU do not support PMU") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	KVM: x86: do not spam dmesg with VMCS/VMCB dumps	Paolo Bonzini	2	-8/+27
	Userspace can easily set up invalid processor state in such a way that dmesg will be filled with VMCS or VMCB dumps. Disable this by default using a module parameter. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	kvm: Check irqchip mode before assign irqfd	Peter Xu	3	-0/+17
	When assigning kvm irqfd we didn't check the irqchip mode but we allow KVM_IRQFD to succeed with all the irqchip modes. However it does not make much sense to create irqfd even without the kernel chips. Let's provide a arch-dependent helper to check whether a specific irqfd is allowed by the arch. At least for x86, it should make sense to check: - when irqchip mode is NONE, all irqfds should be disallowed, and, - when irqchip mode is SPLIT, irqfds that are with resamplefd should be disallowed. For either of the case, previously we'll silently ignore the irq or the irq ack event if the irqchip mode is incorrect. However that can cause misterious guest behaviors and it can be hard to triage. Let's fail KVM_IRQFD even earlier to detect these incorrect configurations. CC: Paolo Bonzini <pbonzini@redhat.com> CC: Radim Krčmář <rkrcmar@redhat.com> CC: Alex Williamson <alex.williamson@redhat.com> CC: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	kvm: svm/avic: fix off-by-one in checking host APIC ID	Suthikulpanit, Suravee	1	-1/+5
	Current logic does not allow VCPU to be loaded onto CPU with APIC ID 255. This should be allowed since the host physical APIC ID field in the AVIC Physical APIC table entry is an 8-bit value, and APIC ID 255 is valid in system with x2APIC enabled. Instead, do not allow VCPU load if the host APIC ID cannot be represented by an 8-bit value. Also, use the more appropriate AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK instead of AVIC_MAX_PHYSICAL_ID_COUNT. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	KVM: selftests: do not blindly clobber registers in guest asm	Paolo Bonzini	1	-24/+30
	The guest_code of sync_regs_test is assuming that the compiler will not touch %r11 outside the asm that increments it, which is a bit brittle. Instead, we can increment a variable and use a dummy asm to ensure the increment is not optimized away. However, we also need to use a callee-save register or the compiler will insert a save/restore around the vmexit, breaking the whole idea behind the test. (Yes, "if it ain't broken...", but I would like the test to be clean before it is copied into the upcoming s390 selftests). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	KVM: selftests: Remove duplicated TEST_ASSERT in hyperv_cpuid.c	Thomas Huth	1	-3/+0
	The check for entry->index == 0 is done twice. One time should be sufficient. Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace	Wanpeng Li	1	-0/+18
	Expose per-vCPU timer_advance_ns to userspace, so it is able to query the auto-adjusted value. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow	Wanpeng Li	1	-1/+1
	After commit c3941d9e0 (KVM: lapic: Allow user to disable adaptive tuning of timer advancement), '-1' enables adaptive tuning starting from default advancment of 1000ns. However, we should expose an int instead of an overflow uint module parameter. Before patch: /sys/module/kvm/parameters/lapic_timer_advance_ns:4294967295 After patch: /sys/module/kvm/parameters/lapic_timer_advance_ns:-1 Fixes: c3941d9e0 (KVM: lapic: Allow user to disable adaptive tuning of timer advancement) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Liran Alon <liran.alon@oracle.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	kvm: vmx: Fix -Wmissing-prototypes warnings	Yi Wang	1	-0/+1
	We get a warning when build kernel W=1: arch/x86/kvm/vmx/vmx.c:6365:6: warning: no previous prototype for ‘vmx_update_host_rsp’ [-Wmissing-prototypes] void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp) Add the missing declaration to fix this. Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	KVM: nVMX: Fix using __this_cpu_read() in preemptible context	Wanpeng Li	1	-2/+2
	BUG: using __this_cpu_read() in preemptible [00000000] code: qemu-system-x86/4590 caller is nested_vmx_enter_non_root_mode+0xebd/0x1790 [kvm_intel] CPU: 4 PID: 4590 Comm: qemu-system-x86 Tainted: G OE 5.1.0-rc4+ #1 Call Trace: dump_stack+0x67/0x95 __this_cpu_preempt_check+0xd2/0xe0 nested_vmx_enter_non_root_mode+0xebd/0x1790 [kvm_intel] nested_vmx_run+0xda/0x2b0 [kvm_intel] handle_vmlaunch+0x13/0x20 [kvm_intel] vmx_handle_exit+0xbd/0x660 [kvm_intel] kvm_arch_vcpu_ioctl_run+0xa2c/0x1e50 [kvm] kvm_vcpu_ioctl+0x3ad/0x6d0 [kvm] do_vfs_ioctl+0xa5/0x6e0 ksys_ioctl+0x6d/0x80 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x6f/0x6c0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Accessing per-cpu variable should disable preemption, this patch extends the preemption disable region for __this_cpu_read(). Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Fixes: 52017608da33 ("KVM: nVMX: add option to perform early consistency checks via H/W") Cc: stable@vger.kernel.org Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	kvm: fix compilation on s390	Paolo Bonzini	1	-0/+2
	s390 does not have memremap, even though in this particular case it would be useful. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	kvm: x86: Include CPUID leaf 0x8000001e in kvm's supported CPUID	Jim Mattson	1	-0/+1
	Kvm now supports extended CPUID functions through 0x8000001f. CPUID leaf 0x8000001e is AMD's Processor Topology Information leaf. This contains similar information to CPUID leaf 0xb (Intel's Extended Topology Enumeration leaf), and should be included in the output of KVM_GET_SUPPORTED_CPUID, even though userspace is likely to override some of this information based upon the configuration of the particular VM. Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Borislav Petkov <bp@suse.de> Fixes: 8765d75329a38 ("KVM: X86: Extend CPUID range to include new leaf") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	kvm: x86: Include multiple indices with CPUID leaf 0x8000001d	Jim Mattson	1	-4/+3
	Per the APM, "CPUID Fn8000_001D_E[D,C,B,A]X reports cache topology information for the cache enumerated by the value passed to the instruction in ECX, referred to as Cache n in the following description. To gather information for all cache levels, software must repeatedly execute CPUID with 8000_001Dh in EAX and ECX set to increasing values beginning with 0 until a value of 00h is returned in the field CacheType (EAX[4:0]) indicating no more cache descriptions are available for this processor." The termination condition is the same as leaf 4, so we can reuse that code block for leaf 0x8000001d. Fixes: 8765d75329a38 ("KVM: X86: Extend CPUID range to include new leaf") Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	KVM: selftests: Compile code with warnings enabled	Thomas Huth	12	-31/+16
	So far the KVM selftests are compiled without any compiler warnings enabled. That's quite bad, since we miss a lot of possible bugs this way. Let's enable at least "-Wall" and some other useful warning flags now, and fix at least the trivial problems in the code (like unused variables). Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	kvm: selftests: avoid type punning	Paolo Bonzini	2	-2/+2
	Avoid warnings from -Wstrict-aliasing by using memcpy. Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-05-24	KVM: selftests: Fix a condition in test_hv_cpuid()	Dan Carpenter	1	-3/+2
	The code is trying to check that all the padding is zeroed out and it does this: entry->padding[0] == entry->padding[1] == entry->padding[2] == 0 Assume everything is zeroed correctly, then the first comparison is true, the next comparison is false and false is equal to zero so the overall condition is true. This bug doesn't affect run time very badly, but the code should instead just check that all three paddings are zero individually. Also the error message was copy and pasted from an earlier error and it wasn't correct. Fixes: 7edcb7343327 ("KVM: selftests: Add hyperv_cpuid test") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>