linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-03-08	KVM: MMU: simplify last_pte_bitmap	Paolo Bonzini	2	-30/+28
	Branch-free code is fun and everybody knows how much Avi loves it, but last_pte_bitmap takes it a bit to the extreme. Since the code is simply doing a range check, like (level == 1 \|\| ((gpte & PT_PAGE_SIZE_MASK) && level < N) we can make it branch-free without storing the entire truth table; it is enough to cache N. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08	KVM: MMU: coalesce more page zapping in mmu_sync_children	Paolo Bonzini	1	-4/+11
	mmu_sync_children can only process up to 16 pages at a time. Check if we need to reschedule, and do not bother zapping the pages until that happens. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08	KVM: MMU: move zap/flush to kvm_mmu_get_page	Paolo Bonzini	1	-20/+20
	kvm_mmu_get_page is the only caller of kvm_sync_page_transient and kvm_sync_pages. Moving the handling of the invalid_list there removes the need for the underdocumented kvm_sync_page_transient function. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08	KVM: MMU: invert return value of mmu.sync_page and kvm_sync_page	Paolo Bonzini	2	-19/+16
	Return true if the page was synced (and the TLB must be flushed) and false if the page was zapped. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08	KVM: MMU: cleanup __kvm_sync_page and its callers	Paolo Bonzini	1	-6/+4
	Calling kvm_unlink_unsync_page in the middle of __kvm_sync_page makes things unnecessarily tricky. If kvm_mmu_prepare_zap_page is called, it will call kvm_unlink_unsync_page too. So kvm_unlink_unsync_page can be called just as well at the beginning or the end of __kvm_sync_page... which means that we might do it in kvm_sync_page too and remove the parameter. kvm_sync_page ends up being the same code that kvm_sync_pages used to have before the previous patch. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08	KVM: MMU: use kvm_sync_page in kvm_sync_pages	Paolo Bonzini	1	-2/+1
	If the last argument is true, kvm_unlink_unsync_page is called anyway in __kvm_sync_page (either by kvm_mmu_prepare_zap_page or by __kvm_sync_page itself). Therefore, kvm_sync_pages can just call kvm_sync_page, instead of going through kvm_unlink_unsync_page+__kvm_sync_page. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08	KVM: MMU: move TLB flush out of __kvm_sync_page	Paolo Bonzini	1	-29/+24
	By doing this, kvm_sync_pages can use __kvm_sync_page instead of reinventing it. Because of kvm_mmu_flush_or_zap, the code does not end up being more complex than before, and more cleanups to kvm_sync_pages will come in the next patches. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08	KVM: MMU: introduce kvm_mmu_flush_or_zap	Paolo Bonzini	1	-9/+10
	This is a generalization of mmu_pte_write_flush_tlb, that also takes care of calling kvm_mmu_commit_zap_page. The next patches will introduce more uses. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: i8254: drop local copy of mul_u64_u32_div	Paolo Bonzini	1	-23/+3
	A function that does the same as i8254.c's muldiv64 has been added (for KVM's own use, in fact!) in include/linux/math64.h. Use it instead of muldiv64. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: MMU: check kvm_mmu_pages and mmu_page_path indices	Xiao Guangrong	1	-1/+6
	Give a special invalid index to the root of the walk, so that we can check the consistency of kvm_mmu_pages and mmu_page_path. Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> [Extracted from a bigger patch proposed by Guangrong. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: MMU: Fix ubsan warnings	Paolo Bonzini	1	-24/+33
	kvm_mmu_pages_init is doing some really yucky stuff. It is setting up a sentinel for mmu_page_clear_parents; however, because of a) the way levels are numbered starting from 1 and b) the way mmu_page_path sizes its arrays with PT64_ROOT_LEVEL-1 elements, the access can be out of bounds. This is harmless because the code overwrites up to the first two elements of parents->idx and these are initialized, and because the sentinel is not needed in this case---mmu_page_clear_parents exits anyway when it gets to the end of the array. However ubsan complains, and everyone else should too. This fix does three things. First it makes the mmu_page_path arrays PT64_ROOT_LEVEL elements in size, so that we can write to them without checking the level in advance. Second it disintegrates kvm_mmu_pages_init between mmu_unsync_walk (to reset the struct kvm_mmu_pages) and for_each_sp (to place the NULL sentinel at the end of the current path). This is okay because the mmu_page_path is only used in mmu_pages_clear_parents; mmu_pages_clear_parents itself is called within a for_each_sp iterator, and hence always after a call to mmu_pages_next. Third it changes mmu_pages_clear_parents to just use the sentinel to stop iteration, without checking the bounds on level. Reported-by: Sasha Levin <sasha.levin@oracle.com> Reported-by: Mike Krinkin <krinkin.m.u@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: MMU: cleanup handle_abnormal_pfn	Paolo Bonzini	1	-6/+2
	The goto and temporary variable are unnecessary, just use return statements. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: VMX: use vmcs_clear/set_bits for debug register exits	Paolo Bonzini	1	-11/+3
	Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: ensure __gfn_to_pfn_memslot initializes *writable	Paolo Bonzini	1	-2/+8
	For the kvm_is_error_hva, ubsan complains if the uninitialized writable is passed to __direct_map, even though the value itself is not used (__direct_map goes to mmu_set_spte->set_spte->set_mmio_spte but never looks at that argument). Ensuring that __gfn_to_pfn_memslot initializes *writable is cheap and avoids this kind of issue. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: document KVM_REINJECT_CONTROL ioctl	Radim Krčmář	1	-0/+24
	Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: i8254: turn kvm_kpit_state.reinject into atomic_t	Radim Krčmář	2	-5/+5
	Document possible races between readers and concurrent update to the ioctl. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: i8254: move PIT timer function initialization	Radim Krčmář	1	-2/+1
	We can do it just once. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: i8254: don't assume layout of kvm_kpit_state	Radim Krčmář	1	-4/+8
	channels has offset 0 and correct size now, but that can change. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: i8254: remove pointless dereference of PIT	Radim Krčmář	1	-2/+2
	PIT is known at that point. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: i8254: remove pit and kvm from kvm_kpit_state	Radim Krčmář	2	-7/+9
	kvm isn't ever used and pit can be accessed with container_of. If you really need kvm, pit_state_to_pit(ps)->kvm. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: i8254: refactor kvm_free_pit	Radim Krčmář	1	-13/+11
	Could be easier to read, but git history will become deeper. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: i8254: refactor kvm_create_pit	Radim Krčmář	1	-15/+12
	Locks are gone, so we don't need to duplicate error paths. Use goto everywhere. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: i8254: remove notifiers from PIT discard policy	Radim Krčmář	3	-13/+38
	Discard policy doesn't rely on information from notifiers, so we don't need to register notifiers unconditionally. We kept correct counts in case userspace switched between policies during runtime, but that can be avoided by reseting the state. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: i8254: remove unnecessary uses of PIT state lock	Radim Krčmář	3	-14/+8
	- kvm_create_pit had to lock only because it exposed kvm->arch.vpit very early, but initialization doesn't use kvm->arch.vpit since the last patch, so we can drop locking. - kvm_free_pit is only run after there are no users of KVM and therefore is the sole actor. - Locking in kvm_vm_ioctl_reinject doesn't do anything, because reinject is only protected at that place. - kvm_pit_reset isn't used anywhere and its locking can be dropped if we hide it. Removing useless locking allows to see what actually is being protected by PIT state lock (values accessible from the guest). Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: i8254: pass struct kvm_pit instead of kvm in PIT	Radim Krčmář	3	-72/+70
	This patch passes struct kvm_pit into internal PIT functions. Those functions used to get PIT through kvm->arch.vpit, even though most of them never used *kvm for other purposes. Another benefit is that we don't need to set kvm->arch.vpit during initialization. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: i8254: tone down WARN_ON pit.state_lock	Radim Krčmář	1	-14/+3
	If the guest could hit this, it would hang the host kernel, bacause of sheer number of those reports. Internal callers have to be sensible anyway, so we now only check for it in an API function. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: i8254: use atomic_t instead of pit.inject_lock	Radim Krčmář	2	-35/+24
	The lock was an overkill, the same can be done with atomics. A mb() was added in kvm_pit_ack_irq, to pair with implicit barrier between pit_timer_fn and pit_do_work. The mb() prevents a race that could happen if pending == 0 and irq_ack == 0: kvm_pit_ack_irq: \| pit_timer_fn: p = atomic_read(&ps->pending); \| \| atomic_inc(&ps->pending); \| queue_work(pit_do_work); \| pit_do_work: \| atomic_xchg(&ps->irq_ack, 0); \| return; atomic_set(&ps->irq_ack, 1); \| if (p == 0) return; \| where the interrupt would not be delivered in this tick of pit_timer_fn. PIT would have eventually delivered the interrupt, but we sacrifice perofmance to make sure that interrupts are not needlessly delayed. sfence isn't enough: atomic_dec_if_positive does atomic_read first and x86 can reorder loads before stores. lfence isn't enough: store can pass lfence, turning it into a nop. A compiler barrier would be more than enough as CPU needs to stall for unbelievably long to use fences. This patch doesn't do anything in kvm_pit_reset_reinject, because any order of resets can race, but the result differs by at most one interrupt, which is ok, because it's the same result as if the reset happened at a slightly different time. (Original code didn't protect the reset path with a proper lock, so users have to be robust.) Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: i8254: add kvm_pit_reset_reinject	Radim Krčmář	1	-8/+10
	pit_state.pending and pit_state.irq_ack are always reset at the same time. Create a function for them. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: i8254: simplify atomics in kvm_pit_ack_irq	Radim Krčmář	1	-11/+1
	We already have a helper that does the same thing. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-04	KVM: i8254: change PIT discard tick policy	Radim Krčmář	1	-5/+7
	Discard policy uses ack_notifiers to prevent injection of PIT interrupts before EOI from the last one. This patch changes the policy to always try to deliver the interrupt, which makes a difference when its vector is in ISR. Old implementation would drop the interrupt, but proposed one injects to IRR, like real hardware would. The old policy breaks legacy NMI watchdogs, where PIT is used through virtual wire (LVT0): PIT never sends an interrupt before receiving EOI, thus a guest deadlock with disabled interrupts will stop NMIs. Note that NMI doesn't do EOI, so PIT also had to send a normal interrupt through IOAPIC. (KVM's PIT is deeply rotten and luckily not used much in modern systems.) Even though there is a chance of regressions, I think we can fix the LVT0 NMI bug without introducing a new tick policy. Cc: <stable@vger.kernel.org> Reported-by: Yuki Shibuya <shibuya.yk@ncos.nec.co.jp> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03	KVM: MMU: apply page track notifier	Xiao Guangrong	3	-6/+22
	Register the notifier to receive write track event so that we can update our shadow page table It makes kvm_mmu_pte_write() be the callback of the notifier, no function is changed Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03	KVM: MMU: simplify mmu_need_write_protect	Xiao Guangrong	1	-22/+7
	Now, all non-leaf shadow page are page tracked, if gfn is not tracked there is no non-leaf shadow page of gfn is existed, we can directly make the shadow page of gfn to unsync Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03	KVM: MMU: use page track for non-leaf shadow pages	Xiao Guangrong	1	-5/+21
	non-leaf shadow pages are always write protected, it can be the user of page track Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03	KVM: page track: add notifier support	Xiao Guangrong	4	-0/+114
	Notifier list is introduced so that any node wants to receive the track event can register to the list Two APIs are introduced here: - kvm_page_track_register_notifier(): register the notifier to receive track event - kvm_page_track_unregister_notifier(): stop receiving track event by unregister the notifier The callback, node->track_write() is called when a write access on the write tracked page happens Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03	KVM: MMU: clear write-flooding on the fast path of tracked page	Xiao Guangrong	3	-4/+24
	If the page fault is caused by write access on write tracked page, the real shadow page walking is skipped, we lost the chance to clear write flooding for the page structure current vcpu is using Fix it by locklessly waking shadow page table to clear write flooding on the shadow page structure out of mmu-lock. So that we change the count to atomic_t Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03	KVM: MMU: let page fault handler be aware tracked page	Xiao Guangrong	4	-7/+57
	The page fault caused by write access on the write tracked page can not be fixed, it always need to be emulated. page_fault_handle_page_track() is the fast path we introduce here to skip holding mmu-lock and shadow page table walking However, if the page table is not present, it is worth making the page table entry present and readonly to make the read access happy mmu_need_write_protect() need to be cooked to avoid page becoming writable when making page table present or sync/prefetch shadow page table entries Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03	KVM: page track: introduce kvm_slot_page_track_{add,remove}_page	Xiao Guangrong	2	-0/+92
	These two functions are the user APIs: - kvm_slot_page_track_add_page(): add the page to the tracking pool after that later specified access on that page will be tracked - kvm_slot_page_track_remove_page(): remove the page from the tracking pool, the specified access on the page is not tracked after the last user is gone Both of these are called under the protection both of mmu-lock and kvm->srcu or kvm->slots_lock Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03	KVM: page track: add the framework of guest page tracking	Xiao Guangrong	5	-1/+82
	The array, gfn_track[mode][gfn], is introduced in memory slot for every guest page, this is the tracking count for the gust page on different modes. If the page is tracked then the count is increased, the page is not tracked after the count reaches zero We use 'unsigned short' as the tracking count which should be enough as shadow page table only can use 2^14 (2^3 for level, 2^1 for cr4_pae, 2^2 for quadrant, 2^3 for access, 2^1 for nxe, 2^1 for cr0_wp, 2^1 for smep_andnot_wp, 2^1 for smap_andnot_wp, and 2^1 for smm) at most, there is enough room for other trackers Two callbacks, kvm_page_track_create_memslot() and kvm_page_track_free_memslot() are implemented in this patch, they are internally used to initialize and reclaim the memory of the array Currently, only write track mode is supported Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03	KVM: MMU: introduce kvm_mmu_slot_gfn_write_protect	Xiao Guangrong	2	-5/+13
	Split rmap_write_protect() and introduce the function to abstract the write protection based on the slot This function will be used in the later patch Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03	KVM: MMU: introduce kvm_mmu_gfn_{allow,disallow}_lpage	Xiao Guangrong	2	-13/+28
	Abstract the common operations from account_shadowed() and unaccount_shadowed(), then introduce kvm_mmu_gfn_disallow_lpage() and kvm_mmu_gfn_allow_lpage() These two functions will be used by page tracking in the later patch Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03	KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed	Xiao Guangrong	4	-22/+25
	kvm_lpage_info->write_count is used to detect if the large page mapping for the gfn on the specified level is allowed, rename it to disallow_lpage to reflect its purpose, also we rename has_wrprotected_page() to mmu_gfn_lpage_is_disallowed() to make the code more clearer Later we will extend this mechanism for page tracking: if the gfn is tracked then large mapping for that gfn on any level is not allowed. The new name is more straightforward Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03	kvm: x86: Check dest_map->vector to match eoi signals for rtc	Joerg Roedel	1	-3/+14
	Using the vector stored at interrupt delivery makes the eoi matching safe agains irq migration in the ioapic. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03	kvm: x86: Track irq vectors in ioapic->rtc_status.dest_map	Joerg Roedel	2	-1/+10
	This allows backtracking later in case the rtc irq has been moved to another vcpu/vector. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03	kvm: x86: Convert ioapic->rtc_status.dest_map to a struct	Joerg Roedel	5	-16/+26
	Currently this is a bitmap which tracks which CPUs we expect an EOI from. Move this bitmap to a struct so that we can track additional information there. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-02	KVM: PPC: Add support for 64bit TCE windows	Alexey Kardashevskiy	6	-5/+75
	The existing KVM_CREATE_SPAPR_TCE only supports 32bit windows which is not enough for directly mapped windows as the guest can get more than 4GB. This adds KVM_CREATE_SPAPR_TCE_64 ioctl and advertises it via KVM_CAP_SPAPR_TCE_64 capability. The table size is checked against the locked memory limit. Since 64bit windows are to support Dynamic DMA windows (DDW), let's add @bus_offset and @page_shift which are also required by DDW. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-03-02	KVM: PPC: Add @offset to kvmppc_spapr_tce_table	Alexey Kardashevskiy	2	-2/+6
	This enables userspace view of TCE tables to start from non-zero offset on a bus. This will be used for huge DMA windows. This only changes the internal structure, the user interface needs to change in order to use an offset. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-03-02	KVM: PPC: Add @page_shift to kvmppc_spapr_tce_table	Alexey Kardashevskiy	3	-23/+23
	At the moment the kvmppc_spapr_tce_table struct can only describe 4GB windows and handle fixed size (4K) pages. Dynamic DMA windows support more so these limits need to be extended. This replaces window_size (in bytes, 4GB max) with page_shift (32bit) and size (64bit, in pages). This should cause no behavioural change as this is changing the internal structures only - the user interface still only allows one to create a 32-bit table with 4KiB pages at this stage. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-03-02	KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_64 capability number	Alexey Kardashevskiy	1	-0/+1
	This adds a capability number for 64-bit TCE tables support. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-29	KVM: PPC: Book3S HV: Add tunable to control H_IPI redirection	Suresh E. Warrier	3	-1/+16
	Redirecting the wakeup of a VCPU from the H_IPI hypercall to a core running in the host is usually a good idea, most workloads seemed to benefit. However, in one heavily interrupt-driven SMT1 workload, some regression was observed. This patch adds a kvm_hv module parameter called h_ipi_redirect to control this feature. The default value for this tunable is 1 - that is enable the feature. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-29	KVM: PPC: Book3S HV: Send IPI to host core to wake VCPU	Suresh E. Warrier	3	-3/+110
	This patch adds support to real-mode KVM to search for a core running in the host partition and send it an IPI message with VCPU to be woken. This avoids having to switch to the host partition to complete an H_IPI hypercall when the VCPU which is the target of the the H_IPI is not loaded (is not running in the guest). The patch also includes the support in the IPI handler running in the host to do the wakeup by calling kvmppc_xics_ipi_action for the PPC_MSG_RM_HOST_ACTION message. When a guest is being destroyed, we need to ensure that there are no pending IPIs waiting to wake up a VCPU before we free the VCPUs of the guest. This is accomplished by: - Forces a PPC_MSG_CALL_FUNCTION IPI to be completed by all CPUs before freeing any VCPUs in kvm_arch_destroy_vm(). - Any PPC_MSG_RM_HOST_ACTION messages must be executed first before any other PPC_MSG_CALL_FUNCTION messages. Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>