aboutsummaryrefslogtreecommitdiffstats
path: root/virt/kvm/iommu.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2011-07-24KVM: MMU: trace mmio page faultXiao Guangrong4-1/+80
Add tracepoints to trace mmio page fault Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: mmio page fault supportXiao Guangrong6-14/+255
The idea is from Avi: | We could cache the result of a miss in an spte by using a reserved bit, and | checking the page fault error code (or seeing if we get an ept violation or | ept misconfiguration), so if we get repeated mmio on a page, we don't need to | search the slot list/tree. | (https://lkml.org/lkml/2011/2/22/221) When the page fault is caused by mmio, we cache the info in the shadow page table, and also set the reserved bits in the shadow page table, so if the mmio is caused again, we can quickly identify it and emulate it directly Searching mmio gfn in memslots is heavy since we need to walk all memeslots, it can be reduced by this feature, and also avoid walking guest page table for soft mmu. [jan: fix operator precedence issue] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: reorganize struct kvm_shadow_walk_iteratorXiao Guangrong1-1/+1
Reorganize it for good using the cache Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: lockless walking shadow page tableXiao Guangrong2-8/+132
Use rcu to protect shadow pages table to be freed, so we can safely walk it, it should run fastly and is needed by mmio page fault Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: do not need atomicly to set/clear spteXiao Guangrong1-15/+71
Now, the spte is just from nonprsent to present or present to nonprsent, so we can use some trick to set/clear spte non-atomicly as linux kernel does Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: introduce the rules to modify shadow page tableXiao Guangrong1-34/+69
Introduce some interfaces to modify spte as linux kernel does: - mmu_spte_clear_track_bits, it set the spte from present to nonpresent, and track the stat bits(accessed/dirty) of spte - mmu_spte_clear_no_track, the same as mmu_spte_clear_track_bits except tracking the stat bits - mmu_spte_set, set spte from nonpresent to present - mmu_spte_update, only update the stat bits Now, it does not allowed to set spte from present to present, later, we can drop the atomicly opration for X86_32 host, and it is the preparing work to get spte on X86_32 host out of the mmu lock Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: abstract some functions to handle fault pfnXiao Guangrong2-18/+41
Introduce handle_abnormal_pfn to handle fault pfn on page fault path, introduce mmu_invalid_pfn to handle fault pfn on prefetch path It is the preparing work for mmio page fault support Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: filter out the mmio pfn from the fault pfnXiao Guangrong3-4/+21
If the page fault is caused by mmio, the gfn can not be found in memslots, and 'bad_pfn' is returned on gfn_to_hva path, so we can use 'bad_pfn' to identify the mmio page fault. And, to clarify the meaning of mmio pfn, we return fault page instead of bad page when the gfn is not allowd to prefetch Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: remove bypass_guest_pfXiao Guangrong7-132/+33
The idea is from Avi: | Maybe it's time to kill off bypass_guest_pf=1. It's not as effective as | it used to be, since unsync pages always use shadow_trap_nonpresent_pte, | and since we convert between the two nonpresent_ptes during sync and unsync. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: split kvm_mmu_free_pageXiao Guangrong1-3/+18
Split kvm_mmu_free_page to kvm_mmu_isolate_page and kvm_mmu_free_page One is used to remove the page from cache under mmu lock and the other is used to free page table out of mmu lock Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: count used shadow pages on prepareing pathXiao Guangrong1-5/+5
Move counting used shadow pages from commiting path to preparing path to reduce tlb flush on some paths Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: rename 'pt_write' to 'emulate'Xiao Guangrong2-13/+13
If 'pt_write' is true, we need to emulate the fault. And in later patch, we need to emulate the fault even though it is not a pt_write event, so rename it to better fit the meaning Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: cleanup for FNAME(fetch)Xiao Guangrong1-2/+2
gw->pte_access is the final access permission, since it is unified with gw->pt_access when we walked guest page table: FNAME(walk_addr_generic): pte_access = pt_access & FNAME(gpte_access)(vcpu, pte, true); Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: optimize to handle dirty bitXiao Guangrong2-34/+25
If dirty bit is not set, we can make the pte access read-only to avoid handing dirty bit everywhere Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: cache mmio info on page fault pathXiao Guangrong6-21/+96
If the page fault is caused by mmio, we can cache the mmio info, later, we do not need to walk guest page table and quickly know it is a mmio fault while we emulate the mmio instruction Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: x86: introduce vcpu_mmio_gva_to_gpa to cleanup the codeXiao Guangrong1-11/+31
Introduce vcpu_mmio_gva_to_gpa to translate the gva to gpa, we can use it to cleanup the code between read emulation and write emulation Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: do not update slot bitmap if spte is nonpresentXiao Guangrong1-8/+7
Set slot bitmap only if the spte is present Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM: MMU: fix walking shadow page tableXiao Guangrong1-4/+5
Properly check the last mapping, and do not walk to the next level if last spte is met Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24KVM guest: KVM Steal time registrationGlauber Costa4-0/+84
This patch implements the kvm bits of the steal time infrastructure. The most important part of it, is the steal time clock. It is an continuous clock that shows the accumulated amount of steal time since vcpu creation. It is supposed to survive cpu offlining/onlining. [marcelo: fix build with CONFIG_KVM_GUEST=n] Signed-off-by: Glauber Costa <glommer@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Eric B Munson <emunson@mgebm.net> CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> CC: Peter Zijlstra <peterz@infradead.org> CC: Avi Kivity <avi@redhat.com> CC: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-24[S390] use siginfo for sigtrap signalsMartin Schwidefsky1-6/+17
Provide additional information on SIGTRAP by using a sig_info signal. Use TRAP_BRKPT for breakpoints via illegal operation and TRAP_HWBKPT for breakpoints via program event recording. Provide the address of the instruction that caused the breakpoint via si_addr. While we are at it get rid of tracehook_consider_fatal_signal. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] dasd: add enhanced DASD statistics interfaceStefan Weinhuber4-91/+686
This patch extends the DASD statistics to allow for a more detailed analysis of DASD I/O operations. In particular we want the statistics to provide answers to the following questions: - How many requests used a PAV alias? - How many requests used High Performance FICON? - How do read request perform versus write requests? The existing DASD statistics interface has several shortcomings - The interface for global data is a formatted text table in procfs (/proc/dasd/statistics). The layout is meant for human readers and is not to easy to parse. If values get to large for the table layout, they get scaled down. - The statistics which are collected per block device can be accessed via an ioctl interface, which can only be extended by defining a new ioctl. - There is no statistics interface for individual PAV base and alias devices. To overcome theses shortcomings we create a new DASD statistics interface in debugfs. This interface will contain one entry for global data, one per DASD block device, and one per DASD base and alias device. Each file contains the statistic data in easy to parse name/value and name/array pairs. The existing interfaces will remain functional, but they will not be extended. Signed-off-by: Stefan Weinhuber <wein@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] kvm: make sigp emerg smp capableChristian Ehrhardt2-0/+9
SIGP emerg needs to pass the source vpu adress into __LC_CPU_ADDRESS of the target guest. Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] disable cpu measurement alerts on a dying cpuJan Glauber1-1/+2
The cpu measurement alerts that are used for instance by oprofile for hardware sampling are not turned off on a cpu that is going offline. Add the appropriate control register bit that should be disabled to the list. Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] initial cr0 bitsMartin Schwidefsky1-1/+1
Remove outdated bits from the initial cr0 register. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] iucv cr0 enablement bitMartin Schwidefsky3-4/+7
Do not set the cr0 enablement bit for iucv by default in head[31|64].S, move the enablement to iucv_init in the iucv base layer. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] race safe external interrupt registrationJan Glauber1-37/+46
The (un-)register_external_interrupt functions are not race safe if more than one interrupt handler is added or deleted for an external interrupt concurrently. Make the registration / unregistration of external interrupts race safe by using RCU and a spinlock. RCU is used to avoid a performance penalty in the external interrupt handler, the register and unregister functions are protected by the spinlock and are not performance critical. call_rcu must be used since the SCLP driver uses the interface with IRQs disabled. Also use the generic list implementation rather than homebrewn list code. Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] remove tape block docuMartin Schwidefsky1-122/+0
After git commit 66ceed5ad1318863c21710f316942bcefff8081c removed the tape block device driver, remove its documentation as well. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] ap: toleration support for ap device type 10Holger Dengler2-27/+91
Add toleration support for ap devices with device type 10. Signed-off-by: Holger Dengler <hd@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] cleanup program check handler prototypesMartin Schwidefsky2-13/+7
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] remove kvm mmu reload on s390Carsten Otte2-19/+2
This patch removes the mmu reload logic for kvm on s390. Via Martin's new gmap interface, we can safely add or remove memory slots while guest CPUs are in-flight. Thus, the mmu reload logic is not needed anymore. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] Use gmap translation for accessing guest memoryCarsten Otte6-110/+194
This patch removes kvm-s390 internal assumption of a linear mapping of guest address space to user space. Previously, guest memory was translated to user addresses using a fixed offset (gmsor). The new code uses gmap_fault to resolve guest addresses. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] use gmap address spaces for kvm guest imagesCarsten Otte3-12/+40
This patch switches kvm from using (Qemu's) user address space to Martin's gmap address space. This way QEMU does not have to use a linker script in order to fit large guests at low addresses in its address space. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] kvm guest address space mappingMartin Schwidefsky11-36/+473
Add code that allows KVM to control the virtual memory layout that is seen by a guest. The guest address space uses a second page table that shares the last level pte-tables with the process page table. If a page is unmapped from the process page table it is automatically unmapped from the guest page table as well. The guest address space mapping starts out empty, KVM can map any individual 1MB segments from the process virtual memory to any 1MB aligned location in the guest virtual memory. If a target segment in the process virtual memory does not exist or is unmapped while a guest mapping exists the desired target address is stored as an invalid segment table entry in the guest page table. The population of the guest page table is fault driven. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] fix s390 assembler code alignmentsJan Glauber21-680/+378
The alignment is missing for various global symbols in s390 assembly code. With a recent gcc and an instruction like stgrl this can lead to a specification exception if the instruction uses such a mis-aligned address. Specify the alignment explicitely and while add it define __ALIGN for s390 and use the ENTRY define to save some lines of code. Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] move sie code to entry.SMartin Schwidefsky5-107/+79
The entry to / exit from sie has subtle dependencies to the first level interrupt handler. Move the sie assembler code to entry64.S and replace the SIE_HOOK callback with a test and the new _TIF_SIE bit. In addition this patch fixes several problems in regard to the check for the_TIF_EXIT_SIE bits. The old code checked the TIF bits before executing the interrupt handler and it only modified the instruction address if it pointed directly to the sie instruction. In both cases it could miss a TIF bit that normally would cause an exit from the guest and would reenter the guest context. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] kvm: handle tprot interceptsChristian Borntraeger5-0/+53
When running a kvm guest we can get intercepts for tprot, if the host page table is read-only or not populated. This patch implements the most common case (linux memory detection). This also allows host copy on write for guest memory on newer systems. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] qdio: clear shared DSCI before scheduling the queue handlerJan Glauber1-10/+5
The following race can occur with qdio devices that use the shared device state change indicator: Device (Shared DSCI) CPU0 CPU1 =============================================================================== 1. DSCI 0 => 1, INT pending 2. Thinint handler * si_used = 1 * Inbound tasklet_schedule * DSCI 1 => 0 3. DSCI 0 => 1, INT pending 4. Thinint handler * si_used = 1 * Inbound tasklet_schedu le => NOP 5. Inbound tasklet run 6. DSCI = 1, INT surpressed 7. DSCI 1 => 0 The race would lead to a stall where new data in the input queue is not recognized so the device stops working in case of no further traffic. Fix the race by resetting the DSCI before scheduling the inbound tasklet so the device generates an interrupt if new data arrives in the above scenario in step 6. Reviewed-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] reference bit testing for unmapped pagesMartin Schwidefsky1-3/+3
On x86 a page without a mapper is by definition not referenced / old. The s390 architecture keeps the reference bit in the storage key and the current code will check the storage key for page without a mapper. This leads to an interesting effect: the first time an s390 system needs to write pages to swap it only finds referenced pages. This causes a lot of pages to get added and written to the swap device. To avoid this behaviour change page_referenced to query the storage key only if there is a mapper of the page. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] irqs: Do not trace arch_local_{*,irq_*} functionsSteven Rostedt1-8/+8
Do not trace arch_local_save_flags(), arch_local_irq_*() and friends. Although they are marked inline, gcc may still make a function out of them and add it to the pool of functions that are traced by the function tracer. This can cause undesirable results (kernel panic, triple faults, etc). Add the notrace notation to prevent them from ever being traced. Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-24[S390] kconfig: remove tape interface support commentMartin Schwidefsky1-3/+0
There is nothing below the menu entry "S/390 tape interface support". Remove it. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2011-07-23of: fix missing include from of_pci.cGrant Likely1-0/+1
of_pci.c references symbols from linux/of.h. Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-07-23gre: fix improper error handlingxeb@mail.ru1-15/+6
Fix improper protocol err_handler, current implementation is fully unapplicable and may cause kernel crash due to double kfree_skb. Signed-off-by: Dmitry Kozlov <xeb@mail.ru> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-23ipv4: use RT_TOS after some rt_tos conversionsJulian Anastasov2-2/+2
rt_tos was changed to iph->tos but it must be filtered by RT_TOS Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-23via-velocity: remove duplicated #includeHuang Weiyi1-1/+0
Remove duplicated #include('s) in drivers/net/via-velocity.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-23qlge: remove duplicated #includeHuang Weiyi1-1/+0
Remove duplicated #include('s) in drivers/net/qlge/qlge_main.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-23igb: remove duplicated #includeHuang Weiyi1-1/+0
Remove duplicated #include('s) in drivers/net/igb/igb_main.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-23can: c_can: remove duplicated #includeHuang Weiyi2-2/+0
Remove duplicated #include('s) in drivers/net/can/c_can/c_can.c drivers/net/can/c_can/c_can_platform.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-23bnad: remove duplicated #includeHuang Weiyi1-1/+0
Remove duplicated #include('s) in drivers/net/bna/bnad.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-23ata: PATA_ARASAN_CF depends on DMADEVICESRandy Dunlap1-0/+1
Fix kconfig unmet dependency warning: warning: (PATA_ARASAN_CF && VIDEO_TIMBERDALE && SND_SOC_SH4_SIU) selects DMA_ENGINE which has unmet direct dependencies (DMADEVICES) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Viresh Kumar <viresh.kumar@st.com> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2011-07-23ata: remove unnecessary codeGreg Dietsche1-7/+1
Compile tested. remove unnecessary code that matches this coccinelle pattern if (...) return ret; return ret; Signed-off-by: Greg Dietsche <Gregory.Dietsche@cuw.edu> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>