aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2019-08-15drm/etnaviv: skip command stream validation on PPAS capable GPUsLucas Stach1-1/+2
With per-process address spaces in place, a rogue process submitting bogus command streams can only hurt itself. There is no need to validate the command stream before execution anymore. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Guido Günther <agx@sigxcpu.org>
2019-08-15drm/etnaviv: implement per-process address spaces on MMUv2Lucas Stach13-116/+209
This builds on top of the MMU contexts introduced earlier. Instead of having one context per GPU core, each GPU client receives its own context. On MMUv1 this still means a single shared pagetable set is used by all clients, but on MMUv2 there is now a distinct set of pagetables for each client. As the command fetch is also translated via the MMU on MMUv2 the kernel command ringbuffer is mapped into each of the client pagetables. As the MMU context switch is a bit of a heavy operation, due to the needed cache and TLB flushing, this patch implements a lazy way of switching the MMU context. The kernel does not have its own MMU context, but reuses the last client context for all of its operations. This has some visible impact, as the GPU can now only be started once a client has submitted some work and we got the client MMU context assigned. Also the MMU context has a different lifetime than the general client context, as the GPU might still execute the kernel command buffer in the context of a client even after the client has completed all GPU work and has been terminated. Only when the GPU is runtime suspended or switches to another clients MMU context is the old context freed up. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Guido Günther <agx@sigxcpu.org>
2019-08-15drm/etnaviv: provide MMU context to etnaviv_gem_mapping_getLucas Stach3-13/+21
In preparation to having a context per process, etnaviv_gem_mapping_get should not use the current GPU context, but needs to be told which context to use. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Guido Günther <agx@sigxcpu.org>
2019-08-15drm/etnaviv: split out starting of FE idle loopLucas Stach1-10/+16
Move buffer setup and starting of the FE loop in the kernel ringbuffer into a separate function. This is a preparation to start the FE later in the submit process. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Guido Günther <agx@sigxcpu.org>
2019-08-15drm/etnaviv: rework MMU handlingLucas Stach15-430/+461
This reworks the MMU handling to make it possible to have multiple MMU contexts. A context is basically one instance of GPU page tables. Currently we have one set of page tables per GPU, which isn't all that clever, as it has the following two consequences: 1. All GPU clients (aka processes) are sharing the same pagetables, which means there is no isolation between clients, but only between GPU assigned memory spaces and the rest of the system. Better than nothing, but also not great. 2. Clients operating on the same set of buffers with different etnaviv GPU cores, e.g. a workload using both the 2D and 3D GPU, need to map the used buffers into the pagetable sets of each used GPU. This patch reworks all the MMU handling to introduce the abstraction of the MMU context. A context can be shared across different GPU cores, as long as they have compatible MMU implementations, which is the case for all systems with Vivante GPUs seen in the wild. As MMUv1 is not able to change pagetables on the fly, without a "stop the world" operation, which stops GPU, changes pagetables via CPU interaction, restarts GPU, the implementation introduces a shared context on MMUv1, which is returned whenever there is a request for a new context. This patch assigns a MMU context to each GPU, so on MMUv2 systems there is still one set of pagetables per GPU, but due to the shared context MMUv1 systems see a change in behavior as now a single pagetable set is used across all GPU cores. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Guido Günther <agx@sigxcpu.org>
2019-08-15drm/etnaviv: replace MMU flush marker with flush sequenceLucas Stach4-8/+11
If a MMU is shared between multiple GPUs, all of them need to flush their TLBs, so a single marker that gets reset on the first flush won't do. Replace the flush marker with a sequence number, so that it's possible to check if the TLB is in sync with the current page table state for each GPU. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Guido Günther <agx@sigxcpu.org>
2019-08-15drm/etnaviv: share a single cmdbuf suballoc region across all GPUsLucas Stach7-28/+31
There is no need for each GPU to have it's own cmdbuf suballocation region. Only allocate a single one for the the etnaviv virtual device and share it across all GPUs. As the suballoc space is now potentially shared by more hardware jobs running in parallel, double its size to 512KB to avoid contention. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Guido Günther <agx@sigxcpu.org>
2019-08-15drm/etnaviv: split out cmdbuf mapping into address spaceLucas Stach8-65/+113
This allows to decouple the cmdbuf suballocator create and mapping the region into the GPU address space. Allowing multiple AS to share a single cmdbuf suballoc. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Guido Günther <agx@sigxcpu.org>
2019-08-15drm/etnaviv: simplify unbind checksLucas Stach2-13/+7
Remember if the GPU has been sucessfully initialized. Only in that case do we need to clean up various structures in the unbind path. If the GPU hasn't been sucessfully initialized all the cleanups should happen in the failure paths of the init function. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Guido Günther <agx@sigxcpu.org>
2019-08-15drm/etnaviv: pass mmu pointer to etnaviv_core_dump_mmuLucas Stach2-5/+5
This function does only need the mmu part part of the gpu struct. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
2019-08-15drm/etnaviv: dump only failing submitLucas Stach2-31/+16
Due to the tracking provided by the scheduler we know exactly which submit is failing. Only dump this single submit and the required auxiliary information. This cuts down the size of the devcoredumps by only including relevant information. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
2019-08-09etnaviv: perfmon: fix total and idle HI cyleces readoutChristian Gmeiner1-11/+33
As seen at CodeAurora's linux-imx git repo in imx_4.19.35_1.0.0 branch. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2019-08-09etnaviv: fix whitespace errorsChristian Gmeiner1-2/+2
Changes in V2: - use indentation as suggested by Philipp Zabel. Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2019-08-09drm/etnaviv: remove unused function etnaviv_gem_mapping_referenceLucas Stach2-13/+0
Hasn't been used for quite a while. There is no point in keeping unused code around. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Guido Günther <agx@sigxcpu.org>
2019-08-09drm/etnaviv: fix etnaviv_cmdbuf_suballoc_new return valueLucas Stach1-2/+4
The call site expects to get either a valid suballoc or an error pointer, so a NULL return will not be treated as an error. Make sure to always return a proper error pointer in case something goes wrong. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de> Reviewed-by: Guido Günther <agx@sigxcpu.org>
2019-08-02drm/etnaviv: clean up includesLucas Stach8-14/+10
Drop unused includes, move more includes from the generic etnaviv_drv.h to the units where they are actually used, sort includes. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Acked-by: Sam Ravnborg <sam@ravnborg.org>
2019-08-02drm/etnaviv: Use devm_platform_ioremap_resource()Fabio Estevam1-3/+1
Use devm_platform_ioremap_resource() to simplify the code a bit. Signed-off-by: Fabio Estevam <festevam@gmail.com> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2019-08-02drm/etnaviv: drop use of drmP.hSam Ravnborg10-9/+35
Drop use of the deprecated drmP.h header file. Fix fallout in all .c files. The etnaviv_drv.h header file was made self-contained, and missing includes was then added to the .c files that needed them. In a few cases the list of include files was sorted. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: etnaviv@lists.freedesktop.org Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
2019-07-21Linus 5.3-rc1Linus Torvalds1-2/+2
2019-07-21iommu/amd: fix a crash in iova_magazine_free_pfnsQian Cai1-1/+1
The commit b3aa14f02254 ("iommu: remove the mapping_error dma_map_ops method") incorrectly changed the checking from dma_ops_alloc_iova() in map_sg() causes a crash under memory pressure as dma_ops_alloc_iova() never return DMA_MAPPING_ERROR on failure but 0, so the error handling is all wrong. kernel BUG at drivers/iommu/iova.c:801! Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:iova_magazine_free_pfns+0x7d/0xc0 Call Trace: free_cpu_cached_iovas+0xbd/0x150 alloc_iova_fast+0x8c/0xba dma_ops_alloc_iova.isra.6+0x65/0xa0 map_sg+0x8c/0x2a0 scsi_dma_map+0xc6/0x160 pqi_aio_submit_io+0x1f6/0x440 [smartpqi] pqi_scsi_queue_command+0x90c/0xdd0 [smartpqi] scsi_queue_rq+0x79c/0x1200 blk_mq_dispatch_rq_list+0x4dc/0xb70 blk_mq_sched_dispatch_requests+0x249/0x310 __blk_mq_run_hw_queue+0x128/0x200 blk_mq_run_work_fn+0x27/0x30 process_one_work+0x522/0xa10 worker_thread+0x63/0x5b0 kthread+0x1d2/0x1f0 ret_from_fork+0x22/0x40 Fixes: b3aa14f02254 ("iommu: remove the mapping_error dma_map_ops method") Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-21hexagon: switch to generic version of pte allocationMike Rapoport1-32/+2
The hexagon implementation pte_alloc_one(), pte_alloc_one_kernel(), pte_free_kernel() and pte_free() is identical to the generic except of lack of __GFP_ACCOUNT for the user PTEs allocation. Switch hexagon to use generic version of these functions. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-20typo fix: it's d_make_root, not d_make_inode...Al Viro1-1/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-20dt-bindings: pinctrl: stm32: Fix missing 'clocks' property in examplesRob Herring1-0/+4
Now that examples are validated against the DT schema, an error with required 'clocks' property missing is exposed: Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.example.dt.yaml: \ pinctrl@40020000: gpio@0: 'clocks' is a required property Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.example.dt.yaml: \ pinctrl@50020000: gpio@1000: 'clocks' is a required property Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.example.dt.yaml: \ pinctrl@50020000: gpio@2000: 'clocks' is a required property Add the missing 'clocks' properties to the examples to fix the errors. Fixes: 2c9239c125f0 ("dt-bindings: pinctrl: Convert stm32 pinctrl bindings to json-schema") Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: linux-gpio@vger.kernel.org Cc: linux-stm32@st-md-mailman.stormreply.com Acked-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: Rob Herring <robh@kernel.org>
2019-07-20dt-bindings: iio: ad7124: Fix dtc warnings in exampleRob Herring1-33/+38
With the conversion to DT schema, the examples are now compiled with dtc. The ad7124 binding example has the following warning: Documentation/devicetree/bindings/iio/adc/adi,ad7124.example.dts:19.11-21: \ Warning (reg_format): /example-0/adc@0:reg: property has invalid length (4 bytes) (#address-cells == 1, #size-cells == 1) There's a default #size-cells and #address-cells values of 1 for examples. For examples needing different values such as this one on a SPI bus, they need to provide a SPI bus parent node. Fixes: 26ae15e62d3c ("Convert AD7124 bindings documentation to YAML format.") Cc: Jonathan Cameron <jic23@kernel.org> Cc: linux-iio@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2019-07-20dt-bindings: iio: avia-hx711: Fix avdd-supply typo in exampleRob Herring1-1/+1
Now that examples are validated against the DT schema, a typo in avia-hx711 example generates a warning: Documentation/devicetree/bindings/iio/adc/avia-hx711.example.dt.yaml: weight: 'avdd-supply' is a required property Fix the typo. Fixes: 5150ec3fe125 ("avia-hx711.yaml: transform DT binding to YAML") Cc: Andreas Klinger <ak@it-klinger.de> Cc: Jonathan Cameron <jic23@kernel.org> Cc: linux-iio@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2019-07-20dt-bindings: pinctrl: aspeed: Fix AST2500 example errorsRob Herring1-4/+1
The schema examples are now validated against the schema itself. The AST2500 pinctrl schema has a couple of errors: Documentation/devicetree/bindings/pinctrl/aspeed,ast2500-pinctrl.example.dt.yaml: \ example-0: $nodename:0: 'example-0' does not match '^(bus|soc|axi|ahb|apb)(@[0-9a-f]+)?$' Documentation/devicetree/bindings/pinctrl/aspeed,ast2500-pinctrl.example.dt.yaml: \ pinctrl: aspeed,external-nodes: [[1, 2]] is too short Fixes: 0a617de16730 ("dt-bindings: pinctrl: aspeed: Convert AST2500 bindings to json-schema") Cc: Andrew Jeffery <andrew@aj.id.au> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Joel Stanley <joel@jms.id.au> Cc: linux-aspeed@lists.ozlabs.org Cc: linux-gpio@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Acked-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Rob Herring <robh@kernel.org>
2019-07-20dt-bindings: pinctrl: aspeed: Fix 'compatible' schema errorsRob Herring2-2/+6
The Aspeed pinctl schema have errors in the 'compatible' schema: Documentation/devicetree/bindings/pinctrl/aspeed,ast2400-pinctrl.yaml: \ properties:compatible:enum: ['aspeed', 'ast2400-pinctrl', 'aspeed', 'g4-pinctrl'] has non-unique elements Documentation/devicetree/bindings/pinctrl/aspeed,ast2500-pinctrl.yaml: \ properties:compatible:enum: ['aspeed', 'ast2500-pinctrl', 'aspeed', 'g5-pinctrl'] has non-unique elements Flow style sequences have to be quoted if the vales contain ','. Fix this by using the more common one line per entry formatting. Fixes: 0a617de16730 ("dt-bindings: pinctrl: aspeed: Convert AST2500 bindings to json-schema") Fixes: 07457937bb5c ("dt-bindings: pinctrl: aspeed: Convert AST2400 bindings to json-schema") Cc: Andrew Jeffery <andrew@aj.id.au> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Joel Stanley <joel@jms.id.au> Cc: linux-aspeed@lists.ozlabs.org Cc: linux-gpio@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Acked-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Rob Herring <robh@kernel.org>
2019-07-20dt-bindings: riscv: Limit cpus schema to only check RiscV 'cpu' nodesRob Herring1-82/+61
Matching on the 'cpus' node was a bad choice because the schema is incorrectly applied to non-RiscV cpus nodes. As we now have a common cpus schema which checks the general structure, it is also redundant to do so in the Risc-V CPU schema. The downside is one could conceivably mix different architecture's cpu nodes or have typos in the compatible string. The latter problem pretty much exists for every schema. Acked-by: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Rob Herring <robh@kernel.org>
2019-07-20dt-bindings: Ensure child nodes are of type 'object'Rob Herring6-0/+8
Properties which are child node definitions need to have an explict type. Otherwise, a matching (DT) property can silently match when an error is desired. Fix this up tree-wide. Once this is fixed, the meta-schema will enforce this on any child node definitions. Cc: Chen-Yu Tsai <wens@csie.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Brian Norris <computersforpeace@gmail.com> Cc: Marek Vasut <marek.vasut@gmail.com> Cc: Richard Weinberger <richard@nod.at> Cc: Vignesh Raghavendra <vigneshr@ti.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: linux-mtd@lists.infradead.org Cc: linux-gpio@vger.kernel.org Cc: linux-stm32@st-md-mailman.stormreply.com Cc: linux-spi@vger.kernel.org Acked-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: Rob Herring <robh@kernel.org>
2019-07-20x86/entry/64: Prevent clobbering of saved CR2 valueThomas Gleixner1-1/+10
The recent fix for CR2 corruption introduced a new way to reliably corrupt the saved CR2 value. CR2 is saved early in the entry code in RDX, which is the third argument to the fault handling functions. But it missed that between saving and invoking the fault handler enter_from_user_mode() can be called. RDX is a caller saved register so the invoked function can freely clobber it with the obvious consequences. The TRACE_IRQS_OFF call is safe as it calls through the thunk which preserves RDX, but TRACE_IRQS_OFF_DEBUG is not because it also calls into C-code outside of the thunk. Store CR2 in R12 instead which is a callee saved register and move R12 to RDX just before calling the fault handler. Fixes: a0d14b8909de ("x86/mm, tracing: Fix CR2 corruption") Reported-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907201020540.1782@nanos.tec.linutronix.de
2019-07-20smp: Warn on function calls from softirq contextPeter Zijlstra1-0/+16
It's clearly documented that smp function calls cannot be invoked from softirq handling context. Unfortunately nothing enforces that or emits a warning. A single function call can be invoked from softirq context only via smp_call_function_single_async(). The only legit context is task context, so add a warning to that effect. Reported-by: luferry <luferry@163.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190718160601.GP3402@hirez.programming.kicks-ass.net
2019-07-20KVM: x86: Add fixed counters to PMU filterEric Hankland3-12/+35
Updates KVM_CAP_PMU_EVENT_FILTER so it can also whitelist or blacklist fixed counters. Signed-off-by: Eric Hankland <ehankland@google.com> [No need to check padding fields for zero. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20KVM: nVMX: do not use dangling shadow VMCS after guest resetPaolo Bonzini1-1/+7
If a KVM guest is reset while running a nested guest, free_nested will disable the shadow VMCS execution control in the vmcs01. However, on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync the VMCS12 to the shadow VMCS which has since been freed. This causes a vmptrld of a NULL pointer on my machime, but Jan reports the host to hang altogether. Let's see how much this trivial patch fixes. Reported-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20KVM: VMX: dump VMCS on failed entryPaolo Bonzini1-0/+1
This is useful for debugging, and is ratelimited nowadays. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20KVM: x86/vPMU: refine kvm_pmu err msg when event creation failedLike Xu1-2/+2
If a perf_event creation fails due to any reason of the host perf subsystem, it has no chance to log the corresponding event for guest which may cause abnormal sampling data in guest result. In debug mode, this message helps to understand the state of vPMC and we may not limit the number of occurrences but not in a spamming style. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20KVM: s390: Use kvm_vcpu_wake_up in kvm_s390_vcpu_wakeupWanpeng Li1-20/+3
Use kvm_vcpu_wake_up() in kvm_s390_vcpu_wakeup(). Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20KVM: Boost vCPUs that are delivering interruptsWanpeng Li3-5/+10
Inspired by commit 9cac38dd5d (KVM/s390: Set preempted flag during vcpu wakeup and interrupt delivery), we want to also boost not just lock holders but also vCPUs that are delivering interrupts. Most smp_call_function_many calls are synchronous, so the IPI target vCPUs are also good yield candidates. This patch introduces vcpu->ready to boost vCPUs during wakeup and interrupt delivery time; unlike s390 we do not reuse vcpu->preempted so that voluntarily preempted vCPUs are taken into account by kvm_vcpu_on_spin, but vmx_vcpu_pi_put is not affected (VT-d PI handles voluntary preemption separately, in pi_pre_block). Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM: ebizzy -M vanilla boosting improved 1VM 21443 23520 9% 2VM 2800 8000 180% 3VM 1800 3100 72% Testing on my Haswell desktop 8 HT, with 8 vCPUs VM 8GB RAM, two VMs, one running ebizzy -M, the other running 'stress --cpu 2': w/ boosting + w/o pv sched yield(vanilla) vanilla boosting improved 1570 4000 155% w/ boosting + w/ pv sched yield(vanilla) vanilla boosting improved 1844 5157 179% w/o boosting, perf top in VM: 72.33% [kernel] [k] smp_call_function_many 4.22% [kernel] [k] call_function_i 3.71% [kernel] [k] async_page_fault w/ boosting, perf top in VM: 38.43% [kernel] [k] smp_call_function_many 6.31% [kernel] [k] async_page_fault 6.13% libc-2.23.so [.] __memcpy_avx_unaligned 4.88% [kernel] [k] call_function_interrupt Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Marc Zyngier <maz@kernel.org> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20KVM: selftests: Remove superfluous define from vmx.cThomas Huth1-2/+0
The code in vmx.c does not use "program_invocation_name", so there is no need to "#define _GNU_SOURCE" here. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20KVM: SVM: Fix detection of AMD Errata 1096Liran Alon1-7/+35
When CPU raise #NPF on guest data access and guest CR4.SMAP=1, it is possible that CPU microcode implementing DecodeAssist will fail to read bytes of instruction which caused #NPF. This is AMD errata 1096 and it happens because CPU microcode reading instruction bytes incorrectly attempts to read code as implicit supervisor-mode data accesses (that is, just like it would read e.g. a TSS), which are susceptible to SMAP faults. The microcode reads CS:RIP and if it is a user-mode address according to the page tables, the processor gives up and returns no instruction bytes. In this case, GuestIntrBytes field of the VMCB on a VMEXIT will incorrectly return 0 instead of the correct guest instruction bytes. Current KVM code attemps to detect and workaround this errata, but it has multiple issues: 1) It mistakenly checks if guest CR4.SMAP=0 instead of guest CR4.SMAP=1, which is required for encountering a SMAP fault. 2) It assumes SMAP faults can only occur when guest CPL==3. However, in case guest CR4.SMEP=0, the guest can execute an instruction which reside in a user-accessible page with CPL<3 priviledge. If this instruction raise a #NPF on it's data access, then CPU DecodeAssist microcode will still encounter a SMAP violation. Even though no sane OS will do so (as it's an obvious priviledge escalation vulnerability), we still need to handle this semanticly correct in KVM side. Note that (2) *is* a useful optimization, because CR4.SMAP=1 is an easy triggerable condition and guests usually enable SMAP together with SMEP. If the vCPU has CR4.SMEP=1, the errata could indeed be encountered onlt at guest CPL==3; otherwise, the CPU would raise a SMEP fault to guest instead of #NPF. We keep this condition to avoid false positives in the detection of the errata. In addition, to avoid future confusion and improve code readbility, include details of the errata in code and not just in commit message. Fixes: 05d5a4863525 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)") Cc: Singh Brijesh <brijesh.singh@amd.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20KVM: LAPIC: Inject timer interrupt via posted interruptWanpeng Li7-36/+87
Dedicated instances are currently disturbed by unnecessary jitter due to the emulated lapic timers firing on the same pCPUs where the vCPUs reside. There is no hardware virtual timer on Intel for guest like ARM, so both programming timer in guest and the emulated timer fires incur vmexits. This patch tries to avoid vmexit when the emulated timer fires, at least in dedicated instance scenario when nohz_full is enabled. In that case, the emulated timers can be offload to the nearest busy housekeeping cpus since APICv has been found for several years in server processors. The guest timer interrupt can then be injected via posted interrupts, which are delivered by the housekeeping cpu once the emulated timer fires. The host should tuned so that vCPUs are placed on isolated physical processors, and with several pCPUs surplus for busy housekeeping. If disabled mwait/hlt/pause vmexits keep the vCPUs in non-root mode, ~3% redis performance benefit can be observed on Skylake server, and the number of external interrupt vmexits drops substantially. Without patch VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% ) While with patch: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% ) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20kbuild: add -fcf-protection=none when using retpoline flagsSeth Forshee1-0/+6
The gcc -fcf-protection=branch option is not compatible with -mindirect-branch=thunk-extern. The latter is used when CONFIG_RETPOLINE is selected, and this will fail to build with a gcc which has -fcf-protection=branch enabled by default. Adding -fcf-protection=none when building with retpoline enabled prevents such build failures. Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-07-20kbuild: update compile-test header list for v5.3-rc1Masahiro Yamada2-16/+6
- Some headers graduated from the blacklist - hyperv_timer.h joined the header-test when CONFIG_X86=y - nf_tables*.h joined the header-test when CONFIG_NF_TABLES is enabled. - The entry for nf_tables_offload.h was added to fix build error for the combination of CONFIG_NF_TABLES=n and CONFIG_KERNEL_HEADER_TEST=y. - The entry for iomap.h was added because this header is supposed to be included only when CONFIG_BLOCK=y Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2019-07-19Remove references to dead website.Dave Jones3-10/+0
This fell into disrepair a while ago, and the majority of hits to the snapshots were from bots, so it's more trouble to keep running than it's worth. Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-19tracing: Fix user stack trace "??" outputEiichi Tsukata1-8/+1
Commit c5c27a0a5838 ("x86/stacktrace: Remove the pointless ULONG_MAX marker") removes ULONG_MAX marker from user stack trace entries but trace_user_stack_print() still uses the marker and it outputs unnecessary "??". For example: less-1911 [001] d..2 34.758944: <user stack trace> => <00007f16f2295910> => ?? => ?? => ?? => ?? => ?? => ?? => ?? The user stack trace code zeroes the storage before saving the stack, so if the trace is shorter than the maximum number of entries it can terminate the print loop if a zero entry is detected. Link: http://lkml.kernel.org/r/20190630085438.25545-1-devel@etsukata.com Cc: stable@vger.kernel.org Fixes: 4285f2fcef80 ("tracing: Remove the ULONG_MAX stack trace hackery") Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-07-19dma-direct: correct the physical addr in dma_direct_sync_sg_for_cpu/deviceFugang Duan1-7/+11
dma_map_sg() may use swiotlb buffer when the kernel command line includes "swiotlb=force" or the dma_addr is out of dev->dma_mask range. After DMA complete the memory moving from device to memory, then user call dma_sync_sg_for_cpu() to sync with DMA buffer, and copy the original virtual buffer to other space. So dma_direct_sync_sg_for_cpu() should use swiotlb physical addr, not the original physical addr from sg_phys(sg). dma_direct_sync_sg_for_device() also has the same issue, correct it as well. Fixes: 55897af63091("dma-direct: merge swiotlb_dma_ops into the dma_direct code") Signed-off-by: Fugang Duan <fugang.duan@nxp.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-07-19Input: alps - fix a mismatch between a condition check and its commentHui Wang1-1/+1
In the function alps_is_cs19_trackpoint(), we check if the param[1] is in the 0x20~0x2f range, but the code we wrote for this checking is not correct: (param[1] & 0x20) does not mean param[1] is in the range of 0x20~0x2f, it also means the param[1] is in the range of 0x30~0x3f, 0x60~0x6f... Now fix it with a new condition checking ((param[1] & 0xf0) == 0x20). Fixes: 7e4935ccc323 ("Input: alps - don't handle ALPS cs19 trackpoint-only device") Cc: stable@vger.kernel.org Signed-off-by: Hui Wang <hui.wang@canonical.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2019-07-19Input: psmouse - fix build error of multiple definitionYueHaibing1-1/+2
trackpoint_detect() should be static inline while CONFIG_MOUSE_PS2_TRACKPOINT is not set, otherwise, we build fails: drivers/input/mouse/alps.o: In function `trackpoint_detect': alps.c:(.text+0x8e00): multiple definition of `trackpoint_detect' drivers/input/mouse/psmouse-base.o:psmouse-base.c:(.text+0x1b50): first defined here Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 55e3d9224b60 ("Input: psmouse - allow disabing certain protocol extensions") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2019-07-19Input: applespi - remove set but not used variables 'sts'Mao Wenan1-2/+1
Fixes gcc '-Wunused-but-set-variable' warning: drivers/input/keyboard/applespi.c: In function applespi_set_bl_level: drivers/input/keyboard/applespi.c:902:6: warning: variable sts set but not used [-Wunused-but-set-variable] Fixes: b426ac0452093d ("Input: add Apple SPI keyboard and trackpad driver") Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2019-07-19Input: add Apple SPI keyboard and trackpad driverRonald Tschalär5-0/+2117
The keyboard and trackpad on recent MacBook's (since 8,1) and MacBookPro's (13,* and 14,*) are attached to an SPI controller instead of USB, as previously. The higher level protocol is not publicly documented and hence has been reverse engineered. As a consequence there are still a number of unknown fields and commands. However, the known parts have been working well and received extensive testing and use. In order for this driver to work, the proper SPI drivers need to be loaded too; for MB8,1 these are spi_pxa2xx_platform and spi_pxa2xx_pci; for all others they are spi_pxa2xx_platform and intel_lpss_pci. Link: https://bugzilla.kernel.org/show_bug.cgi?id=99891 Link: https://bugzilla.kernel.org/show_bug.cgi?id=108331 Signed-off-by: Ronald Tschalär <ronald@innovation.ch> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2019-07-19x86/hyper-v: Zero out the VP ASSIST PAGE on allocationDexuan Cui1-2/+11
The VP ASSIST PAGE is an "overlay" page (see Hyper-V TLFS's Section 5.2.1 "GPA Overlay Pages" for the details) and here is an excerpt: "The hypervisor defines several special pages that "overlay" the guest's Guest Physical Addresses (GPA) space. Overlays are addressed GPA but are not included in the normal GPA map maintained internally by the hypervisor. Conceptually, they exist in a separate map that overlays the GPA map. If a page within the GPA space is overlaid, any SPA page mapped to the GPA page is effectively "obscured" and generally unreachable by the virtual processor through processor memory accesses. If an overlay page is disabled, the underlying GPA page is "uncovered", and an existing mapping becomes accessible to the guest." SPA = System Physical Address = the final real physical address. When a CPU (e.g. CPU1) is onlined, hv_cpu_init() allocates the VP ASSIST PAGE and enables the EOI optimization for this CPU by writing the MSR HV_X64_MSR_VP_ASSIST_PAGE. From now on, hvp->apic_assist belongs to the special SPA page, and this CPU *always* uses hvp->apic_assist (which is shared with the hypervisor) to decide if it needs to write the EOI MSR. When a CPU is offlined then on the outgoing CPU: 1. hv_cpu_die() disables the EOI optimizaton for this CPU, and from now on hvp->apic_assist belongs to the original "normal" SPA page; 2. the remaining work of stopping this CPU is done 3. this CPU is completely stopped. Between 1 and 3, this CPU can still receive interrupts (e.g. reschedule IPIs from CPU0, and Local APIC timer interrupts), and this CPU *must* write the EOI MSR for every interrupt received, otherwise the hypervisor may not deliver further interrupts, which may be needed to completely stop the CPU. So, after the EOI optimization is disabled in hv_cpu_die(), it's required that the hvp->apic_assist's bit0 is zero, which is not guaranteed by the current allocation mode because it lacks __GFP_ZERO. As a consequence the bit might be set and interrupt handling would not write the EOI MSR causing interrupt delivery to become stuck. Add the missing __GFP_ZERO to the allocation. Note 1: after the "normal" SPA page is allocted and zeroed out, neither the hypervisor nor the guest writes into the page, so the page remains with zeros. Note 2: see Section 10.3.5 "EOI Assist" for the details of the EOI optimization. When the optimization is enabled, the guest can still write the EOI MSR register irrespective of the "No EOI required" value, but that's slower than the optimized assist based variant. Fixes: ba696429d290 ("x86/hyper-v: Implement EOI assist") Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/ <PU1P153MB0169B716A637FABF07433C04BFCB0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM