aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc/platforms/pseries/setup.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-09-30powerpc/pseries: Add firmware details to the hardware descriptionMichael Ellerman1-0/+30
Add firmware version details to the hardware description, which is printed at boot and in case of an oops. Use /hypervisor if we find it, though currently it only exists if we're running under qemu. Look for "ibm,powervm-partition" which is specified in PAPR+ v2.11 and tells us we're running under PowerVM. Failing that look for "ibm,fw-net-version" which is seen on PowerVM going back to at least Power6. eg: Hardware name: ... of:IBM,FW860.42 (SV860_138) hv:phyp Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220930082709.55830-6-mpe@ellerman.id.au
2022-09-05powerpc/pseries: Implement CONFIG_PARAVIRT_TIME_ACCOUNTINGNicholas Piggin1-0/+19
CONFIG_VIRT_CPU_ACCOUNTING_GEN under pseries does not provide stolen time accounting unless CONFIG_PARAVIRT_TIME_ACCOUNTING is enabled. Implement this using the VPA accumulated wait counters. Note this will not work on current KVM hosts because KVM does not implement the VPA dispatch counters (yet). It could be implemented with the dispatch trace log as it is for VIRT_CPU_ACCOUNTING_NATIVE, but that is not necessary for the more limited accounting provided by PARAVIRT_TIME_ACCOUNTING, and it is more expensive, complex, and has downsides like potential log wrap. From Shrikanth: [...] it was tested on Power10 [PowerVM] Shared LPAR. system has two LPAR. we will call first one LPAR1 and second one as LPAR2. Test was carried out in SMT=1. Similar observation was seen in SMT=8 as well. LPAR config header from each LPAR is below. LPAR1 is twice as big as LPAR2. Since Both are sharing the same underlying hardware, work stealing will happen when both the LPAR's are contending for the same resource. LPAR1: type=Shared mode=Uncapped smt=Off lcpu=40 cpus=40 ent=20.00 LPAR2: type=Shared mode=Uncapped smt=Off lcpu=20 cpus=40 ent=10.00 mpstat was used to check for the utilization. stress-ng has been used as the workload. Few cases are tested. when the both LPAR are idle there is no steal time. when LPAR1 starts running at 100% which consumes all of the physical resource, steal time starts to get accounted. With LPAR1 running at 100% and LPAR2 starts running, steal time starts increasing. This is as expected. When the LPAR2 Load is increased further, steal time increases further. Case 1: 0% LPAR1; 0% LPAR2 %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.00 99.95 Case 2: 100% LPAR1; 0% LPAR2 %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 97.68 0.00 0.00 0.00 0.00 0.00 2.32 0.00 0.00 0.00 Case 3: 100% LPAR1; 50% LPAR2 %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 86.34 0.00 0.10 0.00 0.00 0.03 13.54 0.00 0.00 0.00 Case 4: 100% LPAR1; 100% LPAR2 %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 78.54 0.00 0.07 0.00 0.00 0.02 21.36 0.00 0.00 0.00 Case 5: 50% LPAR1; 100% LPAR2 %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 49.37 0.00 0.00 0.00 0.00 0.00 1.17 0.00 0.00 49.47 Patch is accounting for the steal time and basic tests are holding good. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> [mpe: Add SPDX tag to new paravirt_api_clock.h] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220902085316.2071519-3-npiggin@gmail.com
2022-07-20powerpc/pseries: register pseries-wdt device with platform busScott Cheloha1-0/+13
PAPR v2.12 defines a new hypercall, H_WATCHDOG. The hypercall permits guest control of one or more virtual watchdog timers. These timers do not conform to PowerPC device conventions. They are not affixed to any extant bus, nor do they have full representation in the device tree. As a workaround we represent them as platform devices. This patch registers a single platform device, "pseries-wdt", with the platform bus if the FW_FEATURE_WATCHDOG flag is set. A driver for this device, "pseries-wdt", will be introduced in a subsequent patch. Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220713202335.1217647-4-cheloha@linux.ibm.com
2022-07-09Merge branch 'fixes' into nextMichael Ellerman1-0/+1
Merge our fixes branch. In particular this brings in commit 986481618023 ("powerpc/book3e: Fix PUD allocation size in map_kernel_page()") which fixes a build failure in next, because commit 2db2008e6363 ("powerpc/64e: Rewrite p4d_populate() as a static inline function") depends on it.
2022-06-29powerpc/64s: Don't read H_BLOCK_REMOVE characteristics in radix modeLaurent Dufour1-3/+2
There is no need to read the H_BLOCK_REMOVE characteristics when running in Radix mode because this hcall is never called. Furthermore since the commit 387e220a2e5e ("powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU") define pseries_lpar_read_hblkrm_characteristics as un empty function if CONFIG_PPC_64S_HASH_MMU is not set, the #ifdef block can be removed. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220523164353.26441-1-ldufour@linux.ibm.com
2022-06-20powerpc: Don't include asm/setup.h in asm/machdep.hChristophe Leroy1-0/+1
asm/machdep.h doesn't need asm/setup.h Remove it. Add it directly in files that needs it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3b1dfb19a2c3265fb4abc2bfc7b6eae9261a998b.1654966508.git.christophe.leroy@csgroup.eu
2022-06-18powerpc/pseries: wire up rng during setup_arch()Jason A. Donenfeld1-0/+1
The platform's RNG must be available before random_init() in order to be useful for initial seeding, which in turn means that it needs to be called from setup_arch(), rather than from an init call. Fortunately, each platform already has a setup_arch function pointer, which means it's easy to wire this up. This commit also removes some noisy log messages that don't add much. Fixes: a489043f4626 ("powerpc/pseries: Implement arch_get_random_long() based on H_RANDOM") Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220611151015.548325-4-Jason@zx2c4.com
2022-05-28Merge tag 'powerpc-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds1-14/+4
Pull powerpc updates from Michael Ellerman: - Convert to the generic mmap support (ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT) - Add support for outline-only KASAN with 64-bit Radix MMU (P9 or later) - Increase SIGSTKSZ and MINSIGSTKSZ and add support for AT_MINSIGSTKSZ - Enable the DAWR (Data Address Watchpoint) on POWER9 DD2.3 or later - Drop support for system call instruction emulation - Many other small features and fixes Thanks to Alexey Kardashevskiy, Alistair Popple, Andy Shevchenko, Bagas Sanjaya, Bjorn Helgaas, Bo Liu, Chen Huang, Christophe Leroy, Colin Ian King, Daniel Axtens, Dwaipayan Ray, Fabiano Rosas, Finn Thain, Frank Rowand, Fuqian Huang, Guilherme G. Piccoli, Hangyu Hua, Haowen Bai, Haren Myneni, Hari Bathini, He Ying, Jason Wang, Jiapeng Chong, Jing Yangyang, Joel Stanley, Julia Lawall, Kajol Jain, Kevin Hao, Krzysztof Kozlowski, Laurent Dufour, Lv Ruyi, Madhavan Srinivasan, Magali Lemes, Miaoqian Lin, Minghao Chi, Nathan Chancellor, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Oscar Salvador, Pali Rohár, Paul Mackerras, Peng Wu, Qing Wang, Randy Dunlap, Reza Arbab, Russell Currey, Sohaib Mohamed, Vaibhav Jain, Vasant Hegde, Wang Qing, Wang Wensheng, Xiang wangx, Xiaomeng Tong, Xu Wang, Yang Guang, Yang Li, Ye Bin, YueHaibing, Yu Kuai, Zheng Bin, Zou Wei, and Zucheng Zheng. * tag 'powerpc-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (200 commits) powerpc/64: Include cache.h directly in paca.h powerpc/64s: Only set HAVE_ARCH_UNMAPPED_AREA when CONFIG_PPC_64S_HASH_MMU is set powerpc/xics: Include missing header powerpc/powernv/pci: Drop VF MPS fixup powerpc/fsl_book3e: Don't set rodata RO too early powerpc/microwatt: Add mmu bits to device tree powerpc/powernv/flash: Check OPAL flash calls exist before using powerpc/powermac: constify device_node in of_irq_parse_oldworld() powerpc/powermac: add missing g5_phy_disable_cpu1() declaration selftests/powerpc/pmu: fix spelling mistake "mis-match" -> "mismatch" powerpc: Enable the DAWR on POWER9 DD2.3 and above powerpc/64s: Add CPU_FTRS_POWER10 to ALWAYS mask powerpc/64s: Add CPU_FTRS_POWER9_DD2_2 to CPU_FTRS_ALWAYS mask powerpc: Fix all occurences of "the the" selftests/powerpc/pmu/ebb: remove fixed_instruction.S powerpc/platforms/83xx: Use of_device_get_match_data() powerpc/eeh: Drop redundant spinlock initialization powerpc/iommu: Add missing of_node_put in iommu_init_early_dart powerpc/pseries/vas: Call misc_deregister if sysfs init fails powerpc/papr_scm: Fix leaking nvdimm_events_map elements ...
2022-05-22powerpc/kasan: Disable address sanitization in kexec pathsDaniel Axtens1-11/+1
The kexec code paths involve code that necessarily run in real mode, as CPUs are disabled and control is transferred to the new kernel. Disable address sanitization for the kexec code and the functions called in real mode on CPUs being disabled. [paulus@ozlabs.org: combined a few work-in-progress commits of Daniel's and wrote the commit message.] Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> [mpe: Move pseries_machine_kexec() into kexec.c so setup.c can be instrumented] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/YoTFSQ2TUSEaDdVC@cleo
2022-05-08powerpc: Add missing headersChristophe Leroy1-1/+1
Don't inherit headers "by chances" from asm/prom.h, asm/mpc52xx.h, asm/pci.h etc... Include the needed headers, and remove asm/prom.h when it was needed exclusively for pulling necessary headers. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/be8bdc934d152a7d8ee8d1a840d5596e2f7d85e0.1646767214.git.christophe.leroy@csgroup.eu
2022-05-05powerpc: fix typos in commentsJulia Lawall1-2/+2
Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220430185654.5855-1-Julia.Lawall@inria.fr
2022-04-18swiotlb: make the swiotlb_init interface more usefulChristoph Hellwig1-3/+0
Pass a boolean flag to indicate if swiotlb needs to be enabled based on the addressing needs, and replace the verbose argument with a set of flags, including one to force enable bounce buffering. Note that this patch removes the possibility to force xen-swiotlb use with the swiotlb=force parameter on the command line on x86 (arm and arm64 never supported that), but this interface will be restored shortly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2022-03-28Merge branch 'topic/ppc-kvm' into nextMichael Ellerman1-1/+12
Merge some more commits from our KVM topic branch. In particular this brings in some commits that depend on a new capability that was merged via the KVM tree for v5.18.
2022-03-08KVM: PPC: Use KVM_CAP_PPC_AIL_MODE_3Nicholas Piggin1-1/+12
Use KVM_CAP_PPC_AIL_MODE_3 to advertise the capability to set the AIL resource mode to 3 with the H_SET_MODE hypercall. This capability differs between processor types and KVM types (PR, HV, Nested HV), and affects guest-visible behaviour. QEMU will implement a cap-ail-mode-3 to control this behaviour[1], and use the KVM CAP if available to determine KVM support[2]. [1] https://lists.nongnu.org/archive/html/qemu-ppc/2022-02/msg00437.html [2] https://lists.nongnu.org/archive/html/qemu-ppc/2022-02/msg00439.html Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> [mpe: Rebase onto 93b71801a827 from kvm-ppc-cap-210 branch, add EXPORT_SYMBOL] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220222064727.2314380-4-npiggin@gmail.com
2022-03-08powerpc/mce: Avoid using irq_work_queue() in realmodeGanesh Goudar1-0/+1
In realmode mce handler we use irq_work_queue() to defer the processing of mce events, irq_work_queue() can only be called when translation is enabled because it touches memory outside RMA, hence we enable translation before calling irq_work_queue and disable on return, though it is not safe to do in realmode. To avoid this, program the decrementer and call the event processing functions from timer handler. Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220120121931.517974-1-ganeshgr@linux.ibm.com
2021-12-23powerpc/pseries: Add __init attribute to eligible functionsNick Child1-2/+2
Some functions defined in 'arch/powerpc/platforms/pseries' are deserving of an `__init` macro attribute. These functions are only called by other initialization functions and therefore should inherit the attribute. Also, change function declarations in header files to include `__init`. Signed-off-by: Nick Child <nick.child@ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211216220035.605465-13-nick.child@ibm.com
2021-12-09powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMUNicholas Piggin1-2/+4
Compiling out hash support code when CONFIG_PPC_64S_HASH_MMU=n saves 128kB kernel image size (90kB text) on powernv_defconfig minus KVM, 350kB on pseries_defconfig minus KVM, 40kB on a tiny config. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fixup defined(ARCH_HAS_MEMREMAP_COMPAT_ALIGN), which needs CONFIG. Fix radix_enabled() use in setup_initial_memory_limit(). Add some stubs to reduce number of ifdefs.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211201144153.2456614-18-npiggin@gmail.com
2021-11-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+1
Merge misc updates from Andrew Morton: "257 patches. Subsystems affected by this patch series: scripts, ocfs2, vfs, and mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache, gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools, memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm, vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram, cleanups, kfence, and damon)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits) mm/damon: remove return value from before_terminate callback mm/damon: fix a few spelling mistakes in comments and a pr_debug message mm/damon: simplify stop mechanism Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions Docs/admin-guide/mm/damon/start: simplify the content Docs/admin-guide/mm/damon/start: fix a wrong link Docs/admin-guide/mm/damon/start: fix wrong example commands mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on mm/damon: remove unnecessary variable initialization Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM) selftests/damon: support watermarks mm/damon/dbgfs: support watermarks mm/damon/schemes: activate schemes based on a watermarks mechanism tools/selftests/damon: update for regions prioritization of schemes mm/damon/dbgfs: support prioritization weights mm/damon/vaddr,paddr: support pageout prioritization mm/damon/schemes: prioritize regions within the quotas mm/damon/selftests: support schemes quotas mm/damon/dbgfs: support quotas of schemes ...
2021-11-06mm/memory_hotplug: remove CONFIG_MEMORY_HOTPLUG_SPARSEDavid Hildenbrand1-1/+1
CONFIG_MEMORY_HOTPLUG depends on CONFIG_SPARSEMEM, so there is no need for CONFIG_MEMORY_HOTPLUG_SPARSE anymore; adjust all instances to use CONFIG_MEMORY_HOTPLUG and remove CONFIG_MEMORY_HOTPLUG_SPARSE. Link: https://lkml.kernel.org/r/20210929143600.49379-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Shuah Khan <skhan@linuxfoundation.org> [kselftest] Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Alex Shi <alexs@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-09powerpc: Drop superfluous pci_dev_is_added() callsNiklas Schnelle1-2/+1
On powerpc, pci_dev_is_added() is called as part of SR-IOV fixups that are done under pcibios_add_device() which in turn is only called in pci_device_add() whih is called when a PCI device is scanned. pci_dev_assign_added() is called in pci_bus_add_device() which is only called after scanning the device. Thus pci_dev_is_added() is always false and can be dropped. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Oliver O'Halloran <oohall@gmail.com> [mpe: Tweak change log slightly to reflect Oliver's comments] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210910141940.2598035-2-schnelle@linux.ibm.com
2021-09-03Merge branch 'fixes' into nextMichael Ellerman1-3/+4
Merge our fixes branch into next. That lets us resolve a conflict in arch/powerpc/sysdev/xive/common.c. Between cbc06f051c52 ("powerpc/xive: Do not skip CPU-less nodes when creating the IPIs"), which moved request_irq() out of xive_init_ipis(), and 17df41fec5b8 ("powerpc: use IRQF_NO_DEBUG for IPIs") which added IRQF_NO_DEBUG to that request_irq() call, which has now moved.
2021-08-10powerpc/pseries/pci: Add MSI domainsCédric Le Goater1-0/+2
Two IRQ domains are added on top of default machine IRQ domain. First, the top level "pSeries-PCI-MSI" domain deals with the MSI specificities. In this domain, the HW IRQ numbers are generated by the PCI MSI layer, they compose a unique ID for an MSI source with the PCI device identifier and the MSI vector number. These numbers can be quite large on a pSeries machine running under the IBM Hypervisor and /sys/kernel/irq/ and /proc/interrupts will require small fixes to show them correctly. Second domain is the in-the-middle "pSeries-MSI" domain which acts as a proxy between the PCI MSI subsystem and the machine IRQ subsystem. It usually allocate the MSI vector numbers but, on pSeries machines, this is done by the RTAS FW and RTAS returns IRQ numbers in the IRQ number space of the machine. This is why the in-the-middle "pSeries-MSI" domain has the same HW IRQ numbers as its parent domain. Only the XIVE (P9/P10) parent domain is supported for now. We still need to add support for IRQ domain hierarchy under XICS. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-6-clg@kaod.org
2021-08-07powerpc/pseries: Fix update of LPAR security flavor after LPMLaurent Dufour1-2/+3
After LPM, when migrating from a system with security mitigation enabled to a system with mitigation disabled, the security flavor exposed in /proc is not correctly set back to 0. Do not assume the value of the security flavor is set to 0 when entering init_cpu_char_feature_flags(), so when called after a LPM, the value is set correctly even if the mitigation are not turned off. Fixes: 6ce56e1ac380 ("powerpc/pseries: export LPAR security flavor in lparcfg") Cc: stable@vger.kernel.org # v5.13+ Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210805152308.33988-1-ldufour@linux.ibm.com
2021-07-29powerpc/pseries: Fix regression while building external modulesSrikar Dronamraju1-1/+1
With commit c9f3401313a5 ("powerpc: Always enable queued spinlocks for 64s, disable for others") CONFIG_PPC_QUEUED_SPINLOCKS is always enabled on ppc64le, external modules that use spinlock APIs are failing. ERROR: modpost: GPL-incompatible module XXX.ko uses GPL-only symbol 'shared_processor' Before the above commit, modules were able to build without any issues. Also this problem is not seen on other architectures. This problem can be workaround if CONFIG_UNINLINE_SPIN_UNLOCK is enabled in the config. However CONFIG_UNINLINE_SPIN_UNLOCK is not enabled by default and only enabled in certain conditions like CONFIG_DEBUG_SPINLOCKS is set in the kernel config. #include <linux/module.h> spinlock_t spLock; static int __init spinlock_test_init(void) { spin_lock_init(&spLock); spin_lock(&spLock); spin_unlock(&spLock); return 0; } static void __exit spinlock_test_exit(void) { printk("spinlock_test unloaded\n"); } module_init(spinlock_test_init); module_exit(spinlock_test_exit); MODULE_DESCRIPTION ("spinlock_test"); MODULE_LICENSE ("non-GPL"); MODULE_AUTHOR ("Srikar Dronamraju"); Given that spin locks are one of the basic facilities for module code, this effectively makes it impossible to build/load almost any non GPL modules on ppc64le. This was first reported at https://github.com/openzfs/zfs/issues/11172 Currently shared_processor is exported as GPL only symbol. Fix this for parity with other architectures by exposing shared_processor to non-GPL modules too. Fixes: 14c73bd344da ("powerpc/vcpu: Assume dedicated processors as non-preempt") Cc: stable@vger.kernel.org # v5.5+ Reported-by: marc.c.dionne@gmail.com Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210729060449.292780-1-srikar@linux.vnet.ibm.com
2021-06-21powerpc/pesries: Get STF barrier requirement from H_GET_CPU_CHARACTERISTICSNicholas Piggin1-0/+3
This allows the hypervisor / firmware to describe this workarounds to the guest. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210503130243.891868-4-npiggin@gmail.com
2021-06-21powerpc/pseries: Get entry and uaccess flush required bits from H_GET_CPU_CHARACTERISTICSNicholas Piggin1-0/+6
This allows the hypervisor / firmware to describe these workarounds to the guest. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210503130243.891868-2-npiggin@gmail.com
2021-03-26powerpc/pseries: export LPAR security flavor in lparcfgLaurent Dufour1-0/+7
This is helpful to read the security flavor from inside the LPAR. In /sys/kernel/debug/powerpc/security_features it can be seen if mitigations are on or off but not the level set through the ASMI menu. Furthermore, reporting it through /proc/powerpc/lparcfg allows an easy processing by the lparstat command [1]. Export it like this in /proc/powerpc/lparcfg: $ grep security_flavor /proc/powerpc/lparcfg security_flavor=1 Value follows what is documented on the IBM support page [2]: 0 Speculative execution fully enabled 1 Speculative execution controls to mitigate user-to-kernel attacks 2 Speculative execution controls to mitigate user-to-kernel and user-to-user side-channel attacks [1] https://groups.google.com/g/powerpc-utils-devel/c/NaKXvdyl_UI/m/wa2stpIDAQAJ [2] https://www.ibm.com/support/pages/node/715841 Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210305125554.5165-1-ldufour@linux.ibm.com
2021-03-24powerpc/pseries: Move hvc_vio_init_early() prototype to shared locationLee Jones1-0/+1
Fixes the following W=1 kernel build warning(s): drivers/tty/hvc/hvc_vio.c:385:13: warning: no previous prototype for ‘hvc_vio_init_early’ 385 | void __init hvc_vio_init_early(void) | ^~~~~~~~~~~~~~~~~~ Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210303124603.3150175-1-lee.jones@linaro.org
2021-02-09powerpc/pci: Move PHB discovery for PCI_DN using platformsOliver O'Halloran1-2/+5
Make powernv, pseries, powermac and maple use ppc_mc.discover_phbs. These platforms need to be done together because they all depend on pci_dn's being created from the DT. The pci_dn contains a pointer to the relevant pci_controller so they need to be created after the pci_controller structures are available, but before PCI devices are scanned. Currently this ordering is provided by initcalls and the sequence is: 1. PHBs are discovered (setup_arch) (early boot, pre-initcalls) 2. pci_dn are created from the unflattended DT (core initcall) 3. PHBs are scanned pcibios_init() (subsys initcall) The new ppc_md.discover_phbs() function is also a core_initcall so we can't guarantee ordering between the creation of pci_controllers and the creation of pci_dn's which require a pci_controller. We could use the postcore, or core_sync initcall levels, but it's cleaner to just move the pci_dn setup into the per-PHB inits which occur inside of .discover_phb() for these platforms. This brings the boot-time path in line with the PHB hotplug path that is used for pseries DLPAR operations too. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [mpe: Squash powermac & maple in to avoid breakage those platforms, convert memblock allocs to use kmalloc to avoid warnings] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201103043523.916109-2-oohall@gmail.com
2021-01-30powerpc/pseries: Make IOV setup routines staticCédric Le Goater1-4/+4
These are only used locally. It fixes these W=1 compile errors : ../arch/powerpc/platforms/pseries/setup.c:610:17: error: no previous prototype for ‘pseries_get_iov_fw_value’ [-Werror=missing-prototypes] 610 | resource_size_t pseries_get_iov_fw_value(struct pci_dev *dev, int resno, | ^~~~~~~~~~~~~~~~~~~~~~~~ ../arch/powerpc/platforms/pseries/setup.c:646:6: error: no previous prototype for ‘of_pci_set_vf_bar_size’ [-Werror=missing-prototypes] 646 | void of_pci_set_vf_bar_size(struct pci_dev *dev, const int *indexes) | ^~~~~~~~~~~~~~~~~~~~~~ ../arch/powerpc/platforms/pseries/setup.c:668:6: error: no previous prototype for ‘of_pci_parse_iov_addrs’ [-Werror=missing-prototypes] 668 | void of_pci_parse_iov_addrs(struct pci_dev *dev, const int *indexes) | ^~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210104143206.695198-22-clg@kaod.org
2020-11-19powerpc/64s: rename pnv|pseries_setup_rfi_flush to _setup_security_mitigationsDaniel Axtens1-3/+4
pseries|pnv_setup_rfi_flush already does the count cache flush setup, and we just added entry and uaccess flushes. So the name is not very accurate any more. In both platforms we then also immediately setup the STF flush. Rename them to _setup_security_mitigations and fold the STF flush in. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-19powerpc/64s: flush L1D after user accessesNicholas Piggin1-0/+4
IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache after user accesses. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-11-19powerpc/64s: flush L1D on kernel entryNicholas Piggin1-0/+4
IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache on kernel entry. This is part of the fix for CVE-2020-4788. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-10-06powerpc/pseries: add new branch prediction security bits for link stackNicholas Piggin1-0/+6
The hypervisor interface has defined branch prediction security bits for handling the link stack. Wire them up. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200825075612.224656-1-npiggin@gmail.com
2020-08-07Merge tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds1-6/+18
Pull powerpc updates from Michael Ellerman: - Add support for (optionally) using queued spinlocks & rwlocks. - Support for a new faster system call ABI using the scv instruction on Power9 or later. - Drop support for the PROT_SAO mmap/mprotect flag as it will be unsupported on Power10 and future processors, leaving us with no way to implement the functionality it requests. This risks breaking userspace, though we believe it is unused in practice. - A bug fix for, and then the removal of, our custom stack expansion checking. We now allow stack expansion up to the rlimit, like other architectures. - Remove the remnants of our (previously disabled) topology update code, which tried to react to NUMA layout changes on virtualised systems, but was prone to crashes and other problems. - Add PMU support for Power10 CPUs. - A change to our signal trampoline so that we don't unbalance the link stack (branch return predictor) in the signal delivery path. - Lots of other cleanups, refactorings, smaller features and so on as usual. Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand, Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran, Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud, Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov, Wei Yongjun, Wen Xiong, YueHaibing. * tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits) selftests/powerpc: Fix pkey syscall redefinitions powerpc: Fix circular dependency between percpu.h and mmu.h powerpc/powernv/sriov: Fix use of uninitialised variable selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs powerpc/40x: Fix assembler warning about r0 powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric powerpc/papr_scm: Fetch nvdimm performance stats from PHYP cpuidle: pseries: Fixup exit latency for CEDE(0) cpuidle: pseries: Add function to parse extended CEDE records cpuidle: pseries: Set the latency-hint before entering CEDE selftests/powerpc: Fix online CPU selection powerpc/perf: Consolidate perf_callchain_user_[64|32]() powerpc/pseries/hotplug-cpu: Remove double free in error path powerpc/pseries/mobility: Add pr_debug() for device tree changes powerpc/pseries/mobility: Set pr_fmt() powerpc/cacheinfo: Warn if cache object chain becomes unordered powerpc/cacheinfo: Improve diagnostics about malformed cache lists powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages powerpc/cacheinfo: Set pr_fmt() powerpc: fix function annotations to avoid section mismatch warnings with gcc-10 ...
2020-07-29powerpc/book3s64/radix: Add kernel command line option to disable radix GTSEAneesh Kumar K.V1-0/+5
This adds a kernel command line option that can be used to disable GTSE support. Disabling GTSE implies kernel will make hcalls to invalidate TLB entries. This was done so that we can do VM migration between configs that enable/disable GTSE support via hypervisor. To migrate a VM from a system that supports GTSE to a system that doesn't, we can boot the guest with radix_hcall_invalidate=on, thereby forcing the guest to use hcalls for TLB invalidates. The check for hcall availability is done in pSeries_setup_arch so that the panic message appears on the console. This should only happen on a hypervisor that doesn't force the guest to hash translation even though it can't handle the radix GTSE=0 request via CAS. With radix_hcall_invalidate=on if the hypervisor doesn't support hcall_rpt_invalidate hcall it should force the LPAR to hash translation. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Tested-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200727085908.420806-1-aneesh.kumar@linux.ibm.com
2020-07-27powerpc/pseries: Implement paravirt qspinlocks for SPLPARNicholas Piggin1-1/+3
This implements the generic paravirt qspinlocks using H_PROD and H_CONFER to kick and wait. This uses an un-directed yield to any CPU rather than the directed yield to a pre-empted lock holder that paravirtualised simple spinlocks use, that requires no kick hcall. This is something that could be investigated and improved in future. Performance results can be found in the commit which added queued spinlocks. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200724131423.1362108-5-npiggin@gmail.com
2020-07-26powerpc/watchpoint: Guest support for 2nd DAWR hcallRavi Bangoria1-2/+5
2nd DAWR can be set/unset using H_SET_MODE hcall with resource value 5. Enable powervm guest support with that. This has no effect on kvm guest because kvm will return error if guest does hcall with resource value 5. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200723090813.303838-9-ravi.bangoria@linux.ibm.com
2020-07-22powerpc/64s: system call support for scv/rfscv instructionsNicholas Piggin1-3/+5
Add support for the scv instruction on POWER9 and later CPUs. For now this implements the zeroth scv vector 'scv 0', as identical to 'sc' system calls, with the exception that LR is not preserved, nor are volatile CR registers, and error is not indicated with CR0[SO], but by returning a negative errno. rfscv is implemented to return from scv type system calls. It can not be used to return from sc system calls because those are defined to preserve LR. getpid syscall throughput on POWER9 is improved by 26% (428 to 318 cycles), largely due to reducing mtmsr and mtspr. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix ppc64e build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200611081203.995112-3-npiggin@gmail.com
2020-07-10powerpc64: Break asm/percpu.h vs spinlock_types.h dependencyPeter Zijlstra1-0/+1
In order to use <asm/percpu.h> in lockdep.h, we need to make sure asm/percpu.h does not itself depend on lockdep. The below seems to make that so and builds powerpc64-defconfig + PROVE_LOCKING. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> https://lkml.kernel.org/r/20200623083721.336906073@infradead.org
2020-06-09mm: don't include asm/pgtable.h if linux/mm.h is already includedMike Rapoport1-1/+0
Patch series "mm: consolidate definitions of page table accessors", v2. The low level page table accessors (pXY_index(), pXY_offset()) are duplicated across all architectures and sometimes more than once. For instance, we have 31 definition of pgd_offset() for 25 supported architectures. Most of these definitions are actually identical and typically it boils down to, e.g. static inline unsigned long pmd_index(unsigned long address) { return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1); } static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) { return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address); } These definitions can be shared among 90% of the arches provided XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined. For architectures that really need a custom version there is always possibility to override the generic version with the usual ifdefs magic. These patches introduce include/linux/pgtable.h that replaces include/asm-generic/pgtable.h and add the definitions of the page table accessors to the new header. This patch (of 12): The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the functions involving page table manipulations, e.g. pte_alloc() and pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h> in the files that include <linux/mm.h>. The include statements in such cases are remove with a simple loop: for f in $(git grep -l "include <linux/mm.h>") ; do sed -i -e '/include <asm\/pgtable.h>/ d' $f done Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-18powerpc/pseries/ras: Avoid calling rtas_token() in NMI pathsNicholas Piggin1-4/+10
In the interest of reducing code and possible failures in the machine check and system reset paths, grab the "ibm,nmi-interlock" token at init time. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Link: https://lore.kernel.org/r/20200508043408.886394-6-npiggin@gmail.com
2020-04-30powerpc/pseries: Account for SPURR ticks on idle CPUsGautham R. Shenoy1-0/+2
On Pseries LPARs, to calculate utilization, we need to know the [S]PURR ticks when the CPUs were busy or idle. Via pseries_idle_prolog(), pseries_idle_epilog(), we track the idle PURR ticks in the VPA variable "wait_state_cycles". This patch extends the support to account for the idle SPURR ticks. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1586249263-14048-4-git-send-email-ego@linux.vnet.ibm.com
2020-04-30powerpc/idle: Store PURR snapshot in a per-cpu global variableGautham R. Shenoy1-4/+3
Currently when CPU goes idle, we take a snapshot of PURR via pseries_idle_prolog() which is used at the CPU idle exit to compute the idle PURR cycles via the function pseries_idle_epilog(). Thus, the value of idle PURR cycle thus read before pseries_idle_prolog() and after pseries_idle_epilog() is always correct. However, if we were to read the idle PURR cycles from an interrupt context between pseries_idle_prolog() and pseries_idle_epilog() (this will be done in a future patch), then, the value of the idle PURR thus read will not include the cycles spent in the most recent idle period. Thus, in that interrupt context, we will need access to the snapshot of the PURR before going idle, in order to compute the idle PURR cycles for the latest idle duration. In this patch, we save the snapshot of PURR in pseries_idle_prolog() in a per-cpu variable, instead of on the stack, so that it can be accessed from an interrupt context. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1586249263-14048-3-git-send-email-ego@linux.vnet.ibm.com
2020-04-30powerpc: Move idle_loop_prolog()/epilog() functions to header fileGautham R. Shenoy1-2/+5
Currently prior to entering an idle state on a Linux Guest, the pseries cpuidle driver implement an idle_loop_prolog() and idle_loop_epilog() functions which ensure that idle_purr is correctly computed, and the hypervisor is informed that the CPU cycles have been donated. These prolog and epilog functions are also required in the default idle call, i.e pseries_lpar_idle(). Hence move these accessor functions to a common header file and call them from pseries_lpar_idle(). Since the existing header files such as asm/processor.h have enough clutter, create a new header file asm/idle.h. Finally rename idle_loop_prolog() and idle_loop_epilog() to pseries_idle_prolog() and pseries_idle_epilog() as they are only relavent for on pseries guests. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1586249263-14048-2-git-send-email-ego@linux.vnet.ibm.com
2019-12-13powerpc/vcpu: Assume dedicated processors as non-preemptSrikar Dronamraju1-0/+7
With commit 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted vCPUs"), the scheduler avoids preempted vCPUs to schedule tasks on wakeup. This leads to wrong choice of CPU, which in-turn leads to larger wakeup latencies. Eventually, it leads to performance regression in latency sensitive benchmarks like soltp, schbench etc. On Powerpc, vcpu_is_preempted() only looks at yield_count. If the yield_count is odd, the vCPU is assumed to be preempted. However yield_count is increased whenever the LPAR enters CEDE state (idle). So any CPU that has entered CEDE state is assumed to be preempted. Even if vCPU of dedicated LPAR is preempted/donated, it should have right of first-use since they are supposed to own the vCPU. On a Power9 System with 32 cores: # lscpu Architecture: ppc64le Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 8 Core(s) per socket: 1 Socket(s): 16 NUMA node(s): 2 Model: 2.2 (pvr 004e 0202) Model name: POWER9 (architected), altivec supported Hypervisor vendor: pHyp Virtualization type: para L1d cache: 32K L1i cache: 32K L2 cache: 512K L3 cache: 10240K NUMA node0 CPU(s): 0-63 NUMA node1 CPU(s): 64-127 # perf stat -a -r 5 ./schbench v5.4 v5.4 + patch Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 45 50.0th: 45 75.0000th: 62 75.0th: 63 90.0000th: 71 90.0th: 74 95.0000th: 77 95.0th: 78 *99.0000th: 91 *99.0th: 82 99.5000th: 707 99.5th: 83 99.9000th: 6920 99.9th: 86 min=0, max=10048 min=0, max=96 Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 45 50.0th: 46 75.0000th: 61 75.0th: 64 90.0000th: 72 90.0th: 75 95.0000th: 79 95.0th: 79 *99.0000th: 691 *99.0th: 83 99.5000th: 3972 99.5th: 85 99.9000th: 8368 99.9th: 91 min=0, max=16606 min=0, max=117 Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 45 50.0th: 46 75.0000th: 61 75.0th: 64 90.0000th: 71 90.0th: 75 95.0000th: 77 95.0th: 79 *99.0000th: 106 *99.0th: 83 99.5000th: 2364 99.5th: 84 99.9000th: 7480 99.9th: 90 min=0, max=10001 min=0, max=95 Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 45 50.0th: 47 75.0000th: 62 75.0th: 65 90.0000th: 72 90.0th: 75 95.0000th: 78 95.0th: 79 *99.0000th: 93 *99.0th: 84 99.5000th: 108 99.5th: 85 99.9000th: 6792 99.9th: 90 min=0, max=17681 min=0, max=117 Latency percentiles (usec) Latency percentiles (usec) 50.0000th: 46 50.0th: 45 75.0000th: 62 75.0th: 64 90.0000th: 73 90.0th: 75 95.0000th: 79 95.0th: 79 *99.0000th: 113 *99.0th: 82 99.5000th: 2724 99.5th: 83 99.9000th: 6184 99.9th: 93 min=0, max=9887 min=0, max=111 Performance counter stats for 'system wide' (5 runs): context-switches 43,373 ( +- 0.40% ) 44,597 ( +- 0.55% ) cpu-migrations 1,211 ( +- 5.04% ) 220 ( +- 6.23% ) page-faults 15,983 ( +- 5.21% ) 15,360 ( +- 3.38% ) Waiman Long suggested using static_keys. Fixes: 247f2f6f3c70 ("sched/core: Don't schedule threads on pre-empted vCPUs") Cc: stable@vger.kernel.org # v4.18+ Reported-by: Parth Shah <parth@linux.ibm.com> Reported-by: Ihor Pasichnyk <Ihor.Pasichnyk@ibm.com> Tested-by: Juri Lelli <juri.lelli@redhat.com> Acked-by: Waiman Long <longman@redhat.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Phil Auld <pauld@redhat.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Tested-by: Parth Shah <parth@linux.ibm.com> [mpe: Move the key and setting of the key to pseries/setup.c] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191213035036.6913-1-mpe@ellerman.id.au
2019-09-24powerpc/pseries: Read TLB Block Invalidate CharacteristicsLaurent Dufour1-0/+1
The PAPR document specifies the TLB Block Invalidate Characteristics which tells for each pair of segment base page size, actual page size, the size of the block the hcall H_BLOCK_REMOVE supports. These characteristics are loaded at boot time in a new table hblkr_size. The table is separate from the mmu_psize_def because this is specific to the pseries platform. A new init function, pseries_lpar_read_hblkrm_characteristics() is added to read the characteristics. It is called from pSeries_setup_arch(). Fixes: ba2dd8a26baa ("powerpc/pseries/mm: call H_BLOCK_REMOVE") Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190920130523.20441-2-ldufour@linux.ibm.com
2019-09-12powerpc/pseries: correctly track irq state in default idleNathan Lynch1-0/+3
prep_irq_for_idle() is intended to be called before entering H_CEDE (and it is used by the pseries cpuidle driver). However the default pseries idle routine does not call it, leading to mismanaged lazy irq state when the cpuidle driver isn't in use. Manifestations of this include: * Dropped IPIs in the time immediately after a cpu comes online (before it has installed the cpuidle handler), making the online operation block indefinitely waiting for the new cpu to respond. * Hitting this WARN_ON in arch_local_irq_restore(): /* * We should already be hard disabled here. We had bugs * where that wasn't the case so let's dbl check it and * warn if we are wrong. Only do that when IRQ tracing * is enabled as mfmsr() can be costly. */ if (WARN_ON_ONCE(mfmsr() & MSR_EE)) __hard_irq_disable(); Call prep_irq_for_idle() from pseries_lpar_idle() and honor its result. Fixes: 363edbe2614a ("powerpc: Default arch idle could cede processor on pseries") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190910225244.25056-1-nathanl@linux.ibm.com
2019-08-30powerpc/64s/powernv: machine check dump SLB contentsNicholas Piggin1-11/+13
Re-use the code introduced in pseries to save and dump the contents of the SLB in the case of an SLB involved machine check exception. This patch also avoids allocating the SLB save array on pseries radix. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190802105709.27696-9-npiggin@gmail.com
2019-08-30powerpc/pseries/svm: Use shared memory for Debug Trace Log (DTL)Anshuman Khandual1-1/+4
Secure guests need to share the DTL buffers with the hypervisor. To that end, use a kmem_cache constructor which converts the underlying buddy allocated SLUB cache pages into shared memory. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190820021326.6884-10-bauerman@linux.ibm.com