aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc/platforms/pseries (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-10-18powerpc/pseries: Use lparcfg to reconfig VAS windows for DLPAR CPUHaren Myneni3-14/+45
The hypervisor assigns VAS (Virtual Accelerator Switchboard) windows depends on cores configured in LPAR. The kernel uses OF reconfig notifier to reconfig VAS windows for DLPAR CPU event. In the case of shared CPU mode partition, the hypervisor assigns VAS windows depends on CPU entitled capacity, not based on vcpus. When the user changes CPU entitled capacity for the partition, drmgr uses /proc/ppc64/lparcfg interface to notify the kernel. This patch adds the following changes to update VAS resources for shared mode: - Call vas reconfig windows from lparcfg_write() - Ignore reconfig changes in the VAS notifier Signed-off-by: Haren Myneni <haren@linux.ibm.com> [mpe: Rework error handling, report any errors as EIO] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/efa9c16e4a78dda4567a16f13dabfd73cb4674a2.camel@linux.ibm.com
2022-10-18powerpc/pseries/vas: Add VAS IRQ primary handlerHaren Myneni2-7/+34
irq_default_primary_handler() can be used only with IRQF_ONESHOT flag, but the flag disables IRQ before executing the thread handler and enables it after the interrupt is handled. But this IRQ disable sets the VAS IRQ OFF state in the hypervisor. In case if NX faults during this window, the hypervisor will not deliver the fault interrupt to the partition and the user space may wait continuously for the CSB update. So use VAS specific IRQ handler instead of calling the default primary handler. Increment pending_faults counter in IRQ handler and the bottom thread handler will process all faults based on this counter. In case if the another interrupt is received while the thread is running, it will be processed using this counter. The synchronization of top and bottom handlers will be done with IRQTF_RUNTHREAD flag and will re-enter to bottom half if this flag is set. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/aaad8813b4762a6753cfcd0b605a7574a5192ec7.camel@linux.ibm.com
2022-10-13powerpc/pseries: Fix CONFIG_DTL=n buildNicholas Piggin2-74/+80
The recently moved dtl code must be compiled-in if CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y even if CONFIG_DTL=n. Fixes: 6ba5aa541aaa0 ("powerpc/pseries: Move dtl scanning and steal time accounting to pseries platform") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221013073131.1485742-1-npiggin@gmail.com
2022-09-30powerpc/pseries: Add firmware details to the hardware descriptionMichael Ellerman1-0/+30
Add firmware version details to the hardware description, which is printed at boot and in case of an oops. Use /hypervisor if we find it, though currently it only exists if we're running under qemu. Look for "ibm,powervm-partition" which is specified in PAPR+ v2.11 and tells us we're running under PowerVM. Failing that look for "ibm,fw-net-version" which is seen on PowerVM going back to at least Power6. eg: Hardware name: ... of:IBM,FW860.42 (SV860_138) hv:phyp Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220930082709.55830-6-mpe@ellerman.id.au
2022-09-30powerpc/pseries/vas: Pass hw_cpu_id to node associativity HCALLHaren Myneni1-1/+1
Generally the hypervisor decides to allocate a window on different VAS instances. But if user space wishes to allocate on the current VAS instance where the process is executing, the kernel has to pass associativity domain IDs to allocate VAS window HCALL. To determine the associativity domain IDs for the current CPU, smp_processor_id() is passed to node associativity HCALL which may return H_P2 (-55) error during DLPAR CPU event. This is because Linux CPU numbers (smp_processor_id()) are not the same as the hypervisor's view of CPU numbers. Fix the issue by passing hard_smp_processor_id() with VPHN_FLAG_VCPU flag (PAPR 14.11.6.1 H_HOME_NODE_ASSOCIATIVITY). Fixes: b22f2d88e435 ("powerpc/pseries/vas: Integrate API with open/close windows") Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Haren Myneni <haren@linux.ibm.com> [mpe: Update change log to mention Linux vs HV CPU numbers] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/55380253ea0c11341824cd4c0fc6bbcfc5752689.camel@linux.ibm.com
2022-09-28powerpc/pseries/vas: Remove the unneeded result variableye xingchen1-5/+1
Return the value vas_register_coproc_api() directly instead of storing it in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220825072657.229168-1-ye.xingchen@zte.com.cn
2022-09-28powerpc/pseries: block untrusted device tree changes when locked downNathan Lynch1-0/+5
The /proc/powerpc/ofdt interface allows the root user to freely alter the in-kernel device tree, enabling arbitrary physical address writes via drivers that could bind to malicious device nodes, thus making it possible to disable lockdown. Historically this interface has been used on the pseries platform to facilitate the runtime addition and removal of processor, memory, and device resources (aka Dynamic Logical Partitioning or DLPAR). Years ago, the processor and memory use cases were migrated to designs that happen to be lockdown-friendly: device tree updates are communicated directly to the kernel from firmware without passing through untrusted user space. I/O device DLPAR via the "drmgr" command in powerpc-utils remains the sole legitimate user of /proc/powerpc/ofdt, but it is already broken in lockdown since it uses /dev/mem to allocate argument buffers for the rtas syscall. So only illegitimate uses of the interface should see a behavior change when running on a locked down kernel. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Acked-by: Paul Moore <paul@paul-moore.com> (LSM) Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926131643.146502-2-nathanl@linux.ibm.com
2022-09-28powerpc/pseries: Move vas_migration_handler early during migrationHaren Myneni1-3/+12
When the migration is initiated, the hypervisor changes VAS mappings as part of pre-migration event. Then the OS gets the migration event which closes all VAS windows before the migration starts. NX generates continuous faults until windows are closed and the user space can not differentiate these NX faults coming from the actual migration. So to reduce this time window, close VAS windows first in pseries_migrate_partition(). Signed-off-by: Haren Myneni <haren@linux.ibm.com> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d8efade91dda831c9ed4abb226dab627da594c5f.camel@linux.ibm.com
2022-09-26powerpc/pseries: move hcall_tracepoint_refcount out of .tocNicholas Piggin1-2/+2
The .toc section is not really intended for arbitrary data. Writable data in particular prevents making the TOC read-only after relocation. Move hcall_tracepoint_refcount into the .data section. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220926053823.2668799-1-npiggin@gmail.com
2022-09-24Merge branch 'fixes' into nextMichael Ellerman2-62/+31
Merge our fixes branch to bring in a few things that new feature patches rely on or conflict with.
2022-09-08powerpc/pseries: Fix plpks crash on non-pseriesMichael Ellerman1-1/+2
As reported[1] by Nathan, the recently added plpks driver will crash if it's built into the kernel and booted on a non-pseries machine, eg powernv: kernel BUG at arch/powerpc/kernel/syscall.c:39! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV ... NIP system_call_exception+0x90/0x3d0 LR system_call_common+0xec/0x250 Call Trace: 0xc0000000035c3e10 (unreliable) system_call_common+0xec/0x250 --- interrupt: c00 at plpar_hcall+0x38/0x60 NIP: c0000000000e4300 LR: c00000000202945c CTR: 0000000000000000 REGS: c0000000035c3e80 TRAP: 0c00 Not tainted (6.0.0-rc4) MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000284 XER: 00000000 ... NIP plpar_hcall+0x38/0x60 LR pseries_plpks_init+0x64/0x23c --- interrupt: c00 On powernv Linux is the hypervisor, so a hypercall just ends up going to the syscall path, which BUGs if the syscall (hypercall) didn't come from userspace. The fix is simply to not probe the plpks driver on non-pseries machines. [1] https://lore.kernel.org/linuxppc-dev/Yxe06fbq18Wv9y3W@dev-arch.thelio-3990X/ Fixes: 2454a7af0f2a ("powerpc/pseries: define driver for Platform KeyStore") Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Dan Horák <dan@danny.cz> Reviewed-by: Dan Horák <dan@danny.cz> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20220907065038.1604504-1-mpe@ellerman.id.au
2022-09-06powerpc/mobility: fix repeated words in commentsJilin Yuan1-1/+1
Delete the redundant word 'the'. Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220831005109.38314-1-yuanjilin@cdjrlc.com
2022-09-05powerpc/pseries: Add missing of_node_put()s in hotplug-cpu.cLiang He1-4/+11
In pseries_cpuhp_cache_use_count() and pseries_cpuhp_detach_nodes(), we need carefully hold the reference returned by of_find_next_cache_node() and use it to call of_node_put() to keep refcount balance. Signed-off-by: Liang He <windhl@126.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220621111701.4082889-1-windhl@126.com
2022-09-05powerpc/pseries: Add missing of_node_put() in ibmebusLiang He1-1/+5
In ibmebus_match_path(), use of_node_put() to drop the reference returned by of_find_node_by_path() before testing for equality of the pointers. Signed-off-by: Liang He <windhl@126.com> [mpe: Rewrite change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220619074016.4068105-1-windhl@126.com
2022-09-05powerpc/pseries: Move dtl scanning and steal time accounting to pseries platformNicholas Piggin1-0/+81
dtl is the PAPR Dispatch Trace Log, which is entirely a pseries feature. The pseries platform alrady has a file dealing with the dtl, so move scanning for stolen time accounting there from kernel/time.c. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220902085316.2071519-5-npiggin@gmail.com
2022-09-05powerpc/pseries: Implement CONFIG_PARAVIRT_TIME_ACCOUNTINGNicholas Piggin3-0/+38
CONFIG_VIRT_CPU_ACCOUNTING_GEN under pseries does not provide stolen time accounting unless CONFIG_PARAVIRT_TIME_ACCOUNTING is enabled. Implement this using the VPA accumulated wait counters. Note this will not work on current KVM hosts because KVM does not implement the VPA dispatch counters (yet). It could be implemented with the dispatch trace log as it is for VIRT_CPU_ACCOUNTING_NATIVE, but that is not necessary for the more limited accounting provided by PARAVIRT_TIME_ACCOUNTING, and it is more expensive, complex, and has downsides like potential log wrap. From Shrikanth: [...] it was tested on Power10 [PowerVM] Shared LPAR. system has two LPAR. we will call first one LPAR1 and second one as LPAR2. Test was carried out in SMT=1. Similar observation was seen in SMT=8 as well. LPAR config header from each LPAR is below. LPAR1 is twice as big as LPAR2. Since Both are sharing the same underlying hardware, work stealing will happen when both the LPAR's are contending for the same resource. LPAR1: type=Shared mode=Uncapped smt=Off lcpu=40 cpus=40 ent=20.00 LPAR2: type=Shared mode=Uncapped smt=Off lcpu=20 cpus=40 ent=10.00 mpstat was used to check for the utilization. stress-ng has been used as the workload. Few cases are tested. when the both LPAR are idle there is no steal time. when LPAR1 starts running at 100% which consumes all of the physical resource, steal time starts to get accounted. With LPAR1 running at 100% and LPAR2 starts running, steal time starts increasing. This is as expected. When the LPAR2 Load is increased further, steal time increases further. Case 1: 0% LPAR1; 0% LPAR2 %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.00 99.95 Case 2: 100% LPAR1; 0% LPAR2 %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 97.68 0.00 0.00 0.00 0.00 0.00 2.32 0.00 0.00 0.00 Case 3: 100% LPAR1; 50% LPAR2 %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 86.34 0.00 0.10 0.00 0.00 0.03 13.54 0.00 0.00 0.00 Case 4: 100% LPAR1; 100% LPAR2 %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 78.54 0.00 0.07 0.00 0.00 0.02 21.36 0.00 0.00 0.00 Case 5: 50% LPAR1; 100% LPAR2 %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 49.37 0.00 0.00 0.00 0.00 0.00 1.17 0.00 0.00 49.47 Patch is accounting for the steal time and basic tests are holding good. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> [mpe: Add SPDX tag to new paravirt_api_clock.h] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220902085316.2071519-3-npiggin@gmail.com
2022-09-02powerpc/papr_scm: Ensure rc is always initialized in papr_scm_pmu_register()Nathan Chancellor1-1/+3
Clang warns: arch/powerpc/platforms/pseries/papr_scm.c:492:6: warning: variable 'rc' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] if (!p->stat_buffer_len) ^~~~~~~~~~~~~~~~~~~ arch/powerpc/platforms/pseries/papr_scm.c:523:64: note: uninitialized use occurs here dev_info(&p->pdev->dev, "nvdimm pmu didn't register rc=%d\n", rc); ^~ include/linux/dev_printk.h:150:67: note: expanded from macro 'dev_info' dev_printk_index_wrap(_dev_info, KERN_INFO, dev, dev_fmt(fmt), ##__VA_ARGS__) ^~~~~~~~~~~ include/linux/dev_printk.h:110:23: note: expanded from macro 'dev_printk_index_wrap' _p_func(dev, fmt, ##__VA_ARGS__); \ ^~~~~~~~~~~ arch/powerpc/platforms/pseries/papr_scm.c:492:2: note: remove the 'if' if its condition is always false if (!p->stat_buffer_len) ^~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/platforms/pseries/papr_scm.c:484:8: note: initialize the variable 'rc' to silence this warning int rc, nodeid; ^ = 0 1 warning generated. The call to papr_scm_pmu_check_events() was eliminated but a return code was not added to the if statement. Add the same return code from papr_scm_pmu_check_events() for this condition so there is no more warning. Fixes: 9b1ac04698a4 ("powerpc/papr_scm: Fix nvdimm event mappings") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://github.com/ClangBuiltLinux/linux/issues/1701 Link: https://lore.kernel.org/r/20220830151256.1473169-1-nathan@kernel.org
2022-08-26powerpc: move from strlcpy with unused retval to strscpyWolfram Sang1-1/+1
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Link: https://lore.kernel.org/r/20220818205946.6336-1-wsa+renesas@sang-engineering.com
2022-08-23powerpc/papr_scm: Fix nvdimm event mappingsKajol Jain1-61/+27
Commit 4c08d4bbc089 ("powerpc/papr_scm: Add perf interface support") added performance monitoring support for papr-scm nvdimm devices via perf interface. Commit also added an array in papr_scm_priv structure called "nvdimm_events_map", which got filled based on the result of H_SCM_PERFORMANCE_STATS hcall. Currently there is an assumption that the order of events in the stats buffer, returned by the hypervisor is same. And order also happens to matches with the events specified in nvdimm driver code. But this assumption is not documented in Power Architecture Platform Requirements (PAPR) document. Although the order of events happens to be same on current generation od system, but it might not be true in future generation systems. Fix the issue, by adding a static mapping for nvdimm events to corresponding stat-id, and removing the dynamic map from papr_scm_priv structure. Also remove the function papr_scm_pmu_check_events from papr_scm.c file, as we no longer need to copy stat-ids dynamically. Fixes: 4c08d4bbc089 ("powerpc/papr_scm: Add perf interface support") Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220804074852.55157-1-kjain@linux.ibm.com
2022-08-06Merge tag 'powerpc-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds14-55/+712
Pull powerpc updates from Michael Ellerman: - Add support for syscall stack randomization - Add support for atomic operations to the 32 & 64-bit BPF JIT - Full support for KASAN on 64-bit Book3E - Add a watchdog driver for the new PowerVM hypervisor watchdog - Add a number of new selftests for the Power10 PMU support - Add a driver for the PowerVM Platform KeyStore - Increase the NMI watchdog timeout during live partition migration, to avoid timeouts due to increased memory access latency - Add support for using the 'linux,pci-domain' device tree property for PCI domain assignment - Many other small features and fixes Thanks to Alexey Kardashevskiy, Andy Shevchenko, Arnd Bergmann, Athira Rajeev, Bagas Sanjaya, Christophe Leroy, Erhard Furtner, Fabiano Rosas, Greg Kroah-Hartman, Greg Kurz, Haowen Bai, Hari Bathini, Jason A. Donenfeld, Jason Wang, Jiang Jian, Joel Stanley, Juerg Haefliger, Kajol Jain, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Masahiro Yamada, Maxime Bizon, Miaoqian Lin, Murilo Opsfelder Araújo, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Ning Qiang, Pali Rohár, Petr Mladek, Rashmica Gupta, Sachin Sant, Scott Cheloha, Segher Boessenkool, Stephen Rothwell, Uwe Kleine-König, Wolfram Sang, Xiu Jianfeng, and Zhouyi Zhou. * tag 'powerpc-6.0-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (191 commits) powerpc/64e: Fix kexec build error EDAC/ppc_4xx: Include required of_irq header directly powerpc/pci: Fix PHB numbering when using opal-phbid powerpc/64: Init jump labels before parse_early_param() selftests/powerpc: Avoid GCC 12 uninitialised variable warning powerpc/cell/axon_msi: Fix refcount leak in setup_msi_msg_address powerpc/xive: Fix refcount leak in xive_get_max_prio powerpc/spufs: Fix refcount leak in spufs_init_isolated_loader powerpc/perf: Include caps feature for power10 DD1 version powerpc: add support for syscall stack randomization powerpc: Move system_call_exception() to syscall.c powerpc/powernv: rename remaining rng powernv_ functions to pnv_ powerpc/powernv/kvm: Use darn for H_RANDOM on Power9 powerpc/powernv: Avoid crashing if rng is NULL selftests/powerpc: Fix matrix multiply assist test powerpc/signal: Update comment for clarity powerpc: make facility_unavailable_exception 64s powerpc/platforms/83xx/suspend: Remove write-only global variable powerpc/platforms/83xx/suspend: Prevent unloading the driver powerpc/platforms/83xx/suspend: Reorder to get rid of a forward declaration ...
2022-08-03Merge tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds1-57/+3
Pull folio updates from Matthew Wilcox: - Fix an accounting bug that made NR_FILE_DIRTY grow without limit when running xfstests - Convert more of mpage to use folios - Remove add_to_page_cache() and add_to_page_cache_locked() - Convert find_get_pages_range() to filemap_get_folios() - Improvements to the read_cache_page() family of functions - Remove a few unnecessary checks of PageError - Some straightforward filesystem conversions to use folios - Split PageMovable users out from address_space_operations into their own movable_operations - Convert aops->migratepage to aops->migrate_folio - Remove nobh support (Christoph Hellwig) * tag 'folio-6.0' of git://git.infradead.org/users/willy/pagecache: (78 commits) fs: remove the NULL get_block case in mpage_writepages fs: don't call ->writepage from __mpage_writepage fs: remove the nobh helpers jfs: stop using the nobh helper ext2: remove nobh support ntfs3: refactor ntfs_writepages mm/folio-compat: Remove migration compatibility functions fs: Remove aops->migratepage() secretmem: Convert to migrate_folio hugetlb: Convert to migrate_folio aio: Convert to migrate_folio f2fs: Convert to filemap_migrate_folio() ubifs: Convert to filemap_migrate_folio() btrfs: Convert btrfs_migratepage to migrate_folio mm/migrate: Add filemap_migrate_folio() mm/migrate: Convert migrate_page() to migrate_folio() nfs: Convert to migrate_folio btrfs: Convert btree_migratepage to migrate_folio mm/migrate: Convert expected_page_refs() to folio_expected_refs() mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio() ...
2022-08-02mm: Convert all PageMovable users to movable_operationsMatthew Wilcox (Oracle)1-57/+3
These drivers are rather uncomfortably hammered into the address_space_operations hole. They aren't filesystems and don't behave like filesystems. They just need their own movable_operations structure, which we can point to directly from page->mapping. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-07-28powerpc/pseries/vas: Fix comment typoJason Wang1-1/+1
The double `the' in line 807 is duplicated, remove one. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220718075553.70897-1-wangborong@cdjrlc.com
2022-07-28powerpc/pseries: define driver for Platform KeyStoreNayna Jain4-0/+545
PowerVM provides an isolated Platform Keystore(PKS) storage allocation for each LPAR with individually managed access controls to store sensitive information securely. It provides a new set of hypervisor calls for Linux kernel to access PKS storage. Define POWER LPAR Platform KeyStore(PLPKS) driver using H_CALL interface to access PKS storage. Signed-off-by: Nayna Jain <nayna@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220723113048.521744-2-nayna@linux.ibm.com
2022-07-28pseries/iommu/ddw: Fix kdump to work in absence of ibm,dma-windowAlexey Kardashevskiy1-41/+48
The pseries platform uses 32bit default DMA window (always 4K pages) and optional 64bit DMA window available via DDW ("Dynamic DMA Windows"), 64K or 2M pages. For ages the default one was not removed and a huge window was created in addition. Things changed with SRIOV-enabled PowerVM which creates a default-and-bigger DMA window in 64bit space (still using 4K pages) for IOV VFs so certain OSes do not need to use the DDW API in order to utilize all available TCE budget. Linux on the other hand removes the default window and creates a bigger one (with more TCEs or/and a bigger page size - 64K/2M) in a bid to map the entire RAM, and if the new window size is smaller than that - it still uses this new bigger window. The result is that the default window is removed but the "ibm,dma-window" property is not. When kdump is invoked, the existing code tries reusing the existing 64bit DMA window which location and parameters are stored in the device tree but this fails as the new property does not make it to the kdump device tree blob. So the code falls back to the default window which does not exist anymore although the device tree says that it does. The result of that is that PCI devices become unusable and cannot be used for kdumping. This preserves the DMA64 and DIRECT64 properties in the device tree blob for the crash kernel. Since the crash kernel setup is done after device drivers are loaded and probed, the proper DMA config is stored at least for boot time devices. Because DDW window is optional and the code configures the default window first, the existing code creates an IOMMU table descriptor for the non-existing default DMA window. It is harmless for kdump as it does not touch the actual window (only reads what is mapped and marks those IO pages as used) but it is bad for kexec which clears it thinking it is a smaller default window rather than a bigger DDW window. This removes the "ibm,dma-window" property from the device tree after a bigger window is created and the crash kernel setup picks it up. Fixes: 381ceda88c4c ("powerpc/pseries/iommu: Make use of DDW for indirect mapping") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220629060614.1680476-1-aik@ozlabs.ru
2022-07-27powerpc/pseries/mobility: set NMI watchdog factor during an LPMLaurent Dufour1-0/+43
During an LPM, while the memory transfer is in progress on the arrival side, some latencies are generated when accessing not yet transferred pages on the arrival side. Thus, the NMI watchdog may be triggered too frequently, which increases the risk to hit an NMI interrupt in a bad place in the kernel, leading to a kernel panic. Disabling the Hard Lockup Watchdog until the memory transfer could be a too strong work around, some users would want this timeout to be eventually triggered if the system is hanging even during an LPM. Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply a factor to the NMI watchdog timeout during an LPM. Just before the CPUs are stopped for the switchover sequence, the NMI watchdog timer is set to watchdog_thresh + factor% A value of 0 has no effect. The default value is 200, meaning that the NMI watchdog is set to 30s during LPM (based on a 10s watchdog_thresh value). Once the memory transfer is achieved, the factor is reset to 0. Setting this value to a high number is like disabling the NMI watchdog during an LPM. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220713154729.80789-5-ldufour@linux.ibm.com
2022-07-25powerpc/mobility: wait for memory transfer to completeLaurent Dufour1-2/+46
In pseries_migration_partition(), loop until the memory transfer is complete. This way the calling drmgr process will not exit earlier, allowing callbacks to be run only once the migration is fully completed. If reading the VASI state is done after the hypervisor has completed the migration, the HCALL is returning H_PARAMETER. We can safely assume that the memory transfer is achieved if this happens. This will also allow to manage the NMI watchdog state in the next commits. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220713154729.80789-2-ldufour@linux.ibm.com
2022-07-25powerpc: Fix all occurences of duplicate wordsMichael Ellerman2-2/+2
Since commit 87c78b612f4f ("powerpc: Fix all occurences of "the the"") fixed "the the", there's now a steady stream of patches fixing other duplicate words. Just fix them all at once, to save the overhead of dealing with individual patches for each case. This leaves a few cases of "that that", which in some contexts is correct. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220718095158.326606-1-mpe@ellerman.id.au
2022-07-20powerpc/pseries: register pseries-wdt device with platform busScott Cheloha1-0/+13
PAPR v2.12 defines a new hypercall, H_WATCHDOG. The hypercall permits guest control of one or more virtual watchdog timers. These timers do not conform to PowerPC device conventions. They are not affixed to any extant bus, nor do they have full representation in the device tree. As a workaround we represent them as platform devices. This patch registers a single platform device, "pseries-wdt", with the platform bus if the FW_FEATURE_WATCHDOG flag is set. A driver for this device, "pseries-wdt", will be introduced in a subsequent patch. Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220713202335.1217647-4-cheloha@linux.ibm.com
2022-07-20powerpc/pseries: add FW_FEATURE_WATCHDOG flagScott Cheloha1-0/+1
PAPR v2.12 specifies a new optional function set, "hcall-watchdog", for the /rtas/ibm,hypertas-functions property. The presence of this function set indicates support for the H_WATCHDOG hypercall. Check for this function set and, if present, set the new FW_FEATURE_WATCHDOG flag. Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220713202335.1217647-3-cheloha@linux.ibm.com
2022-07-18random: remove CONFIG_ARCH_RANDOMJason A. Donenfeld1-1/+0
When RDRAND was introduced, there was much discussion on whether it should be trusted and how the kernel should handle that. Initially, two mechanisms cropped up, CONFIG_ARCH_RANDOM, a compile time switch, and "nordrand", a boot-time switch. Later the thinking evolved. With a properly designed RNG, using RDRAND values alone won't harm anything, even if the outputs are malicious. Rather, the issue is whether those values are being *trusted* to be good or not. And so a new set of options were introduced as the real ones that people use -- CONFIG_RANDOM_TRUST_CPU and "random.trust_cpu". With these options, RDRAND is used, but it's not always credited. So in the worst case, it does nothing, and in the best case, maybe it helps. Along the way, CONFIG_ARCH_RANDOM's meaning got sort of pulled into the center and became something certain platforms force-select. The old options don't really help with much, and it's a bit odd to have special handling for these instructions when the kernel can deal fine with the existence or untrusted existence or broken existence or non-existence of that CPU capability. Simplify the situation by removing CONFIG_ARCH_RANDOM and using the ordinary asm-generic fallback pattern instead, keeping the two options that are actually used. For now it leaves "nordrand" for now, as the removal of that will take a different route. Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-07-09Merge branch 'fixes' into nextMichael Ellerman3-8/+6
Merge our fixes branch. In particular this brings in commit 986481618023 ("powerpc/book3e: Fix PUD allocation size in map_kernel_page()") which fixes a build failure in next, because commit 2db2008e6363 ("powerpc/64e: Rewrite p4d_populate() as a static inline function") depends on it.
2022-06-29powerpc/pseries/iommu: Print ibm,query-pe-dma-windows parametersAlexey Kardashevskiy1-3/+5
PowerVM has a stricter policy about allocating TCEs for LPARs and often there is not enough TCEs for 1:1 mapping, this adds the supported numbers into dev_info() to help analyzing bugreports. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220601040117.1467710-1-aik@ozlabs.ru
2022-06-29powerpc/64s: Don't read H_BLOCK_REMOVE characteristics in radix modeLaurent Dufour1-3/+2
There is no need to read the H_BLOCK_REMOVE characteristics when running in Radix mode because this hcall is never called. Furthermore since the commit 387e220a2e5e ("powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU") define pseries_lpar_read_hblkrm_characteristics as un empty function if CONFIG_PPC_64S_HASH_MMU is not set, the #ifdef block can be removed. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220523164353.26441-1-ldufour@linux.ibm.com
2022-06-29powerpc/papr_scm: use dev_get_drvdataHaowen Bai1-1/+1
Eliminate direct accesses to the driver_data field. Signed-off-by: Haowen Bai <baihaowen@meizu.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1653988790-19999-1-git-send-email-baihaowen@meizu.com
2022-06-29powerpc: Include asm/firmware.h in all users of firmware_has_feature()Christophe Leroy2-0/+2
Trying to remove asm/ppc_asm.h from all places that don't need it leads to several failures linked to firmware_has_feature(). To fix it, include asm/firmware.h in all files using firmware_has_feature() All users found with: git grep -L "firmware\.h" ` git grep -l "firmware_has_feature("` Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/11956ec181a034b51a881ac9c059eea72c679a73.1651828453.git.christophe.leroy@csgroup.eu
2022-06-20powerpc: Don't include asm/setup.h in asm/machdep.hChristophe Leroy3-2/+3
asm/machdep.h doesn't need asm/setup.h Remove it. Add it directly in files that needs it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3b1dfb19a2c3265fb4abc2bfc7b6eae9261a998b.1654966508.git.christophe.leroy@csgroup.eu
2022-06-18powerpc/pseries: wire up rng during setup_arch()Jason A. Donenfeld3-8/+6
The platform's RNG must be available before random_init() in order to be useful for initial seeding, which in turn means that it needs to be called from setup_arch(), rather than from an init call. Fortunately, each platform already has a setup_arch function pointer, which means it's easy to wire this up. This commit also removes some noisy log messages that don't add much. Fixes: a489043f4626 ("powerpc/pseries: Implement arch_get_random_long() based on H_RANDOM") Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220611151015.548325-4-Jason@zx2c4.com
2022-05-31powerpc/papr_scm: don't requests stats with '0' sized stats bufferVaibhav Jain1-0/+3
Sachin reported [1] that on a POWER-10 lpar he is seeing a kernel panic being reported with vPMEM when papr_scm probe is being called. The panic is of the form below and is observed only with following option disabled(profile) for the said LPAR 'Enable Performance Information Collection' in the HMC: Kernel attempted to write user page (1c) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on write at 0x0000001c Faulting instruction address: 0xc008000001b90844 Oops: Kernel access of bad area, sig: 11 [#1] <snip> NIP [c008000001b90844] drc_pmem_query_stats+0x5c/0x270 [papr_scm] LR [c008000001b92794] papr_scm_probe+0x2ac/0x6ec [papr_scm] Call Trace: 0xc00000000941bca0 (unreliable) papr_scm_probe+0x2ac/0x6ec [papr_scm] platform_probe+0x98/0x150 really_probe+0xfc/0x510 __driver_probe_device+0x17c/0x230 <snip> ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Fatal exception On investigation looks like this panic was caused due to a 'stat_buffer' of size==0 being provided to drc_pmem_query_stats() to fetch all performance stats-ids of an NVDIMM. However drc_pmem_query_stats() shouldn't have been called since the vPMEM NVDIMM doesn't support and performance stat-id's. This was caused due to missing check for 'p->stat_buffer_len' at the beginning of papr_scm_pmu_check_events() which indicates that the NVDIMM doesn't support performance-stats. Fix this by introducing the check for 'p->stat_buffer_len' at the beginning of papr_scm_pmu_check_events(). [1] https://lore.kernel.org/all/6B3A522A-6A5F-4CC9-B268-0C63AA6E07D3@linux.ibm.com Fixes: 0e0946e22f3665d2732 ("powerpc/papr_scm: Fix leaking nvdimm_events_map elements") Reported-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220524112353.1718454-1-vaibhav@linux.ibm.com
2022-05-28Merge tag 'powerpc-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds23-79/+77
Pull powerpc updates from Michael Ellerman: - Convert to the generic mmap support (ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT) - Add support for outline-only KASAN with 64-bit Radix MMU (P9 or later) - Increase SIGSTKSZ and MINSIGSTKSZ and add support for AT_MINSIGSTKSZ - Enable the DAWR (Data Address Watchpoint) on POWER9 DD2.3 or later - Drop support for system call instruction emulation - Many other small features and fixes Thanks to Alexey Kardashevskiy, Alistair Popple, Andy Shevchenko, Bagas Sanjaya, Bjorn Helgaas, Bo Liu, Chen Huang, Christophe Leroy, Colin Ian King, Daniel Axtens, Dwaipayan Ray, Fabiano Rosas, Finn Thain, Frank Rowand, Fuqian Huang, Guilherme G. Piccoli, Hangyu Hua, Haowen Bai, Haren Myneni, Hari Bathini, He Ying, Jason Wang, Jiapeng Chong, Jing Yangyang, Joel Stanley, Julia Lawall, Kajol Jain, Kevin Hao, Krzysztof Kozlowski, Laurent Dufour, Lv Ruyi, Madhavan Srinivasan, Magali Lemes, Miaoqian Lin, Minghao Chi, Nathan Chancellor, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Oscar Salvador, Pali Rohár, Paul Mackerras, Peng Wu, Qing Wang, Randy Dunlap, Reza Arbab, Russell Currey, Sohaib Mohamed, Vaibhav Jain, Vasant Hegde, Wang Qing, Wang Wensheng, Xiang wangx, Xiaomeng Tong, Xu Wang, Yang Guang, Yang Li, Ye Bin, YueHaibing, Yu Kuai, Zheng Bin, Zou Wei, and Zucheng Zheng. * tag 'powerpc-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (200 commits) powerpc/64: Include cache.h directly in paca.h powerpc/64s: Only set HAVE_ARCH_UNMAPPED_AREA when CONFIG_PPC_64S_HASH_MMU is set powerpc/xics: Include missing header powerpc/powernv/pci: Drop VF MPS fixup powerpc/fsl_book3e: Don't set rodata RO too early powerpc/microwatt: Add mmu bits to device tree powerpc/powernv/flash: Check OPAL flash calls exist before using powerpc/powermac: constify device_node in of_irq_parse_oldworld() powerpc/powermac: add missing g5_phy_disable_cpu1() declaration selftests/powerpc/pmu: fix spelling mistake "mis-match" -> "mismatch" powerpc: Enable the DAWR on POWER9 DD2.3 and above powerpc/64s: Add CPU_FTRS_POWER10 to ALWAYS mask powerpc/64s: Add CPU_FTRS_POWER9_DD2_2 to CPU_FTRS_ALWAYS mask powerpc: Fix all occurences of "the the" selftests/powerpc/pmu/ebb: remove fixed_instruction.S powerpc/platforms/83xx: Use of_device_get_match_data() powerpc/eeh: Drop redundant spinlock initialization powerpc/iommu: Add missing of_node_put in iommu_init_early_dart powerpc/pseries/vas: Call misc_deregister if sysfs init fails powerpc/papr_scm: Fix leaking nvdimm_events_map elements ...
2022-05-25Merge tag 'dma-mapping-5.19-2022-05-25' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds2-28/+1
Pull dma-mapping updates from Christoph Hellwig: - don't over-decrypt memory (Robin Murphy) - takes min align mask into account for the swiotlb max mapping size (Tianyu Lan) - use GFP_ATOMIC in dma-debug (Mikulas Patocka) - fix DMA_ATTR_NO_KERNEL_MAPPING on xen/arm (me) - don't fail on highmem CMA pages in dma_direct_alloc_pages (me) - cleanup swiotlb initialization and share more code with swiotlb-xen (me, Stefano Stabellini) * tag 'dma-mapping-5.19-2022-05-25' of git://git.infradead.org/users/hch/dma-mapping: (23 commits) dma-direct: don't over-decrypt memory swiotlb: max mapping size takes min align mask into account swiotlb: use the right nslabs-derived sizes in swiotlb_init_late swiotlb: use the right nslabs value in swiotlb_init_remap swiotlb: don't panic when the swiotlb buffer can't be allocated dma-debug: change allocation mode from GFP_NOWAIT to GFP_ATIOMIC dma-direct: don't fail on highmem CMA pages in dma_direct_alloc_pages swiotlb-xen: fix DMA_ATTR_NO_KERNEL_MAPPING on arm x86: remove cruft from <asm/dma-mapping.h> swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl swiotlb: merge swiotlb-xen initialization into swiotlb swiotlb: provide swiotlb_init variants that remap the buffer swiotlb: pass a gfp_mask argument to swiotlb_init_late swiotlb: add a SWIOTLB_ANY flag to lift the low memory restriction swiotlb: make the swiotlb_init interface more useful x86: centralize setting SWIOTLB_FORCE when guest memory encryption is enabled x86: remove the IOMMU table infrastructure MIPS/octeon: use swiotlb_init instead of open coding it arm/xen: don't check for xen_initial_domain() in xen_create_contiguous_region swiotlb: rename swiotlb_late_init_with_default_size ...
2022-05-22powerpc/eeh: Drop redundant spinlock initializationHaowen Bai1-2/+1
slot_errbuf_lock has declared and initialized by DEFINE_SPINLOCK, so we don't need to spin_lock_init again, drop it. Signed-off-by: Haowen Bai <baihaowen@meizu.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1652232476-9696-1-git-send-email-baihaowen@meizu.com
2022-05-22powerpc/pseries/vas: Call misc_deregister if sysfs init failsZheng Bin1-0/+2
Undo effects of misc_register if sysfs init fails after misc_register. Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220511033507.2745992-1-zhengbin13@huawei.com
2022-05-22powerpc/papr_scm: Fix leaking nvdimm_events_map elementsVaibhav Jain1-30/+24
Right now 'char *' elements allocated for individual 'stat_id' in 'papr_scm_priv.nvdimm_events_map[]' during papr_scm_pmu_check_events(), get leaked in papr_scm_remove() and papr_scm_pmu_register(), papr_scm_pmu_check_events() error paths. Also individual 'stat_id' arent NULL terminated 'char *' instead they are fixed 8-byte sized identifiers. However papr_scm_pmu_register() assumes it to be a NULL terminated 'char *' and at other places it assumes it to be a 'papr_scm_perf_stat.stat_id' sized string which is 8-byes in size. Fix this by allocating the memory for papr_scm_priv.nvdimm_events_map to also include space for 'stat_id' entries. This is possible since number of available events/stat_ids are known upfront. This saves some memory and one extra level of indirection from 'nvdimm_events_map' to 'stat_id'. Also rest of the code can continue to call 'kfree(papr_scm_priv.nvdimm_events_map)' without needing to iterate over the array and free up individual elements. Fixes: 4c08d4bbc089 ("powerpc/papr_scm: Add perf interface support") Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220511082637.646714-1-vaibhav@linux.ibm.com
2022-05-22powerpc/numa: Associate numa node to its cpu earlierOscar Salvador1-1/+1
powerpc is the only platform that do not rely on cpu_up()->try_online_node() to bring up a numa node, and special cases it, instead, deep in its own machinery: dlpar_online_cpu find_and_online_cpu_nid try_online_node This should not be needed, but the thing is that the try_online_node() from cpu_up() will not apply on the right node, because cpu_to_node() will return the old mapping numa<->cpu that gets set on boot stage for all possible cpus. That can be seen easily if we try to print out the numa node passed to try_online_node() in cpu_up(). The thing is that the numa<->cpu mapping does not get updated till a much later stage in start_secondary: start_secondary: set_numa_node(numa_cpu_lookup_table[cpu]) But we do not really care, as we already now the CPU <-> NUMA associativity back in find_and_online_cpu_nid(), so let us make use of that and set the proper numa<->cpu mapping, so cpu_to_node() in cpu_up() returns the right node and try_online_node() can do its work. Signed-off-by: Oscar Salvador <osalvador@suse.de> Tested-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220411074934.4632-1-osalvador@suse.de
2022-05-22powerpc: Book3S 64-bit outline-only KASAN supportDaniel Axtens1-0/+2
Implement a limited form of KASAN for Book3S 64-bit machines running under the Radix MMU, supporting only outline mode. - Enable the compiler instrumentation to check addresses and maintain the shadow region. (This is the guts of KASAN which we can easily reuse.) - Require kasan-vmalloc support to handle modules and anything else in vmalloc space. - KASAN needs to be able to validate all pointer accesses, but we can't instrument all kernel addresses - only linear map and vmalloc. On boot, set up a single page of read-only shadow that marks all iomap and vmemmap accesses as valid. - Document KASAN in powerpc docs. Background ---------- KASAN support on Book3S is a bit tricky to get right: - It would be good to support inline instrumentation so as to be able to catch stack issues that cannot be caught with outline mode. - Inline instrumentation requires a fixed offset. - Book3S runs code with translations off ("real mode") during boot, including a lot of generic device-tree parsing code which is used to determine MMU features. [ppc64 mm note: The kernel installs a linear mapping at effective address c000...-c008.... This is a one-to-one mapping with physical memory from 0000... onward. Because of how memory accesses work on powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the same memory both with translations on (accessing as an 'effective address'), and with translations off (accessing as a 'real address'). This works in both guests and the hypervisor. For more details, see s5.7 of Book III of version 3 of the ISA, in particular the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this KASAN implementation currently only supports Radix.] - Some code - most notably a lot of KVM code - also runs with translations off after boot. - Therefore any offset has to point to memory that is valid with translations on or off. One approach is just to give up on inline instrumentation. This way boot-time checks can be delayed until after the MMU is set is up, and we can just not instrument any code that runs with translations off after booting. Take this approach for now and require outline instrumentation. Previous attempts allowed inline instrumentation. However, they came with some unfortunate restrictions: only physically contiguous memory could be used and it had to be specified at compile time. Maybe we can do better in the future. [paulus@ozlabs.org - Rebased onto 5.17. Note that a kernel with CONFIG_KASAN=y will crash during boot on a machine using HPT translation because not all the entry points to the generic KASAN code are protected with a call to kasan_arch_is_ready().] Originally-by: Balbir Singh <bsingharora@gmail.com> # ppc64 out-of-line radix version Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> [mpe: Update copyright year and comment formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/YoTE69OQwiG7z+Gu@cleo
2022-05-22powerpc/kasan: Disable address sanitization in kexec pathsDaniel Axtens4-11/+12
The kexec code paths involve code that necessarily run in real mode, as CPUs are disabled and control is transferred to the new kernel. Disable address sanitization for the kexec code and the functions called in real mode on CPUs being disabled. [paulus@ozlabs.org: combined a few work-in-progress commits of Daniel's and wrote the commit message.] Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> [mpe: Move pseries_machine_kexec() into kexec.c so setup.c can be instrumented] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/YoTFSQ2TUSEaDdVC@cleo
2022-05-22powerpc/pseries/vas: sysfs comments with the correct entriesHaren Myneni1-7/+7
VAS entry is created as a misc device and the sysfs comments should list the proper entries Reported-by: Matheus Castanho <mscastanho@ibm.com> Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6dee950c7b72a4965c102208041f14a063cf5a8c.camel@linux.ibm.com
2022-05-19Merge branch 'topic/ppc-kvm' into nextMichael Ellerman1-2/+1
Merge our KVM topic branch.
2022-05-19KVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlersAlexey Kardashevskiy1-2/+1
LoPAPR defines guest visible IOMMU with hypercalls to use it - H_PUT_TCE/etc. Implemented first on POWER7 where hypercalls would trap in the KVM in the real mode (with MMU off). The problem with the real mode is some memory is not available and some API usage crashed the host but enabling MMU was an expensive operation. The problems with the real mode handlers are: 1. Occasionally these cannot complete the request so the code is copied+modified to work in the virtual mode, very little is shared; 2. The real mode handlers have to be linked into vmlinux to work; 3. An exception in real mode immediately reboots the machine. If the small DMA window is used, the real mode handlers bring better performance. However since POWER8, there has always been a bigger DMA window which VMs use to map the entire VM memory to avoid calling H_PUT_TCE. Such 1:1 mapping happens once and uses H_PUT_TCE_INDIRECT (a bulk version of H_PUT_TCE) which virtual mode handler is even closer to its real mode version. On POWER9 hypercalls trap straight to the virtual mode so the real mode handlers never execute on POWER9 and later CPUs. So with the current use of the DMA windows and MMU improvements in POWER9 and later, there is no point in duplicating the code. The 32bit passed through devices may slow down but we do not have many of these in practice. For example, with this applied, a 1Gbit ethernet adapter still demostrates above 800Mbit/s of actual throughput. This removes the real mode handlers from KVM and related code from the powernv platform. This updates the list of implemented hcalls in KVM-HV as the realmode handlers are removed. This changes ABI - kvmppc_h_get_tce() moves to the KVM module and kvmppc_find_table() is static now. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220506053755.3820702-1-aik@ozlabs.ru