aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc/platforms (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-03-16Merge tag 'powerpc-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds1-0/+1
Pull powerpc fixes from Michael Ellerman: "One fix to prevent runtime allocation of 16GB pages when running in a VM (as opposed to bare metal), because it doesn't work. A small fix to our recently added KCOV support to exempt some more code from being instrumented. Plus a few minor build fixes, a small dead code removal and a defconfig update. Thanks to: Alexey Kardashevskiy, Aneesh Kumar K.V, Christophe Leroy, Jason Yan, Joel Stanley, Mahesh Salgaonkar, Mathieu Malaterre" * tag 'powerpc-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Include <asm/nmi.h> header file to fix a warning powerpc/powernv: Fix compile without CONFIG_TRACEPOINTS powerpc/mm: Disable kcov for SLB routines powerpc: remove dead code in head_fsl_booke.S powerpc/configs: Sync skiroot defconfig powerpc/hugetlb: Don't do runtime allocation of 16G pages in LPAR configuration
2019-03-13powerpc/powernv: Fix compile without CONFIG_TRACEPOINTSAlexey Kardashevskiy1-0/+1
The functions returns s64 but the return statement is missing. This adds the missing return statement. Fixes: 75d9fc7fd94e ("powerpc/powernv: move OPAL call wrapper tracing and interrupt handling to C") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-03-12treewide: add checks for the return value of memblock_alloc*()Mike Rapoport5-0/+20
Add check for the return value of memblock_alloc*() functions and call panic() in case of error. The panic message repeats the one used by panicing memblock allocators with adjustment of parameters to include only relevant ones. The replacement was mostly automated with semantic patches like the one below with manual massaging of format strings. @@ expression ptr, size, align; @@ ptr = memblock_alloc(size, align); + if (!ptr) + panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, size, align); [anders.roxell@linaro.org: use '%pa' with 'phys_addr_t' type] Link: http://lkml.kernel.org/r/20190131161046.21886-1-anders.roxell@linaro.org [rppt@linux.ibm.com: fix format strings for panics after memblock_alloc] Link: http://lkml.kernel.org/r/1548950940-15145-1-git-send-email-rppt@linux.ibm.com [rppt@linux.ibm.com: don't panic if the allocation in sparse_buffer_init fails] Link: http://lkml.kernel.org/r/20190131074018.GD28876@rapoport-lnx [akpm@linux-foundation.org: fix xtensa printk warning] Link: http://lkml.kernel.org/r/1548057848-15136-20-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Reviewed-by: Guo Ren <ren_guo@c-sky.com> [c-sky] Acked-by: Paul Burton <paul.burton@mips.com> [MIPS] Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390] Reviewed-by: Juergen Gross <jgross@suse.com> [Xen] Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Max Filippov <jcmvbkbc@gmail.com> [xtensa] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dennis Zhou <dennis@kernel.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Mark Salter <msalter@redhat.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Petr Mladek <pmladek@suse.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Rob Herring <robh@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-07Merge branch 'akpm' (patches from Andrew)Linus Torvalds3-8/+18
Merge more updates from Andrew Morton: - some of the rest of MM - various misc things - dynamic-debug updates - checkpatch - some epoll speedups - autofs - rapidio - lib/, lib/lzo/ updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (83 commits) samples/mic/mpssd/mpssd.h: remove duplicate header kernel/fork.c: remove duplicated include include/linux/relay.h: fix percpu annotation in struct rchan arch/nios2/mm/fault.c: remove duplicate include unicore32: stop printing the virtual memory layout MAINTAINERS: fix GTA02 entry and mark as orphan mm: create the new vm_fault_t type arm, s390, unicore32: remove oneliner wrappers for memblock_alloc() arch: simplify several early memory allocations openrisc: simplify pte_alloc_one_kernel() sh: prefer memblock APIs returning virtual address microblaze: prefer memblock API returning virtual address powerpc: prefer memblock APIs returning virtual address lib/lzo: separate lzo-rle from lzo lib/lzo: implement run-length encoding lib/lzo: fast 8-byte copy on arm64 lib/lzo: 64-bit CTZ on arm64 lib/lzo: tidy-up ifdefs ipc/sem.c: replace kvmalloc/memset with kvzalloc and use struct_size ipc: annotate implicit fall through ...
2019-03-07arch: simplify several early memory allocationsMike Rapoport1-2/+1
There are several early memory allocations in arch/ code that use memblock_phys_alloc() to allocate memory, convert the returned physical address to the virtual address and then set the allocated memory to zero. Exactly the same behaviour can be achieved simply by calling memblock_alloc(): it allocates the memory in the same way as memblock_phys_alloc(), then it performs the phys_to_virt() conversion and clears the allocated memory. Replace the longer sequence with a simpler call to memblock_alloc(). Link: http://lkml.kernel.org/r/1546248566-14910-6-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Greentime Hu <green.hu@gmail.com> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Simek <michal.simek@xilinx.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Paul Mackerras <paulus@samba.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-07powerpc: prefer memblock APIs returning virtual addressMike Rapoport2-6/+17
Patch series "memblock: simplify several early memory allocation", v4. These patches simplify some of the early memory allocations by replacing usage of older memblock APIs with newer and shinier ones. Quite a few places in the arch/ code allocated memory using a memblock API that returns a physical address of the allocated area, then converted this physical address to a virtual one and then used memset(0) to clear the allocated range. More recent memblock APIs do all the three steps in one call and their usage simplifies the code. It's important to note that regardless of API used, the core allocation is nearly identical for any set of memblock allocators: first it tries to find a free memory with all the constraints specified by the caller and then falls back to the allocation with some or all constraints disabled. The first three patches perform the conversion of call sites that have exact requirements for the node and the possible memory range. The fourth patch is a bit one-off as it simplifies openrisc's implementation of pte_alloc_one_kernel(), and not only the memblock usage. The fifth patch takes care of simpler cases when the allocation can be satisfied with a simple call to memblock_alloc(). The sixth patch removes one-liner wrappers for memblock_alloc on arm and unicore32, as suggested by Christoph. This patch (of 6): There are a several places that allocate memory using memblock APIs that return a physical address, convert the returned address to the virtual address and frequently also memset(0) the allocated range. Update these places to use memblock allocators already returning a virtual address. Use memblock functions that clear the allocated memory instead of calling memset(0) where appropriate. The calls to memblock_alloc_base() that were not followed by memset(0) are replaced with memblock_alloc_try_nid_raw(). Since the latter does not panic() when the allocation fails, the appropriate panic() calls are added to the call sites. Link: http://lkml.kernel.org/r/1546248566-14910-2-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Greentime Hu <green.hu@gmail.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Mark Salter <msalter@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Stafford Horne <shorne@gmail.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Christoph Hellwig <hch@infradead.org> Cc: Michal Simek <michal.simek@xilinx.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-07Merge tag 'powerpc-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds39-850/+556
Pull powerpc updates from Michael Ellerman: "Notable changes: - Enable THREAD_INFO_IN_TASK to move thread_info off the stack. - A big series from Christoph reworking our DMA code to use more of the generic infrastructure, as he said: "This series switches the powerpc port to use the generic swiotlb and noncoherent dma ops, and to use more generic code for the coherent direct mapping, as well as removing a lot of dead code." - Increase our vmalloc space to 512T with the Hash MMU on modern CPUs, allowing us to support machines with larger amounts of total RAM or distance between nodes. - Two series from Christophe, one to optimise TLB miss handlers on 6xx, and another to optimise the way STRICT_KERNEL_RWX is implemented on some 32-bit CPUs. - Support for KCOV coverage instrumentation which means we can run syzkaller and discover even more bugs in our code. And as always many clean-ups, reworks and minor fixes etc. Thanks to: Alan Modra, Alexey Kardashevskiy, Alistair Popple, Andrea Arcangeli, Andrew Donnellan, Aneesh Kumar K.V, Aravinda Prasad, Balbir Singh, Brajeswar Ghosh, Breno Leitao, Christian Lamparter, Christian Zigotzky, Christophe Leroy, Christoph Hellwig, Corentin Labbe, Daniel Axtens, David Gibson, Diana Craciun, Firoz Khan, Gustavo A. R. Silva, Igor Stoppa, Joe Lawrence, Joel Stanley, Jonathan Neuschäfer, Jordan Niethe, Laurent Dufour, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Masahiro Yamada, Mathieu Malaterre, Matteo Croce, Meelis Roos, Michael W. Bringmann, Nathan Chancellor, Nathan Fontenot, Nicholas Piggin, Nick Desaulniers, Nicolai Stange, Oliver O'Halloran, Paul Mackerras, Peter Xu, PrasannaKumar Muralidharan, Qian Cai, Rashmica Gupta, Reza Arbab, Robert P. J. Day, Russell Currey, Sabyasachi Gupta, Sam Bobroff, Sandipan Das, Sergey Senozhatsky, Souptick Joarder, Stewart Smith, Tyrel Datwyler, Vaibhav Jain, YueHaibing" * tag 'powerpc-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (200 commits) powerpc/32: Clear on-stack exception marker upon exception return powerpc: Remove export of save_stack_trace_tsk_reliable() powerpc/mm: fix "section_base" set but not used powerpc/mm: Fix "sz" set but not used warning powerpc/mm: Check secondary hash page table powerpc: remove nargs from __SYSCALL powerpc/64s: Fix unrelocated interrupt trampoline address test powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables powerpc/fsl: Fix the flush of branch predictor. powerpc/powernv: Make opal log only readable by root powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc powerpc/powernv: move OPAL call wrapper tracing and interrupt handling to C powerpc/64s: Fix data interrupts vs d-side MCE reentrancy powerpc/64s: Prepare to handle data interrupts vs d-side MCE reentrancy powerpc/64s: system reset interrupt preserve HSRRs powerpc/64s: Fix HV NMI vs HV interrupt recoverability test powerpc/mm/hash: Handle mmap_min_addr correctly in get_unmapped_area topdown search powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback selftests/powerpc: Remove duplicate header powerpc sstep: Add support for modsd, modud instructions ...
2019-03-06Merge tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds1-6/+4
Pull driver core updates from Greg KH: "Here is the big driver core patchset for 5.1-rc1 More patches than "normal" here this merge window, due to some work in the driver core by Alexander Duyck to rework the async probe functionality to work better for a number of devices, and independant work from Rafael for the device link functionality to make it work "correctly". Also in here is: - lots of BUS_ATTR() removals, the macro is about to go away - firmware test fixups - ihex fixups and simplification - component additions (also includes i915 patches) - lots of minor coding style fixups and cleanups. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (65 commits) driver core: platform: remove misleading err_alloc label platform: set of_node in platform_device_register_full() firmware: hardcode the debug message for -ENOENT driver core: Add missing description of new struct device_link field driver core: Fix PM-runtime for links added during consumer probe drivers/component: kerneldoc polish async: Add cmdline option to specify drivers to be async probed driver core: Fix possible supplier PM-usage counter imbalance PM-runtime: Fix __pm_runtime_set_status() race with runtime resume driver: platform: Support parsing GpioInt 0 in platform_get_irq() selftests: firmware: fix verify_reqs() return value Revert "selftests: firmware: remove use of non-standard diff -Z option" Revert "selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config" device: Fix comment for driver_data in struct device kernfs: Allocating memory for kernfs_iattrs with kmem_cache. sysfs: remove unused include of kernfs-internal.h driver core: Postpone DMA tear-down until after devres release driver core: Document limitation related to DL_FLAG_RPM_ACTIVE PM-runtime: Take suppliers into account in __pm_runtime_set_status() device.h: Add __cold to dev_<level> logging functions ...
2019-03-06Merge tag 'char-misc-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds8-13/+23
Pull char/misc driver updates from Greg KH: "Here is the big char/misc driver patch pull request for 5.1-rc1. The largest thing by far is the new habanalabs driver for their AI accelerator chip. For now it is in the drivers/misc directory but will probably move to a new directory soon along with other drivers of this type. Other than that, just the usual set of individual driver updates and fixes. There's an "odd" merge in here from the DRM tree that they asked me to do as the MEI driver is starting to interact with the i915 driver, and it needed some coordination. All of those patches have been properly acked by the relevant subsystem maintainers. All of these have been in linux-next with no reported issues, most for quite some time" * tag 'char-misc-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (219 commits) habanalabs: adjust Kconfig to fix build errors habanalabs: use %px instead of %p in error print habanalabs: use do_div for 64-bit divisions intel_th: gth: Fix an off-by-one in output unassigning habanalabs: fix little-endian<->cpu conversion warnings habanalabs: use NULL to initialize array of pointers habanalabs: fix little-endian<->cpu conversion warnings habanalabs: soft-reset device if context-switch fails habanalabs: print pointer using %p habanalabs: fix memory leak with CBs with unaligned size habanalabs: return correct error code on MMU mapping failure habanalabs: add comments in uapi/misc/habanalabs.h habanalabs: extend QMAN0 job timeout habanalabs: set DMA0 completion to SOB 1007 habanalabs: fix validation of WREG32 to DMA completion habanalabs: fix mmu cache registers init habanalabs: disable CPU access on timeouts habanalabs: add MMU DRAM default page mapping habanalabs: Dissociate RAZWI info from event types misc/habanalabs: adjust Kconfig to fix build errors ...
2019-03-05mm: replace all open encodings for NUMA_NO_NODEAnshuman Khandual1-2/+3
Patch series "Replace all open encodings for NUMA_NO_NODE", v3. All these places for replacement were found by running the following grep patterns on the entire kernel code. Please let me know if this might have missed some instances. This might also have replaced some false positives. I will appreciate suggestions, inputs and review. 1. git grep "nid == -1" 2. git grep "node == -1" 3. git grep "nid = -1" 4. git grep "node = -1" This patch (of 2): At present there are multiple places where invalid node number is encoded as -1. Even though implicitly understood it is always better to have macros in there. Replace these open encodings for an invalid node number with the global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like 'invalid node' from various places redirecting them to a common definition. Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [ixgbe] Acked-by: Jens Axboe <axboe@kernel.dk> [mtip32xx] Acked-by: Vinod Koul <vkoul@kernel.org> [dmaengine.c] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Doug Ledford <dledford@redhat.com> [drivers/infiniband] Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Hans Verkuil <hverkuil@xs4all.nl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-02powerpc: remove nargs from __SYSCALLFiroz Khan1-1/+1
The __SYSCALL macro's arguments are system call number, system call entry name and number of arguments for the system call. Argument- nargs in __SYSCALL(nr, entry, nargs) is neither calculated nor used anywhere. So it would be better to keep the implementaion as __SYSCALL(nr, entry). This will unifies the implementation with some other architetures too. Signed-off-by: Firoz Khan <firoz.khan@linaro.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-28powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tablesAlexey Kardashevskiy2-2/+6
We store 2 multilevel tables in iommu_table - one for the hardware and one with the corresponding userspace addresses. Before allocating the tables, the iommu_table_group_ops::get_table_size() hook returns the combined size of the two and VFIO SPAPR TCE IOMMU driver adjusts the locked_vm counter correctly. When the table is actually allocated, the amount of allocated memory is stored in iommu_table::it_allocated_size and used to decrement the locked_vm counter when we release the memory used by the table; .get_table_size() and .create_table() calculate it independently but the result is expected to be the same. However the allocator does not add the userspace table size to .it_allocated_size so when we destroy the table because of VFIO PCI unplug (i.e. VFIO container is gone but the userspace keeps running), we decrement locked_vm by just a half of size of memory we are releasing. To make things worse, since we enabled on-demand allocation of indirect levels, it_allocated_size contains only the amount of memory actually allocated at the table creation time which can just be a fraction. It is not a problem with incrementing locked_vm (as get_table_size() value is used) but it is with decrementing. As the result, we leak locked_vm and may not be able to allocate more IOMMU tables after few iterations of hotplug/unplug. This sets it_allocated_size in the pnv_pci_ioda2_ops::create_table() hook to what pnv_pci_ioda2_get_table_size() returns so from now on we have a single place which calculates the maximum memory a table can occupy. The original meaning of it_allocated_size is somewhat lost now though. We do not ditch it_allocated_size whatsoever here and we do not call get_table_size() from vfio_iommu_spapr_tce.c when decrementing locked_vm as we may have multiple IOMMU groups per container and even though they all are supposed to have the same get_table_size() implementation, there is a small chance for failure or confusion. Fixes: 090bad39b237 ("powerpc/powernv: Add indirect levels to it_userspace") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-27powerpc/powernv: Make opal log only readable by rootJordan Niethe1-1/+1
Currently the opal log is globally readable. It is kernel policy to limit the visibility of physical addresses / kernel pointers to root. Given this and the fact the opal log may contain this information it would be better to limit the readability to root. Fixes: bfc36894a48b ("powerpc/powernv: Add OPAL message log interface") Cc: stable@vger.kernel.org # v3.15+ Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Stewart Smith <stewart@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-26powerpc/powernv: move OPAL call wrapper tracing and interrupt handling to CNicholas Piggin3-308/+324
The OPAL call wrapper gets interrupt disabling wrong. It disables interrupts just by clearing MSR[EE], which has two problems: - It doesn't call into the IRQ tracing subsystem, which means tracing across OPAL calls does not always notice IRQs have been disabled. - It doesn't go through the IRQ soft-mask code, which causes a minor bug. MSR[EE] can not be restored by saving the MSR then clearing MSR[EE], because a racing interrupt while soft-masked could clear MSR[EE] between the two steps. This can cause MSR[EE] to be incorrectly enabled when the OPAL call returns. Fortunately that should only result in another masked interrupt being taken to disable MSR[EE] again, but it's a bit sloppy. The existing code also saves MSR to PACA, which is not re-entrant if there is a nested OPAL call from different MSR contexts, which can happen these days with SRESET interrupts on bare metal. To fix these issues, move the tracing and IRQ handling code to C, and call into asm just for the low level call when everything is ready to go. Save the MSR on stack rather than PACA. Performance cost is kept to a minimum with a few optimisations: - The endian switch upon return is combined with the MSR restore, which avoids an expensive context synchronizing operation for LE kernels. This makes up for the additional mtmsrd to enable interrupts with local_irq_enable(). - blr is now used to return from the opal_* functions that are called as C functions, to avoid link stack corruption. This requires a skiboot fix as well to keep the call stack balanced. A NULL call is more costly after this, (410ns->430ns on POWER9), but OPAL calls are generally not performance critical at this scale. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23Merge tag 'powerpc-5.0-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds2-0/+4
Pull powerpc fix from Michael Ellerman: "One fix for an oops when using SRIOV, introduced by the recent changes to support compound IOMMU groups. Thanks to Alexey Kardashevskiy" * tag 'powerpc-5.0-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/powernv/sriov: Register IOMMU groups for VFs
2019-02-23powerpc/wii: remove wii_mmu_mapin_mem2()Christophe Leroy1-28/+0
wii_mmu_mapin_mem2() is not used anymore, remove it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-23powerpc/wii: properly disable use of BATs when requested.Christophe Leroy1-0/+4
'nobats' kernel parameter or some options like CONFIG_DEBUG_PAGEALLOC deny the use of BATS for mapping memory. This patch makes sure that the specific wii RAM mapping function takes it into account as well. Fixes: de32400dd26e ("wii: use both mem1 and mem2 as ram") Cc: stable@vger.kernel.org Reviewed-by: Jonathan Neuschafer <j.neuschaefer@gmx.net> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-22powerpc/83xx: Also save/restore SPRG4-7 during suspendChristophe Leroy1-7/+27
The 83xx has 8 SPRG registers and uses at least SPRG4 for DTLB handling LRU. Fixes: 2319f1239592 ("powerpc/mm: e300c2/c3/c4 TLB errata workaround") Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-22powerpc/powernv: Don't reprogram SLW image on every KVM guest entry/exitPaul Mackerras2-25/+27
Commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug", 2017-07-21) added two calls to opal_slw_set_reg() inside pnv_cpu_offline(), with the aim of changing the LPCR value in the SLW image to disable wakeups from the decrementer while a CPU is offline. However, pnv_cpu_offline() gets called each time a secondary CPU thread is woken up to participate in running a KVM guest, that is, not just when a CPU is offlined. Since opal_slw_set_reg() is a very slow operation (with observed execution times around 20 milliseconds), this means that an offline secondary CPU can often be busy doing the opal_slw_set_reg() call when the primary CPU wants to grab all the secondary threads so that it can run a KVM guest. This leads to messages like "KVM: couldn't grab CPU n" being printed and guest execution failing. There is no need to reprogram the SLW image on every KVM guest entry and exit. So that we do it only when a CPU is really transitioning between online and offline, this moves the calls to pnv_program_cpu_hotplug_lpcr() into pnv_smp_cpu_kill_self(). Fixes: 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-22powerpc/pseries: export timebase register sample in lparcfgTyrel Datwyler1-0/+1
The Processor Utilzation of Resource Registers (PURR) provide an estimate of resources used by a cpu thread. Section 7.6 in Book III of the ISA outlines how to calculate the percentage of shared resources for threads using the ratio of the PURR delta and Timebase Register delta for a sampled period. This calculation is currently done erroneously by the lparstat tool from the powerpc-utils package. This patch exports the current timebase value after we sample the PURRs and exposes it to userspace accounting tools via /proc/ppc64/lparcfg. Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-22powerpc/44x: Force PCI on for CURRITUCKMichael Ellerman1-0/+1
The recent rework of PCI kconfig symbols exposed an existing bug in the CURRITUCK kconfig logic. It selects PPC4xx_PCI_EXPRESS which depends on PCI, but PCI is user selectable and might be disabled, leading to a warning: WARNING: unmet direct dependencies detected for PPC4xx_PCI_EXPRESS Depends on [n]: PCI [=n] && 4xx [=y] Selected by [y]: - CURRITUCK [=y] && PPC_47x [=y] Prior to commit eb01d42a7778 ("PCI: consolidate PCI config entry in drivers/pci") PCI was enabled by default for currituck_defconfig so we didn't see the warning. The bad logic was still there, it just required someone disabling PCI in their .config to hit it. Fix it by forcing PCI on for CURRITUCK, which seems was always the expectation anyway. Fixes: eb01d42a7778 ("PCI: consolidate PCI config entry in drivers/pci") Reported-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-22powerpc/powernv/npu: Remove redundant change_pte() hookPeter Xu1-10/+0
The change_pte() notifier was designed to use as a quick path to update secondary MMU PTEs on write permission changes or PFN changes. For KVM, it could reduce the vm-exits when vcpu faults on the pages that was touched up by KSM. It's not used to do cache invalidations, for example, if we see the notifier will be called before the real PTE update after all (please see set_pte_at_notify that set_pte_at was called later). All the necessary cache invalidation should all be done in invalidate_range() already. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Alistair Popple <alistair@popple.id.au> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-22Merge branch 'topic/ppc-kvm' into nextMichael Ellerman1-1/+1
Merge commits we're sharing with kvm-ppc tree.
2019-02-21powerpc/64s: Better printing of machine check info for guest MCEsPaul Mackerras1-1/+1
This adds an "in_guest" parameter to machine_check_print_event_info() so that we can avoid trying to translate guest NIP values into symbolic form using the host kernel's symbol table. Reviewed-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-21Merge branch 'topic/dma' into nextMichael Ellerman20-453/+130
Merge hch's big DMA rework series. This is in a topic branch in case he wants to merge it to minimise conflicts.
2019-02-19Merge branch 'fixes' into nextMichael Ellerman5-6/+12
There's a few important fixes in our fixes branch, in particular the pgd/pud_present() one, so merge it now.
2019-02-19powerpc/powernv/sriov: Register IOMMU groups for VFsAlexey Kardashevskiy2-0/+4
The compound IOMMU group rework moved iommu_register_group() together in pnv_pci_ioda_setup_iommu_api() (which is a part of ppc_md.pcibios_fixup). As the result, pnv_ioda_setup_bus_iommu_group() does not create groups any more, it only adds devices to groups. This works fine for boot time devices. However IOMMU groups for SRIOV's VFs were added by pnv_ioda_setup_bus_iommu_group() so this got broken: pnv_tce_iommu_bus_notifier() expects a group to be registered for VF and it is not. This adds missing group registration and adds a NULL pointer check into the bus notifier so we won't crash if there is no group, although it is not expected to happen now because of the change above. Example oops seen prior to this patch: $ echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/sriov_numvfs Unable to handle kernel paging request for data at address 0x00000030 Faulting instruction address: 0xc0000000004a6018 Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=2048 NUMA PowerNV CPU: 46 PID: 7006 Comm: bash Not tainted 4.15-ish NIP: c0000000004a6018 LR: c0000000004a6014 CTR: 0000000000000000 REGS: c000008fc876b400 TRAP: 0300 Not tainted (4.15-ish) MSR: 900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CFAR: c000000000d0be20 DAR: 0000000000000030 DSISR: 40000000 SOFTE: 1 ... NIP sysfs_do_create_link_sd.isra.0+0x68/0x150 LR sysfs_do_create_link_sd.isra.0+0x64/0x150 Call Trace: pci_dev_type+0x0/0x30 (unreliable) iommu_group_add_device+0x8c/0x600 iommu_add_device+0xe8/0x180 pnv_tce_iommu_bus_notifier+0xb0/0xf0 notifier_call_chain+0x9c/0x110 blocking_notifier_call_chain+0x64/0xa0 device_add+0x524/0x7d0 pci_device_add+0x248/0x450 pci_iov_add_virtfn+0x294/0x3e0 pci_enable_sriov+0x43c/0x580 mlx5_core_sriov_configure+0x15c/0x2f0 [mlx5_core] sriov_numvfs_store+0x180/0x240 dev_attr_store+0x3c/0x60 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1ac/0x240 __vfs_write+0x3c/0x70 vfs_write+0xd8/0x220 SyS_write+0x6c/0x110 system_call+0x58/0x6c Fixes: 0bd971676e68 ("powerpc/powernv/npu: Add compound IOMMU groups") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reported-by: Santwana Samantray <santwana.samantray@in.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-18powerpc/dma: trim the fat from <asm/dma-mapping.h>Christoph Hellwig3-0/+3
There is no need to provide anything but get_arch_dma_ops to <linux/dma-mapping.h>. More the remaining declarations to <asm/iommu.h> and drop all the includes. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-18powerpc/dma: remove set_dma_offsetChristoph Hellwig3-10/+7
There is no good reason for this helper, just opencode it. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-18powerpc/dma: use the generic direct mapping bypassChristoph Hellwig15-79/+10
Now that we've switched all the powerpc nommu and swiotlb methods to use the generic dma_direct_* calls we can remove these ops vectors entirely and rely on the common direct mapping bypass that avoids indirect function calls entirely. This also allows to remove a whole lot of boilerplate code related to setting up these operations. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-18powerpc/dma: use the dma_direct mapping routinesChristoph Hellwig1-0/+2
Switch the streaming DMA mapping and ownership transfer methods to the functionally identical dma_direct_ versions. Factor the cache maintainance helpers into the form expected by the common code for that. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-18powerpc/dma: remove dma_nommu_mmap_coherentChristoph Hellwig2-1/+1
The coherent cache version of this function already is functionally identicall to the default version, and by defining the arch_dma_coherent_to_pfn hook the same is ture for the noncoherent version as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-18powerpc/dma: remove get_pci_dma_opsChristoph Hellwig1-9/+8
This function is only used by the Cell iommu code, which can keep track if it is using the iommu internally just as good. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-18powerpc/powernv: use the generic iommu bypass codeChristoph Hellwig1-70/+25
Use the generic iommu bypass code instead of overriding set_dma_mask. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-18powerpc/powernv: remove pnv_npu_dma_set_maskChristoph Hellwig1-9/+0
These devices are not PCIe devices and do not have associated dma map ops, so this is just dead code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-18powerpc/powernv: remove pnv_pci_ioda_pe_single_vendorChristoph Hellwig1-26/+2
This function is completely bogus - the fact that two PCIe devices come from the same vendor has absolutely nothing to say about the DMA capabilities and characteristics. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-18powerpc/cell: use the generic iommu bypass codeChristoph Hellwig1-131/+9
This gets rid of a lot of clumsy code and finally allows us to mark dma_iommu_ops const. Includes fixes from Michael Ellerman. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-18powerpc/cell: move dma direct window setup out of dma_configureChristoph Hellwig1-9/+15
Configure the dma settings at device setup time, and stop playing games with get_pci_dma_ops. This prepares for using the common dma_configure code later on. Includes fixes from Michael Ellerman. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-18powerpc/pseries: use the generic iommu bypass codeChristoph Hellwig1-73/+27
Use the generic iommu bypass code instead of overriding set_dma_mask. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-18powerpc/pseries: unwind dma_get_required_mask_pSeriesLP a bitChristoph Hellwig1-1/+1
Call dma_get_required_mask_pSeriesLP directly instead of dma_iommu_ops to simply the code a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-18powerpc/dma: untangle vio_dma_mapping_ops from dma_iommu_opsChristoph Hellwig1-51/+36
vio_dma_mapping_ops currently does a lot of indirect calls through dma_iommu_ops, which not only make the code harder to follow but are also expensive in the post-spectre world. Unwind the indirect calls by calling the ppc_iommu_* or iommu_* APIs directly applicable, or just use the dma_iommu_* methods directly where we can. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-11Merge 5.0-rc6 into driver-core-nextGreg Kroah-Hartman6-8/+13
We need the debugfs fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-11Merge 5.0-rc6 into char-misc-nextGreg Kroah-Hartman1-1/+4
We need the char-misc fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-02-08Merge tag 'powerpc-5.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds1-1/+4
Pull powerpc fixes from Michael Ellerman: "Just two fixes, both going to stable. - Our support for split pmd page table lock had a bug which could lead to a crash on mremap() when using the Radix MMU (Power9 only). - A fix for the PAPR SCM driver (nvdimm) we added last release, which had a bug where we might mis-handle a hypervisor response leading to us failing to attach the memory region. Thanks to: Aneesh Kumar K.V, Oliver O'Halloran" * tag 'powerpc-5.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/papr_scm: Use the correct bind address powerpc/radix: Fix kernel crash with mremap()
2019-02-07powerpc/powernv: Escalate reset when IODA reset failsOliver O'Halloran1-2/+5
The IODA reset is used to flush out any OS controlled state from the PHB. This reset can fail if a PHB fatal error has occurred in early boot, probably due to a because of a bad device. We already do a fundemental reset of the device in some cases, so this patch just adds a test to force a full reset if firmware reports an error when performing the IODA reset. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-03Move static keyword at beginning of declarationMathieu Malaterre2-4/+4
Move the static keyword around to remove the following warnings (W=1): arch/powerpc/platforms/ps3/os-area.c:212:1: error: 'static' is not at beginning of declaration [-Werror=old-style-declaration] arch/powerpc/platforms/ps3/system-bus.c:45:1: error: 'static' is not at beginning of declaration [-Werror=old-style-declaration] Signed-off-by: Mathieu Malaterre <malat@debian.org> Acked-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-02-01powerpc/papr_scm: Use the correct bind addressOliver O'Halloran1-1/+4
When binding an SCM volume to a physical address the hypervisor has the option to return early with a continue token with the expectation that the guest will resume the bind operation until it completes. A quirk of this interface is that the bind address will only be returned by the first bind h-call and the subsequent calls will return 0xFFFF_FFFF_FFFF_FFFF for the bind address. We currently do not save the address returned by the first h-call. As a result we will use the junk address as the base of the bound region if the hypervisor decides to split the bind across multiple h-calls. This bug was found when testing with very large SCM volumes where the bind process would take more time than they hypervisor's internal h-call time limit would allow. This patch fixes the issue by saving the bind address from the first call. Cc: stable@vger.kernel.org Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-01-31powerpc/cell: Remove duplicate headerSabyasachi Gupta1-1/+0
Remove linux/syscalls.h which is included more than once Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com> Acked-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-01-30powerpc/powernv: Remove duplicate headerSabyasachi Gupta1-1/+0
Remove linux/printk.h which is included more than once. Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com> Acked-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-01-30powerpc/pseries: Perform full re-add of CPU for topology update post-migrationNathan Fontenot1-0/+19
On pseries systems, performing a partition migration can result in altering the nodes a CPU is assigned to on the destination system. For exampl, pre-migration on the source system CPUs are in node 1 and 3, post-migration on the destination system CPUs are in nodes 2 and 3. Handling the node change for a CPU can cause corruption in the slab cache if we hit a timing where a CPUs node is changed while cache_reap() is invoked. The corruption occurs because the slab cache code appears to rely on the CPU and slab cache pages being on the same node. The current dynamic updating of a CPUs node done in arch/powerpc/mm/numa.c does not prevent us from hitting this scenario. Changing the device tree property update notification handler that recognizes an affinity change for a CPU to do a full DLPAR remove and add of the CPU instead of dynamically changing its node resolves this issue. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com> Tested-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>