AgeCommit message (Collapse)AuthorFilesLines
2019-07-02powerpc/64s/exception: remove the "extra" macro parameterNicholas Piggin2-122/+114
Rather than pass in the soft-masking and KVM tests via macro that is passed to another macro to expand it, switch to usig gas macros and conditionally expand the soft-masking and KVM tests. The system reset with its idle test is open coded as it is a one-off. No generated code change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-02powerpc/64s/exception: fix sreset KVM test codeNicholas Piggin1-3/+3
The sreset handler KVM test theoretically should not depend on P7. In practice KVM now only supports P7 and up so no real bug fix, but this change is made now so the quirk is not propagated through cleanup patches. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-02powerpc/64s/exception: move and tidy EXCEPTION_PROLOG_2 variantsNicholas Piggin2-54/+51
- Re-name the macros to _REAL and _VIRT suffixes rather than no and _RELON suffix. - Move the macro definitions together in the file. - Move RELOCATABLE ifdef inside the _VIRT macro. Further consolidation between variants does not buy much here. No generated code change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-02powerpc/64s/exception: consolidate EXCEPTION_PROLOG_2 with _NORI variantNicholas Piggin2-38/+17
Switch to a gas macro that conditionally expands the RI clearing instruction. No generated code change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-02powerpc/64s/exception: remove H concatenation for EXC_HV variantsNicholas Piggin3-184/+253
Replace all instances of this with gas macros that test the hsrr parameter and use the appropriate register names / labels. No generated code change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Remove extraneous 2nd check for 0xea0 in SOFTEN_TEST] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-02powerpc/64s/exception: Remove unused SOFTEN_VALUE_0x980Michael Ellerman1-1/+0
Remove SOFTEN_VALUE_0x980, it's been unused since commit dabe859ec636 ("powerpc: Give hypervisor decrementer interrupts their own handler") (Sep 2012). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-01powerpc/64s/exception: fix line wrap and semicolon inconsistencies in macrosNicholas Piggin2-52/+52
By convention, all lines should be separated by a semicolons. Last line should have neither semicolon or line wrap. No generated code change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-01powerpc/powernv: remove the unused vas_win_paste_addr and vas_win_id functionsChristoph Hellwig3-49/+0
These two function have never been used anywhere in the kernel tree since they were added to the kernel. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-01powerpc/powernv: remove unused NPU DMA codeChristoph Hellwig4-581/+0
None of these routines were ever used anywhere in the kernel tree since they were added to the kernel. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-01powerpc/powernv: remove the unused tunneling exportsChristoph Hellwig4-77/+3
These have been unused anywhere in the kernel tree ever since they've been added to the kernel. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-01powerpc/powernv: remove the unused pnv_pci_set_p2p functionChristoph Hellwig5-84/+0
This function has never been used anywhere in the kernel tree since it was added to the tree. We also now have proper PCIe P2P APIs in the core kernel, and any new P2P support should be using those. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-01powerpc/xmon: Fix disabling tracing while in xmonNaveen N. Rao1-2/+4
Commit ed49f7fd6438d ("powerpc/xmon: Disable tracing when entering xmon") added code to disable recording trace entries while in xmon. The commit introduced a variable 'tracing_enabled' to record if tracing was enabled on xmon entry, and used this to conditionally enable tracing during exit from xmon. However, we are not checking the value of 'fromipi' variable in xmon_core() when setting 'tracing_enabled'. Due to this, when secondary cpus enter xmon, they will see tracing as being disabled already and tracing won't be re-enabled on exit. Fix the same. Fixes: ed49f7fd6438d ("powerpc/xmon: Disable tracing when entering xmon") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-01selftests/powerpc: ppc_asm.h: typo in the header guardDenis Efremov1-1/+1
The guard macro __PPC_ASM_H in the header ppc_asm.h doesn't match the #ifndef macro _PPC_ASM_H. The patch makes them the same. Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-01powerpc/cacheflush: fix variable set but not usedQian Cai1-2/+5
The powerpc's flush_cache_vmap() is defined as a macro and never use both of its arguments, so it will generate a compilation warning, lib/ioremap.c: In function 'ioremap_page_range': lib/ioremap.c:203:16: warning: variable 'start' set but not used [-Wunused-but-set-variable] Fix it by making it an inline function. Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-01powerpc/eeh_cache: fix a W=1 kernel-doc warningQian Cai1-0/+3
The opening comment mark "/**" is reserved for kernel-doc comments, so it will generate a warning with "make W=1". arch/powerpc/kernel/eeh_cache.c:37: warning: cannot understand function prototype: 'struct pci_io_addr_range Since this is not a kernel-doc for the struct below, but rather an overview of this source eeh_cache.c, just use the free-form comments kernel-doc syntax instead. Signed-off-by: Qian Cai <cai@lca.pw> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-01powerpc/ftrace: Enable C Version of recordmcountChristophe Leroy1-0/+1
Selects HAVE_C_RECORDMCOUNT to use the C version of the recordmcount intead of the old Perl Version of recordmcount. This should improve build time. It also seems like the old Perl Version misses some calls to _mcount that the C version finds. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-01recordmcount: Fix spurious mcount entries on powerpcNaveen N. Rao1-1/+2
An impending change to enable HAVE_C_RECORDMCOUNT on powerpc leads to warnings such as the following: # modprobe kprobe_example ftrace-powerpc: Not expected bl: opcode is 3c4c0001 WARNING: CPU: 0 PID: 227 at kernel/trace/ftrace.c:2001 ftrace_bug+0x90/0x318 Modules linked in: CPU: 0 PID: 227 Comm: modprobe Not tainted 5.2.0-rc6-00678-g1c329100b942 #2 NIP: c000000000264318 LR: c00000000025d694 CTR: c000000000f5cd30 REGS: c000000001f2b7b0 TRAP: 0700 Not tainted (5.2.0-rc6-00678-g1c329100b942) MSR: 900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 28228222 XER: 00000000 CFAR: c0000000002642fc IRQMASK: 0 <snip> NIP [c000000000264318] ftrace_bug+0x90/0x318 LR [c00000000025d694] ftrace_process_locs+0x4f4/0x5e0 Call Trace: [c000000001f2ba40] [0000000000000004] 0x4 (unreliable) [c000000001f2bad0] [c00000000025d694] ftrace_process_locs+0x4f4/0x5e0 [c000000001f2bb90] [c00000000020ff10] load_module+0x25b0/0x30c0 [c000000001f2bd00] [c000000000210cb0] sys_finit_module+0xc0/0x130 [c000000001f2be20] [c00000000000bda4] system_call+0x5c/0x70 Instruction dump: 419e0018 2f83ffff 419e00bc 2f83ffea 409e00cc 4800001c 0fe00000 3c62ff96 39000001 39400000 386386d0 480000c4 <0fe00000> 3ce20003 39000001 3c62ff96 ---[ end trace 4c438d5cebf78381 ]--- ftrace failed to modify [<c0080000012a0008>] 0xc0080000012a0008 actual: 01:00:4c:3c Initializing ftrace call sites ftrace record flags: 2000000 (0) expected tramp: c00000000006af4c Looking at the relocation records in __mcount_loc shows a few spurious entries: RELOCATION RECORDS FOR [__mcount_loc]: OFFSET TYPE VALUE 0000000000000000 R_PPC64_ADDR64 .text.unlikely+0x0000000000000008 0000000000000008 R_PPC64_ADDR64 .text.unlikely+0x0000000000000014 0000000000000010 R_PPC64_ADDR64 .text.unlikely+0x0000000000000060 0000000000000018 R_PPC64_ADDR64 .text.unlikely+0x00000000000000b4 0000000000000020 R_PPC64_ADDR64 .init.text+0x0000000000000008 0000000000000028 R_PPC64_ADDR64 .init.text+0x0000000000000014 The first entry in each section is incorrect. Looking at the relocation records, the spurious entries correspond to the R_PPC64_ENTRY records: RELOCATION RECORDS FOR [.text.unlikely]: OFFSET TYPE VALUE 0000000000000000 R_PPC64_REL64 .TOC.-0x0000000000000008 0000000000000008 R_PPC64_ENTRY *ABS* 0000000000000014 R_PPC64_REL24 _mcount <snip> The problem is that we are not validating the return value from get_mcountsym() in sift_rel_mcount(). With this entry, mcountsym is 0, but Elf_r_sym(relp) also ends up being 0. Fix this by ensuring mcountsym is valid before processing the entry. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Tested-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-01powerpc/rtas: retry when cpu offline races with suspend/migrationNathan Lynch1-4/+3
The protocol for suspending or migrating an LPAR requires all present processor threads to enter H_JOIN. So if we have threads offline, we have to temporarily bring them up. This can race with administrator actions such as SMT state changes. As of dfd718a2ed1f ("powerpc/rtas: Fix a potential race between CPU-Offline & Migration"), rtas_ibm_suspend_me() accounts for this, but errors out with -EBUSY for what almost certainly is a transient condition in any reasonable scenario. Callers of rtas_ibm_suspend_me() already retry when -EAGAIN is returned, and it is typical during a migration for that to happen repeatedly for several minutes polling the H_VASI_STATE hcall result before proceeding to the next stage. So return -EAGAIN instead of -EBUSY when this race is encountered. Additionally: logging this event is still appropriate but use pr_info instead of pr_err; and remove use of unlikely() while here as this is not a hot path at all. Fixes: dfd718a2ed1f ("powerpc/rtas: Fix a potential race between CPU-Offline & Migration") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-01powerpc: Document xive=off optionMichael Neuling1-0/+9
commit 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE interrupt controller") added an option to turn off Linux native XIVE usage via the xive=off kernel command line option. This documents this option. Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Acked-by: Stewart Smith <stewart@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-07-01Merge branch 'fixes' into nextMichael Ellerman21-24/+230
Merge our fixes branch into next, this brings in a number of commits that fix bugs we don't want to hit in next, in particular the fix for CVE-2019-12817.
2019-07-01Merge tag 'powerpc-5.2-6' into fixesMichael Ellerman4-10/+139
This merges the commits that were the fix for CVE-2019-12817, which was developed under embargo. They have already been merged by Linus Merge them into fixes now so that this branch contains all the fixes for this release.
2019-06-25powerpc/64s/exception: Fix machine check early corrupting AMRNicholas Piggin1-1/+1
The early machine check runs in real mode, so locking is unnecessary. Worse, the windup does not restore AMR, so this can result in a false KUAP fault after a recoverable machine check hits inside a user copy operation. Fix this similarly to HMI by just avoiding the kuap lock in the early machine check handler (it will be set by the late handler that runs in virtual mode if that runs). If the virtual mode handler is reached, it will lock and restore the AMR. Fixes: 890274c2dc4c0 ("powerpc/64s: Implement KUAP for Radix MMU") Cc: Russell Currey <ruscur@russell.cc> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-20KVM: PPC: Book3S HV: Clear pending decrementer exceptions on nested guest entrySuraj Jitindar Singh1-2/+9
If we enter an L1 guest with a pending decrementer exception then this is cleared on guest exit if the guest has writtien a positive value into the decrementer (indicating that it handled the decrementer exception) since there is no other way to detect that the guest has handled the pending exception and that it should be dequeued. In the event that the L1 guest tries to run a nested (L2) guest immediately after this and the L2 guest decrementer is negative (which is loaded by L1 before making the H_ENTER_NESTED hcall), then the pending decrementer exception isn't cleared and the L2 entry is blocked since L1 has a pending exception, even though L1 may have already handled the exception and written a positive value for it's decrementer. This results in a loop of L1 trying to enter the L2 guest and L0 blocking the entry since L1 has an interrupt pending with the outcome being that L2 never gets to run and hangs. Fix this by clearing any pending decrementer exceptions when L1 makes the H_ENTER_NESTED hcall since it won't do this if it's decrementer has gone negative, and anyway it's decrementer has been communicated to L0 in the hdec_expires field and L0 will return control to L1 when this goes negative by delivering an H_DECREMENTER exception. Fixes: 95a6432ce903 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-20KVM: PPC: Book3S HV: Signed extend decrementer value if not using large decrementerSuraj Jitindar Singh1-0/+2
On POWER9 the decrementer can operate in large decrementer mode where the decrementer is 56 bits and signed extended to 64 bits. When not operating in this mode the decrementer behaves as a 32 bit decrementer which is NOT signed extended (as on POWER8). Currently when reading a guest decrementer value we don't take into account whether the large decrementer is enabled or not, and this means the value will be incorrect when the guest is not using the large decrementer. Fix this by sign extending the value read when the guest isn't using the large decrementer. Fixes: 95a6432ce903 ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-20KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entriesSuraj Jitindar Singh1-0/+1
When a guest vcpu moves from one physical thread to another it is necessary for the host to perform a tlb flush on the previous core if another vcpu from the same guest is going to run there. This is because the guest may use the local form of the tlb invalidation instruction meaning stale tlb entries would persist where it previously ran. This is handled on guest entry in kvmppc_check_need_tlb_flush() which calls flush_guest_tlb() to perform the tlb flush. Previously the generic radix__local_flush_tlb_lpid_guest() function was used, however the functionality was reimplemented in flush_guest_tlb() to avoid the trace_tlbie() call as the flushing may be done in real mode. The reimplementation in flush_guest_tlb() was missing an erat invalidation after flushing the tlb. This lead to observable memory corruption in the guest due to the caching of stale translations. Fix this by adding the erat invalidation. Fixes: 70ea13f6e609 ("KVM: PPC: Book3S HV: Flush TLB on secondary radix threads") Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-20powerpc/pci/of: Fix OF flags parsing for 64bit BARsAlexey Kardashevskiy1-0/+2
When the firmware does PCI BAR resource allocation, it passes the assigned addresses and flags (prefetch/64bit/...) via the "reg" property of a PCI device device tree node so the kernel does not need to do resource allocation. The flags are stored in resource::flags - the lower byte stores PCI_BASE_ADDRESS_SPACE/etc bits and the other bytes are IORESOURCE_IO/etc. Some flags from PCI_BASE_ADDRESS_xxx and IORESOURCE_xxx are duplicated, such as PCI_BASE_ADDRESS_MEM_PREFETCH/PCI_BASE_ADDRESS_MEM_TYPE_64/etc. When parsing the "reg" property, we copy the prefetch flag but we skip on PCI_BASE_ADDRESS_MEM_TYPE_64 which leaves the flags out of sync. The missing IORESOURCE_MEM_64 flag comes into play under 2 conditions: 1. we remove PCI_PROBE_ONLY for pseries (by hacking pSeries_setup_arch() or by passing "/chosen/linux,pci-probe-only"); 2. we request resource alignment (by passing pci=resource_alignment= via the kernel cmd line to request PAGE_SIZE alignment or defining ppc_md.pcibios_default_alignment which returns anything but 0). Note that the alignment requests are ignored if PCI_PROBE_ONLY is enabled. With 1) and 2), the generic PCI code in the kernel unconditionally decides to: - reassign the BARs in pci_specified_resource_alignment() (works fine) - write new BARs to the device - this fails for 64bit BARs as the generic code looks at IORESOURCE_MEM_64 (not set) and writes only lower 32bits of the BAR and leaves the upper 32bit unmodified which breaks BAR mapping in the hypervisor. This fixes the issue by copying the flag. This is useful if we want to enforce certain BAR alignment per platform as handling subpage sized BARs is proven to cause problems with hotplug (SLOF already aligns BARs to 64k). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Reviewed-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Shawn Anastasio <shawn@anastas.io> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-19powerpc: enable a 30-bit ZONE_DMA for 32-bit pmacChristoph Hellwig3-1/+10
With the strict dma mask checking introduced with the switch to the generic DMA direct code common wifi chips on 32-bit powerbooks stopped working. Add a 30-bit ZONE_DMA to the 32-bit pmac builds to allow them to reliably allocate dma coherent memory. Fixes: 65a21b71f948 ("powerpc/dma: remove dma_nommu_dma_supported") Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Larry Finger <Larry.Finger@lwfinger.net> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-19powerpc/64s/radix: Enable HAVE_ARCH_HUGE_VMAPNicholas Piggin4-1/+110
This sets the HAVE_ARCH_HUGE_VMAP option, and defines the required page table functions. This enables huge (2MB and 1GB) ioremap mappings. I don't have a benchmark for this change, but huge vmap will be used by a later core kernel change to enable huge vmalloc memory mappings. This improves cached `git diff` performance by about 5% on a 2-node POWER9 with 32MB size dentry cache hash. Profiling git diff dTLB misses with a vanilla kernel: 81.75% git [kernel.vmlinux] [k] __d_lookup_rcu 7.21% git [kernel.vmlinux] [k] strncpy_from_user 1.77% git [kernel.vmlinux] [k] find_get_entry 1.59% git [kernel.vmlinux] [k] kmem_cache_free 40,168 dTLB-miss 0.100342754 seconds time elapsed With powerpc huge vmalloc: 2,987 dTLB-miss 0.095933138 seconds time elapsed Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-19powerpc/64s/radix: ioremap use ioremap_page_rangeNicholas Piggin4-1/+46
Radix can use ioremap_page_range for ioremap, after slab is available. This makes it possible to enable huge ioremap mapping support. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-19powerpc/64: __ioremap_at clean up in the error caseNicholas Piggin1-7/+20
__ioremap_at error handling is wonky, it requires caller to clean up after it. Implement a helper that does the map and error cleanup and remove the requirement from the caller. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-19powerpc/perf: Use cpumask_last() to determine the designated cpu for nest/core units.Anju T Sudhakar1-2/+12
Nest and core IMC (In-Memory Collection counters) assigns a particular cpu as the designated target for counter data collection. During system boot, the first online cpu in a chip gets assigned as the designated cpu for that chip(for nest-imc) and the first online cpu in a core gets assigned as the designated cpu for that core(for core-imc). If the designated cpu goes offline, the next online cpu from the same chip(for nest-imc)/core(for core-imc) is assigned as the next target, and the event context is migrated to the target cpu. Currently, cpumask_any_but() function is used to find the target cpu. Though this function is expected to return a `random` cpu, this always returns the next online cpu. If all cpus in a chip/core is offlined in a sequential manner, starting from the first cpu, the event migration has to happen for all the cpus which goes offline. Since the migration process involves a grace period, the total time taken to offline all the cpus will be significantly high. Example: In a system which has 2 sockets, with NUMA node0 CPU(s): 0-87 NUMA node8 CPU(s): 88-175 Time taken to offline cpu 88-175: real 2m56.099s user 0m0.191s sys 0m0.000s Use cpumask_last() to choose the target cpu, when the designated cpu goes online, so the migration will happen only when the last_cpu in the mask goes offline. This way the time taken to offline all cpus in a chip/core can be reduced. With the patch: Time taken to offline cpu 88-175: real 0m12.207s user 0m0.171s sys 0m0.000s Offlining all cpus in reverse order is also taken care because, cpumask_any_but() is used to find the designated cpu if the last cpu in the mask goes offline. Since cpumask_any_but() always return the first cpu in the mask, that becomes the designated cpu and migration will happen only when the first_cpu in the mask goes offline. Example: With the patch, Time taken to offline cpu from 175-88: real 0m9.330s user 0m0.110s sys 0m0.000s Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-19powerpc/64s: Fix misleading SPR and timebase informationShaokun Zhang1-2/+2
pr_info shows SPR and timebase as a decimal value with a '0x' prefix, which is somewhat misleading. Fix it to print hexadecimal, as was intended. Fixes: 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C") Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-19powerpc/pseries: avoid blocking in irq when queuing hotplug eventsNathan Lynch1-4/+4
A couple of bugs in queue_hotplug_event(): 1. Unchecked kmalloc result which could lead to an oops. 2. Use of GFP_KERNEL allocations in interrupt context (this code's only caller is ras_hotplug_interrupt()). Use kmemdup to avoid open-coding the allocation+copy and check for failure; use GFP_ATOMIC for both allocations. Ultimately it probably would be better to avoid or reduce allocations in this path if possible. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-19powerpc/watchpoint: Restore NV GPRs while returning from exceptionRavi Bangoria1-2/+7
powerpc hardware triggers watchpoint before executing the instruction. To make trigger-after-execute behavior, kernel emulates the instruction. If the instruction is 'load something into non-volatile register', exception handler should restore emulated register state while returning back, otherwise there will be register state corruption. eg, adding a watchpoint on a list can corrput the list: # cat /proc/kallsyms | grep kthread_create_list c00000000121c8b8 d kthread_create_list Add watchpoint on kthread_create_list->prev: # perf record -e mem:0xc00000000121c8c0 Run some workload such that new kthread gets invoked. eg, I just logged out from console: list_add corruption. next->prev should be prev (c000000001214e00), \ but was c00000000121c8b8. (next=c00000000121c8b8). WARNING: CPU: 59 PID: 309 at lib/list_debug.c:25 __list_add_valid+0xb4/0xc0 CPU: 59 PID: 309 Comm: kworker/59:0 Kdump: loaded Not tainted 5.1.0-rc7+ #69 ... NIP __list_add_valid+0xb4/0xc0 LR __list_add_valid+0xb0/0xc0 Call Trace: __list_add_valid+0xb0/0xc0 (unreliable) __kthread_create_on_node+0xe0/0x260 kthread_create_on_node+0x34/0x50 create_worker+0xe8/0x260 worker_thread+0x444/0x560 kthread+0x160/0x1a0 ret_from_kernel_thread+0x5c/0x70 List corruption happened because it uses 'load into non-volatile register' instruction: Snippet from __kthread_create_on_node: c000000000136be8: addis r29,r2,-19 c000000000136bec: ld r29,31424(r29) if (!__list_add_valid(new, prev, next)) c000000000136bf0: mr r3,r30 c000000000136bf4: mr r5,r28 c000000000136bf8: mr r4,r29 c000000000136bfc: bl c00000000059a2f8 <__list_add_valid+0x8> Register state from WARN_ON(): GPR00: c00000000059a3a0 c000007ff23afb50 c000000001344e00 0000000000000075 GPR04: 0000000000000000 0000000000000000 0000001852af8bc1 0000000000000000 GPR08: 0000000000000001 0000000000000007 0000000000000006 00000000000004aa GPR12: 0000000000000000 c000007ffffeb080 c000000000137038 c000005ff62aaa00 GPR16: 0000000000000000 0000000000000000 c000007fffbe7600 c000007fffbe7370 GPR20: c000007fffbe7320 c000007fffbe7300 c000000001373a00 0000000000000000 GPR24: fffffffffffffef7 c00000000012e320 c000007ff23afcb0 c000000000cb8628 GPR28: c00000000121c8b8 c000000001214e00 c000007fef5b17e8 c000007fef5b17c0 Watchpoint hit at 0xc000000000136bec. addis r29,r2,-19 => r29 = 0xc000000001344e00 + (-19 << 16) => r29 = 0xc000000001214e00 ld r29,31424(r29) => r29 = *(0xc000000001214e00 + 31424) => r29 = *(0xc00000000121c8c0) 0xc00000000121c8c0 is where we placed a watchpoint and thus this instruction was emulated by emulate_step. But because handle_dabr_fault did not restore emulated register state, r29 still contains stale value in above register state. Fixes: 5aae8a5370802 ("powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors") Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: stable@vger.kernel.org # 2.6.36+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-19cxl: no need to check return value of debugfs_create functionsGreg Kroah-Hartman2-34/+17
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Because there's no need to check, also make the return value of the local debugfs_create_io_x64() call void, as no one ever did anything with the return value (as they did not need to.) And make the cxl_debugfs_* calls return void as no one was even checking their return value at all. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Acked-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-19powerpc/ps3: Use [] to denote a flexible array memberGeert Uytterhoeven1-1/+1
Flexible array members should be denoted using [] instead of [0], else gcc will not warn when they are no longer at the end of the structure. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-19powerpc/mm/32s: fix condition that is always trueAndreas Schwab1-1/+1
Move a misplaced paren that makes the condition always true. Fixes: 63b2bc619565 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX") Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-19powerpc/32s: fix suspend/resume when IBATs 4-7 are usedChristophe Leroy2-13/+128
Previously, only IBAT1 and IBAT2 were used to map kernel linear mem. Since commit 63b2bc619565 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX"), we may have all 8 BATs used for mapping kernel text. But the suspend/restore functions only save/restore BATs 0 to 3, and clears BATs 4 to 7. Make suspend and restore functions respectively save and reload the 8 BATs on CPUs having MMU_FTR_USE_HIGH_BATS feature. Reported-by: Andreas Schwab <schwab@linux-m68k.org> Cc: stable@vger.kernel.org Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-19selftests/powerpc: Fix earlyclobber in tm-vmxcopyGustavo Romero1-1/+1
In some cases, compiler can allocate the same register for operand 'res' and 'vecoutptr', resulting in segfault at 'stxvd2x 40,0,%[vecoutptr]' because base register will contain 1, yielding a false-positive. This is because output 'res' must be marked as an earlyclobber operand so it may not overlap an input operand ('vecoutptr'). Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-18KVM: PPC: Book3S HV: Only write DAWR[X] when handling h_set_dawr in real modeSuraj Jitindar Singh1-1/+10
The hcall H_SET_DAWR is used by a guest to set the data address watchpoint register (DAWR). This hcall is handled in the host in kvmppc_h_set_dawr() which can be called in either real mode on the guest exit path from hcall_try_real_mode() in book3s_hv_rmhandlers.S, or in virtual mode when called from kvmppc_pseries_do_hcall() in book3s_hv.c. The function kvmppc_h_set_dawr() updates the dawr and dawrx fields in the vcpu struct accordingly and then also writes the respective values into the DAWR and DAWRX registers directly. It is necessary to write the registers directly here when calling the function in real mode since the path to re-enter the guest won't do this. However when in virtual mode the host DAWR and DAWRX values have already been restored, and so writing the registers would overwrite these. Additionally there is no reason to write the guest values here as these will be read from the vcpu struct and written to the registers appropriately the next time the vcpu is run. This also avoids the case when handling h_set_dawr for a nested guest where the guest hypervisor isn't able to write the DAWR and DAWRX registers directly and must rely on the real hypervisor to do this for it when it calls H_ENTER_NESTED. Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-18KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr()Michael Neuling1-1/+3
Commit c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") screwed up some assembler and corrupted a pointer in r3. This resulted in crashes like the below: BUG: Kernel NULL pointer dereference at 0x000013bf Faulting instruction address: 0xc00000000010b044 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA pSeries CPU: 8 PID: 1771 Comm: qemu-system-ppc Kdump: loaded Not tainted 5.2.0-rc4+ #3 NIP: c00000000010b044 LR: c0080000089dacf4 CTR: c00000000010aff4 REGS: c00000179b397710 TRAP: 0300 Not tainted (5.2.0-rc4+) MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 42244842 XER: 00000000 CFAR: c00000000010aff8 DAR: 00000000000013bf DSISR: 42000000 IRQMASK: 0 GPR00: c0080000089dd6bc c00000179b3979a0 c008000008a04300 ffffffffffffffff GPR04: 0000000000000000 0000000000000003 000000002444b05d c0000017f11c45d0 ... NIP kvmppc_h_set_dabr+0x50/0x68 LR kvmppc_pseries_do_hcall+0xa3c/0xeb0 [kvm_hv] Call Trace: 0xc0000017f11c0000 (unreliable) kvmppc_vcpu_run_hv+0x694/0xec0 [kvm_hv] kvmppc_vcpu_run+0x34/0x48 [kvm] kvm_arch_vcpu_ioctl_run+0x2f4/0x400 [kvm] kvm_vcpu_ioctl+0x460/0x850 [kvm] do_vfs_ioctl+0xe4/0xb40 ksys_ioctl+0xc4/0x110 sys_ioctl+0x28/0x80 system_call+0x5c/0x70 Instruction dump: 4082fff4 4c00012c 38600000 4e800020 e96280c0 896b0000 2c2b0000 3860ffff 4d820020 50852e74 508516f6 78840724 <f88313c0> f8a313c8 7c942ba6 7cbc2ba6 Fix the bug by only changing r3 when we are returning immediately. Fixes: c1fe190c0672 ("powerpc: Add force enable of DAWR on P9 option") Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reported-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-16powerpc/32: fix build failure on book3e with KVMChristophe Leroy2-3/+3
Build failure was introduced by the commit identified below, due to missed macro expension leading to wrong called function's name. arch/powerpc/kernel/head_fsl_booke.o: In function `SystemCall': arch/powerpc/kernel/head_fsl_booke.S:416: undefined reference to `kvmppc_handler_BOOKE_INTERRUPT_SYSCALL_SPRN_SRR1' Makefile:1052: recipe for target 'vmlinux' failed The called function should be kvmppc_handler_8_0x01B(). This patch fixes it. Reported-by: Paul Mackerras <paulus@ozlabs.org> Fixes: 1a4b739bbb4f ("powerpc/32: implement fast entry for syscalls on BOOKE") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-16powerpc/64: mark start_here_multiplatform as __refChristophe Leroy1-0/+2
Otherwise, the following warning is encountered: WARNING: vmlinux.o(.text+0x3dc6): Section mismatch in reference from the variable start_here_multiplatform to the function .init.text:.early_setup() The function start_here_multiplatform() references the function __init .early_setup(). This is often because start_here_multiplatform lacks a __init annotation or the annotation of .early_setup is wrong. Fixes: 56c46bba9bbf ("powerpc/64: Fix booting large kernels with STRICT_KERNEL_RWX") Cc: Russell Currey <ruscur@russell.cc> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-15powerpc/booke: fix fast syscall entry on SMPChristophe Leroy1-3/+3
Use r10 instead of r9 to calculate CPU offset as r9 contains the value from SRR1 which is used later. Fixes: 1a4b739bbb4f ("powerpc/32: implement fast entry for syscalls on BOOKE") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-15powerpc/32s: fix initial setup of segment registers on secondary CPUChristophe Leroy1-0/+1
The patch referenced below moved the loading of segment registers out of load_up_mmu() in order to do it earlier in the boot sequence. However, the secondary CPU still needs it to be done when loading up the MMU. Reported-by: Erhard F. <erhard_f@mailbox.org> Fixes: 215b823707ce ("powerpc/32s: set up an early static hash table for KASAN") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-15powerpc/pseries/mobility: rebuild cacheinfo hierarchy post-migrationNathan Lynch1-0/+10
It's common for the platform to replace the cache device nodes after a migration. Since the cacheinfo code is never informed about this, it never drops its references to the source system's cache nodes, causing it to wind up in an inconsistent state resulting in warnings and oopses as soon as CPU online/offline occurs after the migration, e.g. cache for /cpus/l3-cache@3113(Unified) refers to cache for /cpus/l2-cache@200d(Unified) WARNING: CPU: 15 PID: 86 at arch/powerpc/kernel/cacheinfo.c:176 release_cache+0x1bc/0x1d0 [...] NIP release_cache+0x1bc/0x1d0 LR release_cache+0x1b8/0x1d0 Call Trace: release_cache+0x1b8/0x1d0 (unreliable) cacheinfo_cpu_offline+0x1c4/0x2c0 unregister_cpu_online+0x1b8/0x260 cpuhp_invoke_callback+0x114/0xf40 cpuhp_thread_fun+0x270/0x310 smpboot_thread_fn+0x2c8/0x390 kthread+0x1b8/0x1c0 ret_from_kernel_thread+0x5c/0x68 Using device tree notifiers won't work since we want to rebuild the hierarchy only after all the removals and additions have occurred and the device tree is in a consistent state. Call cacheinfo_teardown() before processing device tree updates, and rebuild the hierarchy afterward. Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-15powerpc/pseries/mobility: prevent cpu hotplug during DT updateNathan Lynch1-0/+9
CPU online/offline code paths are sensitive to parts of the device tree (various cpu node properties, cache nodes) that can be changed as a result of a migration. Prevent CPU hotplug while the device tree potentially is inconsistent. Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-15powerpc/cacheinfo: add cacheinfo_teardown, cacheinfo_rebuildNathan Lynch2-0/+25
Allow external callers to force the cacheinfo code to release all its references to cache nodes, e.g. before processing device tree updates post-migration, and to rebuild the hierarchy afterward. CPU online/offline must be blocked by callers; enforce this. Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-14powerpc/pseries: Fix oops in hotplug memory notifierNathan Lynch1-0/+3
During post-migration device tree updates, we can oops in pseries_update_drconf_memory() if the source device tree has an ibm,dynamic-memory-v2 property and the destination has a ibm,dynamic_memory (v1) property. The notifier processes an "update" for the ibm,dynamic-memory property but it's really an add in this scenario. So make sure the old property object is there before dereferencing it. Fixes: 2b31e3aec1db ("powerpc/drmem: Add support for ibm, dynamic-memory-v2 property") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-06-14powerpc/pseries/hvconsole: Fix stack overread via udbgDaniel Axtens2-2/+16
While developing KASAN for 64-bit book3s, I hit the following stack over-read. It occurs because the hypercall to put characters onto the terminal takes 2 longs (128 bits/16 bytes) of characters at a time, and so hvc_put_chars() would unconditionally copy 16 bytes from the argument buffer, regardless of supplied length. However, udbg_hvc_putc() can call hvc_put_chars() with a single-byte buffer, leading to the error. ================================================================== BUG: KASAN: stack-out-of-bounds in hvc_put_chars+0xdc/0x110 Read of size 8 at addr c0000000023e7a90 by task swapper/0 CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-rc2-next-20190528-02824-g048a6ab4835b #113 Call Trace: dump_stack+0x104/0x154 (unreliable) print_address_description+0xa0/0x30c __kasan_report+0x20c/0x224 kasan_report+0x18/0x30 __asan_report_load8_noabort+0x24/0x40 hvc_put_chars+0xdc/0x110 hvterm_raw_put_chars+0x9c/0x110 udbg_hvc_putc+0x154/0x200 udbg_write+0xf0/0x240 console_unlock+0x868/0xd30 register_console+0x970/0xe90 register_early_udbg_console+0xf8/0x114 setup_arch+0x108/0x790 start_kernel+0x104/0x784 start_here_common+0x1c/0x534 Memory state around the buggy address: c0000000023e7980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0000000023e7a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 >c0000000023e7a80: f1 f1 01 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 ^ c0000000023e7b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0000000023e7b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Document that a 16-byte buffer is requred, and provide it in udbg. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>