aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kernel/idle_book3s.S (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-08-04powerpc/perf: POWER9 PMU stops after idle workaroundNicholas Piggin1-1/+7
POWER9 DD2 PMU can stop after a state-loss idle in some conditions. A solution is to set then clear MMCRA[60] after wake from state-loss idle. MMCRA[60] is a non-architected bit, see the user manual for details. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-18powerpc/perf: Avoid spurious PMU interrupts after idleNicholas Piggin1-1/+14
POWER9 DD2 can see spurious PMU interrupts after state-loss idle in some conditions. A solution is to save and reload MMCR0 over state-loss idle. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Tested-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28powerpc/powernv/idle: Clear r12 on wakeup from stop liteAkshay Adiga1-0/+13
pnv_wakeup_noloss() expects r12 to contain SRR1 value to determine if the wakeup reason is an HMI in CHECK_HMI_INTERRUPT. When we wakeup with ESL=0, SRR1 will not contain the wakeup reason, so there is no point setting r12 to SRR1. However, we don't set r12 at all so r12 contains garbage (likely a kernel pointer), and is still used to check HMI assuming that it contained SRR1. This causes the OPAL msglog to be filled with the following print: HMI: Received HMI interrupt: HMER = 0x0040000000000000 This patch clears r12 after waking up from stop with ESL=EC=0, so that we don't accidentally enter the HMI handler in pnv_wakeup_noloss() if the value of r12[42:45] corresponds to HMI as wakeup reason. Prior to commit 9d29250136f6 ("powerpc/64s/idle: Avoid SRR usage in idle sleep/wake paths") this bug existed, in that we would incorrectly look at SRR1 to check for a HMI when SRR1 didn't contain a wakeup reason. However the SRR1 value would just happen to never have bits 42:45 set. Fixes: 9d29250136f6 ("powerpc/64s/idle: Avoid SRR usage in idle sleep/wake paths") Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Change log and comment massaging] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27powerpc/64s: Invalidate ERAT on powersave wakeup for POWER9Benjamin Herrenschmidt1-0/+7
On POWER9 the ERAT may be incorrect on wakeup from some stop states that lose state. This causes random segvs and illegal instructions when these stop states are enabled. This patch invalidates the ERAT on wakeup on POWER9 to prevent this from causing a problem. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Merge comment change with upstream changes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19powerpc/64s/idle: Predict HMI wakeup as unlikelyNicholas Piggin1-1/+1
In a busy system, idle wakeups can be expected from IPIs and device interrupts. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19powerpc/64s/idle: Avoid SRR usage in idle sleep/wake pathsNicholas Piggin1-30/+27
Idle code now always runs at the 0xc... effective address whether in real or virtual mode. This means rfid can be ditched, along with a lot of SRR manipulations. In the wakeup path, carry SRR1 around in r12. Use mtmsrd to change MSR states as required. This also balances the return prediction for the idle call, by doing blr rather than rfid to return to the idle caller. On POWER9, 2-process context switch on different cores, with snooze disabled, increases performance by 2%. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Incorporate v2 fixes from Nick] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19powerpc/64s/idle: Move soft interrupt mask logic into C codeNicholas Piggin1-67/+15
This simplifies the asm and fixes irq-off tracing over sleep instructions. Also move powersave_nap check for POWER8 into C code, and move PSSCR register value calculation for POWER9 into C. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30powerpc/powernv/idle: Use Requested Level for restoring state on P9 DD1Gautham R. Shenoy1-1/+12
On Power9 DD1 due to a hardware bug the Power-Saving Level Status field (PLS) of the PSSCR for a thread waking up from a deep state can under-report if some other thread in the core is in a shallow stop state. The scenario in which this can manifest is as follows: 1) All the threads of the core are in deep stop. 2) One of the threads is woken up. The PLS for this thread will correctly reflect that it is waking up from deep stop. 3) The thread that has woken up now executes a shallow stop. 4) When some other thread in the core is woken, its PLS will reflect the shallow stop state. Thus, the subsequent thread for which the PLS is under-reporting the wakeup state will not restore the hypervisor resources. Hence, on DD1 systems, use the Requested Level (RL) field as a workaround to restore the contents of the hypervisor resources on the wakeup from the stop state. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30powerpc/powernv/idle: Restore LPCR on wakeup from deep-stopGautham R. Shenoy1-3/+10
On wakeup from a deep stop state which is supposed to lose the hypervisor state, we don't restore the LPCR to the old value but set it to a "sane" value via cur_cpu_spec->cpu_restore(). The problem is that the "sane" value doesn't include UPRT and the HR bits which are required to run correctly in Radix mode. Fix this on POWER9 onwards by restoring the LPCR value whatever it was before executing the stop instruction. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30powerpc/powernv/idle: Decouple Timebase restore & Per-core SPRs restoreGautham R. Shenoy1-3/+4
On POWER8, in case of - nap: both timebase and hypervisor state is retained. - fast-sleep: timebase is lost. But the hypervisor state is retained. - winkle: timebase and hypervisor state is lost. Hence, the current code for handling exit from a idle state assumes that if the timebase value is retained, then so is the hypervisor state. Thus, the current code doesn't restore per-core hypervisor state in such cases. But that is no longer the case on POWER9 where we do have stop states in which timebase value is retained, but the hypervisor state is lost. So we have to ensure that the per-core hypervisor state gets restored in such cases. Fix this by ensuring that even in the case when timebase is retained, we explicitly check if we are waking up from a deep stop that loses per-core hypervisor state (indicated by cr4 being eq or gt), and if this is the case, we restore the per-core hypervisor state. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-16powerpc/powernv: Set NAPSTATELOST after recovering paca on P9 DD1Gautham R. Shenoy1-1/+1
Commit 17ed4c8f81da ("powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1") promises to set the NAPSTATELOST bit in paca after recovering the correct paca for the thread waking up from stop1 on DD1, so that the GPRs can be correctly restored on the stop exit path. However, it loads the value 1 into r3, but stores the value in r0 into NAPSTATELOST(r13). Fix this by correctly set the NAPSTATELOST bit in paca after recovering the paca on POWER9 DD1. Fixes: 17ed4c8f81da ("powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1") Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-05Merge tag 'powerpc-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds1-60/+225
Pull powerpc updates from Michael Ellerman: "Highlights include: - Larger virtual address space on 64-bit server CPUs. By default we use a 128TB virtual address space, but a process can request access to the full 512TB by passing a hint to mmap(). - Support for the new Power9 "XIVE" interrupt controller. - TLB flushing optimisations for the radix MMU on Power9. - Support for CAPI cards on Power9, using the "Coherent Accelerator Interface Architecture 2.0". - The ability to configure the mmap randomisation limits at build and runtime. - Several small fixes and cleanups to the kprobes code, as well as support for KPROBES_ON_FTRACE. - Major improvements to handling of system reset interrupts, correctly treating them as NMIs, giving them a dedicated stack and using a new hypervisor call to trigger them, all of which should aid debugging and robustness. - Many fixes and other minor enhancements. Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anshuman Khandual, Anton Blanchard, Balbir Singh, Ben Hutchings, Benjamin Herrenschmidt, Bhupesh Sharma, Chris Packham, Christian Zigotzky, Christophe Leroy, Christophe Lombard, Daniel Axtens, David Gibson, Gautham R. Shenoy, Gavin Shan, Geert Uytterhoeven, Guilherme G. Piccoli, Hamish Martin, Hari Bathini, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mahesh J Salgaonkar, Mahesh Salgaonkar, Masami Hiramatsu, Matt Brown, Matthew R. Ochs, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pan Xinhui, Paul Mackerras, Rashmica Gupta, Russell Currey, Sukadev Bhattiprolu, Thadeu Lima de Souza Cascardo, Tobin C. Harding, Tyrel Datwyler, Uma Krishnan, Vaibhav Jain, Vipin K Parashar, Yang Shi" * tag 'powerpc-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits) powerpc/64s: Power9 has no LPCR[VRMASD] field so don't set it powerpc/powernv: Fix TCE kill on NVLink2 powerpc/mm/radix: Drop support for CPUs without lockless tlbie powerpc/book3s/mce: Move add_taint() later in virtual mode powerpc/sysfs: Move #ifdef CONFIG_HOTPLUG_CPU out of the function body powerpc/smp: Document irq enable/disable after migrating IRQs powerpc/mpc52xx: Don't select user-visible RTAS_PROC powerpc/powernv: Document cxl dependency on special case in pnv_eeh_reset() powerpc/eeh: Clean up and document event handling functions powerpc/eeh: Avoid use after free in eeh_handle_special_event() cxl: Mask slice error interrupts after first occurrence cxl: Route eeh events to all drivers in cxl_pci_error_detected() cxl: Force context lock during EEH flow powerpc/64: Allow CONFIG_RELOCATABLE if COMPILE_TEST powerpc/xmon: Teach xmon oops about radix vectors powerpc/mm/hash: Fix off-by-one in comment about kernel contexts ids powerpc/pseries: Enable VFIO powerpc/powernv: Fix iommu table size calculation hook for small tables powerpc/powernv: Check kzalloc() return value in pnv_pci_table_alloc powerpc: Add arch/powerpc/tools directory ...
2017-04-23powerpc/64s: Simplify POWER9 DD1 idle workaround codeNicholas Piggin1-11/+5
The idle workaround does not need to load PACATOC, and it does not need to be called within a nested function that requires LR to be saved. Load the PACATOC at entry to the idle wakeup. It does not matter which PACA this comes from, so it's okay to call before the workaround. Then apply the workaround to get the right PACA. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-23powerpc/64s: Idle POWER8 avoid full state loss recovery where possibleNicholas Piggin1-6/+83
If not all threads were in winkle, full state loss recovery is not necessary and can be avoided. A previous patch removed this optimisation due to some complexity with the implementation. Re-implement it by counting the number of threads in winkle with the per-core idle state. Only restore full state loss if all threads were in winkle. This has a small window of false positives right before threads execute winkle and just after they wake up, when the winkle count does not reflect the true number of threads in winkle. This is not a significant problem in comparison with even the minimum winkle duration. For correctness, a false positive is not a problem (only false negatives would be). Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-23powerpc/64s: Idle do not hold reservation longer than requiredNicholas Piggin1-9/+11
When taking the core idle state lock, grab it immediately like a regular lock, rather than adding more tests in there. Holding the lock keeps it stable, so there is no need to do it whole holding the reservation. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-23powerpc/64s: Expand core idle state bitsNicholas Piggin1-16/+17
In preparation for adding more bits to the core idle state word, move the lock bit up, and unlock by flipping the lock bit rather than masking off all but the thread bits. Add branch hints for atomic operations while we're here. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-23powerpc/64s: Fix POWER9 machine check handler from stop stateNicholas Piggin1-0/+25
The ISA specifies power save wakeup due to a machine check exception can cause a machine check interrupt (rather than the usual system reset interrupt). The machine check handler copes with this by doing low level machine check recovery without restoring full state from idle, then queues up a machine check event for logging, then directly executes the same idle instruction it woke from. This minimises the work done before recovery is performed. The problem is that it requires machine specific instructions and knowledge of the book3s idle code. Currently it only has code to handle POWER8 idle, so POWER9 crashes when trying to execute the P8 idle instructions which don't exist in ISAv3.0B. cpu 0x0: Vector: e40 (Emulation Assist) at [c0000000008f3810] pc: c000000000008380: machine_check_handle_early+0x130/0x2f0 lr: c00000000053a098: stop_loop+0x68/0xd0 sp: c0000000008f3a90 msr: 9000000000081001 current = 0xc0000000008a1080 paca = 0xc00000000ffd0000 softe: 0 irq_happened: 0x01 pid = 0, comm = swapper/0 Instead of going to sleep after recovery, do the usual idle wakeup and state restoration by calling into the normal idle wakeup path. This reuses the normal idle wakeup paths. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-23powerpc/64s: Use alternative feature patchingNicholas Piggin1-9/+14
This reduces the number of nops for POWER8. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-23powerpc/64s: Stop using bit in HSPRG0 to test winkleNicholas Piggin1-17/+10
The POWER8 idle code has a neat trick of programming the power on engine to restore a low bit into HSPRG0, so idle wakeup code can test and see if it has been programmed this way and therefore lost all state. Restore time can be reduced if winkle has not been reached. However this messes with our r13 PACA pointer, and requires HSPRG0 to be written to. It also optimizes the slowest and most uncommon case at the expense of another SPR write in the common nap state wakeup. Remove this complexity and assume winkle sleeps always require a state restore. This speedup could be made entirely contained within the winkle idle code by counting per-core winkles and setting a thread bitmap when all have gone to winkle. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-23powerpc/64s: Move remaining system reset idle code into idle_book3s.SNicholas Piggin1-21/+43
No functional change. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-11powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1Gautham R. Shenoy1-1/+47
POWER9 DD1.0 hardware has a bug where the SPRs of a thread waking up from stop 0,1,2 with ESL=1 can endup being misplaced in the core. Thus the HSPRG0 of a thread waking up from can contain the paca pointer of its sibling. This patch implements a context recovery framework within threads of a core, by provisioning space in paca_struct for saving every sibling threads's paca pointers. Basically, we should be able to arrive at the right paca pointer from any of the thread's existing paca pointer. At bootup, during powernv idle-init, we save the paca address of every CPU in each one its siblings paca_struct in the slot corresponding to this CPU's index in the core. On wakeup from a stop, the thread will determine its index in the core from the TIR register and recover its PACA pointer by indexing into the correct slot in the provisioned space in the current PACA. Furthermore, ensure that the NVGPRs are restored from the stack on the way out by setting the NAPSTATELOST in paca. [Changelog written with inputs from svaidy@linux.vnet.ibm.com] Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Call it a bug] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-20powerpc/64s: Fix idle wakeup potential to clobber registersNicholas Piggin1-3/+17
We concluded there may be a window where the idle wakeup code could get to pnv_wakeup_tb_loss() (which clobbers non-volatile GPRs), but the hardware may set SRR1[46:47] to 01b (no state loss) which would result in the wakeup code failing to restore non-volatile GPRs. I was not able to trigger this condition with trivial tests on real hardware or simulator, but the ISA (at least 2.07) seems to allow for it, and Gautham says that it can happen if there is an exception pending when the sleep/winkle instruction is executed. Fixes: 1706567117ba ("powerpc/kvm: make hypervisor state restore a function") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-03powerpc/powernv: Fix bug due to labeling ambiguity in power_enter_stopGautham R. Shenoy1-4/+6
Commit 09206b600c76 ("powernv: Pass PSSCR value and mask to power9_idle_stop") added additional code in power_enter_stop() to distinguish between stop requests whose PSSCR had ESL=EC=1 from those which did not. When ESL=EC=1, we do a forward-jump to a location labelled by "1", which had the code to handle the ESL=EC=1 case. Unfortunately just a couple of instructions before this label, is the macro IDLE_STATE_ENTER_SEQ() which also has a label "1" in its expansion. As a result, the current code can result in directly executing stop instruction for deep stop requests with PSSCR ESL=EC=1, without saving the hypervisor state. Fix this BUG by labeling the location that handles ESL=EC=1 case with a more descriptive label ".Lhandle_esl_ec_set" (local label suggestion a la .Lxx from Anton Blanchard). While at it, rename the label "2" labelling the location of the code handling entry into deep stop states with ".Lhandle_deep_stop". For a good measure, change the label in IDLE_STATE_ENTER_SEQ() macro to an not-so commonly used value in order to avoid similar mishaps in the future. Fixes: 09206b600c76 ("powernv: Pass PSSCR value and mask to power9_idle_stop") Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-14Merge branch 'topic/ppc-kvm' into nextMichael Ellerman1-3/+3
Merge the topic branch we're sharing with the kvm-ppc tree.
2017-02-07powerpc/powernv: Remove separate entry for OPAL real mode callsBenjamin Herrenschmidt1-3/+3
All entry points already read the MSR so they can easily do the right thing. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powernv: Pass PSSCR value and mask to power9_idle_stopGautham R. Shenoy1-11/+19
The power9_idle_stop method currently takes only the requested stop level as a parameter and picks up the rest of the PSSCR bits from a hand-coded macro. This is not a very flexible design, especially when the firmware has the capability to communicate the psscr value and the mask associated with a particular stop state via device tree. This patch modifies the power9_idle_stop API to take as parameters the PSSCR value and the PSSCR mask corresponding to the stop state that needs to be set. These PSSCR value and mask are respectively obtained by parsing the "ibm,cpu-idle-state-psscr" and "ibm,cpu-idle-state-psscr-mask" fields from the device tree. In addition to this, the patch adds support for handling stop states for which ESL and EC bits in the PSSCR are zero. As per the architecture, a wakeup from these stop states resumes execution from the subsequent instruction as opposed to waking up at the System Vector. The older firmware sets only the Requested Level (RL) field in the psscr and psscr-mask exposed in the device tree. For older firmware where psscr-mask=0xf, this patch will set the default sane values that the set for for remaining PSSCR fields (i.e PSLL, MTL, ESL, EC, and TR). For the new firmware, the patch will validate that the invariants required by the ISA for the psscr values are maintained by the firmware. This skiboot patch that exports fully populated PSSCR values and the mask for all the stop states can be found here: https://lists.ozlabs.org/pipermail/skiboot/2016-September/004869.html [Optimize the number of instructions before entering STOP with ESL=EC=0, validate the PSSCR values provided by the firimware maintains the invariants required as per the ISA suggested by Balbir Singh] Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31powernv:idle: Add IDLE_STATE_ENTER_SEQ_NORET macroGautham R. Shenoy1-5/+5
Currently all the low-power idle states are expected to wake up at reset vector 0x100. Which is why the macro IDLE_STATE_ENTER_SEQ that puts the CPU to an idle state and never returns. On ISA v3.0, when the ESL and EC bits in the PSSCR are zero, the CPU is expected to wake up at the next instruction of the idle instruction. This patch adds a new macro named IDLE_STATE_ENTER_SEQ_NORET for the no-return variant and reuses the name IDLE_STATE_ENTER_SEQ for a variant that allows resuming operation at the instruction next to the idle-instruction. Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-10-24powerpc/64: Fix race condition in setting lock bit in idle/wakeup codePaul Mackerras1-0/+3
This fixes a race condition where one thread that is entering or leaving a power-saving state can inadvertently ignore the lock bit that was set by another thread, and potentially also clear it. The core_idle_lock_held function is called when the lock bit is seen to be set. It polls the lock bit until it is clear, then does a lwarx to load the word containing the lock bit and thread idle bits so it can be updated. However, it is possible that the value loaded with the lwarx has the lock bit set, even though an immediately preceding lwz loaded a value with the lock bit clear. If this happens then we go ahead and update the word despite the lock bit being set, and when called from pnv_enter_arch207_idle_mode, we will subsequently clear the lock bit. No identifiable misbehaviour has been attributed to this race. This fixes it by checking the lock bit in the value loaded by the lwarx. If it is set then we just go back and keep on polling. Fixes: b32aadc1a8ed ("powerpc/powernv: Fix race in updating core_idle_state") Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-10-24powerpc/64: Re-fix race condition between going idle and entering guestPaul Mackerras1-6/+26
Commit 8117ac6a6c2f ("powerpc/powernv: Switch off MMU before entering nap/sleep/rvwinkle mode", 2014-12-10) fixed a race condition where one thread entering a KVM guest could switch the MMU context to the guest while another thread was still in host kernel context with the MMU on. That commit moved the point where a thread entering a power-saving mode set its kvm_hstate.hwthread_state field in its PACA to KVM_HWTHREAD_IN_IDLE from a point where the MMU was on to after the MMU had been switched off. That commit also added a comment explaining that we have to switch to real mode before setting hwthread_state to avoid this race. Nevertheless, commit 4eae2c9ae54a ("powerpc/powernv: Make pnv_powersave_common more generic", 2016-07-08) subsequently moved the setting of hwthread_state back to a point where the MMU is on, thus reintroducing the race, despite the comment saying that this should not be done being included in full in the context lines of the patch that did it. This fixes the race again and adds a bigger and shoutier comment explaining the potential race condition. Fixes: 4eae2c9ae54a ("powerpc/powernv: Make pnv_powersave_common more generic") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: Shreyas B. Prabhu <shreyasbp@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-09-12powerpc/powernv: Fix restore of SPRs upon wake up from hypervisor state lossGautham R. Shenoy1-5/+5
pnv_wakeup_tb_loss() currently expects cr4 to be "eq" if the CPU is waking up from a complete hypervisor state loss. Hence, it currently restores the SPR contents only if cr4 is "eq". However, after commit bcef83a00dc4 ("powerpc/powernv: Add platform support for stop instruction"), on ISA v3.0 CPUs, the function pnv_restore_hyp_resource() sets cr4 to contain the result of the comparison between the state the CPU has woken up from and the first deep stop state before calling pnv_wakeup_tb_loss(). Thus if the CPU woke up from a state that is deeper than the first deep stop state, cr4 will have "gt" set and hence, pnv_wakeup_tb_loss() will fail to restore the SPRs on waking up from such a state. Fix the code in pnv_wakeup_tb_loss() to restore the SPR states when cr4 is "eq" or "gt". Fixes: bcef83a00dc4 ("powerpc/powernv: Add platform support for stop instruction") Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Shreyas B. Prabhu <shreyasbp@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09powerpc/powernv: Move IDLE_STATE_ENTER_SEQ macro to cpuidle.hMahesh Salgaonkar1-12/+0
Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h so that MCE handler changes in subsequent patch can use it. No functionality change. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09powerpc/powernv: Load correct TOC pointer while waking up from winkle.Mahesh Salgaonkar1-1/+4
The function pnv_restore_hyp_resource() loads the TOC into r2 from the invalid PACA pointer before fixing r13 value. This do not affect POWER ISA 3.0 but it does have an impact on POWER ISA 2.07 or less leading CPU to get stuck forever. login: [ 471.830433] Processor 120 is stuck. This can be easily reproducible using following steps: - Turn off SMT $ ppc64_cpu --smt=off - offline/online any online cpu (Thread 0 of any core which is online) $ echo 0 > /sys/devices/system/cpu/cpu<num>/online $ echo 1 > /sys/devices/system/cpu/cpu<num>/online For POWER ISA 2.07 or less, the last bit of HSPRG0 is set indicating that thread is waking up from winkle. Hence, the last bit of HSPRG0(r13) needs to be clear before accessing it as PACA to avoid loading invalid values from invalid PACA pointer. Fix this by loading TOC after r13 register is corrected. Fixes: bcef83a00dc4 ("powerpc/powernv: Add platform support for stop instruction") Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-05Merge tag 'powerpc-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds1-1/+1
Pull more powerpc updates from Michael Ellerman: "These were delayed for various reasons, so I let them sit in next a bit longer, rather than including them in my first pull request. Fixes: - Fix early access to cpu_spec relocation from Benjamin Herrenschmidt - Fix incorrect event codes in power9-event-list from Madhavan Srinivasan - Move register_process_table() out of ppc_md from Michael Ellerman Use jump_label use for [cpu|mmu]_has_feature(): - Add mmu_early_init_devtree() from Michael Ellerman - Move disable_radix handling into mmu_early_init_devtree() from Michael Ellerman - Do hash device tree scanning earlier from Michael Ellerman - Do radix device tree scanning earlier from Michael Ellerman - Do feature patching before MMU init from Michael Ellerman - Check features don't change after patching from Michael Ellerman - Make MMU_FTR_RADIX a MMU family feature from Aneesh Kumar K.V - Convert mmu_has_feature() to returning bool from Michael Ellerman - Convert cpu_has_feature() to returning bool from Michael Ellerman - Define radix_enabled() in one place & use static inline from Michael Ellerman - Add early_[cpu|mmu]_has_feature() from Michael Ellerman - Convert early cpu/mmu feature check to use the new helpers from Aneesh Kumar K.V - jump_label: Make it possible for arches to invoke jump_label_init() earlier from Kevin Hao - Call jump_label_init() in apply_feature_fixups() from Aneesh Kumar K.V - Remove mfvtb() from Kevin Hao - Move cpu_has_feature() to a separate file from Kevin Hao - Add kconfig option to use jump labels for cpu/mmu_has_feature() from Michael Ellerman - Add option to use jump label for cpu_has_feature() from Kevin Hao - Add option to use jump label for mmu_has_feature() from Kevin Hao - Catch usage of cpu/mmu_has_feature() before jump label init from Aneesh Kumar K.V - Annotate jump label assembly from Michael Ellerman TLB flush enhancements from Aneesh Kumar K.V: - radix: Implement tlb mmu gather flush efficiently - Add helper for finding SLBE LLP encoding - Use hugetlb flush functions - Drop multiple definition of mm_is_core_local - radix: Add tlb flush of THP ptes - radix: Rename function and drop unused arg - radix/hugetlb: Add helper for finding page size - hugetlb: Add flush_hugetlb_tlb_range - remove flush_tlb_page_nohash Add new ptrace regsets from Anshuman Khandual and Simon Guo: - elf: Add powerpc specific core note sections - Add the function flush_tmregs_to_thread - Enable in transaction NT_PRFPREG ptrace requests - Enable in transaction NT_PPC_VMX ptrace requests - Enable in transaction NT_PPC_VSX ptrace requests - Adapt gpr32_get, gpr32_set functions for transaction - Enable support for NT_PPC_CGPR - Enable support for NT_PPC_CFPR - Enable support for NT_PPC_CVMX - Enable support for NT_PPC_CVSX - Enable support for TM SPR state - Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR - Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR - Enable support for EBB registers - Enable support for Performance Monitor registers" * tag 'powerpc-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (48 commits) powerpc/mm: Move register_process_table() out of ppc_md powerpc/perf: Fix incorrect event codes in power9-event-list powerpc/32: Fix early access to cpu_spec relocation powerpc/ptrace: Enable support for Performance Monitor registers powerpc/ptrace: Enable support for EBB registers powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR powerpc/ptrace: Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR powerpc/ptrace: Enable support for TM SPR state powerpc/ptrace: Enable support for NT_PPC_CVSX powerpc/ptrace: Enable support for NT_PPC_CVMX powerpc/ptrace: Enable support for NT_PPC_CFPR powerpc/ptrace: Enable support for NT_PPC_CGPR powerpc/ptrace: Adapt gpr32_get, gpr32_set functions for transaction powerpc/ptrace: Enable in transaction NT_PPC_VSX ptrace requests powerpc/ptrace: Enable in transaction NT_PPC_VMX ptrace requests powerpc/ptrace: Enable in transaction NT_PRFPREG ptrace requests powerpc/process: Add the function flush_tmregs_to_thread elf: Add powerpc specific core note sections powerpc/mm: remove flush_tlb_page_nohash powerpc/mm/hugetlb: Add flush_hugetlb_tlb_range ...
2016-08-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-1/+3
Pull KVM updates from Paolo Bonzini: - ARM: GICv3 ITS emulation and various fixes. Removal of the old VGIC implementation. - s390: support for trapping software breakpoints, nested virtualization (vSIE), the STHYI opcode, initial extensions for CPU model support. - MIPS: support for MIPS64 hosts (32-bit guests only) and lots of cleanups, preliminary to this and the upcoming support for hardware virtualization extensions. - x86: support for execute-only mappings in nested EPT; reduced vmexit latency for TSC deadline timer (by about 30%) on Intel hosts; support for more than 255 vCPUs. - PPC: bugfixes. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits) KVM: PPC: Introduce KVM_CAP_PPC_HTM MIPS: Select HAVE_KVM for MIPS64_R{2,6} MIPS: KVM: Reset CP0_PageMask during host TLB flush MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX() MIPS: KVM: Sign extend MFC0/RDHWR results MIPS: KVM: Fix 64-bit big endian dynamic translation MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase MIPS: KVM: Use 64-bit CP0_EBase when appropriate MIPS: KVM: Set CP0_Status.KX on MIPS64 MIPS: KVM: Make entry code MIPS64 friendly MIPS: KVM: Use kmap instead of CKSEG0ADDR() MIPS: KVM: Use virt_to_phys() to get commpage PFN MIPS: Fix definition of KSEGX() for 64-bit KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD kvm: x86: nVMX: maintain internal copy of current VMCS KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures KVM: arm64: vgic-its: Simplify MAPI error handling KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers KVM: arm64: vgic-its: Turn device_id validation into generic ID validation ...
2016-08-01powerpc/mm: Make MMU_FTR_RADIX a MMU family featureAneesh Kumar K.V1-1/+1
MMU feature bits are defined such that we use the lower half to present MMU family features. Remove the strict split of half and also move Radix to a mmu family feature. Radix introduce a new MMU model and strictly speaking it is a new MMU family. This also free up bits which can be used for individual features later. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/opal: Add real mode call wrappersBenjamin Herrenschmidt1-11/+5
Replace the old generic opal_call_realmode() with proper per-call wrappers similar to the normal ones and convert callers. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: Add platform support for stop instructionShreyas B. Prabhu1-34/+159
POWER ISA v3 defines a new idle processor core mechanism. In summary, a) new instruction named stop is added. This instruction replaces instructions like nap, sleep, rvwinkle. b) new per thread SPR named Processor Stop Status and Control Register (PSSCR) is added which controls the behavior of stop instruction. PSSCR layout: ---------------------------------------------------------- | PLS | /// | SD | ESL | EC | PSLL | /// | TR | MTL | RL | ---------------------------------------------------------- 0 4 41 42 43 44 48 54 56 60 PSSCR key fields: Bits 0:3 - Power-Saving Level Status. This field indicates the lowest power-saving state the thread entered since stop instruction was last executed. Bit 42 - Enable State Loss 0 - No state is lost irrespective of other fields 1 - Allows state loss Bits 44:47 - Power-Saving Level Limit This limits the power-saving level that can be entered into. Bits 60:63 - Requested Level Used to specify which power-saving level must be entered on executing stop instruction This patch adds support for stop instruction and PSSCR handling. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: abstraction for saving SPRs before entering deep idle statesShreyas B. Prabhu1-22/+32
Create a function for saving SPRs before entering deep idle states. This function can be reused for POWER9 deep idle states. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: Make pnv_powersave_common more genericShreyas B. Prabhu1-9/+14
pnv_powersave_common does common steps needed before entering idle state and eventually changes MSR to MSR_IDLE and does rfid to pnv_enter_arch207_idle_mode. Move the updation of HSTATE_HWTHREAD_STATE to pnv_powersave_common from pnv_enter_arch207_idle_mode and make it more generic by passing the rfid address as a function parameter. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: Rename reusable idle functions to hardware agnostic namesShreyas B. Prabhu1-16/+17
Functions like power7_wakeup_loss, power7_wakeup_noloss, power7_wakeup_tb_loss are used by POWER7 and POWER8 hardware. They can also be used by POWER9. Hence rename these functions hardware agnostic names. Suggested-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: Rename idle_power7.S to idle_book3s.SShreyas B. Prabhu1-0/+527
idle_power7.S handles idle entry/exit for POWER7, POWER8 and in next patch for POWER9. Rename the file to a non-hardware specific name. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>