aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2016-07-17powerpc/mm/hash: Update SDR1 size encoding as documented in ISA 3.0Aneesh Kumar K.V1-5/+4
ISA 3.0 document hash table size in bytes = 2^(HTABSIZE + 18) No functionality change by this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/mm: Print formation regarding the the MMU modeAneesh Kumar K.V2-2/+4
This helps in easily identifying the MMU mode with which the kernel is operating. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/mm: Clear top 16 bits of va only on older cpusAneesh Kumar K.V3-6/+13
As per ISA, we need to do this only for architecture version 2.02 and earlier. This continued to work even for 2.07. But let's not do this for anything after 2.02. ISA 3.0 requires these top bits to be not cleared. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/mm: Compile out radix related functions if RADIX_MMU is disabledAneesh Kumar K.V1-0/+5
Currently we depend on mmu_has_feature to evalute to zero based on MMU_FTRS_POSSIBLE mask. In a later patch, we want to update radix_enabled() to runtime update the conditional operation to a jump instruction. This implies we cannot depend on MMU_FTRS_POSSIBLE mask. Instead define radix_enabled to return 0 if RADIX_MMU is not enabled. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/mm: use _raw variant of page table accessorsAneesh Kumar K.V4-35/+91
This switch few of the page table accessor to use the __raw variant and does the cpu to big endian conversion of constants. This helps in generating better code. For ex: a pgd_none(pgd) check with and without fix is listed below Without fix: ------------ 2240: 20 00 61 eb ld r27,32(r1) /* PGD level */ typedef struct { __be64 pgd; } pgd_t; static inline unsigned long pgd_val(pgd_t x) { return be64_to_cpu(x.pgd); 2244: 22 00 66 78 rldicl r6,r3,32,32 2248: 3e 40 7d 54 rotlwi r29,r3,8 224c: 0e c0 7d 50 rlwimi r29,r3,24,0,7 2250: 3e 40 c5 54 rotlwi r5,r6,8 2254: 2e c4 7d 50 rlwimi r29,r3,24,16,23 2258: 0e c0 c5 50 rlwimi r5,r6,24,0,7 225c: 2e c4 c5 50 rlwimi r5,r6,24,16,23 2260: c6 07 bd 7b rldicr r29,r29,32,31 2264: 78 2b bd 7f or r29,r29,r5 if (pgd_none(pgd)) 2268: 00 00 bd 2f cmpdi cr7,r29,0 226c: 54 03 9e 41 beq cr7,25c0 <__get_user_pages_fast+0x500> With fix: --------- 2370: 20 00 61 eb ld r27,32(r1) if (pgd_none(pgd)) 2374: 00 00 bd 2f cmpdi cr7,r29,0 2378: a8 03 9e 41 beq cr7,2720 <__get_user_pages_fast+0x530> break; Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/mm/radix: Update LPCR HR bit as per ISAAneesh Kumar K.V2-2/+3
PowerISA 3.0 requires the MMU mode (radix vs. hash) of the hypervisor to be mirrored in the LPCR register, in addition to the partition table. This is done to avoid fetching from the table when deciding, among other things, how to perform transitions to HV mode on some interrupts. So let's set it up appropriately Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/mm: Fix .long's in tlb-radix.c to more meaningfulBalbir Singh2-8/+19
The .longs with the shifts are harder to read, use more meaningful names for the opcodes. PPC_TLBIE_5 is introduced for the 5 opcode variation of the instruction due to an existing op-code for the 2 opcode variant. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/pci: Don't try to allocate resources that will be reassignedBenjamin Herrenschmidt1-2/+4
When we know we will reassign all resources, trying (and failing) to allocate them initially is fairly pointless and leads to a lot of scary messages in the kernel log Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/powernv/pci: Check status of a PHB before using itBenjamin Herrenschmidt1-0/+3
If the firmware encounters an error (internal or HW) during initialization of a PHB, it might leave the device-node in the tree but mark it disabled using the "status" property. We should check it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/powernv/pci: Use the device-tree to get available range of M64'sBenjamin Herrenschmidt1-6/+45
M64's are the configurable 64-bit windows that cover the 64-bit MMIO space. We used to hard code 16 windows. Newer chips might have a variable number and might need to reserve some as well (for example on PHB4/POWER9, M32 and M64 are actually unified and we use M64#0 to map the 32-bit space). So newer OPALs will provide a property we can use to know what range of windows is available. The property is named so that it can eventually support multiple ranges but we only use the first one for now. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/powernv/pci: Fallback to OPAL for TCE invalidationsBenjamin Herrenschmidt1-4/+29
If we don't find registers for the PHB or don't know the model specific invalidation method, use OPAL calls instead. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/powernv/pci: Rework accessing the TCE invalidate registerBenjamin Herrenschmidt2-48/+28
It's architected, always in a known place, so there is no need to keep a separate pointer to it, we use the existing "regs", and we complement it with a real mode variant. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> # Conflicts: # arch/powerpc/platforms/powernv/pci-ioda.c # arch/powerpc/platforms/powernv/pci.h Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/powernv/pci: Remove SWINV constants and obsolete TCE codeBenjamin Herrenschmidt2-43/+10
We have some obsolete code in pnv_pci_p7ioc_tce_invalidate() to handle some internal lab tools that have stopped being useful a long time ago. Remove that along with the definition and test for the TCE_PCI_SWINV_* flags whose value is basically always the same. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/powernv/pci: Rename TCE invalidation callsBenjamin Herrenschmidt3-25/+23
The TCE invalidation functions are fairly implementation specific, and while the IODA specs more/less describe the register, in practice various implementation workarounds may be required. So name the functions after the target PHB. Note today and for the foreseeable future, there's a 1:1 relationship between an IODA version and a PHB implementation. There exist another variant of IODA1 (Torrent) but we never supported in with OPAL and never will. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/opal: Add real mode call wrappersBenjamin Herrenschmidt4-44/+51
Replace the old generic opal_call_realmode() with proper per-call wrappers similar to the normal ones and convert callers. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/pseries/pci: Remove obsolete SW invalidateBenjamin Herrenschmidt1-52/+1
That was used by some old IBM internal bringup tools and is no longer relevant. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/powernv: Discover IODA3 PHBsBenjamin Herrenschmidt2-1/+6
We instanciate them as IODA2. We also change the MSI EOI hack to only kick on PHB3 since it will not be needed on any new implementation. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/xics: Add ICP OPAL backendBenjamin Herrenschmidt4-2/+155
This adds a new XICS backend that uses OPAL calls, which can be used when we don't have native support for the platform interrupt controller. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/irq: Add mechanism to force a replay of interruptsBenjamin Herrenschmidt2-0/+17
Calling this function with interrupts soft-disabled will cause a replay of the external interrupt vector when they are re-enabled. This will be used by the OPAL XICS backend (and latter by the native XIVE code) to handle EOI signaling that there are more interrupts to fetch from the hardware since the hardware won't issue another HW interrupt in that case. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/irq: Add support for HV virtualization interruptsBenjamin Herrenschmidt4-0/+24
This will be delivering external interrupts from the XIVE to the Hypervisor. We treat it as a normal external interrupt for the lazy irq disable code (so it will be replayed as a 0x500) and route it to do_IRQ. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-17powerpc/book64s: Move a few exception common handlers to make roomBenjamin Herrenschmidt1-8/+9
This moves the CBE RAS and facility unavailable "common" handlers down to after the FWNMI page. This frees up some space in the very demanded spaces before the relocation-on vectors and before the FWNMI page. They are still within 64K of __start, so CONFIG_RELOCATABLE should still work. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: Add XICS emulation APIsBenjamin Herrenschmidt3-1/+14
OPAL provides an emulated XICS interrupt controller to use as a fallback on newer processors that don't have a XICS. It's meant as a way to provide backward compatibility with future processors. Add the corresponding interfaces. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: Use deepest stop state when cpu is offlinedShreyas B. Prabhu3-3/+17
If hardware supports stop state, use the deepest stop state when the cpu is offlined. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15cpuidle/powernv: Add support for POWER ISA v3 idle statesShreyas B. Prabhu1-0/+61
POWER ISA v3 defines a new idle processor core mechanism. In summary, a) new instruction named stop is added. b) new per thread SPR named PSSCR is added which controls the behavior of stop instruction. Supported idle states and value to be written to PSSCR register to enter any idle state is exposed via ibm,cpu-idle-state-names and ibm,cpu-idle-state-psscr respectively. To enter an idle state, platform provided power_stop() needs to be invoked with the appropriate PSSCR value. This patch adds support for this new mechanism in cpuidle powernv driver. Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com> Cc: linux-pm@vger.kernel.org Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: linuxppc-dev@lists.ozlabs.org Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15cpuidle/powernv: cleanup cpuidle-powernv.cShreyas B. Prabhu1-18/+20
- Use stack instead of kzalloc'ed memory for variables while probing device tree for idle states. - Set cap for number of idle states that can be added to cpuidle_state_table - Minor change in way we check of_property_read_u32_array for error for sake of consistency - Drop unnecessary "&" while assigning function pointer Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: linux-pm@vger.kernel.org Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of MAX_POWERNV_IDLE_STATESShreyas B. Prabhu1-3/+1
Use cpuidle's CPUIDLE_STATE_MAX macro instead of powernv specific MAX_POWERNV_IDLE_STATES. Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: linux-pm@vger.kernel.org Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: Add platform support for stop instructionShreyas B. Prabhu8-66/+332
POWER ISA v3 defines a new idle processor core mechanism. In summary, a) new instruction named stop is added. This instruction replaces instructions like nap, sleep, rvwinkle. b) new per thread SPR named Processor Stop Status and Control Register (PSSCR) is added which controls the behavior of stop instruction. PSSCR layout: ---------------------------------------------------------- | PLS | /// | SD | ESL | EC | PSLL | /// | TR | MTL | RL | ---------------------------------------------------------- 0 4 41 42 43 44 48 54 56 60 PSSCR key fields: Bits 0:3 - Power-Saving Level Status. This field indicates the lowest power-saving state the thread entered since stop instruction was last executed. Bit 42 - Enable State Loss 0 - No state is lost irrespective of other fields 1 - Allows state loss Bits 44:47 - Power-Saving Level Limit This limits the power-saving level that can be entered into. Bits 60:63 - Requested Level Used to specify which power-saving level must be entered on executing stop instruction This patch adds support for stop instruction and PSSCR handling. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: abstraction for saving SPRs before entering deep idle statesShreyas B. Prabhu1-22/+32
Create a function for saving SPRs before entering deep idle states. This function can be reused for POWER9 deep idle states. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: Make pnv_powersave_common more genericShreyas B. Prabhu1-9/+14
pnv_powersave_common does common steps needed before entering idle state and eventually changes MSR to MSR_IDLE and does rfid to pnv_enter_arch207_idle_mode. Move the updation of HSTATE_HWTHREAD_STATE to pnv_powersave_common from pnv_enter_arch207_idle_mode and make it more generic by passing the rfid address as a function parameter. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: Rename reusable idle functions to hardware agnostic namesShreyas B. Prabhu3-22/+23
Functions like power7_wakeup_loss, power7_wakeup_noloss, power7_wakeup_tb_loss are used by POWER7 and POWER8 hardware. They can also be used by POWER9. Hence rename these functions hardware agnostic names. Suggested-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: Rename idle_power7.S to idle_book3s.SShreyas B. Prabhu2-1/+1
idle_power7.S handles idle entry/exit for POWER7, POWER8 and in next patch for POWER9. Rename the file to a non-hardware specific name. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/kvm: make hypervisor state restore a functionShreyas B. Prabhu2-54/+46
In the current code, when the thread wakes up in reset vector, some of the state restore code and check for whether a thread needs to branch to kvm is duplicated. Reorder the code such that this duplication is avoided. At a higher level this is what the change looks like- Before this patch - power7_wakeup_tb_loss: restore hypervisor state if (thread needed by kvm) goto kvm_start_guest restore nvgprs, cr, pc rfid to process context power7_wakeup_loss: restore nvgprs, cr, pc rfid to process context reset vector: if (waking from deep idle states) goto power7_wakeup_tb_loss else if (thread needed by kvm) goto kvm_start_guest goto power7_wakeup_loss After this patch - power7_wakeup_tb_loss: restore hypervisor state return power7_restore_hyp_resource(): if (waking from deep idle states) goto power7_wakeup_tb_loss return power7_wakeup_loss: restore nvgprs, cr, pc rfid to process context reset vector: power7_restore_hyp_resource() if (thread needed by kvm) goto kvm_start_guest goto power7_wakeup_loss Reviewed-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/powernv: Use PNV_THREAD_WINKLE macro while requesting for winkleShreyas B. Prabhu1-1/+1
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/lib: Clarify that adde is an instruction and we mean pluralStewart Smith1-6/+6
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/pseries: Remove call to memblock_add()Nathan Fontenot1-27/+10
The call to memblock_add is not needed, this is already done by memory_add(). This patch removes this call which shrinks dlpar_add_lmb_memory() enough that it can be merged into dlpar_add_lmb(). Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/pseries: Auto-online hotplugged memoryNathan Fontenot2-14/+1
A recent update (commit id 31bc3858ea3) allows for automatically onlining memory that is added. This patch sets the config option CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y for pseries and updates the pseries memory hotplug code so that DLPAR added memory can be automatically onlined instead of explicitly onlining the memory. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/pseries: Dynamic add entires to associativity lookup arrayNathan Fontenot1-27/+68
Dynamically add entries to the associativity lookup array The ibm,associativity-lookup-arrays property may only contain associativity arrays for LMBs present at boot time. When hotplug adding a LMB its associativity array may not be in the associativity lookup array, this patch adds the ability to add new entries to the associativity lookup array. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/pseries: Move property cloning into its own routineNathan Fontenot1-12/+26
Move property cloning code into its own routine Split the pieces of dlpar_clone_drconf_property() that create a copy of the property struct into its own routine. This allows for creating clones of more than just the ibm,dynamic-memory property used in memory hotplug. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/pseries: HVC early debug options should depend on HVC_CONSOLEMichael Ellerman1-2/+2
The pseries HVC early debug options, CONFIG_PPC_EARLY_DEBUG_LPAR and CONFIG_PPC_EARLY_DEBUG_LPAR_HVSI both require code that is part of the hvc driver. If we turn them on but not CONFIG_HVC_CONSOLE then we get: arch/powerpc/kernel/built-in.o: In function `.udbg_early_init': arch/powerpc/kernel/built-in.o:(.debug_addr+0x9a00): undefined reference to `udbg_init_debug_lpar' Similarly for HVSI. So make them both depend on CONFIG_HVC_CONSOLE. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-15powerpc/tm: Fix stack pointer corruption in __tm_recheckpoint()Michael Neuling1-2/+1
At the start of __tm_recheckpoint() we save the kernel stack pointer (r1) in SPRG SCRATCH0 (SPRG2) so that we can restore it after the trecheckpoint. Unfortunately, the same SPRG is used in the SLB miss handler. If an SLB miss is taken between the save and restore of r1 to the SPRG, the SPRG is changed and hence r1 is also corrupted. We can end up with the following crash when we start using r1 again after the restore from the SPRG: Oops: Bad kernel stack pointer, sig: 6 [#1] SMP NR_CPUS=2048 NUMA pSeries CPU: 658 PID: 143777 Comm: htm_demo Tainted: G EL X 4.4.13-0-default #1 task: c0000b56993a7810 ti: c00000000cfec000 task.ti: c0000b56993bc000 NIP: c00000000004f188 LR: 00000000100040b8 CTR: 0000000010002570 REGS: c00000000cfefd40 TRAP: 0300 Tainted: G EL X (4.4.13-0-default) MSR: 8000000300001033 <SF,ME,IR,DR,RI,LE> CR: 02000424 XER: 20000000 CFAR: c000000000008468 DAR: 00003ffd84e66880 DSISR: 40000000 SOFTE: 0 PACATMSCRATCH: 00003ffbc865e680 GPR00: fffffffcfabc4268 00003ffd84e667a0 00000000100d8c38 000000030544bb80 GPR04: 0000000000000002 00000000100cf200 0000000000000449 00000000100cf100 GPR08: 000000000000c350 0000000000002569 0000000000002569 00000000100d6c30 GPR12: 00000000100d6c28 c00000000e6a6b00 00003ffd84660000 0000000000000000 GPR16: 0000000000000003 0000000000000449 0000000010002570 0000010009684f20 GPR20: 0000000000800000 00003ffd84e5f110 00003ffd84e5f7a0 00000000100d0f40 GPR24: 0000000000000000 0000000000000000 0000000000000000 00003ffff0673f50 GPR28: 00003ffd84e5e960 00000000003d0f00 00003ffd84e667a0 00003ffd84e5e680 NIP [c00000000004f188] restore_gprs+0x110/0x17c LR [00000000100040b8] 0x100040b8 Call Trace: Instruction dump: f8a1fff0 e8e700a8 38a00000 7ca10164 e8a1fff8 e821fff0 7c0007dd 7c421378 7db142a6 7c3242a6 38800002 7c810164 <e9c100e0> e9e100e8 ea0100f0 ea2100f8 We hit this on large memory machines (> 2TB) but it can also be hit on smaller machines when 1TB segments are disabled. To hit this, you also need to be virtualised to ensure SLBs are periodically removed by the hypervisor. This patches moves the saving of r1 to the SPRG to the region where we are guaranteed not to take any further SLB misses. Fixes: 98ae22e15b43 ("powerpc: Add helper functions for transactional memory context switching") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14powerpc: Make ppc_md.{halt, restart} __noreturnDaniel Axtens33-55/+60
powernv marks it's halt and restart calls as __noreturn. However, ppc_md does not have this annotation. Add the annotation to ppc_md, and then to every halt/restart function that is missing it. Additionally, I have verified that all of these functions do not return. Occasionally I have added a spin loop to be sure. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14powerpc/sparse: Pass endianness to sparseDaniel Axtens1-0/+5
Explicitly give sparse an endianness in the Makefile, so that it doesn't get confused. Normally we have #ifdef one and #else the other, so it doesn't usually matter, but we have been bitten by it before, and indeed this patch fixes a number of sparse errors. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14powerpc/kvm: Clarify __user annotationsDaniel Axtens1-1/+2
kvmppc_h_put_tce_indirect labels a u64 pointer as __user. It also labelled the u64 where get_user puts the result as __user. This isn't a pointer and so doesn't need to be labelled __user. Split the u64 value definition onto a new line to make it clear that it doesn't get the annotation. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14powerpc/pmac/smp: Add missing FROZEN hotplug notifier transitionsAnna-Maria Gleixner1-2/+1
The FROZEN transitions are used when a CPU suspends/resumes. In case of a suspend/resume, only the up prepare (CPU_UP_PREPARE_FROZEN) is handled. The error handling transition CPU_UP_CANCELED_FROZEN as well as the CPU_ONLINE_FROZEN transition are not handled. Masking the switch case action argument with ~CPU_TASKS_FROZEN, to handle all FROZEN tasks the same way than the corresponding non frozen tasks. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14cxl: Add cxl_check_and_switch_mode() API to switch bi-modal cardsAndrew Donnellan3-18/+251
Add a new API, cxl_check_and_switch_mode() to allow for switching of bi-modal CAPI cards, such as the Mellanox CX-4 network card. When a driver requests to switch a card to CAPI mode, use PCI hotplug infrastructure to remove all PCI devices underneath the slot. We then write an updated mode control register to the CAPI VSEC, hot reset the card, and reprobe the card. As the card may present a different set of PCI devices after the mode switch, use the infrastructure provided by the pnv_php driver and the OPAL PCI slot management facilities to ensure that: * the old devices are removed from both the OPAL and Linux device trees * the new devices are probed by OPAL and added to the OPAL device tree * the new devices are added to the Linux device tree and probed through the regular PCI device probe path As such, introduce a new option, CONFIG_CXL_BIMODAL, with a dependency on the pnv_php driver. Refactor existing code that touches the mode control register in the regular single mode case into a new function, setup_cxl_protocol_area(). Co-authored-by: Ian Munsie <imunsie@au1.ibm.com> Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power stateAndrew Donnellan1-1/+1
When calling pnv_php_set_slot_power_state() with state == OPAL_PCI_SLOT_OFFLINE, remove devices from the device tree as if we're dealing with OPAL_PCI_SLOT_POWER_OFF. Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Cc: linux-pci@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14PCI/hotplug: pnv_php: export symbols and move struct types needed by cxlAndrew Donnellan2-27/+33
The cxl driver will use infrastructure from pnv_php to handle device tree updates when switching bi-modal CAPI cards into CAPI mode. To enable this, export pnv_php_find_slot() and pnv_php_set_slot_power_state(), and add corresponding declarations, as well as the definition of struct pnv_php_slot, to asm/pnv-pci.h. Cc: Gavin Shan <gwshan@linux.vnet.ibm.com> Cc: linux-pci@vger.kernel.org Cc: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14cxl: Workaround PE=0 hardware limitation in Mellanox CX4Ian Munsie3-1/+4
The CX4 card cannot cope with a context with PE=0 due to a hardware limitation, resulting in: [ 34.166577] command failed, status limits exceeded(0x8), syndrome 0x5a7939 [ 34.166580] mlx5_core 0000:01:00.1: Failed allocating uar, aborting Since the kernel API allocates a default context very early during device init that will almost certainly get Process Element ID 0 there is no easy way for us to extend the API to allow the Mellanox to inform us of this limitation ahead of time. Instead, work around the issue by extending the XSL structure to include a minimum PE to allocate. Although the bug is not in the XSL, it is the easiest place to work around this limitation given that the CX4 is currently the only card that uses an XSL. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14cxl: Add support for interrupts on the Mellanox CX4Ian Munsie8-0/+202
The Mellanox CX4 in cxl mode uses a hybrid interrupt model, where interrupts are routed from the networking hardware to the XSL using the MSIX table, and from there will be transformed back into an MSIX interrupt using the cxl style interrupts (i.e. using IVTE entries and ranges to map a PE and AFU interrupt number to an MSIX address). We want to hide the implementation details of cxl interrupts as much as possible. To this end, we use a special version of the MSI setup & teardown routines in the PHB while in cxl mode to allocate the cxl interrupts and configure the IVTE entries in the process element. This function does not configure the MSIX table - the CX4 card uses a custom format in that table and it would not be appropriate to fill that out in generic code. The rest of the functionality is similar to the "Full MSI-X mode" described in the CAIA, and this could be easily extended to support other adapters that use that mode in the future. The interrupts will be associated with the default context. If the maximum number of interrupts per context has been limited (e.g. by the mlx5 driver), it will automatically allocate additional kernel contexts to associate extra interrupts as required. These contexts will be started using the same WED that was used to start the default context. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-14cxl: Add preliminary workaround for CX4 interrupt limitationIan Munsie6-0/+53
The Mellanox CX4 has a hardware limitation where only 4 bits of the AFU interrupt number can be passed to the XSL when sending an interrupt, limiting it to only 15 interrupts per context (AFU interrupt number 0 is invalid). In order to overcome this, we will allocate additional contexts linked to the default context as extra address space for the extra interrupts - this will be implemented in the next patch. This patch adds the preliminary support to allow this, by way of adding a linked list in the context structure that we use to keep track of the contexts dedicated to interrupts, and an API to simultaneously iterate over the related context structures, AFU interrupt numbers and hardware interrupt numbers. The point of using a single API to iterate these is to hide some of the details of the iteration from external code, and to reduce the number of APIs that need to be exported via base.c to allow built in code to call. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>