aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-06-03powerpc/perf: Rearrange memory freeing in imc initAnju T Sudhakar2-18/+27
When any of the IMC (In-Memory Collection counter) devices fail to initialize, imc_common_mem_free() frees set of memory. In doing so, pmu_ptr pointer is also freed. But pmu_ptr pointer is used in subsequent function (imc_common_cpuhp_mem_free()) which is wrong. Patch here reorders the code to avoid such access. Also free the memory which is dynamically allocated during imc initialization, wherever required. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/xics: Add missing of_node_put() in error pathYueHaibing1-3/+4
The device node obtained with of_find_compatible_node() should be released by calling of_node_put(). But it was not released when of_get_property() failed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> [mpe: Invert the sense of the if so we only need one return path] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc: cpm_gpio: Remove owner assignment from platform_driverFabio Estevam1-1/+0
Structure platform_driver does not need to set the owner field, as this will be populated by the driver core. Generated by scripts/coccinelle/api/platform_no_drv_owner.cocci. Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/xive: Remove (almost) unused macrosRussell Currey2-7/+1
The GETFIELD and SETFIELD macros in xive-regs.h aren't used except for a single instance of GETFIELD, so replace that and remove them. These macros are also defined in vas.h, so either those should be eventually replaced or the macros moved into bitops.h. Signed-off-by: Russell Currey <ruscur@russell.cc> [mpe: Rewrite the assignment to 'he' to avoid ffs() etc.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc: remove unused to_tm() helperArnd Bergmann3-76/+0
to_tm() is now completely unused, the only reference being in the _dump_time() helper that is also unused. This removes both, leaving the rest of the powerpc RTC code y2038 safe to as far as the hardware supports. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc: use time64_t in update_persistent_clockArnd Bergmann3-21/+9
update_persistent_clock() is deprecated because it suffers from overflow in 2038 on 32-bit architectures. This changes powerpc to use the update_persistent_clock64() replacement, and to pass down 64-bit timestamps consistently. This is now simpler, as we no longer have to worry about the offset numbers in tm_year and tm_mon that are different between the Linux conventions and RTAS. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc: use time64_t in read_persistent_clockArnd Bergmann15-45/+31
Looking through the remaining users of the deprecated mktime() function, I found the powerpc rtc handlers, which use it in place of rtc_tm_to_time64(). To clean this up, I'm changing over the read_persistent_clock() function to the read_persistent_clock64() variant, and change all the platform specific handlers along with it. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc: rtas: clean up time handlingArnd Bergmann1-12/+12
The to_tm() helper function operates on a signed integer for the time, so it will suffer from overflow in 2038, even on 64-bit kernels. Rather than fix that function, this replaces its use in the rtas procfs implementation with the standard rtc_time64_to_tm() helper that is very similar but is not affected by the overflow. In order to actually support long times, the parser function gets changed to 64-bit user input and output as well. Note that the tm_mon and tm_year representation is slightly different, so we have to manually add an offset here. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc: always enable RTC_LIBArnd Bergmann1-0/+1
In order to use the rtc_tm_to_time64() and rtc_time64_to_tm() helper functions in later patches, we have to ensure that CONFIG_RTC_LIB is always built-in. Note that this symbol only controls a couple of helper functions, not the actual RTC subsystem, which remains optional and is enabled with CONFIG_RTC_CLASS. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/pasemi: Set PCI_SCAN_ALL_PCI_DEVSOlof Johansson1-0/+2
Needed on Amiga X1000 with SB600. Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/mm/hash: hard disable irq in the SLB insert pathAneesh Kumar K.V1-0/+13
When inserting SLB entries for EA above 512TB, we need to hard disable irq. This will make sure we don't take a PMU interrupt that can possibly touch user space address via a stack dump. To prevent this, we need to hard disable the interrupt. Also add a comment explaining why we don't need context synchronizing isync with slbmte. Fixes: f384796c4 ("powerpc/mm: Add support for handling > 512TB address in SLB miss") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/mm/hugetlb: Update hugetlb related locksAneesh Kumar K.V2-15/+30
With split pmd page table lock enabled, we don't use mm->page_table_lock when updating pmd entries. This patch update hugetlb path to use the right lock when inserting huge page directory entries into page table. ex: if we are using hugepd and inserting hugepd entry at the pmd level, we use pmd_lockptr, which based on config can be split pmd lock. For update huge page directory entries itself we use mm->page_table_lock. We do have a helper huge_pte_lockptr() for that. Fixes: 675d99529 ("powerpc/book3s64: Enable split pmd ptlock") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/mm/hash: Add missing isync prior to kernel stack SLB switchAneesh Kumar K.V1-0/+1
Currently we do not have an isync, or any other context synchronizing instruction prior to the slbie/slbmte in _switch() that updates the SLB entry for the kernel stack. However that is not correct as outlined in the ISA. From Power ISA Version 3.0B, Book III, Chapter 11, page 1133: "Changing the contents of ... the contents of SLB entries ... can have the side effect of altering the context in which data addresses and instruction addresses are interpreted, and in which instructions are executed and data accesses are performed. ... These side effects need not occur in program order, and therefore may require explicit synchronization by software. ... The synchronizing instruction before the context-altering instruction ensures that all instructions up to and including that synchronizing instruction are fetched and executed in the context that existed before the alteration." And page 1136: "For data accesses, the context synchronizing instruction before the slbie, slbieg, slbia, slbmte, tlbie, or tlbiel instruction ensures that all preceding instructions that access data storage have completed to a point at which they have reported all exceptions they will cause." We're not aware of any bugs caused by this, but it should be fixed regardless. Add the missing isync when updating kernel stack SLB entry. Cc: stable@vger.kernel.org Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Flesh out change log with more ISA text & explanation] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/64s: Fix compiler store ordering to SLB shadow areaNicholas Piggin1-4/+4
The stores to update the SLB shadow area must be made as they appear in the C code, so that the hypervisor does not see an entry with mismatched vsid and esid. Use WRITE_ONCE for this. GCC has been observed to elide the first store to esid in the update, which means that if the hypervisor interrupts the guest after storing to vsid, it could see an entry with old esid and new vsid, which may possibly result in memory corruption. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumaskNicholas Piggin2-27/+134
When a single-threaded process has a non-local mm_cpumask, try to use that point to flush the TLBs out of other CPUs in the cpumask. An IPI is used for clearing remote CPUs for a few reasons: - An IPI can end lazy TLB use of the mm, which is required to prevent TLB entries being created on the remote CPU. The alternative is to drop lazy TLB switching completely, which costs 7.5% in a context switch ping-pong test betwee a process and kernel idle thread. - An IPI can have remote CPUs flush the entire PID, but the local CPU can flush a specific VA. tlbie would require over-flushing of the local CPU (where the process is running). - A single threaded process that is migrated to a different CPU is likely to have a relatively small mm_cpumask, so IPI is reasonable. No other thread can concurrently switch to this mm, because it must have been given a reference to mm_users by the current thread before it can use_mm. mm_users can be asynchronously incremented (by mm_activate or mmget_not_zero), but those users must use remote mm access and can't use_mm or access user address space. Existing code makes the this assumption already, for example sparc64 has reset mm_cpumask using this condition since the start of history, see arch/sparc/kernel/smp_64.c. This reduces tlbies for a kernel compile workload from 0.90M to 0.12M, tlbiels are increased significantly due to the PID flushing for the cleaning up remote CPUs, and increased local flushes (PID flushes take 128 tlbiels vs 1 tlbie). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/64s/radix: optimise pte_updateNicholas Piggin3-15/+27
Implementing pte_update with pte_xchg (which uses cmpxchg) is inefficient. A single larx/stcx. works fine, no need for the less efficient cmpxchg sequence. Then remove the memory barriers from the operation. There is a requirement for TLB flushing to load mm_cpumask after the store that reduces pte permissions, which is moved into the TLB flush code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flagsNicholas Piggin3-2/+32
The ISA suggests ptesync after setting a pte, to prevent a table walk initiated by a subsequent access from missing that store and causing a spurious fault. This is an architectual allowance that allows an implementation's page table walker to be incoherent with the store queue. However there is no correctness problem in taking a spurious fault in userspace -- the kernel copes with these at any time, so the updated pte will be found eventually. Spurious kernel faults on vmap memory must be avoided, so a ptesync is put into flush_cache_vmap. On POWER9 so far I have not found a measurable window where this can result in more minor faults, so as an optimisation, remove the costly ptesync from pte updates. If an implementation benefits from ptesync, it would be better to add it back in update_mmu_cache, so it's not done for things like fork(2). fork --fork --exec benchmark improved 5.2% (12400->13100). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/64s/radix: prefetch user address in update_mmu_cacheNicholas Piggin2-2/+5
Prefetch the faulting address in update_mmu_cache to give the page table walker perhaps 100 cycles head start as locks are dropped and the interrupt completed. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/64s/radix: make ptep_get_and_clear_full non-atomic for the full caseNicholas Piggin1-8/+2
This matches other architectures, when we know there will be no further accesses to the address (e.g., for teardown), page table entries can be cleared non-atomically. The comments about NMMU are bogus: all MMU notifiers (including NMMU) are released at this point, with their TLBs flushed. An NMMU access at this point would be a bug. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/64s/radix: do not flush TLB on spurious faultNicholas Piggin1-1/+11
In the case of a spurious fault (which can happen due to a race with another thread that changes the page table), the default Linux mm code calls flush_tlb_page for that address. This is not required because the pte will be re-fetched. Hash does not wire this up to a hardware TLB flush for this reason. This patch avoids the flush for radix. >From Power ISA v3.0B, p.1090: Setting a Reference or Change Bit or Upgrading Access Authority (PTE Subject to Atomic Hardware Updates) If the only change being made to a valid PTE that is subject to atomic hardware updates is to set the Refer- ence or Change bit to 1 or to add access authorities, a simpler sequence suffices because the translation hardware will refetch the PTE if an access is attempted for which the only problems were reference and/or change bits needing to be set or insufficient access authority. The nest MMU on POWER9 does not re-fetch the PTE after such an access attempt before faulting, so address spaces with a coprocessor attached will continue to flush in these cases. This reduces tlbies for a kernel compile workload from 0.95M to 0.90M. fork --fork --exec benchmark improved 0.5% (12300->12400). Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/64s/radix: do not flush TLB when relaxing accessNicholas Piggin1-1/+6
Radix flushes the TLB when updating ptes to increase permissiveness of protection (increase access authority). Book3S does not require TLB flushing in this case, and it is not done on hash. This patch avoids the flush for radix. >From Power ISA v3.0B, p.1090: Setting a Reference or Change Bit or Upgrading Access Authority (PTE Subject to Atomic Hardware Updates) If the only change being made to a valid PTE that is subject to atomic hardware updates is to set the Reference or Change bit to 1 or to add access authorities, a simpler sequence suffices because the translation hardware will refetch the PTE if an access is attempted for which the only problems were reference and/or change bits needing to be set or insufficient access authority. The nest MMU on POWER9 does not re-fetch the PTE after such an access attempt before faulting, so address spaces with a coprocessor attached will continue to flush in these cases. This reduces tlbies for a kernel compile workload from 1.28M to 0.95M, tlbiels from 20.17M 19.68M. fork --fork --exec benchmark improved 2.77% (12000->12300). Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/mm/radix: Change pte relax sequence to handle nest MMU hangAneesh Kumar K.V7-7/+17
When relaxing access (read -> read_write update), pte needs to be marked invalid to handle a nest MMU bug. We also need to do a tlb flush after the pte is marked invalid before updating the pte with new access bits. We also move tlb flush to platform specific __ptep_set_access_flags. This will help us to gerid of unnecessary tlb flush on BOOK3S 64 later. We don't do that in this patch. This also helps in avoiding multiple tlbies with coprocessor attached. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/mm: Change function prototypeAneesh Kumar K.V8-19/+41
In later patch, we use the vma and psize to do tlb flush. Do the prototype update in separate patch to make the review easy. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/mm/radix: Move function from radix.h to pgtable-radix.cAneesh Kumar K.V2-28/+25
In later patch we will update them which require them to be moved to pgtable-radix.c. Keeping the function in radix.h results in compile warning as below. ./arch/powerpc/include/asm/book3s/64/radix.h: In function ‘radix__ptep_set_access_flags’: ./arch/powerpc/include/asm/book3s/64/radix.h:196:28: error: dereferencing pointer to incomplete type ‘struct vm_area_struct’ struct mm_struct *mm = vma->vm_mm; ^~ ./arch/powerpc/include/asm/book3s/64/radix.h:204:6: error: implicit declaration of function ‘atomic_read’; did you mean ‘__atomic_load’? [-Werror=implicit-function-declaration] atomic_read(&mm->context.copros) > 0) { ^~~~~~~~~~~ __atomic_load ./arch/powerpc/include/asm/book3s/64/radix.h:204:21: error: dereferencing pointer to incomplete type ‘struct mm_struct’ atomic_read(&mm->context.copros) > 0) { Instead of fixing header dependencies, we move the function to pgtable-radix.c Also the function is now large to be a static inline . Doing the move in separate patch helps in review. No functional change in this patch. Only code movement. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/mm/hugetlb: Update huge_ptep_set_access_flags to call __ptep_set_access_flags directlyAneesh Kumar K.V2-18/+34
In a later patch, we want to update __ptep_set_access_flags take page size arg. This makes ptep_set_access_flags only work with mmu_virtual_psize. To simplify the code make huge_ptep_set_access_flags directly call __ptep_set_access_flags so that we can compute the hugetlb page size in hugetlb function. Now that ptep_set_access_flags won't be called for hugetlb remove the is_vm_hugetlb_page() check and add the assert of pte lock unconditionally. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's actionAlastair D'Silva2-3/+3
The function removes the process element from NPU cache. Signed-off-by: Alastair D'Silva <alastair@d-silva.org> Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc: use task_pid_nr() for TID allocationAlastair D'Silva2-95/+28
The current implementation of TID allocation, using a global IDR, may result in an errant process starving the system of available TIDs. Instead, use task_pid_nr(), as mentioned by the original author. The scenario described which prevented it's use is not applicable, as set_thread_tidr can only be called after the task struct has been populated. In the unlikely event that 2 threads share the TID and are waiting, all potential outcomes have been determined safe. Signed-off-by: Alastair D'Silva <alastair@d-silva.org> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc: Use TIDR CPU feature to control TIDR allocationAlastair D'Silva1-3/+3
Switch the use of TIDR on it's CPU feature, rather than assuming it is available based on architecture. Signed-off-by: Alastair D'Silva <alastair@d-silva.org> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc: Add TIDR CPU feature for POWER9Alastair D'Silva2-1/+3
This patch adds a CPU feature bit to show whether the CPU has the TIDR register available, enabling as_notify/wait in userspace. Signed-off-by: Alastair D'Silva <alastair@d-silva.org> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/powernv: process all OPAL event interrupts with kopaldNicholas Piggin3-61/+52
Using irq_work for processing OPAL event interrupts is not necessary. irq_work is typically used to schedule work from NMI context, a softirq may be more appropriate. However OPAL events are not particularly performance or latency critical, so they can all be invoked by kopald. This patch removes the irq_work queueing, and instead wakes up kopald when there is an event to be processed. kopald processes interrupts individually, enabling irqs and calling cond_resched between each one to minimise latencies. Event handlers themselves should still use threaded handlers, workqueues, etc. as necessary to avoid high interrupts-off latencies within any single interrupt. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/powernv: call OPAL_QUIESCE before OPAL_SIGNAL_SYSTEM_RESETNicholas Piggin4-1/+25
Although it is often possible to recover a CPU that was interrupted from OPAL with a system reset NMI, it's undesirable to interrupt them for a few reasons. Firstly because dump/debug code itself needs to call firmware, so it could hang on a lock or possibly corrupt a per-cpu data structure if it or another CPU was interrupted from OPAL. Secondly, the kexec crash dump code will not return from interrupt to unwind the OPAL call. Call OPAL_QUIESCE with QUIESCE_HOLD before sending an NMI IPI to another CPU, which wait for it to leave firmware (or time out) to avoid this problem in normal conditions. Firmware bugs may still result in a timeout and interrupting OPAL, but that is the best option (stops the CPU, and possibly allows firmware to be debugged). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/64: change softe to irqmask in show_regs and xmonNicholas Piggin2-2/+2
When the soft enabled flag was changed to a soft disable mask, xmon and register dump code was not updated to reflect that, which is confusing ('SOFTE: 1' previously meant interrupts were soft enabled, currently it means the opposite, the general interrupt type has been disabled). Fix this by using the name irqmask, and printing it in hex. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/pmu/fsl: fix is_nmi test for irq mask changeNicholas Piggin1-1/+1
When soft enabled was changed to irq disabled mask, this test missed being converted (although the equivalent book3s test was converted). The PMU drivers consider it an NMI when they take a PMI while general interrupts are disabled. This change restores that behaviour. Fixes: 01417c6cc7 ("powerpc/64: Change soft_enabled from flag to bitmask") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/time: account broadcast timer event interrupts separatelyNicholas Piggin3-4/+8
These are not local timer interrupts but IPIs. It's good to be able to see how timer offloading is behaving, so split these out into their own category. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc: move a stray NMI IPI case under NMI_IPI ifdefNicholas Piggin1-0/+2
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc: move timer broadcast code under GENERIC_CLOCKEVENTS_BROADCAST ifdefNicholas Piggin2-0/+10
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc: allow soft-NMI watchdog to cover timer interrupts with large decrementersNicholas Piggin1-6/+13
Large decrementers (e.g., POWER9) can take a very long time to wrap, so when the timer iterrupt handler sets the decrementer to max so as to avoid taking another decrementer interrupt when hard enabling interrupts before running timers, it effectively disables the soft NMI coverage for timer interrupts. Fix this by using the traditional 31-bit value instead, which wraps after a few seconds. masked interrupt code does the same thing, and in normal operation neither of these paths would ever wrap even the 31 bit value. Note: the SMP watchdog should catch timer interrupt lockups, but it is preferable for the local soft-NMI to catch them, mainly to avoid the IPI. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc: generic clockevents broadcast receiver call tick_receive_broadcastNicholas Piggin4-48/+42
The broadcast tick recipient can call tick_receive_broadcast rather than re-running the full timer interrupt. It does not have to check for the next event time, because the sender already determined the timer has expired. It does not have to test irq_work_pending, because that's a direct decrementer interrupt and does not go through the clock events subsystem. And it does not have to read PURR because that was removed with the previous patch. This results in no code size change, but both the decrementer and broadcast path lengths are reduced. Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/pseries: lparcfg calculate PURR on demandNicholas Piggin4-38/+10
For SPLPAR, lparcfg provides a sum of PURR registers for all CPUs. Currently this is done by reading PURR in context switch and timer interrupt, and storing that into a per-CPU variable. These are summed to provide the value. This does not work with all timer schemes (e.g., NO_HZ_FULL), and it is sub-optimal for performance because it reads the PURR register on every context switch, although that's been difficult to distinguish from noise in the contxt_switch microbenchmark. This patch implements the sum by calling a function on each CPU, to read and add PURR values of each CPU. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/64: remove start_tb and accum_tb from thread_structNicholas Piggin2-9/+1
These fields are only written to. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/64s: micro-optimise __hard_irq_enable() for mtmsrd L=1 supportNicholas Piggin1-2/+2
Book3S minimum supported ISA version now requires mtmsrd L=1. This instruction does not require bits other than RI and EE to be supplied, so __hard_irq_enable() and __hard_irq_disable() does not have to read the kernel_msr from paca. Interrupt entry code already relies on L=1 support. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/pseries: put cede MSR[EE] check under IRQ_SOFT_MASK_DEBUGNicholas Piggin1-4/+4
This check does not catch IRQ soft mask bugs, but this option is slightly more suitable than TRACE_IRQFLAGS. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/64: irq_work avoid interrupt when called with hardware irqs enabledNicholas Piggin1-2/+31
irq_work_raise should not cause a decrementer exception unless it is called from NMI context. Doing so often just results in an immediate masked decrementer interrupt: <...>-550 90d... 4us : update_curr_rt <-dequeue_task_rt <...>-550 90d... 5us : dbs_update_util_handler <-update_curr_rt <...>-550 90d... 6us : arch_irq_work_raise <-irq_work_queue <...>-550 90d... 7us : soft_nmi_interrupt <-soft_nmi_common <...>-550 90d... 7us : printk_nmi_enter <-soft_nmi_interrupt <...>-550 90d.Z. 8us : rcu_nmi_enter <-soft_nmi_interrupt <...>-550 90d.Z. 9us : rcu_nmi_exit <-soft_nmi_interrupt <...>-550 90d... 9us : printk_nmi_exit <-soft_nmi_interrupt <...>-550 90d... 10us : cpuacct_charge <-update_curr_rt The soft_nmi_interrupt here is the call into the watchdog, due to the decrementer interrupt firing with irqs soft-disabled. This is harmless, but sub-optimal. When it's not called from NMI context or with interrupts enabled, mark the decrementer pending in the irq_happened mask directly, rather than having the masked decrementer interupt handler do it. This will be replayed at the next local_irq_enable. See the comment for details. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/powernv/ioda2: Remove redundant free of TCE pagesAlexey Kardashevskiy1-1/+0
When IODA2 creates a PE, it creates an IOMMU table with it_ops::free set to pnv_ioda2_table_free() which calls pnv_pci_ioda2_table_free_pages(). Since iommu_tce_table_put() calls it_ops::free when the last reference to the table is released, explicit call to pnv_pci_ioda2_table_free_pages() is not needed so let's remove it. This should fix double free in the case of PCI hotuplug as pnv_pci_ioda2_table_free_pages() does not reset neither iommu_table::it_base nor ::it_size. This was not exposed by SRIOV as it uses different code path via pnv_pcibios_sriov_disable(). IODA1 does not inialize it_ops::free so it does not have this issue. Fixes: c5f7700bbd2e ("powerpc/powernv: Dynamically release PE") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/xmon: use match_string() helperYisheng Xie1-12/+11
match_string() returns the index of an array for a matching string, which can be used instead of open coded variant. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc: Fix build by disabling attribute-alias warning for SYSCALL_DEFINExChristophe Leroy6-0/+28
GCC 8.1 emits warnings such as the following. As arch/powerpc code is built with -Werror, this breaks the build with GCC 8.1. In file included from arch/powerpc/kernel/pci_64.c:23: ./include/linux/syscalls.h:233:18: error: 'sys_pciconfig_iobase' alias between functions of incompatible types 'long int(long int, long unsigned int, long unsigned int)' and 'long int(long int, long int, long int)' [-Werror=attribute-alias] asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \ ^~~ ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx' __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) This patch inhibits those warnings. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Trim change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03powerpc/64: Fix strncpy() related build failures with GCC 8.1Christophe Leroy2-4/+4
GCC 8.1 warns about possible string truncation: arch/powerpc/kernel/nvram_64.c:1042:2: error: 'strncpy' specified bound 12 equals destination size [-Werror=stringop-truncation] strncpy(new_part->header.name, name, 12); arch/powerpc/platforms/ps3/repository.c:106:2: error: 'strncpy' output truncated before terminating nul copying 8 bytes from a string of the same length [-Werror=stringop-truncation] strncpy((char *)&n, text, 8); Fix it by using memcpy(). To make that safe we need to ensure the destination is pre-zeroed. Use kzalloc() in the nvram code and initialise the u64 to zero in the ps3 code. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Use kzalloc() in the nvram code, flesh out change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03Merge branch 'topic/pkey' into nextMichael Ellerman2-18/+0
This is a branch with a mixture of mm, x86 and powerpc commits all relating to some minor cross-arch pkeys consolidation. The x86/mm changes have been reviewed by Ingo & Dave Hansen and the tree has been in linux-next for some weeks without issue.
2018-06-03Merge branch 'fixes' into nextMichael Ellerman6-20/+48
We ended up with an ugly conflict between fixes and next in ftrace.h involving multiple nested ifdefs, and the automatic resolution is wrong. So merge fixes into next so we can fix it up.
2018-06-03Merge branch 'topic/kbuild' into nextMichael Ellerman4-27/+47
Merge in some commits we're sharing with the kbuild tree.