aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kernel (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-10-06Merge tag 'powerpc-4.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds3-5/+15
Pull powerpc fixes from Michael Ellerman: "Nine small fixes, really nothing that stands out. A work-around for a spurious MCE on Power9. A CXL fault handling fix, some fixes to the new XIVE code, and a fix to the new 32-bit STRICT_KERNEL_RWX code. Fixes for old code/stable: an fix to an incorrect TLB flush on boot but not on any current machines, a compile error on 4xx and a fix to memory hotplug when using radix (Power9). Thanks to: Anton Blanchard, Cédric Le Goater, Christian Lamparter, Christophe Leroy, Christophe Lombard, Guenter Roeck, Jeremy Kerr, Michael Neuling, Nicholas Piggin" * tag 'powerpc-4.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/powernv: Increase memory block size to 1GB on radix powerpc/mm: Call flush_tlb_kernel_range with interrupts enabled powerpc/xive: Clear XIVE internal structures when a CPU is removed powerpc/xive: Fix IPI reset powerpc/4xx: Fix compile error with 64K pages on 40x, 44x powerpc: Fix action argument for cpufeatures-based TLB flush cxl: Fix memory page not handled powerpc: Fix workaround for spurious MCE on POWER9 powerpc: Handle MCE on POWER9 with only DSISR bit 30 set
2017-10-06Merge branch 'core-watchdog-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds1-15/+15
Pull watchddog clean-up and fixes from Thomas Gleixner: "The watchdog (hard/softlockup detector) code is pretty much broken in its current state. The patch series addresses this by removing all duct tape and refactoring it into a workable state. The reasons why I ask for inclusion that late in the cycle are: 1) The code causes lockdep splats vs. hotplug locking which get reported over and over. Unfortunately there is no easy fix. 2) The risk of breakage is minimal because it's already broken 3) As 4.14 is a long term stable kernel, I prefer to have working watchdog code in that and the lockdep issues resolved. I wouldn't ask you to pull if 4.14 wouldn't be a LTS kernel or if the solution would be easy to backport. 4) The series was around before the merge window opened, but then got delayed due to the UP failure caused by the for_each_cpu() surprise which we discussed recently. Changes vs. V1: - Addressed your review points - Addressed the warning in the powerpc code which was discovered late - Changed two function names which made sense up to a certain point in the series. Now they match what they do in the end. - Fixed a 'unused variable' warning, which got not detected by the intel robot. I triggered it when trying all possible related config combinations manually. Randconfig testing seems not random enough. The changes have been tested by and reviewed by Don Zickus and tested and acked by Micheal Ellerman for powerpc" * 'core-watchdog-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) watchdog/core: Put softlockup_threads_initialized under ifdef guard watchdog/core: Rename some softlockup_* functions powerpc/watchdog: Make use of watchdog_nmi_probe() watchdog/core, powerpc: Lock cpus across reconfiguration watchdog/core, powerpc: Replace watchdog_nmi_reconfigure() watchdog/hardlockup/perf: Fix spelling mistake: "permanetely" -> "permanently" watchdog/hardlockup/perf: Cure UP damage watchdog/hardlockup: Clean up hotplug locking mess watchdog/hardlockup/perf: Simplify deferred event destroy watchdog/hardlockup/perf: Use new perf CPU enable mechanism watchdog/hardlockup/perf: Implement CPU enable replacement watchdog/hardlockup/perf: Implement init time detection of perf watchdog/hardlockup/perf: Implement init time perf validation watchdog/core: Get rid of the racy update loop watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage watchdog/sysctl: Clean up sysctl variable name space watchdog/sysctl: Get rid of the #ifdeffery watchdog/core: Clean up header mess watchdog/core: Further simplify sysctl handling watchdog/core: Get rid of the thread teardown/setup dance ...
2017-10-04powerpc/watchdog: Make use of watchdog_nmi_probe()Thomas Gleixner1-9/+8
The rework of the core hotplug code triggers the WARN_ON in start_wd_cpu() on powerpc because it is called multiple times for the boot CPU. The first call is via: start_wd_on_cpu+0x80/0x2f0 watchdog_nmi_reconfigure+0x124/0x170 softlockup_reconfigure_threads+0x110/0x130 lockup_detector_init+0xbc/0xe0 kernel_init_freeable+0x18c/0x37c kernel_init+0x2c/0x160 ret_from_kernel_thread+0x5c/0xbc And then again via the CPU hotplug registration: start_wd_on_cpu+0x80/0x2f0 cpuhp_invoke_callback+0x194/0x620 cpuhp_thread_fun+0x7c/0x1b0 smpboot_thread_fn+0x290/0x2a0 kthread+0x168/0x1b0 ret_from_kernel_thread+0x5c/0xbc This can be avoided by setting up the cpu hotplug state with nocalls and move the initialization to the watchdog_nmi_probe() function. That initializes the hotplug callbacks without invoking the callback and the following core initialization function then configures the watchdog for the online CPUs (in this case CPU0) via softlockup_reconfigure_threads(). Reported-and-tested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org
2017-10-04watchdog/core, powerpc: Lock cpus across reconfigurationThomas Gleixner1-4/+0
Instead of dropping the cpu hotplug lock after stopping NMI watchdog and threads and reaquiring for restart, the code and the protection rules become more obvious when holding cpu hotplug lock across the full reconfiguration. Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Don Zickus <dzickus@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710022105570.2114@nanos
2017-10-04watchdog/core, powerpc: Replace watchdog_nmi_reconfigure()Thomas Gleixner1-9/+14
The recent cleanup of the watchdog code split watchdog_nmi_reconfigure() into two stages. One to stop the NMI and one to restart it after reconfiguration. That was done by adding a boolean 'run' argument to the code, which is functionally correct but not necessarily a piece of art. Replace it by two explicit functions: watchdog_nmi_stop() and watchdog_nmi_start(). Fixes: 6592ad2fcc8f ("watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage") Requested-by: Linus 'Nursing his pet-peeve' Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Thomas 'Mopping up garbage' Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Don Zickus <dzickus@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710021957480.2114@nanos
2017-10-03powerpc/4xx: Fix compile error with 64K pages on 40x, 44xChristian Lamparter1-3/+0
The mmu context on the 40x, 44x does not define pte_frag entry. This causes gcc abort the compilation due to: setup-common.c: In function ‘setup_arch’: setup-common.c:908: error: ‘mm_context_t’ has no ‘pte_frag’ This patch fixes the issue by removing the pte_frag initialization in setup-common.c. This is possible, because the compiler will do the initialization, since the mm_context is a sub struct of init_mm. init_mm is declared in mm_types.h as external linkage. According to C99 6.2.4.3: An object whose identifier is declared with external linkage [...] has static storage duration. C99 defines in 6.7.8.10 that: If an object that has static storage duration is not initialized explicitly, then: - if it has pointer type, it is initialized to a null pointer Fixes: b1923caa6e64 ("powerpc: Merge 32-bit and 64-bit setup_arch()") Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-03powerpc: Fix action argument for cpufeatures-based TLB flushJeremy Kerr1-2/+2
Commit 41d0c2ecde19 ("powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9") introduced calls to __flush_tlb_power[89] from the cpufeatures code, specifying the number of sets to flush. However, these functions take an action argument, not a number of sets. This means we hit the BUG() in __flush_tlb_{206,300} when using cpufeatures-style configuration. This change passes TLB_INVAL_SCOPE_GLOBAL instead. Fixes: 41d0c2ecde19 ("powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-29powerpc: Fix workaround for spurious MCE on POWER9Michael Neuling1-2/+2
In the recent commit d8bd9f3f0925 ("powerpc: Handle MCE on POWER9 with only DSISR bit 30 set") I screwed up the bit number. It should be bit 25 (IBM bit 38). Fixes: d8bd9f3f0925 ("powerpc: Handle MCE on POWER9 with only DSISR bit 30 set") Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-26powerpc: Handle MCE on POWER9 with only DSISR bit 30 setMichael Neuling1-0/+13
On POWER9 DD2.1 and below, it's possible for a paste instruction to cause a Machine Check Exception (MCE) where only DSISR bit 30 (IBM 33) is set. This will result in the MCE handler seeing an unknown event, which triggers linux to crash. We change this by detecting unknown events caused by load/stores in the MCE handler and marking them as handled so that we no longer crash. An MCE that occurs like this is spurious, so we don't need to do anything in terms of servicing it. If there is something that needs to be serviced, the CPU will raise the MCE again with the correct DSISR so that it can be serviced properly. Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com Acked-by: Balbir Singh <bsingharora@gmail.com> [mpe: Expand comment with details from change log, use normal bit #s] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-21powerpc/eeh: Create PHB PEs after EEH is initializedBenjamin Herrenschmidt2-18/+4
Otherwise we end up not yet having computed the right diag data size on powernv where EEH initialization is delayed, thus causing memory corruption later on when calling OPAL. Fixes: 5cb1f8fdddb7 ("powerpc/powernv/pci: Dynamically allocate PHB diag data") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-20powerpc/kprobes: Update optprobes to use emulate_update_regs()Naveen N. Rao1-1/+3
Optprobes depended on an updated regs->nip from analyse_instr() to identify the location to branch back from the optprobes trampoline. However, since commit 3cdfcbfd32b9d ("powerpc: Change analyse_instr so it doesn't modify *regs"), analyse_instr() doesn't update the registers anymore. Due to this, we end up branching back from the optprobes trampoline to the same branch into the trampoline resulting in a loop. Fix this by calling out to emulate_update_regs() before using the nip. Additionally, explicitly compare the return value from analyse_instr() to 1, rather than just checking for !0 so as to guard against any future changes to analyse_instr() that may result in -1 being returned in more scenarios. Fixes: 3cdfcbfd32b9d ("powerpc: Change analyse_instr so it doesn't modify *regs") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-20Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into fixesMichael Ellerman1-2/+10
Merge one commit from Scott which I missed while away.
2017-09-20powerpc/tm: Flush TM only if CPU has TM featureGustavo Romero1-1/+1
Commit cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump") added code to access TM SPRs in flush_tmregs_to_thread(). However flush_tmregs_to_thread() does not check if TM feature is available on CPU before trying to access TM SPRs in order to copy live state to thread structures. flush_tmregs_to_thread() is indeed guarded by CONFIG_PPC_TRANSACTIONAL_MEM but it might be the case that kernel was compiled with CONFIG_PPC_TRANSACTIONAL_MEM enabled and ran on a CPU without TM feature available, thus rendering the execution of TM instructions that are treated by the CPU as illegal instructions. The fix is just to add proper checking in flush_tmregs_to_thread() if CPU has the TM feature before accessing any TM-specific resource, returning immediately if TM is no available on the CPU. Adding that checking in flush_tmregs_to_thread() instead of in places where it is called, like in vsr_get() and vsr_set(), is better because avoids the same problem cropping up elsewhere. Cc: stable@vger.kernel.org # v4.13+ Fixes: cd63f3c ("powerpc/tm: Fix saving of TM SPRs in core dump") Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-15Merge tag 'powerpc-4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds1-1/+1
Pull powerpc fix from Michael Ellerman: "Just one fix, for the handling of alignment interrupts on dcbz instructions. Thanks to Paul Mackerras, Christian Zigotzky, Michal Sojka" * tag 'powerpc-4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Fix handling of alignment interrupt on dcbz instruction
2017-09-15powerpc: Fix handling of alignment interrupt on dcbz instructionPaul Mackerras1-1/+1
This fixes the emulation of the dcbz instruction in the alignment interrupt handler. The error was that we were comparing just the instruction type field of op.type rather than the whole thing, and therefore the comparison "type != CACHEOP + DCBZ" was always true. Fixes: 31bfdb036f12 ("powerpc: Use instruction emulation infrastructure to handle alignment faults") Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Tested-by: Michal Sojka <sojkam1@fel.cvut.cz> Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-14watchdog/hardlockup: Clean up hotplug locking messThomas Gleixner1-0/+2
All watchdog thread related functions are delegated to the smpboot thread infrastructure, which handles serialization against CPU hotplug correctly. The sysctl interface is completely decoupled from anything which requires CPU hotplug protection. No need to protect the sysctl writes against cpu hotplug anymore. Remove it and add the now required protection to the powerpc arch_nmi_watchdog implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Don Zickus <dzickus@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20170912194148.418497420@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stageThomas Gleixner1-8/+9
Both the perf reconfiguration and the powerpc watchdog_nmi_reconfigure() need to be done in two steps. 1) Stop all NMIs 2) Read the new parameters and start NMIs Right now watchdog_nmi_reconfigure() is a combination of both. To allow a clean reconfiguration add a 'run' argument and split the functionality in powerpc. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Don Zickus <dzickus@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20170912194147.862865570@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-14watchdog/core: Remove broken suspend/resume interfacesThomas Gleixner1-3/+0
This interface has several issues: - It's causing recursive locking of the hotplug lock. - It's complete overkill to teardown all threads and then recreate them The same can be achieved with the simple hardlockup_detector_perf_stop / restart() interfaces. The abuse from the busy looping poweroff() loop of PARISC has been solved as well. Remove the cruft. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Don Zickus <dzickus@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Ulrich Obergfell <uobergfe@redhat.com> Link: http://lkml.kernel.org/r/20170912194146.487537732@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-13mm: treewide: remove GFP_TEMPORARY allocation flagMichal Hocko1-2/+2
GFP_TEMPORARY was introduced by commit e12ba74d8ff3 ("Group short-lived and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's primary motivation was to allow users to tell that an allocation is short lived and so the allocator can try to place such allocations close together and prevent long term fragmentation. As much as this sounds like a reasonable semantic it becomes much less clear when to use the highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the context holding that memory sleep? Can it take locks? It seems there is no good answer for those questions. The current implementation of GFP_TEMPORARY is basically GFP_KERNEL | __GFP_RECLAIMABLE which in itself is tricky because basically none of the existing caller provide a way to reclaim the allocated memory. So this is rather misleading and hard to evaluate for any benefits. I have checked some random users and none of them has added the flag with a specific justification. I suspect most of them just copied from other existing users and others just thought it might be a good idea to use without any measuring. This suggests that GFP_TEMPORARY just motivates for cargo cult usage without any reasoning. I believe that our gfp flags are quite complex already and especially those with highlevel semantic should be clearly defined to prevent from confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and replace all existing users to simply use GFP_KERNEL. Please note that SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and so they will be placed properly for memory fragmentation prevention. I can see reasons we might want some gfp flag to reflect shorterm allocations but I propose starting from a clear semantic definition and only then add users with proper justification. This was been brought up before LSF this year by Matthew [1] and it turned out that GFP_TEMPORARY really doesn't have a clear semantic. It seems to be a heuristic without any measured advantage for most (if not all) its current users. The follow up discussion has revealed that opinions on what might be temporary allocation differ a lot between developers. So rather than trying to tweak existing users into a semantic which they haven't expected I propose to simply remove the flag and start from scratch if we really need a semantic for short term allocations. [1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org [akpm@linux-foundation.org: fix typo] [akpm@linux-foundation.org: coding-style fixes] [sfr@canb.auug.org.au: drm/i915: fix up] Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Neil Brown <neilb@suse.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-11Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespaceLinus Torvalds1-11/+9
Pull namespace updates from Eric Biederman: "Life has been busy and I have not gotten half as much done this round as I would have liked. I delayed it so that a minor conflict resolution with the mips tree could spend a little time in linux-next before I sent this pull request. This includes two long delayed user namespace changes from Kirill Tkhai. It also includes a very useful change from Serge Hallyn that allows the security capability attribute to be used inside of user namespaces. The practical effect of this is people can now untar tarballs and install rpms in user namespaces. It had been suggested to generalize this and encode some of the namespace information information in the xattr name. Upon close inspection that makes the things that should be hard easy and the things that should be easy more expensive. Then there is my bugfix/cleanup for signal injection that removes the magic encoding of the siginfo union member from the kernel internal si_code. The mips folks reported the case where I had used FPE_FIXME me is impossible so I have remove FPE_FIXME from mips, while at the same time including a return statement in that case to keep gcc from complaining about unitialized variables. I almost finished the work to get make copy_siginfo_to_user a trivial copy to user. The code is available at: git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3 But I did not have time/energy to get the code posted and reviewed before the merge window opened. I was able to see that the security excuse for just copying fields that we know are initialized doesn't work in practice there are buggy initializations that don't initialize the proper fields in siginfo. So we still sometimes copy unitialized data to userspace" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: Introduce v3 namespaced file capabilities mips/signal: In force_fcr31_sig return in the impossible case signal: Remove kernel interal si_code magic fcntl: Don't use ambiguous SIG_POLL si_codes prctl: Allow local CAP_SYS_ADMIN changing exe_file security: Use user_namespace::level to avoid redundant iterations in cap_capable() userns,pidns: Verify the userns for new pid namespaces signal/testing: Don't look for __SI_FAULT in userspace signal/mips: Document a conflict with SI_USER with SIGFPE signal/sparc: Document a conflict with SI_USER with SIGFPE signal/ia64: Document a conflict with SI_USER with SIGFPE signal/alpha: Document a conflict with SI_USER for SIGTRAP
2017-09-08treewide: make "nr_cpu_ids" unsignedAlexey Dobriyan2-2/+2
First, number of CPUs can't be negative number. Second, different signnnedness leads to suboptimal code in the following cases: 1) kmalloc(nr_cpu_ids * sizeof(X)); "int" has to be sign extended to size_t. 2) while (loff_t *pos < nr_cpu_ids) MOVSXD is 1 byte longed than the same MOV. Other cases exist as well. Basically compiler is told that nr_cpu_ids can't be negative which can't be deduced if it is "int". Code savings on allyesconfig kernel: -3KB add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370) function old new delta coretemp_cpu_online 450 512 +62 rcu_init_one 1234 1272 +38 pci_device_probe 374 399 +25 ... pgdat_reclaimable_pages 628 556 -72 select_fallback_rq 446 369 -77 task_numa_find_cpu 1923 1807 -116 Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-02powerpc/xive: add XIVE Exploitation Mode to CASCédric Le Goater1-1/+33
On POWER9, the Client Architecture Support (CAS) negotiation process determines whether the guest operates in XIVE Legacy compatibility or in XIVE exploitation mode. Now that we have initial guest support for the XIVE interrupt controller, let's inform the hypervisor what we can do. The platform advertises the XIVE Exploitation Mode support using the property "ibm,arch-vec-5-platform-support-vec-5", byte 23 bits 0-1 : - 0b00 XIVE legacy mode Only - 0b01 XIVE exploitation mode Only - 0b10 XIVE legacy or exploitation mode The OS asks for XIVE Exploitation Mode support using the property "ibm,architecture-vec-5", byte 23 bits 0-1: - 0b00 XIVE legacy mode Only - 0b01 XIVE exploitation mode Only Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01powerpc/iommu: Use permission-specific DEVICE_ATTR variantsJulia Lawall1-2/+1
Use DEVICE_ATTR_RW for read-write attributes. This simplifies the source code, improves readbility, and reduces the chance of inconsistencies. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01powerpc/eeh: Delete an error out of memory message at init timeMarkus Elfring1-4/+1
Omit an extra message for a memory allocation failure in eeh_dev_init(). This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> [mpe: Do not drop the message that can happen at runtime and lead to an event not being handled] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01powerpc/32: remove a NOP from memset()Christophe Leroy1-1/+6
memset() is patched after initialisation to activate the optimised part which uses cache instructions. Today we have a 'b 2f' to skip the optimised patch, which then gets replaced by a NOP, implying a useless cycle consumption. As we have a 'bne 2f' just before, we could use that instruction for the live patching, hence removing the need to have a dedicated 'b 2f' to be replaced by a NOP. This patch changes the 'bne 2f' by a 'b 2f'. During init, that 'b 2f' is then replaced by 'bne 2f' Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01powerpc: Use instruction emulation infrastructure to handle alignment faultsPaul Mackerras1-771/+32
This replaces almost all of the instruction emulation code in fix_alignment() with calls to analyse_instr(), emulate_loadstore() and emulate_dcbz(). The only emulation code left is the SPE emulation code; analyse_instr() etc. do not handle SPE instructions at present. One result of this is that we can now handle alignment faults on all the new VSX load and store instructions that were added in POWER9. VSX loads/stores will take alignment faults for unaligned accesses to cache-inhibited memory. Another effect is that we no longer rely on the DAR and DSISR values set by the processor. With this, we now need to include the instruction emulation code unconditionally. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc: Fix DAR reporting when alignment handler faultsMichael Ellerman1-45/+74
Anton noticed that if we fault part way through emulating an unaligned instruction, we don't update the DAR to reflect that. The DAR value is eventually reported back to userspace as the address in the SEGV signal, and if userspace is using that value to demand fault then it can be confused by us not setting the value correctly. This patch is ugly as hell, but is intended to be the minimal fix and back ports easily. Cc: stable@vger.kernel.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
2017-08-31powerpc/smp: Add Power9 scheduler topologyOliver O'Halloran1-1/+49
In previous generations of Power processors each core had a private L2 cache. The Power 9 processor has a slightly different design where the L2 cache is shared among pairs of cores rather than being completely private. Making the scheduler aware of this cache sharing allows the scheduler to make better migration decisions. For example, if two CPU heavy tasks share a core then one task can be migrated to the paired core to improve throughput. Under the existing three level topology the task could be migrated to any core on the same chip, while with the new topology it would be preferentially migrated to the paired core so it remains cache-hot. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/smp: Add cpu_l2_cache_mapOliver O'Halloran1-6/+17
We want to add an extra level to the CPU scheduler topology to account for cores which share a cache. To do this we need to build a cpumask for each CPU that indicates which CPUs share this cache to use as an input to the scheduler. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/smp: Rework CPU topology constructionOliver O'Halloran1-66/+86
The CPU scheduler topology is constructed from a number of per-cpu cpumasks which describe which sets of logical CPUs are related in some fashion. Current code that handles constructing these masks when CPUs are hot(un)plugged can be simplified a bit by exploiting the fact that the scheduler requires higher levels of the toplogy (e.g package level groupings) to be supersets of the lower levels (e.g. threas in a core). This patch reworks the cpumask construction to be simpler and easier to extend with extra topology levels. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [mpe: Fix CONFIG_HOTPLUG_CPU=n build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/smp: Use cpu_to_chip_id() to find core siblingsOliver O'Halloran1-26/+11
When building the CPU scheduler topology the kernel uses the ibm,chipid property from the devicetree to group logical CPUs. Currently the DT search for this property is open-coded in smp.c and this functionality is a duplication of what's in cpu_to_chip_id() already. This patch removes the existing search in favor of that. It's worth mentioning that the semantics of the search are different in cpu_to_chip_id(). When there is no ibm,chipid in the CPUs node it will also search /cpus and / for the property, but this should not effect the output topology. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/asm: Convert .llong directives to .8byteTobin C. Harding4-15/+15
.llong is an undocumented PPC specific directive. The generic equivalent is .quad, but even better (because it's self describing) is .8byte. Convert all .llong directives to .8byte. Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc: Squash lines for simple wrapper functionsMasahiro Yamada1-30/+12
Remove unneeded variables and assignments. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/kernel: Change retrieval of pci_dnBryant G. Ly1-29/+4
For a PCI device it's pci_dn can be retrieved from pdev->dev.archdata.firmware_data, PCI_DN(devnode), or parent's list. Thus, we should just use the existing function pci_get_pdn_by_devfn to get the pci_dn. Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Reviewed-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/pci: Remove OF node back pointer from pci_dnAlexey Kardashevskiy1-1/+0
The check_req() helper uses pci_get_pdn() to get an OF node pointer. pci_get_pdn() returns a pci_dn pointer which either: 1) from the OF node returned by pci_device_to_OF_node(); 2) from the parent child_list where entries don't have OF node pointers. Since check_req() does not care about 2), it can call pci_device_to_OF_node() directly, hence the change. The find_pe_dn() helper uses embedded pci_dn to get an OF node which is also stored in edev->pdev so let's take a shortcut and call pci_device_to_OF_node() directly. With these 2 changes, we can finally get rid of the OF node back pointer. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/eeh: Reduce use of pci_dn::nodeAlexey Kardashevskiy2-6/+10
The pci_dn struct caches a OF device node pointer in order to access the "ibm,loc-code" property when EEH is recovering. However, when this happens in eeh_dev_check_failure(), we also have a pci_dev pointer which should have a valid pointer to the device node when pci_dn has one (both pointers are not NULL for physical functions and are NULL for virtual functions). This changes pci_remove_device_node_info() to look for a parent of the node being removed, just like pci_add_device_node_info() does when it references the parent node. This is the first step to get rid of pci_dn::node. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/eeh: Remove unnecessary config_addr from eeh_devAlexey Kardashevskiy2-24/+21
The eeh_dev struct hold a config space address of an associated node and the very same address is also stored in the pci_dn struct which is always present during the eeh_dev lifetime. This uses bus:devfn directly from pci_dn instead of cached and packed config_addr. Since config_addr is made from device's bus:dev.fn, there is no point in keeping it in the debugfs either so remove that too. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/eeh: Remove unnecessary pointer to phb from eeh_devAlexey Kardashevskiy4-18/+17
The eeh_dev struct already holds a pointer to pci_dn which it does not exist without and pci_dn itself holds the very same pointer so just use it. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/eeh: Reduce to one the number of places where edev is allocatedAlexey Kardashevskiy1-11/+21
arch/powerpc/kernel/eeh_dev.c:57 is the only legit place where edev is allocated; other 2 places allocate it on stack and in the heap for a very short period of time to use eeh_pe_get() as takes edev. This changes eeh_pe_get() to receive required parameters explicitly. This removes unnecessary temporary allocation of edev. This uses the "pe_no" name instead of the "pe_config_addr" name as it actually is a PE number and not a config space address as it seemed. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/pci: Remove unused parameter from add_one_dev_pci_data()Alexey Kardashevskiy1-12/+2
pdev is always NULL, remove it. To make checkpatch.pl happy, this also removes the "out of memory" message. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc: Machine check interrupt is a non-maskable interruptNicholas Piggin1-3/+6
Use nmi_enter similarly to system reset interrupts. This uses NMI printk NMI buffers and turns off various debugging facilities that helps avoid tripping on ourselves or other CPUs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/powernv: Use kernel crash path for machine checksNicholas Piggin2-1/+30
There are quite a few machine check exceptions that can be caused by kernel bugs. To make debugging easier, use the kernel crash path in cases of synchronous machine checks that occur in kernel mode, if that would not result in the machine going straight to panic or crash dump. There is a downside here that die()ing the process in kernel mode can still leave the system unstable. panic_on_oops will always force the system to fail-stop, so systems where that behaviour is important will still do the right thing. As a test, when triggering an i-side 0111b error (ifetch from foreign address) in kernel mode process context on POWER9, the kernel currently dies quickly like this: Severe Machine check interrupt [Not recovered] NIP [ffff000000000000]: 0xffff000000000000 Initiator: CPU Error type: Real address [Instruction fetch (foreign)] [ 127.426651616,0] OPAL: Reboot requested due to Platform error. Effective[ 127.426693712,3] OPAL: Reboot requested due to Platform error. address: ffff000000000000 opal: Reboot type 1 not supported Kernel panic - not syncing: PowerNV Unrecovered Machine Check CPU: 56 PID: 4425 Comm: syscall Tainted: G M 4.12.0-rc1-13857-ga4700a261072-dirty #35 Call Trace: [ 128.017988928,4] IPMI: BUG: Dropping ESEL on the floor due to buggy/mising code in OPAL for this BMC Rebooting in 10 seconds.. Trying to free IRQ 496 from IRQ context! After this patch, the process is killed and the kernel continues with this message, which gives enough information to identify the offending branch (i.e., with CFAR): Severe Machine check interrupt [Not recovered] NIP [ffff000000000000]: 0xffff000000000000 Initiator: CPU Error type: Real address [Instruction fetch (foreign)] Effective address: ffff000000000000 Oops: Machine check, sig: 7 [#1] SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 ... CPU: 22 PID: 4436 Comm: syscall Tainted: G M 4.12.0-rc1-13857-ga4700a261072-dirty #36 task: c000000932300000 task.stack: c000000932380000 NIP: ffff000000000000 LR: 00000000217706a4 CTR: ffff000000000000 REGS: c00000000fc8fd80 TRAP: 0200 Tainted: G M (4.12.0-rc1-13857-ga4700a261072-dirty) MSR: 90000000001c1003 <SF,HV,ME,RI,LE> CR: 24000484 XER: 20000000 CFAR: c000000000004c80 DAR: 0000000021770a90 DSISR: 0a000000 SOFTE: 1 GPR00: 0000000000001ebe 00007fffce4818b0 0000000021797f00 0000000000000000 GPR04: 00007fff8007ac24 0000000044000484 0000000000004000 00007fff801405e8 GPR08: 900000000280f033 0000000024000484 0000000000000000 0000000000000030 GPR12: 9000000000001003 00007fff801bc370 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28: 00007fff801b0000 0000000000000000 00000000217707a0 00007fffce481918 NIP [ffff000000000000] 0xffff000000000000 LR [00000000217706a4] 0x217706a4 Call Trace: Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc: Do not send system reset request through the oops pathNicholas Piggin1-16/+31
A system reset is a request to crash / debug the system rather than necessarily caused by encountering a BUG. So there is no need to serialize all CPUs behind the die lock, adding taints to all subsequent traces beyond the first, breaking console locks, etc. The system reset is NMI context which has its own printk buffers to prevent output being interleaved. Then it's better to have all secondaries print out their debug as quickly as possible and the primary will flush out all printk buffers during panic(). So remove the 0x100 path from die, and move it into system_reset. Name the crash/dump reasons "System Reset". This gives "not tained" traces when crashing an untainted kernel. It also gives the panic reason as "System Reset" as opposed to "Fatal exception in interrupt" (or "die oops" for fadump). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc: Do not call ppc_md.panic in fadump panic notifierNicholas Piggin2-27/+22
If fadump is not registered, and no other crash or debug handlers are registered, the powerpc panic handler stops the guest before the generic panic code can push out debug information to the console. Currently, system reset injection causes the guest to silently stop. Stop calling ppc_md.panic in the panic notifier. crash_fadump already does rtas_os_term() to terminate the guest if fadump is registered. Remove ppc_md.panic. Move fadump panic notifier into fadump code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/64: Fix watchdog configuration regressionsNicholas Piggin2-7/+28
This fixes a couple more bits of fallout from the new hard lockup watchdog patch. It restores the required hw_nmi_get_sample_period() function for the perf watchdog, and removes some function declarations on 64e that are only defined for 64s. This fixes the 64e build when the hardlockup detector is enabled. It restores the default behaviour of disabling the perf watchdog, and also fixes disabling the 64s watchdog when running as a guest. Fixes: 2104180a53 ("powerpc/64s: implement arch-specific hardlockup watchdog") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/64s/radix: Do not allocate SLB shadow structuresNicholas Piggin1-2/+11
These are unused in radix mode. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31powerpc/64s/radix: Remove bolted-SLB address limit for per-cpu stacksNicholas Piggin1-3/+8
Radix MMU does not take SLB or TLB interrupts when accessing kernel linear address. Remove this restriction for radix mode. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-29powerpc/64s: idle POWER9 can execute stop in virtual modeNicholas Piggin1-0/+9
The hardware can execute stop in any context, and KVM does not require real mode because siblings do not share MMU state. This saves a switch to real-mode when going idle. Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-29powerpc/64s: Drop no longer used IDLE_STATE_ENTER_SEQNicholas Piggin1-4/+1
There are no longer any callers of IDLE_STATE_ENTER_SEQ, all callers use IDLE_STATE_ENTER_SEQ_NORET. So drop the former. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch, write change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-29powerpc/64s: POWER9 can execute stop without a sync sequenceNicholas Piggin1-3/+4
We don't need to use IDLE_STATE_ENTER_SEQ_NORET on Power9. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Split out of larger patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>