aboutsummaryrefslogtreecommitdiffstats
path: root/arch (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-05-19Merge branch 'topic/ppc-kvm' into nextMichael Ellerman32-1729/+921
Merge our KVM topic branch.
2022-05-19KVM: PPC: Book3S HV: Fix vcore_blocked tracepointFabiano Rosas2-8/+8
We removed most of the vcore logic from the P9 path but there's still a tracepoint that tried to dereference vc->runner. Fixes: ecb6a7207f92 ("KVM: PPC: Book3S HV P9: Remove most of the vcore logic") Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220328215831.320409-1-farosas@linux.ibm.com
2022-05-19KVM: PPC: Book3s: Remove real mode interrupt controller hcalls handlersAlexey Kardashevskiy9-785/+632
Currently we have 2 sets of interrupt controller hypercalls handlers for real and virtual modes, this is from POWER8 times when switching MMU on was considered an expensive operation. POWER9 however does not have dependent threads and MMU is enabled for handling hcalls so the XIVE native or XICS-on-XIVE real mode handlers never execute on real P9 and later CPUs. This untemplate the handlers and only keeps the real mode handlers for XICS native (up to POWER8) and remove the rest of dead code. Changes in functions are mechanical except few missing empty lines to make checkpatch.pl happy. The default implemented hcalls list already contains XICS hcalls so no change there. This should not cause any behavioral change. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220509071150.181250-1-aik@ozlabs.ru
2022-05-19KVM: PPC: Book3s: PR: Enable default TCE hypercallsAlexey Kardashevskiy1-0/+6
When KVM_CAP_PPC_ENABLE_HCALL was introduced, H_GET_TCE and H_PUT_TCE were already implemented and enabled by default; however H_GET_TCE was missed out on PR KVM (probably because the handler was in the real mode code at the time). This enables H_GET_TCE by default. While at this, this wraps the checks in ifdef CONFIG_SPAPR_TCE_IOMMU just like HV KVM. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220506073737.3823347-1-aik@ozlabs.ru
2022-05-19KVM: PPC: Book3s: Retire H_PUT_TCE/etc real mode handlersAlexey Kardashevskiy14-801/+75
LoPAPR defines guest visible IOMMU with hypercalls to use it - H_PUT_TCE/etc. Implemented first on POWER7 where hypercalls would trap in the KVM in the real mode (with MMU off). The problem with the real mode is some memory is not available and some API usage crashed the host but enabling MMU was an expensive operation. The problems with the real mode handlers are: 1. Occasionally these cannot complete the request so the code is copied+modified to work in the virtual mode, very little is shared; 2. The real mode handlers have to be linked into vmlinux to work; 3. An exception in real mode immediately reboots the machine. If the small DMA window is used, the real mode handlers bring better performance. However since POWER8, there has always been a bigger DMA window which VMs use to map the entire VM memory to avoid calling H_PUT_TCE. Such 1:1 mapping happens once and uses H_PUT_TCE_INDIRECT (a bulk version of H_PUT_TCE) which virtual mode handler is even closer to its real mode version. On POWER9 hypercalls trap straight to the virtual mode so the real mode handlers never execute on POWER9 and later CPUs. So with the current use of the DMA windows and MMU improvements in POWER9 and later, there is no point in duplicating the code. The 32bit passed through devices may slow down but we do not have many of these in practice. For example, with this applied, a 1Gbit ethernet adapter still demostrates above 800Mbit/s of actual throughput. This removes the real mode handlers from KVM and related code from the powernv platform. This updates the list of implemented hcalls in KVM-HV as the realmode handlers are removed. This changes ABI - kvmppc_h_get_tce() moves to the KVM module and kvmppc_find_table() is static now. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220506053755.3820702-1-aik@ozlabs.ru
2022-05-19Merge branch 'fixes' into topic/ppc-kvmMichael Ellerman12-95/+123
Merge our fixes branch. In parciular this brings in the KVM TCE handling fix, which is a prerequisite for a subsequent patch.
2022-05-19KVM: PPC: Book3S HV: Initialize AMOR in nested entryFabiano Rosas1-0/+1
The hypervisor always sets AMOR to ~0, but let's ensure we're not passing stale values around. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220425142151.1495142-1-farosas@linux.ibm.com
2022-05-19Merge branch 'fixes' into nextMichael Ellerman6-29/+57
Merge our fixes branch from this cycle. In particular this brings in a papr_scm.c change which a subsequent patch has a dependency on.
2022-05-18KVM: PPC: Book3S HV: Use consistent type for return value of kvm_age_rmapp()Bo Liu1-3/+3
The return value type defined in the function kvm_age_rmapp() is "bool", but the return value type defined in the implementation of the function kvm_age_rmapp() is "int". Change the return value type to "bool". Signed-off-by: Bo Liu <liubo03@inspur.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220401065252.36472-1-liubo03@inspur.com
2022-05-18KVM: PPC: Book3S HV: fix incorrect NULL check on list iteratorXiaomeng Tong1-3/+5
The bug is here: if (!p) return ret; The list iterator value 'p' will *always* be set and non-NULL by list_for_each_entry(), so it is incorrect to assume that the iterator value will be NULL if the list is empty or no element is found. To fix the bug, Use a new value 'iter' as the list iterator, while use the old value 'p' as a dedicated variable to point to the found element. Fixes: dfaa973ae960 ("KVM: PPC: Book3S HV: In H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs") Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220414062103.8153-1-xiam0nd.tong@gmail.com
2022-05-18KVM: PPC: Book3S HV: remove extraneous asterisk from rm_host_ipi_action() commentBagas Sanjaya1-1/+1
kernel test robot reported kernel-doc warning for rm_host_ipi_action(): arch/powerpc/kvm/book3s_hv_rm_xics.c:887: warning: This comment starts with '/**', but isn't a kernel-doc comment. * Host Operations poked by RM KVM Since the function is static, remove the extraneous (second) asterisk at the head of function comment. Fixes: 0c2a66062470cd ("KVM: PPC: Book3S HV: Host side kick VCPU when poked by real-mode KVM") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/linux-doc/202204252334.Cd2IsiII-lkp@intel.com/ Link: https://lore.kernel.org/r/20220506070747.16309-1-bagasdotme@gmail.com
2022-05-13KVM: PPC: Book3S HV Nested: L2 LPCR should inherit L1 LPES settingNicholas Piggin2-2/+5
The L1 should not be able to adjust LPES mode for the L2. Setting LPES if the L0 needs it clear would cause external interrupts to be sent to L2 and missed by the L0. Clearing LPES when it may be set, as typically happens with XIVE enabled could cause a performance issue despite having no native XIVE support in the guest, because it will cause mediated interrupts for the L2 to be taken in HV mode, which then have to be injected. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220303053315.1056880-7-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV Nested: L2 must not run with L1 xive contextNicholas Piggin2-7/+21
The PowerNV L0 currently pushes the OS xive context when running a vCPU, regardless of whether it is running a nested guest. The problem is that xive OS ring interrupts will be delivered while the L2 is running. At the moment, by default, the L2 guest runs with LPCR[LPES]=0, which actually makes external interrupts go to the L0. That causes the L2 to exit and the interrupt taken or injected into the L1, so in some respects this behaves like an escalation. It's not clear if this was deliberate or not, there's no comment about it and the L1 is actually allowed to clear LPES in the L2, so it's confusing at best. When the L2 is running, the L1 is essentially in a ceded state with respect to external interrupts (it can't respond to them directly and won't get scheduled again absent some additional event). So the natural way to solve this is when the L0 handles a H_ENTER_NESTED hypercall to run the L2, have it arm the escalation interrupt and don't push the L1 context while running the L2. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220303053315.1056880-6-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV P9: Split !nested case out from guest entryNicholas Piggin1-4/+13
The differences between nested and !nested will become larger in later changes so split them out for readability. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220303053315.1056880-5-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV P9: Move cede logic out of XIVE escalation rearmingNicholas Piggin3-7/+16
Move the cede abort logic out of xive escalation rearming and into the caller to prepare for handling a similar case with nested guest entry. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220303053315.1056880-4-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV P9: Inject pending xive interrupts at guest entryNicholas Piggin1-2/+7
If there is a pending xive interrupt, inject it at guest entry (if MSR[EE] is enabled) rather than take another interrupt when the guest is entered. If xive is enabled then LPCR[LPES] is set so this behaviour should be expected. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220303053315.1056880-3-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV: Remove KVMPPC_NR_LPIDSNicholas Piggin2-6/+0
KVMPPC_NR_LPIDS no longer represents any size restriction on the LPID space and can be removed. A CPU with more than 12 LPID bits implemented will now be able to create more than 4095 guests. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-7-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S Nested: Use explicit 4096 LPID maximumNicholas Piggin3-15/+18
Rather than tie this to KVMPPC_NR_LPIDS which is becoming more dynamic, fix it to 4096 (12-bits) explicitly for now. kvmhv_get_nested() does not have to check against KVM_MAX_NESTED_GUESTS because the L1 partition table registration hcall already did that, and it checks against the partition table size. This patch also puts all the partition table size calculations into the same form, using 12 for the architected size field shift and 4 for the shift corresponding to the partition table entry size. Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-of-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-6-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV Nested: Change nested guest lookup to use idrNicholas Piggin2-54/+59
This removes the fixed sized kvm->arch.nested_guests array. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-5-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV: Use IDA allocator for LPID allocatorNicholas Piggin1-12/+13
This removes the fixed-size lpid_inuse array. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-4-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV: Update LPID allocator init for POWER9, NestedNicholas Piggin5-11/+33
The LPID allocator init is changed to: - use mmu_lpid_bits rather than hard-coding; - use KVM_MAX_NESTED_GUESTS for nested hypervisors; - not reserve the top LPID on POWER9 and newer CPUs. The reserved LPID is made a POWER7/8-specific detail. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-3-npiggin@gmail.com
2022-05-13KVM: PPC: Remove kvmppc_claim_lpidNicholas Piggin4-16/+7
Removing kvmppc_claim_lpid makes the lpid allocator API a bit simpler to change the underlying implementation in a future patch. The host LPID is always 0, so that can be a detail of the allocator. If the allocator range is restricted, that can reserve LPIDs at the top of the range. This allows kvmppc_claim_lpid to be removed. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123120043.3586018-2-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV P9: Optimise loads around context switchNicholas Piggin1-4/+11
It is better to get all loads for the register values in flight before starting to switch LPID, PID, and LPCR because those mtSPRs are expensive and serialising. This also just tidies up the code for a potential future change to the context switching sequence. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220123114725.3549202-1-npiggin@gmail.com
2022-05-13KVM: PPC: Book3S HV: HFSCR[PREFIX] does not existNicholas Piggin2-2/+1
This facility is controlled by FSCR only. Reserved bits should not be set in the HFSCR register (although it's likely harmless as this position would not be re-used, and the L0 is forgiving here too). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220122105639.3477407-1-npiggin@gmail.com
2022-05-11powerpc/rtas: Keep MSR[RI] set when calling RTASLaurent Dufour2-12/+21
RTAS runs in real mode (MSR[DR] and MSR[IR] unset) and in 32-bit big endian mode (MSR[SF,LE] unset). The change in MSR is done in enter_rtas() in a relatively complex way, since the MSR value could be hardcoded. Furthermore, a panic has been reported when hitting the watchdog interrupt while running in RTAS, this leads to the following stack trace: watchdog: CPU 24 Hard LOCKUP watchdog: CPU 24 TB:997512652051031, last heartbeat TB:997504470175378 (15980ms ago) ... Supported: No, Unreleased kernel CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: G E X 5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c NIP: 000000001fb41050 LR: 000000001fb4104c CTR: 0000000000000000 REGS: c00000000fc33d60 TRAP: 0100 Tainted: G E X (5.14.21-150400.71.1.bz196362_2-default) MSR: 8000000002981000 <SF,VEC,VSX,ME> CR: 48800002 XER: 20040020 CFAR: 000000000000011c IRQMASK: 1 GPR00: 0000000000000003 ffffffffffffffff 0000000000000001 00000000000050dc GPR04: 000000001ffb6100 0000000000000020 0000000000000001 000000001fb09010 GPR08: 0000000020000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 80040000072a40a8 c00000000ff8b680 0000000000000007 0000000000000034 GPR16: 000000001fbf6e94 000000001fbf6d84 000000001fbd1db0 000000001fb3f008 GPR20: 000000001fb41018 ffffffffffffffff 000000000000017f fffffffffffff68f GPR24: 000000001fb18fe8 000000001fb3e000 000000001fb1adc0 000000001fb1cf40 GPR28: 000000001fb26000 000000001fb460f0 000000001fb17f18 000000001fb17000 NIP [000000001fb41050] 0x1fb41050 LR [000000001fb4104c] 0x1fb4104c Call Trace: Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX Oops: Unrecoverable System Reset, sig: 6 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries ... Supported: No, Unreleased kernel CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: G E X 5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c NIP: 000000001fb41050 LR: 000000001fb4104c CTR: 0000000000000000 REGS: c00000000fc33d60 TRAP: 0100 Tainted: G E X (5.14.21-150400.71.1.bz196362_2-default) MSR: 8000000002981000 <SF,VEC,VSX,ME> CR: 48800002 XER: 20040020 CFAR: 000000000000011c IRQMASK: 1 GPR00: 0000000000000003 ffffffffffffffff 0000000000000001 00000000000050dc GPR04: 000000001ffb6100 0000000000000020 0000000000000001 000000001fb09010 GPR08: 0000000020000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 80040000072a40a8 c00000000ff8b680 0000000000000007 0000000000000034 GPR16: 000000001fbf6e94 000000001fbf6d84 000000001fbd1db0 000000001fb3f008 GPR20: 000000001fb41018 ffffffffffffffff 000000000000017f fffffffffffff68f GPR24: 000000001fb18fe8 000000001fb3e000 000000001fb1adc0 000000001fb1cf40 GPR28: 000000001fb26000 000000001fb460f0 000000001fb17f18 000000001fb17000 NIP [000000001fb41050] 0x1fb41050 LR [000000001fb4104c] 0x1fb4104c Call Trace: Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX ---[ end trace 3ddec07f638c34a2 ]--- This happens because MSR[RI] is unset when entering RTAS but there is no valid reason to not set it here. RTAS is expected to be called with MSR[RI] as specified in PAPR+ section "7.2.1 Machine State": R1–7.2.1–9. If called with MSR[RI] equal to 1, then RTAS must protect its own critical regions from recursion by setting the MSR[RI] bit to 0 when in the critical regions. Fixing this by reviewing the way MSR is compute before calling RTAS. Now a hardcoded value meaning real mode, 32 bits big endian mode and Recoverable Interrupt is loaded. In the case MSR[S] is set, it will remain set while entering RTAS as only urfid can unset it (thanks Fabiano). In addition a check is added in do_enter_rtas() to detect calls made with MSR[RI] unset, as we are forcing it on later. This patch has been tested on the following machines: Power KVM Guest P8 S822L (host Ubuntu kernel 5.11.0-49-generic) PowerVM LPAR P8 9119-MME (FW860.A1) p9 9008-22L (FW950.00) P10 9080-HEX (FW1010.00) Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220504101244.12107-1-ldufour@linux.ibm.com
2022-05-11powerpc/8xx: Use kmalloced data structure instead of global staticChristophe Leroy1-18/+30
Use a kmalloced data structure to store interrupt controller internal data instead of static global variables. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c8f0866ee013113d5e28948943cf0586e49f5353.1649226186.git.christophe.leroy@csgroup.eu
2022-05-11powerpc/8xx: Remove mpc8xx_pics_init()Christophe Leroy9-32/+15
mpc8xx_pics_init() is now only a trampoline to mpc8xx_pic_init(). Remove mpc8xx_pics_init() and use mpc8xx_pic_init() directly. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9c55a698adb5ba3b7b77023170fcaf0acb5d2d81.1649226186.git.christophe.leroy@csgroup.eu
2022-05-11powerpc/8xx: Convert CPM1 interrupt controller to platform_deviceChristophe Leroy3-56/+50
In the same logic as commit be7ecbd240b2 ("soc: fsl: qe: convert QE interrupt controller to platform_device"), convert CPM1 interrupt controller to platform_device. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fb80d0b2077312079c49da0296e25591578771cd.1649226186.git.christophe.leroy@csgroup.eu
2022-05-11powerpc/8xx: Convert CPM1 error interrupt handler to platform driverChristophe Leroy1-29/+44
Add CPM error interrupt as a standalone platform driver, to simplify the init of CPM interrupt handler. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/375a72df6e4a26c5959cc81a6c6d46152efa2306.1649226186.git.christophe.leroy@csgroup.eu
2022-05-11powerpc/8xx: Move CPM interrupt controller into a dedicated fileChristophe Leroy3-140/+152
CPM interrupt controller is quite standalone. Move it into a dedicated file. It will help for next step which will change it to a platform driver. This is pure code move, checkpatch report is ignored at this point, except one parenthesis alignment which would remain at the end of the series. All other points fly away with following patches. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d3a7dc832d905bed14b35d83410cdb69a7ba20e8.1649226186.git.christophe.leroy@csgroup.eu
2022-05-11powerpc/code-patching: Use jump_label to check if poking_init() is doneChristophe Leroy1-1/+4
It's only during early startup that poking_init() is not done yet, for instance when calling ftrace_init(). Once poking_init() has been called there must be a poking area, no need to check it everytime patch_instruction() is called. ftrace activation time is reduced by 7% with the change on an 8xx. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8d6088aca7b63247377b6d9e4897d08d935fbe93.1647962456.git.christophe.leroy@csgroup.eu
2022-05-11powerpc/code-patching: Use jump_label for testing freed initmemChristophe Leroy3-1/+8
Once init is done, initmem is freed forever so no need to test system_state at every call to patch_instruction(). Use jump_label. This reduces by 2% the time needed to activate ftrace on an 8xx. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0aee964721cab7316cffde21a2ca223cee14d373.1647962456.git.christophe.leroy@csgroup.eu
2022-05-11KVM: PPC: Book3S PR: Enable MSR_DR for switch_mmu_context()Alexander Graf1-5/+21
Commit 863771a28e27 ("powerpc/32s: Convert switch_mmu_context() to C") moved the switch_mmu_context() to C. While in principle a good idea, it meant that the function now uses the stack. The stack is not accessible from real mode though. So to keep calling the function, let's turn on MSR_DR while we call it. That way, all pointer references to the stack are handled virtually. In addition, make sure to save/restore r12 on the stack, as it may get clobbered by the C function. Fixes: 863771a28e27 ("powerpc/32s: Convert switch_mmu_context() to C") Cc: stable@vger.kernel.org # v5.14+ Reported-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Alexander Graf <graf@amazon.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220510123717.24508-1-graf@amazon.com
2022-05-08powerpc/code-patching: Don't call is_vmalloc_or_module_addr() without CONFIG_MODULESChristophe Leroy1-1/+1
If CONFIG_MODULES is not set, there is no point in checking whether text is in module area. This reduced the time needed to activate/deactivate ftrace by more than 10% on an 8xx. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f3c701cce00a38620788c0fc43ff0b611a268c54.1647962456.git.christophe.leroy@csgroup.eu
2022-05-08powerpc: align address to page boundary in change_page_attr()Christophe Leroy1-0/+1
Aligning address to page boundary allows flush_tlb_kernel_range() to know it's a single page flush and use tlbie instead of tlbia. On 603 we now have the following code in first leg of change_page_attr(): 2c: 55 29 00 3c rlwinm r9,r9,0,0,30 30: 91 23 00 00 stw r9,0(r3) 34: 7c 00 22 64 tlbie r4,r0 38: 7c 00 04 ac hwsync 3c: 38 60 00 00 li r3,0 40: 4e 80 00 20 blr Before we had: 28: 55 29 00 3c rlwinm r9,r9,0,0,30 2c: 91 23 00 00 stw r9,0(r3) 30: 54 89 00 26 rlwinm r9,r4,0,0,19 34: 38 84 10 00 addi r4,r4,4096 38: 7c 89 20 50 subf r4,r9,r4 3c: 28 04 10 00 cmplwi r4,4096 40: 41 81 00 30 bgt 70 <change_page_attr+0x70> 44: 7c 00 4a 64 tlbie r9,r0 48: 7c 00 04 ac hwsync 4c: 38 60 00 00 li r3,0 50: 4e 80 00 20 blr ... 70: 94 21 ff f0 stwu r1,-16(r1) 74: 7c 08 02 a6 mflr r0 78: 90 01 00 14 stw r0,20(r1) 7c: 48 00 00 01 bl 7c <change_page_attr+0x7c> 7c: R_PPC_REL24 _tlbia 80: 80 01 00 14 lwz r0,20(r1) 84: 38 60 00 00 li r3,0 88: 7c 08 03 a6 mtlr r0 8c: 38 21 00 10 addi r1,r1,16 90: 4e 80 00 20 blr Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6bb118fb2ee89fa3c1f9cf90ed19f88220002cb0.1647877467.git.christophe.leroy@csgroup.eu
2022-05-08powerpc/8xx: Simplify flush_tlb_kernel_range()Christophe Leroy2-1/+13
In the same spirit as commit 63f501e07a85 ("powerpc/8xx: Simplify TLB handling"), simplify flush_tlb_kernel_range() for 8xx. 8xx cannot be SMP, and has 'tlbie' and 'tlbia' instructions, so an inline version of flush_tlb_kernel_range() for 8xx is worth it. With this page, first leg of change_page_attr() is: 2c: 55 29 00 3c rlwinm r9,r9,0,0,30 30: 91 23 00 00 stw r9,0(r3) 34: 7c 00 22 64 tlbie r4,r0 38: 7c 00 04 ac hwsync 3c: 38 60 00 00 li r3,0 40: 4e 80 00 20 blr Before the patch it was: 30: 55 29 00 3c rlwinm r9,r9,0,0,30 34: 91 2a 00 00 stw r9,0(r10) 38: 94 21 ff f0 stwu r1,-16(r1) 3c: 7c 08 02 a6 mflr r0 40: 38 83 10 00 addi r4,r3,4096 44: 90 01 00 14 stw r0,20(r1) 48: 48 00 00 01 bl 48 <change_page_attr+0x48> 48: R_PPC_REL24 flush_tlb_kernel_range 4c: 80 01 00 14 lwz r0,20(r1) 50: 38 60 00 00 li r3,0 54: 7c 08 03 a6 mtlr r0 58: 38 21 00 10 addi r1,r1,16 5c: 4e 80 00 20 blr Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d2610043419ce3e0e53a85386baf2c3625af5cfb.1647877442.git.christophe.leroy@csgroup.eu
2022-05-08powerpc: Use static call for get_irq()Christophe Leroy1-1/+7
__do_irq() inconditionnaly calls ppc_md.get_irq() That's definitely a hot path. At the time being ppc_md.get_irq address is read every time from ppc_md structure. Replace that call by a static call, and initialise that call after ppc_md.init_IRQ() has set ppc_md.get_irq. Emit a warning and don't set the static call if ppc_md.init_IRQ() is still NULL, that way the kernel won't blow up if for some reason ppc_md.get_irq() doesn't get properly set. With the patch: 00000000 <__SCT__ppc_get_irq>: 0: 48 00 00 20 b 20 <__static_call_return0> <== Replaced by 'b <ppc_md.get_irq>' at runtime ... 00000020 <__static_call_return0>: 20: 38 60 00 00 li r3,0 24: 4e 80 00 20 blr ... 00000058 <__do_irq>: ... 64: 48 00 00 01 bl 64 <__do_irq+0xc> 64: R_PPC_REL24 __SCT__ppc_get_irq 68: 2c 03 00 00 cmpwi r3,0 ... Before the patch: 00000038 <__do_irq>: ... 3c: 3d 20 00 00 lis r9,0 3e: R_PPC_ADDR16_HA ppc_md+0x1c ... 44: 81 29 00 00 lwz r9,0(r9) 46: R_PPC_ADDR16_LO ppc_md+0x1c ... 4c: 7d 29 03 a6 mtctr r9 50: 4e 80 04 21 bctrl 54: 2c 03 00 00 cmpwi r3,0 ... On PPC64 which doesn't implement static calls yet we get: 00000000000000d0 <__do_irq>: ... dc: 00 00 22 3d addis r9,r2,0 dc: R_PPC64_TOC16_HA .data+0x8 ... e4: 00 00 89 e9 ld r12,0(r9) e4: R_PPC64_TOC16_LO_DS .data+0x8 ... f0: a6 03 89 7d mtctr r12 f4: 18 00 41 f8 std r2,24(r1) f8: 21 04 80 4e bctrl fc: 18 00 41 e8 ld r2,24(r1) ... So on PPC64 that's similar to what we get without static calls. But at least until ppc_md.get_irq() is set the call is to __static_call_return0. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/afb92085f930651d8b1063e4d4bf0396c80ebc7d.1647002274.git.christophe.leroy@csgroup.eu
2022-05-08powerpc: Use rol32() instead of opencoding in csum_fold()Christophe Leroy1-8/+9
rol32(x, 16) will do the rotate using rlwinm. No need to open code using inline assembly. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/794337eff7bb803d2c4e67d9eee635390c4c48fe.1646812553.git.christophe.leroy@csgroup.eu
2022-05-08powerpc: Add missing headersChristophe Leroy146-108/+195
Don't inherit headers "by chances" from asm/prom.h, asm/mpc52xx.h, asm/pci.h etc... Include the needed headers, and remove asm/prom.h when it was needed exclusively for pulling necessary headers. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/be8bdc934d152a7d8ee8d1a840d5596e2f7d85e0.1646767214.git.christophe.leroy@csgroup.eu
2022-05-08powerpc: Remove asm/prom.h from all files that don't need itChristophe Leroy79-80/+0
Several files include asm/prom.h for no reason. Clean it up. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Drop change to prom_parse.c as reported by lkp@intel.com] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/7c9b8fda63dcf63e1b28f43e7ebdb95182cbc286.1646767214.git.christophe.leroy@csgroup.eu
2022-05-06powerpc/papr_scm: Fix buffer overflow issue with CONFIG_FORTIFY_SOURCEKajol Jain1-5/+2
With CONFIG_FORTIFY_SOURCE enabled, string functions will also perform dynamic checks for string size which can panic the kernel, like incase of overflow detection. In papr_scm, papr_scm_pmu_check_events function uses stat->stat_id with string operations, to populate the nvdimm_events_map array. Since stat_id variable is not NULL terminated, the kernel panics with CONFIG_FORTIFY_SOURCE enabled at boot time. Below are the logs of kernel panic: detected buffer overflow in __fortify_strlen ------------[ cut here ]------------ kernel BUG at lib/string_helpers.c:980! Oops: Exception in kernel mode, sig: 5 [#1] NIP [c00000000077dad0] fortify_panic+0x28/0x38 LR [c00000000077dacc] fortify_panic+0x24/0x38 Call Trace: [c0000022d77836e0] [c00000000077dacc] fortify_panic+0x24/0x38 (unreliable) [c00800000deb2660] papr_scm_pmu_check_events.constprop.0+0x118/0x220 [papr_scm] [c00800000deb2cb0] papr_scm_probe+0x288/0x62c [papr_scm] [c0000000009b46a8] platform_probe+0x98/0x150 Fix this issue by using kmemdup_nul() to copy the content of stat->stat_id directly to the nvdimm_events_map array. mpe: stat->stat_id comes from the hypervisor, not userspace, so there is no security exposure. Fixes: 4c08d4bbc089 ("powerpc/papr_scm: Add perf interface support") Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220505153451.35503-1-kjain@linux.ibm.com
2022-05-06powerpc: Add missing declaration in asm/drmem.hChristophe Leroy1-0/+3
Don't rely on random inclusion of linux/of.h by users of asm/drmem.h Add a forward declaration of struct property and struct device_node. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5643ec410e51b749db0636471cb7979524f9ed0e.1646767214.git.christophe.leroy@csgroup.eu
2022-05-06powerpc: Include asm/reg.h in asm/svm.hChristophe Leroy1-0/+2
is_secure_guest() uses mfmsr(). Don't rely on users to include asm/reg.h, include it in asm/svm.h Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/482c82c8a29d5fb3ea279b34f107e0e775001344.1646767214.git.christophe.leroy@csgroup.eu
2022-05-06powerpc: Don't include asm/prom.h in asm/parport.hChristophe Leroy1-1/+1
parport.h needs only of_irq.h, no need to go via asm/prom.h Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ec796ee56cf61f16ba24e62a9d3525d11931538c.1646767214.git.christophe.leroy@csgroup.eu
2022-05-06powerpc/64: Move pci_device_from_OF_node() out of asm/pci-bridge.hChristophe Leroy2-12/+11
Move pci_device_from_OF_node() in pci64.c because it needs definition of struct device_node and is not worth inlining. ppc32.c already has it in pci32.c. That way pci-bridge.h doesn't need linux/of.h (Brought by asm/prom.h via asm/pci.h) Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3c88286b55413730d7784133993a46ef4a3607ce.1646767214.git.christophe.leroy@csgroup.eu
2022-05-06powerpc: Reduce csum_add() complexity for PPC64Christophe Leroy1-5/+4
PPC64 does everything in C, gcc is able to skip calculation when one of the operands in zero. Move the constant folding in PPC32 part. This helps GCC and reduces ppc64_defconfig by 170 bytes. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a4ca63dd4c4b09e1906d08fb814af5a41d0f3fcb.1644651363.git.christophe.leroy@csgroup.eu
2022-05-06powerpc/64: remove system call instruction emulationNicholas Piggin2-46/+10
emulate_step() instruction emulation including sc instruction emulation initially appeared in xmon. It was then moved into sstep.c where kprobes could use it too, and later hw_breakpoint and uprobes started to use it. Until uprobes, the only instruction emulation users were for kernel mode instructions. - xmon only steps / breaks on kernel addresses. - kprobes is kernel only. - hw_breakpoint only emulates kernel instructions, single steps user. At one point, there was support for the kernel to execute sc instructions, although that is long removed and it's not clear whether there were any in-tree users. So system call emulation is not required by the above users. uprobes uses emulate_step and it appears possible to emulate sc instruction in userspace. Userspace system call emulation is broken and it's not clear it ever worked well. The big complication is that userspace takes an interrupt to the kernel to emulate the instruction. The user->kernel interrupt sets up registers and interrupt stack frame expecting to return to userspace, then system call instruction emulation re-directs that stack frame to the kernel, early in the system call interrupt handler. This means the interrupt return code takes the kernel->kernel restore path, which does not restore everything as the system call interrupt handler would expect coming from userspace. regs->iamr appears to get lost for example, because the kernel->kernel return does not restore the user iamr. Accounting such as irqflags tracing and CPU accounting does not get flipped back to user mode as the system call handler expects, so those appear to enter the kernel twice without returning to userspace. These things may be individually fixable with various complication, but it is a big complexity for unclear real benefit. Furthermore, it is not possible to single step a system call instruction since it causes an interrupt. As such, a separate patch disables probing on system call instructions. This patch removes system call emulation and disables stepping system calls. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [minor commit log edit, and also get rid of '#ifdef CONFIG_PPC64'] Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a412e3b3791ed83de18704c8d90f492e7a0049c0.1648648712.git.naveen.n.rao@linux.vnet.ibm.com
2022-05-06powerpc: Reject probes on instructions that can't be single steppedNaveen N. Rao5-8/+66
Per the ISA, a Trace interrupt is not generated for: - [h|u]rfi[d] - rfscv - sc, scv, and Trap instructions that trap - Power-Saving Mode instructions - other instructions that cause interrupts (other than Trace interrupts) - the first instructions of any interrupt handler (applies to Branch and Single Step tracing; CIABR matches may still occur) - instructions that are emulated by software Add a helper to check for instructions belonging to the first four categories above and to reject kprobes, uprobes and xmon breakpoints on such instructions. We reject probing on instructions belonging to these categories across all ISA versions and across both BookS and BookE. For trap instructions, we can't know in advance if they can cause a trap, and there is no good reason to allow probing on those. Also, uprobes already refuses to probe trap instructions and kprobes does not allow probes on trap instructions used for kernel warnings and bugs. As such, stop allowing any type of probes/breakpoints on trap instruction across uprobes, kprobes and xmon. For some of the fp/altivec instructions that can generate an interrupt and which we emulate in the kernel (altivec assist, for example), we check and turn off single stepping in emulate_single_step(). Instructions generating a DSI are restarted and single stepping normally completes once the instruction is completed. In uprobes, if a single stepped instruction results in a non-fatal signal to be delivered to the task, such signals are "delayed" until after the instruction completes. For fatal signals, single stepping is cancelled and the instruction restarted in-place so that core dump captures proper addresses. In kprobes, we do not allow probes on instructions having an extable entry and we also do not allow probing interrupt vectors. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f56ee979d50b8711fae350fc97870f3ca34acd75.1648648712.git.naveen.n.rao@linux.vnet.ibm.com
2022-05-06powerpc: Sort and de-dup primary opcodes in ppc-opcode.hNaveen N. Rao1-38/+31
Some of the primary opcodes are duplicated. Remove those, and sort the rest of the primary opcodes to make it easy to read. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a05edf638a2638d708fc2db0272f6317837b5eab.1648648712.git.naveen.n.rao@linux.vnet.ibm.com
2022-05-05powerpc: fix typos in commentsJulia Lawall83-104/+104
Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220430185654.5855-1-Julia.Lawall@inria.fr