aboutsummaryrefslogtreecommitdiffstats
path: root/arch/parisc/kernel/pacache.S (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-10-20parisc: Optimze cache flush algorithmsJohn David Anglin1-14/+202
The attached patch implements three optimizations: 1) Loops in flush_user_dcache_range_asm, flush_kernel_dcache_range_asm, purge_kernel_dcache_range_asm, flush_user_icache_range_asm, and flush_kernel_icache_range_asm are unrolled to reduce branch overhead. 2) The static branch prediction for cmpb instructions in pacache.S have been reviewed and the operand order adjusted where necessary. 3) For flush routines in cache.c, we purge rather flush when we have no context. The pdc instruction at level 0 is not required to write back dirty lines to memory. This provides a performance improvement over the fdc instruction if the feature is implemented. Version 2 adds alternative patching. The patch provides an average improvement of about 2%. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2018-10-17parisc: Add alternative coding infrastructureHelge Deller1-20/+44
This patch adds the necessary code to patch a running kernel at runtime to improve performance. The current implementation offers a few optimizations variants: - When running a SMP kernel on a single UP processor, unwanted assembler statements like locking functions are overwritten with NOPs. When multiple instructions shall be skipped, one branch instruction is used instead of multiple nop instructions. - In the UP case, some pdtlb and pitlb instructions are patched to become pdtlb,l and pitlb,l which only flushes the CPU-local tlb entries instead of broadcasting the flush to other CPUs in the system and thus may improve performance. - fic and fdc instructions are skipped if no I- or D-caches are installed. This should speed up qemu emulation and cacheless systems. - If no cache coherence is needed for IO operations, the relevant fdc and sync instructions in the sba and ccio drivers are replaced by nops. - On systems which share I- and D-TLBs and thus don't have a seperate instruction TLB, the pitlb instruction is replaced by a nop. Live-patching is done early in the boot process, just after having run the system inventory. No drivers are running and thus no external interrupts should arrive. So the hope is that no TLB exceptions will occur during the patching. If this turns out to be wrong we will probably need to do the patching in real-mode. Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-13parisc: Fix and improve kernel stack unwindingHelge Deller1-125/+0
This patchset fixes and improves stack unwinding a lot: 1. Show backward stack traces with up to 30 callsites 2. Add callinfo to ENTRY_CFI() such that every assembler function will get an entry in the unwind table 3. Use constants instead of numbers in call_on_stack() 4. Do not depend on CONFIG_KALLSYMS to generate backtraces. 5. Speed up backtrace generation Make sure you have this patch to GNU as installed: https://sourceware.org/ml/binutils/2018-07/msg00474.html Without this patch, unwind info in the kernel is often wrong for various functions. Signed-off-by: Helge Deller <deller@gmx.de>
2018-08-08parisc: Define mb() and add memory barriers to assembler unlock sequencesJohn David Anglin1-0/+1
For years I thought all parisc machines executed loads and stores in order. However, Jeff Law recently indicated on gcc-patches that this is not correct. There are various degrees of out-of-order execution all the way back to the PA7xxx processor series (hit-under-miss). The PA8xxx series has full out-of-order execution for both integer operations, and loads and stores. This is described in the following article: http://web.archive.org/web/20040214092531/http://www.cpus.hp.com/technical_references/advperf.shtml For this reason, we need to define mb() and to insert a memory barrier before the store unlocking spinlocks. This ensures that all memory accesses are complete prior to unlocking. The ldcw instruction performs the same function on entry. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Helge Deller <deller@gmx.de>
2018-04-11parisc: Move cache flush functions into .text.hot sectionHelge Deller1-4/+5
and move the disable_sr_hashing() C and assembly functions into the .init section. Signed-off-by: Helge Deller <deller@gmx.de>
2018-03-02parisc: Fix ordering of cache and TLB flushesJohn David Anglin1-0/+22
The change to flush_kernel_vmap_range() wasn't sufficient to avoid the SMP stalls.  The problem is some drivers call these routines with interrupts disabled.  Interrupts need to be enabled for flush_tlb_all() and flush_cache_all() to work.  This version adds checks to ensure interrupts are not disabled before calling routines that need IPI interrupts.  When interrupts are disabled, we now drop into slower code. The attached change fixes the ordering of cache and TLB flushes in several cases.  When we flush the cache using the existing PTE/TLB entries, we need to flush the TLB after doing the cache flush.  We don't need to do this when we flush the entire instruction and data caches as these flushes don't use the existing TLB entries.  The same is true for tmpalias region flushes. The flush_kernel_vmap_range() and invalidate_kernel_vmap_range() routines have been updated. Secondly, we added a new purge_kernel_dcache_range_asm() routine to pacache.S and use it in invalidate_kernel_vmap_range().  Nominally, purges are faster than flushes as the cache lines don't have to be written back to memory. Hopefully, this is sufficient to resolve the remaining problems due to cache speculation.  So far, testing indicates that this is the case.  I did work up a patch using tmpalias flushes, but there is a performance hit because we need the physical address for each page, and we also need to sequence access to the tmpalias flush code.  This increases the probability of stalls. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org # 4.9+ Signed-off-by: Helge Deller <deller@gmx.de>
2018-01-02parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernelHelge Deller1-2/+7
Qemu for PARISC reported on a 32bit SMP parisc kernel strange failures about "Not-handled unaligned insn 0x0e8011d6 and 0x0c2011c9." Those opcodes evaluate to the ldcw() assembly instruction which requires (on 32bit) an alignment of 16 bytes to ensure atomicity. As it turns out, qemu is correct and in our assembly code in entry.S and pacache.S we don't pay attention to the required alignment. This patch fixes the problem by aligning the lock offset in assembly code in the same manner as we do in our C-code. Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v4.0+
2016-12-07parisc: Remove unnecessary TLB purges from flush_dcache_page_asm and flush_icache_page_asmJohn David Anglin1-21/+1
We have four routines in pacache.S that use temporary alias pages: copy_user_page_asm(), clear_user_page_asm(), flush_dcache_page_asm() and flush_icache_page_asm(). copy_user_page_asm() and clear_user_page_asm() don't purge the TLB entry used for the operation. flush_dcache_page_asm() and flush_icache_page_asm do purge the entry. Presumably, this was thought to optimize TLB use. However, the operation is quite heavy weight on PA 1.X processors as we need to take the TLB lock and a TLB broadcast is sent to all processors. This patch removes the purges from flush_dcache_page_asm() and flush_icache_page_asm. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: <stable@vger.kernel.org> # v3.16+ Signed-off-by: Helge Deller <deller@gmx.de>
2016-11-25parisc: Also flush data TLB in flush_icache_page_asmJohn David Anglin1-15/+22
This is the second issue I noticed in reviewing the parisc TLB code. The fic instruction may use either the instruction or data TLB in flushing the instruction cache. Thus, on machines with a split TLB, we should also flush the data TLB after setting up the temporary alias registers. Although this has no functional impact, I changed the pdtlb and pitlb instructions to consistently use the index register %r0. These instructions do not support integer displacements. Tested on rp3440 and c8000. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: <stable@vger.kernel.org> # v3.16+ Signed-off-by: Helge Deller <deller@gmx.de>
2016-10-05parisc: Add cfi_startproc and cfi_endproc to assembly codeHelge Deller1-34/+34
Add ENTRY_CFI() and ENDPROC_CFI() macros for dwarf debug info and convert assembly users to new macros. Signed-off-by: Helge Deller <deller@gmx.de>
2016-09-20parisc: Update comment regarding implementation of copy_user_page_asmJohn David Anglin1-5/+11
The attached patch describes the current implementation of copy_user_page_asm(). It is possible to implement this routine using either the kernel page mappings or equivalent aliases. I tested both and decided the former was more efficient. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2013-06-18parisc: Use unshadowed index register for flush instructions in flush_dcache_page_asm and flush_icache_page_asmJohn David Anglin1-38/+38
The comment at the start of pacache.S states that the base and index registers used for fdc,fic, and pdc instructions should not use shadowed registers. Although this is probably unnecessary for tmpalias flushes, there is also no reason not to comply. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2013-05-24parisc: use PAGE_SHIFT instead of hardcoded value 12 in pacache.SHelge Deller1-6/+6
additionally clean up some whitespaces & tabs. Signed-off-by: Helge Deller <deller@gmx.de>
2013-05-06parisc: fix partly 16/64k PAGE_SIZE bootHelge Deller1-14/+19
This patch fixes partly PAGE_SIZEs of 16K or 64K by adjusting the assembler PTE lookup code and the assembler TEMPALIAS code. Furthermore some data alignments for PAGE_SIZE have been limited to 4K (or less) to not waste too much memory with greater page sizes. As a side note, the palo loader can (currently) only handle up to 10 ELF segments which is fixed with tighter aligning as well. My testings indicated that the ldci command in the sba iommu coding needed adjustment by the PAGE_SHIFT value and that the I/O PDIR Page size was only set to 4K for my machine (C3000). All this fixes partly the boot, but there are still quite some caching problems left. Examples are e.g. the symbios logic driver which is failing: sym0: <896> rev 0x7 at pci 0000:00:0f.0 irq 69 sym0: PA-RISC Firmware, ID 7, Fast-40, SE, parity checking CACHE TEST FAILED: DMA error (dstat=0x81).sym0: CACHE INCORRECTLY CONFIGURED. and the tulip network driver which doesn't seem to work correctly either: Sending BOOTP requests .net eth0: Setting full-duplex based on MII#1 link partner capability of 05e1 ..... timed out! Beside those kernel fixes glibc will need fixes too to be able to handle >4K page sizes. Signed-off-by: Helge Deller <deller@gmx.de>
2013-02-20parisc: fixes and cleanups in page cache flushing (2/4)John David Anglin1-45/+290
Implement clear_page_asm and copy_page_asm. These are optimized routines to clear and copy a page. I tested prefetch optimizations in clear_page_asm and copy_page_asm but didn't see any significant performance improvement on rp3440. I'm not sure if these are routines are significantly faster than memset and/or memcpy, but they are there for further performance evaluation. TLB purge operations on PA 1.X SMP machines are now serialized with the help of the new tlb_lock() and tlb_unlock() macros, since on some PA-RISC machines, TLB purges need to be serialized in software. Obviously, lock isn't needed in UP kernels. On PA 2.0 machines, there is a local TLB instruction which is much less disruptive to the memory subsystem. No lock is needed for local purge. Loops are also unrolled in flush_instruction_cache_local and flush_data_cache_local. The implementation of what used to be copy_user_page (now copy_user_page_asm) is now fixed. Additionally 64-bit support is now added. Read the preceding comment which I didn't change. I left the comment but it is now inaccurate. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
2012-05-16[PARISC] fix crash in flush_icache_page_asm on PA1.1John David Anglin1-18/+20
As pointed out by serveral people, PA1.1 only has a type 26 instruction meaning that the space register must be explicitly encoded. Not giving an explicit space means that the compiler uses the type 24 version which is PA2.0 only resulting in an illegal instruction crash. This regression was caused by commit f311847c2fcebd81912e2f0caf8a461dec28db41 Author: James Bottomley <James.Bottomley@HansenPartnership.com> Date: Wed Dec 22 10:22:11 2010 -0600 parisc: flush pages through tmpalias space Reported-by: Helge Deller <deller@gmx.de> Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org #2.6.39+ Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-04-15[PARISC] fix pacache .size with new binutilsMeelis Roos1-4/+2
Fix style of flush_user_dcache_range_asm procedure declaration in arch/parisc/kernel/pacache.s to be consistent with other assembly procedures. Signed-off-by: Meelis Roos <mroos@linux.ee> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-01-15parisc: flush pages through tmpalias spaceJames Bottomley1-136/+109
The kernel has an 8M tmpailas space (originally designed for copying and clearing pages but now only used for clearing). The idea is to place zeros into the cache above a physical page rather than into the physical page and flush the cache, because often the zeros end up being replaced quickly anyway. We can also use the tmpalias space for flushing a page. The difference here is that we have to do tmpalias processing in the non access data and instruction traps. The principle is the same: as long as we know the physical address and have a virtual address congruent to the real one, the flush will be effective. In order to use the tmpalias space, the icache miss path has to be enhanced to check for the alias region to make the fic instruction effective. Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2008-06-13Revert "parisc: fix trivial section name warnings"Kyle McMartin1-2/+1
This reverts commit bd3bb8c15b9a80dbddfb7905b237a4a11a4725b4. Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
2008-05-15parisc: use conditional macro for 64-bit wide opsKyle McMartin1-35/+35
This work enables us to remove -traditional from $AFLAGS on parisc. Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
2008-05-15parisc: fix trivial section name warningsHelge Deller1-1/+2
This trivial patch fixes the following section warnings on PARISC: > WARNING: vmlinux.o (.text.1): unexpected section name. >The (.[number]+) following section name are ld generated and not expected. > Did you forget to use "ax"/"aw" in a .S file? > Note that for example <linux/init.h> contains > section definitions for use in .S files. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
2007-10-18[PARISC] Clean up pointless ASM_PAGE_SIZE_DIV useKyle McMartin1-4/+4
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
2007-02-17[PARISC] add ASM_EXCEPTIONTABLE_ENTRY() macroHelge Deller1-9/+0
- this macro unifies the code to add exception table entries - additionally use ENTRY()/ENDPROC() at more places Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2007-02-17[PARISC] more ENTRY(), ENDPROC(), END() conversionsHelge Deller1-46/+34
Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2006-06-30Remove obsolete #include <linux/config.h>Jörn Engel1-1/+0
Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
2006-04-21[PARISC] Further work for multiple page sizesHelge Deller1-13/+12
More work towards supporing multiple page sizes on 64-bit. Convert some assumptions that 64bit uses 3 level page tables into testing PT_NLEVELS. Also some BUG() to BUG_ON() conversions and some cleanups to assembler. Signed-off-by: Helge Deller <deller@parisc-linux.org> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2006-03-30[PARISC] Add parisc implementation of flush_kernel_dcache_page()James Bottomley1-2/+2
We need to do a little renaming of our original syntax because of the difference in arguments. Signed-off-by: James Bottomley <jejb@parisc-linux.org> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2005-10-21[PARISC] Explicitly specify sr4 when flushing kernel spaceMatthew Wilcox1-17/+17
Specify sr4 when flushing kernel space (we could equally well use sr5-7, but must not use sr0). Signed-off-by: Matthew Wilcox <willy@parisc-linux.org> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2005-10-21[PARISC] Properly specify index field to I/D cache flush opsGrant Grundler1-2/+2
replace use of "0" with "%r0" since PA 1.1 I/D flush ops only take a general register and not an immediate value for the index field. This just forces the code to always be PA 1.1 "clean". From: Joel Soete <soete.joel@tiscali.be> Signed-off-by: Grant Grundler <grundler@parisc-linux.org> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2005-10-21[PARISC] Fix copy_user_page_asm to NOT access past end of pageGrant Grundler1-4/+9
2.6.12-rc2-pa3 fix copy_user_page_asm to NOT access past end of page. My bad. /o\ Lamont confirmed that instructions following a conditional branch are *alway* executed regardless if the branch is taken or not. Unless they are nullified (which was missing in this case). He also noted: Conditional branches nullify on forward taken branch, and on non-taken backward branch. Note that .+4 is a backwards branch. This makes alot more sense than the giberish in the PA20 arch book. Compiles and boots on both 64-bit (a500) and 32-bit (j6k). Signed-off-by: Grant Grundler <grundler@parisc-linux.org> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2005-10-21[PARISC] Replace uses of __LP64__ with CONFIG_64BITGrant Grundler1-14/+16
2.6.12-rc4-pa3 s/__LP64__/CONFIG_64BIT/ and fixup config.h usage Signed-off-by: Grant Grundler <grundler@parisc-linux.org> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2005-10-21[PARISC] Make sure use of RFI conforms to PA 2.0 and 1.1 arch docsGrant Grundler1-52/+53
2.6.12-rc4-pa3 : first pass at making sure use of RFI conforms to PA 2.0 arch pages F-4 and F-5, PA 1.1 Arch page 3-19 and 3-20. The discussion revolves around all the rules for clearing PSW Q-bit. The hard part is meeting all the rules for "relied upon translation". .align directive is used to guarantee the critical sequence ends more than 8 instructions (32 bytes) from the end of page. Signed-off-by: Grant Grundler <grundler@parisc-linux.org> Signed-off-by: Kyle McMartin <kyle@parisc-linux.org>
2005-04-16Linux-2.6.12-rc2Linus Torvalds1-0/+1086
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!