aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc/lib/checksum_32.S (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner1-5/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-04powerpc: Implement csum_ipv6_magic in assemblyChristophe Leroy1-0/+33
The generic csum_ipv6_magic() generates a pretty bad result 00000000 <csum_ipv6_magic>: (PPC32) 0: 81 23 00 00 lwz r9,0(r3) 4: 81 03 00 04 lwz r8,4(r3) 8: 7c e7 4a 14 add r7,r7,r9 c: 7d 29 38 10 subfc r9,r9,r7 10: 7d 4a 51 10 subfe r10,r10,r10 14: 7d 27 42 14 add r9,r7,r8 18: 7d 2a 48 50 subf r9,r10,r9 1c: 80 e3 00 08 lwz r7,8(r3) 20: 7d 08 48 10 subfc r8,r8,r9 24: 7d 4a 51 10 subfe r10,r10,r10 28: 7d 29 3a 14 add r9,r9,r7 2c: 81 03 00 0c lwz r8,12(r3) 30: 7d 2a 48 50 subf r9,r10,r9 34: 7c e7 48 10 subfc r7,r7,r9 38: 7d 4a 51 10 subfe r10,r10,r10 3c: 7d 29 42 14 add r9,r9,r8 40: 7d 2a 48 50 subf r9,r10,r9 44: 80 e4 00 00 lwz r7,0(r4) 48: 7d 08 48 10 subfc r8,r8,r9 4c: 7d 4a 51 10 subfe r10,r10,r10 50: 7d 29 3a 14 add r9,r9,r7 54: 7d 2a 48 50 subf r9,r10,r9 58: 81 04 00 04 lwz r8,4(r4) 5c: 7c e7 48 10 subfc r7,r7,r9 60: 7d 4a 51 10 subfe r10,r10,r10 64: 7d 29 42 14 add r9,r9,r8 68: 7d 2a 48 50 subf r9,r10,r9 6c: 80 e4 00 08 lwz r7,8(r4) 70: 7d 08 48 10 subfc r8,r8,r9 74: 7d 4a 51 10 subfe r10,r10,r10 78: 7d 29 3a 14 add r9,r9,r7 7c: 7d 2a 48 50 subf r9,r10,r9 80: 81 04 00 0c lwz r8,12(r4) 84: 7c e7 48 10 subfc r7,r7,r9 88: 7d 4a 51 10 subfe r10,r10,r10 8c: 7d 29 42 14 add r9,r9,r8 90: 7d 2a 48 50 subf r9,r10,r9 94: 7d 08 48 10 subfc r8,r8,r9 98: 7d 4a 51 10 subfe r10,r10,r10 9c: 7d 29 2a 14 add r9,r9,r5 a0: 7d 2a 48 50 subf r9,r10,r9 a4: 7c a5 48 10 subfc r5,r5,r9 a8: 7c 63 19 10 subfe r3,r3,r3 ac: 7d 29 32 14 add r9,r9,r6 b0: 7d 23 48 50 subf r9,r3,r9 b4: 7c c6 48 10 subfc r6,r6,r9 b8: 7c 63 19 10 subfe r3,r3,r3 bc: 7c 63 48 50 subf r3,r3,r9 c0: 54 6a 80 3e rotlwi r10,r3,16 c4: 7c 63 52 14 add r3,r3,r10 c8: 7c 63 18 f8 not r3,r3 cc: 54 63 84 3e rlwinm r3,r3,16,16,31 d0: 4e 80 00 20 blr 0000000000000000 <.csum_ipv6_magic>: (PPC64) 0: 81 23 00 00 lwz r9,0(r3) 4: 80 03 00 04 lwz r0,4(r3) 8: 81 63 00 08 lwz r11,8(r3) c: 7c e7 4a 14 add r7,r7,r9 10: 7f 89 38 40 cmplw cr7,r9,r7 14: 7d 47 02 14 add r10,r7,r0 18: 7d 30 10 26 mfocrf r9,1 1c: 55 29 f7 fe rlwinm r9,r9,30,31,31 20: 7d 4a 4a 14 add r10,r10,r9 24: 7f 80 50 40 cmplw cr7,r0,r10 28: 7d 2a 5a 14 add r9,r10,r11 2c: 80 03 00 0c lwz r0,12(r3) 30: 81 44 00 00 lwz r10,0(r4) 34: 7d 10 10 26 mfocrf r8,1 38: 55 08 f7 fe rlwinm r8,r8,30,31,31 3c: 7d 29 42 14 add r9,r9,r8 40: 81 04 00 04 lwz r8,4(r4) 44: 7f 8b 48 40 cmplw cr7,r11,r9 48: 7d 29 02 14 add r9,r9,r0 4c: 7d 70 10 26 mfocrf r11,1 50: 55 6b f7 fe rlwinm r11,r11,30,31,31 54: 7d 29 5a 14 add r9,r9,r11 58: 7f 80 48 40 cmplw cr7,r0,r9 5c: 7d 29 52 14 add r9,r9,r10 60: 7c 10 10 26 mfocrf r0,1 64: 54 00 f7 fe rlwinm r0,r0,30,31,31 68: 7d 69 02 14 add r11,r9,r0 6c: 7f 8a 58 40 cmplw cr7,r10,r11 70: 7c 0b 42 14 add r0,r11,r8 74: 81 44 00 08 lwz r10,8(r4) 78: 7c f0 10 26 mfocrf r7,1 7c: 54 e7 f7 fe rlwinm r7,r7,30,31,31 80: 7c 00 3a 14 add r0,r0,r7 84: 7f 88 00 40 cmplw cr7,r8,r0 88: 7d 20 52 14 add r9,r0,r10 8c: 80 04 00 0c lwz r0,12(r4) 90: 7d 70 10 26 mfocrf r11,1 94: 55 6b f7 fe rlwinm r11,r11,30,31,31 98: 7d 29 5a 14 add r9,r9,r11 9c: 7f 8a 48 40 cmplw cr7,r10,r9 a0: 7d 29 02 14 add r9,r9,r0 a4: 7d 70 10 26 mfocrf r11,1 a8: 55 6b f7 fe rlwinm r11,r11,30,31,31 ac: 7d 29 5a 14 add r9,r9,r11 b0: 7f 80 48 40 cmplw cr7,r0,r9 b4: 7d 29 2a 14 add r9,r9,r5 b8: 7c 10 10 26 mfocrf r0,1 bc: 54 00 f7 fe rlwinm r0,r0,30,31,31 c0: 7d 29 02 14 add r9,r9,r0 c4: 7f 85 48 40 cmplw cr7,r5,r9 c8: 7c 09 32 14 add r0,r9,r6 cc: 7d 50 10 26 mfocrf r10,1 d0: 55 4a f7 fe rlwinm r10,r10,30,31,31 d4: 7c 00 52 14 add r0,r0,r10 d8: 7f 80 30 40 cmplw cr7,r0,r6 dc: 7d 30 10 26 mfocrf r9,1 e0: 55 29 ef fe rlwinm r9,r9,29,31,31 e4: 7c 09 02 14 add r0,r9,r0 e8: 54 03 80 3e rotlwi r3,r0,16 ec: 7c 03 02 14 add r0,r3,r0 f0: 7c 03 00 f8 not r3,r0 f4: 78 63 84 22 rldicl r3,r3,48,48 f8: 4e 80 00 20 blr This patch implements it in assembly for both PPC32 and PPC64 Link: https://github.com/linuxppc/linux/issues/9 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04powerpc/32: Optimise __csum_partial()Christophe Leroy1-2/+11
Improve __csum_partial by interleaving loads and adds. On a 8xx, it brings neither improvement nor degradation. On a 83xx, it brings a 25% improvement. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-14powerpc: EX_TABLE macro for exception tablesNicholas Piggin1-27/+20
This macro is taken from s390, and allows more flexibility in changing exception table format. mpe: Put it in ppc_asm.h and only define one version using stringinfy_in_c(). Add some empty definitions and headers to keep the selftests happy. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-10-14Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuildLinus Torvalds1-0/+3
Pull kbuild updates from Michal Marek: - EXPORT_SYMBOL for asm source by Al Viro. This does bring a regression, because genksyms no longer generates checksums for these symbols (CONFIG_MODVERSIONS). Nick Piggin is working on a patch to fix this. Plus, we are talking about functions like strcpy(), which rarely change prototypes. - Fixes for PPC fallout of the above by Stephen Rothwell and Nick Piggin - fixdep speedup by Alexey Dobriyan. - preparatory work by Nick Piggin to allow architectures to build with -ffunction-sections, -fdata-sections and --gc-sections - CONFIG_THIN_ARCHIVES support by Stephen Rothwell - fix for filenames with colons in the initramfs source by me. * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (22 commits) initramfs: Escape colons in depfile ppc: there is no clear_pages to export powerpc/64: whitelist unresolved modversions CRCs kbuild: -ffunction-sections fix for archs with conflicting sections kbuild: add arch specific post-link Makefile kbuild: allow archs to select link dead code/data elimination kbuild: allow architectures to use thin archives instead of ld -r kbuild: Regenerate genksyms lexer kbuild: genksyms fix for typeof handling fixdep: faster CONFIG_ search ia64: move exports to definitions sparc32: debride memcpy.S a bit [sparc] unify 32bit and 64bit string.h sparc: move exports to definitions ppc: move exports to definitions arm: move exports to definitions s390: move exports to definitions m68k: move exports to definitions alpha: move exports to actual definitions x86: move exports to actual definitions ...
2016-09-08powerpc/32: Fix again csum_partial_copy_generic()Christophe Leroy1-3/+4
Commit 7aef4136566b0 ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()") introduced a bug when destination address is odd and len is lower than cacheline size. In that case the resulting csum value doesn't have to be rotated one byte because the cache-aligned copy part is skipped so no alignment is performed. Fixes: 7aef4136566b0 ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()") Cc: stable@vger.kernel.org # v4.6+ Reported-by: Alessio Igor Bogani <alessio.bogani@elettra.eu> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Tested-by: Alessio Igor Bogani <alessio.bogani@elettra.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-10powerpc/32: Fix csum_partial_copy_generic()Christophe Leroy1-3/+4
Commit 7aef4136566b0 ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()") introduced a bug when destination address is odd and initial csum is not null In that (rare) case the initial csum value has to be rotated one byte as well as the resulting value is This patch also fixes related comments Fixes: 7aef4136566b0 ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-07ppc: move exports to definitionsAl Viro1-0/+3
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-03-09powerpc: optimise csum_partial() call when len is constantChristophe Leroy1-2/+2
csum_partial is often called for small fixed length packets for which it is suboptimal to use the generic csum_partial() function. For instance, in my configuration, I got: * One place calling it with constant len 4 * Seven places calling it with constant len 8 * Three places calling it with constant len 14 * One place calling it with constant len 20 * One place calling it with constant len 24 * One place calling it with constant len 32 This patch renames csum_partial() to __csum_partial() and implements csum_partial() as a wrapper inline function which * uses csum_add() for small 16bits multiple constant length * uses ip_fast_csum() for other 32bits multiple constant * uses __csum_partial() in all other cases Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-04powerpc32: optimise csum_partial() loopChristophe Leroy1-1/+15
On the 8xx, load latency is 2 cycles and taking branches also takes 2 cycles. So let's unroll the loop. This patch improves csum_partial() speed by around 10% on both: * 8xx (single issue processor with parallel execution) * 83xx (superscalar 6xx processor with dual instruction fetch and parallel execution) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-04powerpc32: optimise a few instructions in csum_partial()Christophe Leroy1-20/+17
r5 does contain the value to be updated, so lets use r5 all way long for that. It makes the code more readable. To avoid confusion, it is better to use adde instead of addc The first addition is useless. Its only purpose is to clear carry. As r4 is a signed int that is always positive, this can be done by using srawi instead of srwi Let's also remove the comment about bdnz having no overhead as it is not correct on all powerpc, at least on MPC8xx In the last part, in our situation, the remaining quantity of bytes to be proceeded is between 0 and 3. Therefore, we can base that part on the value of bit 31 and bit 30 of r4 instead of anding r4 with 3 then proceding on comparisons and substractions. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-04powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()Christophe Leroy1-111/+209
csum_partial_copy_generic() does the same as copy_tofrom_user and also calculates the checksum during the copy. Unlike copy_tofrom_user(), the existing version of csum_partial_copy_generic() doesn't take benefit of the cache. This patch is a rewrite of csum_partial_copy_generic() based on copy_tofrom_user(). The previous version of csum_partial_copy_generic() was handling errors. Now we have the checksum wrapper functions to handle the error case like in powerpc64 so we can make the error case simple: just return -EFAULT. copy_tofrom_user() only has r12 available => we use it for the checksum r7 and r8 which contains pointers to error feedback are used, so we stack them. On a TCP benchmark using socklib on the loopback interface on which checksum offload and scatter/gather have been deactivated, we get about 20% performance increase. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-04powerpc: inline ip_fast_csum()Christophe Leroy1-21/+0
In several architectures, ip_fast_csum() is inlined There are functions like ip_send_check() which do nothing much more than calling ip_fast_csum(). Inlining ip_fast_csum() allows the compiler to optimise better Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [scottwood: whitespace and cast fixes] Signed-off-by: Scott Wood <oss@buserror.net>
2015-08-07powerpc: put csum_tcpudp_magic inlineLEROY Christophe1-16/+0
csum_tcpudp_magic() is only a few instructions, and does modify really few registers. So it is not worth having it as a separate function and suffer function branching and saving of volatile registers. This patch makes it inline by use of the already existing csum_tcpudp_nofold() function. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <scottwood@freescale.com>
2005-10-10powerpc: Rename files to have consistent _32/_64 suffixesPaul Mackerras1-0/+225
This doesn't change any code, just renames things so we consistently have foo_32.c and foo_64.c where we have separate 32- and 64-bit versions. Signed-off-by: Paul Mackerras <paulus@samba.org>