aboutsummaryrefslogtreecommitdiffstats
path: root/arch/arm/include/asm/div64.h (follow)
AgeCommit message (Collapse)AuthorFilesLines
2016-02-11ARM: 8504/1: __arch_xprod_64(): small optimizationNicolas Pitre1-7/+7
The tmp variable is used twice: first to pose as a register containing a value of zero, and then to provide a temporary register that initially is zero and get added some value. But somehow gcc decides to split those two usages in different registers. Example code: u64 div64const1000(u64 x) { u32 y = 1000; do_div(x, y); return x; } Result: div64const1000: push {r4, r5, r6, r7, lr} mov lr, #0 mov r6, r0 mov r7, r1 adr r5, .L8 ldrd r4, [r5] mov r1, lr umull r2, r3, r4, r6 cmn r2, r4 adcs r3, r3, r5 adc r2, lr, #0 umlal r3, r2, r5, r6 umlal r3, r1, r4, r7 mov r3, #0 adds r2, r1, r2 adc r3, r3, #0 umlal r2, r3, r5, r7 lsr r0, r2, #9 lsr r1, r3, #9 orr r0, r0, r3, lsl #23 pop {r4, r5, r6, r7, pc} .align 3 .L8: .word -1924145349 .word -2095944041 Full kernel build size: text data bss dec hex filename 13663814 1553940 351368 15569122 ed90e2 vmlinux Here the two instances of 'tmp' are assigned to r1 and lr. To avoid that, let's mark the first 'tmp' usage in __arch_xprod_64() with a "+r" constraint even if the register is not written to, so to create a dependency for the second usage with the effect of enforcing a single temporary register throughout. Result: div64const1000: push {r4, r5, r6, r7} movs r3, #0 adr r5, .L8 ldrd r4, [r5] umull r6, r7, r4, r0 cmn r6, r4 adcs r7, r7, r5 adc r6, r3, #0 umlal r7, r6, r5, r0 umlal r7, r3, r4, r1 mov r7, #0 adds r6, r3, r6 adc r7, r7, #0 umlal r6, r7, r5, r1 lsr r0, r6, #9 lsr r1, r7, #9 orr r0, r0, r7, lsl #23 pop {r4, r5, r6, r7} bx lr .align 3 .L8: .word -1924145349 .word -2095944041 text data bss dec hex filename 13663438 1553940 351368 15568746 ed8f6a vmlinux This time 'tmp' is assigned to r3 and used throughout. However, by being assigned to r3, that blocks usage of the r2-r3 double register slot for 64-bit values, forcing more registers to be spilled on the stack. Let's try to help it by forcing 'tmp' to the caller-saved ip register. Result: div64const1000: stmfd sp!, {r4, r5} mov ip, #0 adr r5, .L8 ldrd r4, [r5] umull r2, r3, r4, r0 cmn r2, r4 adcs r3, r3, r5 adc r2, ip, #0 umlal r3, r2, r5, r0 umlal r3, ip, r4, r1 mov r3, #0 adds r2, ip, r2 adc r3, r3, #0 umlal r2, r3, r5, r1 mov r0, r2, lsr #9 mov r1, r3, lsr #9 orr r0, r0, r3, asl #23 ldmfd sp!, {r4, r5} bx lr .align 3 .L8: .word -1924145349 .word -2095944041 text data bss dec hex filename 13662838 1553940 351368 15568146 ed8d12 vmlinux We could make the code marginally smaller yet by forcing 'tmp' to lr instead, but that would have a negative inpact on branch prediction for which "bx lr" is optimal. Signed-off-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-11-19ARM: asm/div64.h: adjust to generic coddeNicolas Pitre1-190/+93
Now that the constant divisor optimization is made generic, adapt the ARM case to it. Signed-off-by: Nicolas Pitre <nico@linaro.org>
2014-04-22ARM: 8027/1: fix do_div() bug in big-endian systemsXiangyu Lu1-1/+1
In big-endian systems, "%1" get the most significant part of the value, cause the instruction to get the wrong result. When viewing ftrace record in big-endian ARM systems, we found that the timestamp errors: swapper-0 [001] 1325.970000: 0:120:R ==> [001] 16:120:R events/1 events/1-16 [001] 1325.970000: 16:120:S ==> [001] 0:120:R swapper swapper-0 [000] 1325.1000000: 0:120:R + [000] 15:120:R events/0 swapper-0 [000] 1325.1000000: 0:120:R ==> [000] 15:120:R events/0 swapper-0 [000] 1326.030000: 0:120:R + [000] 1150:120:R sshd swapper-0 [000] 1326.030000: 0:120:R ==> [000] 1150:120:R sshd When viewed ftrace records, it will call the do_div(n, base) function, which achieved arch/arm/include/asm/div64.h in. When n = 10000000, base = 1000000, in do_div(n, base) will execute "umull %Q0, %R0, %1, %Q2". Reviewed-by: Dave Martin <Dave.Martin@arm.com> Reviewed-by: Nicolas Pitre <nico@linaro.org> Cc: <stable@vger.kernel.org> # 2.6.20+ Signed-off-by: Alex Wu <wuquanming@huawei.com> Signed-off-by: Xiangyu Lu <luxiangyu@huawei.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2013-05-15ARM: 7705/1: use optimized do_div only for EABIArnd Bergmann1-1/+1
In OABI configurations, some uses of the do_div function cause gcc to run out of registers. To work around that, we can force the use of the out-of-line version for configurations that build a OABI kernel. Without this patch, building netx_defconfig results in: net/core/pktgen.c: In function 'pktgen_if_show': net/core/pktgen.c:682:2775: error: can't find a register in class 'GENERAL_REGS' while reloading 'asm' net/core/pktgen.c:682:3153: error: can't find a register in class 'GENERAL_REGS' while reloading 'asm' net/core/pktgen.c:682:2775: error: 'asm' operand has impossible constraints net/core/pktgen.c:682:3153: error: 'asm' operand has impossible constraints Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-03-28Disintegrate asm/system.h for ARMDavid Howells1-1/+1
Disintegrate asm/system.h for ARM. Signed-off-by: David Howells <dhowells@redhat.com> cc: Russell King <linux@arm.linux.org.uk> cc: linux-arm-kernel@lists.infradead.org
2008-10-23[ARM] 5320/1: fix assembly constraints in implementation of do_div()Nicolas Pitre1-3/+3
Those inline assembly segments using the umlal instruction must have the & modifier so to be sure that a purely input register won't alias one of the registers used as input+output. In most cases, the inputs are still used after the outputs are touched, and most binutil versions insist on "rdhi, rdlo and rm must all be different" even for ARMv6+. Signed-off-by: Nicolas Pitre <nico@marvell.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-08-02[ARM] move include/asm-arm to arch/arm/include/asmRussell King1-0/+227
Move platform independent header files to arch/arm/include/asm, leaving those in asm/arch* and asm/plat* alone. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>