linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2012-12-03	ARM: 7587/1: implement optimized percpu variable access	Rob Herring	1	-1/+0
	Use the previously unused TPIDRPRW register to store percpu offsets. TPIDRPRW is only accessible in PL1, so it can only be used in the kernel. This replaces 2 loads with a mrc instruction for each percpu variable access. With hackbench, the performance improvement is 1.4% on Cortex-A9 (highbank). Taking an average of 30 runs of "hackbench -l 1000" yields: Before: 6.2191 After: 6.1348 Will Deacon reported similar delta on v6 with 11MPCore. The asm "memory clobber" are needed here to ensure the percpu offset gets reloaded. Testing by Will found that this would not happen in __schedule() which is a bit of a special case as preemption is disabled but the execution can move cores. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-11-13	tracing,x86: Add a TSC trace_clock	David Sharp	1	-0/+1
	In order to promote interoperability between userspace tracers and ftrace, add a trace_clock that reports raw TSC values which will then be recorded in the ring buffer. Userspace tracers that also record TSCs are then on exactly the same time base as the kernel and events can be unambiguously interlaced. Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large timestamp values. v2: Move arch-specific bits out of generic code. v3: Rename "x86-tsc", cleanups v7: Generic arch bits in Kbuild. Google-Bug-Id: 6980623 Link: http://lkml.kernel.org/r/1352837903-32191-1-git-send-email-dhsharp@google.com Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Signed-off-by: David Sharp <dhsharp@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-12	UAPI: (Scripted) Disintegrate arch/arm/include/asm	David Howells	1	-2/+0
	Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
2012-08-25	ARM: 7494/1: use generic termios.h	Rob Herring	1	-0/+1
	As pointed out by Arnd Bergmann, this fixes a couple of issues but will increase code size: The original macro user_termio_to_kernel_termios was not endian safe. It used an unsigned short ptr to access the low bits in a 32-bit word. Both user_termio_to_kernel_termios and kernel_termios_to_user_termio are missing error checking on put_user/get_user and copy_to/from_user. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Reviewed-by: Nicolas Pitre <nico@linaro.org> Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-08-25	ARM: 7493/1: use generic unaligned.h	Rob Herring	1	-0/+1
	This moves ARM over to the asm-generic/unaligned.h header. This has the benefit of better code generated especially for ARMv7 on gcc 4.7+ compilers. As Arnd Bergmann, points out: The asm-generic version uses the "struct" version for native-endian unaligned access and the "byteshift" version for the opposite endianess. The current ARM version however uses the "byteshift" implementation for both. Thanks to Nicolas Pitre for the excellent analysis: Test case: int foo (int x) { return get_unaligned(x); } long long bar (long long x) { return get_unaligned(x); } With the current ARM version: foo: ldrb r3, [r0, #2] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D) + 2B], MEM[(const u8 )x_1(D) + 2B] ldrb r1, [r0, #1] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D) + 1B], MEM[(const u8 )x_1(D) + 1B] ldrb r2, [r0, #0] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D)], MEM[(const u8 )x_1(D)] mov r3, r3, asl #16 @ tmp154, MEM[(const u8 )x_1(D) + 2B], ldrb r0, [r0, #3] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D) + 3B], MEM[(const u8 )x_1(D) + 3B] orr r3, r3, r1, asl #8 @, tmp155, tmp154, MEM[(const u8 )x_1(D) + 1B], orr r3, r3, r2 @ tmp157, tmp155, MEM[(const u8 )x_1(D)] orr r0, r3, r0, asl #24 @,, tmp157, MEM[(const u8 )x_1(D) + 3B], bx lr @ bar: stmfd sp!, {r4, r5, r6, r7} @, mov r2, #0 @ tmp184, ldrb r5, [r0, #6] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D) + 6B], MEM[(const u8 )x_1(D) + 6B] ldrb r4, [r0, #5] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D) + 5B], MEM[(const u8 )x_1(D) + 5B] ldrb ip, [r0, #2] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D) + 2B], MEM[(const u8 )x_1(D) + 2B] ldrb r1, [r0, #4] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D) + 4B], MEM[(const u8 )x_1(D) + 4B] mov r5, r5, asl #16 @ tmp175, MEM[(const u8 )x_1(D) + 6B], ldrb r7, [r0, #1] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D) + 1B], MEM[(const u8 )x_1(D) + 1B] orr r5, r5, r4, asl #8 @, tmp176, tmp175, MEM[(const u8 )x_1(D) + 5B], ldrb r6, [r0, #7] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D) + 7B], MEM[(const u8 )x_1(D) + 7B] orr r5, r5, r1 @ tmp178, tmp176, MEM[(const u8 )x_1(D) + 4B] ldrb r4, [r0, #0] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D)], MEM[(const u8 )x_1(D)] mov ip, ip, asl #16 @ tmp188, MEM[(const u8 )x_1(D) + 2B], ldrb r1, [r0, #3] @ zero_extendqisi2 @ MEM[(const u8 )x_1(D) + 3B], MEM[(const u8 )x_1(D) + 3B] orr ip, ip, r7, asl #8 @, tmp189, tmp188, MEM[(const u8 )x_1(D) + 1B], orr r3, r5, r6, asl #24 @,, tmp178, MEM[(const u8 )x_1(D) + 7B], orr ip, ip, r4 @ tmp191, tmp189, MEM[(const u8 )x_1(D)] orr ip, ip, r1, asl #24 @, tmp194, tmp191, MEM[(const u8 )x_1(D) + 3B], mov r1, r3 @, orr r0, r2, ip @ tmp171, tmp184, tmp194 ldmfd sp!, {r4, r5, r6, r7} bx lr In both cases the code is slightly suboptimal. One may wonder why wasting r2 with the constant 0 in the second case for example. And all the mov's could be folded in subsequent orr's, etc. Now with the asm-generic version: foo: ldr r0, [r0, #0] @ unaligned @,* x bx lr @ bar: mov r3, r0 @ x, x ldr r0, [r0, #0] @ unaligned @,* x ldr r1, [r3, #4] @ unaligned @, bx lr @ This is way better of course, but only because this was compiled for ARMv7. In this case the compiler knows that the hardware can do unaligned word access. This isn't that obvious for foo(), but if we remove the get_unaligned() from bar as follows: long long bar (long long x) {return x; } then the resulting code is: bar: ldmia r0, {r0, r1} @ x,, bx lr @ So this proves that the presumed aligned vs unaligned cases does have influence on the instructions the compiler may use and that the above unaligned code results are not just an accident. Still... this isn't fully conclusive without at least looking at the resulting assembly fron a pre ARMv6 compilation. Let's see with an ARMv5 target: foo: ldrb r3, [r0, #0] @ zero_extendqisi2 @ tmp139,* x ldrb r1, [r0, #1] @ zero_extendqisi2 @ tmp140, ldrb r2, [r0, #2] @ zero_extendqisi2 @ tmp143, ldrb r0, [r0, #3] @ zero_extendqisi2 @ tmp146, orr r3, r3, r1, asl #8 @, tmp142, tmp139, tmp140, orr r3, r3, r2, asl #16 @, tmp145, tmp142, tmp143, orr r0, r3, r0, asl #24 @,, tmp145, tmp146, bx lr @ bar: stmfd sp!, {r4, r5, r6, r7} @, ldrb r2, [r0, #0] @ zero_extendqisi2 @ tmp139,* x ldrb r7, [r0, #1] @ zero_extendqisi2 @ tmp140, ldrb r3, [r0, #4] @ zero_extendqisi2 @ tmp149, ldrb r6, [r0, #5] @ zero_extendqisi2 @ tmp150, ldrb r5, [r0, #2] @ zero_extendqisi2 @ tmp143, ldrb r4, [r0, #6] @ zero_extendqisi2 @ tmp153, ldrb r1, [r0, #7] @ zero_extendqisi2 @ tmp156, ldrb ip, [r0, #3] @ zero_extendqisi2 @ tmp146, orr r2, r2, r7, asl #8 @, tmp142, tmp139, tmp140, orr r3, r3, r6, asl #8 @, tmp152, tmp149, tmp150, orr r2, r2, r5, asl #16 @, tmp145, tmp142, tmp143, orr r3, r3, r4, asl #16 @, tmp155, tmp152, tmp153, orr r0, r2, ip, asl #24 @,, tmp145, tmp146, orr r1, r3, r1, asl #24 @,, tmp155, tmp156, ldmfd sp!, {r4, r5, r6, r7} bx lr Compared to the initial results, this is really nicely optimized and I couldn't do much better if I were to hand code it myself. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Reviewed-by: Nicolas Pitre <nico@linaro.org> Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2012-08-25	ARM: 7491/1: use generic version of identical asm headers	Rob Herring	1	-0/+15
	Inspired by the AArgh64 claim that it should be separate from ARM and one reason was being able to use more asm-generic headers. Doing a diff of arch/arm/include/asm and include/asm-generic there are numerous asm headers which are functionally identical to their asm-generic counterparts. Delete the ARM version and use the generic ones. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Reviewed-by: Nicolas Pitre <nico@linaro.org> Tested-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2011-10-17	ARM: 7006/1: Migrate to asm-generic wrapper support	Stephen Boyd	1	-0/+17
	With d8ecc5c (kbuild: asm-generic support, 2011-04-27) we can remove a handful of asm-generic wrappers in ARM code. Since the generic version of sizes.h doesn't contain SZ_48M, we replace the 4 users of SZ_48M with the equivalent SZ_32M + SZ_16M. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Cc: Imre Kaloz <kaloz@openwrt.org> Acked-by: Krzysztof Halasa <khc@pm.waw.pl> Cc: Eric Miao <eric.y.miao@gmail.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-08-14	archs: replace unifdef-y with header-y	Sam Ravnborg	1	-1/+1
	unifdef-y and header-y have same semantic, so drop unifdef-y Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2009-01-14	byteorder: make swab.h include asm/swab.h like a regular header	Harvey Harrison	1	-1/+0
	Add swab.h to kbuild.asm and remove the individual entries from each arch, mark as unifdef as some arches have some kernel-only bits inside. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06	arm: introduce asm/swab.h	Harvey Harrison	1	-0/+1
	Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-02	[ARM] move include/asm-arm to arch/arm/include/asm	Russell King	1	-0/+3
	Move platform independent header files to arch/arm/include/asm, leaving those in asm/arch* and asm/plat* alone. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>