linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2011-01-21	h8300: Convert to new irq_chip functions	Thomas Gleixner	1	-17/+16
	No functional change, just straight forward conversion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Paul Mundt <lethal@linux-sh.org>
2011-01-21	Merge branch 'fix/asoc' into for-linus	Takashi Iwai	7	-68/+55

2011-01-21	Merge branch 'fix/misc' into for-linus	Takashi Iwai	1	-0/+7

2011-01-21	powerpc/mpic: Fix mask/unmask timeout message	Scott Wood	1	-2/+4
	Don't say that enable timed out when it was disable, and show which IRQ had the problem. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc/pseries: Add BNX2=m to defconfig	Nishanth Aravamudan	1	-0/+1
	Upcoming servers will include a Broadcom NIC, add to the defconfig to increase testing coverage and make sure mainline builds come up with networking. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc: Enable 64kB pages and 1024 threads in pseries config	Anton Blanchard	1	-1/+3
	- Enable 64kB pages so it gets some regular testing. - The largest POWER7 has 1024 threads so bump NR_CPUS it to match. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc: Disable mcount tracers in pseries defconfig	Anton Blanchard	1	-2/+0
	IRQSOFF_TRACER and STACK_TRACER force the kernel to be built with -pg which is a substantial overhead. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc/boot/dts: Install dts from the right directory	Ben Hutchings	1	-1/+1
	The dts-installed variable is initialised using a wildcard path that will be expanded relative to the build directory. Use the existing variable dtstree to generate an absolute wildcard path that will work when building in a separate directory. Reported-by: Gerhard Pircher <gerhard_pircher@gmx.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Tested-by: Gerhard Pircher <gerhard_pircher@gmx.net> [against 2.6.32] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc: machine_check_generic is wrong on 64bit	Anton Blanchard	1	-23/+0
	Decoding machine checks is CPU specific and so machine_check_generic doesn't do the right thing on 64bit chips. Luckily we never call into this code because we call ppc_md.machine_check_exception instead if available. Since we check cur_cpu_spec->machine_check before calling it, we may as well remove machine_check_generic from 64bit archs. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc: Check RTAS extended log flag before checking length	Anton Blanchard	1	-1/+1
	The spec suggests we should first check the extended log flag before checking the length field. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc: Fix corruption when grabbing FWNMI data	Anton Blanchard	1	-14/+38
	The FWNMI code uses a global buffer without any locks to read the RTAS error information. If two CPUs take a machine check at once then we will corrupt this buffer. Since most FWNMI rtas messages are not of the extended type, we can create a 64bit percpu buffer and use it where possible. If we do receive an extended RTAS log then we fall back to the old behaviour of using the global buffer. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc: Rework pseries machine check handler	Anton Blanchard	1	-18/+30
	Rework pseries machine check handler: - If MSR_RI isn't set, we cannot recover even if the machine check was fully recovered - Rename nonfatal to recovered - Handle RTAS_DISP_LIMITED_RECOVERY - Use BUS_MCEERR_AR instead of BUS_ADRERR - Don't check all the RTAS error log fields when receiving a synchronous machine check. Recent versions of the pseries firmware do not fill them in during a machine check and instead send a follow up error log with the detailed information. If we see a synchronous machine check, and we came from userspace then kill the task. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc: Don't silently handle machine checks from userspace	Anton Blanchard	1	-5/+0
	If a machine check comes from userspace we send a SIGBUS to the task and fail to printk anything. If we are taking machine checks due to bad hardware we want to know about it right away. Furthermore if we don't complain loudly then it will look a lot like a bug in the userspace application, potentially causing a lot of confusion. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc: Remove duplicate debugger hook in machine_check_exception	Anton Blanchard	1	-2/+0
	We are calling debugger_fault_handler twice in machine_check_exception. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc: Never halt RTAS error logging after receiving an unrecoverable machine check	Anton Blanchard	1	-1/+1
	Newer versions of the System p firwmare send a partial RTAS error log in the machine check handler with a more detailed response appearing sometime later via check event. This means at machine check time we do not have enough information to ascertain exactly what went on. Furthermore, I have found the RTAS error logs in the machine check handler contain no useful information, so halting on them makes little sense. If we want to halt it would make more sense to do it following the error log received sometime later via check event. In light of this, never halt the error log in the pseries machine check handler. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc: Don't force MSR_RI in machine_check_exception	Anton Blanchard	1	-4/+1
	We should never force MSR_RI on. If we take a machine check with MSR_RI off then we have no chance of recovering safely. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc: Print 32 bits of DSISR in show_regs	Anton Blanchard	1	-1/+1
	We were printing 64 bits of DSISR in show_regs even though it is 32 bit. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc/kdump: Disable ftrace during kexec	Anton Blanchard	1	-0/+7
	We should disable ftrace during kexec, some of the tracers are very invasive and we do not want them going off while doing the low level work of swapping one kernel out for another. This mirrors what we do on x86. Even though we cannot return from a kexec on powerpc (since we do not implement CONFIG_KEXEC_JUMP), add the restore code in case we do one day. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc/kdump: Move crash_kexec_stop_spus to kdump crash handler	Anton Blanchard	3	-78/+72
	Use the crash handler hooks to run the SPU stop code, just like we do for ehea and cell RAS code. While I'm here I noticed "CPUSs reliabally" so fix the spelling MISTAKESs reliabally. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc/kexec: Remove empty ppc_md.machine_kexec_prepare	Anton Blanchard	2	-22/+0
	We check for a valid handler before calling ppc_md.machine_kexec_prepare so we can just remove these empty handlers. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc/kexec: Don't initialise kexec hooks to default handlers	Anton Blanchard	2	-13/+0
	There's no need to initialise ppc_md.machine_kexec and ppc_md.machine_kexec_prepare to the default handlers. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc/kdump: Remove ppc_md.machine_crash_shutdown	Anton Blanchard	4	-12/+1
	No one uses ppc_md.machine_crash_shutdown, so remove it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc/kexec: Remove ppc_md.machine_kexec	Anton Blanchard	2	-10/+1
	No one uses ppc_md.machine_kexec, so remove it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc/kexec: Remove ppc_md.machine_kexec_cleanup	Anton Blanchard	2	-5/+0
	No one uses ppc_md.machine_kexec_cleanup, so remove it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc/kexec: Move all ppc_md kexec function pointers together	Anton Blanchard	1	-3/+2
	Move all the kexec handlers together. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc/cell: Use system_wq in cpufreq_spudemand	Tejun Heo	2	-22/+23
	With cmwq, there's no reason to use a separate workqueue in cpufreq_spudemand. Use system_wq instead. The work items are already sync canceled on stop, so it's already guaranteed that no work is running when spu_gov_exit() is entered. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linuxppc-dev@lists.ozlabs.org Cc: Dave Jones <davej@redhat.com> Cc: cpufreq@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc/macintosh: Fix wrong test in fan_{read,write}_reg()	roel kluin	1	-2/+2
	Fix error test in fan_{read,write}_reg() Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc/rtas_flash: Use simple_read_from_buffer	Akinobu Mita	1	-47/+6
	Simplify read file operation for /proc/powerpc/rtas/* interface by using simple_read_from_buffer. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc/spufs: Use simple_write_to_buffer	Akinobu Mita	1	-20/+7
	Simplify several write fileoperations for spufs by using simple_write_to_buffer(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc/ppc32/tracing: Add stack frame to calls of trace_hardirqs_on/off	Steven Rostedt	1	-0/+11
	32-bit variant of the previous patch for 64-bit: << When an interrupt occurs in userspace, we can call trace_hardirqs_on/off() With one level stack. But if we have irqsoff tracing enabled, it checks both CALLER_ADDR0 and CALLER_ADDR1. The second call goes two stack frames up. If this is from user space, then there may not exist a second stack.... >> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc/ppc64/tracing: Add stack frame to calls of trace_hardirqs_on/off	Steven Rostedt	1	-10/+30
	When an interrupt occurs in userspace, we can call trace_hardirqs_on/off() With one level stack. But if we have irqsoff tracing enabled, it checks both CALLER_ADDR0 and CALLER_ADDR1. The second call goes two stack frames up. If this is from user space, then there may not exist a second stack. Add a second stack when calling trace_hardirqs_on/off() otherwise the following oops might occur: Oops: Kernel access of bad area, sig: 11 [#1] PREEMPT SMP NR_CPUS=2 PA Semi PWRficient last sysfs file: /sys/block/sda/size Modules linked in: ohci_hcd ehci_hcd usbcore NIP: c0000000000e1c00 LR: c0000000000034d4 CTR: 000000011012c440 REGS: c00000003e2f3af0 TRAP: 0300 Not tainted (2.6.37-rc6+) MSR: 9000000000001032 <ME,IR,DR> CR: 48044444 XER: 20000000 DAR: 00000001ffb9db50, DSISR: 0000000040000000 TASK = c00000003e1a00a0[2088] 'emacs' THREAD: c00000003e2f0000 CPU: 1 GPR00: 0000000000000001 c00000003e2f3d70 c00000000084e0d0 c0000000008816e8 GPR04: 000000001034c678 000000001032e8f9 0000000010336540 0000000040020000 GPR08: 0000000040020000 00000001ffb9db40 c00000003e2f3e30 0000000060000000 GPR12: 100000000000f032 c00000000fff0280 000000001032e8c9 0000000000000008 GPR16: 00000000105be9c0 00000000105be950 00000000105be9b0 00000000105be950 GPR20: 00000000ffb9dc50 00000000ffb9dbf0 00000000102f0000 00000000102f0000 GPR24: 00000000102e0000 00000000102f0000 0000000010336540 c0000000009ded38 GPR28: 00000000102e0000 c0000000000034d4 c0000000007ccb10 c00000003e2f3d70 NIP [c0000000000e1c00] .trace_hardirqs_off+0xb0/0x1d0 LR [c0000000000034d4] decrementer_common+0xd4/0x100 Call Trace: [c00000003e2f3d70] [c00000003e2f3e30] 0xc00000003e2f3e30 (unreliable) [c00000003e2f3e30] [c0000000000034d4] decrementer_common+0xd4/0x100 Instruction dump: 81690000 7f8b0000 419e0018 f84a0028 60000000 60000000 60000000 e95f0000 80030000 e92a0000 eb6301f8 2f800000 <eb890010> 41fe00dc a06d000a eb1e8050 ---[ end trace 4ec7fd2be9240928 ]--- Reported-by: Joerg Sommer <joerg@alea.gnuu.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-21	powerpc: Ensure the else case of feature sections will fit	Michael Ellerman	2	-12/+34
	When we create an alternative feature section, the else case must be the same size or smaller than the body. This is because when we patch the else case in we just overwrite the body, so there must be room. Up to now we just did this by inspection, but it's quite easy to enforce it in the assembler, so we should. The only change is to add the ifgt block, but that effects the alignment of the tabs and so the whole macro is modified. Also add a test, but #if 0 it because we don't want to break the build. Anyone who's modifying the feature macros should enable the test. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-01-20	Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip	Linus Torvalds	7	-40/+22
	* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: smp: Allow on_each_cpu() to be called while early_boot_irqs_disabled status to init/main.c lockdep: Move early boot local IRQ enable/disable status to init/main.c
2011-01-20	ACPI / PM: Call suspend_nvs_free() earlier during resume	Rafael J. Wysocki	1	-1/+1
	It turns out that some device drivers map pages from the ACPI NVS region during resume using ioremap(), which conflicts with ioremap_cache() used for mapping those pages by the NVS save/restore code in nvs.c. Make the NVS pages mapped by the code in nvs.c be unmapped before device drivers' resume routines run. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20	ACPI: Introduce acpi_os_ioremap()	Rafael J. Wysocki	4	-11/+27
	Commit ca9b600be38c ("ACPI / PM: Make suspend_nvs_save() use acpi_os_map_memory()") attempted to prevent the code in osl.c and nvs.c from using different ioremap() variants by making the latter use acpi_os_map_memory() for mapping the NVS pages. However, that also requires acpi_os_unmap_memory() to be used for unmapping them, which causes synchronize_rcu() to be executed many times in a row unnecessarily and introduces substantial delays during resume on some systems. Instead of using acpi_os_map_memory() for mapping the NVS pages in nvs.c introduce acpi_os_ioremap() calling ioremap_cache() and make the code in both osl.c and nvs.c use it. Reported-by: Jeff Chua <jeff.chua.linux@gmail.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-21	cifs: fix up CIFSSMBEcho for unaligned access	Jeff Layton	1	-3/+3
	Make sure that CIFSSMBEcho can handle unaligned fields. Also fix a minor bug that causes this warning: fs/cifs/cifssmb.c: In function 'CIFSSMBEcho': fs/cifs/cifssmb.c:740: warning: large integer implicitly truncated to unsigned type ...WordCount is u8, not __le16, so no need to convert it. This patch should apply cleanly on top of the rest of the patchset to clean up unaligned access. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
2011-01-21	Merge branch 'for-next'	Steve French	14	-237/+402

2011-01-20	Merge branch 'akpm'	Linus Torvalds	310	-535/+642
	* akpm: kernel/smp.c: consolidate writes in smp_call_function_interrupt() kernel/smp.c: fix smp_call_function_many() SMP race memcg: correctly order reading PCG_USED and pc->mem_cgroup backlight: fix 88pm860x_bl macro collision drivers/leds/ledtrig-gpio.c: make output match input, tighten input checking MAINTAINERS: update Atmel AT91 entry mm: fix truncate_setsize() comment memcg: fix rmdir, force_empty with THP memcg: fix LRU accounting with THP memcg: fix USED bit handling at uncharge in THP memcg: modify accounting function for supporting THP better fs/direct-io.c: don't try to allocate more than BIO_MAX_PAGES in a bio mm: compaction: prevent division-by-zero during user-requested compaction mm/vmscan.c: remove duplicate include of compaction.h memblock: fix memblock_is_region_memory() thp: keep highpte mapped until it is no longer needed kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT
2011-01-20	kernel/smp.c: consolidate writes in smp_call_function_interrupt()	Milton Miller	1	-10/+19
	We have to test the cpu mask in the interrupt handler before checking the refs, otherwise we can start to follow an entry before its deleted and find it partially initailzed for the next trip. Presently we also clear the cpumask bit before executing the called function, which implies getting write access to the line. After the function is called we then decrement refs, and if they go to zero we then unlock the structure. However, this implies getting write access to the call function data before and after another the function is called. If we can assert that no smp_call_function execution function is allowed to enable interrupts, then we can move both writes to after the function is called, hopfully allowing both writes with one cache line bounce. On a 256 thread system with a kernel compiled for 1024 threads, the time to execute testcase in the "smp_call_function_many race" changelog was reduced by about 30-40ms out of about 545 ms. I decided to keep this as WARN because its now a buggy function, even though the stack trace is of no value -- a simple printk would give us the information needed. Raw data: Without patch: ipi_test startup took 1219366ns complete 539819014ns total 541038380ns ipi_test startup took 1695754ns complete 543439872ns total 545135626ns ipi_test startup took 7513568ns complete 539606362ns total 547119930ns ipi_test startup took 13304064ns complete 533898562ns total 547202626ns ipi_test startup took 8668192ns complete 544264074ns total 552932266ns ipi_test startup took 4977626ns complete 548862684ns total 553840310ns ipi_test startup took 2144486ns complete 541292318ns total 543436804ns ipi_test startup took 21245824ns complete 530280180ns total 551526004ns With patch: ipi_test startup took 5961748ns complete 500859628ns total 506821376ns ipi_test startup took 8975996ns complete 495098924ns total 504074920ns ipi_test startup took 19797750ns complete 492204740ns total 512002490ns ipi_test startup took 14824796ns complete 487495878ns total 502320674ns ipi_test startup took 11514882ns complete 494439372ns total 505954254ns ipi_test startup took 8288084ns complete 502570774ns total 510858858ns ipi_test startup took 6789954ns complete 493388112ns total 500178066ns #include <linux/module.h> #include <linux/init.h> #include <linux/sched.h> /* sched clock / #define ITERATIONS 100 static void do_nothing_ipi(void dummy) { } static void do_ipis(struct work_struct *dummy) { int i; for (i = 0; i < ITERATIONS; i++) smp_call_function(do_nothing_ipi, NULL, 1); printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id()); } static struct work_struct work[NR_CPUS]; static int __init testcase_init(void) { int cpu; u64 start, started, done; start = local_clock(); for_each_online_cpu(cpu) { INIT_WORK(&work[cpu], do_ipis); schedule_work_on(cpu, &work[cpu]); } started = local_clock(); for_each_online_cpu(cpu) flush_work(&work[cpu]); done = local_clock(); pr_info("ipi_test startup took %lldns complete %lldns total %lldns\n", started-start, done-started, done-start); return 0; } static void __exit testcase_exit(void) { } module_init(testcase_init) module_exit(testcase_exit) MODULE_LICENSE("GPL"); MODULE_AUTHOR("Anton Blanchard"); Signed-off-by: Milton Miller <miltonm@bga.com> Cc: Anton Blanchard <anton@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20	kernel/smp.c: fix smp_call_function_many() SMP race	Anton Blanchard	1	-0/+30
	I noticed a failure where we hit the following WARN_ON in generic_smp_call_function_interrupt: if (!cpumask_test_and_clear_cpu(cpu, data->cpumask)) continue; data->csd.func(data->csd.info); refs = atomic_dec_return(&data->refs); WARN_ON(refs < 0); <------------------------- We atomically tested and cleared our bit in the cpumask, and yet the number of cpus left (ie refs) was 0. How can this be? It turns out commit 54fdade1c3332391948ec43530c02c4794a38172 ("generic-ipi: make struct call_function_data lockless") is at fault. It removes locking from smp_call_function_many and in doing so creates a rather complicated race. The problem comes about because: - The smp_call_function_many interrupt handler walks call_function.queue without any locking. - We reuse a percpu data structure in smp_call_function_many. - We do not wait for any RCU grace period before starting the next smp_call_function_many. Imagine a scenario where CPU A does two smp_call_functions back to back, and CPU B does an smp_call_function in between. We concentrate on how CPU C handles the calls: CPU A CPU B CPU C CPU D smp_call_function smp_call_function_interrupt walks call_function.queue sees data from CPU A on list smp_call_function smp_call_function_interrupt walks call_function.queue sees (stale) CPU A on list smp_call_function int clears last ref on A list_del_rcu, unlock smp_call_function reuses percpu data A data->cpumask sees and clears bit in cpumask might be using old or new fn! decrements refs below 0 set data->refs (too late!) The important thing to note is since the interrupt handler walks a potentially stale call_function.queue without any locking, then another cpu can view the percpu data structure at any time, even when the owner is in the process of initialising it. The following test case hits the WARN_ON 100% of the time on my PowerPC box (having 128 threads does help :) #include <linux/module.h> #include <linux/init.h> #define ITERATIONS 100 static void do_nothing_ipi(void dummy) { } static void do_ipis(struct work_struct dummy) { int i; for (i = 0; i < ITERATIONS; i++) smp_call_function(do_nothing_ipi, NULL, 1); printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id()); } static struct work_struct work[NR_CPUS]; static int __init testcase_init(void) { int cpu; for_each_online_cpu(cpu) { INIT_WORK(&work[cpu], do_ipis); schedule_work_on(cpu, &work[cpu]); } return 0; } static void __exit testcase_exit(void) { } module_init(testcase_init) module_exit(testcase_exit) MODULE_LICENSE("GPL"); MODULE_AUTHOR("Anton Blanchard"); I tried to fix it by ordering the read and the write of ->cpumask and ->refs. In doing so I missed a critical case but Paul McKenney was able to spot my bug thankfully :) To ensure we arent viewing previous iterations the interrupt handler needs to read ->refs then ->cpumask then ->refs _again_. Thanks to Milton Miller and Paul McKenney for helping to debug this issue. [miltonm@bga.com: add WARN_ON and BUG_ON, remove extra read of refs before initial read of mask that doesn't help (also noted by Peter Zijlstra), adjust comments, hopefully clarify scenario ] [miltonm@bga.com: remove excess tests] Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Milton Miller <miltonm@bga.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: <stable@kernel.org> [2.6.32+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20	memcg: correctly order reading PCG_USED and pc->mem_cgroup	Johannes Weiner	1	-18/+9
	The placement of the read-side barrier is confused: the writer first sets pc->mem_cgroup, then PCG_USED. The read-side barrier has to be between testing PCG_USED and reading pc->mem_cgroup. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20	backlight: fix 88pm860x_bl macro collision	Randy Dunlap	1	-2/+2
	Fix collision with kernel-supplied #define: drivers/video/backlight/88pm860x_bl.c:24:1: warning: "CURRENT_MASK" redefined arch/x86/include/asm/page_64_types.h:6:1: warning: this is the location of the previous definition Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Haojian Zhuang <haojian.zhuang@marvell.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20	drivers/leds/ledtrig-gpio.c: make output match input, tighten input checking	Janusz Krzysztofik	1	-7/+8
	Replicate changes made to drivers/leds/ledtrig-backlight.c. Cc: Paul Mundt <lethal@linux-sh.org> Cc: Richard Purdie <richard.purdie@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20	MAINTAINERS: update Atmel AT91 entry	Nicolas Ferre	1	-2/+6
	Add two co-maintainers and update the entry with new information. Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Acked-by: Andrew Victor <linux@maxim.org.za> Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20	mm: fix truncate_setsize() comment	Jan Kara	1	-6/+5
	Contrary to what the comment says, truncate_setsize() should be called before filesystem truncated blocks. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20	memcg: fix rmdir, force_empty with THP	KAMEZAWA Hiroyuki	1	-11/+26
	Now, when THP is enabled, memcg's rmdir() function is broken because move_account() for THP page is not supported. This will cause account leak or -EBUSY issue at rmdir(). This patch fixes the issue by supporting move_account() THP pages. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20	memcg: fix LRU accounting with THP	KAMEZAWA Hiroyuki	1	-4/+18
	memory cgroup's LRU stat should take care of size of pages because Transparent Hugepage inserts hugepage into LRU. If this value is the number wrong, memory reclaim will not work well. Note: only head page of THP's huge page is linked into LRU. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20	memcg: fix USED bit handling at uncharge in THP	KAMEZAWA Hiroyuki	3	-40/+62
	Now, under THP: at charge: - PageCgroupUsed bit is set to all page_cgroup on a hugepage. ....set to 512 pages. at uncharge - PageCgroupUsed bit is unset on the head page. So, some pages will remain with "Used" bit. This patch fixes that Used bit is set only to the head page. Used bits for tail pages will be set at splitting if necessary. This patch adds this lock order: compound_lock() -> page_cgroup_move_lock(). [akpm@linux-foundation.org: fix warning] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20	memcg: modify accounting function for supporting THP better	KAMEZAWA Hiroyuki	1	-13/+12
	mem_cgroup_charge_statisics() was designed for charging a page but now, we have transparent hugepage. To fix problems (in following patch) it's required to change the function to get the number of pages as its arguments. The new function gets following as argument. - type of page rather than 'pc' - size of page which is accounted. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20	fs/direct-io.c: don't try to allocate more than BIO_MAX_PAGES in a bio	David Dillow	1	-3/+7
	When using devices that support max_segments > BIO_MAX_PAGES (256), direct IO tries to allocate a bio with more pages than allowed, which leads to an oops in dio_bio_alloc(). Clamp the request to the supported maximum, and change dio_bio_alloc() to reflect that bio_alloc() will always return a bio when called with __GFP_WAIT and a valid number of vectors. [akpm@linux-foundation.org: remove redundant BUG_ON()] Signed-off-by: David Dillow <dillowda@ornl.gov> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>