linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2009-09-05	Linux 2.6.31-rc9	Linus Torvalds	1	-1/+1

2009-09-05	powerpc: Fix i8259 interrupt driver kernel crash on ML510	Roderick Colenbrander	1	-1/+0
	This patch fixes a null pointer exception caused by removal of 'ack()' for level interrupts in the Xilinx interrupt driver. A recent change to the xilinx interrupt controller removed the ack hook for level irqs. Signed-off-by: Roderick Colenbrander <thunderbird2k@gmail.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05	ext2: fix unbalanced kmap()/kunmap()	Nicolas Pitre	1	-0/+4
	In ext2_rename(), dir_page is acquired through ext2_dotdot(). It is then released through ext2_set_link() but only if old_dir != new_dir. Failing that, the pkmap reference count is never decremented and the page remains pinned forever. Repeat that a couple times with highmem pages and all pkmap slots get exhausted, and every further kmap() calls end up stalling on the pkmap_map_wait queue at which point the whole system comes to a halt. Signed-off-by: Nicolas Pitre <nico@marvell.com> Acked-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05	pty: don't limit the writes to 'pty_space()' inside 'pty_write()'	Linus Torvalds	1	-9/+1
	The whole write-room thing is something that is up to the _caller_ to worry about, not the pty layer itself. The total buffer space will still be limited by the buffering routines themselves, so there is no advantage or need in having pty_write() artificially limit the size somehow. And what happened was that the caller (the n_tty line discipline, in this case) may have verified that there is room for 2 bytes to be written (for NL -> CRNL expansion), and it used to then do those writes as two single-byte writes. And if the first byte written (CR) then caused a new tty buffer to be allocated, pty_space() may have returned zero when trying to write the second byte (LF), and then incorrectly failed the write - leading to a lost newline character. This should finally fix http://bugzilla.kernel.org/show_bug.cgi?id=14015 Reported-by: Mikael Pettersson <mikpe@it.uu.se> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05	n_tty: do O_ONLCR translation as a single write	Linus Torvalds	1	-2/+1
	When translating CR to CRNL in the n_tty line discipline, we did it as two tty_put_char() calls. Which works, but is stupid, and has caused problems before too with bad interactions with the write_room() logic. The generic USB serial driver had that problem, for example. Now the pty layer had similar issues after being moved to the generic tty buffering code (in commit d945cb9cce20ac7143c2de8d88b187f62db99bdc: "pty: Rework the pty layer to use the normal buffering logic"). So stop doing the silly separate two writes, and do it as a single write instead. That's what the n_tty layer already does for the space expansion of tabs (XTABS), and it means that we'll now always have just a single write for the CRNL to match the single 'tty_write_room()' test, which hopefully means that the next time somebody screws up buffering, it won't cause weeks of debugging. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05	exec: do not sleep in TASK_TRACED under ->cred_guard_mutex	Oleg Nesterov	3	-38/+43
	Tom Horsley reports that his debugger hangs when it tries to read /proc/pid_of_tracee/maps, this happens since "mm_for_maps: take ->cred_guard_mutex to fix the race with exec" 04b836cbf19e885f8366bccb2e4b0474346c02d commit in 2.6.31. But the root of the problem lies in the fact that do_execve() path calls tracehook_report_exec() which can stop if the tracer sets PT_TRACE_EXEC. The tracee must not sleep in TASK_TRACED holding this mutex. Even if we remove ->cred_guard_mutex from mm_for_maps() and proc_pid_attr_write(), another task doing PTRACE_ATTACH should not hang until it is killed or the tracee resumes. With this patch do_execve() does not use ->cred_guard_mutex directly and we do not hold it throughout, instead: - introduce prepare_bprm_creds() helper, it locks the mutex and calls prepare_exec_creds() to initialize bprm->cred. - install_exec_creds() drops the mutex after commit_creds(), and thus before tracehook_report_exec()->ptrace_stop(). or, if exec fails, free_bprm() drops this mutex when bprm->cred != NULL which indicates install_exec_creds() was not called. Reported-by: Tom Horsley <tom.horsley@att.net> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05	page-allocator: always change pageblock ownership when anti-fragmentation is disabled	Mel Gorman	1	-2/+4
	On low-memory systems, anti-fragmentation gets disabled as fragmentation cannot be avoided on a sufficiently large boundary to be worthwhile. Once disabled, there is a period of time when all the pageblocks are marked MOVABLE and the expectation is that they get marked UNMOVABLE at each call to __rmqueue_fallback(). However, when MAX_ORDER is large the pageblocks do not change ownership because the normal criteria are not met. This has the effect of prematurely breaking up too many large contiguous blocks. This is most serious on NOMMU systems which depend on high-order allocations to boot. This patch causes pageblocks to change ownership on every fallback when anti-fragmentation is disabled. This prevents the large blocks being prematurely broken up. This is a fix to commit 49255c619fbd482d704289b5eb2795f8e3b7ff2e [page allocator: move check for disabled anti-fragmentation out of fastpath] and the problem affects 2.6.31-rc8. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Tested-by: Paul Mundt <lethal@linux-sh.org> Cc: David Howells <dhowells@redhat.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05	nommu: fix error handling in do_mmap_pgoff()	David Howells	1	-2/+1
	Fix the error handling in do_mmap_pgoff(). If do_mmap_shared_file() or do_mmap_private() fail, we jump to the error_put_region label at which point we cann __put_nommu_region() on the region - but we haven't yet added the region to the tree, and so __put_nommu_region() may BUG because the region tree is empty or it may corrupt the region tree. To get around this, we can afford to add the region to the region tree before calling do_mmap_shared_file() or do_mmap_private() as we keep nommu_region_sem write-locked, so no-one can race with us by seeing a transient region. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Paul Mundt <lethal@linux-sh.org> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05	workqueues: introduce __cancel_delayed_work()	Oleg Nesterov	1	-0/+15
	cancel_delayed_work() has to use del_timer_sync() to guarantee the timer function is not running after return. But most users doesn't actually need this, and del_timer_sync() has problems: it is not useable from interrupt, and it depends on every lock which could be taken from irq. Introduce __cancel_delayed_work() which calls del_timer() instead. The immediate reason for this patch is http://bugzilla.kernel.org/show_bug.cgi?id=13757 but hopefully this helper makes sense anyway. As for 13757 bug, actually we need requeue_delayed_work(), but its semantics are not yet clear. Merge this patch early to resolves cross-tree interdependencies between input and infiniband. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-05	firewire: sbp2: fix freeing of unallocated memory	Stefan Richter	1	-4/+4
	If a target writes invalid status (typically status of a command that already timed out), firewire-sbp2 attempts to put away an ORB that doesn't exist. https://bugzilla.redhat.com/show_bug.cgi?id=519772 Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-09-05	firewire: ohci: fix Ricoh R5C832, video reception	Stefan Richter	1	-0/+5
	In dual-buffer DMA mode, no video frames are ever received from R5C832 by libdc1394. Fallback to packet-per-buffer DMA works reliably. http://thread.gmane.org/gmane.linux.kernel.firewire.devel/13393/focus=13476 Reported-by: Jonathan Cameron <jic23@cam.ac.uk> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-09-05	firewire: ohci: fix Agere FW643 and multiple cameras	Stefan Richter	1	-0/+9
	An Agere FW643 OHCI 1.1 card works fine for video reception from one camera but fails early if receiving from two cameras. After a short while, no IR IRQ events occur and the context control register does not react anymore. This happens regardless whether both IR DMA contexts are dual-buffer or one is dual-buffer and the other packet-per-buffer. This can be worked around by disabling dual buffer DMA mode entirely. http://sourceforge.net/mailarchive/message.php?msg_name=4A7C0594.2020208%40gmail.com (Reported by Samuel Audet.) In another report (by Jonathan Cameron), an FW643 works OK with two cameras in dual buffer mode. Whether this is due to different chip revisions or different usage patterns (different video formats) is not yet clear. However, as far as the current capabilities of firewire-core's isochronous I/O interface are concerned, simply switching off dual-buffer on non-working and working FW643s alike is not a problem in practice. We only need to revisit this issue if we are going to enhance the interface, e.g. so that applications can explicitly choose modes. Reported-by: Samuel Audet <samuel.audet@gmail.com> Reported-by: Jonathan Cameron <jic23@cam.ac.uk> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-09-05	firewire: core: fix crash in iso resource management	Stefan Richter	1	-2/+2
	This fixes a regression due to post 2.6.30 commit "firewire: core: do not DMA-map stack addresses" 6fdc03709433ccc2005f0f593ae9d9dd04f7b485. As David Moore noted, a previously correct sizeof() expression became wrong since the commit changed its argument from an array to a pointer. This resulted in an oops in ohci_cancel_packet in the shared workqueue thread's context when an isochronous resource was to be freed. Reported-by: Jonathan Cameron <jic23@cam.ac.uk> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2009-09-04	ocfs2: ocfs2_write_begin_nolock() should handle len=0	Sunil Mushran	1	-2/+2
	Bug introduced by mainline commit e7432675f8ca868a4af365759a8d4c3779a3d922 The bug causes ocfs2_write_begin_nolock() to oops when len=0. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Cc: stable@kernel.org Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04	dm snapshot: fix on disk chunk size validation	Mikulas Patocka	2	-8/+19
	Fix some problems seen in the chunk size processing when activating a pre-existing snapshot. For a new snapshot, the chunk size can either be supplied by the creator or a default value can be used. For an existing snapshot, the chunk size in the snapshot header on disk should always be used. If someone attempts to load an existing snapshot and has the 'default chunk size' option set, the kernel uses its default value even when it is incorrect for the snapshot being loaded. This patch ensures the correct on-disk value is always used. Secondly, when the code does use the chunk size stored on the disk it is prudent to revalidate it, so the code can exit cleanly if it got corrupted as happened in https://bugzilla.redhat.com/show_bug.cgi?id=461506 . Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04	dm exception store: split set_chunk_size	Mikulas Patocka	2	-0/+12
	Break the function set_chunk_size to two functions in preparation for the fix in the following patch. Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04	dm snapshot: fix header corruption race on invalidation	Mikulas Patocka	1	-10/+34
	If a persistent snapshot fills up, a race can corrupt the on-disk header which causes a crash on any future attempt to activate the snapshot (typically while booting). This patch fixes the race. When the snapshot overflows, __invalidate_snapshot is called, which calls snapshot store method drop_snapshot. It goes to persistent_drop_snapshot that calls write_header. write_header constructs the new header in the "area" location. Concurrently, an existing kcopyd job may finish, call copy_callback and commit_exception method, that goes to persistent_commit_exception. persistent_commit_exception doesn't do locking, relying on the fact that callbacks are single-threaded, but it can race with snapshot invalidation and overwrite the header that is just being written while the snapshot is being invalidated. The result of this race is a corrupted header being written that can lead to a crash on further reactivation (if chunk_size is zero in the corrupted header). The fix is to use separate memory areas for each. See the bug: https://bugzilla.redhat.com/show_bug.cgi?id=461506 Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04	dm snapshot: refactor zero_disk_area to use chunk_io	Mikulas Patocka	1	-19/+7
	Refactor chunk_io to prepare for the fix in the following patch. Pass an area pointer to chunk_io and simplify zero_disk_area to use chunk_io. No functional change. Cc: stable@kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04	dm log: userspace add luid to distinguish between concurrent log instances	Jonathan Brassow	4	-13/+31
	Device-mapper userspace logs (like the clustered log) are identified by a universally unique identifier (UUID). This identifier is used to associate requests from the kernel to a specific log in userspace. The UUID must be unique everywhere, since multiple machines may use this identifier when communicating about a particular log, as is the case for cluster logs. Sometimes, device-mapper/LVM may re-use a UUID. This is the case during pvmoves, when moving from one segment of an LV to another, or when resizing a mirror, etc. In these cases, a new log is created with the same UUID and loaded in the "inactive" slot. When a device-mapper "resume" is issued, the "live" table is deactivated and the new "inactive" table becomes "live". (The "inactive" table can also be removed via a device-mapper 'clear' command.) The above two issues were colliding. More than one log was being created with the same UUID, and there was no way to distinguish between them. So, sometimes the wrong log would be swapped out during the exchange. The solution is to create a locally unique identifier, 'luid', to go along with the UUID. This new identifier is used to determine exactly which log is being referenced by the kernel when the log exchange is made. The identifier is not universally safe, but it does not need to be, since create/destroy/suspend/resume operations are bound to a specific machine; and these are the operations that make up the exchange. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04	dm raid1: do not allow log_failure variable to unset after being set	Jonathan Brassow	1	-1/+7
	This patch fixes a bug which was triggering a case where the primary leg could not be changed on failure even when the mirror was in-sync. The case involves the failure of the primary device along with the transient failure of the log device. The problem is that bios can be put on the 'failures' list (due to log failure) before 'fail_mirror' is called due to the primary device failure. Normally, this is fine, but if the log device failure is transient, a subsequent iteration of the work thread, 'do_mirror', will reset 'log_failure'. The 'do_failures' function then resets the 'in_sync' variable when processing bios on the failures list. The 'in_sync' variable is what is used to determine if the primary device can be switched in the event of a failure. Since this has been reset, the primary device is incorrectly assumed to be not switchable. The case has been seen in the cluster mirror context, where one machine realizes the log device is dead before the other machines. As the responsibilities of the server migrate from one node to another (because the mirror is being reconfigured due to the failure), the new server may think for a moment that the log device is fine - thus resetting the 'log_failure' variable. In any case, it is inappropiate for us to reset the 'log_failure' variable. The above bug simply illustrates that it can actually hurt us. Cc: stable@kernel.org Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04	dm log: remove incorrect field from userspace table output	Jonathan Brassow	1	-6/+10
	The output of 'dmsetup table' includes an internal field that should not be there. This patch removes it. To make the fix simpler, we first reorder a constructor argument The 'device size' argument is generated internally. Currently it is placed as the last space-separated word of the constructor string. However, we need to use a version of the string without this word, so we move it to the beginning instead so it is trivial to skip past it. We keep a copy of the arguments passed to userspace for creating a log, just in case we need to resend them. These are the same arguments that are desired in the STATUSTYPE_TABLE request, except for one. When creating the userspace log, the userspace daemon must know the size of the mirror, so that is added to the arguments given in the constructor table. We were printing this extra argument out as well, which is a mistake. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04	dm log: fix userspace status output	Jonathan Brassow	1	-1/+1
	Fix 'dmsetup table' output. There is a missing ' ' at the end of the string causing two words to run together. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04	dm stripe: expose correct io hints	Mike Snitzer	3	-1/+20
	Set sensible I/O hints for striped DM devices in the topology infrastructure added for 2.6.31 for userspace tools to obtain via sysfs. Add .io_hints to 'struct target_type' to allow the I/O hints portion (io_min and io_opt) of the 'struct queue_limits' to be set by each target and implement this for dm-stripe. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04	dm table: add more context to terse warning messages	Mike Snitzer	1	-7/+18
	A couple of recent warning messages make it difficult for the reader to determine exactly what is wrong. This patch adds more information to those messages. The messages were added by these commits: 5dea271b6d87bd1d79a59c1d5baac2596a841c37 ("dm table: pass correct dev area size to device_area_is_valid") ea9df47cc92573b159ef3b4fda516c32cba9c4fd ("dm table: fix blk_stack_limits arg to use bytes not sectors") The patch also corrects references to logical_block_size in printk format strings from %hu to %u. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04	dm table: fix queue_limit checking device iterator	Mikulas Patocka	1	-11/+11
	The logic to check for valid device areas is inverted relative to proper use with iterate_devices. The iterate_devices method calls its callback for every underlying device in the target. If any callback returns non-zero, iterate_devices exits immediately. But the callback device_area_is_valid() returns 0 on error and 1 on success. The overall effect without is that an error is issued only if every device is invalid. This patch renames device_area_is_valid to device_area_is_invalid and inverts the logic so that one invalid device is sufficient to raise an error. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04	dm snapshot: implement iterate devices	Mike Snitzer	1	-2/+21
	Implement the .iterate_devices for the origin and snapshot targets. dm-snapshot's lack of .iterate_devices resulted in the inability to properly establish queue_limits for both targets. With 4K sector drives: an unfortunate side-effect of not establishing proper limits in either targets' DM device was that IO to the devices would fail even though both had been created without error. Commit af4874e03ed82f050d5872d8c39ce64bf16b5c38 ("dm target:s introduce iterate devices fn") in 2.6.31-rc1 should have implemented .iterate_devices for dm-snap.c's origin and snapshot targets. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04	dm multipath: fix oops when request based io fails when no paths	Kiyoshi Ueda	1	-5/+10
	The patch posted at http://marc.info/?l=dm-devel&m=124539787228784&w=2 which was merged into cec47e3d4a861e1d942b3a580d0bbef2700d2bb2 ("dm: prepare for request based option") introduced a regression in request-based dm. If map_request() calls dm_kill_unmapped_request() to complete a cloned bio without dispatching it, clone->bio is still set when dm_end_request() is called and the BUG_ON(clone->bio) is incorrect. The patch fixes this bug by freeing bio in dm_end_request() if the clone has bio. I've redone my tests to cover all I/O paths and confirmed there's no other regression. Here is the oops I hit in request-based dm when I do I/O to a multipath device which doesn't have any active path nor queue_if_no_path setting: ------------[ cut here ]------------ kernel BUG at /root/2.6.31-rc4.rqdm/drivers/md/dm.c:828! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map CPU 1 Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log dm_service_time dm_multipath scsi_dh dm_mod video output sbs sbshc battery ac sg sr_mod e1000e button cdrom serio_raw rtc_cmos rtc_core rtc_lib piix lpfc scsi_transport_fc ata_piix libata megaraid_sas sd_mod scsi_mod crc_t10dif ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode] Pid: 7, comm: ksoftirqd/1 Not tainted 2.6.31-rc4.rqdm #1 Express5800/120Lj [N8100-1417] RIP: 0010:[<ffffffffa023629d>] [<ffffffffa023629d>] dm_softirq_done+0xbd/0x100 [dm_mod] RSP: 0018:ffff8800280a1f08 EFLAGS: 00010282 RAX: ffffffffa02544e0 RBX: ffff8802aa1111d0 RCX: ffff8802aa1111e0 RDX: ffff8802ab913e70 RSI: 0000000000000000 RDI: ffff8802ab913e70 RBP: ffff8800280a1f28 R08: ffffc90005457040 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: 00000000fffffffb R13: ffff8802ab913e88 R14: ffff8802ab9c1438 R15: 0000000000000100 FS: 0000000000000000(0000) GS:ffff88002809e000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000003d54a98640 CR3: 000000029f0a1000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process ksoftirqd/1 (pid: 7, threadinfo ffff8802ae50e000, task ffff8802ae4f8040) Stack: ffff8800280a1f38 0000000000000020 ffffffff814f30a0 0000000000000004 <0> ffff8800280a1f58 ffffffff8116b245 ffff8800280a1f38 ffff8800280a1f38 <0> ffff8800280a1f58 0000000000000001 ffff8800280a1fa8 ffffffff810477bc Call Trace: <IRQ> [<ffffffff8116b245>] blk_done_softirq+0x75/0x90 [<ffffffff810477bc>] __do_softirq+0xcc/0x210 [<ffffffff81047170>] ? ksoftirqd+0x0/0x110 [<ffffffff8100ce7c>] call_softirq+0x1c/0x50 <EOI> [<ffffffff8100e785>] do_softirq+0x65/0xa0 [<ffffffff81047170>] ? ksoftirqd+0x0/0x110 [<ffffffff810471e0>] ksoftirqd+0x70/0x110 [<ffffffff81059559>] kthread+0x99/0xb0 [<ffffffff8100cd7a>] child_rip+0xa/0x20 [<ffffffff8100c73c>] ? restore_args+0x0/0x30 [<ffffffff810594c0>] ? kthread+0x0/0xb0 [<ffffffff8100cd70>] ? child_rip+0x0/0x20 Code: 44 89 e6 48 89 df e8 23 fb f2 e0 be 01 00 00 00 4c 89 f7 e8 f6 fd ff ff 5b 41 5c 41 5d 41 5e c9 c3 4c 89 ef e8 85 fe ff ff eb ed <0f> 0b eb fe 41 8b 85 dc 00 00 00 48 83 bb 10 01 00 00 00 89 83 RIP [<ffffffffa023629d>] dm_softirq_done+0xbd/0x100 [dm_mod] RSP <ffff8800280a1f08> ---[ end trace 16af0a1d8542da55 ]--- Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2009-09-04	sparc64: Fix bootup with mcount in some configs.	David S. Miller	3	-6/+5
	Functions invoked early when booting up a cpu can't use tracing because mcount requires a valid 'current_thread_info()' and TLB mappings to be setup. The code path of sun4v_register_mondo_queues --> register_one_mondo is one such case. sun4v_register_mondo_queues already has the necessary 'notrace' annotation, but register_one_mondo does not. Normally register_one_mondo is inlined so the bug doesn't trigger, but with some config/compiler combinations, it won't be so we must properly mark it notrace. While we're here, add 'notrace' annoations to prom_printf and prom_halt so that early error handling won't have the same problem. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Reported-by: Leif Sawyer <lsawyer@gci.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03	Input: atkbd - add Compaq Presario R4000-series repeat quirk	Dave Andrews	1	-0/+35
	Compaq Presario R4000-series laptops are not sending a "volume up button release" and "volume down button release" signal in the PS/2 protocol for atkbd. The URL below has some of confirmed reports: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/385477 Signed-off-by: Dave Andrews <jetdog330@hotmail.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2009-09-03	slub: Fix kmem_cache_destroy() with SLAB_DESTROY_BY_RCU	Eric Dumazet	1	-2/+2
	kmem_cache_destroy() should call rcu_barrier() after kmem_cache_close() and before sysfs_slab_remove() or risk rcu_free_slab() being called after kmem_cache is deleted (kfreed). rmmod nf_conntrack can crash the machine because it has to kmem_cache_destroy() a SLAB_DESTROY_BY_RCU enabled cache. Cc: <stable@kernel.org> Reported-by: Zdenek Kabelac <zdenek.kabelac@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2009-09-03	JFFS2: add missing verify buffer allocation/deallocation	Massimo Cirillo	1	-0/+10
	The function jffs2_nor_wbuf_flash_setup() doesn't allocate the verify buffer if CONFIG_JFFS2_FS_WBUF_VERIFY is defined, so causing a kernel panic when that macro is enabled and the verify function is called. Similarly the jffs2_nor_wbuf_flash_cleanup() must free the buffer if CONFIG_JFFS2_FS_WBUF_VERIFY is enabled. The following patch fixes the problem. The following patch applies to 2.6.30 kernel. Signed-off-by: Massimo Cirillo <maxcir@gmail.com> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Cc: stable@kernel.org
2009-09-03	mtd: nftl: fix offset alignments	Dimitri Gorokhovik	1	-6/+9
	Arithmetic conversion in the mask computation makes the upper word of the second argument passed down to mtd->read_oob(), be always 0 (assuming 'offs' being a 64-bit signed long long type, and 'mtd->writesize' being a 32-bit unsigned int type). This patch applies over the other one adding masking in nftl_write, "nftl: write support is broken". Signed-off-by: Dimitri Gorokhovik <dimitri.gorokhovik@free.fr> Cc: Tim Gardner <tim.gardner@canonical.com> Cc: Scott James Remnant <scott@canonical.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-09-03	mtd: nftl: write support is broken	Dimitri Gorokhovik	1	-1/+1
	Write support is broken in NFTL. Fix it. Signed-off-by: <dimitri.gorokhovik@free.fr> Cc: Tim Gardner <tim.gardner@canonical.com> Cc: Scott James Remnant <scott@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-09-03	mtd: m25p80: fix null pointer dereference bug	Anton Vorontsov	1	-1/+1
	This patch fixes the following oops, observed with MTD_PARTITIONS=n: m25p80 spi32766.0: m25p80 (1024 Kbytes) Unable to handle kernel paging request for data at address 0x00000008 Faulting instruction address: 0xc03a54b0 Oops: Kernel access of bad area, sig: 11 [#1] Modules linked in: NIP: c03a54b0 LR: c03a5494 CTR: c01e98b8 REGS: ef82bb60 TRAP: 0300 Not tainted (2.6.31-rc4-00167-g4733fd3) MSR: 00029000 <EE,ME,CE> CR: 24022022 XER: 20000000 DEAR: 00000008, ESR: 00000000 TASK = ef82c000[1] 'swapper' THREAD: ef82a000 GPR00: 00000000 ef82bc10 ef82c000 0000002e 00001eb8 ffffffff c01e9824 00000036 GPR08: c054ed40 c0542a08 00001eb8 00004000 22022022 1001a1a0 3ff8fd00 00000000 GPR16: 00000000 00000001 00000000 00000000 ef82bddc c0530000 efbef500 ef8356d0 GPR24: 00000000 ef8356d0 00000000 efbf7a00 c0530ec4 ffffffed efbf5300 c0541f98 NIP [c03a54b0] m25p_probe+0x22c/0x354 LR [c03a5494] m25p_probe+0x210/0x354 Call Trace: [ef82bc10] [c03a5494] m25p_probe+0x210/0x354 (unreliable) [ef82bca0] [c024e37c] spi_drv_probe+0x2c/0x3c [ef82bcb0] [c01f1afc] driver_probe_device+0xa4/0x178 [ef82bcd0] [c01f06e8] bus_for_each_drv+0x6c/0xa8 [ef82bd00] [c01f1a34] device_attach+0x84/0xa8 ... Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-09-03	sparc64: Kill spurious NMI watchdog triggers by increasing limit to 30 seconds.	David S. Miller	1	-1/+1
	This is a compromise and a temporary workaround for bootup NMI watchdog triggers some people see with qla2xxx devices present. This happens when, for example: CPU 0 is in the driver init and looping submitting mailbox commands to load the firmware, then waiting for completion. CPU 1 is receiving the device interrupts. CPU 1 is where the NMI watchdog triggers. CPU 0 is submitting mailbox commands fast enough that by the time CPU 1 returns from the device interrupt handler, a new one is pending. This sequence runs for more than 5 seconds. The problematic case is CPU 1's timer interrupt running when the barrage of device interrupts begin. Then we have: timer interrupt return for softirq checking pending, thus enable interrupts qla2xxx interrupt return qla2xxx interrupt return ... 5+ seconds pass final qla2xxx interrupt for fw load return run timer softirq return At some point in the multi-second qla2xxx interrupt storm we trigger the NMI watchdog on CPU 1 from the NMI interrupt handler. The timer softirq, once we get back to running it, is smart enough to run the timer work enough times to make up for the missed timer interrupts. However, the NMI watchdogs (both x86 and sparc) use the timer interrupt count to notice the cpu is wedged. But in the above scenerio we'll receive only one such timer interrupt even if we last all the way back to running the timer softirq. The default watchdog trigger point is only 5 seconds, which is pretty low (the softwatchdog triggers at 60 seconds). So increase it to 30 seconds for now. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03	perf_counter/powerpc: Fix cache event codes for POWER7	Paul Mackerras	1	-3/+3
	I had the codes for L1 D-cache load accesses and misses swapped around, and the wrong codes for LL-cache accesses and misses. This corrects them. Reported-by: Corey Ashford <cjashfor@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org> LKML-Reference: <19103.8514.709300.585484@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-02	tc: Fix unitialized kernel memory leak	Eric Dumazet	1	-0/+2
	Three bytes of uninitialized kernel memory are currently leaked to user Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-03	sound: oxygen: handle cards with missing EEPROM	Clemens Ladisch	1	-0/+3
	The card model detection code introduced in 2.6.30 that tries to work around partially broken EEPROM contents by reading the EEPROM directly does not handle cards where the EEPROM has been omitted. In this case, we have to use the default ID to allow the driver to load. Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Reported-and-tested-by: Ozan Çağlayan <ozan@pardus.org.tr> Cc: <stable@kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2009-09-02	[IA64] fix csum_ipv6_magic()	Jiri Bohac	1	-3/+5
	The 32-bit parameters (len and csum) of csum_ipv6_magic() are passed in 64-bit registers in2 and in4. The high order 32 bits of the registers were never cleared, and garbage was sometimes calculated into the checksum. Fix this by clearing the high order 32 bits of these registers. Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: Tony Luck <tony.luck@intel.com>
2009-09-02	[IA64] Fix warning in dma-mapping.c	Luck, Tony	1	-1/+3
	arch/ia64/kernel/dma-mapping.c:14: warning: control reaches end of non-void function arch/ia64/kernel/dma-mapping.c:14: warning: no return statement in function returning non-void This warning was introduced by commit: 390bd132b2831a2ad0268e84bffbfc0680debfe5 Add dma_debug_init() for ia64 Signed-off-by: Tony Luck <tony.luck@intel.com>
2009-09-01	pkt_sched: Revert tasklet_hrtimer changes.	David S. Miller	3	-21/+18
	These are full of unresolved problems, mainly that conversions don't work 1-1 from hrtimers to tasklet_hrtimers because unlike hrtimers tasklets can't be killed from softirq context. And when a qdisc gets reset, that's exactly what we need to do here. We'll work this out in the net-next-2.6 tree and if warranted we'll backport that work to -stable. This reverts the following 3 changesets: a2cb6a4dd470d7a64255a10b843b0d188416b78f ("pkt_sched: Fix bogon in tasklet_hrtimer changes.") 38acce2d7983632100a9ff3fd20295f6e34074a8 ("pkt_sched: Convert CBQ to tasklet_hrtimer.") ee5f9757ea17759e1ce5503bdae2b07e48e32af9 ("pkt_sched: Convert qdisc_watchdog to tasklet_hrtimer") Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01	net: sk_free() should be allowed right after sk_alloc()	Jarek Poplawski	1	-1/+1
	After commit 2b85a34e911bf483c27cfdd124aeb1605145dc80 (net: No more expensive sock_hold()/sock_put() on each tx) sk_free() frees socks conditionally and depends on sk_wmem_alloc being set e.g. in sock_init_data(). But in some cases sk_free() is called earlier, usually after other alloc errors. Fix is to move sk_wmem_alloc initialization from sock_init_data() to sk_alloc() itself. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01	xfs: actually enable the swapext compat handler	Christoph Hellwig	1	-1/+1
	Fix a small typo in the compat ioctl handler that cause the swapext compat handler to never be called. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Torsten Kaiser <just.for.lkml@googlemail.com> Tested-by: Torsten Kaiser <just.for.lkml@googlemail.com> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>
2009-09-01	block: Allow changing max_sectors_kb above the default 512	Nikanth Karthikesan	1	-1/+1
	The patch "block: Use accessor functions for queue limits" (ae03bf639a5027d27270123f5f6e3ee6a412781d) changed queue_max_sectors_store() to use blk_queue_max_sectors() instead of directly assigning the value. But blk_queue_max_sectors() differs a bit 1. It sets both max_sectors_kb, and max_hw_sectors_kb 2. Never allows one to change max_sectors_kb above BLK_DEF_MAX_SECTORS. If one specifies a value greater then max_hw_sectors is set to that value but max_sectors is set to BLK_DEF_MAX_SECTORS I am not sure whether blk_queue_max_sectors() should be changed, as it seems to be that way for a long time. And there may be callers dependent on that behaviour. This patch simply reverts to the older way of directly assigning the value to max_sectors as it was before. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-01	[CPUFREQ] Re-enable cpufreq suspend and resume code	Dominik Brodowski	1	-88/+7
	Commit 4bc5d3413503 is broken and causes regressions: (1) cpufreq_driver->resume() and ->suspend() were only called on __powerpc__, but you could set them on all architectures. In fact, ->resume() was defined and used before the PPC-related commit 42d4dc3f4e1e complained about in 4bc5d3413503. (2) Therfore, the resume functions in acpi_cpufreq and speedstep-smi would never be called. (3) This means speedstep-smi would be unusuable after suspend or resume. The _real_ problem was calling cpufreq_driver->get() with interrupts off, but it re-enabling interrupts on some platforms. Why is ->get() necessary? Some systems like to change the CPU frequency behind our back, especially during BIOS-intensive operations like suspend or resume. If such systems also use a CPU frequency-dependant timing loop, delays might be off by large factors. Therefore, we need to ascertain as soon as possible that the CPU frequency is indeed at the speed we think it is. You can do this two ways: either setting it anew, or trying to get it. The latter is what was done, the former also has the same IRQ issue. So, let's try something different: defer the checking to after interrupts are re-enabled, by calling cpufreq_update_policy() (via schedule_work()). Timings may be off until this later stage, so let's watch out for resume regressions caused by the deferred handling of frequency changes behind the kernel's back. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Dave Jones <davej@redhat.com>
2009-09-01	percpu: don't assume existence of cpu0	Tejun Heo	1	-1/+14
	percpu incorrectly assumed that cpu0 was always there which led to the following warning and eventual oops on sparc machines w/o cpu0. WARNING: at mm/percpu.c:651 pcpu_map+0xdc/0x100() Modules linked in: Call Trace: [000000000045eb70] warn_slowpath_common+0x50/0xa0 [000000000045ebdc] warn_slowpath_null+0x1c/0x40 [00000000004d493c] pcpu_map+0xdc/0x100 [00000000004d59a4] pcpu_alloc+0x3e4/0x4e0 [00000000004d5af8] __alloc_percpu+0x18/0x40 [00000000005b112c] __percpu_counter_init+0x4c/0xc0 ... Unable to handle kernel NULL pointer dereference ... I7: <sysfs_new_dirent+0x30/0x120> Disabling lock debugging due to kernel taint Caller[000000000053c1b0]: sysfs_new_dirent+0x30/0x120 Caller[000000000053c7a4]: create_dir+0x24/0xc0 Caller[000000000053c870]: sysfs_create_dir+0x30/0x80 Caller[00000000005990e8]: kobject_add_internal+0xc8/0x200 ... Kernel panic - not syncing: Attempted to kill the idle task! This patch fixes the problem by backporting parts from devel branch to make percpu core not depend on the existence of cpu0. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Meelis Roos <mroos@linux.ee> Cc: David Miller <davem@davemloft.net>
2009-09-01	sound: oxygen: fix MCLK rate for 192 kHz playback	Clemens Ladisch	1	-0/+2
	Do not forget to program the MCLK ratio for the I2S output. Otherwise, the master clock frequency can be too high for the DACs at sample frequencies above 96 kHz. Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Cc: <stable@kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
2009-08-31	autofs4 - fix missed case when changing to use struct path	Ian Kent	1	-1/+1
	In the recent change by Al Viro that changes verious subsystems to use "struct path" one case was missed in the autofs4 module which causes mounts to no longer expire. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-31	lmb: Also remove __init from lmb_end_of_RAM() declaration in lmb.h	Benjamin Herrenschmidt	1	-1/+1
	My previous patch (commit 4f8ee2c9cc: "lmb: Remove __init from lmb_end_of_DRAM()") removed __init in lmb.c but missed the fact that it was also marked as such in the .h Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-31	ata_piix: parallel scanning on PATA needs an extra locking	Bartlomiej Zolnierkiewicz	1	-1/+13
	Commit log for commit 517d3cc15b36392e518abab6bacbb72089658313 ("[libata] ata_piix: Enable parallel scan") says: This patch turns on parallel scanning for the ata_piix driver. This driver is used on most netbooks (no AHCI for cheap storage it seems). The scan is the dominating time factor in the kernel boot for these devices; with this flag it gets cut in half for the device I used for testing (eeepc). Alan took a look at the driver source and concluded that it ought to be safe to do for this driver. Alan has also checked with the hardware team. and it is all true but once we put all things together additional constraints for PATA controllers show up (some hardware registers have per-host not per-port atomicity) and we risk misprogramming the controller. I used the following test to check whether the issue is real: @@ -736,8 +736,20 @@ static void piix_set_piomode(struct ata_ (timings[pio][1] << 8); } pci_write_config_word(dev, master_port, master_data); - if (is_slave) + if (is_slave) { + if (ap->port_no == 0) { + u8 tmp = slave_data; + + while (slave_data == tmp) { + pci_read_config_byte(dev, slave_port, &tmp); + msleep(50); + } + + dev_printk(KERN_ERR, &dev->dev, "PATA parallel scan " + "race detected\n"); + } pci_write_config_byte(dev, slave_port, slave_data); + } /* Ensure the UDMA bit is off - it will be turned back on if UDMA is selected */ and it indeed triggered the error message. Lets fix all such races by adding an extra locking to ->set_piomode and ->set_dmamode methods for PATA controllers. [ Alan: would be better to take the host lock in libata-core for these cases so that we fix all the adapters in one swoop. "Looks fine as a temproary quickfix tho" ] Cc: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Alan Cox <alan@linux.intel.com> Cc: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>