linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2009-05-19	Fix scripts/setlocalversion with tagged git commit	Nico Schottelius	1	-7/+23
	Produce correct output for - tagged commit (v2.6.30-rc6) - past tagged commit (v2.6.30-rc5-299-g7c7327d) - no tag Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-19	Avoid ICE in get_random_int() with gcc-3.4.5	Linus Torvalds	1	-1/+1
	Martin Knoblauch reports that trying to build 2.6.30-rc6-git3 with RHEL4.3 userspace (gcc (GCC) 3.4.5 20051201 (Red Hat 3.4.5-2)) causes an internal compiler error (ICE): drivers/char/random.c: In function `get_random_int': drivers/char/random.c:1672: error: unrecognizable insn: (insn 202 148 150 0 /scratch/build/linux-2.6.30-rc6-git3/arch/x86/include/asm/tsc.h:23 (set (reg:SI 0 ax [91]) (subreg:SI (plus:DI (plus:DI (reg:DI 0 ax [88]) (subreg:DI (reg:SI 6 bp) 0)) (const_int -4 [0xfffffffffffffffc])) 0)) -1 (nil) (nil)) drivers/char/random.c:1672: internal compiler error: in extract_insn, at recog.c:2083 and after some debugging it turns out that it's due to the code trying to figure out the rough value of the current stack pointer by taking an address of an uninitialized variable and casting that to an integer. This is clearly a compiler bug, but it's not worth fighting - while the current stack kernel pointer might be somewhat hard to predict in user space, it's also not generally going to change for a lot of the call chains for a particular process. So just drop it, and mumble some incoherent curses at the compiler. Tested-by: Martin Knoblauch <spamtrap@knobisoft.de> Cc: Matt Mackall <mpm@selenic.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-18	nfs: Fix NFS v4 client handling of MAY_EXEC in nfs_permission.	Frank Filz	1	-1/+2
	The problem is that permission checking is skipped if atomic open is possible, but when exec opens a file, it just opens it O_READONLY which means EXEC permission will not be checked at that time. This problem is observed by the following sequence (executed as root): mount -t nfs4 server:/ /mnt4 echo "ls" >/mnt4/foo chmod 744 /mnt4/foo su guest -c "mnt4/foo" Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org Tested-by: Eugene Teo <eugeneteo@kernel.sg> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-18	Fix oops on close of hot-unplugged FTDI serial converter	David Woodhouse	1	-8/+1
	Commit c45d6320 ("fix reference counting of ftdi_private") stopped ftdi_sio_port_remove() from directly freeing the port-private data, with the intention if the port was still open, it would be freed when ftdi_close() is eventually called and releases the last refcount on the structure. That's all very well, but ftdi_sio_port_remove() still contains a call to usb_set_serial_port_data(port, NULL) -- so by the time we get to ftdi_close() for the port which was unplugged, it _still_ oopses on dereferencing that NULL pointer, as it did before (and does in 2.6.29). The fix is just not to clear the private data in ftdi_sio_port_remove(). Then the refcount is properly reduced to zero when the final kref_put() happens in ftdi_close(). Remove a bogus comment too, while we're at it. And stop doing things inside "if (priv)" -- it must _always_ be there. Based loosely on an earlier patch by Daniel Mack, and suggestions by Alan Stern. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Tested-by: Daniel Mack <daniel@caiaq.de> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-18	mtd_dataflash: unbreak erase support	Peter Korsgaard	1	-1/+1
	Commit 5b7f3a50 (fix dataflash 64-bit divisions) unfortunately introduced a typo. Erase addr and len were swapped in the pageaddr calculation, causing the wrong sectors to get erased. Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk> Acked-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-18	asm-generic: fix local_add_unless macro	Roel Kluin	1	-1/+1
	`local_add_unless(x, y, z)' will be expanded to `(&(x)->y, (y), (x))', but `&(x)->y' should be `&(x)->a' Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-18	microblaze: Fix kind-of-intr checking against number of interrupts	Michal Simek	1	-2/+2
	+ Fix typographic fault. Signed-off-by: Michal Simek <monstr@monstr.eu>
2009-05-18	microblaze: Update Microblaze defconfig	Michal Simek	1	-18/+32
	Signed-off-by: Michal Simek <monstr@monstr.eu>
2009-05-18	regulator: da903x: add missing __devexit_p()	Mike Frysinger	1	-1/+1
	The remove function uses __devexit, so the .remove assignment needs __devexit_p() to fix a build error with hotplug disabled. Signed-off-by: Mike Frysinger <vapier@gentoo.org> CC: Liam Girdwood <lrg@slimlogic.co.uk> CC: Mike Rapoport <mike@compulab.co.il> CC: Eric Miao <eric.miao@marvell.com> Acked-by: Eric Miao <eric.y.miao@gmail.com> Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
2009-05-18	powerpc: Explicit alignment for .data.cacheline_aligned	Benjamin Herrenschmidt	1	-0/+1
	I don't think anything guarantees that the objects in data.page_aligned are a multiple of PAGE_SIZE, thus the section may end on any boundary. So the following section, .data.cacheline_aligned needs an explicit alignment. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-18	powerpc/ps3: Update ps3_defconfig	Geoff Levand	1	-43/+62
	Refresh and set these options: CONFIG_SYSFS_DEPRECATED_V2: y -> n CONFIG_INPUT_JOYSTICK: y -> n CONFIG_HID_SONY: n -> m CONFIG_RTC_DRV_PS3: - -> m Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-18	powerpc/ftrace: Fix constraint to be early clobber	Steven Rostedt	1	-1/+1
	After upgrading my distcc boxes from gcc 4.2.2 to 4.4.0, the function graph tracer broke. This was discovered on my x86 boxes. The issue is that gcc used the same register for an output as it did for an input in an asm statement. I first thought this was a bug in gcc and reported it. I was notified that gcc was correct and that the output had to be flagged as an "early clobber". I noticed that powerpc had the same issue and this patch fixes it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-18	powerpc/ftrace: Use pr_devel() in ftrace.c	Michael Ellerman	1	-10/+10
	pr_debug() can now result in code being generated even when #DEBUG is not defined. That's not really desirable in the ftrace code which we want to be snappy. With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 3334 672 4 4010 faa arch/powerpc/kernel/ftrace.o size after: text data bss dec hex filename 2616 360 4 2980 ba4 arch/powerpc/kernel/ftrace.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-18	powerpc: Do not assert pte_locked for hugepage PTE entries	Mel Gorman	1	-1/+2
	With CONFIG_DEBUG_VM, an assertion is made when changing the protection flags of a PTE that the PTE is locked. Huge pages use a different pagetable format and the assertion is bogus and will always trigger with a bug looking something like Unable to handle kernel paging request for data at address 0xf1a00235800006f8 Faulting instruction address: 0xc000000000034a80 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=32 NUMA Maple Modules linked in: dm_snapshot dm_mirror dm_region_hash dm_log dm_mod loop evdev ext3 jbd mbcache sg sd_mod ide_pci_generic pata_amd ata_generic ipr libata tg3 libphy scsi_mod windfarm_pid windfarm_smu_sat windfarm_max6690_sensor windfarm_lm75_sensor windfarm_cpufreq_clamp windfarm_core i2c_powermac NIP: c000000000034a80 LR: c000000000034b18 CTR: 0000000000000003 REGS: c000000003037600 TRAP: 0300 Not tainted (2.6.30-rc3-autokern1) MSR: 9000000000009032 <EE,ME,IR,DR> CR: 28002484 XER: 200fffff DAR: f1a00235800006f8, DSISR: 0000000040010000 TASK = c0000002e54cc740[2960] 'map_high_trunca' THREAD: c000000003034000 CPU: 2 GPR00: 4000000000000000 c000000003037880 c000000000895d30 c0000002e5a2e500 GPR04: 00000000a0000000 c0000002edc40880 0000005700000393 0000000000000001 GPR08: f000000011ac0000 01a00235800006e8 00000000000000f5 f1a00235800006e8 GPR12: 0000000028000484 c0000000008dd780 0000000000001000 0000000000000000 GPR16: fffffffffffff000 0000000000000000 00000000a0000000 c000000003037a20 GPR20: c0000002e5f4ece8 0000000000001000 c0000002edc40880 0000000000000000 GPR24: c0000002e5f4ece8 0000000000000000 00000000a0000000 c0000002e5f4ece8 GPR28: 0000005700000393 c0000002e5a2e500 00000000a0000000 c000000003037880 NIP [c000000000034a80] .assert_pte_locked+0xa4/0xd0 LR [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4 Call Trace: [c000000003037880] [c000000003037990] 0xc000000003037990 (unreliable) [c000000003037910] [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4 [c0000000030379b0] [c00000000014bef8] .hugetlb_cow+0x124/0x674 [c000000003037b00] [c00000000014c930] .hugetlb_fault+0x4e8/0x6f8 [c000000003037c00] [c00000000013443c] .handle_mm_fault+0xac/0x828 [c000000003037cf0] [c0000000000340a8] .do_page_fault+0x39c/0x584 [c000000003037e30] [c0000000000057b0] handle_page_fault+0x20/0x5c Instruction dump: 7d29582a 7d200074 7800d182 0b000000 3c004000 3960ffff 780007c6 796b00c4 7d290214 7929a302 1d290068 7d6b4a14 <800b0010> 7c000074 7800d182 0b000000 This patch fixes the problem by not asseting the PTE is locked for VMAs backed by huge pages. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-17	page-writeback: fix the calculation of the oldest_jif in wb_kupdate()	Toshiyuki Okajima	1	-3/+3
	wb_kupdate() function has a bug on linux-2.6.30-rc5. This bug causes generic_sync_sb_inodes() to start to write inodes back much earlier than our expectations because it miscalculates oldest_jif in wb_kupdate(). This bug was introduced in 704503d836042d4a4c7685b7036e7de0418fbc0f ('mm: fix proc_dointvec_userhz_jiffies "breakage"'). Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-17	reiserfs: fixup perms when xattrs are disabled	Jeff Mahoney	2	-20/+20
	This adds CONFIG_REISERFS_FS_XATTR protection from reiserfs_permission. This is needed to avoid warnings during file deletions and chowns with xattrs disabled. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-17	reiserfs: deal with NULL xattr root w/ xattrs disabled	Jeff Mahoney	2	-3/+3
	This avoids an Oops in open_xa_root that can occur when deleting a file with xattrs disabled. It assumes that the xattr root will be there, and that is not guaranteed. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-17	reiserfs: clean up ifdefs	Jeff Mahoney	1	-23/+22
	With xattr cleanup even with xattrs disabled, much of the initial setup is still performed. Some #ifdefs are just not needed since the options they protect wouldn't be available anyway. This cleans those up. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-16	Fix caller information for warn_slowpath_null	Linus Torvalds	1	-15/+20
	Ian Campbell noticed that since "Eliminate thousands of warnings with gcc 3.2 build" (commit 57adc4d2dbf968fdbe516359688094eef4d46581) all WARN_ON()'s currently appear to come from warn_slowpath_null(), eg: WARNING: at kernel/softirq.c:143 warn_slowpath_null+0x1c/0x20() because now that warn_slowpath_null() is in the call path, the __builtin_return_address(0) returns that, rather than the place that caused the warning. Fix this by splitting up the warn_slowpath_null/fmt cases differently, using a common helper function, and getting the return address in the right place. This also happens to avoid the unnecessary stack usage for the non-stdargs case, and just generally cleans things up. Make the function name printout use %pS while at it. Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-16	piix: The Sony TZ90 needs the cable type hardcoding	Alan Cox	1	-0/+1
	The Sony TZ90 needs the cable type hardcoding. See bug #12734 Signed-off-by: Alan Cox <alan@linux.intel.com> Reported-by: Jonathan E. Snow <jesnow@uh.edu> [bart: port it from ata_piix to piix and give reporter the proper credit] Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-05-16	icside: register second channel of version 6 PCB	Sergei Shtylyov	1	-1/+1
	The second IDE channel of version 6 PCB is not being registered anymore since the commit 48c3c1072651922ed153bcf0a33ea82cf20df390 (ide: add struct ide_host (take 3)). Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-05-16	ide-tape: remove back-to-back REQUEST_SENSE detection	Tejun Heo	1	-6/+0
	Impact: fix an oops which always triggers ide_tape_issue_pc() assumed drive->pc isn't NULL on invocation when checking for back-to-back request sense issues but drive->pc can be NULL and even when it's not NULL, it's not safe to dereference it once the previous command is complete because pc could have been freed or was on stack. Kill back-to-back REQUEST_SENSE detection. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2009-05-16	ACPI: Idle C-states disabled by max_cstate should not disable the TSC	Len Brown	1	-1/+1
	Processor idle power states C2 and C3 stop the TSC on many machines. Linux recognizes this situation and marks the TSC as unstable: Marking TSC unstable due to TSC halts in idle But if those same machines are booted with "processor.max_cstate=1", then there is no need to validate C2 and C3, and no need to disable the TSC, which can be reliably used as a clocksource. Signed-off-by: Len Brown <len.brown@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de>
2009-05-16	ACPI: idle: fix init-time TSC check regression	Len Brown	1	-8/+9
	A previous 2.6.30 patch, a71e4917dc0ebbcb5a0ecb7ca3486643c1c9a6e2, (ACPI: idle: mark_tsc_unstable() at init-time, not run-time) erroneously disabled the TSC on systems that did not actually have valid deep C-states. Move the check after the deep-C-states are validated, via new helper, tsc_check_state(), hich replaces tsc_halts_in_c(). Signed-off-by: Len Brown <len.brown@intel.com> Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Frans Pop <elendil@planet.nl>
2009-05-15	Linux 2.6.30-rc6	Linus Torvalds	1	-1/+1

2009-05-15	ACPI processor: reset the throttling state once it's invalid	Zhang Rui	1	-0/+8
	If the BIOS hands us an invalid throttling state, write a valid state. http://bugzilla.kernel.org/show_bug.cgi?id=13259 Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: James Ettle <theholyettlz@googlemail.com> Signed-off-by: Len Brown <len.brown@intel.com>
2009-05-15	ACPI processor: introduce module parameter processor.ignore_tpc	Zhang Rui	1	-0/+17
	Introduce module parameter processor.ignore_tpc. Some laptops are shipped with buggy _TPC, this module parameter is used to to disable the buggy support. http://bugzilla.kernel.org/show_bug.cgi?id=13259 Signed-off-by: Zhang Rui <rui.zhang@intel.com> Tested-by: James Ettle <theholyettlz@googlemail.com> Signed-off-by: Len Brown <len.brown@intel.com>
2009-05-15	ACPI, i915: build fix	Len Brown	1	-0/+6
	drivers/built-in.o: In function `intel_opregion_init': (.text+0x9d540): undefined reference to `acpi_video_register' http://bugzilla.kernel.org/show_bug.cgi?id=13165 Signed-off-by: Len Brown <len.brown@intel.com>
2009-05-15	ACPI: suspend: restore BM_RLD on resume	Len Brown	1	-0/+23
	In 2.6.29, 31878dd86b7df9a147f5e6cc6e07092b4308782b "ACPI: remove BM_RLD access from idle entry path" moved BM_RLD initialization to init-time from run time. But we discovered that some BIOS do not restore BM_RLD after suspend, causing device errors on C3 and C4 after resume. So now the kernel restores BM_RLD. http://bugzilla.kernel.org/show_bug.cgi?id=13032 Signed-off-by: Len Brown <len.brown@intel.com>
2009-05-15	ACPI: resume: re-enable SCI-enable workaround	Lin Ming	1	-1/+6
	The BIOS bug workaround mistakenly got disabled when we followed the ACPI specification more closely by ignoring OS updates to that bit. (The BIOS is supposed to update SCI_EN, not the OS) http://bugzilla.kernel.org/show_bug.cgi?id=13289 Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2009-05-15	PM: check sysdev_suspend(PMSG_FREEZE) return value	Bjorn Helgaas	1	-2/+2
	Check the return value of sysdev_suspend(). I think this was a typo. Without this change, the following "if" check is always false. I also changed the error message so it's distinguishable from the similar message a few lines above. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2009-05-15	libata: Media rotation rate and form factor heuristics	Martin K. Petersen	2	-5/+34
	This patch provides new heuristics for parsing both the form factor and media rotation rate ATA IDENFITY words. The reported ATA version must be 7 or greater and the device must return values defined as valid in the standard. Only then are the characteristics reported to SCSI via the VPD B1 page. This seems like a reasonable compromise to me considering that we have been shipping several kernel releases that key off the rotation rate bit without any version checking whatsoever. With no complaints so far. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-05-15	libata: Report disk alignment and physical block size	Martin K. Petersen	1	-1/+22
	For disks with 4KB sectors, report the correct block size and alignment when filling out the READ CAPACITY(16) response. This patch is based upon code from Matthew Wilcox' 4KB ATA tree. I fixed the bug I reported a while back caused by ATA and SCSI using different approaches to describing the alignment. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-05-15	sata_fsl: Fix the command description of FSL SATA controller	Dave Liu	1	-3/+5
	The bit 11 of command description is reserved bit in Freescale SATA controller and needs to be set to '1'. This is needed to make sure the last write from the controller to the buffer descriptor is seen before an interrupt is raised. Signed-off-by: Dave Liu <daveliu@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-05-15	sata_fsl: Fix compile warnings	Kumar Gala	1	-3/+4
	We we build with dma_addr_t as a 64-bit quantity we get: drivers/ata/sata_fsl.c: In function 'sata_fsl_fill_sg': drivers/ata/sata_fsl.c:340: warning: format '%x' expects type 'unsigned int', but argument 4 has type 'dma_addr_t' Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-05-15	[libata] sata_sx4: fixup interrupt handling	David Milburn	1	-13/+9
	Issuing ATA_CMD_SET_FEATURES (0xef) times out because pdc20621_interrupt ignores command completion since ATA_TFLAG_POLLING flag is set. This has already been fixed for sata_promise: commit 51b94d2a5a90d4800e74d7348bcde098a28f4fb3 Author: Tejun Heo <htejun@gmail.com> Date: Fri Jun 8 13:46:55 2007 -0700 sata_promise: use TF interface for polling NODATA commands Also, this patch includes Mikael's original patches: http://marc.info/?l=linux-ide&m=121135828227724&w=2 http://marc.info/?l=linux-ide&m=121144512109826&w=2 Signed-off-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: David Milburn <dmilburn@redhat.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-05-15	x86: Fix performance regression caused by paravirt_ops on native kernels	Jeremy Fitzhardinge	7	-10/+38
	Xiaohui Xin and some other folks at Intel have been looking into what's behind the performance hit of paravirt_ops when running native. It appears that the hit is entirely due to the paravirtualized spinlocks introduced by: \| commit 8efcbab674de2bee45a2e4cdf97de16b8e609ac8 \| Date: Mon Jul 7 12:07:51 2008 -0700 \| \| paravirt: introduce a "lock-byte" spinlock implementation The extra call/return in the spinlock path is somehow causing an increase in the cycles/instruction of somewhere around 2-7% (seems to vary quite a lot from test to test). The working theory is that the CPU's pipeline is getting upset about the call->call->locked-op->return->return, and seems to be failing to speculate (though I haven't seen anything definitive about the precise reasons). This doesn't entirely make sense, because the performance hit is also visible on unlock and other operations which don't involve locked instructions. But spinlock operations clearly swamp all the other pvops operations, even though I can't imagine that they're nearly as common (there's only a .05% increase in instructions executed). If I disable just the pv-spinlock calls, my tests show that pvops is identical to non-pvops performance on native (my measurements show that it is actually about .1% faster, but Xiaohui shows a .05% slowdown). Summary of results, averaging 10 runs of the "mmperf" test, using a no-pvops build as baseline: nopv Pv-nospin Pv-spin CPU cycles 100.00% 99.89% 102.18% instructions 100.00% 100.10% 100.15% CPI 100.00% 99.79% 102.03% cache ref 100.00% 100.84% 100.28% cache miss 100.00% 90.47% 88.56% cache miss rate 100.00% 89.72% 88.31% branches 100.00% 99.93% 100.04% branch miss 100.00% 103.66% 107.72% branch miss rt 100.00% 103.73% 107.67% wallclock 100.00% 99.90% 102.20% The clear effect here is that the 2% increase in CPI is directly reflected in the final wallclock time. (The other interesting effect is that the more ops are out of line calls via pvops, the lower the cache access and miss rates. Not too surprising, but it suggests that the non-pvops kernel is over-inlined. On the flipside, the branch misses go up correspondingly...) So, what's the fix? Paravirt patching turns all the pvops calls into direct calls, so _spin_lock etc do end up having direct calls. For example, the compiler generated code for paravirtualized _spin_lock is: <_spin_lock+0>: mov %gs:0xb4c8,%rax <_spin_lock+9>: incl 0xffffffffffffe044(%rax) <_spin_lock+15>: callq 0xffffffff805a5b30 <_spin_lock+22>: retq The indirect call will get patched to: <_spin_lock+0>: mov %gs:0xb4c8,%rax <_spin_lock+9>: incl 0xffffffffffffe044(%rax) <_spin_lock+15>: callq <__ticket_spin_lock> <_spin_lock+20>: nop; nop / or whatever 2-byte nop */ <_spin_lock+22>: retq One possibility is to inline _spin_lock, etc, when building an optimised kernel (ie, when there's no spinlock/preempt instrumentation/debugging enabled). That will remove the outer call/return pair, returning the instruction stream to a single call/return, which will presumably execute the same as the non-pvops case. The downsides arel 1) it will replicate the preempt_disable/enable code at eack lock/unlock callsite; this code is fairly small, but not nothing; and 2) the spinlock definitions are already a very heavily tangled mass of #ifdefs and other preprocessor magic, and making any changes will be non-trivial. The other obvious answer is to disable pv-spinlocks. Making them a separate config option is fairly easy, and it would be trivial to enable them only when Xen is enabled (as the only non-default user). But it doesn't really address the common case of a distro build which is going to have Xen support enabled, and leaves the open question of whether the native performance cost of pv-spinlocks is worth the performance improvement on a loaded Xen system (10% saving of overall system CPU when guests block rather than spin). Still it is a reasonable short-term workaround. [ Impact: fix pvops performance regression when running native ] Analysed-by: "Xin Xiaohui" <xiaohui.xin@intel.com> Analysed-by: "Li Xin" <xin.li@intel.com> Analysed-by: "Nakajima Jun" <jun.nakajima@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Xen-devel <xen-devel@lists.xensource.com> LKML-Reference: <4A0B62F7.5030802@goop.org> [ fixed the help text ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15	[libata] sata_sx4: convert to new exception handling methods	Jeff Garzik	1	-45/+121
	Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-05-15	tracing: Append prompt in /debug/tracing/README file	GeunSik Lim	1	-1/+1
	append prompt in /debug/tracing/README file. This is trivial issue. Fix typo Mini Howto file(README) for ftrace. [ Impact: cleanup ] Signed-off-by: GeunSik Lim <geunsik.lim@samsung.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: williams <williams@redhat.com> LKML-Reference: <1242289418.31161.45.camel@centos51> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-05-15	devpts: correctly set default options	Sukadev Bhattiprolu	1	-6/+12
	devpts_get_sb() calls memset(0) to clear mount options and calls parse_mount_options() if user specified any mount options. The memset(0) is bogus since the 'mode' and 'ptmxmode' options are non-zero by default. parse_mount_options() restores options to default anyway and can properly deal with NULL mount options. So in devpts_get_sb() remove memset(0) and call parse_mount_options() even for NULL mount options. Bug reported by Eric Paris: http://lkml.org/lkml/2009/5/7/448. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Tested-by: Marc Dionne <marc.c.dionne@gmail.com> Reported-by: Eric Paris <eparis@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Reviewed-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-15	ext4: Fix race in ext4_inode_info.i_cached_extent	Theodore Ts'o	1	-5/+12
	If two CPU's simultaneously call ext4_ext_get_blocks() at the same time, there is nothing protecting the i_cached_extent structure from being used and updated at the same time. This could potentially cause the wrong location on disk to be read or written to, including potentially causing the corruption of the block group descriptors and/or inode table. This bug has been in the ext4 code since almost the very beginning of ext4's development. Fortunately once the data is stored in the page cache cache, ext4_get_blocks() doesn't need to be called, so trying to replicate this problem to the point where we could identify its root cause was extremely difficult. Many thanks to Kevin Shanahan for working over several months to be able to reproduce this easily so we could finally nail down the cause of the corruption. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
2009-05-15	kgdb: gdb documentation fix	Frank Rowand	1	-1/+1
	gdb command "set remote debug 1" is not valid, change to correct command. Signed-off-by: Frank Rowand <frank.rowand@am.sony.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2009-05-15	kgdb,i386: use address that SP register points to in the exception frame	Jason Wessel	1	-1/+2
	The treatment of the SP register is different on x86_64 and i386. This is a regression fix that lived outside the mainline kernel from 2.6.27 to now. The regression was a result of the original merge consolidation of the i386 and x86_64 archs to x86. The incorrectly reported SP on i386 prevented stack tracebacks from working correctly in gdb. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2009-05-15	sysrq, intel_fb: fix sysrq g collision	Jason Wessel	3	-6/+6
	Commit 79e539453b34e35f39299a899d263b0a1f1670bd introduced a regression where you cannot use sysrq 'g' to enter kgdb. The solution is to move the intel fb sysrq over to V for video instead of G for graphics. The SMP VOYAGER code to register for the sysrq-v is not anywhere to be found in the mainline kernel, so the comments in the code were cleaned up as well. This patch also cleans up the sysrq definitions for kgdb to make it generic for the kernel debugger, such that the sysrq 'g' can be used in the future to enter a gdbstub or another kernel debugger. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2009-05-15	Revert "mm: add /proc controls for pdflush threads"	Jens Axboe	4	-72/+12
	This reverts commit fafd688e4c0c34da0f3de909881117d374e4c7af. Work is progressing to switch away from pdflush as the process backing for flushing out dirty data. So it seems pointless to add more knobs to control pdflush threads. The original author of the patch did not have any specific use cases for adding the knobs, so we can easily revert this before 2.6.30 to avoid having to maintain this API forever. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-15	ASoC: DaVinci EVM board support buildfixes	David Brownell	3	-14/+81
	This is a build fix, resyncing the DaVinci EVM ASoC board code with the version in the DaVinci tree. That resync includes support for the DM355 EVM, although that board isn't yet in mainline. (NOTE: also includes a bugfix to the platform_add_resources call, recently sent by Chaithrika U S <chaithrika@ti.com> but not yet merged into the DaVinci tree.) Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2009-05-15	ASoC: DaVinci I2S updates	David Brownell	1	-3/+23
	This resyncs the DaVinci I2S code with the version in the DaVinci tree. The behavioral change uses updated clock interfaces which recently merged to mainline. Two other changes include adding a comment on the ASP/McBSP/McASP confusion, and dropping pdev->id in order to support more boards than just the DM644x EVM. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2009-05-15	ASoC: davinci-pcm buildfixes	David Brownell	1	-29/+42
	This is a buildfix for the DaVinci PCM code, resyncing it with the version in the DaVinci tree. The notable change is using current EDMA interfaces, which recently merged to mainline. (The older interfaces never made it into mainline.) NOTE: open issue, the DMA should be to/from SRAM; see chip errata for more info. The artifacts are extremely easy to hear on DM355 hardware (not yet supported in mainline), but don't seem as audible on DM6446 hardwaare (which does have mainline support). Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2009-05-15	powerpc: Fix PCI ROM access	Benjamin Herrenschmidt	2	-10/+26
	A couple of issues crept in since about 2.6.27 related to accessing PCI device ROMs on various powerpc machines. First, historically, we don't allocate the ROM resource in the resource tree. I'm not entirely certain of why, I susepct they often contained garbage on x86 but it's hard to tell. This causes the current generic code to always call pci_assign_resource() when trying to access the said ROM from sysfs, which will try to re-assign some new address regardless of what the ROM BAR was already set to at boot time. This can be a problem on hypervisor platforms like pSeries where we aren't supposed to move PCI devices around (and in fact probably can't). Second, our code that generates the PCI tree from the OF device-tree (instead of doing config space probing) which we mostly use on pseries at the moment, didn't set the (new) flag IORESOURCE_SIZEALIGN on any resource. That means that any attempt at re-assigning such a resource with pci_assign_resource() would fail due to resource_alignment() returning 0. This fixes this by doing these two things: - The code that calculates resource flags based on the OF device-node is improved to set IORESOURCE_SIZEALIGN on any valid BAR, and while at it also set IORESOURCE_READONLY for ROMs since we were lacking that too - We now allocate ROM resources as part of the resource tree. However to limit the chances of nasty conflicts due to busted firmwares, we only do it on the second pass of our two-passes allocation scheme, so that all valid and enabled BARs get precedence. This brings pSeries back the ability to access PCI ROMs via sysfs (and thus initialize various video cards from X etc...). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-05-15	powerpc/pseries: Really fix the oprofile CPU type on pseries	Benjamin Herrenschmidt	1	-1/+1
	My previous pach for fixing the oprofile CPU type got somewhat mismerged (by my fault) when it collided with another related patch. This should finally (fingers crossed) fix the whole thing. We make sure we keep the -old- oprofile type and CPU type whenever one of them was specified in the first pass through the function. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>