linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2009-09-11	[S390] cio: failing set online/offline processing.	Michael Ernst	2	-35/+67
	When unit checks trigger sensing the device state is set to W4SENSE until sense completion; then the device state is set back to ONLINE. If a unit check occurs while set online or set offline requests are processed then it might happen that the device's temporary W4SENSE state causes these functions to terminate, leaving the device in an inconsistent state when the state is set back to ONLINE later on so that the device cannot be set online or offline any longer. To solve this, set online/offline and related rollback or error routines are processed only if the device is in a final or DISCONNECTED state. Signed-off-by: Michael Ernst <mernst@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-09-11	[S390] cio: ensure to hold a reference for deferred deregistration	Sebastian Ott	1	-8/+6
	Ensure to always hold an extra device reference for scheduling a subchannel deregistration, by moving the get_device to ccw_device_schedule_sch_unregister. This fixes an use after free error in ccw_device_call_sch_unregister where put_device was called on an already freed device structure. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-09-11	[S390] qdio: continue polling if the queue is not finished	Jan Glauber	1	-1/+3
	With commit c38f96080955854e54df9cb392bc674e1ae330e1 polling was stopped for the queue even if new data is available. Return immediately after scheduling the queue tasklet if the queue is not done. Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-09-11	[S390] cio: increase trace level	Sebastian Ott	2	-31/+17
	Move debug traces for start I/O and interrupt events to exclusive trace levels. Also change tracing in hot-path from sprintf (costly) to hex. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-09-11	[S390] cio: fix not oper handling after failed [on\|off]line processing	Sebastian Ott	1	-0/+7
	If online/offline processing of a ccw device fails, resulting in not operational state, notify the driver and unregister the device in case the driver dosn't want to keep it. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-09-11	[S390] cio: consolidate subchannel intparm reset	Peter Oberparleiter	2	-10/+3
	Ensure that the hardware interruption parameter for a subchannel is reset when the associated subchannel data structure is freed. Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-09-11	[S390] cio: move scsw helper functions to header file	Heiko Carstens	5	-363/+259
	All scsw helper functions are very short and usage of them shouldn't result in function calls. Therefore we move them to a separate header file. Also saves a lot of EXPORT_SYMBOLs. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-09-11	[S390] cio: fix ineffective verify event	Peter Oberparleiter	1	-1/+12
	Path verification events occurring for offline devices are currently ignored. As a result, offline devices are not removed, even though they might no longer be accessible (for example because the last path to the device was varied offline). Fix this by scheduling a status evaluation for the affected subchannel when a path verification event occurs. Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-09-11	writeback: check for registered bdi in flusher add and inode dirty	Jens Axboe	3	-0/+16
	Also a debugging aid. We want to catch dirty inodes being added to backing devices that don't do writeback. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11	writeback: add name to backing_dev_info	Jens Axboe	17	-0/+18
	This enables us to track who does what and print info. Its main use is catching dirty inodes on the default_backing_dev_info, so we can fix that up. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11	writeback: add some debug inode list counters to bdi stats	Jens Axboe	1	-4/+34
	Add some debug entries to be able to inspect the internal state of the writeback details. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11	writeback: get rid of pdflush completely	Jens Axboe	4	-282/+6
	It is now unused, so kill it off. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11	writeback: switch to per-bdi threads for flushing data	Jens Axboe	10	-472/+1120
	This gets rid of pdflush for bdi writeout and kupdated style cleaning. pdflush writeout suffers from lack of locality and also requires more threads to handle the same workload, since it has to work in a non-blocking fashion against each queue. This also introduces lumpy behaviour and potential request starvation, since pdflush can be starved for queue access if others are accessing it. A sample ffsb workload that does random writes to files is about 8% faster here on a simple SATA drive during the benchmark phase. File layout also seems a LOT more smooth in vmstat: r b swpd free buff cache si so bi bo in cs us sy id wa 0 1 0 608848 2652 375372 0 0 0 71024 604 24 1 10 48 42 0 1 0 549644 2712 433736 0 0 0 60692 505 27 1 8 48 44 1 0 0 476928 2784 505192 0 0 4 29540 553 24 0 9 53 37 0 1 0 457972 2808 524008 0 0 0 54876 331 16 0 4 38 58 0 1 0 366128 2928 614284 0 0 4 92168 710 58 0 13 53 34 0 1 0 295092 3000 684140 0 0 0 62924 572 23 0 9 53 37 0 1 0 236592 3064 741704 0 0 4 58256 523 17 0 8 48 44 0 1 0 165608 3132 811464 0 0 0 57460 560 21 0 8 54 38 0 1 0 102952 3200 873164 0 0 4 74748 540 29 1 10 48 41 0 1 0 48604 3252 926472 0 0 0 53248 469 29 0 7 47 45 where vanilla tends to fluctuate a lot in the creation phase: r b swpd free buff cache si so bi bo in cs us sy id wa 1 1 0 678716 5792 303380 0 0 0 74064 565 50 1 11 52 36 1 0 0 662488 5864 319396 0 0 4 352 302 329 0 2 47 51 0 1 0 599312 5924 381468 0 0 0 78164 516 55 0 9 51 40 0 1 0 519952 6008 459516 0 0 4 78156 622 56 1 11 52 37 1 1 0 436640 6092 541632 0 0 0 82244 622 54 0 11 48 41 0 1 0 436640 6092 541660 0 0 0 8 152 39 0 0 51 49 0 1 0 332224 6200 644252 0 0 4 102800 728 46 1 13 49 36 1 0 0 274492 6260 701056 0 0 4 12328 459 49 0 7 50 43 0 1 0 211220 6324 763356 0 0 0 106940 515 37 1 10 51 39 1 0 0 160412 6376 813468 0 0 0 8224 415 43 0 6 49 45 1 1 0 85980 6452 886556 0 0 4 113516 575 39 1 11 54 34 0 2 0 85968 6452 886620 0 0 0 1640 158 211 0 0 46 54 A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A SSD based writeback test on XFS performs over 20% better as well, with the throughput being very stable around 1GB/sec, where pdflush only manages 750MB/sec and fluctuates wildly while doing so. Random buffered writes to many files behave a lot better as well, as does random mmap'ed writes. A separate thread is added to sync the super blocks. In the long term, adding sync_supers_bdi() functionality could get rid of this thread again. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11	writeback: move dirty inodes from super_block to backing_dev_info	Jens Axboe	6	-84/+165
	This is a first step at introducing per-bdi flusher threads. We should have no change in behaviour, although sb_has_dirty_inodes() is now ridiculously expensive, as there's no easy way to answer that question. Not a huge problem, since it'll be deleted in subsequent patches. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11	writeback: get rid of generic_sync_sb_inodes() export	Jens Axboe	7	-68/+58
	This adds two new exported functions: - writeback_inodes_sb(), which only attempts to writeback dirty inodes on this super_block, for WB_SYNC_NONE writeout. - sync_inodes_sb(), which writes out all dirty inodes on this super_block and also waits for the IO to complete. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11	pata_rz1000: use printk_once	Marcin Slusarz	1	-3/+1
	Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-11	ahci: kill @force_restart and refine CLO for ahci_kick_engine()	Shane Huang	1	-12/+10
	This patch refines ahci_kick_engine() after discussion with Tejun about FBS(FIS-based switching) support preparation: a. Kill @force_restart and always kick the engine. The only case where @force_restart is zero is when it's called from ahci_p5wdh_hardreset() Actually at that point, BSY is pretty much guaranteed to be set. b. If PMP is attached, ignore busy and always do CLO. (AHCI-1.3 9.2) Signed-off-by: Shane Huang <shane.huang@amd.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-11	pata_cs5535: add pci id for AMD based CS5535 controllers	Otavio Salvador	2	-1/+3
	Signed-off-by: Otavio Salvador <otavio@ossystems.com.br> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-11	ahci: Add AMD SB900 SATA/IDE controller device IDs	Shane Huang	5	-1/+14
	Add AMD SB900 SATA/IDE controller device IDs. Signed-off-by: Shane Huang <shane.huang@amd.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-11	drivers/ata: use resource_size	Julia Lawall	4	-7/+7
	Use the function resource_size, which reduces the chance of introducing off-by-one errors in calculating the resource size. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ struct resource *res; @@ - (res->end - res->start) + 1 + resource_size(res) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-10	md: Fix "strchr" [drivers/md/dm-log-userspace.ko] undefined!	Geert Uytterhoeven	1	-1/+1
	Commit b8313b6da7e2e7c7f47d93d8561969a3ff9ba0ea ("dm log: remove incorrect field from userspace table output") added a call to strstr() with a single-character "needle" string parameter. Unfortunately some versions of gcc replace such calls to strstr() by calls to strchr() behind our back. This causes linking errors if strchr() is defined as an inline function in <asm/string.h> (e.g. on m68k): \| WARNING: "strchr" [drivers/md/dm-log-userspace.ko] undefined! Avoid this by explicitly calling strchr() instead. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-10	sched: Fix sched::sched_stat_wait tracepoint field	Ingo Molnar	1	-2/+1
	This weird perf trace output: cc1-9943 [001] 2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns] Is caused by setting one component field of the delta to zero a bit too early. Move it to later. ( Note, this does not affect the NEW_FAIR_SLEEPERS interactivity bug, it's just a reporting bug in essence. ) Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nikos Chantziaras <realnc@arcor.de> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <4AA93D34.8040500@arcor.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-10	sched: Disable NEW_FAIR_SLEEPERS for now	Ingo Molnar	1	-1/+1
	Nikos Chantziaras and Jens Axboe reported that turning off NEW_FAIR_SLEEPERS improves desktop interactivity visibly. Nikos described his experiences the following way: " With this setting, I can do "nice -n 19 make -j20" and still have a very smooth desktop and watch a movie at the same time. Various other annoyances (like the "logout/shutdown/restart" dialog of KDE not appearing at all until the background fade-out effect has finished) are also gone. So this seems to be the single most important setting that vastly improves desktop behavior, at least here. " Jens described it the following way, referring to a 10-seconds xmodmap scheduling delay he was trying to debug: " Then I tried switching NO_NEW_FAIR_SLEEPERS on, and then I get: Performance counter stats for 'xmodmap .xmodmap-carl': 9.009137 task-clock-msecs # 0.447 CPUs 18 context-switches # 0.002 M/sec 1 CPU-migrations # 0.000 M/sec 315 page-faults # 0.035 M/sec 0.020167093 seconds time elapsed Woot! " So disable it for now. In perf trace output i can see weird delta timestamps: cc1-9943 [001] 2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns] That nsec field is not supposed to be that large. More digging is needed - but lets turn it off while the real bug is found. Reported-by: Nikos Chantziaras <realnc@arcor.de> Tested-by: Nikos Chantziaras <realnc@arcor.de> Reported-by: Jens Axboe <jens.axboe@oracle.com> Tested-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <4AA93D34.8040500@arcor.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-10	binfmt_elf: fix PT_INTERP bss handling	Roland McGrath	1	-14/+14
	In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if the PT_LOAD has no PROT_WRITE and no .bss. This generates EFAULT. Here is a small test case. (Yes, there are other, useful PT_INTERP which have only .text and no .data/.bss.) ----- ptinterp.S _start: .globl _start nop int3 ----- $ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S $ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c $ ./hello Segmentation fault # during execve() itself After applying the patch: $ ./hello Trace trap # user-mode execution after execve() finishes If the ELF headers are actually self-inconsistent, then dying is fine. But having no PROT_WRITE segment is perfectly normal and correct if there is no segment with p_memsz > p_filesz (i.e. bss). John Reiser suggested checking for PROT_WRITE in the bss logic. I think it makes most sense to simply apply the bss logic only when there is bss. This patch looks less trivial than it is due to some reindentation. It just moves the "if (last_bss > elf_bss) {" test up to include the partial-page bss logic as well as the more-pages bss logic. Reported-by: John Reiser <jreiser@bitwagon.com> Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-10	TPM: Fixup boot probe timeout for tpm_tis driver	Jason Gunthorpe	1	-6/+6
	When probing the device in tpm_tis_init the call request_locality uses timeout_a, which wasn't being initalized until after request_locality. This results in request_locality falsely timing out if the chip is still starting. Move the initialization to before request_locality. This probably only matters for embedded cases (ie mine), a BIOS likely gets the TPM into a state where this code path isn't necessary. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Acked-by: Rajiv Andrade <srajiv@linux.vnet.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-09	binfmt_elf: fix PT_INTERP bss handling	Roland McGrath	1	-14/+14
	In fs/binfmt_elf.c, load_elf_interp() calls padzero() for .bss even if the PT_LOAD has no PROT_WRITE and no .bss. This generates EFAULT. Here is a small test case. (Yes, there are other, useful PT_INTERP which have only .text and no .data/.bss.) ----- ptinterp.S _start: .globl _start nop int3 ----- $ gcc -m32 -nostartfiles -nostdlib -o ptinterp ptinterp.S $ gcc -m32 -Wl,--dynamic-linker=ptinterp -o hello hello.c $ ./hello Segmentation fault # during execve() itself After applying the patch: $ ./hello Trace trap # user-mode execution after execve() finishes If the ELF headers are actually self-inconsistent, then dying is fine. But having no PROT_WRITE segment is perfectly normal and correct if there is no segment with p_memsz > p_filesz (i.e. bss). John Reiser suggested checking for PROT_WRITE in the bss logic. I think it makes most sense to simply apply the bss logic only when there is bss. This patch looks less trivial than it is due to some reindentation. It just moves the "if (last_bss > elf_bss) {" test up to include the partial-page bss logic as well as the more-pages bss logic. Reported-by: John Reiser <jreiser@bitwagon.com> Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-10	sysfs: Add labeling support for sysfs	David P. Quigley	6	-37/+118
	This patch adds a setxattr handler to the file, directory, and symlink inode_operations structures for sysfs. The patch uses hooks introduced in the previous patch to handle the getting and setting of security information for the sysfs inodes. As was suggested by Eric Biederman the struct iattr in the sysfs_dirent structure has been replaced by a structure which contains the iattr, secdata and secdata length to allow the changes to persist in the event that the inode representing the sysfs_dirent is evicted. Because sysfs only stores this information when a change is made all the optional data is moved into one dynamically allocated field. This patch addresses an issue where SELinux was denying virtd access to the PCI configuration entries in sysfs. The lack of setxattr handlers for sysfs required that a single label be assigned to all entries in sysfs. Granting virtd access to every entry in sysfs is not an acceptable solution so fine grained labeling of sysfs is required such that individual entries can be labeled appropriately. [sds: Fixed compile-time warnings, coding style, and setting of inode security init flags.] Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov> Signed-off-by: Stephen D. Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-10	LSM/SELinux: inode_{get,set,notify}secctx hooks to access LSM security context information.	David P. Quigley	5	-0/+143
	This patch introduces three new hooks. The inode_getsecctx hook is used to get all relevant information from an LSM about an inode. The inode_setsecctx is used to set both the in-core and on-disk state for the inode based on a context derived from inode_getsecctx.The final hook inode_notifysecctx will notify the LSM of a change for the in-core state of the inode in question. These hooks are for use in the labeled NFS code and addresses concerns of how to set security on an inode in a multi-xattr LSM. For historical reasons Stephen Smalley's explanation of the reason for these hooks is pasted below. Quote Stephen Smalley inode_setsecctx: Change the security context of an inode. Updates the in core security context managed by the security module and invokes the fs code as needed (via __vfs_setxattr_noperm) to update any backing xattrs that represent the context. Example usage: NFS server invokes this hook to change the security context in its incore inode and on the backing file system to a value provided by the client on a SETATTR operation. inode_notifysecctx: Notify the security module of what the security context of an inode should be. Initializes the incore security context managed by the security module for this inode. Example usage: NFS client invokes this hook to initialize the security context in its incore inode to the value provided by the server for the file when the server returned the file's attributes to the client. Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-10	VFS: Factor out part of vfs_setxattr so it can be called from the SELinux hook for inode_setsecctx.	David P. Quigley	2	-13/+43
	This factors out the part of the vfs_setxattr function that performs the setting of the xattr and its notification. This is needed so the SELinux implementation of inode_setsecctx can handle the setting of the xattr while maintaining the proper separation of layers. Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-09	Linux 2.6.31	Linus Torvalds	1	-1/+1

2009-09-09	RDMA/iwcm: Reject the connection when the cm_id is destroyed	Steve Wise	1	-0/+1
	If the cm_id of a connect request is destroyed prior to the ULP accepting or rejecting the connection, then the provider never cleans up the connection. The iwcm should explicitly reject these connections if the cm_id is destroyed. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-09	RDMA/cxgb3: Clean up properly on FW mismatch failures	Steve Wise	2	-1/+5
	FW mismatches can cause a crash in the iw_cxgb3 event handler. - NULL the t3cdev->ulp pointer on failures in cxio_rdev_open() - Silently ignore events when the ulp ptr is NULL in iwch_err_handler() Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-09	RDMA/cxgb3: Don't ignore insert_handle() failures	Steve Wise	2	-22/+49
	Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2009-09-09	sched: Keep kthreads at default priority	Mike Galbraith	2	-6/+0
	Removes kthread/workqueue priority boost, they increase worst-case desktop latencies. Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1252486344.28645.18.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-09	sched: Re-tune the scheduler latency defaults to decrease worst-case latencies	Mike Galbraith	1	-6/+6
	Reduce the latency target from 20 msecs to 5 msecs. Why? Larger latencies increase spread, which is good for scaling, but bad for worst case latency. We still have the ilog(nr_cpus) rule to scale up on bigger server boxes. Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1252486344.28645.18.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-09	sched: Turn off child_runs_first	Mike Galbraith	3	-11/+11
	Set child_runs_first default to off. It hurts 'optimal' make -j<NR_CPUS> workloads as make jobs get preempted by child tasks, reducing parallelism. Note, this patch might make existing races in user applications more prominent than before - so breakages might be bisected to this commit. Child-runs-first is broken on SMP to begin with, and we already had it off briefly in v2.6.23 so most of the offenders ought to be fixed. Would be nice not to revert this commit but fix those apps finally ... Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1252486344.28645.18.camel@marge.simson.net> [ made the sysctl independent of CONFIG_SCHED_DEBUG, in case people want to work around broken apps. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-09	aoe: allocate unused request_queue for sysfs	Ed Cashin	3	-4/+11
	Andy Whitcroft reported an oops in aoe triggered by use of an incorrectly initialised request_queue object: [ 2645.959090] kobject '<NULL>' (ffff880059ca22c0): tried to add an uninitialized object, something is seriously wrong. [ 2645.959104] Pid: 6, comm: events/0 Not tainted 2.6.31-5-generic #24-Ubuntu [ 2645.959107] Call Trace: [ 2645.959139] [<ffffffff8126ca2f>] kobject_add+0x5f/0x70 [ 2645.959151] [<ffffffff8125b4ab>] blk_register_queue+0x8b/0xf0 [ 2645.959155] [<ffffffff8126043f>] add_disk+0x8f/0x160 [ 2645.959161] [<ffffffffa01673c4>] aoeblk_gdalloc+0x164/0x1c0 [aoe] The request queue of an aoe device is not used but can be allocated in code that does not sleep. Bruno bisected this regression down to cd43e26f071524647e660706b784ebcbefbd2e44 block: Expose stacked device queues in sysfs "This seems to generate /sys/block/$device/queue and its contents for everyone who is using queues, not just for those queues that have a non-NULL queue->request_fn." Addresses http://bugs.launchpad.net/bugs/410198 Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13942 Note that embedding a queue inside another object has always been an illegal construct, since the queues are reference counted and must persist until the last reference is dropped. So aoe was always buggy in this respect (Jens). Signed-off-by: Ed Cashin <ecashin@coraid.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Bruno Premont <bonbons@linux-vserver.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-09	KEYS: Add missing linux/tracehook.h #inclusions	David Howells	8	-0/+8
	Add #inclusions of linux/tracehook.h to those arch files that had the tracehook call for TIF_NOTIFY_RESUME added when support for that flag was added to that arch. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
2009-09-08	sata_fsl: Defer non-ncq commands when ncq commands active	Ashish Kalra	1	-0/+1
	Fix for non-ncq & ncq commands causing timeouts when both are issued simultaneously to the same device. Signed-off-by: Ashish Kalra <Ashish.Kalra@freescale.com> [fixed to be actual compileable C code -jg] Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-08	libata: add SATA PMP revision information for spec 1.2	Shane Huang	1	-0/+2
	This small patch is just adding the information for PMP spec 1.2 Signed-off-by: Shane Huang <shane.huang@amd.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-08	libata: fix off-by-one error in ata_tf_read_block()	Tejun Heo	1	-1/+7
	ata_tf_read_block() has off-by-one error when converting CHS address to LBA. The bug isn't very visible because ata_tf_read_block() is used only when generating sense data for a failed RW command and CHS addressing isn't used too often these days. This problem was spotted by Atsushi Nemoto. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-08	ahci: Gigabyte GA-MA69VM-S2 can't do 64bit DMA	Tejun Heo	1	-0/+16
	Gigabyte GA-MA69VM-S2 can't do 64bit DMA either. It's yet unknown whether recent BIOS fixes the problem. Blacklist regardless of BIOS revisions for now. Sandor Bodo-Merle reported and provided the initial patch for this issue. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Sandor Bodo-Merle <sbodomerle@gmail.com> Cc: Shane Huang <shane.huang@amd.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-08	ahci: make ahci_asus_m2a_vm_32bit_only() quirk more generic	Tejun Heo	1	-24/+29
	It turns out ASUS M2A-VM isn't the only one with the 32bit DMA problem. Make ahci_asus_m2a_vm_32bit_only() more generic using the new dmi_get_date() and rename it to ahci_sb600_32bit_only(). Cut off date is now pointed to by dmi_system_id->driver_data in "yyyymmdd" format and it's now also allowed to be omitted. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Sandor Bodo-Merle <sbodomerle@gmail.com> Cc: Shane Huang <shane.huang@amd.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-08	dmi: extend dmi_get_year() to dmi_get_date()	Tejun Heo	5	-25/+76
	There are cases where full date information is required instead of just the year. Add month and day parsing to dmi_get_year() and rename it to dmi_get_date(). As the original function only required '/' followed by any number of parseable characters at the end of the string, keep that behavior to avoid upsetting existing users. The new function takes dates of format [mm[/dd]]/yy[yy]. Year, month and date are checked to be in the ranges of [1-9999], [1-12] and [1-31] respectively and any invalid or out-of-range component is returned as zero. The dummy implementation is updated accordingly but the return value is updated to indicate field not found which is consistent with how other dummy functions behave. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-08	dmi: fix date handling in dmi_get_year()	Tejun Heo	1	-2/+3
	Year parsing in dmi_get_year() had the following two bugs. * "00" is treated as invalid instead of 2000 because zero return from simple_strtoul() is treated as error. * "0N" where N >= 8 is treated as invalid of 200N because the leading 0 is considered to specify octal. Fix the above two bugs by using endptr to detect invalid number and forcing decimal. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-08	libata: unbreak TPM filtering by reorganizing ata_scsi_pass_thru()	Tejun Heo	1	-53/+51
	ata_scsi_pass_thru() was checking for input sanity and disallowed commands while initializaing qc from scmd. TPM filtering was added right after protocol check at which point tf wasn't initialized properly. This means that TPM filtering has never really worked. This patch fixes the bug by reorganizing ata_scsi_pass_thru() such that qc is fully initialized before checking for invalid conditions which is way less error prone. Discovered while Thilo-Alexander Ginkel was trying debug patches for bko#13416. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Thilo-Alexander Ginkel <thilo@ginkel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-08	sata_sis: convert to slave_link	Tejun Heo	1	-50/+25
	During introduction of slave_link, sata_sis slipped through the crack and left with ad-hoc merged SCR access. As SCR status was shared for both the master and slave devices, when only one of the device is online, libata EH would think both are online but would only get valid device signature for the actually present one, which in turn trigger the probing safety net mechanism and make EH retry causing large delay during boot. This patch converts sata_sis to slave_link mechanism. This bug was reported by TAXI in bko#14075. http://bugzilla.kernel.org/show_bug.cgi?id=14075 Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: TAXI <taxi@a-city.de> Cc: Uwe Koziolek <uwe.koziolek@gmx.net> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-09-08	i915: disable interrupts before tearing down GEM state	Linus Torvalds	1	-5/+1
	Reinette Chatre reports a frozen system (with blinking keyboard LEDs) when switching from graphics mode to the text console, or when suspending (which does the same thing). With netconsole, the oops turned out to be BUG: unable to handle kernel NULL pointer dereference at 0000000000000084 IP: [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915] and it's due to the i915_gem.c code doing drm_irq_uninstall() after having done i915_gem_idle(). And the i915_gem_idle() path will do i915_gem_idle() -> i915_gem_cleanup_ringbuffer() -> i915_gem_cleanup_hws() -> dev_priv->hw_status_page = NULL; but if an i915 interrupt comes in after this stage, it may want to access that hw_status_page, and gets the above NULL pointer dereference. And since the NULL pointer dereference happens from within an interrupt, and with the screen still in graphics mode, the common end result is simply a silently hung machine. Fix it by simply uninstalling the irq handler before idling rather than after. Fixes http://bugzilla.kernel.org/show_bug.cgi?id=13819 Reported-and-tested-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-08	NFSv4: Disallow 'mount -t nfs4 -overs=2' and 'mount -t nfs4 -overs=3'	Trond Myklebust	1	-0/+6
	Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-08	NFS: Allow the "nfs" file system type to support NFSv4	Chuck Lever	3	-2/+48
	When mounting an "nfs" type file system, recognize "v4," "vers=4," or "nfsvers=4" mount options, and convert the file system to "nfs4" under the covers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [trondmy: fixed up binary mount code so it sets the 'version' field too] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>