aboutsummaryrefslogtreecommitdiffstats
path: root/fs (follow)
AgeCommit message (Collapse)AuthorFilesLines
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.hTejun Heo353-72/+300
percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-29Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2Linus Torvalds11-74/+223
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2: Fix a race in o2dlm lockres mastery Ocfs2: Handle deletion of reflinked oprhan inodes correctly. Ocfs2: Journaling i_flags and i_orphaned_slot when adding inode to orphan dir. ocfs2: Clear undo bits when local alloc is freed ocfs2: Init meta_ac properly in ocfs2_create_empty_xattr_block. ocfs2: Fix the update of name_offset when removing xattrs ocfs2: Always try for maximum bits with new local alloc windows ocfs2: set i_mode on disk during acl operations ocfs2: Update i_blocks in reflink operations. ocfs2: Change bg_chain check for ocfs2_validate_gd_parent. [PATCH] Skip check for mandatory locks when unlocking
2010-03-29Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-clientLinus Torvalds12-91/+182
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (28 commits) ceph: update discussion list address in MAINTAINERS ceph: some documentations fixes ceph: fix use after free on mds __unregister_request ceph: avoid loaded term 'OSD' in documention ceph: fix possible double-free of mds request reference ceph: fix session check on mds reply ceph: handle kmalloc() failure ceph: propagate mds session allocation failures to caller ceph: make write_begin wait propagate ERESTARTSYS ceph: fix snap rebuild condition ceph: avoid reopening osd connections when address hasn't changed ceph: rename r_sent_stamp r_stamp ceph: fix connection fault con_work reentrancy problem ceph: prevent dup stale messages to console for restarting mds ceph: fix pg pool decoding from incremental osdmap update ceph: fix mds sync() race with completing requests ceph: only release unused caps with mds requests ceph: clean up handle_cap_grant, handle_caps wrt session mutex ceph: fix session locking in handle_caps, ceph_check_caps ceph: drop unnecessary WARN_ON in caps migration ...
2010-03-29ext3: fix broken handling of EXT3_STATE_NEWLinus Torvalds2-2/+4
In commit 9df93939b735 ("ext3: Use bitops to read/modify EXT3_I(inode)->i_state") ext3 changed its internal 'i_state' variable to use bitops for its state handling. However, unline the same ext4 change, it didn't actually change the name of the field when it changed the semantics of it. As a result, an old use of 'i_state' remained in fs/ext3/ialloc.c that initialized the field to EXT3_STATE_NEW. And that does not work _at_all_ when we're now working with individually named bits rather than values that get masked. So the code tried to mark the state to be new, but in actual fact set the field to EXT3_STATE_JDATA. Which makes no sense at all, and screws up all the code that checks whether the inode was newly allocated. In particular, it made the xattr code unhappy, and caused various random behavior, like apparently https://bugzilla.redhat.com/show_bug.cgi?id=577911 So fix the initialization, and rename the field to match ext4 so that we don't have this happen again. Cc: James Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Daniel J Walsh <dwalsh@redhat.com> Cc: Eric Paris <eparis@redhat.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-29SLOW_WORK: CONFIG_SLOW_WORK_PROC should be CONFIG_SLOW_WORK_DEBUGDavid Howells2-5/+5
CONFIG_SLOW_WORK_PROC was changed to CONFIG_SLOW_WORK_DEBUG, but not in all instances. Change the remaining instances. This makes the debugfs file display the time mark and the owner's description again. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-28ceph: fix use after free on mds __unregister_requestSage Weil1-1/+2
There was a use after free in __unregister_request that would trigger whenever the request map held the last reference. This appears to have triggered an oops during 'umount -f' when requests are being torn down. Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-26Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2Linus Torvalds2-12/+11
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix imperfect completion wait in nilfs_wait_on_logs nilfs2: fix hang-up of cleaner after log writer returned with error nilfs2: fix duplicate call to nilfs_segctor_cancel_freev
2010-03-26Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6Linus Torvalds1-8/+10
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: Restore LOOKUP_DIRECTORY hint handling in final lookup on open()
2010-03-26Restore LOOKUP_DIRECTORY hint handling in final lookup on open()Al Viro1-8/+10
Lose want_dir argument, while we are at it - since now nd->flags & LOOKUP_DIRECTORY is equivalent to it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-03-25Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4Linus Torvalds3-15/+22
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Fixed inode allocator to correctly track a flex_bg's used_dirs ext4: Don't use delayed allocation by default when used instead of ext3 ext4: Fix spelling of CONTIG_FS_EXT3 to CONFIG_FS_EXT3 ext4: Fix estimate of # of blocks needed to write indirect-mapped files
2010-03-24Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds2-1/+4
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: don't try to decode GETATTR if DELEGRETURN returned error sunrpc: handle allocation errors from __rpc_lookup_create() SUNRPC: Fix the return value of rpc_run_bc_task() SUNRPC: Fix a use after free bug with the NFSv4.1 backchannel SUNRPC: Fix a potential memory leak in auth_gss NFS: Prevent another deadlock in nfs_release_page()
2010-03-24fscache: add missing unlockDan Carpenter1-0/+1
Sparse complained about this missing spin_unlock() Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-24do_sync_read/write() should set kiocb.ki_nbytes to be consistentDavid Howells1-0/+2
do_sync_read/write() should set kiocb.ki_nbytes to be consistent with do_sync_readv_writev(). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-24FDPIC: For-loop in elf_core_vma_data_size() is incorrectDavid Howells1-1/+1
Fix an incorrect for-loop in elf_core_vma_data_size(). The advance-pointer statement lacks an assignment: CC fs/binfmt_elf_fdpic.o fs/binfmt_elf_fdpic.c: In function 'elf_core_vma_data_size': fs/binfmt_elf_fdpic.c:1593: warning: statement with no effect Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-24fs/partition/msdos: fix unusable extended partition for > 512B sectorOGAWA Hirofumi1-3/+10
Smaller size than a minimum blocksize can't be used, after all it's handled like 0 size. For extended partition itself, this makes sure to use bigger size than one logical sector size at least. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Daniel Taylor <Daniel.Taylor@wdc.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-24fs/partitions/msdos: add support for large disksDaniel Taylor1-34/+38
In order to use disks larger than 2TiB on Windows XP, it is necessary to use 4096-byte logical sectors in an MBR. Although the kernel storage and functions called from msdos.c used "sector_t" internally, msdos.c still used u32 variables, which results in the ability to handle XP-compatible large disks. This patch changes the internal variables to "sector_t". Daniel said: "In the near future, WD will be releasing products that need this patch". [hirofumi@mail.parknet.co.jp: tweaks and fix] Signed-off-by: Daniel Taylor <daniel.taylor@wdc.com> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-24kcore: fix test for end of listDan Carpenter1-1/+1
"m" is never NULL here. We need a different test for the end of list condition. Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-24reiserfs: properly honor read-only devicesJeff Mahoney1-6/+9
The reiserfs journal behaves inconsistently when determining whether to allow a mount of a read-only device. This is due to the use of the continue_replay variable to short circuit the journal scanning. If it's set, it's assumed that there are transactions to replay, but there may not be. If it's unset, it's assumed that there aren't any, and that may not be the case either. I've observed two failure cases: 1) Where a clean file system on a read-only device refuses to mount 2) Where a clean file system on a read-only device passes the optimization and then tries writing the journal header to update the latest mount id. The former is easily observable by using a freshly created file system on a read-only loopback device. This patch moves the check into journal_read_transaction, where it can bail out before it's about to replay a transaction. That way it can go through and skip transactions where appropriate, yet still refuse to mount a file system with outstanding transactions. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-24reiserfs: fix oops while creating privroot with selinux enabledJeff Mahoney1-1/+1
Commit 57fe60df ("reiserfs: add atomic addition of selinux attributes during inode creation") contains a bug that will cause it to oops when mounting a file system that didn't previously contain extended attributes on a system using security.* xattrs. The issue is that while creating the privroot during mount reiserfs_security_init calls reiserfs_xattr_jcreate_nblocks which dereferences the xattr root. The xattr root doesn't exist, so we get an oops. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=15309 Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-24fs/binfmt_aout.c: fix pointer warningsBorislav Petkov1-6/+8
fs/binfmt_aout.c: In function `aout_core_dump': fs/binfmt_aout.c:125: warning: passing argument 2 of `dump_write' makes pointer from integer without a cast include/linux/coredump.h:12: note: expected `const void *' but argument is of type `long unsigned int' fs/binfmt_aout.c:132: warning: passing argument 2 of `dump_write' makes pointer from integer without a cast include/linux/coredump.h:12: note: expected `const void *' but argument is of type `long unsigned int' due to dump_write() expecting a user void *. Fold casts into the START_DATA/START_STACK macros and shut up the warnings. Signed-off-by: Borislav Petkov <petkovbb@gmail.com> Cc: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-23ext4: Fixed inode allocator to correctly track a flex_bg's used_dirsEric Sandeen1-2/+2
When used_dirs was introduced for the flex_groups struct, it looks like the accounting was not put into place properly, in some places manipulating free_inodes rather than used_dirs. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-03-24ext4: Don't use delayed allocation by default when used instead of ext3Jan Kara1-9/+16
When ext4 driver is used to mount a filesystem instead of the ext3 file system driver (through CONFIG_EXT4_USE_FOR_EXT23), do not enable delayed allocation by default since some ext3 users and application writers have developed unfortunate expectations about the safety of writing files on systems subject to sudden and violent death without using fsync(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-03-24ext4: Fix spelling of CONTIG_FS_EXT3 to CONFIG_FS_EXT3Theodore Ts'o1-2/+2
Oops. (Blush.) Thanks to Sedat Dilek for pointing this out. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2010-03-23ocfs2: Fix a race in o2dlm lockres masterySrinivas Eeda1-3/+1
In o2dlm, the master of a lock resource keeps a map of all interested nodes. This prevents the master from purging the resource before an interested node can create a lock. A race between the mastery thread and the mastery handler allowed an interested node to discover who the master is without informing the master directly. This is easily fixed by holding the dlm spinlock a little longer in the mastery handler. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-23Ocfs2: Handle deletion of reflinked oprhan inodes correctly.Tristan Ye1-0/+15
The rule is that all inodes in the orphan dir have ORPHANED_FL, otherwise we treated it as an ERROR. This rule works well except for some rare cases of reflink operation: http://oss.oracle.com/bugzilla/show_bug.cgi?id=1215 The problem is caused by how reflink and our orphan_scan thread interact. * The orphan scan pulls the orphans into a queue first, then runs the queue at a later time. We only hold the orphan_dir's lock during scanning. * Reflink create a oprhaned target in orphan_dir as its first step. It removes the target and clears the flag as the final step. These two steps take the orphan_dir's lock, but it is not held for the duration. Based on the above semantics, a reflink inode can be moved out of the orphan dir and have its ORPHANED_FL cleared before the queue of orphans is run. This leads to a ERROR in ocfs2_query_wipde_inode(). This patch teaches ocfs2_query_wipe_inode() to detect previously orphaned reflink targets. If a reflink fails or a crash occurs during the relfink operation, the inode will retain ORPHANED_FL and will be properly wiped. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-23Ocfs2: Journaling i_flags and i_orphaned_slot when adding inode to orphan dir.Tristan Ye1-5/+23
Currently, some callers were missing to journal the dirty inode after adding it to orphan dir. Now we're going to journal such modifications within the ocfs2_orphan_add() itself, It's safe to do so, though some existing caller may duplicate this, and it makes the logic look more straightforward anyway. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-23ocfs2: Clear undo bits when local alloc is freedMark Fasheh4-46/+95
When the local alloc file changes windows, unused bits are freed back to the global bitmap. By defnition, those bits can not be in use by any file. Also, the local alloc will never have been able to allocate those bits if they were part of a previous truncate. Therefore it makes sense that we should clear unused local alloc bits in the undo buffer so that they can be used immediatly. [ Modified to call it ocfs2_release_clusters() -- Joel ] Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-24nilfs2: fix imperfect completion wait in nilfs_wait_on_logsRyusuke Konishi1-4/+4
nilfs_wait_on_logs has a potential to slip out before completion of all bio requests when it met an error. This synchronization fault may cause unexpected results, for instance, violative access to freed segment buffers from an end-bio callback routine. This fixes the issue by ensuring that nilfs_wait_on_logs waits all given logs. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2010-03-24nilfs2: fix hang-up of cleaner after log writer returned with errorRyusuke Konishi1-2/+1
According to the report from Andreas Beckmann (Message-ID: <4BA54677.3090902@abeckmann.de>), nilfs in 2.6.33 kernel got stuck after a disk full error. This turned out to be a regression by log writer updates merged at kernel 2.6.33. nilfs_segctor_abort_construction, which is a cleanup function for erroneous cases, was skipping writeback completion for some logs. This fixes the bug and would resolve the hang issue. Reported-by: Andreas Beckmann <debian@abeckmann.de> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable <stable@kernel.org> [2.6.33.x]
2010-03-23ceph: fix possible double-free of mds request referenceSage Weil1-1/+3
Clear pointer to mds request after dropping the reference to ensure we don't drop it again, as there is at least one error path through this function that does not reset fi->last_readdir to a new value. Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23ceph: fix session check on mds replySage Weil1-1/+1
Fix a broken check that a reply came back from the same MDS we sent the request to. I don't think a case that actually triggers this would ever come up in practice, but it's clearly wrong and easy to fix. Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23ceph: handle kmalloc() failureDan Carpenter1-0/+2
Return ERR_PTR(-ENOMEM) if kmalloc() fails. We handle allocation failures the same way later in the function. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23ceph: propagate mds session allocation failures to callerSage Weil1-1/+6
Return error to original caller if register_session() fails. Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23ceph: make write_begin wait propagate ERESTARTSYSSage Weil1-2/+8
Currently, if the wait_event_interruptible is interrupted, we return EAGAIN unconditionally and loop, such that we aren't, in fact, interruptible. So, propagate ERESTARTSYS if we get it. Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23ceph: fix snap rebuild conditionSage Weil1-2/+2
We were rebuilding the snap context when it was not necessary (i.e. when the realm seq hadn't changed _and_ the parent seq was still older), which caused page snapc pointers to not match the realm's snapc pointer (even though the snap context itself was identical). This confused begin_write and put it into an endless loop. The correct logic is: rebuild snapc if _my_ realm seq changed, or if my parent realm's seq is newer than mine (and thus mine needs to be rebuilt too). Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23ceph: avoid reopening osd connections when address hasn't changedSage Weil3-1/+23
We get a fault callback on _every_ tcp connection fault. Normally, we want to reopen the connection when that happens. If the address we have is bad, however, and connection attempts always result in a connection refused or similar error, explicitly closing and reopening the msgr connection just prevents the messenger's backoff logic from kicking in. The result can be a console full of [ 3974.417106] ceph: osd11 10.3.14.138:6800 connection failed [ 3974.423295] ceph: osd11 10.3.14.138:6800 connection failed [ 3974.429709] ceph: osd11 10.3.14.138:6800 connection failed Instead, if we get a fault, and have outstanding requests, but the osd address hasn't changed and the connection never successfully connected in the first place, do nothing to the osd connection. The messenger layer will back off and retry periodically, because we never connected and thus the lossy bit is not set. Instead, touch each request's r_stamp so that handle_timeout can tell the request is still alive and kicking. Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23ceph: rename r_sent_stamp r_stampSage Weil2-7/+7
Make variable name slightly more generic, since it will (soon) reflect either the time the request was sent OR the time it was last determined to be still retrying. Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23ceph: fix connection fault con_work reentrancy problemSage Weil1-2/+0
The messenger fault was clearing the BUSY bit, for reasons unclear. This made it possible for the con->ops->fault function to reopen the connection, and requeue work in the workqueue--even though the current thread was already in con_work. This avoids a problem where the client busy loops with connection failures on an unreachable OSD, but doesn't address the root cause of that problem. Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23ceph: prevent dup stale messages to console for restarting mdsSage Weil1-1/+1
Prevent duplicate 'mds0 caps stale' message from spamming the console every few seconds while the MDS restarts. Set s_renew_requested earlier, so that we only print the message once, even if we don't send an actual request. Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23ceph: fix pg pool decoding from incremental osdmap updateSage Weil1-7/+10
The incremental map decoding of pg pool updates wasn't skipping the snaps and removed_snaps vectors. This caused osd requests to stall when pool snapshots were created or fs snapshots were deleted. Use a common helper for full and incremental map decoders that decodes pools properly. Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23ceph: fix mds sync() race with completing requestsSage Weil1-7/+20
The wait_unsafe_requests() helper dropped the mdsc mutex to wait for each request to complete, and then examined r_node to get the next request after retaking the lock. But the request completion removes the request from the tree, so r_node was always undefined at this point. Since it's a small race, it usually led to a valid request, but not always. The result was an occasional crash in rb_next() while dereferencing node->rb_left. Fix this by clearing the rb_node when removing the request from the request tree, and not walking off into the weeds when we are done waiting for a request. Since the request we waited on will _always_ be out of the request tree, take a ref on the next request, in the hopes that it won't be. But if it is, it's ok: we can start over from the beginning (and traverse over older read requests again). Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23ceph: only release unused caps with mds requestsSage Weil1-3/+10
We were releasing used caps (e.g. FILE_CACHE) from encode_inode_release with MDS requests (e.g. setattr). We don't carry refs on most caps, so this code worked most of the time, but for setattr (utimes) we try to drop Fscr. This causes cap state to get slightly out of sync with reality, and may result in subsequent mds revoke messages getting ignored. Fix by only releasing unused caps. Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23ceph: clean up handle_cap_grant, handle_caps wrt session mutexSage Weil1-32/+25
Drop session mutex unconditionally in handle_cap_grant, and do the check_caps from the handle_cap_grant helper. This avoids using a magic return value. Also avoid using a flag variable in the IMPORT case and call check_caps at the appropriate point. Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23ceph: fix session locking in handle_caps, ceph_check_capsSage Weil1-5/+9
Passing a session pointer to ceph_check_caps() used to mean it would leave the session mutex locked. That wasn't always possible if it wasn't passed CHECK_CAPS_AUTHONLY. If could unlock the passed session and lock a differet session mutex, which was clearly wrong, and also emitted a warning when it a racing CPU retook it and we did an unlock from the wrong context. This was only a problem when there was more than one MDS. First, make ceph_check_caps unconditionally drop the session mutex, so that it is free to lock other sessions as needed. Then adjust the one caller that passes in a session (handle_cap_grant) accordingly. Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23ceph: drop unnecessary WARN_ON in caps migrationSage Weil1-2/+1
If we don't have the exported cap it's because we already released it. No need to WARN. Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23ceph: fix null pointer deref of r_osd in debug outputSage Weil1-1/+1
This causes an oops when debug output is enabled and we kick an osd request with no current r_osd (sometime after an osd failure). Check the pointer before dereferencing. Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23ceph: clean up service ticket decodingSage Weil1-8/+20
Previously we would decode state directly into our current ticket_handler. This is problematic if for some reason we fail to decode, because we end up with half new state and half old state. We are probably already in bad shape if we get an update we can't decode, but we may as well be tidy anyway. Decode into new_* temporaries and update the ticket_handler only on success. Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-22AFS: Potential null dereferenceDan Carpenter1-2/+3
It seems clear from the surrounding code that xpermits is allowed to be NULL here. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-22NFS: don't try to decode GETATTR if DELEGRETURN returned errorJeff Layton1-0/+2
The reply parsing code attempts to decode the GETATTR response even if the DELEGRETURN portion of the compound returned an error. The GETATTR response won't actually exist if that's the case and we're asking the parser to read past the end of the response. This bug is fairly benign. The parser catches this without reading past the end of the response and decode_getfattr returns -EIO. Earlier kernels however had decode_op_hdr using the READ_BUF macro, and this bug would make this printk pop any time the client got an error from a delegreturn: kernel: decode_op_hdr: reply buffer overflowed in line XXXX More recent kernels seem to have replaced this printk with a dprintk. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-03-22nilfs2: fix duplicate call to nilfs_segctor_cancel_freevRyusuke Konishi1-6/+6
Andreas Beckmann gave me a report that nilfs logged the following warnings when it got a disk full: nilfs_sufile_do_cancel_free: segment 0 must be clean nilfs_sufile_do_cancel_free: segment 1 must be clean These arise from a duplicate call to nilfs_segctor_cancel_freev in an error path of log writer. This will fix the issue. Reported-by: Andreas Beckmann <debian@abeckmann.de> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>