aboutsummaryrefslogtreecommitdiffstats
path: root/fs/ceph (follow)
AgeCommit message (Collapse)AuthorFilesLines
2010-07-05ceph: fix crush device 'out' threshold to 1.0, not 0.1Sage Weil1-1/+1
Fix a typo that made any OSD weighted between 0.1 and 1.0 effectively weighted as 1.0 (fully in). Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-29ceph: fix caps usage accounting for import (non-reserved) caseSage Weil1-2/+8
We need to increase the total and used counters when allocating a new cap in the non-reserved (cap import) case. Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-29ceph: only release clean, unused caps with mds requestsSage Weil1-5/+6
We can drop caps with an mds request. Ensure we only drop unused AND clean caps, since the MDS doesn't support cap writeback in that context, nor do we track it. If caps are dirty, and the MDS needs them back, we it will revoke and we will flush in the normal fashion. This fixes a possibly loss of metadata. Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-24ceph: fix crush CHOOSE_LEAF when type is already a leafSage Weil1-13/+25
We may not recurse for CHOOSE_LEAF if we start with a leaf node. When that happens, the out2 vector needs to be filled in with the result. Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-24ceph: fix crush recursionSage Weil1-0/+1
There was a longstanding problem with recursion through intervening bucket types on complex hierarchies. Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-24ceph: fix caps debugfs entryYehuda Sadeh1-1/+1
The ceph client structure was not set correctly. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-21ceph: delay umount until all mds requests drop inode+dentry refsSage Weil1-0/+6
This fixes a race between handle_reply finishing an mds request, signalling completion, and then dropping the request structing and its dentry+inode refs, and pre_umount function waiting for requests to finish before letting the vfs tear down the dcache. If umount was delayed waiting for mds requests, we could race and BUG in shrink_dcache_for_umount_subtree because of a slow dput. This delays umount until the msgr queue flushes, which means handle_reply will exit and will have dropped the ceph_mds_request struct. I'm assuming the VFS has already ensured that its calls have all completed and those request refs have thus been dropped as well (I haven't seen that race, at least). Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-21ceph: handle splice_dentry/d_materialize_unique error in readdir_prepopulateSage Weil1-7/+12
Handle a splice_dentry failure (due to a d_materialize_unique error) without crashing. (Also, report the error code.) Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-17ceph: fix crush map update decodingSage Weil1-0/+1
If the incremental osdmap has a new crush map, advance the position after decoding so that we can parse the rest of the osdmap properly. Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-13ceph: fix message memory leak, uninitialized variableSage Weil1-0/+2
We need to properly initialize skip, as not all alloc_msg op instances set it. Also, BUG if someone says skip but also allocates a message. Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-13ceph: fix map handler error pathSage Weil1-1/+2
Don't leak message if we receive an unexpected message type. Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-13ceph: some endianity fixesYehuda Sadeh3-3/+4
Fix some problems that came up with sparse. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-10ceph: try to send partial cap release on cap message on missing inodeSage Weil3-5/+9
If we have enough memory to allocate a new cap release message, do so, so that we can send a partial release message immediately. This keeps us from making the MDS wait when the cap release it needs is in a partially full release message. If we fail because of ENOMEM, oh well, they'll just have to wait a bit longer. Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-10ceph: release cap on import if we don't have the inodeSage Weil3-38/+61
If we get an IMPORT that give us a cap, but we don't have the inode, queue a release (and try to send it immediately) so that the MDS doesn't get stuck waiting for us. Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-10ceph: fix misleading/incorrect debug messageSage Weil1-1/+1
Nothing is released here: the caps message is simply ignored in this case. Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-10ceph: fix atomic64_t initialization on ia64Jeff Mahoney1-1/+1
bdi_seq is an atomic_long_t but we're using ATOMIC_INIT, which causes build failures on ia64. This patch fixes it to use ATOMIC_LONG_INIT. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-04ceph: fix lease revocation when seq doesn't matchSage Weil1-4/+8
If the client revokes a lease with a higher seq than what we have, keep the mds's seq, so that it honors our release. Otherwise, we can hang indefinitely. Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-01ceph: fix f_namelen reported by statfsSage Weil1-1/+1
We were setting f_namelen in kstatfs to PATH_MAX instead of NAME_MAX. That disagrees with ceph_lookup behavior (which checks against NAME_MAX), and also makes the pjd posix test suite spit out ugly errors because with can't clean up its temporary files. Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-01ceph: fix memory leak in statfsYehuda Sadeh1-0/+2
Freeing the statfs request structure when required. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-01ceph: fix d_subdirs ordering problemHenry C Chang1-1/+1
We misused list_move_tail() to order the dentry in d_subdirs. This will screw up the d_subdirs order. This bug can be reliably reproduced by: 1. mount ceph fs. 2. on ceph fs, git clone git://ceph.newdream.net/git/ceph.git 3. Run autogen.sh in ceph directory. (Note: Errors only occur at the first time you run autogen.sh.) Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-30Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-clientLinus Torvalds17-32/+85
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: clean up on forwarded aborted mds request ceph: fix leak of osd authorizer ceph: close out mds, osd connections before stopping auth ceph: make lease code DN specific fs/ceph: Use ERR_CAST ceph: renew auth tickets before they expire ceph: do not resend mon requests on auth ticket renewal ceph: removed duplicated #includes ceph: avoid possible null dereference ceph: make mds requests killable, not interruptible sched: add wait_for_completion_killable_timeout
2010-05-29ceph: clean up on forwarded aborted mds requestSage Weil1-4/+9
If an mds request is aborted (timeout, SIGKILL), it is left registered to keep our state in sync with the mds. If we get a forward notification, though, we know the request didn't succeed and we can unregister it safely. We were trying to resend it, but then bailing out (and not unregistering) in __do_request. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29ceph: fix leak of osd authorizerSage Weil1-1/+6
Release the ceph_authorizer when releasing osd state. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29ceph: close out mds, osd connections before stopping authSage Weil3-1/+16
The auth module (part of the mon_client) is needed to free any ceph_authorizer(s) used by the mds and osd connections. Flush the msgr workqueue before stopping monc to ensure that the destroy_authorizer auth op is available when those connections are closed out. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29ceph: make lease code DN specificSage Weil2-12/+13
The lease code includes a mask in the CEPH_LOCK_* namespace, but that namespace is changing, and only one mask (formerly _DN == 1) is used, so hard code for that value for now. If we ever extend this code to handle leases over different data types we can extend it accordingly. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29fs/ceph: Use ERR_CASTJulia Lawall6-6/+6
Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more clear what is the purpose of the operation, which otherwise looks like a no-op. In the case of fs/ceph/inode.c, ERR_CAST is not needed, because the type of the returned value is the same as the type of the enclosing function. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ type T; T x; identifier f; @@ T f (...) { <+... - ERR_PTR(PTR_ERR(x)) + x ...+> } @@ expression x; @@ - ERR_PTR(PTR_ERR(x)) + ERR_CAST(x) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29ceph: renew auth tickets before they expireSage Weil4-1/+27
We were only requesting renewal after our tickets expire; do so before that. Most of the low-level logic for this was already there; just use it. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29ceph: do not resend mon requests on auth ticket renewalSage Weil1-1/+4
We only want to send pending mon requests when we successfully authenticate. If we are already authenticated, like when we renew our ticket, there is no need to resend pending requests. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29ceph: removed duplicated #includesAndrea Gelmini2-2/+0
fs/ceph/auth.c: linux/slab.h is included more than once. fs/ceph/super.h: linux/slab.h is included more than once. Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29ceph: avoid possible null dereferenceSage Weil1-2/+2
ac->ops may be null; use protocol id in error message instead. Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29ceph: make mds requests killable, not interruptibleSage Weil1-2/+2
The underlying problem is that many mds requests can't be restarted. For example, a restarted create() would return -EEXIST if the original request succeeds. However, we do not want a hung MDS to hang the client too. So, use the _killable wait_for_completion variants to abort on SIGKILL but nothing else. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-27drop unused dentry argument to ->fsyncChristoph Hellwig3-6/+5
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-24Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-clientLinus Torvalds30-759/+876
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (59 commits) ceph: reuse mon subscribe message instead of allocated anew ceph: avoid resending queued message to monitor ceph: Storage class should be before const qualifier ceph: all allocation functions should get gfp_mask ceph: specify max_bytes on readdir replies ceph: cleanup pool op strings ceph: Use kzalloc ceph: use common helper for aborted dir request invalidation ceph: cope with out of order (unsafe after safe) mds reply ceph: save peer feature bits in connection structure ceph: resync headers with userland ceph: use ceph. prefix for virtual xattrs ceph: throw out dirty caps metadata, data on session teardown ceph: attempt mds reconnect if mds closes our session ceph: clean up send_mds_reconnect interface ceph: wait for mds OPEN reply to indicate reconnect success ceph: only send cap releases when mds is OPEN|HUNG ceph: dicard cap releases on mds restart ceph: make mon client statfs handling more generic ceph: drop src address(es) from message header [new protocol feature] ...
2010-05-21ceph: reuse mon subscribe message instead of allocated anewSage Weil2-10/+14
Use the same message, allocated during startup. No need to reallocate a new one each time around (and potentially ENOMEM). Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-21sanitize vfs_fsync calling conventionsChristoph Hellwig1-2/+1
Now that the last user passing a NULL file pointer is gone we can remove the redundant dentry argument and associated hacks inside vfs_fsynmc_range. The next step will be removig the dentry argument from ->fsync, but given the luck with the last round of method prototype changes I'd rather defer this until after the main merge window. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21ceph: should use deactivate_locked_super() on failure exitsAl Viro1-2/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21ceph: avoid resending queued message to monitorSage Weil1-0/+2
The auth_reply handler will (re)send any pending requests. For the initial mon authenticate phase, that's correct, but when a auth ticket renewal races with an in-flight request, we may resend a request message that is already in flight. Avoid this by revoking the message before sending it. We should also avoid resending requests at all during ticket renewal; that will come soon. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-21ceph: Storage class should be before const qualifierTobias Klauser3-6/+6
The C99 specification states in section 6.11.5: The placement of a storage-class specifier other than at the beginning of the declaration specifiers in a declaration is an obsolescent feature. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: all allocation functions should get gfp_maskYehuda Sadeh8-30/+32
This is essential, as for the rados block device we'll need to run in different contexts that would need flags that are other than GFP_NOFS. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: specify max_bytes on readdir repliesSage Weil4-1/+14
Specify max bytes in request to bound size of reply. Add associated mount option with default value of 512 KB. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: cleanup pool op stringsSage Weil1-19/+12
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: Use kzallocJulia Lawall1-2/+1
Use kzalloc rather than the combination of kmalloc and memset. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x,size,flags; statement S; @@ -x = kmalloc(size,flags); +x = kzalloc(size,flags); if (x == NULL) S -memset(x, 0, size); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: use common helper for aborted dir request invalidationSage Weil3-31/+27
We invalidate I_COMPLETE and dentry leases in two places: on aborted mds request and on request replay. Use common helper to avoid duplicate code. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: cope with out of order (unsafe after safe) mds replySage Weil1-0/+6
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: save peer feature bits in connection structureSage Weil2-0/+2
These are used for adjusting behavior, such as conditionally encoding a newer message format. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: resync headers with userlandSage Weil6-22/+91
Notable changes include pool op defines and types, FLOCK feature bit, and new CMPXATTR osd ops. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: use ceph. prefix for virtual xattrsSage Weil1-10/+11
Drop the 'user.' prefix and use just 'ceph.' for fs virtual xattrs. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: throw out dirty caps metadata, data on session teardownSage Weil1-3/+41
The remove_session_caps() helper is called when an MDS closes out our session (either normally, or as a result of a failed reconnect), and when we tear down state for umount. If we remove the last cap, and there are no cap migrations in progress, then there is little hope of us flushing out that data to the mds (without heroic efforts to reconnect and flush). So, to avoid leaving inodes pinned (due to dirty state) and crashing after umount, throw out dirty caps state and unpin the inodes. Print a warning to the console so we know something was lost. NOTE: Although we drop wrbuffer refs, we don't actually mark pages clean; maybe a truncate should be queued? Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: attempt mds reconnect if mds closes our sessionSage Weil1-2/+3
Currently, if our session is closed (due to a timeout, or explicit close, or whatever), we just sit there doing nothing unless/until the MDS restarts, at which point we try to reconnect. Change client to attempt an immediate reconnect if our session is closed. Note that currently the MDS doesn't support this, and our attempt will fail. We'll get a session CLOSE, our caps and dirty cap state will be dropped, and the client will be free to attempt to reconnect. That's clearly not as nice as a successful reconnect, but it at least allows us to try to carry on, and in the future the MDS will support a reconnect and we will fare better. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: clean up send_mds_reconnect interfaceSage Weil1-31/+16
Pass a ceph_mds_session, since the caller has it. Remove the dead code for sending empty reconnects. It used to be used when the MDS contacted _us_ to solicit a reconnect, and we could reply saying "go away, I have no session." Now we only send reconnects based on the mds map, and only when we do in fact have an open session. Signed-off-by: Sage Weil <sage@newdream.net>