aboutsummaryrefslogtreecommitdiffstats
path: root/fs/ceph/osd_client.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2010-07-27ceph: use complete_all and wake_up_allYehuda Sadeh1-3/+3
This fixes an issue triggered by running concurrent syncs. One of the syncs would go through while the other would just hang indefinitely. In any case, we never actually want to wake a single waiter, so the *_all functions should be used. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-06-13ceph: fix map handler error pathSage Weil1-1/+2
Don't leak message if we receive an unexpected message type. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-29ceph: fix leak of osd authorizerSage Weil1-1/+6
Release the ceph_authorizer when releasing osd state. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-21ceph: Storage class should be before const qualifierTobias Klauser1-2/+2
The C99 specification states in section 6.11.5: The placement of a storage-class specifier other than at the beginning of the declaration specifiers in a declaration is an obsolescent feature. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: all allocation functions should get gfp_maskYehuda Sadeh1-4/+4
This is essential, as for the rados block device we'll need to run in different contexts that would need flags that are other than GFP_NOFS. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: name msgpools; useful error messagesSage Weil1-2/+4
Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: osdtimeout=0 for now timeoutSage Weil1-1/+1
Allow the osd reset timeout to be disabled. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: wake up mount thread when getting osdmapYehuda Sadeh1-0/+1
Now that the mount thread waits for the osdmap, it needs to be awaken. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-05-17ceph: simplify ceph_msg_newSage Weil1-4/+4
We only need to pass in front_len. Callers can attach any other payload pieces (middle, data) as they see fit. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: make ceph_msg_new return NULL on failure; clean up, fix callersSage Weil1-11/+11
Returning ERR_PTR(-ENOMEM) is useless extra work. Return NULL on failure instead, and fix up the callers (about half of which were wrong anyway). Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: fix theoretically possible double-put on connectionSage Weil1-0/+1
This would only trigger if we bailed out before resetting r_con_filling_msg because the server reply was corrupt (oversized). Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-17ceph: simplify page setup for incoming dataSage Weil1-44/+12
Drop largely useless helper __prepare_pages(), and simplify sanity checks. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-11ceph: resubmit requests on pg mapping change (not just primary change)Sage Weil1-4/+15
OSD requests need to be resubmitted on any pg mapping change, not just when the pg primary changes. Resending only when the primary changes results in occasional 'hung' requests during osd cluster recovery or rebalancing. Signed-off-by: Sage Weil <sage@newdream.net>
2010-05-11ceph: unregister osd request on failureSage Weil1-2/+5
The osd request wasn't being unregistered when the osd returned a failure code, even though the result was returned to the caller. This would cause it to eventually time out, and then crash the kernel when it tried to resend the request using a stale page vector. Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23ceph: avoid reopening osd connections when address hasn't changedSage Weil1-1/+14
We get a fault callback on _every_ tcp connection fault. Normally, we want to reopen the connection when that happens. If the address we have is bad, however, and connection attempts always result in a connection refused or similar error, explicitly closing and reopening the msgr connection just prevents the messenger's backoff logic from kicking in. The result can be a console full of [ 3974.417106] ceph: osd11 10.3.14.138:6800 connection failed [ 3974.423295] ceph: osd11 10.3.14.138:6800 connection failed [ 3974.429709] ceph: osd11 10.3.14.138:6800 connection failed Instead, if we get a fault, and have outstanding requests, but the osd address hasn't changed and the connection never successfully connected in the first place, do nothing to the osd connection. The messenger layer will back off and retry periodically, because we never connected and thus the lossy bit is not set. Instead, touch each request's r_stamp so that handle_timeout can tell the request is still alive and kicking. Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23ceph: rename r_sent_stamp r_stampSage Weil1-6/+6
Make variable name slightly more generic, since it will (soon) reflect either the time the request was sent OR the time it was last determined to be still retrying. Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-23ceph: fix null pointer deref of r_osd in debug outputSage Weil1-1/+1
This causes an oops when debug output is enabled and we kick an osd request with no current r_osd (sometime after an osd failure). Check the pointer before dereferencing. Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-04ceph: reset osd after relevant messages timed outYehuda Sadeh1-55/+98
This simplifies the process of timing out messages. We keep lru of current messages that are in flight. If a timeout has passed, we reset the osd connection, so that messages will be retransmitted. This is a failsafe in case we hit some sort of problem sending out message to the OSD. Normally, we'll get notification via an updated osdmap if there are problems. If a request is older than the keepalive timeout, send a keepalive to ensure we detect any breaks in the TCP connection. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-01ceph: set osd request message front length correctlySage Weil1-0/+3
We didn't set the front length correctly. When messages used the message pool we ended up with the conservative max (4 KB), and the rest of the time the slightly less conservative estimate. Even though the OSD ignores the extra data, set it to the right value to avoid sending extra data over the network. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-03-01ceph: use single osd op reply msgSage Weil1-93/+45
Use a single ceph_msg for the osd reply, even when we are getting multiple replies. Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-26ceph: remove fragile __map_osds optimizationSage Weil1-17/+4
We used to try to avoid freeing and then reallocating the osd struct. This is a bit fragile due to potential interactions with other references (beyond o_requests), and may be the cause of this crash: [120633.442358] BUG: unable to handle kernel NULL pointer dereference at (null) [120633.443292] IP: [<ffffffff812549b6>] rb_erase+0x11d/0x277 [120633.443292] PGD f7ff3067 PUD f7f53067 PMD 0 [120633.443292] Oops: 0000 [#1] PREEMPT SMP [120633.443292] last sysfs file: /sys/kernel/uevent_seqnum [120633.443292] CPU 1 [120633.443292] Modules linked in: ceph fan ac battery psmouse ehci_hcd ide_pci_generic ohci_hcd thermal processor button [120633.443292] Pid: 3023, comm: ceph-msgr/1 Not tainted 2.6.32-rc2 #12 H8SSL [120633.443292] RIP: 0010:[<ffffffff812549b6>] [<ffffffff812549b6>] rb_erase+0x11d/0x277 [120633.443292] RSP: 0018:ffff8800f7b13a50 EFLAGS: 00010246 [120633.443292] RAX: ffff880022907819 RBX: ffff880022907818 RCX: 0000000000000000 [120633.443292] RDX: ffff8800f7b13a80 RSI: ffff8800f587eb48 RDI: 0000000000000000 [120633.443292] RBP: ffff8800f7b13a60 R08: 0000000000000000 R09: 0000000000000004 [120633.443292] R10: 0000000000000000 R11: ffff8800c4441000 R12: ffff8800f587eb48 [120633.443292] R13: ffff8800f58eaa00 R14: ffff8800f413c000 R15: 0000000000000001 [120633.443292] FS: 00007fbef6e226e0(0000) GS:ffff880009200000(0000) knlGS:0000000000000000 [120633.443292] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [120633.443292] CR2: 0000000000000000 CR3: 00000000f7c53000 CR4: 00000000000006e0 [120633.443292] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [120633.443292] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [120633.443292] Process ceph-msgr/1 (pid: 3023, threadinfo ffff8800f7b12000, task ffff8800f5858b40) [120633.443292] Stack: [120633.443292] ffff8800f413c000 ffff8800f587e9c0 ffff8800f7b13a80 ffffffffa0098a86 [120633.443292] <0> 00000000000006f1 0000000000000000 ffff8800f7b13af0 ffffffffa009959b [120633.443292] <0> ffff8800f413c000 ffff880022a68400 ffff880022a68400 ffff8800f587e9c0 [120633.443292] Call Trace: [120633.443292] [<ffffffffa0098a86>] __remove_osd+0x4d/0xbc [ceph] [120633.443292] [<ffffffffa009959b>] __map_osds+0x199/0x4fa [ceph] [120633.443292] [<ffffffffa00999f4>] ? __send_request+0xf8/0x186 [ceph] [120633.443292] [<ffffffffa0099beb>] kick_requests+0x169/0x3cb [ceph] [120633.443292] [<ffffffffa009a8c1>] ceph_osdc_handle_map+0x370/0x522 [ceph] Since we're probably screwed anyway if a small kmalloc is failing, don't bother with trying to be clever here. Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-23ceph: fix up unexpected message handlingSage Weil1-10/+31
Fix skipping of unexpected message types from osd, mon. Clean up pr_info and debug output. Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-15ceph: reset osd connections after faultSage Weil1-2/+3
A single osd connection fault (e.g. tcp disconnect) wasn't reopening the connection, which causes all current and future requests for that osd to hang. Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-11ceph: put unused osd connections on lruYehuda Sadeh1-9/+67
Instead of removing osd connection immediately when the requests list is empty, put the osd connection on an lru. Only if that osd has not been used for more than a specified time, will it be removed. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-10ceph: allow renewal of auth credentialsSage Weil1-0/+12
Add infrastructure to allow the mon_client to periodically renew its auth credentials. Also add a messenger callback that will force such a renewal if a peer rejects our authenticator. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-02ceph: always send truncation info with read and write osd opsYehuda Sadeh1-13/+3
This fixes a bug where the read/write ops arrive the osd after a following truncation request. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-01-25ceph: keep reserved replies on the request structureYehuda Sadeh1-35/+83
This includes treating all the data preallocation and revokation at the same place, not having to have a special case for the reserved pages. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-01-25ceph: alloc message data pages and check if tid existsYehuda Sadeh1-24/+42
Now doing it in the same callback that is also responsible for allocating the 'front' part of the message. If we get a message that we haven't got a corresponding tid for, mark it for skipping. Moving the mutex unlock/lock from the osd alloc_msg callback to the calling function in the messenger. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-01-25ceph: allocate middle of message before stating to readYehuda Sadeh1-4/+13
Both front and middle parts of the message are now being allocated at the ceph_alloc_msg(). Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2010-01-14ceph: display pgid in debugfs osd request dumpSage Weil1-0/+2
Signed-off-by: Sage Weil <sage@newdream.net>
2010-01-14ceph: remove unused erank fieldSage Weil1-3/+4
The ceph_entity_addr erank field is obsolete; remove it. Get rid of trivial addr comparison helpers while we're at it. Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23ceph: include transaction id in ceph_msg_header (protocol change)Sage Weil1-6/+3
Many (most?) message types include a transaction id. By including it in the fixed size header, we always have it available even when we are unable to allocate memory for the (larger, variable sized) message body. This will allow us to error out the appropriate request instead of (silently) dropping the reply. Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23ceph: control access to page vector for incoming dataSage Weil1-9/+33
When we issue an OSD read, we specify a vector of pages that the data is to be read into. The request may be sent multiple times, to multiple OSDs, if the osdmap changes, which means we can get more than one reply. Only read data into the page vector if the reply is coming from the OSD we last sent the request to. Keep track of which connection is using the vector by taking a reference. If another connection was already using the vector before and a new reply comes in on the right connection, revoke the pages from the other connection. Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-23ceph: unregister canceled/timed out osd requestsSage Weil1-1/+2
Canceled or timed out osd requests were getting left in the request list and never deallocated (until umount). Unregister if they are canceled (control-c) or time out. Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-21ceph: fix error paths for corrupt osdmap messagesSage Weil1-0/+2
Both osdmap_decode() and osdmap_apply_incremental() should never return NULL. Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-21ceph: hex dump corrupt server data to KERN_DEBUGSage Weil1-0/+2
Also, print fsid using standard format, NOT hex dump. Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-21ceph: fix msgpool reservation leakYehuda Sadeh1-1/+4
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2009-12-07ceph: use kref for ceph_osd_requestSage Weil1-19/+18
Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-03ceph: whitespace cleanupSage Weil1-2/+2
Signed-off-by: Sage Weil <sage@newdream.net>
2009-12-01ceph: plug leak of request_mutexSage Weil1-0/+1
Fix leak of osd client request_mutex on receiving dup ack. Signed-off-by: Sage Weil <sage@newdream.net>
2009-11-21fs/ceph: Move a dereference below a NULL testJulia Lawall1-1/+2
If the NULL test is necessary, then the dereference should be moved below the NULL test. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/). // <smpl> @@ type T; expression E; identifier i,fld; statement S; @@ - T i = E->fld; + T i; ... when != E when != i if (E == NULL) S + i = E->fld; // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Sage Weil <sage@newdream.net>
2009-11-20ceph: fix debugfs entry, simplify fsid checksSage Weil1-10/+2
We may first learn our fsid from any of the mon, osd, or mds maps (whichever the monitor sends first). Consolidate checks in a single helper. Initialize the client debugfs entry then, since we need the fsid (and global_id) for the directory name. Also remove dead mount code. Signed-off-by: Sage Weil <sage@newdream.net>
2009-11-18ceph: negotiate authentication protocol; implement AUTH_NONE protocolSage Weil1-3/+60
When we open a monitor session, we send an initial AUTH message listing the auth protocols we support, our entity name, and (possibly) a previously assigned global_id. The monitor chooses a protocol and responds with an initial message. Initially implement AUTH_NONE, a dummy protocol that provides no security, but works within the new framework. It generates 'authorizers' that are used when connecting to (mds, osd) services that simply state our entity name and global_id. This is a wire protocol change. Signed-off-by: Sage Weil <sage@newdream.net>
2009-11-18ceph: handle errors during osd client initSage Weil1-4/+11
Unwind initializing if we get ENOMEM during client initialization. Signed-off-by: Sage Weil <sage@newdream.net>
2009-11-18ceph: remove bad calls to ceph_con_shutdownSage Weil1-3/+1
We want to ceph_con_close when we're done with the connection, before the ref count reaches 0. Once it does, do not call ceph_con_shutdown, as that takes the con mutex and may sleep, and besides that is unnecessary. Signed-off-by: Sage Weil <sage@newdream.net>
2009-11-04ceph: fix endian conversions for ceph_pgSage Weil1-4/+4
The endian conversions don't quite work with the old union ceph_pg. Just make it a regular struct, and make each field __le. This is simpler and it has the added bonus of actually working. Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-27ceph: allocate and parse mount args before client instanceSage Weil1-3/+3
This simplifies much of the error handling during mount. It also means that we have the mount args before client creation, and we can initialize based on those options. Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-15ceph: warn on allocation from msgpool with larger front_lenSage Weil1-2/+3
Pass the front_len we need when pulling a message off a msgpool, and WARN if it is greater than the pool's size. Then try to allocate a new message (to continue without failing). Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-14ceph: convert encode/decode macros to inlinesSage Weil1-4/+4
This avoids the fugly pass by reference and makes the code a bit easier to read. Signed-off-by: Sage Weil <sage@newdream.net>
2009-10-09ceph: cancel osd requests before resending themSage Weil1-1/+4
This ensures we don't submit the same request twice if we are kicking a specific osd (as with an osd_reset), or when we hit a transient error and resend. Signed-off-by: Sage Weil <sage@newdream.net>