aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux/ceph (follow)
AgeCommit message (Collapse)AuthorFilesLines
2011-03-22libceph: add lingering request and watch/notify event frameworkYehuda Sadeh1-0/+52
Lingering requests are requests that are sent to the OSD normally but tracked also after we get a successful request. This keeps the OSD connection open and resends the original request if the object moves to another OSD. The OSD can then send notification messages back to us if another client initiates a notify. This framework will be used by RBD so that the client gets notification when a snapshot is created by another node or tool. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-21ceph: move readahead default to fs/ceph from libcephSage Weil1-1/+0
Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-21ceph: update common header filesYehuda Sadeh2-13/+45
This updates the common header files used by the different ceph related modules. Specifically it adds definitions required by the rbd watch/notify feature. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
2011-03-21libceph: fix osd request queuing on osdmap updatesSage Weil1-2/+3
If we send a request to osd A, and the request's pg remaps to osd B and then back to A in quick succession, we need to resend the request to A. The old code was only calling kick_requests after processing all incremental maps in a message, so it was very possible to not resend a request that needed to be resent. This would make the osd eventually time out (at least with the current default of osd timeouts enabled). The correct approach is to scan requests on every map incremental. This patch refactors the kick code in a few ways: - all requests are either on req_lru (in flight), req_unsent (ready to send), or req_notarget (currently map to no up osd) - mapping always done by map_request (previous map_osds) - if the mapping changes, we requeue. requests are resent only after all map incrementals are processed. - some osd reset code is moved out of kick_requests into a separate function - the "kick this osd" functionality is moved to kick_osd_requests, as it is unrelated to scanning for request->pg->osd mapping changes Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-04libceph: fix msgr keepalive flagSage Weil1-1/+0
There was some broken keepalive code using a dead variable. Shift to using the proper bit flag. Signed-off-by: Sage Weil <sage@newdream.net>
2011-03-04libceph: fix msgr backoffSage Weil1-0/+1
With commit f363e45f we replaced a bunch of hacky workqueue mutual exclusion logic with the WQ_NON_REENTRANT flag. One pieces of fallout is that the exponential backoff breaks in certain cases: * con_work attempts to connect. * we get an immediate failure, and the socket state change handler queues immediate work. * con_work calls con_fault, we decide to back off, but can't queue delayed work. In this case, we add a BACKOFF bit to make con_work reschedule delayed work next time it runs (which should be immediately). Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-12net/ceph: make ceph_msgr_wq non-reentrantTejun Heo1-5/+0
ceph messenger code does a rather complex dancing around multithread workqueue to make sure the same work item isn't executed concurrently on different CPUs. This restriction can be provided by workqueue with WQ_NON_REENTRANT. Make ceph_msgr_wq non-reentrant workqueue with the default concurrency level and remove the QUEUED/BUSY logic. * This removes backoff handling in con_work() but it couldn't reliably block execution of con_work() to begin with - queue_con() can be called after the work started but before BUSY is set. It seems that it was an optimization for a rather cold path and can be safely removed. * The number of concurrent work items is bound by the number of connections and connetions are independent from each other. With the default concurrency level, different connections will be executed independently. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Sage Weil <sage@newdream.net> Cc: ceph-devel@vger.kernel.org Signed-off-by: Sage Weil <sage@newdream.net>
2011-01-12ceph: add dir_layout to inodeSage Weil1-3/+13
Add a ceph_dir_layout to the inode, and calculate dentry hash values based on the parent directory's specified dir_hash function. This is needed because the old default Linux dcache hash function is extremely week and leads to a poor distribution of files among dir fragments. Signed-off-by: Sage Weil <sage@newdream.net>
2010-12-17ceph: mark user pages dirty on direct-io readsHenry C Chang1-2/+4
For read operation, we have to set the argument _write_ of get_user_pages to 1 since we will write data to pages. Also, we need to SetPageDirty before releasing these pages. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-09ceph: explicitly specify page alignment in network messagesSage Weil1-0/+1
The alignment used for reading data into or out of pages used to be taken from the data_off field in the message header. This only worked as long as the page alignment matched the object offset, breaking direct io to non-page aligned offsets. Instead, explicitly specify the page alignment next to the page vector in the ceph_msg struct, and use that instead of the message header (which probably shouldn't be trusted). The alloc_msg callback is responsible for filling in this field properly when it sets up the page vector. Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-09ceph: make page alignment explicit in osd interfaceSage Weil1-2/+5
We used to infer alignment of IOs within a page based on the file offset, which assumed they matched. This broke with direct IO that was not aligned to pages (e.g., 512-byte aligned IO). We were also trusting the alignment specified in the OSD reply, which could have been adjusted by the server. Explicitly specify the page alignment when setting up OSD IO requests. Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-09ceph: fix comment, remove extraneous argsSage Weil1-2/+1
The offset/length arguments aren't used. Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20ceph: add CEPH_MDS_OP_SETDIRLAYOUT and associated ioctl.Greg Farnum1-0/+1
Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20ceph: add pagelist_reserve, pagelist_truncate, pagelist_set_cursorGreg Farnum1-0/+21
These facilitate preallocation of pages so that we can encode into the pagelist in an atomic context. Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-20ceph: factor out libceph from Ceph file systemYehuda Sadeh19-0/+2999
This factors out protocol and low-level storage parts of ceph into a separate libceph module living in net/ceph and include/linux/ceph. This is mostly a matter of moving files around. However, a few key pieces of the interface change as well: - ceph_client becomes ceph_fs_client and ceph_client, where the latter captures the mon and osd clients, and the fs_client gets the mds client and file system specific pieces. - Mount option parsing and debugfs setup is correspondingly broken into two pieces. - The mon client gets a generic handler callback for otherwise unknown messages (mds map, in this case). - The basic supported/required feature bits can be expanded (and are by ceph_fs_client). No functional change, aside from some subtle error handling cases that got cleaned up in the refactoring process. Signed-off-by: Sage Weil <sage@newdream.net>