aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/block/rbd.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-10-22libceph, rbd, ceph: move ceph_osdc_alloc_messages() callsIlya Dryomov1-7/+12
The current requirement is that ceph_osdc_alloc_messages() should be called after oid and oloc are known. In preparation for preallocating message data items, move ceph_osdc_alloc_messages() further down, so that it is called when OSD op codes are known. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-10-22libceph: osd_req_op_cls_init() doesn't need to take opcodeIlya Dryomov1-2/+1
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-10-22rbd: add __init/__exit annotationsChengguang Xu1-3/+3
Add __init/__exit annotation to init/cleanup helpers which are only called once in the module. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-09-06rbd: support cloning across namespacesIlya Dryomov1-14/+97
If parent_get class method is not supported by the OSDs, fall back to the legacy class method and assume that the parent is in the default (i.e. "") namespace. The "use the child's image namespace" workaround is no longer needed because creating images within namespaces will require parent_get aware OSDs. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-09-06rbd: factor out get_parent_info()Ilya Dryomov1-48/+86
In preparation for the new parent_get and parent_overlap_get class methods, factor out the fetching and decoding of parent data. As a side effect, we now decode all four fields in the "no parent" case. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2018-08-20Merge tag 'ceph-for-4.19-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds1-37/+88
Pull ceph updates from Ilya Dryomov: "The main things are support for cephx v2 authentication protocol and basic support for rbd images within namespaces (myself). Also included are y2038 conversion patches from Arnd, a pile of miscellaneous fixes from Chengguang and Zheng's feature bit infrastructure for the filesystem" * tag 'ceph-for-4.19-rc1' of git://github.com/ceph/ceph-client: (40 commits) ceph: don't drop message if it contains more data than expected ceph: support cephfs' own feature bits crush: fix using plain integer as NULL warning libceph: remove unnecessary non NULL check for request_key ceph: refactor error handling code in ceph_reserve_caps() ceph: refactor ceph_unreserve_caps() ceph: change to void return type for __do_request() ceph: compare fsc->max_file_size and inode->i_size for max file size limit ceph: add additional size check in ceph_setattr() ceph: add additional offset check in ceph_write_iter() ceph: add additional range check in ceph_fallocate() ceph: add new field max_file_size in ceph_fs_client libceph: weaken sizeof check in ceph_x_verify_authorizer_reply() libceph: check authorizer reply/challenge length before reading libceph: implement CEPHX_V2 calculation mode libceph: add authorizer challenge libceph: factor out encrypt_authorizer() libceph: factor out __ceph_x_decrypt() libceph: factor out __prepare_write_connect() libceph: store ceph_auth_handshake pointer in ceph_connection ...
2018-08-02libceph: use timespec64 for r_mtimeArnd Bergmann1-1/+1
The request mtime field is used all over ceph, and is currently represented as a 'timespec' structure in Linux. This changes it to timespec64 to allow times beyond 2038, modifying all users at the same time. [ Remove now redundant ts variable in writepage_nounlock(). ] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-08-02rbd: support for images within namespacesIlya Dryomov1-2/+49
Cloning across namespaces isn't supported yet -- for now both the parent and the clone have to live in the same namespace, whether the default (i.e. "") or a user-created namespace. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-08-02rbd: pass rbd_spec into parse_rbd_opts_token()Ilya Dryomov1-33/+37
In preparation for _pool_ns client option, make rbd_spec available inside parse_rbd_opts_token(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-08-02libceph: amend "bad option arg" error messageIlya Dryomov1-1/+1
Don't mention "mount" -- in the rbd case it is "mapping". Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-21atomics/treewide: Rename __atomic_add_unless() => atomic_fetch_add_unless()Mark Rutland1-1/+1
While __atomic_add_unless() was originally intended as a building-block for atomic_add_unless(), it's now used in a number of places around the kernel. It's the only common atomic operation named __atomic*(), rather than atomic_*(), and for consistency it would be better named atomic_fetch_add_unless(). This lack of consistency is slightly confusing, and gets in the way of scripting atomics. Given that, let's clean things up and promote it to an official part of the atomics API, in the form of atomic_fetch_add_unless(). This patch converts definitions and invocations over to the new name, including the instrumented version, using the following script: ---- git grep -w __atomic_add_unless | while read line; do sed -i '{s/\<__atomic_add_unless\>/atomic_fetch_add_unless/}' "${line%%:*}"; done git grep -w __arch_atomic_add_unless | while read line; do sed -i '{s/\<__arch_atomic_add_unless\>/arch_atomic_fetch_add_unless/}' "${line%%:*}"; done ---- Note that we do not have atomic{64,_long}_fetch_add_unless(), which will be introduced by later patches. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Palmer Dabbelt <palmer@sifive.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/20180621121321.4761-2-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-15Merge tag 'ceph-for-4.18-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds1-4/+7
Pull ceph updates from Ilya Dryomov: "The main piece is a set of libceph changes that revamps how OSD requests are aborted, improving CephFS ENOSPC handling and making "umount -f" actually work (Zheng and myself). The rest is mostly mount option handling cleanups from Chengguang and assorted fixes from Zheng, Luis and Dongsheng. * tag 'ceph-for-4.18-rc1' of git://github.com/ceph/ceph-client: (31 commits) rbd: flush rbd_dev->watch_dwork after watch is unregistered ceph: update description of some mount options ceph: show ino32 if the value is different with default ceph: strengthen rsize/wsize/readdir_max_bytes validation ceph: fix alignment of rasize ceph: fix use-after-free in ceph_statfs() ceph: prevent i_version from going back ceph: fix wrong check for the case of updating link count libceph: allocate the locator string with GFP_NOFAIL libceph: make abort_on_full a per-osdc setting libceph: don't abort reads in ceph_osdc_abort_on_full() libceph: avoid a use-after-free during map check libceph: don't warn if req->r_abort_on_full is set libceph: use for_each_request() in ceph_osdc_abort_on_full() libceph: defer __complete_request() to a workqueue libceph: move more code into __complete_request() libceph: no need to call flush_workqueue() before destruction ceph: flush pending works before shutdown super ceph: abort osd requests on force umount libceph: introduce ceph_osdc_abort_requests() ...
2018-06-04rbd: flush rbd_dev->watch_dwork after watch is unregisteredDongsheng Yang1-1/+1
There is a problem if we are going to unmap a rbd device and the watch_dwork is going to queue delayed work for watch: unmap Thread watch Thread timer do_rbd_remove cancel_tasks_sync(rbd_dev) queue_delayed_work for watch destroy_workqueue(rbd_dev->task_wq) drain_workqueue(wq) destroy other resources in wq call_timer_fn __queue_work() Then the delayed work escape the cancel_tasks_sync() and destroy_workqueue() and we will get an user-after-free call trace: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI Modules linked in: CPU: 7 PID: 0 Comm: swapper/7 Tainted: G OE 4.17.0-rc6+ #13 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__queue_work+0x6a/0x3b0 RSP: 0018:ffff9427df1c3e90 EFLAGS: 00010086 RAX: ffff9427deca8400 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff9427deca8400 RSI: ffff9427df1c3e50 RDI: 0000000000000000 RBP: ffff942783e39e00 R08: ffff9427deca8400 R09: ffff9427df1c3f00 R10: 0000000000000004 R11: 0000000000000005 R12: ffff9427cfb85970 R13: 0000000000002000 R14: 000000000001eca0 R15: 0000000000000007 FS: 0000000000000000(0000) GS:ffff9427df1c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000004c900a005 CR4: 00000000000206e0 Call Trace: <IRQ> ? __queue_work+0x3b0/0x3b0 call_timer_fn+0x2d/0x130 run_timer_softirq+0x16e/0x430 ? tick_sched_timer+0x37/0x70 __do_softirq+0xd2/0x280 irq_exit+0xd5/0xe0 smp_apic_timer_interrupt+0x6c/0x130 apic_timer_interrupt+0xf/0x20 [ Move rbd_dev->watch_dwork cancellation so that rbd_reregister_watch() either bails out early because the watch is UNREGISTERED at that point or just gets cancelled. ] Cc: stable@vger.kernel.org Fixes: 99d1694310df ("rbd: retry watch re-registration periodically") Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04libceph, rbd: add error handling for osd_req_op_cls_init()Chengguang Xu1-3/+6
Add proper error handling for osd_req_op_cls_init() to replace BUG_ON statement when failing from memory allocation. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-06-04Merge tag 'for-4.18/block-20180603' of git://git.kernel.dk/linux-blockLinus Torvalds1-22/+22
Pull block updates from Jens Axboe: - clean up how we pass around gfp_t and blk_mq_req_flags_t (Christoph) - prepare us to defer scheduler attach (Christoph) - clean up drivers handling of bounce buffers (Christoph) - fix timeout handling corner cases (Christoph/Bart/Keith) - bcache fixes (Coly) - prep work for bcachefs and some block layer optimizations (Kent). - convert users of bio_sets to using embedded structs (Kent). - fixes for the BFQ io scheduler (Paolo/Davide/Filippo) - lightnvm fixes and improvements (Matias, with contributions from Hans and Javier) - adding discard throttling to blk-wbt (me) - sbitmap blk-mq-tag handling (me/Omar/Ming). - remove the sparc jsflash block driver, acked by DaveM. - Kyber scheduler improvement from Jianchao, making it more friendly wrt merging. - conversion of symbolic proc permissions to octal, from Joe Perches. Previously the block parts were a mix of both. - nbd fixes (Josef and Kevin Vigor) - unify how we handle the various kinds of timestamps that the block core and utility code uses (Omar) - three NVMe pull requests from Keith and Christoph, bringing AEN to feature completeness, file backed namespaces, cq/sq lock split, and various fixes - various little fixes and improvements all over the map * tag 'for-4.18/block-20180603' of git://git.kernel.dk/linux-block: (196 commits) blk-mq: update nr_requests when switching to 'none' scheduler block: don't use blocking queue entered for recursive bio submits dm-crypt: fix warning in shutdown path lightnvm: pblk: take bitmap alloc. out of critical section lightnvm: pblk: kick writer on new flush points lightnvm: pblk: only try to recover lines with written smeta lightnvm: pblk: remove unnecessary bio_get/put lightnvm: pblk: add possibility to set write buffer size manually lightnvm: fix partial read error path lightnvm: proper error handling for pblk_bio_add_pages lightnvm: pblk: fix smeta write error path lightnvm: pblk: garbage collect lines with failed writes lightnvm: pblk: rework write error recovery path lightnvm: pblk: remove dead function lightnvm: pass flag on graceful teardown to targets lightnvm: pblk: check for chunk size before allocating it lightnvm: pblk: remove unnecessary argument lightnvm: pblk: remove unnecessary indirection lightnvm: pblk: return NVM_ error on failed submission lightnvm: pblk: warn in case of corrupted write buffer ...
2018-05-24block drivers/block: Use octal not symbolic permissionsJoe Perches1-22/+22
Convert the S_<FOO> symbolic permissions to their octal equivalents as using octal and not symbolic permissions is preferred by many as more readable. see: https://lkml.org/lkml/2016/8/2/1945 Done with automated conversion via: $ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace <files...> Miscellanea: o Wrapped modified multi-line calls to a single line where appropriate o Realign modified multi-line calls to open parenthesis Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-10libceph: add osd_req_op_extent_osd_data_bvecs()Ilya Dryomov1-1/+3
... and store num_bvecs for client code's convenience. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-04-16rbd: notrim map optionIlya Dryomov1-5/+14
Add an option to turn off discard and write zeroes offload support to avoid deprovisioning a fully provisioned image. When enabled, discard requests will fail with -EOPNOTSUPP, write zeroes requests will fall back to manually zeroing. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Hitoshi Kamei <hitoshi.kamei.xm@hitachi.com>
2018-04-16rbd: adjust queue limits for "fancy" stripingIlya Dryomov1-9/+8
In order to take full advantage of merging in ceph_file_to_extents(), allow object set sized I/Os. If the layout is not "fancy", an object set consists of just one object. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-16rbd: avoid Wreturn-type warningsArnd Bergmann1-3/+3
In some configurations gcc cannot see that rbd_assert(0) leads to an unreachable code path: drivers/block/rbd.c: In function 'rbd_img_is_write': drivers/block/rbd.c:1397:1: error: control reaches end of non-void function [-Werror=return-type] drivers/block/rbd.c: In function '__rbd_obj_handle_request': drivers/block/rbd.c:2499:1: error: control reaches end of non-void function [-Werror=return-type] drivers/block/rbd.c: In function 'rbd_obj_handle_write': drivers/block/rbd.c:2471:1: error: control reaches end of non-void function [-Werror=return-type] As the rbd_assert() here shows has no extra information beyond the verbose BUG(), we can simply use BUG() directly in its place. This is reliably detected as not returning on any architecture, since it doesn't depend on the unlikely() comparison that confused gcc. Fixes: 3da691bf4366 ("rbd: new request handling code") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-16rbd: support timeout in rbd_wait_state_locked()Dongsheng Yang1-1/+21
currently, the rbd_wait_state_locked() will wait forever if we can't get our state locked. Example: rbd map --exclusive test1 --> /dev/rbd0 rbd map test1 --> /dev/rbd1 dd if=/dev/zero of=/dev/rbd1 bs=1M count=1 --> IO blocked To avoid this problem, this patch introduce a timeout design in rbd_wait_state_locked(). Then rbd_wait_state_locked() will return error when we reach a timeout. This patch allow user to set the lock_timeout in rbd mapping. Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-16rbd: refactor rbd_wait_state_locked()Ilya Dryomov1-17/+26
In preparation for lock_timeout option, make rbd_wait_state_locked() return error codes. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-10Merge tag 'ceph-for-4.17-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds1-1472/+980
Pull ceph updates from Ilya Dryomov: "The big ticket items are: - support for rbd "fancy" striping (myself). The striping feature bit is now fully implemented, allowing mapping v2 images with non-default striping patterns. This completes support for --image-format 2. - CephFS quota support (Luis Henriques and Zheng Yan). This set is based on the new SnapRealm code in the upcoming v13.y.z ("Mimic") release. Quota handling will be rejected on older filesystems. - memory usage improvements in CephFS (Chengguang Xu). Directory specific bits have been split out of ceph_file_info and some effort went into improving cap reservation code to avoid OOM crashes. Also included a bunch of assorted fixes all over the place from Chengguang and others" * tag 'ceph-for-4.17-rc1' of git://github.com/ceph/ceph-client: (67 commits) ceph: quota: report root dir quota usage in statfs ceph: quota: add counter for snaprealms with quota ceph: quota: cache inode pointer in ceph_snap_realm ceph: fix root quota realm check ceph: don't check quota for snap inode ceph: quota: update MDS when max_bytes is approaching ceph: quota: support for ceph.quota.max_bytes ceph: quota: don't allow cross-quota renames ceph: quota: support for ceph.quota.max_files ceph: quota: add initial infrastructure to support cephfs quotas rbd: remove VLA usage rbd: fix spelling mistake: "reregisteration" -> "reregistration" ceph: rename function drop_leases() to a more descriptive name ceph: fix invalid point dereference for error case in mdsc destroy ceph: return proper bool type to caller instead of pointer ceph: optimize memory usage ceph: optimize mds session register libceph, ceph: add __init attribution to init funcitons ceph: filter out used flags when printing unused open flags ceph: don't wait on writeback when there is no more dirty pages ...
2018-04-02rbd: remove VLA usageKyle Spiers1-4/+4
As part of the effort to remove VLAs from the kernel[1], this moves the literal values into the stack array calculation instead of using a variable for the sizing. The resulting size can be found from sizeof(buf). [1] https://lkml.org/lkml/2018/3/7/621 Signed-off-by: Kyle Spiers <kyle@spiers.me> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: fix spelling mistake: "reregisteration" -> "reregistration"Colin Ian King1-1/+1
Trivial fix to spelling mistake in rdb_warn message text. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: get the latest osdmap when using an existing clientIlya Dryomov1-36/+33
Currently we request the latest osdmap only if ceph_pg_poolid_by_name() fails with -ENOENT. This is effective with newly created pools, but we also want to avoid attempting to map from pools that were recently deleted and report "pool does not exist" instead. (Such an attempt eventually fails in the OSD client after map check code kicks in, but the error message is confusing.) Request the latest osdmap unconditionally after bumping a ref on an existing client in rbd_client_find(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: move rbd_get_client() below rbd_put_client()Ilya Dryomov1-20/+20
... to avoid a forward declaration in the next commit. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: remove redundant declaration of rbd_spec_put()Ilya Dryomov1-1/+0
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: allow "fancy" stripingIlya Dryomov1-27/+2
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Acked-by: Jason Dillaman <dillaman@redhat.com>
2018-04-02rbd: introduce OWN_BVECS data typeIlya Dryomov1-7/+149
If the layout is "fancy", we need to be able to rearrange the provided bio_vecs in stripe unit chunks to make it possible for the messenger to read/write directly from/to the provided data buffer, without employing a temporary data buffer for assembling the result. Higher level bio_vec arrays are generally immutable, so this requires copying into a private array. Only the bio_vecs themselves are shuffled around, not the actual data. OWN_BVECS doesn't own any pages. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: remove rbd_parent_request_{create,destroy}()Ilya Dryomov1-68/+6
rbd_parent_request_create() takes a ref on obj_req for child_img_req. There is no point in doing that because child_img_req is created on behalf of obj_req -- obj_req is the initiator and can't be completed before child_img_req. Open-code the rest of rbd_parent_request_create() and remove it along with rbd_parent_request_destroy(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: get rid of img_req->{offset,length}Ilya Dryomov1-18/+8
These are set, but no longer used. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: remove rbd_img_request_fill() and helpersIlya Dryomov1-98/+0
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: switch to common striping frameworkIlya Dryomov1-23/+168
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: create+truncate for whole-object layered discardsIlya Dryomov1-1/+6
A whole-object layered discard is implemented as a truncate rather than a delete: a dummy object is needed to prevent the CoW machinery from kicking in. However, a truncate on a non-existent object is a no-op. If the object doesn't exist in HEAD, a discard request is effectively ignored, which violates our "discard zeroes data" promise and breaks REQ_OP_WRITE_ZEROES implementation. A non-exclusive create on an existing object is also a no-op, so the fix is to do a compound create+truncate instead. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: move to obj_req->img_extentsIlya Dryomov1-52/+98
In preparation for rbd "fancy" striping, replace obj_req->img_offset with obj_req->img_extents. A single starting offset isn't sufficient because we want only one OSD request per object and will merge adjacent object extents in ceph_file_to_extents(). The final object extent may map into multiple different byte ranges in the image. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: incorporate ceph_object_extentIlya Dryomov1-37/+34
obj_req->object_no -> obj_req->ex.oe_objno obj_req->offset -> obj_req->ex.oe_off obj_req->length -> obj_req->ex.oe_len ... and use ex for linking object requests to image requests. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: store data_type in img_req instead of obj_reqIlya Dryomov1-26/+8
All object requests are associated with an image request now -- avoid duplicating the same info in each object request. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: remove obj_req->flags fieldIlya Dryomov1-35/+0
There are no standalone (!IMG_DATA) object requests anymore. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: remove old request completion codeIlya Dryomov1-172/+3
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: new request completion codeIlya Dryomov1-13/+55
Do away with partial request completions and all the associated complexity. Individual object requests no longer need to be completed in order -- when the last one becomes ready, we complete the entire higher level request all at once. This also wraps up the conversion to a state machine model and eliminates the recursion described in commit 6d69bb536bac ("rbd: prevent kernel stack blow up on rbd map"). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: update rbd_img_request_submit() signatureIlya Dryomov1-10/+3
It should be void now. Also, object requests are unlinked only in image request destructor, which can't run before rbd_img_request_put(), so no need for _safe. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: add img_req->op_type fieldIlya Dryomov1-63/+12
Store op_type in its own field instead of packing it into flags. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: simplify rbd_osd_req_create()Ilya Dryomov1-45/+14
No need to pass rbd_dev and op_type to rbd_osd_req_create(): there are no standalone (!IMG_DATA) object requests anymore and osd_req->r_flags can be set in rbd_osd_req_format_{read,write}(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: remove old request handling codeIlya Dryomov1-730/+4
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: new request handling codeIlya Dryomov1-77/+601
The notable changes are: - instead of explicitly stat'ing the object to see if it exists before issuing the write, send the write optimistically along with the stat in a single OSD request - zero copyup optimization - all object requests are associated with an image request and have a valid ->img_request pointer; there are no standalone (!IMG_DATA) object requests anymore - code is structured as a state machine (vs a bunch of callbacks with implicit state) Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-02rbd: move from raw pages to bvec data descriptorsIlya Dryomov1-78/+77
In preparation for rbd "fancy" striping which requires bio_vec arrays, wire up BVECS data type and kill off PAGES data type. There is nothing wrong with using page vectors for copyup requests -- it's just less iterator boilerplate code to write for the new striping framework. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02rbd: get rid of img_req->copyup_pagesIlya Dryomov1-34/+9
The initiating object request is the proper owner -- save a bit of space. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02rbd: don't (ab)use obj_req->pages for stat requestsIlya Dryomov1-10/+5
obj_req->pages is for provided data buffers. stat requests are internal and should be NODATA. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2018-04-02rbd: remove bio cloning helpersIlya Dryomov1-141/+0
Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>