aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/block/drbd/drbd_actlog.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-08-23block: replace bi_bdev with a gendisk pointer and partitions indexChristoph Hellwig1-1/+1
This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-09block: switch bios to blk_status_tChristoph Hellwig1-1/+1
Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-01block: replace REQ_NOIDLE with REQ_IDLEChristoph Hellwig1-1/+1
Noidle should be the default for writes as seen by all the compounds definitions in fs.h using it. In fact only direct I/O really should be using NODILE, so turn the whole flag around to get the defaults right, which will make our life much easier especially onces the WRITE_* defines go away. This assumes all the existing "raw" users of REQ_SYNC for writes want noidle behavior, which seems to be spot on from a quick audit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-02dynamic_debug: only add header when usedLuis de Bethencourt1-1/+0
kernel.h header doesn't directly use dynamic debug, instead we can include it in module.c (which used it via kernel.h). printk.h only uses it if CONFIG_DYNAMIC_DEBUG is on, changing the inclusion to only happen in that case. Link: http://lkml.kernel.org/r/1468429793-16917-1-git-send-email-luisbg@osg.samsung.com [luisbg@osg.samsung.com: include dynamic_debug.h in drb_int.h] Link: http://lkml.kernel.org/r/1468447828-18558-2-git-send-email-luisbg@osg.samsung.com Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-06-13drbd: al_write_transaction: skip re-scanning of bitmap page pointer arrayLars Ellenberg1-0/+2
For larger devices, the array of bitmap page pointers can grow very large (8000 pointers per TB of storage). For each activity log transaction, we need to flush the associated bitmap pages to stable storage. Currently, we just "mark" the respective pages while setting up the transaction, then tell the bitmap code to write out all marked pages, but skip unchanged pages. But one such transaction can affect only a small number of bitmap pages, there is no need to scan the full array of several (ten-)thousand page pointers to find the few marked ones. Instead, remember the index numbers of the few affected pages, and later only re-check those to skip duplicates and unchanged ones. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-13drbd: introduce WRITE_SAME supportLars Ellenberg1-1/+8
We will support WRITE_SAME, if * all peers support WRITE_SAME (both in kernel and DRBD version), * all peer devices support WRITE_SAME * logical_block_size is identical on all peers. We may at some point introduce a fallback on the receiving side for devices/kernels that do not support WRITE_SAME, by open-coding a submit loop. But not yet. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-13drbd: finish resync on sync source only by notification from sync targetLars Ellenberg1-4/+12
If the replication link breaks exactly during "resync finished" detection, finishing too early on the sync source could again lead to UUIDs rotated too fast, and potentially a spurious full resync on next handshake. Always wait for explicit resync finished state change notification from the sync target. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-13drbd: allow larger max_discard_sectorsLars Ellenberg1-1/+1
Make sure we have at least 67 (> AL_UPDATES_PER_TRANSACTION) al-extents available, and allow up to half of that to be discarded in one bio. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSHMike Christie1-2/+2
To avoid confusion between REQ_OP_FLUSH, which is handled by request_fn drivers, and upper layers requesting the block layer perform a flush sequence along with possibly a WRITE, this patch renames REQ_FLUSH to REQ_PREFLUSH. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07drbd: use bio op accessorsMike Christie1-13/+15
Separate the op from the rq_flag_bits and have drbd set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07block/fs/drivers: remove rw argument from submit_bioMike Christie1-1/+1
This has callers of submit_bio/submit_bio_wait set the bio->bi_rw instead of passing it in. This makes that use the same as generic_make_request and how we set the other bio fields. Signed-off-by: Mike Christie <mchristi@redhat.com> Fixed up fs/ext4/crypto.c Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-25drbd: avoid potential deadlock during handshakeLars Ellenberg1-8/+11
During handshake communication, we also reconsider our device size, using drbd_determine_dev_size(). Just in case we need to change the offsets or layout of our on-disk metadata, we lock out application and other meta data IO, and wait for the activity log to be "idle" (no more referenced extents). If this handshake happens just after a connection loss, with a fencing policy of "resource-and-stonith", we have frozen IO. If, additionally, the activity log was "starving" (too many incoming random writes at that point in time), it won't become idle, ever, because of the frozen IO, and this would be a lockup of the receiver thread, and consquentially of DRBD. Previous logic (re-)initialized with a special "empty" transaction block, which required the activity log to fully drain first. Instead, write out some standard activity log transactions. Using lc_try_lock_for_transaction() instead of lc_try_lock() does not care about pending activity log references, avoiding the potential deadlock. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-25drbd: separate out __al_write_transaction helper functionLars Ellenberg1-148/+156
To be able to "force out" an activity log transaction, even if there are no pending updates. This will be used to relocate the on-disk activity log, if the on-disk offsets have to be changed, without the need to empty the activity log first. While at it, move the definition, so we can drop the forward declaration of a static helper. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-29block: add a bi_error field to struct bioChristoph Hellwig1-2/+2
Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-10drbd: Minor cleanupsAndreas Gruenbacher1-2/+1
. Update comments . drbd_set_{in,out_of}_sync(): Remove unused parameters . Move common code into adm_del_resource() . Redefine ERR_MINOR_EXISTS -> ERR_MINOR_OR_VOLUME_EXISTS Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-11drbd: Use consistent names for all the bi_end_io callbacksAndreas Gruenbacher1-2/+2
Now they follow the _endio naming sheme. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-07-10drbd: drop spurious parameters from _drbd_md_sync_page_ioLars Ellenberg1-7/+5
size is always 4096, page is always device->md_io.page. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: resync should only lock out specific rangesLars Ellenberg1-2/+13
During resync, if we need to block some specific incoming write because of active resync requests to that same range, we potentially caused *all* new application writes (to "cold" activity log extents) to block until this one request has been processed. Improve the do_submit() logic to * grab all incoming requests to some "incoming" list * process this list - move aside requests that are blocked by resync - prepare activity log transactions, - commit transactions and submit corresponding requests - if there are remaining requests that only wait for activity log extents to become free, stop the fast path (mark activity log as "starving") - iterate until no more requests are waiting for the activity log, but all potentially remaining requests are only blocked by resync * only then grab new incoming requests That way, very busy IO on currently "hot" activity log extents cannot starve scattered IO to "cold" extents. And blocked-by-resync requests are processed once resync traffic on the affected region has ceased, without blocking anything else. The only blocking mode left is when we cannot start requests to "cold" extents because all currently "hot" extents are actually used. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: improve throttling decisions of background resynchronisationLars Ellenberg1-3/+26
Background resynchronisation does some "side-stepping", or throttles itself, if it detects application IO activity, and the current resync rate estimate is above the configured "cmin-rate". What was not detected: if there is no application IO, because it blocks on activity log transactions. Introduce a new atomic_t ap_actlog_cnt, tracking such blocked requests, and count non-zero as application IO activity. This counter is exposed at proc_details level 2 and above. Also make sure to release the currently locked resync extent if we side-step due to such voluntary throttling. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: track meta data IO intent, start and submit timeLars Ellenberg1-9/+17
For diagnostic purposes, track intent, start time and latest submit time of meta data IO. Move separate members from struct drbd_device into the embeded struct drbd_md_io. s/md_io_(page|in_use)/md_io.\1/ Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: get rid of drbd_queue_work_frontLars Ellenberg1-63/+6
The last user was al_write_transaction, if called with "delegate", and the last user to call it with "delegate = true" was the receiver thread, which has no need to delegate, but can call it himself. Finally drop the delegate parameter, drop the extra w_al_write_transaction callback, and drop drbd_queue_work_front. Do not (yet) change dequeue_work_item to dequeue_work_batch, though. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: make sure disk cleanup happens in worker contextLars Ellenberg1-8/+1
The recent fix to put_ldev() (correct ordering of access to local_cnt and state.disk; memory barrier in __drbd_set_state) guarantees that the cleanup happens exactly once. However it does not yet guarantee that the cleanup happens from worker context, the last put_ldev() may still happen from atomic context, which must not happen: blkdev_put() may sleep. Fix this by scheduling the cleanup to the worker instead, using a couple more bits in device->flags and a new helper, drbd_device_post_work(). Generalized the "resync progress" work to cover these new work bits. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: explicitly submit meta data requests with REQ_NOIDLELars Ellenberg1-1/+1
For some reason we have assumed NOIDLE was implied by one of the other flags we set. It is not (anymore?). Explicitly set REQ_NOIDLE for synchronous meta data updates, or we can seriously starve random writes when using CFQ. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: fix resync finished detectionLars Ellenberg1-174/+141
This fixes one recent regresion, and one long existing bug. The bug: drbd_try_clear_on_disk_bm() assumed that all "count" bits have to be accounted in the resync extent corresponding to the start sector. Since we allow application requests to cross our "extent" boundaries, this assumption is no longer true, resulting in possible misaccounting, scary messages ("BAD! sector=12345s enr=6 rs_left=-7 rs_failed=0 count=58 cstate=..."), and potentially, if the last bit to be cleared during resync would reside in previously misaccounted resync extent, the resync would never be recognized as finished, but would be "stalled" forever, even though all blocks are in sync again and all bits have been cleared... The regression was introduced by drbd: get rid of atomic update on disk bitmap works For an "empty" resync (rs_total == 0), we must not "finish" the resync on the SyncSource before the SyncTarget knows all relevant information (sync uuid). We need to wait for the full round-trip, the SyncTarget will then explicitly notify us. Also for normal, non-empty resyncs (rs_total > 0), the resync-finished condition needs to be tested before the schedule() in wait_for_work, or it is likely to be missed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: get rid of atomic update on disk bitmap worksLars Ellenberg1-62/+1
Just trigger the occasional lazy bitmap write-out during resync from the central wait_for_work() helper. Previously, during resync, bitmap pages would be written out separately, synchronously, one at a time, at least 8 times each (every 512 bytes worth of bitmap cleared). Now we trigger "merge friendly" bulk write out of all cleared pages every two seconds during resync, and once the resync is finished. Most pages will be written out only once. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-04-30drbd: add back some fairness to AL transactionsLars Ellenberg1-2/+1
When batching more updates to the activity log into single transactions, we lost the ability for new requests to force themselves into the active set: all preparation steps became non-blocking, and if all currently hot extents keep busy, they could starve out new incoming requests to cold extents for quite a while. This can only happen if your IO backend accepts more IO operations per average DRBD replication round trip time than you have al-extents configured. If we have incoming requests to cold extents, at least do one blocking update per transaction. In an artificial worst-case workload on SSD with an asynchronous 600 ms replication link, with al-extents = 7 (the minimum we allow), and concurrent full resynch, without this patch, some write requests have been observed to be starved for 40 seconds. With this patch, application observed a worst case latency of twice the replication round trip time. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: prepare receiving side for REQ_DISCARDLars Ellenberg1-3/+3
If the receiver needs to serve a discard request on a queue that does not announce to be discard cabable, it falls back to do synchronous blkdev_issue_zeroout(). We expect only "reasonably" large (up to one activity log extent?) discard requests. We do this to not to not block the receiver for too long in this fallback code path, and to not set/clear too many bits inside one spinlock_irq_save() in drbd_set_in_sync/drbd_set_out_of_sync, Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: don't let application IO pre-empt resync too oftenLars Ellenberg1-7/+6
Before, application IO could pre-empt resync activity for up to hardcoded 20 seconds per resync request. A very busy server could throttle the effective resync bandwidth down to one request per 20 seconds. Now, we only let application IO pre-empt resync traffic while the current resync rate estimate is above c-min-rate. If you disable the c-min-rate throttle feature (set c-min-rate = 0), application IO will no longer pre-empt resync traffic at all. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30drbd: Remove drbd_wrappers.hPhilipp Reisner1-1/+0
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-17drbd: Create a dedicated struct drbd_device_workAndreas Gruenbacher1-6/+10
drbd_device_work is a work item that has a reference to a device, while drbd_work is a more generic work item that does not carry a reference to a device. All callbacks get a pointer to a drbd_work instance, those callbacks that expect a drbd_device_work use the container_of macro to get it. Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17drbd: Add explicit device parameter to D_ASSERTAndreas Gruenbacher1-19/+19
The implicit dependency on a variable inside the macro is problematic. Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17drbd: Remove the terrible DEV hackAndreas Gruenbacher1-27/+27
DRBD was using dev_err() and similar all over the code; instead of having to write dev_err(disk_to_dev(device->vdisk), ...) to convert a drbd_device into a kernel device, a DEV macro was used which implicitly references the device variable. This is terrible; introduce separate drbd_err() and similar macros with an explicit device parameter instead. Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17drbd: Introduce "peer_device" object between "device" and "connection"Andreas Gruenbacher1-4/+4
In a setup where a device (aka volume) can replicate to multiple peers and one connection can be shared between multiple devices, we need separate objects to represent devices on peer nodes and network connections. As a first step to introduce multiple connections per device, give each drbd_device object a single drbd_peer_device object which connects it to a drbd_connection object. Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17drbd: Rename drbd_tconn -> drbd_connectionAndreas Gruenbacher1-4/+4
sed -i -e 's:all_tconn:connections:g' -e 's:tconn:connection:g' Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17drbd: Rename "mdev" to "device"Andreas Gruenbacher1-272/+272
sed -i -e 's:mdev:device:g' Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17drbd: Rename struct drbd_conf -> struct drbd_deviceAndreas Gruenbacher1-34/+34
sed -i -e 's:\<drbd_conf\>:drbd_device:g' Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17drivers: block: Mark function as static in drbd_actlog.cRashika Kheria1-0/+1
Mark the function drbd_al_begin_io_prepare() as static in drbd/drbd_actlog.c because it is not used outside this file. This eliminates the following warnings in drbd/drbd_actlog.c: drivers/block/drbd/drbd_actlog.c:277:6: warning: no previous prototype for ‘drbd_al_begin_io_prepare’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2013-11-23block: Abstract out bvec iteratorKent Overstreet1-1/+1
Immutable biovecs are going to require an explicit iterator. To implement immutable bvecs, a later patch is going to add a bi_bvec_done member to this struct; for now, this patch effectively just renames things. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Ed L. Cashin" <ecashin@coraid.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@inktank.com> Cc: ceph-devel@vger.kernel.org Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: Boaz Harrosh <bharrosh@panasas.com> Cc: Benny Halevy <bhalevy@tonian.com> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Ben Myers <bpm@sgi.com> Cc: xfs@oss.sgi.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Guo Chao <yan@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: Selvan Mani <smani@micron.com> Cc: Sam Bradshaw <sbradshaw@micron.com> Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchand@redhat.com> Cc: Joe Perches <joe@perches.com> Cc: Peng Tao <tao.peng@emc.com> Cc: Andy Adamson <andros@netapp.com> Cc: fanchaoting <fanchaoting@cn.fujitsu.com> Cc: Jie Liu <jeff.liu@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Namjae Jeon <namjae.jeon@samsung.com> Cc: Pankaj Kumar <pankaj.km@samsung.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Mel Gorman <mgorman@suse.de>6
2013-06-28drbd: Allow online change of al-stripes and al-stripe-sizePhilipp Reisner1-0/+21
Allow to change the AL layout with an resize operation. For that the reisze command gets two new fields: al_stripes and al_stripe_size. In order to make the operation crash save: 1) Lock out all IO and MD-IO 2) Write the super block with MDF_PRIMARY_IND clear 3) write the bitmap to the new location (all zeros, since we allow only while connected) 4) Initialize the new AL-area 5) Write the super block with the restored MDF_PRIMARY_IND. 6) Unfreeze all IO Since the AL-layout has no influence on the protocol, this operation needs to be beforemed on both sides of a resource (if intended). Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-28drbd: fix if(); found by kbuild test robotLars Ellenberg1-1/+1
Recently introduced al_begin_io_nonblock() was returning -EBUSY, even when it should return -EWOULDBLOCK. Impact: A few spurious wake_up() calls in prepare_al_transaction_nonblock(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-22drbd: consolidate as many updates as possible into one AL transactionLars Ellenberg1-0/+49
Depending on current IO depth, try to consolidate as many updates as possible into one activity log transaction. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-22drbd: split out some helper functions to drbd_al_begin_ioLars Ellenberg1-21/+28
To make the code easier to follow, use an explicit find_active_resync_extent(), and add a "nonblock" parameter to _al_get(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-22drbd: split drbd_al_begin_io into fastpath, prepare, and commitLars Ellenberg1-33/+71
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-22drbd: drbd_al_being_io: short circuit to reduce latencyLars Ellenberg1-2/+12
A request hitting an already "hot" extent should proceed right away, even if some other requests need to wait for pending transactions. Without that short-circuit, several simultaneous make_request contexts race for committing the transaction, possibly penalizing the innocent. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-22drbd: Clarify when activity log I/O is delegated to the worker threadLars Ellenberg1-22/+27
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-22drbd: read meta data early, base on-disk offsets on super blockLars Ellenberg1-3/+8
We used to calculate all on-disk meta data offsets, and then compare the stored offsets, basically treating them as magic numbers. Now with the activity log striping, the activity log size is no longer fixed. We need to first read the super block, then base the activity log and bitmap offsets on the stored offsets/al stripe settings. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-22drbd: prepare for new striped layout of activity logLars Ellenberg1-3/+3
Introduce two new on-disk meta data fields: al_stripes and al_stripe_size_4k The intended use case is activity log on RAID 0 or similar. Logically consecutive transactions will advance their on-disk position by al_stripe_size_4k 4kB (transaction sized) blocks. Right now, these are still asserted to be the backward compatible values al_stripes = 1, al_stripe_size_4k = 8 (which amounts to 32kB). Also introduce a caching member for meta_dev_idx in the in-core structure: even though it is initially passed in in the rcu-protected disk_conf structure, it cannot change without a detach/attach cycle. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-22drbd: cleanup ondisk meta data layout calculations and definesLars Ellenberg1-6/+22
Add a comment about our meta data layout variants, and rename a few defines (e.g. MD_RESERVED_SECT -> MD_128MB_SECT) to make it clear that they are short hand for fixed constants, and not arbitrarily to be redefined as one may see fit. Properly pad struct meta_data_on_disk to 4kB, and initialize to zero not only the first 512 Byte, but all of it in drbd_md_sync(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-11-09drbd: don't try to clear bits once the disk has failedPhilipp Reisner1-5/+10
If the disk has failed already, there is no point trying to change the bitmap. drbd_set_out_of_sync() already had this safeguard, time to add it to drbd_set_in_sync() as well. This also prevents some warning messages, like FIXME asender in bm_change_bits_to, bitmap locked for 'detach' by worker if our disk fails during resync, while there are some resync acks queued up. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09drbd: wait for meta data IO completion even with failed disk, unless force-detachedLars Ellenberg1-10/+7
The intention of force-detach is to be able to deal with a completely unresponsive lower level IO stack, which does not even deliver error completions anymore, but no completion at all. In all other cases, we must still wait for the meta data IO completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>