aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/block/drbd/drbd_req.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2011-03-10drbd: drop code present under #ifdef which is relevant to 2.6.28 and belowOr Gerlitz1-5/+1
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Implemented real timeout checking for request processing timePhilipp Reisner1-0/+39
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Removed left over, now wrong commentsPhilipp Reisner1-7/+1
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: fix potential imbalance of ap_in_flightLars Ellenberg1-4/+5
When we receive a barrier ack, we walk the ring list of drbd requests in the transfer log of the respective epoch, do some housekeeping, and free those objects. We tried to keep epochs of mirrored and unmirrored drbd requests separate, and assert that no local-only requests are present in a barrier_acked epoch. It turns out that this has quite a number of corner cases and would add bloated code without functional benefit. We now revert the (insufficient) commits drbd: Fixed an issue with AHEAD -> SYNC_SOURCE transitions drbd: Ensure that an epoch contains only requests of one kind and instead fix the processing of barrier acks to cope with a mix of local-only and mirrored requests. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Ensure that an epoch contains only requests of one kindPhilipp Reisner1-25/+4
The assert in drbd_req.c:755 forces us to have only requests of one kind in an epoch. The two kinds we distinguish here are: local-only or mirrored. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Nothing should stop SyncSource -> Ahead transitionsPhilipp Reisner1-1/+1
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Be more careful with SyncSource -> Ahead transitionsPhilipp Reisner1-1/+7
We may not get from SyncSource to Ahead if we have sent some P_RS_DATA_REPLY packets to the peer and are waiting for P_WRITE_ACK. Again, this is not relevant for proper tuned systems, but makes sure that the not-tuned system does not get diverging bitmaps. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Documenting drbd_should_do_remote() and drbd_should_send_oos()Philipp Reisner1-4/+8
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Use the standard bool, true, and false keywordsAndreas Gruenbacher1-2/+2
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Get rid of unnecessary macros (2)Andreas Gruenbacher1-3/+3
The FAULT_ACTIVE macro just wraps the drbd_insert_fault macro for no apparent reason. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Rename drbd_make_request_26 to drbd_make_requestAndreas Gruenbacher1-3/+3
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: fix regression, we need to close drbd epochs during normal operationLars Ellenberg1-3/+8
commit e2041475e6ddb081734d161f6421977323f5a9b9 drbd: Starting with protocol 96 we can allow app-IO while receiving the bitmap Contained a bad chunk that tried to optimize away drbd barriers during bitmap exchange, but accidentally dropped them for normal mode as well. Impact: depending on activity log size and access pattern, activity log extents may not be recycled in time, causeing IO to block indefinetely. Fix: skip drbd barriers only if there is no connection to send them on, or the request being completed has not been on the network at all. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Starting with protocol 96 we can allow app-IO while receiving the bitmapPhilipp Reisner1-13/+24
* C_STARTING_SYNC_S, C_STARTING_SYNC_T In these states the bitmap gets written to disk. Locking out of app-IO is done by using the drbd_queue_bitmap_io() and drbd_bitmap_io() functions these days. It is no longer necessary to lock out app-IO based on the connection state. App-IO that may come in after the BITMAP_IO flag got cleared before the state transition to C_SYNC_(SOURCE|TARGET) does not get mirrored, sets a bit in the local bitmap, that is already set, therefore changes nothing. * C_WF_BITMAP_S In this state we send updates (P_OUT_OF_SYNC packets). With that we make sure they have the same number of bits when going into the C_SYNC_(SOURCE|TARGET) connection state. * C_UNCONNECTED: The receiver starts, no need to lock out IO. * C_DISCONNECTING: in drbd_disconnect() we had a wait_event() to wait until ap_bio_cnt reaches 0. Removed that. * C_TIMEOUT, C_BROKEN_PIPE, C_NETWORK_FAILURE C_PROTOCOL_ERROR, C_TEAR_DOWN: Same as C_DISCONNECTING * C_WF_REPORT_PARAMS: IO still possible since that is still like C_WF_CONNECTION. And we do not need to send barriers in C_WF_BITMAP_S connection state. Allow concurrent accesses to the bitmap when receiving the bitmap. Everything gets ORed anyways. A drbd_free_tl_hash() is in after_state_chg_work(). At that point all the work items of the last connections must have been processed. Introduced a call to drbd_free_tl_hash() into drbd_free_mdev() for paranoia reasons. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Begin to account BIO processing time before inc_ap_bio()Philipp Reisner1-4/+8
Since inc_ap_bio() might sleep already Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: New packet for Ahead/Behind mode: P_OUT_OF_SYNCPhilipp Reisner1-12/+32
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Implemented two new connection states Ahead/BehindPhilipp Reisner1-0/+23
In this connection mode, the ahead node no longer replicates application IO. The behind's disk becomes out dated. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Track the numbers of sectors in flightPhilipp Reisner1-1/+12
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: properly use max_hw_sectors to limit the our bio sizeLars Ellenberg1-4/+4
To ease tracking of bios in some hash tables, we want it to not cross certain boundaries (128k, used to be 32k). We limit the maximum bio size using queue parameters. Historically some defines and variables we use there have been named max_segment_size, which was misguided. Rename them to max_bio_size, and use [blk_]queue_max_hw_sectors where appropriate. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10block: remove per-queue pluggingJens Axboe1-4/+0
Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-10-23drbd: Removed checks for REQ_HARDBARRIER on incomming BIOsPhilipp Reisner1-14/+0
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-22drbd: Silenced an assertPhilipp Reisner1-1/+1
That assertion's condition needed adjustment for today's semantics Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-22drbd: rate limit an error messageLars Ellenberg1-1/+2
If we don't rate limit it, and you happen to log err level messages via serial console, an IO error on a disconnected Primary may cause serious unresponsiveness. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-22drbd: fix potential data divergence after multiple failuresLars Ellenberg1-5/+14
If we get an IO-error during an activity log transaction, if we failed to write the bitmap of the evicted extent, we must not write the transaction itself. If we failed to write the transaction, we must not even submit the corresponding bio, as its extent is not yet marked in the activity log. Otherwise, if this was a disconneted Primary (degraded cluster), which now lost its disk as well, and we later re-attach the same backend storage, we possibly "forget" to resync some parts of the disk that potentially have been changed. On the receiving side, when receiving from a peer with unhealthy disk, checking for pdsk == D_DISKLESS is not enough, we need to set out of sync and do AL transactions for everything pdsk < D_INCONSISTENT on the receiving side. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: Track the reasons to suspend IO in dedicated state bitsPhilipp Reisner1-3/+3
There are three ways to get IO suspended: * Loss of any access to data * Fence-peer-handler running * User requested to suspend IO Track those in different bits, so that one condition clearing its state bit does not interfere with the other two conditions. Only when the user resumes IO he overrules all three bits. The fact is hidden from the user, he sees only a single suspend bit. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: Disable activity log updates when the whole device is out of syncPhilipp Reisner1-2/+5
When the complete device is marked as out of sync, we can disable updates of the on disk AL. Currently AL updates are only disabled if one uses the "invalidate-remote" command on an unconnected, primary device, or when at attach time all bits in the bitmap are set. As of now, AL updated do not get disabled when a all bits becomes set due to application writes to an unconnected DRBD device. While this is a missing feature, it is not considered important, and might get added later. BTW, after initializing a "one legged" DRBD device drbdadm create-md resX drbdadm -- --force primary resX AL updates also get disabled, until the first connect. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: fix list corruption (recent regression)Lars Ellenberg1-25/+17
The commit 288f422ec13667de40b278535d2a5fb5c77352c4 drbd: Track all IO requests on the TL, not writes only moved a list_add_tail(req, ) into a region where req may have just been freed due to conflict detection. Fix this by adding a proper cleanup section for that code path. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: Allow tl_restart() to do IO completion while IO is suspendedPhilipp Reisner1-14/+20
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: Ensure that the peer was not rebootet in the meantime before resending TLPhilipp Reisner1-1/+1
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: Do not allow a fencing-policy of resource-and-stonith with protocol APhilipp Reisner1-1/+1
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: Finished the "on-no-data-accessible suspend-io;" functionalityPhilipp Reisner1-0/+24
When no data is accessible (no connection to the peer, nor a local disk) allow the user to select to freeze all IO operations instead of getting IO errors. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: Removed redundant error checks in the request code pathPhilipp Reisner1-15/+0
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: factored tl_restart() out of tl_clear().Philipp Reisner1-0/+14
If IO was frozen for a temporal network outage, resend the content of the transfer-log into the newly established connection. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: mod_req has now a return valuePhilipp Reisner1-1/+4
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: Track all IO requests on the TL, not writes onlyPhilipp Reisner1-9/+15
With that the drbd_fail_pending_reads() function becomes obsolete. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: renamed drbd_tl_epoch.n_req to drbd_tl_epoch.n_writesPhilipp Reisner1-2/+2
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-08-07block: unify flags for struct bio and struct requestChristoph Hellwig1-1/+1
Remove the current bio flags and reuse the request flags for the bio, too. This allows to more easily trace the type of I/O from the filesystem down to the block driver. There were two flags in the bio that were missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've renamed two request flags that had a superflous RW in them. Note that the flags are in bio.h despite having the REQ_ name - as blkdev.h includes bio.h that is the only way to go for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-01drbd: Reduce verbosityPhilipp Reisner1-5/+0
The "Local READ/WRITE failed" messages are too verbose. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-01drbd: fix hang on local read errors while disconnectedLars Ellenberg1-9/+13
"canceled" w_read_retry_remote never completed, if they have been canceled after drbd_disconnect connection teardown cleanup has already run (or we are currently not connected anyways). Fixed by not queueing a remote retry if we already know it won't work (pdsk not uptodate), and cleanup ourselves on "cancel", in case we hit a race with drbd_disconnect. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-06-01drbd: Removed the now empty w_io_error() functionPhilipp Reisner1-26/+1
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-05-18drbd: If we detect late that IO got frozen, retry after we thawed.Philipp Reisner1-7/+26
If we detect late (= after grabing mdev->req_lock) that IO got frozen, we return 1 to generic_make_request(), which simply will retry to make a request for that bio. In the subsequent call of generic_make_request() into drbd_make_request_26() we sleep in inc_ap_bio(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18drbd: always use_bmbv, ignore settingLars Ellenberg1-1/+1
Now that the peer may handle multi-bio EEs, we can ignore the peer's limit, and concentrate on the limits of the local IO stack. This is safe accross drbd protocol versions, as our queue_max_sectors() will be adjusted accordingly. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18drbd: fail_requests_early: remove incorrect and unnecessary optimizationLars Ellenberg1-5/+0
The condition does not fit the commend (I may well be Primary, even if I lost the disk earlier and now the connection). And this is catched below anyways, where it also gets logged. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2009-12-03drbd_req.c: use part_[inc|dec]_in_flight()Philipp Reisner1-2/+2
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2009-11-04drbd: performance - don't lose unplug eventsLars Ellenberg1-1/+6
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2009-10-28drbd: fix in_flight rw indexingJens Axboe1-2/+2
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-05drbd: fixup for reverted dual in_flight patchJens Axboe1-2/+2
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01drbd: remove tracing bitsJens Axboe1-11/+0
They should be reimplemented in the current scheme. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01dropping unneeded include autoconf.hLars Ellenberg1-1/+0
It is force-included on the gcc command line since at least 2.6.15. Explicit include lines seem to break compilation now in certain configurations. Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Acked-by: Sam Ravnborg <sam@ravnborg.org>
2009-10-01The DRBD driverPhilipp Reisner1-0/+1132
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>