aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/block/drbd (follow)
AgeCommit message (Collapse)AuthorFilesLines
2015-08-13block: kill merge_bvec_fn() completelyKent Overstreet3-37/+0
As generic_make_request() is now able to handle arbitrarily sized bios, it's no longer necessary for each individual block driver to define its own ->merge_bvec_fn() callback. Remove every invocation completely. Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: drbd-user@lists.linbit.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Cc: Alex Elder <elder@kernel.org> Cc: ceph-devel@vger.kernel.org Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de> Cc: linux-raid@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits) Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [dpark: also remove ->merge_bvec_fn() in dm-thin as well as dm-era-target, and resolve merge conflicts] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-13block: make generic_make_request handle arbitrarily sized biosKent Overstreet1-0/+2
The way the block layer is currently written, it goes to great lengths to avoid having to split bios; upper layer code (such as bio_add_page()) checks what the underlying device can handle and tries to always create bios that don't need to be split. But this approach becomes unwieldy and eventually breaks down with stacked devices and devices with dynamic limits, and it adds a lot of complexity. If the block layer could split bios as needed, we could eliminate a lot of complexity elsewhere - particularly in stacked drivers. Code that creates bios can then create whatever size bios are convenient, and more importantly stacked drivers don't have to deal with both their own bio size limitations and the limitations of the (potentially multiple) devices underneath them. In the future this will let us delete merge_bvec_fn and a bunch of other code. We do this by adding calls to blk_queue_split() to the various make_request functions that need it - a few can already handle arbitrary size bios. Note that we add the call _after_ any call to blk_queue_bounce(); this means that blk_queue_split() and blk_recalc_rq_segments() don't need to be concerned with bouncing affecting segment merging. Some make_request_fn() callbacks were simple enough to audit and verify they don't need blk_queue_split() calls. The skipped ones are: * nfhd_make_request (arch/m68k/emu/nfblock.c) * axon_ram_make_request (arch/powerpc/sysdev/axonram.c) * simdisk_make_request (arch/xtensa/platforms/iss/simdisk.c) * brd_make_request (ramdisk - drivers/block/brd.c) * mtip_submit_request (drivers/block/mtip32xx/mtip32xx.c) * loop_make_request * null_queue_bio * bcache's make_request fns Some others are almost certainly safe to remove now, but will be left for future patches. Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ming Lei <ming.lei@canonical.com> Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: dm-devel@redhat.com Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: drbd-user@lists.linbit.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Geoff Levand <geoff@infradead.org> Cc: Jim Paris <jim@jtan.com> Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Andreas Dilger <andreas.dilger@intel.com> Acked-by: NeilBrown <neilb@suse.de> (for the 'md/md.c' bits) Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> [dpark: skip more mq-based drivers, resolve merge conflicts, etc.] Signed-off-by: Dongsu Park <dpark@posteo.net> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-29block: add a bi_error field to struct bioChristoph Hellwig5-58/+30
Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-17block: have drivers use blk_queue_max_discard_sectors()Jens Axboe1-2/+2
Some drivers use it now, others just set the limits field manually. But in preparation for splitting this into a hard and soft limit, ensure that they all call the proper function for setting the hw limit for discards. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-04Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-9/+1
Pull more vfs updates from Al Viro: "Assorted VFS fixes and related cleanups (IMO the most interesting in that part are f_path-related things and Eric's descriptor-related stuff). UFS regression fixes (it got broken last cycle). 9P fixes. fs-cache series, DAX patches, Jan's file_remove_suid() work" [ I'd say this is much more than "fixes and related cleanups". The file_table locking rule change by Eric Dumazet is a rather big and fundamental update even if the patch isn't huge. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits) 9p: cope with bogus responses from server in p9_client_{read,write} p9_client_write(): avoid double p9_free_req() 9p: forgetting to cancel request on interrupted zero-copy RPC dax: bdev_direct_access() may sleep block: Add support for DAX reads/writes to block devices dax: Use copy_from_iter_nocache dax: Add block size note to documentation fs/file.c: __fget() and dup2() atomicity rules fs/file.c: don't acquire files->file_lock in fd_install() fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation vfs: avoid creation of inode number 0 in get_next_ino namei: make set_root_rcu() return void make simple_positive() public ufs: use dir_pages instead of ufs_dir_pages() pagemap.h: move dir_pages() over there remove the pointless include of lglock.h fs: cleanup slight list_entry abuse xfs: Correctly lock inode when removing suid and file capabilities fs: Call security_ops->inode_killpriv on truncate fs: Provide function telling whether file_remove_privs() will do anything ...
2015-06-25Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-blockLinus Torvalds2-5/+6
Pull cgroup writeback support from Jens Axboe: "This is the big pull request for adding cgroup writeback support. This code has been in development for a long time, and it has been simmering in for-next for a good chunk of this cycle too. This is one of those problems that has been talked about for at least half a decade, finally there's a solution and code to go with it. Also see last weeks writeup on LWN: http://lwn.net/Articles/648292/" * 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits) writeback, blkio: add documentation for cgroup writeback support vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB writeback: do foreign inode detection iff cgroup writeback is enabled v9fs: fix error handling in v9fs_session_init() bdi: fix wrong error return value in cgwb_create() buffer: remove unusued 'ret' variable writeback: disassociate inodes from dying bdi_writebacks writeback: implement foreign cgroup inode bdi_writeback switching writeback: add lockdep annotation to inode_to_wb() writeback: use unlocked_inode_to_wb transaction in inode_congested() writeback: implement unlocked_inode_to_wb transaction and use it for stat updates writeback: implement [locked_]inode_to_wb_and_lock_list() writeback: implement foreign cgroup inode detection writeback: make writeback_control track the inode being written back writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb() mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use writeback: implement memcg writeback domain based throttling writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes writeback: implement memcg wb_domain writeback: update wb_over_bg_thresh() to use wb_domain aware operations ...
2015-06-23make simple_positive() publicAl Viro1-9/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-02writeback: separate out include/linux/backing-dev-defs.hTejun Heo1-0/+1
With the planned cgroup writeback support, backing-dev related declarations will be more widely used across block and cgroup; unfortunately, including backing-dev.h from include/linux/blkdev.h makes cyclic include dependency quite likely. This patch separates out backing-dev-defs.h which only has the essential definitions and updates blkdev.h to include it. c files which need access to more backing-dev details now include backing-dev.h directly. This takes backing-dev.h off the common include dependency chain making it a lot easier to use it across block and cgroup. v2: fs/fat build failure fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-02writeback: move backing_dev_info->state into bdi_writebackTejun Heo1-5/+5
Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback) and the role of the separation is unclear. For cgroup support for writeback IOs, a bdi will be updated to host multiple wb's where each wb serves writeback IOs of a different cgroup on the bdi. To achieve that, a wb should carry all states necessary for servicing writeback IOs for a cgroup independently. This patch moves bdi->state into wb. * enum bdi_state is renamed to wb_state and the prefix of all enums is changed from BDI_ to WB_. * Explicit zeroing of bdi->state is removed without adding zeoring of wb->state as the whole data structure is zeroed on init anyway. * As there's still only one bdi_writeback per backing_dev_info, all uses of bdi->state are mechanically replaced with bdi->wb.state introducing no behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: drbd-dev@lists.linbit.com Cc: Neil Brown <neilb@suse.de> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-11net: Add a struct net parameter to sock_create_kernEric W. Biederman1-2/+2
This is long overdue, and is part of cleaning up how we allocate kernel sockets that don't reference count struct net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-26Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-4/+4
Pull fourth vfs update from Al Viro: "d_inode() annotations from David Howells (sat in for-next since before the beginning of merge window) + four assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RCU pathwalk breakage when running into a symlink overmounting something fix I_DIO_WAKEUP definition direct-io: only inc/dec inode->i_dio_count for file systems fs/9p: fix readdir() VFS: assorted d_backing_inode() annotations VFS: fs/inode.c helpers: d_inode() annotations VFS: fs/cachefiles: d_backing_inode() annotations VFS: fs library helpers: d_inode() annotations VFS: assorted weird filesystems: d_inode() annotations VFS: normal filesystems (and lustre): d_inode() annotations VFS: security/: d_inode() annotations VFS: security/: d_backing_inode() annotations VFS: net/: d_inode() annotations VFS: net/unix: d_backing_inode() annotations VFS: kernel/: d_inode() annotations VFS: audit: d_backing_inode() annotations VFS: Fix up some ->d_inode accesses in the chelsio driver VFS: Cachefiles should perform fs modifications on the top layer only VFS: AF_UNIX sockets should call mknod on the top layer only
2015-04-15VFS: assorted weird filesystems: d_inode() annotationsDavid Howells1-4/+4
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-24block, drbd: use mempool_create_slab_pool()David Rientjes1-4/+3
Mempools created for slab caches should use mempool_create_slab_pool(). Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jens Axboe <axboe@fb.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-03-24block, drbd: fix drbd_req_new() initializationDavid Rientjes1-1/+2
mempool_alloc() does not support __GFP_ZERO since elements may come from memory that has already been released by mempool_free(). Remove __GFP_ZERO from mempool_alloc() in drbd_req_new() and properly initialize it to 0. Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jens Axboe <axboe@fb.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-01-21block: Add discard flag to blkdev_issue_zeroout() functionMartin K. Petersen1-1/+1
blkdev_issue_discard() will zero a given block range. This is done by way of explicit writing, thus provisioning or allocating the blocks on disk. There are use cases where the desired behavior is to zero the blocks but unprovision them if possible. The blocks must deterministically contain zeroes when they are subsequently read back. This patch adds a flag to blkdev_issue_zeroout() that provides this variant. If the discard flag is set and a block device guarantees discard_zeroes_data we will use REQ_DISCARD to clear the block range. If the device does not support discard_zeroes_data or if the discard request fails we will fall back to first REQ_WRITE_SAME and then a regular REQ_WRITE. Also update the callers of blkdev_issue_zero() to reflect the new flag and make sb_issue_zeroout() prefer the discard approach. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-12-13Merge branch 'for-3.19/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds9-115/+93
Pull block layer driver updates from Jens Axboe: - NVMe updates: - The blk-mq conversion from Matias (and others) - A stack of NVMe bug fixes from the nvme tree, mostly from Keith. - Various bug fixes from me, fixing issues in both the blk-mq conversion and generic bugs. - Abort and CPU online fix from Sam. - Hot add/remove fix from Indraneel. - A couple of drbd fixes from the drbd team (Andreas, Lars, Philipp) - With the generic IO stat accounting from 3.19/core, converting md, bcache, and rsxx to use those. From Gu Zheng. - Boundary check for queue/irq mode for null_blk from Matias. Fixes cases where invalid values could be given, causing the device to hang. - The xen blkfront pull request, with two bug fixes from Vitaly. * 'for-3.19/drivers' of git://git.kernel.dk/linux-block: (56 commits) NVMe: fix race condition in nvme_submit_sync_cmd() NVMe: fix retry/error logic in nvme_queue_rq() NVMe: Fix FS mount issue (hot-remove followed by hot-add) NVMe: fix error return checking from blk_mq_alloc_request() NVMe: fix freeing of wrong request in abort path xen/blkfront: remove redundant flush_op xen/blkfront: improve protection against issuing unsupported REQ_FUA NVMe: Fix command setup on IO retry null_blk: boundary check queue_mode and irqmode block/rsxx: use generic io stats accounting functions to simplify io stat accounting md: use generic io stats accounting functions to simplify io stat accounting drbd: use generic io stats accounting functions to simplify io stat accounting md/bcache: use generic io stats accounting functions to simplify io stat accounting NVMe: Update module version major number NVMe: fail pci initialization if the device doesn't have any BARs NVMe: add ->exit_hctx() hook NVMe: make setup work for devices that don't do INTx NVMe: enable IO stats by default NVMe: nvme_submit_async_admin_req() must use atomic rq allocation NVMe: replace blk_put_request() with blk_mq_free_request() ...
2014-11-24drbd: use generic io stats accounting functions to simplify io stat accountingGu Zheng1-18/+4
Use generic io stats accounting help functions (generic_{start,end}_io_acct) to simplify io stat accounting. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-19kill f_dentry usesAl Viro1-3/+3
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-10drbd: Remove an useless copy of kernel_setsockopt()Philipp Reisner1-25/+4
Old backward-compat cruft Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-10drbd: Fix state change in case of connection timeoutPhilipp Reisner1-1/+1
A connection timeout affects all volumes of a resource! Under the following conditions: A resource with multiple volumes AND ko-count >=1 AND a write request triggers the timeout (ko-count * timeout) DRBD's internal state gets confused. That in turn may lead to very miss leading follow up failures. E.g. "BUG: scheduling while atomic" CC: stable@kernel.org # v3.17 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-10drbd: merge_bvec_fn: properly remap bvm->bi_bdevLars Ellenberg1-0/+1
This was not noticed for many years. Affects operation if md raid is used a backing device for DRBD. CC: stable@kernel.org # v3.2+ Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-10drbd: fix resync throttling initializationLars Ellenberg3-3/+5
If for some reason DRBD resync was the only activity on a backend device, drbd_rs_c_min_rate_throttle() would mistakenly decide that it is still initialization time, and keep throttling the resync. This patch explicitly initializes ->rs_last_events to the current backend event counters, and drops the rs_last_events == 0 from the throttle condition. Reported-by: Mikhail Sugakov <msugakov@amazon.de> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-10drbd: fix race between role change and handshakePhilipp Reisner3-8/+40
Symptoms: If DRBD was "cleanly shut down" (all in sync, both Secondary before disconnect, identical data generation uuids), and then one side was promoted *during* the next connection handshake, the role change could confuse the handshake. The Primary would get stuck in WFBitmapS, the Secondary would log unexpected cstate (Connected) in receive_bitmap and get stuck in WFBitmapT. Fix: The test in is_valid_soft_transition wrong. It works because the not allowed actions (promote/attach) do not touch the cstate. The previous condition failed to demand a cstate change in one clause. In order to avoid deadlocks give up the state_mutex while waiting for the transient state to go away. Conflicts: drbd/drbd_state.c drbd/drbd_state.h drbd/drbd_wrappers.h Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-10drbd: Only use drbd_msg_put_info() in drbd_nl.cAndreas Gruenbacher3-16/+4
Avoid generic netlink calls in other parts of the code base. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-11-10drbd: Minor cleanupsAndreas Gruenbacher4-44/+34
. Update comments . drbd_set_{in,out_of}_sync(): Remove unused parameters . Move common code into adm_del_resource() . Redefine ERR_MINOR_EXISTS -> ERR_MINOR_OR_VOLUME_EXISTS Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-18drbd: use RB_DECLARE_CALLBACKS() to define augment callbacksLai Jiangshan1-34/+2
The original code are the same as RB_DECLARE_CALLBACKS(). CC: Michel Lespinasse <walken@google.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-18drbd: compute the end before rb_insert_augmented()Lai Jiangshan1-0/+4
Commit 98683650 "Merge branch 'drbd-8.4_ed6' into for-3.8-drivers-drbd-8.4_ed6" switches to the new augment API, but the new API requires that the tree is augmented before rb_insert_augmented() is called, which is missing. So we add the augment-code to drbd_insert_interval() when it travels the tree up to down before rb_insert_augmented(). See the example in include/linux/interval_tree_generic.h or Documentation/rbtree.txt. drbd_insert_interval() may cancel the insertion when traveling, in this case, the just added augment-code does nothing before cancel since the @this node is already in the subtrees in this case. CC: Michel Lespinasse <walken@google.com> CC: stable@kernel.org # v3.10+ Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-11drbd: Add missing newline in resync progress display in /proc/drbdPhilipp Reisner1-1/+3
Was broken in 2010 with commit 4b0715f096 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-11drbd: reduce lock contention in drbd_workerLars Ellenberg1-6/+4
The worker may now dequeue work items in batches. This should reduce lock contention during busy periods. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-11drbd: Improve asender performanceLars Ellenberg2-10/+7
Shorten receive path in the asender thread. Reduces CPU utilisation of asender when receiving packets, and with that increases IOPs. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-11drbd: Get rid of the WORK_PENDING macroAndreas Gruenbacher1-8/+7
This macro doesn't add any value; just use test_bit() instead. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-11drbd: Get rid of the __no_warn and __cond_lock macrosAndreas Gruenbacher2-8/+12
These macros can easily be replaced with its definition. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-11drbd: Avoid inconsistent locking warningAndreas Gruenbacher1-1/+1
request_timer_fn() takes resource->req_lock via the device and releases it via the connection. Avoid this as it is confusing static code checkers. Reported-by: "Dan Carpenter" <dan.carpenter@oracle.com> Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-11drbd: Remove superfluous newline from "resync_extents" debugfs entry.Philipp Marek1-1/+1
See "drbd/resources/*/volumes/*/resync_extents". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-11drbd: Use consistent names for all the bi_end_io callbacksAndreas Gruenbacher4-9/+9
Now they follow the _endio naming sheme. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-09-11drbd: Use better variable namesAndreas Gruenbacher4-49/+49
Rename local variable 'ds' to 'disk_state' or 'data_size'. 'dgs' to 'digest_size' Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2014-08-14Merge branch 'for-3.17/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds15-1198/+2674
Pull block driver changes from Jens Axboe: "Nothing out of the ordinary here, this pull request contains: - A big round of fixes for bcache from Kent Overstreet, Slava Pestov, and Surbhi Palande. No new features, just a lot of fixes. - The usual round of drbd updates from Andreas Gruenbacher, Lars Ellenberg, and Philipp Reisner. - virtio_blk was converted to blk-mq back in 3.13, but now Ming Lei has taken it one step further and added support for actually using more than one queue. - Addition of an explicit SG_FLAG_Q_AT_HEAD for block/bsg, to compliment the the default behavior of adding to the tail of the queue. From Douglas Gilbert" * 'for-3.17/drivers' of git://git.kernel.dk/linux-block: (86 commits) bcache: Drop unneeded blk_sync_queue() calls bcache: add mutex lock for bch_is_open bcache: Correct printing of btree_gc_max_duration_ms bcache: try to set b->parent properly bcache: fix memory corruption in init error path bcache: fix crash with incomplete cache set bcache: Fix more early shutdown bugs bcache: fix use-after-free in btree_gc_coalesce() bcache: Fix an infinite loop in journal replay bcache: fix crash in bcache_btree_node_alloc_fail tracepoint bcache: bcache_write tracepoint was crashing bcache: fix typo in bch_bkey_equal_header bcache: Allocate bounce buffers with GFP_NOWAIT bcache: Make sure to pass GFP_WAIT to mempool_alloc() bcache: fix uninterruptible sleep in writeback thread bcache: wait for buckets when allocating new btree root bcache: fix crash on shutdown in passthrough mode bcache: fix lockdep warnings on shutdown bcache allocator: send discards with correct size bcache: Fix to remove the rcu_sched stalls. ...
2014-07-10drbd: silence underflow warning in read_in_block()Dan Carpenter1-1/+1
My static checker warns that "data_size" could be negative and underflow the limit check. The code looks suspicious but I don't know if it is a real bug. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: implicitly truncate cpu-maskLars Ellenberg1-0/+14
Don't error out with misleading "out of memory" if the cpu-mask has more bits set than there are CPUs. Just truncate to nr_cpu_ids implicitly. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: drop spurious parameters from _drbd_md_sync_page_ioLars Ellenberg1-7/+5
size is always 4096, page is always device->md_io.page. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: resync should only lock out specific rangesLars Ellenberg2-53/+98
During resync, if we need to block some specific incoming write because of active resync requests to that same range, we potentially caused *all* new application writes (to "cold" activity log extents) to block until this one request has been processed. Improve the do_submit() logic to * grab all incoming requests to some "incoming" list * process this list - move aside requests that are blocked by resync - prepare activity log transactions, - commit transactions and submit corresponding requests - if there are remaining requests that only wait for activity log extents to become free, stop the fast path (mark activity log as "starving") - iterate until no more requests are waiting for the activity log, but all potentially remaining requests are only blocked by resync * only then grab new incoming requests That way, very busy IO on currently "hot" activity log extents cannot starve scattered IO to "cold" extents. And blocked-by-resync requests are processed once resync traffic on the affected region has ceased, without blocking anything else. The only blocking mode left is when we cannot start requests to "cold" extents because all currently "hot" extents are actually used. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: debugfs: add per device data_gen_idLars Ellenberg1-0/+22
The data generation identifiers used to be exposed via sysfs at /sys/block/drbdX/drbd/meta_data/data_gen_id (out-of-tree), for advanced policy scripting. Bring that information over to debugfs. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: debugfs: add per connection oldest requestsLars Ellenberg1-0/+7
Information of former /sys/block/drbdX/drbd/oldest_requests is already with higher detail in these files: debugfs/drbd/resource/$name/in_flight_summary, debugfs/drbd/resource/$name/volumes/$vnr/oldest_requests This patch adds debugfs/drbd/resource/$name/connections/peer/oldest_requests Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: debugfs: add version tag to debugfs filesLars Ellenberg1-0/+52
Make the first line of debugfs files a version number, starting now with "v: 0". If we change content of presentation, we will bump that. Monitoring or diagnostic scritps that may parse these files can then easily know when they need to be reviewed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: debugfs: add per volume oldest_requestsLars Ellenberg1-5/+148
Show oldest requests * pending master bio completion and, * if different, local disk bio completion. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: debugfs: add callback_historyLars Ellenberg4-3/+144
Add a per-connection worker thread callback_history with timing details, call site and callback function. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: debugfs: Add in_flight_summaryLars Ellenberg4-20/+255
* Add details about pending meta data operations to in_flight_summary. * Report number of requests waiting for activity log transactions. * timing details of peer_requests to in_flight_summary. * FLUSH details DRBD devides the incoming request stream into "epochs", in which peers are allowed to re-order writes independendly. These epochs are separated by P_BARRIER on the replication link. Such barrier packets, depending on configuration, may cause the receiving side to drain the lower level device request queues and call blkdev_issue_flush(). This is known to be an other major source of latency in DRBD. Track timing details of calls to blkdev_issue_flush(), and add them to in_flight_summary. * data socket stats To be able to diagnose bottlenecks and root causes of "slow" IO on DRBD, it is useful to see network buffer stats along with the timing details of requests, peer requests, and meta data IO. * pending bitmap IO timing details to in_flight_summary. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: debugfs: deal with destructor racing with open of debugfs fileLars Ellenberg1-4/+54
Try to close the race between open() and debugfs_remove_recursive() from inside an object destructor. Once open succeeds, the object should stay around. Open should not succeed if the object has already reached its destructor. This may be overkill, but to make that happen, we check for existence of a parent directory, "stale-ness" of "this" dentry, and serialize kref_get_unless_zero() on the outermost object relevant for this file with d_delete() on this dentry (using the parent's i_mutex). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: debugfs: add in_flight_summary dataLars Ellenberg1-0/+196
To help diagnosing "high latency" or "hung" IO situations on DRBD, present per drbd resource group a summary of operations currently in progress. First item is a list of oldest drbd_request objects waiting for various things: * still being prepared * waiting for activity log transaction * waiting for local disk * waiting to be sent * waiting for peer acknowledgement ("receive ack", "write ack") * waiting for peer epoch acknowledgement ("barrier ack") Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10drbd: debugfs: add basic hierarchyLars Ellenberg5-5/+278
Add new debugfs hierarchy /sys/kernel/debug/ drbd/ resources/ $resource_name/connections/peer/$volume_number/ $resource_name/volumes/$volume_number/ minors/$minor_number -> ../resources/$resource_name/volumes/$volume_number/ Followup commits will populate this hierarchy with files containing statistics, diagnostic information and some attribute data. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>