aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/md/dm-mpath.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-01-31Merge tag 'for-4.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dmLinus Torvalds1-110/+187
Pull device mapper updates from Mike Snitzer: - DM core fixes to ensure that bio submission follows a depth-first tree walk; this is critical to allow forward progress without the need to use the bioset's BIOSET_NEED_RESCUER. - Remove DM core's BIOSET_NEED_RESCUER based dm_offload infrastructure. - DM core cleanups and improvements to make bio-based DM more efficient (e.g. reduced memory footprint as well leveraging per-bio-data more). - Introduce new bio-based mode (DM_TYPE_NVME_BIO_BASED) that leverages the more direct IO submission path in the block layer; this mode is used by DM multipath and also optimizes targets like DM thin-pool that stack directly on NVMe data device. - DM multipath improvements to factor out legacy SCSI-only (e.g. scsi_dh) code paths to allow for more optimized support for NVMe multipath. - A fix for DM multipath path selectors (service-time and queue-length) to select paths in a more balanced way; largely academic but doesn't hurt. - Numerous DM raid target fixes and improvements. - Add a new DM "unstriped" target that enables Intel to workaround firmware limitations in some NVMe drives that are striped internally (this target also works when stacked above the DM "striped" target). - Various Documentation fixes and improvements. - Misc cleanups and fixes across various DM infrastructure and targets (e.g. bufio, flakey, log-writes, snapshot). * tag 'for-4.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (69 commits) dm cache: Documentation: update default migration_throttling value dm mpath selector: more evenly distribute ties dm unstripe: fix target length versus number of stripes size check dm thin: fix trailing semicolon in __remap_and_issue_shared_cell dm table: fix NVMe bio-based dm_table_determine_type() validation dm: various cleanups to md->queue initialization code dm mpath: delay the retry of a request if the target responded as busy dm mpath: return DM_MAPIO_DELAY_REQUEUE if QUEUE_IO or PG_INIT_REQUIRED dm mpath: return DM_MAPIO_REQUEUE on blk-mq rq allocation failure dm log writes: fix max length used for kstrndup dm: backfill missing calls to mutex_destroy() dm snapshot: use mutex instead of rw_semaphore dm flakey: check for null arg_name in parse_features() dm thin: extend thinpool status format string with omitted fields dm thin: fixes in thin-provisioning.txt dm thin: document representation of <highest mapped sector> when there is none dm thin: fix documentation relative to low water mark threshold dm cache: be consistent in specifying sectors and SI units in cache.txt dm cache: delete obsoleted paragraph in cache.txt dm cache: fix grammar in cache-policies.txt ...
2018-01-29dm mpath: delay the retry of a request if the target responded as busyMike Snitzer1-1/+4
Add DM_ENDIO_DELAY_REQUEUE to allow request-based multipath's multipath_end_io() to instruct dm-rq.c:dm_done() to delay a requeue. This is beneficial to do if BLK_STS_RESOURCE is returned from the target (because target is busy). Relative to blk-mq: kick the hw queues via blk_mq_requeue_work(), indirectly from dm-rq.c:__dm_mq_kick_requeue_list(), after a delay. For old .request_fn: use blk_delay_queue(). bio-based multipath doesn't have feature parity with request-based for retryable error requeues; that is something that'll need fixing in the future. Suggested-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Bart Van Assche <bart.vanassche@wdc.com> [as interpreted from Bart's "... patch looks fine to me."]
2018-01-17dm mpath: return DM_MAPIO_DELAY_REQUEUE if QUEUE_IO or PG_INIT_REQUIREDMing Lei1-3/+2
Avoid using DM_MAPIO_REQUEUE unless absolutely necessary because it results in dm-rq.c:dm_mq_queue_rq() returning BLK_STS_RESOURCE to blk-mq -- doing so should only ever be done if the underlying queue is out of resources. So switch to returning DM_MAPIO_DELAY_REQUEUE from multipath_clone_and_map() if either MPATHF_QUEUE_IO or MPATHF_PG_INIT_REQUIRED are set. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm mpath: return DM_MAPIO_REQUEUE on blk-mq rq allocation failureMing Lei1-1/+13
blk-mq will rerun queue via RESTART or dispatch wake after one request is completed, so not necessary to wait random time for requeuing, we should trust blk-mq to do it. More importantly, we need to return BLK_STS_RESOURCE to blk-mq so that dequeuing from the I/O scheduler can be stopped, this results in improved I/O merging. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17dm: backfill missing calls to mutex_destroy()Mike Snitzer1-0/+1
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-10dm mpath: Use blk_path_errorKeith Busch1-17/+2
Uses common code for determining if an error should be retried on alternate path. Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-06dm mpath: factor out SCSI vs NVMe path selectionMike Snitzer1-13/+55
Trying to do both SCSI and NVMe bio-based handling with branching in the same common code has proven too tedious on a code maintenance level. In addition it slightly hurts IO performance. Fix this by factoring out __map_bio() and __map_bio_nvme(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-06dm mpath: optimize NVMe bio-based supportMike Snitzer1-76/+95
All code that deals with pg_init is not used with bio-based NVMe mode. This includes skipping initialization of pg_init related variables. Also, pg_init related members on 'struct multipath' have been grouped together. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-04dm mpath: implement NVMe bio-based supportMike Snitzer1-9/+21
This DM multipath NVMe bio-based support requires CONFIG_NVME_MULTIPATH to not be set. In the future hopefully NVMe multipath and DM multipath can co-exist more seemlessly. But as is, if CONFIG_NVME_MULTIPATH=Y then all the individal NVMe paths will remain hidden to upper layers and as such DM multipath will not be able to manage them. Though NVMe's native multipathing doesn't multipath namespaces across subsystems; so technically a user _could_ use CONFIG_NVME_MULTIPATH=Y and also use DM multipath to multipath across subsystems. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-03dm mpath: move dm_bio_restore out of endio methodMike Snitzer1-4/+3
Moving the dm_bio_restore() to process_queued_bios() avoids doing that work in multipath_end_io_bio(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-20dm mpath: optimize retrieval of bio_details from per-bio-dataMike Snitzer1-5/+3
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-20dm mpath: remove unnecessary memset() calls for per-io-dataMike Snitzer1-10/+6
All underlying members are initialized directly so the memset() calls are not needed. Also, initialize mpio->nr_bytes from the start since it never changes. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-20dm mpath: remove unused param from multipath_init_per_bio_data()Mike Snitzer1-6/+2
'struct dm_bio_details *' isn't ever needed. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-08dm mpath: fix bio-based multipath queue_if_no_path handlingMike Snitzer1-7/+42
Commit ca5beb76 ("dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH") caused bio-based DM-multipath to fail mptest's "test_02_sdev_delete". Restoring the logic that existed prior to commit ca5beb76 fixes this bio-based DM-multipath regression. Also verified all mptest tests pass with request-based DM-multipath. This commit effectively reverts commit ca5beb76 -- but it does so without reintroducing the need to take the m->lock spinlock in must_push_back_{rq,bio}. Fixes: ca5beb76 ("dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH") Cc: stable@vger.kernel.org # 4.12+ Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-04dm: fix various targets to dm_register_target after module __init resources createdmonty_pavel@sina.com1-9/+9
A NULL pointer is seen if two concurrent "vgchange -ay -K <vg name>" processes race to load the dm-thin-pool module: PID: 25992 TASK: ffff883cd7d23500 CPU: 4 COMMAND: "vgchange" #0 [ffff883cd743d600] machine_kexec at ffffffff81038fa9 0000001 [ffff883cd743d660] crash_kexec at ffffffff810c5992 0000002 [ffff883cd743d730] oops_end at ffffffff81515c90 0000003 [ffff883cd743d760] no_context at ffffffff81049f1b 0000004 [ffff883cd743d7b0] __bad_area_nosemaphore at ffffffff8104a1a5 0000005 [ffff883cd743d800] bad_area at ffffffff8104a2ce 0000006 [ffff883cd743d830] __do_page_fault at ffffffff8104aa6f 0000007 [ffff883cd743d950] do_page_fault at ffffffff81517bae 0000008 [ffff883cd743d980] page_fault at ffffffff81514f95 [exception RIP: kmem_cache_alloc+108] RIP: ffffffff8116ef3c RSP: ffff883cd743da38 RFLAGS: 00010046 RAX: 0000000000000004 RBX: ffffffff81121b90 RCX: ffff881bf1e78cc0 RDX: 0000000000000000 RSI: 00000000000000d0 RDI: 0000000000000000 RBP: ffff883cd743da68 R8: ffff881bf1a4eb00 R9: 0000000080042000 R10: 0000000000002000 R11: 0000000000000000 R12: 00000000000000d0 R13: 0000000000000000 R14: 00000000000000d0 R15: 0000000000000246 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 0000009 [ffff883cd743da70] mempool_alloc_slab at ffffffff81121ba5 0000010 [ffff883cd743da80] mempool_create_node at ffffffff81122083 0000011 [ffff883cd743dad0] mempool_create at ffffffff811220f4 0000012 [ffff883cd743dae0] pool_ctr at ffffffffa08de049 [dm_thin_pool] 0000013 [ffff883cd743dbd0] dm_table_add_target at ffffffffa0005f2f [dm_mod] 0000014 [ffff883cd743dc30] table_load at ffffffffa0008ba9 [dm_mod] 0000015 [ffff883cd743dc90] ctl_ioctl at ffffffffa0009dc4 [dm_mod] The race results in a NULL pointer because: Process A (vgchange -ay -K): a. send DM_LIST_VERSIONS_CMD ioctl; b. pool_target not registered; c. modprobe dm_thin_pool and wait until end. Process B (vgchange -ay -K): a. send DM_LIST_VERSIONS_CMD ioctl; b. pool_target registered; c. table_load->dm_table_add_target->pool_ctr; d. _new_mapping_cache is NULL and panic. Note: 1. process A and process B are two concurrent processes. 2. pool_target can be detected by process B but _new_mapping_cache initialization has not ended. To fix dm-thin-pool, and other targets (cache, multipath, and snapshot) with the same problem, simply dm_register_target() after all resources created during module init (as labelled with __init) are finished. Cc: stable@vger.kernel.org Signed-off-by: monty <monty_pavel@sina.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-17Merge tag 'for-4.15/dm-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dmLinus Torvalds1-2/+0
Pull more device mapper updates from Mike Snitzer: "Given your expected travel I figured I'd get these fixes to you sooner rather than later. - a DM multipath stable@ fix to silence an annoying error message that isn't _really_ an error - a DM core @stable fix for discard support that was enabled for an entire DM device despite only having partial support for discards due to a mix of discard capabilities across the underlying devices. - a couple other DM core discard fixes. - a DM bufio @stable fix that resolves a 32-bit overflow" * tag 'for-4.15/dm-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm bufio: fix integer overflow when limiting maximum cache size dm: clear all discard attributes in queue_limits when discards are disabled dm: do not set 'discards_supported' in targets that do not need it dm: discard support requires all targets in a table support discards dm mpath: remove annoying message of 'blk_get_request() returned -11'
2017-11-16dm mpath: remove annoying message of 'blk_get_request() returned -11'Ming Lei1-2/+0
It is very normal to see allocation failure, especially with blk-mq request_queues, so it's unnecessary to report this error and annoy people. In practice this 'blk_get_request() returned -11' error gets logged quite frequently when a blk-mq DM multipath device sees heavy IO. This change is marked for stable@ because the annoying message in question was included in stable@ commit 7083abbbf. Fixes: 7083abbbf ("dm mpath: avoid that path removal can trigger an infinite loop") Cc: stable@vger.kernel.org Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-14Merge tag 'gpio-v4.15-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpioLinus Torvalds1-15/+7
Pull GPIO updates from Linus Walleij: "This is the bulk of GPIO changes for the v4.15 kernel cycle: Core: - Fix the semantics of raw GPIO to actually be raw. No inversion semantics as before, but also no open draining, and allow the raw operations to affect lines used for interrupts as the caller supposedly knows what they are doing if they are getting the big hammer. - Rewrote the __inner_function() notation calls to names that make more sense. I just find this kind of code disturbing. - Drop the .irq_base() field from the gpiochip since now all IRQs are mapped dynamically. This is nice. - Support for .get_multiple() in the core driver API. This allows us to read several GPIO lines with a single register read. This has high value for some usecases: it can be used to create oscilloscopes and signal analyzers and other things that rely on reading several lines at exactly the same instant. Also a generally nice optimization. This uses the new assign_bit() macro from the bitops lib that was ACKed by Andrew Morton and is implemented for two drivers, one of them being the generic MMIO driver so everyone using that will be able to benefit from this. - Do not allow requests of Open Drain and Open Source setting of a GPIO line simultaneously. If the hardware actually supports enabling both at the same time the electrical result would be disastrous. - A new interrupt chip core helper. This will be helpful to deal with "banked" GPIOs, which means GPIO controllers with several logical blocks of GPIO inside them. This is several gpiochips per device in the device model, in contrast to the case when there is a 1-to-1 relationship between a device and a gpiochip. New drivers: - Maxim MAX3191x industrial serializer, a very interesting piece of professional I/O hardware. - Uniphier GPIO driver. This is the GPIO block from the recent Socionext (ex Fujitsu and Panasonic) platform. - Tegra 186 driver. This is based on the new banked GPIO infrastructure. Other improvements: - Some documentation improvements. - Wakeup support for the DesignWare DWAPB GPIO controller. - Reset line support on the DesignWare DWAPB GPIO controller. - Several non-critical bug fixes and improvements for the Broadcom BRCMSTB driver. - Misc non-critical bug fixes like exotic errorpaths, removal of dead code etc. - Explicit comments on fall-through switch() statements" * tag 'gpio-v4.15-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (65 commits) gpio: tegra186: Remove tegra186_gpio_lock_class gpio: rcar: Add r8a77995 (R-Car D3) support pinctrl: bcm2835: Fix some merge fallout gpio: Fix undefined lock_dep_class gpio: Automatically add lockdep keys gpio: Introduce struct gpio_irq_chip.first gpio: Disambiguate struct gpio_irq_chip.nested gpio: Add Tegra186 support gpio: Export gpiochip_irq_{map,unmap}() gpio: Implement tighter IRQ chip integration gpio: Move lock_key into struct gpio_irq_chip gpio: Move irq_valid_mask into struct gpio_irq_chip gpio: Move irq_nested into struct gpio_irq_chip gpio: Move irq_chained_parent to struct gpio_irq_chip gpio: Move irq_default_type to struct gpio_irq_chip gpio: Move irq_handler to struct gpio_irq_chip gpio: Move irqdomain into struct gpio_irq_chip gpio: Move irqchip into struct gpio_irq_chip gpio: Introduce struct gpio_irq_chip pinctrl: armada-37xx: remove unused variable ...
2017-10-24locking/barriers: Convert users of lockless_dereference() to READ_ONCE()Will Deacon1-10/+10
READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it can be used instead of lockless_dereference() without any change in semantics. Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-19bitops: Introduce assign_bit()Lukas Wunner1-15/+7
A common idiom is to assign a value to a bit with: if (value) set_bit(nr, addr); else clear_bit(nr, addr); Likewise common is the one-line expression variant: value ? set_bit(nr, addr) : clear_bit(nr, addr); Commit 9a8ac3ae682e ("dm mpath: cleanup QUEUE_IF_NO_PATH bit manipulation by introducing assign_bit()") introduced assign_bit() to the md subsystem for brevity. Make it available to others, specifically gpiolib and the upcoming driver for Maxim MAX3191x industrial serializer chips. As requested by Peter Zijlstra, change the argument order to reflect traditional "dst = src" in C, hence "assign_bit(nr, addr, value)". Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Neil Brown <neilb@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-09-14Merge tag 'for-4.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dmLinus Torvalds1-5/+10
Pull device mapper updates from Mike Snitzer: - Some request-based DM core and DM multipath fixes and cleanups - Constify a few variables in DM core and DM integrity - Add bufio optimization and checksum failure accounting to DM integrity - Fix DM integrity to avoid checking integrity of failed reads - Fix DM integrity to use init_completion - A couple DM log-writes target fixes - Simplify DAX flushing by eliminating the unnecessary flush abstraction that was stood up for DM's use. * tag 'for-4.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dax: remove the pmem_dax_ops->flush abstraction dm integrity: use init_completion instead of COMPLETION_INITIALIZER_ONSTACK dm integrity: make blk_integrity_profile structure const dm integrity: do not check integrity for failed read operations dm log writes: fix >512b sectorsize support dm log writes: don't use all the cpu while waiting to log blocks dm ioctl: constify ioctl lookup table dm: constify argument arrays dm integrity: count and display checksum failures dm integrity: optimize writing dm-bufio buffers that are partially changed dm rq: do not update rq partially in each ending bio dm rq: make dm-sq requeuing behavior consistent with dm-mq behavior dm mpath: complain about unsupported __multipath_map_bio() return values dm mpath: avoid that building with W=1 causes gcc 7 to complain about fall-through
2017-09-07Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+1
Pull block layer updates from Jens Axboe: "This is the first pull request for 4.14, containing most of the code changes. It's a quiet series this round, which I think we needed after the churn of the last few series. This contains: - Fix for a registration race in loop, from Anton Volkov. - Overflow complaint fix from Arnd for DAC960. - Series of drbd changes from the usual suspects. - Conversion of the stec/skd driver to blk-mq. From Bart. - A few BFQ improvements/fixes from Paolo. - CFQ improvement from Ritesh, allowing idling for group idle. - A few fixes found by Dan's smatch, courtesy of Dan. - A warning fixup for a race between changing the IO scheduler and device remova. From David Jeffery. - A few nbd fixes from Josef. - Support for cgroup info in blktrace, from Shaohua. - Also from Shaohua, new features in the null_blk driver to allow it to actually hold data, among other things. - Various corner cases and error handling fixes from Weiping Zhang. - Improvements to the IO stats tracking for blk-mq from me. Can drastically improve performance for fast devices and/or big machines. - Series from Christoph removing bi_bdev as being needed for IO submission, in preparation for nvme multipathing code. - Series from Bart, including various cleanups and fixes for switch fall through case complaints" * 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits) kernfs: checking for IS_ERR() instead of NULL drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set drbd: Fix allyesconfig build, fix recent commit drbd: switch from kmalloc() to kmalloc_array() drbd: abort drbd_start_resync if there is no connection drbd: move global variables to drbd namespace and make some static drbd: rename "usermode_helper" to "drbd_usermode_helper" drbd: fix race between handshake and admin disconnect/down drbd: fix potential deadlock when trying to detach during handshake drbd: A single dot should be put into a sequence. drbd: fix rmmod cleanup, remove _all_ debugfs entries drbd: Use setup_timer() instead of init_timer() to simplify the code. drbd: fix potential get_ldev/put_ldev refcount imbalance during attach drbd: new disk-option disable-write-same drbd: Fix resource role for newly created resources in events2 drbd: mark symbols static where possible drbd: Send P_NEG_ACK upon write error in protocol != C drbd: add explicit plugging when submitting batches drbd: change list_for_each_safe to while(list_first_entry_or_null) drbd: introduce drbd_recv_header_maybe_unplug ...
2017-08-28dm: constify argument arraysEric Biggers1-5/+5
The arrays of 'struct dm_arg' are never modified by the device-mapper core, so constify them so that they are placed in .rodata. (Exception: the args array in dm-raid cannot be constified because it is allocated on the stack and modified.) Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-08-28dm mpath: complain about unsupported __multipath_map_bio() return valuesBart Van Assche1-0/+4
WARN_ONCE() if __multipath_map_bio() returns an unsupported return value. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-08-28dm mpath: avoid that building with W=1 causes gcc 7 to complain about fall-throughBart Van Assche1-0/+1
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-08-28dm mpath: do not lock up a CPU with requeuing activityBart Van Assche1-1/+0
When using the block layer in single queue mode, get_request() returns ERR_PTR(-EAGAIN) if the queue is dying and the REQ_NOWAIT flag has been passed to get_request(). Avoid that the kernel reports soft lockup complaints in this case due to continuous requeuing activity. Fixes: 7083abbbf ("dm mpath: avoid that path removal can trigger an infinite loop") Cc: stable@vger.kernel.org Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Tested-by: Laurence Oberman <loberman@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-08-28dm mpath: retry BLK_STS_RESOURCE errorsBart Van Assche1-1/+0
Retry requests instead of failing them if an out-of-memory error occurs or the block driver below dm-mpath is busy. This restores the v4.12 behavior of noretry_error(), namely that -ENOMEM results in a retry. Fixes: 2a842acab109 ("block: introduce new block status code type") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-08-23block: replace bi_bdev with a gendisk pointer and partitions indexChristoph Hellwig1-1/+1
This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-14dm: missing break in process_queued_bios()Dan Carpenter1-0/+1
his used to be a fall through case, but we shifted code around and I think we want a break here now. Fixes: 4e4cbee93d56 ("block: switch bios to blk_status_t") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-09block: switch bios to blk_status_tChristoph Hellwig1-7/+8
Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09block: introduce new block status code typeChristoph Hellwig1-17/+10
Currently we use nornal Linux errno values in the block layer, and while we accept any error a few have overloaded magic meanings. This patch instead introduces a new blk_status_t value that holds block layer specific status codes and explicitly explains their meaning. Helpers to convert from and to the previous special meanings are provided for now, but I suspect we want to get rid of them in the long run - those drivers that have a errno input (e.g. networking) usually get errnos that don't know about the special block layer overloads, and similarly returning them to userspace will usually return somethings that strictly speaking isn't correct for file system operations, but that's left as an exercise for later. For now the set of errors is a very limited set that closely corresponds to the previous overloaded errno values, but there is some low hanging fruite to improve it. blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse typechecking, so that we can easily catch places passing the wrong values. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09dm: change ->end_io calling conventionChristoph Hellwig1-5/+6
Turn the error paramter into a pointer so that target drivers can change the value, and make sure only DM_ENDIO_* values are returned from the methods. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09dm: don't return errnos from ->mapChristoph Hellwig1-3/+10
Instead use the special DM_MAPIO_KILL return value to return -EIO just like we do for the request based path. Note that dm-log-writes returned -ENOMEM in a few places, which now becomes -EIO instead. No consumer treats -ENOMEM special so this shouldn't be an issue (and it should use a mempool to start with to make guaranteed progress). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09dm mpath: merge do_end_io_bio into multipath_end_io_bioChristoph Hellwig1-27/+15
This simplifies the code and especially the error passing a bit and will help with the next patch. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-15dm mpath: multipath_clone_and_map must not return -EIOChristoph Hellwig1-1/+1
Since 412445ac ("dm: introduce a new DM_MAPIO_KILL return value"), the clone_and_map_rq methods must not return errno values, so fix it up to properly return DM_MAPIO_KILL, instead of the -EIO value that snuck in due to a conflict between two patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-05-15dm mpath: don't return -EIO from dm_report_EIOChristoph Hellwig1-8/+11
Instead just turn the macro into a helper for the warning message. This removes an unnecessary assignment and will allow the next commit to fix a place where -EIO is the wrong return value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-05-01dm rq: change ->rq_end_io calling conventionsChristoph Hellwig1-4/+9
Instead of returning either a DM_ENDIO_* constant or an error code, add a new DM_ENDIO_DONE value that means keep errno as is. This allows us to easily keep the existing error code in case where we can't push back, and it also preparares for the new block level status codes with strict type checking. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-05-01dm mpath: merge do_end_io into multipath_end_ioChristoph Hellwig1-34/+17
This simplifies the I/O completion path a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-05-01Merge branch 'dm-4.12' into dm-4.12-post-mergeMike Snitzer1-93/+78
2017-04-27dm mpath: make it easier to detect unintended I/O request flushesBart Van Assche1-4/+21
I/O errors triggered by multipathd incorrectly not enabling the no-flush flag for DM_DEVICE_SUSPEND or DM_DEVICE_RESUME are hard to debug. Add more logging to make it easier to debug this. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-27dm mpath: cleanup QUEUE_IF_NO_PATH bit manipulation by introducing assign_bit()Bart Van Assche1-21/+15
No functional change but makes the code easier to read. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-27dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATHBart Van Assche1-59/+11
Instead of checking MPATHF_QUEUE_IF_NO_PATH, MPATHF_SAVED_QUEUE_IF_NO_PATH and the no_flush flag to decide whether or not to push back a request (or bio) if there are no paths available, only clear MPATHF_QUEUE_IF_NO_PATH in queue_if_no_path() if no_flush has not been set. The result is that only a single bit has to be tested in the hot path to decide whether or not a request must be pushed back and also that m->lock does not have to be taken in the hot path. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-27dm: introduce enum dm_queue_mode to cleanup related codeBart Van Assche1-1/+4
Introduce an enumeration type for the queue mode. This patch does not change any functionality but makes the DM code easier to read. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-27dm mpath: verify __pg_init_all_paths locking assumptions at runtimeBart Van Assche1-0/+2
Verify at runtime that __pg_init_all_paths() is called with multipath.lock held if lockdep is enabled. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-27dm mpath: delay requeuing while path initialization is in progressBart Van Assche1-3/+7
Requeuing a request immediately while path initialization is ongoing causes high CPU usage, something that is undesired. Hence delay requeuing while path initialization is in progress. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-27dm mpath: avoid that path removal can trigger an infinite loopBart Van Assche1-4/+11
If blk_get_request() fails, check whether the failure is due to a path being removed. If that is the case, fail the path by triggering a call to fail_path(). This avoids that the following scenario can be encountered while removing paths: * CPU usage of a kworker thread jumps to 100%. * Removing the DM device becomes impossible. Delay requeueing if blk_get_request() returns -EBUSY or -EWOULDBLOCK, and the queue is not dying, because in these cases immediate requeuing is inappropriate. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-27dm mpath: split and rename activate_path() to prepare for its expanded useBart Van Assche1-5/+12
activate_path() is renamed to activate_path_work() which now calls activate_or_offline_path(). activate_or_offline_path() will be used by the next commit. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-24dm mpath: requeue after a small delay if blk_get_request() failsBart Van Assche1-3/+2
If blk_get_request() returns ENODEV then multipath_clone_and_map() causes a request to be requeued immediately. This can cause a kworker thread to spend 100% of the CPU time of a single core in __blk_mq_run_hw_queue() and also can cause device removal to never finish. Avoid this by only requeuing after a delay if blk_get_request() fails. Additionally, reduce the requeue delay. Cc: stable@vger.kernel.org # 4.9+ Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-20dm mpath: don't check for req->errorsChristoph Hellwig1-1/+1
We'll get all proper errors reported through ->end_io and ->errors will go away soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08dm: support REQ_OP_WRITE_ZEROESChristoph Hellwig1-0/+1
Copy & paste from the REQ_OP_WRITE_SAME code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>