aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/drivers/block (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-05-02virtio-blk: handle block_device_operations callbacks after hot unplugStefan Hajnoczi1-8/+78
A userspace process holding a file descriptor to a virtio_blk device can still invoke block_device_operations after hot unplug. This leads to a use-after-free accessing vblk->vdev in virtblk_getgeo() when ioctl(HDIO_GETGEO) is invoked: BUG: unable to handle kernel NULL pointer dereference at 0000000000000090 IP: [<ffffffffc00e5450>] virtio_check_driver_offered_feature+0x10/0x90 [virtio] PGD 800000003a92f067 PUD 3a930067 PMD 0 Oops: 0000 [#1] SMP CPU: 0 PID: 1310 Comm: hdio-getgeo Tainted: G OE ------------ 3.10.0-1062.el7.x86_64 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 task: ffff9be5fbfb8000 ti: ffff9be5fa890000 task.ti: ffff9be5fa890000 RIP: 0010:[<ffffffffc00e5450>] [<ffffffffc00e5450>] virtio_check_driver_offered_feature+0x10/0x90 [virtio] RSP: 0018:ffff9be5fa893dc8 EFLAGS: 00010246 RAX: ffff9be5fc3f3400 RBX: ffff9be5fa893e30 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff9be5fbc10b40 RBP: ffff9be5fa893dc8 R08: 0000000000000301 R09: 0000000000000301 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9be5fdc24680 R13: ffff9be5fbc10b40 R14: ffff9be5fbc10480 R15: 0000000000000000 FS: 00007f1bfb968740(0000) GS:ffff9be5ffc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000090 CR3: 000000003a894000 CR4: 0000000000360ff0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: [<ffffffffc016ac37>] virtblk_getgeo+0x47/0x110 [virtio_blk] [<ffffffff8d3f200d>] ? handle_mm_fault+0x39d/0x9b0 [<ffffffff8d561265>] blkdev_ioctl+0x1f5/0xa20 [<ffffffff8d488771>] block_ioctl+0x41/0x50 [<ffffffff8d45d9e0>] do_vfs_ioctl+0x3a0/0x5a0 [<ffffffff8d45dc81>] SyS_ioctl+0xa1/0xc0 A related problem is that virtblk_remove() leaks the vd_index_ida index when something still holds a reference to vblk->disk during hot unplug. This causes virtio-blk device names to be lost (vda, vdb, etc). Fix these issues by protecting vblk->vdev with a mutex and reference counting vblk so the vd_index_ida index can be removed in all cases. Fixes: 48e4043d4529 ("virtio: add virtio disk geometry feature") Reported-by: Lance Digby <ldigby@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200430140442.171016-1-stefanha@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2020-04-24Merge tag 'block-5.7-2020-04-24' of git://git.kernel.dk/linux-blockLinus Torvalds3-53/+83
Pull block fixes from Jens Axboe: "A few fixes/changes that should go into this release: - null_blk zoned fixes (Damien) - blkdev_close() sync improvement (Douglas) - Fix regression in blk-iocost that impacted (at least) systemtap (Waiman) - Comment fix, header removal (Zhiqiang, Jianpeng)" * tag 'block-5.7-2020-04-24' of git://git.kernel.dk/linux-block: null_blk: Cleanup zoned device initialization null_blk: Fix zoned command handling block: remove unused header blk-iocost: Fix error on iocost_ioc_vrate_adj bdev: Reduce time holding bd_mutex in sync in blkdev_close() buffer: remove useless comment and WB_REASON_FREE_MORE_MEM, reason.
2020-04-23null_blk: Cleanup zoned device initializationDamien Le Moal3-26/+36
Move all zoned mode related code from null_blk_main.c to null_blk_zoned.c, avoiding an ugly #ifdef in the process. Rename null_zone_init() into null_init_zoned_dev(), null_zone_exit() into null_free_zoned_dev() and add the new function null_register_zoned_dev() to finalize the zoned dev setup before add_disk(). Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-23null_blk: Fix zoned command handlingDamien Le Moal3-27/+47
For write operations issued to a null_blk device with zoned mode enabled, the state and write pointer position of the zone targeted by the command should be checked before badblocks and memory backing are handled as the write may be first failed due to, for instance, a sector position not aligned with the zone write pointer. This order of checking for errors reflects more accuratly the behavior of physical zoned devices. Furthermore, the write pointer position of the target zone should be incremented only and only if no errors are reported by badblocks and memory backing handling. To fix this, introduce the small helper function null_process_cmd() which execute null_handle_badblocks() and null_handle_memory_backed() and use this function in null_zone_write() to correctly handle write requests to zoned null devices depending on the type and state of the write target zone. Also call this function in null_handle_zoned() to process read requests to zoned null devices. null_process_cmd() is called directly from null_handle_cmd() for regular null devices, resulting in no functional change for these type of devices. To have symmetric names, the function null_handle_zoned() is renamed to null_process_zoned_cmd(). Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-21Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds1-0/+1
Pull virtio fixes and cleanups from Michael Tsirkin: - Some bug fixes - Cleanup a couple of issues that surfaced meanwhile - Disable vhost on ARM with OABI for now - to be fixed fully later in the cycle or in the next release. * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (24 commits) vhost: disable for OABI virtio: drop vringh.h dependency virtio_blk: add a missing include virtio-balloon: Avoid using the word 'report' when referring to free page hinting virtio-balloon: make virtballoon_free_page_report() static vdpa: fix comment of vdpa_register_device() vdpa: make vhost, virtio depend on menu vdpa: allow a 32 bit vq alignment drm/virtio: fix up for include file changes remoteproc: pull in slab.h rpmsg: pull in slab.h virtio_input: pull in slab.h remoteproc: pull in slab.h virtio-rng: pull in slab.h virtgpu: pull in uaccess.h tools/virtio: make asm/barrier.h self contained tools/virtio: define aligned attribute virtio/test: fix up after IOTLB changes vhost: Create accessors for virtqueues private_data vdpasim: Return status in vdpasim_get_status ...
2020-04-17virtio_blk: add a missing includeMichael S. Tsirkin1-0/+1
virtio_blk uses VIRTIO_RING_F_INDIRECT_DESC, pull in the header defining that value. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-04-13rbd: don't mess with a page vector in rbd_notify_op_lock()Ilya Dryomov1-5/+1
rbd_notify_op_lock() isn't interested in a notify reply. Instead of accepting that page vector just to free it, have watch-notify code take care of it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2020-04-13rbd: don't test rbd_dev->opts in rbd_dev_image_release()Ilya Dryomov1-1/+1
rbd_dev->opts is used to distinguish between the image that is being mapped and a parent. However, because we no longer establish watch for read-only mappings, this test is imprecise and results in unnecessary rbd_unregister_watch() calls. Make it consistent with need_watch in rbd_dev_image_probe(). Fixes: b9ef2b8858a0 ("rbd: don't establish watch for read-only mappings") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2020-04-13rbd: call rbd_dev_unprobe() after unwatching and flushing notifiesIlya Dryomov1-4/+4
rbd_dev_unprobe() is supposed to undo most of rbd_dev_image_probe(), including rbd_dev_header_info(), which means that rbd_dev_header_info() isn't supposed to be called after rbd_dev_unprobe(). However, rbd_dev_image_release() calls rbd_dev_unprobe() before rbd_unregister_watch(). This is racy because a header update notify can sneak in: "rbd unmap" thread ceph-watch-notify worker rbd_dev_image_release() rbd_dev_unprobe() free and zero out header rbd_watch_cb() rbd_dev_refresh() rbd_dev_header_info() read in header The same goes for "rbd map" because rbd_dev_image_probe() calls rbd_dev_unprobe() on errors. In both cases this results in a memory leak. Fixes: fd22aef8b47c ("rbd: move rbd_unregister_watch() call into rbd_dev_image_release()") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2020-04-13rbd: avoid a deadlock on header_rwsem when flushing notifiesIlya Dryomov1-4/+13
rbd_unregister_watch() flushes notifies and therefore cannot be called under header_rwsem because a header update notify takes header_rwsem to synchronize with "rbd map". If mapping an image fails after the watch is established and a header update notify sneaks in, we deadlock when erroring out from rbd_dev_image_probe(). Move watch registration and unregistration out of the critical section. The only reason they were put there was to make header_rwsem management slightly more obvious. Fixes: 811c66887746 ("rbd: fix rbd map vs notify races") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2020-04-10Merge tag 'for-linus-5.7-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds1-5/+12
Pull more xen updates from Juergen Gross: - two cleanups - fix a boot regression introduced in this merge window - fix wrong use of memory allocation flags * tag 'for-linus-5.7-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: fix booting 32-bit pv guest x86/xen: make xen_pvmmu_arch_setup() static xen/blkfront: fix memory allocation flags in blkfront_setup_indirect() xen: Use evtchn_type_t as a type for event channels
2020-04-10Merge tag 'block-5.7-2020-04-10' of git://git.kernel.dk/linux-blockLinus Torvalds1-13/+36
Pull block fixes from Jens Axboe: "Here's a set of fixes that should go into this merge window. This contains: - NVMe pull request from Christoph with various fixes - Better discard support for loop (Evan) - Only call ->commit_rqs() if we have queued IO (Keith) - blkcg offlining fixes (Tejun) - fix (and fix the fix) for busy partitions" * tag 'block-5.7-2020-04-10' of git://git.kernel.dk/linux-block: block: fix busy device checking in blk_drop_partitions again block: fix busy device checking in blk_drop_partitions nvmet-rdma: fix double free of rdma queue blk-mq: don't commit_rqs() if none were queued nvme-fc: Revert "add module to ops template to allow module references" nvme: fix deadlock caused by ANA update wrong locking nvmet-rdma: fix bonding failover possible NULL deref loop: Better discard support for block devices loop: Report EOPNOTSUPP properly nvmet: fix NULL dereference when removing a referral nvme: inherit stable pages constraint in the mpath stack device blkcg: don't offline parent blkcg first blkcg: rename blkcg->cgwb_refcnt to ->online_pin and always use it nvme-tcp: fix possible crash in recv error flow nvme-tcp: don't poll a non-live queue nvme-tcp: fix possible crash in write_zeroes processing nvmet-fc: fix typo in comment nvme-rdma: Replace comma with a semicolon nvme-fcloop: fix deallocation of working context nvme: fix compat address handling in several ioctls
2020-04-08Merge tag 'ceph-for-5.7-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds1-136/+79
Pull ceph updates from Ilya Dryomov: "The main items are: - support for asynchronous create and unlink (Jeff Layton). Creates and unlinks are satisfied locally, without waiting for a reply from the MDS, provided the client has been granted appropriate caps (new in v15.y.z ("Octopus") release). This can be a big help for metadata heavy workloads such as tar and rsync. Opt-in with the new nowsync mount option. - multiple blk-mq queues for rbd (Hannes Reinecke and myself). When the driver was converted to blk-mq, we settled on a single blk-mq queue because of a global lock in libceph and some other technical debt. These have since been addressed, so allocate a queue per CPU to enhance parallelism. - don't hold onto caps that aren't actually needed (Zheng Yan). This has been our long-standing behavior, but it causes issues with some active/standby applications (synchronous I/O, stalls if the standby goes down, etc). - .snap directory timestamps consistent with ceph-fuse (Luis Henriques)" * tag 'ceph-for-5.7-rc1' of git://github.com/ceph/ceph-client: (49 commits) ceph: fix snapshot directory timestamps ceph: wait for async creating inode before requesting new max size ceph: don't skip updating wanted caps when cap is stale ceph: request new max size only when there is auth cap ceph: cleanup return error of try_get_cap_refs() ceph: return ceph_mdsc_do_request() errors from __get_parent() ceph: check all mds' caps after page writeback ceph: update i_requested_max_size only when sending cap msg to auth mds ceph: simplify calling of ceph_get_fmode() ceph: remove delay check logic from ceph_check_caps() ceph: consider inode's last read/write when calculating wanted caps ceph: always renew caps if mds_wanted is insufficient ceph: update dentry lease for async create ceph: attempt to do async create when possible ceph: cache layout in parent dir on first sync create ceph: add new MDS req field to hold delegated inode number ceph: decode interval_sets for delegated inos ceph: make ceph_fill_inode non-static ceph: perform asynchronous unlink if we have sufficient caps ceph: don't take refs to want mask unless we have all bits ...
2020-04-07xen/blkfront: fix memory allocation flags in blkfront_setup_indirect()Juergen Gross1-5/+12
Commit 1d5c76e664333 ("xen-blkfront: switch kcalloc to kvcalloc for large array allocation") didn't fix the issue it was meant to, as the flags for allocating the memory are GFP_NOIO, which will lead the memory allocation falling back to kmalloc(). So instead of GFP_NOIO use GFP_KERNEL and do all the memory allocation in blkfront_setup_indirect() in a memalloc_noio_{save,restore} section. Fixes: 1d5c76e664333 ("xen-blkfront: switch kcalloc to kvcalloc for large array allocation") Cc: stable@vger.kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20200403090034.8753-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com>
2020-04-03loop: Better discard support for block devicesEvan Green1-11/+31
If the backing device for a loop device is itself a block device, then mirror the "write zeroes" capabilities of the underlying block device into the loop device. Copy this capability into both max_write_zeroes_sectors and max_discard_sectors of the loop device. The reason for this is that REQ_OP_DISCARD on a loop device translates into blkdev_issue_zeroout(), rather than blkdev_issue_discard(). This presents a consistent interface for loop devices (that discarded data is zeroed), regardless of the backing device type of the loop device. There should be no behavior change for loop devices backed by regular files. This change fixes blktest block/003, and removes an extraneous error print in block/013 when testing on a loop device backed by a block device that does not support discard. Signed-off-by: Evan Green <evgreen@chromium.org> Reviewed-by: Gwendal Grignou <gwendal@chromium.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [used updated version of Evan's comment in loop_config_discard()] [moved backingq to local scope, removed redundant braces] Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-04-03loop: Report EOPNOTSUPP properlyEvan Green1-2/+5
Properly plumb out EOPNOTSUPP from loop driver operations, which may get returned when for instance a discard operation is attempted but not supported by the underlying block device. Before this change, everything was reported in the log as an I/O error, which is scary and not helpful in debugging. Signed-off-by: Evan Green <evgreen@chromium.org> Reviewed-by: Gwendal Grignou <gwendal@chromium.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-30Merge tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-blockLinus Torvalds11-556/+726
Pull block driver updates from Jens Axboe: - floppy driver cleanup series from Willy - NVMe updates and fixes (Various) - null_blk trace improvements (Chaitanya) - bcache fixes (Coly) - md fixes (via Song) - loop block size change optimizations (Martijn) - scnprintf() use (Takashi) * tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-block: (81 commits) null_blk: add trace in null_blk_zoned.c null_blk: add tracepoint helpers for zoned mode block: add a zone condition debug helper nvme: cleanup namespace identifier reporting in nvme_init_ns_head nvme: rename __nvme_find_ns_head to nvme_find_ns_head nvme: refactor nvme_identify_ns_descs error handling nvme-tcp: Add warning on state change failure at nvme_tcp_setup_ctrl nvme-rdma: Add warning on state change failure at nvme_rdma_setup_ctrl nvme: Fix controller creation races with teardown flow nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrl nvme: Fix ctrl use-after-free during sysfs deletion nvme-pci: Re-order nvme_pci_free_ctrl nvme: Remove unused return code from nvme_delete_ctrl_sync nvme: Use nvme_state_terminal helper nvme: release ida resources nvme: Add compat_ioctl handler for NVME_IOCTL_SUBMIT_IO nvmet-tcp: optimize tcp stack TX when data digest is used nvme-fabrics: Use scnprintf() for avoiding potential buffer overflow nvme-multipath: do not reset on unknown status nvmet-rdma: allocate RW ctxs according to mdts ...
2020-03-30Merge tag 'for-5.7/block-2020-03-29' of git://git.kernel.dk/linux-blockLinus Torvalds12-73/+83
Pull block updates from Jens Axboe: - Online capacity resizing (Balbir) - Number of hardware queue change fixes (Bart) - null_blk fault injection addition (Bart) - Cleanup of queue allocation, unifying the node/no-node API (Christoph) - Cleanup of genhd, moving code to where it makes sense (Christoph) - Cleanup of the partition handling code (Christoph) - disk stat fixes/improvements (Konstantin) - BFQ improvements (Paolo) - Various fixes and improvements * tag 'for-5.7/block-2020-03-29' of git://git.kernel.dk/linux-block: (72 commits) block: return NULL in blk_alloc_queue() on error block: move bio_map_* to blk-map.c Revert "blkdev: check for valid request queue before issuing flush" block: simplify queue allocation bcache: pass the make_request methods to blk_queue_make_request null_blk: use blk_mq_init_queue_data block: add a blk_mq_init_queue_data helper block: move the ->devnode callback to struct block_device_operations block: move the part_stat* helpers from genhd.h to a new header block: move block layer internals out of include/linux/genhd.h block: move guard_bio_eod to bio.c block: unexport get_gendisk block: unexport disk_map_sector_rcu block: unexport disk_get_part block: mark part_in_flight and part_in_flight_rw static block: mark block_depr static block: factor out requeue handling from dispatch code block/diskstats: replace time_in_queue with sum of request times block/diskstats: accumulate all per-cpu counters in one pass block/diskstats: more accurate approximation of io_ticks for slow disks ...
2020-03-30rbd: enable multiple blk-mq queuesHannes Reinecke1-1/+1
Allocate one queue per CPU and get a performance boost from higher parallelism. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-30rbd: embed image request in blk-mq pduIlya Dryomov1-87/+51
Avoid making allocations for !IMG_REQ_CHILD image requests. Only IMG_REQ_CHILD image requests need to be freed now. Move the initial request checks to rbd_queue_rq(). Unfortunately we can't fill the image request and kick the state machine directly from rbd_queue_rq() because ->queue_rq() isn't allowed to block. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-30rbd: acquire header_rwsem just once in rbd_queue_workfn()Ilya Dryomov1-28/+31
Currently header_rwsem is acquired twice: once in rbd_dev_parent_get() when the image request is being created and then in rbd_queue_workfn() to capture mapping_size and snapc. Introduce rbd_img_capture_header() and move image request allocation so that header_rwsem can be acquired just once. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-30rbd: get rid of img_request_layered_clear()Ilya Dryomov1-8/+1
No need to clear IMG_REQ_LAYERED before destroying the request. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-30rbd: kill img_request krefHannes Reinecke1-19/+5
The reference counter is never increased, so we can as well call rbd_img_request_destroy() directly and drop the kref. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-30rbd: remove barriers from img_request_layered_{set,clear,test}()Ilya Dryomov1-3/+0
IMG_REQ_LAYERED is set in rbd_img_request_create(), and tested and cleared in rbd_img_request_destroy() when the image request is about to be destroyed. The barriers are unnecessary. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-27null_blk: add trace in null_blk_zoned.cChaitanya Kulkarni1-1/+11
With the help of previously added tracepoints we can now trace report-zones, zone-write and zone-mgmt ops in null_blk_zoned.c. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-27null_blk: add tracepoint helpers for zoned modeChaitanya Kulkarni3-0/+106
This patch adds two new tracpoints for null_blk_zoned.c that allows us to trace report-zones, zone-mgmt-op and zone-write operations which has direct effect on the zone condition state machine. Also, we update drivers/block/Makefile so that new null_blk related tracefiles can be compiled. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-27block: simplify queue allocationChristoph Hellwig8-19/+8
Current make_request based drivers use either blk_alloc_queue_node or blk_alloc_queue to allocate a queue, and then set up the make_request_fn function pointer and a few parameters using the blk_queue_make_request helper. Simplify this by passing the make_request pointer to blk_alloc_queue, and while at it merge the _node variant into the main helper by always passing a node_id, and remove the superfluous gfp_mask parameter. A lower-level __blk_alloc_queue is kept for the blk-mq case. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-27null_blk: use blk_mq_init_queue_dataChristoph Hellwig1-22/+1
Use the new blk_mq_init_queue_data instead of open coding the queue allocation and initialization. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-27block: move the ->devnode callback to struct block_device_operationsChristoph Hellwig1-6/+6
There really isn't any good reason to stash a method directly into struct gendisk. Move it together with the other block device operations. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-25block: move the part_stat* helpers from genhd.h to a new headerChristoph Hellwig3-0/+3
These macros are just used by a few files. Move them out of genhd.h, which is included everywhere into a new standalone header. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-19rsxx: Replace zero-length array with flexible-array memberGustavo A. R. Silva1-1/+1
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertenly introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-18xen-blkfront.c: Convert to use set_capacity_revalidate_and_notifyBalbir Singh1-5/+1
block/genhd provides set_capacity_revalidate_and_notify() for sending RESIZE notifications via uevents. Signed-off-by: Balbir Singh <sblbir@amazon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-18virtio_blk.c: Convert to use set_capacity_revalidate_and_notifyBalbir Singh1-4/+1
block/genhd provides set_capacity_revalidate_and_notify() for sending RESIZE notifications via uevents. Signed-off-by: Balbir Singh <sblbir@amazon.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-16floppy: rename the global "fdc" variable to "current_fdc"Willy Tarreau1-130/+137
This is done in order to remove the confusion that arises at some places in the code where local variables or arguments shadow the global variable. It is already visible that some places are a bit awkward and iterate over the global variable, for the sole reason that they used to rely on it being named "fdc" in order to get the correct address when using FD_DOR. These ones are easy to spot by searching for "for (current_fdc...". Some more cleanup is definitely possible. For example "fdc_state[current_fdc].somefield" is used all over the code and would probably be better with "fdc_state->somefield" with fdc_state being set when current_fdc is assigned. This would require to pass the pointer to the current state instead of the current_fdc to the I/O functions. Link: https://lore.kernel.org/r/20200301195555.11154-7-w@1wt.eu Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-16floppy: separate the FDC's base address from its registersWilly Tarreau1-5/+4
FDC registers FD_STATUS, FD_DATA, FD_DOR, FD_DIR and FD_DCR used to be defined relative to FD_IOPORT, which is the FDC's base address, itself a macro depending on the "fdc" local or global variable. This patch changes this so that the register macros above now only reference the address offset, and that the FDC's address is explicitly passed in each call to fd_inb() and fd_outb(), thus removing the macro. With this change there is no more implicit usage of the local/global "fdc" variable. One place in the ARM code used to check if the port was equal to FD_DOR, this was changed to testing the register by applying a mask to the port, as was already done in the sparc code. There are still occurrences of fd_inb() and fd_outb() in the PARISC code and these ones remain unaffected since they already used to work with a base address and a register offset. The sparc, m68k and parisc code could now be slightly cleaned up to benefit from the macro definitions above instead of the equivalent hard-coded values. Link: https://lore.kernel.org/r/20200301195555.11154-6-w@1wt.eu Cc: Ian Molton <spyro@f2s.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-16floppy: introduce new functions fdc_inb() and fdc_outb()Willy Tarreau1-16/+26
These two functions replace fd_inb() and fd_outb() in that they take the FDC in argument. This will ease the separation of the base address and the port everywhere the code is used. Link: https://lore.kernel.org/r/20200301195555.11154-5-w@1wt.eu Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-16floppy: cleanup: expand the reply_buffer macrosWilly Tarreau1-39/+47
Several macros were used to access reply_buffer[] at discrete positions without making it obvious they were relying on this. These ones have been replaced by their offset in the reply buffer to make these accesses more obvious. Link: https://lore.kernel.org/r/20200224212352.8640-11-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-16floppy: cleanup: expand the R/W / format command macrosWilly Tarreau1-96/+98
Various macros were used to access raw_cmd for R/W or format commands without making it obvious that raw_cmd->cmd[] was used. Let's expand the macros to make this more obvious. Link: https://lore.kernel.org/r/20200224212352.8640-10-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-16floppy: cleanup: expand macro DRWEWilly Tarreau1-9/+7
This macro doesn't bring much value and only slightly obfuscates the code by silently using global variable "current_drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-9-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-16floppy: cleanup: expand macro DRSWilly Tarreau1-54/+61
This macro doesn't bring much value and only slightly obfuscates the code by silently using global variable "current_drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-8-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-16floppy: cleanup: expand macro DPWilly Tarreau1-37/+47
This macro doesn't bring much value and only slightly obfuscates the code by silently using global variable "current_drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-7-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-16floppy: cleanup: expand macro UDRWEWilly Tarreau1-6/+4
This macro doesn't bring much value and only slightly obfuscates the code by silently using local variable "drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-6-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-16floppy: cleanup: expand macro UDRSWilly Tarreau1-79/+83
This macro doesn't bring much value and only slightly obfuscates the code by silently using local variable "drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-5-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-16floppy: cleanup: expand macro UDPWilly Tarreau1-75/+77
This macro doesn't bring much value and only slightly obfuscates the code by silently using local variable "drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-4-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-16floppy: cleanup: expand macro UFDCSWilly Tarreau1-5/+4
This macro doesn't bring much value and only slightly obfuscates the code by silently using local variable "drive", let's expand it. Link: https://lore.kernel.org/r/20200224212352.8640-3-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-16floppy: cleanup: expand macro FDCSWilly Tarreau1-92/+91
Macro FDCS silently uses identifier "fdc" which may be either the global one or a local one. Let's expand the macro to make this more obvious. Link: https://lore.kernel.org/r/20200224212352.8640-2-w@1wt.eu Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-12null_blk: describe the usage of fault injection paramDongli Zhang1-0/+7
As null_blk is a very good start point to test block layer, this patch adds description and comments to 'timeout', 'requeue' and 'init_hctx' to explain how to use fault injection with null_blk. The nvme has similar with nvme_core.fail_request in the form of comment. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-12null_blk: fix spurious IO errors after failed past-wp accessAlexey Dobriyan1-0/+2
Steps to reproduce: BLKRESETZONE zone 0 // force EIO pwrite(fd, buf, 4096, 4096); [issue more IO including zone ioctls] It will start failing randomly including IO to unrelated zones because of ->error "reuse". Trigger can be partition detection as well if test is not run immediately which is even more entertaining. The fix is of course to clear ->error where necessary. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexey Dobriyan (SK hynix) <adobriyan@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-12nbd: requeue command if the soecket is changedHou Pu1-0/+10
In commit 2da22da5734 (nbd: fix zero cmd timeout handling v2), it is allowed to reset timer when it fires if tag_set.timeout is set to zero. If the server is shutdown and a new socket is reconfigured, the request should be requeued to be processed by new server instead of waiting for response from the old one. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Hou Pu <houpu@bytedance.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-12nbd: enable replace socket if only one connection is configuredHou Pu1-8/+9
Nbd server with multiple connections could be upgraded since 560bc4b (nbd: handle dead connections). But if only one conncection is configured, after we take down nbd server, all inflight IO would finally timeout and return error. We could requeue them like what we do with multiple connections and wait for new socket in submit path. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Hou Pu <houpu@bytedance.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>