aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/block/blk-zoned.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2024-02-22block: Do not include rbtree.h in blk-zoned.cDamien Le Moal1-1/+0
The block zone code does not use RB-tree. So remove the include of linux/rbtree.h as it is not needed. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240222131724.1803520-2-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-12block: remove gfp_flags from blkdev_zone_mgmtJohannes Thumshirn1-11/+8
Now that all callers pass in GFP_KERNEL to blkdev_zone_mgmt() and use memalloc_no{io,fs}_{save,restore}() to define the allocation scope, we can drop the gfp_mask parameter from blkdev_zone_mgmt() as well as blkdev_zone_reset_all() and blkdev_zone_reset_all_emulated(). Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20240128-zonefs_nofs-v3-5-ae3b7c8def61@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-01-08block: Treat sequential write preferred zone type as invalidDamien Le Moal1-1/+1
With the removal of the support for host-aware zoned devices, blk_revalidate_zone_cb() should never see the zone type BLK_ZONE_TYPE_SEQWRITE_PREF (sequential write preffered zones). Treat this zone type as being invalid. Fixes: 7437bb73f087 ("block: remove support for the host aware zone model") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240107072212.1071080-1-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-01-08block: remove disk_clear_zonedChristoph Hellwig1-21/+0
disk_clear_zoned is unused now that the last warts of the host-aware model support in sd are gone. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20231228075141.362560-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-20block: export disk_clear_zoned()Jens Axboe1-0/+1
A previous commit split disk_set_zoned(..., bool) into not taking an argument for whether to set or clear, and instead added disk_clear_zoned() as the counterpart. However, that commit neglected to export the new symbol, causing failures for modular drivers that used it. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: d73e93b4dfab ("block: simplify disk_set_zoned") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-12-19block: simplify disk_set_zonedChristoph Hellwig1-1/+2
Only use disk_set_zoned to actually enable zoned device support. For clearing it, call disk_clear_zoned, which is renamed from disk_clear_zone_settings and now directly clears the zoned flag as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20231217165359.604246-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-11Merge branch '6.5/scsi-staging' into 6.5/scsi-fixesMartin K. Petersen1-36/+50
Pull in the currently staged SCSI fixes for 6.5. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-07-05scsi: block: Improve checks in blk_revalidate_disk_zones()Damien Le Moal1-36/+50
blk_revalidate_disk_zones() implements checks of the zones of a zoned block device, verifying that the zone size is a power of 2 number of sectors, that all zones (except possibly the last one) have the same size and that zones cover the entire addressing space of the device. While these checks are appropriate to verify that well tested hardware devices have an adequate zone configurations, they lack in certain areas which may result in issues with emulated devices implemented with user drivers such as ublk or tcmu. Specifically, this function does not check if the device driver indicated support for the mandatory zone append writes, that is, if the device max_zone_append_sectors queue limit is set to a non-zero value. Additionally, invalid zones such as a zero length zone with a start sector equal to the device capacity will not be detected and result in out of bounds use of the zone bitmaps prepared with the callback function blk_revalidate_zone_cb(). Improve blk_revalidate_disk_zones() to address these inadequate checks, relying on the fact that all device drivers supporting zoned block devices must set the device zone size (chunk_sectors queue limit) and the max_zone_append_sectors queue limit before executing this function. The check for a non-zero max_zone_append_sectors value is done in blk_revalidate_disk_zones() before executing the zone report. The zone report callback function blk_revalidate_zone_cb() is also modified to add a check that a zone start is below the device capacity. The check that the zone size is a power of 2 number of sectors is moved to blk_revalidate_disk_zones() as the zone size is already known. Similarly, the number of zones of the device can be calculated in blk_revalidate_disk_zones() before executing the zone report. The kdoc comment for blk_revalidate_disk_zones() is also updated to mention that device drivers must set the device zone size and the max_zone_append_sectors queue limit before calling this function. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230703024812.76778-6-dlemoal@kernel.org Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-06-12block: replace fmode_t with a block-specific type for block open flagsChristoph Hellwig1-4/+4
The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12block: remove unused fmode_t arguments from ioctl handlersChristoph Hellwig1-2/+2
A few ioctl handlers have fmode_t arguments that are entirely unused, remove them. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20230608110258.189493-27-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-18block: Introduce blk_rq_is_seq_zoned_write()Bart Van Assche1-4/+1
Introduce the function blk_rq_is_seq_zoned_write(). This function will be used in later patches to preserve the order of zoned writes that require write serialization. This patch includes an optimization: instead of using rq->q->disk->part0->bd_queue to check whether or not the queue is associated with a zoned block device, use rq->q->disk->queue. Cc: Christoph Hellwig <hch@lst.de> Cc: Damien Le Moal <dlemoal@kernel.org> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230517174230.897144-6-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-18block: Simplify blk_req_needs_zone_write_lock()Bart Van Assche1-3/+0
Remove the blk_rq_is_passthrough() check because it is redundant: blk_req_needs_zone_write_lock() also calls bdev_op_is_zoned_write() and the latter function returns false for pass-through requests. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230517174230.897144-3-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-21block: remove more NULL checks after bdev_get_queue()Juhyung Park1-10/+0
bdev_get_queue() never returns NULL. Several commits [1][2] have been made before to remove such superfluous checks, but some still remained. For places where bdev_get_queue() is called solely for NULL checks, it is removed entirely. [1] commit ec9fd2a13d74 ("blk-lib: don't check bdev_get_queue() NULL check") [2] commit fea127b36c93 ("block: remove superfluous check for request queue in bdev_is_zoned()") Signed-off-by: Juhyung Park <qkrwngud825@gmail.com> Reviewed-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20230203024029.48260-1-qkrwngud825@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-01-29block: add a new helper bdev_{is_zone_start, offset_from_zone_start}Pankaj Raghav1-2/+2
Instead of open coding to check for zone start, add a helper to improve readability and store the logic in one place. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230110143635.77300-3-p.raghav@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-29block: adapt blk_mq_plug() to not plug for writes that require a zone lockPankaj Raghav1-6/+3
The current implementation of blk_mq_plug() disables plugging for all operations that involves a transfer to the device as we just check if the last bit in op_is_write() function. Modify blk_mq_plug() to disable plugging only for REQ_OP_WRITE and REQ_OP_WRITE_ZEROS as they might require a zone lock. Suggested-by: Christoph Hellwig <hch@lst.de> Suggested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220929074745.103073-2-p.raghav@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-14treewide: Rename enum req_opf into enum req_opBart Van Assche1-4/+3
The type name enum req_opf is misleading since it suggests that values of this type include both an operation type and flags. Since values of this type represent an operation only, change the type name into enum req_op. Convert the enum req_op documentation into kernel-doc format. Move a few definitions such that the enum req_op documentation occurs just above the enum req_op definition. The name "req_opf" was introduced by commit ef295ecf090d ("block: better op and flags encoding"). Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-2-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: move zone related fields to struct gendiskChristoph Hellwig1-24/+21
Move the zone related fields that are currently stored in struct request_queue to struct gendisk as these are part of the highlevel block layer API and are only used for non-passthrough I/O that requires the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: use bdev based helpers in blkdev_zone_mgmt{,all}Christoph Hellwig1-4/+4
Use the bdev based helpers instead of the queue based ones to clean up the code a bit and prepare for storing all zone related fields in struct gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: replace blkdev_nr_zones with bdev_nr_zonesChristoph Hellwig1-7/+8
Pass a block_device instead of a request_queue as that is what most callers have at hand. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: pass a gendisk to blk_queue_free_zone_bitmapsChristoph Hellwig1-3/+5
Switch to a gendisk based API in preparation for moving all zone related fields from the request_queue to the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: pass a gendisk to blk_queue_clear_zone_settingsChristoph Hellwig1-1/+3
Switch to a gendisk based API in preparation for moving all zone related fields from the request_queue to the gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-06block: use bdev_is_zoned instead of open coding itChristoph Hellwig1-5/+4
Use bdev_is_zoned in all places where a block_device is available instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220706070350.1703384-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-24Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-1/+0
Pull SCSI updates from James Bottomley: "This series consists of the usual driver updates (qla2xxx, pm8001, libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates and bug fixes. The high blast radius core update is the removal of write same, which affects block and several non-SCSI devices. The other big change, which is more local, is the removal of the SCSI pointer" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (281 commits) scsi: scsi_ioctl: Drop needless assignment in sg_io() scsi: bsg: Drop needless assignment in scsi_bsg_sg_io_fn() scsi: lpfc: Copyright updates for 14.2.0.0 patches scsi: lpfc: Update lpfc version to 14.2.0.0 scsi: lpfc: SLI path split: Refactor BSG paths scsi: lpfc: SLI path split: Refactor Abort paths scsi: lpfc: SLI path split: Refactor SCSI paths scsi: lpfc: SLI path split: Refactor CT paths scsi: lpfc: SLI path split: Refactor misc ELS paths scsi: lpfc: SLI path split: Refactor VMID paths scsi: lpfc: SLI path split: Refactor FDISC paths scsi: lpfc: SLI path split: Refactor LS_RJT paths scsi: lpfc: SLI path split: Refactor LS_ACC paths scsi: lpfc: SLI path split: Refactor the RSCN/SCR/RDF/EDC/FARPR paths scsi: lpfc: SLI path split: Refactor PLOGI/PRLI/ADISC/LOGO paths scsi: lpfc: SLI path split: Refactor base ELS paths and the FLOGI path scsi: lpfc: SLI path split: Introduce lpfc_prep_wqe scsi: lpfc: SLI path split: Refactor fast and slow paths to native SLI4 scsi: lpfc: SLI path split: Refactor lpfc_iocbq scsi: lpfc: Use kcalloc() ...
2022-02-22scsi: block: Remove REQ_OP_WRITE_SAME supportChristoph Hellwig1-1/+0
No more users of REQ_OP_WRITE_SAME or drivers implementing it are left, so remove the infrastructure. [mkp: fold in and tweak sysfs reporting fix] Link: https://lore.kernel.org/r/20220209082828.2629273-8-hch@lst.de Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-02block: pass a block_device and opf to bio_initChristoph Hellwig1-4/+1
Pass the block_device that we plan to use this bio for and the operation to bio_init to optimize the assignment. A NULL block_device can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02block: pass a block_device and opf to blk_next_bioChaitanya Kulkarni1-6/+3
All callers need to set the block_device and operation, so lift that into the common code. Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220124091107.642561-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-11block: Hold invalidate_lock in BLKRESETZONE ioctlShin'ichiro Kawasaki1-10/+5
When BLKRESETZONE ioctl and data read race, the data read leaves stale page cache. The commit e5113505904e ("block: Discard page cache of zone reset target range") added page cache truncation to avoid stale page cache after the ioctl. However, the stale page cache still can be read during the reset zone operation for the ioctl. To avoid the stale page cache completely, hold invalidate_lock of the block device file mapping. Fixes: e5113505904e ("block: Discard page cache of zone reset target range") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: stable@vger.kernel.org # v5.15 Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211111085238.942492-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-24blk-zoned: allow BLKREPORTZONE without CAP_SYS_ADMINNiklas Cassel1-3/+0
A user space process should not need the CAP_SYS_ADMIN capability set in order to perform a BLKREPORTZONE ioctl. Getting the zone report is required in order to get the write pointer. Neither read() nor write() requires CAP_SYS_ADMIN, so it is reasonable that a user space process that can read/write from/to the device, also can get the write pointer. (Since e.g. writes have to be at the write pointer.) Fixes: 3ed05a987e0f ("blk-zoned: implement ioctls") Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Aravind Ramesh <aravind.ramesh@wdc.com> Reviewed-by: Adam Manzanares <a.manzanares@samsung.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: stable@vger.kernel.org # v4.10+ Link: https://lore.kernel.org/r/20210811110505.29649-3-Niklas.Cassel@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-08-24blk-zoned: allow zone management send operations without CAP_SYS_ADMINNiklas Cassel1-3/+0
Zone management send operations (BLKRESETZONE, BLKOPENZONE, BLKCLOSEZONE and BLKFINISHZONE) should be allowed under the same permissions as write(). (write() does not require CAP_SYS_ADMIN). Additionally, other ioctls like BLKSECDISCARD and BLKZEROOUT only check if the fd was successfully opened with FMODE_WRITE. (They do not require CAP_SYS_ADMIN). Currently, zone management send operations require both CAP_SYS_ADMIN and that the fd was successfully opened with FMODE_WRITE. Remove the CAP_SYS_ADMIN requirement, so that zone management send operations match the access control requirement of write(), BLKSECDISCARD and BLKZEROOUT. Fixes: 3ed05a987e0f ("blk-zoned: implement ioctls") Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Aravind Ramesh <aravind.ramesh@wdc.com> Reviewed-by: Adam Manzanares <a.manzanares@samsung.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: stable@vger.kernel.org # v4.10+ Link: https://lore.kernel.org/r/20210811110505.29649-2-Niklas.Cassel@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-04block: improve handling of all zones reset operationDamien Le Moal1-27/+92
SCSI, ZNS and null_blk zoned devices support resetting all zones using a single command (REQ_OP_ZONE_RESET_ALL), as indicated using the device request queue flag QUEUE_FLAG_ZONE_RESETALL. This flag is not set for device mapper targets creating zoned devices. In this case, a user request for resetting all zones of a device is processed in blkdev_zone_mgmt() by issuing a REQ_OP_ZONE_RESET operation for each zone of the device. This leads to different behaviors of the BLKRESETZONE ioctl() depending on the target device support for the reset all operation. E.g. blkzone reset /dev/sdX will reset all zones of a SCSI device using a single command that will ignore conventional, read-only or offline zones. But a dm-linear device including conventional, read-only or offline zones cannot be reset in the same manner as some of the single zone reset operations issued by blkdev_zone_mgmt() will fail. E.g.: blkzone reset /dev/dm-Y blkzone: /dev/dm-0: BLKRESETZONE ioctl failed: Remote I/O error To simplify applications and tools development, unify the behavior of the all-zone reset operation by modifying blkdev_zone_mgmt() to not issue a zone reset operation for conventional, read-only and offline zones, thus mimicking what an actual reset-all device command does on a device supporting REQ_OP_ZONE_RESET_ALL. This emulation is done using the new function blkdev_zone_reset_all_emulated(). The zones needing a reset are identified using a bitmap that is initialized using a zone report. Since empty zones do not need a reset, also ignore these zones. The function blkdev_zone_reset_all() is introduced for block devices natively supporting reset all operations. blkdev_zone_mgmt() is modified to call either function to execute an all zone reset request. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> [hch: split into multiple functions] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2021-04-07blk-zoned: Remove the definition of blk_zone_start()Bart Van Assche1-8/+0
Commit e76239a3748c ("block: add a report_zones method") removed the last blk_zone_start() call. Hence also remove the definition of this function. Cc: Christoph Hellwig <hch@lst.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210406200820.15180-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-11block: Discard page cache of zone reset target rangeShin'ichiro Kawasaki1-2/+36
When zone reset ioctl and data read race for a same zone on zoned block devices, the data read leaves stale page cache even though the zone reset ioctl zero clears all the zone data on the device. To avoid non-zero data read from the stale page cache after zone reset, discard page cache of reset target zones in blkdev_zone_mgmt_ioctl(). Introduce the helper function blkdev_truncate_zone_range() to discard the page cache. Ensure the page cache discarded by calling the helper function before and after zone reset in same manner as fallocate does. This patch can be applied back to the stable kernel version v5.10.y. Rework is needed for older stable kernels. Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: 3ed05a987e0f ("blk-zoned: implement ioctls") Cc: <stable@vger.kernel.org> # 5.10+ Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210311072546.678999-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-10block: Fix REQ_OP_ZONE_RESET_ALL handlingDamien Le Moal1-1/+1
Similarly to a single zone reset operation (REQ_OP_ZONE_RESET), execute REQ_OP_ZONE_RESET_ALL operations with REQ_SYNC set. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10block: introduce blk_queue_clear_zone_settings()Damien Le Moal1-0/+17
Introduce the internal function blk_queue_clear_zone_settings() to cleanup all limits and resources related to zoned block devices. This new function is called from blk_queue_set_zoned() when a disk zoned model is set to BLK_ZONED_NONE. This particular case can happens when a partition is created on a host-aware scsi disk. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07block: Improve blk_revalidate_disk_zones() checksDamien Le Moal1-1/+15
Improves the checks on the zones of a zoned block device done in blk_revalidate_disk_zones() by making sure that the device report_zones method did report at least one zone and that the zones reported exactly cover the entire disk capacity, that is, that there are no missing zones at the end of the disk sector range. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-08-03block: don't do revalidate zones on invalid devicesJohannes Thumshirn1-0/+3
When we loose a device for whatever reason while (re)scanning zones, we trip over a NULL pointer in blk_revalidate_zone_cb, like in the following log: sd 0:0:0:0: [sda] 3418095616 4096-byte logical blocks: (14.0 TB/12.7 TiB) sd 0:0:0:0: [sda] 52156 zones of 65536 logical blocks sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 37 00 00 08 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] REPORT ZONES start lba 1065287680 failed sd 0:0:0:0: [sda] REPORT ZONES: Result: hostbyte=0x00 driverbyte=0x08 sd 0:0:0:0: [sda] Sense Key : 0xb [current] sd 0:0:0:0: [sda] ASC=0x0 ASCQ=0x6 sda: failed to revalidate zones sd 0:0:0:0: [sda] 0 4096-byte logical blocks: (0 B/0 B) sda: detected capacity change from 14000519643136 to 0 ================================================================== BUG: KASAN: null-ptr-deref in blk_revalidate_zone_cb+0x1b7/0x550 Write of size 8 at addr 0000000000000010 by task kworker/u4:1/58 CPU: 1 PID: 58 Comm: kworker/u4:1 Not tainted 5.8.0-rc1 #692 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014 Workqueue: events_unbound async_run_entry_fn Call Trace: dump_stack+0x7d/0xb0 ? blk_revalidate_zone_cb+0x1b7/0x550 kasan_report.cold+0x5/0x37 ? blk_revalidate_zone_cb+0x1b7/0x550 check_memory_region+0x145/0x1a0 blk_revalidate_zone_cb+0x1b7/0x550 sd_zbc_parse_report+0x1f1/0x370 ? blk_req_zone_write_trylock+0x200/0x200 ? sectors_to_logical+0x60/0x60 ? blk_req_zone_write_trylock+0x200/0x200 ? blk_req_zone_write_trylock+0x200/0x200 sd_zbc_report_zones+0x3c4/0x5e0 ? sd_dif_config_host+0x500/0x500 blk_revalidate_disk_zones+0x231/0x44d ? _raw_write_lock_irqsave+0xb0/0xb0 ? blk_queue_free_zone_bitmaps+0xd0/0xd0 sd_zbc_read_zones+0x8cf/0x11a0 sd_revalidate_disk+0x305c/0x64e0 ? __device_add_disk+0x776/0xf20 ? read_capacity_16.part.0+0x1080/0x1080 ? blk_alloc_devt+0x250/0x250 ? create_object.isra.0+0x595/0xa20 ? kasan_unpoison_shadow+0x33/0x40 sd_probe+0x8dc/0xcd2 really_probe+0x20e/0xaf0 __driver_attach_async_helper+0x249/0x2d0 async_run_entry_fn+0xbe/0x560 process_one_work+0x764/0x1290 ? _raw_read_unlock_irqrestore+0x30/0x30 worker_thread+0x598/0x12f0 ? __kthread_parkme+0xc6/0x1b0 ? schedule+0xed/0x2c0 ? process_one_work+0x1290/0x1290 kthread+0x36b/0x440 ? kthread_create_worker_on_cpu+0xa0/0xa0 ret_from_fork+0x22/0x30 ================================================================== When the device is already gone we end up with the following scenario: The device's capacity is 0 and thus the number of zones will be 0 as well. When allocating the bitmap for the conventional zones, we then trip over a NULL pointer. So if we encounter a zoned block device with a 0 capacity, don't dare to revalidate the zones sizes. Fixes: 6c6b35491422 ("block: set the zone size in blk_revalidate_disk_zones atomically") Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-08block: add capacity field to zone descriptorsMatias Bjørling1-0/+1
In the zoned storage model, the sectors within a zone are typically all writeable. With the introduction of the Zoned Namespace (ZNS) Command Set in the NVM Express organization, the model was extended to have a specific writeable capacity. Extend the zone descriptor data structure with a zone capacity field to indicate to the user how many sectors in a zone are writeable. Introduce backward compatibility in the zone report ioctl by extending the zone report header data structure with a flags field to indicate if the capacity field is available. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Javier González <javier.gonz@samsung.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Signed-off-by: Matias Bjørling <matias.bjorling@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-05-12block: Modify revalidate zonesDamien Le Moal1-1/+8
Modify the interface of blk_revalidate_disk_zones() to add an optional driver callback function that a driver can use to extend processing done during zone revalidation. The callback, if defined, is executed with the device request queue frozen, after all zones have been inspected. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-12block: introduce blk_req_zone_write_trylockJohannes Thumshirn1-0/+14
Introduce blk_req_zone_write_trylock(), which either grabs the write-lock for a sequential zone or returns false, if the zone is already locked. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-30Merge tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+32
Pull block driver updates from Jens Axboe: - floppy driver cleanup series from Willy - NVMe updates and fixes (Various) - null_blk trace improvements (Chaitanya) - bcache fixes (Coly) - md fixes (via Song) - loop block size change optimizations (Martijn) - scnprintf() use (Takashi) * tag 'for-5.7/drivers-2020-03-29' of git://git.kernel.dk/linux-block: (81 commits) null_blk: add trace in null_blk_zoned.c null_blk: add tracepoint helpers for zoned mode block: add a zone condition debug helper nvme: cleanup namespace identifier reporting in nvme_init_ns_head nvme: rename __nvme_find_ns_head to nvme_find_ns_head nvme: refactor nvme_identify_ns_descs error handling nvme-tcp: Add warning on state change failure at nvme_tcp_setup_ctrl nvme-rdma: Add warning on state change failure at nvme_rdma_setup_ctrl nvme: Fix controller creation races with teardown flow nvme: Make nvme_uninit_ctrl symmetric to nvme_init_ctrl nvme: Fix ctrl use-after-free during sysfs deletion nvme-pci: Re-order nvme_pci_free_ctrl nvme: Remove unused return code from nvme_delete_ctrl_sync nvme: Use nvme_state_terminal helper nvme: release ida resources nvme: Add compat_ioctl handler for NVME_IOCTL_SUBMIT_IO nvmet-tcp: optimize tcp stack TX when data digest is used nvme-fabrics: Use scnprintf() for avoiding potential buffer overflow nvme-multipath: do not reset on unknown status nvmet-rdma: allocate RW ctxs according to mdts ...
2020-03-27block: add a zone condition debug helperChaitanya Kulkarni1-0/+32
Add a helper to stringify the zone conditions. We use this helper in the next patch to track zone conditions in tracepoints. Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-12block, zoned: fix integer overflow with BLKRESETZONE et alAlexey Dobriyan1-1/+1
Check for overflow in addition before checking for end-of-block-device. Steps to reproduce: #define _GNU_SOURCE 1 #include <sys/ioctl.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> typedef unsigned long long __u64; struct blk_zone_range { __u64 sector; __u64 nr_sectors; }; #define BLKRESETZONE _IOW(0x12, 131, struct blk_zone_range) int main(void) { int fd = open("/dev/nullb0", O_RDWR|O_DIRECT); struct blk_zone_range zr = {4096, 0xfffffffffffff000ULL}; ioctl(fd, BLKRESETZONE, &zr); return 0; } BUG: KASAN: null-ptr-deref in submit_bio_wait+0x74/0xe0 Write of size 8 at addr 0000000000000040 by task a.out/1590 CPU: 8 PID: 1590 Comm: a.out Not tainted 5.6.0-rc1-00019-g359c92c02bfa #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014 Call Trace: dump_stack+0x76/0xa0 __kasan_report.cold+0x5/0x3e kasan_report+0xe/0x20 submit_bio_wait+0x74/0xe0 blkdev_zone_mgmt+0x26f/0x2a0 blkdev_zone_mgmt_ioctl+0x14b/0x1b0 blkdev_ioctl+0xb28/0xe60 block_ioctl+0x69/0x80 ksys_ioctl+0x3af/0xa50 Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alexey Dobriyan (SK hynix) <adobriyan@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-01-09block: mark zone-mgmt bios with REQ_SYNCChaitanya Kulkarni1-1/+1
In the current implementation, final zone-mgmt request is issued with submit_bio_wait() which marks the bio REQ_SYNC. This is needed since immediate action is expected for zone-mgmt requests as these are blocking operations. This also bypasses the scheduler in the blk_mq_make_request() and dispatches the request directly into the hw ctx. This patch marks all the chained bios REQ_SYNC so that we can have above-mentioned behavior for non-final bios also. Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03block: set the zone size in blk_revalidate_disk_zones atomicallyChristoph Hellwig1-26/+33
The current zone revalidation code has a major problem in that it doesn't update the zone size and q->nr_zones atomically, leading to a short window where an out of bounds access to the zone arrays is possible. To fix this move the setting of the zone size into the crticial sections blk_revalidate_disk_zones so that it gets updated together with the zone bitmaps and q->nr_zones. This also slightly simplifies the caller as it deducts the zone size from the report_zones. This change also allows to check for a power of two zone size in generic code. Reported-by: Hans Holmberg <hans@owltronix.com> Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03block: don't handle bio based drivers in blk_revalidate_disk_zonesChristoph Hellwig1-11/+5
bio based drivers only need to update q->nr_zones. Do that manually instead of overloading blk_revalidate_disk_zones to keep that function simpler for the next round of changes that will rely even more on the request based functionality. Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03block: allocate the zone bitmaps lazilyChristoph Hellwig1-33/+32
Allocate the conventional zone bitmap and the sequential zone locking bitmap only when we find a zone of the respective type. This avoids wasting memory on the conventional zone bitmap for devices that only have sequential zones, and will also prepare for other future changes. Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03block: replace seq_zones_bitmap with conv_zones_bitmapChristoph Hellwig1-9/+9
Invert the meaning of seq_zones_bitmap by keeping a bitmap of conventional zones. This allows not having a bitmap for devices that do not have conventional zones. Reviewed-by: Javier González <javier@javigon.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03block: simplify blkdev_nr_zonesChristoph Hellwig1-18/+8
Simplify the arguments to blkdev_nr_zones by passing a gendisk instead of the block_device and capacity. This also removes the need for __blkdev_nr_zones as all callers are outside the fast path and can deal with the additional branch. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03block: remove the empty line at the end of blk-zoned.cChristoph Hellwig1-1/+0
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-12block: rework zone reportingChristoph Hellwig1-156/+97
Avoid the need to allocate a potentially large array of struct blk_zone in the block layer by switching the ->report_zones method interface to a callback model. Now the caller simply supplies a callback that is executed on each reported zone, and private data for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>