aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/block/xen-blkfront.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2016-01-29xen/blkfront: realloc ring info in blkif_resumeBob Liu1-29/+45
Need to reallocate ring info in the resume path, because info->rinfo was freed in blkif_free(). And 'multi-queue-max-queues' backend reports may have been changed. Signed-off-by: Bob Liu <bob.liu@oracle.com> Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blkfront: Fix crash if backend doesn't follow the right states.Konrad Rzeszutek Wilk1-4/+8
We have split the setting up of all the resources in two steps: 1) talk_to_blkback - which figures out the num_ring_pages (from the default value of zero), sets up shadow and so 2) blkfront_connect - does the real part of filling out the internal structures. The problem is if we bypass the 1) step and go straight to 2) and call blkfront_setup_indirect where we use the macro BLK_RING_SIZE - which returns an negative value (because sz is zero - since num_ring_pages is zero - since it has never been set). We can fix this by making sure that we always have called talk_to_blkback before going to blkfront_connect. Or we could set in blkfront_probe info->nr_ring_pages = 1 to have a default value. But that looks odd - as we haven't actually negotiated any ring size. This patch changes XenbusStateConnected state to detect if we haven't done the initial handshake - and if so continue on as if were in XenbusStateInitWait state. We also roll the error recovery (freeing the structure) into talk_to_blkback error path - which is safe since that function is only called from blkback_changed. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blkfront: Handle non-indirect grant with 64KB pagesJulien Grall1-16/+212
The minimal size of request in the block framework is always PAGE_SIZE. It means that when 64KB guest is support, the request will at least be 64KB. Although, if the backend doesn't support indirect descriptor (such as QDISK in QEMU), a ring request is only able to accommodate 11 segments of 4KB (i.e 44KB). The current frontend is assuming that an I/O request will always fit in a ring request. This is not true any more when using 64KB page granularity and will therefore crash during boot. On ARM64, the ABI is completely neutral to the page granularity used by the domU. The guest has the choice between different page granularity supported by the processors (for instance on ARM64: 4KB, 16KB, 64KB). This can't be enforced by the hypervisor and therefore it's possible to run guests using different page granularity. So we can't mandate the block backend to support indirect descriptor when the frontend is using 64KB page granularity and have to fix it properly in the frontend. The solution exposed below is based on modifying directly the frontend guest rather than asking the block framework to support smaller size (i.e < PAGE_SIZE). This is because the change is the block framework are not trivial as everything seems to relying on a struct *page (see [1]). Although, it may be possible that someone succeed to do it in the future and we would therefore be able to use it. Given that a block request may not fit in a single ring request, a second request is introduced for the data that cannot fit in the first one. This means that the second ring request should never be used on Linux if the page size is smaller than 44KB. To achieve the support of the extra ring request, the block queue size is divided by two. Therefore, the ring will always contain enough space to accommodate 2 ring requests. While this will reduce the overall performance, it will make the implementation more contained. The way forward to get better performance is to implement in the backend either indirect descriptor or multiple grants ring. Note that the parameters blk_queue_max_* helpers haven't been updated. The block code will set the mimimum size supported and we may be able to support directly any change in the block framework that lower down the minimal size of a request. [1] http://lists.xen.org/archives/html/xen-devel/2015-08/msg02200.html Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen-blkfront: Introduce blkif_ring_get_requestJulien Grall1-11/+19
The code to get a request is always the same. Therefore we can factorize it in a single function. Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blocks: Return -EXX instead of -1Konrad Rzeszutek Wilk1-2/+2
Lets return sensible values instead of -1. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blkfront: correct setting for xen_blkif_max_ring_orderPeng Fan1-1/+1
According to this piece code: " pr_info("Invalid max_ring_order (%d), will use default max: %d.\n", xen_blkif_max_ring_order, XENBUS_MAX_RING_GRANT_ORDER); " if xen_blkif_max_ring_order is bigger that XENBUS_MAX_RING_GRANT_ORDER, need to set xen_blkif_max_ring_order using XENBUS_MAX_RING_GRANT_ORDER, but not 0. Signed-off-by: Peng Fan <van.freenix@gmail.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: "Roger Pau Monné" <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blkfront: make persistent grants pool per-queueBob Liu1-67/+43
Make persistent grants per-queue/ring instead of per-device, so that we can drop the 'dev_lock' and get better scalability. Test was done based on null_blk driver: dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk" domu: v4.2-rc8 16vcpus 10GB [test] rw=read direct=1 ioengine=libaio bs=4k time_based runtime=30 filename=/dev/xvdb numjobs=16 iodepth=64 iodepth_batch=64 iodepth_batch_complete=64 group_reporting Queues: 1 4 8 16 Iops orig(k): 810 1064 780 700 Iops patched(k): 810 1230(~20%) 1024(~20%) 850(~20%) Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blkfront: Remove duplicate setting of ->xbdev.Bob Liu1-1/+0
We do the same exact operations a bit earlier in the function. Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blkfront: Cleanup of comments, fix unaligned variables, and syntax errors.Konrad Rzeszutek Wilk1-7/+6
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blkfront: negotiate number of queues/rings to be used with backendBob Liu1-41/+119
The max number of hardware queues for xen/blkfront is set by parameter 'max_queues'(default 4), while it is also capped by the max value that the xen/blkback exposes through XenStore key 'multi-queue-max-queues'. The negotiated number is the smaller one and would be written back to xenstore as "multi-queue-num-queues", blkback needs to read this negotiated number. Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blkfront: split per device io_lockBob Liu1-26/+47
After patch "xen/blkfront: separate per ring information out of device info", per-ring data is protected by a per-device lock ('io_lock'). This is not a good way and will effect the scalability, so introduce a per-ring lock ('ring_lock'). The old 'io_lock' is renamed to 'dev_lock' which protects the ->grants list and ->persistent_gnts_c which are shared by all rings. Note that in 'blkfront_probe' the 'blkfront_info' is setup via kzalloc so setting ->persistent_gnts_c to zero is not needed. Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blkfront: pseudo support for multi hardware queues/ringsBob Liu1-145/+198
Preparatory patch for multiple hardware queues (rings). The number of rings is unconditionally set to 1, larger number will be enabled in patch "xen/blkfront: negotiate number of queues/rings to be used with backend" so as to make review easier. Note that blkfront_gather_backend_features does not call blkfront_setup_indirect anymore (as that needs to be done per ring). That means that in blkif_recover/blkif_connect we have to do it in a loop (bounded by nr_rings). Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-01-04xen/blkfront: separate per ring information out of device infoBob Liu1-162/+197
Split per ring information to a new structure "blkfront_ring_info". A ring is the representation of a hardware queue, every vbd device can associate with one or more rings depending on how many hardware queues/rings to be used. This patch is a preparation for supporting real multi hardware queues/rings. We also add a backpointer to 'struct blkfront_info' (dev_info) which is not needed (we could use containers_of) but further patch ("xen/blkfront: pseudo support for multi hardware queues/rings") will make allocation of 'blkfront_ring_info' dynamic. Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-11-04Merge tag 'for-linus-4.4-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds1-200/+360
Pull xen updates from David Vrabel: - Improve balloon driver memory hotplug placement. - Use unpopulated hotplugged memory for foreign pages (if supported/enabled). - Support 64 KiB guest pages on arm64. - CPU hotplug support on arm/arm64. * tag 'for-linus-4.4-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (44 commits) xen: fix the check of e_pfn in xen_find_pfn_range x86/xen: add reschedule point when mapping foreign GFNs xen/arm: don't try to re-register vcpu_info on cpu_hotplug. xen, cpu_hotplug: call device_offline instead of cpu_down xen/arm: Enable cpu_hotplug.c xenbus: Support multiple grants ring with 64KB xen/grant-table: Add an helper to iterate over a specific number of grants xen/xenbus: Rename *RING_PAGE* to *RING_GRANT* xen/arm: correct comment in enlighten.c xen/gntdev: use types from linux/types.h in userspace headers xen/gntalloc: use types from linux/types.h in userspace headers xen/balloon: Use the correct sizeof when declaring frame_list xen/swiotlb: Add support for 64KB page granularity xen/swiotlb: Pass addresses rather than frame numbers to xen_arch_need_swiotlb arm/xen: Add support for 64KB page granularity xen/privcmd: Add support for Linux 64KB page granularity net/xen-netback: Make it running on 64KB page granularity net/xen-netfront: Make it running on 64KB page granularity block/xen-blkback: Make it running on 64KB page granularity block/xen-blkfront: Make it running on 64KB page granularity ...
2015-10-23xen/xenbus: Rename *RING_PAGE* to *RING_GRANT*Julien Grall1-5/+5
Linux may use a different page size than the size of grant. So make clear that the order is actually in number of grant. Signed-off-by: Julien Grall <julien.grall@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23block/xen-blkfront: Make it running on 64KB page granularityJulien Grall1-111/+213
The PV block protocol is using 4KB page granularity. The goal of this patch is to allow a Linux using 64KB page granularity using block device on a non-modified Xen. The block API is using segment which should at least be the size of a Linux page. Therefore, the driver will have to break the page in chunk of 4K before giving the page to the backend. When breaking a 64KB segment in 4KB chunks, it is possible that some chunks are empty. As the PV protocol always require to have data in the chunk, we have to count the number of Xen page which will be in use and avoid sending empty chunks. Note that, a pre-defined number of grants are reserved before preparing the request. This pre-defined number is based on the number and the maximum size of the segments. If each segment contains a very small amount of data, the driver may reserve too many grants (16 grants is reserved per segment with 64KB page granularity). Furthermore, in the case of persistent grants we allocate one Linux page per grant although only the first 4KB of the page will be effectively in use. This could be improved by sharing the page with multiple grants. Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23block/xen-blkfront: split get_grant in 2Julien Grall1-29/+59
Prepare the code to support 64KB page granularity. The first implementation will use a full Linux page per indirect and persistent grant. When non-persistent grant is used, each page of a bio request may be split in multiple grant. Furthermore, the field page of the grant structure is only used to copy data from persistent grant or indirect grant. Avoid to set it for other use case as it will have no meaning given the page will be split in multiple grant. Provide 2 functions, to setup indirect grant, the other for bio page. Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23block/xen-blkfront: Store a page rather a pfn in the grant structureJulien Grall1-20/+19
All the usage of the field pfn are done using the same idiom: pfn_to_page(grant->pfn) This will return always the same page. Store directly the page in the grant to clean up the code. Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23block/xen-blkfront: Split blkif_queue_request in 2Julien Grall1-124/+153
Currently, blkif_queue_request has 2 distinct execution path: - Send a discard request - Send a read/write request The function is also allocating grants to use for generating the request. Although, this is only used for read/write request. Rather than having a function with 2 distinct execution path, separate the function in 2. This will also remove one level of tabulation. Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-07Merge branch 'stable/for-jens-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linusJens Axboe1-1/+2
Konrad writes: Please git pull an update branch to your 'for-4.3/drivers' branch (which oddly I don't see does not have the previous pull?) git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-jens-4.3 which has two fixes - one where we use the Xen blockfront EFI driver and don't release all the requests, the other if the allocation of resources for a particular state failed - we would go back 'Closing' and assume that an structure would be allocated while in fact it may not be - and crash.
2015-10-07xen-blkfront: check for null drvdata in blkback_changed (XenbusStateClosing)Cathy Avery1-1/+2
xen-blkfront will crash if the check to talk_to_blkback() in blkback_changed()(XenbusStateInitWait) returns an error. The driver data is freed and info is set to NULL. Later during the close process via talk_to_blkback's call to xenbus_dev_fatal() the null pointer is passed to and dereference in blkfront_closing. CC: stable@vger.kernel.org Signed-off-by: Cathy Avery <cathy.avery@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-10-01blk-mq: fix racy updates of rq->errorsChristoph Hellwig1-9/+10
blk_mq_complete_request may be a no-op if the request has already been completed by others means (e.g. a timeout or cancellation), but currently drivers have to set rq->errors before calling blk_mq_complete_request, which might leave us with the wrong error value. Add an error parameter to blk_mq_complete_request so that we can defer setting rq->errors until we known we won the race to complete the request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-09-10Merge tag 'for-linus-4.3-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds1-3/+3
Pull xen terminology fixes from David Vrabel: "Use the correct GFN/BFN terms more consistently" * tag 'for-linus-4.3-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/xenbus: Rename the variable xen_store_mfn to xen_store_gfn xen/privcmd: Further s/MFN/GFN/ clean-up hvc/xen: Further s/MFN/GFN clean-up video/xen-fbfront: Further s/MFN/GFN clean-up xen/tmem: Use xen_page_to_gfn rather than pfn_to_gfn xen: Use correctly the Xen memory terminologies arm/xen: implement correctly pfn_to_mfn xen: Make clear that swiotlb and biomerge are dealing with DMA address
2015-09-08Merge tag 'for-linus-4.3-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds1-86/+60
Pull xen updates from David Vrabel: "Xen features and fixes for 4.3: - Convert xen-blkfront to the multiqueue API - [arm] Support binding event channels to different VCPUs. - [x86] Support > 512 GiB in a PV guests (off by default as such a guest cannot be migrated with the current toolstack). - [x86] PMU support for PV dom0 (limited support for using perf with Xen and other guests)" * tag 'for-linus-4.3-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (33 commits) xen: switch extra memory accounting to use pfns xen: limit memory to architectural maximum xen: avoid another early crash of memory limited dom0 xen: avoid early crash of memory limited dom0 arm/xen: Remove helpers which are PV specific xen/x86: Don't try to set PCE bit in CR4 xen/PMU: PMU emulation code xen/PMU: Intercept PMU-related MSR and APIC accesses xen/PMU: Describe vendor-specific PMU registers xen/PMU: Initialization code for Xen PMU xen/PMU: Sysfs interface for setting Xen PMU mode xen: xensyms support xen: remove no longer needed p2m.h xen: allow more than 512 GB of RAM for 64 bit pv-domains xen: move p2m list if conflicting with e820 map xen: add explicit memblock_reserve() calls for special pages mm: provide early_memremap_ro to establish read-only mapping xen: check for initrd conflicting with e820 map xen: check pre-allocated page tables for conflict with memory map xen: check for kernel memory conflicting with memory layout ...
2015-09-08xen: Use correctly the Xen memory terminologiesJulien Grall1-3/+3
Based on include/xen/mm.h [1], Linux is mistakenly using MFN when GFN is meant, I suspect this is because the first support for Xen was for PV. This resulted in some misimplementation of helpers on ARM and confused developers about the expected behavior. For instance, with pfn_to_mfn, we expect to get an MFN based on the name. Although, if we look at the implementation on x86, it's returning a GFN. For clarity and avoid new confusion, replace any reference to mfn with gfn in any helpers used by PV drivers. The x86 code will still keep some reference of pfn_to_mfn which may be used by all kind of guests No changes as been made in the hypercall field, even though they may be invalid, in order to keep the same as the defintion in xen repo. Note that page_to_mfn has been renamed to xen_page_to_gfn to avoid a name to close to the KVM function gfn_to_page. Take also the opportunity to simplify simple construction such as pfn_to_mfn(page_to_pfn(page)) into xen_page_to_gfn. More complex clean up will come in follow-up patches. [1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=e758ed14f390342513405dd766e874934573e6cb Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-09-02Merge branch 'for-4.3/core' of git://git.kernel.dk/linux-blockLinus Torvalds1-6/+3
Pull core block updates from Jens Axboe: "This first core part of the block IO changes contains: - Cleanup of the bio IO error signaling from Christoph. We used to rely on the uptodate bit and passing around of an error, now we store the error in the bio itself. - Improvement of the above from myself, by shrinking the bio size down again to fit in two cachelines on x86-64. - Revert of the max_hw_sectors cap removal from a revision again, from Jeff Moyer. This caused performance regressions in various tests. Reinstate the limit, bump it to a more reasonable size instead. - Make /sys/block/<dev>/queue/discard_max_bytes writeable, by me. Most devices have huge trim limits, which can cause nasty latencies when deleting files. Enable the admin to configure the size down. We will look into having a more sane default instead of UINT_MAX sectors. - Improvement of the SGP gaps logic from Keith Busch. - Enable the block core to handle arbitrarily sized bios, which enables a nice simplification of bio_add_page() (which is an IO hot path). From Kent. - Improvements to the partition io stats accounting, making it faster. From Ming Lei. - Also from Ming Lei, a basic fixup for overflow of the sysfs pending file in blk-mq, as well as a fix for a blk-mq timeout race condition. - Ming Lin has been carrying Kents above mentioned patches forward for a while, and testing them. Ming also did a few fixes around that. - Sasha Levin found and fixed a use-after-free problem introduced by the bio->bi_error changes from Christoph. - Small blk cgroup cleanup from Viresh Kumar" * 'for-4.3/core' of git://git.kernel.dk/linux-block: (26 commits) blk: Fix bio_io_vec index when checking bvec gaps block: Replace SG_GAPS with new queue limits mask block: bump BLK_DEF_MAX_SECTORS to 2560 Revert "block: remove artifical max_hw_sectors cap" blk-mq: fix race between timeout and freeing request blk-mq: fix buffer overflow when reading sysfs file of 'pending' Documentation: update notes in biovecs about arbitrarily sized bios block: remove bio_get_nr_vecs() fs: use helper bio_add_page() instead of open coding on bi_io_vec block: kill merge_bvec_fn() completely md/raid5: get rid of bio_fits_rdev() md/raid5: split bio for chunk_aligned_read block: remove split code in blkdev_issue_{discard,write_same} btrfs: remove bio splitting and merge_bvec_fn() calls bcache: remove driver private bio splitting code block: simplify bio_add_page() block: make generic_make_request handle arbitrarily sized bios blk-cgroup: Drop unlikely before IS_ERR(_OR_NULL) block: don't access bio->bi_error after bio_put() block: shrink struct bio down to 2 cache lines again ...
2015-08-20xen-blkfront: convert to blk-mq APIsBob Liu1-86/+60
Note: This patch is based on original work of Arianna's internship for GNOME's Outreach Program for Women. Only one hardware queue is used now, so there is no significant performance change The legacy non-mq code is deleted completely which is the same as other drivers like virtio, mtip, and nvme. Also dropped one unnecessary holding of info->io_lock when calling blk_mq_stop_hw_queues(). Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jens Axboe <axboe@fb.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-07-29block: add a bi_error field to struct bioChristoph Hellwig1-6/+3
Currently we have two different ways to signal an I/O error on a BIO: (1) by clearing the BIO_UPTODATE flag (2) by returning a Linux errno value to the bi_end_io callback The first one has the drawback of only communicating a single possible error (-EIO), and the second one has the drawback of not beeing persistent when bios are queued up, and are not passed along from child to parent bio in the ever more popular chaining scenario. Having both mechanisms available has the additional drawback of utterly confusing driver authors and introducing bugs where various I/O submitters only deal with one of them, and the others have to add boilerplate code to deal with both kinds of error returns. So add a new bi_error field to store an errno value directly in struct bio and remove the existing mechanisms to clean all this up. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-27Merge branch 'stable/for-jens-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linusJens Axboe1-56/+72
Konrad writes: "There are three bugs that have been found in the xen-blkfront (and backend). Two of them have the stable tree CC-ed. They have been found where an guest is migrating to a host that is missing 'feature-persistent' support (from one that has it enabled). We end up hitting an BUG() in the driver code."
2015-07-24xen-blkfront: don't add indirect pages to list when !feature_persistentBob Liu1-2/+4
We should consider info->feature_persistent when adding indirect page to list info->indirect_pages, else the BUG_ON() in blkif_free() would be triggered. When we are using persistent grants the indirect_pages list should always be empty because blkfront has pre-allocated enough persistent pages to fill all requests on the ring. CC: stable@vger.kernel.org Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-07-24xen-blkfront: introduce blkfront_gather_backend_features()Bob Liu1-54/+68
There is a bug when migrate from !feature-persistent host to feature-persistent host, because domU still thinks new host/backend doesn't support persistent. Dmesg like: backed has not unmapped grant: 839 backed has not unmapped grant: 773 backed has not unmapped grant: 773 backed has not unmapped grant: 773 backed has not unmapped grant: 839 The fix is to recheck feature-persistent of new backend in blkif_recover(). See: https://lkml.org/lkml/2015/5/25/469 As Roger suggested, we can split the part of blkfront_connect that checks for optional features, like persistent grants, indirect descriptors and flush/barrier features to a separate function and call it from both blkfront_connect and blkif_recover Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-07-01Merge tag 'for-linus-4.2-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tipLinus Torvalds1-6/+0
Pull xen updates from David Vrabel: "Xen features and cleanups for 4.2-rc0: - add "make xenconfig" to assist in generating configs for Xen guests - preparatory cleanups necessary for supporting 64 KiB pages in ARM guests - automatically use hvc0 as the default console in ARM guests" * tag 'for-linus-4.2-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: block/xen-blkback: s/nr_pages/nr_segs/ block/xen-blkfront: Remove invalid comment block/xen-blkfront: Remove unused macro MAXIMUM_OUTSTANDING_BLOCK_REQS arm/xen: Drop duplicate define mfn_to_virt xen/grant-table: Remove unused macro SPP xen/xenbus: client: Fix call of virt_to_mfn in xenbus_grant_ring xen: Include xen/page.h rather than asm/xen/page.h kconfig: add xenconfig defconfig helper kconfig: clarify kvmconfig is for kvm xen/pcifront: Remove usage of struct timeval xen/tmem: use BUILD_BUG_ON() in favor of BUG_ON() hvc_xen: avoid uninitialized variable warning xenbus: avoid uninitialized variable warning xen/arm: allow console=hvc0 to be omitted for guests arm,arm64/xen: move Xen initialization earlier arm/xen: Correctly check if the event channel interrupt is present
2015-06-22drivers: xen-blkfront: only talk_to_blkback() when in XenbusStateInitialisingBob Liu1-0/+2
Patch 69b91ede5cab843dcf345c28bd1f4b5a99dacd9b "drivers: xen-blkback: delay pending_req allocation to connect_ring" exposed an problem that Xen blkfront has. There is a race with XenStored and the drivers such that we can see two: vbd vbd-268440320: blkfront:blkback_changed to state 2. vbd vbd-268440320: blkfront:blkback_changed to state 2. vbd vbd-268440320: blkfront:blkback_changed to state 4. state changes to XenbusStateInitWait ('2'). The end result is that blkback_changed() receives two notify and calls twice setup_blkring(). While the backend driver may only get the first setup_blkring() which is wrong and reads out-dated (or reads them as they are being updated with new ring-ref values). The end result is that the ring ends up being incorrectly set. The other drivers in the tree have such checks already in. Reported-and-Tested-by: Robert Butera <robert.butera@oracle.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-06-17block/xen-blkfront: Remove invalid commentJulien Grall1-6/+0
Since commit b764915 "xen-blkfront: use a different scatterlist for each request", biovec has been replaced by scatterlist when copying back the data during a completion request. Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-06-17block/xen-blkfront: Remove unused macro MAXIMUM_OUTSTANDING_BLOCK_REQSJulien Grall1-2/+0
Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-06-05xen/block: add multi-page ring supportBob Liu1-39/+96
Extend xen/block to support multi-page ring, so that more requests can be issued by using more than one pages as the request ring between blkfront and backend. As a result, the performance can get improved significantly. We got some impressive improvements on our highend iscsi storage cluster backend. If using 64 pages as the ring, the IOPS increased about 15 times for the throughput testing and above doubled for the latency testing. The reason was the limit on outstanding requests is 32 if use only one-page ring, but in our case the iscsi lun was spread across about 100 physical drives, 32 was really not enough to keep them busy. Changes in v2: - Rebased to 4.0-rc6. - Document on how multi-page ring feature working to linux io/blkif.h. Changes in v3: - Remove changes to linux io/blkif.h and follow the protocol defined in io/blkif.h of XEN tree. - Rebased to 4.1-rc3 Changes in v4: - Turn to use 'ring-page-order' and 'max-ring-page-order'. - A few comments from Roger. Changes in v5: - Clarify with 4k granularity to comment - Address more comments from Roger Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-06-05driver: xen-blkfront: move talk_to_blkback to a more suitable placeBob Liu1-8/+6
The major responsibility of talk_to_blkback() is allocate and initialize the request ring and write the ring info to xenstore. But this work should be done after backend entered 'XenbusStateInitWait' as defined in the protocol file. See xen/include/public/io/blkif.h in XEN git tree: Front Back ================================= ===================================== XenbusStateInitialising XenbusStateInitialising o Query virtual device o Query backend device identification properties. data. o Setup OS device instance. o Open and validate backend device. o Publish backend features and transport parameters. | | V XenbusStateInitWait o Query backend features and transport parameters. o Allocate and initialize the request ring. There is no problem with this yet, but it is an violation of the design and furthermore it would not allow frontend/backend to negotiate 'multi-page' and 'multi-queue' features. Changes in v2: - Re-write the commit message to be more clear. Signed-off-by: Bob Liu <bob.liu@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2015-04-15xenbus_client: Extend interface to support multi-page ringWei Liu1-2/+3
Originally Xen PV drivers only use single-page ring to pass along information. This might limit the throughput between frontend and backend. The patch extends Xenbus driver to support multi-page ring, which in general should improve throughput if ring is the bottleneck. Changes to various frontend / backend to adapt to the new interface are also included. Affected Xen drivers: * blkfront/back * netfront/back * pcifront/back * scsifront/back * vtpmfront The interface is documented, as before, in xenbus_client.c. Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Cc: Konrad Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-02-12Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+1
Merge third set of updates from Andrew Morton: - the rest of MM [ This includes getting rid of the numa hinting bits, in favor of just generic protnone logic. Yay. - Linus ] - core kernel - procfs - some of lib/ (lots of lib/ material this time) * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (104 commits) lib/lcm.c: replace include lib/percpu_ida.c: remove redundant includes lib/strncpy_from_user.c: replace module.h include lib/stmp_device.c: replace module.h include lib/sort.c: move include inside #if 0 lib/show_mem.c: remove redundant include lib/radix-tree.c: change to simpler include lib/plist.c: remove redundant include lib/nlattr.c: remove redundant include lib/kobject_uevent.c: remove redundant include lib/llist.c: remove redundant include lib/md5.c: simplify include lib/list_sort.c: rearrange includes lib/genalloc.c: remove redundant include lib/idr.c: remove redundant include lib/halfmd4.c: simplify includes lib/dynamic_queue_limits.c: simplify includes lib/sort.c: use simpler includes lib/interval_tree.c: simplify includes hexdump: make it return number of bytes placed in buffer ...
2015-02-12kernel.h: remove ancient __FUNCTION__ hackRasmus Villemoes1-1/+1
__FUNCTION__ hasn't been treated as a string literal since gcc 3.4, so this only helps people who only test-compile using 3.3 (compiler-gcc3.h barks at anything older than that). Besides, there are almost no occurrences of __FUNCTION__ left in the tree. [akpm@linux-foundation.org: convert remaining __FUNCTION__ references] Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10xen-blkfront: fix accounting of reqs when migratingRoger Pau Monne1-2/+2
Current migration code uses blk_put_request in order to finish a request before requeuing it. This function doesn't update the statistics of the queue, which completely screws accounting. Use blk_end_request_all instead which properly updates the statistics of the queue. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reported-and-Tested-by: Ouyang Zhaowei (Charles) <ouyangzhaowei@huawei.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: xen-devel@lists.xenproject.org
2014-12-10xen/blkfront: remove redundant flush_opVitaly Kuznetsov1-20/+31
flush_op is unambiguously defined by feature_flush: REQ_FUA | REQ_FLUSH -> BLKIF_OP_WRITE_BARRIER REQ_FLUSH -> BLKIF_OP_FLUSH_DISKCACHE 0 -> 0 and thus can be removed. This is just a cleanup. The patch was suggested by Boris Ostrovsky. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-12-10xen/blkfront: improve protection against issuing unsupported REQ_FUAVitaly Kuznetsov1-6/+8
Guard against issuing unsupported REQ_FUA and REQ_FLUSH was introduced in d11e61583 and was factored out into blkif_request_flush_valid() in 0f1ca65ee. However: 1) This check in incomplete. In case we negotiated to feature_flush = REQ_FLUSH and flush_op = BLKIF_OP_FLUSH_DISKCACHE (so FUA is unsupported) FUA request will still pass the check. 2) blkif_request_flush_valid() is misnamed. It is bool but returns true when the request is invalid. 3) When blkif_request_flush_valid() fails -EIO is being returned. It seems that -EOPNOTSUPP is more appropriate here. Fix all of the above issues. This patch is based on the original patch by Laszlo Ersek and a comment by Jeff Moyer. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-10-18Merge branch 'for-3.18/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds1-3/+9
Pull block layer driver update from Jens Axboe: "This is the block driver pull request for 3.18. Not a lot in there this round, and nothing earth shattering. - A round of drbd fixes from the linbit team, and an improvement in asender performance. - Removal of deprecated (and unused) IRQF_DISABLED flag in rsxx and hd from Michael Opdenacker. - Disable entropy collection from flash devices by default, from Mike Snitzer. - A small collection of xen blkfront/back fixes from Roger Pau Monné and Vitaly Kuznetsov" * 'for-3.18/drivers' of git://git.kernel.dk/linux-block: block: disable entropy contributions for nonrot devices xen, blkfront: factor out flush-related checks from do_blkif_request() xen-blkback: fix leak on grant map error path xen/blkback: unmap all persistent grants when frontend gets disconnected rsxx: Remove deprecated IRQF_DISABLED block: hd: remove deprecated IRQF_DISABLED drbd: use RB_DECLARE_CALLBACKS() to define augment callbacks drbd: compute the end before rb_insert_augmented() drbd: Add missing newline in resync progress display in /proc/drbd drbd: reduce lock contention in drbd_worker drbd: Improve asender performance drbd: Get rid of the WORK_PENDING macro drbd: Get rid of the __no_warn and __cond_lock macros drbd: Avoid inconsistent locking warning drbd: Remove superfluous newline from "resync_extents" debugfs entry. drbd: Use consistent names for all the bi_end_io callbacks drbd: Use better variable names
2014-10-06xen: remove DEFINE_XENBUS_DRIVER() macroDavid Vrabel1-2/+3
The DEFINE_XENBUS_DRIVER() macro looks a bit weird and causes sparse errors. Replace the uses with standard structure definitions instead. This is similar to pci and usb device registration. Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2014-10-01xen, blkfront: factor out flush-related checks from do_blkif_request()Arianna Avanzini1-3/+9
This commit factors out some checks related to the request insertion path, which can be done in an function instead of by itself. Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-05-28Merge branch 'stable/for-jens-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip into for-3.16/driversJens Axboe1-26/+14
Konrad writes: Please git pull the following branch: git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git stable/for-jens-3.16 which has a bunch of fixes to the Xen block frontend and backend driver and a new parameter for Xen backend driver - an override (set by the toolstack) whether to expose the discard support (if disk of course supports it) or not.
2014-05-28xen-blkfront: remove type check from blkfront_setup_discardOlaf Hering1-26/+14
In its initial implementation a check for "type" was added, but only phy and file are handled. This breaks advertised discard support for other type values such as qdisk. Fix and simplify this function: If the backend advertises discard support it is supposed to implement it properly, so enable feature_discard unconditionally. If the backend advertises the need for a certain granularity and alignment then propagate both properties to the blocklayer. The discard-secure property is a boolean, update the code to reflect that. Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-04-15block: remove struct request buffer memberJens Axboe1-2/+2
This was used in the olden days, back when onions were proper yellow. Basically it mapped to the current buffer to be transferred. With highmem being added more than a decade ago, most drivers map pages out of a bio, and rq->buffer isn't pointing at anything valid. Convert old style drivers to just use bio_data(). For the discard payload use case, just reference the page in the bio. Signed-off-by: Jens Axboe <axboe@fb.com>
2014-02-10Merge branch 'stable/for-jens-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip into for-linusJens Axboe1-4/+7
Konrad writes: Please git pull the following branch: git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git stable/for-jens-3.14 which is based off v3.13-rc6. If you would like me to rebase it on a different branch/tag I would be more than happy to do so. The patches are all bug-fixes and hopefully can go in 3.14. They deal with xen-blkback shutdown and cause memory leaks as well as shutdown races. They should go to stable tree and if you are OK with I will ask them to backport those fixes. There is also a fix to xen-blkfront to deal with unexpected state transition. And lastly a fix to the header where it was using the __aligned__ unnecessarily.