aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux/nvme.h (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-06-28nvme: use a single NVME_AQ_DEPTH and relax it to 32Sagi Grimberg1-1/+1
No need to differentiate fabrics from pci/loop, also lower it to 32 as we don't really need 256 inflight admin commands. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-27nvme: add support for streams and directivesJens Axboe1-0/+48
This adds support for Directives in NVMe, particular for the Streams directive. Support for Directives is a new feature in NVMe 1.3. It allows a user to pass in information about where to store the data, so that it the device can do so most effiently. If an application is managing and writing data with different life times, mixing differently retentioned data onto the same locations on flash can cause write amplification to grow. This, in turn, will reduce performance and life time of the device. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-16nvme: implement NS Optimal IO Boundary from 1.3 SpecScott Bauer1-1/+1
The NVMe 1.3 spec introduces Namespace Optimal IO Boundaries (NOIOB), which standardizes the stripe mechanism we currently have quirks for. This patch implements the necessary logic to handle this new feature. Signed-off-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-15nvme: add fields into identify controller data structureGuan Junxiong1-2/+8
Add the new to NVMe 1.3 fields EDSTT, DSTO, FWUG, HCTMA, MNTMT, MXTMT, and SANICAP into the idenfity controller data structure. Signed-off-by: Guan Junxiong <guanjunxiong@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-15nvmet: allow overriding the NVMe VS via configfsJohannes Thumshirn1-0/+4
Allow overriding the announced NVMe Version of a via configfs. This is particularly helpful when debugging new features for the host or target side without bumping the hard coded version (as the target might not be fully compliant to the announced version yet). Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Guan Junxiong <guanjunxiong@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-15nvme: introduce NVMe Namespace Identification Descriptor structuresJohannes Thumshirn1-0/+17
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-15nvmet: use NVME_IDENTIFY_DATA_SIZEJohannes Thumshirn1-0/+2
Use NVME_IDENTIFY_DATA_SIZE define instead of hard coding the magic 4096 value. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.com> [hch: converted three more users] Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-15nvme-pci: remap BAR0 to cover admin CQ doorbell for large strideXu Yu1-0/+1
The existing driver initially maps 8192 bytes of BAR0 which is intended to cover doorbells of admin SQ and CQ. However, if a large stride, e.g. 10, is used, the doorbell of admin CQ will be out of 8192 bytes. Consequently, a page fault will be raised when the admin CQ doorbell is accessed in nvme_configure_admin_queue(). This patch fixes this issue by remapping BAR0 before accessing admin CQ doorbell if the initial mapping is not enough. Signed-off-by: Xu Yu <yu.a.xu@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-13nvme.h: add dword 12 - 15 fields to struct nvme_featuresArnav Dawn1-1/+4
Signed-off-by: Arnav Dawn <a.dawn@samsung.com> [hch: split from a larger patch, new changelog] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2017-06-13nvme.h: add struct nvme_host_mem_buf_desc and HMB flagsChristoph Hellwig1-0/+11
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
2017-06-05nvme: switch to uuid_tChristoph Hellwig1-1/+2
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
2017-04-21nvme: improve performance for virtual NVMe devicesHelen Koike1-0/+13
This change provides a mechanism to reduce the number of MMIO doorbell writes for the NVMe driver. When running in a virtualized environment like QEMU, the cost of an MMIO is quite hefy here. The main idea for the patch is provide the device two memory location locations: 1) to store the doorbell values so they can be lookup without the doorbell MMIO write 2) to store an event index. I believe the doorbell value is obvious, the event index not so much. Similar to the virtio specification, the virtual device can tell the driver (guest OS) not to write MMIO unless you are writing past this value. FYI: doorbell values are written by the nvme driver (guest OS) and the event index is written by the virtual device (host OS). The patch implements a new admin command that will communicate where these two memory locations reside. If the command fails, the nvme driver will work as before without any optimizations. Contributions: Eric Northup <digitaleric@google.com> Frank Swiderski <fes@google.com> Ted Tso <tytso@mit.edu> Keith Busch <keith.busch@intel.com> Just to give an idea on the performance boost with the vendor extension: Running fio [1], a stock NVMe driver I get about 200K read IOPs with my vendor patch I get about 1000K read IOPs. This was running with a null device i.e. the backing device simply returned success on every read IO request. [1] Running on a 4 core machine: fio --time_based --name=benchmark --runtime=30 --filename=/dev/nvme0n1 --nrfiles=1 --ioengine=libaio --iodepth=32 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4 --rw=randread --blocksize=4k --randrepeat=false Signed-off-by: Rob Nelson <rlnelson@google.com> [mlin: port for upstream] Signed-off-by: Ming Lin <mlin@kernel.org> [koike: updated for upstream] Signed-off-by: Helen Koike <helen.koike@collabora.co.uk> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com>
2017-04-02nvme: Correct NVMF enum values to match NVMe-oF rev 1.0Roland Dreier1-8/+8
The enum values for QPTYPE, PRTYPE and CMS are off by 1 from the values defined in figure 42 of the NVM Express over Fabrics 1.0: http://www.nvmexpress.org/wp-content/uploads/NVMe_over_Fabrics_1_0_Gold_20160605-1.pdf Fix our enums to match the final spec. Signed-off-by: Roland Dreier <roland@purestorage.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2017-02-22nvme: Enable autonomous power state transitionsAndy Lutomirski1-0/+6
NVMe devices can advertise multiple power states. These states can be either "operational" (the device is fully functional but possibly slow) or "non-operational" (the device is asleep until woken up). Some devices can automatically enter a non-operational state when idle for a specified amount of time and then automatically wake back up when needed. The hardware configuration is a table. For each state, an entry in the table indicates the next deeper non-operational state, if any, to autonomously transition to and the idle time required before transitioning. This patch teaches the driver to program APST so that each successive non-operational state will be entered after an idle time equal to 100% of the total latency (entry plus exit) associated with that state. The maximum acceptable latency is controlled using dev_pm_qos (e.g. power/pm_qos_latency_tolerance_us in sysfs); non-operational states with total latency greater than this value will not be used. As a special case, setting the latency tolerance to 0 will disable APST entirely. On hardware without APST support, the sysfs file will not be exposed. The latency tolerance for newly-probed devices is set by the module parameter nvme_core.default_ps_max_latency_us. In theory, the device can expose "default" APST table, but this doesn't seem to function correctly on my device (Samsung 950), nor does it seem particularly useful. There is also an optional mechanism by which a configuration can be "saved" so it will be automatically loaded on reset. This can be configured from userspace, but it doesn't seem useful to support in the driver. On my laptop, enabling APST seems to save nearly 1W. The hardware tables can be decoded in userspace with nvme-cli. 'nvme id-ctrl /dev/nvmeN' will show the power state table and 'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST configuration. This feature is quirked off on a known-buggy Samsung device. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-22nvme: Use CNS as 8-bit field and avoid endianness conversionParav Pandit1-1/+3
This patch defines CNS field as 8-bit field and avoids cpu_to/from_le conversions. Also initialize nvme_command cns value explicitly to NVME_ID_CNS_NS for readability (don't rely on the fact that NVME_ID_CNS_NS = 0). Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-17Merge branch 'for-4.11/next' into for-4.11/linus-mergeJens Axboe1-0/+2
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-17nvme: Check for Security send/recv support before issuing commands.Scott Bauer1-0/+1
We need to verify that the controller supports the security commands before actually trying to issue them. Signed-off-by: Scott Bauer <scott.bauer@intel.com> [hch: moved the check so that we don't call into the OPAL code if not supported] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-08nvme: support ranged discard requestsChristoph Hellwig1-0/+2
NVMe supports up to 256 ranges per DSM command, so wire up support for ranged discards up to that limit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-12-13Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-blockLinus Torvalds1-5/+38
Pull block layer updates from Jens Axboe: "This is the main block pull request this series. Contrary to previous release, I've kept the core and driver changes in the same branch. We always ended up having dependencies between the two for obvious reasons, so makes more sense to keep them together. That said, I'll probably try and keep more topical branches going forward, especially for cycles that end up being as busy as this one. The major parts of this pull request is: - Improved support for O_DIRECT on block devices, with a small private implementation instead of using the pig that is fs/direct-io.c. From Christoph. - Request completion tracking in a scalable fashion. This is utilized by two components in this pull, the new hybrid polling and the writeback queue throttling code. - Improved support for polling with O_DIRECT, adding a hybrid mode that combines pure polling with an initial sleep. From me. - Support for automatic throttling of writeback queues on the block side. This uses feedback from the device completion latencies to scale the queue on the block side up or down. From me. - Support from SMR drives in the block layer and for SD. From Hannes and Shaun. - Multi-connection support for nbd. From Josef. - Cleanup of request and bio flags, so we have a clear split between which are bio (or rq) private, and which ones are shared. From Christoph. - A set of patches from Bart, that improve how we handle queue stopping and starting in blk-mq. - Support for WRITE_ZEROES from Chaitanya. - Lightnvm updates from Javier/Matias. - Supoort for FC for the nvme-over-fabrics code. From James Smart. - A bunch of fixes from a whole slew of people, too many to name here" * 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits) blk-stat: fix a few cases of missing batch flushing blk-flush: run the queue when inserting blk-mq flush elevator: make the rqhash helpers exported blk-mq: abstract out blk_mq_dispatch_rq_list() helper blk-mq: add blk_mq_start_stopped_hw_queue() block: improve handling of the magic discard payload blk-wbt: don't throttle discard or write zeroes nbd: use dev_err_ratelimited in io path nbd: reset the setup task for NBD_CLEAR_SOCK nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME nvme-fabrics: Add target support for FC transport nvme-fabrics: Add host support for FC transport nvme-fabrics: Add FC transport LLDD api definitions nvme-fabrics: Add FC transport FC-NVME definitions nvme-fabrics: Add FC transport error codes to nvme.h Add type 0x28 NVME type code to scsi fc headers nvme-fabrics: patch target code in prep for FC transport support nvme-fabrics: set sqe.command_id in core not transports parser: add u64 number parser nvme-rdma: align to generic ib_event logging helper ...
2016-12-06nvme-fabrics: Add FC transport error codes to nvme.hJames Smart1-0/+13
Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2016-12-01nvme.h: add Write Zeroes definitionsChaitanya Kulkarni1-0/+20
Add the command structure, optional command set support (ONCS) bit and a new error code for the Write Zeroes command. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-10nvme: introduce struct nvme_requestChristoph Hellwig1-5/+5
This adds a shared per-request structure for all NVMe I/O. This structure is embedded as the first member in all NVMe transport drivers request private data and allows to implement common functionality between the drivers. The first use is to replace the current abuse of the SCSI command passthrough fields in struct request for the NVMe command passthrough, but it will grow a field more fields to allow implementing things like common abort handlers in the future. The passthrough commands are handled by having a pointer to the SQE (struct nvme_command) in struct nvme_request, and the union of the possible result fields, which had to be turned from an anonymous into a named union for that purpose. This avoids having to pass a reference to a full CQE around and thus makes checking the result a lot more lightweight. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-19nvme.h: add an enum for cns valuesChristoph Hellwig1-0/+10
Ported over from nvme-cli. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-19nvme.h: don't use uuid_beChristoph Hellwig1-2/+1
This makes life easier for nvme-cli and we don't really need the uuid type anyway to start with. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-19nvme.h: resync with nvme-cliChristoph Hellwig1-6/+27
Import a few updates to nvme.h from nvme-cli. This mostly includes a few new fields and error codes, but also a few renames that so far are only used in user space. Also one field is moved from an array of two le64 values to one of 16 u8 values so that we can more easily access it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-19nvme: Add tertiary number to NVME_VSGabriel Krisman Bertazi1-1/+2
NVMe 1.2.1 specification adds a tertiary element to the version number. This updates the macro and its callers to include the final number and fixup a single place in nvmet where the version was generated manually. Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-19nvme-fabrics: change NQN UUID to big-endian formatDaniel Verkamp1-1/+1
NVM Express 1.2.1 section 7.9, NVMe Qualified Names, specifies that the UUID format of NQN uses a UUID based on RFC 4122. RFC 4122 specifies that the UUID is encoded in big-endian byte order. Switch the NVMe over Fabrics host ID field from little-endian UUID to big-endian UUID to match the specification. Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-07-05nvme.h: Add keep-alive opcode and identify controller attributeSagi Grimberg1-1/+5
KAS: keep-alive support and granularity of kato in units of 100 ms nvme_admin_keep_alive opcode: 0x18 Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-05nvme.h: add NVMe over Fabrics definitionsChristoph Hellwig1-19/+318
The NVMe over Fabrics specification defines a protocol interface and related extensions to NVMe that enable operation over network protocols. The NVMe over Fabrics specification has an NVMe Transport binding for each NVMe Transport. This patch adds the fabrics related definitions: - fabric specific command set and error codes - transport addressing and binding definitions - fabrics sgl extensions - controller identification fabrics enhancements - discovery log page definition Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-12nvme: factor out a add nvme_is_write helperChristoph Hellwig1-0/+5
Centralize the check if a given NVMe command reads or writes data. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-12nvme.h: add constants for PSDT and FUSE valuesJames Smart1-0/+23
Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-12nvme.h: add AER constantsChristoph Hellwig1-0/+6
Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-12nvme.h: add NVM command set SQE/CQE size definesChristoph Hellwig1-2/+9
Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-12nvme.h: Add get_log_page command strucureArmen Baloyan1-0/+19
Add get_log_page command structure and a corresponding entry in nvme_command union Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com> Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed--by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-12nvme.h: add RTD3R, RTD3E and OAES fieldsChristoph Hellwig1-1/+4
These have been added in NVMe 1.2 and we'll need at least oaes for the NVMe target driver. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-05-02NVMe: correct comment for offset enum of controller registers in nvme.hWang Sheng-Hui1-2/+2
Section 3.1 gives the comment for the offset of controller registers in the specification 1.2a. Some are mis-copied in the header file nvme.h. Correct them. Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-01nvme: use offset instead of a struct for registersChristoph Hellwig1-14/+13
This makes life easier for future non-PCI drivers where access to the registers might be more complicated. Note that Linux drivers are pretty evenly split between the two versions, and in fact the NVMe driver already uses offsets for the doorbells. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> [Fixed CMBSZ offset] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-10-09nvme: include <linux/types.ĥ> in <linux/nvme.h>Christoph Hellwig1-0/+2
The buildbot complains about this even if it doesn't generate a a build warning. But it's an easy fix, so here we go: Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-10-09nvme.h: add missing nvme_id_ctrl endianess annotationsChristoph Hellwig1-2/+2
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-10-09nvme: move hardware structures out of the uapi version of nvme.hChristoph Hellwig1-2/+524
Currently all NVMe command and completion structures are exposed to userspace through the uapi version of nvme.h. They are not an ABI between the kernel and userspace, and will change in C-incompatible way for future versions of the spec. Move them to the kernel version of the file and rename the uapi header to nvme_ioctl.h so that userspace can easily detect the presence of the new clean header. Nvme-cli already carries a local copy of the header, so it won't be affected by this move. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-10-09nvme: add a local nvme.h headerChristoph Hellwig1-114/+0
Add a new drivers/block/nvme.h which contains all the driver internal interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-10-09NVMe: Simplify device resume on io queue failureKeith Busch1-1/+0
Releasing IO queues and disks was done in a work queue outside the controller resume context to delete namespaces if the controller failed after a resume from suspend. This is unnecessary since we can resume a device asynchronously. This patch makes resume use probe_work so it can directly remove namespaces if the device is manageable but not IO capable. Since the deleting disks was the only reason we had the convoluted "reset_workfn", this patch removes that unnecessary indirection. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-10-09NVMe: Reference count open namespacesKeith Busch1-0/+1
Dynamic namespace attachment means the namespace may be removed at any time, so the namespace reference count can not be tied to the device reference count. This fixes a NULL dereference if an opened namespace is detached from a controller. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18NVMe: Add nvme subsystem reset IOCTLJon Derrick1-1/+1
Controllers can perform optional subsystem resets as introduced in NVMe 1.1. This patch adds an IOCTL to trigger the subsystem reset by writing "NVMe" to the NSSR register. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-18NVMe: Add nvme subsystem reset supportKeith Busch1-0/+3
Controllers part of an NVMe subsystem may be reset by any other controller in the subsystem. If the device is capable of subsystem resets, this patch adds detection for such events and performs appropriate controller initialization upon subsystem reset detection. The register bit is a RW1C type, so the driver needs to write a 1 to the status bit to clear the subsystem reset occured bit during initialization. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-21NVMe: Use CMB for the IO SQes if availableJon Derrick1-0/+17
Some controllers have a controller-side memory buffer available for use for submissions, completions, lists, or data. If a CMB is available, the entire CMB will be ioremapped and it will attempt to map the IO SQes onto the CMB. The queues will be shrunk as needed. The CMB will not be used if the queue depth is shrunk below some threshold where it may have reduced performance over a larger queue in system memory. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-06-05NVMe: Automatic namespace rescanKeith Busch1-0/+1
Namespaces may be dynamically allocated and deleted or attached and detached. This has the driver rescan the device for namespace changes after each device reset or namespace change asynchronous event. There could potentially be many detached namespaces that we don't want polluting /dev/ with unusable block handles, so this will delete disks if the namespace is not active as indicated by the response from identify namespace. This also skips adding the disk if no capacity is provisioned to the namespace in the first place. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22nvme: submit internal commands through the block layerChristoph Hellwig1-15/+9
Use block layer queues with an internal cmd_type to submit internally generated NVMe commands. This both simplifies the code a lot and allow for a better structure. For example now the LighNVM code can construct commands without knowing the details of the underlying I/O descriptors. Or a future NVMe over network target could inject commands, as well as could the SCSI translation and ioctl code be reused for such a beast. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22nvme: store a struct device pointer in struct nvme_devChristoph Hellwig1-1/+1
Most users want the generic device, so store that in struct nvme_dev instead of the pci_dev. This also happens to be a nice step towards making some code reusable for non-PCI transports. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-22nvme: consolidate synchronous command submission helpersChristoph Hellwig1-5/+1
Note that we keep the unused timeout argument, but allow callers to pass 0 instead of a timeout if they want the default. This will allow adding a timeout to the pass through path later on. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>