aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/nvme (follow)
AgeCommit message (Collapse)AuthorFilesLines
2016-11-14nvmet-rdma: drain the queue-pair just before freeing itSagi Grimberg1-1/+1
draining the qp right after disconnect might not suffice because the nvmet sq is not fully drained (in nvmet_sq_destroy) and we might see completions after the drain. Instead, drain right before the qp destroy which comes after the sq destruction and we can be sure that no posts come after the drain. Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-11-14nvme-rdma: stop and free io queues on connect failureSteve Wise1-2/+9
While testing nvme-rdma with the spdk nvmf target over iw_cxgb4, I configured the target (mistakenly) to generate an error creating the NVMF IO queues. This resulted a "Invalid SQE Parameter" error sent back to the host on the first IO queue connect: [ 9610.928182] nvme nvme1: queue_size 128 > ctrl maxcmd 120, clamping down [ 9610.938745] nvme nvme1: creating 32 I/O queues. So nvmf_connect_io_queue() returns an error to nvmf_connect_io_queue() / nvmf_connect_io_queues(), and that is returned to nvme_rdma_create_io_queues(). In the error path, nvmf_rdma_create_io_queues() frees the queue tagset memory _before_ stopping and freeing the IB queues, which causes yet another touch-after-free crash due to SQ CQEs being flushed after the ib_cqe structs pointed-to by the flushed WRs have been freed (since they are part of the nvme_rdma_request struct). The fix is to stop and free the queues in nvmf_connect_io_queues() if there is an error connecting any of the queues. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-11-14nvmet-rdma: don't forget to delete a queue from the list of connection failedSagi Grimberg1-1/+7
In case we accepted a queue connection and it failed, we might not remove the queue from the list until we unload and clean it up. We should delete it from the queue list on the relevant handler. Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-11-14nvmet: Don't queue fatal error work if csts.cfs is setSagi Grimberg1-3/+7
In the transport, in case of an interal queue error like error completion in rdma we trigger a fatal error. However, multiple queues in the same controller can serr error completions and we don't want to trigger fatal error work more than once. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-11-14nvme-rdma: reject non-connect commands before the queue is liveChristoph Hellwig1-1/+30
If we reconncect we might have command queue up that get resent as soon as the queue is restarted. But until the connect command succeeded we can't send other command. Add a new flag that marks a queue as live when connect finishes, and delay any non-connect command until the queue is live based on it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Steve Wise <swise@opengridcomputing.com> [sagig: fixes admin queue LIVE setting] Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-11-14nvmet-rdma: Fix possible NULL deref when handling rdma cm eventsBart Van Assche1-1/+7
When we initiate queue teardown sequence we call rdma_destroy_qp which clears cm_id->qp, afterwards we call rdma_destroy_id, but we might see a rdma_cm event in between with a cleared cm_id->qp so watch out for that and silently ignore the event because this means that the queue teardown sequence is in progress. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-11-11lightnvm: invalid offset calculation for lba_shiftMatias Bjørling1-1/+1
The ns->lba_shift assumes its value to be the logarithmic of the LA size. A previous patch duplicated the lba_shift calculation into lightnvm. It prematurely also subtracted a 512byte shift, which commonly is applied per-command. The 512byte shift being subtracted twice led to data loss when restoring the logical to physical mapping table from device and when issuing I/O commands using rrpc. Fix offset by removing the 512byte shift subtraction when calculating lba_shift. Fixes: b0b4e09c1ae7 "lightnvm: control life of nvm_dev in driver" Reported-by: Javier González <javier@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-11block: move poll code to blk-mqJens Axboe1-1/+1
The poll code is blk-mq specific, let's move it to blk-mq.c. This is a prep patch for improving the polling code. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-11-10nvme: don't pass the full CQE to nvme_complete_async_eventChristoph Hellwig5-11/+21
We only need the status and result fields, and passing them explicitly makes life a lot easier for the Fibre Channel transport which doesn't have a full CQE for the fast path case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-10nvme: introduce struct nvme_requestChristoph Hellwig10-81/+71
This adds a shared per-request structure for all NVMe I/O. This structure is embedded as the first member in all NVMe transport drivers request private data and allows to implement common functionality between the drivers. The first use is to replace the current abuse of the SCSI command passthrough fields in struct request for the NVMe command passthrough, but it will grow a field more fields to allow implementing things like common abort handlers in the future. The passthrough commands are handled by having a pointer to the SQE (struct nvme_command) in struct nvme_request, and the union of the possible result fields, which had to be turned from an anonymous into a named union for that purpose. This avoids having to pass a reference to a full CQE around and thus makes checking the result a lot more lightweight. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-02nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq codeBart Van Assche1-14/+2
Make nvme_requeue_req() check BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED. Remove the QUEUE_FLAG_STOPPED manipulations that became superfluous because of this change. Change blk_queue_stopped() tests into blk_mq_queue_stopped(). This patch fixes a race condition: using queue_flag_clear_unlocked() is not safe if any other function that manipulates the queue flags can be called concurrently, e.g. blk_cleanup_queue(). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-02nvme: Fix a race condition related to stopping queuesBart Van Assche1-1/+1
Avoid that nvme_queue_rq() is still running when nvme_stop_queues() returns. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-02blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request()Bart Van Assche1-1/+1
Most blk_mq_requeue_request() and blk_mq_add_to_requeue_list() calls are followed by kicking the requeue list. Hence add an argument to these two functions that allows to kick the requeue list. This was proposed by Christoph Hellwig. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-02blk-mq: Remove blk_mq_cancel_requeue_work()Bart Van Assche1-1/+0
Since blk_mq_requeue_work() no longer restarts stopped queues canceling requeue work is no longer needed to prevent that a stopped queue would be restarted. Hence remove this function. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Hannes Reinecke <hare@suse.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-01block,fs: use REQ_* flags directlyChristoph Hellwig1-2/+2
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-28block: split out request-only flags into a new namespaceChristoph Hellwig1-2/+2
A lot of the REQ_* flags are only used on struct requests, and only of use to the block layer and a few drivers that dig into struct request internals. This patch adds a new req_flags_t rq_flags field to struct request for them, and thus dramatically shrinks the number of common requests. It also removes the unfortunate situation where we have to fit the fields from the same enum into 32 bits for struct bio and 64 bits for struct request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-21Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds6-38/+71
Pull block fixes from Jens Axboe: "A set of fixes that missed the merge window, mostly due to me being away around that time. Nothing major here, a mix of nvme cleanups and fixes, and one fix for the badblocks handling" * 'for-linus' of git://git.kernel.dk/linux-block: nvmet: use symbolic constants for CNS values nvme: use symbolic constants for CNS values nvme.h: add an enum for cns values nvme.h: don't use uuid_be nvme.h: resync with nvme-cli nvme: Add tertiary number to NVME_VS nvme : Add sysfs entry for NVMe CMBs when appropriate nvme: don't schedule multiple resets nvme: Delete created IO queues on reset nvme: Stop probing a removed device badblocks: fix overlapping check for clearing
2016-10-19nvmet: use symbolic constants for CNS valuesChristoph Hellwig2-4/+4
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-19nvme: use symbolic constants for CNS valuesChristoph Hellwig1-2/+2
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-19nvme.h: resync with nvme-cliChristoph Hellwig2-2/+2
Import a few updates to nvme.h from nvme-cli. This mostly includes a few new fields and error codes, but also a few renames that so far are only used in user space. Also one field is moved from an array of two le64 values to one of 16 u8 values so that we can more easily access it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-19nvme: Add tertiary number to NVME_VSGabriel Krisman Bertazi4-9/+9
NVMe 1.2.1 specification adds a tertiary element to the version number. This updates the macro and its callers to include the final number and fixup a single place in nvmet where the version was generated manually. Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-12nvme : Add sysfs entry for NVMe CMBs when appropriateStephen Bates1-9/+35
Add a sysfs attribute that contains salient information about the NVMe Controller Memory Buffer when one is present. For now, just display the information about the CMB available from the control registers. We attach the CMB attribute file to the existing nvme_ctrl sysfs group so it can handle the sysfs teardown. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Stephen Bates <sbates@raithlin.com> Acked-by Jon Derrick: <jonathan.derrick@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-12nvme: don't schedule multiple resetsKeith Busch1-9/+13
The queue_work only fails if the work is pending, but not yet running. If the work is running, the work item would get requeued, triggering a double reset. If the first reset fails for any reason, the second reset triggers: WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING) Hitting that schedules controller deletion for a second time, which potentially takes a reference on the device that is being deleted. If the reset occurs at the same time as a hot removal event, this causes a double-free. This patch has the reset helper function check if the work is busy prior to queueing, and changes all places that schedule resets to use this function. Since most users don't want to sync with that work, the "flush_work" is moved to the only caller that wants to sync. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg<sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-12nvme: Delete created IO queues on resetKeith Busch1-4/+5
The driver was decrementing the online_queues prior to attempting to delete those IO queues, so the driver ended up not requesting the controller delete any. This patch saves the online_queues prior to suspending them, and adds that parameter for deleting io queues. Fixes: c21377f8 ("nvme: Suspend all queues before deletion") Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-12nvme: Stop probing a removed deviceKeith Busch1-0/+2
There is no reason the nvme controller can ever return all 1's from reading the CSTS register. This patch returns an error if we observe that status. Without this, we may incorrectly proceed with controller initialization and unnecessarilly rely on error handling to clean this. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-11nvme: use the DMA_ATTR_NO_WARN attributeMauricio Faria de Oliveira1-1/+2
Use the DMA_ATTR_NO_WARN attribute for the dma_map_sg() call of the nvme driver that returns BLK_MQ_RQ_QUEUE_BUSY (not for BLK_MQ_RQ_QUEUE_ERROR). Link: http://lkml.kernel.org/r/1470092390-25451-4-git-send-email-mauricfo@linux.vnet.ibm.com Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Jens Axboe <axboe@fb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-09Merge branch 'for-4.9/block-irq' of git://git.kernel.dk/linux-blockLinus Torvalds5-81/+36
Pull blk-mq irq/cpu mapping updates from Jens Axboe: "This is the block-irq topic branch for 4.9-rc. It's mostly from Christoph, and it allows drivers to specify their own mappings, and more importantly, to share the blk-mq mappings with the IRQ affinity mappings. It's a good step towards making this work better out of the box" * 'for-4.9/block-irq' of git://git.kernel.dk/linux-block: blk_mq: linux/blk-mq.h does not include all the headers it depends on blk-mq: kill unused blk_mq_create_mq_map() blk-mq: get rid of the cpumask in struct blk_mq_tags nvme: remove the post_scan callout nvme: switch to use pci_alloc_irq_vectors blk-mq: provide a default queue mapping for PCI device blk-mq: allow the driver to pass in a queue mapping blk-mq: remove ->map_queue blk-mq: only allocate a single mq_map per tag_set blk-mq: don't redistribute hardware queues on a CPU hotplug event
2016-10-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdmaLinus Torvalds2-21/+6
Pull main rdma updates from Doug Ledford: "This is the main pull request for the rdma stack this release. The code has been through 0day and I had it tagged for linux-next testing for a couple days. Summary: - updates to mlx5 - updates to mlx4 (two conflicts, both minor and easily resolved) - updates to iw_cxgb4 (one conflict, not so obvious to resolve, proper resolution is to keep the code in cxgb4_main.c as it is in Linus' tree as attach_uld was refactored and moved into cxgb4_uld.c) - improvements to uAPI (moved vendor specific API elements to uAPI area) - add hns-roce driver and hns and hns-roce ACPI reset support - conversion of all rdma code away from deprecated create_singlethread_workqueue - security improvement: remove unsafe ib_get_dma_mr (breaks lustre in staging)" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (75 commits) staging/lustre: Disable InfiniBand support iw_cxgb4: add fast-path for small REG_MR operations cxgb4: advertise support for FR_NSMR_TPTE_WR IB/core: correctly handle rdma_rw_init_mrs() failure IB/srp: Fix infinite loop when FMR sg[0].offset != 0 IB/srp: Remove an unused argument IB/core: Improve ib_map_mr_sg() documentation IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets IB/mthca: Move user vendor structures IB/nes: Move user vendor structures IB/ocrdma: Move user vendor structures IB/mlx4: Move user vendor structures IB/cxgb4: Move user vendor structures IB/cxgb3: Move user vendor structures IB/mlx5: Move and decouple user vendor structures IB/{core,hw}: Add constant for node_desc ipoib: Make ipoib_warn ratelimited IB/mlx4/alias_GUID: Remove deprecated create_singlethread_workqueue IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue IB/ipoib: Remove deprecated create_singlethread_workqueue ...
2016-10-07Merge branch 'for-4.9/block' of git://git.kernel.dk/linux-blockLinus Torvalds8-157/+268
Pull block layer updates from Jens Axboe: "This is the main pull request for block layer changes in 4.9. As mentioned at the last merge window, I've changed things up and now do just one branch for core block layer changes, and driver changes. This avoids dependencies between the two branches. Outside of this main pull request, there are two topical branches coming as well. This pull request contains: - A set of fixes, and a conversion to blk-mq, of nbd. From Josef. - Set of fixes and updates for lightnvm from Matias, Simon, and Arnd. Followup dependency fix from Geert. - General fixes from Bart, Baoyou, Guoqing, and Linus W. - CFQ async write starvation fix from Glauber. - Add supprot for delayed kick of the requeue list, from Mike. - Pull out the scalable bitmap code from blk-mq-tag.c and make it generally available under the name of sbitmap. Only blk-mq-tag uses it for now, but the blk-mq scheduling bits will use it as well. From Omar. - bdev thaw error progagation from Pierre. - Improve the blk polling statistics, and allow the user to clear them. From Stephen. - Set of minor cleanups from Christoph in block/blk-mq. - Set of cleanups and optimizations from me for block/blk-mq. - Various nvme/nvmet/nvmeof fixes from the various folks" * 'for-4.9/block' of git://git.kernel.dk/linux-block: (54 commits) fs/block_dev.c: return the right error in thaw_bdev() nvme: Pass pointers, not dma addresses, to nvme_get/set_features() nvme/scsi: Remove power management support nvmet: Make dsm number of ranges zero based nvmet: Use direct IO for writes admin-cmd: Added smart-log command support. nvme-fabrics: Add host_traddr options field to host infrastructure nvme-fabrics: revise host transport option descriptions nvme-fabrics: rework nvmf_get_address() for variable options nbd: use BLK_MQ_F_BLOCKING blkcg: Annotate blkg_hint correctly cfq: fix starvation of asynchronous writes blk-mq: add flag for drivers wanting blocking ->queue_rq() blk-mq: remove non-blocking pass in blk_mq_map_request blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue() block: export bio_free_pages to other modules lightnvm: propagate device_add() error code lightnvm: expose device geometry through sysfs lightnvm: control life of nvm_dev in driver blk-mq: register device instead of disk ...
2016-09-24nvme: Pass pointers, not dma addresses, to nvme_get/set_features()Andy Lutomirski3-13/+11
Any user I can imagine that needs a buffer at all will want to pass a pointer directly. There are no currently callers that use buffers, so this change is painless, and it will make it much easier to start using features that use buffers (e.g. APST). Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Tested-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-24nvme/scsi: Remove power management supportAndy Lutomirski1-71/+3
As far as I can tell, there is basically nothing correct about this code. It misinterprets npss (off-by-one). It hardcodes a bunch of power states, which is nonsense, because they're all just indices into a table that software needs to parse. It completely ignores the distinction between operational and non-operational states. And, until 4.8, if all of the above magically succeeded, it would dereference a NULL pointer and OOPS. Since this code appears to be useless, just delete it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Tested-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-23nvmet: Make dsm number of ranges zero basedAlexander Solganik1-1/+1
This caused the nvmet request data length to be incorrect. Signed-off-by: Alexander Solganik <sashas@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-09-23nvmet: Use direct IO for writesSagi Grimberg1-0/+1
We're designed to work with high-end devices where direct IO makes perfect sense. We noticed that we context switch by scheduling kblockd instead of going directly to the device without REQ_SYNC for writes. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jens Axboe <axboe@kernel.dk>
2016-09-23admin-cmd: Added smart-log command support.Chaitanya Kulkarni1-0/+88
This patch implements the support for smart-log command (NVM Express 1.2.1-section 5.10.1.2 SMART / Health Information (Log Identifier 02h)) on the target for NVMe over Fabric. In current implementation host can retrieve following statistics:- 1. Data Units Read. 2. Data Units Written. 3. Host Read Commands. 4. Host Write Commands. Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-09-23nvme-fabrics: Add host_traddr options field to host infrastructureJames Smart2-0/+17
Add the host_traddr field to allow specification of the host-port connection info for the transport. Will be used by FC transport. Signed-off-by: James Smart <james.smart@broadcom.com> Acked-by: Johannes Thumshirn <jth@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-09-23nvme-fabrics: revise host transport option descriptionsJames Smart1-3/+4
Revise some of the comments so not so ethernet-network centric Signed-off-by: James Smart <james.smart@broadcom.com> Acked-by: Johannes Thumshirn <jth@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-09-23nvme-fabrics: rework nvmf_get_address() for variable optionsJames Smart1-2/+10
Revise nvmf_get_address() string to account for not all options being present. Signed-off-by: James Smart <james.smart@broadcom.com> Acked-by: Johannes Thumshirn <jth@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-09-23nvme-rdma: use IB_PD_UNSAFE_GLOBAL_RKEYChristoph Hellwig1-20/+5
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-23IB/core: add support to create a unsafe global rkey to ib_create_pdChristoph Hellwig2-2/+2
Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or less unchecked, this moves the capability of creating a global rkey into the RDMA core, where it can be easily audited. It also prints a warning everytime this feature is used as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-09-22nvme-rdma: only clear queue flags after successful connectSagi Grimberg1-1/+1
Otherwise, nvme_rdma_stop_and_clear_queue() will incorrectly try to stop/free rdma qps/cm_ids that are already freed. Fixes: e89ca58f9c90 ("nvme-rdma: add DELETING queue flag") Reported-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-21lightnvm: expose device geometry through sysfsSimon A. F. Lund3-10/+30
For a host to access an Open-Channel SSD, it has to know its geometry, so that it writes and reads at the appropriate device bounds. Currently, the geometry information is kept within the kernel, and not exported to user-space for consumption. This patch exposes the configuration through sysfs and enables user-space libraries, such as liblightnvm, to use the sysfs implementation to get the geometry of an Open-Channel SSD. The sysfs entries are stored within the device hierarchy, and can be found using the "lightnvm" device type. An example configuration looks like this: /sys/class/nvme/ └── nvme0n1 ├── capabilities: 3 ├── device_mode: 1 ├── erase_max: 1000000 ├── erase_typ: 1000000 ├── flash_media_type: 0 ├── media_capabilities: 0x00000001 ├── media_type: 0 ├── multiplane: 0x00010101 ├── num_blocks: 1022 ├── num_channels: 1 ├── num_luns: 4 ├── num_pages: 64 ├── num_planes: 1 ├── page_size: 4096 ├── prog_max: 100000 ├── prog_typ: 100000 ├── read_max: 10000 ├── read_typ: 10000 ├── sector_oob_size: 0 ├── sector_size: 4096 ├── media_manager: gennvm ├── ppa_format: 0x380830082808001010102008 ├── vendor_opcode: 0 ├── max_phys_secs: 64 └── version: 1 Signed-off-by: Simon A. F. Lund <slund@cnexlabs.com> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-21lightnvm: control life of nvm_dev in driverMatias Bjørling3-33/+46
LightNVM compatible device drivers does not have a method to expose LightNVM specific sysfs entries. To enable LightNVM sysfs entries to be exposed, lightnvm device drivers require a struct device to attach it to. To allow both the actual device driver and lightnvm sysfs entries to coexist, the device driver tracks the lifetime of the nvm_dev structure. This patch refactors NVMe and null_blk to handle the lifetime of struct nvm_dev, which eliminates the need for struct gendisk when a lightnvm compatible device is provided. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-21nvme: refactor namespaces to support non-gendisk devicesMatias Bjørling2-43/+76
With LightNVM enabled namespaces, the gendisk structure is not exposed to the user. This prevents LightNVM users from accessing the NVMe device driver specific sysfs entries, and LightNVM namespace geometry. Refactor the revalidation process, so that a namespace, instead of a gendisk, is revalidated. This later allows patches to wire up the sysfs entries up to a non-gendisk namespace. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-15Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2-74/+79
Pull block fixes from Jens Axboe: "A set of fixes for the current series in the realm of block. Like the previous pull request, the meat of it are fixes for the nvme fabrics/target code. Outside of that, just one fix from Gabriel for not doing a queue suspend if we didn't get the admin queue setup in the first place" * 'for-linus' of git://git.kernel.dk/linux-block: nvme-rdma: add back dependency on CONFIG_BLOCK nvme-rdma: fix null pointer dereference on req->mr nvme-rdma: use ib_client API to detect device removal nvme-rdma: add DELETING queue flag nvme/quirk: Add a delay before checking device ready for memblaze device nvme: Don't suspend admin queue that wasn't created nvme-rdma: destroy nvme queue rdma resources on connect failure nvme_rdma: keep a ref on the ctrl during delete/flush iw_cxgb4: block module unload until all ep resources are released iw_cxgb4: call dev_put() on l2t allocation failure
2016-09-15nvme: remove the post_scan calloutChristoph Hellwig2-4/+0
No need now that we don't have to reverse engineer the irq affinity. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-15nvme: switch to use pci_alloc_irq_vectorsChristoph Hellwig1-71/+36
Use the new helper to automatically select the right interrupt type, as well as to use the automatic interupt affinity assignment. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-15blk-mq: remove ->map_queueChristoph Hellwig3-6/+0
All drivers use the default, so provide an inline version of it. If we ever need other queue mapping we can add an optional method back, although supporting will also require major changes to the queue setup code. This provides better code generation, and better debugability as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-13Merge branch 'nvmf-4.8-rc' of git://git.infradead.org/nvme-fabrics into for-linusJens Axboe2-73/+72
Sagi writes: Here we have: - Kconfig dependencies fix from Arnd - nvme-rdma device removal fixes from Steve - possible bad deref fix from Colin
2016-09-12nvme-rdma: add back dependency on CONFIG_BLOCKArnd Bergmann1-0/+1
A recent change removed the dependency on BLK_DEV_NVME, which implies the dependency on PCI and BLOCK. We don't need CONFIG_PCI, but without CONFIG_BLOCK we get tons of build errors, e.g. In file included from drivers/nvme/host/core.c:16:0: linux/blk-mq.h:182:33: error: 'struct gendisk' declared inside parameter list will not be visible outside of this definition or declaration [-Werror] drivers/nvme/host/core.c: In function 'nvme_setup_rw': drivers/nvme/host/core.c:295:21: error: implicit declaration of function 'rq_data_dir' [-Werror=implicit-function-declaration] drivers/nvme/host/nvme.h: In function 'nvme_map_len': drivers/nvme/host/nvme.h:217:6: error: implicit declaration of function 'req_op' [-Werror=implicit-function-declaration] drivers/nvme/host/scsi.c: In function 'nvme_trans_bdev_limits_page': drivers/nvme/host/scsi.c:768:85: error: implicit declaration of function 'queue_max_hw_sectors' [-Werror=implicit-function-declaration] This adds back the specific CONFIG_BLOCK dependency to avoid broken configurations. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: aa71987472a9 ("nvme: fabrics drivers don't need the nvme-pci driver") Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
2016-09-12nvme-rdma: fix null pointer dereference on req->mrColin Ian King1-0/+1
If there is an error on req->mr, req->mr is set to null, however the following statement sets req->mr->need_inval causing a null pointer dereference. Fix this by bailing out to label 'out' to immediately return and hence skip over the offending null pointer dereference. Fixes: f5b7b559e1488 ("nvme-rdma: Get rid of duplicate variable") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>