aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/target/tcmu-design.rst
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--Documentation/target/tcmu-design.rst (renamed from Documentation/target/tcmu-design.txt)272
1 files changed, 148 insertions, 124 deletions
diff --git a/Documentation/target/tcmu-design.txt b/Documentation/target/tcmu-design.rst
index 4cebc1ebf99a..a7b426707bf6 100644
--- a/Documentation/target/tcmu-design.txt
+++ b/Documentation/target/tcmu-design.rst
@@ -1,25 +1,30 @@
-Contents:
-
-1) TCM Userspace Design
- a) Background
- b) Benefits
- c) Design constraints
- d) Implementation overview
- i. Mailbox
- ii. Command ring
- iii. Data Area
- e) Device discovery
- f) Device events
- g) Other contingencies
-2) Writing a user pass-through handler
- a) Discovering and configuring TCMU uio devices
- b) Waiting for events on the device(s)
- c) Managing the command ring
-3) A final note
+====================
+TCM Userspace Design
+====================
+
+
+.. Contents:
+
+ 1) TCM Userspace Design
+ a) Background
+ b) Benefits
+ c) Design constraints
+ d) Implementation overview
+ i. Mailbox
+ ii. Command ring
+ iii. Data Area
+ e) Device discovery
+ f) Device events
+ g) Other contingencies
+ 2) Writing a user pass-through handler
+ a) Discovering and configuring TCMU uio devices
+ b) Waiting for events on the device(s)
+ c) Managing the command ring
+ 3) A final note
TCM Userspace Design
---------------------
+====================
TCM is another name for LIO, an in-kernel iSCSI target (server).
Existing TCM targets run in the kernel. TCMU (TCM in Userspace)
@@ -32,7 +37,8 @@ modules for file, block device, RAM or using another SCSI device as
storage. These are called "backstores" or "storage engines". These
built-in modules are implemented entirely as kernel code.
-Background:
+Background
+----------
In addition to modularizing the transport protocol used for carrying
SCSI commands ("fabrics"), the Linux kernel target, LIO, also modularizes
@@ -60,7 +66,8 @@ kernel, another approach is to create a userspace pass-through
backstore for LIO, "TCMU".
-Benefits:
+Benefits
+--------
In addition to allowing relatively easy support for RBD and GLFS, TCMU
will also allow easier development of new backstores. TCMU combines
@@ -72,21 +79,25 @@ The disadvantage is there are more distinct components to configure, and
potentially to malfunction. This is unavoidable, but hopefully not
fatal if we're careful to keep things as simple as possible.
-Design constraints:
+Design constraints
+------------------
- Good performance: high throughput, low latency
- Cleanly handle if userspace:
+
1) never attaches
2) hangs
3) dies
4) misbehaves
+
- Allow future flexibility in user & kernel implementations
- Be reasonably memory-efficient
- Simple to configure & run
- Simple to write a userspace backend
-Implementation overview:
+Implementation overview
+-----------------------
The core of the TCMU interface is a memory region that is shared
between kernel and userspace. Within this region is: a control area
@@ -108,7 +119,8 @@ the region mapped at a different virtual address.
See target_core_user.h for the struct definitions.
-The Mailbox:
+The Mailbox
+-----------
The mailbox is always at the start of the shared memory region, and
contains a version, details about the starting offset and size of the
@@ -117,19 +129,27 @@ userspace (respectively) to put commands on the ring, and indicate
when the commands are completed.
version - 1 (userspace should abort if otherwise)
+
flags:
-- TCMU_MAILBOX_FLAG_CAP_OOOC: indicates out-of-order completion is
- supported. See "The Command Ring" for details.
-cmdr_off - The offset of the start of the command ring from the start
-of the memory region, to account for the mailbox size.
-cmdr_size - The size of the command ring. This does *not* need to be a
-power of two.
-cmd_head - Modified by the kernel to indicate when a command has been
-placed on the ring.
-cmd_tail - Modified by userspace to indicate when it has completed
-processing of a command.
-
-The Command Ring:
+ - TCMU_MAILBOX_FLAG_CAP_OOOC:
+ indicates out-of-order completion is supported.
+ See "The Command Ring" for details.
+
+cmdr_off
+ The offset of the start of the command ring from the start
+ of the memory region, to account for the mailbox size.
+cmdr_size
+ The size of the command ring. This does *not* need to be a
+ power of two.
+cmd_head
+ Modified by the kernel to indicate when a command has been
+ placed on the ring.
+cmd_tail
+ Modified by userspace to indicate when it has completed
+ processing of a command.
+
+The Command Ring
+----------------
Commands are placed on the ring by the kernel incrementing
mailbox.cmd_head by the size of the command, modulo cmdr_size, and
@@ -180,29 +200,31 @@ opcode it does not handle, it must set UNKNOWN_OP bit (bit 0) in
hdr.uflags, update cmd_tail, and proceed with processing additional
commands, if any.
-The Data Area:
+The Data Area
+-------------
This is shared-memory space after the command ring. The organization
of this area is not defined in the TCMU interface, and userspace
should access only the parts referenced by pending iovs.
-Device Discovery:
+Device Discovery
+----------------
Other devices may be using UIO besides TCMU. Unrelated user processes
may also be handling different sets of TCMU devices. TCMU userspace
processes must find their devices by scanning sysfs
class/uio/uio*/name. For TCMU devices, these names will be of the
-format:
+format::
-tcm-user/<hba_num>/<device_name>/<subtype>/<path>
+ tcm-user/<hba_num>/<device_name>/<subtype>/<path>
where "tcm-user" is common for all TCMU-backed UIO devices. <hba_num>
and <device_name> allow userspace to find the device's path in the
kernel target's configfs tree. Assuming the usual mount point, it is
-found at:
+found at::
-/sys/kernel/config/target/core/user_<hba_num>/<device_name>
+ /sys/kernel/config/target/core/user_<hba_num>/<device_name>
This location contains attributes such as "hw_block_size", that
userspace needs to know for correct operation.
@@ -214,15 +236,16 @@ configure the device, if needed. The name cannot contain ':', due to
LIO limitations.
For all devices so discovered, the user handler opens /dev/uioX and
-calls mmap():
+calls mmap()::
-mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0)
+ mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0)
where size must be equal to the value read from
/sys/class/uio/uioX/maps/map0/size.
-Device Events:
+Device Events
+-------------
If a new device is added or removed, a notification will be broadcast
over netlink, using a generic netlink family name of "TCM-USER" and a
@@ -233,7 +256,8 @@ the LIO device, so that after determining the device is supported
(based on subtype) it can take the appropriate action.
-Other contingencies:
+Other contingencies
+-------------------
Userspace handler process never attaches:
@@ -258,7 +282,7 @@ Userspace handler process is malicious:
Writing a user pass-through handler (with example code)
--------------------------------------------------------
+=======================================================
A user process handing a TCMU device must support the following:
@@ -277,103 +301,103 @@ TCMU is designed so that multiple unrelated processes can manage TCMU
devices separately. All handlers should make sure to only open their
devices, based opon a known subtype string.
-a) Discovering and configuring TCMU UIO devices:
+a) Discovering and configuring TCMU UIO devices::
-(error checking omitted for brevity)
+ /* error checking omitted for brevity */
-int fd, dev_fd;
-char buf[256];
-unsigned long long map_len;
-void *map;
+ int fd, dev_fd;
+ char buf[256];
+ unsigned long long map_len;
+ void *map;
-fd = open("/sys/class/uio/uio0/name", O_RDONLY);
-ret = read(fd, buf, sizeof(buf));
-close(fd);
-buf[ret-1] = '\0'; /* null-terminate and chop off the \n */
+ fd = open("/sys/class/uio/uio0/name", O_RDONLY);
+ ret = read(fd, buf, sizeof(buf));
+ close(fd);
+ buf[ret-1] = '\0'; /* null-terminate and chop off the \n */
-/* we only want uio devices whose name is a format we expect */
-if (strncmp(buf, "tcm-user", 8))
+ /* we only want uio devices whose name is a format we expect */
+ if (strncmp(buf, "tcm-user", 8))
exit(-1);
-/* Further checking for subtype also needed here */
-
-fd = open(/sys/class/uio/%s/maps/map0/size, O_RDONLY);
-ret = read(fd, buf, sizeof(buf));
-close(fd);
-str_buf[ret-1] = '\0'; /* null-terminate and chop off the \n */
+ /* Further checking for subtype also needed here */
-map_len = strtoull(buf, NULL, 0);
+ fd = open(/sys/class/uio/%s/maps/map0/size, O_RDONLY);
+ ret = read(fd, buf, sizeof(buf));
+ close(fd);
+ str_buf[ret-1] = '\0'; /* null-terminate and chop off the \n */
-dev_fd = open("/dev/uio0", O_RDWR);
-map = mmap(NULL, map_len, PROT_READ|PROT_WRITE, MAP_SHARED, dev_fd, 0);
+ map_len = strtoull(buf, NULL, 0);
+ dev_fd = open("/dev/uio0", O_RDWR);
+ map = mmap(NULL, map_len, PROT_READ|PROT_WRITE, MAP_SHARED, dev_fd, 0);
-b) Waiting for events on the device(s)
-
-while (1) {
- char buf[4];
- int ret = read(dev_fd, buf, 4); /* will block */
+ b) Waiting for events on the device(s)
- handle_device_events(dev_fd, map);
-}
+ while (1) {
+ char buf[4];
+ int ret = read(dev_fd, buf, 4); /* will block */
-c) Managing the command ring
-
-#include <linux/target_core_user.h>
-
-int handle_device_events(int fd, void *map)
-{
- struct tcmu_mailbox *mb = map;
- struct tcmu_cmd_entry *ent = (void *) mb + mb->cmdr_off + mb->cmd_tail;
- int did_some_work = 0;
-
- /* Process events from cmd ring until we catch up with cmd_head */
- while (ent != (void *)mb + mb->cmdr_off + mb->cmd_head) {
-
- if (tcmu_hdr_get_op(ent->hdr.len_op) == TCMU_OP_CMD) {
- uint8_t *cdb = (void *)mb + ent->req.cdb_off;
- bool success = true;
-
- /* Handle command here. */
- printf("SCSI opcode: 0x%x\n", cdb[0]);
-
- /* Set response fields */
- if (success)
- ent->rsp.scsi_status = SCSI_NO_SENSE;
- else {
- /* Also fill in rsp->sense_buffer here */
- ent->rsp.scsi_status = SCSI_CHECK_CONDITION;
+ handle_device_events(dev_fd, map);
}
- }
- else if (tcmu_hdr_get_op(ent->hdr.len_op) != TCMU_OP_PAD) {
- /* Tell the kernel we didn't handle unknown opcodes */
- ent->hdr.uflags |= TCMU_UFLAG_UNKNOWN_OP;
- }
- else {
- /* Do nothing for PAD entries except update cmd_tail */
- }
-
- /* update cmd_tail */
- mb->cmd_tail = (mb->cmd_tail + tcmu_hdr_get_len(&ent->hdr)) % mb->cmdr_size;
- ent = (void *) mb + mb->cmdr_off + mb->cmd_tail;
- did_some_work = 1;
- }
- /* Notify the kernel that work has been finished */
- if (did_some_work) {
- uint32_t buf = 0;
- write(fd, &buf, 4);
- }
-
- return 0;
-}
+c) Managing the command ring::
+
+ #include <linux/target_core_user.h>
+
+ int handle_device_events(int fd, void *map)
+ {
+ struct tcmu_mailbox *mb = map;
+ struct tcmu_cmd_entry *ent = (void *) mb + mb->cmdr_off + mb->cmd_tail;
+ int did_some_work = 0;
+
+ /* Process events from cmd ring until we catch up with cmd_head */
+ while (ent != (void *)mb + mb->cmdr_off + mb->cmd_head) {
+
+ if (tcmu_hdr_get_op(ent->hdr.len_op) == TCMU_OP_CMD) {
+ uint8_t *cdb = (void *)mb + ent->req.cdb_off;
+ bool success = true;
+
+ /* Handle command here. */
+ printf("SCSI opcode: 0x%x\n", cdb[0]);
+
+ /* Set response fields */
+ if (success)
+ ent->rsp.scsi_status = SCSI_NO_SENSE;
+ else {
+ /* Also fill in rsp->sense_buffer here */
+ ent->rsp.scsi_status = SCSI_CHECK_CONDITION;
+ }
+ }
+ else if (tcmu_hdr_get_op(ent->hdr.len_op) != TCMU_OP_PAD) {
+ /* Tell the kernel we didn't handle unknown opcodes */
+ ent->hdr.uflags |= TCMU_UFLAG_UNKNOWN_OP;
+ }
+ else {
+ /* Do nothing for PAD entries except update cmd_tail */
+ }
+
+ /* update cmd_tail */
+ mb->cmd_tail = (mb->cmd_tail + tcmu_hdr_get_len(&ent->hdr)) % mb->cmdr_size;
+ ent = (void *) mb + mb->cmdr_off + mb->cmd_tail;
+ did_some_work = 1;
+ }
+
+ /* Notify the kernel that work has been finished */
+ if (did_some_work) {
+ uint32_t buf = 0;
+
+ write(fd, &buf, 4);
+ }
+
+ return 0;
+ }
A final note
-------------
+============
Please be careful to return codes as defined by the SCSI
specifications. These are different than some values defined in the