aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/ulp (follow)
AgeCommit message (Collapse)AuthorFilesLines
2013-11-18Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infinibandLinus Torvalds9-146/+472
Pull infiniband/rdma updates from Roland Dreier: - Re-enable flow steering verbs with new improved userspace ABI - Fixes for slow connection due to GID lookup scalability - IPoIB fixes - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib - Further improvements to SRP error handling - Add new transport type for Cisco usNIC * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits) IB/core: Re-enable create_flow/destroy_flow uverbs IB/core: extended command: an improved infrastructure for uverbs commands IB/core: Remove ib_uverbs_flow_spec structure from userspace IB/core: Use a common header for uverbs flow_specs IB/core: Make uverbs flow structure use names like verbs ones IB/core: Rename 'flow' structs to match other uverbs structs IB/core: clarify overflow/underflow checks on ib_create/destroy_flow IB/ucma: Convert use of typedef ctl_table to struct ctl_table IB/cm: Convert to using idr_alloc_cyclic() IB/mlx5: Fix page shift in create CQ for userspace IB/mlx4: Fix device max capabilities check IB/mlx5: Fix list_del of empty list IB/mlx5: Remove dead code IB/core: Encorce MR access rights rules on kernel consumers IB/mlx4: Fix endless loop in resize CQ RDMA/cma: Remove unused argument and minor dead code RDMA/ucma: Discard events for IDs not yet claimed by user space IB/core: Add Cisco usNIC rdma node and transport types RDMA/nes: Remove self-assignment from nes_query_qp() IB/srp: Report receive errors correctly ...
2013-11-17Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'srp' into for-nextRoland Dreier9-146/+472
2013-11-15Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivialLinus Torvalds1-2/+2
Pull trivial tree updates from Jiri Kosina: "Usual earth-shaking, news-breaking, rocket science pile from trivial.git" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits) doc: usb: Fix typo in Documentation/usb/gadget_configs.txt doc: add missing files to timers/00-INDEX timekeeping: Fix some trivial typos in comments mm: Fix some trivial typos in comments irq: Fix some trivial typos in comments NUMA: fix typos in Kconfig help text mm: update 00-INDEX doc: Documentation/DMA-attributes.txt fix typo DRM: comment: `halve' -> `half' Docs: Kconfig: `devlopers' -> `developers' doc: typo on word accounting in kprobes.c in mutliple architectures treewide: fix "usefull" typo treewide: fix "distingush" typo mm/Kconfig: Grammar s/an/a/ kexec: Typo s/the/then/ Documentation/kvm: Update cpuid documentation for steal time and pv eoi treewide: Fix common typo in "identify" __page_to_pfn: Fix typo in comment Correct some typos for word frequency clk: fixed-factor: Fix a trivial typo ...
2013-11-08IB/srp: Report receive errors correctlyBart Van Assche1-5/+4
The IB spec does not guarantee that the opcode is available in error completions. Hence do not rely on it. See also commit 948d1e889e5b ("IB/srp: Introduce srp_handle_qp_err()"). Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: <stable@vger.kernel.org> # v3.8 Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Avoid offlining operational SCSI devicesBart Van Assche1-1/+1
If SCSI commands are submitted with a SCSI request timeout that is lower than the the IB RC timeout, it can happen that the SCSI error handler has already started device recovery before transport layer error handling starts. So it can happen that the SCSI error handler tries to abort a SCSI command after it has been reset by srp_rport_reconnect(). Tell the SCSI error handler that such commands have finished and that it is not necessary to continue its recovery strategy for commands that have been reset by srp_rport_reconnect(). Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Remove target from list before freeing Scsi_Host structureVu Pham1-4/+5
Remove an SRP target from the SRP target list before invoking the last scsi_host_put() call. This change is necessary because that last put frees the memory that holds the srp_target_port structure. This patch prevents the following kernel oops: RIP: 0010:[<ffffffff810b00d0>] __lock_acquire+0x500/0x1570 Call Trace: [<ffffffff810b11e4>] lock_acquire+0xa4/0x120 [<ffffffff81531206>] _spin_lock+0x36/0x70 [<ffffffffa01b6d8f>] srp_remove_work+0xef/0x180 [ib_srp] [<ffffffff8109125c>] worker_thread+0x21c/0x3d0 [<ffffffff81096e86>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 Signed-off-by: Vu Pham <vuhuong@mellanox.com> [ bvanassche - Modified path description and CC'ed stable. ] Signed-off-by: Bart Van Assche <bvanassche@acm.org> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Add change_queue_depth and change_queue_type supportJack Wang1-0/+54
Currently, it's not possible to change queue depth for a device behind SRP host. Sometimes, we need to adjust queue_depth for performance reason (eg storage busy, we need lower queue_depth to avoid running into SCSI error handler), so this patch add support for SRP driver. Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> Tested-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Make queue size configurableBart Van Assche2-39/+103
Certain storage configurations, e.g. a sufficiently large array of hard disks in a RAID configuration, need a queue depth above 64 to achieve optimal performance. Hence make the queue depth configurable. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Tested-by: Jack Wang <xjtuwjp@gmail.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Introduce srp_alloc_req_data()Bart Van Assche1-24/+40
This patch does not change any functionality. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Cc: Roland Dreier <roland@purestorage.com> Cc: Vu Pham <vu@mellanox.com> Cc: Sebastian Riemer <sebastian.riemer@profitbricks.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Export sgid to sysfsBart Van Assche1-0/+10
On an initiator system with multiple IB ports it is not yet possible to figure out what the originating port of an SRP connection is. Hence make the source GID available in sysfs. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Add periodic reconnect functionalityBart Van Assche1-6/+46
After a transport layer occurred, periodically try to reconnect to the target until the dev_loss timer expires. Protect the callback functions that can be invoked from inside the SCSI EH against concurrent invocation with srp_reconnect_rport() via the rport mutex. Change the default dev_loss_tmo from 60s into 600s to give the reconnect mechanism a chance to kick in. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08scsi_transport_srp: Add periodic reconnect supportBart Van Assche1-2/+2
Add support for periodically reconnecting to an SRP target until the dev_loss timer expires. After the tenth reconnection attempt, gradually slow down subsequent reconnect attempts. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Start timers if a transport layer error occursBart Van Assche2-0/+20
Start the reconnect timer, fast_io_fail timer and dev_loss timers if a transport layer error occurs. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Use SRP transport layer error recoveryBart Van Assche2-41/+101
Enable fast_io_fail_tmo and dev_loss_tmo functionality for the IB SRP initiator. Add kernel module parameters that allow to specify default values for these parameters. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Keep rport as long as the IB transport layerBart Van Assche2-0/+4
Keep the rport data structure around after srp_remove_host() has finished until cleanup of the IB transport layer has finished completely. This is necessary because later patches use the rport pointer inside the queuecommand callback. Without this patch accessing the rport from inside a queuecommand callback is racy because srp_remove_host() must be invoked before scsi_remove_host() and because the queuecommand callback could get invoked after srp_remove_host() has finished. In other words, without this patch the queuecommand callback can get invoked after the rport data structure has been freed. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IB/srp: Make transport layer retry count configurableVu Pham2-1/+24
Allow the InfiniBand RC retry count to be configured by the user as an option in the target login string. Reducing this retry count allows to reduce the path failover time. Signed-off-by: Vu Pham <vu@mellanox.com> [ bvanassche: Rewrote patch description / changed default retry count ] Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IPoIB: lower NAPI weightMichal Schmidt1-1/+1
Since commit 82dc3c63c692 ("net: introduce NAPI_POLL_WEIGHT") netif_napi_add() produces an error message if a NAPI poll weight greater than 64 is requested. Use the standard NAPI weight. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IPoIB: Start multicast join process only on active portsErez Shitrit1-0/+8
The driver starts the mcast_join task whenever the netdev interface is UP without relation to the underlying IB port state. Until the port state is ACTIVE all the join requests are irrelevant, and the IB core returns -EINVAL. So the user will see errors such as: "multicast join failed for ff12:401b:... , status -22". Instead, have ipoib_mcast_join_task() return when the port is not active. It will be called again when the port state is changed and the low-level driver triggers the IB_EVENT_PORT_ACTIVE event or the IB_EVENT_CLIENT_REREGISTER event. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IPoIB: Add path query flushing in ipoib_ib_dev_cleanupErez Shitrit1-0/+5
The path_rec_completion() callback may be invoked asynchronously even at the middle of "driver uninit" process. This can lead to scheduling a task that tries to touch members of the priv object that are no longer valid. For example the function cm_create_tx_qp can attempt to create qp with no valid priv->pd object. The following crash is one of the results: RIP: 0010:[<ffffffffa021bb47>] [<ffffffffa021bb47>] ipoib_cm_create_tx_qp+0x57/0x90 [ib_ipoib] Process ipoib (pid: 5916, threadinfo ffff8803786e4000, task ffff8804150e1500) Stack: Call Trace: [<ffffffff81309ef0>] ? get_random_bytes+0x20/0x30 [<ffffffffa021be2a>] ipoib_cm_tx_init+0xca/0x340 [ib_ipoib] [<ffffffffa021f765>] ipoib_cm_tx_start+0x215/0x3f0 [ib_ipoib] [<ffffffffa021f550>] ? ipoib_cm_tx_start+0x0/0x3f0 [ib_ipoib] [<ffffffff8108b2b0>] worker_thread+0x170/0x2a0 [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8108b140>] ? worker_thread+0x0/0x2a0 [<ffffffff81090886>] kthread+0x96/0xa0 [<ffffffff8100c14a>] child_rip+0xa/0x20 [<ffffffff810907f0>] ? kthread+0x0/0xa0 [<ffffffff8100c140>] ? child_rip+0x0/0x20 Fix that by flushing all pending path queries at this point. Signed-off-by: Alex Markuze <markuze@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IPoIB: Fix usage of uninitialized multicast objectsErez Shitrit2-4/+19
The driver should avoid calling ib_sa_free_multicast on the mcast->mc object until it finishes its initialization state. Otherwise we can crash when ipoib_mcast_dev_flush() attempts to use the uninitialized multicast object. Instead, only call wait_for_completion() for multicast entries that started the join process, meaning that ib_sa_join_multicast() finished. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IPoIB: Avoid flushing the driver workqueue on dev_downErez Shitrit1-3/+1
The driver should not flush the whole workqueue when only one work (the pkey poll one) needs to be cancelled. Use cancel_delayed_work_sync() instead. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IPoIB: Fix deadlock between dev_change_flags() and __ipoib_dev_flush()Erez Shitrit5-14/+16
When ipoib interface is going down it takes all of its children with it, under mutex. For each child, dev_change_flags() is called. That function calls ipoib_stop() via the ndo, and causes flush of the workqueue. Sometimes in the workqueue an __ipoib_dev_flush work() is waiting and when invoked tries to get the same mutex, which leads to a deadlock, as seen below. The solution is to switch to rw-sem instead of mutex. The deadlock: [11028.165303] [<ffffffff812b0977>] ? vgacon_scroll+0x107/0x2e0 [11028.171844] [<ffffffff814eaac5>] schedule_timeout+0x215/0x2e0 [11028.178465] [<ffffffff8105a5c3>] ? perf_event_task_sched_out+0x33/0x80 [11028.185962] [<ffffffff814ea743>] wait_for_common+0x123/0x180 [11028.192491] [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20 [11028.199504] [<ffffffff814ea85d>] wait_for_completion+0x1d/0x20 [11028.206224] [<ffffffff8108b4f1>] flush_cpu_workqueue+0x61/0x90 [11028.212948] [<ffffffff8108b5a0>] ? wq_barrier_func+0x0/0x20 [11028.219375] [<ffffffff8108bfc4>] flush_workqueue+0x54/0x80 [11028.225712] [<ffffffffa05a0576>] ipoib_mcast_stop_thread+0x66/0x90 [ib_ipoib] [11028.233988] [<ffffffffa059ccea>] ipoib_ib_dev_down+0x6a/0x100 [ib_ipoib] [11028.241678] [<ffffffffa059849a>] ipoib_stop+0x8a/0x140 [ib_ipoib] [11028.248692] [<ffffffff8142adf1>] dev_close+0x71/0xc0 [11028.254447] [<ffffffff8142a631>] dev_change_flags+0xa1/0x1d0 [11028.261062] [<ffffffffa059851b>] ipoib_stop+0x10b/0x140 [ib_ipoib] [11028.268172] [<ffffffff8142adf1>] dev_close+0x71/0xc0 [11028.273922] [<ffffffff8142a631>] dev_change_flags+0xa1/0x1d0 [11028.280452] [<ffffffff8148f20b>] devinet_ioctl+0x5eb/0x6a0 [11028.286786] [<ffffffff814903b8>] inet_ioctl+0x88/0xa0 [11028.292633] [<ffffffff8141591a>] sock_ioctl+0x7a/0x280 [11028.298576] [<ffffffff81189012>] vfs_ioctl+0x22/0xa0 [11028.304326] [<ffffffff81140540>] ? unmap_region+0x110/0x130 [11028.310756] [<ffffffff811891b4>] do_vfs_ioctl+0x84/0x580 [11028.316897] [<ffffffff81189731>] sys_ioctl+0x81/0xa0 and 11028.017533] [<ffffffff8105a5c3>] ? perf_event_task_sched_out+0x33/0x80 [11028.025030] [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20 [11028.031945] [<ffffffff814eb2ae>] __mutex_lock_slowpath+0x13e/0x180 [11028.039053] [<ffffffff814eb14b>] mutex_lock+0x2b/0x50 [11028.044910] [<ffffffffa059f7e7>] __ipoib_ib_dev_flush+0x37/0x210 [ib_ipoib] [11028.052894] [<ffffffffa059fa00>] ? ipoib_ib_dev_flush_light+0x0/0x20 [ib_ipoib] [11028.061363] [<ffffffffa059fa17>] ipoib_ib_dev_flush_light+0x17/0x20 [ib_ipoib] [11028.069738] [<ffffffff8108b120>] worker_thread+0x170/0x2a0 [11028.076068] [<ffffffff81090990>] ? autoremove_wake_function+0x0/0x40 [11028.083374] [<ffffffff8108afb0>] ? worker_thread+0x0/0x2a0 [11028.089709] [<ffffffff81090626>] kthread+0x96/0xa0 [11028.095266] [<ffffffff8100c0ca>] child_rip+0xa/0x20 [11028.100921] [<ffffffff81090590>] ? kthread+0x0/0xa0 [11028.106573] [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 [11028.112423] INFO: task ifconfig:23640 blocked for more than 120 seconds. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IPoIB: Change CM skb memory allocation to be non-atomic during initTal Alon1-5/+9
Change CM skb memory allocation to use GFP_KERNEL when possible. During device init there's no need to use GFP_ATOMIC when allocating memory for the CM skbs -- use GFP_KERNEL instead. Signed-off-by: Tal Alon <talal@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08IPoIB: Fix crash in dev_open error flowErez Shitrit1-4/+7
If napi has never been enabled when calling ipoib_ib_dev_stop, a kernel crash occurs, because the verbs layer completion handler (ipoib_ib_completion) calls napi_schedule unconditionally. If the napi structure passed in the napi_schedule call has not been initialized, napi will crash. The cleanest solution is to simply enable napi before calling ipoib_ib_dev_stop in the dev_open error flow. (dev_stop then immediately disables napi). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-10-23iser-target: check device before dereferencing its variableVu Pham1-1/+1
This patch changes isert_connect_release() to correctly check for the existence struct isert_device *device before checking for isert_device->use_frwr. Signed-off-by: Vu Pham <vu@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-14treewide: Fix typo in KconfigMasanari Iida1-2/+2
Correct spelling typo in Kconfig. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-10-03ib_srpt: always set response for task managementJack Wang1-6/+4
The SRP specification requires: "Response data shall be provided in any SRP_RSP response that is sent in response to an SRP_TSK_MGMT request (see 6.7). The information in the RSP_CODE field (see table 24) shall indicate the completion status of the task management function." So fix this to avoid the SRP initiator interprets task management functions that succeeded as failed. Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com> Cc: stable@vger.kernel.org # 3.3+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-01ib_srpt: Destroy cm_id before destroying QP.Nicholas Bellinger1-2/+2
This patch fixes a bug where ib_destroy_cm_id() was incorrectly being called after srpt_destroy_ch_ib() had destroyed the active QP. This would result in the following failed SRP_LOGIN_REQ messages: Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff1762bd, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c903009f8f41) Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff1758f9, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 2 (guid=0xfe80000000000000:0x2c903009f8f42) Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff175941, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 2 (guid=0xfe80000000000000:0x2c90300a3cfb2) Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff176299, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c90300a3cfb1) mlx4_core 0000:84:00.0: command 0x19 failed: fw status = 0x9 rejected SRP_LOGIN_REQ because creating a new RDMA channel failed. Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff176299, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c90300a3cfb1) mlx4_core 0000:84:00.0: command 0x19 failed: fw status = 0x9 rejected SRP_LOGIN_REQ because creating a new RDMA channel failed. Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff176299, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c90300a3cfb1) Reported-by: Navin Ahuja <navin.ahuja@saratoga-speed.com> Cc: stable@vger.kernel.org # 3.3+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-12Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pendingLinus Torvalds2-228/+545
Pull SCSI target updates from Nicholas Bellinger: "Lots of activity again this round for I/O performance optimizations (per-cpu IDA pre-allocation for vhost + iscsi/target), and the addition of new fabric independent features to target-core (COMPARE_AND_WRITE + EXTENDED_COPY). The main highlights include: - Support for iscsi-target login multiplexing across individual network portals - Generic Per-cpu IDA logic (kent + akpm + clameter) - Conversion of vhost to use per-cpu IDA pre-allocation for descriptors, SGLs and userspace page pointer list - Conversion of iscsi-target + iser-target to use per-cpu IDA pre-allocation for descriptors - Add support for generic COMPARE_AND_WRITE (AtomicTestandSet) emulation for virtual backend drivers - Add support for generic EXTENDED_COPY (CopyOffload) emulation for virtual backend drivers. - Add support for fast memory registration mode to iser-target (Vu) The patches to add COMPARE_AND_WRITE and EXTENDED_COPY support are of particular significance, which make us the first and only open source target to support the full set of VAAI primitives. Currently Linux clients are lacking upstream support to actually utilize these primitives. However, with server side support now in place for folks like MKP + ZAB working on the client, this logic once reserved for the highest end of storage arrays, can now be run in VMs on their laptops" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits) target/iscsi: Bump versions to v4.1.0 target: Update copyright ownership/year information to 2013 iscsi-target: Bump default TCP listen backlog to 256 target: Fix >= v3.9+ regression in PR APTPL + ALUA metadata write-out iscsi-target; Bump default CmdSN Depth to 64 iscsi-target: Remove unnecessary wait_for_completion in iscsi_get_thread_set iscsi-target: Add thread_set->ts_activate_sem + use common deallocate iscsi-target: Fix race with thread_pre_handler flush_signals + ISCSI_THREAD_SET_DIE target: remove unused including <linux/version.h> iser-target: introduce fast memory registration mode (FRWR) iser-target: generalize rdma memory registration and cleanup iser-target: move rdma wr processing to a shared function target: Enable global EXTENDED_COPY setup/release target: Add Third Party Copy (3PC) bit in INQUIRY response target: Enable EXTENDED_COPY setup in spc_parse_cdb target: Add support for EXTENDED_COPY copy offload emulation target: Avoid non-existent tg_pt_gp_mem in target_alua_state_check target: Add global device list for EXTENDED_COPY target: Make helpers non static for EXTENDED_COPY command setup target: Make spc_parse_naa_6h_vendor_specific non static ...
2013-09-10target: Update copyright ownership/year information to 2013Nicholas Bellinger1-1/+1
Update copyright ownership/year information for target-core, loopback, iscsi-target, tcm_qla2xx, vhost and iser-target. Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-10iser-target: introduce fast memory registration mode (FRWR)Vu Pham2-13/+371
This model was introduced in 00f7ec36c "RDMA/core: Add memory management extensions support" and works when the IB device supports the IB_DEVICE_MEM_MGT_EXTENSIONS capability. Upon creating the isert device, ib_isert will test whether the HCA supports FRWR. If supported then set the flag and assign function pointers that handle fast registration and deregistration of appropriate resources (fast_reg descriptors). When new connection coming in, ib_isert will check frwr flag and create frwr resouces, if fail to do it will switch back to old model of using global dma key and turn off the frwr support. Registration is done using posting IB_WR_FAST_REG_MR to the QP and invalidations using posting IB_WR_LOCAL_INV. Signed-off-by: Vu Pham <vu@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-10iser-target: generalize rdma memory registration and cleanupVu Pham2-4/+23
Current driver uses global dma key to register the memory pointed by sg list provided by the target core. This is the preparation step for adding more methods like fast path memory registration, make the reg/unreg calls be function pointers. Signed-off-by: Vu Pham <vu@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-10iser-target: move rdma wr processing to a shared functionVu Pham2-152/+114
isert_put_datain() and isert_get_dataout() share a lot of code in rdma wr processing, move this common code to a shared function. Use isert_unmap_cmd to cleanup for RDMA_READ completion. Remove duplicate field in isert_cmd and isert_rdma_wr structs Change misc debug messages to track isert_cmd Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Vu Pham <vu@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-09iscsi/iser-target: Convert to command priv_size usageNicholas Bellinger2-75/+41
This command converts iscsi/isert-target to use allocations based on iscsit_transport->priv_size within iscsit_allocate_cmd(), instead of using an embedded isert_cmd->iscsi_cmd. This includes removing iscsit_transport->alloc_cmd() usage, along with updating isert-target code to use iscsit_priv_cmd(). Also, remove left-over iscsit_transport->release_cmd() usage for direct calls to iscsit_release_cmd(), and drop the now unused lio_cmd_cache and isert_cmd_cache. Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09iser-target: Updates for login negotiation multi-plexing supportNicholas Bellinger1-1/+13
This patch updates iser-target code to support login negotiation multi-plexing. This includes only using isert_conn->conn_login_comp for the first login request PDU, pushing the subsequent processing to iscsi_conn->login_work -> iscsi_target_do_login_rx(), and turning isert_get_login_rx() into a NOP. v3 changes: - Drop unnecessary LOGIN_FLAGS_READ_ACTIVE bit set in isert_rx_login_req() Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-05Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infinibandLinus Torvalds7-174/+592
Pull main batch of InfiniBand/RDMA changes from Roland Dreier: - Large ocrdma HW driver update: add "fast register" work requests, fixes, cleanups - Add receive flow steering support for raw QPs - Fix IPoIB neighbour race that leads to crash - iSER updates including support for using "fast register" memory registration - IPv6 support for iWARP - XRC transport fixes * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (54 commits) RDMA/ocrdma: Fix compiler warning about int/pointer size mismatch IB/iser: Fix redundant pointer check in dealloc flow IB/iser: Fix possible memory leak in iser_create_frwr_pool() IB/qib: Move COUNTER_MASK definition within qib_mad.h header guards RDMA/ocrdma: Fix passing wrong opcode to modify_srq RDMA/ocrdma: Fill PVID in UMC case RDMA/ocrdma: Add ABI versioning support RDMA/ocrdma: Consider multiple SGES in case of DPP RDMA/ocrdma: Fix for displaying proper link speed RDMA/ocrdma: Increase STAG array size RDMA/ocrdma: Dont use PD 0 for userpace CQ DB RDMA/ocrdma: FRMA code cleanup RDMA/ocrdma: For ERX2 irrespective of Qid, num_posted offset is 24 RDMA/ocrdma: Fix to work with even a single MSI-X vector RDMA/ocrdma: Remove the MTU check based on Ethernet MTU RDMA/ocrdma: Add support for fast register work requests (FRWR) RDMA/ocrdma: Create IRD queue fix IB/core: Better checking of userspace values for receive flow steering IB/mlx4: Add receive flow steering support IB/core: Export ib_create/destroy_flow through uverbs ...
2013-09-03Merge branches 'cxgb4', 'flowsteer', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into for-nextRoland Dreier7-174/+592
2013-09-02IB/iser: Fix redundant pointer check in dealloc flowSagi Grimberg1-1/+1
This bug was discovered by Smatch static checker run by Dan Carpenter. If in free_rx_descriptors(), rx_descs are not NULL then the iser device is definately not NULL, so no need to check it before dereferencing it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-09-02IB/iser: Fix possible memory leak in iser_create_frwr_pool()Roi Dayan1-3/+7
Fix leak where desc is not being freed in error flows. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-26[SCSI] IB/iser: Add Discovery supportOr Gerlitz2-2/+12
To run discovery over iSER we need to advertize the CAP_TEXT_NEGO capability towards user space. Also need to make sure the login RX buffer is posted when SendTargets TEXT PDUs are sent. For that end, we use a setting of the ISCSI_PARAM_DISCOVERY_SESS iscsi param as an indication that this is discovery session. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-13IPoIB: Fix race in deleting ipoib_neigh entriesJim Foraker2-9/+3
In several places, this snippet is used when removing neigh entries: list_del(&neigh->list); ipoib_neigh_free(neigh); The list_del() removes neigh from the associated struct ipoib_path, while ipoib_neigh_free() removes neigh from the device's neigh entry lookup table. Both of these operations are protected by the priv->lock spinlock. The table however is also protected via RCU, and so naturally the lock is not held when doing reads. This leads to a race condition, in which a thread may successfully look up a neigh entry that has already been deleted from neigh->list. Since the previous deletion will have marked the entry with poison, a second list_del() on the object will cause a panic: #5 [ffff8802338c3c70] general_protection at ffffffff815108c5 [exception RIP: list_del+16] RIP: ffffffff81289020 RSP: ffff8802338c3d20 RFLAGS: 00010082 RAX: dead000000200200 RBX: ffff880433e60c88 RCX: 0000000000009e6c RDX: 0000000000000246 RSI: ffff8806012ca298 RDI: ffff880433e60c88 RBP: ffff8802338c3d30 R8: ffff8806012ca2e8 R9: 00000000ffffffff R10: 0000000000000001 R11: 0000000000000000 R12: ffff8804346b2020 R13: ffff88032a3e7540 R14: ffff8804346b26e0 R15: 0000000000000246 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #6 [ffff8802338c3d38] ipoib_cm_tx_handler at ffffffffa066fe0a [ib_ipoib] #7 [ffff8802338c3d98] cm_process_work at ffffffffa05149a7 [ib_cm] #8 [ffff8802338c3de8] cm_work_handler at ffffffffa05161aa [ib_cm] #9 [ffff8802338c3e38] worker_thread at ffffffff81090e10 #10 [ffff8802338c3ee8] kthread at ffffffff81096c66 #11 [ffff8802338c3f48] kernel_thread at ffffffff8100c0ca We move the list_del() into ipoib_neigh_free(), so that deletion happens only once, after the entry has been successfully removed from the lookup table. This same behavior is already used in ipoib_del_neighs_by_gid() and __ipoib_reap_neigh(). Signed-off-by: Jim Foraker <foraker1@llnl.gov> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Reviewed-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09IB/iser: Introduce fast memory registration model (FRWR)Sagi Grimberg3-12/+287
Newer HCAs and Virtual functions may not support FMRs but rather a fast registration model, which we call FRWR - "Fast Registration Work Requests". This model was introduced in 00f7ec36c ("RDMA/core: Add memory management extensions support") and works when the IB device supports the IB_DEVICE_MEM_MGT_EXTENSIONS capability. Upon creating the iser device iser will test whether the HCA supports FMRs. If no support for FMRs, check if IB_DEVICE_MEM_MGT_EXTENSIONS is supported and assign function pointers that handle fast registration and allocation of appropriate resources (fast_reg descriptors). Registration is done using posting IB_WR_FAST_REG_MR to the QP and invalidations using posting IB_WR_LOCAL_INV. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09IB/iser: Place the fmr pool into a union in iser's IB conn structSagi Grimberg3-41/+48
This is preparation step for other memory registration methods to be added. In addition, change reg/unreg routines signature to indicate they use FMRs. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09IB/iser: Handle unaligned SG in separate functionSagi Grimberg1-23/+43
This routine will be shared with other rdma management schemes. The bounce buffer solution for unaligned SG-lists and the sg_to_page_vec routine are likely to be used for other registration schemes and not just FMR. Move them out of the FMR specific code, and call them from there. Later they will be called also from other reg methods code. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09IB/iser: Generalize rdma memory registrationSagi Grimberg3-20/+40
Currently the driver uses FMRs as the only means to register the memory pointed by SG provided by the SCSI mid-layer with the RDMA device. As preparation step for adding more methods for fast path memory registration, make the alloc/free and reg/unreg calls function pointers, which are for now just set to the existing FMR ones. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09IB/iser: Accept session->cmds_max from user spaceShlomo Pongratz4-27/+46
Use cmds_max passed from user space to be the number of PDUs to be supported for the session instead of hard-coded ISCSI_DEF_XMIT_CMDS_MAX. This allow controlling the max number of SCSI commands for the session. Also don't ignore the qdepth passed from user space. Derive from session->cmds_max the actual number of RX buffers and FMR pool size to allocate during the connection bind phase. Since the iser transport connection is established before the iscsi session/connection are created and bound, we still use one hard-coded quantity ISER_DEF_XMIT_CMDS_MAX to compute the maximum number of work-requests to be supported by the RC QP used for the connection. The above quantity is made to be a power of two between ISCSI_TOTAL_CMDS_MIN (16) and ISER_DEF_XMIT_CMDS_MAX (512) inclusive. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09IB/iser: Restructure allocation/deallocation of connection resourcesShlomo Pongratz3-71/+151
This is a preparation step to a patch that accepts the number of max SCSI commands to be supported a session from user space iSCSI tools. Move the allocation of the login buffer, FMR pool and its associated page vector from iser_create_ib_conn_res() (which is called prior when we actually know how many commands should be supported) to iser_alloc_rx_descriptors() (which is called during the iscsi connection bind step where this quantity is known). Also do small refactoring around the deallocation to make that path similar to the allocation one. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09IB/iser: Use proper debug level value for info printsOr Gerlitz2-9/+8
Commit 4f363882612 ("IB/iser: Move informational messages from error to info level") set info prints to be emitted at a lower debug level than warning prints, which is a bit odd. Fix that. Also move the prints on unaligned SG from warning to debug level. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-31IPoIB: Fix pkey change flow for virtualization environmentsErez Shitrit1-13/+63
IPoIB's required behaviour w.r.t to the pkey used by the device is the following: - For "parent" interfaces (e.g ib0, ib1, etc) who are created automatically as a result of hot-plug events from the IB core, the driver needs to take whatever pkey vlaue it finds in index 0, and stick to that index. - For child interfaces (e.g ib0.8001, etc) created by admin directive, the driver needs to use and stick to the value provided during its creation. In SR-IOV environment its possible for the VF probe to take place before the cloud management software provisions the suitable pkey for the VF in the paravirtualed PKEY table index 0. When this is the case, the VF IB stack will find in index 0 an invalide pkey, which is all zeros. Moreover, the cloud managment can assign the pkey value at index 0 at any time of the guest life cycle. The correct behavior for IPoIB to address these requirements for parent interfaces is to use PKEY_CHANGE event as trigger to optionally re-init the device pkey value and re-create all the relevant resources accordingly, if the value of the pkey in index 0 has changed (from invalid to valid or from valid value X to invalid value Y). This patch enhances the heavy flushing code which is triggered by pkey change event, to behave correctly for parent devices. For child devices, the code remains the same, namely chases pkey value and not index. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-31IPoIB: Make sure child devices use valid/proper pkeysOr Gerlitz2-1/+10
Make sure that the IB invalid pkey (0x0000 or 0x8000) isn't used for child devices. Also, make sure to always set the full membership bit for the pkey of devices created by rtnl link ops. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>