aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/ulp (follow)
AgeCommit message (Collapse)AuthorFilesLines
2015-12-23IB/IPoIB: Move multicast specific code out of ipoib_main.cChristoph Lameter3-14/+23
Code cleanup to move multicast specific code that checks for a sendonly join to ipoib_multicast.c. This allows the removal of the export of __ipoib_mcast_find(). Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23IB/IPoIB: factor out common multicast list removal codeChristoph Lameter3-17/+17
Code cleanup to remove multicast specific code from ipoib_main.c The removal of a list of multicast groups occurs in three places. Create a new function ipoib_mcast_remove_list(). Use this new function in ipoib_main.c too. That in turn allows the dropping of two functions that were exported from ipoib_multicast.c for expiration of mc groups. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-22Merge branches '4.5/Or-cleanup' and '4.5/rdma-cq' into k.o/for-4.5Doug Ledford9-797/+512
Signed-off-by: Doug Ledford <dledford@redhat.com> Conflicts: drivers/infiniband/ulp/iser/iser_verbs.c
2015-12-22IB/ulps: Avoid calling ib_query_deviceOr Gerlitz12-142/+63
Instead, use the cached copy of the attributes present on the device. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-15Merge branch 'rdma-cq.2' of git://git.infradead.org/users/hch/rdma into 4.5/rdma-cqDoug Ledford9-795/+510
Signed-off-by: Doug Ledford <dledford@redhat.com> Conflicts: drivers/infiniband/ulp/srp/ib_srp.c - Conflicts with changes in ib_srp.c introduced during 4.4-rc updates
2015-12-11IB/iser: Convert to CQ abstractionChristoph Hellwig4-283/+210
Use the new CQ abstraction to simplify completions in the iSER initiator. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-12-11IB/iser: Use helper for container_ofSagi Grimberg3-6/+9
Nicer this way. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-12-11IB/iser: Use a dedicated descriptor for loginSagi Grimberg3-83/+89
We'll need it later with the new CQ abstraction. also switch login bufs to void pointers. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-12-11IB/srp: use the new CQ APIChristoph Hellwig2-94/+86
This also moves recv completion handling from hardirq context into softirq context. Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-12-11IB/srpt: chain RDMA READ/WRITE requestsChristoph Hellwig2-345/+130
Remove struct rdma_iu and instead allocate the struct ib_rdma_wr array early and fill out directly. This allows us to chain the WRs, and thus archives both less lock contention on the HCA workqueue as well as much simpler error handling. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
2015-12-11IB: add a proper completion queue abstractionChristoph Hellwig2-3/+5
This adds an abstraction that allows ULPs to simply pass a completion object and completion callback with each submitted WR and let the RDMA core handle the nitty gritty details of how to handle completion interrupts and poll the CQ. In detail there is a new ib_cqe structure which just contains the completion callback, and which can be used to get at the containing object using container_of. It is pointed to by the WR and WC as an alternative to the wr_id field, similar to how many ULPs already use the field to store a pointer using casts. A driver using the new completion callbacks allocates it's CQs using the new ib_create_cq API, which in addition to the number of CQEs and the completion vectors also takes a mode on how we poll for CQEs. Three modes are available: direct for drivers that never take CQ interrupts and just poll for them, softirq to poll from softirq context using the to be renamed blk-iopoll infrastructure which takes care of rearming and budgeting, or a workqueue for consumer who want to be called from user context. Thanks a lot to Sagi Grimberg who helped reviewing the API, wrote the current version of the workqueue code because my two previous attempts sucked too much and converted the iSER initiator to the new API. Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-12-08iser-target: Remove explicit mlx4 work-aroundSagi Grimberg1-10/+3
The driver now exposes sufficient limits so we can avoid having mlx4 specific work-around. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07IB/srp: Fix srp_map_sg_fr()Bart Van Assche2-15/+9
After dma_map_sg() has been called the return value of that function must be used as the number of elements in the scatterlist instead of scsi_sg_count(). Fixes: commit f7f7aab1a5c0 ("IB/srp: Convert to new registration API") Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: stable <stable@vger.kernel.org> # v4.4+ Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07IB/srp: Fix indirect data buffer rkey endiannessBart Van Assche1-1/+1
Detected by sparse. Fixes: commit 330179f2fa93 ("IB/srp: Register the indirect data buffer descriptor") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: stable <stable@vger.kernel.org> # v4.3+ Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07IB/srp: Initialize dma_length in srp_map_idbChristoph Hellwig1-0/+3
Without this sg_dma_len will return 0 on architectures tha have the dma_length field. Fixes: commit f7f7aab1a5c0 ("IB/srp: Convert to new registration API") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07IB/srp: Fix possible send queue overflowSagi Grimberg1-1/+1
When using work request based memory registration (fast_reg) we must reserve SQ entries for registration and invalidation in addition to send operations. Each IO consumes 3 SQ entries (registration, send, invalidation) so we need to allocate 3x larger send-queue instead of 2x. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> CC: Stable <stable@vger.kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07IB/srp: Fix a memory leakBart Van Assche1-9/+13
If srp_connect_ch() returns a positive value then that is considered by its caller as a connection failure but this does not result in a scsi_host_put() call and additionally causes the srp_create_target() function to return a positive value while it should return a negative value. Avoid all this confusion and additionally fix a memory leak by ensuring that srp_connect_ch() always returns a value that is <= 0. This patch avoids that a rejected login triggers the following memory leak: unreferenced object 0xffff88021b24a220 (size 8): comm "srp_daemon", pid 56421, jiffies 4295006762 (age 4240.750s) hex dump (first 8 bytes): 68 6f 73 74 35 38 00 a5 host58.. backtrace: [<ffffffff8151014a>] kmemleak_alloc+0x7a/0xc0 [<ffffffff81165c1e>] __kmalloc_track_caller+0xfe/0x160 [<ffffffff81260d2b>] kvasprintf+0x5b/0x90 [<ffffffff81260e2d>] kvasprintf_const+0x8d/0xb0 [<ffffffff81254b0c>] kobject_set_name_vargs+0x3c/0xa0 [<ffffffff81337e3c>] dev_set_name+0x3c/0x40 [<ffffffff81355757>] scsi_host_alloc+0x327/0x4b0 [<ffffffffa03edc8e>] srp_create_target+0x4e/0x8a0 [ib_srp] [<ffffffff8133778b>] dev_attr_store+0x1b/0x20 [<ffffffff811f27fa>] sysfs_kf_write+0x4a/0x60 [<ffffffff811f1e8e>] kernfs_fop_write+0x14e/0x180 [<ffffffff81176eef>] __vfs_write+0x2f/0xf0 [<ffffffff811771e4>] vfs_write+0xa4/0x100 [<ffffffff81177c64>] SyS_write+0x54/0xc0 [<ffffffff8151b257>] entry_SYSCALL_64_fastpath+0x12/0x6f Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-07IB/iser: use sector_div instead of do_divArnd Bergmann1-1/+1
do_div is the wrong way to divide a sector_t, as it is less efficient when sector_t is 32-bit wide. With the upcoming do_div optimizations, the kernel starts warning about this: drivers/infiniband/ulp/iser/iser_verbs.c:1296:4: note: in expansion of macro 'do_div' include/asm-generic/div64.h:224:22: warning: passing argument 1 of '__div64_32' from incompatible pointer type This changes the code to use sector_div instead, which always produces optimal code. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-11-13Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-5/+0
Pull final round of SCSI updates from James Bottomley: "Sorry for the delay in this patch which was mostly caused by getting the merger of the mpt2/mpt3sas driver, which was seen as an essential item of maintenance work to do before the drivers diverge too much. Unfortunately, this caused a compile failure (detected by linux-next), which then had to be fixed up and incubated. In addition to the mpt2/3sas rework, there are updates from pm80xx, lpfc, bnx2fc, hpsa, ipr, aacraid, megaraid_sas, storvsc and ufs plus an assortment of changes including some year 2038 issues, a fix for a remove before detach issue in some drivers and a couple of other minor issues" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (141 commits) mpt3sas: fix inline markers on non inline function declarations sd: Clear PS bit before Mode Select. ibmvscsi: set max_lun to 32 ibmvscsi: display default value for max_id, max_lun and max_channel. mptfusion: don't allow negative bytes in kbuf_alloc_2_sgl() scsi: pmcraid: replace struct timeval with ktime_get_real_seconds() mvumi: 64bit value for seconds_since1970 be2iscsi: Fix bogus WARN_ON length check scsi_scan: don't dump trace when scsi_prep_async_scan() is called twice mpt3sas: Bump mpt3sas driver version to 09.102.00.00 mpt3sas: Single driver module which supports both SAS 2.0 & SAS 3.0 HBAs mpt2sas, mpt3sas: Update the driver versions mpt3sas: setpci reset kernel oops fix mpt3sas: Added OEM Gen2 PnP ID branding names mpt3sas: Refcount fw_events and fix unsafe list usage mpt3sas: Refcount sas_device objects and fix unsafe list usage mpt3sas: sysfs attribute to report Backup Rail Monitor Status mpt3sas: Ported WarpDrive product SSS6200 support mpt3sas: fix for driver fails EEH, recovery from injected pci bus error mpt3sas: Manage MSI-X vectors according to HBA device type ...
2015-11-13Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pendingLinus Torvalds1-44/+34
Pull SCSI target updates from Nicholas Bellinger: "This series contains HCH's changes to absorb configfs attribute ->show() + ->store() function pointer usage from it's original tree-wide consumers, into common configfs code. It includes usb-gadget, target w/ drivers, netconsole and ocfs2 changes to realize the improved simplicity, that now renders the original include/target/configfs_macros.h CPP magic for fabric drivers and others, unnecessary and obsolete. And with common code in place, new configfs attributes can be added easier than ever before. Note, there are further improvements in-flight from other folks for v4.5 code in configfs land, plus number of target fixes for post -rc1 code" In the meantime, a new user of the now-removed old configfs API came in through the char/misc tree in commit 7bd1d4093c2f ("stm class: Introduce an abstraction for System Trace Module devices"). This merge resolution comes from Alexander Shishkin, who updated his stm class tracing abstraction to account for the removal of the old show_attribute and store_attribute methods in commit 517982229f78 ("configfs: remove old API") from this pull. As Alexander says about that patch: "There's no need to keep an extra wrapper structure per item and the awkward show_attribute/store_attribute item ops are no longer needed. This patch converts policy code to the new api, all the while making the code quite a bit smaller and easier on the eyes. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>" That patch was folded into the merge so that the tree should be fully bisectable. * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (23 commits) configfs: remove old API ocfs2/cluster: use per-attribute show and store methods ocfs2/cluster: move locking into attribute store methods netconsole: use per-attribute show and store methods target: use per-attribute show and store methods spear13xx_pcie_gadget: use per-attribute show and store methods dlm: use per-attribute show and store methods usb-gadget/f_serial: use per-attribute show and store methods usb-gadget/f_phonet: use per-attribute show and store methods usb-gadget/f_obex: use per-attribute show and store methods usb-gadget/f_uac2: use per-attribute show and store methods usb-gadget/f_uac1: use per-attribute show and store methods usb-gadget/f_mass_storage: use per-attribute show and store methods usb-gadget/f_sourcesink: use per-attribute show and store methods usb-gadget/f_printer: use per-attribute show and store methods usb-gadget/f_midi: use per-attribute show and store methods usb-gadget/f_loopback: use per-attribute show and store methods usb-gadget/ether: use per-attribute show and store methods usb-gadget/f_acm: use per-attribute show and store methods usb-gadget/f_hid: use per-attribute show and store methods ...
2015-11-09scsi: use host wide tags by defaultChristoph Hellwig1-5/+0
This patch changes the !blk-mq path to the same defaults as the blk-mq I/O path by always enabling block tagging, and always using host wide tags. We've had blk-mq available for a few releases so bugs with this mode should have been ironed out, and this ensures we get better coverage of over tagging setup over different configs. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Odin.com>
2015-10-28IB/srp: Dont allocate a page vector when using fast_regSagi Grimberg1-9/+11
The new fast registration API does not reuqire a page vector so we can't avoid allocating it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28IB/srp: Remove srp_finish_mappingSagi Grimberg1-10/+0
No callers left, remove it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28IB/srp: Convert to new registration APISagi Grimberg2-65/+71
Instead of constructing a page list, call ib_map_mr_sg and post a new ib_reg_wr. srp_map_finish_fr now returns the number of sg elements registered. Remove srp_finish_mapping since no one is calling it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28IB/srp: Split srp_map_sgSagi Grimberg1-51/+106
This is a preparation patch for the new registration API conversion. It splits srp_map_sg per registration strategy (srp_map_sg[fmr|fr|dma]. On its own it adds some code duplication, but it makes the API switch easier to comprehend. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28iser-target: Port to new memory registration APISagi Grimberg2-104/+28
Remove fastreg page list allocation as the page vector is now private to the provider. Instead of constructing the page list and fast_req work request, call ib_map_mr_sg and construct ib_reg_wr. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28IB/iser: Port to new fast registration APISagi Grimberg3-52/+28
Remove fastreg page list allocation as the page vector is now private to the provider. Instead of constructing the page list and fast_req work request, call ib_map_mr_sg and construct ib_reg_wr. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28Merge branch 'wr-cleanup' into k.o/for-4.4Doug Ledford13-172/+172
2015-10-28Merge branch 'wr-cleanup' of git://git.infradead.org/users/hch/rdma into wr-cleanupDoug Ledford14-199/+238
Signed-off-by: Doug Ledford <dledford@redhat.com> Conflicts: drivers/infiniband/ulp/isert/ib_isert.c - Commit 4366b19ca5eb (iser-target: Change the recv buffers posting logic) changed the logic in isert_put_datain() and had to be hand merged
2015-10-28IB/cma: Add support for network namespacesGuy Shapiro2-2/+2
Add support for network namespaces in the ib_cma module. This is accomplished by: 1. Adding network namespace parameter for rdma_create_id. This parameter is used to populate the network namespace field in rdma_id_private. rdma_create_id keeps a reference on the network namespace. 2. Using the network namespace from the rdma_id instead of init_net inside of ib_cma, when listening on an ID and when looking for an ID for an incoming request. 3. Decrementing the reference count for the appropriate network namespace when calling rdma_destroy_id. In order to preserve the current behavior init_net is passed when calling from other modules. Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28IB/iser: Enable SG clusteringSagi Grimberg1-1/+1
iser is perfectly capable supporting SG clustering as it translates the SG list to a page vector. Enabling SG clustering can dramatically reduce the number of SG elements, which doesn't make much of a difference at this point, but with arbitrary SG list support, reducing the number of SG elements can benefit greatly as as it would reduce the length of the HW descriptors array. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28IB/iser: set block queue_virt_boundarySagi Grimberg4-326/+18
The block layer can reliably guarantee that SG lists won't contain gaps (page unaligned) if a driver set the queue virt_boundary. With this setting the block layer will: - refuse merges if bios are not aligned to the virtual boundary - split bios/requests that are not aligned to the virtual boundary - or, bounce buffer SG_IOs that are not aligned to the virtual boundary Since iser is working in 4K page size, set the virt_boundary to 4K pages. With this setting, we can now safely remove the bounce buffering logic in iser. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-22iser-target: Remove an unused variableBart Van Assche1-3/+2
Detected this by compiling with W=1. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-22IB/iser: Remove an unused variableBart Van Assche1-4/+0
Detected this by compiling with W=1. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21IB/core: Add netdev and gid attributes paramteres to cacheMatan Barak4-4/+5
Adding an ability to query the IB cache by a netdev and get the attributes of a GID. These parameters are necessary in order to successfully resolve the required GID (when the netdevice is known) and get the Ethernet L2 attributes from a GID. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21IB/iser: fix a comment typoGeliang Tang1-1/+1
Just fix a typo in the code comment. Signed-off-by: Geliang Tang <geliangtang@163.com> Acked-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21Merge branch 'k.o/for-4.3-v1' into k.o/for-4.4Doug Ledford7-28/+70
Pick up the late fixes from the 4.3 cycle so we have them in our next branch.
2015-10-13target: use per-attribute show and store methodsChristoph Hellwig1-44/+34
This also allows to remove the target-specific old configfs macros, and gets rid of the target_core_fabric_configfs.h header which only had one function declaration left that could be moved to a better place. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org> Acked-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-10-13IB/ipoib: For sendonly join free the multicast group on leaveChristoph Lameter3-2/+5
When we leave the multicast group on expiration of a neighbor we do not free the mcast structure. This results in a memory leak that causes ib_dealloc_pd to fail and print a WARN_ON message and backtrace. Fixes: bd99b2e05c4d (IB/ipoib: Expire sendonly multicast joins) Signed-off-by: Christoph Lameter <cl@linux.com> Tested-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-08IB: split struct ib_send_wrChristoph Hellwig13-172/+172
This patch split up struct ib_send_wr so that all non-trivial verbs use their own structure which embedds struct ib_send_wr. This dramaticly shrinks the size of a WR for most common operations: sizeof(struct ib_send_wr) (old): 96 sizeof(struct ib_send_wr): 48 sizeof(struct ib_rdma_wr): 64 sizeof(struct ib_atomic_wr): 96 sizeof(struct ib_ud_wr): 88 sizeof(struct ib_fast_reg_wr): 88 sizeof(struct ib_bind_mw_wr): 96 sizeof(struct ib_sig_handover_wr): 80 And with Sagi's pending MR rework the fast registration WR will also be down to a reasonable size: sizeof(struct ib_fastreg_wr): 64 Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt] Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc] Tested-by: Haggai Eran <haggaie@mellanox.com> Tested-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Steve Wise <swise@opengridcomputing.com>
2015-09-25IB/ipoib: increase the max mcast backlog queueDoug Ledford1-1/+1
When performing sendonly joins, we queue the packets that trigger the join until the join completes. This may take on the order of hundreds of milliseconds. It is easy to have many more than three packets come in during that time. Expand the maximum queue depth in order to try and prevent dropped packets during the time it takes to join the multicast group. Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-25IB/ipoib: Make sendonly multicast joins create the mcast groupDoug Ledford1-10/+12
Since IPoIB should, as much as possible, emulate how multicast sends work on Ethernet for regular TCP/IP apps, there should be no requirement to subscribe to a multicast group before your sends are properly sent. However, due to the difference in how multicast is handled on InfiniBand, we must join the appropriate multicast group before we can send to it. Previously we tried not to trigger the auto-create feature of the subnet manager when doing this because we didn't have tracking of these sendonly groups and the auto-creation might never get undone. The previous patch added timing to these sendonly joins and allows us to leave them after a reasonable idle expiration time. So supply all of the information needed to auto-create group. Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-25IB/ipoib: Expire sendonly multicast joinsChristoph Lameter3-2/+22
On neighbor expiration, check to see if the neighbor was actually a sendonly multicast join, and if so, leave the multicast group as we expire the neighbor. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-25IB/iser: Add module parameter for always register memorySagi Grimberg4-14/+31
This module parameter forces memory registration even for a continuous memory region. It is true by default as sending an all-physical rkey with remote permissions might be insecure. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-09-15iser-target: Skip data copy if all the command data comes as immediateJenny Derzhavetz2-8/+19
Given that supporting zcopy immediate data for all IOs requires iser driver to use its own buffer allocations, we settle with avoiding data copy for IOs with data length of up to 8K (which is more latency sensitive anyway). This trims IO write latency by up to 3us and increase IOPs by up to 40% by saving CPU time doing sg_copy_from_buffer (8K IO size is the obvious winner here). Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-09-15iser-target: Change the recv buffers posting logicJenny Derzhavetz2-48/+67
iser target batches post recv operations to avoid the overhead of acquiring the recv queue lock and posting a HW doorbell for each command. We change it to be per command in order to support zcopy immediate data for IOs that fits in the 8K transfer boundary (in the next patch). (Fix minor patch fuzz due to ib_mr removal - nab) Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-09-15iser-target: Fix pending connections handling in target stack shutdown sequnceJenny Derzhavetz2-31/+40
Instead of handing a connection to the iscsi stack for processing right after accepting (rdma_accept) we only hand the connection to the iscsi core after we reached to a connected state (ESTABLISHED CM event). This will prevent two error scenrios: 1. race between rdma connection teardown and iscsi login sequence reported by Nic in: (ce9a9fc20a78a "iser-target: Fix REJECT CM event use-after-free OOPs") 2. target stack shutdown sequence race with constant login attempts by multiple initiators. We address this by maintaining two queues at the isert_np level: - accepted: connections that were accepted but have not reached connected state (might get rejected, unreachable or error). - pending: connections in connected state, but have yet to handed to the iscsi core for login processing. iser connections are promoted to the pending queue only from the accepted queue. This way the iscsi core now will only handle functional iser connections and once we shutdown the target stack, we look for any stales that got left behind so we can safely release them. Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Cc: <stable@vger.kernel.org> # v3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-09-15iser-target: Remove np_ prefix from isert_np membersJenny Derzhavetz2-33/+33
These are always referenced from np-> so no need for the prefix. Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-09-15iser-target: Remove unused variablesJenny Derzhavetz2-6/+0
Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2015-09-15iser-target: Put the reference on commands waiting for unsol dataJenny Derzhavetz1-1/+37
The iscsi target core teardown sequence calls wait_conn for all active commands to finish gracefully by: - move the queue-pair to error state - drain all the completions - wait for the core to finish handling all session commands However, when tearing down a session while there are sequenced commands that are still waiting for unsolicited data outs, we can block forever as these are missing an extra reference put. We basically need the equivalent of iscsit_free_queue_reqs_for_conn() which is called after wait_conn has returned. Address this by an explicit walk on conn_cmd_list and put the extra reference. Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Cc: <stable@vger.kernel.org> # v3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>