aboutsummaryrefslogtreecommitdiffstats
path: root/net/rds/ib_cm.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-03-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-11/+36
Conflicts: drivers/net/ethernet/broadcom/genet/bcmgenet.c net/core/sock.c Conflicts were overlapping changes in bcmgenet and the lockdep handling of sockets. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-13rds: ib: drop unnecessary rdma_rejectZhu Yanjun1-3/+2
When rdma_accept fails, rdma_reject is called in it. As such, it is not necessary to execute rdma_reject again. Cc: Joe Jin <joe.jin@oracle.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09rds: ib: add error handleZhu Yanjun1-11/+36
In the function rds_ib_setup_qp, the error handle is missing. When some error occurs, it is possible that memory leak occurs. As such, error handle is added. Cc: Joe Jin <joe.jin@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Guanglei Li <guanglei.li@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02RDS: IB: fix panic due to handlers running post teardownSantosh Shilimkar1-0/+12
Shutdown code reaping loop takes care of emptying the CQ's before they being destroyed. And once tasklets are killed, the hanlders are not expected to run. But because of core tasklet code issues, tasklet handler could still run even after tasklet_kill, RDS IB shutdown code already reaps the CQs before freeing cq/qp resources so as such the handlers have nothing left to do post shutdown. On other hand any handler running after teardown and trying to access already freed qp/cq resources causes issues Patch fixes this race by makes sure that handlers returns without any action post teardown. Reviewed-by: Wengang <wen.gang.wang@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: IB: Add vector spreading for cqsSantosh Shilimkar1-3/+37
Based on available device vectors, allocate cqs accordingly to get better spread of completion vectors which helps performace great deal.. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: IB: track and log active side endpoint in connectionSantosh Shilimkar1-4/+7
Useful to know the active and passive end points in a RDS IB connection. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: IB: split the mr registration and invalidation pathSantosh Shilimkar1-2/+7
MR invalidation in RDS is done in background thread and not in data path like registration. So break the dependency between them which helps to remove the performance bottleneck. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: IB: include faddr in connection logSantosh Shilimkar1-10/+9
Also use pr_* for it. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2016-07-01RDS: TCP: Hooks to set up a single connection pathSowmini Varadhan1-1/+2
This patch adds ->conn_path_connect callbacks in the rds_transport that are used to set up a single connection path. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: Rework path specific indirectionsSowmini Varadhan1-1/+2
Refactor code to avoid separate indirections for single-path and multipath transports. All transports (both single and mp-capable) will get a pointer to the rds_conn_path, and can trivially derive the rds_connection from the ->cp_conn. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
Several cases of overlapping changes, except the packet scheduler conflicts which deal with the addition of the free list parameter to qdisc_enqueue(). Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-18net: rds: fix coding style issuesJoshua Houghton1-1/+1
Fix coding style issues in the following files: ib_cm.c: add space loop.c: convert spaces to tabs sysctl.c: add space tcp.h: convert spaces to tabs tcp_connect.c:remove extra indentation in switch statement tcp_recv.c: convert spaces to tabs tcp_send.c: convert spaces to tabs transport.c: move brace up one line on for statement Signed-off-by: Joshua Houghton <josh@awful.name> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-14RDS: Pass rds_conn_path to rds_send_xmit()Sowmini Varadhan1-1/+1
Pass a struct rds_conn_path to rds_send_xmit so that MP capable transports can transmit packets on something other than c_path[0]. The eventual goal for MP capable transports is to hash the rds socket to a path based on the bound local address/port, and use this path as the argument to rds_send_xmit() Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-14RDS: split out connection specific state from rds_connection to rds_conn_pathSowmini Varadhan1-0/+1
In preparation for multipath RDS, split the rds_connection structure into a base structure, and a per-path struct rds_conn_path. The base structure tracks information and locks common to all paths. The workqs for send/recv/shutdown etc are tracked per rds_conn_path. Thus the workq callbacks now work with rds_conn_path. This commit allows for one rds_conn_path per rds_connection, and will be extended into multiple conn_paths in subsequent commits. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-16RDS: fix endianness for dp_ack_seqQing Huang1-1/+1
dp->dp_ack_seq is used in big endian format. We need to do the big endianness conversion when we assign a value in host format to it. Signed-off-by: Qing Huang <qing.huang@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02RDS: IB: Support Fastreg MR (FRMR) memory registration modeAvinash Repaka1-1/+6
Fastreg MR(FRMR) is another method with which one can register memory to HCA. Some of the newer HCAs supports only fastreg mr mode, so we need to add support for it to have RDS functional on them. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02RDS: IB: allocate extra space on queues for FRMR supportsantosh.shilimkar@oracle.com1-4/+12
Fastreg MR(FRMR) memory registration and invalidation makes use of work request and completion queues for its operation. Patch allocates extra queue space towards these operation(s). Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02RDS: IB: Remove the RDS_IB_SEND_OP dependencysantosh.shilimkar@oracle.com1-15/+27
This helps to combine asynchronous fastreg MR completion handler with send completion handler. No functional change. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdmaLinus Torvalds1-1/+1
Pull rdma updates from Doug Ledford: "This is my initial round of 4.4 merge window patches. There are a few other things I wish to get in for 4.4 that aren't in this pull, as this represents what has gone through merge/build/run testing and not what is the last few items for which testing is not yet complete. - "Checksum offload support in user space" enablement - Misc cxgb4 fixes, add T6 support - Misc usnic fixes - 32 bit build warning fixes - Misc ocrdma fixes - Multicast loopback prevention extension - Extend the GID cache to store and return attributes of GIDs - Misc iSER updates - iSER clustering update - Network NameSpace support for rdma CM - Work Request cleanup series - New Memory Registration API" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (76 commits) IB/core, cma: Make __attribute_const__ declarations sparse-friendly IB/core: Remove old fast registration API IB/ipath: Remove fast registration from the code IB/hfi1: Remove fast registration from the code RDMA/nes: Remove old FRWR API IB/qib: Remove old FRWR API iw_cxgb4: Remove old FRWR API RDMA/cxgb3: Remove old FRWR API RDMA/ocrdma: Remove old FRWR API IB/mlx4: Remove old FRWR API support IB/mlx5: Remove old FRWR API support IB/srp: Dont allocate a page vector when using fast_reg IB/srp: Remove srp_finish_mapping IB/srp: Convert to new registration API IB/srp: Split srp_map_sg RDS/IW: Convert to new memory registration API svcrdma: Port to new memory registration API xprtrdma: Port to new memory registration API iser-target: Port to new memory registration API IB/iser: Port to new fast registration API ...
2015-10-28IB/cma: Add support for network namespacesGuy Shapiro1-1/+1
Add support for network namespaces in the ib_cma module. This is accomplished by: 1. Adding network namespace parameter for rdma_create_id. This parameter is used to populate the network namespace field in rdma_id_private. rdma_create_id keeps a reference on the network namespace. 2. Using the network namespace from the rdma_id instead of init_net inside of ib_cma, when listening on an ID and when looking for an ID for an incoming request. 3. Decrementing the reference count for the appropriate network namespace when calling rdma_destroy_id. In order to preserve the current behavior init_net is passed when calling from other modules. Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-05RDS: IB: handle rds_ibdev release case instead of crashing the kernelSantosh Shilimkar1-1/+2
Just in case we are still handling the QP receive completion while the rds_ibdev is released, drop the connection instead of crashing the kernel. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05RDS: IB: split send completion handling and do batch ackSantosh Shilimkar1-3/+42
Similar to what we did with receive CQ completion handling, we split the transmit completion handler so that it lets us implement batched work completion handling. We re-use the cq_poll routine and makes use of RDS_IB_SEND_OP to identify the send vs receive completion event handler invocation. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05RDS: IB: ack more receive completions to improve performanceSantosh Shilimkar1-2/+68
For better performance, we split the receive completion IRQ handler. That lets us acknowledge several WCE events in one call. We also limit the WC to max 32 to avoid latency. Acknowledging several completions in one call instead of several calls each time will provide better performance since less mutual exclusion locks are being performed. In next patch, send completion is also split which re-uses the poll_cq() and hence the code is moved to ib_cm.c Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-09-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdmaLinus Torvalds1-3/+1
Pull inifiniband/rdma updates from Doug Ledford: "This is a fairly sizeable set of changes. I've put them through a decent amount of testing prior to sending the pull request due to that. There are still a few fixups that I know are coming, but I wanted to go ahead and get the big, sizable chunk into your hands sooner rather than waiting for those last few fixups. Of note is the fact that this creates what is intended to be a temporary area in the drivers/staging tree specifically for some cleanups and additions that are coming for the RDMA stack. We deprecated two drivers (ipath and amso1100) and are waiting to hear back if we can deprecate another one (ehca). We also put Intel's new hfi1 driver into this area because it needs to be refactored and a transfer library created out of the factored out code, and then it and the qib driver and the soft-roce driver should all be modified to use that library. I expect drivers/staging/rdma to be around for three or four kernel releases and then to go away as all of the work is completed and final deletions of deprecated drivers are done. Summary of changes for 4.3: - Create drivers/staging/rdma - Move amso1100 driver to staging/rdma and schedule for deletion - Move ipath driver to staging/rdma and schedule for deletion - Add hfi1 driver to staging/rdma and set TODO for move to regular tree - Initial support for namespaces to be used on RDMA devices - Add RoCE GID table handling to the RDMA core caching code - Infrastructure to support handling of devices with differing read and write scatter gather capabilities - Various iSER updates - Kill off unsafe usage of global mr registrations - Update SRP driver - Misc mlx4 driver updates - Support for the mr_alloc verb - Support for a netlink interface between kernel and user space cache daemon to speed path record queries and route resolution - Ininitial support for safe hot removal of verbs devices" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (136 commits) IB/ipoib: Suppress warning for send only join failures IB/ipoib: Clean up send-only multicast joins IB/srp: Fix possible protection fault IB/core: Move SM class defines from ib_mad.h to ib_smi.h IB/core: Remove unnecessary defines from ib_mad.h IB/hfi1: Add PSM2 user space header to header_install IB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY mlx5: Fix incorrect wc pkey_index assignment for GSI messages IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow IB/uverbs: reject invalid or unknown opcodes IB/cxgb4: Fix if statement in pick_local_ip6adddrs IB/sa: Fix rdma netlink message flags IB/ucma: HW Device hot-removal support IB/mlx4_ib: Disassociate support IB/uverbs: Enable device removal when there are active user space applications IB/uverbs: Explicitly pass ib_dev to uverbs commands IB/uverbs: Fix race between ib_uverbs_open and remove_one IB/uverbs: Fix reference counting usage of event files IB/core: Make ib_dealloc_pd return void IB/srp: Create an insecure all physical rkey only if needed ...
2015-08-30rds/ib: Remove ib_get_dma_mr callsJason Gunthorpe1-3/+1
The pd now has a local_dma_lkey member which completely replaces ib_get_dma_mr, use it instead. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-25RDS: Don't destroy the rdma id until after we're done using itSantosh Shilimkar1-1/+2
During connection resets, we are destroying the rdma id too soon. We can't destroy it when it is still in use. So lets move rdma_destroy_id() after we clear the rings. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: make sure we post recv bufferssantosh.shilimkar@oracle.com1-1/+1
If we get an ENOMEM during rds_ib_recv_refill, we might never come back and refill again later. Patch makes sure to kick krdsd into helping out. To achieve this we add RDS_RECV_REFILL flag and update in the refill path based on that so that at least some therad will keep posting receive buffers. Since krdsd and softirq both might race for refill, we decide to schedule on work queue based on ring_low instead of ring_empty. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: destroy the ib state earlier during shutdownsantosh.shilimkar@oracle.com1-8/+10
Destroy ib state early during shutdown. Otherwise we can get callbacks after the QP isn't really able to handle them. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-07RDS-TCP: Make RDS-TCP work correctly when it is set up in a netns other than init_netSowmini Varadhan1-2/+3
Open the sockets calling sock_create_kern() with the correct struct net pointer, and use that struct net pointer when verifying the address passed to rds_bind(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-12IB/core: Change ib_create_cq to use struct ib_cq_init_attrMatan Barak1-2/+5
Currently, ib_create_cq uses cqe and comp_vecotr instead of the extendible ib_cq_init_attr struct. Earlier patches already changed the vendors to work with ib_cq_init_attr. This patch changes the consumers too. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18RDS: Switch to generic logging helpersSagi Grimberg1-33/+3
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-04net/rds: Fix new sparse warningDavid Ahern1-1/+1
c0adf54a109 introduced new sparse warnings: CHECK /home/dahern/kernels/linux.git/net/rds/ib_cm.c net/rds/ib_cm.c:191:34: warning: incorrect type in initializer (different base types) net/rds/ib_cm.c:191:34: expected unsigned long long [unsigned] [usertype] dp_ack_seq net/rds/ib_cm.c:191:34: got restricted __be64 <noident> net/rds/ib_cm.c:194:51: warning: cast to restricted __be64 The temporary variable for sequence number should have been declared as __be64 rather than u64. Make it so. Signed-off-by: David Ahern <david.ahern@oracle.com> Cc: shamir rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-03net/rds: fix unaligned memory accessshamir rabinovitch1-2/+11
rdma_conn_param private data is copied using memcpy after headers such as cma_hdr (see cma_resolve_ib_udp as example). so the start of the private data is aligned to the end of the structure that come before. if this structure end with u32 the meaning is that the start of the private data will be 4 bytes aligned. structures that use u8/u16/u32/u64 are naturally aligned but in case the structure start is not 8 bytes aligned, all u64 members of this structure will not be aligned. to solve this issue we must use special macros that allow unaligned access to those unaligned members. Addresses the following kernel log seen when attempting to use RDMA: Kernel unaligned access at TPC[10507a88] rds_ib_cm_connect_complete+0x1bc/0x1e0 [rds_rdma] Acked-by: Chien Yen <chien.yen@oracle.com> Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com> [Minor tweaks for top of tree by:] Signed-off-by: David Ahern <david.ahern@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26IB/rds: suppress incompatible protocol when version is knownMarciniszyn, Mike1-6/+5
Add an else to only print the incompatible protocol message when version hasn't been established. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-22RDS: use gfp flags from caller in conn_alloc()Dan Carpenter1-1/+1
We should be using the gfp flags the caller specified here, instead of GFP_KERNEL. I think this might be a bugfix, depending on the value of "sock->sk->sk_allocation" when we call rds_conn_create_outgoing() in rds_sendmsg(). Otherwise, it's just a cleanup. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-15net: Convert vmalloc/memset to vzallocJoe Perches1-4/+2
Signed-off-by: Joe Perches <joe@perches.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-06-17net/rds: use prink_ratelimited() instead of printk_ratelimit()Manuel Zerpies1-3/+3
Since printk_ratelimit() shouldn't be used anymore (see comment in include/linux/printk.h), replace it with printk_ratelimited() Signed-off-by: Manuel Zerpies <manuel.f.zerpies@ww.stud.uni-erlangen.de> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
2011-05-25RDMA/cma: Pass QP type into rdma_create_id()Sean Hefty1-1/+1
The RDMA CM currently infers the QP type from the port space selected by the user. In the future (eg with RDMA_PS_IB or XRC), there may not be a 1-1 correspondence between port space and QP type. For netlink export of RDMA CM state, we want to export the QP type to userspace, so it is cleaner to explicitly associate a QP type to an ID. Modify rdma_create_id() to allow the user to specify the QP type, and use it to make our selections of datagram versus connected mode. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2010-09-19rds: double unlock in rds_ib_cm_handle_connect()Dan Carpenter1-1/+0
We unlock after we goto out. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08RDS/IB: print string constants in more placesZach Brown1-6/+4
This prints the constant identifier for work completion status and rdma cm event types, like we already do for IB event types. A core string array helper is added that each string type uses. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS/IB: print IB event strings as well as their numberZach Brown1-4/+39
It's nice to not have to go digging in the code to see which event occurred. It's easy to throw together a quick array that maps the ib event enums to their strings. I didn't see anything in the stack that does this translation for us, but I also didn't look very hard. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS/IB: track signaled sendsZach Brown1-3/+11
We're seeing bugs today where IB connection shutdown clears the send ring while the tasklet is processing completed sends. Implementation details cause this to dereference a null pointer. Shutdown needs to wait for send completion to stop before tearing down the connection. We can't simply wait for the ring to empty because it may contain unsignaled sends that will never be processed. This patch tracks the number of signaled sends that we've posted and waits for them to complete. It also makes sure that the tasklet has finished executing. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS/IB: Add caching of frags and incsChris Mason1-0/+9
This patch is based heavily on an initial patch by Chris Mason. Instead of freeing slab memory and pages, it keeps them, and funnels them back to be reused. The lock minimization strategy uses xchg and cmpxchg atomic ops for manipulation of pointers to list heads. We anchor the lists with a pointer to a list_head struct instead of a static list_head struct. We just have to carefully use the existing primitives with the difference between a pointer and a static head struct. For example, 'list_empty()' means that our anchor pointer points to a list with a single item instead of meaning that our static head element doesn't point to any list items. Original patch by Chris, with significant mods and fixes by Andy and Zach. Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS/IB: add refcount tracking to struct rds_ib_deviceZach Brown1-19/+14
The RDS IB client .remove callback used to free the rds_ibdev for the given device unconditionally. This could race other users of the struct. This patch adds refcounting so that we only free the rds_ibdev once all of its users are done. Many rds_ibdev users are tied to connections. We give the connection a reference and change these users to reference the device in the connection instead of looking it up in the IB client data. The only user of the IB client data remaining is the first lookup of the device as connections are built up. Incrementing the reference count of a device found in the IB client data could race with final freeing so we use an RCU grace period to make sure that freeing won't happen until those lookups are done. MRs need the rds_ibdev to get at the pool that they're freed in to. They exist outside a connection and many MRs can reference different devices from one socket, so it was natural to have each MR hold a reference. MR refs can be dropped from interrupt handlers and final device teardown can block so we push it off to a work struct. Pool teardown had to be fixed to cancel its pending work instead of deadlocking waiting for all queued work, including itself, to finish. MRs get their reference from the global device list, which gets a reference. It is left unprotected by locks and remains racy. A simple global lock would be a significant bottleneck. More scalable (complicated) locking should be done carefully in a later patch. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS/IB: rds_ib_cm_handle_connect() forgot to unlock c_cm_lockZach Brown1-9/+6
rds_ib_cm_handle_connect() could return without unlocking the c_conn_lock if rds_setup_qp() failed. Rather than adding another imbalanced mutex_unlock() to this error path we only unlock the mutex once as we exit the function, reducing the likelyhood of making this same mistake in the future. We remove the previous mulitple return sites, leaving one unambigious return path. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS/IB: add _to_node() macros for numa and use {k,v}malloc_node()Andy Grover1-2/+4
Allocate send/recv rings in memory that is node-local to the HCA. This significantly helps performance. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: Refill recv ring directly from taskletAndy Grover1-1/+1
Performance is better if we use allocations that don't block to refill the receive ring. Since the whole reason we were kicking out to the worker thread was so we could do blocking allocs, we no longer need to do this. Remove gfp params from rds_ib_recv_refill(); we always use GFP_NOWAIT. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS/IB: Do not wait for send ring to be empty on conn shutdownAndy Grover1-1/+4
Now that we are signaling send completions much less, we are likely to have dirty entries in the send queue when the connection is shut down (on rmmod, for example.) These are cleaned up a little further down in conn_shutdown, but if we wait on the ring_empty_wait for them, it'll never happen, and we hand on unload. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS: Perform unmapping ops in stagesAndy Grover1-3/+6
Previously, RDS would wait until the final send WR had completed and then handle cleanup. With silent ops, we do not know if an atomic, rdma, or data op will be last. This patch handles any of these cases by keeping a pointer to the last op in the message in m_last_op. When the TX completion event fires, rds dispatches to per-op-type cleanup functions, and then does whole-message cleanup, if the last op equalled m_last_op. This patch also moves towards having op-specific functions take the op struct, instead of the overall rm struct. rds_ib_connection has a pointer to keep track of a a partially- completed data send operation. This patch changes it from an rds_message pointer to the narrower rm_data_op pointer, and modifies places that use this pointer as needed. Signed-off-by: Andy Grover <andy.grover@oracle.com>
2010-09-08RDS/IB: Disallow connections less than RDS 3.1Andy Grover1-5/+15
RDS 3.0 connections (in OFED 1.3 and earlier) put the header at the end. 3.1 connections put it at the head. The code has significant added complexity in order to handle both configurations. In OFED 1.6 we can drop this and simplify the code by only supporting "header-first" configuration. This patch checks the protocol version, and if prior to 3.1, does not complete the connection. Signed-off-by: Andy Grover <andy.grover@oracle.com>