aboutsummaryrefslogtreecommitdiffstats
path: root/net/rds/ib_cm.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-08-01rds: Remove IPv6 dependencyKa-Cheong Poon1-0/+9
This patch removes the IPv6 dependency from RDS. Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-01rds: rds_ib_recv_alloc_cache() should call alloc_percpu_gfp() insteadKa-Cheong Poon1-1/+1
Currently, rds_ib_conn_alloc() calls rds_ib_recv_alloc_caches() without passing along the gfp_t flag. But rds_ib_recv_alloc_caches() and rds_ib_recv_alloc_cache() should take a gfp_t parameter so that rds_ib_recv_alloc_cache() can call alloc_percpu_gfp() using the correct flag instead of calling alloc_percpu(). Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23rds: Enable RDS IPv6 supportKa-Cheong Poon1-5/+15
This patch enables RDS to use IPv6 addresses. For RDS/TCP, the listener is now an IPv6 endpoint which accepts both IPv4 and IPv6 connection requests. RDS/RDMA/IB uses a private data (struct rds_ib_connect_private) exchange between endpoints at RDS connection establishment time to support RDMA. This private data exchange uses a 32 bit integer to represent an IP address. This needs to be changed in order to support IPv6. A new private data struct rds6_ib_connect_private is introduced to handle this. To ensure backward compatibility, an IPv6 capable RDS stack uses another RDMA listener port (RDS_CM_PORT) to accept IPv6 connection. And it continues to use the original RDS_PORT for IPv4 RDS connections. When it needs to communicate with an IPv6 peer, it uses the RDS_CM_PORT to send the connection set up request. v5: Fixed syntax problem (David Miller). v4: Changed port history comments in rds.h (Sowmini Varadhan). v3: Added support to set up IPv4 connection using mapped address (David Miller). Added support to set up connection between link local and non-link addresses. Various review comments from Santosh Shilimkar and Sowmini Varadhan. v2: Fixed bound and peer address scope mismatched issue. Added back rds_connect() IPv6 changes. Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23rds: Changing IP address internal representation to struct in6_addrKa-Cheong Poon1-78/+221
This patch changes the internal representation of an IP address to use struct in6_addr. IPv4 address is stored as an IPv4 mapped address. All the functions which take an IP address as argument are also changed to use struct in6_addr. But RDS socket layer is not modified such that it still does not accept IPv6 address from an application. And RDS layer does not accept nor initiate IPv6 connections. v2: Fixed sparse warnings. Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-12treewide: Use array_size() in vzalloc_node()Kees Cook1-2/+4
The vzalloc_node() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: vzalloc_node(a * b, node) with: vzalloc_node(array_size(a, b), node) as well as handling cases of: vzalloc_node(a * b * c, node) with: vzalloc_node(array3_size(a, b, c), node) This does, however, attempt to ignore constant size factors like: vzalloc_node(4 * 1024, node) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( vzalloc_node( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | vzalloc_node( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( vzalloc_node( - sizeof(u8) * (COUNT) + COUNT , ...) | vzalloc_node( - sizeof(__u8) * (COUNT) + COUNT , ...) | vzalloc_node( - sizeof(char) * (COUNT) + COUNT , ...) | vzalloc_node( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | vzalloc_node( - sizeof(u8) * COUNT + COUNT , ...) | vzalloc_node( - sizeof(__u8) * COUNT + COUNT , ...) | vzalloc_node( - sizeof(char) * COUNT + COUNT , ...) | vzalloc_node( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( vzalloc_node( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vzalloc_node( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vzalloc_node( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vzalloc_node( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vzalloc_node( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) | vzalloc_node( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) | vzalloc_node( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) | vzalloc_node( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ vzalloc_node( - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( vzalloc_node( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc_node( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc_node( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc_node( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc_node( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc_node( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc_node( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc_node( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( vzalloc_node( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vzalloc_node( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vzalloc_node( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vzalloc_node( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vzalloc_node( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | vzalloc_node( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( vzalloc_node( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc_node( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc_node( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc_node( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc_node( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc_node( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc_node( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc_node( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( vzalloc_node(C1 * C2 * C3, ...) | vzalloc_node( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@ ( vzalloc_node(C1 * C2, ...) | vzalloc_node( - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-04-25rds: ib: Fix missing call to rds_ib_dev_put in rds_ib_setup_qpDag Moxnes1-1/+2
The function rds_ib_setup_qp is calling rds_ib_get_client_data and should correspondingly call rds_ib_dev_put. This call was lost in the non-error path with the introduction of error handling done in commit 3b12f73a5c29 ("rds: ib: add error handle") Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com> Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-08rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds connection/workq managementSowmini Varadhan1-0/+1
An rds_connection can get added during netns deletion between lines 528 and 529 of 506 static void rds_tcp_kill_sock(struct net *net) : /* code to pull out all the rds_connections that should be destroyed */ : 528 spin_unlock_irq(&rds_tcp_conn_lock); 529 list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node) 530 rds_conn_destroy(tc->t_cpath->cp_conn); Such an rds_connection would miss out the rds_conn_destroy() loop (that cancels all pending work) and (if it was scheduled after netns deletion) could trigger the use-after-free. A similar race-window exists for the module unload path in rds_tcp_exit -> rds_tcp_destroy_conns Concurrency with netns deletion (rds_tcp_kill_sock()) must be handled by checking check_net() before enqueuing new work or adding new connections. Concurrency with module-unload is handled by maintaining a module specific flag that is set at the start of the module exit function, and must be checked before enqueuing new work or adding new connections. This commit refactors existing RDS_DESTROY_PENDING checks added by commit 3db6e0d172c9 ("rds: use RCU to synchronize work-enqueue with connection teardown") and consolidates all the concurrency checks listed above into the function rds_destroy_pending(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-11/+36
Conflicts: drivers/net/ethernet/broadcom/genet/bcmgenet.c net/core/sock.c Conflicts were overlapping changes in bcmgenet and the lockdep handling of sockets. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-13rds: ib: drop unnecessary rdma_rejectZhu Yanjun1-3/+2
When rdma_accept fails, rdma_reject is called in it. As such, it is not necessary to execute rdma_reject again. Cc: Joe Jin <joe.jin@oracle.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09rds: ib: add error handleZhu Yanjun1-11/+36
In the function rds_ib_setup_qp, the error handle is missing. When some error occurs, it is possible that memory leak occurs. As such, error handle is added. Cc: Joe Jin <joe.jin@oracle.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Guanglei Li <guanglei.li@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02RDS: IB: fix panic due to handlers running post teardownSantosh Shilimkar1-0/+12
Shutdown code reaping loop takes care of emptying the CQ's before they being destroyed. And once tasklets are killed, the hanlders are not expected to run. But because of core tasklet code issues, tasklet handler could still run even after tasklet_kill, RDS IB shutdown code already reaps the CQs before freeing cq/qp resources so as such the handlers have nothing left to do post shutdown. On other hand any handler running after teardown and trying to access already freed qp/cq resources causes issues Patch fixes this race by makes sure that handlers returns without any action post teardown. Reviewed-by: Wengang <wen.gang.wang@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: IB: Add vector spreading for cqsSantosh Shilimkar1-3/+37
Based on available device vectors, allocate cqs accordingly to get better spread of completion vectors which helps performace great deal.. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: IB: track and log active side endpoint in connectionSantosh Shilimkar1-4/+7
Useful to know the active and passive end points in a RDS IB connection. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: IB: split the mr registration and invalidation pathSantosh Shilimkar1-2/+7
MR invalidation in RDS is done in background thread and not in data path like registration. So break the dependency between them which helps to remove the performance bottleneck. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: IB: include faddr in connection logSantosh Shilimkar1-10/+9
Also use pr_* for it. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2016-07-01RDS: TCP: Hooks to set up a single connection pathSowmini Varadhan1-1/+2
This patch adds ->conn_path_connect callbacks in the rds_transport that are used to set up a single connection path. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: Rework path specific indirectionsSowmini Varadhan1-1/+2
Refactor code to avoid separate indirections for single-path and multipath transports. All transports (both single and mp-capable) will get a pointer to the rds_conn_path, and can trivially derive the rds_connection from the ->cp_conn. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
Several cases of overlapping changes, except the packet scheduler conflicts which deal with the addition of the free list parameter to qdisc_enqueue(). Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-18net: rds: fix coding style issuesJoshua Houghton1-1/+1
Fix coding style issues in the following files: ib_cm.c: add space loop.c: convert spaces to tabs sysctl.c: add space tcp.h: convert spaces to tabs tcp_connect.c:remove extra indentation in switch statement tcp_recv.c: convert spaces to tabs tcp_send.c: convert spaces to tabs transport.c: move brace up one line on for statement Signed-off-by: Joshua Houghton <josh@awful.name> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-14RDS: Pass rds_conn_path to rds_send_xmit()Sowmini Varadhan1-1/+1
Pass a struct rds_conn_path to rds_send_xmit so that MP capable transports can transmit packets on something other than c_path[0]. The eventual goal for MP capable transports is to hash the rds socket to a path based on the bound local address/port, and use this path as the argument to rds_send_xmit() Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-14RDS: split out connection specific state from rds_connection to rds_conn_pathSowmini Varadhan1-0/+1
In preparation for multipath RDS, split the rds_connection structure into a base structure, and a per-path struct rds_conn_path. The base structure tracks information and locks common to all paths. The workqs for send/recv/shutdown etc are tracked per rds_conn_path. Thus the workq callbacks now work with rds_conn_path. This commit allows for one rds_conn_path per rds_connection, and will be extended into multiple conn_paths in subsequent commits. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-16RDS: fix endianness for dp_ack_seqQing Huang1-1/+1
dp->dp_ack_seq is used in big endian format. We need to do the big endianness conversion when we assign a value in host format to it. Signed-off-by: Qing Huang <qing.huang@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02RDS: IB: Support Fastreg MR (FRMR) memory registration modeAvinash Repaka1-1/+6
Fastreg MR(FRMR) is another method with which one can register memory to HCA. Some of the newer HCAs supports only fastreg mr mode, so we need to add support for it to have RDS functional on them. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02RDS: IB: allocate extra space on queues for FRMR supportsantosh.shilimkar@oracle.com1-4/+12
Fastreg MR(FRMR) memory registration and invalidation makes use of work request and completion queues for its operation. Patch allocates extra queue space towards these operation(s). Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02RDS: IB: Remove the RDS_IB_SEND_OP dependencysantosh.shilimkar@oracle.com1-15/+27
This helps to combine asynchronous fastreg MR completion handler with send completion handler. No functional change. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-07Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdmaLinus Torvalds1-1/+1
Pull rdma updates from Doug Ledford: "This is my initial round of 4.4 merge window patches. There are a few other things I wish to get in for 4.4 that aren't in this pull, as this represents what has gone through merge/build/run testing and not what is the last few items for which testing is not yet complete. - "Checksum offload support in user space" enablement - Misc cxgb4 fixes, add T6 support - Misc usnic fixes - 32 bit build warning fixes - Misc ocrdma fixes - Multicast loopback prevention extension - Extend the GID cache to store and return attributes of GIDs - Misc iSER updates - iSER clustering update - Network NameSpace support for rdma CM - Work Request cleanup series - New Memory Registration API" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (76 commits) IB/core, cma: Make __attribute_const__ declarations sparse-friendly IB/core: Remove old fast registration API IB/ipath: Remove fast registration from the code IB/hfi1: Remove fast registration from the code RDMA/nes: Remove old FRWR API IB/qib: Remove old FRWR API iw_cxgb4: Remove old FRWR API RDMA/cxgb3: Remove old FRWR API RDMA/ocrdma: Remove old FRWR API IB/mlx4: Remove old FRWR API support IB/mlx5: Remove old FRWR API support IB/srp: Dont allocate a page vector when using fast_reg IB/srp: Remove srp_finish_mapping IB/srp: Convert to new registration API IB/srp: Split srp_map_sg RDS/IW: Convert to new memory registration API svcrdma: Port to new memory registration API xprtrdma: Port to new memory registration API iser-target: Port to new memory registration API IB/iser: Port to new fast registration API ...
2015-10-28IB/cma: Add support for network namespacesGuy Shapiro1-1/+1
Add support for network namespaces in the ib_cma module. This is accomplished by: 1. Adding network namespace parameter for rdma_create_id. This parameter is used to populate the network namespace field in rdma_id_private. rdma_create_id keeps a reference on the network namespace. 2. Using the network namespace from the rdma_id instead of init_net inside of ib_cma, when listening on an ID and when looking for an ID for an incoming request. 3. Decrementing the reference count for the appropriate network namespace when calling rdma_destroy_id. In order to preserve the current behavior init_net is passed when calling from other modules. Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-05RDS: IB: handle rds_ibdev release case instead of crashing the kernelSantosh Shilimkar1-1/+2
Just in case we are still handling the QP receive completion while the rds_ibdev is released, drop the connection instead of crashing the kernel. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05RDS: IB: split send completion handling and do batch ackSantosh Shilimkar1-3/+42
Similar to what we did with receive CQ completion handling, we split the transmit completion handler so that it lets us implement batched work completion handling. We re-use the cq_poll routine and makes use of RDS_IB_SEND_OP to identify the send vs receive completion event handler invocation. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05RDS: IB: ack more receive completions to improve performanceSantosh Shilimkar1-2/+68
For better performance, we split the receive completion IRQ handler. That lets us acknowledge several WCE events in one call. We also limit the WC to max 32 to avoid latency. Acknowledging several completions in one call instead of several calls each time will provide better performance since less mutual exclusion locks are being performed. In next patch, send completion is also split which re-uses the poll_cq() and hence the code is moved to ib_cm.c Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-09-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdmaLinus Torvalds1-3/+1
Pull inifiniband/rdma updates from Doug Ledford: "This is a fairly sizeable set of changes. I've put them through a decent amount of testing prior to sending the pull request due to that. There are still a few fixups that I know are coming, but I wanted to go ahead and get the big, sizable chunk into your hands sooner rather than waiting for those last few fixups. Of note is the fact that this creates what is intended to be a temporary area in the drivers/staging tree specifically for some cleanups and additions that are coming for the RDMA stack. We deprecated two drivers (ipath and amso1100) and are waiting to hear back if we can deprecate another one (ehca). We also put Intel's new hfi1 driver into this area because it needs to be refactored and a transfer library created out of the factored out code, and then it and the qib driver and the soft-roce driver should all be modified to use that library. I expect drivers/staging/rdma to be around for three or four kernel releases and then to go away as all of the work is completed and final deletions of deprecated drivers are done. Summary of changes for 4.3: - Create drivers/staging/rdma - Move amso1100 driver to staging/rdma and schedule for deletion - Move ipath driver to staging/rdma and schedule for deletion - Add hfi1 driver to staging/rdma and set TODO for move to regular tree - Initial support for namespaces to be used on RDMA devices - Add RoCE GID table handling to the RDMA core caching code - Infrastructure to support handling of devices with differing read and write scatter gather capabilities - Various iSER updates - Kill off unsafe usage of global mr registrations - Update SRP driver - Misc mlx4 driver updates - Support for the mr_alloc verb - Support for a netlink interface between kernel and user space cache daemon to speed path record queries and route resolution - Ininitial support for safe hot removal of verbs devices" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (136 commits) IB/ipoib: Suppress warning for send only join failures IB/ipoib: Clean up send-only multicast joins IB/srp: Fix possible protection fault IB/core: Move SM class defines from ib_mad.h to ib_smi.h IB/core: Remove unnecessary defines from ib_mad.h IB/hfi1: Add PSM2 user space header to header_install IB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY mlx5: Fix incorrect wc pkey_index assignment for GSI messages IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow IB/uverbs: reject invalid or unknown opcodes IB/cxgb4: Fix if statement in pick_local_ip6adddrs IB/sa: Fix rdma netlink message flags IB/ucma: HW Device hot-removal support IB/mlx4_ib: Disassociate support IB/uverbs: Enable device removal when there are active user space applications IB/uverbs: Explicitly pass ib_dev to uverbs commands IB/uverbs: Fix race between ib_uverbs_open and remove_one IB/uverbs: Fix reference counting usage of event files IB/core: Make ib_dealloc_pd return void IB/srp: Create an insecure all physical rkey only if needed ...
2015-08-30rds/ib: Remove ib_get_dma_mr callsJason Gunthorpe1-3/+1
The pd now has a local_dma_lkey member which completely replaces ib_get_dma_mr, use it instead. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-25RDS: Don't destroy the rdma id until after we're done using itSantosh Shilimkar1-1/+2
During connection resets, we are destroying the rdma id too soon. We can't destroy it when it is still in use. So lets move rdma_destroy_id() after we clear the rings. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: make sure we post recv bufferssantosh.shilimkar@oracle.com1-1/+1
If we get an ENOMEM during rds_ib_recv_refill, we might never come back and refill again later. Patch makes sure to kick krdsd into helping out. To achieve this we add RDS_RECV_REFILL flag and update in the refill path based on that so that at least some therad will keep posting receive buffers. Since krdsd and softirq both might race for refill, we decide to schedule on work queue based on ring_low instead of ring_empty. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-25RDS: destroy the ib state earlier during shutdownsantosh.shilimkar@oracle.com1-8/+10
Destroy ib state early during shutdown. Otherwise we can get callbacks after the QP isn't really able to handle them. Reviewed-by: Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-07RDS-TCP: Make RDS-TCP work correctly when it is set up in a netns other than init_netSowmini Varadhan1-2/+3
Open the sockets calling sock_create_kern() with the correct struct net pointer, and use that struct net pointer when verifying the address passed to rds_bind(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-12IB/core: Change ib_create_cq to use struct ib_cq_init_attrMatan Barak1-2/+5
Currently, ib_create_cq uses cqe and comp_vecotr instead of the extendible ib_cq_init_attr struct. Earlier patches already changed the vendors to work with ib_cq_init_attr. This patch changes the consumers too. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18RDS: Switch to generic logging helpersSagi Grimberg1-33/+3
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-04net/rds: Fix new sparse warningDavid Ahern1-1/+1
c0adf54a109 introduced new sparse warnings: CHECK /home/dahern/kernels/linux.git/net/rds/ib_cm.c net/rds/ib_cm.c:191:34: warning: incorrect type in initializer (different base types) net/rds/ib_cm.c:191:34: expected unsigned long long [unsigned] [usertype] dp_ack_seq net/rds/ib_cm.c:191:34: got restricted __be64 <noident> net/rds/ib_cm.c:194:51: warning: cast to restricted __be64 The temporary variable for sequence number should have been declared as __be64 rather than u64. Make it so. Signed-off-by: David Ahern <david.ahern@oracle.com> Cc: shamir rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-03net/rds: fix unaligned memory accessshamir rabinovitch1-2/+11
rdma_conn_param private data is copied using memcpy after headers such as cma_hdr (see cma_resolve_ib_udp as example). so the start of the private data is aligned to the end of the structure that come before. if this structure end with u32 the meaning is that the start of the private data will be 4 bytes aligned. structures that use u8/u16/u32/u64 are naturally aligned but in case the structure start is not 8 bytes aligned, all u64 members of this structure will not be aligned. to solve this issue we must use special macros that allow unaligned access to those unaligned members. Addresses the following kernel log seen when attempting to use RDMA: Kernel unaligned access at TPC[10507a88] rds_ib_cm_connect_complete+0x1bc/0x1e0 [rds_rdma] Acked-by: Chien Yen <chien.yen@oracle.com> Signed-off-by: shamir rabinovitch <shamir.rabinovitch@oracle.com> [Minor tweaks for top of tree by:] Signed-off-by: David Ahern <david.ahern@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-26IB/rds: suppress incompatible protocol when version is knownMarciniszyn, Mike1-6/+5
Add an else to only print the incompatible protocol message when version hasn't been established. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-22RDS: use gfp flags from caller in conn_alloc()Dan Carpenter1-1/+1
We should be using the gfp flags the caller specified here, instead of GFP_KERNEL. I think this might be a bugfix, depending on the value of "sock->sk->sk_allocation" when we call rds_conn_create_outgoing() in rds_sendmsg(). Otherwise, it's just a cleanup. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-15net: Convert vmalloc/memset to vzallocJoe Perches1-4/+2
Signed-off-by: Joe Perches <joe@perches.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-06-17net/rds: use prink_ratelimited() instead of printk_ratelimit()Manuel Zerpies1-3/+3
Since printk_ratelimit() shouldn't be used anymore (see comment in include/linux/printk.h), replace it with printk_ratelimited() Signed-off-by: Manuel Zerpies <manuel.f.zerpies@ww.stud.uni-erlangen.de> Signed-off-by: David S. Miller <davem@conan.davemloft.net>
2011-05-25RDMA/cma: Pass QP type into rdma_create_id()Sean Hefty1-1/+1
The RDMA CM currently infers the QP type from the port space selected by the user. In the future (eg with RDMA_PS_IB or XRC), there may not be a 1-1 correspondence between port space and QP type. For netlink export of RDMA CM state, we want to export the QP type to userspace, so it is cleaner to explicitly associate a QP type to an ID. Modify rdma_create_id() to allow the user to specify the QP type, and use it to make our selections of datagram versus connected mode. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2010-09-19rds: double unlock in rds_ib_cm_handle_connect()Dan Carpenter1-1/+0
We unlock after we goto out. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-09-08RDS/IB: print string constants in more placesZach Brown1-6/+4
This prints the constant identifier for work completion status and rdma cm event types, like we already do for IB event types. A core string array helper is added that each string type uses. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS/IB: print IB event strings as well as their numberZach Brown1-4/+39
It's nice to not have to go digging in the code to see which event occurred. It's easy to throw together a quick array that maps the ib event enums to their strings. I didn't see anything in the stack that does this translation for us, but I also didn't look very hard. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS/IB: track signaled sendsZach Brown1-3/+11
We're seeing bugs today where IB connection shutdown clears the send ring while the tasklet is processing completed sends. Implementation details cause this to dereference a null pointer. Shutdown needs to wait for send completion to stop before tearing down the connection. We can't simply wait for the ring to empty because it may contain unsignaled sends that will never be processed. This patch tracks the number of signaled sends that we've posted and waits for them to complete. It also makes sure that the tasklet has finished executing. Signed-off-by: Zach Brown <zach.brown@oracle.com>
2010-09-08RDS/IB: Add caching of frags and incsChris Mason1-0/+9
This patch is based heavily on an initial patch by Chris Mason. Instead of freeing slab memory and pages, it keeps them, and funnels them back to be reused. The lock minimization strategy uses xchg and cmpxchg atomic ops for manipulation of pointers to list heads. We anchor the lists with a pointer to a list_head struct instead of a static list_head struct. We just have to carefully use the existing primitives with the difference between a pointer and a static head struct. For example, 'list_empty()' means that our anchor pointer points to a list with a single item instead of meaning that our static head element doesn't point to any list items. Original patch by Chris, with significant mods and fixes by Andy and Zach. Signed-off-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andy Grover <andy.grover@oracle.com> Signed-off-by: Zach Brown <zach.brown@oracle.com>