aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/mlx4/mlx4_ib.h
diff options
context:
space:
mode:
authorHåkon Bugge <haakon.bugge@oracle.com>2020-08-03 08:19:40 +0200
committerJason Gunthorpe <jgg@nvidia.com>2020-08-24 11:31:22 -0300
commit227a0e142e375909959a74b7782403e14331f6f3 (patch)
tree7c71e35c4589f776178eff69f2996d4c16b9c59e /drivers/infiniband/hw/mlx4/mlx4_ib.h
parentIB/mlx4: Fix starvation in paravirt mux/demux (diff)
downloadlinux-dev-227a0e142e375909959a74b7782403e14331f6f3.tar.xz
linux-dev-227a0e142e375909959a74b7782403e14331f6f3.zip
IB/mlx4: Add support for REJ due to timeout
A CM REJ packet with its reason equal to timeout is a special beast in the sense that it doesn't have a Remote Communication ID nor does it have a Remote Port GID. Using CX-3 virtual functions, either from a bare-metal machine or pass-through from a VM, MAD packets are proxied through the PF driver. Since the VF drivers have separate name spaces for MAD Transaction Ids (TIDs), the PF driver has to re-map the TIDs and keep the book keeping in a cache. This proxying doesn't not handle said REJ packets. If the active side abandons its connection attempt after having sent a REQ, it will send a REJ with the reason being timeout. This example can be provoked by a simple user-verbs program, which ends up doing: rdma_connect(cm_id, &conn_param); rdma_destroy_id(cm_id); using the async librdmacm API. Having dynamic debug prints enabled in the mlx4_ib driver, we will then see: mlx4_ib_demux_cm_handler: Couldn't find an entry for pv_cm_id 0x0, attr_id 0x12 The solution is to introduce a radix-tree. When a REQ packet is received and handled in mlx4_ib_demux_cm_handler(), we know the connecting peer's para-virtual cm_id and the destination slave. We then insert an entry into the tree with said information. We also schedule work to remove this entry from the tree and free it, in order to avoid memory leak. When a REJ packet with reason timeout is received, we can look up the slave in the tree, and deliver the packet to the correct slave. When a duplicate REQ packet is received, the entry is in the tree. In this case, we adjust the delayed work in order to avoid a too premature eviction of the entry. When cleaning up, we simply traverse the tree and modify any delayed work to use a zero delay. A subsequent flush of the system_wq will ensure all entries being wiped out. Fixes: 3cf69cc8dbeb ("IB/mlx4: Add CM paravirtualization") Link: https://lore.kernel.org/r/20200803061941.1139994-6-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Diffstat (limited to 'drivers/infiniband/hw/mlx4/mlx4_ib.h')
-rw-r--r--drivers/infiniband/hw/mlx4/mlx4_ib.h3
1 files changed, 3 insertions, 0 deletions
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 27bb23756663..bcac8fc50317 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -495,6 +495,9 @@ struct mlx4_ib_sriov {
spinlock_t id_map_lock;
struct rb_root sl_id_map;
struct list_head cm_list;
+ /* Protects the radix-tree */
+ struct mutex rej_tmout_lock;
+ struct radix_tree_root rej_tmout_root;
};
struct gid_cache_context {