aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/mlx4 (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-07-30IB/mlx4: Use 4K pages for kernel QP's WQE bufferJack Morgenstein2-176/+34
In the current implementation, the driver tries to allocate contiguous memory, and if it fails, it falls back to 4K fragmented allocation. Once the memory is fragmented, the first allocation might take a lot of time, and even fail, which can cause connection failures. This patch changes the logic to always allocate with 4K granularity, since it's more robust and more likely to succeed. This patch was tested with Lustre and no performance degradation was observed. Note: This commit eliminates the "shrinking WQE" feature. This feature depended on using vmap to create a virtually contiguous send WQ. vmap use was abandoned due to problems with several processors (see the commit cited below). As a result, shrinking WQE was available only with physically contiguous send WQs. Allocating such send WQs caused the problems described above. Therefore, as a side effect of eliminating the use of large physically contiguous send WQs, the shrinking WQE feature became unavailable. Warning example: worker/20:1: page allocation failure: order:8, mode:0x80d0 CPU: 20 PID: 513 Comm: kworker/20:1 Tainted: G OE ------------ Workqueue: ib_cm cm_work_handler [ib_cm] Call Trace: [<ffffffff81686d81>] dump_stack+0x19/0x1b [<ffffffff81186160>] warn_alloc_failed+0x110/0x180 [<ffffffff8118a954>] __alloc_pages_nodemask+0x9b4/0xba0 [<ffffffff811ce868>] alloc_pages_current+0x98/0x110 [<ffffffff81184fae>] __get_free_pages+0xe/0x50 [<ffffffff8133f6fe>] swiotlb_alloc_coherent+0x5e/0x150 [<ffffffff81062551>] x86_swiotlb_alloc_coherent+0x41/0x50 [<ffffffffa056b4c4>] mlx4_buf_direct_alloc.isra.7+0xc4/0x180 [mlx4_core] [<ffffffffa056b73b>] mlx4_buf_alloc+0x1bb/0x260 [mlx4_core] [<ffffffffa0b15496>] create_qp_common+0x536/0x1000 [mlx4_ib] [<ffffffff811c6ef7>] ? dma_pool_free+0xa7/0xd0 [<ffffffffa0b163c1>] mlx4_ib_create_qp+0x3b1/0xdc0 [mlx4_ib] [<ffffffffa0b01bc2>] ? mlx4_ib_create_cq+0x2d2/0x430 [mlx4_ib] [<ffffffffa0b21f20>] mlx4_ib_create_qp_wrp+0x10/0x20 [mlx4_ib] [<ffffffffa08f152a>] ib_create_qp+0x7a/0x2f0 [ib_core] [<ffffffffa06205d4>] rdma_create_qp+0x34/0xb0 [rdma_cm] [<ffffffffa08275c9>] kiblnd_create_conn+0xbf9/0x1950 [ko2iblnd] [<ffffffffa074077a>] ? cfs_percpt_unlock+0x1a/0xb0 [libcfs] [<ffffffffa0835519>] kiblnd_passive_connect+0xa99/0x18c0 [ko2iblnd] Fixes: 73898db04301 ("net/mlx4: Avoid wrong virtual mappings") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments constBart Van Assche4-21/+23
Since neither ib_post_send() nor ib_post_recv() modify the data structure their second argument points at, declare that argument const. This change makes it necessary to declare the 'bad_wr' argument const too and also to modify all ULPs that call ib_post_send(), ib_post_recv() or ib_post_srq_recv(). This patch does not change any functionality but makes it possible for the compiler to verify whether the ib_post_(send|recv|srq_recv) really do not modify the posted work request. To make this possible, only one cast had to be introduce that casts away constness, namely in rpcrdma_post_recvs(). The only way I can think of to avoid that cast is to introduce an additional loop in that function or to change the data type of bad_wr from struct ib_recv_wr ** into int (an index that refers to an element in the work request list). However, both approaches would require even more extensive changes than this patch. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-30RDMA: Constify the argument of the work request conversion functionsBart Van Assche1-12/+13
When posting a send work request, the work request that is posted is not modified by any of the RDMA drivers. Make this explicit by constifying most ib_send_wr pointers in RDMA transport drivers. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-10RDMA: Fix storage of PortInfo CapabilityMask in the kernelJason Gunthorpe1-1/+2
The internal flag IP_BASED_GIDS was added to a field that was being used to hold the port Info CapabilityMask without considering the effects this will have. Since most drivers just use the value from the HW MAD it means IP_BASED_GIDS will also become set on any HW that sets the IBA flag IsOtherLocalChangesNoticeSupported - which is not intended. Fix this by keeping port_cap_flags only for the IBA CapabilityMask value and store unrelated flags externally. Move the bit definitions for this to ib_mad.h to make it clear what is happening. To keep the uAPI unchanged define a new set of flags in the uapi header that are only used by ib_uverbs_query_port_resp.port_cap_flags which match the current flags supported in rdma-core, and the values exposed by the current kernel. Fixes: b4a26a27287a ("IB: Report using RoCE IP based gids in port caps") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04IB/mlx4: Test port number before querying type.Tarick Bedeir1-1/+1
rdma_ah_find_type() can reach into ib_device->port_immutable with a potentially out-of-bounds port number, so check that the port number is valid first. Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types") Signed-off-by: Tarick Bedeir <tarick@google.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-26IB/mlx4: Create slave AH's directlyJason Gunthorpe3-25/+59
Since slave GID's do not exist in the core gid table we can no longer use the core code to help do this without creating inconsistencies. Directly create the AH using mlx4 internal APIs. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-25IB/mlx4: Add support for drain SQ & RQYishai Hadas3-6/+152
This patch follows the logic from ib_core but considers the internal device state upon executing the involved commands. Specifically, Upon internal error state modify QP to an error state can be assumed to be success as each in-progress WR going to be flushed in error in any case as expected by that modify command. In addition, As the drain should never fail the driver makes sure that post_send/recv will succeed even if the device is already in an internal error state. As such once the driver will supply the simulated/SW CQEs the CQE for the drain WR will be handled as well. In case of an internal error state the CQE for the drain WR may be completed as part of the main task that handled the error state or by the task that issued the drain WR. As the above depends on scheduling the code takes the relevant locks and actions to make sure that the completion handler for that WR will always be called after that the post_send/recv were issued but not in parallel to the other task that handles the error flow. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18IB/core: add max_send_sge and max_recv_sge attributesSteve Wise1-2/+2
This patch replaces the ib_device_attr.max_sge with max_send_sge and max_recv_sge. It allows ulps to take advantage of devices that have very different send and recv sge depths. For example cxgb4 has a max_recv_sge of 4, yet a max_send_sge of 16. Splitting out these attributes allows much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW API. Consider a large RDMA WRITE that has 16 scattergather entries. With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of 16, it can be done with 1 WRITE WR. Acked-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18RDMA: Convert drivers to use the AH's sgid_attr in post_wr pathsParav Pandit1-6/+2
For UD the drivers were doing a sgid_index lookup into the cache to get the attrs, however we can now directly access the same attrs stores in the ib_ah instead and remove the lookup. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18IB/mlx4: Use GID attribute from ah attributeParav Pandit4-20/+9
While converting GID index from attribute to that of the HCA, GID attribute is available from the ah_attr. Make use of GID attribute to simplify the code and also avoid avoid GID query. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18RDMA: Convert drivers to use sgid_attr instead of sgid_indexParav Pandit2-30/+17
The core code now ensures that all driver callbacks that receive an rdma_ah_attrs will have a sgid_attr's pointer if there is a GRH present. Drivers can use this pointer instead of calling a query function with sgid_index. This simplifies the drivers and also avoids races where a gid_index lookup may return different data if it is changed. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18RDMA: Use GID from the ib_gid_attr during the add_gid() callbackParav Pandit1-6/+6
Now that ib_gid_attr contains the GID, make use of that in the add_gid() callback functions for the provider drivers to simplify the add_gid() implementations. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-12treewide: kzalloc() -> kcalloc()Kees Cook1-1/+2
The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12treewide: kmalloc() -> kmalloc_array()Kees Cook2-7/+9
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-04Merge tag 'verbs_flow_counters' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git into for-nextJason Gunthorpe1-1/+5
Pull verbs counters series from Leon Romanovsky: ==================== Verbs flow counters support This series comes to allow user space applications to monitor real time traffic activity and events of the verbs objects it manages, e.g.: ibv_qp, ibv_wq, ibv_flow. The API enables generic counters creation and define mapping to association with a verbs object, the current mlx5 driver is using this API for flow counters. With this API, an application can monitor the entire life cycle of object activity, defined here as a static counters attachment. This API also allows dynamic counters monitoring of measurement points for a partial period in the verbs object life cycle. In addition it presents the implementation of the generic counters interface. This will be achieved by extending flow creation by adding a new flow count specification type which allows the user to associate a previously created flow counters using the generic verbs counters interface to the created flow, once associated the user could read statistics by using the read function of the generic counters interface. The API includes: 1. create and destroyed API of a new counters objects 2. read the counters values from HW Note: Attaching API to allow application to define the measurement points per objects is a user space only API and this data is passed to kernel when the counted object (e.g. flow) is created with the counters object. =================== * tag 'verbs_flow_counters': IB/mlx5: Add counters read support IB/mlx5: Add flow counters read support IB/mlx5: Add flow counters binding support IB/mlx5: Add counters create and destroy support IB/uverbs: Add support for flow counters IB/core: Add support for flow counters IB/core: Support passing uhw for create_flow IB/uverbs: Add read counters support IB/core: Introduce counters read verb IB/uverbs: Add create/destroy counters support IB/core: Introduce counters object and its create/destroy IB/uverbs: Add an ib_uobject getter to ioctl() infrastructure net/mlx5: Export flow counter related API net/mlx5: Use flow counter pointer as input to the query function
2018-06-02IB/core: Support passing uhw for create_flowMatan Barak1-1/+5
This is required when user-space drivers need to pass extra information regarding how to handle this flow steering specification. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-01RDMA/mlx4: Don't crash machine if zap_vma_ptes() failsLeon Romanovsky1-8/+2
The failure reported by zap_vma_ptes() means that wrong VMA pages were supplied, however it is impossible for this type of address. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-06-01RDMA/mlx4: Discard unknown SQP work requestsLeon Romanovsky1-1/+0
There is no need to crash the machine if unknown work request was received in SQP MAD. Cc: <stable@vger.kernel.org> # 3.6 Fixes: 37bfc7c1e83f ("IB/mlx4: SR-IOV multiplex and demultiplex MADs") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-06-01RDMA/mlx4: Catch FW<->SW misalignment without machine crashLeon Romanovsky1-1/+4
Any steering QP is supposed be above steering_qp_base, see function mlx4_ib_steer_qp_alloc() for it, however in case of misalignment between SW and FW, this qp_base can be wrong. Use WARN() to catch such situation without killing the machine. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-30RDMA/uverbs: Hoist the common process of disassociate_ucontext into ib coreWei Hu(Xavier)1-34/+0
This patch hoisted the common process of disassociate_ucontext callback function into ib core code, and these code are common to ervery ib_device driver. Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-28Merge branch 'mr_fix' into git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma for-nextJason Gunthorpe1-8/+42
Update mlx4 to support user MR creation against read-only memory, previously it required the memory to be writable. Based on rdma for-rc due to dependencies. * mr_fix: (2 commits) IB/mlx4: Mark user MR as writable if actual virtual memory is writable IB/core: Make testing MR flags for writability a static inline function
2018-05-28IB/mlx4: Mark user MR as writable if actual virtual memory is writableJack Morgenstein1-8/+42
To allow rereg_user_mr to modify the MR from read-only to writable without using get_user_pages again, we needed to define the initial MR as writable. However, this was originally done unconditionally, without taking into account the writability of the underlying virtual memory. As a result, any attempt to register a read-only MR over read-only virtual memory failed. To fix this, do not add the writable flag bit when the user virtual memory is not writable (e.g. const memory). However, when the underlying memory is NOT writable (and we therefore do not define the initial MR as writable), the IB core adds a "force writable" flag to its user-pages request. If this succeeds, the reg_user_mr caller gets a writable copy of the original pages. If the user-space caller then does a rereg_user_mr operation to enable writability, this will succeed. This should not be allowed, since the original virtual memory was not writable. Cc: <stable@vger.kernel.org> Fixes: 9376932d0c26 ("IB/mlx4_ib: Add support for user MR re-registration") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-05-24IB/core: Reduce the places that use zgidParav Pandit2-3/+4
Instead of open coding memcmp() to check whether a given GID is zero or not, use a helper function to do so, and replace instances of memcpy(z,&zgid) with memset. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-03IB/mlx4: Fix integer overflow when calculating optimal MTT sizeJack Morgenstein1-1/+1
When the kernel was compiled using the UBSAN option, we saw the following stack trace: [ 1184.827917] UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx4/mr.c:349:27 [ 1184.828114] signed integer overflow: [ 1184.828247] -2147483648 - 1 cannot be represented in type 'int' The problem was caused by calling round_up in procedure mlx4_ib_umem_calc_optimal_mtt_size (on line 349, as noted in the stack trace) with the second parameter (1 << block_shift) (which is an int). The second parameter should have been (1ULL << block_shift) (which is an unsigned long long). (1 << block_shift) is treated by the compiler as an int (because 1 is an integer). Now, local variable block_shift is initialized to 31. If block_shift is 31, 1 << block_shift is 1 << 31 = 0x80000000=-214748368. This is the most negative int value. Inside the round_up macro, there is a cast applied to ((1 << 31) - 1). However, this cast is applied AFTER ((1 << 31) - 1) is calculated. Since (1 << 31) is treated as an int, we get the negative overflow identified by UBSAN in the process of calculating ((1 << 31) - 1). The fix is to change (1 << block_shift) to (1ULL << block_shift) on line 349. Fixes: 9901abf58368 ("IB/mlx4: Use optimal numbers of MTT entries") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27RDMA/mlx4: Add missed RSS hash inner header flagLeon Romanovsky1-1/+2
Despite being advertised to user space application, the RSS inner header flag was filtered by checks at the beginning of QP creation routine. Cc: <stable@vger.kernel.org> # 4.15 Fixes: 4d02ebd9bbbd ("IB/mlx4: Fix RSS hash fields restrictions") Fixes: 07d84f7b6adf ("IB/mlx4: Add support to RSS hash for inner headers") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-04IB/mlx4: Check for egress flow steeringBoris Pismenny1-0/+3
ConnectX3 doesn't support egress flow steering. Return an EOPNOTSUPP error when such a flow is being created. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03RDMA: Use ib_gid_attr during GID modificationParav Pandit1-18/+12
Now that ib_gid_attr contains device, port and index, simplify the provider APIs add_gid() and del_gid() to use device, port and index fields from the ib_gid_attr attributes structure. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03IB/providers: Avoid null netdev check for RoCEParav Pandit2-7/+5
Now that IB core GID cache ensures that all RoCE entries have an associated netdev remove null checks from the provider drivers for clarity. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03IB/providers: Avoid zero GID check for RoCEParav Pandit2-5/+0
Now that the IB core GID cache ensures that a zero GID doesn't exist in the GID table remove zero GID checks from the provider drivers for clarity. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03RDMA/providers: Simplify query_gid callback of RoCE providersParav Pandit1-16/+1
ib_query_gid() fetches the GID from the software cache maintained in ib_core for RoCE ports. Therefore, simplify the provider drivers for RoCE to treat query_gid() callback as never called for RoCE, and only require non-RoCE devices to implement it. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-21IB/mlx4: Eliminate duplicate barriers on weakly-ordered archsSinan Kaya1-2/+2
Code includes wmb() followed by writel(). writel() already has a barrier on some architectures like arm64. This ends up CPU observing two barriers back to back before executing the register write. Since code already has an explicit barrier call, changing writel() to writel_relaxed(). Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/uverbs: Extend uverbs_ioctl header with driver_idMatan Barak1-0/+1
Extending uverbs_ioctl header with driver_id and another reserved field. driver_id should be used in order to identify the driver. Since every driver could have its own parsing tree, this is necessary for strace support. Downstream patches take off the EXPERIMENTAL flag from the ioctl() IB support and thus we add some reserved fields for future usage. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/mlx4: Add Scatter FCS support over WQ creationGuy Levi3-9/+32
As a default, for Ethernet packets, the device scatters only the payload of ingress packets. The scatter FCS feature lets the user to get the FCS (Ethernet's frame check sequence) in the received WR's buffer as a 4 Bytes trailer following the packet's payload. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/mlx4: Report TSO capabilitiesYishai Hadas1-2/+20
Report to the user area the TSO device capabilities, it includes the max_tso size and the QP types that support it. The TSO is applicable only when when of the ports is ETH and the device supports it. uresp logic around rss_caps is updated to fix a till-now harmless bug computing the length of the structure to copy. The code did not handle the implicit padding before rss_caps correctly. This is necessay to copy tss_caps successfully. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15RDMA/mlx4: Move flag constants to uapi headerJason Gunthorpe2-5/+1
MLX4_USER_DEV_CAP_LARGE_CQE (via mlx4_ib_alloc_ucontext_resp.dev_caps) and MLX4_IB_QUERY_DEV_RESP_MASK_CORE_CLOCK_OFFSET (via mlx4_uverbs_ex_query_device_resp.comp_mask) are copied directly to userspace and form part of the uAPI. Move them to the uapi header where they belong. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-14Merge branch 'k.o/wip/dl-for-rc' into k.o/wip/dl-for-nextDoug Ledford2-5/+10
Due to bug fixes found by the syzkaller bot and taken into the for-rc branch after development for the 4.17 merge window had already started being taken into the for-next branch, there were fairly non-trivial merge issues that would need to be resolved between the for-rc branch and the for-next branch. This merge resolves those conflicts and provides a unified base upon which ongoing development for 4.17 can be based. Conflicts: drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524 (IB/mlx5: Fix cleanup order on unload) added to for-rc and commit b5ca15ad7e61 (IB/mlx5: Add proper representors support) add as part of the devel cycle both needed to modify the init/de-init functions used by mlx5. To support the new representors, the new functions added by the cleanup patch needed to be made non-static, and the init/de-init list added by the representors patch needed to be modified to match the init/de-init list changes made by the cleanup patch. Updates: drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function prototypes added by representors patch to reflect new function names as changed by cleanup patch drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init stage list to match new order from cleanup patch Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08mlx4_ib: zero out struct ib_pd when allocatingSteve Wise1-2/+1
Zero out the fields of the struct ib_pd for user mode pds so that users querying pds via nldev will not get garbage. For simplicity, use kzalloc() to allocate the mlx4_ib_pd struct. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08mlx4_ib: set user mr attributes in struct ib_mrSteve Wise1-0/+3
Setting iova, length, and page_size allows this information to be seen via NLDEV netlink queries, which can aid in user rdma debugging. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-07IB/mlx4: Move mlx4_uverbs_ex_query_device_resp to include/uapi/Yishai Hadas1-14/+0
This struct is involved in the user API for mlx4 and should not be hidden inside a driver header file. Fixes: 09d208b258a2 ("IB/mlx4: Add report for RSS capabilities by vendor channel") Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06IB/mlx4: Include GID type when deleting GIDs from HW table under RoCEJack M1-2/+7
The commit cited below added a gid_type field (RoCEv1 or RoCEv2) to GID properties. When adding GIDs, this gid_type field was copied over to the hardware gid table. However, when deleting GIDs, the gid_type field was not copied over to the hardware gid table. As a result, when running RoCEv2, all RoCEv2 gids in the hardware gid table were set to type RoCEv1 when any gid was deleted. This problem would persist until the next gid was added (which would again restore the gid_type field for all the gids in the hardware gid table). Fix this by copying over the gid_type field to the hardware gid table when deleting gids, so that the gid_type of all remaining gids is preserved when a gid is deleted. Fixes: b699a859d17b ("IB/mlx4: Add gid_type to GID properties") Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-06IB/mlx4: Fix corruption of RoCEv2 IPv4 GIDsJack Morgenstein1-2/+0
When using IPv4 addresses in RoCEv2, the GID format for the mapped IPv4 address should be: ::ffff:<4-byte IPv4 address>. In the cited commit, IPv4 mapped IPV6 addresses had the 3 upper dwords zeroed out by memset, which resulted in deleting the ffff field. However, since procedure ipv6_addr_v4mapped() already verifies that the gid has format ::ffff:<ipv4 address>, no change is needed for the gid, and the memset can simply be removed. Fixes: 7e57b85c444c ("IB/mlx4: Add support for setting RoCEv2 gids in hardware") Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28IB/mlx: Set slid to zero in Ethernet completion structMoni Shoua1-1/+3
IB spec says that a lid should be ignored when link layer is Ethernet, for example when building or parsing a CM request message (CA17-34). However, since ib_lid_be16() and ib_lid_cpu16() validates the slid, not only when link layer is IB, we set the slid to zero to prevent false warnings in the kernel log. Fixes: 62ede7779904 ("Add OPA extended LID support") Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-29RDMA: Move enum ib_cq_creation_flags to uapi headersJason Gunthorpe1-2/+2
The flags field the enum is used with comes directly from the uapi so it belongs in the uapi headers for clarity and so userspace can use it. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15IB/mlx4: Fix incorrectly releasing steerable UD QPs when have only ETH portsJack Morgenstein1-8/+5
Allocating steerable UD QPs depends on having at least one IB port, while releasing those QPs does not. As a result, when there are only ETH ports, the IB (RoCE) driver requests releasing a qp range whose base qp is zero, with qp count zero. When SR-IOV is enabled, and the VF driver is running on a VM over a hypervisor which treats such qp release calls as errors (rather than NOPs), we see lines in the VM message log like: mlx4_core 0002:00:02.0: Failed to release qp range base:0 cnt:0 Fix this by adding a check for a zero count in mlx4_release_qp_range() (which thus treats releasing 0 qps as a nop), and eliminating the check for device managed flow steering when releasing steerable UD QPs. (Freeing ib_uc_qpns_bitmap unconditionally is also OK, since it remains NULL when steerable UD QPs are not allocated). Cc: <stable@vger.kernel.org> Fixes: 4196670be786 ("IB/mlx4: Don't allocate range of steerable UD QPs for Ethernet-only device") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08Merge branch 'bart-srpt-for-next' into k.o/wip/dl-for-nextDoug Ledford1-1/+1
Merging in 12 patch series from Bart that required changes in the current for-rc branch in order to apply cleanly. Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-02IB/mlx4: Fix mlx4_ib_alloc_mr error flowLeon Romanovsky1-1/+1
ibmr.device is being set only after ib_alloc_mr() is successfully complete. Therefore, in case imlx4_mr_enable() returns with error, the error flow unwinder calls to mlx4_free_priv_pages(), which uses ibmr.device. Such usage causes to NULL dereference oops and to fix it, the IB device should be set in the mr struct earlier stage (e.g. prior to calling mlx4_free_priv_pages()). Fixes: 1b2cd0fc673c ("IB/mlx4: Support the new memory registration API") Signed-off-by: Nitzan Carmi <nitzanc@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-28IB/mlx4: Add support to RSS hash for inner headersGuy Levi2-0/+20
Support RSS hash for inner headers according to a new flag, MLX4_IB_RX_HASH_INNER provided by the vendor channel. In case the flag is set, RSS hash will be done on the inner headers of VXLAN packets (which are encapsulated). Non-encapsulated packets will be hashed according to the outer headers. Signed-off-by: Guy Levi <guyle@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-27Merge branch 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.gitJason Gunthorpe1-7/+19
Patches for 4.16 that are dependent on patches sent to 4.15-rc. These are small clean ups for the vmw_pvrdma and i40iw drivers. * 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git: RDMA/vmw_pvrdma: Remove usage of BIT() from UAPI header RDMA/vmw_pvrdma: Use refcount_t instead of atomic_t RDMA/vmw_pvrdma: Use more specific sizeof in kcalloc RDMA/vmw_pvrdma: Clarify QP and CQ is_kernel logic RDMA/vmw_pvrdma: Add UAR SRQ macros in ABI header file i40iw: Change accelerated flag to bool
2017-12-18IB/mlx4: Remove unused ibpd parameterErez Alfasi1-2/+2
Remove unused ibpd parameter from create_qp_rss() function. Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-13IB/mlx4: Potential buffer overflow in _mlx4_set_path()Dan Carpenter1-0/+2
Smatch complains about this code: drivers/infiniband/hw/mlx4/qp.c:1827 _mlx4_set_path() error: buffer overflow 'dev->dev->caps.gid_table_len' 3 <= 255 The mlx4_ib_gid_index_to_real_index() does check that "port" is within bounds, but we don't check the return value for errors. It seems simple enough to add a check for that. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>