aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/sw/rxe (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-01-21Merge tag 'rds-odp-for-5.5' into rdma.git for-nextJason Gunthorpe1-1/+1
From https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma Leon Romanovsky says: ==================== Use ODP MRs for kernel ULPs The following series extends MR creation routines to allow creation of user MRs through kernel ULPs as a proxy. The immediate use case is to allow RDS to work over FS-DAX, which requires ODP (on-demand-paging) MRs to be created and such MRs were not possible to create prior this series. The first part of this patchset extends RDMA to have special verb ib_reg_user_mr(). The common use case that uses this function is a userspace application that allocates memory for HCA access but the responsibility to register the memory at the HCA is on an kernel ULP. This ULP acts as an agent for the userspace application. The second part provides advise MR functionality for ULPs. This is integral part of ODP flows and used to trigger pagefaults in advance to prepare memory before running working set. The third part is actual user of those in-kernel APIs. ==================== * tag 'rds-odp-for-5.5': net/rds: Use prefetch for On-Demand-Paging MR net/rds: Handle ODP mr registration/unregistration net/rds: Detect need of On-Demand-Paging memory registration RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths IB/mlx5: Mask out unsupported ODP capabilities for kernel QPs RDMA/mlx5: Don't fake udata for kernel path IB/mlx5: Add ODP WQE handlers for kernel QPs IB/core: Add interface to advise_mr for kernel users IB/core: Introduce ib_reg_user_mr IB: Allow calls to ib_umem_get from kernel ULPs Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16IB: Allow calls to ib_umem_get from kernel ULPsMoni Shoua1-1/+1
So far the assumption was that ib_umem_get() and ib_umem_odp_get() are called from flows that start in UVERBS and therefore has a user context. This assumption restricts flows that are initiated by ULPs and need the service that ib_umem_get() provides. This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device directly by relying on the fact that both UVERBS and ULPs sets that field correctly. Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-15RDMA/rxe: Compute the maximum sges and inline size based on the WQE sizeRao Shoaib1-10/+8
The SGE buffer size and max_inline data should be derived from the size of the WQE. Each value individually sets the WQE size, so compute the actual sizes based on the actual WQE size and configure the QP with the maximums. Also fix the missing return of the actual maximum capability to the caller. Link: https://lore.kernel.org/r/1578962480-17814-3-git-send-email-rao.shoaib@oracle.com Signed-off-by: Rao Shoaib <rao.shoaib@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-15Introduce maximum WQE size to check limitsRao Shoaib1-1/+6
Introduce maximum WQE size to impose limits on max SGE's and inline data Link: https://lore.kernel.org/r/1578962480-17814-2-git-send-email-rao.shoaib@oracle.com Signed-off-by: Rao Shoaib <rao.shoaib@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03RDMA/rxe: Fix error type of mmap_offsetJiewei Ke1-1/+1
The type of mmap_offset should be u64 instead of int to match the type of mminfo.offset. If otherwise, after we create several thousands of CQs, it will run into overflow issues. Link: https://lore.kernel.org/r/20191227113613.5020-1-kejiewei.cn@gmail.com Signed-off-by: Jiewei Ke <kejiewei.cn@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-12-09rxe: correctly calculate iCRC for unaligned payloadsSteve Wise3-1/+14
If RoCE PDUs being sent or received contain pad bytes, then the iCRC is miscalculated, resulting in PDUs being emitted by RXE with an incorrect iCRC, as well as ingress PDUs being dropped due to erroneously detecting a bad iCRC in the PDU. The fix is to include the pad bytes, if any, in iCRC computations. Note: This bug has caused broken on-the-wire compatibility with actual hardware RoCE devices since the soft-RoCE driver was first put into the mainstream kernel. Fixing it will create an incompatibility with the original soft-RoCE devices, but is necessary to be compatible with real hardware devices. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Steve Wise <larrystevenwise@gmail.com> Link: https://lore.kernel.org/r/20191203020319.15036-2-larrystevenwise@gmail.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-04net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookupSabrina Dubroca1-3/+5
ipv6_stub uses the ip6_dst_lookup function to allow other modules to perform IPv6 lookups. However, this function skips the XFRM layer entirely. All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the ip_route_output_key and ip_route_output helpers) for their IPv4 lookups, which calls xfrm_lookup_route(). This patch fixes this inconsistent behavior by switching the stub to ip6_dst_lookup_flow, which also calls xfrm_lookup_route(). This requires some changes in all the callers, as these two functions take different arguments and have different return types. Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-17IB/umem: remove the dmasync argument to ib_umem_getChristoph Hellwig1-1/+1
The argument is always ignored, so remove it. Link: https://lore.kernel.org/r/20191113073214.9514-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/rxe: Increase DMA max_segment_size parameterBart Van Assche2-0/+4
Increase the DMA max_segment_size parameter from 64 KB to 2 GB. Link: https://lore.kernel.org/r/20191025225830.257535-3-bvanassche@acm.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-24RDMA/rxe: Remove useless rxe_init_device_param assignmentsLeon Romanovsky2-26/+0
IB devices are allocated with kzalloc and don't need explicit zero assignments for their parameters. It can be removed safely. Link: https://lore.kernel.org/r/20191020055724.7410-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/rxe: Verify modify_device maskKamal Heib1-0/+4
Verify that the passed mask to rxe_modify_device() is supported. Link: https://lore.kernel.org/r/20190923104158.5331-4-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-12RDMA: Introduce ib_port_phys_state enumKamal Heib3-8/+4
In order to improve readability, add ib_port_phys_state enum to replace the use of magic numbers. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Andrew Boyer <aboyer@tobark.org> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20190807103138.17219-2-kamalheib1@gmail.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-07-08RDMA/rxe: Fill in wc byte_len with IB_WC_RECV_RDMA_WITH_IMMKonstantin Taranov2-1/+5
Calculate the correct byte_len on the receiving side when a work completion is generated with IB_WC_RECV_RDMA_WITH_IMM opcode. According to the IBA byte_len must indicate the number of written bytes, whereas it was always equal to zero for the IB_WC_RECV_RDMA_WITH_IMM opcode, even though data was transferred. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Konstantin Taranov <konstantin.taranov@inf.ethz.ch> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-07ibverbs/rxe: Remove variable self-initializationMaksym Planeta1-1/+1
In some cases (not in this particular one) variable self-initialization can lead to undefined behavior. In this case, it is just obscure code. Signed-off-by: Maksym Planeta <mplaneta@os.inf.tu-dresden.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28Merge tag 'v5.2-rc6' into rdma.git for-nextJason Gunthorpe1-0/+1
For dependencies in next patches. Resolve conflicts: - Use uverbs_get_cleared_udata() with new cq allocation flow - Continue to delete nes despite SPDX conflict - Resolve list appends in mlx5_command_str() - Use u16 for vport_rule stuff - Resolve list appends in struct ib_client Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-20RDMA: Check umem pointer validity prior to releaseLeon Romanovsky1-2/+1
Update ib_umem_release() to behave similarly to kfree() and allow submitting NULL pointer as safe input to this function. Fixes: a52c8e2469c3 ("RDMA: Clean destroy CQ in drivers do not return errors") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-11RDMA: Convert CQ allocations to be under core responsibilityLeon Romanovsky3-21/+12
Ensure that CQ is allocated and freed by IB/core and not by drivers. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-11RDMA: Clean destroy CQ in drivers do not return errorsLeon Romanovsky1-2/+1
Like all other destroy commands, .destroy_cq() call is not supposed to fail. In all flows, the attempt to return earlier caused to memory leaks. This patch converts .destroy_cq() to do not return any errors. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-10RDMA: Move owner into struct ib_device_opsJason Gunthorpe1-1/+1
This more closely follows how other subsytems work, with owner being a member of the structure containing the function pointers. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-10RDMA: Move uverbs_abi_ver into struct ib_device_opsJason Gunthorpe1-1/+1
No reason for every driver to emit code to set this, just make it part of the driver's existing static const ops structure. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-10RDMA: Move driver_id into struct ib_device_opsJason Gunthorpe1-1/+2
No reason for every driver to emit code to set this, just make it part of the driver's existing static const ops structure. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-21treewide: Add SPDX license identifier - Makefile/KconfigThomas Gleixner1-0/+1
Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds13-146/+119
Pull rdma updates from Jason Gunthorpe: "This has been a smaller cycle than normal. One new driver was accepted, which is unusual, and at least one more driver remains in review on the list. Summary: - Driver fixes for hns, hfi1, nes, rxe, i40iw, mlx5, cxgb4, vmw_pvrdma - Many patches from MatthewW converting radix tree and IDR users to use xarray - Introduction of tracepoints to the MAD layer - Build large SGLs at the start for DMA mapping and get the driver to split them - Generally clean SGL handling code throughout the subsystem - Support for restricting RDMA devices to net namespaces for containers - Progress to remove object allocation boilerplate code from drivers - Change in how the mlx5 driver shows representor ports linked to VFs - mlx5 uapi feature to access the on chip SW ICM memory - Add a new driver for 'EFA'. This is HW that supports user space packet processing through QPs in Amazon's cloud" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (186 commits) RDMA/ipoib: Allow user space differentiate between valid dev_port IB/core, ipoib: Do not overreact to SM LID change event RDMA/device: Don't fire uevent before device is fully initialized lib/scatterlist: Remove leftover from sg_page_iter comment RDMA/efa: Add driver to Kconfig/Makefile RDMA/efa: Add the efa module RDMA/efa: Add EFA verbs implementation RDMA/efa: Add common command handlers RDMA/efa: Implement functions that submit and complete admin commands RDMA/efa: Add the ABI definitions RDMA/efa: Add the com service API definitions RDMA/efa: Add the efa_com.h file RDMA/efa: Add the efa.h header file RDMA/efa: Add EFA device definitions RDMA: Add EFA related definitions RDMA/umem: Remove hugetlb flag RDMA/bnxt_re: Use core helpers to get aligned DMA address RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks RDMA/umem: Add API to find best driver supported page size in an MR ...
2019-05-03RDMA/rxe: Use rdma_read_gid_attr_ndev_rcu to access netdevParav Pandit1-3/+12
Use rdma_read_gid_attr_ndev_rcu() to access netdevice attached to GID entry under rcu lock. This ensures that while working on the netdevice of the GID, it doesn't get freed. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-03RDMA/rxe: Consider skb reserve space based on netdev of GIDParav Pandit1-1/+2
Always consider the skb reserve space based on netdevice of the GID attribute, regardless of vlan or non vlan netdevice. Fixes: 43c9fc509fa5 ("rdma_rxe: make rxe work over 802.1q VLAN devices") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-25crypto: shash - remove shash_desc::flagsEric Biggers1-1/+0
The flags field in 'struct shash_desc' never actually does anything. The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP. However, no shash algorithm ever sleeps, making this flag a no-op. With this being the case, inevitably some users who can't sleep wrongly pass MAY_SLEEP. These would all need to be fixed if any shash algorithm actually started sleeping. For example, the shash_ahash_*() functions, which wrap a shash algorithm with the ahash API, pass through MAY_SLEEP from the ahash API to the shash API. However, the shash functions are called under kmap_atomic(), so actually they're assumed to never sleep. Even if it turns out that some users do need preemption points while hashing large buffers, we could easily provide a helper function crypto_shash_update_large() which divides the data into smaller chunks and calls crypto_shash_update() and cond_resched() for each chunk. It's not necessary to have a flag in 'struct shash_desc', nor is it necessary to make individual shash algorithms aware of this at all. Therefore, remove shash_desc::flags, and document that the crypto_shash_*() functions can be called from any context. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-04-08RDMA: Handle SRQ allocations by IB/coreLeon Romanovsky3-21/+14
Convert SRQ allocation from drivers to be in the IB/core Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08RDMA: Handle AH allocations by IB/coreLeon Romanovsky3-20/+14
Simplify drivers by ensuring lifetime of ib_ah object. The changes in .create_ah() go hand in hand with relevant update in .destroy_ah(). We will use this opportunity and convert .destroy_ah() to don't fail, as it was suggested a long time ago, because there is nothing to do in case of failure during destroy. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01IB: Pass only ib_udata in function prototypesShamir Rabinovitch8-71/+51
Now when ib_udata is passed to all the driver's object create/destroy APIs the ib_udata will carry the ib_ucontext for every user command. There is no need to also pass the ib_ucontext via the functions prototypes. Make ib_udata the only argument psssed. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01IB: Pass uverbs_attr_bundle down ib_x destroy pathShamir Rabinovitch1-9/+8
The uverbs_attr_bundle with the ucontext is sent down to the drivers ib_x destroy path as ib_udata. The next patch will use the ib_udata to free the drivers destroy path from the dependency in 'uobject->context' as we already did for the create path. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-28RDMA/rxe: Use correct sizing on buffers holding page DMA addressesShiraz Saleem1-1/+1
The buffer that holds the page DMA addresses is sized off umem->nmap. This can potentially cause out of bound accesses on the PBL array when iterating the umem DMA-mapped SGL. This is because if umem pages are combined, umem->nmap can be much lower than the number of system pages in umem. Use ib_umem_num_pages() to size this buffer. Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-27IB/rxe: Replace av->network_type with skb->protocolZhu Yanjun1-16/+12
In the function rxe_init_packet, based on av->network_type, skb->protocol is set to ipv4 or ipv6. The functions rxe_prepare and rxe_send are called after the functin rxe_init_packet. So in these functions, av->network_type can be replaced with skb->protocol. The functions are in the xmit fast path. So with skb->protocol, the performance will be better. Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-26RDMA/rxe: Fix slab-out-bounds access which lead to kernel crash laterLeon Romanovsky1-5/+6
BUG: KASAN: slab-out-of-bounds in rxe_mem_init_user+0x6c1/0x740 [rdma_rxe] Read of size 8 at addr ffff88805c01a608 by task ib_send_bw/573 CPU: 24 PID: 573 Comm: ib_send_bw Not tainted 5.0.0-rc5+ #189 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 Call Trace: rxe_mem_init_user+0x6c1/0x740 [rdma_rxe] rxe_reg_user_mr+0x9b/0x110 [rdma_rxe] ib_uverbs_reg_mr+0x428/0x9c0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2b0/0x410 [ib_uverbs] ib_uverbs_run_method+0x79c/0x1da0 [ib_uverbs] rxe_mem_init_user+0x6c1/0x740 [rdma_rxe] rxe_reg_user_mr+0x9b/0x110 [rdma_rxe] ib_uverbs_reg_mr+0x428/0x9c0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2b0/0x410 [ib_uverbs] ib_uverbs_run_method+0x79c/0x1da0 [ib_uverbs] ib_uverbs_cmd_verbs+0x5f2/0xf20 [ib_uverbs] ib_uverbs_ioctl+0x202/0x310 [ib_uverbs] do_vfs_ioctl+0x193/0x1440 ksys_ioctl+0x3a/0x70 __x64_sys_ioctl+0x6f/0xb0 do_syscall_64+0x13f/0x570 entry_SYSCALL_64_after_hwframe+0x49/0xbe Allocated by task 573: __kasan_kmalloc.constprop.5+0xc1/0xd0 __kmalloc+0x161/0x310 rxe_mem_alloc+0x52/0x470 [rdma_rxe] rxe_mem_init_user+0x113/0x740 [rdma_rxe] rxe_reg_user_mr+0x9b/0x110 [rdma_rxe] ib_uverbs_reg_mr+0x428/0x9c0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2b0/0x410 [ib_uverbs] ib_uverbs_run_method+0x79c/0x1da0 [ib_uverbs] ib_uverbs_cmd_verbs+0x5f2/0xf20 [ib_uverbs] ib_uverbs_ioctl+0x202/0x310 [ib_uverbs] do_vfs_ioctl+0x193/0x1440 ksys_ioctl+0x3a/0x70 __x64_sys_ioctl+0x6f/0xb0 do_syscall_64+0x13f/0x570 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 0: __kasan_slab_free+0x12e/0x180 kfree+0x10a/0x2c0 rcu_process_callbacks+0xa77/0x1260 __do_softirq+0x2ad/0xacb Test scenario: ib_send_bw -x 1 -d rxe0 -a & ib_send_bw -x 1 -d rxe0 -a localhost Fixes: 8700e3e7c485 ("Soft RoCE driver") Reported-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com> Tested-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-25RDMA: Use __packed annotation instead of __attribute__ ((packed))Erez Alfasi1-1/+1
"__attribute__" set of macros has been standardized, have became more potentially portable and consistent code back in v2.6.21 by commit 82ddcb040 ("[PATCH] extend the set of "__attribute__" shortcut macros"). Moreover, nowadays checkpatch.pl warns about using __attribute__((packed)) instead of __packed. This patch converts all the "__attribute__ ((packed))" annotations to "__packed" within the RDMA subsystem. Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-22RDMA: Handle ucontext allocations by IB/coreLeon Romanovsky3-9/+8
Following the PD conversion patch, do the same for ucontext allocations. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19rdma_rxe: Use netlink messages to add/delete linksSteve Wise8-13/+46
Add support for the RDMA_NLDEV_CMD_NEWLINK/DELLINK messages which allow dynamically adding new RXE links. Deprecate the old module options for now. Cc: Moni Shoua <monis@mellanox.com> Reviewed-by: Yanjun Zhu <yanjun.zhu@oracle.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/rxe: Close a race after ib_register_deviceJason Gunthorpe4-12/+21
Since rxe allows unregistration from other threads the rxe pointer can become invalid any moment after ib_register_driver returns. This could cause a user triggered use after free. Add another driver callback to be called right after the device becomes registered to complete any device setup required post-registration. This callback has enough core locking to prevent the device from becoming unregistered. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/rxe: Add ib_device_get_by_name() and use it in rxeJason Gunthorpe5-33/+4
rxe has an open coded version of this that is not as safe as the core version. This lets us eliminate the internal device list entirely from rxe. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/rxe: Use driver_unregister and new unregistration APIJason Gunthorpe8-87/+30
rxe does not have correct locking for its registration/unregistration paths, use the core code to handle it instead. In this mode ib_unregister_device will also do the dealloc, so rxe is required to do clean up from a callback. The core code ensures that unregistration is done only once, and generally takes care of locking and concurrency problems for rxe. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/rxe: Use ib_device_get_by_netdev() instead of open codingJason Gunthorpe4-47/+38
The core API handles the locking correctly and is faster if there are multiple devices. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15IB/{hw,sw}: Remove 'uobject->context' dependency in object creation APIsShamir Rabinovitch2-5/+9
Now when we have the udata passed to all the ib_xxx object creation APIs and the additional macro 'rdma_udata_to_drv_context' to get the ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally start to remove the dependency of the drivers in the ib_xxx->uobject->context. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-11RDMA/rxe: Use for_each_sg_page iterator on umem SGLShiraz, Saleem1-7/+6
The driver walks the umem SGL assuming a 1:1 mapping between SGE and system page. Update to use the for_each_sg_page iterator to get individual pages contained in the SGEs. This is a pre-requisite before adding page combining into SGEs while building the scatter table in IB core. Additionally, purge umem->page_shift usage in the driver as its only relevant for ODP MRs. Use system page size and shift instead. Signed-off-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-08RDMA: Handle PD allocations by IB/coreLeon Romanovsky4-21/+61
The PD allocations in IB/core allows us to simplify drivers and their error flows in their .alloc_pd() paths. The changes in .alloc_pd() go hand in had with relevant update in .dealloc_pd(). We will use this opportunity and convert .dealloc_pd() to don't fail, as it was suggested a long time ago, failures are not happening as we have never seen a WARN_ON print. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04RDMA/rxe: Improve loopback markingKamal Heib2-6/+4
Currently a packet is marked for loopback only if the source and destination addresses equals. This is not enough when multiple gids are present in rxe device's gid table and the traffic is from one gid to another. Fix it by marking the packet for loopback if the destination MAC address is equal to the source MAC address. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Tested-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-04RDMA/rxe: Move rxe_init_av() to rxe_av.cKamal Heib4-14/+11
Move the function rxe_init_av() to rxe_av.c file and use it instead of calling rxe_av_from_attr() and rxe_av_fill_ip_info(), also remove the unused rxe_dev parameter from rxe_init_av(). Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30RDMA: Provide safe ib_alloc_device() functionLeon Romanovsky1-1/+1
All callers to ib_alloc_device() provide a larger size than struct ib_device and rely on the fact that struct ib_device is embedded in their driver specific structure as the first member. Provide a safer variant of ib_alloc_device() that checks and enforces this approach to make sure the drivers are using it right. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21IB/rxe: Remove unnecessary rxe variableZhu Yanjun4-16/+11
The variable rxe in the function is not used. So it is removed. Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-14RDMA: Introduce and use rdma_device_to_ibdev()Parav Pandit1-2/+2
Introduce and use rdma_device_to_ibdev() API for those drivers which are registering one sysfs group and also use in ib_core. In subsequent patch, device->provider_ibdev one-to-one mapping is no longer holds true during accessing sysfs entries. Therefore, introduce an API rdma_device_to_ibdev() that provides such information. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-14RDMA: Rename port_callback to init_portParav Pandit1-1/+1
Most provider routines are callback routines which ib core invokes. _callback suffix doesn't convey information about when such callback is invoked. Therefore, rename port_callback to init_port. Additionally, store the init_port function pointer in ib_device_ops, so that it can be accessed in subsequent patches when binding rdma device to net namespace. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-10IB/{core,hw}: Have ib_umem_get extract the ib_ucontext from ib_udataJason Gunthorpe1-1/+1
ib_umem_get() can only be called in a method callback, which always has a udata parameter. This allows ib_umem_get() to derive the ucontext pointer directly from the udata without requiring the drivers to find it in some way or another. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>