aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/mlx4 (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2020-01-27IB/mlx4: Fix leak in id_map_find_delHåkon Bugge1-26/+3
Using CX-3 virtual functions, either from a bare-metal machine or pass-through from a VM, MAD packets are proxied through the PF driver. Since the VF drivers have separate name spaces for MAD Transaction Ids (TIDs), the PF driver has to re-map the TIDs and keep the book keeping in a cache. Following the RDMA Connection Manager (CM) protocol, it is clear when an entry has to evicted from the cache. When a DREP is sent from mlx4_ib_multiplex_cm_handler(), id_map_find_del() is called. Similar when a REJ is received by the mlx4_ib_demux_cm_handler(), id_map_find_del() is called. This function wipes out the TID in use from the IDR or XArray and removes the id_map_entry from the table. In short, it does everything except the topping of the cake, which is to remove the entry from the list and free it. In other words, for the REJ case enumerated above, one id_map_entry will be leaked. For the other case above, a DREQ has been received first. The reception of the DREQ will trigger queuing of a delayed work to delete the id_map_entry, for the case where the VM doesn't send back a DREP. In the normal case, the VM _will_ send back a DREP, and id_map_find_del() will be called. But this scenario introduces a secondary leak. First, when the DREQ is received, a delayed work is queued. The VM will then return a DREP, which will call id_map_find_del(). As stated above, this will free the TID used from the XArray or IDR. Now, there is window where that particular TID can be re-allocated, lets say by an outgoing REQ. This TID will later be wiped out by the delayed work, when the function id_map_ent_timeout() is called. But the id_map_entry allocated by the outgoing REQ will not be de-allocated, and we have a leak. Both leaks are fixed by removing the id_map_find_del() function and only using schedule_delayed(). Of course, a check in schedule_delayed() to see if the work already has been queued, has been added. Another benefit of always using the delayed version for deleting entries, is that we do get a TimeWait effect; a TID no longer in use, will occupy the XArray or IDR for CM_CLEANUP_CACHE_TIMEOUT time, without any ability of being re-used for that time period. Fixes: 3cf69cc8dbeb ("IB/mlx4: Add CM paravirtualization") Link: https://lore.kernel.org/r/20200123155521.1212288-1-haakon.bugge@oracle.com Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Manjunath Patil <manjunath.b.patil@oracle.com> Reviewed-by: Rama Nichanamatlu <rama.nichanamatlu@oracle.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16IB/mlx4: Fix memory leak in add_gid error flowJack Morgenstein1-4/+16
In procedure mlx4_ib_add_gid(), if the driver is unable to update the FW gid table, there is a memory leak in the driver's copy of the gid table: the gid entry's context buffer is not freed. If such an error occurs, free the entry's context buffer, and mark the entry as available (by setting its context pointer to NULL). Fixes: e26be1bfef81 ("IB/mlx4: Implement ib_device callbacks") Link: https://lore.kernel.org/r/20200115085050.73746-1-leon@kernel.org Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16IB: Allow calls to ib_umem_get from kernel ULPsMoni Shoua5-9/+12
So far the assumption was that ib_umem_get() and ib_umem_odp_get() are called from flows that start in UVERBS and therefore has a user context. This assumption restricts flows that are initiated by ULPs and need the service that ib_umem_get() provides. This patch changes ib_umem_get() and ib_umem_odp_get() to get IB device directly by relying on the fact that both UVERBS and ULPs sets that field correctly. Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-03RDMA/mlx4: use true,false for bool variablezhengbin1-2/+2
Fixes coccicheck warning: drivers/infiniband/hw/mlx4/qp.c:852:2-14: WARNING: Assignment of 0/1 to bool variable drivers/infiniband/hw/mlx4/qp.c:3087:3-10: WARNING: Assignment of 0/1 to bool variable Link: https://lore.kernel.org/r/1577176812-2238-5-git-send-email-zhengbin13@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03RDMA/mlx4: Redo TX checksum offload in line with docsEugene Crosser1-11/+7
Ingress checksum offload was not working for IPv6 frames because the conditional expression that checks validation status passed from the hardware was not matching the algorithm described in the documentation. This patch defines L4_CSUM flag (which falls inside the badfcs_enc field in the existing definition of the CQE layout) and replaces the conditional expression with the one defined in the "ConnectX(r) Family Programmer's Manual" document. Link: https://lore.kernel.org/r/20191219134847.413582-1-leon@kernel.org Signed-off-by: Eugene Crosser <evgenii.cherkashin@profitbricks.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-12-12IB/mlx4: Follow mirror sequence of device add during device removalParav Pandit1-4/+5
Current code device add sequence is: ib_register_device() ib_mad_init() init_sriov_init() register_netdev_notifier() Therefore, the remove sequence should be, unregister_netdev_notifier() close_sriov() mad_cleanup() ib_unregister_device() However it is not above. Hence, make do above remove sequence. Fixes: fa417f7b520ee ("IB/mlx4: Add support for IBoE") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191212091214.315005-3-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-11-19IB/mlx4: Update HW GID table while adding vlan GIDDanit Goldberg2-1/+9
When adding a new GID compare the vlan along with the GID and type. This allows vlan's to have GIDs that alias each other, such as the default GID. Otherwise they the GID cache view can become inconsistent with the HW view. Link: https://lore.kernel.org/r/20191115154457.247763-1-leon@kernel.org Signed-off-by: Danit Goldberg <danitg@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-17IB/umem: remove the dmasync argument to ib_umem_getChristoph Hellwig5-7/+6
The argument is always ignored, so remove it. Link: https://lore.kernel.org/r/20191113073214.9514-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-12RDMA: Change MAD processing function to remove extra casting and parameterLeon Romanovsky2-18/+14
All users of process_mad() converts input pointers from ib_mad_hdr to be ib_mad, update the function declaration to use ib_mad directly. Also remove not used input MAD size parameter. Link: https://lore.kernel.org/r/20191029062745.7932-17-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Tested-By: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06RDMA/mad: Do not check MAD sizes in roce and ib driversLeon Romanovsky1-4/+0
All callers for process_mad allocate MAD structures with proper sizes, there is no need to recheck it. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06RDMA/mad: Allocate zeroed MAD bufferLeon Romanovsky1-1/+0
Ensure that MAD output buffer is zero-based allocated in all the callers of process_mad and remove the various memset()'s from the drivers. Link: https://lore.kernel.org/r/20191029062745.7932-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06RDMA: Connect between the mmap entry and the umap_priv structureMichal Kalderon1-3/+6
The rdma_user_mmap_io interface created a common interface for drivers to correctly map hw resources and zap them once the ucontext is destroyed enabling the drivers to safely free the hw resources. However, this meant the drivers need to delay freeing the resource to the ucontext destroy phase to ensure they were no longer mapped. The new mechanism for a common way of handling user/driver address mapping enabled notifying the driver if all umap_priv mappings were removed, and enabled freeing the hw resources when they are done with and not delay it until ucontext destroy. Since not all drivers use the mechanism, NULL can be sent to the rdma_user_mmap_io interface to continue working as before. Drivers that use the mmap_xa interface can pass the entry being mapped to the rdma_user_mmap_io function to be linked together. Link: https://lore.kernel.org/r/20191030094417.16866-4-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-20IB/mlx4: Fix memory leaksWenwen Wang1-2/+2
In mlx4_ib_alloc_pv_bufs(), 'tun_qp->tx_ring' is allocated through kcalloc(). However, it is not always deallocated in the following execution if an error occurs, leading to memory leaks. To fix this issue, free 'tun_qp->tx_ring' whenever an error occurs. Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Acked-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/1566159781-4642-1-git-send-email-wenwen@cs.uga.edu Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-12RDMA: Introduce ib_port_phys_state enumKamal Heib1-1/+2
In order to improve readability, add ib_port_phys_state enum to replace the use of magic numbers. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Andrew Boyer <aboyer@tobark.org> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20190807103138.17219-2-kamalheib1@gmail.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-07-25RDMA/mlx4: Annotate boolean arguments as bool and not intLeon Romanovsky1-4/+4
Information provided by qp_has_rq() and used latter is boolean, so update callers to proper type. Link: https://lore.kernel.org/r/20190704130936.8705-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-25RDMA/mlx4: Separate creation of RWQ and QPLeon Romanovsky1-82/+154
The mlx4 WQ is implemented with HW QP without special HW object. Current implementation which tried to reuse the code did it with common QP creation flows. Such decision caused to the absence of mlx4_ib_wq struct, which is needed to ensure proper allocation of ib_wq inside of IB/core. Separate create_qp_common() to pure QP flow and to create_rq() for RWQ. Link: https://lore.kernel.org/r/20190704130936.8705-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-24RDMA/mlx4: Untag user pointers in mlx4_get_umem_mrAndrey Konovalov1-3/+4
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. mlx4_get_umem_mr() uses provided user pointers for vma lookups, which can only by done with untagged pointers. Untag user pointers in this function. Link: https://lore.kernel.org/r/7969018013a67ddbbf784ac7afeea5a57b1e2bcb.1563904656.git.andreyknvl@google.com Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-22timekeeping: Use proper clock specifier names in functionsJason A. Donenfeld1-3/+3
This makes boot uniformly boottime and tai uniformly clocktai, to address the remaining oversights. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lkml.kernel.org/r/20190621203249.3909-2-Jason@zx2c4.com
2019-06-20RDMA: Check umem pointer validity prior to releaseLeon Romanovsky3-17/+11
Update ib_umem_release() to behave similarly to kfree() and allow submitting NULL pointer as safe input to this function. Fixes: a52c8e2469c3 ("RDMA: Clean destroy CQ in drivers do not return errors") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-20RDMA: Convert destroy_wq to be voidLeon Romanovsky2-4/+2
All callers of destroy WQ are always success and there is no need to check their return value, so convert destroy_wq to be void. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-11RDMA: Convert CQ allocations to be under core responsibilityLeon Romanovsky3-20/+11
Ensure that CQ is allocated and freed by IB/core and not by drivers. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Tested-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-11RDMA: Clean destroy CQ in drivers do not return errorsLeon Romanovsky2-4/+2
Like all other destroy commands, .destroy_cq() call is not supposed to fail. In all flows, the attempt to return earlier caused to memory leaks. This patch converts .destroy_cq() to do not return any errors. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Gal Pressman <galpress@amazon.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-06-10RDMA: Move owner into struct ib_device_opsJason Gunthorpe1-1/+1
This more closely follows how other subsytems work, with owner being a member of the structure containing the function pointers. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-10RDMA: Move uverbs_abi_ver into struct ib_device_opsJason Gunthorpe1-7/+8
No reason for every driver to emit code to set this, just make it part of the driver's existing static const ops structure. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-10RDMA: Move driver_id into struct ib_device_opsJason Gunthorpe1-1/+2
No reason for every driver to emit code to set this, just make it part of the driver's existing static const ops structure. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-21IB/mlx4: Delete unused func argYuval Shaia1-5/+3
The function argument virt_addr is not in use - delete it. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-05-21RDMA/umem: Move page_shift from ib_umem to ib_odp_umemJason Gunthorpe2-5/+5
This value has always been set to PAGE_SHIFT in the core code, the only thing that does differently was the ODP path. Move the value into the ODP struct and still use it for ODP, but change all the non-ODP things to just use PAGE_SHIFT/PAGE_SIZE/PAGE_MASK directly. Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-05-21treewide: Add SPDX license identifier - Makefile/KconfigThomas Gleixner2-0/+2
Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-03RDMA: Introduce and use GID attr helper to read RoCE L2 fieldsParav Pandit2-5/+9
Instead of RoCE drivers figuring out vlan, smac fields while working on QP/AH, provide a helper routine to read the L2 fields such as vlan_id and source mac address. This moves logic from mlx5 driver to core for wider usage for RoCE ports. This is a preparation patch to allow detaching netdev in subsequent patch. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08RDMA: Handle SRQ allocations by IB/coreLeon Romanovsky3-35/+20
Convert SRQ allocation from drivers to be in the IB/core Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08RDMA: Handle AH allocations by IB/coreLeon Romanovsky4-81/+63
Simplify drivers by ensuring lifetime of ib_ah object. The changes in .create_ah() go hand in hand with relevant update in .destroy_ah(). We will use this opportunity and convert .destroy_ah() to don't fail, as it was suggested a long time ago, because there is nothing to do in case of failure during destroy. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-08drivers: Remove explicit invocations of mmiowb()Will Deacon1-6/+0
mmiowb() is now implied by spin_unlock() on architectures that require it, so there is no reason to call it from driver code. This patch was generated using coccinelle: @mmiowb@ @@ - mmiowb(); and invoked as: $ for d in drivers include/linux/qed sound; do \ spatch --include-headers --sp-file mmiowb.cocci --dir $d --in-place; done NOTE: mmiowb() has only ever guaranteed ordering in conjunction with spin_unlock(). However, pairing each mmiowb() removal in this patch with the corresponding call to spin_unlock() is not at all trivial, so there is a small chance that this change may regress any drivers incorrectly relying on mmiowb() to order MMIO writes between CPUs using lock-free synchronisation. If you've ended up bisecting to this commit, you can reintroduce the mmiowb() calls using wmb() instead, which should restore the old behaviour on all architectures other than some esoteric ia64 systems. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-01IB: Pass only ib_udata in function prototypesShamir Rabinovitch6-29/+26
Now when ib_udata is passed to all the driver's object create/destroy APIs the ib_udata will carry the ib_ucontext for every user command. There is no need to also pass the ib_ucontext via the functions prototypes. Make ib_udata the only argument psssed. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01IB: Remove 'uobject->context' dependency in object destroy APIsShamir Rabinovitch3-16/+35
Now that we have the udata passed to all the ib_xxx object destroy APIs and the additional macro 'rdma_udata_to_drv_context' to get the ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally start to remove the dependency of the drivers in the ib_xxx->uobject->context. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-04-01IB: Pass uverbs_attr_bundle down ib_x destroy pathShamir Rabinovitch8-24/+24
The uverbs_attr_bundle with the ucontext is sent down to the drivers ib_x destroy path as ib_udata. The next patch will use the ib_udata to free the drivers destroy path from the dependency in 'uobject->context' as we already did for the create path. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-26mlx4: Convert pv_id_table to XArrayMatthew Wilcox2-25/+16
Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-03-17IB/mlx4: Fix race condition between catas error reset and aliasguid flowsJack Morgenstein1-1/+1
Code review revealed a race condition which could allow the catas error flow to interrupt the alias guid query post mechanism at random points. Thiis is fixed by doing cancel_delayed_work_sync() instead of cancel_delayed_work() during the alias guid mechanism destroy flow. Fixes: a0c64a17aba8 ("mlx4: Add alias_guid mechanism") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-26net: devlink: turn devlink into a built-inJakub Kicinski1-1/+0
Being able to build devlink as a module causes growing pains. First all drivers had to add a meta dependency to make sure they are not built in when devlink is built as a module. Now we are struggling to invoke ethtool compat code reliably. Make devlink code built-in, users can still not build it at all but the dynamically loadable module option is removed. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22RDMA: Handle ucontext allocations by IB/coreLeon Romanovsky1-19/+11
Following the PD conversion patch, do the same for ucontext allocations. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-22IB/mlx4: Increase the timeout for CM cacheHåkon Bugge1-1/+1
Using CX-3 virtual functions, either from a bare-metal machine or pass-through from a VM, MAD packets are proxied through the PF driver. Since the VF drivers have separate name spaces for MAD Transaction Ids (TIDs), the PF driver has to re-map the TIDs and keep the book keeping in a cache. Following the RDMA Connection Manager (CM) protocol, it is clear when an entry has to evicted form the cache. But life is not perfect, remote peers may die or be rebooted. Hence, it's a timeout to wipe out a cache entry, when the PF driver assumes the remote peer has gone. During workloads where a high number of QPs are destroyed concurrently, excessive amount of CM DREQ retries has been observed The problem can be demonstrated in a bare-metal environment, where two nodes have instantiated 8 VFs each. This using dual ported HCAs, so we have 16 vPorts per physical server. 64 processes are associated with each vPort and creates and destroys one QP for each of the remote 64 processes. That is, 1024 QPs per vPort, all in all 16K QPs. The QPs are created/destroyed using the CM. When tearing down these 16K QPs, excessive CM DREQ retries (and duplicates) are observed. With some cat/paste/awk wizardry on the infiniband_cm sysfs, we observe as sum of the 16 vPorts on one of the nodes: cm_rx_duplicates: dreq 2102 cm_rx_msgs: drep 1989 dreq 6195 rep 3968 req 4224 rtu 4224 cm_tx_msgs: drep 4093 dreq 27568 rep 4224 req 3968 rtu 3968 cm_tx_retries: dreq 23469 Note that the active/passive side is equally distributed between the two nodes. Enabling pr_debug in cm.c gives tons of: [171778.814239] <mlx4_ib> mlx4_ib_multiplex_cm_handler: id{slave: 1,sl_cm_id: 0xd393089f} is NULL! By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the tear-down phase of the application is reduced from approximately 90 to 50 seconds. Retries/duplicates are also significantly reduced: cm_rx_duplicates: dreq 2460 [] cm_tx_retries: dreq 3010 req 47 Increasing the timeout further didn't help, as these duplicates and retries stems from a too short CMA timeout, which was 20 (~4 seconds) on the systems. By increasing the CMA timeout to 22 (~17 seconds), the numbers fell down to about 10 for both of them. Adjustment of the CMA timeout is not part of this commit. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15IB/{hw,sw}: Remove 'uobject->context' dependency in object creation APIsShamir Rabinovitch3-42/+51
Now when we have the udata passed to all the ib_xxx object creation APIs and the additional macro 'rdma_udata_to_drv_context' to get the ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally start to remove the dependency of the drivers in the ib_xxx->uobject->context. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-08RDMA: Handle PD allocations by IB/coreLeon Romanovsky1-23/+13
The PD allocations in IB/core allows us to simplify drivers and their error flows in their .alloc_pd() paths. The changes in .alloc_pd() go hand in had with relevant update in .dealloc_pd(). We will use this opportunity and convert .dealloc_pd() to don't fail, as it was suggested a long time ago, failures are not happening as we have never seen a WARN_ON print. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-30RDMA: Provide safe ib_alloc_device() functionLeon Romanovsky1-1/+1
All callers to ib_alloc_device() provide a larger size than struct ib_device and rely on the fact that struct ib_device is embedded in their driver specific structure as the first member. Provide a safer variant of ib_alloc_device() that checks and enforces this approach to make sure the drivers are using it right. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-21IB/mlx4: Fix using wrong function to destroy sqp AHs under SRIOVJack Morgenstein1-3/+3
The commit cited below replaced rdma_create_ah with mlx4_ib_create_slave_ah when creating AHs for the paravirtualized special QPs. However, this change also required replacing rdma_destroy_ah with mlx4_ib_destroy_ah in the affected flows. The commit missed 3 places where rdma_destroy_ah should have been replaced with mlx4_ib_destroy_ah. As a result, the pd usecount was decremented when the ah was destroyed -- although the usecount was NOT incremented when the ah was created. This caused the pd usecount to become negative, and resulted in the WARN_ON stack trace below when the mlx4_ib.ko module was unloaded: WARNING: CPU: 3 PID: 25303 at drivers/infiniband/core/verbs.c:329 ib_dealloc_pd+0x6d/0x80 [ib_core] Modules linked in: rdma_ucm rdma_cm iw_cm ib_cm ib_umad mlx4_ib(-) ib_uverbs ib_core mlx4_en mlx4_core nfsv3 nfs fscache configfs xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc dm_mirror dm_region_hash dm_log dm_mod dax rndis_wlan rndis_host coretemp kvm_intel cdc_ether kvm usbnet iTCO_wdt iTCO_vendor_support cfg80211 irqbypass lpc_ich ipmi_si i2c_i801 mii pcspkr i2c_core mfd_core ipmi_devintf i7core_edac ipmi_msghandler ioatdma pcc_cpufreq dca acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi mptsas scsi_transport_sas mptscsih crc32c_intel ata_piix bnx2 mptbase ipv6 crc_ccitt autofs4 [last unloaded: mlx4_core] CPU: 3 PID: 25303 Comm: modprobe Tainted: G W I 5.0.0-rc1-net-mlx4+ #1 Hardware name: IBM -[7148ZV6]-/Node 1, System Card, BIOS -[MLE170CUS-1.70]- 09/23/2011 RIP: 0010:ib_dealloc_pd+0x6d/0x80 [ib_core] Code: 00 00 85 c0 75 02 5b c3 80 3d aa 87 03 00 00 75 f5 48 c7 c7 88 d7 8f a0 31 c0 c6 05 98 87 03 00 01 e8 07 4c 79 e0 0f 0b 5b c3 <0f> 0b eb be 0f 0b eb ab 90 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 RSP: 0018:ffffc90005347e30 EFLAGS: 00010282 RAX: 00000000ffffffea RBX: ffff8888589e9540 RCX: 0000000000000006 RDX: 0000000000000006 RSI: ffff88885d57ad40 RDI: 0000000000000000 RBP: ffff88885b029c00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000004 R12: ffff8887f06c0000 R13: ffff8887f06c13e8 R14: 0000000000000000 R15: 0000000000000000 FS: 00007fd6743c6740(0000) GS:ffff88887fcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000ed1038 CR3: 00000007e3156000 CR4: 00000000000006e0 Call Trace: mlx4_ib_close_sriov+0x125/0x180 [mlx4_ib] mlx4_ib_remove+0x57/0x1f0 [mlx4_ib] mlx4_remove_device+0x92/0xa0 [mlx4_core] mlx4_unregister_interface+0x39/0x90 [mlx4_core] mlx4_ib_cleanup+0xc/0xd7 [mlx4_ib] __x64_sys_delete_module+0x17d/0x290 ? trace_hardirqs_off_thunk+0x1a/0x1c ? do_syscall_64+0x12/0x180 do_syscall_64+0x4a/0x180 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 5e62d5ff1b9a ("IB/mlx4: Create slave AH's directly") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-14RDMA: Introduce and use rdma_device_to_ibdev()Parav Pandit1-3/+4
Introduce and use rdma_device_to_ibdev() API for those drivers which are registering one sysfs group and also use in ib_core. In subsequent patch, device->provider_ibdev one-to-one mapping is no longer holds true during accessing sysfs entries. Therefore, introduce an API rdma_device_to_ibdev() that provides such information. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-14RDMA: Rename port_callback to init_portParav Pandit1-1/+1
Most provider routines are callback routines which ib core invokes. _callback suffix doesn't convey information about when such callback is invoked. Therefore, rename port_callback to init_port. Additionally, store the init_port function pointer in ib_device_ops, so that it can be accessed in subsequent patches when binding rdma device to net namespace. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-10RDMA: Clear CQ objects during their allocationLeon Romanovsky1-1/+1
As part of an audit process to update drivers to use rdma_restrack_add() ensure that CQ objects is cleared before access. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-10IB/{core,hw}: Have ib_umem_get extract the ib_ucontext from ib_udataJason Gunthorpe6-26/+30
ib_umem_get() can only be called in a method callback, which always has a udata parameter. This allows ib_umem_get() to derive the ucontext pointer directly from the udata without requiring the drivers to find it in some way or another. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
2019-01-10IB/{core,uverbs}: Move ib_umem_xxx functions from ib_core to ib_uverbsShamir Rabinovitch1-1/+0
The next patch will add dependency from ib_umem_get in to ib_uverbs so move the required ib_umem_xxx functionality to it's correct module - ib_uverbs - and avoid circular dependecy from the form of ib_core -> ib_uverbs -> ib_core in depmod. Since this now requires all drivers to be build modular if uverbs is modular, hoist the test a couple drivers had into the main kconfig and apply it to all drivers uniformly. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-20IB/mlx4: Remove set but not used variable 'pd'YueHaibing1-3/+0
Fixes gcc '-Wunused-but-set-variable' warning: drivers/infiniband/hw/mlx4/qp.c: In function '_mlx4_ib_destroy_qp': drivers/infiniband/hw/mlx4/qp.c:1612:22: warning: variable 'pd' set but not used [-Wunused-but-set-variable] Fixes: e00b64f7c54c ("RDMA: Cleanup undesired pd->uobject usage") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>