aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-07-18RDMA/hns: Fix the wrong type of return value of the interrupt handlerHaoyue Xu1-13/+14
The type of return value of the interrupt handler should be irqreturn_t. Link: https://lore.kernel.org/r/20220714134353.16700-3-liangwenpeng@huawei.com Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/hns: Remove unused abnormal interrupt of type RASHaoyue Xu2-11/+0
The HNS NIC driver receives and handles the abnormal interrupt of the RAS type generated by ROCEE, and the HNS RDMA driver does not need to handle this type of interrupt. Therefore, delete unused codes in the HNS RDMA driver. Link: https://lore.kernel.org/r/20220714134353.16700-2-liangwenpeng@huawei.com Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/qedr: Fix potential memory leak in __qedr_alloc_mr()Jianglei Nie1-3/+5
__qedr_alloc_mr() allocates a memory chunk for "mr->info.pbl_table" with init_mr_info(). When rdma_alloc_tid() and rdma_register_tid() fail, "mr" is released while "mr->info.pbl_table" is not released, which will lead to a memory leak. We should release the "mr->info.pbl_table" with qedr_free_pbl() when error occurs to fix the memory leak. Fixes: e0290cce6ac0 ("qedr: Add support for memory registeration verbs") Link: https://lore.kernel.org/r/20220714061505.2342759-1-niejianglei2021@163.com Signed-off-by: Jianglei Nie <niejianglei2021@163.com> Acked-by: Michal KalderonĀ <michal.kalderon@marvell.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/hfi1: Depend on !UMLEhab Ababneh1-1/+1
Both hfi1 and UML depend on x86_64, this can trigger build errors. This driver must depends on !UML because it accesses x86_64 features that are not supported by UML. Link: https://lore.kernel.org/r/165755127879.2996325.5668395672492732376.stgit@awfm-02.cornelisnetworks.com Signed-off-by: Ehab Ababneh <ehab.ababneh@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Use the bitmap API to allocate bitmapsChristophe JAILLET1-5/+4
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Link: https://lore.kernel.org/r/1f671b1af5881723ee265a0a12809c92950e58aa.1657567269.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/qib: Use the bitmap API to allocate bitmapsChristophe JAILLET1-3/+2
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Link: https://lore.kernel.org/r/f7a8588447679e80a438b6188b0603c1a11ad877.1657300671.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Fix setting of QP context err_rq_idx_valid fieldMustafa Ismail1-11/+4
Setting err_rq_idx_valid field in QP context when the AE source of the AEQE is not associated with an RQ causes the firmware flush to fail. Set err_rq_idx_valid field in QP context only if it is associated with an RQ. Additionally, cleanup the redundant setting of this field in irdma_process_aeq. Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20220705230815.265-8-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Fix VLAN connection with wildcard addressMustafa Ismail1-5/+6
When an application listens on a wildcard address, and there are VLAN and non-VLAN IP addresses, iWARP connection establishemnt can fail if the listen node VLAN ID does not match. Fix this by checking the vlan_id only if not a wildcard listen node. Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Link: https://lore.kernel.org/r/20220705230815.265-7-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Fix a window for use-after-freeMustafa Ismail1-1/+1
During a destroy CQ an interrupt may cause processing of a CQE after CQ resources are freed by irdma_cq_free_rsrc(). Fix this by moving the call to irdma_cq_free_rsrc() after the irdma_sc_cleanup_ceqes(), which is called under the cq_lock. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Link: https://lore.kernel.org/r/20220705230815.265-6-shiraz.saleem@intel.com Signed-off-by: Bartosz Sobczak <bartosz.sobczak@intel.com> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Make resource distribution algorithm more QP orientedNayan Kumar2-7/+6
Adapt the resource distribution algorithm in irdma_cfg_fpm_val to be more QP oriented. If the configuration is too big for the available memory, trim the MR and PBLE's first before trimming the QPs. This also avoids having to double QPs requested as input to algorithm for GEN1 devices. Link: https://lore.kernel.org/r/20220705230815.265-5-shiraz.saleem@intel.com Signed-off-by: Nayan Kumar <nayan.kumar@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Make CQP invalid state error non-criticalMustafa Ismail1-0/+1
The invalid state error returned by the Control Queue-Pair (CQP) is not a critical error. Add it to the irdma_noncrit_err_list and drop reporting it as device error message. Link: https://lore.kernel.org/r/20220705230815.265-4-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Add AE source to error logMustafa Ismail1-2/+2
To assist with debugging add the Asynchronous Event (AE) source when logging the abnormal AE error log message. Link: https://lore.kernel.org/r/20220705230815.265-3-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Add 2 level PBLE support for FMRMustafa Ismail2-4/+12
Level 2 Physical Buffer List Entry (PBLE) is currently not supported for Fast MRs which limits memory registrations to 256K pages. Adapt irdma_set_page and irdma_alloc_mr to allow for 2 level PBLEs. Link: https://lore.kernel.org/r/20220705230815.265-2-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-17IB/hfi1: switch to netif_napi_add_weight()Jakub Kicinski1-1/+1
Since we'll remove the last argument from netif_napi_add() soon switch this RDMA driver to netif_napi_add_weight() for now to avoid cross-tree patches. Link: https://lore.kernel.org/r/20220705230208.924408-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-17IB/hfi1: switch to netif_napi_add_tx()Jakub Kicinski1-3/+1
Switch to the new API not requiring the NAPI_POLL_WEIGHT argument. Link: https://lore.kernel.org/r/20220705230208.924408-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-17RDMA/qib: Use the bitmap API when applicableChristophe JAILLET1-15/+8
Using the bitmap API is less verbose than hand writing them. It also improves the semantic. While at it, initialize the bitmaps. It can't hurt. Link: https://lore.kernel.org/r/33d8992586d382bec8b8efd83e4729fb7feaf89e.1656834106.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-15tracing/IB/hfi1: Use the new __vstring() helperSteven Rostedt (Google)1-6/+2
Instead of open coding a __dynamic_array() with a fixed length (which defeats the purpose of the dynamic array in the first place). Use the new __vstring() helper that will use a va_list and only write enough of the string into the ring buffer that is needed. Link: https://lkml.kernel.org/r/20220705224749.239494531@goodmis.org Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-11RDMA/irdma: Fix sleep from invalid context BUGMustafa Ismail1-50/+0
Taking the qos_mutex to process RoCEv2 QP's on netdev events causes a kernel splat. Fix this by removing the handling for RoCEv2 in irdma_cm_teardown_connections that uses the mutex. This handling is only needed for iWARP to avoid having connections established while the link is down or having connections remain functional after the IP address is removed. BUG: sleeping function called from invalid context at kernel/locking/mutex. Call Trace: kernel: dump_stack+0x66/0x90 kernel: ___might_sleep.cold.92+0x8d/0x9a kernel: mutex_lock+0x1c/0x40 kernel: irdma_cm_teardown_connections+0x28e/0x4d0 [irdma] kernel: ? check_preempt_curr+0x7a/0x90 kernel: ? select_idle_sibling+0x22/0x3c0 kernel: ? select_task_rq_fair+0x94c/0xc90 kernel: ? irdma_exec_cqp_cmd+0xc27/0x17c0 [irdma] kernel: ? __wake_up_common+0x7a/0x190 kernel: irdma_if_notify+0x3cc/0x450 [irdma] kernel: ? sched_clock_cpu+0xc/0xb0 kernel: irdma_inet6addr_event+0xc6/0x150 [irdma] Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-11RDMA/irdma: Do not advertise 1GB page size for x722Mustafa Ismail4-2/+5
x722 does not support 1GB page size but the irdma driver incorrectly advertises 1GB page size support for x722 device to ib_core to compute the best page size to use on this MR. This could lead to incorrect start offsets computed by hardware on the MR. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-04IB: Fix spelling of 'writable'Zhang Jiaming2-3/+3
There is a typo (writeable) in qib_file_ops.c, qib_sd7220.c's comments, and in rxe_check_bind_mw() Link: https://lore.kernel.org/r/20220701074812.12615-1-jiaming@nfschina.com Link: https://lore.kernel.org/r/20220701080019.13329-1-jiaming@nfschina.com Signed-off-by: Zhang Jiaming <jiaming@nfschina.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-1/+4
drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c 9c5de246c1db ("net: sparx5: mdb add/del handle non-sparx5 devices") fbb89d02e33a ("net: sparx5: Allow mdb entries to both CPU and ports") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-24RDMA: Correct duplicated words in commentsJiang Jian1-1/+1
There is a duplicated word 'is' and 'for' in a comment that needs to be dropped. Link: https://lore.kernel.org/r/20220622170853.3644-1-jiangjian@cdjrlc.com Link: https://lore.kernel.org/r/20220623103708.43104-1-jiangjian@cdjrlc.com Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-06-16Merge branch 'mlx5-next' into wip/leon-for-nextLeon Romanovsky2-18/+36
This merge commit includes shared updates to net-next and rdma-next for upcoming mlx5 features. 1) Updated HW bits and definitions for upcoming features 1.1) vport debug counters 1.2) flow meter 1.3) Execute ASO action for flow entry 1.4) enhanced CQE compression 2) Add ICM header-modify-pattern RDMA API Leon Says ========= SW steering manipulates packet's header using "modifying header" actions. Many of these actions do the same operation, but use different data each time. Currently we create and keep every one of these actions, which use expensive and limited resources. Now we introduce a new mechanism - pattern and argument, which splits a modifying action into two parts: 1. action pattern: contains the operations to be applied on packet's header, mainly set/add/copy of fields in the packet 2. action data/argument: contains the data to be used by each operation in the pattern. This way we reuse same patterns with different arguments to create new modifying actions, and since many actions share the same operations, we end up creating a small number of patterns that we keep in a dedicated cache. These modify header patterns are implemented as new type of ICM memory, so the following kernel patch series add the support for this new ICM type. ========== Link: https://lore.kernel.org/all/20220614184028.51548-1-saeed@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-06-16RDMA/usnic: Use device_iommu_capable()Robin Murphy1-1/+1
Use the new interface to check the capability for our device specifically. Link: https://lore.kernel.org/r/96ffe7050da0aa0ad6bce4705c3532f3ecaf32e3.1654688682.git.robin.murphy@arm.com Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-06-13RDMA/mlx5: Support handling of modify-header pattern ICM areaYevgeny Kliteynik2-18/+36
Add support for allocate/deallocate and registering MR of the new type of ICM area. Support exists only for devices that support sw_owner_v2. Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-07RDMA/mlx5: Add a umr recovery flowAharon Landau3-11/+83
When a UMR fails, the UMR QP state changes to an error state. Therefore, all the further UMR operations will fail too. Add a recovery flow to the UMR QP, and repost the flushed WQEs. Link: https://lore.kernel.org/r/6cc24816cca049bd8541317f5e41d3ac659445d3.1652588303.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-06-07RDMA/hfi1: Fix typo in commentXiang wangx1-1/+1
Delete the redundant word 'and'. Link: https://lore.kernel.org/r/20220606123419.29109-1-wangxiang@cdjrlc.com Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-06-07RDMA/qedr: Fix reporting QP timeout attributeKamal Heib2-1/+4
Make sure to save the passed QP timeout attribute when the QP gets modified, so when calling query QP the right value is reported and not the converted value that is required by the firmware. This issue was found while running the pyverbs tests. Fixes: cecbcddf6461 ("qedr: Add support for QP verbs") Link: https://lore.kernel.org/r/20220525132029.84813-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Michal KalderonĀ <michal.kalderon@marvell.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-05-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds45-1696/+1578
Pull rdma updates from Jason Gunthorpe: "Small collection of incremental improvement patches: - Minor code cleanup patches, comment improvements, etc from static tools - Clean the some of the kernel caps, reducing the historical stealth uAPI leftovers - Bug fixes and minor changes for rdmavt, hns, rxe, irdma - Remove unimplemented cruft from rxe - Reorganize UMR QP code in mlx5 to avoid going through the IB verbs layer - flush_workqueue(system_unbound_wq) removal - Ensure rxe waits for objects to be unused before allowing the core to free them - Several rc quality bug fixes for hfi1" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (67 commits) RDMA/rtrs-clt: Fix one kernel-doc comment RDMA/hfi1: Remove all traces of diagpkt support RDMA/hfi1: Consolidate software versions RDMA/hfi1: Remove pointless driver version RDMA/hfi1: Fix potential integer multiplication overflow errors RDMA/hfi1: Prevent panic when SDMA is disabled RDMA/hfi1: Prevent use of lock before it is initialized RDMA/rxe: Fix an error handling path in rxe_get_mcg() IB/core: Fix typo in comment RDMA/core: Fix typo in comment IB/hf1: Fix typo in comment IB/qib: Fix typo in comment IB/iser: Fix typo in comment RDMA/mlx4: Avoid flush_scheduled_work() usage IB/isert: Avoid flush_scheduled_work() usage RDMA/mlx5: Remove duplicate pointer assignment in mlx5_ib_alloc_implicit_mr() RDMA/qedr: Remove unnecessary synchronize_irq() before free_irq() RDMA/hns: Use hr_reg_read() instead of remaining roce_get_xxx() RDMA/hns: Use hr_reg_xxx() instead of remaining roce_set_xxx() RDMA/irdma: Add SW mechanism to generate completions on error ...
2022-05-24RDMA/hfi1: Remove all traces of diagpkt supportDennis Dalessandro1-23/+0
One of the concessions we made to get our driver upstream was to remove the diagnostic packet support. There is however still some cruft that was left over. Remove it. Link: https://lore.kernel.org/r/20220520183727.48973.93587.stgit@awfm-01.cornelisnetworks.com Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24RDMA/hfi1: Consolidate software versionsDennis Dalessandro2-18/+1
There is no need to have separate user and kernel software versions. There is a single software that the kernel is compatible with. Also remove the notion of a "kernel type" that is long since deprecated. Link: https://lore.kernel.org/r/20220520183722.48973.60262.stgit@awfm-01.cornelisnetworks.com Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24RDMA/hfi1: Remove pointless driver versionDennis Dalessandro2-21/+0
Driver versions have long been forbidden in the RDMA subsystem. We removed most of the code relating to them and have been very strict about not allowing. However there is some leftover versioning that we do not need. Get rid of that. Link: https://lore.kernel.org/r/20220520183717.48973.17418.stgit@awfm-01.cornelisnetworks.com Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24RDMA/hfi1: Fix potential integer multiplication overflow errorsDennis Dalessandro1-1/+1
When multiplying of different types, an overflow is possible even when storing the result in a larger type. This is because the conversion is done after the multiplication. So arithmetic overflow and thus in incorrect value is possible. Correct an instance of this in the inter packet delay calculation. Fix by ensuring one of the operands is u64 which will promote the other to u64 as well ensuring no overflow. Cc: stable@vger.kernel.org Fixes: 7724105686e7 ("IB/hfi1: add driver files") Link: https://lore.kernel.org/r/20220520183712.48973.29855.stgit@awfm-01.cornelisnetworks.com Reviewed-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24RDMA/hfi1: Prevent panic when SDMA is disabledDouglas Miller1-0/+2
If the hfi1 module is loaded with HFI1_CAP_SDMA off, a call to hfi1_write_iter() will dereference a NULL pointer and panic. A typical stack frame is: sdma_select_user_engine [hfi1] hfi1_user_sdma_process_request [hfi1] hfi1_write_iter [hfi1] do_iter_readv_writev do_iter_write vfs_writev do_writev do_syscall_64 The fix is to test for SDMA in hfi1_write_iter() and fail the I/O with EINVAL. Link: https://lore.kernel.org/r/20220520183706.48973.79803.stgit@awfm-01.cornelisnetworks.com Signed-off-by: Douglas Miller <doug.miller@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24RDMA/hfi1: Prevent use of lock before it is initializedDouglas Miller1-5/+7
If there is a failure during probe of hfi1 before the sdma_map_lock is initialized, the call to hfi1_free_devdata() will attempt to use a lock that has not been initialized. If the locking correctness validator is on then an INFO message and stack trace resembling the following may be seen: INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. Call Trace: register_lock_class+0x11b/0x880 __lock_acquire+0xf3/0x7930 lock_acquire+0xff/0x2d0 _raw_spin_lock_irq+0x46/0x60 sdma_clean+0x42a/0x660 [hfi1] hfi1_free_devdata+0x3a7/0x420 [hfi1] init_one+0x867/0x11a0 [hfi1] pci_device_probe+0x40e/0x8d0 The use of sdma_map_lock in sdma_clean() is for freeing the sdma_map memory, and sdma_map is not allocated/initialized until after sdma_map_lock has been initialized. This code only needs to be run if sdma_map is not NULL, and so checking for that condition will avoid trying to use the lock before it is initialized. Fixes: 473291b3ea0e ("IB/hfi1: Fix for early release of sdma context") Fixes: 7724105686e7 ("IB/hfi1: add driver files") Link: https://lore.kernel.org/r/20220520183701.48973.72434.stgit@awfm-01.cornelisnetworks.com Reported-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Douglas Miller <doug.miller@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24Merge tag 'v5.18' into rdma.git for-nextJason Gunthorpe5-38/+31
Following patches have dependencies. Resolve the merge conflict in drivers/net/ethernet/mellanox/mlx5/core/main.c by keeping the new names for the fs functions following linux-next: https://lore.kernel.org/r/20220519113529.226bc3e2@canb.auug.org.au/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24IB/hf1: Fix typo in commentJulia Lawall1-1/+1
Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Link: https://lore.kernel.org/r/20220521111145.81697-83-Julia.Lawall@inria.fr Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-24IB/qib: Fix typo in commentJulia Lawall1-1/+1
Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Link: https://lore.kernel.org/r/20220521111145.81697-56-Julia.Lawall@inria.fr Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-20RDMA/mlx4: Avoid flush_scheduled_work() usageTetsuo Handa3-8/+34
Flushing system-wide workqueues is dangerous and will be forbidden. Replace system_wq with local cm_wq. Link: https://lore.kernel.org/r/22f7183b-cc16-5a34-e879-7605f5efc6e6@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-19RDMA/mlx5: Remove duplicate pointer assignment in mlx5_ib_alloc_implicit_mr()Daisuke Matsuda1-1/+0
The pointer imr->umem is assigned twice. Fix this by removing the redundant one. Link: https://lore.kernel.org/r/20220518044914.1903125-1-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-17RDMA/qedr: Remove unnecessary synchronize_irq() before free_irq()Minghao Chi1-1/+0
Calling synchronize_irq() right before free_irq() is quite useless. On one hand the IRQ can easily fire again before free_irq() is entered, on the other hand free_irq() itself calls synchronize_irq() internally (in a race condition free way), before any state associated with the IRQ is freed. Link: https://lore.kernel.org/r/20220513081647.1631141-1-chi.minghao@zte.com.cn Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Acked-by: Michal KalderonĀ <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-37/+21
No conflicts. Build issue in drivers/net/ethernet/sfc/ptp.c 54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static") 49e6123c65da ("net: sfc: fix memory leak due to ptp channel") https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12RDMA/hns: Use hr_reg_read() instead of remaining roce_get_xxx()Wenpeng Liang4-254/+104
To reduce the code size and make the code clearer, replace all roce_get_xxx() with hr_reg_read() to read the data fields. Link: https://lore.kernel.org/r/20220512080012.38728-3-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-12RDMA/hns: Use hr_reg_xxx() instead of remaining roce_set_xxx()Wenpeng Liang2-270/+157
To reduce the code size and make the code clearer, replace all roce_set_xxx() with hr_reg_xxx() to write the data fields. Link: https://lore.kernel.org/r/20220512080012.38728-2-liangwenpeng@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-11RDMA/irdma: Add SW mechanism to generate completions on errorMustafa Ismail4-37/+210
HW flushes after QP in error state is not reliable. This can lead to application hang waiting on a completion for outstanding WRs. Implement a SW mechanism to generate completions for any outstanding WR's after the QP is modified to error. This is accomplished by starting a delayed worker after the QP is modified to error and the HW flush is performed. The worker will generate completions that will be returned to the application when it polls the CQ. This mechanism only applies to Kernel applications. Link: https://lore.kernel.org/r/20220425181624.1617-1-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-09net/mlx5: Lag, expose number of lag portsMark Bloch4-2/+4
Downstream patches will add support for hardware lag with more than 2 ports. Add a way for users to query the number of lag ports. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-04RDMA/hns: Remove the num_cqc_timer variableYixing Liu4-5/+3
The bt number of cqc_timer of HIP09 increases compared with that of HIP08. Therefore, cqc_timer_bt_num and num_cqc_timer do not match. As a result, the driver may fail to allocate cqc_timer. So the driver needs to uniquely uses cqc_timer_bt_num to represent the bt number of cqc_timer. Fixes: 0e40dc2f70cd ("RDMA/hns: Add timer allocation support for hip08") Link: https://lore.kernel.org/r/20220429093545.58070-1-liangwenpeng@huawei.com Signed-off-by: Yixing Liu <liuyixing1@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-04RDMA/hns: Add the detection for CMDQ status in the device initialization processYangyang Li2-0/+27
CMDQ may fail during HNS ROCEE initialization. The following is the log when the execution fails: hns3 0000:bd:00.2: In reset process RoCE client reinit. hns3 0000:bd:00.2: CMDQ move tail from 840 to 839 hns3 0000:bd:00.2 hns_2: failed to set gid, ret = -11! hns3 0000:bd:00.2: CMDQ move tail from 840 to 839 <...> hns3 0000:bd:00.2: CMDQ move tail from 840 to 839 hns3 0000:bd:00.2: CMDQ move tail from 840 to 0 hns3 0000:bd:00.2: [cmd]token 14e mailbox 20 timeout. hns3 0000:bd:00.2 hns_2: set HEM step 0 failed! hns3 0000:bd:00.2 hns_2: set HEM address to HW failed! hns3 0000:bd:00.2 hns_2: failed to alloc mtpt, ret = -16. infiniband hns_2: Couldn't create ib_mad PD infiniband hns_2: Couldn't open port 1 hns3 0000:bd:00.2: Reset done, RoCE client reinit finished. However, even if ib_mad client registration failed, ib_register_device() still returns success to the driver. In the device initialization process, CMDQ execution fails because HW/FW is abnormal. Therefore, if CMDQ fails, the initialization function should set CMDQ to a fatal error state and return a failure to the caller. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Link: https://lore.kernel.org/r/20220429093104.26687-1-liangwenpeng@huawei.com Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-04RDMA/hns: Remove unnecessary ret variable from hns_roce_dereg_mr()Guo Zhengkui1-2/+1
Fix the following coccicheck warning: drivers/infiniband/hw/hns/hns_roce_mr.c:343:5-8: Unneeded variable: "ret". Return 0 directly instead. Link: https://lore.kernel.org/r/20220426070858.9098-1-guozhengkui@vivo.com Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-02RDMA/irdma: Fix possible crash due to NULL netdev in notifierMustafa Ismail1-12/+9
For some net events in irdma_net_event notifier, the netdev can be NULL which will cause a crash in rdma_vlan_dev_real_dev. Fix this by moving all processing to the NETEVENT_NEIGH_UPDATE case where the netdev is guaranteed to not be NULL. Fixes: 6702bc147448 ("RDMA/irdma: Fix netdev notifications for vlan's") Link: https://lore.kernel.org/r/20220425181703.1634-4-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>