aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/mlx4/alias_GUID.c (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2019-02-22RDMA/core: Fix a WARN() messageDan Carpenter1-3/+1
The first parameter of WARN_ONCE() is a condition, then following parameters are the message. In this case, we left out the condition so it will just print the ops->type string. Fixes: 3856ec4b93c9 ("RDMA/core: Add RDMA_NLDEV_CMD_NEWLINK/DELLINK support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-22bnxt_re: fix the regression due to changes in alloc_pblDevesh Sharma3-22/+14
While adding the use of for_each_sg_dma_page iterator for Brodcom's rdma driver, there was a regression added in the __alloc_pbl path. The change left bnxt_re in DOA state in for-next branch. Fixing the regression to avoid the host crash when a user space object is created. Restricting the unconditional access to hwq.pg_arr when hwq is initialized for user space objects. Fixes: 161ebe2498d4 ("RDMA/bnxt_re: Use for_each_sg_dma_page iterator on umem SGL") Reported-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-22IB/mlx4: Increase the timeout for CM cacheHåkon Bugge1-1/+1
Using CX-3 virtual functions, either from a bare-metal machine or pass-through from a VM, MAD packets are proxied through the PF driver. Since the VF drivers have separate name spaces for MAD Transaction Ids (TIDs), the PF driver has to re-map the TIDs and keep the book keeping in a cache. Following the RDMA Connection Manager (CM) protocol, it is clear when an entry has to evicted form the cache. But life is not perfect, remote peers may die or be rebooted. Hence, it's a timeout to wipe out a cache entry, when the PF driver assumes the remote peer has gone. During workloads where a high number of QPs are destroyed concurrently, excessive amount of CM DREQ retries has been observed The problem can be demonstrated in a bare-metal environment, where two nodes have instantiated 8 VFs each. This using dual ported HCAs, so we have 16 vPorts per physical server. 64 processes are associated with each vPort and creates and destroys one QP for each of the remote 64 processes. That is, 1024 QPs per vPort, all in all 16K QPs. The QPs are created/destroyed using the CM. When tearing down these 16K QPs, excessive CM DREQ retries (and duplicates) are observed. With some cat/paste/awk wizardry on the infiniband_cm sysfs, we observe as sum of the 16 vPorts on one of the nodes: cm_rx_duplicates: dreq 2102 cm_rx_msgs: drep 1989 dreq 6195 rep 3968 req 4224 rtu 4224 cm_tx_msgs: drep 4093 dreq 27568 rep 4224 req 3968 rtu 3968 cm_tx_retries: dreq 23469 Note that the active/passive side is equally distributed between the two nodes. Enabling pr_debug in cm.c gives tons of: [171778.814239] <mlx4_ib> mlx4_ib_multiplex_cm_handler: id{slave: 1,sl_cm_id: 0xd393089f} is NULL! By increasing the CM_CLEANUP_CACHE_TIMEOUT from 5 to 30 seconds, the tear-down phase of the application is reduced from approximately 90 to 50 seconds. Retries/duplicates are also significantly reduced: cm_rx_duplicates: dreq 2460 [] cm_tx_retries: dreq 3010 req 47 Increasing the timeout further didn't help, as these duplicates and retries stems from a too short CMA timeout, which was 20 (~4 seconds) on the systems. By increasing the CMA timeout to 22 (~17 seconds), the numbers fell down to about 10 for both of them. Adjustment of the CMA timeout is not part of this commit. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-21IB/core: Abort page fault handler silently during owning process exitMoni Shoua1-1/+1
It is possible that during a page fault handling, the process that owns the MR is terminating. The indication for it is failure to get the task_struct or take reference on the mm_struct. In this case just abort the page-fault handler with error but without a warning to the kernel log. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-21IB/mlx5: Validate correct PD before prefetch MRMoni Shoua1-16/+30
When prefetching odp mr it is required to verify that pd of the mr is identical to the pd for which the advise_mr request arrived with. This check was missing from synchronous flow and is added now. Fixes: 813e90b1aeaa ("IB/mlx5: Add advise_mr() support") Reported-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-21IB/mlx5: Protect against prefetch of invalid MRMoni Shoua4-23/+102
When deferring a prefetch request we need to protect against MR or PD being destroyed while the request is still enqueued. The first step is to validate that PD owns the lkey that describes the MR and that the MR that the lkey refers to is owned by that PD. The second step is to dequeue all requests when MR is destroyed. Since PD can't be destroyed while it owns MRs it is guaranteed that when a worker wakes up the request it refers to is still valid. Now, it is possible to refrain from taking a reference on the device since it is assured to be present as pd. While that, replace the dedicated ordered workqueue with the system unbound workqueue to reuse an existing resource and improve performance. This will also fix a bug of queueing to the wrong workqueue. Fixes: 813e90b1aeaa ("IB/mlx5: Add advise_mr() support") Reported-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-21RDMA/uverbs: Store PR pointer before it is overwrittenLeon Romanovsky1-1/+1
The IB_MR_REREG_PD command rewrites mr->pd after successful rereg_user_mr(), such change causes to lost usecnt information and produces the following warning: WARNING: CPU: 1 PID: 1771 at drivers/infiniband/core/verbs.c:336 ib_dealloc_pd+0x4e/0x60 [ib_core] CPU: 1 PID: 1771 Comm: rereg_mr Tainted: G W OE 5.0.0-rc7-for-upstream-perf-2019-02-20_14-03-40-34 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:ib_dealloc_pd+0x4e/0x60 [ib_core] RSP: 0018:ffffc90003923dc0 EFLAGS: 00010286 RAX: 00000000ffffffff RBX: ffff88821f7f0400 RCX: ffff888236a40c00 RDX: ffff88821f7f0400 RSI: 0000000000000001 RDI: 0000000000000000 RBP: 0000000000000001 R08: ffff88835f665d80 R09: ffff8882209c90d8 R10: ffff88835ec003e0 R11: 0000000000000000 R12: ffff888221680ba0 R13: ffff888221680b00 R14: 00000000ffffffea R15: ffff88821f53c318 FS: 00007f70db11e740(0000) GS:ffff88835f640000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000001dfd030 CR3: 000000029d9d8000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: uverbs_free_pd+0x2d/0x30 [ib_uverbs] destroy_hw_idr_uobject+0x16/0x40 [ib_uverbs] uverbs_destroy_uobject+0x28/0x170 [ib_uverbs] __uverbs_cleanup_ufile+0x6b/0x90 [ib_uverbs] uverbs_destroy_ufile_hw+0x8b/0x110 [ib_uverbs] ib_uverbs_close+0x1f/0x80 [ib_uverbs] __fput+0xb1/0x220 task_work_run+0x7f/0xa0 exit_to_usermode_loop+0x6b/0xb2 do_syscall_64+0xc5/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f70dad00664 Fixes: e278173fd19e ("RDMA/core: Cosmetic change - move member initialization to correct block") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-21IB/hfi1: Add missing break in switch statementGustavo A. R. Silva1-0/+1
Fix the following warning by adding a missing break: drivers/infiniband/hw/hfi1/tid_rdma.c: In function ‘hfi1_tid_rdma_wqe_interlock’: drivers/infiniband/hw/hfi1/tid_rdma.c:3251:3: warning: this statement may fall through [-Wimplicit-fallthrough=] switch (prev->wr.opcode) { ^~~~~~ drivers/infiniband/hw/hfi1/tid_rdma.c:3259:2: note: here case IB_WR_RDMA_READ: ^~~~ Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough. Fixes: c6c231175ccd ("IB/hfi1: Add interlock between TID RDMA WRITE and other requests") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Kaike Wan <Kaike.wan@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-20drivers/IB,qib: Fix pinned/locked limit check in qib_get_user_pages()Davidlohr Bueso1-1/+1
The current check does not take into account the previous value of pinned_vm; thus it is quite bogus as is. Fix this by checking the new value after the (optimistic) atomic inc. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/core: Verify that memory window type is legalNoa Osherovich1-0/+5
Before calling the provider's alloc_mw function, verify that the given memory type is either IB_MW_TYPE_1 or IB_MW_TYPE_2. Signed-off-by: Noa Osherovich <noaos@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/iwcm: Fix string truncation errorLeon Romanovsky1-3/+3
The strlen() check at the beginning of iw_cm_map() ensures that devname and ifname strings are less than destinations to which they are supposed to be copied. Change strncpy() call to be strcpy(), because we are protected from overflow. Zero the entire string buffer to avoid copying uninitialized kernel stack memory to userspace. This fixes the compilation warning below: In file included from ./include/linux/dma-mapping.h:6, from drivers/infiniband/core/iwcm.c:38: In function _strncpy_, inlined from _iw_cm_map_ at drivers/infiniband/core/iwcm.c:519:2: ./include/linux/string.h:253:9: warning: ___builtin_strncpy_ specified bound 32 equals destination size [-Wstringop-truncation] return __builtin_strncpy(p, q, size); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: d53ec8af56d5 ("RDMA/iwcm: Don't copy past the end of dev_name() string") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/core: Cosmetic change - move member initialization to correct blockYuval Shaia1-8/+7
old_pd is used only if IB_MR_REREG_PD flags is set. For readability move it's initialization to where it is used. While there rewrite the whole 'if-else' block so on error jump directly to label and no need for 'else' Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19iw_cxgb4: Make function read_tcb() staticWei Yongjun1-1/+1
Fixes the following sparse warning: drivers/infiniband/hw/cxgb4/cm.c:658:6: warning: symbol 'read_tcb' was not declared. Should it be static? Fixes: 11a27e2121a5 ("iw_cxgb4: complete the cached SRQ buffers") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/hns: Bugfix for set hem of SCCYangyang Li1-0/+3
The method of set hem for scc context is different from other contexts. It should notify the hardware with the detailed idx in bt0 for scc, while for other contexts, it only need to notify the bt step and the hardware will calculate the idx. Here fixes the following error when unloading the hip08 driver: [ 123.570768] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 123.579023] {1}[Hardware Error]: event severity: recoverable [ 123.584670] {1}[Hardware Error]: Error 0, type: recoverable [ 123.590317] {1}[Hardware Error]: section_type: PCIe error [ 123.595877] {1}[Hardware Error]: version: 4.0 [ 123.600395] {1}[Hardware Error]: command: 0x0006, status: 0x0010 [ 123.606562] {1}[Hardware Error]: device_id: 0000:7d:00.0 [ 123.612034] {1}[Hardware Error]: slot: 0 [ 123.616120] {1}[Hardware Error]: secondary_bus: 0x00 [ 123.621245] {1}[Hardware Error]: vendor_id: 0x19e5, device_id: 0xa222 [ 123.627847] {1}[Hardware Error]: class_code: 000002 [ 123.632977] hns3 0000:7d:00.0: aer_status: 0x00000000, aer_mask: 0x00000000 [ 123.639928] hns3 0000:7d:00.0: aer_layer=Transaction Layer, aer_agent=Receiver ID [ 123.647400] hns3 0000:7d:00.0: aer_uncor_severity: 0x00000000 [ 123.653136] hns3 0000:7d:00.0: PCI error detected, state(=1)!! [ 123.658959] hns3 0000:7d:00.0: ROCEE uncorrected RAS error identified [ 123.665395] hns3 0000:7d:00.0: ROCEE RAS AXI rresp error [ 123.670713] hns3 0000:7d:00.0: requesting reset due to PCI error [ 123.676715] hns3 0000:7d:00.0: received reset event , reset type is 5 [ 123.683147] hns3 0000:7d:00.0: AER: Device recovery successful [ 123.688978] hns3 0000:7d:00.0: PF Reset requested [ 123.693684] hns3 0000:7d:00.0: PF failed(=-5) to send mailbox message to VF [ 123.700633] hns3 0000:7d:00.0: inform reset to vf(1) failded -5! Fixes: 6a157f7d1b14 ("RDMA/hns: Add SCC context allocation support for hip08") Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Reviewed-by: Yixian Liu <liuyixian@huawei.com> Reviewed-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/hns: Modify qp&cq&pd specification according to UMLijun Ou1-5/+5
Accroding to hip08's limitation, qp&cq specification is 1M, mtpt specification 1M in kernel space. Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19lib/irq_poll: Support schedules in non-interrupt contextsSteve Wise1-1/+1
Do not assume irq_poll_sched() is called from an interrupt context only. So use raise_softirq_irqoff() instead of __raise_softirq_irqoff() so it will kick the ksoftirqd if the schedule is from a non-interrupt context. This is required for RDMA drivers, like soft iwarp, that generate cq completion notifications in a workqueue or kthread context. Without this change, siw completion notifications to the ULP can take several hundred usecs, depending on the system load. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19rdma_rxe: Use netlink messages to add/delete linksSteve Wise8-13/+46
Add support for the RDMA_NLDEV_CMD_NEWLINK/DELLINK messages which allow dynamically adding new RXE links. Deprecate the old module options for now. Cc: Moni Shoua <monis@mellanox.com> Reviewed-by: Yanjun Zhu <yanjun.zhu@oracle.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/core: Add RDMA_NLDEV_CMD_NEWLINK/DELLINK supportSteve Wise4-2/+144
Add support for new LINK messages to allow adding and deleting rdma interfaces. This will be used initially for soft rdma drivers which instantiate device instances dynamically by the admin specifying a netdev device to use. The rdma_rxe module will be the first user of these messages. The design is modeled after RTNL_NEWLINK/DELLINK: rdma drivers register with the rdma core if they provide link add/delete functions. Each driver registers with a unique "type" string, that is used to dispatch messages coming from user space. A new RDMA_NLDEV_ATTR is defined for the "type" string. User mode will pass 3 attributes in a NEWLINK message: RDMA_NLDEV_ATTR_DEV_NAME for the desired rdma device name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device interface to use for this link. The DELLINK message will contain the RDMA_NLDEV_ATTR_DEV_INDEX of the device to delete. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19IB/usnic: Fix deadlockParvi Kaustubhi3-28/+19
There is a dead lock in usnic ib_register and netdev_notify path. usnic_ib_discover_pf() | mutex_lock(&usnic_ib_ibdev_list_lock); | usnic_ib_device_add(); | ib_register_device() | usnic_ib_query_port() | mutex_lock(&us_ibdev->usdev_lock); | ib_get_eth_speed() | rtnl_lock() order of lock: &usnic_ib_ibdev_list_lock -> usdev_lock -> rtnl_lock rtnl_lock() | usnic_ib_netdevice_event() | mutex_lock(&usnic_ib_ibdev_list_lock); order of lock: rtnl_lock -> &usnic_ib_ibdev_list_lock Solution is to use the core's lock-free ib_device_get_by_netdev() scheme to lookup ib_dev while handling netdev & inet events. Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com> Reviewed-by: Govindarajulu Varadarajan <gvaradar@cisco.com> Reviewed-by: Tanmay Inamdar <tinamdar@cisco.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/rxe: Close a race after ib_register_deviceJason Gunthorpe6-12/+35
Since rxe allows unregistration from other threads the rxe pointer can become invalid any moment after ib_register_driver returns. This could cause a user triggered use after free. Add another driver callback to be called right after the device becomes registered to complete any device setup required post-registration. This callback has enough core locking to prevent the device from becoming unregistered. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/rxe: Add ib_device_get_by_name() and use it in rxeJason Gunthorpe6-33/+32
rxe has an open coded version of this that is not as safe as the core version. This lets us eliminate the internal device list entirely from rxe. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/rxe: Use driver_unregister and new unregistration APIJason Gunthorpe8-87/+30
rxe does not have correct locking for its registration/unregistration paths, use the core code to handle it instead. In this mode ib_unregister_device will also do the dealloc, so rxe is required to do clean up from a callback. The core code ensures that unregistration is done only once, and generally takes care of locking and concurrency problems for rxe. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/device: Provide APIs from the core code to help unregistrationJason Gunthorpe2-36/+217
These APIs are intended to support drivers that exist outside the usual driver core probe()/remove() callbacks. Normally the driver core will prevent remove() from running concurrently with probe(), once this safety is lost drivers need more support to get the locking and lifetimes right. ib_unregister_driver() is intended to be used during module_exit of a driver using these APIs. It unregisters all the associated ib_devices. ib_unregister_device_and_put() is to be used by a driver-specific removal function (ie removal by name, removal from a netdev notifier, removal from netlink) ib_unregister_queued() is to be used from netdev notifier chains where RTNL is held. The locking is tricky here since once things become async it is possible to race unregister with registration. This is largely solved by relying on the registration refcount, unregistration will only ever work on something that has a positive registration refcount - and then an unregistration mutex serializes all competing unregistrations of the same device. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/rxe: Use ib_device_get_by_netdev() instead of open codingJason Gunthorpe4-47/+38
The core API handles the locking correctly and is faster if there are multiple devices. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/device: Add ib_device_get_by_netdev()Jason Gunthorpe2-13/+116
Several drivers need to find the ib_device from a given netdev. rxe needs this at speed in an unsleepable context, so choose to implement the translation using a RCU safe hash table. The hash table can have a many to one mapping. This is intended to support some future case where multiple IB drivers (ie iWarp and RoCE) connect to the same netdevs. driver_ids will need to be different to support this. In the process this makes the struct ib_device and ib_port_data RCU safe by deferring their kfrees. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/device: Add ib_device_set_netdev() as an alternative to get_netdevJason Gunthorpe6-38/+171
The associated netdev should not actually be very dynamic, so for most drivers there is no reason for a callback like this. Provide an API to inform the core code about the net dev affiliation and use a core maintained data structure instead. This allows the core code to be more aware of the ndev relationship which will allow some new APIs based around this. This also uses locking that makes some kind of sense, many drivers had a confusing RCU lock, or missing locking which isn't right. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/cache: Move the cache per-port data into the main ib_port_dataJason Gunthorpe2-59/+33
Like the other cases there no real reason to have another array just for the cache. This larger conversion gets its own patch. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/device: Consolidate ib_device per_port data into one placeJason Gunthorpe4-94/+78
There is no reason to have three allocations of per-port data. Combine them together and make the lifetime for all the per-port data match the struct ib_device. Following patches will require more port-specific data, now there is a good place to put it. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA: Add and use rdma_for_each_portJason Gunthorpe12-46/+53
We have many loops iterating over all of the end port numbers on a struct ib_device, simplify them with a for_each helper. Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/nldev: Don't expose number of not-visible entriesLeon Romanovsky1-5/+2
Netlink dumpit handshake exchanges the index from which kernel should start to return its value, in current code, this index included not-visible in this PID items too and indirectly revealed the number of entries. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/nldev: Connect QP number to .doit callbackLeon Romanovsky2-1/+11
This patch adds ability to query specific QP based on its LQPN (local QPN), which is assigned by HW and needs special treatment while inserting into restrack DB. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/nldev: Provide parent IDs for PD, MR and QP objectsLeon Romanovsky2-0/+19
PD, MR and QP objects have parents objects: contexts and PDs. The exposed parent IDs allow to correlate various objects and simplify debug investigation. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/nldev: Share with user-space object IDsLeon Romanovsky2-0/+29
Give to the user space tools unique identifier for PD, MR, CQ and CM_ID objects, so they can be able to query on them with .doit callbacks. QP .doit is not supported yet, till all drivers will be updated to provide their LQPN to be equal to their restrack ID. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19RDMA/restrack: Prepare restrack_root to addition of extra fields per-typeLeon Romanovsky3-78/+43
As a preparation to extension of rdma_restrack_root to provide software IDs, which will be per-type too. We convert the rdma_restrack_root from struct with arrays to array of structs. Such conversion allows us to drop rwsem lock in favour of internal XArray lock. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-19net/mlx5: Factor out HCA capabilities functionsLeon Romanovsky1-15/+31
Combine all HCA capabilities setters under one function and compile out the ODP related function in case kernel was compiled without ODP support. Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-02-18RDMA/restrack: Hide restrack DB from IB/coreLeon Romanovsky6-66/+114
There is no need to expose internals of restrack DB to IB/core. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-18RDMA/restrack: Reduce scope of synchronization lock while updating DBLeon Romanovsky2-3/+2
XArray uses internal lock for updates to XArray. This means that our external RW lock is needed to ensure that entry is not deleted while we are performing iteration over list. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-18RDMA/nldev: Add resource tracker doit callbackLeon Romanovsky1-61/+129
Implement doit callbacks and ensure that users won't provide port values on resource entry allocated in per-device mode needed for .doit callback. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-18RDMA/restrack: Translate from ID to restrack objectLeon Romanovsky2-0/+28
Add new general helper to get restrack entry given by ID and their respective type. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-18RDMA/restrack: Convert internal DB from hash to XArrayLeon Romanovsky3-45/+93
The additions of .doit callbacks posses new access pattern to the resource entries by some user visible index. Back then, the legacy DB was implemented as hash because per-index access wasn't needed and XArray wasn't accepted yet. Acceptance of XArray together with per-index access requires the refresh of DB implementation. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15RDMA/core: Move device addition deletion to device.cParav Pandit3-13/+14
Move core device addition and removal from sysfs.c to device.c as device.c is more appropriate place for device management. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15RDMA/core: Introduce and use ib_setup_port_attrs()Parav Pandit1-29/+35
Refactor code for device and port sysfs attributes for reuse. While at it, rename counter part free function to ib_free_port_attrs. Also attribute setup sequence is: (a) port specific init. (b) device stats alloc/init. So for cleanup, follow reverse sequence: (a) device stats dealloc (b) port specific cleanup Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15RDMA/core: Use simpler device_del() instead of device_unregister()Parav Pandit2-5/+3
Instead of holding extra reference using get_device() that device_unregister() releases, simplify it as below. device_add() balances with device_del(). device_initialize() balances with put_device(), always via ib_dealloc_device(). Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15RDMA/cxgb4: Remove kref accounting for sync operationLeon Romanovsky3-29/+3
Ucontext allocation and release aren't async events and don't need kref accounting. The common layer of RDMA subsystem ensures that dealloc ucontext will be called after all other objects are released. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15RDMA/nes: Remove useless usecnt variable and redundant memsetLeon Romanovsky2-9/+1
The internal design of RDMA/core ensures that there dealloc ucontext will be called only if alloc_ucontext succeeded, hence there is no need to manage internal variable to mark validity of ucontext. As part of this change, remove redundant memeset too. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15IB/hfi1: Fix a build warning for TID RDMA READKaike Wan1-0/+1
The following build warning was produced for the TID RDMA READ patch ("IB/hfi1: Enable TID RDMA READ protocol"): drivers/infiniband/hw/hfi1/qp.c: In function 'hfi1_setup_wqe': drivers/infiniband/hw/hfi1/qp.c:328:3: warning: this statement may fall through [-Wimplicit-fallthrough=] hfi1_setup_tid_rdma_wqe(qp, wqe); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/hfi1/qp.c:329:2: note: here case IB_QPT_UC: ^~~~ This patch will fix the issue by adding the "fall through" comment. Fixes: f1ab4efa6d32 ("IB/hfi1: Enable TID RDMA READ protocol") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15RDMA/uverbs: Fix an error flow in ib_uverbs_poll_cqJason Gunthorpe1-2/+1
The new output_written block was wrongly placed before the ret=0, causing the error code to be lost. uverbs_output_written is not expected to fail, and even if it does fail it has no significant impact on the userspace flow. Reported-by: Bart Van Assche <bvanassche@acm.org> Fixes: d6f4a21f309d ("RDMA/uverbs: Mark ioctl responses with UVERBS_ATTR_F_VALID_OUTPUT") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2019-02-15IB/{hw,sw}: Remove 'uobject->context' dependency in object creation APIsShamir Rabinovitch23-155/+194
Now when we have the udata passed to all the ib_xxx object creation APIs and the additional macro 'rdma_udata_to_drv_context' to get the ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally start to remove the dependency of the drivers in the ib_xxx->uobject->context. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15IB/verbs: Add helper function rdma_udata_to_drv_contextShamir Rabinovitch1-0/+17
Helper function to get driver's context out of ib_udata wrapped in uverbs_attr_bundle for user objects or NULL for kernel objects. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-02-15IB/uverbs: Add ib_ucontext to uverbs_attr_bundle sent from ioctl and cmd flowsShamir Rabinovitch8-35/+58
Add ib_ucontext to the uverbs_attr_bundle sent down the iocl and cmd flows as soon as the flow has ib_uobject. In addition, remove rdma_get_ucontext helper function that is only used by ib_umem_get. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>