aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-01-10RDMA/core: Remove err in iw_query_portGuoqing Jiang1-6/+1
Since we can return device->ops.query_port directly, so no need to keep those lines. Link: https://lore.kernel.org/r/20200109134043.15568-1-guoqing.jiang@cloud.ionos.com Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-10RDMA/core: Fix locking in ib_uverbs_event_readJason Gunthorpe1-18/+14
This should not be using ib_dev to test for disassociation, during disassociation is_closed is set under lock and the waitq is triggered. Instead check is_closed and be sure to re-obtain the lock to test the value after the wait_event returns. Fixes: 036b10635739 ("IB/uverbs: Enable device removal when there are active user space applications") Link: https://lore.kernel.org/r/1578504126-9400-12-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-09IB/core: Fix build failure without hugepagesArnd Bergmann1-2/+3
HPAGE_SHIFT is only defined on architectures that support hugepages: drivers/infiniband/core/umem_odp.c: In function 'ib_umem_odp_get': drivers/infiniband/core/umem_odp.c:245:26: error: 'HPAGE_SHIFT' undeclared (first use in this function); did you mean 'PAGE_SHIFT'? Enclose this in an #ifdef. Fixes: 9ff1b6466a29 ("IB/core: Fix ODP with IB_ACCESS_HUGETLB handling") Link: https://lore.kernel.org/r/20200109084740.2872079-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07IB/core: Rename event_handler_lock to qp_open_list_lockParav Pandit2-7/+7
This lock is used to protect the qp->open_list linked list. As a side effect it seems to also globally serialize the qp event_handler, but it isn't clear if that is a deliberate design. Link: https://lore.kernel.org/r/20191212113024.336702-5-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07IB/core: Cut down single member ib_cache structureParav Pandit1-15/+15
Given that ib_cache structure has only single member now, merge the cache lock directly in the ib_device. Link: https://lore.kernel.org/r/20191212113024.336702-4-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07IB/core: Let IB core distribute cache update eventsParav Pandit3-69/+86
Currently when the low level driver notifies Pkey, GID, and port change events they are notified to the registered handlers in the order they are registered. IB core and other ULPs such as IPoIB are interested in GID, LID, Pkey change events. Since all GID queries done by ULPs are serviced by IB core, and the IB core deferes cache updates to a work queue, it is possible for other clients to see stale cache data when they handle their own events. For example, the below call tree shows how ipoib will call rdma_query_gid() concurrently with the update to the cache sitting in the WQ. mlx5_ib_handle_event() ib_dispatch_event() ib_cache_event() queue_work() -> slow cache update [..] ipoib_event() queue_work() [..] work handler ipoib_ib_dev_flush_light() __ipoib_ib_dev_flush() ipoib_dev_addr_changed_valid() rdma_query_gid() <- Returns old GID, cache not updated. Move all the event dispatch to a work queue so that the cache update is always done before any clients are notified. Fixes: f35faa4ba956 ("IB/core: Simplify ib_query_gid to always refer to cache") Link: https://lore.kernel.org/r/20191212113024.336702-3-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07RDMA/core: Add trace points to follow MR allocationChuck Lever1-11/+28
Track the lifetime of ib_mr objects. Here's sample output from a test run with NFS/RDMA: <...>-361 [009] 79238.772782: mr_alloc: pd.id=3 mr.id=11 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79238.772812: mr_alloc: pd.id=3 mr.id=12 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79238.772839: mr_alloc: pd.id=3 mr.id=13 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79238.772866: mr_alloc: pd.id=3 mr.id=14 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79238.772893: mr_alloc: pd.id=3 mr.id=15 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79238.772921: mr_alloc: pd.id=3 mr.id=16 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79238.772947: mr_alloc: pd.id=3 mr.id=17 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79238.772974: mr_alloc: pd.id=3 mr.id=18 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79238.773001: mr_alloc: pd.id=3 mr.id=19 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79238.773028: mr_alloc: pd.id=3 mr.id=20 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79238.773055: mr_alloc: pd.id=3 mr.id=21 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.270942: mr_alloc: pd.id=3 mr.id=22 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.270975: mr_alloc: pd.id=3 mr.id=23 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.271007: mr_alloc: pd.id=3 mr.id=24 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.271036: mr_alloc: pd.id=3 mr.id=25 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.271067: mr_alloc: pd.id=3 mr.id=26 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.271095: mr_alloc: pd.id=3 mr.id=27 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.271121: mr_alloc: pd.id=3 mr.id=28 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.271153: mr_alloc: pd.id=3 mr.id=29 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.271181: mr_alloc: pd.id=3 mr.id=30 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.271208: mr_alloc: pd.id=3 mr.id=31 type=MEM_REG max_num_sg=30 rc=0 <...>-361 [009] 79240.271236: mr_alloc: pd.id=3 mr.id=32 type=MEM_REG max_num_sg=30 rc=0 <...>-4351 [001] 79242.299400: mr_dereg: mr.id=32 <...>-4351 [001] 79242.299467: mr_dereg: mr.id=31 <...>-4351 [001] 79242.299554: mr_dereg: mr.id=30 <...>-4351 [001] 79242.299615: mr_dereg: mr.id=29 <...>-4351 [001] 79242.299684: mr_dereg: mr.id=28 <...>-4351 [001] 79242.299748: mr_dereg: mr.id=27 <...>-4351 [001] 79242.299812: mr_dereg: mr.id=26 <...>-4351 [001] 79242.299874: mr_dereg: mr.id=25 <...>-4351 [001] 79242.299944: mr_dereg: mr.id=24 <...>-4351 [001] 79242.300009: mr_dereg: mr.id=23 <...>-4351 [001] 79242.300190: mr_dereg: mr.id=22 <...>-4351 [001] 79242.300263: mr_dereg: mr.id=21 <...>-4351 [001] 79242.300326: mr_dereg: mr.id=20 <...>-4351 [001] 79242.300388: mr_dereg: mr.id=19 <...>-4351 [001] 79242.300450: mr_dereg: mr.id=18 <...>-4351 [001] 79242.300516: mr_dereg: mr.id=17 <...>-4351 [001] 79242.300629: mr_dereg: mr.id=16 <...>-4351 [001] 79242.300718: mr_dereg: mr.id=15 <...>-4351 [001] 79242.300784: mr_dereg: mr.id=14 <...>-4351 [001] 79242.300879: mr_dereg: mr.id=13 <...>-4351 [001] 79242.300945: mr_dereg: mr.id=12 <...>-4351 [001] 79242.301012: mr_dereg: mr.id=11 Some features of the output: - The lifetime and owner PD of each MR is clearly visible. - The type of MR is captured, as is the SGE array size. - Failing MR allocation can be recorded. Link: https://lore.kernel.org/r/20191218201820.30584.34636.stgit@manet.1015granger.net Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07RDMA/core: Trace points for diagnosing completion queue issuesChuck Lever4-4/+44
Sample trace events: kworker/u29:0-300 [007] 120.042217: cq_alloc: cq.id=4 nr_cqe=161 comp_vector=2 poll_ctx=WORKQUEUE <idle>-0 [002] 120.056292: cq_schedule: cq.id=4 kworker/2:1H-482 [002] 120.056402: cq_process: cq.id=4 wake-up took 109 [us] from interrupt kworker/2:1H-482 [002] 120.056407: cq_poll: cq.id=4 requested 16, returned 1 <idle>-0 [002] 120.067503: cq_schedule: cq.id=4 kworker/2:1H-482 [002] 120.067537: cq_process: cq.id=4 wake-up took 34 [us] from interrupt kworker/2:1H-482 [002] 120.067541: cq_poll: cq.id=4 requested 16, returned 1 <idle>-0 [002] 120.067657: cq_schedule: cq.id=4 kworker/2:1H-482 [002] 120.067672: cq_process: cq.id=4 wake-up took 15 [us] from interrupt kworker/2:1H-482 [002] 120.067674: cq_poll: cq.id=4 requested 16, returned 1 ... systemd-1 [002] 122.392653: cq_schedule: cq.id=4 kworker/2:1H-482 [002] 122.392688: cq_process: cq.id=4 wake-up took 35 [us] from interrupt kworker/2:1H-482 [002] 122.392693: cq_poll: cq.id=4 requested 16, returned 16 kworker/2:1H-482 [002] 122.392836: cq_poll: cq.id=4 requested 16, returned 16 kworker/2:1H-482 [002] 122.392970: cq_poll: cq.id=4 requested 16, returned 16 kworker/2:1H-482 [002] 122.393083: cq_poll: cq.id=4 requested 16, returned 16 kworker/2:1H-482 [002] 122.393195: cq_poll: cq.id=4 requested 16, returned 3 Several features to note in this output: - The WCE count and context type are reported at allocation time - The CPU and kworker for each CQ is evident - The CQ's restracker ID is tagged on each trace event - CQ poll scheduling latency is measured - Details about how often single completions occur versus multiple completions are evident - The cost of the ULP's completion handler is recorded Link: https://lore.kernel.org/r/20191218201815.30584.3481.stgit@manet.1015granger.net Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-07RDMA/cma: Add trace points in RDMA Connection ManagerChuck Lever4-23/+475
Record state transitions as each connection is established. The IP address of both peers and the Type of Service is reported. These trace points are not in performance hot paths. Also, record each cm_event_handler call to ULPs. This eliminates the need for each ULP to add its own similar trace point in its CM event handler function. These new trace points appear in a new trace subsystem called "rdma_cma". Sample events: <...>-220 [004] 121.430733: cm_id_create: cm.id=0 <...>-472 [003] 121.430991: cm_event_handler: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 ADDR_RESOLVED (0/0) <...>-472 [003] 121.430995: cm_event_done: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 result=0 <...>-472 [003] 121.431172: cm_event_handler: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 ROUTE_RESOLVED (2/0) <...>-472 [003] 121.431174: cm_event_done: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 result=0 <...>-220 [004] 121.433480: cm_qp_create: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 pd.id=2 qp_type=RC send_wr=4091 recv_wr=256 qp_num=521 rc=0 <...>-220 [004] 121.433577: cm_send_req: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 qp_num=521 kworker/1:2-973 [001] 121.436190: cm_send_mra: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 kworker/1:2-973 [001] 121.436340: cm_send_rtu: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 kworker/1:2-973 [001] 121.436359: cm_event_handler: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 ESTABLISHED (9/0) kworker/1:2-973 [001] 121.436365: cm_event_done: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 result=0 <...>-1975 [005] 123.161954: cm_disconnect: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 <...>-1975 [005] 123.161974: cm_sent_dreq: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 <...>-220 [004] 123.162102: cm_disconnect: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 kworker/0:1-13 [000] 123.162391: cm_event_handler: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 DISCONNECTED (10/0) kworker/0:1-13 [000] 123.162393: cm_event_done: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 result=0 <...>-220 [004] 123.164456: cm_qp_destroy: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 qp_num=521 <...>-220 [004] 123.165290: cm_id_destroy: cm.id=0 src=192.168.2.51:35090 dst=192.168.2.55:20049 tos=0 Some features to note: - restracker ID of the rdma_cm_id is tagged on each trace event - The source and destination IP addresses and TOS are reported - CM event upcalls are shown with decoded event and status - CM state transitions are reported - rdma_cm_id lifetime events are captured - The latency of ULP CM event handlers is reported - Lifetime events of associated QPs are reported - Device removal and insertion is reported This patch is based on previous work by: Saeed Mahameed <saeedm@mellanox.com> Mukesh Kacker <mukesh.kacker@oracle.com> Ajaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Aron Silverton <aron.silverton@oracle.com> Avinash Repaka <avinash.repaka@oracle.com> Somasundaram Krishnasamy <somasundaram.krishnasamy@oracle.com> Link: https://lore.kernel.org/r/20191218201810.30584.3052.stgit@manet.1015granger.net Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03RDMA/cm: Delete unused CM ARP functionsLeon Romanovsky1-66/+0
Clean the code by deleting ARP functions, which are not called anyway. Link: https://lore.kernel.org/r/20191212093830.316934-46-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03RDMA/cm: Delete unused CM LAP functionsLeon Romanovsky2-168/+0
Clean the code by deleting LAP functions, which are not called anyway. Link: https://lore.kernel.org/r/20191212093830.316934-43-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03IB/core: Fix ODP with IB_ACCESS_HUGETLB handlingYishai Hadas1-17/+4
As VMAs for a given range might not be available as part of the registration phase in ODP. ib_init_umem_odp() considered the expected page shift value that was previously set and initializes its internals accordingly. If memory isn't backed by physical contiguous pages aligned to a hugepage boundary an error will be set as part of the page fault flow and come back to the user as some failed RDMA operation. Fixes: 0008b84ea9af ("IB/umem: Add support to huge ODP") Link: https://lore.kernel.org/r/20191222124649.52300-4-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03IB/core: Fix ODP get user pages flowYishai Hadas1-1/+1
The nr_pages argument of get_user_pages_remote() should always be in terms of the system page size, not the MR page size. Use PAGE_SIZE instead of umem_odp->page_shift. Fixes: 403cd12e2cf7 ("IB/umem: Add contiguous ODP support") Link: https://lore.kernel.org/r/20191222124649.52300-3-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03RDMA/cm: Use RCU synchronization mechanism to protect cm_id_private xa_load()Danit Goldberg1-30/+13
The RCU mechanism is optimized for read-mostly scenarios and therefore more suitable to protect the cm_id_private to decrease "cm.lock" congestion. This patch replaces the existing spinlock locking mechanism and kfree with RCU mechanism in places where spinlock(cm.lock) protected xa_load returning the cm_id_priv In addition, delete the cm_get_id() function as there is no longer a distinction if the caller already holds the cm_lock. Remove an open coded version of cm_get_id(). Link: https://lore.kernel.org/r/20191219134750.413429-1-leon@kernel.org Signed-off-by: Danit Goldberg <danitg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03RDMA/netlink: Do not always generate an ACK for some netlink operationsHåkon Bugge2-3/+3
In rdma_nl_rcv_skb(), the local variable err is assigned the return value of the supplied callback function, which could be one of ib_nl_handle_resolve_resp(), ib_nl_handle_set_timeout(), or ib_nl_handle_ip_res_resp(). These three functions all return skb->len on success. rdma_nl_rcv_skb() is merely a copy of netlink_rcv_skb(). The callback functions used by the latter have the convention: "Returns 0 on success or a negative error code". In particular, the statement (equal for both functions): if (nlh->nlmsg_flags & NLM_F_ACK || err) implies that rdma_nl_rcv_skb() always will ack a message, independent of the NLM_F_ACK being set in nlmsg_flags or not. The fix could be to change the above statement, but it is better to keep the two *_rcv_skb() functions equal in this respect and instead change the three callback functions in the rdma subsystem to the correct convention. Fixes: 2ca546b92a02 ("IB/sa: Route SA pathrecord query through netlink") Fixes: ae43f8286730 ("IB/core: Add IP to GID netlink offload") Link: https://lore.kernel.org/r/20191216120436.3204814-1-haakon.bugge@oracle.com Suggested-by: Mark Haywood <mark.haywood@oracle.com> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Tested-by: Mark Haywood <mark.haywood@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-12-12IB/core: Introduce rdma_user_mmap_entry_insert_range() APIYishai Hadas1-9/+39
Introduce rdma_user_mmap_entry_insert_range() API to be used once the required key for the given entry should be in a given range. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191212100237.330654-2-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-12RDMA/counter: Prevent auto-binding a QP which are not tracked with resMark Zhang1-0/+3
Some QPs (e.g. XRC QP) are not tracked in kernel, in this case they have an invalid res and should not be bound to any dynamically-allocated counter in auto mode. This fixes below call trace: BUG: kernel NULL pointer dereference, address: 0000000000000390 PGD 80000001a7233067 P4D 80000001a7233067 PUD 1a7215067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 2 PID: 24822 Comm: ibv_xsrq_pingpo Not tainted 5.4.0-rc5+ #21 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 RIP: 0010:rdma_counter_bind_qp_auto+0x142/0x270 [ib_core] Code: e1 48 85 c0 48 89 c2 0f 84 bc 00 00 00 49 8b 06 48 39 42 48 75 d6 40 3a aa 90 00 00 00 75 cd 49 8b 86 00 01 00 00 48 8b 4a 28 <8b> 80 90 03 00 00 39 81 90 03 00 00 75 b4 85 c0 74 b0 48 8b 04 24 RSP: 0018:ffffc900003f39c0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 RDX: ffff88820020ec00 RSI: 0000000000000004 RDI: ffffffffffffffc0 RBP: 0000000000000001 R08: ffff888224149ff0 R09: ffffc900003f3968 R10: ffffffffffffffff R11: ffff8882249c5848 R12: ffffffffffffffff R13: ffff88821d5aca50 R14: ffff8881f7690800 R15: ffff8881ff890000 FS: 00007fe53a3e1740(0000) GS:ffff888237b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000390 CR3: 00000001a7292006 CR4: 00000000003606a0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: _ib_modify_qp+0x3a4/0x3f0 [ib_core] ? lookup_get_idr_uobject.part.8+0x23/0x40 [ib_uverbs] modify_qp+0x322/0x3e0 [ib_uverbs] ib_uverbs_modify_qp+0x43/0x70 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xb1/0xf0 [ib_uverbs] ib_uverbs_run_method+0x6be/0x760 [ib_uverbs] ? uverbs_disassociate_api+0xd0/0xd0 [ib_uverbs] ib_uverbs_cmd_verbs+0x18d/0x3a0 [ib_uverbs] ? get_acl+0x1a/0x120 ? __alloc_pages_nodemask+0x15d/0x2c0 ib_uverbs_ioctl+0xa7/0x110 [ib_uverbs] do_vfs_ioctl+0xa5/0x610 ksys_ioctl+0x60/0x90 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x48/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 99fa331dc862 ("RDMA/counter: Add "auto" configuration mode support") Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Ido Kalir <idok@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191212091214.315005-2-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-09RDMA/cma: add missed unregister_pernet_subsys in init failureChuhong Yuan1-0/+1
The driver forgets to call unregister_pernet_subsys() in the error path of cma_init(). Add the missed call to fix it. Fixes: 4be74b42a6d0 ("IB/cma: Separate port allocation to network namespaces") Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Link: https://lore.kernel.org/r/20191206012426.12744-1-hslester96@gmail.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-04net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookupSabrina Dubroca1-4/+3
ipv6_stub uses the ip6_dst_lookup function to allow other modules to perform IPv6 lookups. However, this function skips the XFRM layer entirely. All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the ip_route_output_key and ip_route_output helpers) for their IPv4 lookups, which calls xfrm_lookup_route(). This patch fixes this inconsistent behavior by switching the stub to ip6_dst_lookup_flow, which also calls xfrm_lookup_route(). This requires some changes in all the callers, as these two functions take different arguments and have different return types. Fixes: 5f81bd2e5d80 ("ipv6: export a stub for IPv6 symbols used by vxlan") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-01Merge tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playgroundLinus Torvalds1-2/+2
Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann: "As part of the cleanup of some remaining y2038 issues, I came to fs/compat_ioctl.c, which still has a couple of commands that need support for time64_t. In completely unrelated work, I spent time on cleaning up parts of this file in the past, moving things out into drivers instead. After Al Viro reviewed an earlier version of this series and did a lot more of that cleanup, I decided to try to completely eliminate the rest of it and move it all into drivers. This series incorporates some of Al's work and many patches of my own, but in the end stops short of actually removing the last part, which is the scsi ioctl handlers. I have patches for those as well, but they need more testing or possibly a rewrite" * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits) scsi: sd: enable compat ioctls for sed-opal pktcdvd: add compat_ioctl handler compat_ioctl: move SG_GET_REQUEST_TABLE handling compat_ioctl: ppp: move simple commands into ppp_generic.c compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic compat_ioctl: unify copy-in of ppp filters tty: handle compat PPP ioctls compat_ioctl: move SIOCOUTQ out of compat_ioctl.c compat_ioctl: handle SIOCOUTQNSD af_unix: add compat_ioctl support compat_ioctl: reimplement SG_IO handling compat_ioctl: move WDIOC handling into wdt drivers fs: compat_ioctl: move FITRIM emulation into file systems gfs2: add compat_ioctl support compat_ioctl: remove unused convert_in_user macro compat_ioctl: remove last RAID handling code compat_ioctl: remove /dev/raw ioctl translation compat_ioctl: remove PCI ioctl translation compat_ioctl: remove joystick ioctl translation ...
2019-11-30Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2-263/+41
Pull hmm updates from Jason Gunthorpe: "This is another round of bug fixing and cleanup. This time the focus is on the driver pattern to use mmu notifiers to monitor a VA range. This code is lifted out of many drivers and hmm_mirror directly into the mmu_notifier core and written using the best ideas from all the driver implementations. This removes many bugs from the drivers and has a very pleasing diffstat. More drivers can still be converted, but that is for another cycle. - A shared branch with RDMA reworking the RDMA ODP implementation - New mmu_interval_notifier API. This is focused on the use case of monitoring a VA and simplifies the process for drivers - A common seq-count locking scheme built into the mmu_interval_notifier API usable by drivers that call get_user_pages() or hmm_range_fault() with the VA range - Conversion of mlx5 ODP, hfi1, radeon, nouveau, AMD GPU, and Xen GntDev drivers to the new API. This deletes a lot of wonky driver code. - Two improvements for hmm_range_fault(), from testing done by Ralph" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: mm/hmm: remove hmm_range_dma_map and hmm_range_dma_unmap mm/hmm: make full use of walk_page_range() xen/gntdev: use mmu_interval_notifier_insert mm/hmm: remove hmm_mirror and related drm/amdgpu: Use mmu_interval_notifier instead of hmm_mirror drm/amdgpu: Use mmu_interval_insert instead of hmm_mirror drm/amdgpu: Call find_vma under mmap_sem nouveau: use mmu_interval_notifier instead of hmm_mirror nouveau: use mmu_notifier directly for invalidate_range_start drm/radeon: use mmu_interval_notifier_insert RDMA/hfi1: Use mmu_interval_notifier_insert for user_exp_rcv RDMA/odp: Use mmu_interval_notifier_insert() mm/hmm: define the pre-processor related parts of hmm.h even if disabled mm/hmm: allow hmm_range to be used with a mmu_interval_notifier or hmm_mirror mm/mmu_notifier: add an interval tree notifier mm/mmu_notifier: define the header pre-processor parts even if disabled mm/hmm: allow snapshot of the special zero page
2019-11-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds24-369/+671
Pull rdma updates from Jason Gunthorpe: "Again another fairly quiet cycle with few notable core code changes and the usual variety of driver bug fixes and small improvements. - Various driver updates and bug fixes for siw, bnxt_re, hns, qedr, iw_cxgb4, vmw_pvrdma, mlx5 - Improvements in SRPT from working with iWarp - SRIOV VF support for bnxt_re - Skeleton kernel-doc files for drivers/infiniband - User visible counters for events related to ODP - Common code for tracking of mmap lifetimes so that drivers can link HW object liftime to a VMA - ODP bug fixes and rework - RDMA READ support for efa - Removal of the very old cxgb3 driver" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (168 commits) RDMA/hns: Delete unnecessary callback functions for cq RDMA/hns: Rename the functions used inside creating cq RDMA/hns: Redefine the member of hns_roce_cq struct RDMA/hns: Redefine interfaces used in creating cq RDMA/efa: Expose RDMA read related attributes RDMA/efa: Support remote read access in MR registration RDMA/efa: Store network attributes in device attributes IB/hfi1: remove redundant assignment to variable ret RDMA/bnxt_re: Fix missing le16_to_cpu RDMA/bnxt_re: Fix stat push into dma buffer on gen p5 devices RDMA/bnxt_re: Fix chip number validation Broadcom's Gen P5 series RDMA/bnxt_re: Fix Kconfig indentation IB/mlx5: Implement callbacks for getting VFs GUID attributes IB/ipoib: Add ndo operation for getting VFs GUID attributes IB/core: Add interfaces to get VF node and port GUIDs net/core: Add support for getting VF GUIDs RDMA/qedr: Fix null-pointer dereference when calling rdma_user_mmap_get_offset RDMA/cm: Use refcount_t type for refcount variable IB/mlx5: Support extended number of strides for Striding RQ IB/mlx4: Update HW GID table while adding vlan GID ...
2019-11-25Merge branch 'ib-guids' into rdma.git for-nextJason Gunthorpe2-0/+11
Danit Goldberg says: ==================== This series extends RTNETLINK to provide IB port and node GUIDs, which were configured for Infiniband VFs. The functionality to set VF GUIDs already existed for a long time, and here we are adding the missing "get" so that netlink will be symmetric and various cloud orchestration tools will be able to manage such VFs more naturally. The iproute2 was extended too to present those GUIDs. - ip link show <device> For example: - ip link set ib4 vf 0 node_guid 22:44:33:00:33:11:00:33 - ip link set ib4 vf 0 port_guid 10:21:33:12:00:11:22:10 - ip link show ib4 ib4: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff vf 0 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off ==================== Based on the mlx5-next branch from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for dependencies * branch 'ib-guids': (35 commits) IB/mlx5: Implement callbacks for getting VFs GUID attributes IB/ipoib: Add ndo operation for getting VFs GUID attributes IB/core: Add interfaces to get VF node and port GUIDs net/core: Add support for getting VF GUIDs net/mlx5: Add new chain for netfilter flow table offload net/mlx5: Refactor creating fast path prio chains net/mlx5: Accumulate levels for chains prio namespaces net/mlx5: Define fdb tc levels per prio net/mlx5: Rename FDB_* tc related defines to FDB_TC_* defines net/mlx5: Simplify fdb chain and prio eswitch defines IB/mlx5: Load profile according to RoCE enablement state IB/mlx5: Rename profile and init methods net/mlx5: Handle "enable_roce" devlink param net/mlx5: Document flow_steering_mode devlink param devlink: Add new "enable_roce" generic device param net/mlx5: fix spelling mistake "metdata" -> "metadata" net/mlx5: fix kvfree of uninitialized pointer spec IB/mlx5: Introduce and use mlx5_core_is_vf() net/mlx5: E-switch, Enable metadata on own vport net/mlx5: Refactor ingress acl configuration ... Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23RDMA/odp: Use mmu_interval_notifier_insert()Jason Gunthorpe2-263/+41
Replace the internal interval tree based mmu notifier with the new common mmu_interval_notifier_insert() API. This removes a lot of code and fixes a deadlock that can be triggered in ODP: zap_page_range() mmu_notifier_invalidate_range_start() [..] ib_umem_notifier_invalidate_range_start() down_read(&per_mm->umem_rwsem) unmap_single_vma() [..] __split_huge_page_pmd() mmu_notifier_invalidate_range_start() [..] ib_umem_notifier_invalidate_range_start() down_read(&per_mm->umem_rwsem) // DEADLOCK mmu_notifier_invalidate_range_end() up_read(&per_mm->umem_rwsem) mmu_notifier_invalidate_range_end() up_read(&per_mm->umem_rwsem) The umem_rwsem is held across the range_start/end as the ODP algorithm for invalidate_range_end cannot tolerate changes to the interval tree. However, due to the nested invalidation regions the second down_read() can deadlock if there are competing writers. The new core code provides an alternative scheme to solve this problem. Fixes: ca748c39ea3f ("RDMA/umem: Get rid of per_mm->notifier_count") Link: https://lore.kernel.org/r/20191112202231.3856-6-jgg@ziepe.ca Tested-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-22IB/core: Add interfaces to get VF node and port GUIDsDanit Goldberg2-0/+11
Provide ability to get node and port GUIDs of VFs to be symmetrical to already existing set option. Signed-off-by: Danit Goldberg <danitg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-11-19RDMA/cm: Use refcount_t type for refcount variableDanit Goldberg1-12/+12
This atomic in struct cm_id_private is being used as a refcount, change it to refcount_t for better clarity and to get the refcount protections. Link: https://lore.kernel.org/r/1573997601-4502-1-git-send-email-danitg@mellanox.com Signed-off-by: Danit Goldberg <danitg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-17RDMA/cma: Use ACK timeout for RoCE packetLifeTimeDag Moxnes1-2/+13
The cma is currently using a hard-coded value, CMA_IBOE_PACKET_LIFETIME, for the PacketLifeTime, as it can not be determined from the network. This value might not be optimal for all networks. The cma module supports the function rdma_set_ack_timeout to set the ACK timeout for a QP associated with a connection. As per IBTA 12.7.34 local ACK timeout = (2 * PacketLifeTime + Local CA’s ACK delay). Assuming a negligible local ACK delay, we can use PacketLifeTime = local ACK timeout/2 as a reasonable approximation for RoCE networks. Link: https://lore.kernel.org/r/1572439440-17416-1-git-send-email-dag.moxnes@oracle.com Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-17IB/umem: remove the dmasync argument to ib_umem_getChristoph Hellwig1-2/+1
The argument is always ignored, so remove it. Link: https://lore.kernel.org/r/20191113073214.9514-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-14dma-mapping: remove the DMA_ATTR_WRITE_BARRIER flagChristoph Hellwig1-7/+2
This flag is not implemented by any backend and only set by the ib_umem module in a single instance. Link: https://lore.kernel.org/r/20191113073214.9514-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-12RDMA: Change MAD processing function to remove extra casting and parameterLeon Romanovsky2-10/+8
All users of process_mad() converts input pointers from ib_mad_hdr to be ib_mad, update the function declaration to use ib_mad directly. Also remove not used input MAD size parameter. Link: https://lore.kernel.org/r/20191029062745.7932-17-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Tested-By: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06RDMA/mad: Allocate zeroed MAD bufferLeon Romanovsky1-2/+2
Ensure that MAD output buffer is zero-based allocated in all the callers of process_mad and remove the various memset()'s from the drivers. Link: https://lore.kernel.org/r/20191029062745.7932-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06RDMA/mad: Delete never implemented functionsLeon Romanovsky1-19/+0
Delete never implemented and used MAD functions. Link: https://lore.kernel.org/r/20191029062745.7932-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06RDMA/core: Fix return code when modify_port isn't supportedKamal Heib1-1/+5
Improve return code from ib_modify_port() by doing the following: - Use "-EOPNOTSUPP" instead "-ENOSYS" which is the proper return code - Allow only fake IB_PORT_CM_SUP manipulation for RoCE providers that didn't implement the modify_port callback, otherwise return "-EOPNOTSUPP" Fixes: 61e0962d5221 ("IB: Avoid ib_modify_port() failure for RoCE devices") Link: https://lore.kernel.org/r/20191028155931.1114-2-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06RDMA: Connect between the mmap entry and the umap_priv structureMichal Kalderon3-13/+49
The rdma_user_mmap_io interface created a common interface for drivers to correctly map hw resources and zap them once the ucontext is destroyed enabling the drivers to safely free the hw resources. However, this meant the drivers need to delay freeing the resource to the ucontext destroy phase to ensure they were no longer mapped. The new mechanism for a common way of handling user/driver address mapping enabled notifying the driver if all umap_priv mappings were removed, and enabled freeing the hw resources when they are done with and not delay it until ucontext destroy. Since not all drivers use the mechanism, NULL can be sent to the rdma_user_mmap_io interface to continue working as before. Drivers that use the mmap_xa interface can pass the entry being mapped to the rdma_user_mmap_io function to be linked together. Link: https://lore.kernel.org/r/20191030094417.16866-4-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06RDMA/core: Create mmap database and cookie helper functionsMichal Kalderon4-0/+240
Create some common API's for adding entries to a xa_mmap. Searching for an entry and freeing one. The general approach is copied from the EFA driver and improved to be more general and do more to help the drivers. Integration with the core allows a reference counted scheme with a free function so that the driver can know when its mmaps are all gone. This significant new functionality will be helpful for drivers to have the correct lifetime model for mmap objects. Link: https://lore.kernel.org/r/20191030094417.16866-3-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-05RDMA/core: Move core content from ib_uverbs to ib_coreMichal Kalderon4-72/+86
Move functionality that is called by the driver, which is related to umap, to a new file that will be linked in ib_core. This is a first step in later enabling ib_uverbs to be optional. vm_ops is now initialized in ib_uverbs_mmap instead of priv_init to avoid having to move all the rdma_umap functions as well. Link: https://lore.kernel.org/r/20191030094417.16866-2-michal.kalderon@marvell.com Suggested-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28Merge branch 'odp_rework' into rdma.git for-nextJason Gunthorpe1-19/+19
Jason Gunthorpe says: ==================== In order to hoist the interval tree code out of the drivers and into the mmu_notifiers it is necessary for the drivers to not use the interval tree for other things. This series replaces the interval tree with an xarray and along the way re-aligns all the locking to use a sensible SRCU model where the 'update' step is done by modifying an xarray. The result is overall much simpler and with less locking in the critical path. Many functions were reworked for clarity and small details like using 'imr' to refer to the implicit MR make the entire code flow here more readable. This also squashes at least two race bugs on its own, and quite possibily more that haven't been identified. ==================== Merge conflicts with the odp statistics patch resolved. * branch 'odp_rework': RDMA/odp: Remove broken debugging call to invalidate_range RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray RDMA/mlx5: Rework implicit ODP destroy RDMA/mlx5: Avoid double lookups on the pagefault path RDMA/mlx5: Reduce locking in implicit_mr_get_data() RDMA/mlx5: Use an xarray for the children of an implicit ODP RDMA/mlx5: Split implicit handling from pagefault_mr RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it RDMA/mlx5: Rework implicit_mr_get_data RDMA/mlx5: Delete struct mlx5_priv->mkey_table RDMA/mlx5: Use a dedicated mkey xarray for ODP RDMA/mlx5: Split sig_err MR data into its own xarray RDMA/mlx5: Use SRCU properly in ODP prefetch Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/odp: Remove broken debugging call to invalidate_rangeJason Gunthorpe1-19/+19
invalidate_range() also obtains the umem_mutex which is being held at this point, so if this path were was ever called it would deadlock. Thus conclude the debugging never triggers and rework it into a simple WARN_ON and leave things as they are. While here add a note to explain how we could possibly get inconsistent page pointers. Link: https://lore.kernel.org/r/20191009160934.3143-16-jgg@ziepe.ca Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28Merge tag 'v5.4-rc5' into rdma.git for-nextJason Gunthorpe6-14/+19
Linux 5.4-rc5 For dependencies in the next patches Conflict resolved by keeping the delete of the unlock. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/core: Set DMA parameters correctlyBart Van Assche1-2/+14
The dma_set_max_seg_size() call in setup_dma_device() does not have any effect since device->dev.dma_parms is NULL. Fix this by initializing device->dev.dma_parms first. Link: https://lore.kernel.org/r/20191025225830.257535-5-bvanassche@acm.org Fixes: d10bcf947a3e ("RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/cm: Update copyright together with SPDX tagLeon Romanovsky3-85/+6
Add Mellanox to lust of copyright holders and replace copyright boilerplate with relevant SPDX tag. Link: https://lore.kernel.org/r/20191020071559.9743-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/cm: Use specific keyword to check defineLeon Romanovsky1-1/+1
There is a specific define keyword to check if define exists or not, let's use it instead of open-coded variant. Link: https://lore.kernel.org/r/20191020071559.9743-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/cm: Delete unused cm_is_active_peer functionLeon Romanovsky1-8/+0
Function cm_is_active_peer is not used, delete it. Link: https://lore.kernel.org/r/20191020071559.9743-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-24IB/core: Avoid deadlock during netlink message handlingParav Pandit3-54/+56
When rdmacm module is not loaded, and when netlink message is received to get char device info, it results into a deadlock due to recursive locking of rdma_nl_mutex with the below call sequence. [..] rdma_nl_rcv() mutex_lock() [..] rdma_nl_rcv_msg() ib_get_client_nl_info() request_module() iw_cm_init() rdma_nl_register() mutex_lock(); <- Deadlock, acquiring mutex again Due to above call sequence, following call trace and deadlock is observed. kernel: __mutex_lock+0x35e/0x860 kernel: ? __mutex_lock+0x129/0x860 kernel: ? rdma_nl_register+0x1a/0x90 [ib_core] kernel: rdma_nl_register+0x1a/0x90 [ib_core] kernel: ? 0xffffffffc029b000 kernel: iw_cm_init+0x34/0x1000 [iw_cm] kernel: do_one_initcall+0x67/0x2d4 kernel: ? kmem_cache_alloc_trace+0x1ec/0x2a0 kernel: do_init_module+0x5a/0x223 kernel: load_module+0x1998/0x1e10 kernel: ? __symbol_put+0x60/0x60 kernel: __do_sys_finit_module+0x94/0xe0 kernel: do_syscall_64+0x5a/0x270 kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe process stack trace: [<0>] __request_module+0x1c9/0x460 [<0>] ib_get_client_nl_info+0x5e/0xb0 [ib_core] [<0>] nldev_get_chardev+0x1ac/0x320 [ib_core] [<0>] rdma_nl_rcv_msg+0xeb/0x1d0 [ib_core] [<0>] rdma_nl_rcv+0xcd/0x120 [ib_core] [<0>] netlink_unicast+0x179/0x220 [<0>] netlink_sendmsg+0x2f6/0x3f0 [<0>] sock_sendmsg+0x30/0x40 [<0>] ___sys_sendmsg+0x27a/0x290 [<0>] __sys_sendmsg+0x58/0xa0 [<0>] do_syscall_64+0x5a/0x270 [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe To overcome this deadlock and to allow multiple netlink messages to progress in parallel, following scheme is implemented. 1. Split the lock protecting the cb_table into a per-index lock, and make it a rwlock. This lock is used to ensure no callbacks are running after unregistration returns. Since a module will not be registered once it is already running callbacks, this avoids the deadlock. 2. Use smp_store_release() to update the cb_table during registration so that no lock is required. This avoids lockdep problems with thinking all the rwsems are the same lock class. Fixes: 0e2d00eb6fd45 ("RDMA: Add NLDEV_GET_CHARDEV to allow char dev discovery and autoload") Link: https://lore.kernel.org/r/20191015080733.18625-1-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-24RDMA/nldev: Skip counter if port doesn't matchMark Zhang1-1/+1
The counter resource should return -EAGAIN if it was requested for a different port, this is similar to how QP works if the users provides a port filter. Otherwise port filtering in netlink will return broken counter nests. Fixes: c4ffee7c9bdb ("RDMA/netlink: Implement counter dumpit calback") Link: https://lore.kernel.org/r/20191020062800.8065-1-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-23RDMA/core: Check that process is still alive before sending it to the usersLeon Romanovsky2-9/+33
The PID information can disappear asynchronously because the task can be killed and moved to zombie state. In this case, PID will be zero in similar way to the kernel tasks. Recognize such situation where we are asking to return orphaned object and simply skip filling PID attribute. As part of this change, document the same scenario in counter.c code. Link: https://lore.kernel.org/r/20191010071105.25538-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-23RDMA/restrack: Remove PID namespace supportLeon Romanovsky4-46/+2
IB resources are bounded to IB device and file descriptors, both entities are unaware to PID namespaces and to task lifetime. The difference in model caused to unpredictable behavior for the following scenario: 1. Create FD and context 2. Share it with ephemeral child 3. Create any object and exit that child The end result of this flow, that those newly created objects will be tracked by restrack, but won't be visible for users because task_struct associated with them already exited. The right thing is to rely on net namespace only for any filtering purposes and drop PID namespace. Link: https://lore.kernel.org/r/20191010071105.25538-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-23compat_ioctl: move more drivers to compat_ptr_ioctlArnd Bergmann1-2/+2
The .ioctl and .compat_ioctl file operations have the same prototype so they can both point to the same function, which works great almost all the time when all the commands are compatible. One exception is the s390 architecture, where a compat pointer is only 31 bit wide, and converting it into a 64-bit pointer requires calling compat_ptr(). Most drivers here will never run in s390, but since we now have a generic helper for it, it's easy enough to use it consistently. I double-checked all these drivers to ensure that all ioctl arguments are used as pointers or are ignored, but are not interpreted as integer values. Acked-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: David Sterba <dsterba@suse.com> Acked-by: Darren Hart (VMware) <dvhart@infradead.org> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-10-22IB/core: Do not notify GID change event of an unregistered deviceParav Pandit1-7/+1
When IB device is undergoing unregistration, the GID cache is always cleaned up after all clients are unregistered with the below flow. __ib_unregister_device() disable_device() ib_cache_cleanup_one() gid_table_cleanup_one() cleanup_gid_table_port() There is no use in generating a GID change event at this stage, where there is no active client of the device and device is nearly unregistered. Link: https://lore.kernel.org/r/20191020065427.8772-4-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22RDMA/uverbs: Add a check for uverbs_attr_get to uverbs_copy_to_struct_or_zeroChuhong Yuan1-0/+3
All current callers for uverbs_copy_to_struct_or_zero() already check that the attribute exists, but it make sense to verify the result like the other functions do. Link: https://lore.kernel.org/r/20191018081533.8544-1-hslester96@gmail.com Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>