path: root/drivers/infiniband/core (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-12-01Merge tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playgroundLinus Torvalds1-2/+2
Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann: "As part of the cleanup of some remaining y2038 issues, I came to fs/compat_ioctl.c, which still has a couple of commands that need support for time64_t. In completely unrelated work, I spent time on cleaning up parts of this file in the past, moving things out into drivers instead. After Al Viro reviewed an earlier version of this series and did a lot more of that cleanup, I decided to try to completely eliminate the rest of it and move it all into drivers. This series incorporates some of Al's work and many patches of my own, but in the end stops short of actually removing the last part, which is the scsi ioctl handlers. I have patches for those as well, but they need more testing or possibly a rewrite" * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits) scsi: sd: enable compat ioctls for sed-opal pktcdvd: add compat_ioctl handler compat_ioctl: move SG_GET_REQUEST_TABLE handling compat_ioctl: ppp: move simple commands into ppp_generic.c compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic compat_ioctl: unify copy-in of ppp filters tty: handle compat PPP ioctls compat_ioctl: move SIOCOUTQ out of compat_ioctl.c compat_ioctl: handle SIOCOUTQNSD af_unix: add compat_ioctl support compat_ioctl: reimplement SG_IO handling compat_ioctl: move WDIOC handling into wdt drivers fs: compat_ioctl: move FITRIM emulation into file systems gfs2: add compat_ioctl support compat_ioctl: remove unused convert_in_user macro compat_ioctl: remove last RAID handling code compat_ioctl: remove /dev/raw ioctl translation compat_ioctl: remove PCI ioctl translation compat_ioctl: remove joystick ioctl translation ...
2019-11-30Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2-263/+41
Pull hmm updates from Jason Gunthorpe: "This is another round of bug fixing and cleanup. This time the focus is on the driver pattern to use mmu notifiers to monitor a VA range. This code is lifted out of many drivers and hmm_mirror directly into the mmu_notifier core and written using the best ideas from all the driver implementations. This removes many bugs from the drivers and has a very pleasing diffstat. More drivers can still be converted, but that is for another cycle. - A shared branch with RDMA reworking the RDMA ODP implementation - New mmu_interval_notifier API. This is focused on the use case of monitoring a VA and simplifies the process for drivers - A common seq-count locking scheme built into the mmu_interval_notifier API usable by drivers that call get_user_pages() or hmm_range_fault() with the VA range - Conversion of mlx5 ODP, hfi1, radeon, nouveau, AMD GPU, and Xen GntDev drivers to the new API. This deletes a lot of wonky driver code. - Two improvements for hmm_range_fault(), from testing done by Ralph" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: mm/hmm: remove hmm_range_dma_map and hmm_range_dma_unmap mm/hmm: make full use of walk_page_range() xen/gntdev: use mmu_interval_notifier_insert mm/hmm: remove hmm_mirror and related drm/amdgpu: Use mmu_interval_notifier instead of hmm_mirror drm/amdgpu: Use mmu_interval_insert instead of hmm_mirror drm/amdgpu: Call find_vma under mmap_sem nouveau: use mmu_interval_notifier instead of hmm_mirror nouveau: use mmu_notifier directly for invalidate_range_start drm/radeon: use mmu_interval_notifier_insert RDMA/hfi1: Use mmu_interval_notifier_insert for user_exp_rcv RDMA/odp: Use mmu_interval_notifier_insert() mm/hmm: define the pre-processor related parts of hmm.h even if disabled mm/hmm: allow hmm_range to be used with a mmu_interval_notifier or hmm_mirror mm/mmu_notifier: add an interval tree notifier mm/mmu_notifier: define the header pre-processor parts even if disabled mm/hmm: allow snapshot of the special zero page
2019-11-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds24-369/+671
Pull rdma updates from Jason Gunthorpe: "Again another fairly quiet cycle with few notable core code changes and the usual variety of driver bug fixes and small improvements. - Various driver updates and bug fixes for siw, bnxt_re, hns, qedr, iw_cxgb4, vmw_pvrdma, mlx5 - Improvements in SRPT from working with iWarp - SRIOV VF support for bnxt_re - Skeleton kernel-doc files for drivers/infiniband - User visible counters for events related to ODP - Common code for tracking of mmap lifetimes so that drivers can link HW object liftime to a VMA - ODP bug fixes and rework - RDMA READ support for efa - Removal of the very old cxgb3 driver" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (168 commits) RDMA/hns: Delete unnecessary callback functions for cq RDMA/hns: Rename the functions used inside creating cq RDMA/hns: Redefine the member of hns_roce_cq struct RDMA/hns: Redefine interfaces used in creating cq RDMA/efa: Expose RDMA read related attributes RDMA/efa: Support remote read access in MR registration RDMA/efa: Store network attributes in device attributes IB/hfi1: remove redundant assignment to variable ret RDMA/bnxt_re: Fix missing le16_to_cpu RDMA/bnxt_re: Fix stat push into dma buffer on gen p5 devices RDMA/bnxt_re: Fix chip number validation Broadcom's Gen P5 series RDMA/bnxt_re: Fix Kconfig indentation IB/mlx5: Implement callbacks for getting VFs GUID attributes IB/ipoib: Add ndo operation for getting VFs GUID attributes IB/core: Add interfaces to get VF node and port GUIDs net/core: Add support for getting VF GUIDs RDMA/qedr: Fix null-pointer dereference when calling rdma_user_mmap_get_offset RDMA/cm: Use refcount_t type for refcount variable IB/mlx5: Support extended number of strides for Striding RQ IB/mlx4: Update HW GID table while adding vlan GID ...
2019-11-25Merge branch 'ib-guids' into rdma.git for-nextJason Gunthorpe2-0/+11
Danit Goldberg says: ==================== This series extends RTNETLINK to provide IB port and node GUIDs, which were configured for Infiniband VFs. The functionality to set VF GUIDs already existed for a long time, and here we are adding the missing "get" so that netlink will be symmetric and various cloud orchestration tools will be able to manage such VFs more naturally. The iproute2 was extended too to present those GUIDs. - ip link show <device> For example: - ip link set ib4 vf 0 node_guid 22:44:33:00:33:11:00:33 - ip link set ib4 vf 0 port_guid 10:21:33:12:00:11:22:10 - ip link show ib4 ib4: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff vf 0 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off ==================== Based on the mlx5-next branch from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for dependencies * branch 'ib-guids': (35 commits) IB/mlx5: Implement callbacks for getting VFs GUID attributes IB/ipoib: Add ndo operation for getting VFs GUID attributes IB/core: Add interfaces to get VF node and port GUIDs net/core: Add support for getting VF GUIDs net/mlx5: Add new chain for netfilter flow table offload net/mlx5: Refactor creating fast path prio chains net/mlx5: Accumulate levels for chains prio namespaces net/mlx5: Define fdb tc levels per prio net/mlx5: Rename FDB_* tc related defines to FDB_TC_* defines net/mlx5: Simplify fdb chain and prio eswitch defines IB/mlx5: Load profile according to RoCE enablement state IB/mlx5: Rename profile and init methods net/mlx5: Handle "enable_roce" devlink param net/mlx5: Document flow_steering_mode devlink param devlink: Add new "enable_roce" generic device param net/mlx5: fix spelling mistake "metdata" -> "metadata" net/mlx5: fix kvfree of uninitialized pointer spec IB/mlx5: Introduce and use mlx5_core_is_vf() net/mlx5: E-switch, Enable metadata on own vport net/mlx5: Refactor ingress acl configuration ... Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-23RDMA/odp: Use mmu_interval_notifier_insert()Jason Gunthorpe2-263/+41
Replace the internal interval tree based mmu notifier with the new common mmu_interval_notifier_insert() API. This removes a lot of code and fixes a deadlock that can be triggered in ODP: zap_page_range() mmu_notifier_invalidate_range_start() [..] ib_umem_notifier_invalidate_range_start() down_read(&per_mm->umem_rwsem) unmap_single_vma() [..] __split_huge_page_pmd() mmu_notifier_invalidate_range_start() [..] ib_umem_notifier_invalidate_range_start() down_read(&per_mm->umem_rwsem) // DEADLOCK mmu_notifier_invalidate_range_end() up_read(&per_mm->umem_rwsem) mmu_notifier_invalidate_range_end() up_read(&per_mm->umem_rwsem) The umem_rwsem is held across the range_start/end as the ODP algorithm for invalidate_range_end cannot tolerate changes to the interval tree. However, due to the nested invalidation regions the second down_read() can deadlock if there are competing writers. The new core code provides an alternative scheme to solve this problem. Fixes: ca748c39ea3f ("RDMA/umem: Get rid of per_mm->notifier_count") Link: https://lore.kernel.org/r/20191112202231.3856-6-jgg@ziepe.ca Tested-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-22IB/core: Add interfaces to get VF node and port GUIDsDanit Goldberg2-0/+11
Provide ability to get node and port GUIDs of VFs to be symmetrical to already existing set option. Signed-off-by: Danit Goldberg <danitg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-11-19RDMA/cm: Use refcount_t type for refcount variableDanit Goldberg1-12/+12
This atomic in struct cm_id_private is being used as a refcount, change it to refcount_t for better clarity and to get the refcount protections. Link: https://lore.kernel.org/r/1573997601-4502-1-git-send-email-danitg@mellanox.com Signed-off-by: Danit Goldberg <danitg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-17RDMA/cma: Use ACK timeout for RoCE packetLifeTimeDag Moxnes1-2/+13
The cma is currently using a hard-coded value, CMA_IBOE_PACKET_LIFETIME, for the PacketLifeTime, as it can not be determined from the network. This value might not be optimal for all networks. The cma module supports the function rdma_set_ack_timeout to set the ACK timeout for a QP associated with a connection. As per IBTA 12.7.34 local ACK timeout = (2 * PacketLifeTime + Local CA’s ACK delay). Assuming a negligible local ACK delay, we can use PacketLifeTime = local ACK timeout/2 as a reasonable approximation for RoCE networks. Link: https://lore.kernel.org/r/1572439440-17416-1-git-send-email-dag.moxnes@oracle.com Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-17IB/umem: remove the dmasync argument to ib_umem_getChristoph Hellwig1-2/+1
The argument is always ignored, so remove it. Link: https://lore.kernel.org/r/20191113073214.9514-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-14dma-mapping: remove the DMA_ATTR_WRITE_BARRIER flagChristoph Hellwig1-7/+2
This flag is not implemented by any backend and only set by the ib_umem module in a single instance. Link: https://lore.kernel.org/r/20191113073214.9514-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-12RDMA: Change MAD processing function to remove extra casting and parameterLeon Romanovsky2-10/+8
All users of process_mad() converts input pointers from ib_mad_hdr to be ib_mad, update the function declaration to use ib_mad directly. Also remove not used input MAD size parameter. Link: https://lore.kernel.org/r/20191029062745.7932-17-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Tested-By: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06RDMA/mad: Allocate zeroed MAD bufferLeon Romanovsky1-2/+2
Ensure that MAD output buffer is zero-based allocated in all the callers of process_mad and remove the various memset()'s from the drivers. Link: https://lore.kernel.org/r/20191029062745.7932-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06RDMA/mad: Delete never implemented functionsLeon Romanovsky1-19/+0
Delete never implemented and used MAD functions. Link: https://lore.kernel.org/r/20191029062745.7932-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06RDMA/core: Fix return code when modify_port isn't supportedKamal Heib1-1/+5
Improve return code from ib_modify_port() by doing the following: - Use "-EOPNOTSUPP" instead "-ENOSYS" which is the proper return code - Allow only fake IB_PORT_CM_SUP manipulation for RoCE providers that didn't implement the modify_port callback, otherwise return "-EOPNOTSUPP" Fixes: 61e0962d5221 ("IB: Avoid ib_modify_port() failure for RoCE devices") Link: https://lore.kernel.org/r/20191028155931.1114-2-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06RDMA: Connect between the mmap entry and the umap_priv structureMichal Kalderon3-13/+49
The rdma_user_mmap_io interface created a common interface for drivers to correctly map hw resources and zap them once the ucontext is destroyed enabling the drivers to safely free the hw resources. However, this meant the drivers need to delay freeing the resource to the ucontext destroy phase to ensure they were no longer mapped. The new mechanism for a common way of handling user/driver address mapping enabled notifying the driver if all umap_priv mappings were removed, and enabled freeing the hw resources when they are done with and not delay it until ucontext destroy. Since not all drivers use the mechanism, NULL can be sent to the rdma_user_mmap_io interface to continue working as before. Drivers that use the mmap_xa interface can pass the entry being mapped to the rdma_user_mmap_io function to be linked together. Link: https://lore.kernel.org/r/20191030094417.16866-4-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-06RDMA/core: Create mmap database and cookie helper functionsMichal Kalderon4-0/+240
Create some common API's for adding entries to a xa_mmap. Searching for an entry and freeing one. The general approach is copied from the EFA driver and improved to be more general and do more to help the drivers. Integration with the core allows a reference counted scheme with a free function so that the driver can know when its mmaps are all gone. This significant new functionality will be helpful for drivers to have the correct lifetime model for mmap objects. Link: https://lore.kernel.org/r/20191030094417.16866-3-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-05RDMA/core: Move core content from ib_uverbs to ib_coreMichal Kalderon4-72/+86
Move functionality that is called by the driver, which is related to umap, to a new file that will be linked in ib_core. This is a first step in later enabling ib_uverbs to be optional. vm_ops is now initialized in ib_uverbs_mmap instead of priv_init to avoid having to move all the rdma_umap functions as well. Link: https://lore.kernel.org/r/20191030094417.16866-2-michal.kalderon@marvell.com Suggested-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28Merge branch 'odp_rework' into rdma.git for-nextJason Gunthorpe1-19/+19
Jason Gunthorpe says: ==================== In order to hoist the interval tree code out of the drivers and into the mmu_notifiers it is necessary for the drivers to not use the interval tree for other things. This series replaces the interval tree with an xarray and along the way re-aligns all the locking to use a sensible SRCU model where the 'update' step is done by modifying an xarray. The result is overall much simpler and with less locking in the critical path. Many functions were reworked for clarity and small details like using 'imr' to refer to the implicit MR make the entire code flow here more readable. This also squashes at least two race bugs on its own, and quite possibily more that haven't been identified. ==================== Merge conflicts with the odp statistics patch resolved. * branch 'odp_rework': RDMA/odp: Remove broken debugging call to invalidate_range RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray RDMA/mlx5: Rework implicit ODP destroy RDMA/mlx5: Avoid double lookups on the pagefault path RDMA/mlx5: Reduce locking in implicit_mr_get_data() RDMA/mlx5: Use an xarray for the children of an implicit ODP RDMA/mlx5: Split implicit handling from pagefault_mr RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it RDMA/mlx5: Rework implicit_mr_get_data RDMA/mlx5: Delete struct mlx5_priv->mkey_table RDMA/mlx5: Use a dedicated mkey xarray for ODP RDMA/mlx5: Split sig_err MR data into its own xarray RDMA/mlx5: Use SRCU properly in ODP prefetch Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/odp: Remove broken debugging call to invalidate_rangeJason Gunthorpe1-19/+19
invalidate_range() also obtains the umem_mutex which is being held at this point, so if this path were was ever called it would deadlock. Thus conclude the debugging never triggers and rework it into a simple WARN_ON and leave things as they are. While here add a note to explain how we could possibly get inconsistent page pointers. Link: https://lore.kernel.org/r/20191009160934.3143-16-jgg@ziepe.ca Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28Merge tag 'v5.4-rc5' into rdma.git for-nextJason Gunthorpe6-14/+19
Linux 5.4-rc5 For dependencies in the next patches Conflict resolved by keeping the delete of the unlock. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/core: Set DMA parameters correctlyBart Van Assche1-2/+14
The dma_set_max_seg_size() call in setup_dma_device() does not have any effect since device->dev.dma_parms is NULL. Fix this by initializing device->dev.dma_parms first. Link: https://lore.kernel.org/r/20191025225830.257535-5-bvanassche@acm.org Fixes: d10bcf947a3e ("RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/cm: Update copyright together with SPDX tagLeon Romanovsky3-85/+6
Add Mellanox to lust of copyright holders and replace copyright boilerplate with relevant SPDX tag. Link: https://lore.kernel.org/r/20191020071559.9743-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/cm: Use specific keyword to check defineLeon Romanovsky1-1/+1
There is a specific define keyword to check if define exists or not, let's use it instead of open-coded variant. Link: https://lore.kernel.org/r/20191020071559.9743-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/cm: Delete unused cm_is_active_peer functionLeon Romanovsky1-8/+0
Function cm_is_active_peer is not used, delete it. Link: https://lore.kernel.org/r/20191020071559.9743-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-24IB/core: Avoid deadlock during netlink message handlingParav Pandit3-54/+56
When rdmacm module is not loaded, and when netlink message is received to get char device info, it results into a deadlock due to recursive locking of rdma_nl_mutex with the below call sequence. [..] rdma_nl_rcv() mutex_lock() [..] rdma_nl_rcv_msg() ib_get_client_nl_info() request_module() iw_cm_init() rdma_nl_register() mutex_lock(); <- Deadlock, acquiring mutex again Due to above call sequence, following call trace and deadlock is observed. kernel: __mutex_lock+0x35e/0x860 kernel: ? __mutex_lock+0x129/0x860 kernel: ? rdma_nl_register+0x1a/0x90 [ib_core] kernel: rdma_nl_register+0x1a/0x90 [ib_core] kernel: ? 0xffffffffc029b000 kernel: iw_cm_init+0x34/0x1000 [iw_cm] kernel: do_one_initcall+0x67/0x2d4 kernel: ? kmem_cache_alloc_trace+0x1ec/0x2a0 kernel: do_init_module+0x5a/0x223 kernel: load_module+0x1998/0x1e10 kernel: ? __symbol_put+0x60/0x60 kernel: __do_sys_finit_module+0x94/0xe0 kernel: do_syscall_64+0x5a/0x270 kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe process stack trace: [<0>] __request_module+0x1c9/0x460 [<0>] ib_get_client_nl_info+0x5e/0xb0 [ib_core] [<0>] nldev_get_chardev+0x1ac/0x320 [ib_core] [<0>] rdma_nl_rcv_msg+0xeb/0x1d0 [ib_core] [<0>] rdma_nl_rcv+0xcd/0x120 [ib_core] [<0>] netlink_unicast+0x179/0x220 [<0>] netlink_sendmsg+0x2f6/0x3f0 [<0>] sock_sendmsg+0x30/0x40 [<0>] ___sys_sendmsg+0x27a/0x290 [<0>] __sys_sendmsg+0x58/0xa0 [<0>] do_syscall_64+0x5a/0x270 [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe To overcome this deadlock and to allow multiple netlink messages to progress in parallel, following scheme is implemented. 1. Split the lock protecting the cb_table into a per-index lock, and make it a rwlock. This lock is used to ensure no callbacks are running after unregistration returns. Since a module will not be registered once it is already running callbacks, this avoids the deadlock. 2. Use smp_store_release() to update the cb_table during registration so that no lock is required. This avoids lockdep problems with thinking all the rwsems are the same lock class. Fixes: 0e2d00eb6fd45 ("RDMA: Add NLDEV_GET_CHARDEV to allow char dev discovery and autoload") Link: https://lore.kernel.org/r/20191015080733.18625-1-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-24RDMA/nldev: Skip counter if port doesn't matchMark Zhang1-1/+1
The counter resource should return -EAGAIN if it was requested for a different port, this is similar to how QP works if the users provides a port filter. Otherwise port filtering in netlink will return broken counter nests. Fixes: c4ffee7c9bdb ("RDMA/netlink: Implement counter dumpit calback") Link: https://lore.kernel.org/r/20191020062800.8065-1-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-23RDMA/core: Check that process is still alive before sending it to the usersLeon Romanovsky2-9/+33
The PID information can disappear asynchronously because the task can be killed and moved to zombie state. In this case, PID will be zero in similar way to the kernel tasks. Recognize such situation where we are asking to return orphaned object and simply skip filling PID attribute. As part of this change, document the same scenario in counter.c code. Link: https://lore.kernel.org/r/20191010071105.25538-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-23RDMA/restrack: Remove PID namespace supportLeon Romanovsky4-46/+2
IB resources are bounded to IB device and file descriptors, both entities are unaware to PID namespaces and to task lifetime. The difference in model caused to unpredictable behavior for the following scenario: 1. Create FD and context 2. Share it with ephemeral child 3. Create any object and exit that child The end result of this flow, that those newly created objects will be tracked by restrack, but won't be visible for users because task_struct associated with them already exited. The right thing is to rely on net namespace only for any filtering purposes and drop PID namespace. Link: https://lore.kernel.org/r/20191010071105.25538-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-23compat_ioctl: move more drivers to compat_ptr_ioctlArnd Bergmann1-2/+2
The .ioctl and .compat_ioctl file operations have the same prototype so they can both point to the same function, which works great almost all the time when all the commands are compatible. One exception is the s390 architecture, where a compat pointer is only 31 bit wide, and converting it into a 64-bit pointer requires calling compat_ptr(). Most drivers here will never run in s390, but since we now have a generic helper for it, it's easy enough to use it consistently. I double-checked all these drivers to ensure that all ioctl arguments are used as pointers or are ignored, but are not interpreted as integer values. Acked-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: David Sterba <dsterba@suse.com> Acked-by: Darren Hart (VMware) <dvhart@infradead.org> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-10-22IB/core: Do not notify GID change event of an unregistered deviceParav Pandit1-7/+1
When IB device is undergoing unregistration, the GID cache is always cleaned up after all clients are unregistered with the below flow. __ib_unregister_device() disable_device() ib_cache_cleanup_one() gid_table_cleanup_one() cleanup_gid_table_port() There is no use in generating a GID change event at this stage, where there is no active client of the device and device is nearly unregistered. Link: https://lore.kernel.org/r/20191020065427.8772-4-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22RDMA/uverbs: Add a check for uverbs_attr_get to uverbs_copy_to_struct_or_zeroChuhong Yuan1-0/+3
All current callers for uverbs_copy_to_struct_or_zero() already check that the attribute exists, but it make sense to verify the result like the other functions do. Link: https://lore.kernel.org/r/20191018081533.8544-1-hslester96@gmail.com Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22IB/cma: Honor traffic class from lower netdevice for RoCEParav Pandit1-9/+52
When a macvlan netdevice is used for RoCE, consider the tos->prio->tc mapping as SL using its lower netdevice. 1. If the lower netdevice is a VLAN netdevice, consider the VLAN netdevice and it's parent netdevice for mapping 2. If the lower netdevice is not a VLAN netdevice, consider tc mapping directly from the lower netdevice Link: https://lore.kernel.org/r/20191015072058.17347-1-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22RDMA/nldev: Provide MR statisticsErez Alfasi2-6/+40
Add RDMA nldev netlink interface for dumping MR statistics information. Output example: $ ./ibv_rc_pingpong -o -P -s 500000000 local address: LID 0x0001, QPN 0x00008a, PSN 0xf81096, GID :: $ rdma stat show mr dev mlx5_0 mrn 2 page_faults 122071 page_invalidations 0 Link: https://lore.kernel.org/r/20191016062308.11886-5-leon@kernel.org Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22RDMA/mlx5: Return ODP type per MRErez Alfasi1-0/+13
Provide an ODP explicit/implicit type as part of 'rdma -dd resource show mr' dump. For example: $ rdma -dd resource show mr dev mlx5_0 mrn 1 rkey 0xa99a lkey 0xa99a mrlen 50000000 pdn 9 pid 7372 comm ibv_rc_pingpong drv_odp explicit For non-ODP MRs, we won't print "drv_odp ..." at all. Link: https://lore.kernel.org/r/20191016062308.11886-4-leon@kernel.org Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22RDMA/nldev: Allow different fill function per resourceErez Alfasi1-21/+23
So far res_get_common_{dumpit, doit} was using the default resource fill function which was defined as part of the nldev_fill_res_entry fill_entries. Add a fill function pointer as an argument allows us to use different fill function in case we want to dump different values then 'rdma resource' flow do, but still use the same existing general resources dumping flow. Link: https://lore.kernel.org/r/20191016062308.11886-3-leon@kernel.org Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22RDMA/uverbs: Prevent potential underflowDan Carpenter1-1/+1
The issue is in drivers/infiniband/core/uverbs_std_types_cq.c in the UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE) function. We check that: if (attr.comp_vector >= attrs->ufile->device->num_comp_vectors) { But we don't check if "attr.comp_vector" is negative. It could potentially lead to an array underflow. My concern would be where cq->vector is used in the create_cq() function from the cxgb4 driver. And really "attr.comp_vector" is appears as a u32 to user space so that's the right type to use. Fixes: 9ee79fce3642 ("IB/core: Add completion queue (cq) object actions") Link: https://lore.kernel.org/r/20191011133419.GA22905@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22infiniband: fix core/ kernel-doc notationrd.dunlab@gmail.com2-13/+12
Correct function parameter names (typos or renames). Add kernel-doc notation for missing function parameters. ../drivers/infiniband/core/sa_query.c:1263: warning: Function parameter or member 'gid_attr' not described in 'ib_init_ah_attr_from_path' ../drivers/infiniband/core/sa_query.c:1263: warning: Excess function parameter 'sgid_attr' description in 'ib_init_ah_attr_from_path' ../drivers/infiniband/core/device.c:145: warning: Function parameter or member 'dev' not described in 'rdma_dev_access_netns' ../drivers/infiniband/core/device.c:145: warning: Excess function parameter 'device' description in 'rdma_dev_access_netns' ../drivers/infiniband/core/device.c:1333: warning: Function parameter or member 'name' not described in 'ib_register_device' ../drivers/infiniband/core/device.c:1461: warning: Function parameter or member 'ib_dev' not described in 'ib_unregister_device' ../drivers/infiniband/core/device.c:1461: warning: Excess function parameter 'device' description in 'ib_unregister_device' ../drivers/infiniband/core/device.c:1483: warning: Function parameter or member 'ib_dev' not described in 'ib_unregister_device_and_put' ../drivers/infiniband/core/device.c:1550: warning: Function parameter or member 'ib_dev' not described in 'ib_unregister_device_queued' Link: https://lore.kernel.org/r/20191010035240.191542461@gmail.com Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22infiniband: fix core/verbs.c kernel-doc notationrd.dunlab@gmail.com1-0/+2
Add missing function parameter descriptions: ../drivers/infiniband/core/verbs.c:257: warning: Function parameter or member 'flags' not described in '__ib_alloc_pd' ../drivers/infiniband/core/verbs.c:257: warning: Function parameter or member 'caller' not described in '__ib_alloc_pd' Link: https://lore.kernel.org/r/20191010035240.011497492@gmail.com Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22infiniband: fix core/ipwm_util.h kernel-doc warningsrd.dunlab@gmail.com1-1/+4
Fix kernel-doc warnings and expected formatting. ../drivers/infiniband/core/iwpm_util.h:219: warning: Function parameter or member 'a_sockaddr' not described in 'iwpm_compare_sockaddr' ../drivers/infiniband/core/iwpm_util.h:219: warning: Function parameter or member 'b_sockaddr' not described in 'iwpm_compare_sockaddr' ../drivers/infiniband/core/iwpm_util.h:280: warning: Function parameter or member 'iwpm_pid' not described in 'iwpm_send_hello' Link: https://lore.kernel.org/r/20191010035239.695604406@gmail.com Signed-off-by: Randy Dunlap <rd.dunlab@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22Merge branch 'mlx5-rd-sgl' into rdma.git for-nextJason Gunthorpe1-10/+15
From Yamin Friedman: ==================== This series from Yamin implements long standing "TODO" existed in rw.c. It allows the driver to specify a cut-over point where it is faster to build a lkey MR rather than do a large SGL for RDMA READ operations. mlx5 HW gets a notable performane boost by switching to MRs. ==================== Based on the mlx5-next branch from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for dependencies * branch 'mlx5-rd-sgl': (3 commits) RDMA/mlx5: Add capability for max sge to get optimized performance RDMA/rw: Support threshold for registration vs scattering to local pages net/mlx5: Expose optimal performance scatter entries capability
2019-10-22RDMA/rw: Support threshold for registration vs scattering to local pagesYamin Friedman1-10/+15
If there are more scatter entries than the recommended limit provided by the ib device, UMR registration is used. This will provide optimal performance when performing large RDMA READs over devices that advertise the threshold capability. With ConnectX-5 running NVMeoF RDMA with FIO single QP 128KB writes: Without use of cap: 70Gb/sec With use of cap: 84Gb/sec Link: https://lore.kernel.org/r/20191007135933.12483-3-leon@kernel.org Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-18IB/core: Use rdma_read_gid_l2_fields to compare GID L2 fieldsParav Pandit1-4/+5
Current code tries to derive VLAN ID and compares it with GID attribute for matching entry. This raw search fails on macvlan netdevice as its not a VLAN device, but its an upper device of a VLAN netdevice. Due to this limitation, incoming QP1 packets fail to match in the GID table. Such packets are dropped. Hence, to support it, use the existing rdma_read_gid_l2_fields() that takes care of diffferent device types. Fixes: dbf727de7440 ("IB/core: Use GID table in AH creation and dmac resolution") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191002121750.17313-1-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-18RDMA/iwcm: move iw_rem_ref() calls out of spinlockKrishnamraju Eraparaju1-23/+29
kref release routines usually perform memory release operations, hence, they should not be called with spinlocks held. one such case is: SIW kref release routine siw_free_qp(), which can sleep via vfree() while freeing queue memory. Hence, all iw_rem_ref() calls in IWCM are moved out of spinlocks. Fixes: 922a8e9fb2e0 ("RDMA: iWARP Connection Manager.") Signed-off-by: Krishnamraju Eraparaju <krishna2@chelsio.com> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Link: https://lore.kernel.org/r/20191007102627.12568-1-krishna2@chelsio.com Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-10-04RDMA/odp: Lift umem_mutex out of ib_umem_odp_unmap_dma_pages()Jason Gunthorpe1-2/+4
This fixes a race of the form: CPU0 CPU1 mlx5_ib_invalidate_range() mlx5_ib_invalidate_range() // This one actually makes npages == 0 ib_umem_odp_unmap_dma_pages() if (npages == 0 && !dying) // This one does nothing ib_umem_odp_unmap_dma_pages() if (npages == 0 && !dying) dying = 1; dying = 1; schedule_work(&umem_odp->work); // Double schedule of the same work schedule_work(&umem_odp->work); // BOOM npages and dying must be read and written under the umem_mutex lock. Since whenever ib_umem_odp_unmap_dma_pages() is called mlx5 must also call mlx5_ib_update_xlt, and both need to be done in the same locking region, hoist the lock out of unmap. This avoids an expensive double critical section in mlx5_ib_invalidate_range(). Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support") Link: https://lore.kernel.org/r/20191001153821.23621-4-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04IB/core: Fix wrong iterating on portsMohamad Heib1-1/+1
rdma_for_each_port is already incrementing the iterator's value it receives therefore, after the first iteration the iterator is increased by 2 which eventually causing wrong queries and possible traces. Fix the above by removing the old redundant incrementation that was used before rdma_for_each_port() macro. Cc: <stable@vger.kernel.org> Fixes: ea1075edcbab ("RDMA: Add and use rdma_for_each_port") Link: https://lore.kernel.org/r/20191002122127.17571-1-leon@kernel.org Signed-off-by: Mohamad Heib <mohamadh@mellanox.com> Reviewed-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04IB/cm: Use container_of() instead of typecastParav Pandit1-2/+2
Use container_of() macro to get to timewait info structure instead of typecasting. Link: https://lore.kernel.org/r/20191002122517.17721-5-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/nldev: Reshuffle the code to avoid need to rebind QP in error pathLeon Romanovsky1-6/+4
Properly unwind QP counter rebinding in case of failure. Trying to rebind the counter after unbiding it is not going to work reliably, move the unbind to the end so it doesn't have to be unwound. Fixes: b389327df905 ("RDMA/nldev: Allow counter manual mode configration through RDMA netlink") Link: https://lore.kernel.org/r/20191002115627.16740-1-leon@kernel.org Reviewed-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/cm: Fix memory leak in cm_add/remove_oneJack Morgenstein1-0/+3
In the process of moving the debug counters sysfs entries, the commit mentioned below eliminated the cm_infiniband sysfs directory. This sysfs directory was tied to the cm_port object allocated in procedure cm_add_one(). Before the commit below, this cm_port object was freed via a call to kobject_put(port->kobj) in procedure cm_remove_port_fs(). Since port no longer uses its kobj, kobject_put(port->kobj) was eliminated. This, however, meant that kfree was never called for the cm_port buffers. Fix this by adding explicit kfree(port) calls to functions cm_add_one() and cm_remove_one(). Note: the kfree call in the first chunk below (in the cm_add_one error flow) fixes an old, undetected memory leak. Fixes: c87e65cfb97c ("RDMA/cm: Move debug counters to be under relevant IB device") Link: https://lore.kernel.org/r/20190916071154.20383-2-leon@kernel.org Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04RDMA/core: Fix an error handling path in 'res_get_common_doit()'Christophe JAILLET1-1/+1
According to surrounding error paths, it is likely that 'goto err_get;' is expected here. Otherwise, a call to 'rdma_restrack_put(res);' would be missing. Fixes: c5dfe0ea6ffa ("RDMA/nldev: Add resource tracker doit callback") Link: https://lore.kernel.org/r/20190818091044.8845-1-christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-01RDMA/core: Fix return code when modify_device isn't supportedKamal Heib2-2/+2
The proper return code is "-EOPNOTSUPP" when modify_device callback is not supported. Link: https://lore.kernel.org/r/20190923104158.5331-2-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>