aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/mlx5/main.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-01-30RDMA/core: Make the entire API tree staticJason Gunthorpe1-2/+0
Compilation of mlx5 driver without CONFIG_INFINIBAND_USER_ACCESS generates the following error. on x86_64: ld: drivers/infiniband/hw/mlx5/main.o: in function `mlx5_ib_handler_MLX5_IB_METHOD_VAR_OBJ_ALLOC': main.c:(.text+0x186d): undefined reference to `ib_uverbs_get_ucontext_file' ld: drivers/infiniband/hw/mlx5/main.o:(.rodata+0x2480): undefined reference to `uverbs_idr_class' ld: drivers/infiniband/hw/mlx5/main.o:(.rodata+0x24d8): undefined reference to `uverbs_destroy_def_handler' This is happening because some parts of the UAPI description are not static. This is a hold over from earlier code that relied on struct pointers to refer to object types, now object types are referenced by number. Remove the unused globals and add statics to the remaining UAPI description elements. Remove the redundent #ifdefs around mlx5_ib_*defs and obsolete mlx5_ib_get_devx_tree(). The compiler now trims alot more unused code, including the above problematic definitions when !CONFIG_INFINIBAND_USER_ACCESS. Fixes: 7be76bef320b ("IB/mlx5: Introduce VAR object and its alloc/destroy methods") Reported-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds1-22/+39
Pull networking updates from David Miller: 1) Add WireGuard 2) Add HE and TWT support to ath11k driver, from John Crispin. 3) Add ESP in TCP encapsulation support, from Sabrina Dubroca. 4) Add variable window congestion control to TIPC, from Jon Maloy. 5) Add BCM84881 PHY driver, from Russell King. 6) Start adding netlink support for ethtool operations, from Michal Kubecek. 7) Add XDP drop and TX action support to ena driver, from Sameeh Jubran. 8) Add new ipv4 route notifications so that mlxsw driver does not have to handle identical routes itself. From Ido Schimmel. 9) Add BPF dynamic program extensions, from Alexei Starovoitov. 10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes. 11) Add support for macsec HW offloading, from Antoine Tenart. 12) Add initial support for MPTCP protocol, from Christoph Paasch, Matthieu Baerts, Florian Westphal, Peter Krystad, and many others. 13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu Cherian, and others. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits) net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC udp: segment looped gso packets correctly netem: change mailing list qed: FW 8.42.2.0 debug features qed: rt init valid initialization changed qed: Debug feature: ilt and mdump qed: FW 8.42.2.0 Add fw overlay feature qed: FW 8.42.2.0 HSI changes qed: FW 8.42.2.0 iscsi/fcoe changes qed: Add abstraction for different hsi values per chip qed: FW 8.42.2.0 Additional ll2 type qed: Use dmae to write to widebus registers in fw_funcs qed: FW 8.42.2.0 Parser offsets modified qed: FW 8.42.2.0 Queue Manager changes qed: FW 8.42.2.0 Expose new registers and change windows qed: FW 8.42.2.0 Internal ram offsets modifications MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver Documentation: net: octeontx2: Add RVU HW and drivers overview octeontx2-pf: ethtool RSS config support octeontx2-pf: Add basic ethtool support ...
2020-01-21Merge tag 'rds-odp-for-5.5' into rdma.git for-nextJason Gunthorpe1-18/+33
From https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma Leon Romanovsky says: ==================== Use ODP MRs for kernel ULPs The following series extends MR creation routines to allow creation of user MRs through kernel ULPs as a proxy. The immediate use case is to allow RDS to work over FS-DAX, which requires ODP (on-demand-paging) MRs to be created and such MRs were not possible to create prior this series. The first part of this patchset extends RDMA to have special verb ib_reg_user_mr(). The common use case that uses this function is a userspace application that allocates memory for HCA access but the responsibility to register the memory at the HCA is on an kernel ULP. This ULP acts as an agent for the userspace application. The second part provides advise MR functionality for ULPs. This is integral part of ODP flows and used to trigger pagefaults in advance to prepare memory before running working set. The third part is actual user of those in-kernel APIs. ==================== * tag 'rds-odp-for-5.5': net/rds: Use prefetch for On-Demand-Paging MR net/rds: Handle ODP mr registration/unregistration net/rds: Detect need of On-Demand-Paging memory registration RDMA/mlx5: Fix handling of IOVA != user_va in ODP paths IB/mlx5: Mask out unsupported ODP capabilities for kernel QPs RDMA/mlx5: Don't fake udata for kernel path IB/mlx5: Add ODP WQE handlers for kernel QPs IB/core: Add interface to advise_mr for kernel users IB/core: Introduce ib_reg_user_mr IB: Allow calls to ib_umem_get from kernel ULPs Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-21Merge tag 'rds-odp-for-5.5' of https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdmaDavid S. Miller1-18/+33
Leon Romanovsky says: ==================== Use ODP MRs for kernel ULPs The following series extends MR creation routines to allow creation of user MRs through kernel ULPs as a proxy. The immediate use case is to allow RDS to work over FS-DAX, which requires ODP (on-demand-paging) MRs to be created and such MRs were not possible to create prior this series. The first part of this patchset extends RDMA to have special verb ib_reg_user_mr(). The common use case that uses this function is a userspace application that allocates memory for HCA access but the responsibility to register the memory at the HCA is on an kernel ULP. This ULP acts as an agent for the userspace application. The second part provides advise MR functionality for ULPs. This is integral part of ODP flows and used to trigger pagefaults in advance to prepare memory before running working set. The third part is actual user of those in-kernel APIs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-16Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linuxSaeed Mahameed1-4/+6
This merge syncs with mlx5-next latest HW bits and layout updates for next features, in addition one patch that improves mlx5_create_auto_grouped_flow_table() API across all mlx5 users. * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Refactor mlx5_create_auto_grouped_flow_table net/mlx5e: Add discard counters per priority net/mlx5e: Expose FEC feilds and related capability bit net/mlx5: Add mlx5_ifc definitions for connection tracking support net/mlx5: Add copy header action struct layout net/mlx5: Expose resource dump register mapping net/mlx5: Add structures and defines for MIRC register net/mlx5: Read MCAM register groups 1 and 2 net/mlx5: Add structures layout for new MCAM access reg groups net/mlx5: Expose vDPA emulation device capabilities net/mlx5: Add Virtio Emulation related device capabilities Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16net/mlx5: Refactor mlx5_create_auto_grouped_flow_tablePaul Blakey1-4/+6
Refactor mlx5_create_auto_grouped_flow_table() to use ft_attr param which already carries the max_fte, prio and flags memebers, and is used the same in similar mlx5_create_flow_table() function. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-01-16IB/mlx5: Expose RoCE accelerator countersAvihai Horon1-0/+18
Introduce the following RoCE accelerator counters: * roce_adp_retrans - number of adaptive retransmission for RoCE traffic. * roce_adp_retrans_to - number of times RoCE traffic reached time out due to adaptive retransmission. * roce_slow_restart - number of times RoCE slow restart was used. * roce_slow_restart_cnps - number of times RoCE slow restart generate CNP packets. * roce_slow_restart_trans - number of times RoCE slow restart change state to slow restart. Link: https://lore.kernel.org/r/20200115145459.83280-3-leon@kernel.org Signed-off-by: Avihai Horon <avihaih@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-16IB/mlx5: Mask out unsupported ODP capabilities for kernel QPsMoni Shoua1-0/+17
The ODP handler for WQEs in RQ or SRQ is not implented for kernel QPs. Therefore don't report support in these if query comes from a kernel user. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-16RDMA/mlx5: Don't fake udata for kernel pathLeon Romanovsky1-18/+16
Kernel paths must not set udata and provide NULL pointer, instead of faking zeroed udata struct. Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-01-12IB/mlx5: Add mmap support for VARYishai Hadas1-1/+4
Add mmap support for VAR, it uses the 'offset' command mode with involvement of IB core APIs to find the previously allocated mmap entry. Link: https://lore.kernel.org/r/20191212110928.334995-6-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-12IB/mlx5: Introduce VAR object and its alloc/destroy methodsYishai Hadas1-0/+157
Introduce VAR object and its alloc/destroy KABI methods. The internal implementation uses the IB core API to manage mmap/munamp calls. Link: https://lore.kernel.org/r/20191212110928.334995-5-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-12IB/mlx5: Extend caps stage to handle VAR capabilitiesYishai Hadas1-2/+38
Extend caps stage to handle VAR capabilities. Link: https://lore.kernel.org/r/20191212110928.334995-4-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-10Merge branch 'x86/mm' into efi/core, to pick up dependenciesIngo Molnar1-1/+1
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-07IB/mlx5: Do reverse sequence during device removalParav Pandit1-0/+2
When IB device profile initialization completes, device is marked as active. However, IB device is not marked inactive, during device removal flow. It should be the mirror of the add flow. Hence, mark it inactive during remove sequence. Link: https://lore.kernel.org/r/20191212113024.336702-2-leon@kernel.org Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-12-12IB/mlx5: Fix device memory flowsYishai Hadas1-41/+79
Fix device memory flows so that only once there will be no live mmaped VA to a given allocation the matching object will be destroyed. This prevents a potential scenario that existing VA that was mmaped by one process might still be used post its deallocation despite that it's owned now by other process. The above is achieved by integrating with IB core APIs to manage mmap/munmap. Only once the refcount will become 0 the DM object and its underlay area will be freed. Fixes: 3b113a1ec3d4 ("IB/mlx5: Support device memory type attribute") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191212100237.330654-3-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-12IB/mlx5: Fix steering rule of drop and countMaor Gottlieb1-7/+6
There are two flow rule destinations: QP and packet. While users are setting DROP packet rule, the QP should not be set as a destination. Fixes: 3b3233fbf02e ("IB/mlx5: Add flow counters binding support") Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20191212091214.315005-4-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-12-10x86/mm/pat: Rename <asm/pat.h> => <asm/memtype.h>Ingo Molnar1-1/+1
pat.h is a file whose main purpose is to provide the memtype_*() APIs. PAT is the low level hardware mechanism - but the high level abstraction is memtype. So name the header <memtype.h> as well - this goes hand in hand with memtype.c and memtype_interval.c. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-27Merge tag 'driver-core-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-coreLinus Torvalds1-47/+15
Pull driver core updates from Greg KH: "Here is the "big" set of driver core patches for 5.5-rc1 There's a few minor cleanups and fixes in here, but the majority of the patches in here fall into two buckets: - debugfs api cleanups and fixes - driver core device link support for boot dependancy issues The debugfs api cleanups are working to slowly refactor the debugfs apis so that it is even harder to use incorrectly. That work has been happening for the past few kernel releases and will continue over time, it's a long-term project/goal The driver core device link support missed 5.4 by just a bit, so it's been sitting and baking for many months now. It's from Saravana Kannan to help resolve the problems that DT-based systems have at boot time with dependancy graphs and kernel modules. Turns out that no one has actually tried to build a generic arm64 kernel with loads of modules and have it "just work" for a variety of platforms (like a distro kernel). The big problem turned out to be a lack of dependency information between different areas of DT entries, and the work here resolves that problem and now allows devices to boot properly, and quicker than a monolith kernel. All of these patches have been in linux-next for a long time with no reported issues" * tag 'driver-core-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (68 commits) tracing: Remove unnecessary DEBUG_FS dependency of: property: Add device link support for interrupt-parent, dmas and -gpio(s) debugfs: Fix !DEBUG_FS debugfs_create_automount of: property: Add device link support for "iommu-map" of: property: Fix the semantics of of_is_ancestor_of() i2c: of: Populate fwnode in of_i2c_get_board_info() drivers: base: Fix Kconfig indentation firmware_loader: Fix labels with comma for builtin firmware driver core: Allow device link operations inside sync_state() driver core: platform: Declare ret variable only once cpu-topology: declare parse_acpi_topology in <linux/arch_topology.h> crypto: hisilicon: no need to check return value of debugfs_create functions driver core: platform: use the correct callback type for bus_find_device firmware_class: make firmware caching configurable driver core: Clarify documentation for fwnode_operations.add_links() mailbox: tegra: Fix superfluous IRQ error message net: caif: Fix debugfs on 64-bit platforms mac80211: Use debugfs_create_xul() helper media: c8sectpfe: no need to check return value of debugfs_create functions of: property: Add device link support for iommus, mboxes and io-channels ...
2019-11-25Merge branch 'ib-guids' into rdma.git for-nextJason Gunthorpe1-20/+22
Danit Goldberg says: ==================== This series extends RTNETLINK to provide IB port and node GUIDs, which were configured for Infiniband VFs. The functionality to set VF GUIDs already existed for a long time, and here we are adding the missing "get" so that netlink will be symmetric and various cloud orchestration tools will be able to manage such VFs more naturally. The iproute2 was extended too to present those GUIDs. - ip link show <device> For example: - ip link set ib4 vf 0 node_guid 22:44:33:00:33:11:00:33 - ip link set ib4 vf 0 port_guid 10:21:33:12:00:11:22:10 - ip link show ib4 ib4: <BROADCAST,MULTICAST> mtu 4092 qdisc noop state DOWN mode DEFAULT group default qlen 256 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff vf 0 link/infiniband 00:00:0a:2d:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:44:36:8d brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff, spoof checking off, NODE_GUID 22:44:33:00:33:11:00:33, PORT_GUID 10:21:33:12:00:11:22:10, link-state disable, trust off, query_rss off ==================== Based on the mlx5-next branch from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for dependencies * branch 'ib-guids': (35 commits) IB/mlx5: Implement callbacks for getting VFs GUID attributes IB/ipoib: Add ndo operation for getting VFs GUID attributes IB/core: Add interfaces to get VF node and port GUIDs net/core: Add support for getting VF GUIDs net/mlx5: Add new chain for netfilter flow table offload net/mlx5: Refactor creating fast path prio chains net/mlx5: Accumulate levels for chains prio namespaces net/mlx5: Define fdb tc levels per prio net/mlx5: Rename FDB_* tc related defines to FDB_TC_* defines net/mlx5: Simplify fdb chain and prio eswitch defines IB/mlx5: Load profile according to RoCE enablement state IB/mlx5: Rename profile and init methods net/mlx5: Handle "enable_roce" devlink param net/mlx5: Document flow_steering_mode devlink param devlink: Add new "enable_roce" generic device param net/mlx5: fix spelling mistake "metdata" -> "metadata" net/mlx5: fix kvfree of uninitialized pointer spec IB/mlx5: Introduce and use mlx5_core_is_vf() net/mlx5: E-switch, Enable metadata on own vport net/mlx5: Refactor ingress acl configuration ... Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-22IB/mlx5: Implement callbacks for getting VFs GUID attributesDanit Goldberg1-0/+1
Implement the IB defined callback mlx5_ib_get_vf_guid used to query FW for VFs attributes and return node and port GUIDs. Signed-off-by: Danit Goldberg <danitg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-11-19IB/mlx5: Support extended number of strides for Striding RQMark Zhang1-2/+8
Extends the minimum single WQE strides from 64 to 8, which is exposed by the "min_single_wqe_log_num_of_strides" field of striding_rq_caps. Choose right number of strides based on FW capability. Link: https://lore.kernel.org/r/20191115154555.247856-1-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-11IB/mlx5: Load profile according to RoCE enablement stateMichael Guralnik1-12/+13
When RoCE is disabled load mlx5_ib in raw_eth profile. Clean pf_profile roce capability checks as it will not be used without roce capability. Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-11IB/mlx5: Rename profile and init methodsMichael Guralnik1-7/+7
Rename uplink_rep_profile and its unique init and cleanup stages to suit its upcoming use as the profile when RoCE is disabled. Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-06RDMA: Connect between the mmap entry and the umap_priv structureMichal Kalderon1-3/+5
The rdma_user_mmap_io interface created a common interface for drivers to correctly map hw resources and zap them once the ucontext is destroyed enabling the drivers to safely free the hw resources. However, this meant the drivers need to delay freeing the resource to the ucontext destroy phase to ensure they were no longer mapped. The new mechanism for a common way of handling user/driver address mapping enabled notifying the driver if all umap_priv mappings were removed, and enabled freeing the hw resources when they are done with and not delay it until ucontext destroy. Since not all drivers use the mechanism, NULL can be sent to the rdma_user_mmap_io interface to continue working as before. Drivers that use the mmap_xa interface can pass the entry being mapped to the rdma_user_mmap_io function to be linked together. Link: https://lore.kernel.org/r/20191030094417.16866-4-michal.kalderon@marvell.com Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-11-04IB: mlx5: no need to check return value of debugfs_create functionsGreg Kroah-Hartman1-47/+15
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Leon Romanovsky <leon@kernel.org> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: linux-rdma@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-11-01IB/mlx5: Introduce and use mlx5_core_is_vf()Parav Pandit1-1/+1
Instead of deciding a given device is virtual function or not based on a device is PF or not, use already defined MLX5_COREDEV_VF by introducing an helper API mlx5_core_is_vf(). This enables to clearly identify PF, VF and non virtual functions. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-10-31IB/mlx5: Test write combining supportMichael Guralnik1-1/+14
Linux can run in all sorts of physical machines and VMs where write combining may or may not be supported. Currently there is no way to reliably tell if the system supports WC, or not. The driver uses WC to optimize posting work to the HCA, and getting this wrong in either direction can cause a significant performance loss. Add a test in mlx5_ib initialization process to test whether write-combining is supported on the machine. The test will run as part of the enable_driver callback to ensure that the test runs after the device is setup and can create and modify the QP needed, but runs before the device is exposed to the users. The test opens UD QP and posts NOP WQEs, the WQE written to the BlueFlame is different from the WQE in memory, requesting CQE only on the BlueFlame WQE. By checking whether we received a completion on one of these WQEs we can know if BlueFlame succeeded and this write-combining must be supported. Change reporting of BlueFlame support to be dependent on write-combining support instead of the FW's guess as to what the machine can do. Link: https://lore.kernel.org/r/20191027062234.10993-1-leon@kernel.org Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28Merge branch 'odp_rework' into rdma.git for-nextJason Gunthorpe1-9/+8
Jason Gunthorpe says: ==================== In order to hoist the interval tree code out of the drivers and into the mmu_notifiers it is necessary for the drivers to not use the interval tree for other things. This series replaces the interval tree with an xarray and along the way re-aligns all the locking to use a sensible SRCU model where the 'update' step is done by modifying an xarray. The result is overall much simpler and with less locking in the critical path. Many functions were reworked for clarity and small details like using 'imr' to refer to the implicit MR make the entire code flow here more readable. This also squashes at least two race bugs on its own, and quite possibily more that haven't been identified. ==================== Merge conflicts with the odp statistics patch resolved. * branch 'odp_rework': RDMA/odp: Remove broken debugging call to invalidate_range RDMA/mlx5: Do not race with mlx5_ib_invalidate_range during create and destroy RDMA/mlx5: Do not store implicit children in the odp_mkeys xarray RDMA/mlx5: Rework implicit ODP destroy RDMA/mlx5: Avoid double lookups on the pagefault path RDMA/mlx5: Reduce locking in implicit_mr_get_data() RDMA/mlx5: Use an xarray for the children of an implicit ODP RDMA/mlx5: Split implicit handling from pagefault_mr RDMA/mlx5: Set the HW IOVA of the child MRs to their place in the tree RDMA/mlx5: Lift implicit_mr_alloc() into the two routines that call it RDMA/mlx5: Rework implicit_mr_get_data RDMA/mlx5: Delete struct mlx5_priv->mkey_table RDMA/mlx5: Use a dedicated mkey xarray for ODP RDMA/mlx5: Split sig_err MR data into its own xarray RDMA/mlx5: Use SRCU properly in ODP prefetch Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/mlx5: Rework implicit ODP destroyJason Gunthorpe1-2/+0
Use SRCU in a sensible way by removing all MRs in the implicit tree from the two xarrays (the update operation), then a synchronize, followed by a normal single threaded teardown. This is only a little unusual from the normal pattern as there can still be some work pending in the unbound wq that may also require a workqueue flush. This is tracked with a single atomic, consolidating the redundant existing atomics and wait queue. For understand-ability the entire ODP implicit create/destroy flow now largely exists in a single pair of functions within odp.c, with a few support functions for tearing down an unused child. Link: https://lore.kernel.org/r/20191009160934.3143-13-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/mlx5: Use a dedicated mkey xarray for ODPJason Gunthorpe1-9/+8
There is a per device xarray storing mkeys that is used to store every mkey in the system. However, this xarray is now only read by ODP for certain ODP designated MRs (ODP, implicit ODP, MW, DEVX_INDIRECT). Create an xarray only for use by ODP, that only contains ODP related MKeys. This xarray is protected by SRCU and all erases are protected by a synchronize. This improves performance: - All MRs in the odp_mkeys xarray are ODP MRs, so some tests for is_odp() can be deleted. The xarray will also consume fewer nodes. - normal MR's are never mixed with ODP MRs in a SRCU data structure so performance sucking synchronize_srcu() on every MR destruction is not needed. - No smp_load_acquire(live) and xa_load() double barrier on read Due to the SRCU locking scheme care must be taken with the placement of the xa_store(). Once it completes the MR is immediately visible to other threads and only through a xa_erase() & synchronize_srcu() cycle could it be destroyed. Link: https://lore.kernel.org/r/20191009160934.3143-4-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-28RDMA/mlx5: Split sig_err MR data into its own xarrayJason Gunthorpe1-0/+2
The locking model for signature is completely different than ODP, do not share the same xarray that relies on SRCU locking to support ODP. Simply store the active mlx5_core_sig_ctx's in an xarray when signature MRs are created and rely on trivial xarray locking to serialize everything. The overhead of storing only a handful of SIG related MRs is going to be much less than an xarray full of every mkey. Link: https://lore.kernel.org/r/20191009160934.3143-3-jgg@ziepe.ca Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22IB/mlx5: Remove dead codeRan Rozenstein1-15/+0
mlx5_ib_dc_atomic_is_supported function is not used anywhere. Remove the dead code. Fixes: a60109dc9a95 ("IB/mlx5: Add support for extended atomic operations") Link: https://lore.kernel.org/r/20191020064454.8551-1-leon@kernel.org Signed-off-by: Ran Rozenstein <ranro@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22RDMA/nldev: Provide MR statisticsErez Alfasi1-0/+2
Add RDMA nldev netlink interface for dumping MR statistics information. Output example: $ ./ibv_rc_pingpong -o -P -s 500000000 local address: LID 0x0001, QPN 0x00008a, PSN 0xf81096, GID :: $ rdma stat show mr dev mlx5_0 mrn 2 page_faults 122071 page_invalidations 0 Link: https://lore.kernel.org/r/20191016062308.11886-5-leon@kernel.org Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22RDMA/mlx5: Return ODP type per MRErez Alfasi1-0/+1
Provide an ODP explicit/implicit type as part of 'rdma -dd resource show mr' dump. For example: $ rdma -dd resource show mr dev mlx5_0 mrn 1 rkey 0xa99a lkey 0xa99a mrlen 50000000 pdn 9 pid 7372 comm ibv_rc_pingpong drv_odp explicit For non-ODP MRs, we won't print "drv_odp ..." at all. Link: https://lore.kernel.org/r/20191016062308.11886-4-leon@kernel.org Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-22Merge branch 'mlx5-rd-sgl' into rdma.git for-nextJason Gunthorpe1-0/+2
From Yamin Friedman: ==================== This series from Yamin implements long standing "TODO" existed in rw.c. It allows the driver to specify a cut-over point where it is faster to build a lkey MR rather than do a large SGL for RDMA READ operations. mlx5 HW gets a notable performane boost by switching to MRs. ==================== Based on the mlx5-next branch from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for dependencies * branch 'mlx5-rd-sgl': (3 commits) RDMA/mlx5: Add capability for max sge to get optimized performance RDMA/rw: Support threshold for registration vs scattering to local pages net/mlx5: Expose optimal performance scatter entries capability
2019-10-22RDMA/mlx5: Add capability for max sge to get optimized performanceYamin Friedman1-0/+2
Allows the IB device to provide a value of maximum scatter gather entries per RDMA READ. In certain cases it may be preferable for a device to perform UMR memory registration rather than have many scatter entries in a single RDMA READ. This provides a significant performance increase in devices capable of using different memory registration schemes based on the number of scatter gather entries. This general capability allows each device vendor to fine tune when it is better to use memory registration. Link: https://lore.kernel.org/r/20191007135933.12483-4-leon@kernel.org Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-10-04IB/mlx5: Remove unnecessary else statementErez Alfasi1-2/+2
'else' is not generally useful after a break or return. Remove this unnecessary statement. Link: https://lore.kernel.org/r/20191002122517.17721-4-leon@kernel.org Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-41/+86
Pull RDMA subsystem updates from Jason Gunthorpe: "This cycle mainly saw lots of bug fixes and clean up code across the core code and several drivers, few new functional changes were made. - Many cleanup and bug fixes for hns - Various small bug fixes and cleanups in hfi1, mlx5, usnic, qed, bnxt_re, efa - Share the query_port code between all the iWarp drivers - General rework and cleanup of the ODP MR umem code to fit better with the mmu notifier get/put scheme - Support rdma netlink in non init_net name spaces - mlx5 support for XRC devx and DC ODP" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits) RDMA: Fix double-free in srq creation error flow RDMA/efa: Fix incorrect error print IB/mlx5: Free mpi in mp_slave mode IB/mlx5: Use the original address for the page during free_pages RDMA/bnxt_re: Fix spelling mistake "missin_resp" -> "missing_resp" RDMA/hns: Package operations of rq inline buffer into separate functions RDMA/hns: Optimize cmd init and mode selection for hip08 IB/hfi1: Define variables as unsigned long to fix KASAN warning IB/{rdmavt, hfi1, qib}: Add a counter for credit waits IB/hfi1: Add traces for TID RDMA READ RDMA/siw: Relax from kmap_atomic() use in TX path IB/iser: Support up to 16MB data transfer in a single command RDMA/siw: Fix page address mapping in TX path RDMA: Fix goto target to release the allocated memory RDMA/usnic: Avoid overly large buffers on stack RDMA/odp: Add missing cast for 32 bit RDMA/hns: Use devm_platform_ioremap_resource() to simplify code Documentation/infiniband: update name of some functions RDMA/cma: Fix false error message RDMA/hns: Fix wrong assignment of qp_access_flags ...
2019-09-21Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-9/+0
Pull hmm updates from Jason Gunthorpe: "This is more cleanup and consolidation of the hmm APIs and the very strongly related mmu_notifier interfaces. Many places across the tree using these interfaces are touched in the process. Beyond that a cleanup to the page walker API and a few memremap related changes round out the series: - General improvement of hmm_range_fault() and related APIs, more documentation, bug fixes from testing, API simplification & consolidation, and unused API removal - Simplify the hmm related kconfigs to HMM_MIRROR and DEVICE_PRIVATE, and make them internal kconfig selects - Hoist a lot of code related to mmu notifier attachment out of drivers by using a refcount get/put attachment idiom and remove the convoluted mmu_notifier_unregister_no_release() and related APIs. - General API improvement for the migrate_vma API and revision of its only user in nouveau - Annotate mmu_notifiers with lockdep and sleeping region debugging Two series unrelated to HMM or mmu_notifiers came along due to dependencies: - Allow pagemap's memremap_pages family of APIs to work without providing a struct device - Make walk_page_range() and related use a constant structure for function pointers" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (75 commits) libnvdimm: Enable unit test infrastructure compile checks mm, notifier: Catch sleeping/blocking for !blockable kernel.h: Add non_block_start/end() drm/radeon: guard against calling an unpaired radeon_mn_unregister() csky: add missing brackets in a macro for tlb.h pagewalk: use lockdep_assert_held for locking validation pagewalk: separate function pointers from iterator data mm: split out a new pagewalk.h header from mm.h mm/mmu_notifiers: annotate with might_sleep() mm/mmu_notifiers: prime lockdep mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end mm/mmu_notifiers: remove the __mmu_notifier_invalidate_range_start/end exports mm/hmm: hmm_range_fault() infinite loop mm/hmm: hmm_range_fault() NULL pointer bug mm/hmm: fix hmm_range_fault()'s handling of swapped out pages mm/mmu_notifiers: remove unregister_no_release RDMA/odp: remove ib_ucontext from ib_umem RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm' RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr RDMA/mlx5: Use ib_umem_start instead of umem.address ...
2019-09-16IB/mlx5: Free mpi in mp_slave modeDanit Goldberg1-0/+1
ib_add_slave_port() allocates a multiport struct but never frees it. Don't leak memory, free the allocated mpi struct during driver unload. Cc: <stable@vger.kernel.org> Fixes: 32f69e4be269 ("{net, IB}/mlx5: Manage port association for multiport RoCE") Link: https://lore.kernel.org/r/20190916064818.19823-3-leon@kernel.org Signed-off-by: Danit Goldberg <danitg@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13Merge tag 'v5.3-rc8' into rdma.git for-nextJason Gunthorpe1-3/+3
To resolve dependencies in following patches mlx5_ib.h conflict resolved by keeing both hunks Linux 5.3-rc8 Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-03net/mlx5: Add flow steering actions to fs_cmd shim layerMaor Gottlieb1-3/+4
Add flow steering actions: modify header and packet reformat to the fs_cmd shim layer. This allows each namespace to define possibly different functionality for alloc/dealloc action commands. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-02Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linuxSaeed Mahameed1-72/+30
Merge mlx5-next patches needed for upcoming mlx5 software steering. 1) Alex adds HW bits and definitions required for SW steering 2) Ariel moves device memory management to mlx5_core (From mlx5_ib) 3) Maor, Cleanups and fixups for eswitch mode and RoCE 4) Mark, Set only stag for match untagged packets Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-09-01net/mlx5: Move device memory management to mlx5_coreAriel Levkovich1-72/+30
Move the device memory allocation and deallocation commands SW ICM memory to mlx5_core to expose this API for all mlx5_core users. This comes as preparation for supporting SW steering in kernel where it will be required to allocate and register device memory for direct rule insertion. In addition, an API to register this device memory for future remote access operations is introduced using the create_mkey commands. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-08-26RDMA/mlx5: RDMA_RX flow type support for user applicationsMark Zhang1-0/+7
Currently user applications can only steer TCP/IP(NIC RX/RX) traffic. This patch adds RDMA_RX as a new flow type to allow the user to insert steering rules to control RDMA traffic. Two destinations are supported(but not set at the same time): devx flow table object and QP. Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190819113626.20284-4-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-21RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'Jason Gunthorpe1-5/+0
This is a significant simplification, no extra list is kept per FD, and the interval tree is now shared between all the ucontexts, reducing overhead if there are multiple ucontexts active. Link: https://lore.kernel.org/r/20190806231548.25242-7-jgg@ziepe.ca Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-21Merge branch 'odp_fixes' into rdma.git for-nextJason Gunthorpe1-8/+3
Jason Gunthorpe says: ==================== This is a collection of general cleanups for ODP to clarify some of the flows around umem creation and use of the interval tree. ==================== The branch is based on v5.3-rc5 due to dependencies * odp_fixes: RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr RDMA/mlx5: Use ib_umem_start instead of umem.address RDMA/core: Make invalidate_range a device operation RDMA/odp: Use kvcalloc for the dma_list and page_list RDMA/odp: Check for overflow when computing the umem_odp end RDMA/odp: Provide ib_umem_odp_release() to undo the allocs RDMA/odp: Split creating a umem_odp from ib_umem_get RDMA/odp: Make the three ways to create a umem_odp clear RMDA/odp: Consolidate umem_odp initialization RDMA/odp: Make it clearer when a umem is an implicit ODP umem RDMA/odp: Iterate over the whole rbtree directly RDMA/odp: Use the common interval tree library instead of generic RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-21RDMA/core: Make invalidate_range a device operationMoni Shoua1-4/+0
The callback function 'invalidate_range' is implemented in a driver so the place for it is in the ib_device_ops structure and not in ib_ucontext. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Link: https://lore.kernel.org/r/20190819111710.18440-11-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-08-20IB/mlx5: Report and handle ODP support properlyMoni Shoua1-3/+3
ODP depends on the several device capabilities, among them is the ability to send UMR WQEs with that modify atomic and entity size of the MR. Therefore, only if all conditions to send such a UMR WQE are met then driver can report that ODP is supported. Use this check of conditions in all places where driver needs to know about ODP support. Also, implicit ODP support depends on ability of driver to send UMR WQEs for an indirect mkey. Therefore, verify that all conditions to do so are met when reporting support. Fixes: c8d75a980fab ("IB/mlx5: Respect new UMR capabilities") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190815083834.9245-7-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2019-08-13RDMA/mlx5: Annotate lock dependency in bind/unbind slave portLeon Romanovsky1-2/+4
NULL-ing notifier_call is performed under protection of mlx5_ib_multiport_mutex lock. Such protection is not easily spotted and better to be guarded by lockdep annotation. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/20190813102814.22350-1-leon@kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>