aboutsummaryrefslogtreecommitdiffstats
path: root/include (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-07-05RDMA/nldev: Allow get default counter statistics through RDMA netlinkMark Zhang1-0/+1
This patch adds the ability to return the hwstats of per-port default counters (which can also be queried through sysfs nodes). Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05RDMA/nldev: Allow counter manual mode configration through RDMA netlinkMark Zhang2-0/+5
Provide an option to allow users to manually bind a qp with a counter through RDMA netlink. Limit it to users with ADMIN capability only. Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05RDMA/counter: Allow manual mode configuration supportMark Zhang2-0/+13
In manual mode a QP is bound to a counter manually. If counter is not specified then a new one will be allocated. Manual mode is enabled when user binds a QP, and disabled when the last manually bound QP is unbound. When auto-mode is turned off and there are counters left, manual mode is enabled so that the user is able to access these counters. Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05RDMA/core: Get sum value of all counters when perform a sysfs stat readMark Zhang1-0/+2
Since a QP can only be bound to one counter, then if it is bound to a separate counter, for backward compatibility purpose, the statistic value must be: * stat of default counter + stat of all running allocated counters + stat of all deallocated counters (history stats) Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05RDMA/netlink: Implement counter dumpit calbackMark Zhang3-7/+28
This patch adds the ability to return all available counters together with their properties and hwstats. Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05RDMA/nldev: Allow counter auto mode configration through RDMA netlinkMark Zhang1-0/+8
Provide an option to enable/disable per-port counter auto mode through RDMA netlink. Limit it to users with ADMIN capability only. Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05IB/mlx5: Support set qp counterMark Zhang1-0/+1
Support bind a qp with counter. If counter is null then bind the qp to the default counter. Different QP state has different operation: - RESET: Set the counter field so that it will take effective during RST2INIT change; - RTS: Issue an RTS2RTS change to update the QP counter; - Other: Set the counter field and mark the counter_pending flag, when QP is moved to RTS state and this flag is set, then issue an RTS2RTS modification to update the counter. Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05RDMA/counter: Add "auto" configuration mode supportMark Zhang2-0/+26
In auto mode all QPs belong to one category are bind automatically to a single counter set. Currently only "qp type" is supported. In this mode the qp counter is set in RST2INIT modification, and when a qp is destroyed the counter is unbound. Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05RDMA/counter: Add set/clear per-port auto mode supportMark Zhang3-0/+52
Add an API to support set/clear per-port auto mode. Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05RDMA/restrack: Introduce statistic counterMark Zhang2-0/+22
Introduce statistic counter as a new resource. It allows a user to monitor specific objects (e.g., QPs) by binding to a counter. In some cases a user counter resource is created with task other then "current", because its creation is done as part of rdmatool call. Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-05Merge mlx5-next into rdma for-nextJason Gunthorpe6-13/+131
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Required for dependencies in the next patches. * mlx5-next: net/mlx5: Add rts2rts_qp_counters_set_id field in hca cap net/mlx5: Properly name the generic WQE control field net/mlx5: Introduce TLS TX offload hardware bits and structures net/mlx5: Refactor mlx5_esw_query_functions for modularity net/mlx5: E-Switch prepare functions change handler to be modular net/mlx5: Introduce and use mlx5_eswitch_get_total_vports()
2019-07-04net/mlx5: Add rts2rts_qp_counters_set_id field in hca capMark Zhang1-1/+3
Add rts2rts_qp_counters_set_id field in hca cap so that RTS2RTS qp modification can be used to change the counter of a QP. Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03IB/mlx5: Implement DEVX dispatching eventYishai Hadas1-0/+5
Implement DEVX dispatching event by looking up for the applicable subscriptions for the reported event and using their target fd to signal/set the event. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03IB/mlx5: Enable subscription for device events over DEVXYishai Hadas1-0/+9
Enable subscription for device events over DEVX. Each subscription is added to the two level xarray data structure according to its event number and the DEVX object information in case was given with the given target fd. Those events will be reported over the given fd once will occur. Downstream patches will mange the dispatching to any subscription. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FDYishai Hadas2-0/+14
Introduce MLX5_IB_OBJECT_DEVX_ASYNC_EVENT_FD and its initial implementation. This object is from type class FD and will be used to read DEVX async events. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03net/mlx5: Properly name the generic WQE control fieldTariq Toukan1-1/+6
A generic WQE control field is used for different purposes in different cases. Use union to allow using the proper name in each case. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-03net/mlx5: Introduce TLS TX offload hardware bits and structuresEran Ben Elisha2-4/+114
Add TLS offload related IFC structs, layouts and enumerations. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-03net/mlx5: Introduce and use mlx5_eswitch_get_total_vports()Parav Pandit3-7/+8
Instead MLX5_TOTAL_VPORTS, use mlx5_eswitch_get_total_vports(). mlx5_eswitch_get_total_vports() in subsequent patch accounts for SF vports as well. Expanding MLX5_TOTAL_VPORTS macro would require exposing SF internals to more generic vport.h header file. Such exposure is not desired. Hence a mlx5_eswitch_get_total_vports() is introduced. Given that mlx5_eswitch_get_total_vports() API wants to work on const mlx5_core_dev*, change its helper functions also to accept const *dev. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-03Merge mlx5-next into rdma for-nextJason Gunthorpe9-49/+294
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Required for dependencies in the next patches. Resolved the conflicts: - esw_destroy_offloads_acl_tables() use the newer mlx5_esw_for_all_vports() version - esw_offloads_steering_init() drop the cap test - esw_offloads_init() drop the extra function arguments * branch 'mlx5-next': (39 commits) net/mlx5: Expose device definitions for object events net/mlx5: Report EQE data upon CQ completion net/mlx5: Report a CQ error event only when a handler was set net/mlx5: mlx5_core_create_cq() enhancements net/mlx5: Expose the API to register for ANY event net/mlx5: Use event mask based on device capabilities net/mlx5: Fix mlx5_core_destroy_cq() error flow net/mlx5: E-Switch, Handle UC address change in switchdev mode net/mlx5: E-Switch, Consider host PF for inline mode and vlan pop net/mlx5: E-Switch, Use iterator for vlan and min-inline setups net/mlx5: E-Switch, Reg/unreg function changed event at correct stage net/mlx5: E-Switch, Consolidate eswitch function number of VFs net/mlx5: E-Switch, Refactor eswitch SR-IOV interface net/mlx5: Handle host PF vport mac/guid for ECPF net/mlx5: E-Switch, Use correct flags when configuring vlan net/mlx5: Reduce dependency on enabled_vfs counter and num_vfs net/mlx5: Don't handle VF func change if host PF is disabled net/mlx5: Limit scope of mlx5_get_next_phys_dev() to PCI PF devices net/mlx5: Move pci status reg access mutex to mlx5_pci_init net/mlx5: Rename mlx5_pci_dev_type to mlx5_coredev_type ... Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-03net/mlx5: Expose device definitions for object eventsYishai Hadas1-0/+21
Expose an extra device definitions for objects events. It includes: object_type values for legacy objects and generic data header for any other object. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03net/mlx5: Report EQE data upon CQ completionYishai Hadas1-2/+2
Report EQE data upon CQ completion to let upper layers use this data. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03net/mlx5: mlx5_core_create_cq() enhancementsYishai Hadas1-1/+1
Enhance mlx5_core_create_cq() to get the command out buffer from the callers to let them use the output. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03net/mlx5: Expose the API to register for ANY eventYishai Hadas1-0/+2
Expose the API to register for ANY event, mlx5_ib will be able to use this functionality for its needs. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-03net/mlx5: Use event mask based on device capabilitiesYishai Hadas3-5/+16
Use the reported device capabilities for the supported user events (i.e. affiliated and un-affiliated) to set the EQ mask. As the event mask can be up to 256 defined by 4 entries of u64 change the applicable code to work accordingly. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2019-07-02rdma/siw: application interfaceBernard Metzler2-0/+186
Broken up commit to add the Soft iWarp RDMA driver. Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-01net/mlx5: E-Switch, Consider host PF for inline mode and vlan popBodong Wang1-0/+1
When ECPF is the eswitch manager, host PF is treated like other VFs. Driver should do the same for inline mode and vlan pop. Add new iterators to include host PF if ECPF is the eswitch manager. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: E-Switch, Refactor eswitch SR-IOV interfaceBodong Wang1-3/+3
Devlink eswitch mode is not necessarily related to SR-IOV, e.g, ECPF can be at offload mode when SR-IOV is not enabled. Rename the interface and eswitch mode names to decouple from SR-IOV, and cleanup eswitch messages accordingly. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: Handle host PF vport mac/guid for ECPFBodong Wang1-1/+2
When ECPF is eswitch manager, it has the privilege to query and configure the mac and node guid of host PF. While vport number of host PF is 0, the vport command should be issued with other_vport set in this case as the cmd is issued by ECPF vport(0xfffe). Add a specific function to query own vport mac. Low level functions are used by vport manager to query/modify any vport mac and node guid. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: Reduce dependency on enabled_vfs counter and num_vfsParav Pandit1-1/+0
While enabling SR-IOV, PCI core already checks that if SR-IOV is already enabled, it returns failure error code. Hence, remove such duplicate check from mlx5_core driver. While at it, make mlx5_device_disable_sriov() to perform cleanup of VFs in reverse order of mlx5_device_enable_sriov(). Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: Don't handle VF func change if host PF is disabledBodong Wang1-1/+2
When ECPF eswitch manager is at offloads mode, it monitors functions changed event from host PF side and acts according to the number of VFs enabled/disabled. As ECPF and host PF work in two independent hosts, it's possible that host PF OS reboots but ECPF system is still kept on and continues monitoring events from host PF. When kernel from host PF side is booting, PCI iov driver does sriov_init and compute_max_vf_buses by iterating over all valid num of VFs. This triggers FLR and generates functions changed events, even though host PF HCA is not enabled at this time. However, ECPF is not aware of this information, and still handles these events as usual. ECPF system will see massive number of reps are created, but destroyed immediately once creation finished. To eliminate this noise, a bit is added to host parameter context to indicate host PF is disabled. ECPF will not handle the VF changed event if this bit is set. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: Rename mlx5_pci_dev_type to mlx5_coredev_typeHuy Nguyen1-3/+8
Rename mlx5_pci_dev_type to mlx5_coredev_type to distinguish different mlx5 device types. mlx5_coredev_type represents mlx5_core_dev instance type. Hence keep mlx5_coredev_type in mlx5_core_dev structure. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01{IB, net}/mlx5: E-Switch, Use index of rep for vport to IB port mappingBodong Wang1-0/+2
In the single IB device mode, the mapping between vport number and rep relies on a counter. However for dynamic vport allocation, it is desired to keep consistent map of eswitch vport and IB port. Hence, simplify code to remove the free running counter and instead use the available vport index during load/unload sequence from the eswitch. Signed-off-by: Bodong Wang <bodong@mellanox.com> Suggested-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: Added MCQI and MCQS registers' description to ifcShay Agroskin2-2/+58
Given a fw component index, the MCQI register allows us to query this component's information (e.g. its version and capabilities). Given a fw component index, the MCQS register allows us to query the status of a fw component, including its type and state (e.g. PRESET/IN_USE). It can be used to find the index of a component of a specific type, by sequentially increasing the component index, and querying each time the type of the returned component. If max component index is reached, 'last_index_flag' is set by the HCA. These registers' description was added to query the running and pending fw version of the HCA. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-07-01net/mlx5: Add hardware definitions for sub functionsParav Pandit1-3/+96
Update mlx5 device interface data structures for: 1. New command definitions for allocating, deallocating SF 2. Query SF partition 3. Eswitch SF fields 4. HCA CAP SF fields 5. Extend Eswitch functions command for SF Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-28IB/{rdmavt, hfi1, qib}: Add helpers to hide SWQE WR detailsMichael J. Ruhl1-0/+50
Add some helper functions to hide struct rvt_swqe details. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/{rdmavt, hfi1, qib}: Remove AH refcount for UD QPsMichael J. Ruhl2-5/+20
Historically rdmavt destroy_ah() has returned an -EBUSY when the AH has a non-zero reference count. IBTA 11.2.2 notes no such return value or error case: Output Modifiers: - Verb results: - Operation completed successfully. - Invalid HCA handle. - Invalid address handle. ULPs never test for this error and this will leak memory. The reference count exists to allow for driver independent progress mechanisms to process UD SWQEs in parallel with post sends. The SWQE will hold a reference count until the UD SWQE completes and then drops the reference. Fix by removing need to reference count the AH. Add a UD specific allocation to each SWQE entry to cache the necessary information for independent progress. Copy the information during the post send processing. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/{hfi1, qib, rdmavt}: Put qp in error state when cq is fullKamenee Arumugam2-5/+45
When a completion queue is full, the associated queue pairs are not put into the error state. According to the IBTA specification, this is a violation. Quote from IBTA spec: C9-218: A Requester Class F error occurs when the CQ is inaccessible or full and an attempt is made to complete a WQE. The Affected QP shall be moved to the error state and affiliated asynchronous errors generated as described in 11.6.3.1 Affiliated Asynchronous Events on page 678. The current WQE and any subsequent WQEs are left in an unknown state. C11-37: The CI shall generate a CQ Error when a CQ overrun is detected. This condition will result in an Affiliated Asynchronous Error for any associated Work Queues when they attempt to use that CQ. Completions can no longer be added to the CQ. It is not guaranteed that completions present in the CQ at the time the error occurred can be retrieved. Possible causes include a CQ overrun or a CQ protection error. Put the qp in error state when cq is full. Implement a state called full to continue to put other associated QPs in error state. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/rdmavt: Fracture single lock used for posting and processing RWQEsKamenee Arumugam1-0/+7
Usage of single lock prevents fetching posted and processing receive work queue entries from progressing simultaneously and impacts overall performance. Fracture the single lock used for posting and processing Receive Work Queue Entries (RWQEs) to allow the circular buffer to be filled and emptied at the same time. Two new spinlocks - one for the producers and one for the consumers used for posting and processing RWQEs simultaneously and the two indices are define on two different cache lines. The threshold count is used to avoid reading other index in different cache line every time. Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/hfi1: Move receive work queue struct into uapi directoryKamenee Arumugam2-21/+60
The rvt_rwqe and rvt_rwq struct elements are shared between rdmavt and the providers but are not in uapi directory. As per the comment in https://marc.info/?l=linux-rdma&m=152296522708522&w=2, The hfi1 driver and the rdma core driver are not using shared structures in the uapi directory. Move rvt_rwqe and rvt_rwq struct into rvt-abi.h header in uapi directory. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28IB/hfi1: Move rvt_cq_wc struct into uapi directoryKamenee Arumugam3-6/+80
The rvt_cq_wc struct elements are shared between rdmavt and the providers but not in uapi directory. As per the comment in https://marc.info/?l=linux-rdma&m=152296522708522&w=2 The hfi1 driver and the rdma core driver are not using shared structures in the uapi directory. In that case, move rvt_cq_wc struct into the rvt-abi.h header file and create a rvt_k_cq_w for the kernel completion queue. Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-28Merge tag 'v5.2-rc6' into rdma.git for-nextJason Gunthorpe1563-11568/+1967
For dependencies in next patches. Resolve conflicts: - Use uverbs_get_cleared_udata() with new cq allocation flow - Continue to delete nes despite SPDX conflict - Resolve list appends in mlx5_command_str() - Use u16 for vport_rule stuff - Resolve list appends in struct ib_client Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-26net/mlx5e: Specifying known origin of packets matching the flowJianbo Liu1-0/+1
In vport metadata matching, source port number is replaced by metadata. While FW has no idea about what it is in the metadata, a syndrome will happen. Specify a known origin to avoid the syndrome. However, there is no functional change because ANY_VPORT (0) is filled in flow_source, the same default value as before, as a pre-step towards metadata matching for fast path. There are two other values can be filled in flow_source. When setting 0x1, packet matching this rule is from uplink, while 0x2 is for packet from other local vports. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: E-Switch, Tag packet with vport number in VF vports and uplink ingress ACLsJianbo Liu1-0/+17
When a dual-port VHCA sends a RoCE packet on its non-native port, and the packet arrives to its affiliated vport FDB, a mismatch might occur on the rules that match the packet source vport as it is not represented by single VHCA only in this case. So we change to match on metadata instead of source vport. To do that, a rule is created in all vports and uplink ingress ACLs, to save the source vport number and vhca id in the packet's metadata in order to match on it later. The metadata register used is the first of the 32-bit type C registers. It can be used for matching and header modify operations. The higher 16 bits of this register are for vhca id, and the lower 16 ones is for vport number. This change is not for dual-port RoCE only. If HW and FW allow, the vport metadata matching is enabled by default. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: Add flow context for flow tagJianbo Liu1-4/+11
Refactor the flow data structures, add new flow_context and move flow_tag into it, as flow_tag doesn't belong to the rule action. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-26net/mlx5: Introduce vport metadata matching bits and enum constantsJianbo Liu1-7/+49
When a dual-port VHCA sends a RoCE packet on its non-native port, and the packet arrives to its affiliated vport FDB, a mismatch might occur on the rules that match the packet source vport. So we replace the match on source port with the match on metadata that was configured in ingress ACL, and that metadata will be passed further also to the NIC RX table of the eswitch manager. Introduce vport metadata matching bits and enum constants as a pre-step towards metadata matching. o metadata type C registers in the misc parameters 2 fields. o esw_uplink_ingress_acl bit in esw cap. If it set, the device supports ingress ACL for the uplink vport. o fdb_to_vport_reg_* bits in flow table cap and esw vport context, to support propagating the metadata to the nic rx through the loopback path. o flow_source in flow context, to indicate the known origin of packets. o enum constants, to support the above bits. Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-25RDMA/netlink: Audit policy settings for netlink attributesDoug Ledford2-4/+6
For all string attributes for which we don't currently accept the element as input, we only use it as output, set the string length to RDMA_NLDEV_ATTR_EMPTY_STRING which is defined as 1. That way we will only accept a null string for that element. This will prevent someone from writing a new input routine that uses the element without also updating the policy to have a valid value. Also while there, make sure the existing entries that are valid have the correct policy, if not, correct the policy. Remove unnecessary checks for nla_strlcpy() overflow once the policy has been set correctly. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24net/mlx5: Convert mkey_table to XArrayMatthew Wilcox2-16/+2
The lock protecting the data structure does not need to be an rwlock. The only read access to the lock is in an error path, and if that's limiting your scalability, you have bigger performance problems. Eliminate mlx5_mkey_table in favour of using the xarray directly. reg_mr_callback must use GFP_ATOMIC for allocating XArray nodes as it may be called in interrupt context. This also fixes a minor bug where SRCU locking was being used on the radix tree read side, when RCU was needed too. Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-06-24RDMA/mlx5: Remove unused IB_WR_REG_SIG_MR codeIsrael Rukshin1-19/+0
IB_WR_REG_SIG_MR is not needed after IB_WR_REG_MR_INTEGRITY was used. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/rw: Use IB_WR_REG_MR_INTEGRITY for PI handoverIsrael Rukshin1-9/+0
Replace the old signature handover API with the new one. The new API simplifes PI handover code complexity for ULPs and improve performance. For RW API it will reduce the maximum number of work requests per task and the need of dealing with multiple MRs (and their registrations and invalidations) per task. All the mappings and registration of the data and the protection buffers is done by the LLD using a single WR and a special MR type (IB_MR_TYPE_INTEGRITY) for the PI handover operation. The setup of the tested benchmark (using iSER ULP): - 2 servers with 24 cores (1 initiator and 1 target) - ConnectX-4/ConnectX-5 adapters - 24 target sessions with 1 LUN each - ramdisk backstore - PI active Performance results running fio (24 jobs, 128 iodepth) using write_generate=1 and read_verify=1 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1243.3K/1182.3K 1725.1K/1680.2K 4k 571233/528835 743293/748259 32k 72388/71086 71789/93573 Using write_generate=0 and read_verify=0 (w/w.o patch): bs IOPS(read) IOPS(write) ---- ---------- ---------- 512 1572.1K/1427.2K 1823.5K/1724.3K 4k 921992/916194 753772/768267 32k 75052/73960 73180/95484 There is a performance degradation when writing big block sizes. Degradation is caused by the complexity of combining multiple indirections and perform RDMA READ operation from it. This will be fixed in the following patches by reducing the indirections if possible. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-06-24RDMA/core: Validate integrity handover device capMax Gurtovoy1-0/+1
Protect the case that a ULP tries to allocate a QP with signature enabled flag while the LLD doesn't support this feature. While we're here, also move integrity_en attribute from mlx5_qp to ib_qp as a preparation for adding new integrity API to the rw-API (that is part of ib_core module). Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>