aboutsummaryrefslogtreecommitdiffstats
path: root/include/rdma/ib_verbs.h (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-06-25IB: Make ib_init_ah_attr_from_wc set sgid_attrParav Pandit1-0/+7
The work completion is inspected to determine what dgid table entry was used to receieve the packet, produces a sgid_attr that matches and sticks it in the ah_attr. All callers of this function are now required to release the ah_attr on success. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-19IB/core: Expose ib_ucontext from a given ib_uverbs_fileYishai Hadas1-0/+2
Drivers that use the IOCTL API may have the ib_uverbs_file and need a way to get the related ib_ucontext from it, this is enabled by this patch. Downstream patches from this series will use it. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18IB/core: add max_send_sge and max_recv_sge attributesSteve Wise1-1/+2
This patch replaces the ib_device_attr.max_sge with max_send_sge and max_recv_sge. It allows ulps to take advantage of devices that have very different send and recv sge depths. For example cxgb4 has a max_recv_sge of 4, yet a max_send_sge of 16. Splitting out these attributes allows much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW API. Consider a large RDMA WRITE that has 16 scattergather entries. With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of 16, it can be done with 1 WRITE WR. Acked-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18RDMA: Hold the sgid_attr inside the struct ib_ah/qpJason Gunthorpe1-0/+4
If the AH has a GRH then hold a reference to the sgid_attr inside the common struct. If the QP is modified with an AV that includes a GRH then also hold a reference to the sgid_attr inside the common struct. This informs the cache that the sgid_index is in-use so long as the AH or QP using it exists. This also means that all drivers can access the sgid_attr directly from the ah_attr instead of querying the cache during their UD post-send paths. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18RDMA: Convert drivers to use sgid_attr instead of sgid_indexParav Pandit1-4/+4
The core code now ensures that all driver callbacks that receive an rdma_ah_attrs will have a sgid_attr's pointer if there is a GRH present. Drivers can use this pointer instead of calling a query function with sgid_index. This simplifies the drivers and also avoids races where a gid_index lookup may return different data if it is changed. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18IB{cm, core}: Introduce and use ah_attr copy, move, replace APIsJason Gunthorpe1-0/+5
Introduce AH attribute copy, move and replace APIs to be used by core and provider drivers. In CM code flow when ah attribute might be re-initialized twice while processing incoming request, or initialized once while from path record while sending out CM requests. Therefore use rdma_move_ah_attr API to handle such scenarios instead of memcpy(). Provider drivers keeps a copy ah_attr during the lifetime of the ah. Therefore, use rdma_replace_ah_attr() which conditionally release reference to old ah_attr and holds reference to new attribute whose referrence is released when the AH is freed. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18IB/core: Add a sgid_attr pointer to struct rdma_ah_attrJason Gunthorpe1-0/+7
The sgid_attr will ultimately replace the sgid_index in the ah_attr. This will allow for all layers to have a consistent view of what gid table entry was selected as processing runs through all stages of the stack. This commit introduces the pointer and ensures it is set before calling any driver callback that includes a struct ah_attr callback, allowing future patches to adjust both the drivers and the callers to use sgid_attr instead of sgid_index. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18IB: Replace ib_query_gid/ib_get_cached_gid with rdma_query_gidParav Pandit1-4/+0
If the gid_attr argument is NULL then the functions behave identically to rdma_query_gid. ib_query_gid just calls ib_get_cached_gid, so everything can be consolidated to one function. Now that all callers either use rdma_query_gid() or ib_get_cached_gid(), ib_query_gid() API is removed. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18RDMA: Use GID from the ib_gid_attr during the add_gid() callbackParav Pandit1-2/+1
Now that ib_gid_attr contains the GID, make use of that in the add_gid() callback functions for the provider drivers to simplify the add_gid() implementations. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18IB/core: Introduce GID entry reference countsParav Pandit1-0/+1
In order to be able to expose pointers to the ib_gid_attrs in the GID table we need to make it so the value of the pointer cannot be changed. Thus each GID table entry gets a unique piece of kref'd memory that is written only during initialization and remains constant for its lifetime. This eventually will allow the struct ib_gid_attrs to be returned without copy from many of query the APIs, but it also provides a way to track when all users of a HW table index go away. For roce we no longer allow an in-use HW table index to be re-used for a new an different entry. When a GID table entry needs to be removed it is hidden from the find API, but remains as a valid HW index and all ib_gid_attr points remain valid. The HW index is not relased until all users put the kref. Later patches will broadly replace the use of the sgid_index integer with the kref'd structure. Ultimately this will prevent security problems where the OS changes the properties of a HW GID table entry while an active user object is still using the entry. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-12Convert infiniband uverbs to struct_sizeMatthew Wilcox1-4/+1
The flows were hidden from the C compiler; expose them as a zero-length array to allow struct_size to work. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-04RDMA/core: introduce check masks for T10-PI offloadMax Gurtovoy1-0/+13
T10-PI offload capability is currently supported in iSER protocol only, and the definition of the HCA protection information checks are missing from the core layer. Add those definition to avoid code duplication in other drivers (such iSER target and NVMeoF). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-04Merge tag 'verbs_flow_counters' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git into for-nextJason Gunthorpe1-2/+41
Pull verbs counters series from Leon Romanovsky: ==================== Verbs flow counters support This series comes to allow user space applications to monitor real time traffic activity and events of the verbs objects it manages, e.g.: ibv_qp, ibv_wq, ibv_flow. The API enables generic counters creation and define mapping to association with a verbs object, the current mlx5 driver is using this API for flow counters. With this API, an application can monitor the entire life cycle of object activity, defined here as a static counters attachment. This API also allows dynamic counters monitoring of measurement points for a partial period in the verbs object life cycle. In addition it presents the implementation of the generic counters interface. This will be achieved by extending flow creation by adding a new flow count specification type which allows the user to associate a previously created flow counters using the generic verbs counters interface to the created flow, once associated the user could read statistics by using the read function of the generic counters interface. The API includes: 1. create and destroyed API of a new counters objects 2. read the counters values from HW Note: Attaching API to allow application to define the measurement points per objects is a user space only API and this data is passed to kernel when the counted object (e.g. flow) is created with the counters object. =================== * tag 'verbs_flow_counters': IB/mlx5: Add counters read support IB/mlx5: Add flow counters read support IB/mlx5: Add flow counters binding support IB/mlx5: Add counters create and destroy support IB/uverbs: Add support for flow counters IB/core: Add support for flow counters IB/core: Support passing uhw for create_flow IB/uverbs: Add read counters support IB/core: Introduce counters read verb IB/uverbs: Add create/destroy counters support IB/core: Introduce counters object and its create/destroy IB/uverbs: Add an ib_uobject getter to ioctl() infrastructure net/mlx5: Export flow counter related API net/mlx5: Use flow counter pointer as input to the query function
2018-06-02IB/core: Add support for flow countersRaed Salem1-1/+14
A counters object could be attached to flow on creation by providing the counter specification action. General counters description which count packets and bytes are introduced, downstream patches from this series will use them as part of flow counters binding. In addition, increase number of flow specifications supported layers to 10 upon adding count specification and for the previously added drop specification. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-02IB/core: Support passing uhw for create_flowMatan Barak1-1/+2
This is required when user-space drivers need to pass extra information regarding how to handle this flow steering specification. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-02IB/core: Introduce counters read verbRaed Salem1-0/+14
The user supplies counters instance and a reference to an output array of uint64_t. The driver reads the hardware counters values and writes them to the output index location in the user supplied array. All counters values are represented as uint64_t types. To be able to successfully read the data the counters must be first bound to an IB object. Downstream patches will present binding method for flow counters. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-02IB/core: Introduce counters object and its create/destroyRaed Salem1-0/+11
A verbs application may need to get statistics and info on various aspects of a verb object (e.g. Flow, QP, ...), in general case the application will state which object's counters its interested in (we refer to this action as attach), bind this new counters object to the appropriate verb object and on later stage read their values using the counters object. This series introduces a general API for counters object that may accumulate any ib object counters type, bound and read on demand. Counters instance is allocated on an IB context and belongs to that context. Upon successful creation the counters can be bound to a verbs object so that hardware counter instances can be created and read. Downstream patches in this series will introduce the attach, bind and the read functionality. Counters instance can be de-allocated, upon successful destruction the related hardware resources are released. Prior to destroy call the user must first make sure that the counters is not being used by any IB object, e.g. not attached to any of its counted type otherwise an EBUSY error is invoked. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Raed Salem <raeds@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-28Merge branch 'mr_fix' into git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma for-nextJason Gunthorpe1-0/+14
Update mlx4 to support user MR creation against read-only memory, previously it required the memory to be writable. Based on rdma for-rc due to dependencies. * mr_fix: (2 commits) IB/mlx4: Mark user MR as writable if actual virtual memory is writable IB/core: Make testing MR flags for writability a static inline function
2018-05-28IB/core: Make testing MR flags for writability a static inline functionJack Morgenstein1-0/+14
Make the MR writability flags check, which is performed in umem.c, a static inline function in file ib_verbs.h This allows the function to be used by low-level infiniband drivers. Cc: <stable@vger.kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-05-16IB/uverbs: Introduce a MPLS steering match filterAriel Levkovich1-0/+15
Add a new MPLS steering match filter that can match against a single MPLS tag field. Since the MPLS header can reside in different locations in the packet's protocol stack as well as be encapsulated with a tunnel protocol, it is required to know the exact location of the header in the protocol stack. Therefore, when including the MPLS protocol spec in the specs list, it is mandatory to provide the list in an ordered manner, so that it represents the actual header order in a matching packet. Drivers that process the spec list and apply the matching rule should treat the position of the MPLS spec in the spec list as the actual location of the MPLS label in the packet's protocol stack. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16IB/uverbs: Introduce a GRE steering match filterAriel Levkovich1-0/+17
Adding a new GRE steering match filter that can match against key and protocol fields. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-06Merge tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-41/+171
Pull rdma updates from Jason Gunthorpe: "Doug and I are at a conference next week so if another PR is sent I expect it to only be bug fixes. Parav noted yesterday that there are some fringe case behavior changes in his work that he would like to fix, and I see that Intel has a number of rc looking patches for HFI1 they posted yesterday. Parav is again the biggest contributor by patch count with his ongoing work to enable container support in the RDMA stack, followed by Leon doing syzkaller inspired cleanups, though most of the actual fixing went to RC. There is one uncomfortable series here fixing the user ABI to actually work as intended in 32 bit mode. There are lots of notes in the commit messages, but the basic summary is we don't think there is an actual 32 bit kernel user of drivers/infiniband for several good reasons. However we are seeing people want to use a 32 bit user space with 64 bit kernel, which didn't completely work today. So in fixing it we required a 32 bit rxe user to upgrade their userspace. rxe users are still already quite rare and we think a 32 bit one is non-existing. - Fix RDMA uapi headers to actually compile in userspace and be more complete - Three shared with netdev pull requests from Mellanox: * 7 patches, mostly to net with 1 IB related one at the back). This series addresses an IRQ performance issue (patch 1), cleanups related to the fix for the IRQ performance problem (patches 2-6), and then extends the fragmented completion queue support that already exists in the net side of the driver to the ib side of the driver (patch 7). * Mostly IB, with 5 patches to net that are needed to support the remaining 10 patches to the IB subsystem. This series extends the current 'representor' framework when the mlx5 driver is in switchdev mode from being a netdev only construct to being a netdev/IB dev construct. The IB dev is limited to raw Eth queue pairs only, but by having an IB dev of this type attached to the representor for a switchdev port, it enables DPDK to work on the switchdev device. * All net related, but needed as infrastructure for the rdma driver - Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers - SRP performance updates - IB uverbs write path cleanup patch series from Leon - Add RDMA_CM support to ib_srpt. This is disabled by default. Users need to set the port for ib_srpt to listen on in configfs in order for it to be enabled (/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port) - TSO and Scatter FCS support in mlx4 - Refactor of modify_qp routine to resolve problems seen while working on new code that is forthcoming - More refactoring and updates of RDMA CM for containers support from Parav - mlx5 'fine grained packet pacing', 'ipsec offload' and 'device memory' user API features - Infrastructure updates for the new IOCTL interface, based on increased usage - ABI compatibility bug fixes to fully support 32 bit userspace on 64 bit kernel as was originally intended. See the commit messages for extensive details - Syzkaller bugs and code cleanups motivated by them" * tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (199 commits) IB/rxe: Fix for oops in rxe_register_device on ppc64le arch IB/mlx5: Device memory mr registration support net/mlx5: Mkey creation command adjustments IB/mlx5: Device memory support in mlx5_ib net/mlx5: Query device memory capabilities IB/uverbs: Add device memory registration ioctl support IB/uverbs: Add alloc/free dm uverbs ioctl support IB/uverbs: Add device memory capabilities reporting IB/uverbs: Expose device memory capabilities to user RDMA/qedr: Fix wmb usage in qedr IB/rxe: Removed GID add/del dummy routines RDMA/qedr: Zero stack memory before copying to user space IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIR IB/mlx5: Add information for querying IPsec capabilities IB/mlx5: Add IPsec support for egress and ingress {net,IB}/mlx5: Add ipsec helper IB/mlx5: Add modify_flow_action_esp verb IB/mlx5: Add implementation for create and destroy action_xfrm IB/uverbs: Introduce ESP steering match filter IB/uverbs: Add modify ESP flow_action ...
2018-04-05IB/uverbs: Add device memory registration ioctl supportAriel Levkovich1-0/+11
Adding new ioctl method for the MR object - REG_DM_MR. This command can be used by users to register an allocated device memory buffer as an MR and receive lkey and rkey to be used within work requests. It is added as a new method under the MR object and using a new ib_device callback - reg_dm_mr. The command creates a standard ib_mr object which represents the registered memory. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05IB/uverbs: Add alloc/free dm uverbs ioctl supportAriel Levkovich1-1/+19
This change adds uverbs support for allocation/freeing of device memory commands. A new uverbs object is defined of type idr to represent and track the new resource type allocation per context. The API requires provider driver to implement 2 new ib_device callbacks - one for allocation and one for deallocation which return and accept (respectively) the ib_dm object which represents the allocated memory on the device. The support is added via the ioctl command infrastructure only. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05IB/uverbs: Add device memory capabilities reportingAriel Levkovich1-0/+1
This change allows vendors to report device memory capability max_dm_size - to user via uverbs command. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04IB/uverbs: Introduce ESP steering match filterMatan Barak1-0/+16
Adding a new ESP steering match filter that could match against spi and seq used in IPSec protocol. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04IB/uverbs: Add modify ESP flow_actionMatan Barak1-0/+4
flow_actions of ESP type could be modified during runtime. This could be common for example when ESN should be changed. Adding a new UVERBS_FLOW_ACTION_ESP_MODIFY method for changing ESP parameters of an existing ESP flow_action. The new method uses the UVERBS_FLOW_ACTION_ESP_CREATE attributes, but adds a new IB_FLOW_ACTION_ESP_FLAGS_MOD_ESP_ATTRS which means ESP_ATTRS should be changed. In addition, we add a new FLOW_ACTION_ESP_REPLAY_NONE replay type that could be used when one wants to disable a replay protection over a specific flow_action. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04IB/uverbs: Introduce egress flow steeringBoris Pismenny1-1/+2
The egress flag indicates that this flow steering rule is for egress traffic. The scope of an egress rule is port-wide, meaning all packets originated from that port, which match the steering rule specification will be effected by this steering rule's action. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04IB/uverbs: Add action_handle flow steering specificationMatan Barak1-0/+8
Binding a flow_action to flow steering rule requires using a new specification. Therefore, adding such an IB_FLOW_SPEC_ACTION_HANDLE flow specification. Flow steering rules could use flow_action(s) and as of that we need to avoid deleting flow_action(s) as long as they're being used. Moreover, when the attached rules are deleted, action_handle reference count should be decremented. Introducing a new mechanism of flow resources to keep track on the attached action_handle(s). Later on, this mechanism should be extended to other attached flow steering resources like flow counters. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04IB/uverbs: Add flow_action create and destroy verbsMatan Barak1-0/+65
A verbs application may receive and transmits packets using a data path pipeline. Sometimes, the first stage in the receive pipeline or the last stage in the transmit pipeline involves transforming a packet, either in order to make it easier for later stages to process it or to prepare it for transmission over the wire. Such transformation could be stripping/encapsulating the packet (i.e. vxlan), decrypting/encrypting it (i.e. ipsec), altering headers, doing some complex FPGA changes, etc. Some hardware could do such transformations without software data path intervention at all. The flow steering API supports steering a packet (either to a QP or dropping it) and some simple packet immutable actions (i.e. tagging a packet). Complex actions, that may change the packet, could bloat the flow steering API extensively. Sometimes the same action should be applied to several flows. In this case, it's easier to bind several flows to the same action and modify it than change all matching flows. Introducing a new flow_action object that abstracts any packet transformation (out of a standard and well defined set of actions). This flow_action object could be tied to a flow steering rule via a new specification. Currently, we support esp flow_action, which encrypts or decrypts a packet according to the given parameters. However, we present a flexible schema that could be used to other transformation actions tied to flow rules. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03RDMA: Use ib_gid_attr during GID modificationParav Pandit1-19/+14
Now that ib_gid_attr contains device, port and index, simplify the provider APIs add_gid() and del_gid() to use device, port and index fields from the ib_gid_attr attributes structure. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03IB/core: Refactor GID modify code for RoCEParav Pandit1-1/+4
Code is refactored to prepare separate functions for RoCE which can do more complex operations related to reference counting, while still maintainining code readability. This includes (a) Simplification to not perform netdevice checks and modifications for IB link layer. (b) Do not add RoCE GID entry which has NULL netdevice; instead return an error. (c) If GID addition fails at provider level add_gid(), do not add the entry in the cache and keep the entry marked as INVALID. (d) Simplify and reuse the ib_cache_gid_add()/del() routines so that they can be used even for modifying default GIDs. This avoid some code duplication in modifying default GIDs. (e) find_gid() routine refers to the data entry flags to qualify a GID as valid or invalid GID rather than depending on attributes and zeroness of the GID content. (f) gid_table_reserve_default() sets the GID default attribute at beginning while setting up the GID table. There is no need to use default_gid flag in low level functions such as write_gid(), add_gid(), del_gid(), as they never need to update the DEFAULT property of the GID entry while during GID table update. As as result of this refactor, reserved GID 0:0:0:0:0:0:0:0 is no longer searchable as described below. A unicast GID entry of 0:0:0:0:0:0:0:0 is Reserved GID as per the IB spec version 1.3 section 4.1.1, point (6) whose snippet is below. "The unicast GID address 0:0:0:0:0:0:0:0 is reserved - referred to as the Reserved GID. It shall never be assigned to any endport. It shall not be used as a destination address or in a global routing header (GRH)." GID table cache now only stores valid GID entries. Before this patch, Reserved GID 0:0:0:0:0:0:0:0 was searchable in the GID table using ib_find_cached_gid_by_port() and other similar find routines. Zero GID is no longer searchable as it shall not to be present in GRH or path recored entry as described in IB spec version 1.3 section 4.1.1, point (6), section 12.7.10 and section 12.7.20. ib_cache_update() is simplified to check link layer once, use unified locking scheme for all link layers, removed temporary gid table allocation/free logic. Additionally, (a) Expand ib_gid_attr to store port and index so that GID query routines can get port and index information from the attribute structure. (b) Expand ib_gid_attr to store device as well so that in future code when GID reference counting is done, device is used to reach back to the GID table entry. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03RDMA/core: Update query_gid documentation for HCA driversParav Pandit1-0/+4
query_gid() should return right GID value for iWarp and IB link layers. It is a no-op for RoCE link layer. Update the documentation to reflect this. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27IB/core: Protect against concurrent access to hardware statsMark Bloch1-0/+4
Currently access to hardware stats buffer isn't protected, this can result in multiple writes and reads at the same time to the same memory location. This can lead to providing an incorrect value to the user. Add a mutex to protect against it. Fixes: b40f4757daa1 ("IB/core: Make device counter infrastructure dynamic") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-26net: Drop NETDEV_UNREGISTER_FINALKirill Tkhai1-2/+2
Last user is gone after bdf5bd7f2132 "rds: tcp: remove register_netdevice_notifier infrastructure.", so we can remove this netdevice command. This allows to delete rtnl_lock() in netdev_run_todo(), which is hot path for net namespace unregistration. dev_change_net_namespace() and netdev_wait_allrefs() have rcu_barrier() before NETDEV_UNREGISTER_FINAL call, and the source commits say they were introduced to delemit the call with NETDEV_UNREGISTER, but this patch leaves them on the places, since they require additional analysis, whether we need in them for something else. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-19IB/uverbs: Safely extend existing attributesMatan Barak1-4/+9
Previously, we've used UVERBS_ATTR_SPEC_F_MIN_SZ for extending existing attributes. The behavior of this flag was the kernel accepts anything bigger than the minimum size it specified. This is unsafe, since in order to safely extend an attribute, we need to make sure unknown size is zeroed. Replacing UVERBS_ATTR_SPEC_F_MIN_SZ with UVERBS_ATTR_SPEC_F_MIN_SZ_OR_ZERO, which essentially checks that the unknown size is zero. In addition, attributes are now decorated with UVERBS_ATTR_TYPE and UVERBS_ATTR_STRUCT, so we can provide the minimum and known length. Users of this flag needs to use copy_from_or_zero functions/macros. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/uverbs: Extend uverbs_ioctl header with driver_idMatan Barak1-0/+2
Extending uverbs_ioctl header with driver_id and another reserved field. driver_id should be used in order to identify the driver. Since every driver could have its own parsing tree, this is necessary for strace support. Downstream patches take off the EXPERIMENTAL flag from the ioctl() IB support and thus we add some reserved fields for future usage. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19RDMA/verbs: Remove restrack entry from XRCD structureLeon Romanovsky1-4/+0
XRCD object is not implemented in the restrack, so lets remove it. Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track RDMA resources") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/core: Remove unimplemented ib_peek_cqParav Pandit1-12/+0
ib_peek_cq() verb doesn't seem be implemented in current code. There is some past reference to it at [1] about it being unimplemented. Lot of user documentation created out of kdoc refers to this unimplemented API. Therefore, remove unimplemented API. [1] http://lists.openfabrics.org/pipermail/ofw/2008-May/002465.html Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-15IB/{core, ipoib}: Simplify ib_find_gid() for unused ndevParav Pandit1-1/+1
ib_find_gid() is only used by IPoIB driver. For IB link layer, GID table entries are not based on netdevice. Netdevice parameter is unused here. Therefore, it is removed. Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-14RDMA/verbs: Simplify modify QP checkLeon Romanovsky1-3/+3
All callers to ib_modify_qp_is_ok() provides enum ib_qp_state makes the checks of out-of-scope redundant. Let's remove them together with updating function signature to return boolean result. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-08RDMA/nldev: provide detailed MR informationSteve Wise1-0/+5
Implement the RDMA nldev netlink interface for dumping detailed MR information. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-01IB/core: Map iWarp AH type to undefined in rdma_ah_find_typeDon Hiatt1-6/+14
iWarp devices do not support the creation of address handles so return AH_ATTR_TYPE_UNDEFINED for all iWarp devices. While we are here reduce the size of port_num to u8 and add a comment. Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types") Reported-by: Parav Pandit <parav@mellanox.com> CC: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-29RDMA/restrack: Add general infrastructure to track RDMA resourcesLeon Romanovsky1-0/+19
The RDMA subsystem has very strict set of objects to work with, but it completely lacks tracking facilities and has no visibility of resource utilization. The following patch adds such infrastructure to keep track of RDMA resources to help with debugging of user space applications. The primary user of this infrastructure is RDMA nldev netlink (following patches), to be exposed to userspace via rdmatool, but it is not limited too that. At this stage, the main three objects (PD, CQ and QP) are added, and more will be added later. Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-29RDMA/core: Save kernel caller name when creating PD and CQ objectsLeon Romanovsky1-3/+10
The KBUILD_MODNAME variable contains the module name and it is known for kernel users during compilation, so let's reuse it to track the owners. Followup patches will store this for resource tracking. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-29RDMA/core: Use the MODNAME instead of the function name for pd callersLeon Romanovsky1-1/+1
Each of our modules only allocates a PD in one place, so there isn't any loss in detail, while MODNAME is more useful and recognizable as something to expose to the user. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-29RDMA: Move enum ib_cq_creation_flags to uapi headersJason Gunthorpe1-5/+0
The flags field the enum is used with comes directly from the uapi so it belongs in the uapi headers for clarity and so userspace can use it. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15RDMA/core: Clarify rdma_ah_find_typeParav Pandit1-2/+1
iWARP does not use rdma_ah_attr_type, and for this reason we do not have a RDMA_AH_ATTR_TYPE_IWARP. rdma_ah_find_type should not even be called on iwarp ports and for clarity it shouldn't have a special test for iWarp. This changes the result from RDMA_AH_ATTR_TYPE_ROCE to RDMA_AH_ATTR_TYPE_IB when wrongly called on an iWarp port. Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15IB/core: Fix ib_wc structure size to remain in 64 bytes boundaryBodong Wang1-1/+1
The change of slid from u16 to u32 results in sizeof(struct ib_wc) cross 64B boundary, which causes more cache misses. This patch rearranges the fields and remain the size to 64B. Pahole output before this change: struct ib_wc { union { u64 wr_id; /* 8 */ struct ib_cqe * wr_cqe; /* 8 */ }; /* 0 8 */ enum ib_wc_status status; /* 8 4 */ enum ib_wc_opcode opcode; /* 12 4 */ u32 vendor_err; /* 16 4 */ u32 byte_len; /* 20 4 */ struct ib_qp * qp; /* 24 8 */ union { __be32 imm_data; /* 4 */ u32 invalidate_rkey; /* 4 */ } ex; /* 32 4 */ u32 src_qp; /* 36 4 */ int wc_flags; /* 40 4 */ u16 pkey_index; /* 44 2 */ /* XXX 2 bytes hole, try to pack */ u32 slid; /* 48 4 */ u8 sl; /* 52 1 */ u8 dlid_path_bits; /* 53 1 */ u8 port_num; /* 54 1 */ u8 smac[6]; /* 55 6 */ /* XXX 1 byte hole, try to pack */ u16 vlan_id; /* 62 2 */ /* --- cacheline 1 boundary (64 bytes) --- */ u8 network_hdr_type; /* 64 1 */ /* size: 72, cachelines: 2, members: 17 */ /* sum members: 62, holes: 2, sum holes: 3 */ /* padding: 7 */ /* last cacheline: 8 bytes */ }; Pahole output after this change: struct ib_wc { union { u64 wr_id; /* 8 */ struct ib_cqe * wr_cqe; /* 8 */ }; /* 0 8 */ enum ib_wc_status status; /* 8 4 */ enum ib_wc_opcode opcode; /* 12 4 */ u32 vendor_err; /* 16 4 */ u32 byte_len; /* 20 4 */ struct ib_qp * qp; /* 24 8 */ union { __be32 imm_data; /* 4 */ u32 invalidate_rkey; /* 4 */ } ex; /* 32 4 */ u32 src_qp; /* 36 4 */ u32 slid; /* 40 4 */ int wc_flags; /* 44 4 */ u16 pkey_index; /* 48 2 */ u8 sl; /* 50 1 */ u8 dlid_path_bits; /* 51 1 */ u8 port_num; /* 52 1 */ u8 smac[6]; /* 53 6 */ /* XXX 1 byte hole, try to pack */ u16 vlan_id; /* 60 2 */ u8 network_hdr_type; /* 62 1 */ /* size: 64, cachelines: 1, members: 17 */ /* sum members: 62, holes: 1, sum holes: 1 */ /* padding: 1 */ }; Cc: <stable@vger.kernel.org> # v4.13 Fixes: 7db20ecd1d97 ("IB/core: Change wc.slid from 16 to 32 bits") Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08{net, IB}/mlx5: Manage port association for multiport RoCEDaniel Jurgens1-0/+8
When mlx5_ib_add is called determine if the mlx5 core device being added is capable of dual port RoCE operation. If it is, determine whether it is a master device or a slave device using the num_vhca_ports and affiliate_nic_vport_criteria capabilities. If the device is a slave, attempt to find a master device to affiliate it with. Devices that can be affiliated will share a system image guid. If none are found place it on a list of unaffiliated ports. If a master is found bind the port to it by configuring the port affiliation in the NIC vport context. Similarly when mlx5_ib_remove is called determine the port type. If it's a slave port, unaffiliate it from the master device, otherwise just remove it from the unaffiliated port list. The IB device is registered as a multiport device, even if a 2nd port is not available for affiliation. When the 2nd port is affiliated later the GID cache must be refreshed in order to get the default GIDs for the 2nd port in the cache. Export roce_rescan_device to provide a mechanism to refresh the cache after a new port is bound. In a multiport configuration all IB object (QP, MR, PD, etc) related commands should flow through the master mlx5_core_dev, other commands must be sent to the slave port mlx5_core_mdev, an interface is provide to get the correct mdev for non IB object commands. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>