aboutsummaryrefslogtreecommitdiffstats
path: root/include/rdma/ib_verbs.h (follow)
AgeCommit message (Collapse)AuthorFilesLines
2015-09-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdmaLinus Torvalds1-87/+116
Pull inifiniband/rdma updates from Doug Ledford: "This is a fairly sizeable set of changes. I've put them through a decent amount of testing prior to sending the pull request due to that. There are still a few fixups that I know are coming, but I wanted to go ahead and get the big, sizable chunk into your hands sooner rather than waiting for those last few fixups. Of note is the fact that this creates what is intended to be a temporary area in the drivers/staging tree specifically for some cleanups and additions that are coming for the RDMA stack. We deprecated two drivers (ipath and amso1100) and are waiting to hear back if we can deprecate another one (ehca). We also put Intel's new hfi1 driver into this area because it needs to be refactored and a transfer library created out of the factored out code, and then it and the qib driver and the soft-roce driver should all be modified to use that library. I expect drivers/staging/rdma to be around for three or four kernel releases and then to go away as all of the work is completed and final deletions of deprecated drivers are done. Summary of changes for 4.3: - Create drivers/staging/rdma - Move amso1100 driver to staging/rdma and schedule for deletion - Move ipath driver to staging/rdma and schedule for deletion - Add hfi1 driver to staging/rdma and set TODO for move to regular tree - Initial support for namespaces to be used on RDMA devices - Add RoCE GID table handling to the RDMA core caching code - Infrastructure to support handling of devices with differing read and write scatter gather capabilities - Various iSER updates - Kill off unsafe usage of global mr registrations - Update SRP driver - Misc mlx4 driver updates - Support for the mr_alloc verb - Support for a netlink interface between kernel and user space cache daemon to speed path record queries and route resolution - Ininitial support for safe hot removal of verbs devices" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (136 commits) IB/ipoib: Suppress warning for send only join failures IB/ipoib: Clean up send-only multicast joins IB/srp: Fix possible protection fault IB/core: Move SM class defines from ib_mad.h to ib_smi.h IB/core: Remove unnecessary defines from ib_mad.h IB/hfi1: Add PSM2 user space header to header_install IB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY mlx5: Fix incorrect wc pkey_index assignment for GSI messages IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow IB/uverbs: reject invalid or unknown opcodes IB/cxgb4: Fix if statement in pick_local_ip6adddrs IB/sa: Fix rdma netlink message flags IB/ucma: HW Device hot-removal support IB/mlx4_ib: Disassociate support IB/uverbs: Enable device removal when there are active user space applications IB/uverbs: Explicitly pass ib_dev to uverbs commands IB/uverbs: Fix race between ib_uverbs_open and remove_one IB/uverbs: Fix reference counting usage of event files IB/core: Make ib_dealloc_pd return void IB/srp: Create an insecure all physical rkey only if needed ...
2015-08-30IB/uverbs: Enable device removal when there are active user space applicationsYishai Hadas1-0/+1
Enables the uverbs_remove_one to succeed despite the fact that there are running IB applications working with the given ib device. This functionality enables a HW device to be unbind/reset despite the fact that there are running user space applications using it. It exposes a new IB kernel API named 'disassociate_ucontext' which lets a driver detaching its HW resources from a given user context without crashing/terminating the application. In case a driver implemented the above API and registered with ib_uverb there will be no dependency between its device to its uverbs_device. Upon calling remove_one of ib_uverbs the call should return after disassociating the open HW resources without waiting to clients disconnecting. In case driver didn't implement this API there will be no change to current behaviour and uverbs_remove_one will return only when last client has disconnected and reference count on uverbs device became 0. In case the lower driver device was removed any application will continue working over some zombie HCA, further calls will ended with an immediate error. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/core: Make ib_dealloc_pd return voidJason Gunthorpe1-5/+1
The majority of callers never check the return value, and even if they did, they can't do anything about a failure. All possible failure cases represent a bug in the caller, so just WARN_ON inside the function instead. This fixes a few random errors: net/rd/iw.c infinite loops while it fails. (racing with EBUSY?) This also lays the ground work to get rid of error return from the drivers. Most drivers do not error, the few that do are broken since it cannot be handled. Since uverbs can legitimately make use of EBUSY, open code the check. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/core: Guarantee that a local_dma_lkey is availableJason Gunthorpe1-7/+2
Every single ULP requires a local_dma_lkey to do anything with a QP, so let us ensure one exists for every PD created. If the driver can supply a global local_dma_lkey then use that, otherwise ask the driver to create a local use all physical memory MR associated with the new PD. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Sagi Grimberg <sagig@dev.mellanox.co.il> Acked-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/mlx4: Implement ib_device callbacksMoni Shoua1-0/+2
get_netdev: get the net_device on the physical port of the IB transport port. In port aggregation mode it is required to return the netdev of the active port. modify_gid: note for a change in the RoCE gid cache. Handle this by writing to the harsware GID table. It is possible that indexes in cahce and hardware tables won't match so a translation is required when modifying a QP or creating an address handle. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/core: Add RoCE GID table managementMatan Barak1-2/+64
RoCE GIDs are based on IP addresses configured on Ethernet net-devices which relate to the RDMA (RoCE) device port. Currently, each of the low-level drivers that support RoCE (ocrdma, mlx4) manages its own RoCE port GID table. As there's nothing which is essentially vendor specific, we generalize that, and enhance the RDMA core GID cache to do this job. In order to populate the GID table, we listen for events: (a) netdev up/down/change_addr events - if a netdev is built onto our RoCE device, we need to add/delete its IPs. This involves adding all GIDs related to this ndev, add default GIDs, etc. (b) inet events - add new GIDs (according to the IP addresses) to the table. For programming the port RoCE GID table, providers must implement the add_gid and del_gid callbacks. RoCE GID management requires us to state the associated net_device alongside the GID. This information is necessary in order to manage the GID table. For example, when a net_device is removed, its associated GIDs need to be removed as well. RoCE mandates generating a default GID for each port, based on the related net-device's IPv6 link local. In contrast to the GID based on the regular IPv6 link-local (as we generate GID per IP address), the default GID is also available when the net device is down (in order to support loopback). Locking is done as follows: The patch modify the GID table code both for new RoCE drivers implementing the add_gid/del_gid callbacks and for current RoCE and IB drivers that do not. The flows for updating the table are different, so the locking requirements are too. While updating RoCE GID table, protection against multiple writers is achieved via mutex_lock(&table->lock). Since writing to a table requires us to find an entry (possible a free entry) in the table and then modify it, this mutex protects both the find_gid and write_gid ensuring the atomicity of the action. Each entry in the GID cache is protected by rwlock. In RoCE, writing (usually results from netdev notifier) involves invoking the vendor's add_gid and del_gid callbacks, which could sleep. Therefore, an invalid flag is added for each entry. Updates for RoCE are done via a workqueue, thus sleeping is permitted. In IB, updates are done in write_lock_irq(&device->cache.lock), thus write_gid isn't allowed to sleep and add_gid/del_gid are not called. When passing net-device into/out-of the GID cache, the device is always passed held (dev_hold). The code uses a single work item for updating all RDMA devices, following a netdev or inet notifier. The patch moves the cache from being a client (which was incorrect, as the cache is part of the IB infrastructure) to being explicitly initialized/freed when a device is registered/removed. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/core: Drop ib_alloc_fast_reg_mrSagi Grimberg1-11/+0
Fully replaced by a more generic and suitable ib_alloc_mr. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB: Modify ib_create_mr APISagi Grimberg1-22/+15
Use ib_alloc_mr with specific parameters. Change the existing callers. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/core: Get rid of redundant verb ib_destroy_mrSagi Grimberg1-10/+0
This was added in a thought of uniting all mr allocation and deallocation routines but the fact is we have a single deallocation routine already, ib_dereg_mr. And, move mlx5_ib_destroy_mr specific logic into mlx5_ib_dereg_mr (includes only signature stuff for now). And, fixup the only callers (iser/isert) accordingly. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/core: Find the network device matching connection parametersYotam Kenneth1-0/+27
In the case of IPoIB, and maybe in other cases, the network device is managed by an upper-layer protocol (ULP). In order to expose this network device to other users of the IB device, let ULPs implement a callback that returns network device according to connection parameters. The IB device and port, together with the P_Key and the GID should be enough to uniquely identify the ULP net device. However, in current kernels there can be multiple IPoIB interfaces created with the same GID. Furthermore, such configuration may be desireable to support ipvlan-like configurations for RDMA CM with IPoIB. To resolve the device in these cases the code will also take the IP address as an additional input. Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/core: lock client data with lists_rwsemHaggai Eran1-1/+3
An ib_client callback that is called with the lists_rwsem locked only for read is protected from changes to the IB client lists, but not from ib_unregister_device() freeing its client data. This is because ib_unregister_device() will remove the device from the device list with lists_rwsem locked for write, but perform the rest of the cleanup, including the call to remove() without that lock. Mark client data that is undergoing de-registration with a new going_down flag in the client data context. Lock the client data list with lists_rwsem for write in addition to using the spinlock, so that functions calling the callback would be able to lock only lists_rwsem for read and let callbacks sleep. Since ib_unregister_client() now marks the client data context, no need for remove() to search the context again, so pass the client data directly to remove() callbacks. Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28RDMA/Core: remove rdma_cap_read_multi_sge() helperSteve Wise1-28/+0
This functionality already exists via the max_sge_rd device capability. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-05core: Remove the ib_reg_phys_mr() and ib_rereg_phys_mr() verbsChuck Lever1-46/+0
The verbs are obsolete. The ib_rereg_phys_mr() verb is not used by kernel ULPs, and the last ib_reg_phys_mr() call site in the kernel tree has now been removed. Two staging tree call sites remain in the Lustre client. The Lustre team has been notified of the deprecation of reg_phys_mr. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-07-14IB: Add rdma_cap_ib_switch helper and use where appropriateHal Rosenstock1-3/+17
Persuant to Liran's comments on node_type on linux-rdma mailing list: In an effort to reform the RDMA core and ULPs to minimize use of node_type in struct ib_device, an additional bit is added to struct ib_device for is_switch (IB switch). This is needed to be initialized by any IB switch device driver. This is a NEW requirement on such device drivers which are all "out of tree". In addition, an ib_switch helper was added to ib_verbs.h based on the is_switch device bit rather than node_type (although those should be consistent). The RDMA core (MAD, SMI, agent, sa_query, multicast, sysfs) as well as (IPoIB and SRP) ULPs are updated where appropriate to use this new helper. In some cases, the helper is now used under the covers of using rdma_[start end]_port rather than the open coding previously used. Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12IB/core: Add OPA MAD core capability flagIra Weiny1-0/+28
Add OPA MAD support flags to the core capability immutable flags. In addition add the rdma_cap_opa_mad helper function for core functions to use to detect OPA MAD support. OPA MADs share a common header with IBTA MADs but with some differences for increased performance. Sharing a common header with IBTA MADs allows us to share most of the MAD processing code when dealing with OPA MADs in addition to supporting some IBTA MADs on OPA devices. OPA MADs differ in the following ways: 1) MADs are variable size up to 2K IBTA defined MADs remain fixed at 256 bytes 2) OPA SMPs must carry valid PKeys 3) OPA SMP packets are a different format The MAD stack will use this new functionality to determine if OPA MAD processing should occur on individual device ports. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12IB/mad: Add support for additional MAD info to/from driversIra Weiny1-3/+6
In order to support alternate sized MADs (and variable sized MADs on OPA devices) add in/out MAD size parameters to the process_mad core call. In addition, add an out_mad_pkey_index to communicate the pkey index the driver wishes the MAD stack to use when sending OPA MAD responses. The out MAD size and the out MAD PKey index are required by the MAD stack to generate responses on OPA devices. Furthermore, the in and out MAD parameters are made generic by specifying them as ib_mad_hdr rather than ib_mad. Drivers are modified as needed and are protected by BUG_ON flags if the MAD sizes passed to them is incorrect. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12IB/core: Add ability for drivers to report an alternate MAD size.Ira Weiny1-0/+18
Add max MAD size to the device immutable data set and have all drivers that support MADs report the current IB MAD size (IB_MGMT_MAD_SIZE) to the core. Verify MAD size data in both the MAD core and when reading the immutable data. OPA drivers will report alternate MAD sizes in subsequent patches. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12IB/core: Pass hardware specific data in query_deviceMatan Barak1-1/+2
Vendors should be able to pass vendor specific data to/from user-space via query_device uverb. In order to do this, we need to pass the vendors' specific udata. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12IB/core: Add timestamp_mask and hca_core_clock to query_deviceMatan Barak1-0/+2
In order to expose timestamp we need to expose two new attributes in query_device to be used for CQ completion time-stamping: timestamp_mask - how many bits are valid in the timestamp, where timestamp values could be 64bits the most. hca_core_clock - timestamp is given in HW cycles, the frequency in KHZ units of the HCA, necessary in order to convert cycles to seconds. This is added both to ib_query_device and its respective uverbs counterpart. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12IB/core: Add CQ creation time-stamping flagMatan Barak1-0/+4
Add CQ creation flag which dictates that the created CQ will report completion time-stamp value in the WC. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12IB/core: Change ib_create_cq to use struct ib_cq_init_attrMatan Barak1-4/+3
Currently, ib_create_cq uses cqe and comp_vecotr instead of the extendible ib_cq_init_attr struct. Earlier patches already changed the vendors to work with ib_cq_init_attr. This patch changes the consumers too. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12IB/core: Change provider's API of create_cq to be extendibleMatan Barak1-2/+8
Add a new ib_cq_init_attr structure which contains the previous cqe (minimum number of CQ entries) and comp_vector (completion vector) in addition to a new flags field. All vendors' create_cq callbacks are changed in order to work with the new API. This commit does not change any functionality. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> to patch #2 Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-10IB/core: Don't advertise SA in RoCE port capabilitiesMoni Shoua1-1/+0
The Subnet Administrator (SA) is not a component of the RoCE spec. Therefore, it should not be a capability of a RoCE port. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-02IB/core cleanup: Add const to args - agent_send_responseIra Weiny1-4/+5
In order to support constant callers of agent_send_response we add const specifiers to the its pointer arguments. Adjust the call tree accordingly. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-02IB/core cleanup: Add const on args - device->process_madIra Weiny1-3/+3
The process_mad device function declares some parameters as "in". Make those parameters const and adjust the call tree under process_mad in the various drivers accordingly. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-02IB/core cleanup: Add const to RDMA helpersIra Weiny1-12/+12
The ib_device passed to the new RDMA helpers is constant. Declare the ib_device as const in the following functions. rdma_protocol_ib rdma_protocol_roce rdma_protocol_iwarp rdma_ib_or_roce rdma_cap_ib_mad rdma_cap_ib_smi rdma_cap_ib_cm rdma_cap_iw_cm rdma_cap_ib_sa rdma_cap_ib_mcast rdma_cap_af_ib rdma_cap_eth_ah Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-20Merge branches 'bart-srp', 'generic-errors', 'ira-cleanups' and 'mwang-v8' into k.o/for-4.2Doug Ledford1-2/+299
2015-05-20IB/core: Change rdma_protocol_iboe to roceIra Weiny1-2/+2
After discussion upstream, it was agreed to transition the usage of iboe in the kernel to roce. This keeps our terminology consistent with what was finalized in the IBTA Annex 16 and IBTA Annex 17 publications. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-20IB/core: Convert core to use bitfield for capsIra Weiny1-16/+48
Remove query_protocol callback Use the new Core Capability bits for: rdma_protocol_* rdma_cap_ib_mad rdma_cap_ib_smi rdma_cap_ib_cm rdma_cap_iw_cm rdma_cap_ib_sa rdma_cap_ib_mcast rdma_cap_af_ib rdma_cap_eth_ah Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-20IB/core: Add per port immutable struct to ib_deviceIra Weiny1-2/+17
As of commit 5eb620c81ce3 "IB/core: Add helpers for uncached GID and P_Key searches"; pkey_tbl_len and gid_tbl_len are immutable data which are stored in the ib_device. The per port core capability flags to be added later are also immutable data to be stored in the ib_device object. In preparation for this create a structure for per port immutable data and place the pkey and gid table lengths within this structure. "get_port_immutable" is added as a mandatory device function to allow the drivers to fill in this data. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18IB/core, cma: Nice log-friendly string helpersSagi Grimberg1-0/+4
Some of us keep revisiting the code to decode enumerations that appear in out logs. Let's borrow the nice logging helpers that exists in xprtrdma and rds for CMA events, IB events and WC statuses. Reviewd-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18IB/core: Create common start/end port functionsIra Weiny1-0/+27
Previously start_port and end_port were defined in 2 places, cache.c and device.c and this prevented their use in other modules. Make these common functions, change the name to reflect the rdma name space, and update existing users. Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18IB/Verbs: Improve docs for rdma-helpersMichael Wang1-40/+92
Increase the level of documentation for the rdma_cap_* helpers introduced by Michael Wang <yun.wang@profitbricks.com>. This patch is loosely based on a patch Michael wrote to enhance the documentation of these functions, but has been significantly modified in terms of verbiage. In addition, the comments were moved from a kernel Documentation/infiniband/ file to being inline in the header file itself for the functions in question. Finally, the documentation was formated in proper kdoc format. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18IB/Verbs: Use management helper rdma_cap_eth_ah()Michael Wang1-0/+15
Introduce helper rdma_cap_eth_ah() to help us check if the port of an IB device support Ethernet Address Handler. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18IB/Verbs: Use management helper rdma_cap_af_ib()Michael Wang1-0/+15
Introduce helper rdma_cap_af_ib() to help us check if the port of an IB device support Native Infiniband Address. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18IB/Verbs: Use management helper rdma_cap_read_multi_sge()Michael Wang1-0/+16
Introduce helper rdma_cap_read_multi_sge() to help us check if the port of an IB device support RDMA Read Multiple Scatter-Gather Entries. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18IB/Verbs: Use management helper rdma_cap_ib_mcast()Michael Wang1-0/+15
Introduce helper rdma_cap_ib_mcast() to help us check if the port of an IB device support Infiniband Multicast. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18IB/Verbs: Use management helper rdma_cap_ib_sa()Michael Wang1-0/+15
Introduce helper rdma_cap_ib_sa() to help us check if the port of an IB device support Infiniband Subnet Administration. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18IB/Verbs: Use management helper rdma_cap_iw_cm()Michael Wang1-0/+15
Introduce helper rdma_cap_iw_cm() to help us check if the port of an IB device support IWARP Communication Manager. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18IB/Verbs: Use management helper rdma_cap_ib_cm()Michael Wang1-0/+15
Introduce helper rdma_cap_ib_cm() to help us check if the port of an IB device support Infiniband Communication Manager. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18IB/Verbs: Use management helper rdma_cap_ib_smi()Michael Wang1-0/+15
Introduce helper rdma_cap_ib_smi() to help us check if the port of an IB device support Infiniband Subnet Management Interface. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18IB/Verbs: Use management helper rdma_cap_ib_mad()Michael Wang1-0/+15
Introduce helper rdma_cap_ib_mad() to help us check if the port of an IB device support Infiniband Management Datagrams. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18IB/Verbs: Implement raw management helpersMichael Wang1-0/+22
Add raw helpers: rdma_protocol_ib rdma_protocol_iboe rdma_protocol_iwarp rdma_ib_or_iboe (transition, clean up later) To help us detect which technology the port supported. Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-05-18IB/Verbs: Implement new callback query_protocol()Michael Wang1-0/+9
Add new callback query_protocol() and implement for each HW. Mapping List: node-type link-layer transport protocol nes RNIC ETH IWARP IWARP amso1100 RNIC ETH IWARP IWARP cxgb3 RNIC ETH IWARP IWARP cxgb4 RNIC ETH IWARP IWARP usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP ocrdma IB_CA ETH IB IBOE mlx4 IB_CA IB/ETH IB IB/IBOE mlx5 IB_CA IB IB IB ehca IB_CA IB IB IB ipath IB_CA IB IB IB mthca IB_CA IB IB IB qib IB_CA IB IB IB Signed-off-by: Michael Wang <yun.wang@profitbricks.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-02-06Revert "IB/core: Add support for extended query device caps"Yann Droneaud1-4/+1
While commit 7e36ef8205ff ("IB/core: Temporarily disable ex_query_device uverb") is correct as it makes the extended QUERY_DEVICE uverb (which came as part of commit 5a77abf9a97a ("IB/core: Add support for extended query device caps") and commit 860f10a799c8 ("IB/core: Add flags for on demand paging support")) not available to userspace, it doesn't address the initial issue regarding ib_copy_to_udata() [1][2]. Additionally, further discussions around this new uverb seems to conclude it would require a different data structure than the one currently described in <rdma/ib_user_verbs.h> [3]. Both of these issues require a revert of the changes, so this patch partially reverts commit 8cdd312cfed7 ("IB/mlx5: Implement the ODP capability query verb") and commit 860f10a799c8 ("IB/core: Add flags for on demand paging support") and fully reverts commit 5a77abf9a97a ("IB/core: Add support for extended query device caps"). [1] "Re: [PATCH v3 06/17] IB/core: Add support for extended query device caps" http://mid.gmane.org/1418733236.2779.26.camel@opteya.com [2] "Re: [PATCH] IB/core: Temporarily disable ex_query_device uverb" http://mid.gmane.org/1423067503.3030.83.camel@opteya.com [3] "RE: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask" http://mid.gmane.org/2807E5FD2F6FDA4886F6618EAC48510E0CC12C30@CRSMSX101.amr.corp.intel.com Cc: Eli Cohen <eli@mellanox.com> Cc: Haggai Eran <haggaie@mellanox.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/core: Implement support for MMU notifiers regarding on demand paging regionsHaggai Eran1-0/+19
* Add an interval tree implementation for ODP umems. Create an interval tree for each ucontext (including a count of the number of ODP MRs in this context, semaphore, etc.), and register ODP umems in the interval tree. * Add MMU notifiers handling functions, using the interval tree to notify only the relevant umems and underlying MRs. * Register to receive MMU notifier events from the MM subsystem upon ODP MR registration (and unregister accordingly). * Add a completion object to synchronize the destruction of ODP umems. * Add mechanism to abort page faults when there's a concurrent invalidation. The way we synchronize between concurrent invalidations and page faults is by keeping a counter of currently running invalidations, and a sequence number that is incremented whenever an invalidation is caught. The page fault code checks the counter and also verifies that the sequence number hasn't progressed before it updates the umem's page tables. This is similar to what the kvm module does. In order to prevent the case where we register a umem in the middle of an ongoing notifier, we also keep a per ucontext counter of the total number of active mmu notifiers. We only enable new umems when all the running notifiers complete. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yuval Dagan <yuvalda@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/core: Add support for on demand paging regionsShachar Raindel1-0/+2
* Extend the umem struct to keep the ODP related data. * Allocate and initialize the ODP related information in the umem (page_list, dma_list) and freeing as needed in the end of the run. * Store a reference to the process PID struct in the ucontext. Used to safely obtain the task_struct and the mm during fault handling, without preventing the task destruction if needed. * Add 2 helper functions: ib_umem_odp_map_dma_pages and ib_umem_odp_unmap_dma_pages. These functions get the DMA addresses of specific pages of the umem (and, currently, pin them). * Support for page faults only - IB core will keep the reference on the pages used and call put_page when freeing an ODP umem area. Invalidations support will be added in a later patch. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/core: Add flags for on demand paging supportSagi Grimberg1-2/+26
* Add a configuration option for enable on-demand paging support in the infiniband subsystem (CONFIG_INFINIBAND_ON_DEMAND_PAGING). In a later patch, this configuration option will select the MMU_NOTIFIER configuration option to enable mmu notifiers. * Add a flag for on demand paging (ODP) support in the IB device capabilities. * Add a flag to request ODP MR in the access flags to reg_mr. * Fail registrations done with the ODP flag when the low-level driver doesn't support this. * Change the conditions in which an MR will be writable to explicitly specify the access flags. This is to avoid making an MR writable just because it is an ODP MR. * Add a ODP capabilities to the extended query device verb. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/core: Add support for extended query device capsEli Cohen1-1/+4
Add extensible query device capabilities verb to allow adding new features. ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to copy capability fields to be used by both ib_uverbs_query_device and ib_uverbs_ex_query_device. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09IB/mlx5, iser, isert: Add Signature API additionsSagi Grimberg1-18/+14
Expose more signature setting parameters. We modify the signature API to allow usage of some new execution parameters relevant to data integrity feature. This patch modifies ib_sig_domain structure by: - Deprecate DIF type in signature API (operation will be determined by the parameters alone, no DIF type awareness) - Add APPTAG check bitmask (for input domain) - Add REFTAG remap (increment) flag for each domain - Add APPTAG/REFTAG escape options for each domain The mlx5 driver is modified to follow the new parameters in HW signature setup. At the moment the callers (iser/isert) hard-code new parameters (by DIF type). In the future, callers will retrieve them from the scsi command structure. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>