aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core/Makefile (follow)
AgeCommit message (Collapse)AuthorFilesLines
2017-05-23IB/core: Enforce PKey security on QPsDaniel Jurgens1-1/+2
Add new LSM hooks to allocate and free security contexts and check for permission to access a PKey. Allocate and free a security context when creating and destroying a QP. This context is used for controlling access to PKeys. When a request is made to modify a QP that changes the port, PKey index, or alternate path, check that the QP has permission for the PKey in the PKey table index on the subnet prefix of the port. If the QP is shared make sure all handles to the QP also have access. Store which port and PKey index a QP is using. After the reset to init transition the user can modify the port, PKey index and alternate path independently. So port and PKey settings changes can be a merge of the previous settings and the new ones. In order to maintain access control if there are PKey table or subnet prefix change keep a list of all QPs are using each PKey index on each port. If a change occurs all QPs using that device and port must have access enforced for the new cache settings. These changes add a transaction to the QP modify process. Association with the old port and PKey index must be maintained if the modify fails, and must be removed if it succeeds. Association with the new port and PKey index must be established prior to the modify and removed if the modify fails. 1. When a QP is modified to a particular Port, PKey index or alternate path insert that QP into the appropriate lists. 2. Check permission to access the new settings. 3. If step 2 grants access attempt to modify the QP. 4a. If steps 2 and 3 succeed remove any prior associations. 4b. If ether fails remove the new setting associations. If a PKey table or subnet prefix changes walk the list of QPs and check that they have permission. If not send the QP to the error state and raise a fatal error event. If it's a shared QP make sure all the QPs that share the real_qp have permission as well. If the QP that owns a security structure is denied access the security structure is marked as such and the QP is added to an error_list. Once the moving the QP to error is complete the security structure mark is cleared. Maintaining the lists correctly turns QP destroy into a transaction. The hardware driver for the device frees the ib_qp structure, so while the destroy is in progress the ib_qp pointer in the ib_qp_security struct is undefined. When the destroy process begins the ib_qp_security structure is marked as destroying. This prevents any action from being taken on the QP pointer. After the QP is destroyed successfully it could still listed on an error_list wait for it to be processed by that flow before cleaning up the structure. If the destroy fails the QPs port and PKey settings are reinserted into the appropriate lists, the destroying flag is cleared, and access control is enforced, in case there were any cache changes during the destroy flow. To keep the security changes isolated a new file is used to hold security related functionality. Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Acked-by: Doug Ledford <dledford@redhat.com> [PM: merge fixup in ib_verbs.h and uverbs_cmd.c] Signed-off-by: Paul Moore <paul@paul-moore.com>
2017-04-05IB/core: Add idr based standard typesMatan Barak1-1/+1
This patch adds the standard idr based types. These types are used in downstream patches in order to initialize, destroy and lookup IB standard objects which are based on idr objects. An idr object requires filling out several parameters. Its op pointer should point to uverbs_idr_ops and its size should be at least the size of ib_uobject. We add a macro to make the type declaration easier. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-05IB/core: Add support for idr typesMatan Barak1-1/+2
The new ioctl infrastructure supports driver specific objects. Each such object type has a hot unplug function, allocation size and an order of destruction. When a ucontext is created, a new list is created in this ib_ucontext. This list contains all objects created under this ib_ucontext. When a ib_ucontext is destroyed, we traverse this list several time destroying the various objects by the order mentioned in the object type description. If few object types have the same destruction order, they are destroyed in an order opposite to their creation. Adding an object is done in two parts. First, an object is allocated and added to idr tree. Then, the command's handlers (in downstream patches) could work on this object and fill in its required details. After a successful command, the commit part is called and the user objects become ucontext visible. If the handler failed, alloc_abort should be called. Removing an uboject is done by calling lookup_get with the write flag and finalizing it with destroy_commit. A major change from the previous code is that we actually destroy the kernel object itself in destroy_commit (rather than just the uobject). We should make sure idr (per-uverbs-file) and list (per-ucontext) could be accessed concurrently without corrupting them. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-01-10IB/core: added support to use rdma cgroup controllerParav Pandit1-0/+1
Added support APIs for IB core to register/unregister every IB/RDMA device with rdma cgroup for tracking rdma resources. IB core registers with rdma cgroup controller. Added support APIs for uverbs layer to make use of rdma controller. Added uverbs layer to perform resource charge/uncharge functionality. Added support during query_device uverb operation to ensure it returns resource limits by honoring rdma cgroup configured limits. Signed-off-by: Parav Pandit <pandit.parav@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-05-24IB/SA: Integrate ib_sa module into ib_core moduleMark Bloch1-6/+3
Consolidate ib_sa into ib_core, this commit eliminates ib_sa.ko and makes it part of ib_core.ko Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-24IB/MAD: Integrate ib_mad module into ib_core moduleMark Bloch1-4/+3
Consolidate ib_mad into ib_core, this commit eliminates ib_mad.ko and makes it part of ib_core.ko Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-24IB/core: Integrate IB address resolution module into coreLeon Romanovsky1-4/+2
IB address resolution is declared as a module (ib_addr.ko) which loads itself before IB core module (ib_core.ko). It causes to the scenario where IB netlink which is initialized by IB core can't be used by ib_addr.ko. In order to solve it, we are converting ib_addr.ko to be part of IB core module. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: generic RDMA READ/WRITE APIChristoph Hellwig1-1/+1
This supports both manual mapping of lots of SGEs, as well as using MRs from the QP's MR pool, for iWarp or other cases where it's more optimal. For now, MRs are only used for iWARP transports. The user of the RDMA-RW API must allocate the QP MR pool as well as size the SQ accordingly. Thanks to Steve Wise for testing, fixing and rewriting the iWarp support, and to Sagi Grimberg for ideas, reviews and fixes. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-13IB/core: add a simple MR poolChristoph Hellwig1-1/+1
Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-23IB/cma: Add configfs for rdma_cmMatan Barak1-0/+2
Users would like to control the behaviour of rdma_cm. For example, old applications which don't set the required RoCE gid type could be executed on RoCE V2 network types. In order to support this configuration, we implement a configfs for rdma_cm. In order to use the configfs, one needs to mount it and mkdir <IB device name> inside rdma_cm directory. The patch adds support for a single configuration file, default_roce_mode. The mode can either be "IB/RoCE v1" or "RoCE v2". Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-12-11IB: add a proper completion queue abstractionChristoph Hellwig1-1/+1
This adds an abstraction that allows ULPs to simply pass a completion object and completion callback with each submitted WR and let the RDMA core handle the nitty gritty details of how to handle completion interrupts and poll the CQ. In detail there is a new ib_cqe structure which just contains the completion callback, and which can be used to get at the containing object using container_of. It is pointed to by the WR and WC as an alternative to the wr_id field, similar to how many ULPs already use the field to store a pointer using casts. A driver using the new completion callbacks allocates it's CQs using the new ib_create_cq API, which in addition to the number of CQEs and the completion vectors also takes a mode on how we poll for CQEs. Three modes are available: direct for drivers that never take CQ interrupts and just poll for them, softirq to poll from softirq context using the to be renamed blk-iopoll infrastructure which takes care of rearming and budgeting, or a workqueue for consumer who want to be called from user context. Thanks a lot to Sagi Grimberg who helped reviewing the API, wrote the current version of the workqueue code because my two previous attempts sucked too much and converted the iSER initiator to the new API. Signed-off-by: Christoph Hellwig <hch@lst.de>
2015-08-30IB/core: Add RoCE GID table managementMatan Barak1-1/+2
RoCE GIDs are based on IP addresses configured on Ethernet net-devices which relate to the RDMA (RoCE) device port. Currently, each of the low-level drivers that support RoCE (ocrdma, mlx4) manages its own RoCE port GID table. As there's nothing which is essentially vendor specific, we generalize that, and enhance the RDMA core GID cache to do this job. In order to populate the GID table, we listen for events: (a) netdev up/down/change_addr events - if a netdev is built onto our RoCE device, we need to add/delete its IPs. This involves adding all GIDs related to this ndev, add default GIDs, etc. (b) inet events - add new GIDs (according to the IP addresses) to the table. For programming the port RoCE GID table, providers must implement the add_gid and del_gid callbacks. RoCE GID management requires us to state the associated net_device alongside the GID. This information is necessary in order to manage the GID table. For example, when a net_device is removed, its associated GIDs need to be removed as well. RoCE mandates generating a default GID for each port, based on the related net-device's IPv6 link local. In contrast to the GID based on the regular IPv6 link-local (as we generate GID per IP address), the default GID is also available when the net device is down (in order to support loopback). Locking is done as follows: The patch modify the GID table code both for new RoCE drivers implementing the add_gid/del_gid callbacks and for current RoCE and IB drivers that do not. The flows for updating the table are different, so the locking requirements are too. While updating RoCE GID table, protection against multiple writers is achieved via mutex_lock(&table->lock). Since writing to a table requires us to find an entry (possible a free entry) in the table and then modify it, this mutex protects both the find_gid and write_gid ensuring the atomicity of the action. Each entry in the GID cache is protected by rwlock. In RoCE, writing (usually results from netdev notifier) involves invoking the vendor's add_gid and del_gid callbacks, which could sleep. Therefore, an invalid flag is added for each entry. Updates for RoCE are done via a workqueue, thus sleeping is permitted. In IB, updates are done in write_lock_irq(&device->cache.lock), thus write_gid isn't allowed to sleep and add_gid/del_gid are not called. When passing net-device into/out-of the GID cache, the device is always passed held (dev_hold). The code uses a single work item for updating all RDMA devices, following a netdev or inet notifier. The patch moves the cache from being a client (which was incorrect, as the cache is part of the IB infrastructure) to being explicitly initialized/freed when a device is registered/removed. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2014-12-15IB/core: Implement support for MMU notifiers regarding on demand paging regionsHaggai Eran1-1/+1
* Add an interval tree implementation for ODP umems. Create an interval tree for each ucontext (including a count of the number of ODP MRs in this context, semaphore, etc.), and register ODP umems in the interval tree. * Add MMU notifiers handling functions, using the interval tree to notify only the relevant umems and underlying MRs. * Register to receive MMU notifier events from the MM subsystem upon ODP MR registration (and unregister accordingly). * Add a completion object to synchronize the destruction of ODP umems. * Add mechanism to abort page faults when there's a concurrent invalidation. The way we synchronize between concurrent invalidations and page faults is by keeping a counter of currently running invalidations, and a sequence number that is incremented whenever an invalidation is caught. The page fault code checks the counter and also verifies that the sequence number hasn't progressed before it updates the umem's page tables. This is similar to what the kvm module does. In order to prevent the case where we register a umem in the middle of an ongoing notifier, we also keep a per ucontext counter of the total number of active mmu notifiers. We only enable new umems when all the running notifiers complete. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yuval Dagan <yuvalda@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15IB/core: Add support for on demand paging regionsShachar Raindel1-0/+1
* Extend the umem struct to keep the ODP related data. * Allocate and initialize the ODP related information in the umem (page_list, dma_list) and freeing as needed in the end of the run. * Store a reference to the process PID struct in the ucontext. Used to safely obtain the task_struct and the mm during fault handling, without preventing the task destruction if needed. * Add 2 helper functions: ib_umem_odp_map_dma_pages and ib_umem_odp_unmap_dma_pages. These functions get the DMA addresses of specific pages of the umem (and, currently, pin them). * Support for page faults only - IB core will keep the reference on the pages used and call put_page when freeing an ODP umem area. Invalidations support will be added in a later patch. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-10RDMA/core: Add support for iWARP Port Mapper user space serviceTatyana Nikolova1-1/+1
This patch adds iWARP Port Mapper (IWPM) Version 2 support. The iWARP Port Mapper implementation is based on the port mapper specification section in the Sockets Direct Protocol paper - http://www.rdmaconsortium.org/home/draft-pinkerton-iwarp-sdp-v1.0.pdf Existing iWARP RDMA providers use the same IP address as the native TCP/IP stack when creating RDMA connections. They need a mechanism to claim the TCP ports used for RDMA connections to prevent TCP port collisions when other host applications use TCP ports. The iWARP Port Mapper provides a standard mechanism to accomplish this. Without this service it is possible for RDMA application to bind/listen on the same port which is already being used by native TCP host application. If that happens the incoming TCP connection data can be passed to the RDMA stack with error. The iWARP Port Mapper solution doesn't contain any changes to the existing network stack in the kernel space. All the changes are contained with the infiniband tree and also in user space. The iWARP Port Mapper service is implemented as a user space daemon process. Source for the IWPM service is located at http://git.openfabrics.org/git?p=~tnikolova/libiwpm-1.0.0/.git;a=summary The iWARP driver (port mapper client) sends to the IWPM service the local IP address and TCP port it has received from the RDMA application, when starting a connection. The IWPM service performs a socket bind from user space to get an available TCP port, called a mapped port, and communicates it back to the client. In that sense, the IWPM service is used to map the TCP port, which the RDMA application uses to any port available from the host TCP port space. The mapped ports are used in iWARP RDMA connections to avoid collisions with native TCP stack which is aware that these ports are taken. When an RDMA connection using a mapped port is terminated, the client notifies the IWPM service, which then releases the TCP port. The message exchange between the IWPM service and the iWARP drivers (between user space and kernel space) is implemented using netlink sockets. 1) Netlink interface functions are added: ibnl_unicast() and ibnl_mulitcast() for sending netlink messages to user space 2) The signature of the existing ibnl_put_msg() is changed to be more generic 3) Two netlink clients are added: RDMA_NL_NES, RDMA_NL_C4IW corresponding to the two iWarp drivers - nes and cxgb4 which use the IWPM service 4) Enums are added to enumerate the attributes in the netlink messages, which are exchanged between the user space IWPM service and the iWARP drivers Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: PJ Waskiewicz <pj.waskiewicz@solidfire.com> [ Fold in range checking fixes and nlh_next removal as suggested by Dan Carpenter and Steve Wise. Fix sparse endianness in hash. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-19IB/core: Make ib_addr a core IB moduleMatan Barak1-2/+3
IP based addressing introduces the usage of rdma_addr_find_dmac_by_grh() within ib_core. Since this function is declared in ib_addr, ib_addr should be a part of the core INFINIBAND modules, rather than INFINIBAND_ADDR_TRANS. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-20RDMA: Add netlink infrastructureRoland Dreier1-1/+1
Add basic RDMA netlink infrastructure that allows for registration of RDMA clients for which data is to be exported and supplies message construction callbacks. Signed-off-by: Nir Muchtar <nirm@voltaire.com> [ Reorganize a few things, add CONFIG_NET dependency. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
2007-05-08IB/uverbs: Export ib_umem_get()/ib_umem_release() to modulesRoland Dreier1-2/+2
Export ib_umem_get()/ib_umem_release() and put low-level drivers in control of when to call ib_umem_get() to pin and DMA map userspace, rather than always calling it in ib_uverbs_reg_mr() before calling the low-level driver's reg_user_mr method. Also move these functions to be in the ib_core module instead of ib_uverbs, so that driver modules using them do not depend on ib_uverbs. This has a number of advantages: - It is better design from the standpoint of making generic code a library that can be used or overridden by device-specific code as the details of specific devices dictate. - Drivers that do not need to pin userspace memory regions do not need to take the performance hit of calling ib_mem_get(). For example, although I have not tried to implement it in this patch, the ipath driver should be able to avoid pinning memory and just use copy_{to,from}_user() to access userspace memory regions. - Buffers that need special mapping treatment can be identified by the low-level driver. For example, it may be possible to solve some Altix-specific memory ordering issues with mthca CQs in userspace by mapping CQ buffers with extra flags. - Drivers that need to pin and DMA map userspace memory for things other than memory regions can use ib_umem_get() directly, instead of hacks using extra parameters to their reg_phys_mr method. For example, the mlx4 driver that is pending being merged needs to pin and DMA map QP and CQ buffers, but it does not need to create a memory key for these buffers. So the cleanest solution is for mlx4 to call ib_umem_get() in the create_qp and create_cq methods. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-02-16IB/sa: Track multicast join/leave requestsSean Hefty1-1/+1
The IB SA tracks multicast join/leave requests on a per port basis and does not do any reference counting: if two users of the same port join the same group, and one leaves that group, then the SA will remove the port from the group even though there is one user who wants to stay a member left. Therefore, in order to support multiple users of the same multicast group from the same port, we need to perform reference counting locally. To do this, add an multicast submodule to ib_sa to perform reference counting of multicast join/leave operations. Modify ib_ipoib (the only in-kernel user of multicast) to use the new interface. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-12-12RDMA/cma: Export rdma cm interface to userspaceSean Hefty1-1/+5
Export the rdma cm interfaces to userspace via a misc device. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22RDMA: iWARP Core Changes.Tom Tucker1-1/+3
Modifications to the existing rdma header files, core files, drivers, and ulp files to support iWARP, including: - Hook iWARP CM into the build system and use it in rdma_cm. - Convert enum ib_node_type to enum rdma_node_type, which includes the possibility of RDMA_NODE_RNIC, and update everything for this. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17IB: IP address based RDMA connection managerSean Hefty1-2/+4
Kernel connection management agent over InfiniBand that connects based on IP addresses. The agent defines a generic RDMA connection abstraction to support clients wanting to connect over different RDMA devices. The agent also handles RDMA device hotplug events on behalf of clients. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17IB: address translation to map IP toIB addresses (GIDs)Sean Hefty1-1/+5
Add an address translation service that maps IP addresses to InfiniBand GID addresses using IPoIB. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-06-17IB: common handling for marshalling parameters to/from userspaceSean Hefty1-1/+2
Provide common handling for marshalling data between userspace clients and kernel InfiniBand drivers. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-09-07[PATCH] IB: clean up user access config optionsJames Lentini1-2/+3
Add a new config option INFINIBAND_USER_MAD to control whether we build ib_umad. Change INFINIBAND_USER_VERBS to INFINIBAND_USER_ACCESS, and have it control ib_ucm and ib_uat as well as ib_uverbs. Signed-off-by: James Lentini <jlentini@netapp.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-08-26[PATCH] IB: move include files to include/rdmaRoland Dreier1-2/+0
Move the InfiniBand headers from drivers/infiniband/include to include/rdma. This allows InfiniBand-using code to live elsewhere, and lets us remove the ugly EXTRA_CFLAGS include path from the InfiniBand Makefiles. Signed-off-by: Roland Dreier <rolandd@cisco.com>
2005-07-27[PATCH] IB: Hook up userspace CM to the make systemHal Rosenstock1-1/+3
Hook up userspace CM to the make system Signed-off-by: Libor Michalek <libor@topspin.com> Signed-off-by: Hal Rosenstock <halr@voltaire.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27[PATCH] IB: Add the kernel CM implementationHal Rosenstock1-1/+4
Add the kernel CM implementation Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Hal Rosenstock <halr@voltaire.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-27[PATCH] IB: Add RMPP implementationHal Rosenstock1-1/+1
Add RMPP implementation. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Hal Rosenstock <halr@voltaire.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-07[PATCH] IB uverbs: hook up Kconfig/MakefileRoland Dreier1-1/+4
Hook up InfiniBand userspace verbs to Kconfig and the make system. Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-04-16Linux-2.6.12-rc2Linus Torvalds1-0/+12
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!