aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2015-08-30IB/mlx4: Demote mcg message from warning to debugJack Morgenstein1-2/+6
The mcg "too many pending requests" warning message fills the log when OpenSM is downed. Demote the message from warning level to debug level. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/mlx4: Fix potential deadlock when sending mad to wireJack Morgenstein1-3/+4
send_mad_to_wire takes the same spinlock that is taken in the interrupt context. Therefore, it needs irqsave/restore. Fixes: b9c5d6a64358 ('IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV') Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/core: Remove needless bracketizationDoug Ledford1-7/+4
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30RDMA/ocrdma: Incorporate the moving of GID Table mgmt to IB/CoreSomnath Kotur5-233/+60
1.Change query_gid hook to return value from IB/Core GID management APIs. 2.Get rid of all the netdev notifier chain subscription code as well as maintenance of SGID Table in memory. 3.Implement get_netdev hook in driver. Signed-off-by: Somnath Kotur <somnath.kotur@avagotech.com> Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/mlx4: Replace mechanism for RoCE GID managementMoni Shoua4-494/+35
Manage RoCE gid table with logic in IB/core, which is common to all vendors, and remove the mechanism from the mlx4 IB driver. Since management of the GID cache may lead to index mismatch with the hardware GID table, a translation between indexes is required when modifying a QP or creating an address handle. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/mlx4: Implement ib_device callbacksMoni Shoua5-4/+257
get_netdev: get the net_device on the physical port of the IB transport port. In port aggregation mode it is required to return the netdev of the active port. modify_gid: note for a change in the RoCE gid cache. Handle this by writing to the harsware GID table. It is possible that indexes in cahce and hardware tables won't match so a translation is required when modifying a QP or creating an address handle. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30net/mlx4: Postpone the registration of net_deviceMoni Shoua3-15/+25
The mlx4 network driver was registered in the context of the 'add' function of the core driver (called when HW should be registered). This makes the netdev event NETDEV_REGISTER to be sent in a context where the answer to get_protocol_dev() callback returns NULL. This may be confusing to listeners of netdev events. This patch is a preparation to the patch that implements the get_netdev() callback in the IB/mlx4 driver. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/core: Add RoCE table bonding supportMatan Barak1-20/+285
Handling bonding and other devices require us to all all GIDs of the net-devices which are upper-devices of the RoCE port related net-device. Active-backup configurations imposes even more challenges as the default GID should only be set on the active devices (this is necessary as otherwise the same MAC could be used for several slaves and thus several slaves will have identical GIDs). Managing these configurations are done by listening to: (a) NETDEV_CHANGEUPPER event (1) if a related net-device is linked, delete all inactive slaves default GIDs and add the upper device GIDs. (2) if a related net-device is unlinked, delete all upper GIDs and add the default GIDs. (b) NETDEV_BONDING_FAILOVER: (1) delete the bond GIDs from inactive slaves (2) delete the inactive slave's default GIDs (3) Add the bond GIDs to the active slave. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/core: missing curly braces in ib_find_gid()Dan Carpenter1-1/+2
Smatch says that, based on the indenting, we should probably add curly braces here. Fixes: 03db3a2d81e6 ('IB/core: Add RoCE GID table management') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/core: Add RoCE GID table managementMatan Barak6-108/+1338
RoCE GIDs are based on IP addresses configured on Ethernet net-devices which relate to the RDMA (RoCE) device port. Currently, each of the low-level drivers that support RoCE (ocrdma, mlx4) manages its own RoCE port GID table. As there's nothing which is essentially vendor specific, we generalize that, and enhance the RDMA core GID cache to do this job. In order to populate the GID table, we listen for events: (a) netdev up/down/change_addr events - if a netdev is built onto our RoCE device, we need to add/delete its IPs. This involves adding all GIDs related to this ndev, add default GIDs, etc. (b) inet events - add new GIDs (according to the IP addresses) to the table. For programming the port RoCE GID table, providers must implement the add_gid and del_gid callbacks. RoCE GID management requires us to state the associated net_device alongside the GID. This information is necessary in order to manage the GID table. For example, when a net_device is removed, its associated GIDs need to be removed as well. RoCE mandates generating a default GID for each port, based on the related net-device's IPv6 link local. In contrast to the GID based on the regular IPv6 link-local (as we generate GID per IP address), the default GID is also available when the net device is down (in order to support loopback). Locking is done as follows: The patch modify the GID table code both for new RoCE drivers implementing the add_gid/del_gid callbacks and for current RoCE and IB drivers that do not. The flows for updating the table are different, so the locking requirements are too. While updating RoCE GID table, protection against multiple writers is achieved via mutex_lock(&table->lock). Since writing to a table requires us to find an entry (possible a free entry) in the table and then modify it, this mutex protects both the find_gid and write_gid ensuring the atomicity of the action. Each entry in the GID cache is protected by rwlock. In RoCE, writing (usually results from netdev notifier) involves invoking the vendor's add_gid and del_gid callbacks, which could sleep. Therefore, an invalid flag is added for each entry. Updates for RoCE are done via a workqueue, thus sleeping is permitted. In IB, updates are done in write_lock_irq(&device->cache.lock), thus write_gid isn't allowed to sleep and add_gid/del_gid are not called. When passing net-device into/out-of the GID cache, the device is always passed held (dev_hold). The code uses a single work item for updating all RDMA devices, following a netdev or inet notifier. The patch moves the cache from being a client (which was incorrect, as the cache is part of the IB infrastructure) to being explicitly initialized/freed when a device is registered/removed. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/core: Make ib_alloc_device init the kobjectJason Gunthorpe4-87/+68
This gets rid of the weird in-between state where struct ib_device was allocated but the kobject didn't work. Consequently ib_device_release is now guaranteed to be called in all situations and we needn't duplicate its kfrees on error paths. Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30net/bonding: Export bond_option_active_slave_get_rcuMatan Barak2-13/+7
Some consumers of the netdev events API would like to know who is the active slave when a NETDEV_CHANGEUPPER or NETDEV_BONDING_FAILOVER events occur. For example, when managing RoCE GIDs, GIDs based on the bond's ips should only be set on the port which corresponds to active slave netdevice. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30net: Add info for NETDEV_CHANGEUPPER eventMatan Barak2-2/+24
Some consumers of NETDEV_CHANGEUPPER event would like to know which upper device was linked/unlinked and what operation was carried. Add information in the notifier info block for that purpose. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30net/ipv6: Export addrconf_ifid_eui48Matan Barak2-31/+31
For loopback purposes, RoCE devices should have a default GID in the port GID table, even when the interface is down. In order to do so, we use the IPv6 link local address which would have been genenrated for the related Ethernet netdevice when it goes up as a default GID. addrconf_ifid_eui48 is used to gernerate this address, export it. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/core: Drop ib_alloc_fast_reg_mrSagi Grimberg2-40/+3
Fully replaced by a more generic and suitable ib_alloc_mr. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/hfi1: Support ib_alloc_mr verbMike Marciniszyn3-3/+11
Ported from upstream qib commit 68c02e232b8a ("qib: Support ib_alloc_mr verb") Tested-by: Jubin John <jubin.john@intel.com> Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30qib: Support ib_alloc_mr verbSagi Grimberg3-4/+11
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30nes: Support ib_alloc_mr verbSagi Grimberg1-5/+14
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30cxgb3: Support ib_alloc_mr verbSagi Grimberg1-4/+10
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30iw_cxgb4: Support ib_alloc_mr verbSagi Grimberg3-5/+13
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30ocrdma: Support ib_alloc_mr verbSagi Grimberg3-5/+12
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30mlx4: Support ib_alloc_mr verbSagi Grimberg3-6/+12
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30mlx5: Drop mlx5_ib_alloc_fast_reg_mrSagi Grimberg3-47/+0
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30RDS: Convert to ib_alloc_mrSagi Grimberg2-4/+6
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30svcrdma: limit FRMR page list lengths to device maxSteve Wise1-2/+4
Svcrdma was incorrectly allocating fastreg MRs and page lists using RPCSVC_MAXPAGES, which can exceed the device capabilities. So limit the depth to the minimum of RPCSVC_MAXPAGES and xprt->sc_frmr_pg_list_len. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30xprtrdma, svcrdma: Convert to ib_alloc_mrSagi Grimberg2-4/+4
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/srp: Convert to ib_alloc_mrSagi Grimberg1-1/+2
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30iser-target: Convert to ib_alloc_mrSagi Grimberg1-2/+4
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/iser: Convert to ib_alloc_mrSagi Grimberg1-3/+4
Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB: Modify ib_create_mr APISagi Grimberg7-46/+58
Use ib_alloc_mr with specific parameters. Change the existing callers. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/core: Get rid of redundant verb ib_destroy_mrSagi Grimberg7-61/+15
This was added in a thought of uniting all mr allocation and deallocation routines but the fact is we have a single deallocation routine already, ib_dereg_mr. And, move mlx5_ib_destroy_mr specific logic into mlx5_ib_dereg_mr (includes only signature stuff for now). And, fixup the only callers (iser/isert) accordingly. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/cma: Fix net_dev reference leak with failed requestsHaggai Eran1-0/+4
When no matching listening ID is found for a given request, the net_dev that was used to find the request isn't released. Fixes: 0b3ca768fcb0 ("IB/cma: Use found net_dev for passive connections") Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/cm: Remove compare_data checksHaggai Eran5-107/+23
Now that there are no ib_cm clients using the compare_data feature for matching IB CM requests' private data, remove the compare_data parameter of ib_cm_listen and remove the code implementing the feature. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/cma: Share ib_cm_ids between rdma_cm_idsHaggai Eran1-55/+4
Use ib_cm_insert_listen to create listening IB CM IDs or share existing ones if needed. When given a request on a specific CM ID, the code now matches the request to the RDMA CM ID based on the request parameters, so it no longer needs to rely on the ib_cm's private data matching capabilities. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/cma: Use found net_dev for passive connectionsHaggai Eran1-27/+49
When receiving a new connection in cma_req_handler, we actually already know the net_dev that is used for the connection's creation. Instead of calling cma_translate_addr to resolve the new connection id's source address, just use the net_dev that was found. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/cma: Validate routing of incoming requestsHaggai Eran1-3/+92
Pass incoming request parameters through the relevant IPv4/IPv6 routing tables and make sure the network stack is configured to handle such requests. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/cma: Add net_dev and private data checks to RDMA CMHaggai Eran1-3/+185
Instead of relying on a the ib_cm module to check an incoming CM request's private data header, add these checks to the RDMA CM module. This allows a following patch to to clean up the ib_cm interface and remove the code that looks into the private headers. It will also allow supporting namespaces in RDMA CM by making these checks namespace aware later on. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/cm: Expose BTH P_Key in CM and SIDR request eventsHaggai Eran2-0/+26
The rdma_cm module will later use the P_Key from the BTH to de-mux requests. See discussion at: http://www.spinics.net/lists/netdev/msg336067.html Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Liran Liss <liranl@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/cma: Helper functions to access port space IDRsHaggai Eran1-21/+60
Add helper functions to access the IDRs by port-space and port number. Pass around the port-space enum in cma.c instead of using pointers to port-space IDRs. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/cma: Refactor RDMA IP CM private-data parsing codeHaggai Eran1-65/+105
When receiving a connection request, rdma_cm needs to associate the request with a network device, in order to disambiguate requests. To do this, it needs to know the request's destination IP. For this the module needs to allow getting this information from the private data in the request packet, instead of relying on the information already being in the listening RDMA CM ID. When creating a new incoming connection ID, the code in cma_save_ip{4,6}_info can no longer rely on the listener's private data to find the port number, so it reads it from the requested service ID. Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/cm: Share listening CM IDsHaggai Eran2-6/+120
Enabling network namespaces for RDMA CM will allow processes on different namespaces to listen on the same port. In order to leave namespace support out of the CM layer, this requires that multiple RDMA CM IDs will be able to share a single CM ID. This patch adds infrastructure to retrieve an existing listening ib_cm_id, based on its device and service ID, or create a new one if one does not already exist. It also adds a reference count for such instances (cm_id_private.listen_sharecount), and prevents cm_destroy_id from destroying a CM if it is still shared. See the relevant discussion [1]. [1] Re: [PATCH v3 for-next 05/13] IB/cm: Reference count ib_cm_ids http://www.spinics.net/lists/netdev/msg328860.html Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/cm: Expose service ID in request eventsHaggai Eran2-0/+4
Expose the service ID on an incoming CM or SIDR request to the event handler. This will allow the RDMA CM module to de-multiplex connection requests based on the information encoded in the service ID. Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/ipoib: Return IPoIB devices matching connection parametersGuy Shapiro1-1/+228
Implement the get_net_device_by_port_pkey_ip callback that returns network device to ib_core according to connection parameters. Check the ipoib device and iterate over all child devices to look for a match. For each IPoIB device we iterate through all upper devices when searching for a matching IP, in order to support bonding. Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/core: Find the network device matching connection parametersYotam Kenneth2-0/+73
In the case of IPoIB, and maybe in other cases, the network device is managed by an upper-layer protocol (ULP). In order to expose this network device to other users of the IB device, let ULPs implement a callback that returns network device according to connection parameters. The IB device and port, together with the P_Key and the GID should be enough to uniquely identify the ULP net device. However, in current kernels there can be multiple IPoIB interfaces created with the same GID. Furthermore, such configuration may be desireable to support ipvlan-like configurations for RDMA CM with IPoIB. To resolve the device in these cases the code will also take the IP address as an additional input. Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/core: lock client data with lists_rwsemHaggai Eran16-52/+82
An ib_client callback that is called with the lists_rwsem locked only for read is protected from changes to the IB client lists, but not from ib_unregister_device() freeing its client data. This is because ib_unregister_device() will remove the device from the device list with lists_rwsem locked for write, but perform the rest of the cleanup, including the call to remove() without that lock. Mark client data that is undergoing de-registration with a new going_down flag in the client data context. Lock the client data list with lists_rwsem for write in addition to using the spinlock, so that functions calling the callback would be able to lock only lists_rwsem for read and let callbacks sleep. Since ib_unregister_client() now marks the client data context, no need for remove() to search the context again, so pass the client data directly to remove() callbacks. Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30IB/core: Add rwsem to allow reading device list or client listHaggai Eran1-12/+28
Currently the RDMA subsystem's device list and client list are protected by a single mutex. This prevents adding user-facing APIs that iterate these lists, since using them may cause a deadlock. The patch attempts to solve this problem by adding a read-write semaphore to protect the lists. Readers now don't need the mutex, and are safe just by read-locking the semaphore. The ib_register_device, ib_register_client, ib_unregister_device, and ib_unregister_client functions are modified to lock the semaphore for write during their respective list modification. Also, in order to make sure client callbacks are called only between add() and remove() calls, the code is changed to only add items to the lists after the add() calls and remove from the lists before the remove() calls. This patch attempts to solve a similar need [1] that was seen in the RoCE v2 patch series. [1] http://www.spinics.net/lists/linux-rdma/msg24733.html Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Matan Barak <matanb@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28RDMA/Core: remove rdma_cap_read_multi_sge() helperSteve Wise1-28/+0
This functionality already exists via the max_sge_rd device capability. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28svcrdma: Use max_sge_rd for destination read depthsSteve Wise3-11/+6
Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28ipath,qib: Expose max_sge_rd correctlySteve Wise2-0/+2
Applications must not assume that max_sge and max_sge_rd are the same, Hence expose max_sge_rd correctly as well. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-28mlx4, mlx5, mthca: Expose max_sge_rd correctlySagi Grimberg3-0/+3
Applications must not assume that max_sge and max_sge_rd are the same, Hence expose max_sge_rd correctly as well. Reported-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>