aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/core (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-04-23IB/core: Fix deleting default GIDs when changing mac adddressParav Pandit2-25/+29
Before [1], When MAC address of the netdevice is changed, default GID is supposed to get deleted and added back which affects the node and/or port GUID in below sequence. netdevice_event() -> NETDEV_CHANGEADDR default_del_cmd() del_netdev_default_ips() bond_delete_netdev_default_gids() ib_cache_gid_set_default_gid() ib_cache_gid_del() add_cmd() [..] However, ib_cache_gid_del() was not getting invoked in non bonding scenarios because event_ndev and rdma_ndev are same. Therefore, fix such condition to ignore checking upper device when event ndev and rdma_dev are same; similar to bond_set_netdev_default_gids(). Which this fix ib_cache_gid_del() is invoked correctly; however ib_cache_gid_del() doesn't find the default GID for deletion because find_gid() was given default_gid = false with GID_ATTR_FIND_MASK_DEFAULT set. But it was getting overwritten by ib_cache_gid_set_default_gid() later on as part of add_cmd(). Therefore, mac address change used to work for default GID. With refactor series [1], this incorrect behavior is detected. Therefore, when deleting default GID, set default_gid and set MASK flag. when deleting IP based GID, clear default_gid and set MASK flag. [1] https://patchwork.kernel.org/patch/10319151/ Fixes: 238fdf48f2b5 ("IB/core: Add RoCE table bonding support") Fixes: 598ff6bae689 ("IB/core: Refactor GID modify code for RoCE") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-23IB/core: Fix to avoid deleting IPv6 look alike default GIDsParav Pandit1-4/+13
When IPv6 link local address is removed, if it matches with the default GID, default GID(s)s gets removed which may not be a desired behavior. This behavior is introduced by refactor work in Fixes tag. When IPv6 link address is removed, removing its equivalent RoCEv2 GID which exactly matches with default RoCEv2 GID, is right thing to do. However achieving it correctly requires lot more changes, likely in roce_gid_mgmt.c and core/cache.c. This should be done as independent patch. Therefore, this patch preserves behavior of not deleteing default GIDs. This is done by providing explicit hint to consider default GID property using mask and default_gid; similar to add_gid(). Fixes: 598ff6bae68 ("IB/core: Refactor GID modify code for RoCE") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-23IB/core: Don't allow default GID addition at non reseved slotsParav Pandit1-8/+12
Default GIDs are marked reserved at the start of the GID table at index 0 and 1 by gid_table_reserve_default(). Currently when default GID is requested, it can still allocates an empty slot which was not marked as RESERVED for default GID, which is incorrect. At least in current code flow of roce_gid_mgmt.c, in theory we can still request to allocate more than one/two default GIDs depending on how upper devices are setup. Therefore, it is better for cache layer to only allow our reserved slots to be used by default GID allocation requests. Fixes: 598ff6bae689 ("IB/core: Refactor GID modify code for RoCE") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-23RDMA/ucma: Allow resolving address w/o specifying source addressRoland Dreier1-1/+1
The RDMA CM will select a source device and address by consulting the routing table if no source address is passed into rdma_resolve_address(). Userspace will ask for this by passing an all-zero source address in the RESOLVE_IP command. Unfortunately the new check for non-zero address size rejects this with EINVAL, which breaks valid userspace applications. Fix this by explicitly allowing a zero address family for the source. Fixes: 2975d5de6428 ("RDMA/ucma: Check AF family prior resolving address") Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-19RDMA/ucma: Check for a cm_id->device in all user calls that need itJason Gunthorpe1-12/+24
This is done by auditing all callers of ucma_get_ctx and switching the ones that unconditionally touch ->device to ucma_get_ctx_dev. This covers a little less than half of the call sites. The 11 remaining call sites to ucma_get_ctx() were manually audited. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-17IB/uverbs: Add missing braces in anonymous union initializersGeert Uytterhoeven1-6/+6
With gcc-4.1.2: drivers/infiniband/core/uverbs_std_types_flow_action.c:366: error: unknown field ‘ptr’ specified in initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:367: error: unknown field ‘type’ specified in initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:367: warning: missing braces around initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:367: warning: (near initialization for ‘uverbs_flow_action_esp_keymat[0].<anonymous>.<anonymous>’) drivers/infiniband/core/uverbs_std_types_flow_action.c:368: error: unknown field ‘min_len’ specified in initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:368: warning: excess elements in union initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:368: warning: (near initialization for ‘uverbs_flow_action_esp_keymat[0].<anonymous>’) drivers/infiniband/core/uverbs_std_types_flow_action.c:368: error: unknown field ‘len’ specified in initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:368: warning: excess elements in union initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:368: warning: (near initialization for ‘uverbs_flow_action_esp_keymat[0].<anonymous>’) drivers/infiniband/core/uverbs_std_types_flow_action.c:369: error: unknown field ‘flags’ specified in initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:369: warning: excess elements in union initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:369: warning: (near initialization for ‘uverbs_flow_action_esp_keymat[0].<anonymous>’) drivers/infiniband/core/uverbs_std_types_flow_action.c:376: error: unknown field ‘ptr’ specified in initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:377: error: unknown field ‘type’ specified in initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:377: warning: missing braces around initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:377: warning: (near initialization for ‘uverbs_flow_action_esp_replay[0].<anonymous>.<anonymous>’) drivers/infiniband/core/uverbs_std_types_flow_action.c:379: error: unknown field ‘len’ specified in initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:379: warning: excess elements in union initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:379: warning: (near initialization for ‘uverbs_flow_action_esp_replay[0].<anonymous>’) drivers/infiniband/core/uverbs_std_types_flow_action.c:383: error: unknown field ‘ptr’ specified in initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:384: error: unknown field ‘type’ specified in initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:385: error: unknown field ‘min_len’ specified in initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:385: warning: excess elements in union initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:385: warning: (near initialization for ‘uverbs_flow_action_esp_replay[1].<anonymous>’) drivers/infiniband/core/uverbs_std_types_flow_action.c:385: error: unknown field ‘len’ specified in initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:385: warning: excess elements in union initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:385: warning: (near initialization for ‘uverbs_flow_action_esp_replay[1].<anonymous>’) drivers/infiniband/core/uverbs_std_types_flow_action.c:386: error: unknown field ‘flags’ specified in initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:386: warning: excess elements in union initializer drivers/infiniband/core/uverbs_std_types_flow_action.c:386: warning: (near initialization for ‘uverbs_flow_action_esp_replay[1].<anonymous>’) Add the missing braces to fix this. Fixes: 2eb9beaee5d7 ("IB/uverbs: Add flow_action create and destroy verbs") Fixes: 7d12f8d5a164 ("IB/uverbs: Add modify ESP flow_action") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-16RDMA/ucma: ucma_context reference leak in error pathShamir Rabinovitch1-3/+3
Validating input parameters should be done before getting the cm_id otherwise it can leak a cm_id reference. Fixes: 6a21dfc0d0db ("RDMA/ucma: Limit possible option size") Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-06Merge tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds28-1079/+2579
Pull rdma updates from Jason Gunthorpe: "Doug and I are at a conference next week so if another PR is sent I expect it to only be bug fixes. Parav noted yesterday that there are some fringe case behavior changes in his work that he would like to fix, and I see that Intel has a number of rc looking patches for HFI1 they posted yesterday. Parav is again the biggest contributor by patch count with his ongoing work to enable container support in the RDMA stack, followed by Leon doing syzkaller inspired cleanups, though most of the actual fixing went to RC. There is one uncomfortable series here fixing the user ABI to actually work as intended in 32 bit mode. There are lots of notes in the commit messages, but the basic summary is we don't think there is an actual 32 bit kernel user of drivers/infiniband for several good reasons. However we are seeing people want to use a 32 bit user space with 64 bit kernel, which didn't completely work today. So in fixing it we required a 32 bit rxe user to upgrade their userspace. rxe users are still already quite rare and we think a 32 bit one is non-existing. - Fix RDMA uapi headers to actually compile in userspace and be more complete - Three shared with netdev pull requests from Mellanox: * 7 patches, mostly to net with 1 IB related one at the back). This series addresses an IRQ performance issue (patch 1), cleanups related to the fix for the IRQ performance problem (patches 2-6), and then extends the fragmented completion queue support that already exists in the net side of the driver to the ib side of the driver (patch 7). * Mostly IB, with 5 patches to net that are needed to support the remaining 10 patches to the IB subsystem. This series extends the current 'representor' framework when the mlx5 driver is in switchdev mode from being a netdev only construct to being a netdev/IB dev construct. The IB dev is limited to raw Eth queue pairs only, but by having an IB dev of this type attached to the representor for a switchdev port, it enables DPDK to work on the switchdev device. * All net related, but needed as infrastructure for the rdma driver - Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers - SRP performance updates - IB uverbs write path cleanup patch series from Leon - Add RDMA_CM support to ib_srpt. This is disabled by default. Users need to set the port for ib_srpt to listen on in configfs in order for it to be enabled (/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port) - TSO and Scatter FCS support in mlx4 - Refactor of modify_qp routine to resolve problems seen while working on new code that is forthcoming - More refactoring and updates of RDMA CM for containers support from Parav - mlx5 'fine grained packet pacing', 'ipsec offload' and 'device memory' user API features - Infrastructure updates for the new IOCTL interface, based on increased usage - ABI compatibility bug fixes to fully support 32 bit userspace on 64 bit kernel as was originally intended. See the commit messages for extensive details - Syzkaller bugs and code cleanups motivated by them" * tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (199 commits) IB/rxe: Fix for oops in rxe_register_device on ppc64le arch IB/mlx5: Device memory mr registration support net/mlx5: Mkey creation command adjustments IB/mlx5: Device memory support in mlx5_ib net/mlx5: Query device memory capabilities IB/uverbs: Add device memory registration ioctl support IB/uverbs: Add alloc/free dm uverbs ioctl support IB/uverbs: Add device memory capabilities reporting IB/uverbs: Expose device memory capabilities to user RDMA/qedr: Fix wmb usage in qedr IB/rxe: Removed GID add/del dummy routines RDMA/qedr: Zero stack memory before copying to user space IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIR IB/mlx5: Add information for querying IPsec capabilities IB/mlx5: Add IPsec support for egress and ingress {net,IB}/mlx5: Add ipsec helper IB/mlx5: Add modify_flow_action_esp verb IB/mlx5: Add implementation for create and destroy action_xfrm IB/uverbs: Introduce ESP steering match filter IB/uverbs: Add modify ESP flow_action ...
2018-04-05IB/uverbs: Add device memory registration ioctl supportAriel Levkovich4-12/+154
Adding new ioctl method for the MR object - REG_DM_MR. This command can be used by users to register an allocated device memory buffer as an MR and receive lkey and rkey to be used within work requests. It is added as a new method under the MR object and using a new ib_device callback - reg_dm_mr. The command creates a standard ib_mr object which represents the registered memory. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05IB/uverbs: Add alloc/free dm uverbs ioctl supportAriel Levkovich4-2/+112
This change adds uverbs support for allocation/freeing of device memory commands. A new uverbs object is defined of type idr to represent and track the new resource type allocation per context. The API requires provider driver to implement 2 new ib_device callbacks - one for allocation and one for deallocation which return and accept (respectively) the ib_dm object which represents the allocated memory on the device. The support is added via the ioctl command infrastructure only. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-05IB/uverbs: Add device memory capabilities reportingAriel Levkovich1-0/+6
This change allows vendors to report device memory capability max_dm_size - to user via uverbs command. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04IB/uverbs: Introduce ESP steering match filterMatan Barak2-0/+12
Adding a new ESP steering match filter that could match against spi and seq used in IPSec protocol. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04IB/uverbs: Add modify ESP flow_actionMatan Barak1-5/+92
flow_actions of ESP type could be modified during runtime. This could be common for example when ESN should be changed. Adding a new UVERBS_FLOW_ACTION_ESP_MODIFY method for changing ESP parameters of an existing ESP flow_action. The new method uses the UVERBS_FLOW_ACTION_ESP_CREATE attributes, but adds a new IB_FLOW_ACTION_ESP_FLAGS_MOD_ESP_ATTRS which means ESP_ATTRS should be changed. In addition, we add a new FLOW_ACTION_ESP_REPLAY_NONE replay type that could be used when one wants to disable a replay protection over a specific flow_action. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04IB/uverbs: Add action_handle flow steering specificationMatan Barak3-8/+100
Binding a flow_action to flow steering rule requires using a new specification. Therefore, adding such an IB_FLOW_SPEC_ACTION_HANDLE flow specification. Flow steering rules could use flow_action(s) and as of that we need to avoid deleting flow_action(s) as long as they're being used. Moreover, when the attached rules are deleted, action_handle reference count should be decremented. Introducing a new mechanism of flow resources to keep track on the attached action_handle(s). Later on, this mechanism should be extended to other attached flow steering resources like flow counters. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04IB/uverbs: Add flow_action create and destroy verbsMatan Barak4-2/+363
A verbs application may receive and transmits packets using a data path pipeline. Sometimes, the first stage in the receive pipeline or the last stage in the transmit pipeline involves transforming a packet, either in order to make it easier for later stages to process it or to prepare it for transmission over the wire. Such transformation could be stripping/encapsulating the packet (i.e. vxlan), decrypting/encrypting it (i.e. ipsec), altering headers, doing some complex FPGA changes, etc. Some hardware could do such transformations without software data path intervention at all. The flow steering API supports steering a packet (either to a QP or dropping it) and some simple packet immutable actions (i.e. tagging a packet). Complex actions, that may change the packet, could bloat the flow steering API extensively. Sometimes the same action should be applied to several flows. In this case, it's easier to bind several flows to the same action and modify it than change all matching flows. Introducing a new flow_action object that abstracts any packet transformation (out of a standard and well defined set of actions). This flow_action object could be tied to a flow steering rule via a new specification. Currently, we support esp flow_action, which encrypts or decrypts a packet according to the given parameters. However, we present a flexible schema that could be used to other transformation actions tied to flow rules. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04IB/uverbs: Refactor kern_spec_to_ib_spec_filterMatan Barak2-16/+37
The current implementation of kern_spec_to_ib_spec_filter, which takes a uAPI based flow steering specification and creates the respective kernel API flow steering structure, gets a ib_uverbs_flow_spec structure. The new flow_action uAPI gets a match mask and filter from user-space which aren't encoded in the flow steering's ib_uverbs_flow_spec structure. Exporting the logic out of kern_spec_to_ib_spec_filter to get user-space blobs rather than ib_uverbs_flow_spec structure. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-04IB/uverbs: Add enum attribute type to ioctl() interfaceMatan Barak1-9/+30
Methods sometimes need to get one attribute out of a group of pre-defined attributes. This is an enum-like behavior. Since this is a common requirement, we add a new ENUM attribute to the generic uverbs ioctl() layer. This attribute is embedded in methods, like any other attributes we currently have. ENUM attributes point to an array of standard UVERBS_ATTR_PTR_IN. The user-space encodes the enum's attribute id in the id field and the internal PTR_IN attr id in the enum_data.elem_id field. This ENUM attribute could be shared by several attributes and it can get UVERBS_ATTR_SPEC_F_MANDATORY flag, stating this attribute must be supported by the kernel, like any other attribute. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03RDMA: Use ib_gid_attr during GID modificationParav Pandit1-3/+2
Now that ib_gid_attr contains device, port and index, simplify the provider APIs add_gid() and del_gid() to use device, port and index fields from the ib_gid_attr attributes structure. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03IB/core: Refactor GID modify code for RoCEParav Pandit2-234/+268
Code is refactored to prepare separate functions for RoCE which can do more complex operations related to reference counting, while still maintainining code readability. This includes (a) Simplification to not perform netdevice checks and modifications for IB link layer. (b) Do not add RoCE GID entry which has NULL netdevice; instead return an error. (c) If GID addition fails at provider level add_gid(), do not add the entry in the cache and keep the entry marked as INVALID. (d) Simplify and reuse the ib_cache_gid_add()/del() routines so that they can be used even for modifying default GIDs. This avoid some code duplication in modifying default GIDs. (e) find_gid() routine refers to the data entry flags to qualify a GID as valid or invalid GID rather than depending on attributes and zeroness of the GID content. (f) gid_table_reserve_default() sets the GID default attribute at beginning while setting up the GID table. There is no need to use default_gid flag in low level functions such as write_gid(), add_gid(), del_gid(), as they never need to update the DEFAULT property of the GID entry while during GID table update. As as result of this refactor, reserved GID 0:0:0:0:0:0:0:0 is no longer searchable as described below. A unicast GID entry of 0:0:0:0:0:0:0:0 is Reserved GID as per the IB spec version 1.3 section 4.1.1, point (6) whose snippet is below. "The unicast GID address 0:0:0:0:0:0:0:0 is reserved - referred to as the Reserved GID. It shall never be assigned to any endport. It shall not be used as a destination address or in a global routing header (GRH)." GID table cache now only stores valid GID entries. Before this patch, Reserved GID 0:0:0:0:0:0:0:0 was searchable in the GID table using ib_find_cached_gid_by_port() and other similar find routines. Zero GID is no longer searchable as it shall not to be present in GRH or path recored entry as described in IB spec version 1.3 section 4.1.1, point (6), section 12.7.10 and section 12.7.20. ib_cache_update() is simplified to check link layer once, use unified locking scheme for all link layers, removed temporary gid table allocation/free logic. Additionally, (a) Expand ib_gid_attr to store port and index so that GID query routines can get port and index information from the attribute structure. (b) Expand ib_gid_attr to store device as well so that in future code when GID reference counting is done, device is used to reach back to the GID table entry. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03IB/core: Simplify ib_query_gid to always refer to cacheParav Pandit2-14/+5
Currently following inconsistencies exist. 1. ib_query_gid() returns GID from the software cache for a RoCE port and returns GID from the HCA for an IB port. This is incorrect because software GID cache is maintained regardless of HCA port type. 2. GID is queries from the HCA via ib_query_gid and updated in the software cache for IB link layer. Both of them might not be in sync. ULPs such as SRP initiator, SRP target, IPoIB driver have historically used ib_query_gid() API to query the GID. However CM used cached version during CM processing, When software cache was introduced, this inconsitency remained. In order to simplify, improve readability and avoid link layer specific above inconsistencies, this patch brings following changes. 1. ib_query_gid() always refers to the cache layer regardless of link layer. 2. cache module who reads the GID entry from HCA and builds the cache, directly invokes the HCA provider verb's query_gid() callback function. 3. ib_query_port() is being called in early stage where GID cache is not yet build while reading port immutable property. Therefore it needs to read the default GID from the HCA for IB link layer to publish the subnet prefix. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03RDMA/providers: Simplify query_gid callback of RoCE providersParav Pandit1-1/+3
ib_query_gid() fetches the GID from the software cache maintained in ib_core for RoCE ports. Therefore, simplify the provider drivers for RoCE to treat query_gid() callback as never called for RoCE, and only require non-RoCE devices to implement it. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-03RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA deviceRoland Dreier1-0/+3
Check to make sure that ctx->cm_id->device is set before we use it. Otherwise userspace can trigger a NULL dereference by doing RDMA_USER_CM_CMD_SET_OPTION on an ID that is not bound to a device. Cc: <stable@vger.kernel.org> Reported-by: <syzbot+a67bc93e14682d92fc2f@syzkaller.appspotmail.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-04-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-22/+53
Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c, we had some overlapping changes: 1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE --> MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE 2) In 'net-next' params->log_rq_size is renamed to be params->log_rq_mtu_frames. 3) In 'net-next' params->hard_mtu is added. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-29RDMA/cma: Move rdma_cm_state to cma_priv.hParav Pandit1-0/+14
rdma_cm_state enum is internal to rdma_cm kernel module. It is not required to expose state enums to ULP modules. So lets keep its scope limited to rdma_cm module in cma_priv.h file. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29IB/addr: Constify dst_entry pointerParav Pandit1-5/+7
Make dst_entry pointer as const struct dst_entry* to improve code readablity to make sure that dst structure fields are not modified by various functions which are using it. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29RDMA: Use u64_to_user_ptr everywhereJason Gunthorpe3-20/+20
This is already used in many places, get the rest of them too, only to make the code a bit clearer & simpler. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29RDMA/nldev: Provide netdevice name and indexLeon Romanovsky1-5/+26
Export the net device name and index to easily find connection between IB devices and relevant net devices. We also updated the comment regarding the devices without FW. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29net: Introduce net_rwsem to protect net_namespace_listKirill Tkhai1-0/+2
rtnl_lock() is used everywhere, and contention is very high. When someone wants to iterate over alive net namespaces, he/she has no a possibility to do that without exclusive lock. But the exclusive rtnl_lock() in such places is overkill, and it just increases the contention. Yes, there is already for_each_net_rcu() in kernel, but it requires rcu_read_lock(), and this can't be sleepable. Also, sometimes it may be need really prevent net_namespace_list growth, so for_each_net_rcu() is not fit there. This patch introduces new rw_semaphore, which will be used instead of rtnl_mutex to protect net_namespace_list. It is sleepable and allows not-exclusive iterations over net namespaces list. It allows to stop using rtnl_lock() in several places (what is made in next patches) and makes less the time, we keep rtnl_mutex. Here we just add new lock, while the explanation of we can remove rtnl_lock() there are in next patches. Fine grained locks generally are better, then one big lock, so let's do that with net_namespace_list, while the situation allows that. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-28RDMA/CMA: Add rdma_port_space to UAPISteve Wise1-15/+17
Since the rdma_port_space enum is being passed between user and kernel for user cm_id setup, we need it in a UAPI header. So add it to rdma_user_cm.h. This also fixes the cm_id restrack changes which pass up the port space value via the RDMA_NLDEV_ATTR_RES_PS attribute. Fixes: 00313983cda6 ("RDMA/nldev: provide detailed CM_ID information") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-28RDMA/ucma: Introduce safer rdma_addr_size() variantsRoland Dreier2-17/+33
There are several places in the ucma ABI where userspace can pass in a sockaddr but set the address family to AF_IB. When that happens, rdma_addr_size() will return a size bigger than sizeof struct sockaddr_in6, and the ucma kernel code might end up copying past the end of a buffer not sized for a struct sockaddr_ib. Fix this by introducing new variants int rdma_addr_size_in6(struct sockaddr_in6 *addr); int rdma_addr_size_kss(struct __kernel_sockaddr_storage *addr); that are type-safe for the types used in the ucma ABI and return 0 if the size computed is bigger than the size of the type passed in. We can use these new variants to check what size userspace has passed in before copying any addresses. Reported-by: <syzbot+6800425d54ed3ed8135d@syzkaller.appspotmail.com> Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27IB/core: Refer to RoCE port property to decide building cacheParav Pandit1-2/+2
IB core maintains the GID cache entries for the GID table. This cache table has to be maintained regardless of HCA's support of GID table. For IB and iWarp ports, cache is created by querying the HCA. For RoCE cache is created based on netdev events. Therefore just refer to the RoCE port property of the {device, port} to decide whether to build cache by querying HCA or from netdev events. There is no need to check if HCA support GID table or not. ib_cache_update() referred to RoCE attribute before validating port. Though in all current callers port is valid, it is incorrect to query RoCE port property before validating the port. Therefore, rdma_protocol_roce() check is done after rdma_is_port_valid() verifies that port is valid. Fixes: 115b68aa6ea4 ("IB/ocrdma: Removed GID add/del null routines") Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27IB/core: Search GID only for IB link layerParav Pandit1-1/+1
Even though API is only used by IPoIB driver, its incorrect to refer RoCE GID table property to search for GID. Look for only IB link layer to search for the GID. Fixes: dbb12562f7c2 ("IB/{core, ipoib}: Simplify ib_find_gid to search only for IB link layer") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27IB/core: Refer to RoCE port property instead of GID table propertyParav Pandit1-1/+1
ib_find_gid_by_filter() searches GID with filter only for RoCE link layer regardless of HCA's support for GID table. Therefore, right way to lookup is compare RoCE port property and not the GID table property. Fixes: 99b27e3b5da0 ("IB/cache: Add ib_find_gid_by_filter cache API") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27IB/core: Generate GID change event regardless of RoCE GID table propertyParav Pandit1-7/+5
Due to following reasons, GID table event is generated regardless of GID table property. 1. GID table cache is maintained at ib core layer regardless of link layer. 2. GID change event has no relation with IB link layer. 3. GID change event also doesn't depend on whether HCA supports GID table or not. Fixes: f3906bd36087 ("IB/core: Refactor GID cache's ib_dispatch_event") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27IB/cm: Block processing alternate path handling RoCE Rx cm messagesParav Pandit1-0/+14
Due to below reasons, it is better to not support alternate path receive messages for RoCE in near term. 1. Alternate path for RoCE is not supported at rdmacm layer. 2. It is not supported in uverbs/core layer for RoCE. 3. Alternate path for IPv6 for link local address cannot resolve route determinstically without a valid incoming interface id whose usecase make sense only with dual port mode. 4. init_av_from_path while processing LAP messages for IB and RoCE can lead to adding duplicate entry of AV into the port list, leads to list corruption. 5. rdma-core userspace a well known userspace implementation has removed support of libucm which use ucm.ko module, which is the only module that can trigger alternate path related messages. 6. ucm kernel module is requested to be removed from the IB core in patch [1]. [1] https://patchwork.kernel.org/patch/10268503/ Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27IB/core: Protect against concurrent access to hardware statsMark Bloch1-6/+28
Currently access to hardware stats buffer isn't protected, this can result in multiple writes and reads at the same time to the same memory location. This can lead to providing an incorrect value to the user. Add a mutex to protect against it. Fixes: b40f4757daa1 ("IB/core: Make device counter infrastructure dynamic") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27RDMA/ucma: Fix uABI structure layouts for 32/64 compatJason Gunthorpe1-2/+7
The rdma_ucm_event_resp is a different length on 32 and 64 bit compiles. The kernel requires it to be the expected length or longer so 32 bit builds running on a 64 bit kernel will not work. Retain full compat by having all kernels accept a struct with or without the trailing reserved field. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27RDMA/ucma: Check that device exists prior to accessing itLeon Romanovsky1-2/+4
Ensure that device exists prior to accessing its properties. Reported-by: <syzbot+71655d44855ac3e76366@syzkaller.appspotmail.com> Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27RDMA/ucma: Check that device is connected prior to access itLeon Romanovsky1-0/+5
Add missing check that device is connected prior to access it. [ 55.358652] BUG: KASAN: null-ptr-deref in rdma_init_qp_attr+0x4a/0x2c0 [ 55.359389] Read of size 8 at addr 00000000000000b0 by task qp/618 [ 55.360255] [ 55.360432] CPU: 1 PID: 618 Comm: qp Not tainted 4.16.0-rc1-00071-gcaf61b1b8b88 #91 [ 55.361693] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 [ 55.363264] Call Trace: [ 55.363833] dump_stack+0x5c/0x77 [ 55.364215] kasan_report+0x163/0x380 [ 55.364610] ? rdma_init_qp_attr+0x4a/0x2c0 [ 55.365238] rdma_init_qp_attr+0x4a/0x2c0 [ 55.366410] ucma_init_qp_attr+0x111/0x200 [ 55.366846] ? ucma_notify+0xf0/0xf0 [ 55.367405] ? _get_random_bytes+0xea/0x1b0 [ 55.367846] ? urandom_read+0x2f0/0x2f0 [ 55.368436] ? kmem_cache_alloc_trace+0xd2/0x1e0 [ 55.369104] ? refcount_inc_not_zero+0x9/0x60 [ 55.369583] ? refcount_inc+0x5/0x30 [ 55.370155] ? rdma_create_id+0x215/0x240 [ 55.370937] ? _copy_to_user+0x4f/0x60 [ 55.371620] ? mem_cgroup_commit_charge+0x1f5/0x290 [ 55.372127] ? _copy_from_user+0x5e/0x90 [ 55.372720] ucma_write+0x174/0x1f0 [ 55.373090] ? ucma_close_id+0x40/0x40 [ 55.373805] ? __lru_cache_add+0xa8/0xd0 [ 55.374403] __vfs_write+0xc4/0x350 [ 55.374774] ? kernel_read+0xa0/0xa0 [ 55.375173] ? fsnotify+0x899/0x8f0 [ 55.375544] ? fsnotify_unmount_inodes+0x170/0x170 [ 55.376689] ? __fsnotify_update_child_dentry_flags+0x30/0x30 [ 55.377522] ? handle_mm_fault+0x174/0x320 [ 55.378169] vfs_write+0xf7/0x280 [ 55.378864] SyS_write+0xa1/0x120 [ 55.379270] ? SyS_read+0x120/0x120 [ 55.379643] ? mm_fault_error+0x180/0x180 [ 55.380071] ? task_work_run+0x7d/0xd0 [ 55.380910] ? __task_pid_nr_ns+0x120/0x140 [ 55.381366] ? SyS_read+0x120/0x120 [ 55.381739] do_syscall_64+0xeb/0x250 [ 55.382143] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 55.382841] RIP: 0033:0x7fc2ef803e99 [ 55.383227] RSP: 002b:00007fffcc5f3be8 EFLAGS: 00000217 ORIG_RAX: 0000000000000001 [ 55.384173] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc2ef803e99 [ 55.386145] RDX: 0000000000000057 RSI: 0000000020000080 RDI: 0000000000000003 [ 55.388418] RBP: 00007fffcc5f3c00 R08: 0000000000000000 R09: 0000000000000000 [ 55.390542] R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000400480 [ 55.392916] R13: 00007fffcc5f3cf0 R14: 0000000000000000 R15: 0000000000000000 [ 55.521088] Code: e5 4d 1e ff 48 89 df 44 0f b6 b3 b8 01 00 00 e8 65 50 1e ff 4c 8b 2b 49 8d bd b0 00 00 00 e8 56 50 1e ff 41 0f b6 c6 48 c1 e0 04 <49> 03 85 b0 00 00 00 48 8d 78 08 48 89 04 24 e8 3a 4f 1e ff 48 [ 55.525980] RIP: rdma_init_qp_attr+0x52/0x2c0 RSP: ffff8801e2c2f9d8 [ 55.532648] CR2: 00000000000000b0 [ 55.534396] ---[ end trace 70cee64090251c0b ]--- Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace") Fixes: d541e45500bd ("IB/core: Convert ah_attr from OPA to IB when copying to user") Reported-by: <syzbot+7b62c837c2516f8f38c8@syzkaller.appspotmail.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27RDMA/rdma_cm: Fix use after free race with process_one_reqJason Gunthorpe1-0/+9
process_one_req() can race with rdma_addr_cancel(): CPU0 CPU1 ==== ==== process_one_work() debug_work_deactivate(work); process_one_req() rdma_addr_cancel() mutex_lock(&lock); set_timeout(&req->work,..); __queue_work() debug_work_activate(work); mutex_unlock(&lock); mutex_lock(&lock); [..] list_del(&req->list); mutex_unlock(&lock); [..] // ODEBUG explodes since the work is still queued. kfree(req); Causing ODEBUG to detect the use after free: ODEBUG: free active (active state 0) object type: work_struct hint: process_one_req+0x0/0x6c0 include/net/dst.h:165 WARNING: CPU: 0 PID: 79 at lib/debugobjects.c:291 debug_print_object+0x166/0x220 lib/debugobjects.c:288 kvm: emulating exchange as write Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 79 Comm: kworker/u4:3 Not tainted 4.16.0-rc6+ #361 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: ib_addr process_one_req Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 panic+0x1e4/0x41c kernel/panic.c:183 __warn+0x1dc/0x200 kernel/panic.c:547 report_bug+0x1f4/0x2b0 lib/bug.c:186 fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178 fixup_bug arch/x86/kernel/traps.c:247 [inline] do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986 RIP: 0010:debug_print_object+0x166/0x220 lib/debugobjects.c:288 RSP: 0000:ffff8801d966f210 EFLAGS: 00010086 RAX: dffffc0000000008 RBX: 0000000000000003 RCX: ffffffff815acd6e RDX: 0000000000000000 RSI: 1ffff1003b2cddf2 RDI: 0000000000000000 RBP: ffff8801d966f250 R08: 0000000000000000 R09: 1ffff1003b2cddc8 R10: ffffed003b2cde71 R11: ffffffff86f39a98 R12: 0000000000000001 R13: ffffffff86f15540 R14: ffffffff86408700 R15: ffffffff8147c0a0 __debug_check_no_obj_freed lib/debugobjects.c:745 [inline] debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774 kfree+0xc7/0x260 mm/slab.c:3799 process_one_req+0x2e7/0x6c0 drivers/infiniband/core/addr.c:592 process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113 worker_thread+0x223/0x1990 kernel/workqueue.c:2247 kthread+0x33c/0x400 kernel/kthread.c:238 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406 Fixes: 5fff41e1f89d ("IB/core: Fix race condition in resolving IP to MAC") Reported-by: <syzbot+3b4acab09b6463472d0a@syzkaller.appspotmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27net: Drop pernet_operations::asyncKirill Tkhai1-1/+0
Synchronous pernet_operations are not allowed anymore. All are asynchronous. So, drop the structure member. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23IB/cma: Resolve route only while receiving CM requestsParav Pandit3-0/+11
Currently CM request for RoCE follows following flow. rdma_create_id() rdma_resolve_addr() rdma_resolve_route() For RC QPs: rdma_connect() ->cma_connect_ib() ->ib_send_cm_req() ->cm_init_av_by_path() ->ib_init_ah_attr_from_path() For UD QPs: rdma_connect() ->cma_resolve_ib_udp() ->ib_send_cm_sidr_req() ->cm_init_av_by_path() ->ib_init_ah_attr_from_path() In both the flows, route is already resolved before sending CM requests. Therefore, code is refactored to avoid resolving route second time in ib_cm layer. ib_init_ah_attr_from_path() is extended to resolve route when it is not yet resolved for RoCE link layer. This is achieved by caller setting route_resolved field in path record whenever it has route already resolved. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller6-41/+65
Fun set of conflict resolutions here... For the mac80211 stuff, these were fortunately just parallel adds. Trivially resolved. In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the function phy_disable_interrupts() earlier in the file, whilst in 'net-next' the phy_error() call from this function was removed. In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the 'rt_table_id' member of rtable collided with a bug fix in 'net' that added a new struct member "rt_mtu_locked" which needs to be copied over here. The mlxsw driver conflict consisted of net-next separating the span code and definitions into separate files, whilst a 'net' bug fix made some changes to that moved code. The mlx5 infiniband conflict resolution was quite non-trivial, the RDMA tree's merge commit was used as a guide here, and here are their notes: ==================== Due to bug fixes found by the syzkaller bot and taken into the for-rc branch after development for the 4.17 merge window had already started being taken into the for-next branch, there were fairly non-trivial merge issues that would need to be resolved between the for-rc branch and the for-next branch. This merge resolves those conflicts and provides a unified base upon which ongoing development for 4.17 can be based. Conflicts: drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524 (IB/mlx5: Fix cleanup order on unload) added to for-rc and commit b5ca15ad7e61 (IB/mlx5: Add proper representors support) add as part of the devel cycle both needed to modify the init/de-init functions used by mlx5. To support the new representors, the new functions added by the cleanup patch needed to be made non-static, and the init/de-init list added by the representors patch needed to be modified to match the init/de-init list changes made by the cleanup patch. Updates: drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function prototypes added by representors patch to reflect new function names as changed by cleanup patch drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init stage list to match new order from cleanup patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-22IB/core: Refer to RoCE port property instead of GID table propertyParav Pandit1-1/+1
ib_query_gid() in commit [1] refers to RoCE GID table capability of the HCA using rdma_cap_roce_gid_table(). ib_core maintains the GID table cache regardless of the HCA provider drivers capability to maintain RoCE GID table. Therefore, whether to return a GID table entry from the software cache or from HCA should be done based on whether the port is RoCE or not. [1] commit 03db3a2d81e6 ("IB/core: Add RoCE GID table management") Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-22RDMA/restrack: Remove ambiguity in resource track clean logicLeon Romanovsky1-1/+44
The restrack clean routine had simple, but powerful WARN_ON check to see if all resources are cleared prior to releasing device. The WARN_ON check performed very well, but lack of information which device caused to resource leak, the object type and origin made debug to be fun and challenging at the same time. The fact that all dumps were the same because restrack_clean() is called in dealloc() didn't help either. So let's fix spelling error and convert WARN_ON to be more debug friendly. The dmesg cut below gives example of how the output will look output for the case fixed in patch [1] [ 438.421372] restrack: ------------[ cut here ]------------ [ 438.423448] restrack: BUG: RESTRACK detected leak of resources on mlx5_2 [ 438.425600] restrack: Kernel PD object allocated by mlx5_ib is not freed [ 438.427753] restrack: Kernel CQ object allocated by mlx5_ib is not freed [ 438.429660] restrack: ------------[ cut here ]------------ [1] https://patchwork.kernel.org/patch/10298695/ Cc: Michal Kalderon <Michal.Kalderon@cavium.com> Cc: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-21RDMA/ucma: Correct option size check using optlenChien Tin Tung1-1/+1
The option size check is using optval instead of optlen causing the set option call to fail. Use the correct field, optlen, for size check. Fixes: 6a21dfc0d0db ("RDMA/ucma: Limit possible option size") Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-21RDMA/restrack: Move restrack_clean to be symmetrical to restrack_initLeon Romanovsky1-2/+1
The fact that resource tracking commit 02d8883f520e ("RDMA/restrack: Add general infrastructure to track RDMA resources") was added immediately after commit 16c1975f1032 ("IB/mlx5: Create profile infrastructure to add and remove stages") caused us to miss the fact that PD and CQ are created after ib_register_device, but released after ib_unregister_device() and not before as it is expected from normal flow. Fix introduced in commit 42cea83f9524 ("IB/mlx5: Fix cleanup order on unload") revealed this fact, so this patch is needed to avoid from restrack warnings It fixes resource tracking warnings during shutdown. [ 43.473906] CPU: 5 PID: 3016 Comm: modprobe Not tainted 4.16.0-rc5-for-linust-perf-2018-03-19_07-01-58-14 #1 [ 43.473907] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014 [ 43.473919] RIP: 0010:rdma_restrack_clean+0x25/0x30 [ib_core] [ 43.473921] RSP: 0018:ffffc9000267be48 EFLAGS: 00010282 [ 43.473924] RAX: 0000000000000000 RBX: ffff88033c690070 RCX: 0000000180080006 [ 43.473925] RDX: ffff88035ce922e0 RSI: ffffea000cf1a200 RDI: ffff88033c6907c8 [ 43.473926] RBP: ffff88033c690070 R08: ffff88033c689000 R09: 0000000180080006 [ 43.473927] R10: 000000003c68a001 R11: ffff88033c689000 R12: ffff88033c690000 [ 43.473929] R13: ffff88033c69005c R14: 0000000000000000 R15: 0000000000000000 [ 43.473932] FS: 00007f5928359740(0000) GS:ffff88036c540000(0000) knlGS:0000000000000000 [ 43.473933] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 43.473935] CR2: 00007ffffc760cc8 CR3: 000000035620c000 CR4: 00000000000006e0 [ 43.473940] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 43.473941] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 43.473942] Call Trace: [ 43.473969] ib_unregister_device+0xf5/0x190 [ib_core] [ 43.474000] __mlx5_ib_remove+0x2e/0x40 [mlx5_ib] [ 43.474098] mlx5_remove_device+0xf5/0x120 [mlx5_core] [ 43.474132] mlx5_unregister_interface+0x37/0x90 [mlx5_core] [ 43.474142] mlx5_ib_cleanup+0xc/0x16a [mlx5_ib] [ 43.474152] SyS_delete_module+0x159/0x260 [ 43.474159] do_syscall_64+0x61/0x110 [ 43.474165] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 43.474168] RIP: 0033:0x7f59278466b7 [ 43.474170] RSP: 002b:00007ffffc763e38 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0 [ 43.474172] RAX: ffffffffffffffda RBX: 000000000130d590 RCX: 00007f59278466b7 [ 43.474173] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000000000130d5f8 [ 43.474175] RBP: 0000000000000000 R08: 00007f5927b0b060 R09: 00007f59278b6a40 [ 43.474176] R10: 00007ffffc763bc0 R11: 0000000000000202 R12: 0000000000000000 [ 43.474177] R13: 0000000000000001 R14: 000000000130d5f8 R15: 0000000000000000 [ 43.474179] Code: 84 00 00 00 00 00 0f 1f 44 00 00 48 83 c7 28 31 c0 eb 0c 48 83 c0 08 48 3d 00 08 00 00 74 0f 48 8d 14 07 48 8b 12 48 85 d2 74 e8 <0f> 0b c3 f3 c3 66 0f 1f 44 00 00 0f 1f 44 00 00 53 48 8b 47 28 [ 43.474221] ---[ end trace e89771e2250ffc23 ]--- Fixes: 42cea83f9524 ("IB/mlx5: Fix cleanup order on unload") Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-20RDMA/ucma: Ensure that CM_ID exists prior to access itLeon Romanovsky1-6/+9
Prior to access UCMA commands, the context should be initialized and connected to CM_ID with ucma_create_id(). In case user skips this step, he can provide non-valid ctx without CM_ID and cause to multiple NULL dereferences. Also there are situations where the create_id can be raced with other user access, ensure that the context is only shared to other threads once it is fully initialized to avoid the races. [ 109.088108] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [ 109.090315] IP: ucma_connect+0x138/0x1d0 [ 109.092595] PGD 80000001dc02d067 P4D 80000001dc02d067 PUD 1da9ef067 PMD 0 [ 109.095384] Oops: 0000 [#1] SMP KASAN PTI [ 109.097834] CPU: 0 PID: 663 Comm: uclose Tainted: G B 4.16.0-rc1-00062-g2975d5de6428 #45 [ 109.100816] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 [ 109.105943] RIP: 0010:ucma_connect+0x138/0x1d0 [ 109.108850] RSP: 0018:ffff8801c8567a80 EFLAGS: 00010246 [ 109.111484] RAX: 0000000000000000 RBX: 1ffff100390acf50 RCX: ffffffff9d7812e2 [ 109.114496] RDX: 1ffffffff3f507a5 RSI: 0000000000000297 RDI: 0000000000000297 [ 109.117490] RBP: ffff8801daa15600 R08: 0000000000000000 R09: ffffed00390aceeb [ 109.120429] R10: 0000000000000001 R11: ffffed00390aceea R12: 0000000000000000 [ 109.123318] R13: 0000000000000120 R14: ffff8801de6459c0 R15: 0000000000000118 [ 109.126221] FS: 00007fabb68d6700(0000) GS:ffff8801e5c00000(0000) knlGS:0000000000000000 [ 109.129468] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 109.132523] CR2: 0000000000000020 CR3: 00000001d45d8003 CR4: 00000000003606b0 [ 109.135573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 109.138716] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 109.142057] Call Trace: [ 109.144160] ? ucma_listen+0x110/0x110 [ 109.146386] ? wake_up_q+0x59/0x90 [ 109.148853] ? futex_wake+0x10b/0x2a0 [ 109.151297] ? save_stack+0x89/0xb0 [ 109.153489] ? _copy_from_user+0x5e/0x90 [ 109.155500] ucma_write+0x174/0x1f0 [ 109.157933] ? ucma_resolve_route+0xf0/0xf0 [ 109.160389] ? __mod_node_page_state+0x1d/0x80 [ 109.162706] __vfs_write+0xc4/0x350 [ 109.164911] ? kernel_read+0xa0/0xa0 [ 109.167121] ? path_openat+0x1b10/0x1b10 [ 109.169355] ? fsnotify+0x899/0x8f0 [ 109.171567] ? fsnotify_unmount_inodes+0x170/0x170 [ 109.174145] ? __fget+0xa8/0xf0 [ 109.177110] vfs_write+0xf7/0x280 [ 109.179532] SyS_write+0xa1/0x120 [ 109.181885] ? SyS_read+0x120/0x120 [ 109.184482] ? compat_start_thread+0x60/0x60 [ 109.187124] ? SyS_read+0x120/0x120 [ 109.189548] do_syscall_64+0xeb/0x250 [ 109.192178] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 109.194725] RIP: 0033:0x7fabb61ebe99 [ 109.197040] RSP: 002b:00007fabb68d5e98 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 109.200294] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fabb61ebe99 [ 109.203399] RDX: 0000000000000120 RSI: 00000000200001c0 RDI: 0000000000000004 [ 109.206548] RBP: 00007fabb68d5ec0 R08: 0000000000000000 R09: 0000000000000000 [ 109.209902] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fabb68d5fc0 [ 109.213327] R13: 0000000000000000 R14: 00007fff40ab2430 R15: 00007fabb68d69c0 [ 109.216613] Code: 88 44 24 2c 0f b6 84 24 6e 01 00 00 88 44 24 2d 0f b6 84 24 69 01 00 00 88 44 24 2e 8b 44 24 60 89 44 24 30 e8 da f6 06 ff 31 c0 <66> 41 83 7c 24 20 1b 75 04 8b 44 24 64 48 8d 74 24 20 4c 89 e7 [ 109.223602] RIP: ucma_connect+0x138/0x1d0 RSP: ffff8801c8567a80 [ 109.226256] CR2: 0000000000000020 Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace") Reported-by: <syzbot+36712f50b0552615bf59@syzkaller.appspotmail.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/uverbs: Enable ioctl() uAPI by default for new verbsMatan Barak2-4/+2
Enable the ioctl() uAPI for IB by default if the standard write() uAPI (INFINIBAND_USER_ACCESS) is enabled. Verbs that are also available under the old write() uAPI are put inside a new INFINIBAND_EXP_LEGACY_VERBS_NEW_UAPI Kconfig. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19IB/uverbs: Move ioctl path of create_cq and destroy_cq to a new fileMatan Barak4-177/+215
Currently, all objects are declared in uverbs_std_types. This could lead to a huge file once we implement all objects, methods and handlers. Moving each object to its own file to keep the files smaller and more readable. uverbs_std_types.c will only contain the parsing tree definition and objects without any methods. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>