aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/ulp/ipoib/ipoib_main.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-01-29IB/ipoib: Make ipoib_intercept_dev_id_attr() staticKamal Heib1-1/+1
The function ipoib_intercept_dev_id_attr() is only used in ipoib_main.c Fixes: f6350da41dc7 ("IB/ipoib: Log sysfs 'dev_id' accesses from userspace") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-01-24infiniband: ipoib: no need to check return value of debugfs_create functionsGreg Kroah-Hartman1-3/+1
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-28Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-2/+2
Pull rdma updates from Jason Gunthorpe: "This has been a fairly typical cycle, with the usual sorts of driver updates. Several series continue to come through which improve and modernize various parts of the core code, and we finally are starting to get the uAPI command interface cleaned up. - Various driver fixes for bnxt_re, cxgb3/4, hfi1, hns, i40iw, mlx4, mlx5, qib, rxe, usnic - Rework the entire syscall flow for uverbs to be able to run over ioctl(). Finally getting past the historic bad choice to use write() for command execution - More functional coverage with the mlx5 'devx' user API - Start of the HFI1 series for 'TID RDMA' - SRQ support in the hns driver - Support for new IBTA defined 2x lane widths - A big series to consolidate all the driver function pointers into a big struct and have drivers provide a 'static const' version of the struct instead of open coding initialization - New 'advise_mr' uAPI to control device caching/loading of page tables - Support for inline data in SRPT - Modernize how umad uses the driver core and creates cdev's and sysfs files - First steps toward removing 'uobject' from the view of the drivers" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (193 commits) RDMA/srpt: Use kmem_cache_free() instead of kfree() RDMA/mlx5: Signedness bug in UVERBS_HANDLER() IB/uverbs: Signedness bug in UVERBS_HANDLER() IB/mlx5: Allocate the per-port Q counter shared when DEVX is supported IB/umad: Start using dev_groups of class IB/umad: Use class_groups and let core create class file IB/umad: Refactor code to use cdev_device_add() IB/umad: Avoid destroying device while it is accessed IB/umad: Simplify and avoid dynamic allocation of class IB/mlx5: Fix wrong error unwind IB/mlx4: Remove set but not used variable 'pd' RDMA/iwcm: Don't copy past the end of dev_name() string IB/mlx5: Fix long EEH recover time with NVMe offloads IB/mlx5: Simplify netdev unbinding IB/core: Move query port to ioctl RDMA/nldev: Expose port_cap_flags2 IB/core: uverbs copy to struct or zero helper IB/rxe: Reuse code which sets port state IB/rxe: Make counters thread safe IB/mlx5: Use the correct commands for UMEM and UCTX allocation ...
2018-12-12RDMA: Start use ib_device_opsKamal Heib1-2/+2
Make all the required change to start use the ib_device_ops structure. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06net: core: dev: Add extack argument to dev_change_flags()Petr Machata1-3/+3
In order to pass extack together with NETDEV_PRE_UP notifications, it's necessary to route the extack to __dev_open() from diverse (possibly indirect) callers. One prominent API through which the notification is invoked is dev_change_flags(). Therefore extend dev_change_flags() with and extra extack argument and update all users. Most of the calls end up just encoding NULL, but several sites (VLAN, ipvlan, VRF, rtnetlink) do have extack available. Since the function declaration line is changed anyway, name the other function arguments to placate checkpatch. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-1/+35
Pull rdma updates from Jason Gunthorpe: "This has been a smaller cycle with many of the commits being smallish code fixes and improvements across the drivers. - Driver updates for bnxt_re, cxgb4, hfi1, hns, mlx5, nes, qedr, and rxe - Memory window support in hns - mlx5 user API 'flow mutate/steering' allows accessing the full packet mangling and matching machinery from user space - Support inter-working with verbs API calls in the 'devx' mlx5 user API, and provide options to use devx with less privilege - Modernize the use of syfs and the device interface to use attribute groups and cdev properly for uverbs, and clean up some of the core code's device list management - More progress on net namespaces for RDMA devices - Consolidate driver BAR mmapping support into core code helpers and rework how RDMA holds poitners to mm_struct for get_user_pages cases - First pass to use 'dev_name' instead of ib_device->name - Device renaming for RDMA devices" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (242 commits) IB/mlx5: Add support for extended atomic operations RDMA/core: Fix comment for hw stats init for port == 0 RDMA/core: Refactor ib_register_device() function RDMA/core: Fix unwinding flow in case of error to register device ib_srp: Remove WARN_ON in srp_terminate_io() IB/mlx5: Allow scatter to CQE without global signaled WRs IB/mlx5: Verify that driver supports user flags IB/mlx5: Support scatter to CQE for DC transport type RDMA/drivers: Use core provided API for registering device attributes RDMA/core: Allow existing drivers to set one sysfs group per device IB/rxe: Remove unnecessary enum values RDMA/umad: Use kernel API to allocate umad indexes RDMA/uverbs: Use kernel API to allocate uverbs indexes RDMA/core: Increase total number of RDMA ports across all devices IB/mlx4: Add port and TID to MAD debug print IB/mlx4: Enable debug print of SMPs RDMA/core: Rename ports_parent to ports_kobj RDMA/core: Do not expose unsupported counters IB/mlx4: Refer to the device kobject instead of ports_parent RDMA/nldev: Allow IB device rename through RDMA netlink ...
2018-10-10RDMA/netdev: Fix netlink support in IPoIBDenis Drozdov1-53/+70
IPoIB netlink support was broken by the below commit since integrating the rdma_netdev support relies on an allocation flow for netdevs that was controlled by the ipoib driver while netdev's rtnl_newlink implementation assumes that the netdev will be allocated by netlink. Such situation leads to crash in __ipoib_device_add, once trying to reuse netlink device. This patch fixes the kernel oops for both mlx4 and mlx5 devices triggered by the following command: Fixes: cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks") Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-10RDMA/netdev: Hoist alloc_netdev_mqs out of the driverDenis Drozdov1-13/+8
netdev has several interfaces that expect to call alloc_netdev_mqs from the core code, with the driver only providing the arguments. This is incompatible with the rdma_netdev interface that returns the netdev directly. Thus re-organize the API used by ipoib so that the verbs core code calls alloc_netdev_mqs for the driver. This is done by allowing the drivers to provide the allocation parameters via a 'get_params' callback and then initializing an allocated netdev as a second step. Fixes: cd565b4b51e5 ("IB/IPoIB: Support acceleration options callbacks") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-09-13IB/ipoib: Log sysfs 'dev_id' accesses from userspaceArseny Maslennikov1-0/+31
Some tools may currently be using only the deprecated attribute; let's print an elaborate and clear deprecation notice to kmsg. To do that, we have to replace the whole sysfs file, since we inherit the original one from netdev. Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-13IB/ipoib: Use dev_port to expose network interface port numbersArseny Maslennikov1-0/+2
Some InfiniBand network devices have multiple ports on the same PCI function. This initializes the `dev_port' sysfs field of those network interfaces with their port number. Prior to this the kernel erroneously used the `dev_id' sysfs field of those network interfaces to convey the port number to userspace. The use of `dev_id' was considered correct until Linux 3.15, when another field, `dev_port', was defined for this particular purpose and `dev_id' was reserved for distinguishing stacked ifaces (e.g: VLANs) with the same hardware address as their parent device. Similar fixes to net/mlx4_en and many other drivers, which started exporting this information through `dev_id' before 3.15, were accepted into the kernel 4 years ago. See 76a066f2a2a0 (`net/mlx4_en: Expose port number through sysfs'). Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-06IB/ipoib: Ensure that MTU isn't less than minimum permittedMuhammad Sammar1-1/+2
It is illegal to change MTU to a value lower than the minimum MTU stated in ethernet spec. In addition to that we need to add 4 bytes for encapsulation header (IPOIB_ENCAP_LEN). Before "ifconfig ib0 mtu 0" command, succeeds while it obviously shouldn't. Signed-off-by: Muhammad Sammar <muhammads@mellanox.com> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-02IB/ipoib: Maintain the child_intfs list from ndo_init/uninitJason Gunthorpe1-0/+16
This fixes a bug in the netlink path where the vlan_rwsem was not held around __ipoib_vlan_add causing the child_intfs to be manipulated unsafely. In the process this greatly simplifies the vlan_rwsem write side locking to only cover a single non-sleeping statement. This also further increases the safety of the removal ordering by holding the netdev of the parent while the child is active to ensure most bugs become either an oops on a NULL priv or a deadlock on the netdev refcount. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-02IB/ipoib: Do not remove child devices from within the ndo_uninitJason Gunthorpe1-11/+17
Switching to priv_destructor and needs_free_netdev created a subtle ordering problem in ipoib_remove_one. Now that unregister_netdev frees the netdev and priv we must ensure that the children are unregistered before trying to unregister the parent, or child unregister will use after free. The solution is to unregister the children, then parent, in the same batch all while holding the rtnl_lock. This closes all the races where a new child could have been added and ensures proper ordering. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-02IB/ipoib: Get rid of the sysfs_mutexJason Gunthorpe1-6/+1
This mutex was introduced to deal with the deadlock formed by calling unregister_netdev from within the sysfs callback of a netdev. Now that we have priv_destructor and needs_free_netdev we can switch to the more targeted solution of running the unregister from a work queue. This avoids the deadlock and gets rid of the mutex. The next patch in the series needs this mutex eliminated to create atomicity of unregisteration. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-02RDMA/netdev: Use priv_destructor for netdev cleanupJason Gunthorpe1-37/+64
Now that the unregister_netdev flow for IPoIB no longer relies on external code we can now introduce the use of priv_destructor and needs_free_netdev. The rdma_netdev flow is switched to use the netdev common priv_destructor instead of the special free_rdma_netdev and the IPOIB ULP adjusted: - priv_destructor needs to switch to point to the ULP's destructor which will then call the rdma_ndev's in the right order - We need to be careful around the error unwind of register_netdev as it sometimes calls priv_destructor on failure - ULPs need to use ndo_init/uninit to ensure proper ordering of failures around register_netdev Switching to priv_destructor is a necessary pre-requisite to using the rtnl new_link mechanism. The VNIC user for rdma_netdev should also be revised, but that is left for another patch. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Denis Drozdov <denisd@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-02IB/ipoib: Move init code to ndo_initJason Gunthorpe1-82/+111
Now that we have a proper ndo_uninit, move code that naturally pairs with the ndo_uninit into ndo_init. This allows the netdev core to natually handle ordering. This fixes the situation where register_netdev can fail before calling ndo_init, in which case it wouldn't call ndo_uninit either. Also move a bunch of duplicated init code that is shared between child and parent for clarity. Now the child and parent register functions look very similar. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-02IB/ipoib: Move all uninit code into ndo_uninitJason Gunthorpe1-27/+33
Currently uninit is sometimes done twice in error flows, and is sprinkled a bit all over the place. Improve the clarity of the design by moving all uninit only into ndo_uinit. Some duplication is removed: - Sometimes IPOIB_STOP_NEIGH_GC was done before unregister, but this duplicates the process in ipoib_neigh_hash_init - Flushing priv->wq was sometimes done before unregister, but that duplicates what has been done in ndo_uninit Uniniting the IB event queue must remain before unregister_netdev as it requires the RTNL lock to be dropped, this is moved to a helper to make that flow really clear and remove some duplication in error flows. If register_netdev fails (and ndo_init is NULL) then it almost always calls ndo_uninit, which lets us remove all the extra code from the error unwinds. The next patch in the series will close the 'almost always' hole by pairing a proper ndo_init with ndo_uninit. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-08-02IB/ipoib: Use cancel_delayed_work_sync for neigh-clean taskErez Shitrit1-23/+10
The neigh_reap_task is self restarting, but so long as we call cancel_delayed_work_sync() it will be guaranteed to not be running and never start again. Thus we don't need to have the racy IPOIB_STOP_NEIGH_GC bit, or the confusing mismatch of places sometimes calling flush_workqueue after the cancel. This fixes a situation where the GC work could have been left running in some rare situations. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-08-02IB/ipoib: Get rid of IPOIB_FLAG_GOING_DOWNJason Gunthorpe1-3/+0
This essentially duplicates the netdev's reg_state, so just use that directly. The reg_state is updated under the rntl_lock, and all places using GOING_DOWN already acquire the rtnl_lock so checking is safe. Since the only place we use GOING_DOWN is for the parent device this does not fix any bugs, but it is a step to tidy up the unregister flow so that after later patches the flow is uniform and sane. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-24IB/ipoib: Fix error return code in ipoib_dev_init()Wei Yongjun1-1/+2
Fix to return a negative error code from the ipoib_neigh_hash_init() error handling case instead of 0, as done elsewhere in this function. Fixes: 515ed4f3aab4 ("IB/IPoIB: Separate control and data related initializations") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-23IPoIB: use kvzalloc to allocate an array of bucket pointersJan Dakinevich1-2/+2
This table by default takes 32KiB which is 3rd memory order. Meanwhile, this memory is not aimed for DMA operation and could be safely allocated by vmalloc. Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com> Reviewed-by: HÃ¥kon Bugge <haakon.bugge@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09RDMA/ipoib: Fix use of sizeof()Kamal Heib1-5/+5
Make sure to use sizeof(...) instead of sizeof ... which is more preferred. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-09RDMA/ipoib: Prefer unsigned int to bare use of unsignedKamal Heib1-1/+3
This commit replaces all the unsigned definitions in favour of 'unsigned int' which is preferred. Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-25IB/cm: Replace members of sa_path_rec with 'struct sgid_attr *'Parav Pandit1-1/+1
While processing a path record entry in CM messages the associated GID attribute is now also supplied. Currently for RoCE a netdevice's net namespace pointer and ifindex are stored in path record entry. Both of these fields of the netdev can change anytime while processing CM messages. Additionally storing net namespace without holding reference will lead to use-after-free crash. Therefore it is removed. Netdevice information for RoCE is instead provided via referenced gid attribute in ib_cm requests. Such a design leads to a situation where the kernel can crash when the net pointer becomes invalid. However today it is always initialized to init_net, which cannot become invalid. In order to support processing packets in any arbitrary namespace of the received packet, it is necessary to avoid such conditions. This patch removes the dependency on the net pointer and ifindex; instead it will rely on SGID attribute which contains a pointer to netdev. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-25IB: Make init_ah_attr_grh_fields set sgid_attrParav Pandit1-1/+3
Use the sgid and other information from the path record to figure out the sgid_attrs. Store the selected table entry in the sgid_attr for everything else to use. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-18IB: Replace ib_query_gid/ib_get_cached_gid with rdma_query_gidParav Pandit1-2/+2
If the gid_attr argument is NULL then the functions behave identically to rdma_query_gid. ib_query_gid just calls ib_get_cached_gid, so everything can be consolidated to one function. Now that all callers either use rdma_query_gid() or ib_get_cached_gid(), ib_query_gid() API is removed. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-12treewide: Use array_size() in vzalloc()Kees Cook1-1/+2
The vzalloc() function has no 2-factor argument form, so multiplication factors need to be wrapped in array_size(). This patch replaces cases of: vzalloc(a * b) with: vzalloc(array_size(a, b)) as well as handling cases of: vzalloc(a * b * c) with: vzalloc(array3_size(a, b, c)) This does, however, attempt to ignore constant size factors like: vzalloc(4 * 1024) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( vzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | vzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( vzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | vzalloc( - sizeof(u8) * COUNT + COUNT , ...) | vzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | vzalloc( - sizeof(char) * COUNT + COUNT , ...) | vzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( vzalloc( - sizeof(TYPE) * (COUNT_ID) + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT_ID + array_size(COUNT_ID, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT_CONST + array_size(COUNT_CONST, sizeof(TYPE)) , ...) | vzalloc( - sizeof(THING) * (COUNT_ID) + array_size(COUNT_ID, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT_ID + array_size(COUNT_ID, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * (COUNT_CONST) + array_size(COUNT_CONST, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT_CONST + array_size(COUNT_CONST, sizeof(THING)) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ vzalloc( - SIZE * COUNT + array_size(COUNT, SIZE) , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( vzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | vzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | vzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( vzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | vzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | vzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | vzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( vzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | vzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( vzalloc(C1 * C2 * C3, ...) | vzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants. @@ expression E1, E2; constant C1, C2; @@ ( vzalloc(C1 * C2, ...) | vzalloc( - E1 * E2 + array_size(E1, E2) , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12treewide: kzalloc() -> kcalloc()Kees Cook1-3/+4
The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-05-22RDMA/ipoib: drop skb on path record lookup failureEvgenii Smirnov1-41/+21
In unicast_arp_send function there is an inconsistency in error handling of path_rec_start call. If path_rec_start is called because of an absent ah field, skb will be dropped. But if it is called on a creation of a new path, or if the path is invalid, skb will be added to the tail of path queue. In case of a new path it will be dropped on path_free, but in case of invalid path it can stay in the queue forever. This patch unifies the behavior, dropping skb in all cases of path_rec_start failure. Signed-off-by: Evgenii Smirnov <evgenii.smirnov@profitbricks.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-22RDMA/ipoib: Update paths on CLIENT_REREG/SM_CHANGE eventsDoug Ledford1-6/+27
We do a light flush on CLIENT_REREG and SM_CHANGE events. This goes through and marks paths invalid. But we weren't always checking for this validity when we needed to, and so we could keep using a path marked invalid. What's more, once we establish a path with a valid ah, we put a pointer to the ah in the neigh struct directly, so even if we mark the path as invalid, as long as the neigh has a direct pointer to the ah, it keeps using the old, outdated ah. To fix this we do several things. 1) Put the valid flag in the ah instead of the path struct, so when we put the ah pointer directly in the neigh struct, we can easily check the validity of the ah on send events. 2) Check the neigh->ah and neigh->ah->valid elements in the needed places, and if we have an ah, but it's invalid, then invoke a refresh of the ah. 3) Fix the various places that check for path, but didn't check for path->valid (now path->ah && path->ah->valid). Reported-by: Evgenii Smirnov <evgenii.smirnov@profitbricks.com> Fixes: ee1e2c82c245 ("IPoIB: Refresh paths instead of flushing them on SM change events") Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-04-27IB/ipoib: fix ipoib_start_xmit()'s return typeLuc Van Oostenryck1-1/+1
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t', which is a typedef for an enum type, but the implementation in this driver returns an 'int'. Fix this by returning 'netdev_tx_t' in this driver too. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-01IB/ipoib: Fix for potential no-carrier stateAlex Estrin1-0/+3
On reboot SM can program port pkey table before ipoib registered its event handler, which could result in missing pkey event and leave root interface with initial pkey value from index 0. Since OPA port starts with invalid pkey in index 0, root interface will fail to initialize and stay down with no-carrier flag. For IB ipoib interface may end up with pkey different from value opensm put in pkey table idx 0, resulting in connectivity issues (different mcast groups, for example). Close the window by calling event handler after registration to make sure ipoib pkey is in sync with port pkey table. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08Merge branch 'bart-srpt-for-next' into k.o/wip/dl-for-nextDoug Ledford1-7/+18
Merging in 12 patch series from Bart that required changes in the current for-rc branch in order to apply cleanly. Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-02IB/ipoib: Fix race condition in neigh creationErez Shitrit1-7/+18
When using enhanced mode for IPoIB, two threads may execute xmit in parallel to two different TX queues while the target is the same. In this case, both of them will add the same neighbor to the path's neigh link list and we might see the following message: list_add double add: new=ffff88024767a348, prev=ffff88024767a348... WARNING: lib/list_debug.c:31__list_add_valid+0x4e/0x70 ipoib_start_xmit+0x477/0x680 [ib_ipoib] dev_hard_start_xmit+0xb9/0x3e0 sch_direct_xmit+0xf9/0x250 __qdisc_run+0x176/0x5d0 __dev_queue_xmit+0x1f5/0xb10 __dev_queue_xmit+0x55/0xb10 Analysis: Two SKB are scheduled to be transmitted from two cores. In ipoib_start_xmit, both gets NULL when calling ipoib_neigh_get. Two calls to neigh_add_path are made. One thread takes the spin-lock and calls ipoib_neigh_alloc which creates the neigh structure, then (after the __path_find) the neigh is added to the path's neigh link list. When the second thread enters the critical section it also calls ipoib_neigh_alloc but in this case it gets the already allocated ipoib_neigh structure, which is already linked to the path's neigh link list and adds it again to the list. Which beside of triggering the list, it creates a loop in the linked list. This loop leads to endless loop inside path_rec_completion. Solution: Check list_empty(&neigh->list) before adding to the list. Add a similar fix in "ipoib_multicast.c::ipoib_mcast_send" Fixes: b63b70d87741 ('IPoIB: Use a private hash table for path lookup in xmit path') Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18IB/{core, cm, cma, ipoib}: Rename ib_init_ah_from_path to ib_init_ah_attr_from_pathParav Pandit1-1/+2
Since ib_init_ah_from_path initializes the address handle attribute, it is renamed to reflect so. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18IB/ipoib: Update pathrec field if not valid recordErez Shitrit1-15/+34
In case that the PathRecord is not valid (SM changed its network prefix) ipoib will continue issue PathQuery requests with the same parameters that are in its database, which are no longer valid anymore. Now the driver in that case will re-initialize the record from a valid place (the priv structure keeps the updated values), and a valid request will be issued. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-18IB/ipoib: Avoid memory leak if the SA returns a different DGIDErez Shitrit1-0/+16
The ipoib path database is organized around DGIDs from the LLADDR, but the SA is free to return a different GID when asked for path. This causes a bug because the SA's modified DGID is copied into the database key, even though it is no longer the correct lookup key, causing a memory leak and other malfunctions. Ensure the database key does not change after the SA query completes. Demonstration of the bug is as follows ipoib wants to send to GID fe80:0000:0000:0000:0002:c903:00ef:5ee2, it creates new record in the DB with that gid as a key, and issues a new request to the SM. Now, the SM from some reason returns path-record with other SGID (for example, 2001:0000:0000:0000:0002:c903:00ef:5ee2 that contains the local subnet prefix) now ipoib will overwrite the current entry with the new one, and if new request to the original GID arrives ipoib will not find it in the DB (was overwritten) and will create new record that in its turn will also be overwritten by the response from the SM, and so on till the driver eats all the device memory. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-13IB/ipoib: Warn when one port fails to initializeYuval Shaia1-3/+4
If one port fails to initialize an error message should indicate the reason and driver should continue serving the working port(s) and other HCA(s). Fixes: e4b2d06892c7 ("IB/ipoib: Remove device when one port fails to init"). Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-11IB/ipoib: Replace printk with pr_warnYuval Shaia1-12/+11
pr_* is the preferred way to print messages, replace all printk(KERN_WARN, ...) with pr_warn. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-10-25IB/ipoib: Use NAPI in UD/TX flowsErez Shitrit1-5/+19
Instead of explicit call to poll_cq of the tx ring, use the NAPI mechanism to handle the completions of each packet that has been sent to the HW. The next major changes were taken: * The driver init completion function in the creation of the send CQ, that function triggers the napi scheduling. * The driver uses CQ for RX for both modes UD and CM, and CQ for TX for CM and UD. Cc: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18Merge branch 'timer_setup' into for-nextDoug Ledford1-2/+1
Conflicts: drivers/infiniband/hw/cxgb4/cm.c drivers/infiniband/hw/qib/qib_driver.c drivers/infiniband/hw/qib/qib_mad.c There were minor fixups needed in these files. Just minor context diffs due to patches from independent sources touching the same basic area. Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18Merge branch 'for-next-early' into for-nextDoug Ledford1-2/+2
The early for-next branch was based on v4.14-rc2, while the shared pull request I got from Mellanox used a v4.14-rc4 base. I'm making the branch that was the shared Mellanox pull request the new for-next branch and merging the early for-next branch into it. Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-09IB/ipoib: Convert timers to use timer_setup()Kees Cook1-2/+1
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Doug Ledford <dledford@redhat.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Alex Vesker <valex@mellanox.com> Cc: Erez Shitrit <erezsh@mellanox.com> Cc: Zhu Yanjun <yanjun.zhu@oracle.com> Cc: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Yuval Shaia <yuval.shaia@oracle.com> Cc: linux-rdma@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27IB/ipoib: Remove device when one port fails to initYuval Shaia1-1/+2
Call ipoib_remove_one when one of the IPoIB ports fails to initialize in order not to leave the module in unstable state. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27IB: Move PCI dependency from root KConfig to HW's KConfigsYuval Shaia1-1/+0
No reason to have dependency on PCI for the entire infiniband stack so move it to KConfig of only the drivers that actually using PCI. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25IB/ipoib: Fix inconsistency with free_netdev and free_rdma_netdevAlex Vesker1-4/+11
Call free_rdma_netdev instead of free_netdev each time we want to release a netdevice. This call is also relevant for future freeing of offloaded child interfaces. This patch also adds a missing call for free netdevice when releasing a parent interface that has child interfaces using ipoib_remove_one. Fixes: cd565b4b51e5 ('IB/IPoIB: Support acceleration options callbacks') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24Merge branch 'mellanox' into k.o/for-nextDoug Ledford1-9/+6
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24IB/ipoib: Enable ioctl for to IPoIB rdma netdevsFeras Daoud1-0/+15
Adds support for ioctl callback in the RDMA netdevs to allow supporting functions not handled by the generic interface code. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Eitan Rabin <rabin@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24IB/ipoib: Sync between remove_one to sysfs calls that use rtnl_lockErez Shitrit1-0/+5
In order to avoid deadlock between sysfs functions (like create/delete child) and remove_one (both of them are using the sysfs lock and rtnl_lock) the driver will use a state mutex for sync. That will fix traces as the following: schedule+0x3e/0x90 kernfs_drain+0x75/0xf0 ? wait_woken+0x90/0x90 __kernfs_remove+0x12e/0x1c0 kernfs_remove+0x25/0x40 sysfs_remove_dir+0x57/0x90 kobject_del+0x22/0x60 device_del+0x195/0x230 pm_runtime_set_memalloc_noio+0xac/0xf0 netdev_unregister_kobject+0x71/0x80 rollback_registered_many+0x205/0x2f0 rollback_registered+0x31/0x40 unregister_netdevice_queue+0x58/0xb0 unregister_netdev+0x20/0x30 ipoib_remove_one+0xb7/0x240 [ib_ipoib] ib_unregister_device+0xbc/0x1b0 [ib_core] ib_unregister_mad_agent+0x29/0x30 [ib_core] mlx4_ib_remove+0x67/0x280 [mlx4_ib] INFO: task echo:24082 blocked for more than 120 seconds. Tainted: G OE 4.1.12-37.5.1.el6uek.x86_64 #2 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Call Trace: schedule+0x3e/0x90 schedule_preempt_disabled+0xe/0x10 __mutex_lock_slowpath+0x95/0x110 ? _rcu_barrier+0x177/0x220 mutex_lock+0x23/0x40 rtnl_lock+0x15/0x20 netdev_run_todo+0x81/0x1f0 rtnl_unlock+0xe/0x10 ipoib_vlan_delete+0x12f/0x1c0 [ib_ipoib] delete_child+0x69/0x80 [ib_ipoib] dev_attr_store+0x20/0x30 sysfs_kf_write+0x41/0x50 Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-24RDMA/(core, ulp): Convert register/unregister event handler to be voidLeon Romanovsky1-9/+1
The functions ib_register_event_handler() and ib_unregister_event_handler() always returned success and they can't fail. Let's convert those functions to be void, remove redundant checks and cleanup tons of goto statements. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>