linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2021-10-13	RDMA/core: Set sgtable nents when using ib_dma_virt_map_sg()	Logan Gunthorpe	1	-1/+6
	ib_dma_map_sgtable_attrs() should be mapping the sgls and setting nents but the ib_uses_virt_dma() path falls back to ib_dma_virt_map_sg() which will not set the nents in the sgtable. Check the return value (per the map_sg calling convention) and set sgt->nents appropriately on success. Fixes: 79fbd3e1241c ("RDMA: Use the sg_table directly and remove the opencoded version from umem") Link: https://lore.kernel.org/r/20211013165942.89806-1-logang@deltatee.com Reported-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/rxe: Convert kernel UD post send to use ah_num	Bob Pearson	1	-6/+3
	Modify ib_post_send for kernel UD sends to put the AH index into the WQE instead of the address vector. Link: https://lore.kernel.org/r/20211007204051.10086-7-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/rxe: Lookup kernel AH from ah index in UD WQEs	Bob Pearson	2	-4/+24
	Add code to rxe_get_av in rxe_av.c to use the AH index in UD send WQEs to lookup the kernel AH. For old user providers continue to use the AV passed in WQEs. Move setting pkt->rxe to before the call to rxe_get_av() to get access to the AH pool. Link: https://lore.kernel.org/r/20211007204051.10086-6-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/rxe: Replace ah->pd by ah->ibah.pd	Bob Pearson	1	-1/+5
	The pd field in struct rxe_ah is redundant with the pd field in the rdma-core's ib_ah. Eliminate the pd field in rxe_ah and add an inline to extract the pd from the ibah field. Link: https://lore.kernel.org/r/20211007204051.10086-5-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/rxe: Create AH index and return to user space	Bob Pearson	3	-2/+39
	Make changes to rdma_user_rxe.h to allow indexing AH objects, passing the index in UD send WRs to the driver and returning the index to the rxe provider. Modify rxe_create_ah() to add an index to AH when created and if called from a new user provider return it to user space. If called from an old provider mark the AH as not having a useful index. Modify rxe_destroy_ah to drop the index before deleting the object. Link: https://lore.kernel.org/r/20211007204051.10086-4-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/rxe: Change AH objects to indexed	Bob Pearson	2	-2/+6
	Make changes to rxe_param.h and rxe_pool.c to allow indexing of AH objects. Valid indices are non-zero so older providers can be detected. Link: https://lore.kernel.org/r/20211007204051.10086-3-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/rxe: Move AV from rxe_send_wqe to rxe_send_wr	Bob Pearson	3	-3/+6
	Move the struct rxe_av av from struct rxe_send_wqe to struct rxe_send_wr placing it in wr.ud at the same offset as it was previously. This has the effect of increasing the size of struct rxe_send_wr while keeping the size of struct rxe_send_wqe the same. This better reflects the use of this field which is only used for UD sends. This change has no effect on ABI compatibility so the modified rxe driver will operate with older versions of rdma-core. Link: https://lore.kernel.org/r/20211007204051.10086-2-rpearsonhpe@gmail.com Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/mlx4: Return missed an error if device doesn't support steering	Leon Romanovsky	1	-1/+3
	The error flow fixed in this patch is not possible because all kernel users of create QP interface check that device supports steering before set IB_QP_CREATE_NETIF_QP flag. Fixes: c1c98501121e ("IB/mlx4: Add support for steerable IB UD QPs") Link: https://lore.kernel.org/r/91c61f6e60eb0240f8bbc321fda7a1d2986dd03c.1634023677.git.leonro@nvidia.com Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/irdma: Remove irdma_cqp_up_map_cmd()	Zhu Yanjun	2	-36/+0
	The function irdma_cqp_up_map_cmd() is not used. So remove it. Link: https://lore.kernel.org/r/20211011110128.4057-5-yanjun.zhu@linux.dev Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/irdma: Remove irdma_get_hw_addr()	Zhu Yanjun	2	-12/+0
	The function irdma_get_hw_addr() is not used. So remove it. Link: https://lore.kernel.org/r/20211011110128.4057-4-yanjun.zhu@linux.dev Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/irdma: Remove irdma_sc_send_lsmm_nostag()	Zhu Yanjun	2	-39/+1
	The function irdma_sc_send_lsmm_nostag is not used. So remove it. Link: https://lore.kernel.org/r/20211011110128.4057-3-yanjun.zhu@linux.dev Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/irdma: Remove irdma_uk_mw_bind()	Zhu Yanjun	2	-60/+1
	The function irdma_uk_mw_bind is not used. So remove it. Link: https://lore.kernel.org/r/20211011110128.4057-2-yanjun.zhu@linux.dev Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA: Remove redundant 'flush_workqueue()' calls	Christophe JAILLET	6	-10/+2
	'destroy_workqueue()' already drains the queue before destroying it, so there is no need to flush it explicitly. Remove the redundant 'flush_workqueue()' calls. This was generated with coccinelle: @@ expression E; @@ - flush_workqueue(E); destroy_workqueue(E); Link: https://lore.kernel.org/r/ca7bac6e6c9c5cc8d04eec3944edb13de0e381a3.1633874776.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/iwpm: Remove redundant initialization of pointer err_str	Colin Ian King	1	-1/+1
	The pointer err_str is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Link: https://lore.kernel.org/r/20211007173942.21933-1-colin.king@canonical.com Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/hns: Use dma_alloc_coherent() instead of kmalloc/dma_map_single()	Cai Huoqing	1	-15/+5
	Replacing kmalloc/kfree/dma_map_single/dma_unmap_single() with dma_alloc_coherent/dma_free_coherent() helps to reduce code size, and simplify the code, and coherent DMA will not clear the cache every time. The SOC that this driver supports does not have incoherent DMA, so this makes the code follow the DMA API properly with no performance impact. Currently there are missing dma sync calls around the DMA transfers. Link: https://lore.kernel.org/r/20210926061116.282-1-caihuoqing@baidu.com Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Reviewed-by: Wenpeng Liang <liangwenpeng@huawei.com> Tested-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/mlx5: Add optional counter support in get_hw_stats callback	Aharon Landau	1	-3/+85
	When get_hw_stats is called, query and return the optional counter statistic as well. Link: https://lore.kernel.org/r/20211008122439.166063-14-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/mlx5: Add modify_op_stat() support	Aharon Landau	3	-6/+76
	Add support for ib callback modify_op_stat() to add or remove an optional counter. When adding, a steering flow table is created with a rule that catches and counts all the matching packets. When removing, the table and flow counter are destroyed. Link: https://lore.kernel.org/r/20211008122439.166063-13-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/mlx5: Add steering support in optional flow counters	Aharon Landau	3	-0/+212
	Adding steering infrastructure for adding and removing optional counter. This allows to add and remove the counters dynamically in order not to hurt performance. Link: https://lore.kernel.org/r/20211008122439.166063-12-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/mlx5: Support optional counters in hw_stats initialization	Aharon Landau	2	-16/+75
	Add optional counter support when allocate and initialize hw_stats structure. Optional counters have IB_STAT_FLAG_OPTIONAL flag set and are disabled by default. Link: https://lore.kernel.org/r/20211008122439.166063-11-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/nldev: Allow optional-counter status configuration through RDMA netlink	Aharon Landau	1	-4/+57
	Provide an option to allow users to enable/disable optional counters through RDMA netlink. Limiting it to users with ADMIN capability only. Examples: 1. Enable optional counters cc_rx_ce_pkts and cc_rx_cnp_pkts (and disable all others): $ sudo rdma statistic set link rocep8s0f0/1 optional-counters \ cc_rx_ce_pkts,cc_rx_cnp_pkts 2. Remove all optional counters: $ sudo rdma statistic unset link rocep8s0f0/1 optional-counters Link: https://lore.kernel.org/r/20211008122439.166063-10-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/nldev: Split nldev_stat_set_mode_doit out of nldev_stat_set_doit	Aharon Landau	1	-46/+70
	In order to allow expansion of the set command with more set options, take the set mode out of the main set function. Link: https://lore.kernel.org/r/20211008122439.166063-9-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/nldev: Add support to get status of all counters	Aharon Landau	2	-0/+103
	This patch adds the ability to get the name, index and status of all counters for each link through RDMA netlink. This can be used for user-space to get the current optional-counter mode. Examples: $ rdma statistic mode link rocep8s0f0/1 optional-counters cc_rx_ce_pkts $ rdma statistic mode supported link rocep8s0f0/1 supported optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts,cc_tx_cnp_pkts link rocep8s0f1/1 supported optional-counters cc_rx_ce_pkts,cc_rx_cnp_pkts,cc_tx_cnp_pkts Link: https://lore.kernel.org/r/20211008122439.166063-8-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Neta Ostrovsky <netao@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/counter: Add optional counter support	Aharon Landau	5	-10/+74
	An optional counter is a driver-specific counter that may be dynamically enabled/disabled. This enhancement allows drivers to expose counters which are, for example, mutually exclusive and cannot be enabled at the same time, counters that might degrades performance, optional debug counters, etc. Optional counters are marked with IB_STAT_FLAG_OPTIONAL flag. They are not exported in sysfs, and must be at the end of all stats, otherwise the attr->show() in sysfs would get wrong indexes for hwcounters that are behind optional counters. Link: https://lore.kernel.org/r/20211008122439.166063-7-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Neta Ostrovsky <netao@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/counter: Add an is_disabled field in struct rdma_hw_stats	Aharon Landau	3	-1/+26
	Add a bitmap in rdma_hw_stat structure, with each bit indicates whether the corresponding counter is currently disabled or not. By default hwcounters are enabled. Link: https://lore.kernel.org/r/20211008122439.166063-6-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/core: Add a helper API rdma_free_hw_stats_struct	Mark Zhang	4	-31/+47
	Add a new API rdma_free_hw_stats_struct to pair with rdma_alloc_hw_stats_struct (which is also de-inlined). This will be useful when there are more alloc/free works in following patches. Link: https://lore.kernel.org/r/20211008122439.166063-5-markzhang@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	RDMA/counter: Add a descriptor in struct rdma_hw_stats	Aharon Landau	13	-245/+252
	Add a counter statistic descriptor structure in rdma_hw_stats. In addition to the counter name, more meta-information will be added. This code extension is needed for optional-counter support in the following patches. Link: https://lore.kernel.org/r/20211008122439.166063-4-markzhang@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux	Jason Gunthorpe	4	-11/+69
	For dependencies in the following patches. * mellanox/mlx5-next: net/mlx5: Add priorities for counters in RDMA namespaces net/mlx5: Add ifc bits to support optional counters Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-09	net/mlx5: Add priorities for counters in RDMA namespaces	Aharon Landau	3	-8/+50
	Add additional flow steering priorities in the RDMA namespace. This allows adding flow counters to count filtered RDMA traffic and then continue processing in the regular RDMA steering flow. Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-10-09	net/mlx5: Add ifc bits to support optional counters	Aharon Landau	1	-3/+19
	Adding bth_opcode field and the relevant bits. This field will be used to capture and count congestion notification packets (CNP). Adding source_vhca_port support bit. This field will be used to check the capability to use the source_vhca_port as a match criteria in cases of dual port. Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-10-06	RDMA/efa: CQ notifications	Gal Pressman	11	-55/+625
	This patch adds support for CQ notifications through the standard verbs api. In order to achieve that, a new event queue (EQ) object is introduced, which is in charge of reporting completion events to the driver. On driver load, EQs are allocated and their affinity is set to a single cpu. When a user app creates a CQ with a completion channel, the completion vector number is converted to a EQ number, which is in charge of reporting the CQ events. In addition, the CQ creation admin command now returns an offset for the CQ doorbell, which is mapped to the userspace provider and is used to arm the CQ when requested by the user. The EQs use a single doorbell (located on the registers BAR), which encodes the EQ number and arm as part of the doorbell value. The EQs are polled by the driver on each new EQE, and arm it when the poll is completed. Link: https://lore.kernel.org/r/20211003105605.29222-1-galpress@amazon.com Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-06	RDMA/rxe: Remove duplicate settings	Xiao Yang	1	-4/+0
	Remove duplicate settings for vendor_err and qp_num. Link: https://lore.kernel.org/r/20210930094813.226888-5-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-06	RDMA/rxe: Set partial attributes when completion status != IBV_WC_SUCCESS	Xiao Yang	1	-20/+25
	As ibv_poll_cq()'s manual said, only partial attributes are valid when completion status != IBV_WC_SUCCESS. Link: https://lore.kernel.org/r/20210930094813.226888-4-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-06	RDMA/rxe: Change the is_user member of struct rxe_cq to bool	Xiao Yang	2	-3/+2
	Make the is_user members of struct rxe_qp/rxe_cq has the same type. Link: https://lore.kernel.org/r/20210930094813.226888-3-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-06	RDMA/rxe: Remove the is_user members of struct rxe_sq/rxe_rq/rxe_srq	Xiao Yang	4	-9/+0
	The is_user members of struct rxe_sq/rxe_rq/rxe_srq are unsed since commit ae6e843fe08d ("RDMA/rxe: Add memory barriers to kernel queues"). In this case, it is fine to remove them directly. Link: https://lore.kernel.org/r/20210930094813.226888-2-yangx.jy@fujitsu.com Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-06	RDMA/irdma: Delete unused struct irdma_bth	Zhu Yanjun	1	-8/+0
	The struct irdma_bth is not used, so remove it. Link: https://lore.kernel.org/r/20211006201531.469650-1-yanjun.zhu@linux.dev Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-05	IB/hf1: Use string_upper() instead of an open coded variant	Andy Shevchenko	1	-6/+4
	Use string_upper() from the string helper module instead of an open coded variant. Link: https://lore.kernel.org/r/20211001123153.67379-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-05	RDMA/rw: switch to dma_map_sgtable()	Logan Gunthorpe	1	-25/+41
	There are a couple of subtle error path bugs related to mapping the sgls: - In rdma_rw_ctx_init(), dma_unmap would be called with an sg that could have been incremented from the original call, as well as an nents that is the dma mapped entries not the original number of nents called when mapped. - Similarly in rdma_rw_ctx_signature_init, both sg and prot_sg were unmapped with the incorrect number of nents. To fix this, switch to the sgtable interface for mapping which conveniently stores the original nents for unmapping. This will get cleaned up further once the dma mapping interface supports P2PDMA and pci_p2pdma_map_sg() can be removed. Fixes: 0e353e34e1e7 ("IB/core: add RW API support for signature MRs") Fixes: a060b5629ab0 ("IB/core: generic RDMA READ/WRITE API") Link: https://lore.kernel.org/r/20211001213215.3761-1-logang@deltatee.com Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-04	RDMA/mlx5: Avoid taking MRs from larger MR cache pools when a pool is empty	Aharon Landau	1	-17/+9
	Currently, if a cache entry is empty, the driver will try to take MRs from larger cache entries. This behavior consumes a lot of memory. In addition, when searching for an mkey in an entry, the entry is locked. When using a multithreaded application with the old behavior, the threads will block each other more often, which can hurt performance as can be seen in the table below. Therefore, avoid it by creating a new mkey when the requested cache entry is empty. The test was performed on a machine with Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 44 cores. Here are the time measures for allocating MRs of 2^6 pages. The search in the cache started from entry 6. +------------+---------------------+---------------------+ \| \| Old behavior \| New behavior \| \| +----------+----------+----------+----------+ \| \| 1 thread \| 5 thread \| 1 thread \| 5 thread \| +============+==========+==========+==========+==========+ \| 1,000 MRs \| 14 ms \| 30 ms \| 14 ms \| 80 ms \| +------------+----------+----------+----------+----------+ \| 10,000 MRs \| 135 ms \| 6 sec \| 173 ms \| 880 ms \| +------------+----------+----------+----------+----------+ \|100,000 MRs \| 11.2 sec \| 57 sec \| 1.74 sec \| 8.8 sec \| +------------+----------+----------+----------+----------+ Link: https://lore.kernel.org/r/71af2770c737b936f7b10f457f0ef303ffcf7ad7.1632644527.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-04	RDMA/rtrs-clt: Follow "one entry one value" rule for IO migration stats	Md Haris Iqbal	3	-13/+28
	This commit divides the sysfs entry cpu_migration into 2 different entries One for "from cpus" and the other for "to cpus". Link: https://lore.kernel.org/r/20210922125333.351454-8-haris.iqbal@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-04	RDMA/rtrs: Do not allow sessname to contain special symbols / and .	Md Haris Iqbal	2	-0/+11
	Allowing these characters in sessname can lead to unexpected results, particularly because / is used as a separator between files in a path, and . points to the current directory. Link: https://lore.kernel.org/r/20210922125333.351454-7-haris.iqbal@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com> Reviewed-by: Aleksei Marov <aleksei.marov@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-04	RDMA/rtrs: Introduce destroy_cq helper	Md Haris Iqbal	1	-12/+13
	The same code snip used twice, to avoid duplicate, replace it with a destroy_cq helper. Link: https://lore.kernel.org/r/20210922125333.351454-6-haris.iqbal@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-04	RDMA/rtrs: Replace duplicate check with is_pollqueue helper	Jack Wang	1	-3/+8
	if (con->cid >= con->sess->irq_con_num) check can be replaced with a is_pollqueue helper. Link: https://lore.kernel.org/r/20210922125333.351454-5-haris.iqbal@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-04	RDMA/rtrs: Fix warning when use poll mode on client side.	Jack Wang	2	-3/+15
	When testing with poll mode, it will fail and lead to warning below on client side: $ echo "sessname=bla path=gid:fe80::2:c903:4e:d0b3@gid:fe80::2:c903:8:ca17 device_path=/dev/nullb2 nr_poll_queues=-1" \| \ sudo tee /sys/devices/virtual/rnbd-client/ctl/map_device rnbd_client L597: Mapping device /dev/nullb2 on session bla, (access_mode: rw, nr_poll_queues: 8) WARNING: CPU: 3 PID: 9886 at drivers/infiniband/core/cq.c:447 ib_cq_pool_get+0x26f/0x2a0 [ib_core] The problem is in case of poll queue, we need to still call ib_alloc_cq/ib_free_cq, we can't use cq_poll api for poll queue. As both client and server use shared function from rtrs, set irq_con_num to con_num on server side, which is number of total connection of the session, this way we can differ if the rtrs_con requires pollqueue. Following up patches will replace the duplicate code with helpers. Link: https://lore.kernel.org/r/20210922125333.351454-4-haris.iqbal@ionos.com Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Gioh Kim <gi-oh.kim@ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-04	RDMA/rtrs: Remove len parameter from helper print functions of sysfs	Md Haris Iqbal	5	-20/+12
	Since we have changed all sysfs show functions to use sysfs_emit, we do not require the len (PAGE_SIZE) in our helper print functions. So remove it from the function parameter. Link: https://lore.kernel.org/r/20210922125333.351454-3-haris.iqbal@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-04	RDMA/rtrs: Use sysfs_emit instead of s*printf function for sysfs show	Md Haris Iqbal	2	-14/+12
	sysfs_emit function was added to be aware of the PAGE_SIZE maximum of the temporary buffer used for outputting sysfs content, so there is no possible overruns. So replace the uses of any s*printf functions for the sysfs show functions with sysfs_emit. Link: https://lore.kernel.org/r/20210922125333.351454-2-haris.iqbal@ionos.com Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-04	RDMA/cma: Split apart the multiple uses of the same list heads	Jason Gunthorpe	2	-18/+27
	Two list heads in the rdma_id_private are being used for multiple purposes, to save a few bytes of memory. Give the different purposes different names and union the memory that is clearly exclusive. list splits into device_item and listen_any_item. device_item is threaded onto the cma_device's list and listen_any goes onto the listen_any_list. IDs doing any listen cannot have devices. listen_list splits into listen_item and listen_list. listen_list is on the parent listen any rdma_id_private and listen_item is on child listen that is bound to a specific cma_dev. Which name should be used in which case depends on the state and other factors of the rdma_id_private. Remap all the confusing references to make sense with the new names, so at least there is some hope of matching the necessary preconditions with each access. Link: https://lore.kernel.org/r/0-v1-a5ead4a0c19d+c3a-cma_list_head_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-04	Merge tag 'v5.15-rc4' into rdma.get for-next	Jason Gunthorpe	721	-4094/+7471
	Merged due to dependencies in following patches. Conflict in drivers/infiniband/hw/hfi1/ipoib_tx.c resolved by hand to take the %p change and txq stats rename together. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-04	RDMA/rxe: Bump up default maximum values used via uverbs	Rao Shoaib	1	-14/+16
	In our internal testing we have found that default maximum values are too small. Ideally there should be no limits, but since maximum values are reported via ibv_query_device, we have to return some value. So, the default maximums have been changed to large values. Link: https://lore.kernel.org/r/20210915011220.307585-1-Rao.Shoaib@oracle.com Signed-off-by: Rao Shoaib <Rao.Shoaib@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-03	Linux 5.15-rc4	Linus Torvalds	1	-1/+1

2021-10-03	elf: don't use MAP_FIXED_NOREPLACE for elf interpreter mappings	Chen Jingwen	1	-1/+1
	In commit b212921b13bd ("elf: don't use MAP_FIXED_NOREPLACE for elf executable mappings") we still leave MAP_FIXED_NOREPLACE in place for load_elf_interp. Unfortunately, this will cause kernel to fail to start with: 1 (init): Uhuuh, elf segment at 00003ffff7ffd000 requested but the memory is mapped already Failed to execute /init (error -17) The reason is that the elf interpreter (ld.so) has overlapping segments. readelf -l ld-2.31.so Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x000000000002c94c 0x000000000002c94c R E 0x10000 LOAD 0x000000000002dae0 0x000000000003dae0 0x000000000003dae0 0x00000000000021e8 0x0000000000002320 RW 0x10000 LOAD 0x000000000002fe00 0x000000000003fe00 0x000000000003fe00 0x00000000000011ac 0x0000000000001328 RW 0x10000 The reason for this problem is the same as described in commit ad55eac74f20 ("elf: enforce MAP_FIXED on overlaying elf segments"). Not only executable binaries, elf interpreters (e.g. ld.so) can have overlapping elf segments, so we better drop MAP_FIXED_NOREPLACE and go back to MAP_FIXED in load_elf_interp. Fixes: 4ed28639519c ("fs, elf: drop MAP_FIXED usage from elf_map") Cc: <stable@vger.kernel.org> # v4.19 Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Chen Jingwen <chenjingwen6@huawei.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>