linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2014-07-30	mlx5: Adjust events to use unsigned long param instead of void *	Jack Morgenstein	1	-7/+7
	In the event flow, we currently pass only a port number in the void *data argument. Rather than pass a pointer to the event handlers, we should use an "unsigned long" parameter, and pass the port number value directly. In the future, if necessary for some events, we can use the unsigned long parameter to pass a pointer. Based on a patch by Eli Cohen <eli@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30	mlx5: minor fixes (mainly avoidance of hidden casts)	Jack Morgenstein	6	-7/+7
	There were many places where parameters which should be u8/u16 were integer type. Additionally, in 2 places, a check for a non-null pointer was added before dereferencing the pointer (this is actually a bug fix). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30	mlx5: Move pci device handling from mlx5_ib to mlx5_core	Jack Morgenstein	7	-305/+196
	In preparation for a new mlx5 device which is VPI (i.e., ports can be either IB or ETH), move the pci device functionality from mlx5_ib to mlx5_core. This involves the following changes: 1. Move mlx5_core_dev struct out of mlx5_ib_dev. mlx5_core_dev is now an independent structure maintained by mlx5_core. mlx5_ib_dev now has a pointer to that struct. This requires changing a lot of places where the core_dev struct was accessed via mlx5_ib_dev (now, this needs to be a pointer dereference). 2. All PCI initializations are now done in mlx5_core. Thus, it is now mlx5_core which does pci_register_device (and not mlx5_ib, as was previously). 3. mlx5_ib now registers itself with mlx5_core as an "interface" driver. This is very similar to the mechanism employed for the mlx4 (ConnectX) driver. Once the HCA is initialized (by mlx5_core), it invokes the interface drivers to do their initializations. 4. There is a new event handler which the core registers: mlx5_core_event(). This event handler invokes the event handlers registered by the interfaces. Based on a patch by Eli Cohen <eli@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	4	-12/+27
	Conflicts: drivers/infiniband/hw/cxgb4/device.c The cxgb4 conflict was simply overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-21	iw_cxgb4: Don't limit TPTE count to 32KB	Hariprasad Shenai	2	-2/+1
	Use the size advertised by FW Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-21	iw_cxgb4: advertise the correct device max attributes	Hariprasad Shenai	5	-33/+32
	Advertise the actual max limits for things like qp depths, number of qps, cqs, etc. Clean up the queue allocation for qps and cqs. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-21	iw_cxgb4: Support query_qp() verb	Hariprasad Shenai	1	-0/+6
	Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-21	iw_cxgb4: log detailed warnings for negative advice	Hariprasad Shenai	1	-6/+23
	Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-17	iw_cxgb4: fix for 64-bit integer division	Hariprasad Shenai	1	-1/+2
	Fixed error introduced in commit id 7730b4c (" cxgb4/iw_cxgb4: work request logging feature") while compiling on 32 bit architecture reported by kbuild. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-17	cxgb4/iw_cxgb4: Move common defines to cxgb4	Anish Bhatt	1	-1/+0
	This define is used by cxgb4i and iw_cxgb4, moving to avoid code duplication Signed-off-by: Anish Bhatt <anish@chelsio.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16	Merge branches 'cxgb4' and 'mlx5' into for-next	Roland Dreier	1	-1/+1

2014-07-16	IB/mlx5: Enable "block multicast loopback" for kernel consumers	Or Gerlitz	1	-1/+1
	In commit f360d88a2efd, we advertise blocking multicast loopback to both kernel and userspace consumers, but don't allow kernel consumers (e.g IPoIB) to use it with their UD QPs. Fix that. Fixes: f360d88a2efd ("IB/mlx5: Add block multicast loopback support") Reported-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-07-15	cxgb4/iw_cxgb4: work request logging feature	Hariprasad Shenai	5	-0/+174
	This commit enhances the iwarp driver to optionally keep a log of rdma work request timining data for kernel mode QPs. If iw_cxgb4 module option c4iw_wr_log is set to non-zero, each work request is tracked and timing data maintained in a rolling log that is 4096 entries deep by default. Module option c4iw_wr_log_size_order allows specifing a log2 size to use instead of the default order of 12 (4096 entries). Both module options are read-only and must be passed in at module load time to set them. IE: modprobe iw_cxgb4 c4iw_wr_log=1 c4iw_wr_log_size_order=10 The timing data is viewable via the iw_cxgb4 debugfs file "wr_log". Writing anything to this file will clear all the timing data. Data tracked includes: - The host time when the work request was posted, just before ringing the doorbell. The host time when the completion was polled by the application. This is also the time the log entry is created. The delta of these two times is the amount of time took processing the work request. - The qid of the EQ used to post the work request. - The work request opcode. - The cqe wr_id field. For sq completions requests this is the swsqe index. For recv completions this is the MSN of the ingress SEND. This value can be used to match log entries from this log with firmware flowc event entries. - The sge timestamp value just before ringing the doorbell when posting, the sge timestamp value just after polling the completion, and CQE.timestamp field from the completion itself. With these three timestamps we can track the latency from post to poll, and the amount of time the completion resided in the CQ before being reaped by the application. With debug firmware, the sge timestamp is also logged by firmware in its flowc history so that we can compute the latency from posting the work request until the firmware sees it. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-15	cxgb4/iw_cxgb4: display TPTE on errors	Hariprasad Shenai	3	-11/+76
	With ingress WRITE or READ RESPONSE errors, HW provides the offending stag from the packet. This patch adds logic to log the parsed TPTE in this case. cxgb4 now exports a function to read a TPTE entry from adapter memory. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-15	cxgb4/iw_cxgb4: use firmware ord/ird resource limits	Hariprasad Shenai	5	-34/+117
	Advertise a larger max read queue depth for qps, and gather the resource limits from fw and use them to avoid exhaustinq all the resources. Design: cxgb4: Obtain the max_ordird_qp and max_ird_adapter device params from FW at init time and pass them up to the ULDs when they attach. If these parameters are not available, due to older firmware, then hard-code the values based on the known values for older firmware. iw_cxgb4: Fix the c4iw_query_device() to report these correct values based on adapter parameters. ibv_query_device() will always return: max_qp_rd_atom = max_qp_init_rd_atom = min(module_max, max_ordird_qp) max_res_rd_atom = max_ird_adapter Bump up the per qp max module option to 32, allowing it to be increased by the user up to the device max of max_ordird_qp. 32 seems to be sufficient to maximize throughput for streaming read benchmarks. Fail connection setup if the negotiated IRD exhausts the available adapter ird resources. So the driver will track the amount of ird resource in use and not send an RI_WR/INIT to FW that would reduce the available ird resources below zero. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-15	iw_cxgb4: Detect Ing. Padding Boundary at run-time	Hariprasad Shenai	6	-16/+43
	Updates iw_cxgb4 to determine the Ingress Padding Boundary from cxgb4_lld_info, and take subsequent actions. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-15	net: set name_assign_type in alloc_netdev()	Tom Gundersen	2	-3/+3
	Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert all users to pass NET_NAME_UNKNOWN. Coccinelle patch: @@ expression sizeof_priv, name, setup, txqs, rxqs, count; @@ ( -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs) +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs) \| -alloc_netdev_mq(sizeof_priv, name, setup, count) +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count) \| -alloc_netdev(sizeof_priv, name, setup) +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup) ) v9: move comments here from the wrong commit Signed-off-by: Tom Gundersen <teg@jklm.no> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-13	RDMA/cxgb4: Call iwpm_init() only once	Steve Wise	3	-9/+12
	We need to only register with the iwpm core once. Currently it is being done for every adapter, which causes a failure for each adapter but the first, making multiple adapters unusable. Fixes: 9eccfe109b27 ("RDMA/cxgb4: Add support for iWARP Port Mapper user space service") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-07-08	RDMA/cxgb4: Initialize the device status page	Steve Wise	1	-0/+1
	The status page is mapped to user processes and allows sharing the device state between the kernel and user processes. This state isn't getting initialized and thus intermittently causes problems. Namely, the user process can mistakenly think the user doorbell writes are disabled which causes SQ work requests to never get fetched by HW. Fixes: 05eb23893c2c ("cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes"). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Cc: <stable@vger.kernel.org> # v3.15 Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-07-08	RDMA/cxgb4: Clean up connection on ARP error	Hariprasad S	1	-1/+10
	Based on origninal work by Steve Wise <swise@opengridcomputing.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-07-08	RDMA/cxgb4: Fix skb_leak in reject_cr()	Hariprasad S	1	-1/+0
	Based on origninal work by Steve Wise <swise@opengridcomputing.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-07-01	rdma/cxgb4: Fixes cxgb4 probe failure in VM when PF is exposed through PCI Passthrough	Hariprasad Shenai	1	-1/+2
	Change logic which determines our Physical Function at PCI Probe time. Now we read the PL_WHOAMI register and get the Physical Function. Pass Physical Function to Upper Layer Drivers in lld_info structure in the new field "pf" added to lld_info. This is useful for the cases where the PF, say PF4, is attached to a Virtual Machine via some form of "PCI Pass Through" technology and the PCI Function shows up as PF0 in the VM. Based on original work by Casey Leedom <leedom@chelsio.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending	Linus Torvalds	3	-60/+46
	Pull SCSI target updates from Nicholas Bellinger: "The highlights this round include: - Add support for T10 PI pass-through between vhost-scsi + virtio-scsi (MST + Paolo + MKP + nab) - Add support for T10 PI in qla2xxx target mode (Quinn + MKP + hch + nab, merged through scsi.git) - Add support for percpu-ida pre-allocation in qla2xxx target code (Quinn + nab) - A number of iser-target fixes related to hardening the network portal shutdown path (Sagi + Slava) - Fix response length residual handling for a number of control CDBs (Roland + Christophe V.) - Various iscsi RFC conformance fixes in the CHAP authentication path (Tejas and Calsoft folks + nab) - Return TASK_SET_FULL status for tcm_fc(FCoE) DataIn + Response failures (Vasu + Jun + nab) - Fix long-standing ABORT_TASK + session reset hang (nab) - Convert iser-initiator + iser-target to include T10 bytes into EDTL (Sagi + Or + MKP + Mike Christie) - Fix NULL pointer dereference regression related to XCOPY introduced in v3.15 + CC'ed to v3.12.y (nab)" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (34 commits) target: Fix NULL pointer dereference for XCOPY in target_put_sess_cmd vhost-scsi: Include prot_bytes into expected data transfer length TARGET/sbc,loopback: Adjust command data length in case pi exists on the wire libiscsi, iser: Adjust data_length to include protection information scsi_cmnd: Introduce scsi_transfer_length helper target: Report correct response length for some commands target/sbc: Check that the LBA and number of blocks are correct in VERIFY target/sbc: Remove sbc_check_valid_sectors() Target/iscsi: Fix sendtargets response pdu for iser transport Target/iser: Fix a wrong dereference in case discovery session is over iser iscsi-target: Fix ABORT_TASK + connection reset iscsi_queue_req memory leak target: Use complete_all for se_cmd->t_transport_stop_comp target: Set CMD_T_ACTIVE bit for Task Management Requests target: cleanup some boolean tests target/spc: Simplify INQUIRY EVPD=0x80 tcm_fc: Generate TASK_SET_FULL status for response failures tcm_fc: Generate TASK_SET_FULL status for DataIN failures iscsi-target: Reject mutual authentication with reflected CHAP_C iscsi-target: Remove no-op from iscsit_tpg_del_portal_group iscsi-target: Fix CHAP_A parameter list handling ...
2014-06-12	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next	Linus Torvalds	7	-18/+124
	Pull networking updates from David Miller: 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov. 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J Benniston. 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn Mork. 4) BPF now has a "random" opcode, from Chema Gonzalez. 5) Add more BPF documentation and improve test framework, from Daniel Borkmann. 6) Support TCP fastopen over ipv6, from Daniel Lee. 7) Add software TSO helper functions and use them to support software TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia. 8) Support software TSO in fec driver too, from Nimrod Andy. 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli. 10) Handle broadcasts more gracefully over macvlan when there are large numbers of interfaces configured, from Herbert Xu. 11) Allow more control over fwmark used for non-socket based responses, from Lorenzo Colitti. 12) Do TCP congestion window limiting based upon measurements, from Neal Cardwell. 13) Support busy polling in SCTP, from Neal Horman. 14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru. 15) Bridge promisc mode handling improvements from Vlad Yasevich. 16) Don't use inetpeer entries to implement ID generation any more, it performs poorly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits) rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 tcp: fixing TLP's FIN recovery net: fec: Add software TSO support net: fec: Add Scatter/gather support net: fec: Increase buffer descriptor entry number net: fec: Factorize feature setting net: fec: Enable IP header hardware checksum net: fec: Factorize the .xmit transmit function bridge: fix compile error when compiling without IPv6 support bridge: fix smatch warning / potential null pointer dereference via-rhine: fix full-duplex with autoneg disable bnx2x: Enlarge the dorq threshold for VFs bnx2x: Check for UNDI in uncommon branch bnx2x: Fix 1G-baseT link bnx2x: Fix link for KR with swapped polarity lane sctp: Fix sk_ack_backlog wrap-around problem net/core: Add VF link state control policy net/fsl: xgmac_mdio is dependent on OF_MDIO net/fsl: Make xgmac_mdio read error message useful net_sched: drr: warn when qdisc is not work conserving ...
2014-06-11	libiscsi, iser: Adjust data_length to include protection information	Sagi Grimberg	1	-24/+10
	In case protection information exists over the wire iscsi header data length is required to include it. Use protection information aware scsi helpers to set the correct transfer length. In order to avoid breakage, remove iser transfer length checks for each task as they are not always true and somewhat redundant anyway. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Mike Christie <michaelc@cs.wisc.edu> Cc: stable@vger.kernel.org # 3.15+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-11	Target/iscsi: Fix sendtargets response pdu for iser transport	Sagi Grimberg	1	-1/+1
	In case the transport is iser we should not include the iscsi target info in the sendtargets text response pdu. This causes sendtargets response to include the target info twice. Modify iscsit_build_sendtargets_response to filter transport types that don't match. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reported-by: Slava Shwartsman <valyushash@gmail.com> Cc: stable@vger.kernel.org # 3.11+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-11	Target/iser: Fix a wrong dereference in case discovery session is over iser	Sagi Grimberg	1	-1/+3
	In case the discovery session is carried over iser, we can't access the assumed network portal since the default portal is used. In this case we don't really need to allocate the fastreg pool, just prepare to the text pdu that will follow. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reported-by: Alex Tabachnik <alext@mellanox.com> Cc: stable@vger.kernel.org # 3.15+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-10	iw_cxgb4: don't truncate the recv window size	Hariprasad Shenai	2	-4/+52
	Fixed a bug that shows up with recv window sizes that exceed the size of the RCV_BUFSIZ field in opt0 (>= 1024K). If the recv window exceeds this, then we specify the max possible in opt0, add add the rest in via a RX_DATA_ACK credits. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-10	iw_cxgb4: Choose appropriate hw mtu index and ISS for iWARP connections	Hariprasad Shenai	2	-11/+63
	Select the appropriate hw mtu index and initial sequence number to optimize hw memory performance. Add new cxgb4_best_aligned_mtu() which allows callers to provide enough information to be used to [possibly] select an MTU which will result in the TCP Data Segment Size (AKA Maximum Segment Size) to be an aligned value. If an RTR message exhange is required, then align the ISS to 8B - 1 + 4, so that after the SYN the send seqno will align on a 4B boundary. The RTR message exchange will leave the send seqno aligned on an 8B boundary. If an RTR is not required, then align the ISS to 8B - 1. The goal is to have the send seqno be 8B aligned when we send the first FPDU. Based on original work by Casey Leedom <leeedom@chelsio.com> and Steve Wise <swise@opengridcomputing.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-10	iw_cxgb4: Allocate and use IQs specifically for indirect interrupts	Hariprasad Shenai	3	-2/+8
	Currently indirect interrupts for RDMA CQs funnel through the LLD's RDMA RXQs, which also handle direct interrupts for offload CPLs during RDMA connection setup/teardown. The intended T4 usage model, however, is to have indirect interrupts flow through dedicated IQs. IE not to mix indirect interrupts with CPL messages in an IQ. This patch adds the concept of RDMA concentrator IQs, or CIQs, setup and maintained by the LLD and exported to iw_cxgb4 for use when creating CQs. RDMA CPLs will flow through the LLD's RDMA RXQs, and CQ interrupts flow through the CIQs. Design: cxgb4 creates and exports an array of CIQs for the RDMA ULD. These IQs are sized according to the max available CQs available at adapter init. In addition, these IQs don't need FL buffers since they only service indirect interrupts. One CIQ is setup per RX channel similar to the RDMA RXQs. iw_cxgb4 will utilize these CIQs based on the vector value passed into create_cq(). The num_comp_vectors advertised by iw_cxgb4 will be the number of CIQs configured, and thus the vector value will be the index into the array of CIQs. Based on original work by Steve Wise <swise@opengridcomputing.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-10	Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband	Linus Torvalds	54	-637/+3316
	Pull main InfiniBand/RDMA updates from Roland Dreier: - add iWARP port mapper to avoid conflicts between RDMA and normal stack TCP connections. - fixes for i386 / x86-64 structure padding differences (ABI compatibility for 32-on-64) from Yann Droneaud. - a pile of SRP initiator fixes from Bart Van Assche. - fixes for a writeback / memory allocation deadlock with NFS over IPoIB connected mode from Jiri Kosina. - the usual fixes and cleanups to mlx4, mlx5, cxgb4 and other low-level drivers. * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (61 commits) RDMA/cxgb4: Add support for iWARP Port Mapper user space service RDMA/nes: Add support for iWARP Port Mapper user space service RDMA/core: Add support for iWARP Port Mapper user space service IB/mlx4: Fix gfp passing in create_qp_common() IB/umad: Fix use-after-free on close IB/core: Fix kobject leak on device register error flow RDMA/cxgb4: add missing padding at end of struct c4iw_alloc_ucontext_resp mlx4_core: Fix GFP flags parameters to be gfp_t IB/core: Fix port kobject deletion during error flow IB/core: Remove unneeded kobject_get/put calls IB/core: Fix sparse warnings about redeclared functions IB/mad: Fix sparse warning about gfp_t use IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO IB: Add a QP creation flag to use GFP_NOIO allocations IB: Return error for unsupported QP creation flags IB: Allow build of hw/ and ulp/ subdirectories independently mlx4_core: Move handling of MLX4_QP_ST_MLX to proper switch statement RDMA/cxgb4: Add missing padding at end of struct c4iw_create_cq_resp IB/srp: Avoid problems if a header uses pr_fmt IB/umad: Fix error handling ...
2014-06-10	Merge branches 'core', 'cxgb3', 'cxgb4', 'iser', 'iwpm', 'misc', 'mlx4', 'mlx5', 'noio', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next	Roland Dreier	52	-566/+3227

2014-06-10	RDMA/cxgb4: Add support for iWARP Port Mapper user space service	Steve Wise	3	-43/+262
	Based on original work by Vipul Pandya. Signed-off-by: Steve Wise <swise@opengridcomputing.com> [ Fix htons -> ntohs to make sparse happy. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-10	RDMA/nes: Add support for iWARP Port Mapper user space service	Tatyana Nikolova	4	-64/+296
	Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-10	RDMA/core: Add support for iWARP Port Mapper user space service	Tatyana Nikolova	6	-4/+1549
	This patch adds iWARP Port Mapper (IWPM) Version 2 support. The iWARP Port Mapper implementation is based on the port mapper specification section in the Sockets Direct Protocol paper - http://www.rdmaconsortium.org/home/draft-pinkerton-iwarp-sdp-v1.0.pdf Existing iWARP RDMA providers use the same IP address as the native TCP/IP stack when creating RDMA connections. They need a mechanism to claim the TCP ports used for RDMA connections to prevent TCP port collisions when other host applications use TCP ports. The iWARP Port Mapper provides a standard mechanism to accomplish this. Without this service it is possible for RDMA application to bind/listen on the same port which is already being used by native TCP host application. If that happens the incoming TCP connection data can be passed to the RDMA stack with error. The iWARP Port Mapper solution doesn't contain any changes to the existing network stack in the kernel space. All the changes are contained with the infiniband tree and also in user space. The iWARP Port Mapper service is implemented as a user space daemon process. Source for the IWPM service is located at http://git.openfabrics.org/git?p=~tnikolova/libiwpm-1.0.0/.git;a=summary The iWARP driver (port mapper client) sends to the IWPM service the local IP address and TCP port it has received from the RDMA application, when starting a connection. The IWPM service performs a socket bind from user space to get an available TCP port, called a mapped port, and communicates it back to the client. In that sense, the IWPM service is used to map the TCP port, which the RDMA application uses to any port available from the host TCP port space. The mapped ports are used in iWARP RDMA connections to avoid collisions with native TCP stack which is aware that these ports are taken. When an RDMA connection using a mapped port is terminated, the client notifies the IWPM service, which then releases the TCP port. The message exchange between the IWPM service and the iWARP drivers (between user space and kernel space) is implemented using netlink sockets. 1) Netlink interface functions are added: ibnl_unicast() and ibnl_mulitcast() for sending netlink messages to user space 2) The signature of the existing ibnl_put_msg() is changed to be more generic 3) Two netlink clients are added: RDMA_NL_NES, RDMA_NL_C4IW corresponding to the two iWarp drivers - nes and cxgb4 which use the IWPM service 4) Enums are added to enumerate the attributes in the netlink messages, which are exchanged between the user space IWPM service and the iWARP drivers Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: PJ Waskiewicz <pj.waskiewicz@solidfire.com> [ Fold in range checking fixes and nlh_next removal as suggested by Dan Carpenter and Steve Wise. Fix sparse endianness in hash. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-09	IB/mlx4: Fix gfp passing in create_qp_common()	Jiri Kosina	1	-2/+2
	There are two kzalloc() calls which were not converted to use value of gfp passed to create_qp_common() instead of using hardcoded GFP_KERNEL in 40f2287bd583 ("IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO"). Fix this by passing gfp value down properly. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-07	Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending	Linus Torvalds	1	-0/+2
	Pull SCSI target fixes from Nicholas Bellinger: "Here are the remaining fixes for v3.15. This series includes: - iser-target fix for ImmediateData exception reference count bug (Sagi + nab) - iscsi-target fix for MC/S login + potential iser-target MRDSL buffer overrun (Santosh + Roland) - iser-target fix for v3.15-rc multi network portal shutdown regression (nab) - target fix for allowing READ_CAPCITY during ALUA Standby access state (Chris + nab) - target fix for NULL pointer dereference of alua_access_state for un-configured devices (Chris + nab)" * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: target: Fix alua_access_state attribute OOPs for un-configured devices target: Allow READ_CAPACITY opcode in ALUA Standby access state iser-target: Fix multi network portal shutdown regression iscsi-target: Fix wrong buffer / buffer overrun in iscsi_change_param_value() iser-target: Add missing target_put_sess_cmd for ImmedateData failure
2014-06-06	IB/umad: Fix use-after-free on close	Bart Van Assche	1	-11/+19
	Avoid that closing /dev/infiniband/umad<n> or /dev/infiniband/issm<n> triggers a use-after-free. __fput() invokes f_op->release() before it invokes cdev_put(). Make sure that the ib_umad_device structure is freed by the cdev_put() call instead of f_op->release(). This avoids that changing the port mode from IB into Ethernet and back to IB followed by restarting opensmd triggers the following kernel oops: general protection fault: 0000 [#1] PREEMPT SMP RIP: 0010:[<ffffffff810cc65c>] [<ffffffff810cc65c>] module_put+0x2c/0x170 Call Trace: [<ffffffff81190f20>] cdev_put+0x20/0x30 [<ffffffff8118e2ce>] __fput+0x1ae/0x1f0 [<ffffffff8118e35e>] ____fput+0xe/0x10 [<ffffffff810723bc>] task_work_run+0xac/0xe0 [<ffffffff81002a9f>] do_notify_resume+0x9f/0xc0 [<ffffffff814b8398>] int_signal+0x12/0x17 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=75051 Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> Cc: <stable@vger.kernel.org> # 3.x: 8ec0a0e6b58: IB/umad: Fix error handling Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-05	IB/core: Fix kobject leak on device register error flow	Haggai Eran	1	-26/+26
	The ports kobject isn't being released during error flow in device registration. This patch refactors the ports kobject cleanup into a single function called from both the error flow in device registration and from the unregistration function. A couple of attributes aren't being deleted (iw_stats_group, and ib_class_attributes). While this may be handled implicitly by the destruction of their kobjects, it seems better to handle all the attributes the same way. Signed-off-by: Haggai Eran <haggaie@mellanox.com> [ Make free_port_list_attributes() static. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-05	RDMA/cxgb4: add missing padding at end of struct c4iw_alloc_ucontext_resp	Yann Droneaud	2	-2/+4
	The i386 ABI disagrees with most other ABIs regarding alignment of data types larger than 4 bytes: on most ABIs a padding must be added at end of the structures, while it is not required on i386. So for most ABI struct c4iw_alloc_ucontext_resp gets implicitly padded to be aligned on a 8 bytes multiple, while for i386, such padding is not added. The tool pahole can be used to find such implicit padding: $ pahole --anon_include \ --nested_anon_include \ --recursive \ --class_name c4iw_alloc_ucontext_resp \ drivers/infiniband/hw/cxgb4/iw_cxgb4.o Then, structure layout can be compared between i386 and x86_64: +++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 11:43:05.547432195 +0100 --- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100 @@ -2,9 +2,8 @@ struct c4iw_alloc_ucontext_resp { __u64 status_page_key; /* 0 8 / __u32 status_page_size; / 8 4 / - / size: 12, cachelines: 1, members: 2 / - / last cacheline: 12 bytes / + / size: 16, cachelines: 1, members: 2 / + / padding: 4 / + / last cacheline: 16 bytes */ }; This ABI disagreement will make an x86_64 kernel try to write past the buffer provided by an i386 binary. When boundary check will be implemented, the x86_64 kernel will refuse to write past the i386 userspace provided buffer and the uverbs will fail. If the structure is on a page boundary and the next page is not mapped, ib_copy_to_udata() will fail and the uverb will fail. Additionally, as reported by Dan Carpenter, without the implicit padding being properly cleared, an information leak would take place in most architectures. This patch adds an explicit padding to struct c4iw_alloc_ucontext_resp, and, like 92b0ca7cb149 ("IB/mlx5: Fix stack info leak in mlx5_ib_alloc_ucontext()"), makes function c4iw_alloc_ucontext() not writting this padding field to userspace. This way, x86_64 kernel will be able to write struct c4iw_alloc_ucontext_resp as expected by unpatched and patched i386 libcxgb4. Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com Link: http://marc.info/?i=1395848977.3297.15.camel@localhost.localdomain Link: http://marc.info/?i=20140328082428.GH25192@mwanda Cc: <stable@vger.kernel.org> Fixes: 05eb23893c2c ("cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes") Reported-by: Yann Droneaud <ydroneaud@opteya.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-04	IB/core: Fix port kobject deletion during error flow	Haggai Eran	1	-9/+17
	When encountering an error during the add_port function, adding a port to sysfs, the port kobject is freed without being deleted from sysfs. Instead of freeing it directly, the patch uses kobject_put to release the kobject and delete it. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-04	IB/core: Remove unneeded kobject_get/put calls	Haggai Eran	1	-5/+2
	The ib_core module will call kobject_get on the parent object of each kobject it creates. This is redundant since kobject_add does that anyway. As a side effect, this patch should fix leaking the ports kobject and the device kobject during unregister flow, since the previous code didn't seem to take into account the kobject_get calls on behalf of the child kobjects. Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-04	IB/core: Fix sparse warnings about redeclared functions	Roland Dreier	1	-4/+4
	Fix a few functions that are declared with __attribute_const__ in the ib_verbs.h header file but defined without it in verbs.c. This gets rid of the following sparse warnings: drivers/infiniband/core/verbs.c:51:5: error: symbol 'ib_rate_to_mult' redeclared with different type (originally declared at include/rdma/ib_verbs.h:469) - different modifiers drivers/infiniband/core/verbs.c:68:14: error: symbol 'mult_to_ib_rate' redeclared with different type (originally declared at include/rdma/ib_verbs.h:607) - different modifiers drivers/infiniband/core/verbs.c:85:5: error: symbol 'ib_rate_to_mbps' redeclared with different type (originally declared at include/rdma/ib_verbs.h:476) - different modifiers drivers/infiniband/core/verbs.c:111:1: error: symbol 'rdma_node_get_transport' redeclared with different type (originally declared at include/rdma/ib_verbs.h:84) - different modifiers Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-03	iser-target: Add missing target_put_sess_cmd for ImmedateData failure	Nicholas Bellinger	1	-0/+2
	This patch addresses a bug where an early exception for SCSI WRITE with ImmediateData=Yes was missing the target_put_sess_cmd() call to drop the extra se_cmd->cmd_kref reference obtained during the normal iscsit_setup_scsi_cmd() codepath execution. This bug was manifesting itself during session shutdown within isert_cq_rx_comp_err() where target_wait_for_sess_cmds() would end up waiting indefinately for the last se_cmd->cmd_kref put to occur for the failed SCSI WRITE + ImmediateData descriptors. This fix follows what traditional iscsi-target code already does for the same failure case within iscsit_get_immediate_data(). Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il> Cc: Sagi Grimberg <sagig@dev.mellanox.co.il> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-03	IB/mad: Fix sparse warning about gfp_t use	Roland Dreier	1	-1/+1
	Properly convert gfp_t & result to bool to fix: drivers/infiniband/core/sa_query.c:621:33: warning: incorrect type in initializer (different base types) drivers/infiniband/core/sa_query.c:621:33: expected bool [unsigned] [usertype] preload drivers/infiniband/core/sa_query.c:621:33: got restricted gfp_t Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-02	IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO	Jiri Kosina	4	-19/+25
	Modify the various routines used to allocate memory resources which serve QPs in mlx4 to get an input GFP directive. Have the Ethernet driver to use GFP_KERNEL in it's QP allocations as done prior to this commit, and the IB driver to use GFP_NOIO when the IB verbs IB_QP_CREATE_USE_GFP_NOIO QP creation flag is provided. Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-02	IB: Add a QP creation flag to use GFP_NOIO allocations	Or Gerlitz	1	-3/+15
	This addresses a problem where NFS client writes over IPoIB connected mode may deadlock on memory allocation/writeback. The problem is not directly memory reclamation. There is an indirect dependency between network filesystems writing back pages and ipoib_cm_tx_init() due to how a kworker is used. Page reclaim cannot make forward progress until ipoib_cm_tx_init() succeeds and it is stuck in page reclaim itself waiting for network transmission. Ordinarily this situation may be avoided by having the caller use GFP_NOFS but ipoib_cm_tx_init() does not have that information. To address this, take a general approach and add a new QP creation flag that tells the low-level hardware driver to use GFP_NOIO for the memory allocations related to the new QP. Use the new flag in the ipoib connected mode path, and if the driver doesn't support it, re-issue the QP creation without the flag. Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-02	IB: Return error for unsupported QP creation flags	Or Gerlitz	2	-1/+5
	Fix the usnic and thw qib drivers to err when QP creation flags that they don't understand are provided. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-02	IB: Allow build of hw/ and ulp/ subdirectories independently	Yann Droneaud	3	-17/+19
	It is not possible to build only the drivers/infiniband/hw/ (or ulp/) subdirectory with command such as: $ make ARCH=x86_64 O=./obj-x86_64/ drivers/infiniband/hw/ This fails with following error messages: make[2]: Nothing to be done for `all'. make[2]: Nothing to be done for `relocs'. CHK include/config/kernel.release Using /home/ydroneaud/src/linux as source for kernel GEN /home/ydroneaud/src/linux/obj-x86_64/Makefile CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h CALL /home/ydroneaud/src/linux/scripts/checksyscalls.sh /home/ydroneaud/src/linux/scripts/Makefile.build:44: /home/ydroneaud/src/linux/drivers/infiniband/hw/Makefile: No such file or directory make[2]: * No rule to make target `/home/ydroneaud/src/linux/drivers/infiniband/hw/Makefile'. Stop. make[1]: * [drivers/infiniband/hw/] Error 2 make: *** [sub-make] Error 2 This patch creates a Makefile in hw/ and ulp/ and moves each corresponding parts of drivers/infiniband/Makefile in the new Makefiles. It should not break build except if some hw/ drivers or ulp/ were allowed previously to be built while CONFIG_INFINIBAND is set to 'n', but according to drivers/infiniband/Kconfig, it's not possible. So it should be safe to apply. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-02	Revert "net/mlx4_en: Use affinity hint"	David S. Miller	1	-1/+1
	This reverts commit 70a640d0dae3a9b1b222ce673eb5d92c263ddd61. Signed-off-by: David S. Miller <davem@davemloft.net>