aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/hfi1/verbs.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-10-16Merge branch 'for-rc' into rdma.git for-nextJason Gunthorpe1-1/+7
From git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git This is required to resolve dependencies of the next series of RDMA patches. The code motion conflicts in drivers/infiniband/core/cache.c were resolved. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03IB/{hfi1, qib, rdmavt}: Move send completion logic to rdmavtVenkata Sandeep Dhanalakota1-3/+6
Moving send completion code into rdmavt in order to have shared logic between qib and hfi1 drivers. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03IB/{hfi1, qib, rdmavt}: Move copy SGE logic into rdmavtBrian Welty1-222/+4
This patch moves hfi1_copy_sge() into rdmavt for sharing with qib. This patch also moves all the wss_*() functions into rdmavt as several wss_*() functions are called from hfi1_copy_sge() When SGE copy mode is adaptive, cacheless copy may be done in some cases for performance reasons. In those cases, X86 cacheless copy function is called since the drivers that use rdmavt and may set SGE copy mode to adaptive are X86 only. For this reason, this patch adds "depends on X86_64" to rdmavt/Kconfig. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-30IB/hfi1: Prepare resource waits for dual legDennis Dalessandro1-6/+5
Current implementation allows each qp to have only one send engine. As such, each qp has only one list to queue prebuilt packets when send engine resources are not available. To improve performance, it is desired to support multiple send engines for each qp. This patch creates the framework to support two send engines (two legs) for each qp for the TID RDMA protocol, which can be easily extended to support more send engines. It achieves the goal by creating a leg specific struct, iowait_work in the iowait struct, to hold the work_struct and the tx_list as well as a pointer to the parent iowait struct. The hfi1_pkt_state now has an additional field to record the current legs work structure and that is now passed to all egress waiters to determine the leg that needs to wait via a new iowait helper. The APIs are adjusted to use the new leg specific struct as required. Many new and modified helpers are added to support this change. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-30IB/rdmavt: Rename check_send_wqe as setup_wqeKaike Wan1-1/+1
The driver-provided function check_send_wqe allows the hardware driver to check and set up the incoming send wqe before it is inserted into the swqe ring. This patch will rename it as setup_wqe to better reflect its usage. In addition, this function is only called when all setup is complete in rdmavt. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-20IB/hfi1: Fix SL array bounds checkIra Weiny1-1/+7
The SL specified by a user needs to be a valid SL. Add a range check to the user specified SL value which protects from running off the end of the SL to SC table. CC: stable@vger.kernel.org Fixes: 7724105686e7 ("IB/hfi1: add driver files") Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-11IB/hfi1: Suppress a compiler warningBart Van Assche1-1/+1
Avoid that the following compiler warning is reported when building with gcc 8: drivers/infiniband/hw/hfi1/verbs.c:1896:2: warning: 'strncpy' output may be truncated copying 64 bytes from a string of length 64 [-Wstringop-truncation] Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-10RDMA/hfi1: Move grh_required into update_sm_ahJason Gunthorpe1-9/+0
grh_required is intended to be a global setting where all AV's will require a GRH, not just the sm_lid. Move the special logic to the creation of the SM AH. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-19IB/rdmavt, IB/hfi1: Create device dependent s_flagsMike Marciniszyn1-2/+2
Move some s_flags defines out of rdmavt and into hfi1 because they are hfi1 specific and therefore should remain in the driver instead of bubbling up to rdmavt. Document device specific ranges in rdmavt and remap those in hfi1. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-18IB/core: add max_send_sge and max_recv_sge attributesSteve Wise1-1/+2
This patch replaces the ib_device_attr.max_sge with max_send_sge and max_recv_sge. It allows ulps to take advantage of devices that have very different send and recv sge depths. For example cxgb4 has a max_recv_sge of 4, yet a max_send_sge of 16. Splitting out these attributes allows much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW API. Consider a large RDMA WRITE that has 16 scattergather entries. With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of 16, it can be done with 1 WRITE WR. Acked-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24IB/hfi1: Add support for 16B Management PacketsDon Hiatt1-7/+18
16B Management Packets (L4=0x08) replace the BTH and DETH of normal MAD packet packets with a header containing the the source and destination queue pair numbers; fields that were originally retrieved from the BTH/DETH are now populated from this header as well as from the 16B LRH (e.g. pkey). 16B Management Packets are used as an optimized management format on 16B fabrics. These management packets have an opcode of IB_OPCODE_UD_SEND_ONLY, a fixed 3Byte pad, and a header length of 24Bytes. The decision as to when we send a management packet is based upon either the source or destination queue pair number being 0 or 1. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09IB/{hfi1, rdmavt, qib}: Implement CQ completion vector supportSebastian Sanchez1-3/+4
Currently the driver doesn't support completion vectors. These are used to indicate which sets of CQs should be grouped together into the same vector. A vector is a CQ processing thread that runs on a specific CPU. If an application has several CQs bound to different completion vectors, and each completion vector runs on different CPUs, then the completion queue workload is balanced. This helps scale as more nodes are used. Implement CQ completion vector support using a global workqueue where a CQ entry is queued to the CPU corresponding to the CQ's completion vector. Since the workqueue is global, it's guaranteed to always be there when queueing CQ entries; Therefore, the RCU locking for cq->rdi->worker in the hot path is superfluous. Each completion vector is assigned to a different CPU. The number of completion vectors available is computed by taking the number of online, physical CPUs from the local NUMA node and subtracting the CPUs used for kernel receive queues and the general interrupt. Special use cases: * If there are no CPUs left for completion vectors, the same CPU for the general interrupt is used; Therefore, there would only be one completion vector available. * For multi-HFI systems, the number of completion vectors available for each device is the total number of completion vectors in the local NUMA node divided by the number of devices in the same NUMA node. If there's a division remainder, the first device to get initialized gets an extra completion vector. Upon a CQ creation, an invalid completion vector could be specified. Handle it as follows: * If the completion vector is less than 0, set it to 0. * Set the completion vector to the result of the passed completion vector moded with the number of device completion vectors available. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09IB/hfi1: Rework fault injection machineryMitko Haralanov1-8/+5
The packet fault injection code present in the HFI1 driver had some issues which not only fragment the code but also created user confusion. Furthermore, it suffered from the following issues: 1. The fault_packet method only worked for received packets. This meant that the only fault injection mode available for sent packets is fault_opcode, which did not allow for random packet drops on all egressing packets. 2. The mask available for the fault_opcode mode did not really work due to the fact that the opcode values are not bits in a bitmask but rather sequential integer values. Creating a opcode/mask pair that would successfully capture a set of packets was nearly impossible. 3. The code was fragmented and used too many debugfs entries to operate and control. This was confusing to users. 4. It did not allow filtering fault injection on a per direction basis - egress vs. ingress. In order to improve or fix the above issues, the following changes have been made: 1. The fault injection methods have been combined into a single fault injection facility. As such, the fault injection has been plugged into both the send and receive code paths. Regardless of method used the fault injection will operate on both egress and ingress packets. 2. The type of fault injection - by packet or by opcode - is now controlled by changing the boolean value of the file "opcode_mode". When the value is set to True, fault injection is done by opcode. Otherwise, by packet. 2. The masking ability has been removed in favor of a bitmap that holds opcodes of interest (one bit per opcode, a total of 256 bits). This works in tandem with the "opcode_mode" value. When the value of "opcode_mode" is False, this bitmap is ignored. When the value is True, the bitmap lists all opcodes to be considered for fault injection. By default, the bitmap is empty. When the user wants to filter by opcode, the user sets the corresponding bit in the bitmap by echo'ing the bit position into the 'opcodes' file. This gets around the issue that the set of opcodes does not lend itself to effective masks and allow for extremely fine-grained filtering by opcode. 4. fault_packet and fault_opcode methods have been combined. Hence, there is only one debugfs directory controlling the entire operation of the fault injection machinery. This reduces the number of debugfs entries and provides a more unified user experience. 5. A new control files - "direction" - is provided to allow the user to control the direction of packets, which are subject to fault injection. 6. A new control file - "skip_usec" - is added that would allow the user to specify a "timeout" during which no fault injection will occur. In addition, the following bug fixes have been applied: 1. The fault injection code has been split into its own header and source files. This was done to better organize the code and support conditional compilation without littering the code with #ifdef's. 2. The method by which the TX PIO packets were being marked for drop conflicted with the way send contexts were being setup. As a result, the send context was repeatedly being reset. 3. The fault injection only makes sense when the user can control it through the debugfs entries. However, a kernel configuration can enable fault injection but keep fault injection debugfs entries disabled. Therefore, it makes sense that the HFI fault injection code depends on both. 4. Error suppression did not take into account the method by which PIO packets were being dropped. Therefore, even with error suppression turned on, errors would still be displayed to the screen. A larger enough packet drop percentage would case the kernel to crash because the driver would be stuck printing errors. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-03-19IB/uverbs: Extend uverbs_ioctl header with driver_idMatan Barak1-1/+1
Extending uverbs_ioctl header with driver_id and another reserved field. driver_id should be used in order to identify the driver. Since every driver could have its own parsing tree, this is necessary for strace support. Downstream patches take off the EXPERIMENTAL flag from the ioctl() IB support and thus we add some reserved fields for future usage. Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01IB/hfi1: Remove dependence on qp->s_hdrwordsMitko Haralanov1-5/+5
The s_hdrwords variable was used to indicate whether a packet was already built on a previous iteration of the send engine. This variable assumed the protection of the QP's RVT_S_BUSY flag, which was required since the the QP's s_lock was dropped just prior to the packet being queued on the one of the egress mechanisms. Support for multiple send engine instantiations require that the field not be used due to concurency issues. The ps.txreq signals the "already built" without the potential concurency issues. Fix by getting rid of all s_hdrword usage. A wrapper is added to test for the already built case that used to use s_hdrwords. What used to be stored in s_hdrwords is now in the txreq. The PBC is not counted, but is added in the pio/sdma code paths prior to posting the packet. Reviewed-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-05IB/{rdmavt, hfi1, qib}: Remove get_card_name() downcallMichael J. Ruhl1-1/+0
rdmavt has a down call to client drivers to retrieve a crafted card name. This name should be the IB defined name. Rather than craft the name each time it is needed, simply retrieve the IB allocated name from the IB device. Update the function name to reflect its application. Clean up driver code to match this change. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-05IB/{rdmavt, hfi1, qib}: Self determine driver nameMichael J. Ruhl1-3/+0
Currently the HFI and QIB drivers allow the IB core to assign a unit number to the driver name string. If multiple devices exist in a system, there is a possibility that the device unit number and the IB core number will be mismatched. Fix by using the driver defined unit number to generate the device name. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-12-11IB/hfi1: Use 4096 for default active MTU in query_qpJan Sokolowski1-1/+1
Currently, if a port is queried that has an invalid Maximum Transmission Unit, driver reports default MTU of 2048. This in incorrect. Use default value of 4096 if invalid. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-10-30IB/hfi1: Add tx_opcode_stats like the opcode_statsMike Marciniszyn1-0/+24
This patch adds tx_opcode_stats to parallel the (rx)opcode_stats in the debugfs. Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18Merge branch 'timer_setup' into for-nextDoug Ledford1-5/+4
Conflicts: drivers/infiniband/hw/cxgb4/cm.c drivers/infiniband/hw/qib/qib_driver.c drivers/infiniband/hw/qib/qib_mad.c There were minor fixups needed in these files. Just minor context diffs due to patches from independent sources touching the same basic area. Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18IB/hfi1: Convert timers to use timer_setup()Kees Cook1-5/+4
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Switches test of .data field to .function, since .data will be going away. Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18Merge branch 'hfi1' into k.o/for-nextDoug Ledford1-14/+8
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18IB/hfi1: Eliminate allocation while atomicDon Hiatt1-14/+8
The PIO trailing buffer was being dynamically allocated but the kcalloc return value was not being checked. Further, the GFP_KERNEL was being used even though the send engine might be called with interrupts disabled. Since the maximum size of the trailing buffer is only 12 bytes (CRC = 4, LT = 1, Pad = 0 to 7 bytes) just statically allocate the buffer, remove the alloc entirely and share it with the SDMA engine by making it global. Reported-by: Leon Romanovsky <leon@kernel.org> Fixes: 566d53a82644 ("IB/hfi1: Enhance PIO/SDMA send for 16B") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-14IB/hfi1: Remove set-but-not-used variablesBart Van Assche1-10/+0
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22IB/hfi1: Enable RDMA_CAP_OPA_AH in hfi driver to support extended LIDsDasaratharaman Chandramouli1-1/+2
Enabling this bit helps core components query for extended address support using the rdma_cap_opa_ah interface. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22IB/hfi1: Enhance PIO/SDMA send for 16BDon Hiatt1-69/+128
PIO/SDMA send logic now uses the hdr_type field to determine the type of packet that has been constructed. Based on the hdr_type, certain things such as PBC flags, padding count and the LT extra trailing bytes are determined. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22IB/rdmavt, hfi1, qib: Enhance rdmavt and hfi1 to use 32 bit lidsDasaratharaman Chandramouli1-22/+1
Increase lid used in hfi1 driver to 32 bits. qib continues to use 16 bit lids. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22IB/hfi1: Determine 9B/16B L2 header type based on Address handleDon Hiatt1-0/+12
When address handle attributes are initialized, the LIDs are transformed to be in the 32 bit LID space. When constructing the header, hfi1 driver will look at the LID to determine the packet header to be created. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22IB/hfi1: Add support to process 16B header errorsDon Hiatt1-8/+31
Enhance hdr_rcverr() to also handle errors during 16B bypass packet receive. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22IB/hfi1: Add support to send 16B bypass packetsDon Hiatt1-3/+21
We introduce struct hfi1_opa_header as a union of ib (9B) and 16B headers. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22IB/hfi1: Add support to receive 16B bypass packetsDon Hiatt1-7/+10
We introduce a struct hfi1_16b_header to support 16B headers. 16B bypass packets are received by the driver and processed similar to 9B packets. Add basic support to handle 16B packets. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22IB/rdmavt, hfi1, qib: Modify check_ah() to account for extended LIDsDon Hiatt1-0/+5
rvt_check_ah() delegates lid verification to underlying driver. Underlying driver uses different conditions to check for dlid depending on whether the device supports extended LIDs Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-10RDMA: Simplify get firmware interfaceLeon Romanovsky1-3/+2
There is a need to forward FW version to user space application through RDMA netlink. In order to make it safe, there is need to declare nla_policy and limit the size of FW string. The new define IB_FW_VERSION_NAME_MAX will limit the size of FW version string. That define was chosen to be equal to ETHTOOL_FWVERS_LEN, because many drivers anyway are limited by that value indirectly. The introduction of this define allows us to remove the string size from get_fw_str function signature. Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2017-07-31IB/hfi1: Send MAD traps until repressedMichael J. Ruhl1-0/+5
A trap should be sent to the FM until the FM sends a repress message. This is in line with the IBTA 13.4.9. Add the ability to resend traps until a repress message is received. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Michael N. Henry <michael.n.henry@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/hfi1: Serve the most starved iowait entry firstKaike Wan1-2/+4
When an egress resource(SDMA descriptors, pio credits) is not available, a sending thread will be put on the resource's wait queue. When the resource becomes available again, up to a fixed number of sending threads can be awakened sequentially and removed from the wait queue, depending on the number of waiting threads and the number of free resources. Since each awakened sending thread will send as many packets as possible, it is highly likely that the first sending thread will consume all the egress resources. Subsequently, it will be put back to the end of the wait queue. Depending on the timing when the later sending threads wake up, they may not be able to send any packet and be again put back to the end of the wait queue sequentially, right behind the first sending thread. This starvation cycle continues until some sending threads exceed their retry limit and consequently fail. This patch fixes the issue by two simple approaches: (1) Any starved sending thread will be put to the head of the wait queue while a served sending thread will be put to the tail; (2) The most starved sending thread will be served first. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-24Merge branch 'hfi1' into k.o/for-4.14Doug Ledford1-58/+48
2017-07-05IB/core, opa_vnic, hfi1, mlx5: Properly free rdma_netdevNiranjana Vishwanathapura1-1/+0
IPOIB is calling free_rdma_netdev even though alloc_rdma_netdev has returned -EOPNOTSUPP. Move free_rdma_netdev from ib_device structure to rdma_netdev structure thus ensuring proper cleanup function is called for the rdma net device. Fix the following trace: ib0: Failed to modify QP to ERROR state BUG: unable to handle kernel paging request at 0000000000001d20 IP: hfi1_vnic_free_rn+0x26/0xb0 [hfi1] Call Trace: ipoib_remove_one+0xbe/0x160 [ib_ipoib] ib_unregister_device+0xd0/0x170 [ib_core] rvt_unregister_device+0x29/0x90 [rdmavt] hfi1_unregister_ib_device+0x1a/0x100 [hfi1] remove_one+0x4b/0x220 [hfi1] pci_device_remove+0x39/0xc0 device_release_driver_internal+0x141/0x200 driver_detach+0x3f/0x80 bus_remove_driver+0x55/0xd0 driver_unregister+0x2c/0x50 pci_unregister_driver+0x2a/0xa0 hfi1_mod_cleanup+0x10/0xf65 [hfi1] SyS_delete_module+0x171/0x250 do_syscall_64+0x67/0x150 entry_SYSCALL64_slow_path+0x25/0x25 Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27IB/hfi1: Modify handling of physical link state by Host DriverByczkowski, Jakub1-1/+1
Ensure states returned to the Fabric Manager are consistent with the OPA specification by caching the physical state along with the logical state. Reviewed-by: Stuart Summers <john.s.summers@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Andrzej Kotlowski <andrzej.kotlowski@intel.com> Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27IB/core,rdmavt,hfi1,opa-vnic: Send OPA cap_mask3 in trapVishwanathapura, Niranjana1-1/+5
Provide the ability for IB clients to modify the OPA specific capability mask and include this mask in the subsequent trap data. Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Michael N. Henry <michael.n.henry@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27IB/hfi1: Setup common IB fields in hfi1_packet structDon Hiatt1-51/+38
We move many common IB fields into the hfi1_packet structure and set them up in a single function. This allows us to set the fields in a single place and not deal with them throughout the driver. Reviewed-by: Brian Welty <brian.welty@intel.com> Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27IB/hfi1: Separate input/output header tracingDon Hiatt1-4/+3
Calls to trace incoming packets will now receive the packet context as parameter. This enables trace support for future packet types. Header trace output is in the format <field>:<value> which makes parsing easier. input_ibhdr trace before change: <idle>-0 [001] d.h. 5904.250925: input_ibhdr: [0000:05:00.0] vl 0 lver 0 sl 0 lnh 2,LRH_BTH dlid 0002 len 18 slid 0001 op 0x64,UD_SEND_ONLY se 0 m 0 pad 0 tver 0 pkey 0xffff f 0 b 0 qpn 0x000001 a 0 psn 0x000001b2 deth qkey 0x80010000 sqpn 0x000001 input_ibhdr trace after change: <idle>-0 [001] d.h. 6655.714488: input_ibhdr: [0000:05:00.0] (IB) len:124 sc:0 dlid:0x0001 slid:0x0002 lnh:2,LRH_BTH lver:0 sl:0 age:0 becn:0 fecn:0 l4:0 rc:0 entropy:0 op:0x64,UD_SEND_ONLY se:0 m:0 pad:0 tver:0 pkey:0x7fff f:0 b:0 qpn:0x000001 a:0 psn:0x00000036 hlen:8 deth qkey:0x80010000 sqpn:0x000001 Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27IB/hfi1: Add functions to parse BTH/IB headersDon Hiatt1-3/+3
Improve code readablity by adding inline functions to read specific BTH/IB fields without knowledge of byte offsets. Reviewed-by: Brian Welty <brian.welty@intel.com> Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Define 'ib' and 'roce' rdma_ah_attr typesDasaratharaman Chandramouli1-0/+4
rdma_ah_attr can now be either ib or roce allowing core components to use one type or the other and also to define attributes unique to a specific type. struct ib_ah is also initialized with the type when its first created. This ensures that calls such as modify_ah dont modify the type of the address handle attribute. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Use rdma_ah_attr accessor functionsDasaratharaman Chandramouli1-10/+10
Modify core and driver components to use accessor functions introduced to access individual fields of rdma_ah_attr Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Rename ib_create_ah to rdma_create_ahDasaratharaman Chandramouli1-1/+1
Rename ib_create_ah to rdma_create_ah so its in sync with the rename of the ib address handle attribute Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-01IB/core: Rename struct ib_ah_attr to rdma_ah_attrDasaratharaman Chandramouli1-4/+4
This patch simply renames struct ib_ah_attr to rdma_ah_attr as these fields specify attributes that are not necessarily specific to IB. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28IB/hfi1: Prevent kernel QP post send hard lockupsMike Marciniszyn1-2/+2
The driver progress routines can call cond_resched() when a timeslice is exhausted and irqs are enabled. If the ULP had been holding a spin lock without disabling irqs and the post send directly called the progress routine, the cond_resched() could yield allowing another thread from the same ULP to deadlock on that same lock. Correct by replacing the current hfi1_do_send() calldown with a unique one for post send and adding an argument to hfi1_do_send() to indicate that the send engine is running in a thread. If the routine is not running in a thread, avoid calling cond_resched(). CC: <stable@vger.kernel.org> # 4.7.x- Fixes: Commit 831464ce4b74 ("IB/hfi1: Don't call cond_resched in atomic mode when sending packets") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28IB/hfi1: Add functions to parse 9B headersDon Hiatt1-4/+4
These inline functions improve code readability by enabling callers to read specific fields from the header without knowledge of byte offsets. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28IB/hfi1: Rename hdr2sc to hfi1_9B_get_sc5Dasaratharaman Chandramouli1-1/+1
The function really returned the 5-bit sc value from the header and rhf. hdr2sc didn't quite describe what it did. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Don Hiatt <don.hiatt@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28IB/rdmavt/hfi1/qib: Use the MGID and MLID for multicast addressingMichael J. Ruhl1-1/+1
The Infiniband spec defines "A multicast address is defined by a MGID and a MLID" (section 10.5). The current code only uses the MGID for identifying multicast groups. Update the driver to be compliant with this definition. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>