aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/hfi1 (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-12-11RDMA/hfi1: Initialize ib_device_ops structKamal Heib1-7/+11
Initialize ib_device_ops with the supported operations using ib_set_device_ops(). Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06IB/hfi1: Reduce lock contention on iowait_lock for sdma and pioMike Marciniszyn8-31/+27
Commit 4e045572e2c2 ("IB/hfi1: Add unique txwait_lock for txreq events") laid the ground work to support per resource waiting locking. This patch adds that with a lock unique to each sdma engine and pio sendcontext and makes necessary changes for verbs, PSM, and vnic to use the new locks. This is particularly beneficial for smaller messages that will exhaust resources at a faster rate. Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Gary Leshner <Gary.S.Leshner@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06IB/hfi1: Close VNIC sdma_progress sleep windowMike Marciniszyn1-10/+5
The call to sdma_progress() is called outside the wait lock. In this case, there is a race condition where sdma_progress() can return false and the sdma_engine can idle. If that happens, there will be no more sdma interrupts to cause the wakeup and the vnic_sdma xmit will hang. Fix by moving the lock to enclose the sdma_progress() call. Also, delete the tx_retry. The need for this was removed by: commit bcad29137a97 ("IB/hfi1: Serve the most starved iowait entry first") Fixes: 64551ede6cd1 ("IB/hfi1: VNIC SDMA support") Reviewed-by: Gary Leshner <Gary.S.Leshner@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06IB/hfi1: Allow the driver to initialize QP priv structMike Marciniszyn5-0/+65
This patch adds an interface to allow the driver to initialize the QP priv struct when the QP is created and after the qpn has been assigned. A field is added to the QP priv struct to reference the rcd and two new files are added to contain the function to initialize the rcd field so that more TID RDMA related code can be added here later. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06IB/hfi1: Add OPFN and TID RDMA capability bitsKaike Wan1-8/+11
The OPFN and TID RDMA capability bits are added to allow users to control which feature is enabled and disabled. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06IB/hfi1: Unreserve a reserved request when it is completedKaike Wan1-0/+2
Currently, When a reserved operation is completed, its entry in the send queue will not be unreserved, which leads to the miscalculation of qp->s_avail and thus the triggering of a WARN_ON call trace. This patch fixes the problem by unreserving the reserved operation when it is completed. Fixes: 856cc4c237ad ("IB/hfi1: Add the capability for reserved operations") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06IB/hfi1: Consider LMC in 16B/bypass ingress packet checkAshutosh Dixit1-1/+1
Ingress packet check for 16B/bypass packets should consider the port LMC. Not doing this will result in packets sent to the LMC LIDs getting dropped. The check is implemented in HW for 9B packets. Reviewed-by: Mike Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06IB/hfi1: Incorrect sizing of sge for PIO will OOPsMichael J. Ruhl1-0/+2
An incorrect sge sizing in the HFI PIO path will cause an OOPs similar to this: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [] hfi1_verbs_send_pio+0x3d8/0x530 [hfi1] PGD 0 Oops: 0000 1 SMP Call Trace: ? hfi1_verbs_send_dma+0xad0/0xad0 [hfi1] hfi1_verbs_send+0xdf/0x250 [hfi1] ? make_rc_ack+0xa80/0xa80 [hfi1] hfi1_do_send+0x192/0x430 [hfi1] hfi1_do_send_from_rvt+0x10/0x20 [hfi1] rvt_post_send+0x369/0x820 [rdmavt] ib_uverbs_post_send+0x317/0x570 [ib_uverbs] ib_uverbs_write+0x26f/0x420 [ib_uverbs] ? security_file_permission+0x21/0xa0 vfs_write+0xbd/0x1e0 ? mntput+0x24/0x40 SyS_write+0x7f/0xe0 system_call_fastpath+0x16/0x1b Fix by adding the missing sizing check to correctly determine the sge length. Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06IB/hfi1: Limit VNIC use of SDMA engines to the available countMichael J. Ruhl1-2/+2
VNIC assumes that all SDMA engines have been configured for use. This is not necessarily true (i.e. if the count was constrained by the module parameter). Update VNICs usage to use the configured count, rather than the hardware count. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Gary Leshner <gary.s.leshner@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06IB/hfi1: Correctly process FECN and BECN in packetsMitko Haralanov5-66/+104
A CA is supposed to ignore FECN bits in multicast, ACK, and CNP packets. This patch corrects the behavior of the HFI1 driver in this regard by ignoring FECNs in those packet types. While fixing the above behavior, fix the extraction of the FECN and BECN bits from the packet headers for both 9B and 16B packets. Furthermore, this patch corrects the driver's response to a FECN in RDMA READ RESPONSE packets. Instead of sending an "empty" ACK, the driver now sends a CNP packet. While editing that code path, add the missing trace for CNP packets. Fixes: 88733e3b8450 ("IB/hfi1: Add 16B UD support") Fixes: f59fb9e05109 ("IB/hfi1: Fix handling of FECN marked multicast packet") Reviewed-by: Kaike Wan <kaike.wan@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06IB/hfi1: Ignore LNI errors before DC8051 transitions to Polling stateKaike Wan1-1/+46
When it is requested to change its physical state back to Offline while in the process to go up, DC8051 will set the ERROR field in the DC8051_DBG_ERR_INFO_SET_BY_8051 register. This ERROR field will remain until the next time when DC8051 transitions from Offline to Polling. Subsequently, when the host requests DC8051 to change its physical state to Polling again, it may receive a DC8051 interrupt with the stale ERROR field still in DC8051_DBG_ERR_INFO_SET_BY_8051. If the host link state has been changed to Polling, this stale ERROR will force the host to transition to Offline state, resulting in a vicious cycle of Polling ->Offline->Polling->Offline. On the other hand, if the host link state is still Offline when the stale ERROR is received, the stale ERROR will be ignored, and the link will come up correctly. This patch implements the correct behavior by changing host link state to Polling only after DC8051 changes its physical state to Polling. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Krzysztof Goreczny <krzysztof.goreczny@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-12-06IB/hfi1: Dump pio info for non-user send contextsKaike Wan4-0/+81
This patch dumps the pio info for non-user send contexts to assist debugging in the field. Reviewed-by: Mike Marciniczyn <mike.marciniszyn@intel.com> Reviewed-by: Mike Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-26Revert "mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks"Michal Hocko1-1/+0
Revert 5ff7091f5a2ca ("mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks"). MMU_INVALIDATE_DOES_NOT_BLOCK flags was the only one used and it is no longer needed since 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifiers"). We now have a full support for per range !blocking behavior so we can drop the stop gap workaround which the per notifier flag was used for. Link: http://lkml.kernel.org/r/20180827112623.8992-4-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: David Rientjes <rientjes@google.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-26Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds33-1472/+1362
Pull rdma updates from Jason Gunthorpe: "This has been a smaller cycle with many of the commits being smallish code fixes and improvements across the drivers. - Driver updates for bnxt_re, cxgb4, hfi1, hns, mlx5, nes, qedr, and rxe - Memory window support in hns - mlx5 user API 'flow mutate/steering' allows accessing the full packet mangling and matching machinery from user space - Support inter-working with verbs API calls in the 'devx' mlx5 user API, and provide options to use devx with less privilege - Modernize the use of syfs and the device interface to use attribute groups and cdev properly for uverbs, and clean up some of the core code's device list management - More progress on net namespaces for RDMA devices - Consolidate driver BAR mmapping support into core code helpers and rework how RDMA holds poitners to mm_struct for get_user_pages cases - First pass to use 'dev_name' instead of ib_device->name - Device renaming for RDMA devices" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (242 commits) IB/mlx5: Add support for extended atomic operations RDMA/core: Fix comment for hw stats init for port == 0 RDMA/core: Refactor ib_register_device() function RDMA/core: Fix unwinding flow in case of error to register device ib_srp: Remove WARN_ON in srp_terminate_io() IB/mlx5: Allow scatter to CQE without global signaled WRs IB/mlx5: Verify that driver supports user flags IB/mlx5: Support scatter to CQE for DC transport type RDMA/drivers: Use core provided API for registering device attributes RDMA/core: Allow existing drivers to set one sysfs group per device IB/rxe: Remove unnecessary enum values RDMA/umad: Use kernel API to allocate umad indexes RDMA/uverbs: Use kernel API to allocate uverbs indexes RDMA/core: Increase total number of RDMA ports across all devices IB/mlx4: Add port and TID to MAD debug print IB/mlx4: Enable debug print of SMPs RDMA/core: Rename ports_parent to ports_kobj RDMA/core: Do not expose unsupported counters IB/mlx4: Refer to the device kobject instead of ports_parent RDMA/nldev: Allow IB device rename through RDMA netlink ...
2018-10-25Merge tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pciLinus Torvalds1-1/+0
Pull PCI updates from Bjorn Helgaas: - Fix ASPM link_state teardown on removal (Lukas Wunner) - Fix misleading _OSC ASPM message (Sinan Kaya) - Make _OSC optional for PCI (Sinan Kaya) - Don't initialize ASPM link state when ACPI_FADT_NO_ASPM is set (Patrick Talbert) - Remove x86 and arm64 node-local allocation for host bridge structures (Punit Agrawal) - Pay attention to device-specific _PXM node values (Jonathan Cameron) - Support new Immediate Readiness bit (Felipe Balbi) - Differentiate between pciehp surprise and safe removal (Lukas Wunner) - Remove unnecessary pciehp includes (Lukas Wunner) - Drop pciehp hotplug_slot_ops wrappers (Lukas Wunner) - Tolerate PCIe Slot Presence Detect being hardwired to zero to workaround broken hardware, e.g., the Wilocity switch/wireless device (Lukas Wunner) - Unify pciehp controller & slot structs (Lukas Wunner) - Constify hotplug_slot_ops (Lukas Wunner) - Drop hotplug_slot_info (Lukas Wunner) - Embed hotplug_slot struct into users instead of allocating it separately (Lukas Wunner) - Initialize PCIe port service drivers directly instead of relying on initcall ordering (Keith Busch) - Restore PCI config state after a slot reset (Keith Busch) - Save/restore DPC config state along with other PCI config state (Keith Busch) - Reference count devices during AER handling to avoid race issue with concurrent hot removal (Keith Busch) - If an Upstream Port reports ERR_FATAL, don't try to read the Port's config space because it is probably unreachable (Keith Busch) - During error handling, use slot-specific reset instead of secondary bus reset to avoid link up/down issues on hotplug ports (Keith Busch) - Restore previous AER/DPC handling that does not remove and re-enumerate devices on ERR_FATAL (Keith Busch) - Notify all drivers that may be affected by error recovery resets (Keith Busch) - Always generate error recovery uevents, even if a driver doesn't have error callbacks (Keith Busch) - Make PCIe link active reporting detection generic (Keith Busch) - Support D3cold in PCIe hierarchies during system sleep and runtime, including hotplug and Thunderbolt ports (Mika Westerberg) - Handle hpmemsize/hpiosize kernel parameters uniformly, whether slots are empty or occupied (Jon Derrick) - Remove duplicated include from pci/pcie/err.c and unused variable from cpqphp (YueHaibing) - Remove driver pci_cleanup_aer_uncorrect_error_status() calls (Oza Pawandeep) - Uninline PCI bus accessors for better ftracing (Keith Busch) - Remove unused AER Root Port .error_resume method (Keith Busch) - Use kfifo in AER instead of a local version (Keith Busch) - Use threaded IRQ in AER bottom half (Keith Busch) - Use managed resources in AER core (Keith Busch) - Reuse pcie_port_find_device() for AER injection (Keith Busch) - Abstract AER interrupt handling to disconnect error injection (Keith Busch) - Refactor AER injection callbacks to simplify future improvments (Keith Busch) - Remove unused Netronome NFP32xx Device IDs (Jakub Kicinski) - Use bitmap_zalloc() for dma_alias_mask (Andy Shevchenko) - Add switch fall-through annotations (Gustavo A. R. Silva) - Remove unused Switchtec quirk variable (Joshua Abraham) - Fix pci.c kernel-doc warning (Randy Dunlap) - Remove trivial PCI wrappers for DMA APIs (Christoph Hellwig) - Add Intel GPU device IDs to spurious interrupt quirk (Bin Meng) - Run Switchtec DMA aliasing quirk only on NTB endpoints to avoid useless dmesg errors (Logan Gunthorpe) - Update Switchtec NTB documentation (Wesley Yung) - Remove redundant "default n" from Kconfig (Bartlomiej Zolnierkiewicz) - Avoid panic when drivers enable MSI/MSI-X twice (Tonghao Zhang) - Add PCI support for peer-to-peer DMA (Logan Gunthorpe) - Add sysfs group for PCI peer-to-peer memory statistics (Logan Gunthorpe) - Add PCI peer-to-peer DMA scatterlist mapping interface (Logan Gunthorpe) - Add PCI configfs/sysfs helpers for use by peer-to-peer users (Logan Gunthorpe) - Add PCI peer-to-peer DMA driver writer's documentation (Logan Gunthorpe) - Add block layer flag to indicate driver support for PCI peer-to-peer DMA (Logan Gunthorpe) - Map Infiniband scatterlists for peer-to-peer DMA if they contain P2P memory (Logan Gunthorpe) - Register nvme-pci CMB buffer as PCI peer-to-peer memory (Logan Gunthorpe) - Add nvme-pci support for PCI peer-to-peer memory in requests (Logan Gunthorpe) - Use PCI peer-to-peer memory in nvme (Stephen Bates, Steve Wise, Christoph Hellwig, Logan Gunthorpe) - Cache VF config space size to optimize enumeration of many VFs (KarimAllah Ahmed) - Remove unnecessary <linux/pci-ats.h> include (Bjorn Helgaas) - Fix VMD AERSID quirk Device ID matching (Jon Derrick) - Fix Cadence PHY handling during probe (Alan Douglas) - Signal Cadence Endpoint interrupts via AXI region 0 instead of last region (Alan Douglas) - Write Cadence Endpoint MSI interrupts with 32 bits of data (Alan Douglas) - Remove redundant controller tests for "device_type == pci" (Rob Herring) - Document R-Car E3 (R8A77990) bindings (Tho Vu) - Add device tree support for R-Car r8a7744 (Biju Das) - Drop unused mvebu PCIe capability code (Thomas Petazzoni) - Add shared PCI bridge emulation code (Thomas Petazzoni) - Convert mvebu to use shared PCI bridge emulation (Thomas Petazzoni) - Add aardvark Root Port emulation (Thomas Petazzoni) - Support 100MHz/200MHz refclocks for i.MX6 (Lucas Stach) - Add initial power management for i.MX7 (Leonard Crestez) - Add PME_Turn_Off support for i.MX7 (Leonard Crestez) - Fix qcom runtime power management error handling (Bjorn Andersson) - Update TI dra7xx unaligned access errata workaround for host mode as well as endpoint mode (Vignesh R) - Fix kirin section mismatch warning (Nathan Chancellor) - Remove iproc PAXC slot check to allow VF support (Jitendra Bhivare) - Quirk Keystone K2G to limit MRRS to 256 (Kishon Vijay Abraham I) - Update Keystone to use MRRS quirk for host bridge instead of open coding (Kishon Vijay Abraham I) - Refactor Keystone link establishment (Kishon Vijay Abraham I) - Simplify and speed up Keystone link training (Kishon Vijay Abraham I) - Remove unused Keystone host_init argument (Kishon Vijay Abraham I) - Merge Keystone driver files into one (Kishon Vijay Abraham I) - Remove redundant Keystone platform_set_drvdata() (Kishon Vijay Abraham I) - Rename Keystone functions for uniformity (Kishon Vijay Abraham I) - Add Keystone device control module DT binding (Kishon Vijay Abraham I) - Use SYSCON API to get Keystone control module device IDs (Kishon Vijay Abraham I) - Clean up Keystone PHY handling (Kishon Vijay Abraham I) - Use runtime PM APIs to enable Keystone clock (Kishon Vijay Abraham I) - Clean up Keystone config space access checks (Kishon Vijay Abraham I) - Get Keystone outbound window count from DT (Kishon Vijay Abraham I) - Clean up Keystone outbound window configuration (Kishon Vijay Abraham I) - Clean up Keystone DBI setup (Kishon Vijay Abraham I) - Clean up Keystone ks_pcie_link_up() (Kishon Vijay Abraham I) - Fix Keystone IRQ status checking (Kishon Vijay Abraham I) - Add debug messages for all Keystone errors (Kishon Vijay Abraham I) - Clean up Keystone includes and macros (Kishon Vijay Abraham I) - Fix Mediatek unchecked return value from devm_pci_remap_iospace() (Gustavo A. R. Silva) - Fix Mediatek endpoint/port matching logic (Honghui Zhang) - Change Mediatek Root Port Class Code to PCI_CLASS_BRIDGE_PCI (Honghui Zhang) - Remove redundant Mediatek PM domain check (Honghui Zhang) - Convert Mediatek to pci_host_probe() (Honghui Zhang) - Fix Mediatek MSI enablement (Honghui Zhang) - Add Mediatek system PM support for MT2712 and MT7622 (Honghui Zhang) - Add Mediatek loadable module support (Honghui Zhang) - Detach VMD resources after stopping root bus to prevent orphan resources (Jon Derrick) - Convert pcitest build process to that used by other tools (iio, perf, etc) (Gustavo Pimentel) * tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits) PCI/AER: Refactor error injection fallbacks PCI/AER: Abstract AER interrupt handling PCI/AER: Reuse existing pcie_port_find_device() interface PCI/AER: Use managed resource allocations PCI: pcie: Remove redundant 'default n' from Kconfig PCI: aardvark: Implement emulated root PCI bridge config space PCI: mvebu: Convert to PCI emulated bridge config space PCI: mvebu: Drop unused PCI express capability code PCI: Introduce PCI bridge emulated config space common logic PCI: vmd: Detach resources after stopping root bus nvmet: Optionally use PCI P2P memory nvmet: Introduce helper functions to allocate and free request SGLs nvme-pci: Add support for P2P memory in requests nvme-pci: Use PCI p2pmem subsystem to manage the CMB IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]() block: Add PCI P2P flag for request queue PCI/P2PDMA: Add P2P DMA driver writer's documentation docs-rst: Add a new directory for PCI documentation PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset ...
2018-10-17RDMA/drivers: Use core provided API for registering device attributesParav Pandit3-37/+36
Use rdma_set_device_sysfs_group() to register device attributes and simplify the driver. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16Merge branch 'for-rc' into rdma.git for-nextJason Gunthorpe5-13/+56
From git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git This is required to resolve dependencies of the next series of RDMA patches. The code motion conflicts in drivers/infiniband/core/cache.c were resolved. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03IB/{hfi1, qib, rdmavt}: Move ruc_loopback to rdmavtVenkata Sandeep Dhanalakota1-330/+2
This patch moves ruc_loopback() from hfi1 into rdmavt for code sharing with the qib driver. Reviewed-by: Brian Welty <brian.welty@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03IB/{hfi1, qib, rdmavt}: Move send completion logic to rdmavtVenkata Sandeep Dhanalakota6-60/+19
Moving send completion code into rdmavt in order to have shared logic between qib and hfi1 drivers. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03IB/{hfi1, qib, rdmavt}: Move copy SGE logic into rdmavtBrian Welty7-272/+26
This patch moves hfi1_copy_sge() into rdmavt for sharing with qib. This patch also moves all the wss_*() functions into rdmavt as several wss_*() functions are called from hfi1_copy_sge() When SGE copy mode is adaptive, cacheless copy may be done in some cases for performance reasons. In those cases, X86 cacheless copy function is called since the drivers that use rdmavt and may set SGE copy mode to adaptive are X86 only. For this reason, this patch adds "depends on X86_64" to rdmavt/Kconfig. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-02PCI/AER: Remove pci_cleanup_aer_uncorrect_error_status() callsOza Pawandeep1-1/+0
After bfcb79fca19d ("PCI/ERR: Run error recovery callbacks for all affected devices"), AER errors are always cleared by the PCI core and drivers don't need to do it themselves. Remove calls to pci_cleanup_aer_uncorrect_error_status() from device driver error recovery functions. Signed-off-by: Oza Pawandeep <poza@codeaurora.org> [bhelgaas: changelog, remove PCI core changes, remove unused variables] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-09-30IB/hfi1: Use VL15 for SM packetsKaike Wan1-2/+12
Subnet Management Packets (SMP) should exclusively use VL15 and their SL is ignored (IBTA v1.3, Section 3.5.8.2). Therefore, when an SMP is posted, the SL in the address handle can be set to 0 by a user application. Consequently, when an address handle is created by the IB core, some fields in struct rvt_ah may not be set correctly by using the SL2SC and SC2VL tables at the time. Subsequently, when the request is post sent, the incoming swqe may fail the validation check, resulting in the rejection of the send request. This patch fixes the problem by using VL15 for any validation, ignoring the SL in the address handle. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-30IB/hfi1: Add mtu check for operational data VLsAlex Estrin1-4/+22
Since Virtual Lanes BCT credits and MTU are set through separate MADs, we have to ensure both are valid, and data VLs are ready for transmission before we allow port transition to Armed state. Fixes: 5e2d6764a729 ("IB/hfi1: Verify port data VLs credits on transition to Armed") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-30IB/hfi1: Add static trace for iowaitKaike Wan3-1/+59
This patch adds the static trace for resource wait. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-30IB/hfi1: Prepare resource waits for dual legDennis Dalessandro13-157/+356
Current implementation allows each qp to have only one send engine. As such, each qp has only one list to queue prebuilt packets when send engine resources are not available. To improve performance, it is desired to support multiple send engines for each qp. This patch creates the framework to support two send engines (two legs) for each qp for the TID RDMA protocol, which can be easily extended to support more send engines. It achieves the goal by creating a leg specific struct, iowait_work in the iowait struct, to hold the work_struct and the tx_list as well as a pointer to the parent iowait struct. The hfi1_pkt_state now has an additional field to record the current legs work structure and that is now passed to all egress waiters to determine the leg that needs to wait via a new iowait helper. The APIs are adjusted to use the new leg specific struct as required. Many new and modified helpers are added to support this change. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-30IB/rdmavt: Rename check_send_wqe as setup_wqeKaike Wan3-6/+11
The driver-provided function check_send_wqe allows the hardware driver to check and set up the incoming send wqe before it is inserted into the swqe ring. This patch will rename it as setup_wqe to better reflect its usage. In addition, this function is only called when all setup is complete in rdmavt. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-30IB/hfi1: Error path MAD response size is incorrectMichael J. Ruhl1-2/+2
If a MAD packet has incorrect header information, the logic uses the reply path to report the error. The reply path expects *resp_len to be set prior to return. Unfortunately, *resp_len is set to 0 for this path. This causes an incorrect response packet. Fix by ensuring that the *resp_len is defaulted to the incoming packet size (wc->bytes_len - sizeof(GRH)). Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-27Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaGreg Kroah-Hartman5-13/+56
Jason writes: "Second RDMA rc pull request - Fix a long standing race bug when destroying comp_event file descriptors - srp, hfi1, bnxt_re: Various driver crashes from missing validation and other cases - Fixes for regressions in patches merged this window in the gid cache, devx, ucma and uapi." * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/core: Set right entry state before releasing reference IB/mlx5: Destroy the DEVX object upon error flow IB/uverbs: Free uapi on destroy RDMA/bnxt_re: Fix system crash during RDMA resource initialization IB/hfi1: Fix destroy_qp hang after a link down IB/hfi1: Fix context recovery when PBC has an UnsupportedVL IB/hfi1: Invalid user input can result in crash IB/hfi1: Fix SL array bounds check RDMA/uverbs: Fix validity check for modify QP IB/srp: Avoid that sg_reset -d ${srp_device} triggers an infinite loop ucma: fix a use-after-free in ucma_resolve_ip() RDMA/uverbs: Atomically flush and mark closed the comp event queue cxgb4: fix abort_req_rss6 struct
2018-09-26IB/hfi1: Move UnsupportedVL bits definitions to the correct headerMichael J. Ruhl2-8/+4
The UnsupportedVL SendCtrl register bit information is defined in the module rather than the chip register header file. Move the defines to the appropriate header file. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-20IB/hfi1: Fix destroy_qp hang after a link downMichael J. Ruhl3-9/+41
rvt_destroy_qp() cannot complete until all in process packets have been released from the underlying hardware. If a link down event occurs, an application can hang with a kernel stack similar to: cat /proc/<app PID>/stack quiesce_qp+0x178/0x250 [hfi1] rvt_reset_qp+0x23d/0x400 [rdmavt] rvt_destroy_qp+0x69/0x210 [rdmavt] ib_destroy_qp+0xba/0x1c0 [ib_core] nvme_rdma_destroy_queue_ib+0x46/0x80 [nvme_rdma] nvme_rdma_free_queue+0x3c/0xd0 [nvme_rdma] nvme_rdma_destroy_io_queues+0x88/0xd0 [nvme_rdma] nvme_rdma_error_recovery_work+0x52/0xf0 [nvme_rdma] process_one_work+0x17a/0x440 worker_thread+0x126/0x3c0 kthread+0xcf/0xe0 ret_from_fork+0x58/0x90 0xffffffffffffffff quiesce_qp() waits until all outstanding packets have been freed. This wait should be momentary. During a link down event, the cleanup handling does not ensure that all packets caught by the link down are flushed properly. This is caused by the fact that the freeze path and the link down event is handled the same. This is not correct. The freeze path waits until the HFI is unfrozen and then restarts PIO. A link down is not a freeze event. The link down path cannot restart the PIO until link is restored. If the PIO path is restarted before the link comes up, the application (QP) using the PIO path will hang (until link is restored). Fix by separating the linkdown path from the freeze path and use the link down path for link down events. Close a race condition sc_disable() by acquiring both the progress and release locks. Close a race condition in sc_stop() by moving the setting of the flag bits under the alloc lock. Cc: <stable@vger.kernel.org> # 4.9.x+ Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-20IB/hfi1: Fix context recovery when PBC has an UnsupportedVLMichael J. Ruhl1-2/+7
If a packet stream uses an UnsupportedVL (virtual lane), the send engine will not send the packet, and it will not indicate that an error has occurred. This will cause the packet stream to block. HFI has 8 virtual lanes available for packet streams. Each lane can be enabled or disabled using the UnsupportedVL mask. If a lane is disabled, adding a packet to the send context must be disallowed. The current mask for determining unsupported VLs defaults to 0 (allow all). This is incorrect. Only the VLs that are defined should be allowed. Determine which VLs are disabled (mtu == 0), and set the appropriate unsupported bit in the mask. The correct mask will allow the send engine to error on the invalid VL, and error recovery will work correctly. Cc: <stable@vger.kernel.org> # 4.9.x+ Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Lukasz Odzioba <lukasz.odzioba@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-20IB/hfi1: Invalid user input can result in crashMichael J. Ruhl1-1/+1
If the number of packets in a user sdma request does not match the actual iovectors being sent, sdma_cleanup can be called on an uninitialized request structure, resulting in a crash similar to this: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffffc0ae8bb7>] __sdma_txclean+0x57/0x1e0 [hfi1] PGD 8000001044f61067 PUD 1052706067 PMD 0 Oops: 0000 [#1] SMP CPU: 30 PID: 69912 Comm: upsm Kdump: loaded Tainted: G OE ------------ 3.10.0-862.el7.x86_64 #1 Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016 task: ffff8b331c890000 ti: ffff8b2ed1f98000 task.ti: ffff8b2ed1f98000 RIP: 0010:[<ffffffffc0ae8bb7>] [<ffffffffc0ae8bb7>] __sdma_txclean+0x57/0x1e0 [hfi1] RSP: 0018:ffff8b2ed1f9bab0 EFLAGS: 00010286 RAX: 0000000000008b2b RBX: ffff8b2adf6e0000 RCX: 0000000000000000 RDX: 00000000000000a0 RSI: ffff8b2e9eedc540 RDI: ffff8b2adf6e0000 RBP: ffff8b2ed1f9bad8 R08: 0000000000000000 R09: ffffffffc0b04a06 R10: ffff8b331c890190 R11: ffffe6ed00bf1840 R12: ffff8b3315480000 R13: ffff8b33154800f0 R14: 00000000fffffff2 R15: ffff8b2e9eedc540 FS: 00007f035ac47740(0000) GS:ffff8b331e100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 0000000c03fe6000 CR4: 00000000001607e0 Call Trace: [<ffffffffc0b0570d>] user_sdma_send_pkts+0xdcd/0x1990 [hfi1] [<ffffffff9fe75fb0>] ? gup_pud_range+0x140/0x290 [<ffffffffc0ad3105>] ? hfi1_mmu_rb_insert+0x155/0x1b0 [hfi1] [<ffffffffc0b0777b>] hfi1_user_sdma_process_request+0xc5b/0x11b0 [hfi1] [<ffffffffc0ac193a>] hfi1_aio_write+0xba/0x110 [hfi1] [<ffffffffa001a2bb>] do_sync_readv_writev+0x7b/0xd0 [<ffffffffa001bede>] do_readv_writev+0xce/0x260 [<ffffffffa022b089>] ? tty_ldisc_deref+0x19/0x20 [<ffffffffa02268c0>] ? n_tty_ioctl+0xe0/0xe0 [<ffffffffa001c105>] vfs_writev+0x35/0x60 [<ffffffffa001c2bf>] SyS_writev+0x7f/0x110 [<ffffffffa051f7d5>] system_call_fastpath+0x1c/0x21 Code: 06 49 c7 47 18 00 00 00 00 0f 87 89 01 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 2e 0f 1f 84 00 00 00 00 00 48 8b 4e 10 48 89 fb <48> 8b 51 08 49 89 d4 83 e2 0c 41 81 e4 00 e0 00 00 48 c1 ea 02 RIP [<ffffffffc0ae8bb7>] __sdma_txclean+0x57/0x1e0 [hfi1] RSP <ffff8b2ed1f9bab0> CR2: 0000000000000008 There are two exit points from user_sdma_send_pkts(). One (free_tx) merely frees the slab entry and one (free_txreq) cleans the sdma_txreq prior to freeing the slab entry. The free_txreq variation can only be called after one of the sdma_init*() variations has been called. In the panic case, the slab entry had been allocated but not inited. Fix the issue by exiting through free_tx thus avoiding sdma_clean(). Cc: <stable@vger.kernel.org> # 4.9.x+ Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Lukasz Odzioba <lukasz.odzioba@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-20IB/hfi1: Fix SL array bounds checkIra Weiny1-1/+7
The SL specified by a user needs to be a valid SL. Add a range check to the user specified SL value which protects from running off the end of the SL to SC table. CC: stable@vger.kernel.org Fixes: 7724105686e7 ("IB/hfi1: add driver files") Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11IB/hfi1,PCI: Allow bus reset while probingDennis Dalessandro1-7/+4
Calling into the new API to reset the secondary bus results in a deadlock. This occurs because the device/bus is already locked at probe time. Reverting back to the old behavior while the API is improved. Link: https://bugzilla.kernel.org/show_bug.cgi?id=200985 Fixes: c6a44ba950d1 ("PCI: Rename pci_try_reset_bus() to pci_reset_bus()") Fixes: 409888e0966e ("IB/hfi1: Use pci_try_reset_bus() for initiating PCI Secondary Bus Reset") Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Cc: Sinan Kaya <okaya@codeaurora.org>
2018-09-11IB/hfi1: set_intr_bits uses incorrect source for register modificationMichael J. Ruhl1-1/+1
HFI IRQ enable bits are not being set correctly. Send context error and DC IRQs are not being enabled correctly. In addition, send context error IRQs are not being delivered. Because of this, send context errors are not being handled correctly when they occur. When setting the IRQ bits, if an IRQ range is used, and the last bit is on a register boundary (bit 63), the calculated index for the final register modification is incorrect (index + 1 vs. index). The incorrect index calculation causes incorrect IRQ bits to be set. In this case the send context error IRQ is NOT enabled. Fix by using the 'last' value rather than the counted 'src' value to determine the final index to use. This satisfies all cases. Fixes: a2f7bbdc2dba ("IB/hfi1: Rework the IRQ API to be more flexible") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11IB/hfi1: Missing return value in error path for user sdmaMichael J. Ruhl1-1/+3
If the set_txreq_header_agh() function returns an error, the exit path is chosen. In this path, the code fails to set the return value. This will cause the caller to not realize an error has occurred. Set the return value correctly in the error path. Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11IB/hfi1: Right size user_sdma sequence numbers and related variablesMichael J. Ruhl4-11/+11
Hardware limits the maximum number of packets to u16 packets. Match that size for all relevant sequence numbers in the user_sdma engine. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11IB/hfi1: Remove race conditions in user_sdma send pathMichael J. Ruhl2-18/+15
Packet queue state is over used to determine SDMA descriptor availablitity and packet queue request state. cpu 0 ret = user_sdma_send_pkts(req, pcount); cpu 0 if (atomic_read(&pq->n_reqs)) cpu 1 IRQ user_sdma_txreq_cb calls pq_update() (state to _INACTIVE) cpu 0 xchg(&pq->state, SDMA_PKT_Q_ACTIVE); At this point pq->n_reqs == 0 and pq->state is incorrectly SDMA_PKT_Q_ACTIVE. The close path will hang waiting for the state to return to _INACTIVE. This can also change the state from _DEFERRED to _ACTIVE. However, this is a mostly benign race. Remove the racy code path. Use n_reqs to determine if a packet queue is active or not. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11IB/hfi1: Eliminate races in the SDMA send error pathMichael J. Ruhl2-51/+39
pq_update() can only be called in two places: from the completion function when the complete (npkts) sequence of packets has been submitted and processed, or from setup function if a subset of the packets were submitted (i.e. the error path). Currently both paths can call pq_update() if an error occurrs. This race will cause the n_req value to go negative, hanging file_close(), or cause a crash by freeing the txlist more than once. Several variables are used to determine SDMA send state. Most of these are unnecessary, and have code inspectible races between the setup function and the completion function, in both the send path and the error path. The request 'status' value can be set by the setup or by the completion function. This is code inspectibly racy. Since the status is not needed in the completion code or by the caller it has been removed. The request 'done' value races between usage by the setup and the completion function. The completion function does not need this. When the number of processed packets matches npkts, it is done. The 'has_error' value races between usage of the setup and the completion function. This can cause incorrect error handling and leave the n_req in an incorrect value (i.e. negative). Simplify the code by removing all of the unneeded state checks and variables. Clean up iovs node when it is freed. Eliminate race conditions in the error path: If all packets are submitted, the completion handler will set the completion status correctly (ok or aborted). If all packets are not submitted, the caller must wait until the submitted packets have completed, and then set the completion status. These two change eliminate the race condition in the error path. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-11IB/{hfi1, qib, rdmavt}: Schedule multi RC/UC packets instead of postingMichael J. Ruhl2-8/+7
The post_send() path determines if it should post directly or, schedule the post for later. The current logic is: if the swqe ring is empty or (for hfi1) wqe->length <= piothreshold post the send else schedule This can allow large requests to call the send engine directly. Large requests can potentially produce a large number of packets prior to returning to the caller, blocking the caller from posting more requests, and allowing better parallel processing. Allow the driver(s) more say in this logic (pass call_send to the driver, rather than examining a return value). Update hfi1/qib logic to schedule the send engine if an RC or UC message is larger than the QP MTU size. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-01IB/hfi1: Move URGENT IRQ enable to hfi1_rcvctrl()Michael J. Ruhl3-8/+12
User contexts use the receive URGENT interrupt. However, enabling the IRQ SRC in the file_ops module is not as clean as it could be. Augment the _rcvctl() function to be able to enable/disable the IRQ source. Use the new interface from file_ops to enable/disable the IRQ. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01IB/hfi1: Rework the IRQ API to be more flexibleMichael J. Ruhl6-104/+162
The current IRQ API is an all or nothing interface. This has two problems: 1. All IRQs are enabled regardless of use 2. Moving from general interrupt to MSIx handling is difficult Introduce a new API to enable/disable specific IRQs or a range of IRQs. Do not enable and disable all IRQs in one step. Rework various modules to enable/disable IRQs when needed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01IB/hfi1: PCIe bus width retryKamenee Arumugam1-3/+8
Retry the PCIe link training up to 'pcie_retry' times if the PCIe link width is narrower than the previous width. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01IB/hfi1: Make the MSIx resource allocation a bit more flexibleMichael J. Ruhl8-243/+295
The current method of allocating MSIx resources is a bit cumbersome, and not very easily added to. Refactor and re-order the code paths into a more consistent interface. Update the interface so that allocations are not order dependent. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01IB/hfi1: Prepare for new HFI1 MSIx APIMichael J. Ruhl7-295/+414
The current HFI1 MSIx API is difficult to follow, change, or add to. In anticipation of moving to an more flexible API, move the current MSIx functionality to the new msix.c module. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01IB/hfi1: Get the hfi1_devdata structure as early as possibleMichael J. Ruhl4-86/+59
Currently several things occur before the hfi1_devdata structure is allocated. This leads to an inconsistent logging ability and makes it more difficult to restructure some code paths. Allocate (and do a minimal init) the structure as soon as possible. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01IB/hfi1: tune_pcie_caps is arbitrarily placed, poorlyMichael J. Ruhl3-10/+15
The tune_pcie_caps needs to occur sometime after PCI is enabled, but before the HFI is enabled. Currently it is placed in the MSIx allocation code which doesn't really fit. Moving it to just after the gen3 bump. Clean up the associated code (modules, etc.). Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01IB/hfi1: Remove duplicated definesMichael J. Ruhl1-10/+0
TXREQ defines are duplicated, incompletely, in the sdma header file. Remove duplicate defines. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01IB/hfi1: Rework file list in MakefileDennis Dalessandro1-6/+34
We want to keep files in alphabetical order in our makefile, however this just makes for messy diffs when adding (or removing) files. Let's just clean this up and make it line by line. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-08-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-3/+21
Pull more rdma updates from Jason Gunthorpe: "This is the SMC cleanup promised, a randconfig regression fix, and kernel oops fix. Summary: - Switch SMC over to rdma_get_gid_attr and remove the compat - Fix a crash in HFI1 with some BIOS's - Fix a randconfig failure" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: IB/ucm: fix UCM link error IB/hfi1: Invalid NUMA node information can cause a divide by zero RDMA/smc: Replace ib_query_gid with rdma_get_gid_attr