aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/hfi1/init.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-10-03IB/{hfi1, qib, rdmavt}: Move copy SGE logic into rdmavtBrian Welty1-6/+0
This patch moves hfi1_copy_sge() into rdmavt for sharing with qib. This patch also moves all the wss_*() functions into rdmavt as several wss_*() functions are called from hfi1_copy_sge() When SGE copy mode is adaptive, cacheless copy may be done in some cases for performance reasons. In those cases, X86 cacheless copy function is called since the drivers that use rdmavt and may set SGE copy mode to adaptive are X86 only. For this reason, this patch adds "depends on X86_64" to rdmavt/Kconfig. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-01IB/hfi1: Rework the IRQ API to be more flexibleMichael J. Ruhl1-3/+23
The current IRQ API is an all or nothing interface. This has two problems: 1. All IRQs are enabled regardless of use 2. Moving from general interrupt to MSIx handling is difficult Introduce a new API to enable/disable specific IRQs or a range of IRQs. Do not enable and disable all IRQs in one step. Rework various modules to enable/disable IRQs when needed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01IB/hfi1: Make the MSIx resource allocation a bit more flexibleMichael J. Ruhl1-2/+2
The current method of allocating MSIx resources is a bit cumbersome, and not very easily added to. Refactor and re-order the code paths into a more consistent interface. Update the interface so that allocations are not order dependent. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-09-01IB/hfi1: Get the hfi1_devdata structure as early as possibleMichael J. Ruhl1-35/+42
Currently several things occur before the hfi1_devdata structure is allocated. This leads to an inconsistent logging ability and makes it more difficult to restructure some code paths. Allocate (and do a minimal init) the structure as soon as possible. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-07-03IB/hfi1: Set in_use_ctxts bits for user ctxts onlyMichael J. Ruhl1-1/+0
The in_use_ctxts bitmask is for user receive contexts only. Setting it for any other type of receive context is incorrect. Move initial set of in_use_ctxts bits from the general context init to the user context specific init. Having this bit set can allow contexts to be incorrectly identified by some IRQ handlers. This will allow handle_user_interrupt() will now filter user contexts correctly. Clean up redundant is_rcv_urgent_int() user context check. A follow on patch will clean up an incorrect code path in the is_rcv_avail_int(). Fixes: 8737ce95c463 ("IB/hfi1: Fix an assign/ordering issue with shared context IDs") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22IB/hfi1: Reorg ctxtdata and rightsize fieldsMike Marciniszyn1-2/+2
Many fields in ctxtdata are incorrectly sized and the organization of the fields within the structure is a jumble. Fix by: - Correcting oversize fields. - Putting fields common to all contexts at the top with hot fields at the top. - Moving PSM fields to the bottom of the structure. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22IB/hfi1: Remove caches of chip CSRsMike Marciniszyn1-1/+1
Remove the sizeable cache of the chip sizing CSRs and replace with CSR reads as needed. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22IB/hfi1: Remove unused/writeonly devdata fieldsMike Marciniszyn1-3/+0
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-22IB/hfi1: Remove rcvhdrq_sizeMike Marciniszyn1-9/+2
The usage of this ctxt data field is not hot path and the value can be computed on demand to cut down the ctxtdata bloat. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19IB/hfi1: Remove rcvhdrsizeMike Marciniszyn1-1/+1
The field is based on a constant that can never change. Use the define to assign the register instead. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19IB/hfi1: Move rhf_offset from devdata to ctxtdataMike Marciniszyn1-0/+2
This field should be in ctxtdata to allow for better locality of access by eliminating a dd dereference. The new field is now side-by-side with rcvhdrqentsize since the rhf_offset is a function of the rcvhdrqentsize. Both fields are now correctly sized as u8. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19IB/hfi1: Move normal functions from hfi1_devdata to const arrayMike Marciniszyn1-18/+1
The current implementation precludes having receive context specific packet type receive handlers. Fix this by adding adding c99 const array for the existing handlers and remove the current 72 bytes of pointers from devdata. A new pointer in hfi1_ctxtdata will point to the const array. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-04IB/hfi1: Fix comment on default hdr entry sizeMike Marciniszyn1-2/+2
The comment for the default header queue entry size is incorrect. Correct the comment and fix the resulting S_IRUGO warning that shows up in the widened patch context. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-04IB/hfi1: Rename exp_lock to exp_mutexKaike Wan1-1/+1
The mutex exp_lock in struct hfi1_ctxtdata is used to protect all Expected TID data of a user context. This patch renames it to exp_mutex to better reflect its identity and prepare for upcoming patches. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-04IB/hfi1: Fix user context tail allocation for DMA_RTAILMike Marciniszyn1-5/+4
The following code fails to allocate a buffer for the tail address that the hardware DMAs into when the user context DMA_RTAIL is set. if (HFI1_CAP_KGET_MASK(rcd->flags, DMA_RTAIL)) { rcd->rcvhdrtail_kvaddr = dma_zalloc_coherent( &dd->pcidev->dev, PAGE_SIZE, &dma_hdrqtail, gfp_flags); if (!rcd->rcvhdrtail_kvaddr) goto bail_free; rcd->rcvhdrqtailaddr_dma = dma_hdrqtail; } So the rcvhdrtail_kvaddr would then be NULL. The mmap logic fails to check for a NULL rcvhdrtail_kvaddr. The fix is to test for both user and kernel DMA_TAIL options during the allocation as well as testing for a NULL rcvhdrtail_kvaddr during the mmap processing. Additionally, all downstream testing of the capmask for DMA_RTAIL have been eliminated in favor of testing rcvhdrtail_kvaddr. Cc: <stable@vger.kernel.org> # 4.9.x Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-24IB/hfi1: Cleanup of exp_rcvMike Marciniszyn1-3/+1
The knowledge of the internal workings of the expect receive is too distributed. Fix by: - right size several rcd fields associated with expect receive - making an init entrance to init all the lists - consolidate all the allocations into an array anchored in the rcd Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09IB/{hfi1, rdmavt, qib}: Implement CQ completion vector supportSebastian Sanchez1-2/+13
Currently the driver doesn't support completion vectors. These are used to indicate which sets of CQs should be grouped together into the same vector. A vector is a CQ processing thread that runs on a specific CPU. If an application has several CQs bound to different completion vectors, and each completion vector runs on different CPUs, then the completion queue workload is balanced. This helps scale as more nodes are used. Implement CQ completion vector support using a global workqueue where a CQ entry is queued to the CPU corresponding to the CQ's completion vector. Since the workqueue is global, it's guaranteed to always be there when queueing CQ entries; Therefore, the RCU locking for cq->rdi->worker in the hot path is superfluous. Each completion vector is assigned to a different CPU. The number of completion vectors available is computed by taking the number of online, physical CPUs from the local NUMA node and subtracting the CPUs used for kernel receive queues and the general interrupt. Special use cases: * If there are no CPUs left for completion vectors, the same CPU for the general interrupt is used; Therefore, there would only be one completion vector available. * For multi-HFI systems, the number of completion vectors available for each device is the total number of completion vectors in the local NUMA node divided by the number of devices in the same NUMA node. If there's a division remainder, the first device to get initialized gets an extra completion vector. Upon a CQ creation, an invalid completion vector could be specified. Handle it as follows: * If the completion vector is less than 0, set it to 0. * Set the completion vector to the result of the passed completion vector moded with the number of device completion vectors available. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-09IB/{hfi1, qib}: Add handling of kernel restartAlex Estrin1-0/+13
A warm restart will fail to unload the driver, leaving link state potentially flapping up to the point the BIOS resets the adapter. Correct the issue by hooking the shutdown pci method, which will bring port down. Cc: <stable@vger.kernel.org> # 4.9.x Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-03IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failureSebastian Sanchez1-10/+27
When allocating device data, if there's an allocation failure, the already allocated memory won't be freed such as per-cpu counters. Fix memory leaks in exception path by creating a common reentrant clean up function hfi1_clean_devdata() to be used at driver unload time and device data allocation failure. To accomplish this, free_platform_config() and clean_up_i2c() are changed to be reentrant to remove dependencies when they are called in different order. This helps avoid NULL pointer dereferences introduced by this patch if those two functions weren't reentrant. In addition, set dd->int_counter, dd->rcv_limit, dd->send_schedule and dd->tx_opstats to NULL after they're freed in hfi1_clean_devdata(), so that hfi1_clean_devdata() is fully reentrant. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-03IB/hfi1: Fix NULL pointer dereference when invalid num_vls is usedSebastian Sanchez1-0/+2
When an invalid num_vls is used as a module parameter, the code execution follows an exception path where the macro dd_dev_err() expects dd->pcidev->dev not to be NULL in hfi1_init_dd(). This causes a NULL pointer dereference. Fix hfi1_init_dd() by initializing dd->pcidev and dd->pcidev->dev earlier in the code. If a dd exists, then dd->pcidev and dd->pcidev->dev always exists. BUG: unable to handle kernel NULL pointer dereference at 00000000000000f0 IP: __dev_printk+0x15/0x90 Workqueue: events work_for_cpu_fn RIP: 0010:__dev_printk+0x15/0x90 Call Trace: dev_err+0x6c/0x90 ? hfi1_init_pportdata+0x38d/0x3f0 [hfi1] hfi1_init_dd+0xdd/0x2530 [hfi1] ? pci_conf1_read+0xb2/0xf0 ? pci_read_config_word.part.9+0x64/0x80 ? pci_conf1_write+0xb0/0xf0 ? pcie_capability_clear_and_set_word+0x57/0x80 init_one+0x141/0x490 [hfi1] local_pci_probe+0x3f/0xa0 work_for_cpu_fn+0x10/0x20 process_one_work+0x152/0x350 worker_thread+0x1cf/0x3e0 kthread+0xf5/0x130 ? max_active_store+0x80/0x80 ? kthread_bind+0x10/0x10 ? do_syscall_64+0x6e/0x1a0 ? SyS_exit_group+0x10/0x10 ret_from_fork+0x35/0x40 Cc: <stable@vger.kernel.org> # 4.9.x Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-05-03IB/hfi1 Use correct type for num_user_contextMichael J. Ruhl1-2/+2
The module parameter num_user_context is defined as 'int' and defaults to -1. The module_param_named() says that it is uint. Correct module_param_named() type information and update the modinfo text to reflect the default value. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-01IB/hfi1: Convert kzalloc_node and kcalloc to use kcalloc_nodeKamenee Arumugam1-8/+9
Kzalloc_node API doesn't check for overflows in size multiplication. While kcalloc API check for overflows in size multiplication but these implementations are not NUMA-aware. This conversion allowed for correcting an allocation used in the hot path to be on the local NUMA and ensure us overflow free multiplication for the size of a memory allocation. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01IB/hfi1: Convert PortXmitWait/PortVLXmitWait counters to flit timesKamenee Arumugam1-0/+9
HFI's counters SendWaitCnt and SendWaitVlCnt are in units of TXE cycle time (at 805MHz). OPA counters PortXmitWait and PortVLXmtWait are in units of flit times. Convert the counter values to flit units using following conversion formula: PortXmitWait = SendWaitCnt * 2 * (4 /link_width) * (25 Gbps /link_speed) PortVLXmitWait = SendWaitVLCnt * 2 * (4 /link_width) * (25 Gbps /link_speed) At link up or downgrade events, the link width can change. To ensure accurate counter calculations, sample the counters after the events, during counter requests, and then aggregate the OPA counters. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01IB/hfi1: Fix for early release of sdma contextAlex Estrin1-0/+1
With IRQF_SHARED flag set and CONFIG_DEBUG_SHIRQ enabled module removal may result in panic in sdma_interrupt() routine if associated sdma context was released before pci_free_irq(); [ 9198.939885] BUG: unable to handle kernel NULL pointer dereference at (null) [ 9198.940514] IP: sdma_make_progress+0xa5/0x450 [hfi1] [ 9198.941114] PGD 170bdc0067 P4D 170bdc0067 PUD 172063e067 PMD 0 [ 9198.941783] Oops: 0000 [#1] SMP ..... [ 9198.958877] CPU: 132 PID: 64173 Comm: rmmod Tainted: G OE 4.14.0-rc4+ #1 [ 9198.961032] Hardware name: Intel Corporation S7200AP/S7200AP, BIOS S72C610.86B.01.02.0118.080620171935 08/06/2017 [ 9198.963323] task: ffff9681397f0000 task.stack: ffffae1647c40000 [ 9198.965695] RIP: 0010:sdma_make_progress+0xa5/0x450 [hfi1] [ 9198.968082] RSP: 0018:ffffae1647c43be8 EFLAGS: 00010046 [ 9198.970503] RAX: 0000000000000000 RBX: ffff9680ce8b5ca8 RCX: 0000000000000000 [ 9198.973006] RDX: 0000000000000000 RSI: 0000000001a00d28 RDI: ffff9680ce8b5ca0 [ 9198.975546] RBP: ffffae1647c43c40 R08: ffff96814325ec00 R09: 00000000ffffffff [ 9198.978142] R10: 000000004325e501 R11: ffff96814325ec00 R12: ffff9680ce8b5c44 [ 9198.980779] R13: ffff9680ce8b5ca0 R14: 0000000000000000 R15: ffff9680ce8b5b00 [ 9198.983462] FS: 00007f31196ba740(0000) GS:ffff96819df00000(0000) knlGS:0000000000000000 [ 9198.986231] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9198.989036] CR2: 0000000000000000 CR3: 000000170833f000 CR4: 00000000001406e0 [ 9198.991911] Call Trace: [ 9198.994847] sdma_engine_interrupt+0x82/0x100 [hfi1] [ 9198.997852] sdma_interrupt+0x61/0xc0 [hfi1] [ 9199.000852] __free_irq+0x1b3/0x2d0 [ 9199.003873] free_irq+0x35/0x70 [ 9199.006909] pci_free_irq+0x1c/0x30 [ 9199.009999] clean_up_interrupts+0x53/0xf0 [hfi1] [ 9199.013137] hfi1_start_cleanup+0x117/0x190 [hfi1] [ 9199.016315] postinit_cleanup+0x1d/0x270 [hfi1] [ 9199.019529] remove_one+0x1f3/0x210 [hfi1] [ 9199.022738] pci_device_remove+0x39/0xc0 [ 9199.025974] device_release_driver_internal+0x141/0x210 [ 9199.029268] driver_detach+0x3f/0x80 [ 9199.032580] bus_remove_driver+0x55/0xd0 [ 9199.035931] driver_unregister+0x2c/0x50 [ 9199.039321] pci_unregister_driver+0x2a/0xa0 [ 9199.042755] hfi1_mod_cleanup+0x10/0xb50 [hfi1] [ 9199.046196] SyS_delete_module+0x171/0x250 ... Fix by exporting sdma_clean() and removing from sdma_exit(). sdma_exit() now just manipulates the engine state, leaving the memory free to sdma_clean() which is now called just before the dd is freed. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-01IB/hfi1: Re-order IRQ cleanup to address driver cleanup raceMichael J. Ruhl1-1/+3
The pci_request_irq() interfaces always adds the IRQF_SHARED bit to all IRQ requests. When the kernel is built with CONFIG_DEBUG_SHIRQ config flag, if the IRQF_SHARED bit is set, a call to the IRQ handler is made from the __free_irq() function. This is testing a race condition between the IRQ cleanup and an IRQ racing the cleanup. The HFI driver should be able to handle this race, but does not. This race can cause traces that start with this footprint: BUG: unable to handle kernel NULL pointer dereference at (null) Call Trace: <hfi1 irq handler> ... __free_irq+0x1b3/0x2d0 free_irq+0x35/0x70 pci_free_irq+0x1c/0x30 clean_up_interrupts+0x53/0xf0 [hfi1] hfi1_start_cleanup+0x122/0x190 [hfi1] postinit_cleanup+0x1d/0x280 [hfi1] remove_one+0x233/0x250 [hfi1] pci_device_remove+0x39/0xc0 Export IRQ cleanup function so it can be called from other modules. Using the exported cleanup function: Re-order the driver cleanup code to clean up IRQ resources before other resources, eliminating the race. Re-order error path for init so that the race does not occur. Reduce severity on spurious error message for SDMA IRQs to info. Reviewed-by: Alex Estrin <alex.estrin@intel.com> Reviewed-by: Patel Jay P <jay.p.patel@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-05IB/{rdmavt, hfi1, qib}: Self determine driver nameMichael J. Ruhl1-0/+2
Currently the HFI and QIB drivers allow the IB core to assign a unit number to the driver name string. If multiple devices exist in a system, there is a possibility that the device unit number and the IB core number will be mismatched. Fix by using the driver defined unit number to generate the device name. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-11-13IB/hfi1: Do not allocate PIO send contexts for VNICNiranjana Vishwanathapura1-2/+1
OPA VNIC does not use PIO contexts and instead only uses SDMA engines. Do not allocate PIO contexts for VNIC ports. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-30IB/hfi1: Add tx_opcode_stats like the opcode_statsMike Marciniszyn1-0/+7
This patch adds tx_opcode_stats to parallel the (rx)opcode_stats in the debugfs. Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18Merge branch 'timer_setup' into for-nextDoug Ledford1-1/+1
Conflicts: drivers/infiniband/hw/cxgb4/cm.c drivers/infiniband/hw/qib/qib_driver.c drivers/infiniband/hw/qib/qib_mad.c There were minor fixups needed in these files. Just minor context diffs due to patches from independent sources touching the same basic area. Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18IB/hfi1: Convert timers to use timer_setup()Kees Cook1-1/+1
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Switches test of .data field to .function, since .data will be going away. Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27IB/hfi1: Add a safe wrapper for _rcd_get_by_indexMichael J. Ruhl1-0/+21
hfi1_rcd_get_by_index assumes that the given index is in the correct range. In most cases this is correct because the index is bounded by a loop. For these cases, adding a range check to the function is redundant. For the use case that is not bounded by the loop range, a _safe wrapper function is needed to validate the index before accessing the rcd array. Add a _safe wrapper to _get_by_index to validate the index range. Update appropriate call sites with the new _safe function. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27IB/hfi1: Remove unused hfi1_cpulist variablesJan Sokolowski1-16/+0
Following variables: hfi1_cpulist and hfi1_cpulist_count are unused. Remove them. Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-27IB/hfi1: Remove unnecessary error messages on alloc failuresJan Sokolowski1-6/+0
Per-cpu variables int_counter, rcv_limit, and send_schedule print unnecessary error messages on failed allocations. Remove the error messages. Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22IB/hf1: User context locking is inconsistentMichael J. Ruhl1-50/+84
There is a mixture of mutex and spinlocks to protect receive context (rcd/uctxt) information. This is not used consistently. Use the mutex to protect device receive context information only. Use the spinlock to protect sub context information only. Protect access to items in the rcd array with a spinlock and reference count. Remove spinlock around dd->rcd array cleanup. Since interrupts are disabled and cleaned up before this point, this lock is not useful. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22IB/hfi1: Protect context array set/clear with spinlockMichael J. Ruhl1-62/+137
The rcd array can be accessed from user context or during interrupts. Protecting this with a mutex isn't a good idea because the mutex should not be used from an IRQ. Protect the allocation and freeing of rcd array elements with a spinlock. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-22IB/hfi1: Revert egress pkey check enforcementAlex Estrin1-1/+0
Current code has some serious flaws. Disarm the flag pending an appropriate patch. Fixes: 53526500f301 ("IB/hfi1: Permanently enable P_Key checking in HFI") Cc: stable@vger.kernel.org Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/hfi1: Create workqueue for link eventsSebastian Sanchez1-0/+26
Currently, link down interrupts queue link entries on a workqueue intended for sending events only. Create a workqueue for queuing link events. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/hfi1: Pass the context pointer rather than the indexMichael J. Ruhl1-7/+7
The hfi1_rcvctrl() function receives an index which it then converts to an rcd. Since most functions have the rcd, use that instead. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-07-31IB/hfi1: Size rcd array index correctly and consistentlyMichael J. Ruhl1-4/+5
The array index for the rcd array is sized several different ways throughout the code. Use the user interface size (u16) as the standard size and update the necessary code to reflect this. u16 is large enough for the largest amount of supported contexts. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27IB/hfi1: Resolve kernel panics by reference counting receive contextsMichael J. Ruhl1-10/+48
Base receive contexts can be used by sub contexts. Because of this, resources for the context cannot be completely freed until all sub contexts are done using the base context. Introduce a reference count so that the base receive context can be freed only when all sub contexts are done with it. Use the provided function call for setting default send context integrity rather than the manual method. The cleanup path does not set all variables back to NULL after freeing resources. Since the clean up code can get called more than once, (e.g. during context close and on the error path), it is necessary to make sure that all the variables are NULLed. Possible crash are: BUG: unable to handle kernel paging request at 0000000001908900 IP: read_csr+0x24/0x30 [hfi1] RIP: 0010:read_csr+0x24/0x30 [hfi1] Call Trace: sc_disable+0x40/0x110 [hfi1] hfi1_file_close+0x16f/0x360 [hfi1] __fput+0xe7/0x210 ____fput+0xe/0x10 or kernel BUG at mm/slub.c:3877! RIP: 0010:kfree+0x14f/0x170 Call Trace: hfi1_free_ctxtdata+0x19a/0x2b0 [hfi1] ? hfi1_user_exp_rcv_grp_free+0x73/0x80 [hfi1] hfi1_file_close+0x20f/0x360 [hfi1] __fput+0xe7/0x210 ____fput+0xe/0x10 Fixes: Commit 62239fc6e554 ("IB/hfi1: Clean up on context initialization failure") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-06-27IB/hfi1: Initialize TID lists to avoid crash on cleanupMichael J. Ruhl1-0/+4
The expected receive lists (tid_xxx_list) are not initialized until late in the receive context initialization. If an error happens before the initialization, a NULL pointer access will occur during cleanup. Initialized the lists sooner rather than later to avoid this Oops: IP: unlock_exp_tids.isra.11+0x26/0xd0 [hfi1] RIP: 0010:unlock_exp_tids.isra.11+0x26/0xd0 [hfi1] Call Trace: hfi1_user_exp_rcv_free+0x79/0xb0 [hfi1] hfi1_file_close+0x87/0x360 [hfi1] __fput+0xe7/0x210 ____fput+0xe/0x10 Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04IB/hfi1: Clean up on context initialization failureMichael J. Ruhl1-6/+4
The error path for context initialization is not consistent. Cleanup all resources on failure. Removed unused variable user_event_mask. Add the _BASE_FAILED bit to the event flags so that a base context can notify waiting sub contexts that they cannot continue. Running out of sub contexts is an EBUSY result, not EINVAL. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04IB/hfi1: Fix an assign/ordering issue with shared context IDsMichael J. Ruhl1-1/+2
The current algorithm for generating sub-context IDs is FILO. If the contexts are not closed in that order, the uniqueness of the ID will be compromised. I.e. logging the creation/deletion of context IDs with an application that assigns and closes in a FIFO order reveals: cache_id: assign: uctxt: 3 sub_ctxt: 0 cache_id: assign: uctxt: 3 sub_ctxt: 1 cache_id: assign: uctxt: 3 sub_ctxt: 2 cache_id: close: uctxt: 3 sub_ctxt: 0 cache_id: assign: uctxt: 3 sub_ctxt: 2 <<< The sub_ctxt ID 2 is reused incorrectly. Update the sub-context ID assign algorithm to use a bitmask of in_use contexts. The new algorithm will allow the contexts to be closed in any order, and will only re-use unused contexts. Size subctxt and subctxt_cnt to match the user API size. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04IB/hfi1: Clean up context initializationMichael J. Ruhl1-8/+2
Context initialization mixes base context init with sub context init. This is bad because contexts can be reused, and on reuse, reinit things that should not re-initialized. Normalize comments and function names to refer to base context and sub context (not main, shared or slaves). Separate the base context initialization from sub context initialization. hfi1_init_ctxt() cannot return an error so changed to a void and remove error message. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04IB/hfi1: Name function prototype parametersMichael J. Ruhl1-1/+1
To improve the readability of function prototypes, give the parameters names. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04IB/hfi1: Return an error on memory allocation failureMichael J. Ruhl1-0/+1
If the eager buffer allocation fails, it is necessary to return an error code. Cc: stable@vger.kernel.org Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04IB/hfi1: Adjust default eager_buffer_size to 8MBTymoteusz Kielan1-2/+2
Performance analysis shows benefits for PSM2 in increasing eager buffer size from 2MB to 8MB. The change has neutral impact on verbs. Make change to the module parameter's default value. Allocation ring down was verified to work with the larger buffer size. Reviewed-by: Tadeusz Struk <tadeusz.struk@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Tymoteusz Kielan <tymoteusz.kielan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04IB/hfi1: Fix yield logic in send engineMike Marciniszyn1-1/+2
When there are many RC QPs and an RDMA READ request is sent, timeouts occur on the requester side because of fairness among RC QPs on their relative SDMA engine on the responder side. This also hits write and send, but to a lesser extent. Complicating the issue is that the current code checks if workqueue is congested before scheduling other QPs, however, this check is based on the number of active entries in the workqueue, which was found to be too big to for workqueue_congested() to be effective. Fix by reducing the number of active entries as revealed by experimentation from the default of num_sdma to HFI1_MAX_ACTIVE_WORKQUEUE_ENTRIES. Retry counts were monitored to determine the correct value. Tracing to investigate any future issues is also added. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28IB/hfi1: Fix unbalanced braces around elseDennis Dalessandro1-2/+4
Add missing braces around else blocks in a few places to make checkpatch happy. Fixes: 7724105686e7 ("IB/hfi1: add driver files") Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-04-28IB/hfi1: Permanently enable P_Key checking in HFINeel Desai1-0/+3
Ingress and egress port P_Key checking should always be performed for HFIs. This patch will enable ingress and egress P_Key checking when the port is initialized and will ignore the P_Key information sent by the FM in the port info structure which is meant to be used only by the switch. Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Neel Desai <neel.desai@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>