aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox/mlx5/core/eq.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2019-02-14net/mlx5: Relocate vport macros to the vport header fileBodong Wang1-0/+1
These are two macros in the driver general header which deal with the number of total vports and if a vport is vport manager. Such macros are vport entities, better to place them at the vport header file. This patch doesn't change any functionality. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-14net/mlx5: Add host params change eventBodong Wang1-0/+3
In Embedded CPU (EC) configurations, the EC driver needs to know when the number of virtual functions change on the corresponding PF at the host side. This is required so the EC driver can create or destroy representor net devices that represent the VFs ports. Whenever a change in the number of VFs occurs, firmware will generate an event towards the EC which will trigger a work to complete the rest of the handling. The specifics of the handling will be introduced in a downstream patch. Signed-off-by: Bodong Wang <bodong@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-11mlx5: use RCU lock in mlx5_eq_cq_get()Cong Wang1-6/+6
mlx5_eq_cq_get() is called in IRQ handler, the spinlock inside gets a lot of contentions when we test some heavy workload with 60 RX queues and 80 CPU's, and it is clearly shown in the flame graph. In fact, radix_tree_lookup() is perfectly fine with RCU read lock, we don't have to take a spinlock on this hot path. This is pretty much similar to commit 291c566a2891 ("net/mlx4_core: Fix racy CQ (Completion Queue) free"). Slow paths are still serialized with the spinlock, and with synchronize_irq() it should be safe to just move the fast path to RCU read lock. This patch itself reduces the latency by about 50% for our memcached workload on a 4.14 kernel we test. In upstream, as pointed out by Saeed, this spinlock gets some rework in commit 02d92f790364 ("net/mlx5: CQ Database per EQ"), so the difference could be smaller. Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-10net/mlx5: Add monitor commands layout and event dataEyal Davidovich1-0/+3
Will be used in downstream patch to monitor counter changes by the HCA and report it to the driver by an event. The driver will update its counters cached data accordingly. Signed-off-by: Eyal Davidovich <eyald@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-26net/mlx5: Device events, Use async events chainSaeed Mahameed1-157/+0
Move all the generic async events handling into new specific events handling file events.c to keep eq.c file clean from concrete event logic handling. Use new API to register for NOTIFY_ANY to handle generic events and dispatch allowed events to mlx5_core consumers (mlx5_ib and mlx5e) Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-26net/mlx5: CQ ERR, Use async events chainSaeed Mahameed1-22/+44
Remove the explicit call to mlx5_eq_cq_event on MLX5_EVENT_TYPE_CQ_ERROR and register a specific CQ ERROR handler via the new API. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-26net/mlx5: Resource tables, Use async events chainSaeed Mahameed1-29/+0
Remove the explicit call to QP/SRQ resources events handlers on several FW events and let resources logic register resources events notifiers via the new API. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-26net/mlx5: CmdIF, Use async events chainSaeed Mahameed1-4/+0
Remove the explicit call to mlx5_cmd_comp_handler on MLX5_EVENT_TYPE_CMD and let command interface to register its own handler when its ready. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-26net/mlx5: FWPage, Use async events chainSaeed Mahameed1-11/+0
Remove the explicit call to mlx5_core_req_pages_handler on MLX5_EVENT_TYPE_PAGE_REQUEST and let FW page logic to register its own handler when its ready. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-26net/mlx5: E-Switch, Use async events chainSaeed Mahameed1-4/+0
Remove the explicit call to mlx5_eswitch_vport_event on MLX5_EVENT_TYPE_NIC_VPORT_CHANGE and let the eswitch register its own handler when its ready. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-26net/mlx5: Clock, Use async events chainSaeed Mahameed1-4/+0
Remove the explicit call to mlx5_pps_event on MLX5_EVENT_TYPE_PPS_EVENT and let clock logic to register its own handler when its ready. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-26net/mlx5: FPGA, Use async events chainSaeed Mahameed1-5/+0
Remove the explicit call to mlx5_fpga_event on MLX5_EVENT_TYPE_FPGA_ERROR or MLX5_EVENT_TYPE_FPGA_QP_ERROR let fpga core to register its own handler when its ready. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-26net/mlx5: FWTrace, Use async events chainSaeed Mahameed1-4/+0
Remove the explicit call to mlx5_fw_tracer_event on MLX5_EVENT_TYPE_DEVICE_TRACER and let fw tracer to register its own handler when its ready. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-26net/mlx5: EQ, Introduce atomic notifier chain subscription APISaeed Mahameed1-3/+39
Use atomic_notifier_chain to fire firmware events at internal mlx5 core components such as eswitch/fpga/clock/FW tracer/etc.., this is to avoid explicit calls from low level mlx5_core to upper components and to simplify the mlx5_core API for future developments. Simply provide register/unregister notifiers API and call the notifier chain on firmware async events. Example: to subscribe to a FW event: struct mlx5_nb port_event; MLX5_NB_INIT(&port_event, port_event_handler, PORT_CHANGE); mlx5_eq_notifier_register(mdev, &port_event); where: - port_event_handler is the notifier block callback. - PORT_EVENT is the suffix of MLX5_EVENT_TYPE_PORT_CHANGE. The above will guarantee that port_event_handler will receive all FW events of the type MLX5_EVENT_TYPE_PORT_CHANGE. To receive all FW/HW events one can subscribe to MLX5_EVENT_TYPE_NOTIFY_ANY. The next few patches will start moving all mlx5 core components to use this new API and cleanup mlx5_eq_async_int misx handler from component explicit calls and specific logic. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-11-20net/mlx5: EQ, Make EQE access methods inlineSaeed Mahameed1-23/+0
These are one/two liner generic EQ access methods, better have them declared static inline in eq.h. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20{net,IB}/mlx5: Move Page fault EQ and ODP logic to RDMASaeed Mahameed1-252/+0
Use the new generic EQ API to move all ODP RDMA data structures and logic form mlx5 core driver into mlx5_ib driver. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20net/mlx5: EQ, Generic EQSaeed Mahameed1-63/+180
Add mlx5_eq_{create/destroy}_generic APIs and EQE access methods, for mlx5 core consumers generic EQs. This API will be used in downstream patch to move page fault (RDMA ODP) EQ logic into mlx5_ib rdma driver, hence it will use a generic EQ. Current mlx5 EQ allocation scheme: On load mlx5 allocates 4 (for async) + #cores (for data completions) MSIX vectors, mlx5 core will assign 3 MSIX vectors for internal async EQs and will use all of the #cores MSIX vectors for completion EQs, (One vector is going to be reserved for a generic EQ). After this patch an external user (e.g mlx5_ib) of mlx5_core can use this new API to create new generic EQs with the reserved msix vector index for that eq. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20net/mlx5: EQ, Different EQ typesSaeed Mahameed1-151/+225
In mlx5 we have three types of usages for EQs, 1. Asynchronous EQs, used internally by mlx5 core for a. FW command completions b. FW page requests c. one EQ for all other Asynchronous events 2. Completion EQs, used for CQ completion (we create one per core) 3. *Special type of EQ (page fault) used for RDMA on demand paging (ODP). *The 3rd type shouldn't be special at least in mlx5 core, it is yet another async events EQ with specific use case, it will be removed in the next two patches, and will completely move its logic to mlx5_ib, as it is rdma specific. In this patch we remove use case (eq type) specific fields from struct mlx5_eq into a new eq type specific structures. struct mlx5_eq_async; truct mlx5_eq_comp; struct mlx5_eq_pagefault; Separate between their type specific flows. In the future we will allow users to create there own generic EQs. for now we will allow only one for ODP in next patches. We will introduce event listeners registration API for those who want to receive mlx5 async events. After that mlx5 eq handling will be clean from feature/user specific handling. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20net/mlx5: EQ, Privatize eq_table and friendsSaeed Mahameed1-21/+81
Move unnecessary EQ table structures and declaration from the public include/linux/mlx5/driver.h into the private area of mlx5_core and into eq.c/eq.h. Introduce new mlx5 EQ APIs: mlx5_comp_vectors_count(dev); mlx5_comp_irq_get_affinity_mask(dev, vector); And use them from mlx5_ib or mlx5e netdevice instead of direct access to mlx5_core internal structures. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20net/mlx5: EQ, irq_info and rmap belong to eq_tableSaeed Mahameed1-19/+21
irq_info and rmap are EQ properties of the driver, and only needed for EQ objects, move them to the eq_table EQs database structure. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20net/mlx5: EQ, Create all EQs in one placeSaeed Mahameed1-25/+96
Instead of creating the EQ table in three steps at driver load, - allocate irq vectors - allocate async EQs - allocate completion EQs Gather all of the procedures into one function in eq.c and call it from driver load. This will help us reduce the EQ and EQ table private structures visibility to eq.c in downstream refactoring. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20net/mlx5: EQ, Move all EQ logic to eq.cSaeed Mahameed1-0/+176
Move completion EQs flows from main.c to eq.c, reasons: 1) It is where this logic belongs. 2) It will help centralize the EQ logic in one file for downstream refactoring, and future extensions/updates. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20net/mlx5: EQ, Remove redundant completion EQ list lockSaeed Mahameed1-2/+0
Completion EQs list is only modified on driver load/unload, locking is not required, remove it. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-20net/mlx5: EQ, Remove unused fields and structuresSaeed Mahameed1-11/+0
Some fields and structures are not referenced nor used by the driver, remove them. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-11-12net/mlx5: Use multi threaded workqueue for page fault handlingMoni Shoua1-2/+3
Page fault events are processed in a workqueue context. Since each QP can have up to two concurrent unrelated page-faults, one for requester and one for responder, page-fault handling can be done in parallel. Achieve this by changing the workqueue to be multi-threaded. The number of threads is the same as the number of command interface channels to avoid command interface bottlenecks. In addition to multi-threads, change the workqueue flags to give it high priority. Stress benchmark shows that before this change 85% of page faults were waiting in queue 8 seconds or more while after the change 98% of page faults were waiting in queue 64 milliseconds or less. The number of threads was chosen as the number of channels to the command interface. Fixes: d9aaed838765 ("{net,IB}/mlx5: Refactor page fault handling") Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-10-10net/mlx5: Take only bit 24-26 of wqe.pftype_wq for page fault typeHuy Nguyen1-1/+1
The HW spec defines only bits 24-26 of pftype_wq as the page fault type, use the required mask to ensure that. Fixes: d9aaed838765 ("{net,IB}/mlx5: Refactor page fault handling") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-08-13net/mlx5e: clock.c depends on CONFIG_PTP_1588_CLOCKMoshe Shemesh1-0/+1
lib/clock.c includes clock related functions which require ptp support. Thus compile out lib/clock.c and add the needed function stubs in case kconfig CONFIG_PTP_1588_CLOCK is off. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23net/mlx5: FW tracer, events handlingFeras Daoud1-0/+11
The tracer has one event, event 0x26, with two subtypes: - Subtype 0: Ownership change - Subtype 1: Traces available An ownership change occurs in the following cases: 1- Owner releases his ownership, in this case, an event will be sent to inform others to reattempt acquire ownership. 2- Ownership was taken by a higher priority tool, in this case the owner should understand that it lost ownership, and go through tear down flow. The second subtype indicates that there are traces in the trace buffer, in this case, the driver polls the tracer buffer for new traces, parse them and prepares the messages for printing. The HW starts tracing from the first address in the tracer buffer. Driver receives an event notifying that new trace block exists. HW posts a timestamp event at the last 8B of every 256B block. Comparing the timestamp to the last handled timestamp would indicate that this is a new trace block. Once the new timestamp is detected, the entire block is considered valid. Block validation and parsing, should be done after copying the current block to a different location, in order to avoid block overwritten during processing. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-31net/mlx5: Add FPGA QP error eventIlan Tayari1-2/+5
The FPGA queue pair (QP) event fires whenever a QP on the FPGA transitions to the error state. At this stage, this event is unrecoverable, it may become recoverable in the future. Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Adi Nissim <adin@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31net/mlx5: Add temperature warning event to logIlan Tayari1-0/+23
Temperature warning event is sent by FW to indicate high temperature as detected by one of the sensors on the board. Add handling of this event by writing the numbers of the alert sensors to the kernel log. Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Adi Nissim <adin@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-10net/mlx5: Free IRQs in shutdown pathDaniel Jurgens1-0/+28
Some platforms require IRQs to be free'd in the shutdown path. Otherwise they will fail to be reallocated after a kexec. Fixes: 8812c24d28f4 ("net/mlx5: Add fast unload support in shutdown flow") Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-15net/mlx5: Remove redundant EQ API exportsSaeed Mahameed1-3/+0
EQ structure and API is private to mlx5_core driver only, external drivers should not have access or the means to manipulate EQ objects. Remove redundant exports and move API functions out of the linux/mlx5 include directory into the driver's mlx5_core.h private include file. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>
2018-02-15net/mlx5: Move CQ completion and event forwarding logic to eq.cSaeed Mahameed1-2/+47
Since CQ tree is now per EQ, CQ completion and event forwarding became specific implementation of EQ logic, this patch moves that logic to eq.c and makes those functions static. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>
2018-02-15net/mlx5: EQ add/del CQ APISaeed Mahameed1-0/+34
Add API to add/del CQ to/from EQs CQ table to be used in cq.c upon CQ creation/destruction, as CQ table is now private to eq.c. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>
2018-02-15net/mlx5: CQ Database per EQSaeed Mahameed1-2/+8
Before this patch the driver had one CQ database protected via one spinlock, this spinlock is meant to synchronize between CQ adding/removing and CQ IRQ interrupt handling. On a system with large number of CPUs and on a work load that requires lots of interrupts, this global spinlock becomes a very nasty hotspot and introduces a contention between the active cores, which will significantly hurt performance and becomes a bottleneck that prevents seamless cpu scaling. To solve this we simply move the CQ database and its spinlock to be per EQ (IRQ), thus per core. Tested with: system: 2 sockets, 14 cores per socket, hyperthreading, 2x14x2=56 cores netperf command: ./super_netperf 200 -P 0 -t TCP_RR -H <server> -l 30 -- -r 300,300 -o -s 1M,1M -S 1M,1M WITHOUT THIS PATCH: Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 4.32 0.00 36.15 0.09 0.00 34.02 0.00 0.00 0.00 25.41 Samples: 2M of event 'cycles:pp', Event count (approx.): 1554616897271 Overhead Command Shared Object Symbol + 14.28% swapper [kernel.vmlinux] [k] intel_idle + 12.25% swapper [kernel.vmlinux] [k] queued_spin_lock_slowpath + 10.29% netserver [kernel.vmlinux] [k] queued_spin_lock_slowpath + 1.32% netserver [kernel.vmlinux] [k] mlx5e_xmit WITH THIS PATCH: Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 4.27 0.00 34.31 0.01 0.00 18.71 0.00 0.00 0.00 42.69 Samples: 2M of event 'cycles:pp', Event count (approx.): 1498132937483 Overhead Command Shared Object Symbol + 23.33% swapper [kernel.vmlinux] [k] intel_idle + 1.69% netserver [kernel.vmlinux] [k] mlx5e_xmit Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>
2018-02-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-1/+1
Pull more rdma updates from Doug Ledford: "Items of note: - two patches fix a regression in the 4.15 kernel. The 4.14 kernel worked fine with NVMe over Fabrics and mlx5 adapters. That broke in 4.15. The fix is here. - one of the patches (the endian notation patch from Lijun) looks like a lot of lines of change, but it's mostly mechanical in nature. It amounts to the biggest chunk of change in it (it's about 2/3rds of the overall pull request). Summary: - Clean up some function signatures in rxe for clarity - Tidy the RDMA netlink header to remove unimplemented constants - bnxt_re driver fixes, one is a regression this window. - Minor hns driver fixes - Various fixes from Dan Carpenter and his tool - Fix IRQ cleanup race in HFI1 - HF1 performance optimizations and a fix to report counters in the right units - Fix for an IPoIB startup sequence race with the external manager - Oops fix for the new kabi path - Endian cleanups for hns - Fix for mlx5 related to the new automatic affinity support" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (38 commits) net/mlx5: increase async EQ to avoid EQ overrun mlx5: fix mlx5_get_vector_affinity to start from completion vector 0 RDMA/hns: Fix the endian problem for hns IB/uverbs: Use the standard kConfig format for experimental IB: Update references to libibverbs IB/hfi1: Add 16B rcvhdr trace support IB/hfi1: Convert kzalloc_node and kcalloc to use kcalloc_node IB/core: Avoid a potential OOPs for an unused optional parameter IB/core: Map iWarp AH type to undefined in rdma_ah_find_type IB/ipoib: Fix for potential no-carrier state IB/hfi1: Show fault stats in both TX and RX directions IB/hfi1: Remove blind constants from 16B update IB/hfi1: Convert PortXmitWait/PortVLXmitWait counters to flit times IB/hfi1: Do not override given pcie_pset value IB/hfi1: Optimize process_receive_ib() IB/hfi1: Remove unnecessary fecn and becn fields IB/hfi1: Look up ibport using a pointer in receive path IB/hfi1: Optimize packet type comparison using 9B and bypass code paths IB/hfi1: Compute BTH only for RDMA_WRITE_LAST/SEND_LAST packet IB/hfi1: Remove dependence on qp->s_hdrwords ...
2018-02-05net/mlx5: increase async EQ to avoid EQ overrunMax Gurtovoy1-1/+1
Currently the async EQ has 256 entries only. It might not be big enough for the SW to handle all the needed pending events. For example, in case of many QPs (let's say 1024) connected to a SRQ created using NVMeOF target and the target goes down, the FW will raise 1024 "last WQE reached" events and may cause EQ overrun. Increase the EQ to more reasonable size, that beyond it the FW should be able to delay the event and raise it later on using internal backpressure mechanism. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-0/+18
Pull networking updates from David Miller: 1) Significantly shrink the core networking routing structures. Result of http://vger.kernel.org/~davem/seoul2017_netdev_keynote.pdf 2) Add netdevsim driver for testing various offloads, from Jakub Kicinski. 3) Support cross-chip FDB operations in DSA, from Vivien Didelot. 4) Add a 2nd listener hash table for TCP, similar to what was done for UDP. From Martin KaFai Lau. 5) Add eBPF based queue selection to tun, from Jason Wang. 6) Lockless qdisc support, from John Fastabend. 7) SCTP stream interleave support, from Xin Long. 8) Smoother TCP receive autotuning, from Eric Dumazet. 9) Lots of erspan tunneling enhancements, from William Tu. 10) Add true function call support to BPF, from Alexei Starovoitov. 11) Add explicit support for GRO HW offloading, from Michael Chan. 12) Support extack generation in more netlink subsystems. From Alexander Aring, Quentin Monnet, and Jakub Kicinski. 13) Add 1000BaseX, flow control, and EEE support to mvneta driver. From Russell King. 14) Add flow table abstraction to netfilter, from Pablo Neira Ayuso. 15) Many improvements and simplifications to the NFP driver bpf JIT, from Jakub Kicinski. 16) Support for ipv6 non-equal cost multipath routing, from Ido Schimmel. 17) Add resource abstration to devlink, from Arkadi Sharshevsky. 18) Packet scheduler classifier shared filter block support, from Jiri Pirko. 19) Avoid locking in act_csum, from Davide Caratti. 20) devinet_ioctl() simplifications from Al viro. 21) More TCP bpf improvements from Lawrence Brakmo. 22) Add support for onlink ipv6 route flag, similar to ipv4, from David Ahern. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1925 commits) tls: Add support for encryption using async offload accelerator ip6mr: fix stale iterator net/sched: kconfig: Remove blank help texts openvswitch: meter: Use 64-bit arithmetic instead of 32-bit tcp_nv: fix potential integer overflow in tcpnv_acked r8169: fix RTL8168EP take too long to complete driver initialization. qmi_wwan: Add support for Quectel EP06 rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK ipmr: Fix ptrdiff_t print formatting ibmvnic: Wait for device response when changing MAC qlcnic: fix deadlock bug tcp: release sk_frag.page in tcp_disconnect ipv4: Get the address of interface correctly. net_sched: gen_estimator: fix lockdep splat net: macb: Handle HRESP error net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoring ipv6: addrconf: break critical section in addrconf_verify_rtnl() ipv6: change route cache aging logic i40e/i40evf: Update DESC_NEEDED value to reflect larger value bnxt_en: cleanup DIM work on device shutdown ...
2018-01-30Merge tag v4.15 of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.gitJason Gunthorpe1-8/+12
To resolve conflicts in: drivers/infiniband/hw/mlx5/main.c drivers/infiniband/hw/mlx5/qp.c From patches merged into the -rc cycle. The conflict resolution matches what linux-next has been carrying. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-19net/mlx5e: Poll event queue upon TX timeout before performing full channels recoveryEran Ben Elisha1-0/+18
Up until this patch, on every TX timeout we would try to do channels recovery. However, in case of a lost interrupt for an EQ, the channel associated to it cannot be recovered if reopened as it would never get another interrupt on sent/received traffic, and eventually ends up with another TX timeout (Restarting the EQ is not part of channel recovery). This patch adds a mechanism for explicitly polling EQ in case of a TX timeout in order to recover from a lost interrupt. If this is not the case (no pending EQEs), perform a channels full recovery as usual. Once a lost EQE is recovered, it triggers the NAPI to run and handle all pending completions. This will free some budget in the bql (via calling netdev_tx_completed_queue) or by clearing pending TXWQEs and waking up the queue. One of the above actions will move the queue to be ready for transmit again. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-08net/mlx5: Add DCT command interfaceMoni Shoua1-1/+8
Add a missing command interface to work with a DCT. It includes: creating, destroying and get events for. Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-19net/mlx5: Stay in polling mode when command EQ destroy failsMoshe Shemesh1-3/+1
During unload, on mlx5_stop_eqs we move command interface from events mode to polling mode, but if command interface EQ destroy fail we move back to events mode. That's wrong since even if we fail to destroy command interface EQ, we do release its irq, so no interrupts will be received. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19net/mlx5: Cleanup IRQs in case of unload failureMoshe Shemesh1-7/+13
When mlx5_stop_eqs fails to destroy any of the eqs it returns with an error. In such failure flow the function will return without releasing all EQs irqs and then pci_free_irq_vectors will fail. Fix by only warn on destroy EQ failure and continue to release other EQs and their irqs. It fixes the following kernel trace: kernel: kernel BUG at drivers/pci/msi.c:352! ... ... kernel: Call Trace: kernel: pci_disable_msix+0xd3/0x100 kernel: pci_free_irq_vectors+0xe/0x20 kernel: mlx5_load_one.isra.17+0x9f5/0xec0 [mlx5_core] Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19net/mlx5: Fix misspelling in the error message and commentEugenia Emantayev1-1/+1
Fix misspelling in word syndrome. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-14net/mlx5: PTP code migration to driver core sectionFeras Daoud1-2/+1
PTP code is moved to core section of mlx5 driver in order to share it between ethernet and infiniband. This movement involves the following changes: - Change mlx5e_ prefix to be mlx5_ - Add clock structs to Core - Add clock object to mlx5_core_dev - Call Init/Uninit clock from core init/cleanup - Rename mlx5e_tstamp to be mlx5_clock Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Eitan Rabin <rabin@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-7/+2
Pull networking updates from David Miller: 1) Support ipv6 checksum offload in sunvnet driver, from Shannon Nelson. 2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric Dumazet. 3) Allow generic XDP to work on virtual devices, from John Fastabend. 4) Add bpf device maps and XDP_REDIRECT, which can be used to build arbitrary switching frameworks using XDP. From John Fastabend. 5) Remove UFO offloads from the tree, gave us little other than bugs. 6) Remove the IPSEC flow cache, from Florian Westphal. 7) Support ipv6 route offload in mlxsw driver. 8) Support VF representors in bnxt_en, from Sathya Perla. 9) Add support for forward error correction modes to ethtool, from Vidya Sagar Ravipati. 10) Add time filter for packet scheduler action dumping, from Jamal Hadi Salim. 11) Extend the zerocopy sendmsg() used by virtio and tap to regular sockets via MSG_ZEROCOPY. From Willem de Bruijn. 12) Significantly rework value tracking in the BPF verifier, from Edward Cree. 13) Add new jump instructions to eBPF, from Daniel Borkmann. 14) Rework rtnetlink plumbing so that operations can be run without taking the RTNL semaphore. From Florian Westphal. 15) Support XDP in tap driver, from Jason Wang. 16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal. 17) Add Huawei hinic ethernet driver. 18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan Delalande. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits) i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq i40e: avoid NVM acquire deadlock during NVM update drivers: net: xgene: Remove return statement from void function drivers: net: xgene: Configure tx/rx delay for ACPI drivers: net: xgene: Read tx/rx delay for ACPI rocker: fix kcalloc parameter order rds: Fix non-atomic operation on shared flag variable net: sched: don't use GFP_KERNEL under spin lock vhost_net: correctly check tx avail during rx busy polling net: mdio-mux: add mdio_mux parameter to mdio_mux_init() rxrpc: Make service connection lookup always check for retry net: stmmac: Delete dead code for MDIO registration gianfar: Fix Tx flow control deactivation cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6 cxgb4: Fix pause frame count in t4_get_port_stats cxgb4: fix memory leak tun: rename generic_xdp to skb_xdp tun: reserve extra headroom only when XDP is set net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping net: dsa: bcm_sf2: Advertise number of egress queues ...
2017-09-03Merge tag 'for-linus-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdmaLinus Torvalds1-7/+25
Pull rdma updates from Doug Ledford: "This is a big pull request. Of note is that I'm sending you the new ioctl API for the rdma subsystem. We put it up on linux-api@, but didn't get much response. The API is complex, but it solves two different problems in one go: 1) The bi-directional nature of the RDMA file write calls, which created the security hole we had to handle (and for which the fix is now causing problems for systems in production, we were a bit over zealous in the fix and the ability to open a device, then fork, then create new queue pairs on the device and use them is broken). 2) The bloat caused by different vendors implementing extensions to the base verbs API. Each vendor's hardware is slightly different, and the hardware might be suitable for one extension but not another. By the time we add generic extensions for all the different ways that the different hardware can offload things, the API becomes bloated. Things like our completion structs have started to exceed a cache line in size because of all the elements needed to support this. That in turn shows up heavily in the performance graphs with a noticable drop in performance on 100Gigabit links as our completion structs go from occupying one cache line to 1+. This API makes things like the completion structs modular in a very similar way to netlink so that your structs can only include the items needed for the offloads/features you are actually using on a given queue pair. In that way we support everything, but only use what we need, and our structs stay smaller. The ioctl API is better explained by the posting on linux-api@ than I can explain it here, so I'll just leave it at that. The rest of the pull request is typical stuff. Updates for 4.14 kernel merge window - Lots of hfi1 driver updates (mixed with a few qib and core updates as well) - rxe updates - various mlx updates - Set default roce type to RoCEv2 - Several larger fixes for bnxt_re that were too big for -rc - Several larger fixes for qedr that, likewise, were too big for -rc - Misc core changes - Make the hns_roce driver compilable on arches other than aarch64 so we can more easily debug build issues related to it - Add rdma-netlink infrastructure updates - Add automatic IRQ affinity infrastructure - Add 32bit lid support - Lots of misc fixes across the subsystem from random people - Autoloading of RDMA netlink modules - PCI pool cleanups from Romain Perier - mlx5 driver feature additions and fixes - Hardware tag matchine feature - Fix sleeping in atomic when resolving roce ah - Add experimental ioctl interface as posted to linux-api@" * tag 'for-linus-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (328 commits) IB/core: Expose ioctl interface through experimental Kconfig IB/core: Assign root to all drivers IB/core: Add completion queue (cq) object actions IB/core: Add legacy driver's user-data IB/core: Export ioctl enum types to user-space IB/core: Explicitly destroy an object while keeping uobject IB/core: Add macros for declaring methods and attributes IB/core: Add uverbs merge trees functionality IB/core: Add DEVICE object and root tree structure IB/core: Declare an object instead of declaring only type attributes IB/core: Add new ioctl interface RDMA/vmw_pvrdma: Fix a signedness RDMA/vmw_pvrdma: Report network header type in WC IB/core: Add might_sleep() annotation to ib_init_ah_from_wc() IB/cm: Fix sleeping in atomic when RoCE is used IB/core: Add support to finalize objects in one transaction IB/core: Add a generic way to execute an operation on a uobject Documentation: Hardware tag matching IB/mlx5: Support IB_SRQT_TM net/mlx5: Add XRQ support ...
2017-08-24net/mlx5: Add a blank line after declarations V2Saeed Mahameed1-1/+1
The blank line should be after u32 val = ... and not after __be32 __iomem *addr = ... Fixes: ad5b39a95c83 ("net/mlx5: Add a blank line after declarations") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reported-by: Joe Perches <joe@perches.com>
2017-08-20net/mlx5: Add a blank line after declarationsOr Gerlitz1-0/+1
To fix these checkpatch complaints: WARNING: Missing a blank line after declarations Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-10Merge branches '32bit_lid' and 'irq_affinity' into k.o/merge-testDoug Ledford1-7/+2
Conflicts: drivers/infiniband/hw/mlx5/main.c - Both add new code include/rdma/ib_verbs.h - Both add new code Signed-off-by: Doug Ledford <dledford@redhat.com>