aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/intel/igb/igb_main.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-11-07intel-ethernet: software timestamp skbs as late as possibleJacob Keller1-2/+2
Many of the Intel Ethernet drivers call skb_tx_timestamp() earlier than necessary. Move the calls to this function to the latest point possible, just prior to notifying hardware of the new Tx packet when we bump the tail register. This affects i40e, iavf, igb, igc, and ixgbe. The e100, e1000, e1000e, fm10k, and ice drivers already call the skb_tx_timestamp() function just prior to indicating the Tx packet to hardware, so they do not need to be changed. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-10-25Merge tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pciLinus Torvalds1-9/+0
Pull PCI updates from Bjorn Helgaas: - Fix ASPM link_state teardown on removal (Lukas Wunner) - Fix misleading _OSC ASPM message (Sinan Kaya) - Make _OSC optional for PCI (Sinan Kaya) - Don't initialize ASPM link state when ACPI_FADT_NO_ASPM is set (Patrick Talbert) - Remove x86 and arm64 node-local allocation for host bridge structures (Punit Agrawal) - Pay attention to device-specific _PXM node values (Jonathan Cameron) - Support new Immediate Readiness bit (Felipe Balbi) - Differentiate between pciehp surprise and safe removal (Lukas Wunner) - Remove unnecessary pciehp includes (Lukas Wunner) - Drop pciehp hotplug_slot_ops wrappers (Lukas Wunner) - Tolerate PCIe Slot Presence Detect being hardwired to zero to workaround broken hardware, e.g., the Wilocity switch/wireless device (Lukas Wunner) - Unify pciehp controller & slot structs (Lukas Wunner) - Constify hotplug_slot_ops (Lukas Wunner) - Drop hotplug_slot_info (Lukas Wunner) - Embed hotplug_slot struct into users instead of allocating it separately (Lukas Wunner) - Initialize PCIe port service drivers directly instead of relying on initcall ordering (Keith Busch) - Restore PCI config state after a slot reset (Keith Busch) - Save/restore DPC config state along with other PCI config state (Keith Busch) - Reference count devices during AER handling to avoid race issue with concurrent hot removal (Keith Busch) - If an Upstream Port reports ERR_FATAL, don't try to read the Port's config space because it is probably unreachable (Keith Busch) - During error handling, use slot-specific reset instead of secondary bus reset to avoid link up/down issues on hotplug ports (Keith Busch) - Restore previous AER/DPC handling that does not remove and re-enumerate devices on ERR_FATAL (Keith Busch) - Notify all drivers that may be affected by error recovery resets (Keith Busch) - Always generate error recovery uevents, even if a driver doesn't have error callbacks (Keith Busch) - Make PCIe link active reporting detection generic (Keith Busch) - Support D3cold in PCIe hierarchies during system sleep and runtime, including hotplug and Thunderbolt ports (Mika Westerberg) - Handle hpmemsize/hpiosize kernel parameters uniformly, whether slots are empty or occupied (Jon Derrick) - Remove duplicated include from pci/pcie/err.c and unused variable from cpqphp (YueHaibing) - Remove driver pci_cleanup_aer_uncorrect_error_status() calls (Oza Pawandeep) - Uninline PCI bus accessors for better ftracing (Keith Busch) - Remove unused AER Root Port .error_resume method (Keith Busch) - Use kfifo in AER instead of a local version (Keith Busch) - Use threaded IRQ in AER bottom half (Keith Busch) - Use managed resources in AER core (Keith Busch) - Reuse pcie_port_find_device() for AER injection (Keith Busch) - Abstract AER interrupt handling to disconnect error injection (Keith Busch) - Refactor AER injection callbacks to simplify future improvments (Keith Busch) - Remove unused Netronome NFP32xx Device IDs (Jakub Kicinski) - Use bitmap_zalloc() for dma_alias_mask (Andy Shevchenko) - Add switch fall-through annotations (Gustavo A. R. Silva) - Remove unused Switchtec quirk variable (Joshua Abraham) - Fix pci.c kernel-doc warning (Randy Dunlap) - Remove trivial PCI wrappers for DMA APIs (Christoph Hellwig) - Add Intel GPU device IDs to spurious interrupt quirk (Bin Meng) - Run Switchtec DMA aliasing quirk only on NTB endpoints to avoid useless dmesg errors (Logan Gunthorpe) - Update Switchtec NTB documentation (Wesley Yung) - Remove redundant "default n" from Kconfig (Bartlomiej Zolnierkiewicz) - Avoid panic when drivers enable MSI/MSI-X twice (Tonghao Zhang) - Add PCI support for peer-to-peer DMA (Logan Gunthorpe) - Add sysfs group for PCI peer-to-peer memory statistics (Logan Gunthorpe) - Add PCI peer-to-peer DMA scatterlist mapping interface (Logan Gunthorpe) - Add PCI configfs/sysfs helpers for use by peer-to-peer users (Logan Gunthorpe) - Add PCI peer-to-peer DMA driver writer's documentation (Logan Gunthorpe) - Add block layer flag to indicate driver support for PCI peer-to-peer DMA (Logan Gunthorpe) - Map Infiniband scatterlists for peer-to-peer DMA if they contain P2P memory (Logan Gunthorpe) - Register nvme-pci CMB buffer as PCI peer-to-peer memory (Logan Gunthorpe) - Add nvme-pci support for PCI peer-to-peer memory in requests (Logan Gunthorpe) - Use PCI peer-to-peer memory in nvme (Stephen Bates, Steve Wise, Christoph Hellwig, Logan Gunthorpe) - Cache VF config space size to optimize enumeration of many VFs (KarimAllah Ahmed) - Remove unnecessary <linux/pci-ats.h> include (Bjorn Helgaas) - Fix VMD AERSID quirk Device ID matching (Jon Derrick) - Fix Cadence PHY handling during probe (Alan Douglas) - Signal Cadence Endpoint interrupts via AXI region 0 instead of last region (Alan Douglas) - Write Cadence Endpoint MSI interrupts with 32 bits of data (Alan Douglas) - Remove redundant controller tests for "device_type == pci" (Rob Herring) - Document R-Car E3 (R8A77990) bindings (Tho Vu) - Add device tree support for R-Car r8a7744 (Biju Das) - Drop unused mvebu PCIe capability code (Thomas Petazzoni) - Add shared PCI bridge emulation code (Thomas Petazzoni) - Convert mvebu to use shared PCI bridge emulation (Thomas Petazzoni) - Add aardvark Root Port emulation (Thomas Petazzoni) - Support 100MHz/200MHz refclocks for i.MX6 (Lucas Stach) - Add initial power management for i.MX7 (Leonard Crestez) - Add PME_Turn_Off support for i.MX7 (Leonard Crestez) - Fix qcom runtime power management error handling (Bjorn Andersson) - Update TI dra7xx unaligned access errata workaround for host mode as well as endpoint mode (Vignesh R) - Fix kirin section mismatch warning (Nathan Chancellor) - Remove iproc PAXC slot check to allow VF support (Jitendra Bhivare) - Quirk Keystone K2G to limit MRRS to 256 (Kishon Vijay Abraham I) - Update Keystone to use MRRS quirk for host bridge instead of open coding (Kishon Vijay Abraham I) - Refactor Keystone link establishment (Kishon Vijay Abraham I) - Simplify and speed up Keystone link training (Kishon Vijay Abraham I) - Remove unused Keystone host_init argument (Kishon Vijay Abraham I) - Merge Keystone driver files into one (Kishon Vijay Abraham I) - Remove redundant Keystone platform_set_drvdata() (Kishon Vijay Abraham I) - Rename Keystone functions for uniformity (Kishon Vijay Abraham I) - Add Keystone device control module DT binding (Kishon Vijay Abraham I) - Use SYSCON API to get Keystone control module device IDs (Kishon Vijay Abraham I) - Clean up Keystone PHY handling (Kishon Vijay Abraham I) - Use runtime PM APIs to enable Keystone clock (Kishon Vijay Abraham I) - Clean up Keystone config space access checks (Kishon Vijay Abraham I) - Get Keystone outbound window count from DT (Kishon Vijay Abraham I) - Clean up Keystone outbound window configuration (Kishon Vijay Abraham I) - Clean up Keystone DBI setup (Kishon Vijay Abraham I) - Clean up Keystone ks_pcie_link_up() (Kishon Vijay Abraham I) - Fix Keystone IRQ status checking (Kishon Vijay Abraham I) - Add debug messages for all Keystone errors (Kishon Vijay Abraham I) - Clean up Keystone includes and macros (Kishon Vijay Abraham I) - Fix Mediatek unchecked return value from devm_pci_remap_iospace() (Gustavo A. R. Silva) - Fix Mediatek endpoint/port matching logic (Honghui Zhang) - Change Mediatek Root Port Class Code to PCI_CLASS_BRIDGE_PCI (Honghui Zhang) - Remove redundant Mediatek PM domain check (Honghui Zhang) - Convert Mediatek to pci_host_probe() (Honghui Zhang) - Fix Mediatek MSI enablement (Honghui Zhang) - Add Mediatek system PM support for MT2712 and MT7622 (Honghui Zhang) - Add Mediatek loadable module support (Honghui Zhang) - Detach VMD resources after stopping root bus to prevent orphan resources (Jon Derrick) - Convert pcitest build process to that used by other tools (iio, perf, etc) (Gustavo Pimentel) * tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits) PCI/AER: Refactor error injection fallbacks PCI/AER: Abstract AER interrupt handling PCI/AER: Reuse existing pcie_port_find_device() interface PCI/AER: Use managed resource allocations PCI: pcie: Remove redundant 'default n' from Kconfig PCI: aardvark: Implement emulated root PCI bridge config space PCI: mvebu: Convert to PCI emulated bridge config space PCI: mvebu: Drop unused PCI express capability code PCI: Introduce PCI bridge emulated config space common logic PCI: vmd: Detach resources after stopping root bus nvmet: Optionally use PCI P2P memory nvmet: Introduce helper functions to allocate and free request SGLs nvme-pci: Add support for P2P memory in requests nvme-pci: Use PCI p2pmem subsystem to manage the CMB IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]() block: Add PCI P2P flag for request queue PCI/P2PDMA: Add P2P DMA driver writer's documentation docs-rst: Add a new directory for PCI documentation PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset ...
2018-10-02PCI/AER: Remove pci_cleanup_aer_uncorrect_error_status() callsOza Pawandeep1-9/+0
After bfcb79fca19d ("PCI/ERR: Run error recovery callbacks for all affected devices"), AER errors are always cleared by the PCI core and drivers don't need to do it themselves. Remove calls to pci_cleanup_aer_uncorrect_error_status() from device driver error recovery functions. Signed-off-by: Oza Pawandeep <poza@codeaurora.org> [bhelgaas: changelog, remove PCI core changes, remove unused variables] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-09-25Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-30/+0
Version bump conflict in batman-adv, take what's in net-next. iavf conflict, adjustment of netdev_ops in net-next conflicting with poll controller method removal in net. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-23igb: remove ndo_poll_controllerEric Dumazet1-30/+0
As diagnosed by Song Liu, ndo_poll_controller() can be very dangerous on loaded hosts, since the cpu calling ndo_poll_controller() might steal all NAPI contexts (for all RX/TX queues of the NIC). This capture can last for unlimited amount of time, since one cpu is generally not able to drain all the queues under load. igb uses NAPI for TX completions, so we better let core networking stack call the napi->poll() to avoid the capture. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-18intel-ethernet: use correct module licenseJesse Brandeburg1-1/+1
We recently updated all our SPDX identifiers to correctly indicate our net/ethernet/intel/* drivers were always released and intended to be released under GPL v2, but the MODULE_LICENSE declaration was never updated. Fix the MODULE_LICENSE to be GPL v2, for all our drivers. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-08-24igb: Replace GFP_ATOMIC with GFP_KERNEL in igb_sw_init()Jia-Ju Bai1-2/+2
igb_sw_init() is never called in atomic context. It calls kzalloc() and kcalloc() with GFP_ATOMIC, which is not necessary. GFP_ATOMIC can be replaced with GFP_KERNEL. This is found by a static analysis tool named DCNS written by myself. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-08-24igb: Use an advanced ctx descriptor for launchtimeJesus Sanchez-Palencia1-1/+2
On i210, Launchtime (TxTime) requires the usage of an "Advanced Transmit Context Descriptor" for retrieving the timestamp of a packet. The igb driver correctly builds such descriptor on the segmentation flow (i.e. igb_tso()) or on the checksum one (i.e. igb_tx_csum()), but the feature is broken for AF_PACKET if the IGB_TX_FLAGS_VLAN is not set, which happens due to an early return on igb_tx_csum(). This flag is only set by the kernel when a VLAN interface is used, thus we can't just rely on it. Here we are fixing this issue by checking if launchtime is enabled for the current tx_ring before performing the early return. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-08-16Merge tag 'pci-v4.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pciLinus Torvalds1-1/+0
Pull pci updates from Bjorn Helgaas: - Decode AER errors with names similar to "lspci" (Tyler Baicar) - Expose AER statistics in sysfs (Rajat Jain) - Clear AER status bits selectively based on the type of recovery (Oza Pawandeep) - Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST (Alexandru Gagniuc) - Don't clear AER status bits if we're using the "Firmware-First" strategy where firmware owns the registers (Alexandru Gagniuc) - Use sysfs_match_string() to simplify ASPM sysfs parsing (Andy Shevchenko) - Remove unnecessary includes of <linux/pci-aspm.h> (Bjorn Helgaas) - Defer DPC event handling to work queue (Keith Busch) - Use threaded IRQ for DPC bottom half (Keith Busch) - Print AER status while handling DPC events (Keith Busch) - Work around IDT switch ACS Source Validation erratum (James Puthukattukaran) - Emit diagnostics for all cases of PCIe Link downtraining (Links operating slower than they're capable of) (Alexandru Gagniuc) - Skip VFs when configuring Max Payload Size (Myron Stowe) - Reduce Root Port Max Payload Size if necessary when hot-adding a device below it (Myron Stowe) - Simplify SHPC existence/permission checks (Bjorn Helgaas) - Remove hotplug sample skeleton driver (Lukas Wunner) - Convert pciehp to threaded IRQ handling (Lukas Wunner) - Improve pciehp tolerance of missed events and initially unstable links (Lukas Wunner) - Clear spurious pciehp events on resume (Lukas Wunner) - Add pciehp runtime PM support, including for Thunderbolt controllers (Lukas Wunner) - Support interrupts from pciehp bridges in D3hot (Lukas Wunner) - Mark fall-through switch cases before enabling -Wimplicit-fallthrough (Gustavo A. R. Silva) - Move DMA-debug PCI init from arch code to PCI core (Christoph Hellwig) - Fix pci_request_irq() usage of IRQF_ONESHOT when no handler is supplied (Heiner Kallweit) - Unify PCI and DMA direction #defines (Shunyong Yang) - Add PCI_DEVICE_DATA() macro (Andy Shevchenko) - Check for VPD completion before checking for timeout (Bert Kenward) - Limit Netronome NFP5000 config space size to work around erratum (Jakub Kicinski) - Set IRQCHIP_ONESHOT_SAFE for PCI MSI irqchips (Heiner Kallweit) - Document ACPI description of PCI host bridges (Bjorn Helgaas) - Add "pci=disable_acs_redir=" parameter to disable ACS redirection for peer-to-peer DMA support (we don't have the peer-to-peer support yet; this is just one piece) (Logan Gunthorpe) - Clean up devm_of_pci_get_host_bridge_resources() resource allocation (Jan Kiszka) - Fixup resizable BARs after suspend/resume (Christian K├Ânig) - Make "pci=earlydump" generic (Sinan Kaya) - Fix ROM BAR access routines to stay in bounds and check for signature correctly (Rex Zhu) - Add DMA alias quirk for Microsemi Switchtec NTB (Doug Meyer) - Expand documentation for pci_add_dma_alias() (Logan Gunthorpe) - To avoid bus errors, enable PASID only if entire path supports End-End TLP prefixes (Sinan Kaya) - Unify slot and bus reset functions and remove hotplug knowledge from callers (Sinan Kaya) - Add Function-Level Reset quirks for Intel and Samsung NVMe devices to fix guest reboot issues (Alex Williamson) - Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD Controller (Bjorn Helgaas) - Remove Xilinx AXI-PCIe host bridge arch dependency (Palmer Dabbelt) - Remove Aardvark outbound window configuration (Evan Wang) - Fix Aardvark bridge window sizing issue (Zachary Zhang) - Convert Aardvark to use pci_host_probe() to reduce code duplication (Thomas Petazzoni) - Correct the Cadence cdns_pcie_writel() signature (Alan Douglas) - Add Cadence support for optional generic PHYs (Alan Douglas) - Add Cadence power management ops (Alan Douglas) - Remove redundant variable from Cadence driver (Colin Ian King) - Add Kirin MSI support (Xiaowei Song) - Drop unnecessary root_bus_nr setting from exynos, imx6, keystone, armada8k, artpec6, designware-plat, histb, qcom, spear13xx (Shawn Guo) - Move link notification settings from DesignWare core to individual drivers (Gustavo Pimentel) - Add endpoint library MSI-X interfaces (Gustavo Pimentel) - Correct signature of endpoint library IRQ interfaces (Gustavo Pimentel) - Add DesignWare endpoint library MSI-X callbacks (Gustavo Pimentel) - Add endpoint library MSI-X test support (Gustavo Pimentel) - Remove unnecessary GFP_ATOMIC from Hyper-V "new child" allocation (Jia-Ju Bai) - Add more devices to Broadcom PAXC quirk (Ray Jui) - Work around corrupted Broadcom PAXC config space to enable SMMU and GICv3 ITS (Ray Jui) - Disable MSI parsing to work around broken Broadcom PAXC logic in some devices (Ray Jui) - Hide unconfigured functions to work around a Broadcom PAXC defect (Ray Jui) - Lower iproc log level to reduce console output during boot (Ray Jui) - Fix mobiveil iomem/phys_addr_t type usage (Lorenzo Pieralisi) - Fix mobiveil missing include file (Lorenzo Pieralisi) - Add mobiveil Kconfig/Makefile support (Lorenzo Pieralisi) - Fix mvebu I/O space remapping issues (Thomas Petazzoni) - Use generic pci_host_bridge in mvebu instead of ARM-specific API (Thomas Petazzoni) - Whitelist VMD devices with fast interrupt handlers to avoid sharing vectors with slow handlers (Keith Busch) * tag 'pci-v4.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (153 commits) PCI/AER: Don't clear AER bits if error handling is Firmware-First PCI: Limit config space size for Netronome NFP5000 PCI/MSI: Set IRQCHIP_ONESHOT_SAFE for PCI-MSI irqchips PCI/VPD: Check for VPD access completion before checking for timeout PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry PCI: Match Root Port's MPS to endpoint's MPSS as necessary PCI: Skip MPS logic for Virtual Functions (VFs) PCI: Add function 1 DMA alias quirk for Marvell 88SS9183 PCI: Check for PCIe Link downtraining PCI: Add ACS Redirect disable quirk for Intel Sunrise Point PCI: Add device-specific ACS Redirect disable infrastructure PCI: Convert device-specific ACS quirks from NULL termination to ARRAY_SIZE PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support PCI: Allow specifying devices using a base bus and path of devfns PCI: Make specifying PCI devices in kernel parameters reusable PCI: Hide ACS quirk declarations inside PCI core PCI: Delay after FLR of Intel DC P3700 NVMe PCI: Disable Samsung SM961/PM961 NVMe before FLR PCI: Export pcie_has_flr() PCI: mvebu: Drop bogus comment above mvebu_pcie_map_registers() ...
2018-08-07igb_main: Mark expected switch fall-throughsGustavo A. R. Silva1-0/+2
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Addresses-Coverity-ID: 200521 ("Missing break in switch") Addresses-Coverity-ID: 114797 ("Missing break in switch") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-06igb: Remove unnecessary include of <linux/pci-aspm.h>Bjorn Helgaas1-1/+0
The igb driver doesn't need anything provided by pci-aspm.h, so remove the unnecessary include of it. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Sinan Kaya <okaya@kernel.org> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-07-26igb: Use dma_wmb() instead of wmb() before doorbell writesVenkatesh Srinivas1-2/+2
igb writes to doorbells to post transmit and receive descriptors; after writing descriptors to memory but before writing to doorbells, use dma_wmb() rather than wmb(). wmb() is more heavyweight than necessary before doorbell writes. On x86, this avoids SFENCEs before doorbell writes in both the tx and rx refill paths. Signed-off-by: Venkatesh Srinivas <venkateshs@google.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-07-04igb: Add support for ETF offloadJesus Sanchez-Palencia1-17/+121
Implement HW offload support for SO_TXTIME through igb's Launchtime feature. This is done by extending igb_setup_tc() so it supports TC_SETUP_QDISC_ETF and configuring i210 so time based transmit arbitration is enabled. The FQTSS transmission mode added before is extended so strict priority (SP) queues wait for stream reservation (SR) ones. igb_config_tx_modes() is extended so it can support enabling/disabling Launchtime following the previous approach used for the credit-based shaper (CBS). As the previous flow, FQTSS transmission mode is enabled automatically by the driver once Launchtime (or CBS, as before) is enabled. Similarly, it's automatically disabled when the feature is disabled for the last queue that had it setup on. The driver just consumes the transmit times from the skbuffs directly, so no special handling is done in case an 'invalid' time is provided. We assume this has been handled by the ETF qdisc already. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04igb: Only call skb_tx_timestamp after descriptors are readyJesus Sanchez-Palencia1-2/+2
Currently, skb_tx_timestamp() is being called before the Tx descriptors are prepared in igb_xmit_frame_ring(), which happens during either the igb_tso() or igb_tx_csum() calls. Given that now the skb->tstamp might be used to carry the timestamp for SO_TXTIME, we must only call skb_tx_timestamp() after the information has been copied into the Tx descriptors. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04igb: Refactor igb_offload_cbs()Jesus Sanchez-Palencia1-9/+14
Split code into a separate function (igb_offload_apply()) that will be used by ETF offload implementation. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04igb: Only change Tx arbitration when CBS is onJesus Sanchez-Palencia1-16/+33
Currently the data transmission arbitration algorithm - DataTranARB field on TQAVCTRL reg - is always set to CBS when the Tx mode is changed from legacy to 'Qav' mode. Make that configuration a bit more granular in preparation for the upcoming Launchtime enabling patches, since CBS and Launchtime can be enabled separately. That is achieved by moving the DataTranARB setup to igb_config_tx_modes() instead. Similarly, when disabling CBS we must check if it has been disabled for all queues, and clear the DataTranARB accordingly. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04igb: Refactor igb_configure_cbs()Jesus Sanchez-Palencia1-32/+28
Make this function retrieve what it needs from the Tx ring being addressed since it already relies on what had been saved on it before. Also, since this function will be used by the upcoming Launchtime patches rename it to better reflect its intention. Note that Launchtime is not part of what 802.1Qav specifies, but the i210 datasheet refers to this set of functionality as "Qav Transmission Mode". Here we also perform a tiny refactor at is_any_cbs_enabled(), and add further documentation to igb_setup_tx_mode(). Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26net: sched: pass extack pointer to block binds and cb registrationJohn Hurley1-1/+1
Pass the extact struct from a tc qdisc add to the block bind function and, in turn, to the setup_tc ndo of binding device via the tc_block_offload struct. Pass this back to any block callback registrations to allow netlink logging of fails in the bind process. Signed-off-by: John Hurley <john.hurley@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-25cls_flower: fix error values for commands not supported by driversJiri Pirko1-1/+1
-EOPNOTSUPP is the error value that should be reported if a flower command is not supported by a driver. Fix it in couple of Intel drivers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-12treewide: kzalloc() -> kcalloc()Kees Cook1-3/+4
The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-04igb: Wait 10ms just once after TX queues resetSergey Nemov1-5/+8
Move 10ms sleep out of function resetting TX queue. Reset all the TX queues in one turn and wait for all of them just once. Use usleep_range() instead of mdelay() in order not to affect transmission on other interfaces. Signed-off-by: Sergey Nemov <sergey.nemov@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-06-04igb: Clear TSICR interrupts together with ICRJoanna Yurdal1-0/+2
Issuing "ip link set up/down" can block TSICR interrupts, what results in missing PTP Tx timestamp and no PPS pulse generation. Problem happens when the link is set up with the TSICR interrupts pending. ICR is cleared before enabling interrupts, while TSICR is not. When all TSICR interrupts are pending at this moment, time_sync interrupt will never be generated. TSICR should be cleared as well. In order to reproduce the issue: 1. Setup linux with IEEE 1588 grandmaster and PPS output enabled 2. Continue setting link up/down with random intervals between commands 3. Wait until PPS is not generated ( only one pulse is generated and PPS dies), and ptp4l complains constantly about Tx timeout. Signed-off-by: Joanna Yurdal <jyu@trackman.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-27net: intel: Cleanup the copyright/license headersJeff Kirsher1-22/+1
After many years of having a ~30 line copyright and license header to our source files, we are finally able to reduce that to one line with the advent of the SPDX identifier. Also caught a few files missing the SPDX license identifier, so fixed them up. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-25igb: Add support for adding offloaded clsflower filtersVinicius Costa Gomes1-2/+186
This allows filters added by tc-flower and specifying MAC addresses, Ethernet types, and the VLAN priority field, to be offloaded to the controller. This reuses most of the infrastructure used by ethtool, but clsflower filters are kept in a separated list, so they are invisible to ethtool. To setup clsflower offloading: $ tc qdisc replace dev eth0 handle 100: parent root mqprio \ num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \ queues 1@0 1@1 2@2 hw 0 (clsflower offloading depends on the netword driver to be configured with multiple traffic classes, we use mqprio's 'num_tc' parameter to set it to 3) $ tc qdisc add dev eth0 ingress Examples of filters: $ tc filter add dev eth0 parent ffff: flower \ dst_mac aa:aa:aa:aa:aa:aa \ hw_tc 2 skip_sw (just a simple filter filtering for the destination MAC address and steering that traffic to queue 2) $ tc filter add dev enp2s0 parent ffff: proto 0x22f0 flower \ src_mac cc:cc:cc:cc:cc:cc \ hw_tc 1 skip_sw (as the i210 doesn't support steering traffic based on the source address alone, we need to use another steering traffic, in this case we are using the ethernet type (0x22f0) to steer traffic to queue 1) Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Add the skeletons for tc-flower offloadingVinicius Costa Gomes1-0/+66
This adds basic functions needed to implement offloading for filters created by tc-flower. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Allow filters to be added for the local MAC addressVinicius Costa Gomes1-4/+36
Users expect that when adding a steering filter for the local MAC address, that all the traffic directed to that address will go to some queue. Currently, it's not possible to configure entries in the "in use" state, which is the normal state of the local MAC address entry (it is the default), this patch allows to override the steering configuration of "in use" entries, if the filter to be added match the address and address type (source or destination) of an existing entry. There is a bit of a special handling for entries referring to the local MAC address, when they are removed, only the steering configuration is reset. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Add support for enabling queue steering in filtersVinicius Costa Gomes1-0/+26
On some igb models (82575 and i210) the MAC address filters can control to which queue the packet will be assigned. This extends the 'state' with one more state to signify that queue selection should be enabled for that filter. As 82575 parts are no longer easily obtained (and this was developed against i210), only support for the i210 model is enabled. These functions are exported and will be used in the next patch. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Add support for MAC address filters specifying source addressesVinicius Costa Gomes1-5/+35
Makes it possible to direct packets to queues based on their source address. Documents the expected usage of the 'flags' parameter. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Enable the hardware traffic class feature bit for igb modelsVinicius Costa Gomes1-0/+3
This will allow functionality depending on the hardware being traffic class aware to work. In particular the tc-flower offloading checks verifies that this bit is set. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Fix queue selection on MAC filters on i210Vinicius Costa Gomes1-2/+7
On the RAH registers there are semantic differences on the meaning of the "queue" parameter for traffic steering depending on the controller model: there is the 82575 meaning, which "queue" means a RX Hardware Queue, and the i350 meaning, where it is a reception pool. The previous behaviour was having no effect for i210 based controllers because the QSEL bit of the RAH register wasn't being set. This patch separates the condition in discrete cases, so the different handling is clearer. Fixes: 83c21335c876 ("igb: improve MAC filter handling") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24igb: Fix the transmission mode of queue 0 for Qav modeVinicius Costa Gomes1-1/+16
When Qav mode is enabled, queue 0 should be kept on Stream Reservation mode. From the i210 datasheet, section 8.12.19: "Note: Queue0 QueueMode must be set to 1b when TransmitMode is set to Qav." ("QueueMode 1b" represents the Stream Reservation mode) The solution is to give queue 0 the all the credits it might need, so it has priority over queue 1. A situation where this can happen is when cbs is "installed" only on queue 1, leaving queue 0 alone. For example: $ tc qdisc replace dev enp2s0 handle 100: parent root mqprio num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0 $ tc qdisc replace dev enp2s0 parent 100:2 cbs locredit -1470 \ hicredit 30 sendslope -980000 idleslope 20000 offload 1 Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-23intel: add SPDX identifiers to all the Intel driversJeff Kirsher1-0/+1
Add the SPDX identifiers to all the Intel wired LAN driver files, as outlined in Documentation/process/license-rules.rst. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-05igb: Fix a test with HWTSTAMP_TX_ONChristophe JAILLET1-1/+1
'HWTSTAMP_TX_ON' should be handled as a value, not as a bit mask. The modified code should behave the same, because HWTSTAMP_TX_ON is 1 and no other possible values of 'tx_type' would match the test. However, this is more future-proof, should other values be allowed one day. See 'struct hwtstamp_config' in 'include/uapi/linux/net_tstamp.h' This fixes a warning reported by smatch: igb_xmit_frame_ring() warn: bit shifter 'HWTSTAMP_TX_ON' used for logical '&' Fixes: 26bd4e2db06be ("igb: protect TX timestamping from API misuse") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-05igb: Do not call netif_device_detach() when PCIe link goes missingMika Westerberg1-2/+1
When the driver notices that PCIe link is gone by reading 0xffffffff from a register it clears hw->hw_addr and then calls netif_device_detach(). This happens when the PCIe device is physically unplugged for example the user disconnected the Thunderbolt cable. However, netif_device_detach() prevents netif_unregister() from bringing the device down properly including tearing down MSI-X vectors. This triggers following crash during the driver removal: igb 0000:0b:00.0 enp11s0f0: PCIe link lost, device now detached ------------[ cut here ]------------ kernel BUG at drivers/pci/msi.c:352! invalid opcode: 0000 [#1] PREEMPT SMP PTI ... Call Trace: pci_disable_msix+0xc9/0xf0 igb_reset_interrupt_capability+0x58/0x60 [igb] igb_remove+0x90/0x100 [igb] pci_device_remove+0x31/0xa0 device_release_driver_internal+0x152/0x210 pci_stop_bus_device+0x78/0xa0 pci_stop_bus_device+0x38/0xa0 pci_stop_bus_device+0x38/0xa0 pci_stop_bus_device+0x26/0xa0 pci_stop_bus_device+0x38/0xa0 pci_stop_and_remove_bus_device+0x9/0x20 trim_stale_devices+0xee/0x130 ? _raw_spin_unlock_irqrestore+0xf/0x30 trim_stale_devices+0x8f/0x130 ? _raw_spin_unlock_irqrestore+0xf/0x30 trim_stale_devices+0xa1/0x130 ? get_slot_status+0x8b/0xc0 acpiphp_check_bridge.part.7+0xf9/0x140 acpiphp_hotplug_notify+0x170/0x1f0 ... To prevent the crash do not call netif_device_detach() in igb_rd32(). This should be fine because hw->hw_addr is set to NULL preventing future hardware access of the now missing device. Link: https://bugzilla.kernel.org/show_bug.cgi?id=198181 Reported-by: Ferenc Boldog <ferenc.boldog@gmail.com> Reported-by: Nikolay Bogoychev <nheart@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-05igb: add VF trust infrastructureCorinna Vinschen1-3/+27
* Add a per-VF value to know if a VF is trusted, by default don't trust VFs. * Implement netdev op to trust VFs (igb_ndo_set_vf_trust) and add trust status to ndo_get_vf_config output. * Allow a trusted VF to change MAC and MAC filters even if MAC has been administratively set. Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-24igb: Delete an error message for a failed memory allocation in igb_enable_sriov()Markus Elfring1-2/+0
Omit an extra message for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-24igb: Free IRQs when device is hotpluggedLyude Paul1-1/+1
Recently I got a Caldigit TS3 Thunderbolt 3 dock, and noticed that upon hotplugging my kernel would immediately crash due to igb: [ 680.825801] kernel BUG at drivers/pci/msi.c:352! [ 680.828388] invalid opcode: 0000 [#1] SMP [ 680.829194] Modules linked in: igb(O) thunderbolt i2c_algo_bit joydev vfat fat btusb btrtl btbcm btintel bluetooth ecdh_generic hp_wmi sparse_keymap rfkill wmi_bmof iTCO_wdt intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul snd_pcm rtsx_pci_ms mei_me snd_timer memstick snd pcspkr mei soundcore i2c_i801 tpm_tis psmouse shpchp wmi tpm_tis_core tpm video hp_wireless acpi_pad rtsx_pci_sdmmc mmc_core crc32c_intel serio_raw rtsx_pci mfd_core xhci_pci xhci_hcd i2c_hid i2c_core [last unloaded: igb] [ 680.831085] CPU: 1 PID: 78 Comm: kworker/u16:1 Tainted: G O 4.15.0-rc3Lyude-Test+ #6 [ 680.831596] Hardware name: HP HP ZBook Studio G4/826B, BIOS P71 Ver. 01.03 06/09/2017 [ 680.832168] Workqueue: kacpi_hotplug acpi_hotplug_work_fn [ 680.832687] RIP: 0010:free_msi_irqs+0x180/0x1b0 [ 680.833271] RSP: 0018:ffffc9000030fbf0 EFLAGS: 00010286 [ 680.833761] RAX: ffff8803405f9c00 RBX: ffff88033e3d2e40 RCX: 000000000000002c [ 680.834278] RDX: 0000000000000000 RSI: 00000000000000ac RDI: ffff880340be2178 [ 680.834832] RBP: 0000000000000000 R08: ffff880340be1ff0 R09: ffff8803405f9c00 [ 680.835342] R10: 0000000000000000 R11: 0000000000000040 R12: ffff88033d63a298 [ 680.835822] R13: ffff88033d63a000 R14: 0000000000000060 R15: ffff880341959000 [ 680.836332] FS: 0000000000000000(0000) GS:ffff88034f440000(0000) knlGS:0000000000000000 [ 680.836817] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 680.837360] CR2: 000055e64044afdf CR3: 0000000001c09002 CR4: 00000000003606e0 [ 680.837954] Call Trace: [ 680.838853] pci_disable_msix+0xce/0xf0 [ 680.839616] igb_reset_interrupt_capability+0x5d/0x60 [igb] [ 680.840278] igb_remove+0x9d/0x110 [igb] [ 680.840764] pci_device_remove+0x36/0xb0 [ 680.841279] device_release_driver_internal+0x157/0x220 [ 680.841739] pci_stop_bus_device+0x7d/0xa0 [ 680.842255] pci_stop_bus_device+0x2b/0xa0 [ 680.842722] pci_stop_bus_device+0x3d/0xa0 [ 680.843189] pci_stop_and_remove_bus_device+0xe/0x20 [ 680.843627] trim_stale_devices+0xf3/0x140 [ 680.844086] trim_stale_devices+0x94/0x140 [ 680.844532] trim_stale_devices+0xa6/0x140 [ 680.845031] ? get_slot_status+0x90/0xc0 [ 680.845536] acpiphp_check_bridge.part.5+0xfe/0x140 [ 680.846021] acpiphp_hotplug_notify+0x175/0x200 [ 680.846581] ? free_bridge+0x100/0x100 [ 680.847113] acpi_device_hotplug+0x8a/0x490 [ 680.847535] acpi_hotplug_work_fn+0x1a/0x30 [ 680.848076] process_one_work+0x182/0x3a0 [ 680.848543] worker_thread+0x2e/0x380 [ 680.848963] ? process_one_work+0x3a0/0x3a0 [ 680.849373] kthread+0x111/0x130 [ 680.849776] ? kthread_create_worker_on_cpu+0x50/0x50 [ 680.850188] ret_from_fork+0x1f/0x30 [ 680.850601] Code: 43 14 85 c0 0f 84 d5 fe ff ff 31 ed eb 0f 83 c5 01 39 6b 14 0f 86 c5 fe ff ff 8b 7b 10 01 ef e8 b7 e4 d2 ff 48 83 78 70 00 74 e3 <0f> 0b 49 8d b5 a0 00 00 00 e8 62 6f d3 ff e9 c7 fe ff ff 48 8b [ 680.851497] RIP: free_msi_irqs+0x180/0x1b0 RSP: ffffc9000030fbf0 As it turns out, normally the freeing of IRQs that would fix this is called inside of the scope of __igb_close(). However, since the device is already gone by the point we try to unregister the netdevice from the driver due to a hotplug we end up seeing that the netif isn't present and thus, forget to free any of the device IRQs. So: make sure that if we're in the process of dismantling the netdev, we always allow __igb_close() to be called so that IRQs may be freed normally. Additionally, only allow igb_close() to be called from __igb_close() if it hasn't already been called for the given adapter. Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach") Cc: Todd Fujinaka <todd.fujinaka@intel.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: stable@vger.kernel.org Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-24igb: Clarify idleslope config constraintsJesus Sanchez-Palencia1-0/+14
By design, the idleslope increments are restricted to 16.384kbps steps. Add a comment to igb_main.c making that explicit and add one example that illustrates the impact of that. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-24igb: add function to get maximum RSS queuesZhang Shengju1-2/+10
This patch adds a new function igb_get_max_rss_queues() to get maximum RSS queues, this will reduce duplicate code and facilitate future maintenance. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-24igb: Allow to remove administratively set MAC on VFsCorinna Vinschen1-11/+31
Before libvirt modifies the MAC address and vlan tag for an SRIOV VF for use by a virtual machine (either using vfio device assignment or macvtap passthru mode), it saves the current MAC address and vlan tag so that it can reset them to their original value when the guest is done. Libvirt can't leave the VF MAC set to the value used by the now-defunct guest since it may be started again later using a different VF, but it certainly shouldn't just pick any random value, either. So it saves the state of everything prior to using the VF, and resets it to that. The igb driver initializes the MAC addresses of all VFs to 00:00:00:00:00:00, and reports that when asked (via an RTM_GETLINK netlink message, also visible in the list of VFs in the output of "ip link show"). But when libvirt attempts to restore the MAC address back to 00:00:00:00:00:00 (using an RTM_SETLINK netlink message) the kernel responds with "Invalid argument". Forbidding a reset back to the original value leaves the VF MAC at the value set for the now-defunct virtual machine. Especially on a system with NetworkManager enabled, this has very bad consequences, since NetworkManager forces all interfacess to be IFF_UP all the time - if the same virtual machine is restarted using a different VF (or even on a different host), there will be multiple interfaces watching for traffic with the same MAC address. To allow libvirt to revert to the original state, we need a way to remove the administrative set MAC on a VF, to allow normal host operation again, and to reset/overwrite the VF MAC via VF netdev. This patch implements the outlined scenario by allowing to set the VF MAC to 00:00:00:00:00:00 via RTM_SETLINK on the PF. igb_ndo_set_vf_mac resets the IGB_VF_FLAG_PF_SET_MAC flag to 0, so it's possible to reset the VF MAC back to the original value via the VF netdev. Note: Recent patches to libvirt allow for a workaround if the NIC isn't capable of resetting the administrative MAC back to all 0, but in theory the NIC should allow resetting the MAC in the first place. Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Tested-by: Aaron Brown <arron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-11-21igb: Use smp_rmb rather than read_barrier_dependsBrian King1-1/+1
The original issue being fixed in this patch was seen with the ixgbe driver, but the same issue exists with igb as well, as the code is very similar. read_barrier_depends is not sufficient to ensure loads following it are not speculatively loaded out of order by the CPU, which can result in stale data being loaded, causing potential system crashes. Cc: stable <stable@vger.kernel.org> Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-11-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-10/+357
Pull networking updates from David Miller: "Highlights: 1) Maintain the TCP retransmit queue using an rbtree, with 1GB windows at 100Gb this really has become necessary. From Eric Dumazet. 2) Multi-program support for cgroup+bpf, from Alexei Starovoitov. 3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew Lunn. 4) Add meter action support to openvswitch, from Andy Zhou. 5) Add a data meta pointer for BPF accessible packets, from Daniel Borkmann. 6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet. 7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli. 8) More work to move the RTNL mutex down, from Florian Westphal. 9) Add 'bpftool' utility, to help with bpf program introspection. From Jakub Kicinski. 10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper Dangaard Brouer. 11) Support 'blocks' of transformations in the packet scheduler which can span multiple network devices, from Jiri Pirko. 12) TC flower offload support in cxgb4, from Kumar Sanghvi. 13) Priority based stream scheduler for SCTP, from Marcelo Ricardo Leitner. 14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg. 15) Add RED qdisc offloadability, and use it in mlxsw driver. From Nogah Frankel. 16) eBPF based device controller for cgroup v2, from Roman Gushchin. 17) Add some fundamental tracepoints for TCP, from Song Liu. 18) Remove garbage collection from ipv6 route layer, this is a significant accomplishment. From Wei Wang. 19) Add multicast route offload support to mlxsw, from Yotam Gigi" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits) tcp: highest_sack fix geneve: fix fill_info when link down bpf: fix lockdep splat net: cdc_ncm: GetNtbFormat endian fix openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start netem: remove unnecessary 64 bit modulus netem: use 64 bit divide by rate tcp: Namespace-ify sysctl_tcp_default_congestion_control net: Protect iterations over net::fib_notifier_ops in fib_seq_sum() ipv6: set all.accept_dad to 0 by default uapi: fix linux/tls.h userspace compilation error usbnet: ipheth: prevent TX queue timeouts when device not ready vhost_net: conditionally enable tx polling uapi: fix linux/rxrpc.h userspace compilation errors net: stmmac: fix LPI transitioning for dwmac4 atm: horizon: Fix irq release error net-sysfs: trigger netlink notification on ifalias change via sysfs openvswitch: Using kfree_rcu() to simplify the code openvswitch: Make local function ovs_nsh_key_attr_size() static openvswitch: Fix return value check in ovs_meter_cmd_features() ...
2017-11-08net_sch: cbs: Change TC_SETUP_CBS to TC_SETUP_QDISC_CBSNogah Frankel1-1/+1
Change TC_SETUP_CBS to TC_SETUP_QDISC_CBS to match the new convention.. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-07Merge branch 'linus' into locking/core, to resolve conflictsIngo Molnar1-1/+1
Conflicts: include/linux/compiler-clang.h include/linux/compiler-gcc.h include/linux/compiler-intel.h include/uapi/linux/stddef.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
Several conflicts here. NFP driver bug fix adding nfp_netdev_is_nfp_repr() check to nfp_fl_output() needed some adjustments because the code block is in an else block now. Parallel additions to net/pkt_cls.h and net/sch_generic.h A bug fix in __tcp_retransmit_skb() conflicted with some of the rbtree changes in net-next. The tc action RCU callback fixes in 'net' had some overlap with some of the recent tcf_block reworking. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-27igb: Add support for CBS offloadAndre Guedes1-0/+347
This patch adds support for Credit-Based Shaper (CBS) qdisc offload from Traffic Control system. This support enable us to leverage the Forwarding and Queuing for Time-Sensitive Streams (FQTSS) features from Intel i210 Ethernet Controller. FQTSS is the former 802.1Qav standard which was merged into 802.1Q in 2014. It enables traffic prioritization and bandwidth reservation via the Credit-Based Shaper which is implemented in hardware by i210 controller. The patch introduces the igb_setup_tc() function which implements the support for CBS qdisc hardware offload in the IGB driver. CBS offload is the only traffic control offload supported by the driver at the moment. FQTSS transmission mode from i210 controller is automatically enabled by the IGB driver when the CBS is enabled for the first hardware queue. Likewise, FQTSS mode is automatically disabled when CBS is disabled for the last hardware queue. Changing FQTSS mode requires NIC reset. FQTSS feature is supported by i210 controller only. Signed-off-by: Andre Guedes <andre.guedes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Tested-by: Henrik Austad <henrik@austad.us> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-26igb: Fix TX map failure pathJean-Philippe Brucker1-1/+1
When the driver cannot map a TX buffer, instead of rolling back gracefully and retrying later, we currently get a panic: [ 159.885994] igb 0000:00:00.0: TX DMA map failed [ 159.886588] Unable to handle kernel paging request at virtual address ffff00000a08c7a8 ... [ 159.897031] PC is at igb_xmit_frame_ring+0x9c8/0xcb8 Fix the erroneous test that leads to this situation. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-25locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()Mark Rutland1-1/+1
Please do not apply this to mainline directly, instead please re-run the coccinelle script shown below and apply its output. For several reasons, it is desirable to use {READ,WRITE}_ONCE() in preference to ACCESS_ONCE(), and new code is expected to use one of the former. So far, there's been no reason to change most existing uses of ACCESS_ONCE(), as these aren't harmful, and changing them results in churn. However, for some features, the read/write distinction is critical to correct operation. To distinguish these cases, separate read/write accessors must be used. This patch migrates (most) remaining ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following coccinelle script: ---- // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and // WRITE_ONCE() // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch virtual patch @ depends on patch @ expression E1, E2; @@ - ACCESS_ONCE(E1) = E2 + WRITE_ONCE(E1, E2) @ depends on patch @ expression E; @@ - ACCESS_ONCE(E) + READ_ONCE(E) ---- Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davem@davemloft.net Cc: linux-arch@vger.kernel.org Cc: mpe@ellerman.id.au Cc: shuah@kernel.org Cc: snitzer@redhat.com Cc: thor.thayer@linux.intel.com Cc: tj@kernel.org Cc: viro@zeniv.linux.org.uk Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-18ethernet/intel: Convert timers to use timer_setup()Kees Cook1-10/+8
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Switches test of .data field to .function, since .data will be going away. Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10igb: check memory allocation failureChristophe JAILLET1-0/+2
Check memory allocation failures and return -ENOMEM in such cases, as already done for other memory allocations in this function. This avoids NULL pointers dereference. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: Aaron Brown <aaron.f.brown@intel.com Acked-by: PJ Waskiewicz <peter.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>