path: root/drivers/net/ethernet/intel/igb (follow)
AgeCommit message (Collapse)AuthorFilesLines
2018-04-25igb: Add support for adding offloaded clsflower filtersVinicius Costa Gomes2-2/+188
This allows filters added by tc-flower and specifying MAC addresses, Ethernet types, and the VLAN priority field, to be offloaded to the controller. This reuses most of the infrastructure used by ethtool, but clsflower filters are kept in a separated list, so they are invisible to ethtool. To setup clsflower offloading: $ tc qdisc replace dev eth0 handle 100: parent root mqprio \ num_tc 3 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \ queues 1@0 1@1 2@2 hw 0 (clsflower offloading depends on the netword driver to be configured with multiple traffic classes, we use mqprio's 'num_tc' parameter to set it to 3) $ tc qdisc add dev eth0 ingress Examples of filters: $ tc filter add dev eth0 parent ffff: flower \ dst_mac aa:aa:aa:aa:aa:aa \ hw_tc 2 skip_sw (just a simple filter filtering for the destination MAC address and steering that traffic to queue 2) $ tc filter add dev enp2s0 parent ffff: proto 0x22f0 flower \ src_mac cc:cc:cc:cc:cc:cc \ hw_tc 1 skip_sw (as the i210 doesn't support steering traffic based on the source address alone, we need to use another steering traffic, in this case we are using the ethernet type (0x22f0) to steer traffic to queue 1) Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Add the skeletons for tc-flower offloadingVinicius Costa Gomes1-0/+66
This adds basic functions needed to implement offloading for filters created by tc-flower. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Add MAC address support for ethtool nftuple filtersVinicius Costa Gomes1-4/+39
This adds the capability of configuring the queue steering of arriving packets based on their source and destination MAC addresses. Source address steering (i.e. driving traffic to a specific queue), for the i210, does not work, but filtering does (i.e. accepting traffic based on the source address). So, trying to add a filter specifying only a source address will be an error. In practical terms this adds support for the following use cases, characterized by these examples: $ ethtool -N eth0 flow-type ether dst aa:aa:aa:aa:aa:aa action 0 (this will direct packets with destination address "aa:aa:aa:aa:aa:aa" to the RX queue 0) $ ethtool -N eth0 flow-type ether src 44:44:44:44:44:44 \ proto 0x22f0 action 3 (this will direct packets with source address "44:44:44:44:44:44" and ethertype 0x22f0 to the RX queue 3) Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Enable nfc filters to specify MAC addressesVinicius Costa Gomes2-0/+32
This allows igb_add_filter()/igb_erase_filter() to work on filters that include MAC addresses (both source and destination). For now, this only exposes the functionality, the next commit glues ethtool into this. Later in this series, these APIs are used to allow offloading of cls_flower filters. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Allow filters to be added for the local MAC addressVinicius Costa Gomes1-4/+36
Users expect that when adding a steering filter for the local MAC address, that all the traffic directed to that address will go to some queue. Currently, it's not possible to configure entries in the "in use" state, which is the normal state of the local MAC address entry (it is the default), this patch allows to override the steering configuration of "in use" entries, if the filter to be added match the address and address type (source or destination) of an existing entry. There is a bit of a special handling for entries referring to the local MAC address, when they are removed, only the steering configuration is reset. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Add support for enabling queue steering in filtersVinicius Costa Gomes3-0/+33
On some igb models (82575 and i210) the MAC address filters can control to which queue the packet will be assigned. This extends the 'state' with one more state to signify that queue selection should be enabled for that filter. As 82575 parts are no longer easily obtained (and this was developed against i210), only support for the i210 model is enabled. These functions are exported and will be used in the next patch. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Add support for MAC address filters specifying source addressesVinicius Costa Gomes3-5/+37
Makes it possible to direct packets to queues based on their source address. Documents the expected usage of the 'flags' parameter. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Enable the hardware traffic class feature bit for igb modelsVinicius Costa Gomes1-0/+3
This will allow functionality depending on the hardware being traffic class aware to work. In particular the tc-flower offloading checks verifies that this bit is set. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Fix queue selection on MAC filters on i210Vinicius Costa Gomes1-2/+7
On the RAH registers there are semantic differences on the meaning of the "queue" parameter for traffic steering depending on the controller model: there is the 82575 meaning, which "queue" means a RX Hardware Queue, and the i350 meaning, where it is a reception pool. The previous behaviour was having no effect for i210 based controllers because the QSEL bit of the RAH register wasn't being set. This patch separates the condition in discrete cases, so the different handling is clearer. Fixes: 83c21335c876 ("igb: improve MAC filter handling") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25igb: Fix not adding filter elements to the listVinicius Costa Gomes1-1/+1
Because the order of the parameters passes to 'hlist_add_behind()' was inverted, the 'parent' node was added "behind" the 'input', as input is not in the list, this causes the 'input' node to be lost. Fixes: 0e71def25281 ("igb: add support of RX network flow classification") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24igb: Fix the transmission mode of queue 0 for Qav modeVinicius Costa Gomes1-1/+16
When Qav mode is enabled, queue 0 should be kept on Stream Reservation mode. From the i210 datasheet, section 8.12.19: "Note: Queue0 QueueMode must be set to 1b when TransmitMode is set to Qav." ("QueueMode 1b" represents the Stream Reservation mode) The solution is to give queue 0 the all the credits it might need, so it has priority over queue 1. A situation where this can happen is when cbs is "installed" only on queue 1, leaving queue 0 alone. For example: $ tc qdisc replace dev enp2s0 handle 100: parent root mqprio num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0 $ tc qdisc replace dev enp2s0 parent 100:2 cbs locredit -1470 \ hicredit 30 sendslope -980000 idleslope 20000 offload 1 Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26ethernet: Use octal not symbolic permissionsJoe Perches1-1/+1
Prefer the direct use of octal for permissions. Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace and some typing. Miscellanea: o Whitespace neatening around these conversions. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23intel: add SPDX identifiers to all the Intel driversJeff Kirsher21-0/+21
Add the SPDX identifiers to all the Intel wired LAN driver files, as outlined in Documentation/process/license-rules.rst. Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-05igb: Fix a test with HWTSTAMP_TX_ONChristophe JAILLET1-1/+1
'HWTSTAMP_TX_ON' should be handled as a value, not as a bit mask. The modified code should behave the same, because HWTSTAMP_TX_ON is 1 and no other possible values of 'tx_type' would match the test. However, this is more future-proof, should other values be allowed one day. See 'struct hwtstamp_config' in 'include/uapi/linux/net_tstamp.h' This fixes a warning reported by smatch: igb_xmit_frame_ring() warn: bit shifter 'HWTSTAMP_TX_ON' used for logical '&' Fixes: 26bd4e2db06be ("igb: protect TX timestamping from API misuse") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-05igb: Do not call netif_device_detach() when PCIe link goes missingMika Westerberg1-2/+1
When the driver notices that PCIe link is gone by reading 0xffffffff from a register it clears hw->hw_addr and then calls netif_device_detach(). This happens when the PCIe device is physically unplugged for example the user disconnected the Thunderbolt cable. However, netif_device_detach() prevents netif_unregister() from bringing the device down properly including tearing down MSI-X vectors. This triggers following crash during the driver removal: igb 0000:0b:00.0 enp11s0f0: PCIe link lost, device now detached ------------[ cut here ]------------ kernel BUG at drivers/pci/msi.c:352! invalid opcode: 0000 [#1] PREEMPT SMP PTI ... Call Trace: pci_disable_msix+0xc9/0xf0 igb_reset_interrupt_capability+0x58/0x60 [igb] igb_remove+0x90/0x100 [igb] pci_device_remove+0x31/0xa0 device_release_driver_internal+0x152/0x210 pci_stop_bus_device+0x78/0xa0 pci_stop_bus_device+0x38/0xa0 pci_stop_bus_device+0x38/0xa0 pci_stop_bus_device+0x26/0xa0 pci_stop_bus_device+0x38/0xa0 pci_stop_and_remove_bus_device+0x9/0x20 trim_stale_devices+0xee/0x130 ? _raw_spin_unlock_irqrestore+0xf/0x30 trim_stale_devices+0x8f/0x130 ? _raw_spin_unlock_irqrestore+0xf/0x30 trim_stale_devices+0xa1/0x130 ? get_slot_status+0x8b/0xc0 acpiphp_check_bridge.part.7+0xf9/0x140 acpiphp_hotplug_notify+0x170/0x1f0 ... To prevent the crash do not call netif_device_detach() in igb_rd32(). This should be fine because hw->hw_addr is set to NULL preventing future hardware access of the now missing device. Link: https://bugzilla.kernel.org/show_bug.cgi?id=198181 Reported-by: Ferenc Boldog <ferenc.boldog@gmail.com> Reported-by: Nikolay Bogoychev <nheart@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-05igb: add VF trust infrastructureCorinna Vinschen2-3/+28
* Add a per-VF value to know if a VF is trusted, by default don't trust VFs. * Implement netdev op to trust VFs (igb_ndo_set_vf_trust) and add trust status to ndo_get_vf_config output. * Allow a trusted VF to change MAC and MAC filters even if MAC has been administratively set. Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-24igb: Clear TXSTMP when ptp_tx_work() is timeoutDaniel Hua1-0/+9
Problem description: After ethernet cable connect and disconnect for several iterations on a device with i210, tx timestamp will stop being put into the socket. Steps to reproduce: 1. Setup a device with i210 and wire it to a 802.1AS capable switch ( Extreme Networks Summit x440 is used in our case) 2. Have the gptp daemon running on the device and make sure it is synced with the switch 3. Have the switch disable and enable the port, wait for the device gets resynced with the switch 4. Iterates step 3 until the device is not albe to get resynced 5. Review the log in dmesg and you will see warning message "igb : clearing Tx timestamp hang" Root cause: If ptp_tx_work() gets scheduled just before the port gets disabled, a LINK DOWN event will be processed before ptp_tx_work(), which may cause timeout in ptp_tx_work(). In the timeout logic, the TSYNCTXCTL's TXTT bit (Transmit timestamp valid bit) is not cleared, causing no new timestamp loaded to TXSTMP register. Consequently therefore, no new interrupt is triggerred by TSICR.TXTS bit and no more Tx timestamp send to the socket. Signed-off-by: Daniel Hua <daniel.hua@ni.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-24igb: Delete an error message for a failed memory allocation in igb_enable_sriov()Markus Elfring1-2/+0
Omit an extra message for a memory allocation failure in this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-24igb: Free IRQs when device is hotpluggedLyude Paul1-1/+1
Recently I got a Caldigit TS3 Thunderbolt 3 dock, and noticed that upon hotplugging my kernel would immediately crash due to igb: [ 680.825801] kernel BUG at drivers/pci/msi.c:352! [ 680.828388] invalid opcode: 0000 [#1] SMP [ 680.829194] Modules linked in: igb(O) thunderbolt i2c_algo_bit joydev vfat fat btusb btrtl btbcm btintel bluetooth ecdh_generic hp_wmi sparse_keymap rfkill wmi_bmof iTCO_wdt intel_rapl x86_pkg_temp_thermal coretemp crc32_pclmul snd_pcm rtsx_pci_ms mei_me snd_timer memstick snd pcspkr mei soundcore i2c_i801 tpm_tis psmouse shpchp wmi tpm_tis_core tpm video hp_wireless acpi_pad rtsx_pci_sdmmc mmc_core crc32c_intel serio_raw rtsx_pci mfd_core xhci_pci xhci_hcd i2c_hid i2c_core [last unloaded: igb] [ 680.831085] CPU: 1 PID: 78 Comm: kworker/u16:1 Tainted: G O 4.15.0-rc3Lyude-Test+ #6 [ 680.831596] Hardware name: HP HP ZBook Studio G4/826B, BIOS P71 Ver. 01.03 06/09/2017 [ 680.832168] Workqueue: kacpi_hotplug acpi_hotplug_work_fn [ 680.832687] RIP: 0010:free_msi_irqs+0x180/0x1b0 [ 680.833271] RSP: 0018:ffffc9000030fbf0 EFLAGS: 00010286 [ 680.833761] RAX: ffff8803405f9c00 RBX: ffff88033e3d2e40 RCX: 000000000000002c [ 680.834278] RDX: 0000000000000000 RSI: 00000000000000ac RDI: ffff880340be2178 [ 680.834832] RBP: 0000000000000000 R08: ffff880340be1ff0 R09: ffff8803405f9c00 [ 680.835342] R10: 0000000000000000 R11: 0000000000000040 R12: ffff88033d63a298 [ 680.835822] R13: ffff88033d63a000 R14: 0000000000000060 R15: ffff880341959000 [ 680.836332] FS: 0000000000000000(0000) GS:ffff88034f440000(0000) knlGS:0000000000000000 [ 680.836817] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 680.837360] CR2: 000055e64044afdf CR3: 0000000001c09002 CR4: 00000000003606e0 [ 680.837954] Call Trace: [ 680.838853] pci_disable_msix+0xce/0xf0 [ 680.839616] igb_reset_interrupt_capability+0x5d/0x60 [igb] [ 680.840278] igb_remove+0x9d/0x110 [igb] [ 680.840764] pci_device_remove+0x36/0xb0 [ 680.841279] device_release_driver_internal+0x157/0x220 [ 680.841739] pci_stop_bus_device+0x7d/0xa0 [ 680.842255] pci_stop_bus_device+0x2b/0xa0 [ 680.842722] pci_stop_bus_device+0x3d/0xa0 [ 680.843189] pci_stop_and_remove_bus_device+0xe/0x20 [ 680.843627] trim_stale_devices+0xf3/0x140 [ 680.844086] trim_stale_devices+0x94/0x140 [ 680.844532] trim_stale_devices+0xa6/0x140 [ 680.845031] ? get_slot_status+0x90/0xc0 [ 680.845536] acpiphp_check_bridge.part.5+0xfe/0x140 [ 680.846021] acpiphp_hotplug_notify+0x175/0x200 [ 680.846581] ? free_bridge+0x100/0x100 [ 680.847113] acpi_device_hotplug+0x8a/0x490 [ 680.847535] acpi_hotplug_work_fn+0x1a/0x30 [ 680.848076] process_one_work+0x182/0x3a0 [ 680.848543] worker_thread+0x2e/0x380 [ 680.848963] ? process_one_work+0x3a0/0x3a0 [ 680.849373] kthread+0x111/0x130 [ 680.849776] ? kthread_create_worker_on_cpu+0x50/0x50 [ 680.850188] ret_from_fork+0x1f/0x30 [ 680.850601] Code: 43 14 85 c0 0f 84 d5 fe ff ff 31 ed eb 0f 83 c5 01 39 6b 14 0f 86 c5 fe ff ff 8b 7b 10 01 ef e8 b7 e4 d2 ff 48 83 78 70 00 74 e3 <0f> 0b 49 8d b5 a0 00 00 00 e8 62 6f d3 ff e9 c7 fe ff ff 48 8b [ 680.851497] RIP: free_msi_irqs+0x180/0x1b0 RSP: ffffc9000030fbf0 As it turns out, normally the freeing of IRQs that would fix this is called inside of the scope of __igb_close(). However, since the device is already gone by the point we try to unregister the netdevice from the driver due to a hotplug we end up seeing that the netif isn't present and thus, forget to free any of the device IRQs. So: make sure that if we're in the process of dismantling the netdev, we always allow __igb_close() to be called so that IRQs may be freed normally. Additionally, only allow igb_close() to be called from __igb_close() if it hasn't already been called for the given adapter. Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: 9474933caf21 ("igb: close/suspend race in netif_device_detach") Cc: Todd Fujinaka <todd.fujinaka@intel.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: stable@vger.kernel.org Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-24igb: Clarify idleslope config constraintsJesus Sanchez-Palencia1-0/+14
By design, the idleslope increments are restricted to 16.384kbps steps. Add a comment to igb_main.c making that explicit and add one example that illustrates the impact of that. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-24igb: add function to get maximum RSS queuesZhang Shengju3-33/+12
This patch adds a new function igb_get_max_rss_queues() to get maximum RSS queues, this will reduce duplicate code and facilitate future maintenance. Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-24igb: Allow to remove administratively set MAC on VFsCorinna Vinschen1-11/+31
Before libvirt modifies the MAC address and vlan tag for an SRIOV VF for use by a virtual machine (either using vfio device assignment or macvtap passthru mode), it saves the current MAC address and vlan tag so that it can reset them to their original value when the guest is done. Libvirt can't leave the VF MAC set to the value used by the now-defunct guest since it may be started again later using a different VF, but it certainly shouldn't just pick any random value, either. So it saves the state of everything prior to using the VF, and resets it to that. The igb driver initializes the MAC addresses of all VFs to 00:00:00:00:00:00, and reports that when asked (via an RTM_GETLINK netlink message, also visible in the list of VFs in the output of "ip link show"). But when libvirt attempts to restore the MAC address back to 00:00:00:00:00:00 (using an RTM_SETLINK netlink message) the kernel responds with "Invalid argument". Forbidding a reset back to the original value leaves the VF MAC at the value set for the now-defunct virtual machine. Especially on a system with NetworkManager enabled, this has very bad consequences, since NetworkManager forces all interfacess to be IFF_UP all the time - if the same virtual machine is restarted using a different VF (or even on a different host), there will be multiple interfaces watching for traffic with the same MAC address. To allow libvirt to revert to the original state, we need a way to remove the administrative set MAC on a VF, to allow normal host operation again, and to reset/overwrite the VF MAC via VF netdev. This patch implements the outlined scenario by allowing to set the VF MAC to 00:00:00:00:00:00 via RTM_SETLINK on the PF. igb_ndo_set_vf_mac resets the IGB_VF_FLAG_PF_SET_MAC flag to 0, so it's possible to reset the VF MAC back to the original value via the VF netdev. Note: Recent patches to libvirt allow for a workaround if the NIC isn't capable of resetting the administrative MAC back to all 0, but in theory the NIC should allow resetting the MAC in the first place. Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Tested-by: Aaron Brown <arron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-11-21igb: Use smp_rmb rather than read_barrier_dependsBrian King1-1/+1
The original issue being fixed in this patch was seen with the ixgbe driver, but the same issue exists with igb as well, as the code is very similar. read_barrier_depends is not sufficient to ensure loads following it are not speculatively loaded out of order by the CPU, which can result in stale data being loaded, causing potential system crashes. Cc: stable <stable@vger.kernel.org> Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-11-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds4-10/+394
Pull networking updates from David Miller: "Highlights: 1) Maintain the TCP retransmit queue using an rbtree, with 1GB windows at 100Gb this really has become necessary. From Eric Dumazet. 2) Multi-program support for cgroup+bpf, from Alexei Starovoitov. 3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew Lunn. 4) Add meter action support to openvswitch, from Andy Zhou. 5) Add a data meta pointer for BPF accessible packets, from Daniel Borkmann. 6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet. 7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli. 8) More work to move the RTNL mutex down, from Florian Westphal. 9) Add 'bpftool' utility, to help with bpf program introspection. From Jakub Kicinski. 10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper Dangaard Brouer. 11) Support 'blocks' of transformations in the packet scheduler which can span multiple network devices, from Jiri Pirko. 12) TC flower offload support in cxgb4, from Kumar Sanghvi. 13) Priority based stream scheduler for SCTP, from Marcelo Ricardo Leitner. 14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg. 15) Add RED qdisc offloadability, and use it in mlxsw driver. From Nogah Frankel. 16) eBPF based device controller for cgroup v2, from Roman Gushchin. 17) Add some fundamental tracepoints for TCP, from Song Liu. 18) Remove garbage collection from ipv6 route layer, this is a significant accomplishment. From Wei Wang. 19) Add multicast route offload support to mlxsw, from Yotam Gigi" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits) tcp: highest_sack fix geneve: fix fill_info when link down bpf: fix lockdep splat net: cdc_ncm: GetNtbFormat endian fix openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start netem: remove unnecessary 64 bit modulus netem: use 64 bit divide by rate tcp: Namespace-ify sysctl_tcp_default_congestion_control net: Protect iterations over net::fib_notifier_ops in fib_seq_sum() ipv6: set all.accept_dad to 0 by default uapi: fix linux/tls.h userspace compilation error usbnet: ipheth: prevent TX queue timeouts when device not ready vhost_net: conditionally enable tx polling uapi: fix linux/rxrpc.h userspace compilation errors net: stmmac: fix LPI transitioning for dwmac4 atm: horizon: Fix irq release error net-sysfs: trigger netlink notification on ifalias change via sysfs openvswitch: Using kfree_rcu() to simplify the code openvswitch: Make local function ovs_nsh_key_attr_size() static openvswitch: Fix return value check in ovs_meter_cmd_features() ...
2017-11-08net_sch: cbs: Change TC_SETUP_CBS to TC_SETUP_QDISC_CBSNogah Frankel1-1/+1
Change TC_SETUP_CBS to TC_SETUP_QDISC_CBS to match the new convention.. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-07Merge branch 'linus' into locking/core, to resolve conflictsIngo Molnar1-1/+1
Conflicts: include/linux/compiler-clang.h include/linux/compiler-gcc.h include/linux/compiler-intel.h include/uapi/linux/stddef.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+1
Several conflicts here. NFP driver bug fix adding nfp_netdev_is_nfp_repr() check to nfp_fl_output() needed some adjustments because the code block is in an else block now. Parallel additions to net/pkt_cls.h and net/sch_generic.h A bug fix in __tcp_retransmit_skb() conflicted with some of the rbtree changes in net-next. The tc action RCU callback fixes in 'net' had some overlap with some of the recent tcf_block reworking. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-27igb: Add support for CBS offloadAndre Guedes4-0/+384
This patch adds support for Credit-Based Shaper (CBS) qdisc offload from Traffic Control system. This support enable us to leverage the Forwarding and Queuing for Time-Sensitive Streams (FQTSS) features from Intel i210 Ethernet Controller. FQTSS is the former 802.1Qav standard which was merged into 802.1Q in 2014. It enables traffic prioritization and bandwidth reservation via the Credit-Based Shaper which is implemented in hardware by i210 controller. The patch introduces the igb_setup_tc() function which implements the support for CBS qdisc hardware offload in the IGB driver. CBS offload is the only traffic control offload supported by the driver at the moment. FQTSS transmission mode from i210 controller is automatically enabled by the IGB driver when the CBS is enabled for the first hardware queue. Likewise, FQTSS mode is automatically disabled when CBS is disabled for the last hardware queue. Changing FQTSS mode requires NIC reset. FQTSS feature is supported by i210 controller only. Signed-off-by: Andre Guedes <andre.guedes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Tested-by: Henrik Austad <henrik@austad.us> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-26igb: Fix TX map failure pathJean-Philippe Brucker1-1/+1
When the driver cannot map a TX buffer, instead of rolling back gracefully and retrying later, we currently get a panic: [ 159.885994] igb 0000:00:00.0: TX DMA map failed [ 159.886588] Unable to handle kernel paging request at virtual address ffff00000a08c7a8 ... [ 159.897031] PC is at igb_xmit_frame_ring+0x9c8/0xcb8 Fix the erroneous test that leads to this situation. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-25locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()Mark Rutland2-2/+2
Please do not apply this to mainline directly, instead please re-run the coccinelle script shown below and apply its output. For several reasons, it is desirable to use {READ,WRITE}_ONCE() in preference to ACCESS_ONCE(), and new code is expected to use one of the former. So far, there's been no reason to change most existing uses of ACCESS_ONCE(), as these aren't harmful, and changing them results in churn. However, for some features, the read/write distinction is critical to correct operation. To distinguish these cases, separate read/write accessors must be used. This patch migrates (most) remaining ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following coccinelle script: ---- // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and // WRITE_ONCE() // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch virtual patch @ depends on patch @ expression E1, E2; @@ - ACCESS_ONCE(E1) = E2 + WRITE_ONCE(E1, E2) @ depends on patch @ expression E; @@ - ACCESS_ONCE(E) + READ_ONCE(E) ---- Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davem@davemloft.net Cc: linux-arch@vger.kernel.org Cc: mpe@ellerman.id.au Cc: shuah@kernel.org Cc: snitzer@redhat.com Cc: thor.thayer@linux.intel.com Cc: tj@kernel.org Cc: viro@zeniv.linux.org.uk Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-18ethernet/intel: Convert timers to use timer_setup()Kees Cook1-10/+8
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Switches test of .data field to .function, since .data will be going away. Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10igb: check memory allocation failureChristophe JAILLET1-0/+2
Check memory allocation failures and return -ENOMEM in such cases, as already done for other memory allocations in this function. This avoids NULL pointers dereference. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: Aaron Brown <aaron.f.brown@intel.com Acked-by: PJ Waskiewicz <peter.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-08igb: support BCM54616 PHYJohn W Linville3-0/+8
The management port on an Edgecore AS7712-32 switch uses an igb MAC, but it uses a BCM54616 PHY. Without a patch like this, loading the igb module produces dmesg output like this: [ 3.439125] igb: Copyright (c) 2007-2014 Intel Corporation. [ 3.439866] igb: probe of 0000:00:14.0 failed with error -2 Signed-off-by: John W Linville <linville@tuxdriver.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-08igb: do not drop PF mailbox lock after read of VF messageGreg Edwards4-12/+26
When the PF receives a mailbox message from the VF, it grabs the mailbox lock, reads the VF message from the mailbox, ACKs the message and drops the lock. While the PF is performing the action for the VF message, nothing prevents another VF message from being posted to the mailbox. The current code handles this condition by just dropping any new VF messages without processing them. This results in a mailbox timeout in the VM for posted messages waiting for an ACK, and the VF is reset by the igbvf_watchdog_task in the VM. Given the right sequence of VF messages and mailbox timeouts, this condition can go on ad infinitum. Modify the PF mailbox read method to take an 'unlock' argument that optionally leaves the mailbox locked by the PF after reading the VF message. This ensures another VF message is not posted to the mailbox until after the PF has completed processing the VF message and written its reply. Signed-off-by: Greg Edwards <gedwards@ddn.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-08igb: expose mailbox unlock methodGreg Edwards3-0/+41
Add a mailbox unlock method to e1000_mbx_operations, which will be used to unlock the PF/VF mailbox by the PF. Signed-off-by: Greg Edwards <gedwards@ddn.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-08igb: add argument names to mailbox op function declarationsGreg Edwards2-13/+14
Signed-off-by: Greg Edwards <gedwards@ddn.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-08igb: Remove incorrect "unexpected SYS WRAP" log messageCorinna Vinschen1-2/+0
TSAUXC.DisableSystime is never set, so SYSTIM runs into a SYS WRAP every 1100 secs on 80580/i350/i354 (40 bit SYSTIM) and every 35000 secs on 80576 (45 bit SYSTIM). This wrap event sets the TSICR.SysWrap bit unconditionally. However, checking TSIM at interrupt time shows that this event does not actually cause the interrupt. Rather, it's just bycatch while the actual interrupt is caused by, for instance, TSICR.TXTS. The conclusion is that the SYS WRAP is actually expected, so the "unexpected SYS WRAP" message is entirely bogus and just helps to confuse users. Drop it. Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Acked-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-08igb: protect TX timestamping from API misuseCliff Spradlin1-1/+2
HW timestamping can only be requested for a packet if the NIC is first setup via ioctl(SIOCSHWTSTAMP). If this step was skipped, then the igb driver still allowed TX packets to request HW timestamping. In this situation, the _IGB_PTP_TX_IN_PROGRESS flag was set and would never clear. This prevented any future HW timestamping requests to succeed. Fix this by checking that the NIC is configured for HW TX timestamping before accepting a HW TX timestamping request. Signed-off-by: Cliff Spradlin <cspradlin@google.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-08igb: Fix error of RX network flow classificationGangfeng Huang1-2/+2
After add an ethertype filter, if user change the adapter speed several times, the error "ethtool -N: etype filters are all used" is reported by igb driver. In older patch, function igb_nfc_filter_exit() and igb_nfc_filter_restore() is not paried. igb_nfc_filter_restore() exist in igb_up(), but function igb_nfc_filter_exit() is exist in __igb_close(). In the process of speed changing, only igb_nfc_filter_restore() is called, it will take a position of ethertype bitmap. Reproduce steps: Step 1: Add a etype filter by ethtool $ethtool -N eth0 flow-type ether proto 0x88F8 action 1 Step 2: Change the adapter speed to 100M/full duplex $ethtool -s eth0 speed 100 duplex full Step 3: Change the adapter speed to 1000M/full duplex ethtool -s eth0 speed 1000 duplex full Repeat step2 and step3, then dmesg the system log, you can find the error message, add new ethtype filter is also failed. This fixing is move igb_nfc_filter_exit() from __igb_close() to igb_down() to make igb_nfc_filter_restore()/igb_nfc_filter_exit() is paired. Signed-off-by: Gangfeng Huang <gangfeng.huang@ni.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-07igb: make a few local functions staticColin Ian King1-6/+6
Clean up a few sparse warnings, these following functions can be made static: drivers/net/ethernet/intel/igb/igb_main.c: warning: symbol 'igb_add_mac_filter' was not declared. Should it be static? drivers/net/ethernet/intel/igb/igb_main.c: warning: symbol 'igb_del_mac_filter' was not declared. Should it be static? drivers/net/ethernet/intel/igb/igb_main.c: warning: symbol 'igb_set_vf_mac_filter' was not declared. Should it be static? Signed-off-by: Colin Ian King <colin.king@canonical.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06igb: Remove useless argumentBenjamin Poirier3-7/+7
Given that all callers of igb_update_stats() pass the same two arguments: (adapter, &adapter->stats64), the second argument can be removed. Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06igb: check for Tx timestamp timeouts during watchdogJacob Keller3-0/+31
The igb driver has logic to handle only one Tx timestamp at a time, using a state bit lock to avoid multiple requests at once. It may be possible, if incredibly unlikely, that a Tx timestamp event is requested but never completes. Since we use an interrupt scheme to determine when the Tx timestamp occurred we would never clear the state bit in this case. Add an igb_ptp_tx_hang() function similar to the already existing igb_ptp_rx_hang() function. This function runs in the watchdog routine and makes sure we eventually recover from this case instead of permanently disabling Tx timestamps. Note: there is no currently known way to cause this without hacking the driver code to force it. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06igb: add statistic indicating number of skipped Tx timestampsJacob Keller3-0/+4
The igb driver can only handle one Tx timestamp request at a time. This means it is possible for an application timestamp request to be ignored. There is no easy way for an administrator to determine if this occurred. Add a new statistic which tracks this, tx_hwtstamp_skipped. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06igb: avoid permanent lock of *_PTP_TX_IN_PROGRESSJacob Keller1-5/+18
The igb driver uses a state bit lock to avoid handling more than one Tx timestamp request at once. This is required because hardware is limited to a single set of registers for Tx timestamps. The state bit lock is not properly cleaned up during igb_xmit_frame_ring() if the transmit fails such as due to DMA or TSO failure. In some hardware this results in blocking timestamps until the service task times out. In other hardware this results in a permanent lock of the timestamp bit because we never receive an interrupt indicating the timestamp occurred, since indeed the packet was never transmitted. Fix this by checking for DMA and TSO errors in igb_xmit_frame_ring() and properly cleaning up after ourselves when these occur. Reported-by: Reported-by: David Mirabito <davidm@metamako.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06igb: fix race condition with PTP_TX_IN_PROGRESS bitsJacob Keller1-2/+10
Hardware related to the igb driver has a limitation of only handling one Tx timestamp at a time. Thus, the driver uses a state bit lock to enforce that only one timestamp request is honored at a time. Unfortunately this suffers from a simple race condition. The bit lock is not cleared until after skb_tstamp_tx() is called notifying the stack of a new Tx timestamp. Even a well behaved application which sends only one timestamp request at once and waits for a response might wake up and send a new packet before the bit lock is cleared. This results in needlessly dropping some Tx timestamp requests. We can fix this by unlocking the state bit as soon as we read the Timestamp register, as this is the first point at which it is safe to unlock. To avoid issues with the skb pointer, we'll use a copy of the pointer and set the global variable in the driver structure to NULL first. This ensures that the next timestamp request does not modify our local copy of the skb pointer. This ensures that well behaved applications do not accidentally race with the unlock bit. Obviously an application which sends multiple Tx timestamp requests at once will still only timestamp one packet at a time. Unfortunately there is nothing we can do about this. Reported-by: David Mirabito <davidm@metamako.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06igb: mark PM functions as __maybe_unusedArnd Bergmann1-13/+5
The new wake function is only used by the suspend/resume handlers that are defined in inside of an #ifdef, which can cause this harmless warning: drivers/net/ethernet/intel/igb/igb_main.c:7988:13: warning: 'igb_deliver_wake_packet' defined but not used [-Wunused-function] Removing the #ifdef, instead using a __maybe_unused annotation simplifies the code and avoids the warning. Fixes: b90fa8763560 ("igb: Enable reading of wake up packet") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06igb: Explicitly select page 0 at initializationMatwey V Kornilov1-0/+1
The functions igb_read_phy_reg_gs40g/igb_write_phy_reg_gs40g (which were removed in 2a3cdea) explicitly selected the required page at every phy_reg access. Currently, igb_get_phy_id_82575 relays on the fact that page 0 is already selected. The assumption is not fulfilled for my Lex 3I380CW motherboard with integrated dual i211 based gigabit ethernet. This leads to igb initialization failure and network interfaces are not working: igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k igb: Copyright (c) 2007-2014 Intel Corporation. igb: probe of 0000:01:00.0 failed with error -2 igb: probe of 0000:02:00.0 failed with error -2 In order to fix it, we explicitly select page 0 before first access to phy registers. See also: https://bugzilla.suse.com/show_bug.cgi?id=1009911 See also: http://www.lex.com.tw/products/pdf/3I380A&3I380CW.pdf Fixes: 2a3cdea ("igb: Remove GS40G specific defines/functions") Cc: <stable@vger.kernel.org> # 4.5+ Signed-off-by: Matwey V Kornilov <matwey@sai.msu.ru> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-05-21net: ethernet: update drivers to handle HWTSTAMP_FILTER_NTP_ALLMiroslav Lichvar1-0/+1
Include HWTSTAMP_FILTER_NTP_ALL in net_hwtstamp_validate() as a valid filter and update drivers which can timestamp all packets, or which explicitly list unsupported filters instead of using a default case, to handle the filter. CC: Richard Cochran <richardcochran@gmail.com> CC: Willem de Bruijn <willemb@google.com> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-08scripts/spelling.txt: add regsiter -> register spelling mistakeStephen Boyd1-1/+1
This typo is quite common. Fix it and add it to the spelling file so that checkpatch catches it earlier. Link: http://lkml.kernel.org/r/20170317011131.6881-2-sboyd@codeaurora.org Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-20igb: Enable reading of wake up packetKim Tatt Chuah2-1/+56
Currently, in igb_resume(), igb driver ignores the Wake Up Status (WUS) and Wake Up Packet Memory (WUPM) registers. This patch enables the igb driver to read the WUPM if the controller was woken by a wake up packet that is not more than 128 bytes long (maximum WUPM size), then pass it up the kernel network stack. Signed-off-by: Kim Tatt Chuah <kim.tatt.chuah@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>