linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-04-07	tcp/dccp: fix inet_reuseport_add_sock()	Eric Dumazet	2	-2/+4
	David Ahern reported panics in __inet_hash() caused by my recent commit. The reason is inet_reuseport_add_sock() was still using sk_nulls_for_each_rcu() instead of sk_for_each_rcu(). SO_REUSEPORT enabled listeners were causing an instant crash. While chasing this bug, I found that I forgot to clear SOCK_RCU_FREE flag, as it is inherited from the parent at clone time. Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07	Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue	David S. Miller	9	-94/+53
	Jeff Kirsher says: ==================== 1GbE Intel Wired LAN Driver Updates 2016-04-06 This series contains updates to e1000, e1000e, igb and Kconfig. Alex fixes igb where we were casting the MAC address as __beXX and then passing it into le32_to_cpu, when we could simply cast as __lexx to maintain consistency since it is already little endian. Then enabled bulk free in transmit cleanup for igb. John Holland enables igb to pickup the MAC address from a device tree blob when CONFIG_OF has been enabled. Doron Shikmoni fixes a bug in the output of "ethtool -m ethX" where the data byte appeared duplicated. Stefan fixes up e1000 and e1000e ethtool offline tests which were calling dev_close() which causes IFF_UP to be cleared which removes teh interface routes and some addresses, so use ndo_stop() instead. Jiri Benc cleans up some old links in the Kconfig for Intel drivers where we referred to a URL which is no longer valid. I am so glad Jiri has the time in his day to spend clicking on and testing all the URL links in the the kernel. Arika Chen reverts the addition of a 'rtnl_unlock()' which had a unmatched 'rtnl_lock()' call before it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-07	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue	David S. Miller	19	-88/+311
	Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2016-04-06 This series contains updates to i40e and i40evf. Deepthi adds a debug message to display the MSIx vector count for hardware capabilities. Shannon removed the setting of debug_mask at startup to take care of an issue where all the device capabilities getting printed when we had not asked for it. Moved the NVM status out of the admin queue structure, since it should really stay with the other NVM data structures. Akeem added the flush routine to the end of the reset flow to avoid problems in the pass-through routines. Jesse moves a local variable deeper into the depths of the driver where the light is low and the context is great. Then cleaned up the tx_ring argument since it was not making good arguments. Improved performance by not "checking for FCoE" by re-ordering the FCoE checks. Anjali adds the support for changing a VF from non-trusted to trusted and vice-versa. Mitch adds opcodes and structures to support RSS configuration by PF driver on behalf of the VF driver. Fixed how the VLAN feature flags are set. Kiran added defines for RSS, flow director, flexible payload and IPv6. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	Revert "igb: Fix a deadlock in igb_sriov_reinit"	Arika Chen	1	-1/+0
	This reverts commit 3eb14ea8d958 ("igb: Fix a deadlock in igb_sriov_reinit") It is the same as commit f468adc944ef ("igb: missing rtnl_unlock in igb_sriov_reinit()") There is no rtnl_lock() in igb_resume before, rtnl_unlock will cause a deadlock. Signed-off-by: Arika Chen <arika.chen@huawei.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	net: intel: remove dead links	Jiri Benc	1	-66/+14
	The Kconfig for Intel NICs references two different URLs for the "Adapter & Driver ID Guide". Neither of those two links works. The current URL seems to be http://www.intel.com/content/www/us/en/support/network-and-i-o/ethernet-products/000005584.html but given it's apparently constantly changing, there's no point in having it in the help text. Just keep a generic pointer to http://support.intel.com. Hopefully, this one will have a longer live. It still works, at least. Furthermore, remove a link to "the latest Intel PRO/100 network driver for Linux", this has no place in the mainline kernel and the latest Linux driver it offers is from 2006, anyway. Signed-off-by: Jiri Benc <jbenc@redhat.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	i40evf: properly handle VLAN features	Mitch Williams	1	-15/+12
	Correctly set the VLAN feature flags after setting the rest of the netdev flags. And don't set them in hw_features, because these can't be controlled by the VF driver. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	i40e/i40evf: Bump patch from 1.5.2 to 1.5.5	Harshitha Ramamurthy	2	-2/+2
	Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	i40e: Input set mask constants for RSS, flow director, and flex bytes	Kiran Patil	2	-0/+75
	Add defines for input set mask (RSS, flow director, flexible payload), including defines specific to IPv6. Change-ID: Ie95ef7d0916a4d6ca011c194283f959774c8dce9 Signed-off-by: Kiran Patil <kiran.patil@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	i40e: Move NVM event wait check to NVM code	Shannon Nelson	3	-30/+33
	The logic that checks AQ events for NVM done events is better kept in nvm.c with the rest of the nvmupdate handling code. Change-ID: I2ea58980df8ecaa3726b28a37bff3dfcb8df03dc Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	i40e: Add RSS configuration to virtual channel	Mitch Williams	2	-6/+84
	Add opcodes and structures to support RSS configuration by PF driver on behalf of the VF drivers. This reduces complexity in the VF driver and allows us to support future hardware designs without modifying the VF driver. Change-ID: I8c75765c630eacb71f95967f1109a198542593ac Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	i40e: Move NVM variable out of AQ struct	Shannon Nelson	6	-11/+11
	The NVM update status info should stay collected together, not spread across different structs. Change-ID: Ic16f9e9fd79945d865bb7226184c889884585025 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	i40e: Restrict VF poll mode to only single function mode devices	Shannon Nelson	1	-1/+9
	The VFs can request their queues to be set up into polling mode, rather than interrupt mode, which works well for supporting things like DPDK, but this should not be available when working in an multi-function support device. Change-ID: Id36792e4e7422db8f2033336507211f68f14ff6f Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	i40e: Patch to support trusted VF	Anjali Singhai Jain	3	-0/+45
	This patch adds hook to support changing a VF from not-trusted to trusted and vice-versa. Fixed the wrappers and function prototype. Changed the dmesg to reflex the current state better. This patch also disables turning on/off trusted VF in MFP mode. Change-ID: Ibcd910935c01f0be1f3fdd6d427230291ee92ebe Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	i40e/i40evf: Faster RX via avoiding FCoE	Jesse Brandeburg	5	-14/+30
	As it turns out, calling into other files from hot path hurts performance a lot. In this case the majority of the time we call "check FCoE" and the packet is not FCoE, but this call was taking 5% of our total cycles spent on receive. Change-ID: I080552c26e7060bc7b78504dc2763f6f0b3d8c76 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	i40e/i40evf: Drop unused tx_ring argument	Jesse Brandeburg	2	-8/+4
	Some of the tx_ring arguments can be deleted since they are not used. Change-ID: I99275b0f191d7f63ec2f05061919904940c36f31 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	i40e/i40evf: Move stack var deeper	Jesse Brandeburg	2	-2/+4
	A local variable could move down inside the context where it is used. Change-ID: I9caba9e1eacf921037077f2665cbce83fd8e95d6 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	i40e: Move HW flush	Akeem G Abodunrin	1	-0/+1
	This patch moves the HW flush routine to the end of the reset flow, after the completion of writing to the device VFLR registers- the benefit is to avoid problems in the passthrough routines. Change-ID: Ieb56866f21895e6c1fc514b7328c3df79807a57c Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	i40e: Leave debug_mask cleared at init	Shannon Nelson	1	-1/+0
	Don't set our internal debug_mask at startup unless we get specific signal to from the debug module parameter. This should take care of the issue with all the device capabilities getting printed even when we hadn't asked for the debug info. Change-ID: I7fbc6bd8b11ed9b0631ec018ff36015a04100b6c Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	i40e: Inserting a HW capability display info	Deepthi Kavalur	1	-0/+3
	Display MSIx vector count for HW capabilities. Change-ID: I4b41e9b50360cf660e7fbcb85b9390fedcf313b1 Signed-off-by: Deepthi Kavalur <deepthi.kavalur@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	e1000: call ndo_stop() instead of dev_close() when running offline selftest	Stefan Assmann	3	-6/+8
	Calling dev_close() causes IFF_UP to be cleared which will remove the interfaces routes and some addresses. That's probably not what the user intended when running the offline selftest. Besides this does not happen if the interface is brought down before the test, so the current behaviour is inconsistent. Instead call the net_device_ops ndo_stop function directly and avoid touching IFF_UP at all. Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	Merge branch 'mlxsw-dcb'	David S. Miller	8	-50/+1381
	Jiri Pirko says: ==================== mlxsw: Introduce support for Data Center Bridging Ido says: This patchset introduces support for Quality of Service (QoS) as part of the IEEE Data Center Bridiging (DCB) standards. Patches 1-9 do the required device initialization. Specifically, patches 1-6 initialize the ports' headroom buffers, which are used at ingress to store incoming packets while they go through the switch's pipeline. Patches 7-9 complete them by initializing the egress scheduling. The pipeline mentioned above determines the packet's egress port(s) and traffic class. Ideally, once out of the pipeline the packet moves to the switch's shared buffer (to be introduced in Jiri's patchset, currently default values are used) and scheduled for transmission according to its traffic class. The egress scheduling is configured according to the 802.1Qaz standard, which is part of the DCB infrastructure supported by Linux. This is introduced in patches 10-12. Even after going through the pipeline packets are not always eligible to enter the shared buffer. This is determined by the amount of available space and the quotas associated with the packet. However, if flow control is enabled and the packet is associated with the lossless flow, then it will stay in the headroom and won't be discarded. This is introduced in patches 13-17. Please check individual commit messages for more info, as I tried to keep them pretty detailed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	mlxsw: spectrum: Add IEEE 802.1Qbb PFC support	Ido Schimmel	4	-11/+158
	Implement the appropriate DCB ops and allow a user to configure certain traffic classes as lossless. The operation configures PFC for both the egress (respecting PFC frames) and ingress (sending PFC frames) parts of the port. At egress, when a PFC frame is received for a PFC enabled priority, then all the priorities mapped to the same TC are stopped. At ingress, the priority group (PG) buffers to which the enabled PFC priorities are mapped are configured to be lossless. PFC frames will be transmitted when the Xoff threshold is crossed. The user-supplied delay parameter is used to determine the PG's size according to the following formula: PG_SIZE = PG_SIZE_LOSSY + delay * CELL_FACTOR + MTU In the worst case scenario the delay will be made up of packets that are all of size CELL_SIZE + 1, which means each packet will require almost twice its true size when buffered in the switch. We therefore multiply this value by the "cell factor", which is close to 2. Another MTU is added in case the transmitting host already started transmitting a maximum length frame when the PFC packet was received. As with PAUSE enabled ports, when the port's MTU is changed both the PGs' size and threshold are adjusted accordingly. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	mlxsw: reg: Introduce per priority counters	Ido Schimmel	3	-5/+63
	We are going to add support for PFC as part of DCB ops, which requires us to report the number of PFC frames sent and received per priority. Add per priority counters in order to report number of PFC frames sent and received per priority. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	mlxsw: spectrum: Add support for PAUSE frames	Ido Schimmel	3	-10/+95
	When a packet ingress the switch it's placed in its assigned priority group (PG) buffer in the port's headroom buffer while it goes through the switch's pipeline. After going through the pipeline - which determines its egress port(s) and traffic class - it's moved to the switch's shared buffer awaiting transmission. However, some packets are not eligible to enter the shared buffer due to exceeded quotas or insufficient space. Marking their associated PGs as lossless will cause the packets to accumulate in the PG buffer. Another reason for packets accumulation are complicated pipelines (e.g. involving a lot of ACLs). To prevent packets from being dropped a user can enable PAUSE frames on the port. This will mark all the active PGs as lossless and set their size according to the maximum delay, as it's not configured by user. +----------------+ + \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| Delay \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| Xon/Xoff threshold +----------------+ + \| \| \| \| \| \| 2 * MTU \| \| \| +----------------+ + The delay (612 [Cells]) was calculated according to worst-case scenario involving maximum MTU and 100m cables. After marking the PGs as lossless the device is configured to respect incoming PAUSE frames (Rx PAUSE) and generate PAUSE frames (Tx PAUSE) according to user's settings. Whenever the port's headroom configuration changes we take into account the PAUSE configuration, so that we correctly set the PG's type (lossy / lossless), size and threshold. This can happen when: a) The port's MTU changes, as it directly affects the PG's size. b) A PG is created following user configuration, by binding a priority to it. Note that the relevant SUPPORTED flags were already mistakenly set by the driver before this commit. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	mlxsw: reg: Add lossless settings for PBMC register	Ido Schimmel	1	-0/+35
	When configuring PAUSE frames and PFC we'll need to configure the Xon/Xoff threshold for the priority group (PG) buffers. Add the Xon/Xoff threshold fields to the PBMC register so that we can configure these when needed. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	mlxsw: reg: Add Port Flow Control Configuration register	Ido Schimmel	1	-0/+131
	Add the Port Flow Control Configuration (PFCC) register, which configures both flow control and Priority-based Flow Control (PFC). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	mlxsw: spectrum: Allow setting maximum rate for a TC	Ido Schimmel	3	-3/+77
	Allow a user to set maximum rate for a particular TC using DCB ops. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	mlxsw: spectrum: Add IEEE 802.1Qaz ETS support	Ido Schimmel	3	-10/+283
	Implement the appropriate DCB ops and allow a user to configure: * Priority to traffic class (TC) mapping with a total of 8 supported TCs * Transmission selection algorithm (TSA) for each TC and the corresponding weights in case of weighted round robin (WRR) As previously explained, we treat the priority group (PG) buffer in the port's headroom as the ingress counterpart of the egress TC. Therefore, when a certain priority to TC mapping is configured, we also configure the port's headroom buffer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	mlxsw: spectrum: Introduce support for Data Center Bridging (DCB)	Ido Schimmel	5	-0/+101
	Introduce basic infrastructure for DCB and add the missing ops in following patches. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	mlxsw: spectrum: Initialize egress scheduling	Ido Schimmel	1	-0/+111
	Before introducing support for DCB ops we should first make sure we initialize the relevant parts in the device correctly. Specifically, the egress scheduling. The device supports a superset of the 802.1Qaz standard with 4 hierarchy levels that can be linked to each other in multiple ways and with different transmission selection algorithms (TSA) employed between them. However, since we only intend to support the 802.1Qaz standard we flatten the hierarchies and let the user configure via DCB ops the TSA and max rate shaper at the subgroup hierarchy (see figure below) and the mapping between switch priority to traffic class. By default, all switch priorities are mapped to traffic class 0, strict priority is employed and max shaper is disabled. Default configuration: switch priority 0 ... switch priority 7 + + \| \| +----------------------------------+ \| +--v--+ +-----+ Traffic Class \| \| \| \| Hierarchy \| TC0 \| ... \| TC7 \| \| \| \| \| +--+--+ +--+--+ \| \| +--v--+ +--v--+ Subgroup \| SG0 \| \| SG7 \| Hierarchy \| \| \| \| +-----+ +-----+ \| TSA \| \| TSA \| +-----+ ... +-----+ \| MAX \| \| MAX \| +--+--+ +--+--+ \| \| +---------------+----------------+ \| +--v--+ Group \| \| Hierarchy \| GR0 \| \| \| +--+--+ \| +--v--+ Port \| \| Hierarchy \| PR0 \| \| \| +-----+ Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	mlxsw: reg: Add QoS Switch Traffic Class Table register	Ido Schimmel	1	-0/+55
	As part of DCB ops we'll have to configure the priority to traffic class mapping of a port. Add the QoS Switch Traffic Class Table (QTCT) register, which configures the mapping between the packet switch priority and traffic class on the transmit port. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	mlxsw: reg: Add QoS ETS Element Configuration register	Ido Schimmel	1	-0/+127
	We are going to introduce support for DCB, so we need to be able to configure the traffic selection algorithm (TSA) used by each traffic class (TC), as well as the bandwidth percentage allocated to each TC in case of ETS. Add the QoS ETS Element Configuration register, which controls the above parameters. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	mlxsw: spectrum: Set port's shared buffer size to 0	Ido Schimmel	2	-0/+4
	In addition to the priority group (PG) buffers in the headroom, the device enables the allocation of headroom shared buffer, which can be shared between different PGs. However, we are not going to use the headroom shared buffer and instead allow the user to use its size for PGs or the switch's shared buffer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	mlxsw: reg: Use correct PBMC register length	Ido Schimmel	1	-1/+1
	The last field of the PBMC register is at offset 0x64 and its size is 0x8, so the correct register's length is 0x6C bytes. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	mlxsw: spectrum: Correctly configure headroom size	Ido Schimmel	2	-10/+34
	When packets ingress the switch they are assigned a switch priority and directed to the corresponding priority group (PG) buffer in the port's headroom buffer. Since we now map all switch priorities to priority group 0 (PG0) by default, there is no need to allocate the other priority groups during initialization. The only exception is PG9, which is used for control traffic. At minimum, the PG should be able to store the currently classified packet (pipeline latency isn't 0) and also the packets arriving during the classification time. However, an incoming packet will not be buffered if there is no available MTU-sized buffer space for storing it. The buffer needed to accommodate for pipeline latency is variable and needs to take into account both the current link speed and current latency of the pipeline, which is time-dependent. Testing showed that setting the PG's size to twice the current MTU is optimal. Since PG9 is used strictly for control packets and not subject to flow control, we are not going to resize it according to user configuration, so we simply set it according to worst case scenario, which is twice the maximum MTU. In any case, later patches in the series will allow a user to direct lossless flows to other PGs than PG0 and set their size to accommodate for round-trip propagation delay. The above change also requires us to resize the PG buffer whenever the port's MTU is changed. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	mlxsw: spectrum: Add bytes to cells helper	Ido Schimmel	2	-34/+34
	Buffers in the switch store packets in units called buffer cells. Add a helper to convert from bytes to cells, so that the actual number of cells required (result is round up) is returned. Also, drop the SB (shared buffer) acronym from the BYTES_PER_CELL macro, as this unit is also used in the ports' buffers and not only the switch's shared buffer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	mlxsw: spectrum: Map all switch priorities to priority group 0	Ido Schimmel	1	-1/+24
	During transmission, the skb's priority is used to map the skb to a traffic class, where the idea is to group priorities with similar characteristics (e.g. lossy, lossless) to the same traffic class. By default, all priorities are mapped to traffic class 0. In the device, we model the skb's priority as the switch priority, which is assigned to a packet according to its PCP value and ingress port (untagged packets are assigned the port's default switch priority - 0). At ingress, the packet is directed to a priority group (PG) buffer in the port's headroom buffer according to the packet's switch priority and switch priority to buffer mapping. While it's possible to configure the egress mapping between skb's priority (switch priority) and traffic class, there is no mechanism to configure the ingress mapping to a PG. In order to keep things simple and since grouping certain priorities into a traffic class at egress also implies they should be grouped the same at ingress, treat a PG as the ingress counterpart of an egress traffic class. Having established the above, during initialization map all the switch priorities to PG0 in accordance with the Linux defaults for traffic class mapping. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	mlxsw: reg: Add Port Prio To Buffer register	Ido Schimmel	1	-0/+83
	When packets ingress the switch they are assigned a switch priority number that dictates the packet's priority group (PG) buffer in the port's headroom buffer. Add the Port Prio To Buffer (PPTB) register, which configures the switch priority to PG mapping. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	e1000e: call ndo_stop() instead of dev_close() when running offline selftest	Stefan Assmann	3	-8/+10
	Calling dev_close() causes IFF_UP to be cleared which will remove the interfaces routes and some addresses. That's probably not what the user intended when running the offline selftest. Besides this does not happen if the interface is brought down before the test, so the current behaviour is inconsistent. Instead call the net_device_ops ndo_stop function directly and avoid touching IFF_UP at all. Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue	David S. Miller	11	-43/+84
	Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2016-04-05 This series contains updates to i40e and i40evf only. Colin Ian King cleaned up a redundant NULL check which was found by static analysis. Anjali enables geneve receive offload for XL710/X710 devices. Mitch cleans up unused variable in i40e_vc_get_vf_resources_msg(). Fixed the driver to actually be able to adjust VLAN tagging features through ethtool, as expected. Fixed a problem where VF resets would get lost by the PF preventing the VF driver from initializing. Also put users mind at ease by lowering some message levels since many of these conditions can happen any time VFs are enabled or disabled and are not really indicative a fatal problems, unless they happen continuously. Shannon disables the link polling to lessen the admin queue traffic especially since the link event mask usage has been fixed recently. Alex Duyck fixes the i40e and i40evf drivers to correctly update checksums for frames up to 16776960 in length which should be more than large enough for all possible TSO frames in the near future. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	Merge branch 'vxlan-gpe'	David S. Miller	5	-40/+258
	Jiri Benc says: ==================== vxlan: implement Generic Protocol Extension (GPE) v3: just rebased on top of the current net-next, no changes This patchset implements VXLAN-GPE. It follows the same model as the tun/tap driver: depending on the chosen mode, the vxlan interface is created either as ARPHRD_ETHER (non-GPE) or ARPHRD_NONE (GPE). Note that the internal fdb control plane cannot be used together with VXLAN-GPE and attempt to configure it will be rejected by the driver. In fact, COLLECT_METADATA is required to be set for now. This can be relaxed in the future by adding support for static PtP configuration; it will be backward compatible and won't affect existing users. The previous version of the patchset supported two GPE modes, L2 and L3. The L2 mode (now called "ether mode" in the code) was removed from this version. It can be easily added later if there's demand. The L3 mode is now called "raw mode" and supports also encapsulated Ethernet headers (via ETH_P_TEB). The only limitation of not having "ether mode" for GPE is for ip route based encapsulation: with such setup, only IP packets can be encapsulated. Meaning no Ethernet encapsulation. It seems there's not much use for this, though. If it turns out to be useful, we'll add it. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	vxlan: implement GPE	Jiri Benc	3	-17/+222
	Implement VXLAN-GPE. Only COLLECT_METADATA is supported for now (it is possible to support static configuration, too, if there is demand for it). The GPE header parsing has to be moved before iptunnel_pull_header, as we need to know the protocol. v2: Removed what was called "L2 mode" in v1 of the patchset. Only "L3 mode" (now called "raw mode") is added by this patch. This mode does not allow Ethernet header to be encapsulated in VXLAN-GPE when using ip route to specify the encapsulation, IP header is encapsulated instead. The patch does support Ethernet to be encapsulated, though, using ETH_P_TEB in skb->protocol. This will be utilized by other COLLECT_METADATA users (openvswitch in particular). If there is ever demand for Ethernet encapsulation with VXLAN-GPE using ip route, it's easy to add a new flag switching the interface to "Ethernet mode" (called "L2 mode" in v1 of this patchset). For now, leave this out, it seems we don't need it. Disallowed more flag combinations, especially RCO with GPE. Added comment explaining that GBP and GPE cannot be set together. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	ip_tunnel: implement __iptunnel_pull_header	Jiri Benc	2	-6/+13
	Allow calling of iptunnel_pull_header without special casing ETH_P_TEB inner protocol. Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	vxlan: move fdb code to common location in vxlan_xmit	Jiri Benc	1	-11/+11
	Handle VXLAN_F_COLLECT_METADATA before VXLAN_F_PROXY. The latter does not make sense with the former, as it needs populated fdb which does not happen in metadata mode. After this cleanup, the fdb code in vxlan_xmit is moved to a common location and can be later skipped for VXLAN-GPE which does not necessarily carry inner Ethernet header. v2: changed commit description to not reference L3 mode Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	vxlan: move Ethernet initialization to a separate function	Jiri Benc	1	-7/+13
	This will allow to initialize vxlan in ARPHRD_NONE mode based on the passed rtnl attributes. v2: renamed "l2mode" to "ether". Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	igb: Garbled output for "ethtool -m"	Doron Shikmoni	1	-1/+2
	Garbled output for "ethtool -m ethX", in igb-driven NICs with module / plugin EEPROM (i.e. SFP information). Each output data byte appears duplicated. In igb_ethtool.c, igb_get_module_eeprom() is reading the EEPROM via i2c; the eeprom offset for each word that's read via igb_read_phy_reg_i2c() was passed in #words, whereas it needs to be a byte offset. This patches fixes the bug. Signed-off-by: Doron Shikmoni <doron.shikmoni@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	cxgb4/cxgb4vf: Deprecate module parameter dflt_msg_enable	Hariprasad Shenai	2	-2/+4
	Message level can be set through ethtool, so deprecate module parameter which is used to set the same. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-06	igb: allow setting MAC address on i211 using a device tree blob	John Holland	1	-3/+6
	The Intel i211 LOM PCIe Ethernet controllers' iNVM operates as an OTP and has no external EEPROM interface [1]. The following allows the driver to pickup the MAC address from a device tree blob when CONFIG_OF has been enabled. [1] http://www.intel.com/content/www/us/en/embedded/products/networking/i211-ethernet-controller-datasheet.html Signed-off-by: John Holland <jotihojr@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	igb: Add support for bulk Tx cleanup & cleanup boolean logic	Alexander Duyck	1	-5/+7
	This patch enables bulk free in Tx cleanup for igb and cleans up the boolean logic in the polling routines for igb in the hopes of avoiding any mix-ups similar to what occurred with i40e and i40evf. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06	igb: Fix sparse warning about passing __beXX into leXX_to_cpup	Alexander Duyck	1	-4/+6
	We were casting the addr as __beXX and then passing it into le32_to_cpu because the device expects the MAC address to be in network order even though the register set is little endian. Instead of casting it as __beXX we can just cast it as __leXX in order to maintain consistency since the region of memory is already in little endian order as far as we are concerned. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>