linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2012-07-20	ipv4: Cache input routes in fib_info nexthops.	David S. Miller	3	-12/+46
	Caching input routes is slightly simpler than output routes, since we don't need to be concerned with nexthop exceptions. (locally destined, and routed packets, never trigger PMTU events or redirects that will be processed by us). However, we have to elide caching for the DIRECTSRC and non-zero itag cases. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	ipv4: Cache output routes in fib_info nexthops.	David S. Miller	3	-43/+101
	If we have an output route that lacks nexthop exceptions, we can cache it in the FIB info nexthop. Such routes will have DST_HOST cleared because such routes refer to a family of destinations, rather than just one. The sequence of the handling of exceptions during route lookup is adjusted to make the logic work properly. Before we allocate the route, we lookup the exception. Then we know if we will cache this route or not, and therefore whether DST_HOST should be set on the allocated route. Then we use DST_HOST to key off whether we should store the resulting route, during rt_set_nexthop(), in the FIB nexthop cache. With help from Eric Dumazet. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	ipv4: Kill routes during PMTU/redirect updates.	David S. Miller	2	-12/+30
	Mark them obsolete so there will be a re-lookup to fetch the FIB nexthop exception info. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	net: Document dst->obsolete better.	David S. Miller	7	-21/+35
	Add a big comment explaining how the field works, and use defines instead of magic constants for the values assigned to it. Suggested by Joe Perches. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	ipv4: Adjust semantics of rt->rt_gateway.	David S. Miller	8	-17/+25
	In order to allow prefixed routes, we have to adjust how rt_gateway is set and interpreted. The new interpretation is: 1) rt_gateway == 0, destination is on-link, nexthop is iph->daddr 2) rt_gateway != 0, destination requires a nexthop gateway Abstract the fetching of the proper nexthop value using a new inline helper, rt_nexthop(), as suggested by Joe Perches. Signed-off-by: David S. Miller <davem@davemloft.net> Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
2012-07-20	ipv4: Remove 'rt_dst' from 'struct rtable'	David S. Miller	3	-38/+9
	Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	ipv4: Remove 'rt_mark' from 'struct rtable'	David Miller	4	-10/+3
	Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	ipv4: Kill 'rt_src' from 'struct rtable'	David Miller	3	-21/+15
	Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	ipv4: Remove rt_key_{src,dst,tos} from struct rtable.	David Miller	3	-38/+9
	They are always used in contexts where they can be reconstituted, or where the finally resolved rt->rt_{src,dst} is semantically equivalent. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	ipv4: Kill ip_route_input_noref().	David Miller	6	-24/+12
	The "noref" argument to ip_route_input_common() is now always ignored because we do not cache routes, and in that case we must always grab a reference to the resulting 'dst'. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	ipv4: Delete routing cache.	David S. Miller	3	-933/+13
	The ipv4 routing cache is non-deterministic, performance wise, and is subject to reasonably easy to launch denial of service attacks. The routing cache works great for well behaved traffic, and the world was a much friendlier place when the tradeoffs that led to the routing cache's design were considered. What it boils down to is that the performance of the routing cache is a product of the traffic patterns seen by a system rather than being a product of the contents of the routing tables. The former of which is controllable by external entitites. Even for "well behaved" legitimate traffic, high volume sites can see hit rates in the routing cache of only ~%10. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	atl1c: fix issue of io access mode for AR8152 v2.1	Cloud Ren	2	-1/+20
	When io access mode is enabled by BOOTROM or BIOS for AR8152 v2.1, the register can't be read/write by memory access mode. Clearing Bit 8 of Register 0x21c could fixed the issue. Signed-off-by: Cloud Ren <cjren@qca.qualcomm.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: xiong <xiong@qca.qualcomm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	tun: fix a crash bug and a memory leak	Mikulas Patocka	3	-0/+7
	This patch fixes a crash tun_chr_close -> netdev_run_todo -> tun_free_netdev -> sk_release_kernel -> sock_release -> iput(SOCK_INODE(sock)) introduced by commit 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d The problem is that this socket is embedded in struct tun_struct, it has no inode, iput is called on invalid inode, which modifies invalid memory and optionally causes a crash. sock_release also decrements sockets_in_use, this causes a bug that "sockets: used" field in /proc/*/net/sockstat keeps on decreasing when creating and closing tun devices. This patch introduces a flag SOCK_EXTERNALLY_ALLOCATED that instructs sock_release to not free the inode and not decrement sockets_in_use, fixing both memory corruption and sockets_in_use underflow. It should be backported to 3.3 an 3.4 stabke. Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: stable@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	ipv4: show pmtu in route list	Julian Anastasov	1	-1/+5
	Override the metrics with rt_pmtu Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	team: add multiqueue support	Jiri Pirko	2	-5/+68
	Largely copied from bonding code. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	bond_sysfs: use real_num_tx_queues rather than params.tx_queue	Jiri Pirko	1	-1/+1
	Since now number of tx queues can be specified during bond instance creation and therefore it may differ from params.tx_queues, use rather real_num_tx_queues for boundary check. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	net: rename bond_queue_mapping to slave_dev_queue_mapping	Jiri Pirko	2	-4/+4
	As this is going to be used not only by bonding. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	rtnl: allow to specify number of rx and tx queues on device creation	Jiri Pirko	2	-2/+15
	This patch introduces IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES by which userspace can set number of rx and/or tx queues to be allocated for newly created netdevice. This overrides ops->get_num_[tr]x_queues() Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	rtnl: allow to specify different num for rx and tx queue count	Jiri Pirko	3	-18/+22
	Also cut out unused function parameters and possible err in return value. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	net: honour netif_set_real_num_tx_queues() retval	Jiri Pirko	1	-1/+6
	In netif_copy_real_num_queues() the return value of netif_set_real_num_tx_queues() should be checked. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	tcp: improve latencies of timer triggered events	Eric Dumazet	4	-51/+71
	Modern TCP stack highly depends on tcp_write_timer() having a small latency, but current implementation doesn't exactly meet the expectations. When a timer fires but finds the socket is owned by the user, it rearms itself for an additional delay hoping next run will be more successful. tcp_write_timer() for example uses a 50ms delay for next try, and it defeats many attempts to get predictable TCP behavior in term of latencies. Use the recently introduced tcp_release_cb(), so that the user owning the socket will call various handlers right before socket release. This will permit us to post a followup patch to address the tcp_tso_should_defer() syndrome (some deferred packets have to wait RTO timer to be transmitted, while cwnd should allow us to send them sooner) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: John Heffner <johnwheffner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	tcp: fix ABC in tcp_slow_start()	Eric Dumazet	1	-2/+3
	When/if sysctl_tcp_abc > 1, we expect to increase cwnd by 2 if the received ACK acknowledges more than 2*MSS bytes, in tcp_slow_start() Problem is this RFC 3465 statement is not correctly coded, as the while () loop increases snd_cwnd one by one. Add a new variable to avoid this off-by one error. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: John Heffner <johnwheffner@gmail.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	tcp: use hash_32() in tcp_metrics	Eric Dumazet	2	-16/+11
	Fix a missing roundup_pow_of_two(), since tcpmhash_entries is not guaranteed to be a power of two. Uses hash_32() instead of custom hash. tcpmhash_entries should be an unsigned int. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	tcp: Return bool instead of int where appropriate	Vijay Subramanian	1	-8/+8
	Applied to a set of static inline functions in tcp_input.c Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	ixgbe: use PCI_VENDOR_ID_INTEL	Jon Mason	3	-9/+6
	Use PCI_VENDOR_ID_INTEL from pci_ids.h instead of creating its own vendor ID #define. Signed-off-by: Jon Mason <jdmason@kudzu.us> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Bruce Allan <bruce.w.allan@intel.com> Cc: Carolyn Wyborny <carolyn.wyborny@intel.com> Cc: Don Skidmore <donald.c.skidmore@intel.com> Cc: Greg Rose <gregory.v.rose@intel.com> Cc: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Cc: Alex Duyck <alexander.h.duyck@intel.com> Cc: John Ronciak <john.ronciak@intel.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	ixgb: use PCI_VENDOR_ID_INTEL	Jon Mason	3	-12/+8
	Use PCI_VENDOR_ID_INTEL from pci_ids.h instead of creating its own vendor ID #define. Signed-off-by: Jon Mason <jdmason@kudzu.us> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Bruce Allan <bruce.w.allan@intel.com> Cc: Carolyn Wyborny <carolyn.wyborny@intel.com> Cc: Don Skidmore <donald.c.skidmore@intel.com> Cc: Greg Rose <gregory.v.rose@intel.com> Cc: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Cc: Alex Duyck <alexander.h.duyck@intel.com> Cc: John Ronciak <john.ronciak@intel.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	myri10ge: update MAINTAINERS	Jon Mason	1	-1/+0
	Remove myself from myri10ge MAINTAINERS list Signed-off-by: Jon Mason <jdmason@kudzu.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	can: janz-ican3: add support for one shot mode	Ira W. Snyder	1	-1/+7
	The Janz VMOD-ICAN3 hardware has support for one shot packet transmission. This means that a packet will be attempted to be sent once, with no automatic retries. The SocketCAN core has a controller-wide setting for this mode: CAN_CTRLMODE_ONE_SHOT. The Janz VMOD-ICAN3 hardware supports this flag on a per-packet level, but the SocketCAN core does not. Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-07-20	can: janz-ican3: avoid firmware lockup caused by infinite bus error quota	Ira W. Snyder	1	-1/+11
	If the bus error quota is set to infinite and the host CPU cannot keep up, the Janz VMOD-ICAN3 firmware will stop responding to control messages until the controller is reset. The firmware will automatically stop sending bus error messages when the quota is reached, and will only resume sending bus error messages when the quota is re-set to a positive value. This limitation is worked around by setting the bus error quota to one message, and then re-setting the quota to one message every time a bus error message is received. By doing this, the firmware never stops responding to control messages. The CAN bus can be reset without a hard-reset of the controller card. Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-07-20	can: janz-ican3: fix support for CAN_RAW_RECV_OWN_MSGS	Ira W. Snyder	1	-46/+157
	The Janz VMOD-ICAN3 firmware does not support any sort of TX-done notification or interrupt. The driver previously used the hardware loopback to attempt to work around this deficiency, but this caused all sockets to receive all messages, even if CAN_RAW_RECV_OWN_MSGS is off. Using the new function ican3_cmp_echo_skb(), we can drop the loopback messages and return the original skbs. This fixes the issues with CAN_RAW_RECV_OWN_MSGS. A private skb queue is used to store the echo skbs. This avoids the need for any index management. Due to a lack of TX-error interrupts, bus errors are permanently enabled, and are used as a TX-error notification. This is used to drop an echo skb when transmission fails. Bus error packets are not generated if the user has not enabled bus error reporting. Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-07-20	can: janz-ican3: fix error and byte counters	Ira W. Snyder	1	-5/+6
	The error and byte counter statistics were being incremented incorrectly. For example, a TX error would be counted both in tx_errors and rx_errors. This corrects the problem so that tx_errors and rx_errors are only incremented for errors caused by packets sent to the bus. Error packets generated by the driver are not counted. The byte counters are only increased for packets which are actually transmitted or received from the bus. Error packets generated by the driver are not counted. Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-07-20	can: janz-ican3: cleanup of ican3_to_can_frame and can_frame_to_ican3	Marc Kleine-Budde	1	-5/+5
	This patch cleans up the ICAN3 to Linux CAN frame and vice versa conversion functions: - RX: Use get_can_dlc() to limit the dlc value. - RX+TX: Don't copy the whole frame, only copy the amount of bytes specified in cf->can_dlc. Acked-by: Ira W. Snyder <iws@ovro.caltech.edu> Tested-by: Ira W. Snyder <iws@ovro.caltech.edu> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-07-20	can: janz-ican3: drop invalid skbs	Ira W. Snyder	1	-0/+3
	The commit which added the janz-ican3 driver and commit 3ccd4c61 "can: Unify droping of invalid tx skbs and netdev stats" were committed into mainline Linux during the same merge window. Therefore, the addition of this code to the janz-ican3 driver was forgotten. This patch adds the expected code. Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-07-20	can: janz-ican3: remove dead code	Ira W. Snyder	1	-2/+0
	The code which used this variable was removed during review, before the driver was added to mainline Linux. It is now dead code, and can be removed. Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-07-20	can: flexcan: add 2nd clock to support imx53 and newer	Steffen Trumtrar	1	-18/+27
	This patch adds support for a second clock to the flexcan driver. On modern freescale ARM cores like the imx53 and imx6q two clocks ("ipg" and "per") must be enabled in order to access the CAN core. In the original driver, the clock was requested without specifying the connection id, further all mainline ARM archs with flexcan support (imx28, imx25, imx35) register their flexcan clock without a connection id, too. This patch first renames the existing clk variable to clk_ipg and converts it to devm for easier error handling. The connection id "ipg" is added to the devm_clk_get() call. Then a second clock "per" is requested. As all archs don't specify a connection id, both clk_get return the same clock. This ensures compatibility to existing flexcan support and adds support for imx53 at the same time. After this patch hits mainline, the archs may give their existing flexcan clock the "ipg" connection id and implement a dummy "per" clock. This patch has been tested on imx28 (unmodified clk tree) and on imx53 with a seperate "ipg" and "per" clock. Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Steffen Trumtrar <s.trumtrar@pengutronix.de> Acked-by: Hui Wang <jason77.wang@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-07-20	can: mark bittiming_const pointer in struct can_priv as const	Marc Kleine-Budde	15	-15/+15
	This patch marks the bittiming_const pointer as in the struct can_pric as "const". This allows us to mark the struct can_bittiming_const in the CAN drivers as "const", too. Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-07-19	ixgbe: Enable FCoE FSO and CRC offloads based on CAPABLE instead of ENABLED flag	Alexander Duyck	2	-8/+13
	Instead of only setting the FCOE segmentation offload and CRC offload flags if we enable FCoE, we could just set them always since there are no modifications needed to the hardware or adapter FCoE structure in order to use these features. The advantage to this is that if FCoE enablement fails, for example because SR-IOV was enabled on 82599, we will still have use of the FCoE segmentation offload and Tx/Rx CRC offloads which should still help to improve the FCoE performance. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-07-19	ixgbe: Only enable anti-spoof on VF pools	Alexander Duyck	1	-9/+11
	The current logic is enabling anti-spoof on all pools and then clearing anti-spoof on just the first PF pool. The correct approach is to only set anti-spoof on the VF pools and to leave all of the PF pools unchecked. This allows for items such as FCoE to use adjacent pools within the PF for transmit and receive queues without the traffic being blocked by this security feature. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-07-19	ixgbe: Correctly set SAN MAC RAR pool to default pool of PF	Alexander Duyck	6	-3/+46
	This change corrects an issue in which an FCoE enabled adapter was always setting the FCoE SAN MAC MPSAR register to 0x1. This results in the first VF being assigned the SAN MAC address in the case of SR-IOV and as such is incorrect. To resolve this I am adding a new function that will update the SAN MAC pool address after reset. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-07-19	ixgbe: Make FCoE allocation and configuration closer to how rings work	Alexander Duyck	4	-122/+154
	This patch changes the behavior of the FCoE configuration so that it is much closer to how the main body of the ixgbe driver works for ring allocation. The first piece is the ixgbe_fcoe_ddp_enable/disable calls. These allocate the percpu values and if successful set the fcoe_ddp_xid value indicating that we can support DDP. The next piece is the ixgbe_setup/free_ddp_resources calls. These are called on open/close and will allocate and free the DMA pools. Finally ixgbe_configure_fcoe is now just register configuration. It can go through and enable the registers for the FCoE redirection offload, and FIP configuration without any interference from the DDP pool allocation. The net result of all this is two fold. First it adds a certain amount of exception handling. So for example if ixgbe_setup_fcoe_resources fails we will actually generate an error in open and refuse to bring up the interface. Secondly it provides a much more graceful failure case than the previous model which would skip setting up the registers for FCoE on failure to allocate DDP resources leaving no Rx functionality enabled instead of just disabling DDP. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-07-19	ixgbe: Merge all FCoE percpu values into a single structure	Alexander Duyck	3	-86/+86
	This change merges the 2 statistics values for noddp and noddp_ext_buff and the dma_pool into a single structure that can be allocated per CPU. The advantages to this are several fold. First we only need to do one alloc_percpu call now instead of 3, so that means less overhead for handling memory allocation failures. Secondly in the case of ixgbe_fcoe_ddp_setup we only need to call get_cpu once which makes things a bit cleaner since we can drop a put_cpu() from the exception path. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-07-19	ixgbe: Cleanup configuration of FCoE registers	Alexander Duyck	2	-27/+32
	This change makes it so we always use the FCoE redirection table. We just set all 8 entries to the same value in the case of only having one queue for FCoE. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-07-19	ixgbe: Drop references to deprecated pci_ DMA api and instead use dma_ API	Alexander Duyck	2	-17/+17
	The networking side of the code had already been updated to use dma_ calls instead of the old pci_ calls. However it looks like the FCoE code was never updated. This change goes through and moves everything from the pci APIs to the dma APIs. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-07-19	ixgbe: Fix memory leak when SR-IOV VFs are direct assigned	Alexander Duyck	1	-4/+11
	The VF driver had a memory leak that would occur if VFs were assigned to a guest. The amount of leak would vary with the number of VFs but could max out at about 14K per PF. To reproduce the leak all you would need to do is enable all the VFs on the first PF. Then start a loop of loading and unloading the driver with max_vfs=63 for the first port. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-07-19	ixgbe: Use VMDq offset to indicate the default pool	Alexander Duyck	3	-19/+21
	This change makes it so that we can use the VMDq ring feature offset value to determine the default pool instead of using num_vfs. The reason for this change is to avoid issues should we fail to allocate vfinfo but have pre-existing VFs. What should happen in this case is that num_vfs will go to 0, but the VMDq offset will contain the location of the first PF pool. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Sibai Li <Sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-07-19	ipv4: Fix again the time difference calculation	Julian Anastasov	1	-1/+1
	Fix again the diff value in rt_bind_exception after collision of two latest patches, my original commit actually fixed the same problem. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19	net-tcp: Fast Open client - cookie-less mode	Yuchung Cheng	5	-3/+15
	In trusted networks, e.g., intranet, data-center, the client does not need to use Fast Open cookie to mitigate DoS attacks. In cookie-less mode, sendmsg() with MSG_FASTOPEN flag will send SYN-data regardless of cookie availability. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19	net-tcp: Fast Open client - detecting SYN-data drops	Yuchung Cheng	4	-8/+37
	On paths with firewalls dropping SYN with data or experimental TCP options, Fast Open connections will have experience SYN timeout and bad performance. The solution is to track such incidents in the cookie cache and disables Fast Open temporarily. Since only the original SYN includes data and/or Fast Open option, the SYN-ACK has some tell-tale sign (tcp_rcv_fastopen_synack()) to detect such drops. If a path has recurring Fast Open SYN drops, Fast Open is disabled for 2^(recurring_losses) minutes starting from four minutes up to roughly one and half day. sendmsg with MSG_FASTOPEN flag will succeed but it behaves as connect() then write(). Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19	net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)	Yuchung Cheng	7	-12/+92
	sendmsg() (or sendto()) with MSG_FASTOPEN is a combo of connect(2) and write(2). The application should replace connect() with it to send data in the opening SYN packet. For blocking socket, sendmsg() blocks until all the data are buffered locally and the handshake is completed like connect() call. It returns similar errno like connect() if the TCP handshake fails. For non-blocking socket, it returns the number of bytes queued (and transmitted in the SYN-data packet) if cookie is available. If cookie is not available, it transmits a data-less SYN packet with Fast Open cookie request option and returns -EINPROGRESS like connect(). Using MSG_FASTOPEN on connecting or connected socket will result in simlar errno like repeating connect() calls. Therefore the application should only use this flag on new sockets. The buffer size of sendmsg() is independent of the MSS of the connection. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19	net-tcp: Fast Open client - receiving SYN-ACK	Yuchung Cheng	1	-5/+35
	On receiving the SYN-ACK after SYN-data, the client needs to a) update the cached MSS and cookie (if included in SYN-ACK) b) retransmit the data not yet acknowledged by the SYN-ACK in the final ACK of the handshake. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>