linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2012-10-01	vxlan: virtual extensible lan	stephen hemminger	1	-0/+47
	This is an implementation of Virtual eXtensible Local Area Network as described in draft RFC: http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02 The driver integrates a Virtual Tunnel Endpoint (VTEP) functionality that learns MAC to IP address mapping. This implementation has not been tested only against the Linux userspace implementation using TAP, not against other vendor's equipment. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31	tcp: TCP Fast Open Server - header & support functions	Jerry Chu	1	-7/+22
	This patch adds all the necessary data structure and support functions to implement TFO server side. It also documents a number of flags for the sysctl_tcp_fastopen knob, and adds a few Linux extension MIBs. In addition, it includes the following: 1. a new TCP_FASTOPEN socket option an application must call to supply a max backlog allowed in order to enable TFO on its listener. 2. A number of key data structures: "fastopen_rsk" in tcp_sock - for a big socket to access its request_sock for retransmission and ack processing purpose. It is non-NULL iff 3WHS not completed. "fastopenq" in request_sock_queue - points to a per Fast Open listener data structure "fastopen_queue" to keep track of qlen (# of outstanding Fast Open requests) and max_qlen, among other things. "listener" in tcp_request_sock - to point to the original listener for book-keeping purpose, i.e., to maintain qlen against max_qlen as part of defense against IP spoofing attack. 3. various data structure and functions, many in tcp_fastopen.c, to support server side Fast Open cookie operations, including /proc/sys/net/ipv4/tcp_fastopen_key to allow manual rekeying. Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31	net:stmmac: Remove bus_id from mdio platform data.	Srinivas Kandagatla	1	-5/+0
	This patch removes bus_id from mdio platform data, The reason to remove bus_id is, stmmac mdio bus_id is always same as stmmac bus-id, so there is no point in passing this in different variable. Also stmmac ethernet driver connects to phy with bus_id passed its platform data. So, having single bus-id is much simpler. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31	tcp: Increase timeout for SYN segments	Alex Bergmann	1	-2/+6
	Commit 9ad7c049 ("tcp: RFC2988bis + taking RTT sample from 3WHS for the passive open side") changed the initRTO from 3secs to 1sec in accordance to RFC6298 (former RFC2988bis). This reduced the time till the last SYN retransmission packet gets sent from 93secs to 31secs. RFC1122 is stating that the retransmission should be done for at least 3 minutes, but this seems to be quite high. "However, the values of R1 and R2 may be different for SYN and data segments. In particular, R2 for a SYN segment MUST be set large enough to provide retransmission of the segment for at least 3 minutes. The application can close the connection (i.e., give up on the open attempt) sooner, of course." This patch increases the value of TCP_SYN_RETRIES to the value of 6, providing a retransmission window of 63secs. The comments for SYN and SYNACK retries have also been updated to describe the current settings. The same goes for the documentation file "Documentation/networking/ip-sysctl.txt". Signed-off-by: Alexander Bergmann <alex@linlab.net> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-23	batman-adv: Add the backbone gateway list to debugfs	Simon Wunderlich	1	-3/+4
	This is especially useful if there are no claims yet, but we still want to know which gateways are using bridge loop avoidance in the network. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-22	bonding: support for IPv6 transmit hashing	John Eaglesham	1	-5/+25
	Currently the "bonding" driver does not support load balancing outgoing traffic in LACP mode for IPv6 traffic. IPv4 (and TCP or UDP over IPv4) are currently supported; this patch adds transmit hashing for IPv6 (and TCP or UDP over IPv6), bringing IPv6 up to par with IPv4 support in the bonding driver. In addition, bounds checking has been added to all transmit hashing functions. The algorithm chosen (xor'ing the bottom three quads of the source and destination addresses together, then xor'ing each byte of that result into the bottom byte, finally xor'ing with the last bytes of the MAC addresses) was selected after testing almost 400,000 unique IPv6 addresses harvested from server logs. This algorithm had the most even distribution for both big- and little-endian architectures while still using few instructions. Its behavior also attempts to closely match that of the IPv4 algorithm. The IPv6 flow label was intentionally not included in the hash as it appears to be unset in the vast majority of IPv6 traffic sampled, and the current algorithm not using the flow label already offers a very even distribution. Fragmented IPv6 packets are handled the same way as fragmented IPv4 packets, ie, they are not balanced based on layer 4 information. Additionally, IPv6 packets with intermediate headers are not balanced based on layer 4 information. In practice these intermediate headers are not common and this should not cause any problems, and the alternative (a packet-parsing loop and look-up table) seemed slow and complicated for little gain. Tested-by: John Eaglesham <linux@8192.net> Signed-off-by: John Eaglesham <linux@8192.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14	netconsole.txt: revision of examples for the receiver of kernel messages	Dirk Gouders	1	-2/+17
	There are at least 4 implementations of netcat with the BSD-based being the only one that has to be used without the -p switch to specify the listening port. Jan Engelhardt suggested to add an example for socat(1). Signed-off-by: Dirk Gouders <gouders@et.bocholt.fh-gelsenkirchen.de> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-30	ipv4: remove rt_cache_rebuild_count	Eric Dumazet	1	-6/+0
	After IP route cache removal, rt_cache_rebuild_count is no longer used. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-22	net-next: minor cleanups for bonding documentation	Rick Jones	1	-3/+3
	The section titled "Configuring Bonding for Maximum Throughput" is actually section twelve not thirteen, and there are a couple of words spelled incorrectly. Signed-off-by: Rick Jones <rick.jones2@hp.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-22	sctp: Implement quick failover draft from tsvwg	Neil Horman	1	-0/+14
	I've seen several attempts recently made to do quick failover of sctp transports by reducing various retransmit timers and counters. While its possible to implement a faster failover on multihomed sctp associations, its not particularly robust, in that it can lead to unneeded retransmits, as well as false connection failures due to intermittent latency on a network. Instead, lets implement the new ietf quick failover draft found here: http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05 This will let the sctp stack identify transports that have had a small number of errors, and avoid using them quickly until their reliability can be re-established. I've tested this out on two virt guests connected via multiple isolated virt networks and believe its in compliance with the above draft and works well. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: Sridhar Samudrala <sri@us.ibm.com> CC: "David S. Miller" <davem@davemloft.net> CC: linux-sctp@vger.kernel.org CC: joe@perches.com Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch	David S. Miller	1	-1/+1
	Jesse Gross says: ==================== A few bug fixes and small enhancements for net-next/3.6. ... Ansis Atteka (1): openvswitch: Do not send notification if ovs_vport_set_options() failed Ben Pfaff (1): openvswitch: Check gso_type for correct sk_buff in queue_gso_packets(). Jesse Gross (2): openvswitch: Enable retrieval of TCP flags from IPv6 traffic. openvswitch: Reset upper layer protocol info on internal devices. Leo Alterman (1): openvswitch: Fix typo in documentation. Pravin B Shelar (1): openvswitch: Check currect return value from skb_gso_segment() Raju Subramanian (1): openvswitch: Replace Nicira Networks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20	openvswitch: Fix typo in documentation.	Leo Alterman	1	-1/+1
	Signed-off-by: Leo Alterman <lalterman@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-07-19	net-tcp: Fast Open client - cookie-less mode	Yuchung Cheng	1	-0/+2
	In trusted networks, e.g., intranet, data-center, the client does not need to use Fast Open cookie to mitigate DoS attacks. In cookie-less mode, sendmsg() with MSG_FASTOPEN flag will send SYN-data regardless of cookie availability. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19	net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)	Yuchung Cheng	1	-0/+11
	sendmsg() (or sendto()) with MSG_FASTOPEN is a combo of connect(2) and write(2). The application should replace connect() with it to send data in the opening SYN packet. For blocking socket, sendmsg() blocks until all the data are buffered locally and the handshake is completed like connect() call. It returns similar errno like connect() if the TCP handshake fails. For non-blocking socket, it returns the number of bytes queued (and transmitted in the SYN-data packet) if cookie is available. If cookie is not available, it transmits a data-less SYN packet with Fast Open cookie request option and returns -EINPROGRESS like connect(). Using MSG_FASTOPEN on connecting or connected socket will result in simlar errno like repeating connect() calls. Therefore the application should only use this flag on new sockets. The buffer size of sendmsg() is independent of the MSS of the connection. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19	bridge: update documentation references	stephen hemminger	1	-3/+10
	Update the references to bridge utilities and web pages to current locations Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17	tcp: implement RFC 5961 3.2	Eric Dumazet	1	-0/+5
	Implement the RFC 5691 mitigation against Blind Reset attack using RST bit. Idea is to validate incoming RST sequence, to match RCV.NXT value, instead of previouly accepted window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND) If sequence is in window but not an exact match, send a "challenge ACK", so that the other part can resend an RST with the appropriate sequence. Add a new sysctl, tcp_challenge_ack_limit, to limit number of challenge ACK sent per second. Add a new SNMP counter to count number of challenge acks sent. (netstat -s \| grep TCPChallengeACK) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kiran Kumar Kella <kkiran@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-11	tcp: TCP Small Queues	Eric Dumazet	1	-0/+14
	This introduce TSQ (TCP Small Queues) TSQ goal is to reduce number of TCP packets in xmit queues (qdisc & device queues), to reduce RTT and cwnd bias, part of the bufferbloat problem. sk->sk_wmem_alloc not allowed to grow above a given limit, allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a given time. TSO packets are sized/capped to half the limit, so that we have two TSO packets in flight, allowing better bandwidth use. As a side effect, setting the limit to 40000 automatically reduces the standard gso max limit (65536) to 40000/2 : It can help to reduce latencies of high prio packets, having smaller TSO packets. This means we divert sock_wfree() to a tcp_wfree() handler, to queue/send following frames when skb_orphan() [2] is called for the already queued skbs. Results on my dev machines (tg3/ixgbe nics) are really impressive, using standard pfifo_fast, and with or without TSO/GSO. Without reduction of nominal bandwidth, we have reduction of buffering per bulk sender : < 1ms on Gbit (instead of 50ms with TSO) < 8ms on 100Mbit (instead of 132 ms) I no longer have 4 MBytes backlogged in qdisc by a single netperf session, and both side socket autotuning no longer use 4 Mbytes. As skb destructor cannot restart xmit itself ( as qdisc lock might be taken at this point ), we delegate the work to a tasklet. We use one tasklest per cpu for performance reasons. If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag. This flag is tested in a new protocol method called from release_sock(), to eventually send new segments. [1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable [2] skb_orphan() is usually called at TX completion time, but some drivers call it in their start_xmit() handler. These drivers should at least use BQL, or else a single TCP session can still fill the whole NIC TX ring, since TSQ will have no effect. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Dave Taht <dave.taht@bufferbloat.net> Cc: Tom Herbert <therbert@google.com> Cc: Matt Mathis <mattmathis@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10	vxge/s2io: remove dead URLs	Jon Mason	2	-19/+2
	URLs to neterion.com and s2io.com no longer resolve. Remove all references to these URLs in the driver source and documentation. Signed-off-by: Jon Mason <jdmason@kudzu.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-01	stmmac: update the driver Documentation and add EEE	Giuseppe CAVALLARO	1	-6/+30
	This patch updates the stmmac's documentation adding some missing files in the section used to describe the internal driver's structure. Also the patch adds a new section to describe the EEE support. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-30	ipv4: Clarify in docs that accept_local requires rp_filter.	David S. Miller	1	-3/+8
	Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25	Documentation/networking/caif: Update documentation	Sjur Brændeland	1	-64/+27
	Update drawing and remove description of old features. Add HSI and USB link layers to the drawing. Reported-by: Joerg Reisenweber <joerg.reisenweber@stericssion.com> Signed-off-by: Sjur Brændeland <sjur.brandeland@stericssion.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-19	canfd: update documentation according to CAN FD extensions	Oliver Hartkopp	1	-8/+146
	Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-06-18	batman-adv: Add get_ethtool_stats() support	Martin Hundebøll	1	-0/+5
	Added additional counters in a bat_stats structure, which are exported through the ethtool api. The counters are specific to batman-adv and includes: forwarded packets and bytes management packets and bytes (aggregated OGMs at this point) translation table packets New counters are added by extending "enum bat_counters" in types.h and adding corresponding descriptive string(s) to bat_counters_strings in soft-iface.c. Counters are increased by calling batadv_add_counter() and incremented by one by calling batadv_inc_counter(). Signed-off-by: Martin Hundebøll <martin@hundeboll.net> Signed-off-by: Sven Eckelmann <sven@narfation.org>
2012-06-12	ipv4: Add interface option to enable routing of 127.0.0.0/8	Thomas Graf	1	-0/+5
	Routing of 127/8 is tradtionally forbidden, we consider packets from that address block martian when routing and do not process corresponding ARP requests. This is a sane default but renders a huge address space practically unuseable. The RFC states that no address within the 127/8 block should ever appear on any network anywhere but it does not forbid the use of such addresses outside of the loopback device in particular. For example to address a pool of virtual guests behind a load balancer. This patch adds a new interface option 'route_localnet' enabling routing of the 127/8 address block and processing of ARP requests on a specific interface. Note that for the feature to work, the default local route covering 127/8 dev lo needs to be removed. Example: $ sysctl -w net.ipv4.conf.eth0.route_localnet=1 $ ip route del 127.0.0.0/8 dev lo table local $ ip addr add 127.1.0.1/16 dev eth0 $ ip route flush cache V2: Fix invalid check to auto flush cache (thanks davem) Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-06	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	1	-19/+25

2012-06-06	Merge branch 'master' of git://gitorious.org/linux-can/linux-can-next	David S. Miller	1	-16/+16

2012-06-06	stmmac: update driver's doc	Giuseppe CAVALLARO	1	-19/+25
	Fixed the driver's documentation that was obsolete and didn't report new platform fields (recently added). Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-23	can: update documentation wording error frames -> error messages	Oliver Hartkopp	1	-16/+16
	As Heinz-Juergen Oertel pointed out 'CAN error frames' are a already defined term for the CAN protocol violation indication on the wire. To avoid confusion with the error messages created by CAN drivers available via CAN RAW sockets update the documentation and change the naming from 'error frames' to 'error messages' or 'error message frames'. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2012-05-22	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial	Linus Torvalds	1	-1/+1
	Pull trivial updates from Jiri Kosina: "As usual, it's mostly typo fixes, redundant code elimination and some documentation updates." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (57 commits) edac, mips: don't change code that has been removed in edac/mips tree xtensa: Change mail addresses of Hannes Weiner and Oskar Schirmer lib: Change mail address of Oskar Schirmer net: Change mail address of Oskar Schirmer arm/m68k: Change mail address of Sebastian Hess i2c: Change mail address of Oskar Schirmer net: Fix tcp_build_and_update_options comment in struct tcp_sock atomic64_32.h: fix parameter naming mismatch Kconfig: replace "--- help ---" with "---help---" c2port: fix bogus Kconfig "default no" edac: Fix spelling errors. qla1280: Remove redundant NULL check before release_firmware() call remoteproc: remove redundant NULL check before release_firmware() qla2xxx: Remove redundant NULL check before release_firmware() call. aic94xx: Get rid of redundant NULL check before release_firmware() call tehuti: delete redundant NULL check before release_firmware() qlogic: get rid of a redundant test for NULL before call to release_firmware() bna: remove redundant NULL test before release_firmware() tg3: remove redundant NULL test before release_firmware() call typhoon: get rid of redundant conditional before all to release_firmware() ...
2012-05-17	drivers/net: delete all code/drivers depending on CONFIG_MCA	Paul Gortmaker	2	-5/+2
	The support for CONFIG_MCA is being removed, since the 20 year old hardware simply isn't capable of meeting today's software demands on CPU and memory resources. This commit removes any MCA specific net drivers, and removes any MCA specific probe/support code from drivers that were doing a dual ISA/MCA role. Cc: "David S. Miller" <davem@davemloft.net> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: netdev@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-16	Documentation/networking/ieee802154: update MAC chapter	alex.bluesman.smirnov@gmail.com	1	-16/+59
	Update the documentation according to latest changes. Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-15	tokenring: delete all remaining driver support	Paul Gortmaker	5	-358/+0
	This represents the mass deletion of the of the tokenring support. It gets rid of: - the net/tr.c which the drivers depended on - the drivers/net component - the Kbuild infrastructure around it - any tokenring related CONFIG_ settings in any defconfigs - the tokenring headers in the include/linux dir - the firmware associated with the tokenring drivers. - any associated token ring documentation. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-05-14	batman-adv: README cleanups	Sven Eckelmann	1	-6/+5
	- Add routing_algo - Remove date from README: The date has to be updated when a patch touches the README. Therefore, nearly every feature will modify this date. It can happens quite often that not only one feature is currently in development or waiting on the mailinglist. This creates merge conflicts when applying a patchset. The date itself doesn't provide any additional information when this file is only available in a release tarball or as part of a SCM repository. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-05-08	netfilter: bridge: optionally set indev to vlan	Pablo Neira Ayuso	1	-2/+11
	if net.bridge.bridge-nf-filter-vlan-tagged sysctl is enabled, bridge netfilter removes the vlan header temporarily and then feeds the packet to ip(6)tables. When the new "bridge-nf-pass-vlan-input-device" sysctl is on (default off), then bridge netfilter will also set the in-interface to the vlan interface; if such an interface exists. This is needed to make iptables REDIRECT target work with "vlan-on-top-of-bridge" setups and to allow use of "iptables -i" to match the vlan device name. Also update Documentation with current brnf default settings. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Bart De Schuymer <bdschuym@pandora.be> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-07	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	1	-2/+2
	Conflicts: drivers/net/ethernet/intel/e1000e/param.c drivers/net/wireless/iwlwifi/iwl-agn-rx.c drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c drivers/net/wireless/iwlwifi/iwl-trans.h Resolved the iwlwifi conflict with mainline using 3-way diff posted by John Linville and Stephen Rothwell. In 'net' we added a bug fix to make iwlwifi report a more accurate skb->truesize but this conflicted with RX path changes that happened meanwhile in net-next. In e1000e a conflict arose in the validation code for settings of adapter->itr. 'net-next' had more sophisticated logic so that logic was used. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02	tcp: change tcp_adv_win_scale and tcp_rmem[2]	Eric Dumazet	1	-2/+2
	tcp_adv_win_scale default value is 2, meaning we expect a good citizen skb to have skb->len / skb->truesize ratio of 75% (3/4) In 2.6 kernels we (mis)accounted for typical MSS=1460 frame : 1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392. So these skbs were considered as not bloated. With recent truesize fixes, a typical MSS=1460 frame truesize is now the more precise : 2048 + 256 = 2304. But 2304 * 3/4 = 1728. So these skb are not good citizen anymore, because 1460 < 1728 (GRO can escape this problem because it build skbs with a too low truesize.) This also means tcp advertises a too optimistic window for a given allocated rcvspace : When receiving frames, sk_rmem_alloc can hit sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often, especially when application is slow to drain its receive queue or in case of losses (netperf is fast, scp is slow). This is a major latency source. We should adjust the len/truesize ratio to 50% instead of 75% This patch : 1) changes tcp_adv_win_scale default to 1 instead of 2 2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account better truesize tracking and to allow autotuning tcp receive window to reach same value than before. Note that same amount of kernel memory is consumed compared to 2.6 kernels. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02	tcp: early retransmit	Yuchung Cheng	1	-0/+14
	This patch implements RFC 5827 early retransmit (ER) for TCP. It reduces DUPACK threshold (dupthresh) if outstanding packets are less than 4 to recover losses by fast recovery instead of timeout. While the algorithm is simple, small but frequent network reordering makes this feature dangerous: the connection repeatedly enter false recovery and degrade performance. Therefore we implement a mitigation suggested in the appendix of the RFC that delays entering fast recovery by a small interval, i.e., RTT/4. Currently ER is conservative and is disabled for the rest of the connection after the first reordering event. A large scale web server experiment on the performance impact of ER is summarized in section 6 of the paper "Proportional Rate Reduction for TCP”, IMC 2011. http://conferences.sigcomm.org/imc/2011/docs/p155.pdf Note that Linux has a similar feature called THIN_DUPACK. The differences are THIN_DUPACK do not mitigate reorderings and is only used after slow start. Currently ER is disabled if THIN_DUPACK is enabled. I would be happy to merge THIN_DUPACK feature with ER if people think it's a good idea. ER is enabled by sysctl_tcp_early_retrans: 0: Disables ER 1: Reduce dupthresh to packets_out - 1 when outstanding packets < 4. 2: (Default) reduce dupthresh like mode 1. In addition, delay entering fast recovery by RTT/4. Note: mode 2 is implemented in the third part of this patch series. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-27	net: doc: merge /proc/sys/net/core/* documents into one place	Shan Wei	1	-4/+1
	All parameter descriptions in /proc/sys/net/core/* now is separated two places. So, merge them into Documentation/sysctl/net.txt. Signed-off-by: Shan Wei <davidshan@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-16	Documentation: Fix typo in multiple files in Documentation	Masanari Iida	1	-1/+1
	Correct multiple spelling typo in Documentation. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Rob Landley <rob@landley.net> Reported-by: Anders Larsen <al@alarsen.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-04-12	Merge branch 'master' into for-davem	John W. Linville	1	-7/+3
	Conflicts: drivers/net/wireless/iwlwifi/iwl-testmode.c net/wireless/nl80211.c
2012-04-12	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless	John W. Linville	11	-84/+68

2012-04-11	batman-adv: export claim tables through debugfs	Simon Wunderlich	1	-3/+2
	Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-11	batman-adv: add basic bridge loop avoidance code	Simon Wunderlich	1	-6/+8
	This second version of the bridge loop avoidance for batman-adv avoids loops between the mesh and a backbone (usually a LAN). By connecting multiple batman-adv mesh nodes to the same ethernet segment a loop can be created when the soft-interface is bridged into that ethernet segment. A simple visualization of the loop involving the most common case - a LAN as ethernet segment: node1 <-- LAN --> node2 \| \| wifi <-- mesh --> wifi Packets from the LAN (e.g. ARP broadcasts) will circle forever from node1 or node2 over the mesh back into the LAN. With this patch, batman recognizes backbone gateways, nodes which are part of the mesh and backbone/LAN at the same time. Each backbone gateway "claims" clients from within the mesh to handle them exclusively. By restricting that only responsible backbone gateways may handle their claimed clients traffic, loops are effectively avoided. Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-04-10	mac80211: set HT channel before association	Johannes Berg	1	-7/+3
	Changing the channel type during operation is confusing to some drivers and will be hard to handle in multi-channel scenarios. Instead of changing the channel, set it to the right HT channel before authenticating/associating and don't change it -- just update the 20/40 MHz restrictions in rate control as needed when changed by the AP. This also fixes a problem that Paul missed in his fix for the "regulatory makes us deaf" issue -- when we couldn't use 40 MHz we still associated saying we were using 40 MHz, which could in similarly broken APs make us never even connect successfully. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-10	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	11	-84/+68

2012-04-06	doc, net: Update ndo_start_xmit return type and values	Ben Hutchings	1	-10/+12
	Commit dc1f8bf68b311b1537cb65893430b6796118498a ('netdev: change transmit to limited range type') changed the required return type and 9a1654ba0b50402a6bd03c7b0fe9b0200a5ea7b1 ('net: Optimize hard_start_xmit() return checking') changed the valid numerical return values. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06	doc, net: Remove instruction to set net_device::trans_start	Ben Hutchings	1	-5/+2
	Commit 08baf561083bc27a953aa087dd8a664bb2b88e8e ('net: txq_trans_update() helper') made it unnecessary for most drivers to set net_device::trans_start (or netdev_queue::trans_start). Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06	doc, net: Update netdev operation names	Ben Hutchings	2	-14/+14
	Commits d314774cf2cd5dfeb39a00d37deee65d4c627927 ('netdev: network device operations infrastructure') and 008298231abbeb91bc7be9e8b078607b816d1a4a ('netdev: add more functions to netdevice ops') moved and renamed net device operation pointers. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06	doc, net: Update documentation of synchronisation for TX multiqueue	Ben Hutchings	1	-3/+3
	Commits e308a5d806c852f56590ffdd3834d0df0cbed8d7 ('netdev: Add netdev->addr_list_lock protection.') and e8a0464cc950972824e2e128028ae3db666ec1ed ('netdev: Allocate multiple queues for TX.') introduced more fine-grained locks. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-06	doc, net: Remove obsolete reference to dev->poll	Ben Hutchings	1	-2/+1
	Commit bea3348eef27e6044b6161fd04c3152215f96411 ('[NET]: Make NAPI polling independent of struct net_device objects.') removed the automatic disabling of NAPI polling by dev_close(), and drivers must now do this themselves. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>