aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/networking/snmp_counter.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking/snmp_counter.rst')
-rw-r--r--Documentation/networking/snmp_counter.rst415
1 files changed, 387 insertions, 28 deletions
diff --git a/Documentation/networking/snmp_counter.rst b/Documentation/networking/snmp_counter.rst
index b0dfdaaca512..52b026be028f 100644
--- a/Documentation/networking/snmp_counter.rst
+++ b/Documentation/networking/snmp_counter.rst
@@ -1,16 +1,17 @@
-===========
+============
SNMP counter
-===========
+============
This document explains the meaning of SNMP counters.
General IPv4 counters
-====================
+=====================
All layer 4 packets and ICMP packets will change these counters, but
these counters won't be changed by layer 2 packets (such as STP) or
ARP packets.
* IpInReceives
+
Defined in `RFC1213 ipInReceives`_
.. _RFC1213 ipInReceives: https://tools.ietf.org/html/rfc1213#page-26
@@ -23,6 +24,7 @@ and so on). It indicates the number of aggregated segments after
GRO/LRO.
* IpInDelivers
+
Defined in `RFC1213 ipInDelivers`_
.. _RFC1213 ipInDelivers: https://tools.ietf.org/html/rfc1213#page-28
@@ -33,6 +35,7 @@ supported protocols will be delivered, if someone listens on the raw
socket, all valid IP packets will be delivered.
* IpOutRequests
+
Defined in `RFC1213 ipOutRequests`_
.. _RFC1213 ipOutRequests: https://tools.ietf.org/html/rfc1213#page-28
@@ -42,6 +45,7 @@ multicast packets, and would always be updated together with
IpExtOutOctets.
* IpExtInOctets and IpExtOutOctets
+
They are Linux kernel extensions, no RFC definitions. Please note,
RFC1213 indeed defines ifInOctets and ifOutOctets, but they
are different things. The ifInOctets and ifOutOctets include the MAC
@@ -49,6 +53,7 @@ layer header size but IpExtInOctets and IpExtOutOctets don't, they
only include the IP layer header and the IP layer data.
* IpExtInNoECTPkts, IpExtInECT1Pkts, IpExtInECT0Pkts, IpExtInCEPkts
+
They indicate the number of four kinds of ECN IP packets, please refer
`Explicit Congestion Notification`_ for more details.
@@ -60,6 +65,7 @@ for the same packet, you might find that IpInReceives count 1, but
IpExtInNoECTPkts counts 2 or more.
* IpInHdrErrors
+
Defined in `RFC1213 ipInHdrErrors`_. It indicates the packet is
dropped due to the IP header error. It might happen in both IP input
and IP forward paths.
@@ -67,6 +73,7 @@ and IP forward paths.
.. _RFC1213 ipInHdrErrors: https://tools.ietf.org/html/rfc1213#page-27
* IpInAddrErrors
+
Defined in `RFC1213 ipInAddrErrors`_. It will be increased in two
scenarios: (1) The IP address is invalid. (2) The destination IP
address is not a local address and IP forwarding is not enabled
@@ -74,6 +81,7 @@ address is not a local address and IP forwarding is not enabled
.. _RFC1213 ipInAddrErrors: https://tools.ietf.org/html/rfc1213#page-27
* IpExtInNoRoutes
+
This counter means the packet is dropped when the IP stack receives a
packet and can't find a route for it from the route table. It might
happen when IP forwarding is enabled and the destination IP address is
@@ -81,6 +89,7 @@ not a local address and there is no route for the destination IP
address.
* IpInUnknownProtos
+
Defined in `RFC1213 ipInUnknownProtos`_. It will be increased if the
layer 4 protocol is unsupported by kernel. If an application is using
raw socket, kernel will always deliver the packet to the raw socket
@@ -89,10 +98,12 @@ and this counter won't be increased.
.. _RFC1213 ipInUnknownProtos: https://tools.ietf.org/html/rfc1213#page-27
* IpExtInTruncatedPkts
+
For IPv4 packet, it means the actual data size is smaller than the
"Total Length" field in the IPv4 header.
* IpInDiscards
+
Defined in `RFC1213 ipInDiscards`_. It indicates the packet is dropped
in the IP receiving path and due to kernel internal reasons (e.g. no
enough memory).
@@ -100,20 +111,23 @@ enough memory).
.. _RFC1213 ipInDiscards: https://tools.ietf.org/html/rfc1213#page-28
* IpOutDiscards
+
Defined in `RFC1213 ipOutDiscards`_. It indicates the packet is
dropped in the IP sending path and due to kernel internal reasons.
.. _RFC1213 ipOutDiscards: https://tools.ietf.org/html/rfc1213#page-28
* IpOutNoRoutes
+
Defined in `RFC1213 ipOutNoRoutes`_. It indicates the packet is
dropped in the IP sending path and no route is found for it.
.. _RFC1213 ipOutNoRoutes: https://tools.ietf.org/html/rfc1213#page-29
ICMP counters
-============
+=============
* IcmpInMsgs and IcmpOutMsgs
+
Defined by `RFC1213 icmpInMsgs`_ and `RFC1213 icmpOutMsgs`_
.. _RFC1213 icmpInMsgs: https://tools.ietf.org/html/rfc1213#page-41
@@ -126,6 +140,7 @@ IcmpOutMsgs would still be updated if the IP header is constructed by
a userspace program.
* ICMP named types
+
| These counters include most of common ICMP types, they are:
| IcmpInDestUnreachs: `RFC1213 icmpInDestUnreachs`_
| IcmpInTimeExcds: `RFC1213 icmpInTimeExcds`_
@@ -180,6 +195,7 @@ straightforward. The 'In' counter means kernel receives such a packet
and the 'Out' counter means kernel sends such a packet.
* ICMP numeric types
+
They are IcmpMsgInType[N] and IcmpMsgOutType[N], the [N] indicates the
ICMP type number. These counters track all kinds of ICMP packets. The
ICMP type number definition could be found in the `ICMP parameters`_
@@ -192,12 +208,14 @@ IcmpMsgOutType8 would increase 1. And if kernel gets an ICMP Echo Reply
packet, IcmpMsgInType0 would increase 1.
* IcmpInCsumErrors
+
This counter indicates the checksum of the ICMP packet is
wrong. Kernel verifies the checksum after updating the IcmpInMsgs and
before updating IcmpMsgInType[N]. If a packet has bad checksum, the
IcmpInMsgs would be updated but none of IcmpMsgInType[N] would be updated.
* IcmpInErrors and IcmpOutErrors
+
Defined by `RFC1213 icmpInErrors`_ and `RFC1213 icmpOutErrors`_
.. _RFC1213 icmpInErrors: https://tools.ietf.org/html/rfc1213#page-41
@@ -209,7 +227,7 @@ and the sending packet path use IcmpOutErrors. When IcmpInCsumErrors
is increased, IcmpInErrors would always be increased too.
relationship of the ICMP counters
--------------------------------
+---------------------------------
The sum of IcmpMsgOutType[N] is always equal to IcmpOutMsgs, as they
are updated at the same time. The sum of IcmpMsgInType[N] plus
IcmpInErrors should be equal or larger than IcmpInMsgs. When kernel
@@ -229,8 +247,9 @@ IcmpInMsgs should be less than the sum of IcmpMsgOutType[N] plus
IcmpInErrors.
General TCP counters
-==================
+====================
* TcpInSegs
+
Defined in `RFC1213 tcpInSegs`_
.. _RFC1213 tcpInSegs: https://tools.ietf.org/html/rfc1213#page-48
@@ -247,6 +266,7 @@ isn't aware of GRO. So if two packets are merged by GRO, the TcpInSegs
counter would only increase 1.
* TcpOutSegs
+
Defined in `RFC1213 tcpOutSegs`_
.. _RFC1213 tcpOutSegs: https://tools.ietf.org/html/rfc1213#page-48
@@ -258,6 +278,7 @@ GSO, so if a packet would be split to 2 by GSO, TcpOutSegs will
increase 2.
* TcpActiveOpens
+
Defined in `RFC1213 tcpActiveOpens`_
.. _RFC1213 tcpActiveOpens: https://tools.ietf.org/html/rfc1213#page-47
@@ -267,6 +288,7 @@ state. Every time TcpActiveOpens increases 1, TcpOutSegs should always
increase 1.
* TcpPassiveOpens
+
Defined in `RFC1213 tcpPassiveOpens`_
.. _RFC1213 tcpPassiveOpens: https://tools.ietf.org/html/rfc1213#page-47
@@ -275,6 +297,7 @@ It means the TCP layer receives a SYN, replies a SYN+ACK, come into
the SYN-RCVD state.
* TcpExtTCPRcvCoalesce
+
When packets are received by the TCP layer and are not be read by the
application, the TCP layer will try to merge them. This counter
indicate how many packets are merged in such situation. If GRO is
@@ -282,12 +305,14 @@ enabled, lots of packets would be merged by GRO, these packets
wouldn't be counted to TcpExtTCPRcvCoalesce.
* TcpExtTCPAutoCorking
+
When sending packets, the TCP layer will try to merge small packets to
a bigger one. This counter increase 1 for every packet merged in such
situation. Please refer to the LWN article for more details:
https://lwn.net/Articles/576263/
* TcpExtTCPOrigDataSent
+
This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
explaination below::
@@ -297,6 +322,7 @@ explaination below::
more useful to track the TCP retransmission rate.
* TCPSynRetrans
+
This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
explaination below::
@@ -304,6 +330,7 @@ explaination below::
retransmissions into SYN, fast-retransmits, timeout retransmits, etc.
* TCPFastOpenActiveFail
+
This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
explaination below::
@@ -313,6 +340,7 @@ explaination below::
.. _kernel commit f19c29e3e391: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f19c29e3e391a66a273e9afebaf01917245148cd
* TcpExtListenOverflows and TcpExtListenDrops
+
When kernel receives a SYN from a client, and if the TCP accept queue
is full, kernel will drop the SYN and add 1 to TcpExtListenOverflows.
At the same time kernel will also add 1 to TcpExtListenDrops. When a
@@ -337,6 +365,54 @@ to the accept queue.
TCP Fast Open
+=============
+* TcpEstabResets
+
+Defined in `RFC1213 tcpEstabResets`_.
+
+.. _RFC1213 tcpEstabResets: https://tools.ietf.org/html/rfc1213#page-48
+
+* TcpAttemptFails
+
+Defined in `RFC1213 tcpAttemptFails`_.
+
+.. _RFC1213 tcpAttemptFails: https://tools.ietf.org/html/rfc1213#page-48
+
+* TcpOutRsts
+
+Defined in `RFC1213 tcpOutRsts`_. The RFC says this counter indicates
+the 'segments sent containing the RST flag', but in linux kernel, this
+couner indicates the segments kerenl tried to send. The sending
+process might be failed due to some errors (e.g. memory alloc failed).
+
+.. _RFC1213 tcpOutRsts: https://tools.ietf.org/html/rfc1213#page-52
+
+* TcpExtTCPSpuriousRtxHostQueues
+
+When the TCP stack wants to retransmit a packet, and finds that packet
+is not lost in the network, but the packet is not sent yet, the TCP
+stack would give up the retransmission and update this counter. It
+might happen if a packet stays too long time in a qdisc or driver
+queue.
+
+* TcpEstabResets
+
+The socket receives a RST packet in Establish or CloseWait state.
+
+* TcpExtTCPKeepAlive
+
+This counter indicates many keepalive packets were sent. The keepalive
+won't be enabled by default. A userspace program could enable it by
+setting the SO_KEEPALIVE socket option.
+
+* TcpExtTCPSpuriousRTOs
+
+The spurious retransmission timeout detected by the `F-RTO`_
+algorithm.
+
+.. _F-RTO: https://tools.ietf.org/html/rfc5682
+
+TCP Fast Path
============
When kernel receives a TCP packet, it has two paths to handler the
packet, one is fast path, another is slow path. The comment in kernel
@@ -370,22 +446,23 @@ will disable the fast path at first, and try to enable it after kernel
receives packets.
* TcpExtTCPPureAcks and TcpExtTCPHPAcks
+
If a packet set ACK flag and has no data, it is a pure ACK packet, if
kernel handles it in the fast path, TcpExtTCPHPAcks will increase 1,
if kernel handles it in the slow path, TcpExtTCPPureAcks will
increase 1.
* TcpExtTCPHPHits
+
If a TCP packet has data (which means it is not a pure ACK packet),
and this packet is handled in the fast path, TcpExtTCPHPHits will
increase 1.
TCP abort
-========
-
-
+=========
* TcpExtTCPAbortOnData
+
It means TCP layer has data in flight, but need to close the
connection. So TCP layer sends a RST to the other side, indicate the
connection is not closed very graceful. An easy way to increase this
@@ -404,11 +481,13 @@ when the application closes a connection, kernel will send a RST
immediately and increase the TcpExtTCPAbortOnData counter.
* TcpExtTCPAbortOnClose
+
This counter means the application has unread data in the TCP layer when
the application wants to close the TCP connection. In such a situation,
kernel will send a RST to the other side of the TCP connection.
* TcpExtTCPAbortOnMemory
+
When an application closes a TCP connection, kernel still need to track
the connection, let it complete the TCP disconnect process. E.g. an
app calls the close method of a socket, kernel sends fin to the other
@@ -430,10 +509,12 @@ the tcp_mem. Please refer the tcp_mem section in the `TCP man page`_:
* TcpExtTCPAbortOnTimeout
+
This counter will increase when any of the TCP timers expire. In such
situation, kernel won't send RST, just give up the connection.
* TcpExtTCPAbortOnLinger
+
When a TCP connection comes into FIN_WAIT_2 state, instead of waiting
for the fin packet from the other side, kernel could send a RST and
delete the socket immediately. This is not the default behavior of
@@ -441,6 +522,7 @@ Linux kernel TCP stack. By configuring the TCP_LINGER2 socket option,
you could let kernel follow this behavior.
* TcpExtTCPAbortFailed
+
The kernel TCP layer will send RST if the `RFC2525 2.17 section`_ is
satisfied. If an internal error occurs during this process,
TcpExtTCPAbortFailed will be increased.
@@ -448,7 +530,7 @@ TcpExtTCPAbortFailed will be increased.
.. _RFC2525 2.17 section: https://tools.ietf.org/html/rfc2525#page-50
TCP Hybrid Slow Start
-====================
+=====================
The Hybrid Slow Start algorithm is an enhancement of the traditional
TCP congestion window Slow Start algorithm. It uses two pieces of
information to detect whether the max bandwidth of the TCP path is
@@ -464,23 +546,27 @@ relate with the Hybrid Slow Start algorithm.
.. _Hybrid Slow Start paper: https://pdfs.semanticscholar.org/25e9/ef3f03315782c7f1cbcd31b587857adae7d1.pdf
* TcpExtTCPHystartTrainDetect
+
How many times the ACK train length threshold is detected
* TcpExtTCPHystartTrainCwnd
+
The sum of CWND detected by ACK train length. Dividing this value by
TcpExtTCPHystartTrainDetect is the average CWND which detected by the
ACK train length.
* TcpExtTCPHystartDelayDetect
+
How many times the packet delay threshold is detected.
* TcpExtTCPHystartDelayCwnd
+
The sum of CWND detected by packet delay. Dividing this value by
TcpExtTCPHystartDelayDetect is the average CWND which detected by the
packet delay.
TCP retransmission and congestion control
-======================================
+=========================================
The TCP protocol has two retransmission mechanisms: SACK and fast
recovery. They are exclusive with each other. When SACK is enabled,
the kernel TCP stack would use SACK, or kernel would use fast
@@ -499,12 +585,14 @@ https://pdfs.semanticscholar.org/0e9c/968d09ab2e53e24c4dca5b2d67c7f7140f8e.pdf
.. _RFC6582: https://tools.ietf.org/html/rfc6582
* TcpExtTCPRenoRecovery and TcpExtTCPSackRecovery
+
When the congestion control comes into Recovery state, if sack is
used, TcpExtTCPSackRecovery increases 1, if sack is not used,
TcpExtTCPRenoRecovery increases 1. These two counters mean the TCP
stack begins to retransmit the lost packets.
* TcpExtTCPSACKReneging
+
A packet was acknowledged by SACK, but the receiver has dropped this
packet, so the sender needs to retransmit this packet. In this
situation, the sender adds 1 to TcpExtTCPSACKReneging. A receiver
@@ -515,6 +603,7 @@ the RTO expires for this packet, then the sender assumes this packet
has been dropped by the receiver.
* TcpExtTCPRenoReorder
+
The reorder packet is detected by fast recovery. It would only be used
if SACK is disabled. The fast recovery algorithm detects recorder by
the duplicate ACK number. E.g., if retransmission is triggered, and
@@ -525,6 +614,7 @@ order packet. Thus the sender would find more ACks than its
expectation, and the sender knows out of order occurs.
* TcpExtTCPTSReorder
+
The reorder packet is detected when a hole is filled. E.g., assume the
sender sends packet 1,2,3,4,5, and the receiving order is
1,2,4,5,3. When the sender receives the ACK of packet 3 (which will
@@ -534,6 +624,7 @@ fill the hole), two conditions will let TcpExtTCPTSReorder increase
than the retransmission timestamp.
* TcpExtTCPSACKReorder
+
The reorder packet detected by SACK. The SACK has two methods to
detect reorder: (1) DSACK is received by the sender. It means the
sender sends the same packet more than one times. And the only reason
@@ -545,6 +636,28 @@ packet yet, the sender would know packet 4 is out of order. The TCP
stack of kernel will increase TcpExtTCPSACKReorder for both of the
above scenarios.
+* TcpExtTCPSlowStartRetrans
+
+The TCP stack wants to retransmit a packet and the congestion control
+state is 'Loss'.
+
+* TcpExtTCPFastRetrans
+
+The TCP stack wants to retransmit a packet and the congestion control
+state is not 'Loss'.
+
+* TcpExtTCPLostRetransmit
+
+A SACK points out that a retransmission packet is lost again.
+
+* TcpExtTCPRetransFail
+
+The TCP stack tries to deliver a retransmission packet to lower layers
+but the lower layers return an error.
+
+* TcpExtTCPSynRetrans
+
+The TCP stack retransmits a SYN packet.
DSACK
=====
@@ -558,39 +671,95 @@ sender side.
.. _RFC2883 : https://tools.ietf.org/html/rfc2883
* TcpExtTCPDSACKOldSent
+
The TCP stack receives a duplicate packet which has been acked, so it
sends a DSACK to the sender.
* TcpExtTCPDSACKOfoSent
+
The TCP stack receives an out of order duplicate packet, so it sends a
DSACK to the sender.
* TcpExtTCPDSACKRecv
-The TCP stack receives a DSACK, which indicate an acknowledged
+The TCP stack receives a DSACK, which indicates an acknowledged
duplicate packet is received.
* TcpExtTCPDSACKOfoRecv
+
The TCP stack receives a DSACK, which indicate an out of order
duplicate packet is received.
+invalid SACK and DSACK
+====================
+When a SACK (or DSACK) block is invalid, a corresponding counter would
+be updated. The validation method is base on the start/end sequence
+number of the SACK block. For more details, please refer the comment
+of the function tcp_is_sackblock_valid in the kernel source code. A
+SACK option could have up to 4 blocks, they are checked
+individually. E.g., if 3 blocks of a SACk is invalid, the
+corresponding counter would be updated 3 times. The comment of the
+`Add counters for discarded SACK blocks`_ patch has additional
+explaination:
+
+.. _Add counters for discarded SACK blocks: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=18f02545a9a16c9a89778b91a162ad16d510bb32
+
+* TcpExtTCPSACKDiscard
+This counter indicates how many SACK blocks are invalid. If the invalid
+SACK block is caused by ACK recording, the TCP stack will only ignore
+it and won't update this counter.
+
+* TcpExtTCPDSACKIgnoredOld and TcpExtTCPDSACKIgnoredNoUndo
+When a DSACK block is invalid, one of these two counters would be
+updated. Which counter will be updated depends on the undo_marker flag
+of the TCP socket. If the undo_marker is not set, the TCP stack isn't
+likely to re-transmit any packets, and we still receive an invalid
+DSACK block, the reason might be that the packet is duplicated in the
+middle of the network. In such scenario, TcpExtTCPDSACKIgnoredNoUndo
+will be updated. If the undo_marker is set, TcpExtTCPDSACKIgnoredOld
+will be updated. As implied in its name, it might be an old packet.
+
+SACK shift
+=========
+The linux networking stack stores data in sk_buff struct (skb for
+short). If a SACK block acrosses multiple skb, the TCP stack will try
+to re-arrange data in these skb. E.g. if a SACK block acknowledges seq
+10 to 15, skb1 has seq 10 to 13, skb2 has seq 14 to 20. The seq 14 and
+15 in skb2 would be moved to skb1. This operation is 'shift'. If a
+SACK block acknowledges seq 10 to 20, skb1 has seq 10 to 13, skb2 has
+seq 14 to 20. All data in skb2 will be moved to skb1, and skb2 will be
+discard, this operation is 'merge'.
+
+* TcpExtTCPSackShifted
+A skb is shifted
+
+* TcpExtTCPSackMerged
+A skb is merged
+
+* TcpExtTCPSackShiftFallback
+A skb should be shifted or merged, but the TCP stack doesn't do it for
+some reasons.
+
TCP out of order
-===============
+================
* TcpExtTCPOFOQueue
+
The TCP layer receives an out of order packet and has enough memory
to queue it.
* TcpExtTCPOFODrop
+
The TCP layer receives an out of order packet but doesn't have enough
memory, so drops it. Such packets won't be counted into
TcpExtTCPOFOQueue.
* TcpExtTCPOFOMerge
+
The received out of order packet has an overlay with the previous
packet. the overlay part will be dropped. All of TcpExtTCPOFOMerge
packets will also be counted into TcpExtTCPOFOQueue.
TCP PAWS
-=======
+========
PAWS (Protection Against Wrapped Sequence numbers) is an algorithm
which is used to drop old packets. It depends on the TCP
timestamps. For detail information, please refer the `timestamp wiki`_
@@ -600,13 +769,15 @@ and the `RFC of PAWS`_.
.. _timestamp wiki: https://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_timestamps
* TcpExtPAWSActive
+
Packets are dropped by PAWS in Syn-Sent status.
* TcpExtPAWSEstab
+
Packets are dropped by PAWS in any status other than Syn-Sent.
TCP ACK skip
-===========
+============
In some scenarios, kernel would avoid sending duplicate ACKs too
frequently. Please find more details in the tcp_invalid_ratelimit
section of the `sysctl document`_. When kernel decides to skip an ACK
@@ -618,6 +789,7 @@ it has no data.
.. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
* TcpExtTCPACKSkippedSynRecv
+
The ACK is skipped in Syn-Recv status. The Syn-Recv status means the
TCP stack receives a SYN and replies SYN+ACK. Now the TCP stack is
waiting for an ACK. Generally, the TCP stack doesn't need to send ACK
@@ -631,6 +803,7 @@ increase TcpExtTCPACKSkippedSynRecv.
* TcpExtTCPACKSkippedPAWS
+
The ACK is skipped due to PAWS (Protect Against Wrapped Sequence
numbers) check fails. If the PAWS check fails in Syn-Recv, Fin-Wait-2
or Time-Wait statuses, the skipped ACK would be counted to
@@ -639,18 +812,22 @@ TcpExtTCPACKSkippedTimeWait. In all other statuses, the skipped ACK
would be counted to TcpExtTCPACKSkippedPAWS.
* TcpExtTCPACKSkippedSeq
+
The sequence number is out of window and the timestamp passes the PAWS
check and the TCP status is not Syn-Recv, Fin-Wait-2, and Time-Wait.
* TcpExtTCPACKSkippedFinWait2
+
The ACK is skipped in Fin-Wait-2 status, the reason would be either
PAWS check fails or the received sequence number is out of window.
* TcpExtTCPACKSkippedTimeWait
+
Tha ACK is skipped in Time-Wait status, the reason would be either
PAWS check failed or the received sequence number is out of window.
* TcpExtTCPACKSkippedChallenge
+
The ACK is skipped if the ACK is a challenge ACK. The RFC 5961 defines
3 kind of challenge ACK, please refer `RFC 5961 section 3.2`_,
`RFC 5961 section 4.2`_ and `RFC 5961 section 5.2`_. Besides these
@@ -662,12 +839,194 @@ unacknowledged number (more strict than `RFC 5961 section 5.2`_).
.. _RFC 5961 section 4.2: https://tools.ietf.org/html/rfc5961#page-9
.. _RFC 5961 section 5.2: https://tools.ietf.org/html/rfc5961#page-11
+TCP receive window
+==================
+* TcpExtTCPWantZeroWindowAdv
+
+Depending on current memory usage, the TCP stack tries to set receive
+window to zero. But the receive window might still be a no-zero
+value. For example, if the previous window size is 10, and the TCP
+stack receives 3 bytes, the current window size would be 7 even if the
+window size calculated by the memory usage is zero.
+
+* TcpExtTCPToZeroWindowAdv
+
+The TCP receive window is set to zero from a no-zero value.
+
+* TcpExtTCPFromZeroWindowAdv
+
+The TCP receive window is set to no-zero value from zero.
+
+
+Delayed ACK
+===========
+The TCP Delayed ACK is a technique which is used for reducing the
+packet count in the network. For more details, please refer the
+`Delayed ACK wiki`_
+
+.. _Delayed ACK wiki: https://en.wikipedia.org/wiki/TCP_delayed_acknowledgment
+
+* TcpExtDelayedACKs
+
+A delayed ACK timer expires. The TCP stack will send a pure ACK packet
+and exit the delayed ACK mode.
+
+* TcpExtDelayedACKLocked
+
+A delayed ACK timer expires, but the TCP stack can't send an ACK
+immediately due to the socket is locked by a userspace program. The
+TCP stack will send a pure ACK later (after the userspace program
+unlock the socket). When the TCP stack sends the pure ACK later, the
+TCP stack will also update TcpExtDelayedACKs and exit the delayed ACK
+mode.
+
+* TcpExtDelayedACKLost
+
+It will be updated when the TCP stack receives a packet which has been
+ACKed. A Delayed ACK loss might cause this issue, but it would also be
+triggered by other reasons, such as a packet is duplicated in the
+network.
+
+Tail Loss Probe (TLP)
+=====================
+TLP is an algorithm which is used to detect TCP packet loss. For more
+details, please refer the `TLP paper`_.
+
+.. _TLP paper: https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01
+
+* TcpExtTCPLossProbes
+
+A TLP probe packet is sent.
+
+* TcpExtTCPLossProbeRecovery
+
+A packet loss is detected and recovered by TLP.
+
+TCP Fast Open
+=============
+TCP Fast Open is a technology which allows data transfer before the
+3-way handshake complete. Please refer the `TCP Fast Open wiki`_ for a
+general description.
+
+.. _TCP Fast Open wiki: https://en.wikipedia.org/wiki/TCP_Fast_Open
+
+* TcpExtTCPFastOpenActive
+
+When the TCP stack receives an ACK packet in the SYN-SENT status, and
+the ACK packet acknowledges the data in the SYN packet, the TCP stack
+understand the TFO cookie is accepted by the other side, then it
+updates this counter.
+
+* TcpExtTCPFastOpenActiveFail
+
+This counter indicates that the TCP stack initiated a TCP Fast Open,
+but it failed. This counter would be updated in three scenarios: (1)
+the other side doesn't acknowledge the data in the SYN packet. (2) The
+SYN packet which has the TFO cookie is timeout at least once. (3)
+after the 3-way handshake, the retransmission timeout happens
+net.ipv4.tcp_retries1 times, because some middle-boxes may black-hole
+fast open after the handshake.
+
+* TcpExtTCPFastOpenPassive
+
+This counter indicates how many times the TCP stack accepts the fast
+open request.
+
+* TcpExtTCPFastOpenPassiveFail
+
+This counter indicates how many times the TCP stack rejects the fast
+open request. It is caused by either the TFO cookie is invalid or the
+TCP stack finds an error during the socket creating process.
+
+* TcpExtTCPFastOpenListenOverflow
+
+When the pending fast open request number is larger than
+fastopenq->max_qlen, the TCP stack will reject the fast open request
+and update this counter. When this counter is updated, the TCP stack
+won't update TcpExtTCPFastOpenPassive or
+TcpExtTCPFastOpenPassiveFail. The fastopenq->max_qlen is set by the
+TCP_FASTOPEN socket operation and it could not be larger than
+net.core.somaxconn. For example:
+
+setsockopt(sfd, SOL_TCP, TCP_FASTOPEN, &qlen, sizeof(qlen));
+
+* TcpExtTCPFastOpenCookieReqd
+
+This counter indicates how many times a client wants to request a TFO
+cookie.
+
+SYN cookies
+===========
+SYN cookies are used to mitigate SYN flood, for details, please refer
+the `SYN cookies wiki`_.
+
+.. _SYN cookies wiki: https://en.wikipedia.org/wiki/SYN_cookies
+
+* TcpExtSyncookiesSent
+
+It indicates how many SYN cookies are sent.
+
+* TcpExtSyncookiesRecv
+
+How many reply packets of the SYN cookies the TCP stack receives.
+
+* TcpExtSyncookiesFailed
+
+The MSS decoded from the SYN cookie is invalid. When this counter is
+updated, the received packet won't be treated as a SYN cookie and the
+TcpExtSyncookiesRecv counter wont be updated.
+
+Challenge ACK
+=============
+For details of challenge ACK, please refer the explaination of
+TcpExtTCPACKSkippedChallenge.
+
+* TcpExtTCPChallengeACK
+
+The number of challenge acks sent.
+
+* TcpExtTCPSYNChallenge
+
+The number of challenge acks sent in response to SYN packets. After
+updates this counter, the TCP stack might send a challenge ACK and
+update the TcpExtTCPChallengeACK counter, or it might also skip to
+send the challenge and update the TcpExtTCPACKSkippedChallenge.
+
+prune
+=====
+When a socket is under memory pressure, the TCP stack will try to
+reclaim memory from the receiving queue and out of order queue. One of
+the reclaiming method is 'collapse', which means allocate a big sbk,
+copy the contiguous skbs to the single big skb, and free these
+contiguous skbs.
+
+* TcpExtPruneCalled
+
+The TCP stack tries to reclaim memory for a socket. After updates this
+counter, the TCP stack will try to collapse the out of order queue and
+the receiving queue. If the memory is still not enough, the TCP stack
+will try to discard packets from the out of order queue (and update the
+TcpExtOfoPruned counter)
+
+* TcpExtOfoPruned
+
+The TCP stack tries to discard packet on the out of order queue.
+
+* TcpExtRcvPruned
+
+After 'collapse' and discard packets from the out of order queue, if
+the actually used memory is still larger than the max allowed memory,
+this counter will be updated. It means the 'prune' fails.
+
+* TcpExtTCPRcvCollapsed
+
+This counter indicates how many skbs are freed during 'collapse'.
examples
-=======
+========
ping test
---------
+---------
Run the ping command against the public dns server 8.8.8.8::
nstatuser@nstat-a:~$ ping 8.8.8.8 -c 1
@@ -711,7 +1070,7 @@ and its corresponding Echo Reply packet are constructed by:
So the IpExtInOctets and IpExtOutOctets are 20+16+48=84.
tcp 3-way handshake
-------------------
+-------------------
On server side, we run::
nstatuser@nstat-b:~$ nc -lknv 0.0.0.0 9000
@@ -753,7 +1112,7 @@ ACK, so client sent 2 packets, received 1 packet, TcpInSegs increased
1, TcpOutSegs increased 2.
TCP normal traffic
------------------
+------------------
Run nc on server::
nstatuser@nstat-b:~$ nc -lkv 0.0.0.0 9000
@@ -876,7 +1235,7 @@ and the packet received from client qualified for fast path, so it
was counted into 'TcpExtTCPHPHits'.
TcpExtTCPAbortOnClose
---------------------
+---------------------
On the server side, we run below python script::
import socket
@@ -910,7 +1269,7 @@ If we run tcpdump on the server side, we could find the server sent a
RST after we type Ctrl-C.
TcpExtTCPAbortOnMemory and TcpExtTCPAbortOnTimeout
------------------------------------------------
+---------------------------------------------------
Below is an example which let the orphan socket count be higher than
net.ipv4.tcp_max_orphans.
Change tcp_max_orphans to a smaller value on client::
@@ -1032,7 +1391,7 @@ FIN_WAIT_1 state finally. So we wait for a few minutes, we could find
TcpExtTCPAbortOnTimeout 10 0.0
TcpExtTCPAbortOnLinger
----------------------
+----------------------
The server side code::
nstatuser@nstat-b:~$ cat server_linger.py
@@ -1077,7 +1436,7 @@ After run client_linger.py, check the output of nstat::
TcpExtTCPAbortOnLinger 1 0.0
TcpExtTCPRcvCoalesce
--------------------
+--------------------
On the server, we run a program which listen on TCP port 9000, but
doesn't read any data::
@@ -1137,7 +1496,7 @@ the receiving queue. So the TCP layer merged the two packets, and we
could find the TcpExtTCPRcvCoalesce increased 1.
TcpExtListenOverflows and TcpExtListenDrops
-----------------------------------------
+-------------------------------------------
On server, run the nc command, listen on port 9000::
nstatuser@nstat-b:~$ nc -lkv 0.0.0.0 9000
@@ -1185,7 +1544,7 @@ TcpExtListenOverflows and TcpExtListenDrops would be larger, because
the SYN of the 4th nc was dropped, the client was retrying.
IpInAddrErrors, IpExtInNoRoutes and IpOutNoRoutes
-----------------------------------------------
+-------------------------------------------------
server A IP address: 192.168.122.250
server B IP address: 192.168.122.251
Prepare on server A, add a route to server B::
@@ -1280,7 +1639,7 @@ a route for the 8.8.8.8 IP address, so server B increased
IpOutNoRoutes.
TcpExtTCPACKSkippedSynRecv
-------------------------
+--------------------------
In this test, we send 3 same SYN packets from client to server. The
first SYN will let server create a socket, set it to Syn-Recv status,
and reply a SYN/ACK. The second SYN will let server reply the SYN/ACK
@@ -1328,7 +1687,7 @@ Check snmp cunter on nstat-b::
As we expected, TcpExtTCPACKSkippedSynRecv is 1.
TcpExtTCPACKSkippedPAWS
-----------------------
+-----------------------
To trigger PAWS, we could send an old SYN.
On nstat-b, let nc listen on port 9000::
@@ -1365,7 +1724,7 @@ failed, the nstat-b replied an ACK for the first SYN, skipped the ACK
for the second SYN, and updated TcpExtTCPACKSkippedPAWS.
TcpExtTCPACKSkippedSeq
---------------------
+----------------------
To trigger TcpExtTCPACKSkippedSeq, we send packets which have valid
timestamp (to pass PAWS check) but the sequence number is out of
window. The linux TCP stack would avoid to skip if the packet has