aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/networking
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/bonding.txt31
-rw-r--r--Documentation/networking/filter.txt12
-rw-r--r--Documentation/networking/i40e.txt7
-rw-r--r--Documentation/networking/ip-sysctl.txt38
-rw-r--r--Documentation/networking/packet_mmap.txt18
-rw-r--r--Documentation/networking/phy.txt18
-rw-r--r--Documentation/networking/pktgen.txt28
-rw-r--r--Documentation/networking/timestamping.txt16
-rw-r--r--Documentation/networking/timestamping/timestamping.c7
9 files changed, 112 insertions, 63 deletions
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 9c723ecd0025..eeb5b2e97bed 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -542,10 +542,10 @@ mode
XOR policy: Transmit based on the selected transmit
hash policy. The default policy is a simple [(source
- MAC address XOR'd with destination MAC address) modulo
- slave count]. Alternate transmit policies may be
- selected via the xmit_hash_policy option, described
- below.
+ MAC address XOR'd with destination MAC address XOR
+ packet type ID) modulo slave count]. Alternate transmit
+ policies may be selected via the xmit_hash_policy option,
+ described below.
This mode provides load balancing and fault tolerance.
@@ -801,10 +801,11 @@ xmit_hash_policy
layer2
- Uses XOR of hardware MAC addresses to generate the
- hash. The formula is
+ Uses XOR of hardware MAC addresses and packet type ID
+ field to generate the hash. The formula is
- (source MAC XOR destination MAC) modulo slave count
+ hash = source MAC XOR destination MAC XOR packet type ID
+ slave number = hash modulo slave count
This algorithm will place all traffic to a particular
network peer on the same slave.
@@ -819,7 +820,7 @@ xmit_hash_policy
Uses XOR of hardware MAC addresses and IP addresses to
generate the hash. The formula is
- hash = source MAC XOR destination MAC
+ hash = source MAC XOR destination MAC XOR packet type ID
hash = hash XOR source IP XOR destination IP
hash = hash XOR (hash RSHIFT 16)
hash = hash XOR (hash RSHIFT 8)
@@ -2301,13 +2302,13 @@ broadcast: Like active-backup, there is not much advantage to this
bandwidth.
Additionally, the linux bonding 802.3ad implementation
- distributes traffic by peer (using an XOR of MAC addresses),
- so in a "gatewayed" configuration, all outgoing traffic will
- generally use the same device. Incoming traffic may also end
- up on a single device, but that is dependent upon the
- balancing policy of the peer's 8023.ad implementation. In a
- "local" configuration, traffic will be distributed across the
- devices in the bond.
+ distributes traffic by peer (using an XOR of MAC addresses
+ and packet type ID), so in a "gatewayed" configuration, all
+ outgoing traffic will generally use the same device. Incoming
+ traffic may also end up on a single device, but that is
+ dependent upon the balancing policy of the peer's 8023.ad
+ implementation. In a "local" configuration, traffic will be
+ distributed across the devices in the bond.
Finally, the 802.3ad mode mandates the use of the MII monitor,
therefore, the ARP monitor is not available in this mode.
diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index ee78eba78a9d..c48a9704bda8 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -586,12 +586,12 @@ team driver's classifier for its load-balancing mode, netfilter's xt_bpf
extension, PTP dissector/classifier, and much more. They are all internally
converted by the kernel into the new instruction set representation and run
in the eBPF interpreter. For in-kernel handlers, this all works transparently
-by using sk_unattached_filter_create() for setting up the filter, resp.
-sk_unattached_filter_destroy() for destroying it. The macro
-SK_RUN_FILTER(filter, ctx) transparently invokes eBPF interpreter or JITed
-code to run the filter. 'filter' is a pointer to struct sk_filter that we
-got from sk_unattached_filter_create(), and 'ctx' the given context (e.g.
-skb pointer). All constraints and restrictions from sk_chk_filter() apply
+by using bpf_prog_create() for setting up the filter, resp.
+bpf_prog_destroy() for destroying it. The macro
+BPF_PROG_RUN(filter, ctx) transparently invokes eBPF interpreter or JITed
+code to run the filter. 'filter' is a pointer to struct bpf_prog that we
+got from bpf_prog_create(), and 'ctx' the given context (e.g.
+skb pointer). All constraints and restrictions from bpf_check_classic() apply
before a conversion to the new layout is being done behind the scenes!
Currently, the classic BPF format is being used for JITing on most of the
diff --git a/Documentation/networking/i40e.txt b/Documentation/networking/i40e.txt
index f737273c6dc1..a251bf4fe9c9 100644
--- a/Documentation/networking/i40e.txt
+++ b/Documentation/networking/i40e.txt
@@ -69,8 +69,11 @@ Additional Configurations
FCoE
----
- Fiber Channel over Ethernet (FCoE) hardware offload is not currently
- supported.
+ The driver supports Fiber Channel over Ethernet (FCoE) and Data Center
+ Bridging (DCB) functionality. Configuring DCB and FCoE is outside the scope
+ of this driver doc. Refer to http://www.open-fcoe.org/ for FCoE project
+ information and http://www.open-lldp.org/ or email list
+ e1000-eedc@lists.sourceforge.net for DCB information.
MAC and VLAN anti-spoofing feature
----------------------------------
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index ab42c95f9985..29a93518bf18 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -101,19 +101,17 @@ ipfrag_high_thresh - INTEGER
Maximum memory used to reassemble IP fragments. When
ipfrag_high_thresh bytes of memory is allocated for this purpose,
the fragment handler will toss packets until ipfrag_low_thresh
- is reached.
+ is reached. This also serves as a maximum limit to namespaces
+ different from the initial one.
ipfrag_low_thresh - INTEGER
- See ipfrag_high_thresh
+ Maximum memory used to reassemble IP fragments before the kernel
+ begins to remove incomplete fragment queues to free up resources.
+ The kernel still accepts new fragments for defragmentation.
ipfrag_time - INTEGER
Time in seconds to keep an IP fragment in memory.
-ipfrag_secret_interval - INTEGER
- Regeneration interval (in seconds) of the hash secret (or lifetime
- for the hash secret) for IP fragments.
- Default: 600
-
ipfrag_max_dist - INTEGER
ipfrag_max_dist is a non-negative integer value which defines the
maximum "disorder" which is allowed among fragments which share a
@@ -1132,6 +1130,15 @@ flowlabel_consistency - BOOLEAN
FALSE: disabled
Default: TRUE
+auto_flowlabels - BOOLEAN
+ Automatically generate flow labels based based on a flow hash
+ of the packet. This allows intermediate devices, such as routers,
+ to idenfify packet flows for mechanisms like Equal Cost Multipath
+ Routing (see RFC 6438).
+ TRUE: enabled
+ FALSE: disabled
+ Default: false
+
anycast_src_echo_reply - BOOLEAN
Controls the use of anycast addresses as source addresses for ICMPv6
echo reply
@@ -1153,11 +1160,6 @@ ip6frag_low_thresh - INTEGER
ip6frag_time - INTEGER
Time in seconds to keep an IPv6 fragment in memory.
-ip6frag_secret_interval - INTEGER
- Regeneration interval (in seconds) of the hash secret (or lifetime
- for the hash secret) for IPv6 fragments.
- Default: 600
-
conf/default/*:
Change the interface-specific default settings.
@@ -1210,6 +1212,18 @@ accept_ra_defrtr - BOOLEAN
Functional default: enabled if accept_ra is enabled.
disabled if accept_ra is disabled.
+accept_ra_from_local - BOOLEAN
+ Accept RA with source-address that is found on local machine
+ if the RA is otherwise proper and able to be accepted.
+ Default is to NOT accept these as it may be an un-intended
+ network loop.
+
+ Functional default:
+ enabled if accept_ra_from_local is enabled
+ on a specific interface.
+ disabled if accept_ra_from_local is disabled
+ on a specific interface.
+
accept_ra_pinfo - BOOLEAN
Learn Prefix Information in Router Advertisement.
diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt
index 38112d512f47..a6d7cb91069e 100644
--- a/Documentation/networking/packet_mmap.txt
+++ b/Documentation/networking/packet_mmap.txt
@@ -1008,14 +1008,9 @@ hardware timestamps to be used. Note: you may need to enable the generation
of hardware timestamps with SIOCSHWTSTAMP (see related information from
Documentation/networking/timestamping.txt).
-PACKET_TIMESTAMP accepts the same integer bit field as
-SO_TIMESTAMPING. However, only the SOF_TIMESTAMPING_SYS_HARDWARE
-and SOF_TIMESTAMPING_RAW_HARDWARE values are recognized by
-PACKET_TIMESTAMP. SOF_TIMESTAMPING_SYS_HARDWARE takes precedence over
-SOF_TIMESTAMPING_RAW_HARDWARE if both bits are set.
-
- int req = 0;
- req |= SOF_TIMESTAMPING_SYS_HARDWARE;
+PACKET_TIMESTAMP accepts the same integer bit field as SO_TIMESTAMPING:
+
+ int req = SOF_TIMESTAMPING_RAW_HARDWARE;
setsockopt(fd, SOL_PACKET, PACKET_TIMESTAMP, (void *) &req, sizeof(req))
For the mmap(2)ed ring buffers, such timestamps are stored in the
@@ -1023,14 +1018,13 @@ tpacket{,2,3}_hdr structure's tp_sec and tp_{n,u}sec members. To determine
what kind of timestamp has been reported, the tp_status field is binary |'ed
with the following possible bits ...
- TP_STATUS_TS_SYS_HARDWARE
TP_STATUS_TS_RAW_HARDWARE
TP_STATUS_TS_SOFTWARE
... that are equivalent to its SOF_TIMESTAMPING_* counterparts. For the
-RX_RING, if none of those 3 are set (i.e. PACKET_TIMESTAMP is not set),
-then this means that a software fallback was invoked *within* PF_PACKET's
-processing code (less precise).
+RX_RING, if neither is set (i.e. PACKET_TIMESTAMP is not set), then a
+software fallback was invoked *within* PF_PACKET's processing code (less
+precise).
Getting timestamps for the TX_RING works as follows: i) fill the ring frames,
ii) call sendto() e.g. in blocking mode, iii) wait for status of relevant
diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index 3544c98401fd..e839e7efc835 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -272,6 +272,8 @@ Writing a PHY driver
txtsamp: Requests a transmit timestamp at the PHY level for a 'skb'
set_wol: Enable Wake-on-LAN at the PHY level
get_wol: Get the Wake-on-LAN status at the PHY level
+ read_mmd_indirect: Read PHY MMD indirect register
+ write_mmd_indirect: Write PHY MMD indirect register
Of these, only config_aneg and read_status are required to be
assigned by the driver code. The rest are optional. Also, it is
@@ -284,7 +286,21 @@ Writing a PHY driver
Feel free to look at the Marvell, Cicada, and Davicom drivers in
drivers/net/phy/ for examples (the lxt and qsemi drivers have
- not been tested as of this writing)
+ not been tested as of this writing).
+
+ The PHY's MMD register accesses are handled by the PAL framework
+ by default, but can be overridden by a specific PHY driver if
+ required. This could be the case if a PHY was released for
+ manufacturing before the MMD PHY register definitions were
+ standardized by the IEEE. Most modern PHYs will be able to use
+ the generic PAL framework for accessing the PHY's MMD registers.
+ An example of such usage is for Energy Efficient Ethernet support,
+ implemented in the PAL. This support uses the PAL to access MMD
+ registers for EEE query and configuration if the PHY supports
+ the IEEE standard access mechanisms, or can use the PHY's specific
+ access interfaces if overridden by the specific PHY driver. See
+ the Micrel driver in drivers/net/phy/ for an example of how this
+ can be implemented.
Board Fixups
diff --git a/Documentation/networking/pktgen.txt b/Documentation/networking/pktgen.txt
index 0e30c7845b2b..0dffc6e37902 100644
--- a/Documentation/networking/pktgen.txt
+++ b/Documentation/networking/pktgen.txt
@@ -24,6 +24,34 @@ For monitoring and control pktgen creates:
/proc/net/pktgen/ethX
+Tuning NIC for max performance
+==============================
+
+The default NIC setting are (likely) not tuned for pktgen's artificial
+overload type of benchmarking, as this could hurt the normal use-case.
+
+Specifically increasing the TX ring buffer in the NIC:
+ # ethtool -G ethX tx 1024
+
+A larger TX ring can improve pktgen's performance, while it can hurt
+in the general case, 1) because the TX ring buffer might get larger
+than the CPUs L1/L2 cache, 2) because it allow more queueing in the
+NIC HW layer (which is bad for bufferbloat).
+
+One should be careful to conclude, that packets/descriptors in the HW
+TX ring cause delay. Drivers usually delay cleaning up the
+ring-buffers (for various performance reasons), thus packets stalling
+the TX ring, might just be waiting for cleanup.
+
+This cleanup issues is specifically the case, for the driver ixgbe
+(Intel 82599 chip). This driver (ixgbe) combine TX+RX ring cleanups,
+and the cleanup interval is affected by the ethtool --coalesce setting
+of parameter "rx-usecs".
+
+For ixgbe use e.g "30" resulting in approx 33K interrupts/sec (1/30*10^6):
+ # ethtool -C ethX rx-usecs 30
+
+
Viewing threads
===============
/proc/net/pktgen/kpktgend_0
diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
index bc3554124903..897f942b976b 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -40,7 +40,7 @@ the set bits correspond to data that is available, then the control
message will not be generated:
SOF_TIMESTAMPING_SOFTWARE: report systime if available
-SOF_TIMESTAMPING_SYS_HARDWARE: report hwtimetrans if available
+SOF_TIMESTAMPING_SYS_HARDWARE: report hwtimetrans if available (deprecated)
SOF_TIMESTAMPING_RAW_HARDWARE: report hwtimeraw if available
It is worth noting that timestamps may be collected for reasons other
@@ -88,13 +88,12 @@ hwtimeraw is the original hardware time stamp. Filled in if
SOF_TIMESTAMPING_RAW_HARDWARE is set. No assumptions about its
relation to system time should be made.
-hwtimetrans is the hardware time stamp transformed so that it
-corresponds as good as possible to system time. This correlation is
-not perfect; as a consequence, sorting packets received via different
-NICs by their hwtimetrans may differ from the order in which they were
-received. hwtimetrans may be non-monotonic even for the same NIC.
-Filled in if SOF_TIMESTAMPING_SYS_HARDWARE is set. Requires support
-by the network device and will be empty without that support.
+hwtimetrans is always zero. This field is deprecated. It used to hold
+hw timestamps converted to system time. Instead, expose the hardware
+clock device on the NIC directly as a HW PTP clock source, to allow
+time conversion in userspace and optionally synchronize system time
+with a userspace PTP stack such as linuxptp. For the PTP clock API,
+see Documentation/ptp/ptp.txt.
SIOCSHWTSTAMP, SIOCGHWTSTAMP:
@@ -185,7 +184,6 @@ struct skb_shared_hwtstamps {
* since arbitrary point in time
*/
ktime_t hwtstamp;
- ktime_t syststamp; /* hwtstamp transformed to system time base */
};
Time stamps for outgoing packets are to be generated as follows:
diff --git a/Documentation/networking/timestamping/timestamping.c b/Documentation/networking/timestamping/timestamping.c
index 8ba82bfe6a33..5cdfd743447b 100644
--- a/Documentation/networking/timestamping/timestamping.c
+++ b/Documentation/networking/timestamping/timestamping.c
@@ -76,7 +76,6 @@ static void usage(const char *error)
" SOF_TIMESTAMPING_RX_HARDWARE - hardware time stamping of incoming packets\n"
" SOF_TIMESTAMPING_RX_SOFTWARE - software fallback for incoming packets\n"
" SOF_TIMESTAMPING_SOFTWARE - request reporting of software time stamps\n"
- " SOF_TIMESTAMPING_SYS_HARDWARE - request reporting of transformed HW time stamps\n"
" SOF_TIMESTAMPING_RAW_HARDWARE - request reporting of raw HW time stamps\n"
" SIOCGSTAMP - check last socket time stamp\n"
" SIOCGSTAMPNS - more accurate socket time stamp\n");
@@ -202,9 +201,7 @@ static void printpacket(struct msghdr *msg, int res,
(long)stamp->tv_sec,
(long)stamp->tv_nsec);
stamp++;
- printf("HW transformed %ld.%09ld ",
- (long)stamp->tv_sec,
- (long)stamp->tv_nsec);
+ /* skip deprecated HW transformed */
stamp++;
printf("HW raw %ld.%09ld",
(long)stamp->tv_sec,
@@ -361,8 +358,6 @@ int main(int argc, char **argv)
so_timestamping_flags |= SOF_TIMESTAMPING_RX_SOFTWARE;
else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_SOFTWARE"))
so_timestamping_flags |= SOF_TIMESTAMPING_SOFTWARE;
- else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_SYS_HARDWARE"))
- so_timestamping_flags |= SOF_TIMESTAMPING_SYS_HARDWARE;
else if (!strcasecmp(argv[i], "SOF_TIMESTAMPING_RAW_HARDWARE"))
so_timestamping_flags |= SOF_TIMESTAMPING_RAW_HARDWARE;
else