diff options
Diffstat (limited to 'Documentation/networking/timestamping.rst')
-rw-r--r-- | Documentation/networking/timestamping.rst | 203 |
1 files changed, 149 insertions, 54 deletions
diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst index 03f7beade470..7aabead90648 100644 --- a/Documentation/networking/timestamping.rst +++ b/Documentation/networking/timestamping.rst @@ -55,7 +55,8 @@ struct __kernel_sock_timeval format. SO_TIMESTAMP_OLD returns incorrect timestamps after the year 2038 on 32 bit machines. -1.2 SO_TIMESTAMPNS (also SO_TIMESTAMPNS_OLD and SO_TIMESTAMPNS_NEW): +1.2 SO_TIMESTAMPNS (also SO_TIMESTAMPNS_OLD and SO_TIMESTAMPNS_NEW) +------------------------------------------------------------------- This option is identical to SO_TIMESTAMP except for the returned data type. Its struct timespec allows for higher resolution (ns) timestamps than the @@ -139,6 +140,14 @@ SOF_TIMESTAMPING_TX_ACK: cumulative acknowledgment. The mechanism ignores SACK and FACK. This flag can be enabled via both socket options and control messages. +SOF_TIMESTAMPING_TX_COMPLETION: + Request tx timestamps on packet tx completion. The completion + timestamp is generated by the kernel when it receives packet a + completion report from the hardware. Hardware may report multiple + packets at once, and completion timestamps reflect the timing of the + report and not actual tx time. This flag can be enabled via both + socket options and control messages. + 1.3.2 Timestamp Reporting ^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -157,7 +166,8 @@ SOF_TIMESTAMPING_SYS_HARDWARE: SOF_TIMESTAMPING_RAW_HARDWARE: Report hardware timestamps as generated by - SOF_TIMESTAMPING_TX_HARDWARE when available. + SOF_TIMESTAMPING_TX_HARDWARE or SOF_TIMESTAMPING_RX_HARDWARE + when available. 1.3.3 Timestamp Options @@ -178,7 +188,8 @@ SOF_TIMESTAMPING_OPT_ID: identifier and returns that along with the timestamp. The identifier is derived from a per-socket u32 counter (that wraps). For datagram sockets, the counter increments with each sent packet. For stream - sockets, it increments with every byte. + sockets, it increments with every byte. For stream sockets, also set + SOF_TIMESTAMPING_OPT_ID_TCP, see the section below. The counter starts at zero. It is initialized the first time that the socket option is enabled. It is reset each time the option is @@ -191,6 +202,49 @@ SOF_TIMESTAMPING_OPT_ID: among all possibly concurrently outstanding timestamp requests for that socket. + The process can optionally override the default generated ID, by + passing a specific ID with control message SCM_TS_OPT_ID (not + supported for TCP sockets):: + + struct msghdr *msg; + ... + cmsg = CMSG_FIRSTHDR(msg); + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_type = SCM_TS_OPT_ID; + cmsg->cmsg_len = CMSG_LEN(sizeof(__u32)); + *((__u32 *) CMSG_DATA(cmsg)) = opt_id; + err = sendmsg(fd, msg, 0); + + +SOF_TIMESTAMPING_OPT_ID_TCP: + Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP + timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the + counter increments for stream sockets, but its starting point is + not entirely trivial. This option fixes that. + + For stream sockets, if SOF_TIMESTAMPING_OPT_ID is set, this should + always be set too. On datagram sockets the option has no effect. + + A reasonable expectation is that the counter is reset to zero with + the system call, so that a subsequent write() of N bytes generates + a timestamp with counter N-1. SOF_TIMESTAMPING_OPT_ID_TCP + implements this behavior under all conditions. + + SOF_TIMESTAMPING_OPT_ID without modifier often reports the same, + especially when the socket option is set when no data is in + transmission. If data is being transmitted, it may be off by the + length of the output queue (SIOCOUTQ). + + The difference is due to being based on snd_una versus write_seq. + snd_una is the offset in the stream acknowledged by the peer. This + depends on factors outside of process control, such as network RTT. + write_seq is the last byte written by the process. This offset is + not affected by external inputs. + + The difference is subtle and unlikely to be noticed when configured + at initial socket creation, when no data is queued or sent. But + SOF_TIMESTAMPING_OPT_ID_TCP behavior is more robust regardless of + when the socket option is set. SOF_TIMESTAMPING_OPT_CMSG: Support recv() cmsg for all timestamped packets. Control messages @@ -235,6 +289,23 @@ SOF_TIMESTAMPING_OPT_TX_SWHW: two separate messages will be looped to the socket's error queue, each containing just one timestamp. +SOF_TIMESTAMPING_OPT_RX_FILTER: + Filter out spurious receive timestamps: report a receive timestamp + only if the matching timestamp generation flag is enabled. + + Receive timestamps are generated early in the ingress path, before a + packet's destination socket is known. If any socket enables receive + timestamps, packets for all socket will receive timestamped packets. + Including those that request timestamp reporting with + SOF_TIMESTAMPING_SOFTWARE and/or SOF_TIMESTAMPING_RAW_HARDWARE, but + do not request receive timestamp generation. This can happen when + requesting transmit timestamps only. + + Receiving spurious timestamps is generally benign. A process can + ignore the unexpected non-zero value. But it makes behavior subtly + dependent on other sockets. This flag isolates the socket for more + deterministic behavior. + New applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to disambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate regardless of the setting of sysctl net.core.tstamp_allow_data. @@ -326,7 +397,8 @@ enabling SOF_TIMESTAMPING_OPT_ID and comparing the byte offset at send time with the value returned for each timestamp. It can prevent the situation by always flushing the TCP stack in between requests, for instance by enabling TCP_NODELAY and disabling TCP_CORK and -autocork. +autocork. After linux-4.7, a better way to prevent coalescing is +to use MSG_EOR flag at sendmsg() time. These precautions ensure that the timestamp is generated only when all bytes have passed a timestamp point, assuming that the network stack @@ -461,8 +533,8 @@ implicitly defined. ts[0] holds a software timestamp if set, ts[1] is again deprecated and ts[2] holds a hardware timestamp if set. -3. Hardware Timestamping configuration: SIOCSHWTSTAMP and SIOCGHWTSTAMP -======================================================================= +3. Hardware Timestamping configuration: ETHTOOL_MSG_TSCONFIG_SET/GET +==================================================================== Hardware time stamping must also be initialized for each device driver that is expected to do hardware time stamping. The parameter is defined in @@ -475,18 +547,20 @@ include/uapi/linux/net_tstamp.h as:: }; Desired behavior is passed into the kernel and to a specific device by -calling ioctl(SIOCSHWTSTAMP) with a pointer to a struct ifreq whose -ifr_data points to a struct hwtstamp_config. The tx_type and -rx_filter are hints to the driver what it is expected to do. If -the requested fine-grained filtering for incoming packets is not -supported, the driver may time stamp more than just the requested types -of packets. +calling the tsconfig netlink socket ``ETHTOOL_MSG_TSCONFIG_SET``. +The ``ETHTOOL_A_TSCONFIG_TX_TYPES``, ``ETHTOOL_A_TSCONFIG_RX_FILTERS`` and +``ETHTOOL_A_TSCONFIG_HWTSTAMP_FLAGS`` netlink attributes are then used to set +the struct hwtstamp_config accordingly. + +The ``ETHTOOL_A_TSCONFIG_HWTSTAMP_PROVIDER`` netlink nested attribute is used +to select the source of the hardware time stamping. It is composed of an index +for the device source and a qualifier for the type of time stamping. Drivers are free to use a more permissive configuration than the requested configuration. It is expected that drivers should only implement directly the most generic mode that can be supported. For example if the hardware can -support HWTSTAMP_FILTER_V2_EVENT, then it should generally always upscale -HWTSTAMP_FILTER_V2_L2_SYNC_MESSAGE, and so forth, as HWTSTAMP_FILTER_V2_EVENT +support HWTSTAMP_FILTER_PTP_V2_EVENT, then it should generally always upscale +HWTSTAMP_FILTER_PTP_V2_L2_SYNC, and so forth, as HWTSTAMP_FILTER_PTP_V2_EVENT is more generic (and more useful to applications). A driver which supports hardware time stamping shall update the struct @@ -499,9 +573,16 @@ Only a processes with admin rights may change the configuration. User space is responsible to ensure that multiple processes don't interfere with each other and that the settings are reset. -Any process can read the actual configuration by passing this -structure to ioctl(SIOCGHWTSTAMP) in the same way. However, this has -not been implemented in all drivers. +Any process can read the actual configuration by requesting tsconfig netlink +socket ``ETHTOOL_MSG_TSCONFIG_GET``. + +The legacy configuration is the use of the ioctl(SIOCSHWTSTAMP) with a pointer +to a struct ifreq whose ifr_data points to a struct hwtstamp_config. +The tx_type and rx_filter are hints to the driver what it is expected to do. +If the requested fine-grained filtering for incoming packets is not +supported, the driver may time stamp more than just the requested types +of packets. ioctl(SIOCGHWTSTAMP) is used in the same way as the +ioctl(SIOCSHWTSTAMP). However, this has not been implemented in all drivers. :: @@ -546,9 +627,10 @@ not been implemented in all drivers. -------------------------------------------------------- A driver which supports hardware time stamping must support the -SIOCSHWTSTAMP ioctl and update the supplied struct hwtstamp_config with -the actual values as described in the section on SIOCSHWTSTAMP. It -should also support SIOCGHWTSTAMP. +ndo_hwtstamp_set NDO or the legacy SIOCSHWTSTAMP ioctl and update the +supplied struct hwtstamp_config with the actual values as described in +the section on SIOCSHWTSTAMP. It should also support ndo_hwtstamp_get or +the legacy SIOCGHWTSTAMP. Time stamps for received packets must be stored in the skb. To get a pointer to the shared time stamp structure of the skb call skb_hwtstamps(). Then @@ -581,8 +663,8 @@ Time stamps for outgoing packets are to be generated as follows: and hardware timestamping is not possible (SKBTX_IN_PROGRESS not set). - As soon as the driver has sent the packet and/or obtained a hardware time stamp for it, it passes the time stamp back by - calling skb_hwtstamp_tx() with the original skb, the raw - hardware time stamp. skb_hwtstamp_tx() clones the original skb and + calling skb_tstamp_tx() with the original skb, the raw + hardware time stamp. skb_tstamp_tx() clones the original skb and adds the timestamps, therefore the original skb has to be freed now. If obtaining the hardware time stamp somehow fails, then the driver should not fall back to software time stamping. The rationale is that @@ -624,35 +706,50 @@ interfaces of a DSA switch to share the same PHC. By design, PTP timestamping with a DSA switch does not need any special handling in the driver for the host port it is attached to. However, when the host port also supports PTP timestamping, DSA will take care of intercepting -the ``.ndo_do_ioctl`` calls towards the host port, and block attempts to enable +the ``.ndo_eth_ioctl`` calls towards the host port, and block attempts to enable hardware timestamping on it. This is because the SO_TIMESTAMPING API does not allow the delivery of multiple hardware timestamps for the same packet, so anybody else except for the DSA switch port must be prevented from doing so. -In code, DSA provides for most of the infrastructure for timestamping already, -in generic code: a BPF classifier (``ptp_classify_raw``) is used to identify -PTP event messages (any other packets, including PTP general messages, are not -timestamped), and provides two hooks to drivers: - -- ``.port_txtstamp()``: The driver is passed a clone of the timestampable skb - to be transmitted, before actually transmitting it. Typically, a switch will - have a PTP TX timestamp register (or sometimes a FIFO) where the timestamp - becomes available. There may be an IRQ that is raised upon this timestamp's - availability, or the driver might have to poll after invoking - ``dev_queue_xmit()`` towards the host interface. Either way, in the - ``.port_txtstamp()`` method, the driver only needs to save the clone for - later use (when the timestamp becomes available). Each skb is annotated with - a pointer to its clone, in ``DSA_SKB_CB(skb)->clone``, to ease the driver's - job of keeping track of which clone belongs to which skb. - -- ``.port_rxtstamp()``: The original (and only) timestampable skb is provided - to the driver, for it to annotate it with a timestamp, if that is immediately - available, or defer to later. On reception, timestamps might either be - available in-band (through metadata in the DSA header, or attached in other - ways to the packet), or out-of-band (through another RX timestamping FIFO). - Deferral on RX is typically necessary when retrieving the timestamp needs a - sleepable context. In that case, it is the responsibility of the DSA driver - to call ``netif_rx_ni()`` on the freshly timestamped skb. +In the generic layer, DSA provides the following infrastructure for PTP +timestamping: + +- ``.port_txtstamp()``: a hook called prior to the transmission of + packets with a hardware TX timestamping request from user space. + This is required for two-step timestamping, since the hardware + timestamp becomes available after the actual MAC transmission, so the + driver must be prepared to correlate the timestamp with the original + packet so that it can re-enqueue the packet back into the socket's + error queue. To save the packet for when the timestamp becomes + available, the driver can call ``skb_clone_sk`` , save the clone pointer + in skb->cb and enqueue a tx skb queue. Typically, a switch will have a + PTP TX timestamp register (or sometimes a FIFO) where the timestamp + becomes available. In case of a FIFO, the hardware might store + key-value pairs of PTP sequence ID/message type/domain number and the + actual timestamp. To perform the correlation correctly between the + packets in a queue waiting for timestamping and the actual timestamps, + drivers can use a BPF classifier (``ptp_classify_raw``) to identify + the PTP transport type, and ``ptp_parse_header`` to interpret the PTP + header fields. There may be an IRQ that is raised upon this + timestamp's availability, or the driver might have to poll after + invoking ``dev_queue_xmit()`` towards the host interface. + One-step TX timestamping do not require packet cloning, since there is + no follow-up message required by the PTP protocol (because the + TX timestamp is embedded into the packet by the MAC), and therefore + user space does not expect the packet annotated with the TX timestamp + to be re-enqueued into its socket's error queue. + +- ``.port_rxtstamp()``: On RX, the BPF classifier is run by DSA to + identify PTP event messages (any other packets, including PTP general + messages, are not timestamped). The original (and only) timestampable + skb is provided to the driver, for it to annotate it with a timestamp, + if that is immediately available, or defer to later. On reception, + timestamps might either be available in-band (through metadata in the + DSA header, or attached in other ways to the packet), or out-of-band + (through another RX timestamping FIFO). Deferral on RX is typically + necessary when retrieving the timestamp needs a sleepable context. In + that case, it is the responsibility of the DSA driver to call + ``netif_rx()`` on the freshly timestamped skb. 3.2.2 Ethernet PHYs ^^^^^^^^^^^^^^^^^^^ @@ -672,7 +769,7 @@ ethtool ioctl operations for them need to be mediated by their respective MAC driver. Therefore, as opposed to DSA switches, modifications need to be done to each individual MAC driver for PHY timestamping support. This entails: -- Checking, in ``.ndo_do_ioctl``, whether ``phy_has_hwtstamp(netdev->phydev)`` +- Checking, in ``.ndo_eth_ioctl``, whether ``phy_has_hwtstamp(netdev->phydev)`` is true or not. If it is, then the MAC driver should not process this request but instead pass it on to the PHY using ``phy_mii_ioctl()``. @@ -714,11 +811,9 @@ Documentation/devicetree/bindings/ptp/timestamper.txt for more details. 3.2.4 Other caveats for MAC drivers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Stacked PHCs, especially DSA (but not only) - since that doesn't require any -modification to MAC drivers, so it is more difficult to ensure correctness of -all possible code paths - is that they uncover bugs which were impossible to -trigger before the existence of stacked PTP clocks. One example has to do with -this line of code, already presented earlier:: +The use of stacked PHCs may uncover MAC driver bugs which were impossible to +trigger without them. One example has to do with this line of code, already +presented earlier:: skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS; @@ -731,7 +826,7 @@ For example, a typical driver design for TX timestamping might be to split the transmission part into 2 portions: 1. "TX": checks whether PTP timestamping has been previously enabled through - the ``.ndo_do_ioctl`` ("``priv->hwtstamp_tx_enabled == true``") and the + the ``.ndo_eth_ioctl`` ("``priv->hwtstamp_tx_enabled == true``") and the current skb requires a TX timestamp ("``skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP``"). If this is true, it sets the "``skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS``" flag. Note: as |