aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/networking/timestamping.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking/timestamping.rst')
-rw-r--r--Documentation/networking/timestamping.rst165
1 files changed, 165 insertions, 0 deletions
diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
index 1adead6a4527..03f7beade470 100644
--- a/Documentation/networking/timestamping.rst
+++ b/Documentation/networking/timestamping.rst
@@ -589,3 +589,168 @@ Time stamps for outgoing packets are to be generated as follows:
this would occur at a later time in the processing pipeline than other
software time stamping and therefore could lead to unexpected deltas
between time stamps.
+
+3.2 Special considerations for stacked PTP Hardware Clocks
+----------------------------------------------------------
+
+There are situations when there may be more than one PHC (PTP Hardware Clock)
+in the data path of a packet. The kernel has no explicit mechanism to allow the
+user to select which PHC to use for timestamping Ethernet frames. Instead, the
+assumption is that the outermost PHC is always the most preferable, and that
+kernel drivers collaborate towards achieving that goal. Currently there are 3
+cases of stacked PHCs, detailed below:
+
+3.2.1 DSA (Distributed Switch Architecture) switches
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+These are Ethernet switches which have one of their ports connected to an
+(otherwise completely unaware) host Ethernet interface, and perform the role of
+a port multiplier with optional forwarding acceleration features. Each DSA
+switch port is visible to the user as a standalone (virtual) network interface,
+and its network I/O is performed, under the hood, indirectly through the host
+interface (redirecting to the host port on TX, and intercepting frames on RX).
+
+When a DSA switch is attached to a host port, PTP synchronization has to
+suffer, since the switch's variable queuing delay introduces a path delay
+jitter between the host port and its PTP partner. For this reason, some DSA
+switches include a timestamping clock of their own, and have the ability to
+perform network timestamping on their own MAC, such that path delays only
+measure wire and PHY propagation latencies. Timestamping DSA switches are
+supported in Linux and expose the same ABI as any other network interface (save
+for the fact that the DSA interfaces are in fact virtual in terms of network
+I/O, they do have their own PHC). It is typical, but not mandatory, for all
+interfaces of a DSA switch to share the same PHC.
+
+By design, PTP timestamping with a DSA switch does not need any special
+handling in the driver for the host port it is attached to. However, when the
+host port also supports PTP timestamping, DSA will take care of intercepting
+the ``.ndo_do_ioctl`` calls towards the host port, and block attempts to enable
+hardware timestamping on it. This is because the SO_TIMESTAMPING API does not
+allow the delivery of multiple hardware timestamps for the same packet, so
+anybody else except for the DSA switch port must be prevented from doing so.
+
+In code, DSA provides for most of the infrastructure for timestamping already,
+in generic code: a BPF classifier (``ptp_classify_raw``) is used to identify
+PTP event messages (any other packets, including PTP general messages, are not
+timestamped), and provides two hooks to drivers:
+
+- ``.port_txtstamp()``: The driver is passed a clone of the timestampable skb
+ to be transmitted, before actually transmitting it. Typically, a switch will
+ have a PTP TX timestamp register (or sometimes a FIFO) where the timestamp
+ becomes available. There may be an IRQ that is raised upon this timestamp's
+ availability, or the driver might have to poll after invoking
+ ``dev_queue_xmit()`` towards the host interface. Either way, in the
+ ``.port_txtstamp()`` method, the driver only needs to save the clone for
+ later use (when the timestamp becomes available). Each skb is annotated with
+ a pointer to its clone, in ``DSA_SKB_CB(skb)->clone``, to ease the driver's
+ job of keeping track of which clone belongs to which skb.
+
+- ``.port_rxtstamp()``: The original (and only) timestampable skb is provided
+ to the driver, for it to annotate it with a timestamp, if that is immediately
+ available, or defer to later. On reception, timestamps might either be
+ available in-band (through metadata in the DSA header, or attached in other
+ ways to the packet), or out-of-band (through another RX timestamping FIFO).
+ Deferral on RX is typically necessary when retrieving the timestamp needs a
+ sleepable context. In that case, it is the responsibility of the DSA driver
+ to call ``netif_rx_ni()`` on the freshly timestamped skb.
+
+3.2.2 Ethernet PHYs
+^^^^^^^^^^^^^^^^^^^
+
+These are devices that typically fulfill a Layer 1 role in the network stack,
+hence they do not have a representation in terms of a network interface as DSA
+switches do. However, PHYs may be able to detect and timestamp PTP packets, for
+performance reasons: timestamps taken as close as possible to the wire have the
+potential to yield a more stable and precise synchronization.
+
+A PHY driver that supports PTP timestamping must create a ``struct
+mii_timestamper`` and add a pointer to it in ``phydev->mii_ts``. The presence
+of this pointer will be checked by the networking stack.
+
+Since PHYs do not have network interface representations, the timestamping and
+ethtool ioctl operations for them need to be mediated by their respective MAC
+driver. Therefore, as opposed to DSA switches, modifications need to be done
+to each individual MAC driver for PHY timestamping support. This entails:
+
+- Checking, in ``.ndo_do_ioctl``, whether ``phy_has_hwtstamp(netdev->phydev)``
+ is true or not. If it is, then the MAC driver should not process this request
+ but instead pass it on to the PHY using ``phy_mii_ioctl()``.
+
+- On RX, special intervention may or may not be needed, depending on the
+ function used to deliver skb's up the network stack. In the case of plain
+ ``netif_rx()`` and similar, MAC drivers must check whether
+ ``skb_defer_rx_timestamp(skb)`` is necessary or not - and if it is, don't
+ call ``netif_rx()`` at all. If ``CONFIG_NETWORK_PHY_TIMESTAMPING`` is
+ enabled, and ``skb->dev->phydev->mii_ts`` exists, its ``.rxtstamp()`` hook
+ will be called now, to determine, using logic very similar to DSA, whether
+ deferral for RX timestamping is necessary. Again like DSA, it becomes the
+ responsibility of the PHY driver to send the packet up the stack when the
+ timestamp is available.
+
+ For other skb receive functions, such as ``napi_gro_receive`` and
+ ``netif_receive_skb``, the stack automatically checks whether
+ ``skb_defer_rx_timestamp()`` is necessary, so this check is not needed inside
+ the driver.
+
+- On TX, again, special intervention might or might not be needed. The
+ function that calls the ``mii_ts->txtstamp()`` hook is named
+ ``skb_clone_tx_timestamp()``. This function can either be called directly
+ (case in which explicit MAC driver support is indeed needed), but the
+ function also piggybacks from the ``skb_tx_timestamp()`` call, which many MAC
+ drivers already perform for software timestamping purposes. Therefore, if a
+ MAC supports software timestamping, it does not need to do anything further
+ at this stage.
+
+3.2.3 MII bus snooping devices
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+These perform the same role as timestamping Ethernet PHYs, save for the fact
+that they are discrete devices and can therefore be used in conjunction with
+any PHY even if it doesn't support timestamping. In Linux, they are
+discoverable and attachable to a ``struct phy_device`` through Device Tree, and
+for the rest, they use the same mii_ts infrastructure as those. See
+Documentation/devicetree/bindings/ptp/timestamper.txt for more details.
+
+3.2.4 Other caveats for MAC drivers
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Stacked PHCs, especially DSA (but not only) - since that doesn't require any
+modification to MAC drivers, so it is more difficult to ensure correctness of
+all possible code paths - is that they uncover bugs which were impossible to
+trigger before the existence of stacked PTP clocks. One example has to do with
+this line of code, already presented earlier::
+
+ skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
+
+Any TX timestamping logic, be it a plain MAC driver, a DSA switch driver, a PHY
+driver or a MII bus snooping device driver, should set this flag.
+But a MAC driver that is unaware of PHC stacking might get tripped up by
+somebody other than itself setting this flag, and deliver a duplicate
+timestamp.
+For example, a typical driver design for TX timestamping might be to split the
+transmission part into 2 portions:
+
+1. "TX": checks whether PTP timestamping has been previously enabled through
+ the ``.ndo_do_ioctl`` ("``priv->hwtstamp_tx_enabled == true``") and the
+ current skb requires a TX timestamp ("``skb_shinfo(skb)->tx_flags &
+ SKBTX_HW_TSTAMP``"). If this is true, it sets the
+ "``skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS``" flag. Note: as
+ described above, in the case of a stacked PHC system, this condition should
+ never trigger, as this MAC is certainly not the outermost PHC. But this is
+ not where the typical issue is. Transmission proceeds with this packet.
+
+2. "TX confirmation": Transmission has finished. The driver checks whether it
+ is necessary to collect any TX timestamp for it. Here is where the typical
+ issues are: the MAC driver takes a shortcut and only checks whether
+ "``skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS``" was set. With a stacked
+ PHC system, this is incorrect because this MAC driver is not the only entity
+ in the TX data path who could have enabled SKBTX_IN_PROGRESS in the first
+ place.
+
+The correct solution for this problem is for MAC drivers to have a compound
+check in their "TX confirmation" portion, not only for
+"``skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS``", but also for
+"``priv->hwtstamp_tx_enabled == true``". Because the rest of the system ensures
+that PTP timestamping is not enabled for anything other than the outermost PHC,
+this enhanced check will avoid delivering a duplicated TX timestamp to user
+space.