diff options
Diffstat (limited to 'Documentation/networking/dsa/dsa.txt')
-rw-r--r-- | Documentation/networking/dsa/dsa.txt | 584 |
1 files changed, 0 insertions, 584 deletions
diff --git a/Documentation/networking/dsa/dsa.txt b/Documentation/networking/dsa/dsa.txt deleted file mode 100644 index 43ef767bc440..000000000000 --- a/Documentation/networking/dsa/dsa.txt +++ /dev/null @@ -1,584 +0,0 @@ -Distributed Switch Architecture -=============================== - -Introduction -============ - -This document describes the Distributed Switch Architecture (DSA) subsystem -design principles, limitations, interactions with other subsystems, and how to -develop drivers for this subsystem as well as a TODO for developers interested -in joining the effort. - -Design principles -================= - -The Distributed Switch Architecture is a subsystem which was primarily designed -to support Marvell Ethernet switches (MV88E6xxx, a.k.a Linkstreet product line) -using Linux, but has since evolved to support other vendors as well. - -The original philosophy behind this design was to be able to use unmodified -Linux tools such as bridge, iproute2, ifconfig to work transparently whether -they configured/queried a switch port network device or a regular network -device. - -An Ethernet switch is typically comprised of multiple front-panel ports, and one -or more CPU or management port. The DSA subsystem currently relies on the -presence of a management port connected to an Ethernet controller capable of -receiving Ethernet frames from the switch. This is a very common setup for all -kinds of Ethernet switches found in Small Home and Office products: routers, -gateways, or even top-of-the rack switches. This host Ethernet controller will -be later referred to as "master" and "cpu" in DSA terminology and code. - -The D in DSA stands for Distributed, because the subsystem has been designed -with the ability to configure and manage cascaded switches on top of each other -using upstream and downstream Ethernet links between switches. These specific -ports are referred to as "dsa" ports in DSA terminology and code. A collection -of multiple switches connected to each other is called a "switch tree". - -For each front-panel port, DSA will create specialized network devices which are -used as controlling and data-flowing endpoints for use by the Linux networking -stack. These specialized network interfaces are referred to as "slave" network -interfaces in DSA terminology and code. - -The ideal case for using DSA is when an Ethernet switch supports a "switch tag" -which is a hardware feature making the switch insert a specific tag for each -Ethernet frames it received to/from specific ports to help the management -interface figure out: - -- what port is this frame coming from -- what was the reason why this frame got forwarded -- how to send CPU originated traffic to specific ports - -The subsystem does support switches not capable of inserting/stripping tags, but -the features might be slightly limited in that case (traffic separation relies -on Port-based VLAN IDs). - -Note that DSA does not currently create network interfaces for the "cpu" and -"dsa" ports because: - -- the "cpu" port is the Ethernet switch facing side of the management - controller, and as such, would create a duplication of feature, since you - would get two interfaces for the same conduit: master netdev, and "cpu" netdev - -- the "dsa" port(s) are just conduits between two or more switches, and as such - cannot really be used as proper network interfaces either, only the - downstream, or the top-most upstream interface makes sense with that model - -Switch tagging protocols ------------------------- - -DSA currently supports 5 different tagging protocols, and a tag-less mode as -well. The different protocols are implemented in: - -net/dsa/tag_trailer.c: Marvell's 4 trailer tag mode (legacy) -net/dsa/tag_dsa.c: Marvell's original DSA tag -net/dsa/tag_edsa.c: Marvell's enhanced DSA tag -net/dsa/tag_brcm.c: Broadcom's 4 bytes tag -net/dsa/tag_qca.c: Qualcomm's 2 bytes tag - -The exact format of the tag protocol is vendor specific, but in general, they -all contain something which: - -- identifies which port the Ethernet frame came from/should be sent to -- provides a reason why this frame was forwarded to the management interface - -Master network devices ----------------------- - -Master network devices are regular, unmodified Linux network device drivers for -the CPU/management Ethernet interface. Such a driver might occasionally need to -know whether DSA is enabled (e.g.: to enable/disable specific offload features), -but the DSA subsystem has been proven to work with industry standard drivers: -e1000e, mv643xx_eth etc. without having to introduce modifications to these -drivers. Such network devices are also often referred to as conduit network -devices since they act as a pipe between the host processor and the hardware -Ethernet switch. - -Networking stack hooks ----------------------- - -When a master netdev is used with DSA, a small hook is placed in in the -networking stack is in order to have the DSA subsystem process the Ethernet -switch specific tagging protocol. DSA accomplishes this by registering a -specific (and fake) Ethernet type (later becoming skb->protocol) with the -networking stack, this is also known as a ptype or packet_type. A typical -Ethernet Frame receive sequence looks like this: - -Master network device (e.g.: e1000e): - -Receive interrupt fires: -- receive function is invoked -- basic packet processing is done: getting length, status etc. -- packet is prepared to be processed by the Ethernet layer by calling - eth_type_trans - -net/ethernet/eth.c: - -eth_type_trans(skb, dev) - if (dev->dsa_ptr != NULL) - -> skb->protocol = ETH_P_XDSA - -drivers/net/ethernet/*: - -netif_receive_skb(skb) - -> iterate over registered packet_type - -> invoke handler for ETH_P_XDSA, calls dsa_switch_rcv() - -net/dsa/dsa.c: - -> dsa_switch_rcv() - -> invoke switch tag specific protocol handler in - net/dsa/tag_*.c - -net/dsa/tag_*.c: - -> inspect and strip switch tag protocol to determine originating port - -> locate per-port network device - -> invoke eth_type_trans() with the DSA slave network device - -> invoked netif_receive_skb() - -Past this point, the DSA slave network devices get delivered regular Ethernet -frames that can be processed by the networking stack. - -Slave network devices ---------------------- - -Slave network devices created by DSA are stacked on top of their master network -device, each of these network interfaces will be responsible for being a -controlling and data-flowing end-point for each front-panel port of the switch. -These interfaces are specialized in order to: - -- insert/remove the switch tag protocol (if it exists) when sending traffic - to/from specific switch ports -- query the switch for ethtool operations: statistics, link state, - Wake-on-LAN, register dumps... -- external/internal PHY management: link, auto-negotiation etc. - -These slave network devices have custom net_device_ops and ethtool_ops function -pointers which allow DSA to introduce a level of layering between the networking -stack/ethtool, and the switch driver implementation. - -Upon frame transmission from these slave network devices, DSA will look up which -switch tagging protocol is currently registered with these network devices, and -invoke a specific transmit routine which takes care of adding the relevant -switch tag in the Ethernet frames. - -These frames are then queued for transmission using the master network device -ndo_start_xmit() function, since they contain the appropriate switch tag, the -Ethernet switch will be able to process these incoming frames from the -management interface and delivers these frames to the physical switch port. - -Graphical representation ------------------------- - -Summarized, this is basically how DSA looks like from a network device -perspective: - - - |--------------------------- - | CPU network device (eth0)| - ---------------------------- - | <tag added by switch | - | | - | | - | tag added by CPU> | - |--------------------------------------------| - | Switch driver | - |--------------------------------------------| - || || || - |-------| |-------| |-------| - | sw0p0 | | sw0p1 | | sw0p2 | - |-------| |-------| |-------| - -Slave MDIO bus --------------- - -In order to be able to read to/from a switch PHY built into it, DSA creates a -slave MDIO bus which allows a specific switch driver to divert and intercept -MDIO reads/writes towards specific PHY addresses. In most MDIO-connected -switches, these functions would utilize direct or indirect PHY addressing mode -to return standard MII registers from the switch builtin PHYs, allowing the PHY -library and/or to return link status, link partner pages, auto-negotiation -results etc.. - -For Ethernet switches which have both external and internal MDIO busses, the -slave MII bus can be utilized to mux/demux MDIO reads and writes towards either -internal or external MDIO devices this switch might be connected to: internal -PHYs, external PHYs, or even external switches. - -Data structures ---------------- - -DSA data structures are defined in include/net/dsa.h as well as -net/dsa/dsa_priv.h. - -dsa_chip_data: platform data configuration for a given switch device, this -structure describes a switch device's parent device, its address, as well as -various properties of its ports: names/labels, and finally a routing table -indication (when cascading switches) - -dsa_platform_data: platform device configuration data which can reference a -collection of dsa_chip_data structure if multiples switches are cascaded, the -master network device this switch tree is attached to needs to be referenced - -dsa_switch_tree: structure assigned to the master network device under -"dsa_ptr", this structure references a dsa_platform_data structure as well as -the tagging protocol supported by the switch tree, and which receive/transmit -function hooks should be invoked, information about the directly attached switch -is also provided: CPU port. Finally, a collection of dsa_switch are referenced -to address individual switches in the tree. - -dsa_switch: structure describing a switch device in the tree, referencing a -dsa_switch_tree as a backpointer, slave network devices, master network device, -and a reference to the backing dsa_switch_ops - -dsa_switch_ops: structure referencing function pointers, see below for a full -description. - -Design limitations -================== - -Limits on the number of devices and ports ------------------------------------------ - -DSA currently limits the number of maximum switches within a tree to 4 -(DSA_MAX_SWITCHES), and the number of ports per switch to 12 (DSA_MAX_PORTS). -These limits could be extended to support larger configurations would this need -arise. - -Lack of CPU/DSA network devices -------------------------------- - -DSA does not currently create slave network devices for the CPU or DSA ports, as -described before. This might be an issue in the following cases: - -- inability to fetch switch CPU port statistics counters using ethtool, which - can make it harder to debug MDIO switch connected using xMII interfaces - -- inability to configure the CPU port link parameters based on the Ethernet - controller capabilities attached to it: http://patchwork.ozlabs.org/patch/509806/ - -- inability to configure specific VLAN IDs / trunking VLANs between switches - when using a cascaded setup - -Common pitfalls using DSA setups --------------------------------- - -Once a master network device is configured to use DSA (dev->dsa_ptr becomes -non-NULL), and the switch behind it expects a tagging protocol, this network -interface can only exclusively be used as a conduit interface. Sending packets -directly through this interface (e.g.: opening a socket using this interface) -will not make us go through the switch tagging protocol transmit function, so -the Ethernet switch on the other end, expecting a tag will typically drop this -frame. - -Slave network devices check that the master network device is UP before allowing -you to administratively bring UP these slave network devices. A common -configuration mistake is forgetting to bring UP the master network device first. - -Interactions with other subsystems -================================== - -DSA currently leverages the following subsystems: - -- MDIO/PHY library: drivers/net/phy/phy.c, mdio_bus.c -- Switchdev: net/switchdev/* -- Device Tree for various of_* functions - -MDIO/PHY library ----------------- - -Slave network devices exposed by DSA may or may not be interfacing with PHY -devices (struct phy_device as defined in include/linux/phy.h), but the DSA -subsystem deals with all possible combinations: - -- internal PHY devices, built into the Ethernet switch hardware -- external PHY devices, connected via an internal or external MDIO bus -- internal PHY devices, connected via an internal MDIO bus -- special, non-autonegotiated or non MDIO-managed PHY devices: SFPs, MoCA; a.k.a - fixed PHYs - -The PHY configuration is done by the dsa_slave_phy_setup() function and the -logic basically looks like this: - -- if Device Tree is used, the PHY device is looked up using the standard - "phy-handle" property, if found, this PHY device is created and registered - using of_phy_connect() - -- if Device Tree is used, and the PHY device is "fixed", that is, conforms to - the definition of a non-MDIO managed PHY as defined in - Documentation/devicetree/bindings/net/fixed-link.txt, the PHY is registered - and connected transparently using the special fixed MDIO bus driver - -- finally, if the PHY is built into the switch, as is very common with - standalone switch packages, the PHY is probed using the slave MII bus created - by DSA - - -SWITCHDEV ---------- - -DSA directly utilizes SWITCHDEV when interfacing with the bridge layer, and -more specifically with its VLAN filtering portion when configuring VLANs on top -of per-port slave network devices. Since DSA primarily deals with -MDIO-connected switches, although not exclusively, SWITCHDEV's -prepare/abort/commit phases are often simplified into a prepare phase which -checks whether the operation is supported by the DSA switch driver, and a commit -phase which applies the changes. - -As of today, the only SWITCHDEV objects supported by DSA are the FDB and VLAN -objects. - -Device Tree ------------ - -DSA features a standardized binding which is documented in -Documentation/devicetree/bindings/net/dsa/dsa.txt. PHY/MDIO library helper -functions such as of_get_phy_mode(), of_phy_connect() are also used to query -per-port PHY specific details: interface connection, MDIO bus location etc.. - -Driver development -================== - -DSA switch drivers need to implement a dsa_switch_ops structure which will -contain the various members described below. - -register_switch_driver() registers this dsa_switch_ops in its internal list -of drivers to probe for. unregister_switch_driver() does the exact opposite. - -Unless requested differently by setting the priv_size member accordingly, DSA -does not allocate any driver private context space. - -Switch configuration --------------------- - -- tag_protocol: this is to indicate what kind of tagging protocol is supported, - should be a valid value from the dsa_tag_protocol enum - -- probe: probe routine which will be invoked by the DSA platform device upon - registration to test for the presence/absence of a switch device. For MDIO - devices, it is recommended to issue a read towards internal registers using - the switch pseudo-PHY and return whether this is a supported device. For other - buses, return a non-NULL string - -- setup: setup function for the switch, this function is responsible for setting - up the dsa_switch_ops private structure with all it needs: register maps, - interrupts, mutexes, locks etc.. This function is also expected to properly - configure the switch to separate all network interfaces from each other, that - is, they should be isolated by the switch hardware itself, typically by creating - a Port-based VLAN ID for each port and allowing only the CPU port and the - specific port to be in the forwarding vector. Ports that are unused by the - platform should be disabled. Past this function, the switch is expected to be - fully configured and ready to serve any kind of request. It is recommended - to issue a software reset of the switch during this setup function in order to - avoid relying on what a previous software agent such as a bootloader/firmware - may have previously configured. - -PHY devices and link management -------------------------------- - -- get_phy_flags: Some switches are interfaced to various kinds of Ethernet PHYs, - if the PHY library PHY driver needs to know about information it cannot obtain - on its own (e.g.: coming from switch memory mapped registers), this function - should return a 32-bits bitmask of "flags", that is private between the switch - driver and the Ethernet PHY driver in drivers/net/phy/*. - -- phy_read: Function invoked by the DSA slave MDIO bus when attempting to read - the switch port MDIO registers. If unavailable, return 0xffff for each read. - For builtin switch Ethernet PHYs, this function should allow reading the link - status, auto-negotiation results, link partner pages etc.. - -- phy_write: Function invoked by the DSA slave MDIO bus when attempting to write - to the switch port MDIO registers. If unavailable return a negative error - code. - -- adjust_link: Function invoked by the PHY library when a slave network device - is attached to a PHY device. This function is responsible for appropriately - configuring the switch port link parameters: speed, duplex, pause based on - what the phy_device is providing. - -- fixed_link_update: Function invoked by the PHY library, and specifically by - the fixed PHY driver asking the switch driver for link parameters that could - not be auto-negotiated, or obtained by reading the PHY registers through MDIO. - This is particularly useful for specific kinds of hardware such as QSGMII, - MoCA or other kinds of non-MDIO managed PHYs where out of band link - information is obtained - -Ethtool operations ------------------- - -- get_strings: ethtool function used to query the driver's strings, will - typically return statistics strings, private flags strings etc. - -- get_ethtool_stats: ethtool function used to query per-port statistics and - return their values. DSA overlays slave network devices general statistics: - RX/TX counters from the network device, with switch driver specific statistics - per port - -- get_sset_count: ethtool function used to query the number of statistics items - -- get_wol: ethtool function used to obtain Wake-on-LAN settings per-port, this - function may, for certain implementations also query the master network device - Wake-on-LAN settings if this interface needs to participate in Wake-on-LAN - -- set_wol: ethtool function used to configure Wake-on-LAN settings per-port, - direct counterpart to set_wol with similar restrictions - -- set_eee: ethtool function which is used to configure a switch port EEE (Green - Ethernet) settings, can optionally invoke the PHY library to enable EEE at the - PHY level if relevant. This function should enable EEE at the switch port MAC - controller and data-processing logic - -- get_eee: ethtool function which is used to query a switch port EEE settings, - this function should return the EEE state of the switch port MAC controller - and data-processing logic as well as query the PHY for its currently configured - EEE settings - -- get_eeprom_len: ethtool function returning for a given switch the EEPROM - length/size in bytes - -- get_eeprom: ethtool function returning for a given switch the EEPROM contents - -- set_eeprom: ethtool function writing specified data to a given switch EEPROM - -- get_regs_len: ethtool function returning the register length for a given - switch - -- get_regs: ethtool function returning the Ethernet switch internal register - contents. This function might require user-land code in ethtool to - pretty-print register values and registers - -Power management ----------------- - -- suspend: function invoked by the DSA platform device when the system goes to - suspend, should quiesce all Ethernet switch activities, but keep ports - participating in Wake-on-LAN active as well as additional wake-up logic if - supported - -- resume: function invoked by the DSA platform device when the system resumes, - should resume all Ethernet switch activities and re-configure the switch to be - in a fully active state - -- port_enable: function invoked by the DSA slave network device ndo_open - function when a port is administratively brought up, this function should be - fully enabling a given switch port. DSA takes care of marking the port with - BR_STATE_BLOCKING if the port is a bridge member, or BR_STATE_FORWARDING if it - was not, and propagating these changes down to the hardware - -- port_disable: function invoked by the DSA slave network device ndo_close - function when a port is administratively brought down, this function should be - fully disabling a given switch port. DSA takes care of marking the port with - BR_STATE_DISABLED and propagating changes to the hardware if this port is - disabled while being a bridge member - -Bridge layer ------------- - -- port_bridge_join: bridge layer function invoked when a given switch port is - added to a bridge, this function should be doing the necessary at the switch - level to permit the joining port from being added to the relevant logical - domain for it to ingress/egress traffic with other members of the bridge. - -- port_bridge_leave: bridge layer function invoked when a given switch port is - removed from a bridge, this function should be doing the necessary at the - switch level to deny the leaving port from ingress/egress traffic from the - remaining bridge members. When the port leaves the bridge, it should be aged - out at the switch hardware for the switch to (re) learn MAC addresses behind - this port. - -- port_stp_state_set: bridge layer function invoked when a given switch port STP - state is computed by the bridge layer and should be propagated to switch - hardware to forward/block/learn traffic. The switch driver is responsible for - computing a STP state change based on current and asked parameters and perform - the relevant ageing based on the intersection results - -Bridge VLAN filtering ---------------------- - -- port_vlan_filtering: bridge layer function invoked when the bridge gets - configured for turning on or off VLAN filtering. If nothing specific needs to - be done at the hardware level, this callback does not need to be implemented. - When VLAN filtering is turned on, the hardware must be programmed with - rejecting 802.1Q frames which have VLAN IDs outside of the programmed allowed - VLAN ID map/rules. If there is no PVID programmed into the switch port, - untagged frames must be rejected as well. When turned off the switch must - accept any 802.1Q frames irrespective of their VLAN ID, and untagged frames are - allowed. - -- port_vlan_prepare: bridge layer function invoked when the bridge prepares the - configuration of a VLAN on the given port. If the operation is not supported - by the hardware, this function should return -EOPNOTSUPP to inform the bridge - code to fallback to a software implementation. No hardware setup must be done - in this function. See port_vlan_add for this and details. - -- port_vlan_add: bridge layer function invoked when a VLAN is configured - (tagged or untagged) for the given switch port - -- port_vlan_del: bridge layer function invoked when a VLAN is removed from the - given switch port - -- port_vlan_dump: bridge layer function invoked with a switchdev callback - function that the driver has to call for each VLAN the given port is a member - of. A switchdev object is used to carry the VID and bridge flags. - -- port_fdb_add: bridge layer function invoked when the bridge wants to install a - Forwarding Database entry, the switch hardware should be programmed with the - specified address in the specified VLAN Id in the forwarding database - associated with this VLAN ID. If the operation is not supported, this - function should return -EOPNOTSUPP to inform the bridge code to fallback to - a software implementation. - -Note: VLAN ID 0 corresponds to the port private database, which, in the context -of DSA, would be the its port-based VLAN, used by the associated bridge device. - -- port_fdb_del: bridge layer function invoked when the bridge wants to remove a - Forwarding Database entry, the switch hardware should be programmed to delete - the specified MAC address from the specified VLAN ID if it was mapped into - this port forwarding database - -- port_fdb_dump: bridge layer function invoked with a switchdev callback - function that the driver has to call for each MAC address known to be behind - the given port. A switchdev object is used to carry the VID and FDB info. - -- port_mdb_prepare: bridge layer function invoked when the bridge prepares the - installation of a multicast database entry. If the operation is not supported, - this function should return -EOPNOTSUPP to inform the bridge code to fallback - to a software implementation. No hardware setup must be done in this function. - See port_fdb_add for this and details. - -- port_mdb_add: bridge layer function invoked when the bridge wants to install - a multicast database entry, the switch hardware should be programmed with the - specified address in the specified VLAN ID in the forwarding database - associated with this VLAN ID. - -Note: VLAN ID 0 corresponds to the port private database, which, in the context -of DSA, would be the its port-based VLAN, used by the associated bridge device. - -- port_mdb_del: bridge layer function invoked when the bridge wants to remove a - multicast database entry, the switch hardware should be programmed to delete - the specified MAC address from the specified VLAN ID if it was mapped into - this port forwarding database. - -- port_mdb_dump: bridge layer function invoked with a switchdev callback - function that the driver has to call for each MAC address known to be behind - the given port. A switchdev object is used to carry the VID and MDB info. - -TODO -==== - -Making SWITCHDEV and DSA converge towards an unified codebase -------------------------------------------------------------- - -SWITCHDEV properly takes care of abstracting the networking stack with offload -capable hardware, but does not enforce a strict switch device driver model. On -the other DSA enforces a fairly strict device driver model, and deals with most -of the switch specific. At some point we should envision a merger between these -two subsystems and get the best of both worlds. - -Other hanging fruits --------------------- - -- making the number of ports fully dynamic and not dependent on DSA_MAX_PORTS -- allowing more than one CPU/management interface: - http://comments.gmane.org/gmane.linux.network/365657 -- porting more drivers from other vendors: - http://comments.gmane.org/gmane.linux.network/365510 |