aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/include/linux/dsa (follow)
AgeCommit message (Collapse)AuthorFilesLines
2020-01-05net: dsa: Make deferred_xmit private to sja1105Vladimir Oltean1-0/+3
There are 3 things that are wrong with the DSA deferred xmit mechanism: 1. Its introduction has made the DSA hotpath ever so slightly more inefficient for everybody, since DSA_SKB_CB(skb)->deferred_xmit needs to be initialized to false for every transmitted frame, in order to figure out whether the driver requested deferral or not (a very rare occasion, rare even for the only driver that does use this mechanism: sja1105). That was necessary to avoid kfree_skb from freeing the skb. 2. Because L2 PTP is a link-local protocol like STP, it requires management routes and deferred xmit with this switch. But as opposed to STP, the deferred work mechanism needs to schedule the packet rather quickly for the TX timstamp to be collected in time and sent to user space. But there is no provision for controlling the scheduling priority of this deferred xmit workqueue. Too bad this is a rather specific requirement for a feature that nobody else uses (more below). 3. Perhaps most importantly, it makes the DSA core adhere a bit too much to the NXP company-wide policy "Innovate Where It Doesn't Matter". The sja1105 is probably the only DSA switch that requires some frames sent from the CPU to be routed to the slave port via an out-of-band configuration (register write) rather than in-band (DSA tag). And there are indeed very good reasons to not want to do that: if that out-of-band register is at the other end of a slow bus such as SPI, then you limit that Ethernet flow's throughput to effectively the throughput of the SPI bus. So hardware vendors should definitely not be encouraged to design this way. We do _not_ want more widespread use of this mechanism. Luckily we have a solution for each of the 3 issues: For 1, we can just remove that variable in the skb->cb and counteract the effect of kfree_skb with skb_get, much to the same effect. The advantage, of course, being that anybody who doesn't use deferred xmit doesn't need to do any extra operation in the hotpath. For 2, we can create a kernel thread for each port's deferred xmit work. If the user switch ports are named swp0, swp1, swp2, the kernel threads will be named swp0_xmit, swp1_xmit, swp2_xmit (there appears to be a 15 character length limit on kernel thread names). With this, the user can change the scheduling priority with chrt $(pidof swp2_xmit). For 3, we can actually move the entire implementation to the sja1105 driver. So this patch deletes the generic implementation from the DSA core and adds a new one, more adequate to the requirements of PTP TX timestamping, in sja1105_main.c. Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-05net: dsa: sja1105: Always send through management routes in slot 0Vladimir Oltean1-1/+0
I finally found out how the 4 management route slots are supposed to be used, but.. it's not worth it. The description from the comment I've just deleted in this commit is still true: when more than 1 management slot is active at the same time, the switch will match frames incoming [from the CPU port] on the lowest numbered management slot that matches the frame's DMAC. My issue was that one was not supposed to statically assign each port a slot. Yes, there are 4 slots and also 4 non-CPU ports, but that is a mere coincidence. Instead, the switch can be used like this: every management frame gets a slot at the right of the most recently assigned slot: Send mgmt frame 1 through S0: S0 x x x Send mgmt frame 2 through S1: S0 S1 x x Send mgmt frame 3 through S2: S0 S1 S2 x Send mgmt frame 4 through S3: S0 S1 S2 S3 The difference compared to the old usage is that the transmission of frames 1-4 doesn't need to wait until the completion of the management route. It is safe to use a slot to the right of the most recently used one, because by protocol nobody will program a slot to your left and "steal" your route towards the correct egress port. So there is a potential throughput benefit here. But mgmt frame 5 has no more free slot to use, so it has to wait until _all_ of S0, S1, S2, S3 are full, in order to use S0 again. And that's actually exactly the problem: I was looking for something that would bring more predictable transmission latency, but this is exactly the opposite: 3 out of 4 frames would be transmitted quicker, but the 4th would draw the short straw and have a worse worst-case latency than before. Useless. Things are made even worse by PTP TX timestamping, which is something I won't go deeply into here. Suffice to say that the fact there is a driver-level lock on the SPI bus offsets any potential throughput gains that parallelism might bring. So there's no going back to the multi-slot scheme, remove the "mgmt_slot" variable from sja1105_port and the dummy static assignment made at probe time. While passing by, also remove the assignment to casc_port altogether. Don't pretend that we support cascaded setups. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30net: dsa: sja1105: Use PTP core's dedicated kernel thread for RX timestampingVladimir Oltean1-2/+0
And move the queue of skb's waiting for RX timestamps into the ptp_data structure, since it isn't needed if PTP is not compiled. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-02net: dsa: sja1105: Fix sleeping while atomic in .port_hwtstamp_setVladimir Oltean1-1/+3
Currently this stack trace can be seen with CONFIG_DEBUG_ATOMIC_SLEEP=y: [ 41.568348] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:909 [ 41.576757] in_atomic(): 1, irqs_disabled(): 0, pid: 208, name: ptp4l [ 41.583212] INFO: lockdep is turned off. [ 41.587123] CPU: 1 PID: 208 Comm: ptp4l Not tainted 5.3.0-rc6-01445-ge950f2d4bc7f-dirty #1827 [ 41.599873] [<c0313d7c>] (unwind_backtrace) from [<c030e13c>] (show_stack+0x10/0x14) [ 41.607584] [<c030e13c>] (show_stack) from [<c1212d50>] (dump_stack+0xd4/0x100) [ 41.614863] [<c1212d50>] (dump_stack) from [<c037dfc8>] (___might_sleep+0x1c8/0x2b4) [ 41.622574] [<c037dfc8>] (___might_sleep) from [<c122ea90>] (__mutex_lock+0x48/0xab8) [ 41.630368] [<c122ea90>] (__mutex_lock) from [<c122f51c>] (mutex_lock_nested+0x1c/0x24) [ 41.638340] [<c122f51c>] (mutex_lock_nested) from [<c0c6fe08>] (sja1105_static_config_reload+0x30/0x27c) [ 41.647779] [<c0c6fe08>] (sja1105_static_config_reload) from [<c0c7015c>] (sja1105_hwtstamp_set+0x108/0x1cc) [ 41.657562] [<c0c7015c>] (sja1105_hwtstamp_set) from [<c0feb650>] (dev_ifsioc+0x18c/0x330) [ 41.665788] [<c0feb650>] (dev_ifsioc) from [<c0febbd8>] (dev_ioctl+0x320/0x6e8) [ 41.673064] [<c0febbd8>] (dev_ioctl) from [<c0f8b1f4>] (sock_ioctl+0x334/0x5e8) [ 41.680340] [<c0f8b1f4>] (sock_ioctl) from [<c05404a8>] (do_vfs_ioctl+0xb0/0xa10) [ 41.687789] [<c05404a8>] (do_vfs_ioctl) from [<c0540e3c>] (ksys_ioctl+0x34/0x58) [ 41.695151] [<c0540e3c>] (ksys_ioctl) from [<c0301000>] (ret_fast_syscall+0x0/0x28) [ 41.702768] Exception stack(0xe8495fa8 to 0xe8495ff0) [ 41.707796] 5fa0: beff4a8c 00000001 00000011 000089b0 beff4a8c beff4a80 [ 41.715933] 5fc0: beff4a8c 00000001 0000000c 00000036 b6fa98c8 004e19c1 00000001 00000000 [ 41.724069] 5fe0: 004dcedc beff4a6c 004c0738 b6e7af4c [ 41.729860] BUG: scheduling while atomic: ptp4l/208/0x00000002 [ 41.735682] INFO: lockdep is turned off. Enabling RX timestamping will logically disturb the fastpath (processing of meta frames). Replace bool hwts_rx_en with a bit that is checked atomically from the fastpath and temporarily unset from the sleepable context during a change of the RX timestamping process (a destructive operation anyways, requires switch reset). If found unset, the fastpath (net/dsa/tag_sja1105.c) will just drop any received meta frame and not take the meta_lock at all. Fixes: a602afd200f5 ("net: dsa: sja1105: Expose PTP timestamping ioctls to userspace") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-08net: dsa: sja1105: Add a state machine for RX timestampingVladimir Oltean1-0/+7
Meta frame reception relies on the hardware keeping its promise that it will send no other traffic towards the CPU port between a link-local frame and a meta frame. Otherwise there is no other way to associate the meta frame with the link-local frame it's holding a timestamp of. The receive function is made stateful, and buffers a timestampable frame until its meta frame arrives, then merges the two, drops the meta and releases the link-local frame up the stack. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-08net: dsa: sja1105: Add a global sja1105_tagger_data structureVladimir Oltean1-0/+15
This will be used to keep state for RX timestamping. It is global because the switch serializes timestampable and meta frames when trapping them towards the CPU port (lower port indices have higher priority) and therefore having one state machine per port would create unnecessary complications. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-08net: dsa: sja1105: Build a minimal understanding of meta framesVladimir Oltean1-0/+11
Meta frames are sent on the CPU port by the switch if RX timestamping is enabled. They contain a partial timestamp of the previous frame. They are Ethernet frames with the Ethernet header constructed out of: - SJA1105_META_DMAC - SJA1105_META_SMAC - ETH_P_SJA1105_META The Ethernet payload will be decoded in a follow-up patch. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-08net: dsa: sja1105: Add logic for TX timestampingVladimir Oltean1-0/+1
On TX, timestamping is performed synchronously from the port_deferred_xmit worker thread. In management routes, the switch is requested to take egress timestamps (again partial), which are reconstructed and appended to a clone of the skb that was just sent. The cloning is done by DSA and we retrieve the pointer from the structure that DSA keeps in skb->cb. Then these clones are enqueued to the socket's error queue for application-level processing. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-08net: dsa: tag_8021q: Create helper function for removing VLAN headerVladimir Oltean1-9/+7
This removes the existing implementation from tag_sja1105, which was partially incorrect (it was not changing the MAC header offset, thereby leaving it to point 4 bytes earlier than it should have). This overwrites the VLAN tag by moving the Ethernet source and destination MACs 4 bytes to the right. Then skb->data (assumed to be pointing immediately after the EtherType) is temporarily pushed to the beginning of the new Ethernet header, the new Ethernet header offset and length are recorded, then skb->data is moved back to where it was. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-31net: dsa: sja1105: Don't store frame type in skb->cbVladimir Oltean1-12/+0
Due to a confusion I thought that eth_type_trans() was called by the network stack whereas it can actually be called by network drivers to figure out the skb protocol and next packet_type handlers. In light of the above, it is not safe to store the frame type from the DSA tagger's .filter callback (first entry point on RX path), since GRO is yet to be invoked on the received traffic. Hence it is very likely that the skb->cb will actually get overwritten between eth_type_trans() and the actual DSA packet_type handler. Of course, what this patch fixes is the actual overwriting of the SJA1105_SKB_CB(skb)->type field from the GRO layer, which made all frames be seen as SJA1105_FRAME_TYPE_NORMAL (0). Fixes: 227d07a07ef1 ("net: dsa: sja1105: Add support for traffic through standalone ports") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05net: dsa: sja1105: Add support for traffic through standalone portsVladimir Oltean1-9/+26
In order to support this, we are creating a make-shift switch tag out of a VLAN trunk configured on the CPU port. Termination of normal traffic on switch ports only works when not under a vlan_filtering bridge. Termination of management (PTP, BPDU) traffic works under all circumstances because it uses a different tagging mechanism (incl_srcpt). We are making use of the generic CONFIG_NET_DSA_TAG_8021Q code and leveraging it from our own CONFIG_NET_DSA_TAG_SJA1105. There are two types of traffic: regular and link-local. The link-local traffic received on the CPU port is trapped from the switch's regular forwarding decisions because it matched one of the two DMAC filters for management traffic. On transmission, the switch requires special massaging for these link-local frames. Due to a weird implementation of the switching IP, by default it drops link-local frames that originate on the CPU port. It needs to be told where to forward them to, through an SPI command ("management route") that is valid for only a single frame. So when we're sending link-local traffic, we are using the dsa_defer_xmit mechanism. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05net: dsa: Optional VLAN-based port separation for switches without taggingVladimir Oltean1-0/+76
This patch provides generic DSA code for using VLAN (802.1Q) tags for the same purpose as a dedicated switch tag for injection/extraction. It is based on the discussions and interest that has been so far expressed in https://www.spinics.net/lists/netdev/msg556125.html. Unlike all other DSA-supported tagging protocols, CONFIG_NET_DSA_TAG_8021Q does not offer a complete solution for drivers (nor can it). Instead, it provides generic code that driver can opt into calling: - dsa_8021q_xmit: Inserts a VLAN header with the specified contents. Can be called from another tagging protocol's xmit function. Currently the LAN9303 driver is inserting headers that are simply 802.1Q with custom fields, so this is an opportunity for code reuse. - dsa_8021q_rcv: Retrieves the TPID and TCI from a VLAN-tagged skb. Removing the VLAN header is left as a decision for the caller to make. - dsa_port_setup_8021q_tagging: For each user port, installs an Rx VID and a Tx VID, for proper untagged traffic identification on ingress and steering on egress. Also sets up the VLAN trunk on the upstream (CPU or DSA) port. Drivers are intentionally left to call this function explicitly, depending on the context and hardware support. The expected switch behavior and VLAN semantics should not be violated under any conditions. That is, after calling dsa_port_setup_8021q_tagging, the hardware should still pass all ingress traffic, be it tagged or untagged. For uniformity with the other tagging protocols, a module for the dsa_8021q_netdev_ops structure is registered, but the typical usage is to set up another tagging protocol which selects CONFIG_NET_DSA_TAG_8021Q, and calls the API from tag_8021q.h. Null function definitions are also provided so that a "depends on" is not forced in the Kconfig. This tagging protocol only works when switch ports are standalone, or when they are added to a VLAN-unaware bridge. It will probably remain this way for the reasons below. When added to a bridge that has vlan_filtering 1, the bridge core will install its own VLANs and reset the pvids through switchdev. For the bridge core, switchdev is a write-only pipe. All VLAN-related state is kept in the bridge core and nothing is read from DSA/switchdev or from the driver. So the bridge core will break this port separation because it will install the vlan_default_pvid into all switchdev ports. Even if we could teach the bridge driver about switchdev preference of a certain vlan_default_pvid (task difficult in itself since the current setting is per-bridge but we would need it per-port), there would still exist many other challenges. Firstly, in the DSA rcv callback, a driver would have to perform an iterative reverse lookup to find the correct switch port. That is because the port is a bridge slave, so its Rx VID (port PVID) is subject to user configuration. How would we ensure that the user doesn't reset the pvid to a different value (which would make an O(1) translation impossible), or to a non-unique value within this DSA switch tree (which would make any translation impossible)? Finally, not all switch ports are equal in DSA, and that makes it difficult for the bridge to be completely aware of this anyway. The CPU port needs to transmit tagged packets (VLAN trunk) in order for the DSA rcv code to be able to decode source information. But the bridge code has absolutely no idea which switch port is the CPU port, if nothing else then just because there is no netdevice registered by DSA for the CPU port. Also DSA does not currently allow the user to specify that they want the CPU port to do VLAN trunking anyway. VLANs are added to the CPU port using the same flags as they were added on the user port. So the VLANs installed by dsa_port_setup_8021q_tagging per driver request should remain private from the bridge's and user's perspective, and should not alter the VLAN semantics observed by the user. In the current implementation a VLAN range ending at 4095 (VLAN_N_VID) is reserved for this purpose. Each port receives a unique Rx VLAN and a unique Tx VLAN. Separate VLANs are needed for Rx and Tx because they serve different purposes: on Rx the switch must process traffic as untagged and process it with a port-based VLAN, but with care not to hinder bridging. On the other hand, the Tx VLAN is where the reachability restrictions are imposed, since by tagging frames in the xmit callback we are telling the switch onto which port to steer the frame. Some general guidance on how this support might be employed for real-life hardware (some comments made by Florian Fainelli): - If the hardware supports VLAN tag stacking, it should somehow back up its private VLAN settings when the bridge tries to override them. Then the driver could re-apply them as outer tags. Dedicating an outer tag per bridge device would allow identical inner tag VID numbers to co-exist, yet preserve broadcast domain isolation. - If the switch cannot handle VLAN tag stacking, it should disable this port separation when added as slave to a vlan_filtering bridge, in that case having reduced functionality. - Drivers for old switches that don't support the entire VLAN_N_VID range will need to rework the current range selection mechanism. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-03net: dsa: sja1105: Add support for VLAN operationsVladimir Oltean1-0/+4
VLAN filtering cannot be properly disabled in SJA1105. So in order to emulate the "no VLAN awareness" behavior (not dropping traffic that is tagged with a VID that isn't configured on the port), we need to hack another switch feature: programmable TPID (which is 0x8100 for 802.1Q). We are reprogramming the TPID to a bogus value which leaves the switch thinking that all traffic is untagged, and therefore accepts it. Under a vlan_filtering bridge, the proper TPID of ETH_P_8021Q is installed again, and the switch starts identifying 802.1Q-tagged traffic. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-03net: dsa: Introduce driver for NXP SJA1105 5-port L2 switchVladimir Oltean1-0/+19
At this moment the following is supported: * Link state management through phylib * Autonomous L2 forwarding managed through iproute2 bridge commands. IP termination must be done currently through the master netdevice, since the switch is unmanaged at this point and using DSA_TAG_PROTO_NONE. Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Georg Waibel <georg.waibel@sensor-technik.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-04net: dsa: lan9303: phy_addr_sel_strap rename and retypeEgil Hjelmeland1-1/+1
chip->phy_addr_sel_strap is declared as a bool, but is also used as an integer address base. Rename 'phy_addr_sel_strap' to 'phy_addr_base', and change type to int. Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-08net: dsa: lan9303: Protect ALR operations with mutexEgil Hjelmeland1-0/+1
ALR table operations are a sequence of related register operations which should be protected from concurrent access. The alr_cache should also be protected. Add alr_mutex doing that. Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08net: dsa: lan9303: Correct register names in commentsEgil Hjelmeland1-3/+5
Two comments refer to registers, but lack the LAN9303_ prefix. Fix that. Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03net: Define eth_stp_addr in linux/etherdevice.hEgil Hjelmeland1-2/+0
The lan9303 driver defines eth_stp_addr as a synonym to eth_reserved_addr_base to get the STP ethernet address 01:80:c2:00:00:00. eth_reserved_addr_base is also used to define the start of Bridge Reserved ethernet address range, which happen to be the STP address. br_dev_setup refer to eth_reserved_addr_base as a definition of STP address. Clean up by: - Move the eth_stp_addr definition to linux/etherdevice.h - Use eth_stp_addr instead of eth_reserved_addr_base in br_dev_setup. Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-01net: dsa: lan9303: Add STP ALR entry on port 0Egil Hjelmeland1-0/+2
STP BPDUs arriving on user ports must sent to CPU port only, for processing by the SW bridge. Add an ALR entry with STP state override to fix that. Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-27net: dsa: lan9303: Move struct lan9303 to include/linux/dsa/lan9303.hEgil Hjelmeland1-0/+36
The next patch require net/dsa/tag_lan9303.c to access struct lan9303. Therefore move struct lan9303 definitions from drivers/net/dsa/lan9303.h to new file include/linux/dsa/lan9303.h. Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>