aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/drivers/net/ethernet/mscc/ocelot.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2022-09-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+7
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-28net: mscc: ocelot: fix tagged VLAN refusal while under a VLAN-unaware bridgeVladimir Oltean1-0/+7
Currently the following set of commands fails: $ ip link add br0 type bridge # vlan_filtering 0 $ ip link set swp0 master br0 $ bridge vlan port vlan-id swp0 1 PVID Egress Untagged $ bridge vlan add dev swp0 vid 10 Error: mscc_ocelot_switch_lib: Port with more than one egress-untagged VLAN cannot have egress-tagged VLANs. Dumping ocelot->vlans, one can see that the 2 egress-untagged VLANs on swp0 are vid 1 (the bridge PVID) and vid 4094, a PVID used privately by the driver for VLAN-unaware bridging. So this is why bridge vid 10 is refused, despite 'bridge vlan' showing a single egress untagged VLAN. As mentioned in the comment added, having this private VLAN does not impose restrictions to the hardware configuration, yet it is a bookkeeping problem. There are 2 possible solutions. One is to make the functions that operate on VLAN-unaware pvids: - ocelot_add_vlan_unaware_pvid() - ocelot_del_vlan_unaware_pvid() - ocelot_port_setup_dsa_8021q_cpu() - ocelot_port_teardown_dsa_8021q_cpu() call something different than ocelot_vlan_member_(add|del)(), the latter being the real problem, because it allocates a struct ocelot_bridge_vlan *vlan which it adds to ocelot->vlans. We don't really *need* the private VLANs in ocelot->vlans, it's just that we have the extra convenience of having the vlan->portmask cached in software (whereas without these structures, we'd have to create a raw ocelot_vlant_rmw_mask() procedure which reads back the current port mask from hardware). The other solution is to filter out the private VLANs from ocelot_port_num_untagged_vlans(), since they aren't what callers care about. We only need to do this to the mentioned function and not to ocelot_port_num_tagged_vlans(), because private VLANs are never egress-tagged. Nothing else seems to be broken in either solution, but the first one requires more rework which will conflict with the net-next change 36a0bf443585 ("net: mscc: ocelot: set up tag_8021q CPU ports independent of user port affinity"), and I'd like to avoid that. So go with the other one. Fixes: 54c319846086 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220927122042.1100231-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20net: dsa: felix: add support for changing DSA masterVladimir Oltean1-1/+2
Changing the DSA master means different things depending on the tagging protocol in use. For NPI mode ("ocelot" and "seville"), there is a single port which can be configured as NPI, but DSA only permits changing the CPU port affinity of user ports one by one. So changing a user port to a different NPI port globally changes what the NPI port is, and breaks the user ports still using the old one. To address this while still permitting the change of the NPI port, require that the user ports which are still affine to the old NPI port are down, and cannot be brought up until they are all affine to the same NPI port. The tag_8021q mode ("ocelot-8021q") is more flexible, in that each user port can be freely assigned to one CPU port or to the other. This works by filtering host addresses towards both tag_8021q CPU ports, and then restricting the forwarding from a certain user port only to one of the two tag_8021q CPU ports. Additionally, the 2 tag_8021q CPU ports can be placed in a LAG. This works by enabling forwarding via PGID_SRC from a certain user port towards the logical port ID containing both tag_8021q CPU ports, but then restricting forwarding per packet, via the LAG hash codes in PGID_AGGR, to either one or the other. When we change the DSA master to a LAG device, DSA guarantees us that the LAG has at least one lower interface as a physical DSA master. But DSA masters can come and go as lowers of that LAG, and ds->ops->port_change_master() will not get called, because the DSA master is still the same (the LAG). So we need to hook into the ds->ops->port_lag_{join,leave} calls on the CPU ports and update the logical port ID of the LAG that user ports are assigned to. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20net: dsa: propagate extack to port_lag_joinVladimir Oltean1-2/+6
Drivers could refuse to offload a LAG configuration for a variety of reasons, mainly having to do with its TX type. Additionally, since DSA masters may now also be LAG interfaces, and this will translate into a call to port_lag_join on the CPU ports, there may be extra restrictions there. Propagate the netlink extack to this DSA method in order for drivers to give a meaningful error message back to the user. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-09net: mscc: ocelot: move more PTP code from the lib to ocelot_ptp.cVladimir Oltean1-478/+0
Decongest ocelot.c a bit more by moving all PTP related logic (including timestamp processing and PTP packet traps) to ocelot_ptp.c. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: mscc: ocelot: unexport ocelot_port_fdb_do_dump from the common libVladimir Oltean1-44/+0
ocelot_port_fdb_do_dump() is only used by ocelot_net.c, so move it there. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: mscc: ocelot: move stats code to ocelot_stats.cVladimir Oltean1-207/+7
The main C file of the ocelot switch lib, ocelot.c, is getting larger and larger, and there are plans to add more logic related to stats. So it seems like an appropriate moment to split the statistics code to a new file. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: dsa: felix: check the 32-bit PSFP stats against overflowVladimir Oltean1-0/+3
The Felix PSFP counters suffer from the same problem as the ocelot ndo_get_stats64 ones - they are 32-bit, so they can easily overflow and this can easily go undetected. Add a custom hook in ocelot_check_stats_work() through which driver specific actions can be taken, and update the stats for the existing PSFP filters from that hook. Previously, vsc9959_psfp_filter_add() and vsc9959_psfp_filter_del() were serialized with respect to each other via rtnl_lock(). However, with the new entry point into &psfp->sfi_list coming from the periodic worker, we now need an explicit mutex to serialize access to these lists. We used to keep a struct felix_stream_filter_counters on stack, through which vsc9959_psfp_stats_get() - a FLOW_CLS_STATS callback - would retrieve data from vsc9959_psfp_counters_get(). We need to become smarter about that in 3 ways: - we need to keep a persistent set of counters for each stream instead of keeping them on stack - we need to promote those counters from u32 to u64, and create a procedure that properly keeps 64-bit counters. Since we clear the hardware counters anyway, and we poll every 2 seconds, a simple increment of a u64 counter with a u32 value will perfectly do the job. - FLOW_CLS_STATS also expect incremental counters, so we also need to zeroize our u64 counters every time sch_flower calls us Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-09net: mscc: ocelot: make access to STAT_VIEW sleepable againVladimir Oltean1-11/+37
To support SPI-controlled switches in the future, access to SYS_STAT_CFG_STAT_VIEW needs to be done outside of any spinlock protected region, but it still needs to be serialized (by a mutex). Split the ocelot->stats_lock spinlock into a mutex that serializes indirect access to hardware registers (ocelot->stat_view_lock) and a spinlock that serializes access to the u64 ocelot->stats array. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-23net: mscc: ocelot: adjust forwarding domain for CPU ports in a LAGVladimir Oltean1-0/+19
Currently when we have 2 CPU ports configured for DSA tag_8021q mode and we put them in a LAG, a PGID dump looks like this: PGID_SRC[0] = ports 4, PGID_SRC[1] = ports 4, PGID_SRC[2] = ports 4, PGID_SRC[3] = ports 4, PGID_SRC[4] = ports 0, 1, 2, 3, 4, 5, PGID_SRC[5] = no ports (ports 0-3 are user ports, ports 4 and 5 are CPU ports) There are 2 problems with the configuration above: - user ports should enable forwarding towards both CPU ports, not just 4, and the aggregation PGIDs should prune one CPU port or the other from the destination port mask, based on a hash computed from packet headers. - CPU ports should not be allowed to forward towards themselves and also not towards other ports in the same LAG as themselves The first problem requires fixing up the PGID_SRC of user ports, when ocelot_port_assigned_dsa_8021q_cpu_mask() is called. We need to say that when a user port is assigned to a tag_8021q CPU port and that port is in a LAG, it should forward towards all ports in that LAG. The second problem requires fixing up the PGID_SRC of port 4, to remove ports 4 and 5 (in a LAG) from the allowed destinations. After this change, the PGID source masks look as follows: PGID_SRC[0] = ports 4, 5, PGID_SRC[1] = ports 4, 5, PGID_SRC[2] = ports 4, 5, PGID_SRC[3] = ports 4, 5, PGID_SRC[4] = ports 0, 1, 2, 3, PGID_SRC[5] = no ports Note that PGID_SRC[5] still looks weird (it should say "0, 1, 2, 3" just like PGID_SRC[4] does), but I've tested forwarding through this CPU port and it doesn't seem like anything is affected (it appears that PGID_SRC[4] is being looked up on forwarding from the CPU, since both ports 4 and 5 have logical port ID 4). The reason why it looks weird is because we've never called ocelot_port_assign_dsa_8021q_cpu() for any user port towards port 5 (all user ports are assigned to port 4 which is in a LAG with 5). Since things aren't broken, I'm willing to leave it like that for now and just document the oddity. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-23net: mscc: ocelot: set up tag_8021q CPU ports independent of user port affinityVladimir Oltean1-32/+32
This is a partial revert of commit c295f9831f1d ("net: mscc: ocelot: switch from {,un}set to {,un}assign for tag_8021q CPU ports"), because as it turns out, this isn't how tag_8021q CPU ports under a LAG are supposed to work. Under that scenario, all user ports are "assigned" to the single tag_8021q CPU port represented by the logical port corresponding to the bonding interface. So one CPU port in a LAG would have is_dsa_8021q_cpu set to true (the one whose physical port ID is equal to the logical port ID), and the other one to false. In turn, this makes 2 undesirable things happen: (1) PGID_CPU contains only the first physical CPU port, rather than both (2) only the first CPU port will be added to the private VLANs used by ocelot for VLAN-unaware bridging To make the driver behave in the same way for both bonded CPU ports, we need to bring back the old concept of setting up a port as a tag_8021q CPU port, and this is what deals with VLAN membership and PGID_CPU updating. But we also need the CPU port "assignment" (the user to CPU port affinity), and this is what updates the PGID_SRC forwarding rules. All DSA CPU ports are statically configured for tag_8021q mode when the tagging protocol is changed to ocelot-8021q. User ports are "assigned" to one CPU port or the other dynamically (this will be handled by a future change). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-17net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offsetVladimir Oltean1-6/+5
With so many counter addresses recently discovered as being wrong, it is desirable to at least have a central database of information, rather than two: one through the SYS_COUNT_* registers (used for ndo_get_stats64), and the other through the offset field of struct ocelot_stat_layout elements (used for ethtool -S). The strategy will be to keep the SYS_COUNT_* definitions as the single source of truth, but for that we need to expand our current definitions to cover all registers. Then we need to convert the ocelot region creation logic, and stats worker, to the read semantics imposed by going through SYS_COUNT_* absolute register addresses, rather than offsets of 32-bit words relative to SYS_COUNT_RX_OCTETS (which should have been SYS_CNT, by the way). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: mscc: ocelot: make struct ocelot_stat_layout array indexableVladimir Oltean1-13/+27
The ocelot counters are 32-bit and require periodic reading, every 2 seconds, by ocelot_port_update_stats(), so that wraparounds are detected. Currently, the counters reported by ocelot_get_stats64() come from the 32-bit hardware counters directly, rather than from the 64-bit accumulated ocelot->stats, and this is a problem for their integrity. The strategy is to make ocelot_get_stats64() able to cherry-pick individual stats from ocelot->stats the way in which it currently reads them out from SYS_COUNT_* registers. But currently it can't, because ocelot->stats is an opaque u64 array that's used only to feed data into ethtool -S. To solve that problem, we need to make ocelot->stats indexable, and associate each element with an element of struct ocelot_stat_layout used by ethtool -S. This makes ocelot_stat_layout a fat (and possibly sparse) array, so we need to change the way in which we access it. We no longer need OCELOT_STAT_END as a sentinel, because we know the array's size (OCELOT_NUM_STATS). We just need to skip the array elements that were left unpopulated for the switch revision (ocelot, felix, seville). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17net: mscc: ocelot: turn stats_lock into a spinlockVladimir Oltean1-6/+5
ocelot_get_stats64() currently runs unlocked and therefore may collide with ocelot_port_update_stats() which indirectly accesses the same counters. However, ocelot_get_stats64() runs in atomic context, and we cannot simply take the sleepable ocelot->stats_lock mutex. We need to convert it to an atomic spinlock first. Do that as a preparatory change. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-19net: dsa: felix: update base time of time-aware shaper when adjusting PTP timeXiaoliang Yang1-0/+1
When adjusting the PTP clock, the base time of the TAS configuration will become unreliable. We need reset the TAS configuration by using a new base time. For example, if the driver gets a base time 0 of Qbv configuration from user, and current time is 20000. The driver will set the TAS base time to be 20000. After the PTP clock adjustment, the current time becomes 10000. If the TAS base time is still 20000, it will be a future time, and TAS entry list will stop running. Another example, if the current time becomes to be 10000000 after PTP clock adjust, a large time offset can cause the hardware to hang. This patch introduces a tas_clock_adjust() function to reset the TAS module by using a new base time after the PTP clock adjustment. This can avoid issues above. Due to PTP clock adjustment can occur at any time, it may conflict with the TAS configuration. We introduce a new TAS lock to serialize the access to the TAS registers. Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-23net: mscc: ocelot: switch from {,un}set to {,un}assign for tag_8021q CPU portsVladimir Oltean1-46/+74
There is a desire for the felix driver to gain support for multiple tag_8021q CPU ports, but the current model prevents it. This is because ocelot_apply_bridge_fwd_mask() only takes into consideration whether a port is a tag_8021q CPU port, but not whose CPU port it is. We need a model where we can have a direct affinity between an ocelot port and a tag_8021q CPU port. This serves as the basis for multiple CPU ports. Declare a "dsa_8021q_cpu" backpointer in struct ocelot_port which encodes that affinity. Repurpose the "ocelot_set_dsa_8021q_cpu" API to "ocelot_assign_dsa_8021q_cpu" to express the change of paradigm. Note that this change makes the first practical use of the new ocelot_port->index field in ocelot_port_unassign_dsa_8021q_cpu(), where we need to remove the old tag_8021q CPU port from the reserved VLAN range. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-23net: dsa: felix: directly call ocelot_port_{set,unset}_dsa_8021q_cpuVladimir Oltean1-0/+8
Absorb the final details of calling ocelot_port_{,un}set_dsa_8021q_cpu(), i.e. the need to lock &ocelot->fwd_domain_lock, into the callee, to simplify the caller and permit easier code reuse later. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-23net: dsa: felix: update bridge fwd mask from ocelot lib when changing tag_8021q CPUVladimir Oltean1-2/+5
Add more logic to ocelot_port_{,un}set_dsa_8021q_cpu() from the ocelot switch lib by encapsulating the ocelot_apply_bridge_fwd_mask() call that felix used to have. This is necessary because the CPU port change procedure will also need to do this, and it's good to reduce code duplication by having an entry point in the ocelot switch lib that does all that is needed. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-23net: dsa: felix: move the updating of PGID_CPU to the ocelot libVladimir Oltean1-0/+31
PGID_CPU must be updated every time a port is configured or unconfigured as a tag_8021q CPU port. The ocelot switch lib already has a hook for that operation, so move the updating of PGID_CPU to those hooks. These bits are pretty specific to DSA, so normally I would keep them out of the common switch lib, but when tag_8021q is in use, this has implications upon the forwarding mask determined by ocelot_apply_bridge_fwd_mask() and called extensively by the switch lib. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-12net: dsa: felix: bring the NPI port indirection for host flooding to surfaceVladimir Oltean1-3/+0
For symmetry with host FDBs and MDBs where the indirection is now handled outside the ocelot switch lib, do the same for bridge port flags (unicast/multicast/broadcast flooding). The only caller of the ocelot switch lib which uses the NPI port is the Felix DSA driver. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12net: dsa: felix: bring the NPI port indirection for host MDBs to surfaceVladimir Oltean1-6/+0
For symmetry with host FDBs where the indirection is now handled outside the ocelot switch lib, do the same for host MDB entries. The only caller of the ocelot switch lib which uses the NPI port is the Felix DSA driver. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12net: dsa: felix: program host FDB entries towards PGID_CPU for tag_8021q tooVladimir Oltean1-6/+1
I remembered why we had the host FDB migration procedure in place. It is true that host FDB entry migration can be done by changing the value of PGID_CPU, but the problem is that only host FDB entries learned while operating in NPI mode go to PGID_CPU. When the CPU port operates in tag_8021q mode, the FDB entries are learned towards the unicast PGID equal to the physical port number of this CPU port, bypassing the PGID_CPU indirection. So host FDB entries learned in tag_8021q mode are not migrated any longer towards the NPI port. Fix this by extracting the NPI port -> PGID_CPU redirection from the ocelot switch lib, moving it to the Felix DSA driver, and applying it for any CPU port regardless of its kind (NPI or tag_8021q). Fixes: a51c1c3f3218 ("net: dsa: felix: stop migrating FDBs back and forth on tag proto change") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-8/+3
No conflicts. Build issue in drivers/net/ethernet/sfc/ptp.c 54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static") 49e6123c65da ("net: sfc: fix memory leak due to ptp channel") https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-06net: dsa: felix: perform MDB migration based on ocelot->multicast listVladimir Oltean1-0/+61
The felix driver is the only user of dsa_port_walk_mdbs(), and there isn't even a good reason for it, considering that the host MDB entries are already saved by the ocelot switch lib in the ocelot->multicast list. Rewrite the multicast entry migration procedure around the ocelot->multicast list so we can delete dsa_port_walk_mdbs(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-05net: mscc: ocelot: mark traps with a bool instead of keeping them in a listVladimir Oltean1-8/+3
Since the blamed commit, VCAP filters can appear on more than one list. If their action is "trap", they are chained on ocelot->traps via filter->trap_list. This is in addition to their normal placement on the VCAP block->rules list head. Therefore, when we free a VCAP filter, we must remove it from all lists it is a member of, including ocelot->traps. There are at least 2 bugs which are direct consequences of this design decision. First is the incorrect usage of list_empty(), meant to denote whether "filter" is chained into ocelot->traps via filter->trap_list. This does not do the correct thing, because list_empty() checks whether "head->next == head", but in our case, head->next == head->prev == NULL. So we dereference NULL pointers and die when we call list_del(). Second is the fact that not all places that should remove the filter from ocelot->traps do so. One example is ocelot_vcap_block_remove_filter(), which is where we have the main kfree(filter). By keeping freed filters in ocelot->traps we end up in a use-after-free in felix_update_trapping_destinations(). Attempting to fix all the buggy patterns is a whack-a-mole game which makes the driver unmaintainable. Actually this is what the previous patch version attempted to do: https://patchwork.kernel.org/project/netdevbpf/patch/20220503115728.834457-3-vladimir.oltean@nxp.com/ but it introduced another set of bugs, because there are other places in which create VCAP filters, not just ocelot_vcap_filter_create(): - ocelot_trap_add() - felix_tag_8021q_vlan_add_rx() - felix_tag_8021q_vlan_add_tx() Relying on the convention that all those code paths must call INIT_LIST_HEAD(&filter->trap_list) is not going to scale. So let's do what should have been done in the first place and keep a bool in struct ocelot_vcap_filter which denotes whether we are looking at a trapping rule or not. Iterating now happens over the main VCAP IS2 block->rules. The advantage is that we no longer risk having stale references to a freed filter, since it is only present in that list. Fixes: e42bd4ed09aa ("net: mscc: ocelot: keep traps in a list") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-30net: ethernet: ocelot: remove the need for num_stats initializerColin Foster1-0/+5
There is a desire to share the oclot_stats_layout struct outside of the current vsc7514 driver. In order to do so, the length of the array needs to be known at compile time, and defined in the struct ocelot and struct felix_info. Since the array is defined in a .c file and would be declared in the header file via: extern struct ocelot_stat_layout[]; the size of the array will not be known at compile time to outside modules. To fix this, remove the need for defining the number of stats at compile time and allow this number to be determined at initialization. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-25net: mscc: ocelot: don't add VID 0 to ocelot->vlans when leaving VLAN-aware bridgeVladimir Oltean1-2/+2
DSA, through dsa_port_bridge_leave(), first notifies the port of the fact that it left a bridge, then, if that bridge was VLAN-aware, it notifies the port of the change in VLAN awareness state, towards VLAN-unaware mode. So ocelot_port_vlan_filtering() can be called when ocelot_port->bridge is NULL, and this makes ocelot_add_vlan_unaware_pvid() create a struct ocelot_bridge_vlan with a vid of 0 and an "untagged" setting of true on that port. In a way this structure correctly reflects the reality, but by design, VID 0 (OCELOT_STANDALONE_PVID) was not meant to be kept in the bridge VLAN list of the driver, but managed separately. Having OCELOT_STANDALONE_PVID in ocelot->vlans makes us trip up on several sanity checks that did not expect to have this VID there. For example, after we leave a VLAN-aware bridge and we re-join it, we can no longer program egress-tagged VLANs to hardware: # ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up # ip link set swp0 master br0 # ip link set swp0 nomaster # ip link set swp0 master br0 # bridge vlan add dev swp0 vid 100 Error: mscc_ocelot_switch_lib: Port with more than one egress-untagged VLAN cannot have egress-tagged VLANs. But this configuration is in fact supported by the hardware, since we could use OCELOT_PORT_TAG_NATIVE. According to its comment: /* all VLANs except the native VLAN and VID 0 are egress-tagged */ yet when assessing the eligibility for this mode, we do not check for VID 0 in ocelot_port_uses_native_vlan(), instead we just ensure that ocelot_port_num_untagged_vlans() == 1. This is simply because VID 0 doesn't have a bridge VLAN structure. The way I identify the problem is that ocelot_port_vlan_filtering(false) only means to call ocelot_add_vlan_unaware_pvid() when we dynamically turn off VLAN awareness for a bridge we are under, and the PVID changes from the bridge PVID to a reserved PVID based on the bridge number. Since OCELOT_STANDALONE_PVID is statically added to the VLAN table during ocelot_vlan_init() and never removed afterwards, calling ocelot_add_vlan_unaware_pvid() for it is not intended and does not serve any purpose. Fix the issue by avoiding the call to ocelot_add_vlan_unaware_pvid(vid=0) when we're resetting VLAN awareness after leaving the bridge, to become a standalone port. Fixes: 54c319846086 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-25net: mscc: ocelot: ignore VID 0 added by 8021q moduleVladimir Oltean1-0/+10
Both the felix DSA driver and ocelot switchdev driver declare dev->features & NETIF_F_HW_VLAN_CTAG_FILTER under certain circumstances*, so the 8021q module will add VID 0 to our RX filter when the port goes up, to ensure 802.1p traffic is not dropped. We treat VID 0 as a special value (OCELOT_STANDALONE_PVID) which deliberately does not have a struct ocelot_bridge_vlan associated with it. Instead, this gets programmed to the VLAN table in ocelot_vlan_init(). If we allow external calls to modify VID 0, we reach the following situation: # ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up # ip link set swp0 master br0 # ip link set swp0 up # this adds VID 0 to ocelot->vlans with untagged=false bridge vlan port vlan-id swp0 1 PVID Egress Untagged # the bridge also adds VID 1 br0 1 PVID Egress Untagged # bridge vlan add dev swp0 vid 100 untagged Error: mscc_ocelot_switch_lib: Port with egress-tagged VLANs cannot have more than one egress-untagged (native) VLAN. This configuration should have been accepted, because ocelot_port_manage_port_tag() should select OCELOT_PORT_TAG_NATIVE. Yet it isn't, because we have an entry in ocelot->vlans which says VID 0 should be egress-tagged, something the hardware can't do. Fix this by suppressing additions/deletions on VID 0 and managing this VLAN exclusively using OCELOT_STANDALONE_PVID. *DSA toggles it when the port becomes VLAN-aware by joining a VLAN-aware bridge. Ocelot declares it unconditionally for some reason. Fixes: 54c319846086 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-19net: mscc: ocelot: fix broken IP multicast floodingVladimir Oltean1-0/+2
When the user runs: bridge link set dev $br_port mcast_flood on this command should affect not only L2 multicast, but also IPv4 and IPv6 multicast. In the Ocelot switch, unknown multicast gets flooded according to different PGIDs according to its type, and PGID_MC only handles L2 multicast. Therefore, by leaving PGID_MCIPV4 and PGID_MCIPV6 at their default value of 0, unknown IP multicast traffic is never flooded. Fixes: 421741ea5672 ("net: mscc: ocelot: offload bridge port flags to device") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220415151950.219660-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-03-17net: mscc: ocelot: offload per-flow mirroring using tc-mirred and VCAP IS2Vladimir Oltean1-3/+3
Per-flow mirroring with the VCAP IS2 TCAM (in itself handled as an offload for tc-flower) is done by setting the MIRROR_ENA bit from the action vector of the filter. The packet is mirrored to the port mask configured in the ANA:ANA:MIRRORPORTS register (the same port mask as the destinations for port-based mirroring). Functionality was tested with: tc qdisc add dev swp3 clsact tc filter add dev swp3 ingress protocol ip \ flower skip_sw ip_proto icmp \ action mirred egress mirror dev swp1 and pinging through swp3, while seeing that the ICMP replies are mirrored towards swp1. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: mscc: ocelot: add port mirroring support using tc-matchallVladimir Oltean1-0/+76
Ocelot switches perform port-based ingress mirroring if ANA:PORT:PORT_CFG field SRC_MIRROR_ENA is set, and egress mirroring if the port is in ANA:ANA:EMIRRORPORTS. Both ingress-mirrored and egress-mirrored frames are copied to the port mask from ANA:ANA:MIRRORPORTS. So the choice of limiting to a single mirror port via ocelot_mirror_get() and ocelot_mirror_put() may seem bizarre, but the hardware model doesn't map very well to the user space model. If the user wants to mirror the ingress of swp1 towards swp2 and the ingress of swp3 towards swp4, we'd have to program ANA:ANA:MIRRORPORTS with BIT(2) | BIT(4), and that would make swp1 be mirrored towards swp4 too, and swp3 towards swp2. But there are no tc-matchall rules to describe those actions. Now, we could offload a matchall rule with multiple mirred actions, one per desired mirror port, and force the user to stick to the multi-action rule format for subsequent matchall filters. But both DSA and ocelot have the flow_offload_has_one_action() check for the matchall offload, plus the fact that it will get cumbersome to cross-check matchall mirrors with flower mirrors (which will be added in the next patch). As a result, we limit the configuration to a single mirror port, with the possibility of lifting the restriction in the future. Frames injected from the CPU don't get egress-mirrored, since they are sent with the BYPASS bit in the injection frame header, and this bypasses the analyzer module (effectively also the mirroring logic). I don't know what to do/say about this. Functionality was tested with: tc qdisc add dev swp3 clsact tc filter add dev swp3 ingress \ matchall skip_sw \ action mirred egress mirror dev swp1 and pinging through swp3, while seeing that the ICMP replies are mirrored towards swp1. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-15net: mscc: ocelot: fix build error due to missing IEEE_8021QAZ_MAX_TCSVladimir Oltean1-2/+2
IEEE_8021QAZ_MAX_TCS is defined in include/uapi/linux/dcbnl.h, which is included by net/dcbnl.h. Then, linux/netdevice.h conditionally includes net/dcbnl.h if CONFIG_DCB is enabled. Therefore, when CONFIG_DCB is disabled, this indirect dependency is broken. There isn't a good reason to include net/dcbnl.h headers into the ocelot switch library which exports low-level hardware API, so replace IEEE_8021QAZ_MAX_TCS with OCELOT_NUM_TC which has the same value. Fixes: 978777d0fb06 ("net: dsa: felix: configure default-prio and dscp priorities") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220315131215.273450-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-14net: dsa: felix: configure default-prio and dscp prioritiesVladimir Oltean1-0/+116
Follow the established programming model for this driver and provide shims in the felix DSA driver which call the implementations from the ocelot switch lib. The ocelot switchdev driver wasn't integrated with dcbnl due to lack of hardware availability. The switch doesn't have any fancy QoS classification enabled by default. The provided getters will create a default-prio app table entry of 0, and no dscp entry. However, the getters have been made to actually retrieve the hardware configuration rather than static values, to be future proof in case DSA will need this information from more call paths. For default-prio, there is a single field per port, in ANA_PORT_QOS_CFG, called QOS_DEFAULT_VAL. DSCP classification is enabled per-port, again via ANA_PORT_QOS_CFG (field QOS_DSCP_ENA), and individual DSCP values are configured as trusted or not through register ANA_DSCP_CFG (replicated 64 times). An untrusted DSCP value falls back to other QoS classification methods. If trusted, the selected ANA_DSCP_CFG register also holds the QoS class in the QOS_DSCP_VAL field. The hardware also supports DSCP remapping (DSCP value X is translated to DSCP value Y before the QoS class is determined based on the app table entry for Y) and DSCP packet rewriting. The dcbnl framework, for being so flexible in other useless areas, doesn't appear to support this. So this functionality has been left out. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: mscc: ocelot: accept configuring bridge port flags on the NPI portVladimir Oltean1-0/+3
In order for the Felix DSA driver to be able to turn on/off flooding towards its CPU port, we need to redirect calls on the NPI port to actually act upon the index in the analyzer block that corresponds to the CPU port module. This was never necessary until now because DSA (or the bridge) never called ocelot_port_bridge_flags() for the NPI port. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net: mscc: ocelot: enforce FDB isolation when VLAN-unawareVladimir Oltean1-17/+183
Currently ocelot uses a pvid of 0 for standalone ports and ports under a VLAN-unaware bridge, and the pvid of the bridge for ports under a VLAN-aware bridge. Standalone ports do not perform learning, but packets received on them are still subject to FDB lookups. So if the MAC DA that a standalone port receives has been also learned on a VLAN-unaware bridge port, ocelot will attempt to forward to that port, even though it can't, so it will drop packets. So there is a desire to avoid that, and isolate the FDBs of different bridges from one another, and from standalone ports. The ocelot switch library has two distinct entry points: the felix DSA driver and the ocelot switchdev driver. We need to code up a minimal bridge_num allocation in the ocelot switchdev driver too, this is copied from DSA with the exception that ocelot does not care about DSA trees, cross-chip bridging etc. So it only looks at its own ports that are already in the same bridge. The ocelot switchdev driver uses the bridge_num it has allocated itself, while the felix driver uses the bridge_num allocated by DSA. They are both stored inside ocelot_port->bridge_num by the common function ocelot_port_bridge_join() which receives the bridge_num passed by value. Once we have a bridge_num, we can only use it to enforce isolation between VLAN-unaware bridges. As far as I can see, ocelot does not have anything like a FID that further makes VLAN 100 from a port be different to VLAN 100 from another port with regard to FDB lookup. So we simply deny multiple VLAN-aware bridges. For VLAN-unaware bridges, we crop the 4000-4095 VLAN region and we allocate a VLAN for each bridge_num. This will be used as the pvid of each port that is under that VLAN-unaware bridge, for as long as that bridge is VLAN-unaware. VID 0 remains only for standalone ports. It is okay if all standalone ports use the same VID 0, since they perform no address learning, the FDB will contain no entry in VLAN 0, so the packets will always be flooded to the only possible destination, the CPU port. The CPU port module doesn't need to be member of the VLANs to receive packets, but if we use the DSA tag_8021q protocol, those packets are part of the data plane as far as ocelot is concerned, so there it needs to. Just ensure that the DSA tag_8021q CPU port is a member of all reserved VLANs when it is created, and is removed when it is deleted. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-24net: dsa: felix: support FDB entries on offloaded LAG interfacesVladimir Oltean1-1/+127
This adds the logic in the Felix DSA driver and Ocelot switch library. For Ocelot switches, the DEST_IDX that is the output of the MAC table lookup is a logical port (equal to physical port, if no LAG is used, or a dynamically allocated number otherwise). The allocation we have in place for LAG IDs is different from DSA's, so we can't use that: - DSA allocates a continuous range of LAG IDs starting from 1 - Ocelot appears to require that physical ports and LAG IDs are in the same space of [0, num_phys_ports), and additionally, ports that aren't in a LAG must have physical port id == logical port id The implication is that an FDB entry towards a LAG might need to be deleted and reinstalled when the LAG ID changes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+5
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17net: mscc: ocelot: annotate which traps need PTP timestampingVladimir Oltean1-6/+8
The ocelot switch library does not need this information, but the felix DSA driver does. As a reminder, the VSC9959 switch in LS1028A doesn't have an IRQ line for packet extraction, so to be notified that a PTP packet needs to be dequeued, it receives that packet also over Ethernet, by setting up a packet trap. The Felix driver needs to install special kinds of traps for packets in need of RX timestamps, such that the packets are replicated both over Ethernet and over the CPU port module. But the Ocelot switch library sets up more than one trap for PTP event messages; it also traps PTP general messages, MRP control messages etc. Those packets don't need PTP timestamps, so there's no reason for the Felix driver to send them to the CPU port module. By knowing which traps need PTP timestamps, the Felix driver can adjust the traps installed using ocelot_trap_add() such that only those will actually get delivered to the CPU port module. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17net: mscc: ocelot: keep traps in a listVladimir Oltean1-2/+8
When using the ocelot-8021q tagging protocol, the CPU port isn't configured as an NPI port, but is a regular port. So a "trap to CPU" operation is actually a "redirect" operation. So DSA needs to set up the trapping action one way or another, depending on the tagging protocol in use. To ease DSA's work of modifying the action, keep all currently installed traps in a list, so that DSA can live-patch them when the tagging protocol changes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17net: mscc: ocelot: use a single VCAP filter for all MRP trapsVladimir Oltean1-5/+3
The MRP assist code installs a VCAP IS2 trapping rule for each port, but since the key and the action is the same, just the ingress port mask differs, there isn't any need to do this. We can save some space in the TCAM by using a single filter and adjusting the ingress port mask. Reuse the ocelot_trap_add() and ocelot_trap_del() functions for this purpose. Now that the cookies are no longer per port, we need to change the allocation scheme such that MRP traps use a fixed number. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17net: mscc: ocelot: consolidate cookie allocation for private VCAP rulesVladimir Oltean1-10/+10
Every use case that needed VCAP filters (in order: DSA tag_8021q, MRP, PTP traps) has hardcoded filter identifiers that worked well enough for that use case alone. But when two or more of those use cases would be used together, some of those identifiers would overlap, leading to breakage. Add definitions for each cookie and centralize them in ocelot_vcap.h, such that the overlaps are more obvious. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-15net: mscc: ocelot: fix use-after-free in ocelot_vlan_del()Vladimir Oltean1-1/+5
ocelot_vlan_member_del() will free the struct ocelot_bridge_vlan, so if this is the same as the port's pvid_vlan which we access afterwards, what we're accessing is freed memory. Fix the bug by determining whether to clear ocelot_port->pvid_vlan prior to calling ocelot_vlan_member_del(). Fixes: d4004422f6f9 ("net: mscc: ocelot: track the port pvid using a pointer") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14net: mscc: ocelot: use bulk reads for statsColin Foster1-15/+77
Create and utilize bulk regmap reads instead of single access for gathering stats. The background reading of statistics happens frequently, and over a few contiguous memory regions. High speed PCIe buses and MMIO access will probably see negligible performance increase. Lower speed buses like SPI and I2C could see significant performance increase, since the bus configuration and register access times account for a large percentage of data transfer time. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14net: mscc: ocelot: remove unnecessary stat reading from ethtoolColin Foster1-17/+16
The ocelot_update_stats function only needs to read from one port, yet it was updating the stats for all ports. Update to only read the stats that are necessary. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-10net: mscc: ocelot: fix mutex lock error during ethtool stats readColin Foster1-4/+7
An ongoing workqueue populates the stats buffer. At the same time, a user might query the statistics. While writing to the buffer is mutex-locked, reading from the buffer wasn't. This could lead to buggy reads by ethtool. This patch fixes the former blamed commit, but the bug was introduced in the latter. Signed-off-by: Colin Foster <colin.foster@in-advantage.com> Fixes: 1e1caa9735f90 ("ocelot: Clean up stats update deferred work") Fixes: a556c76adc052 ("net: mscc: Add initial Ocelot switch support") Reported-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/all/20220210150451.416845-2-colin.foster@in-advantage.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-05net: mscc: ocelot: fix all IP traffic getting trapped to CPU with PTP over IPVladimir Oltean1-0/+8
The filters for the PTP trap keys are incorrectly configured, in the sense that is2_entry_set() only looks at trap->key.ipv4.dport or trap->key.ipv6.dport if trap->key.ipv4.proto or trap->key.ipv6.proto is set to IPPROTO_TCP or IPPROTO_UDP. But we don't do that, so is2_entry_set() goes through the "else" branch of the IP protocol check, and ends up installing a rule for "Any IP protocol match" (because msk is also 0). The UDP port is ignored. This means that when we run "ptp4l -i swp0 -4", all IP traffic is trapped to the CPU, which hinders bridging. Fix this by specifying the IP protocol in the VCAP IS2 filters for PTP over UDP. Fixes: 96ca08c05838 ("net: mscc: ocelot: set up traps for PTP packets") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-13net: mscc: ocelot: don't let phylink re-enable TX PAUSE on the NPI portVladimir Oltean1-1/+4
Since commit b39648079db4 ("net: mscc: ocelot: disable flow control on NPI interface"), flow control should be disabled on the DSA CPU port when used in NPI mode. However, the commit blamed in the Fixes: tag below broke this, because it allowed felix_phylink_mac_link_up() to overwrite SYS_PAUSE_CFG_PAUSE_ENA for the DSA CPU port. This issue became noticeable since the device tree update from commit 8fcea7be5736 ("arm64: dts: ls1028a: mark internal links between Felix and ENETC as capable of flow control"). The solution is to check whether this is the currently configured NPI port from ocelot_phylink_mac_link_up(), and to not modify the statically disabled PAUSE frame transmission if it is. When the port is configured for lossless mode as opposed to tail drop mode, but the link partner (DSA master) doesn't observe the transmitted PAUSE frames, the switch termination throughput is much worse, as can be seen below. Before: root@debian:~# iperf3 -c 192.168.100.2 Connecting to host 192.168.100.2, port 5201 [ 5] local 192.168.100.1 port 37504 connected to 192.168.100.2 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 28.4 MBytes 238 Mbits/sec 357 22.6 KBytes [ 5] 1.00-2.00 sec 33.6 MBytes 282 Mbits/sec 426 19.8 KBytes [ 5] 2.00-3.00 sec 34.0 MBytes 285 Mbits/sec 343 21.2 KBytes [ 5] 3.00-4.00 sec 32.9 MBytes 276 Mbits/sec 354 22.6 KBytes [ 5] 4.00-5.00 sec 32.3 MBytes 271 Mbits/sec 297 18.4 KBytes ^C[ 5] 5.00-5.06 sec 2.05 MBytes 270 Mbits/sec 45 19.8 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-5.06 sec 163 MBytes 271 Mbits/sec 1822 sender [ 5] 0.00-5.06 sec 0.00 Bytes 0.00 bits/sec receiver After: root@debian:~# iperf3 -c 192.168.100.2 Connecting to host 192.168.100.2, port 5201 [ 5] local 192.168.100.1 port 49470 connected to 192.168.100.2 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 112 MBytes 941 Mbits/sec 259 143 KBytes [ 5] 1.00-2.00 sec 110 MBytes 920 Mbits/sec 329 144 KBytes [ 5] 2.00-3.00 sec 112 MBytes 936 Mbits/sec 255 144 KBytes [ 5] 3.00-4.00 sec 110 MBytes 927 Mbits/sec 355 105 KBytes [ 5] 4.00-5.00 sec 110 MBytes 926 Mbits/sec 350 156 KBytes [ 5] 5.00-6.00 sec 110 MBytes 925 Mbits/sec 305 148 KBytes [ 5] 6.00-7.00 sec 110 MBytes 924 Mbits/sec 320 143 KBytes [ 5] 7.00-8.00 sec 110 MBytes 925 Mbits/sec 273 97.6 KBytes [ 5] 8.00-9.00 sec 109 MBytes 913 Mbits/sec 299 141 KBytes [ 5] 9.00-10.00 sec 110 MBytes 922 Mbits/sec 287 146 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 1.08 GBytes 926 Mbits/sec 3032 sender [ 5] 0.00-10.00 sec 1.08 GBytes 925 Mbits/sec receiver Fixes: de274be32cb2 ("net: dsa: felix: set TX flow control according to the phylink_mac_link_up resolution") Reported-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-07net: dsa: felix: add port fast age supportVladimir Oltean1-0/+37
Add support for flushing the MAC table on a given port in the ocelot switch library, and use this functionality in the felix DSA driver. This operation is needed when a port leaves a bridge to become standalone, and when the learning is disabled, and when the STP state changes to a state where no FDB entry should be present. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220107144229.244584-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-07net: mscc: ocelot: fix incorrect balancing with down LAG portsVladimir Oltean1-15/+11
Assuming the test setup described here: https://patchwork.kernel.org/project/netdevbpf/cover/20210205130240.4072854-1-vladimir.oltean@nxp.com/ (swp1 and swp2 are in bond0, and bond0 is in a bridge with swp0) it can be seen that when swp1 goes down (on either board A or B), then traffic that should go through that port isn't forwarded anywhere. A dump of the PGID table shows the following: PGID_DST[0] = ports 0 PGID_DST[1] = ports 1 PGID_DST[2] = ports 2 PGID_DST[3] = ports 3 PGID_DST[4] = ports 4 PGID_DST[5] = ports 5 PGID_DST[6] = no ports PGID_AGGR[0] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[1] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[2] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[3] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[4] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[5] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[6] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[7] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[8] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[9] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[10] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[11] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[12] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[13] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[14] = ports 0, 1, 2, 3, 4, 5 PGID_AGGR[15] = ports 0, 1, 2, 3, 4, 5 PGID_SRC[0] = ports 1, 2 PGID_SRC[1] = ports 0 PGID_SRC[2] = ports 0 PGID_SRC[3] = no ports PGID_SRC[4] = no ports PGID_SRC[5] = no ports PGID_SRC[6] = ports 0, 1, 2, 3, 4, 5 Whereas a "good" PGID configuration for that setup should have looked like this: PGID_DST[0] = ports 0 PGID_DST[1] = ports 1, 2 PGID_DST[2] = ports 1, 2 PGID_DST[3] = ports 3 PGID_DST[4] = ports 4 PGID_DST[5] = ports 5 PGID_DST[6] = no ports PGID_AGGR[0] = ports 0, 2, 3, 4, 5 PGID_AGGR[1] = ports 0, 2, 3, 4, 5 PGID_AGGR[2] = ports 0, 2, 3, 4, 5 PGID_AGGR[3] = ports 0, 2, 3, 4, 5 PGID_AGGR[4] = ports 0, 2, 3, 4, 5 PGID_AGGR[5] = ports 0, 2, 3, 4, 5 PGID_AGGR[6] = ports 0, 2, 3, 4, 5 PGID_AGGR[7] = ports 0, 2, 3, 4, 5 PGID_AGGR[8] = ports 0, 2, 3, 4, 5 PGID_AGGR[9] = ports 0, 2, 3, 4, 5 PGID_AGGR[10] = ports 0, 2, 3, 4, 5 PGID_AGGR[11] = ports 0, 2, 3, 4, 5 PGID_AGGR[12] = ports 0, 2, 3, 4, 5 PGID_AGGR[13] = ports 0, 2, 3, 4, 5 PGID_AGGR[14] = ports 0, 2, 3, 4, 5 PGID_AGGR[15] = ports 0, 2, 3, 4, 5 PGID_SRC[0] = ports 1, 2 PGID_SRC[1] = ports 0 PGID_SRC[2] = ports 0 PGID_SRC[3] = no ports PGID_SRC[4] = no ports PGID_SRC[5] = no ports PGID_SRC[6] = ports 0, 1, 2, 3, 4, 5 In other words, in the "bad" configuration, the attempt is to remove the inactive swp1 from the destination ports via PGID_DST. But when a MAC table entry is learned, it is learned towards PGID_DST 1, because that is the logical port id of the LAG itself (it is equal to the lowest numbered member port). So when swp1 becomes inactive, if we set PGID_DST[1] to contain just swp1 and not swp2, the packet will not have any chance to reach the destination via swp2. The "correct" way to remove swp1 as a destination is via PGID_AGGR (remove swp1 from the aggregation port groups for all aggregation codes). This means that PGID_DST[1] and PGID_DST[2] must still contain both swp1 and swp2. This makes the MAC table still treat packets destined towards the single-port LAG as "multicast", and the inactive ports are removed via the aggregation code tables. The change presented here is a design one: the ocelot_get_bond_mask() function used to take an "only_active_ports" argument. We don't need that. The only call site that specifies only_active_ports=true, ocelot_set_aggr_pgids(), must retrieve the entire bonding mask, because it must program that into PGID_DST. Additionally, it must also clear the inactive ports from the bond mask here, which it can't do if bond_mask just contains the active ports: ac = ocelot_read_rix(ocelot, ANA_PGID_PGID, i); ac &= ~bond_mask; <---- here /* Don't do division by zero if there was no active * port. Just make all aggregation codes zero. */ if (num_active_ports) ac |= BIT(aggr_idx[i % num_active_ports]); ocelot_write_rix(ocelot, ac, ANA_PGID_PGID, i); So it becomes the responsibility of ocelot_set_aggr_pgids() to take ocelot_port->lag_tx_active into consideration when populating the aggr_idx array. Fixes: 23ca3b727ee6 ("net: mscc: ocelot: rebalance LAGs on link up/down events") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220107164332.402133-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-14net_tstamp: add new flag HWTSTAMP_FLAG_BONDED_PHC_INDEXHangbin Liu1-4/+0
Since commit 94dd016ae538 ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device") the user could get bond active interface's PHC index directly. But when there is a failover, the bond active interface will change, thus the PHC index is also changed. This may break the user's program if they did not update the PHC timely. This patch adds a new hwtstamp_config flag HWTSTAMP_FLAG_BONDED_PHC_INDEX. When the user wants to get the bond active interface's PHC, they need to add this flag and be aware the PHC index may be changed. With the new flag. All flag checks in current drivers are removed. Only the checking in net_hwtstamp_validate() is kept. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>