aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/freescale/enetc/enetc.c (follow)
AgeCommit message (Collapse)AuthorFilesLines
2021-03-10net: enetc: squash enetc_alloc_cbdr and enetc_setup_cbdrVladimir Oltean1-3/+1
enetc_alloc_cbdr and enetc_setup_cbdr are always called one after another, so we can simplify the callers and make enetc_setup_cbdr do everything that's needed. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10net: enetc: save the DMA device for enetc_free_cbdrVladimir Oltean1-2/+2
We shouldn't need to pass the struct device *dev to enetc CBDR APIs over and over again, so save this inside struct enetc_cbdr::dma_dev and avoid calling it from the enetc_free_cbdr functions. This breaks the dependency of the cbdr API from struct enetc_si (the station interface). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10net: enetc: move the CBDR API to enetc_cbdr.cVladimir Oltean1-54/+0
Since there is a dedicated file in this driver for interacting with control BD rings, it makes sense to move these functions there. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-10net: add a helper to avoid issues with HW TX timestamping and SO_TXTIMEVladimir Oltean1-6/+2
As explained in commit 29d98f54a4fe ("net: enetc: allow hardware timestamping on TX queues with tc-etf enabled"), hardware TX timestamping requires an skb with skb->tstamp = 0. When a packet is sent with SO_TXTIME, the skb->skb_mstamp_ns corrupts the value of skb->tstamp, so the drivers need to explicitly reset skb->tstamp to zero after consuming the TX time. Create a helper named skb_txtime_consumed() which does just that. All drivers which offload TC_SETUP_QDISC_ETF should implement it, and it would make it easier to assess during review whether they do the right thing in order to be compatible with hardware timestamping or not. Suggested-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-08net: enetc: allow hardware timestamping on TX queues with tc-etf enabledVladimir Oltean1-0/+6
The txtime is passed to the driver in skb->skb_mstamp_ns, which is actually in a union with skb->tstamp (the place where software timestamps are kept). Since commit b50a5c70ffa4 ("net: allow simultaneous SW and HW transmit timestamping"), __sock_recv_timestamp has some logic for making sure that the two calls to skb_tstamp_tx: skb_tx_timestamp(skb) # Software timestamp in the driver -> skb_tstamp_tx(skb, NULL) and skb_tstamp_tx(skb, &shhwtstamps) # Hardware timestamp in the driver will both do the right thing and in a race-free manner, meaning that skb_tx_timestamp will deliver a cmsg with the software timestamp only, and skb_tstamp_tx with a non-NULL hwtstamps argument will deliver a cmsg with the hardware timestamp only. Why are races even possible? Well, because although the software timestamp skb->tstamp is private per skb, the hardware timestamp skb_hwtstamps(skb) lives in skb_shinfo(skb), an area which is shared between skbs and their clones. And skb_tstamp_tx works by cloning the packets when timestamping them, therefore attempting to perform hardware timestamping on an skb's clone will also change the hardware timestamp of the original skb. And the original skb might have been yet again cloned for software timestamping, at an earlier stage. So the logic in __sock_recv_timestamp can't be as simple as saying "does this skb have a hardware timestamp? if yes I'll send the hardware timestamp to the socket, otherwise I'll send the software timestamp", precisely because the hardware timestamp is shared. Instead, it's quite the other way around: __sock_recv_timestamp says "does this skb have a software timestamp? if yes, I'll send the software timestamp, otherwise the hardware one". This works because the software timestamp is not shared with clones. But that means we have a problem when we attempt hardware timestamping with skbs that don't have the skb->tstamp == 0. __sock_recv_timestamp will say "oh, yeah, this must be some sort of odd clone" and will not deliver the hardware timestamp to the socket. And this is exactly what is happening when we have txtime enabled on the socket: as mentioned, that is put in a union with skb->tstamp, so it is quite easy to mistake it. Do what other drivers do (intel igb/igc) and write zero to skb->tstamp before taking the hardware timestamp. It's of no use to us now (we're already on the TX confirmation path). Fixes: 0d08c9ec7d6e ("enetc: add support time specific departure base on the qos etf") Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01net: enetc: keep RX ring consumer index in sync with hardwareVladimir Oltean1-0/+2
The RX rings have a producer index owned by hardware, where newly received frame buffers are placed, and a consumer index owned by software, where newly allocated buffers are placed, in expectation of hardware being able to place frame data in them. Hardware increments the producer index when a frame is received, however it is not allowed to increment the producer index to match the consumer index (RBCIR) since the ring can hold at most RBLENR[LENGTH]-1 received BDs. Whenever the producer index matches the value of the consumer index, the ring has no unprocessed received frames and all BDs in the ring have been initialized/prepared by software, i.e. hardware owns all BDs in the ring. The code uses the next_to_clean variable to keep track of the producer index, and the next_to_use variable to keep track of the consumer index. The RX rings are seeded from enetc_refill_rx_ring, which is called from two places: 1. initially the ring is seeded until full with enetc_bd_unused(rx_ring), i.e. with 511 buffers. This will make next_to_clean=0 and next_to_use=511: .ndo_open -> enetc_open -> enetc_setup_bdrs -> enetc_setup_rxbdr -> enetc_refill_rx_ring 2. then during the data path processing, it is refilled with 16 buffers at a time: enetc_msix -> napi_schedule -> enetc_poll -> enetc_clean_rx_ring -> enetc_refill_rx_ring There is just one problem: the initial seeding done during .ndo_open updates just the producer index (ENETC_RBPIR) with 0, and the software next_to_clean and next_to_use variables. Notably, it will not update the consumer index to make the hardware aware of the newly added buffers. Wait, what? So how does it work? Well, the reset values of the producer index and of the consumer index of a ring are both zero. As per the description in the second paragraph, it means that the ring is full of buffers waiting for hardware to put frames in them, which by coincidence is almost true, because we have in fact seeded 511 buffers into the ring. But will the hardware attempt to access the 512th entry of the ring, which has an invalid BD in it? Well, no, because in order to do that, it would have to first populate the first 511 entries, and the NAPI enetc_poll will kick in by then. Eventually, after 16 processed slots have become available in the RX ring, enetc_clean_rx_ring will call enetc_refill_rx_ring and then will [ finally ] update the consumer index with the new software next_to_use variable. From now on, the next_to_clean and next_to_use variables are in sync with the producer and consumer ring indices. So the day is saved, right? Well, not quite. Freeing the memory allocated for the rings is done in: enetc_close -> enetc_clear_bdrs -> enetc_clear_rxbdr -> this just disables the ring -> enetc_free_rxtx_rings -> enetc_free_rx_ring -> sets next_to_clean and next_to_use to 0 but again, nothing is committed to the hardware producer and consumer indices (yay!). The assumption is that the ring is disabled, so the indices don't matter anyway, and it's the responsibility of the "open" code path to set those up. .. Except that the "open" code path does not set those up properly. While initially, things almost work, during subsequent enetc_close -> enetc_open sequences, we have problems. To be precise, the enetc_open that is subsequent to enetc_close will again refill the ring with 511 entries, but it will leave the consumer index untouched. Untouched means, of course, equal to the value it had before disabling the ring and draining the old buffers in enetc_close. But as mentioned, enetc_setup_rxbdr will at least update the producer index though, through this line of code: enetc_rxbdr_wr(hw, idx, ENETC_RBPIR, 0); so at this stage we'll have: next_to_clean=0 (in hardware 0) next_to_use=511 (in hardware we'll have the refill index prior to enetc_close) Again, the next_to_clean and producer index are in sync and set to correct values, so the driver manages to limp on. Eventually, 16 ring entries will be consumed by enetc_poll, and the savior enetc_clean_rx_ring will come and call enetc_refill_rx_ring, and then update the hardware consumer ring based upon the new next_to_use. So.. it works? Well, by coincidence, it almost does, but there's a circumstance where enetc_clean_rx_ring won't be there to save us. If the previous value of the consumer index was 15, there's a problem, because the NAPI poll sequence will only issue a refill when 16 or more buffers have been consumed. It's easiest to illustrate this with an example: ip link set eno0 up ip addr add 192.168.100.1/24 dev eno0 ping 192.168.100.1 -c 20 # ping this port from another board ip link set eno0 down ip link set eno0 up ping 192.168.100.1 -c 20 # ping it again from the same other board One by one: 1. ip link set eno0 up -> calls enetc_setup_rxbdr: -> calls enetc_refill_rx_ring(511 buffers) -> next_to_clean=0 (in hw 0) -> next_to_use=511 (in hw 0) 2. ping 192.168.100.1 -c 20 # ping this port from another board enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 0) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=15 next_to_clean 14 (in hw 15) next_to_use 511 (in hw 0) enetc_clean_rx_ring: enetc_refill_rx_ring(16) increments next_to_use by 16 (mod 512) and writes it to hw enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=0 next_to_clean 15 (in hw 16) next_to_use 15 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 16 (in hw 17) next_to_use 15 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 17 (in hw 18) next_to_use 15 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 18 (in hw 19) next_to_use 15 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 19 (in hw 20) next_to_use 15 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 20 (in hw 21) next_to_use 15 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 21 (in hw 22) next_to_use 15 (in hw 15) 20 packets transmitted, 20 packets received, 0% packet loss 3. ip link set eno0 down enetc_free_rx_ring: next_to_clean 0 (in hw 22), next_to_use 0 (in hw 15) 4. ip link set eno0 up -> calls enetc_setup_rxbdr: -> calls enetc_refill_rx_ring(511 buffers) -> next_to_clean=0 (in hw 0) -> next_to_use=511 (in hw 15) 5. ping 192.168.100.1 -c 20 # ping it again from the same other board enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 15) enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 15) 20 packets transmitted, 12 packets received, 40% packet loss And there it dies. No enetc_refill_rx_ring (because cleaned_cnt must be equal to 15 for that to happen), no nothing. The hardware enters the condition where the producer (14) + 1 is equal to the consumer (15) index, which makes it believe it has no more free buffers to put packets in, so it starts discarding them: ip netns exec ns0 ethtool -S eno0 | grep -v ': 0' NIC statistics: Rx ring 0 discarded frames: 8 Summarized, if the interface receives between 16 and 32 (mod 512) frames and then there is a link flap, then the port will eventually die with no way to recover. If it receives less than 16 (mod 512) frames, then the initial NAPI poll [ before the link flap ] will not update the consumer index in hardware (it will remain zero) which will be ok when the buffers are later reinitialized. If more than 32 (mod 512) frames are received, the initial NAPI poll has the chance to refill the ring twice, updating the consumer index to at least 32. So after the link flap, the consumer index is still wrong, but the post-flap NAPI poll gets a chance to refill the ring once (because it passes through cleaned_cnt=15) and makes the consumer index be again back in sync with next_to_use. The solution to this problem is actually simple, we just need to write next_to_use into the hardware consumer index at enetc_open time, which always brings it back in sync after an initial buffer seeding process. The simpler thing would be to put the write to the consumer index into enetc_refill_rx_ring directly, but there are issues with the MDIO locking: in the NAPI poll code we have the enetc_lock_mdio() taken from top-level and we use the unlocked enetc_wr_reg_hot, whereas in enetc_open, the enetc_lock_mdio() is not taken at the top level, but instead by each individual enetc_wr_reg, so we are forced to put an additional enetc_wr_reg in enetc_setup_rxbdr. Better organization of the code is left as a refactoring exercise. Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01net: enetc: remove bogus write to SIRXIDR from enetc_setup_rxbdrVladimir Oltean1-1/+0
The Station Interface Receive Interrupt Detect Register (SIRXIDR) contains a 16-bit wide mask of 'interrupt detected' events for each ring associated with a port. Bit i is write-1-to-clean for RX ring i. I have no explanation whatsoever how this line of code came to be inserted in the blamed commit. I checked the downstream versions of that patch and none of them have it. The somewhat comical aspect of it is that we're writing a binary number to the SIRXIDR register, which is derived from enetc_bd_unused(rx_ring). Since the RX rings have 512 buffer descriptors, we end up writing 511 to this register, which is 0x1ff, so we are effectively clearing the 'interrupt detected' event for rings 0-8. This register is not what is used for interrupt handling though - it only provides a summary for the entire SI. The hardware provides one separate Interrupt Detect Register per RX ring, which auto-clears upon read. So there doesn't seem to be any adverse effect caused by this bogus write. There is, however, one reason why this should be handled as a bugfix: next_to_clean _should_ be committed to hardware, just not to that register, and this was obscuring the fact that it wasn't. This is fixed in the next patch, and removing the bogus line now allows the fix patch to be backported beyond that point. Fixes: fd5736bf9f23 ("enetc: Workaround for MDIO register access issue") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01net: enetc: fix incorrect TPID when receiving 802.1ad tagged packetsVladimir Oltean1-8/+26
When the enetc ports have rx-vlan-offload enabled, they report a TPID of ETH_P_8021Q regardless of what was actually in the packet. When rx-vlan-offload is disabled, packets have the proper TPID. Fix this inconsistency by finishing the TODO left in the code. Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01net: enetc: take the MDIO lock only once per NAPI poll cycleVladimir Oltean1-22/+9
The workaround for the ENETC MDIO erratum caused a performance degradation of 82 Kpps (seen with IP forwarding of two 1Gbps streams of 64B packets). This is due to excessive locking and unlocking in the fast path, which can be avoided. By taking the MDIO read-side lock only once per NAPI poll cycle, we are able to regain 54 Kpps (65%) of the performance hit. The rest of the performance degradation comes from the TX data path, but unfortunately it doesn't look like we can optimize that away easily, even with netdev_xmit_more(), there just isn't any skb batching done, to help with taking the MDIO lock less often than once per packet. We need to change the register accessor type for enetc_get_tx_tstamp, because it now runs under the enetc_lock_mdio as per the new call path detailed below: enetc_msix -> napi_schedule -> enetc_poll -> enetc_lock_mdio -> enetc_clean_tx_ring -> enetc_get_tx_tstamp -> enetc_clean_rx_ring -> enetc_unlock_mdio Fixes: fd5736bf9f23 ("enetc: Workaround for MDIO register access issue") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01net: enetc: initialize RFS/RSS memories for unused ports tooVladimir Oltean1-4/+4
Michael reports that since linux-next-20210211, the AER messages for ECC errors have started reappearing, and this time they can be reliably reproduced with the first ping on one of his LS1028A boards. $ ping 1[ 33.258069] pcieport 0000:00:1f.0: AER: Multiple Corrected error received: 0000:00:00.0 72.16.0.1 PING [ 33.267050] pcieport 0000:00:1f.0: AER: can't find device of ID0000 172.16.0.1 (172.16.0.1): 56 data bytes 64 bytes from 172.16.0.1: seq=0 ttl=64 time=17.124 ms 64 bytes from 172.16.0.1: seq=1 ttl=64 time=0.273 ms $ devmem 0x1f8010e10 32 0xC0000006 It isn't clear why this is necessary, but it seems that for the errors to go away, we must clear the entire RFS and RSS memory, not just for the ports in use. Sadly the code is structured in such a way that we can't have unified logic for the used and unused ports. For the minimal initialization of an unused port, we need just to enable and ioremap the PF memory space, and a control buffer descriptor ring. Unused ports must then free the CBDR because the driver will exit, but used ports can not pick up from where that code path left, since the CBDR API does not reinitialize a ring when setting it up, so its producer and consumer indices are out of sync between the software and hardware state. So a separate enetc_init_unused_port function was created, and it gets called right after the PF memory space is enabled. Fixes: 07bf34a50e32 ("net: enetc: initialize the RFS and RSS memories") Reported-by: Michael Walle <michael@walle.cc> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Michael Walle <michael@walle.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-01net: enetc: don't overwrite the RSS indirection table when initializingVladimir Oltean1-8/+3
After the blamed patch, all RX traffic gets hashed to CPU 0 because the hashing indirection table set up in: enetc_pf_probe -> enetc_alloc_si_resources -> enetc_configure_si -> enetc_setup_default_rss_table is overwritten later in: enetc_pf_probe -> enetc_init_port_rss_memory which zero-initializes the entire port RSS table in order to avoid ECC errors. The trouble really is that enetc_init_port_rss_memory really neads enetc_alloc_si_resources to be called, because it depends upon enetc_alloc_cbdr and enetc_setup_cbdr. But that whole enetc_configure_si thing could have been better thought out, it has nothing to do in a function called "alloc_si_resources", especially since its counterpart, "free_si_resources", does nothing to unwind the configuration of the SI. The point is, we need to pull out enetc_configure_si out of enetc_alloc_resources, and move it after enetc_init_port_rss_memory. This allows us to set up the default RSS indirection table after initializing the memory. Fixes: 07bf34a50e32 ("net: enetc: initialize the RFS and RSS memories") Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-11-19Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-17/+45
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17enetc: Workaround for MDIO register access issueAlex Marginean1-17/+45
Due to a hardware issue, an access to MDIO registers that is concurrent with other ENETC register accesses may lead to the MDIO access being dropped or corrupted. The workaround introduces locking for all register accesses to the ENETC register space. To reduce performance impact, a readers-writers locking scheme has been implemented. The writer in this case is the MDIO access code (irrelevant whether that MDIO access is a register read or write), and the reader is any access code to non-MDIO ENETC registers. Also, the datapath functions acquire the read lock fewer times and use _hot accessors. All the rest of the code uses the _wa accessors which lock every register access. The commit introducing MDIO support is - commit ebfcb23d62ab ("enetc: Add ENETC PF level external MDIO support") but due to subsequent refactoring this patch is applicable on top of a later commit. Fixes: 6517798dd343 ("enetc: Make MDIO accessors more generic and export to include/linux/fsl") Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20201112182608.26177-1-claudiu.manoil@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-04enetc: Remove Tx checksumming offload codeClaudiu Manoil1-46/+5
Tx checksumming has been defeatured and completely removed from the h/w reference manual. Made a little cleanup for the TSE case as this is complementary code. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20201103140213.3294-1-claudiu.manoil@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-11enetc: Migrate to PHYLINK and PCS_LYNXClaudiu Manoil1-33/+20
This is a methodical transition of the driver from phylib to phylink, following the guidelines from sfp-phylink.rst. The MAC register configurations based on interface mode were moved from the probing path to the mac_config() hook. MAC enable and disable commands (enabling Rx and Tx paths at MAC level) were also extracted and assigned to their corresponding phylink hooks. As part of the migration to phylink, the serdes configuration from the driver was offloaded to the PCS_LYNX module, introduced in commit 0da4c3d393e4 ("net: phy: add Lynx PCS module"), the PCS_LYNX module being a mandatory component required to make the enetc driver work with phylink. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Ioana Ciornei <ioana.cionei@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-08-03enetc: use napi_schedule to be compatible with PREEMPT_RTJiafei Pan1-1/+1
The driver calls napi_schedule_irqoff() from a context where, in RT, hardirqs are not disabled, since the IRQ handler is force-threaded. In the call path of this function, __raise_softirq_irqoff() is modifying its per-CPU mask of pending softirqs that must be processed, using or_softirq_pending(). The or_softirq_pending() function is not atomic, but since interrupts are supposed to be disabled, nobody should be preempting it, and the operation should be safe. Nonetheless, when running with hardirqs on, as in the PREEMPT_RT case, it isn't safe, and the pending softirqs mask can get corrupted, resulting in softirqs being lost and never processed. To have common code that works with PREEMPT_RT and with mainline Linux, we can use plain napi_schedule() instead. The difference is that napi_schedule() (via __napi_schedule) also calls local_irq_save, which disables hardirqs if they aren't already. But, since they already are disabled in non-RT, this means that in practice we don't see any measurable difference in throughput or latency with this patch. Signed-off-by: Jiafei Pan <Jiafei.Pan@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21enetc: Add adaptive interrupt coalescingClaudiu Manoil1-1/+47
Use the generic dynamic interrupt moderation (dim) framework to implement adaptive interrupt coalescing on Rx. With the per-packet interrupt scheme, a high interrupt rate has been noted for moderate traffic flows leading to high CPU utilization. The 'dim' scheme implemented by the current patch addresses this issue improving CPU utilization while using minimal coalescing time thresholds in order to preserve a good latency. On the Tx side use an optimal time threshold value by default. This value has been optimized for Tx TCP streams at a rate of around 85kpps on a 1G link, at which rate half of the Tx ring size (128) gets filled in 1500 usecs. Scaling this down to 2.5G links yields the current value of 600 usecs, which is conservative and gives good enough results for 1G links too (see next). Below are some measurement results for before and after this patch (and related dependencies) basically, for a 2 ARM Cortex-A72 @1.3Ghz CPUs system (32 KB L1 data cache), using 60secs log netperf TCP stream tests @ 1Gbit link (maximum throughput): 1) 1 Rx TCP flow, both Rx and Tx processed by the same NAPI thread on the same CPU: CPU utilization int rate (ints/sec) Before: 50%-60% (over 50%) 92k After: 13%-22% 3.5k-12k Comment: Major CPU utilization improvement for a single flow Rx TCP flow (i.e. netperf -t TCP_MAERTS) on a single CPU. Usually settles under 16% for longer tests. 2) 4 Rx TCP flows + 4 Tx TCP flows (+ pings to check the latency): Total CPU utilization Total int rate (ints/sec) Before: ~80% (spikes to 90%) ~100k After: 60% (more steady) ~4k Comment: Important improvement for this load test, while the ping test outcome does not show any notable difference compared to before. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21enetc: Add interrupt coalescing supportClaudiu Manoil1-6/+26
Enable programming of the interrupt coalescing registers and allow manual configuration of the coalescing time thresholds via ethtool. Packet thresholds have been fixed to predetermined values as there's no point in making them run-time configurable, also anticipating the dynamic interrupt moderation (DIM) algorithm which uses fixed packet thresholds as well. If the interface is up when the operation mode of traffic interrupt events is changed by the user (i.e. switching from default per-packet interrupts to coalesced interrupts), the traffic needs to be paused in the process. This patch also prepares the ground for introducing DIM on Rx. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21enetc: Fix interrupt coalescing register namingClaudiu Manoil1-2/+2
Interrupt coalescing registers naming in the current revision of the Ref Man (RM) is ICR, deprecating the ICIR name used in earlier (draft) versions of the RM. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21enetc: Factor out the traffic start/stop proceduresClaudiu Manoil1-25/+49
A reliable traffic pause (and reconfiguration) procedure is needed to be able to safely make h/w configuration changes during run-time, like changing the mode in which the interrupts are operating (i.e. with or without coalescing), as opposed to making on-the-fly register updates that may be subject to h/w or s/w concurrency issues. To this end, the code responsible of the run-time device configurations that basically starts resp. stops the traffic flow through the device has been extracted from the the enetc_open/_close procedures, to the separate standalone enetc_start/_stop procedures. Traffic stop should be as graceful as possible, it lets the executing napi threads to to finish while the interrupts stay disabled. But since the napi thread will try to re-enable interrupts by clearing the device's unmask register, the enable_irq/ disable_irq API has been used to avoid this potential concurrency issue and make the traffic pause procedure more reliable. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21enetc: Refine buffer descriptor ring sizesClaudiu Manoil1-2/+2
It's time to differentiate between Rx and Tx ring sizes. Not only Tx rings are processed differently than Rx rings, but their default number also differs - i.e. up to 8 Tx rings per device (8 traffic classes) vs. 2 Rx rings (one per CPU). So let's set Tx rings sizes to half the size of the Rx rings for now, to be conservative. The default ring sizes were decreased as well (to the next lower power of 2), to reduce the memory footprint, buffering etc., since the measurements I've made so far show that the rings are very unlikely to get full. This change also anticipates the introduction of the dynamic interrupt moderation (dim) algorithm which operates on maximum packet thresholds of 256 packets for Rx and 128 packets for Tx. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-2/+2
All conflicts seemed rather trivial, with some guidance from Saeed Mameed on the tc_ct.c one. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-26enetc: Fix tx rings bitmap iteration range, irq handlingClaudiu Manoil1-2/+2
The rings bitmap of an interrupt vector encodes which of the device's rings were assigned to that interrupt vector. Hence the iteration range of the tx rings bitmap (for_each_set_bit()) should be the total number of Tx rings of that netdevice instead of the number of rings assigned to the interrupt vector. Since there are 2 cores, and one interrupt vector for each core, the number of rings asigned to an interrupt vector is half the number of available rings. The impact of this error is that the upper half of the tx rings could still generate interrupts during napi polling. Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-0/+26
Minor overlapping changes in xfrm_device.c, between the double ESP trailing bug fix setting the XFRM_INIT flag and the changes in net-next preparing for bonding encryption support. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-19enetc: Fix HW_VLAN_CTAG_TX|RX togglingClaudiu Manoil1-0/+26
VLAN tag insertion/extraction offload is correctly activated at probe time but deactivation of this feature (i.e. via ethtool) is broken. Toggling works only for Tx/Rx ring 0 of a PF, and is ignored for the other rings, including the VF rings. To fix this, the existing VLAN offload toggling code was extended to all the rings assigned to a netdevice, instead of the default ring 0 (likely a leftover from the early validation days of this feature). And the code was moved to the common set_features() function to fix toggling for the VF driver too. Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-18enetc: Use struct_size() helper in kzalloc()Gustavo A. R. Silva1-4/+2
Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. This code was detected with the help of Coccinelle and, audited and fixed manually. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01net: enetc: add tc flower psfp offload driverPo Liu1-8/+17
This patch is to add tc flower offload for the enetc IEEE 802.1Qci(PSFP) function. There are four main feature parts to implement the flow policing and filtering for ingress flow with IEEE 802.1Qci features. They are stream identify(this is defined in the P802.1cb exactly but needed for 802.1Qci), stream filtering, stream gate and flow metering. Each function block includes many entries by index to assign parameters. So for one frame would be filtered by stream identify first, then flow into stream filter block by the same handle between stream identify and stream filtering. Then flow into stream gate control which assigned by the stream filtering entry. And then policing by the gate and limited by the max sdu in the filter block(optional). At last, policing by the flow metering block, index choosing at the fitering block. So you can see that each entry of block may link to many upper entries since they can be assigned same index means more streams want to share the same feature in the stream filtering or stream gate or flow metering. To implement such features, each stream filtered by source/destination mac address, some stream maybe also plus the vlan id value would be treated as one flow chain. This would be identified by the chain_index which already in the tc filter concept. Driver would maintain this chain and also with gate modules. The stream filter entry create by the gate index and flow meter(optional) entry id and also one priority value. Offloading only transfer the gate action and flow filtering parameters. Driver would create (or search same gate id and flow meter id and priority) one stream filter entry to set to the hardware. So stream filtering do not need transfer by the action offloading. This architecture is same with tc filter and actions relationship. tc filter maintain the list for each flow feature by keys. And actions maintain by the action list. Below showing a example commands by tc: > tc qdisc add dev eth0 ingress > ip link set eth0 address 10:00:80:00:00:00 > tc filter add dev eth0 parent ffff: protocol ip chain 11 \ flower skip_sw dst_mac 10:00:80:00:00:00 \ action gate index 10 \ sched-entry open 200000000 1 8000000 \ sched-entry close 100000000 -1 -1 Command means to set the dst_mac 10:00:80:00:00:00 to index 11 of stream identify module. Then setting the gate index 10 of stream gate module. Keep the gate open for 200ms and limit the traffic volume to 8MB in this sched-entry. Then direct the frames to the ingress queue 1. Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01net: enetc: add hw tc hw offload features for PSPF capabilityPo Liu1-0/+23
This patch is to let ethtool enable/disable the tc flower offload features. Hardware ENETC has the feature of PSFP which is for per-stream policing. When enable the tc hw offloading feature, driver would enable the IEEE 802.1Qci feature. It is only set the register enable bit for this feature not enable for any entry of per stream filtering and stream gate or stream identify but get how much capabilities for each feature. Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10enetc: Add dynamic allocation of extended Rx BD ringsClaudiu Manoil1-12/+27
Hardware timestamping support (PTP) on Rx requires extended buffer descriptors, double the size of normal Rx descriptors. On the current controller revision only the timestamping offload requires extended Rx descriptors. Since Rx timestamping can be turned on/off at runtime, make Rx ring allocation configurable at runtime too. As a result, the static config option FSL_ENETC_HW_TIMESTAMPING can be dropped and the extended descriptors can be used only when Rx timestamping gets activated. The extension has the same size as the base descriptor, making the descriptor iterators easy to update for the extended case. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10enetc: Clean up Rx BD iterationClaudiu Manoil1-19/+12
Improve maintainability of the code iterating the Rx buffer descriptors to prepare it to support iterating extended Rx BD descriptors as well. Don't increment by one the h/w descriptor pointers explicitly, provide an iterator that takes care of the h/w details. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-02enetc: add support time specific departure base on the qos etfPo Liu1-0/+12
ENETC implement time specific departure capability, which enables the user to specify when a frame can be transmitted. When this capability is enabled, the device will delay the transmission of the frame so that it can be transmitted at the precisely specified time. The delay departure time up to 0.5 seconds in the future. If the departure time in the transmit BD has not yet been reached, based on the current time, the packet will not be transmitted. This driver was loaded by Qos driver ETF. User could load it by tc commands. Here are the example commands: tc qdisc add dev eth0 root handle 1: mqprio \ num_tc 8 map 0 1 2 3 4 5 6 7 hw 1 tc qdisc replace dev eth0 parent 1:8 etf \ clockid CLOCK_TAI delta 30000 offload These example try to set queue mapping first and then set queue 7 with 30us ahead dequeue time. Then user send test frame should set SO_TXTIME feature for socket. There are also some limitations for this feature in hardware: - Transmit checksum offloads and time specific departure operation are mutually exclusive. - Time Aware Shaper feature (Qbv) offload and time specific departure operation are mutually exclusive. Signed-off-by: Po Liu <Po.Liu@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-10enetc: add software timestampingMichael Walle1-0/+2
Provide a software TX timestamp and add it to the ethtool query interface. skb_tx_timestamp() is also needed if one would like to use PHY timestamping. Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-06enetc: disable EEE autoneg by defaultYangbo Lu1-0/+5
The EEE support has not been enabled on ENETC, but it may connect to a PHY which supports EEE and advertises EEE by default, while its link partner also advertises EEE. If this happens, the PHY enters low power mode when the traffic rate is low and causes packet loss. This patch disables EEE advertisement by default for any PHY that ENETC connects to, to prevent the above unwanted outcome. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-25enetc: add support Credit Based Shaper(CBS) for hardware offloadPo Liu1-0/+2
The ENETC hardware support the Credit Based Shaper(CBS) which part of the IEEE-802.1Qav. The CBS driver was loaded by the sch_cbs interface when set in the QOS in the kernel. Here is an example command to set 20Mbits bandwidth in 1Gbits port for taffic class 7: tc qdisc add dev eth0 root handle 1: mqprio \ num_tc 8 map 0 1 2 3 4 5 6 7 hw 1 tc qdisc replace dev eth0 parent 1:8 cbs \ locredit -1470 hicredit 30 \ sendslope -980000 idleslope 20000 offload 1 Signed-off-by: Po Liu <Po.Liu@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-21enetc: make enetc_setup_tc_mqprio staticMao Wenan1-1/+1
While using ARCH=mips CROSS_COMPILE=mips-linux-gnu- command to compile, make C=2 drivers/net/ethernet/freescale/enetc/enetc.o one warning can be found: drivers/net/ethernet/freescale/enetc/enetc.c:1439:5: warning: symbol 'enetc_setup_tc_mqprio' was not declared. Should it be static? This patch make symbol enetc_setup_tc_mqprio static. Fixes: 34c6adf1977b ("enetc: Configure the Time-Aware Scheduler via tc-taprio offload") Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16enetc: update TSN Qbv PSPEED set according to adjust link speedPo Liu1-2/+11
ENETC has a register PSPEED to indicate the link speed of hardware. It is need to update accordingly. PSPEED field needs to be updated with the port speed for QBV scheduling purposes. Or else there is chance for gate slot not free by frame taking the MAC if PSPEED and phy speed not match. So update PSPEED when link adjust. This is implement by the adjust_link. Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16enetc: Configure the Time-Aware Scheduler via tc-taprio offloadPo Liu1-5/+14
ENETC supports in hardware for time-based egress shaping according to IEEE 802.1Qbv. This patch implement the Qbv enablement by the hardware offload method qdisc tc-taprio method. Also update cbdr writeback to up level since control bd ring may writeback data to control bd ring. Signed-off-by: Po Liu <Po.Liu@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07enetc: fix return value for enetc_ioctl()Michael Walle1-1/+1
Return -EOPNOTSUPP instead of -EINVAL if the requested ioctl is not implemented. Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-07enetc: add ioctl() support for PHY-related opsMichael Walle1-1/+4
If there is an attached PHY try to handle the requested ioctl with its handler, which allows the userspace to access PHY registers, for example. This will make mii-diag and similar tools work. Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-22net: Use skb accessors in network driversMatthew Wilcox (Oracle)1-1/+1
In preparation for unifying the skb_frag and bio_vec, use the fine accessors which already exist and use skb_frag_t instead of struct skb_frag_struct. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-28enetc: Enable TC offloading with mqprioCamelia Groza1-0/+56
Add support to configure multiple prioritized TX traffic classes with mqprio. Configure one BD ring per TC for the moment, one netdev queue per TC. Signed-off-by: Camelia Groza <camelia.groza@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-27enetc: fix le32/le16 degrading to integer warningsY.b. Lu1-7/+9
Fix blow sparse warning introduced by a previous patch. - restricted __le32 degrades to integer - restricted __le16 degrades to integer Fixes: d39823121911 ("enetc: add hardware timestamping support") Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-24enetc: add hardware timestamping supportY.b. Lu1-5/+153
This patch is to add hardware timestamping support for ENETC. On Rx, timestamping is enabled for all frames. On Tx, we only instruct the hardware to timestamp the frames marked accordingly by the stack. Because the RX BD ring dynamic allocation has not been supported and it is too expensive to use extended RX BDs if timestamping is not used, a Kconfig option is used to enable extended RX BDs in order to support hardware timestamping. This option will be removed once RX BD ring dynamic allocation is implemented. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-15enetc: Fix NULL dma address unmap for Tx BD extensionsClaudiu Manoil1-1/+3
For the unlikely case of TxBD extensions (i.e. ptp) the driver tries to unmap the tx_swbd corresponding to the extension, which is bogus as it has no buffer attached. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28enetc: include linux/vmalloc.h for vzalloc etcStephen Rothwell1-0/+1
Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24enetc: Add RFS and RSS supportClaudiu Manoil1-0/+77
A ternary match table is used for RFS. If multiple entries in the table match, the entry with the lowest numerical values index is chosen as the matching entry. Entries in the table are identified using an index which takes a value from 0 to PRFSCAPR[NUM_RFS]-1 when accessed by the PSI (PF). Portions of the RFS table can be assigned to each SI by the PSI (PF) driver in PSIaRFSCFGR. Assignments are cumulative, the entries assigned to SIn start after those assigned to SIn-1. The total assignments to all SIs must be equal to or less than the number available to the port as found in PRFSCAPR. For RSS, the Toeplitz hash function used requires two inputs, a 40B random secret key that is supplied through the PRSSKR0-9 registers as well as the relevant pieces of the packet header (n-tuple). The 6 LSB bits of the hash function result will then be used as a pointer to obtain the tag referenced in the 64 entry indirection table. The result will provide a winning group which will be used to help route the received packet. Signed-off-by: Alex Marginean <alexandru.marginean@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24enetc: Introduce basic PF and VF ENETC ethernet driversClaudiu Manoil1-0/+1526
ENETC is a multi-port virtualized Ethernet controller supporting GbE designs and Time-Sensitive Networking (TSN) functionality. ENETC is operating as an SR-IOV multi-PF capable Root Complex Integrated Endpoint (RCIE). As such, it contains multiple physical (PF) and virtual (VF) PCIe functions, discoverable by standard PCI Express. Introduce basic PF and VF ENETC ethernet drivers. The PF has access to the ENETC Port registers and resources and makes the required privileged configurations for the underlying VF devices. Common functionality is controlled through so called System Interface (SI) register blocks, PFs and VFs own a SI each. Though SI register blocks are almost identical, there are a few privileged SI level controls that are accessible only to PFs, and so the distinction is made between PF SIs (PSI) and VF SIs (VSI). As such, the bulk of the code, including datapath processing, basic h/w offload support and generic pci related configuration, is shared between the 2 drivers and is factored out in common source files (i.e. enetc.c). Major functionalities included (for both drivers): MSI-X support for Rx and Tx processing, assignment of Rx/Tx BD ring pairs to MSI-X entries, multi-queue support, Rx S/G (Rx frame fragmentation) and jumbo frame (up to 9600B) support, Rx paged allocation and reuse, Tx S/G support (NETIF_F_SG), Rx and Tx checksum offload, PF MAC filtering and initial control ring support, VLAN extraction/ insertion, PF Rx VLAN CTAG filtering, VF mac address config support, VF VLAN isolation support, etc. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>