Age | Commit message (Collapse) | Author | Files | Lines |
|
The default RX IRQ coalescing settings of one IRQ per packet can represent
a significant CPU load. However, increasing the coalescing unilaterally
can result in undesirable latency under low load. Adaptive IRQ
coalescing with DIM offers a way to adjust the coalescing settings based
on load.
This device only supports "CQE" mode [1], where each packet resets the
timer. Therefore, an interrupt is fired either when we receive
coalesce_count_rx packets or when the interface is idle for
coalesce_usec_rx. With this in mind, consider the following scenarios:
Link saturated
Here we want to set coalesce_count_rx to a large value, in order to
coalesce more packets and reduce CPU load. coalesce_usec_rx should
be set to at least the time for one packet. Otherwise the link will
be "idle" and we will get an interrupt for each packet anyway.
Bursts of packets
Each burst should be coalesced into a single interrupt, although it
may be prudent to reduce coalesce_count_rx for better latency.
coalesce_usec_rx should be set to at least the time for one packet
so bursts are coalesced. However, additional time beyond the packet
time will just increase latency at the end of a burst.
Sporadic packets
Due to low load, we can set coalesce_count_rx to 1 in order to
reduce latency to the minimum. coalesce_usec_rx does not matter in
this case.
Based on this analysis, I expected the CQE profiles to look something
like
usec = 0, pkts = 1 // Low load
usec = 16, pkts = 4
usec = 16, pkts = 16
usec = 16, pkts = 64
usec = 16, pkts = 256 // High load
Where usec is set to 16 to be a few us greater than the 12.3 us packet
time of a 1500 MTU packet at 1 GBit/s. However, the CQE profile is
instead
usec = 2, pkts = 256 // Low load
usec = 8, pkts = 128
usec = 16, pkts = 64
usec = 32, pkts = 64
usec = 64, pkts = 64 // High load
I found this very surprising. The number of coalesced packets
*decreases* as load increases. But as load increases we have more
opportunities to coalesce packets without affecting latency as much.
Additionally, the profile *increases* the usec as the load increases.
But as load increases, the gaps between packets will tend to become
smaller, making it possible to *decrease* usec for better latency at the
end of a "burst".
I consider the default CQE profile unsuitable for this NIC. Therefore,
we use the first profile outlined in this commit instead.
coalesce_usec_rx is set to 16 by default, but the user can customize it.
This may be necessary if they are using jumbo frames. I think adjusting
the profile times based on the link speed/mtu would be good improvement
for generic DIM.
In addition to the above profile problems, I noticed the following
additional issues with DIM while testing:
- DIM tends to "wander" when at low load, since the performance gradient
is pretty flat. If you only have 10p/ms anyway then adjusting the
coalescing settings will not affect throughput very much.
- DIM takes a long time to adjust back to low indices when load is
decreased following a period of high load. This is because it only
re-evaluates its settings once every 64 interrupts. However, at low
load 64 interrupts can be several seconds.
Finally: performance. This patch increases receive throughput with
iperf3 from 840 Mbits/sec to 938 Mbits/sec, decreases interrupts from
69920/sec to 316/sec, and decreases CPU utilization (4x Cortex-A53) from
43% to 9%.
[1] Who names this stuff?
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20250206201036.1516800-5-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The cr variables now contain the same values as the control registers
themselves. Extract/calculate the values from the variables instead of
saving the user-specified values. This allows us to remove some
bookeeping, and also lets the user know what the actual coalesce
settings are.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20250206201036.1516800-4-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In preparation for adaptive IRQ coalescing, we first need to support
adjusting the settings at runtime. The existing code doesn't require any
locking because
- dma_start is the only function that modifies rx/tx_dma_cr. It is
always called with IRQs and NAPI disabled, so nothing else is touching
the hardware.
- The IRQs don't race with poll, since the latter is a softirq.
- The IRQs don't race with dma_stop since they both just clear the
control registers.
- dma_stop doesn't race with poll since the former is called with NAPI
disabled.
However, once we introduce another function that modifies rx/tx_dma_cr,
we need to have some locking to prevent races. Introduce two locks to
protect these variables and their registers.
The control register values are now generated where the coalescing
settings are set. Converting coalescing settings to control register
values may require sleeping because of clk_get_rate. However, the
read/modify/write of the control registers themselves can't sleep
because it needs to happen in IRQ context. By pre-calculating the
control register values, we avoid introducing an additional mutex.
Since axienet_dma_start writes the control settings when it runs, we
don't bother updating the CR registers when rx/tx_dma_started is false.
This prevents any issues from writing to the control registers in the
middle of a reset sequence.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20250206201036.1516800-3-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Combine the common parts of the CR calculations for better code reuse.
While we're at it, simplify the code a bit.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20250206201036.1516800-2-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Instead of using literals, add some symbolic constants for the IRQ delay
timer calculation.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://patch.msgid.link/20250116232954.2696930-2-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
These variables are set but never used. Remove them.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://patch.msgid.link/20240909161016.1149119-2-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/phy/phy_device.c
2560db6ede1a ("net: phy: Fix missing of_node_put() for leds")
1dce520abd46 ("net: phy: Use for_each_available_child_of_node_scoped()")
https://lore.kernel.org/20240904115823.74333648@canb.auug.org.au
Adjacent changes:
drivers/net/ethernet/xilinx/xilinx_axienet.h
drivers/net/ethernet/xilinx/xilinx_axienet_main.c
858430db28a5 ("net: xilinx: axienet: Fix race in axienet_stop")
76abb5d675c4 ("net: xilinx: axienet: Add statistics support")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
axienet_dma_err_handler can race with axienet_stop in the following
manner:
CPU 1 CPU 2
====================== ==================
axienet_stop()
napi_disable()
axienet_dma_stop()
axienet_dma_err_handler()
napi_disable()
axienet_dma_stop()
axienet_dma_start()
napi_enable()
cancel_work_sync()
free_irq()
Fix this by setting a flag in axienet_stop telling
axienet_dma_err_handler not to bother doing anything. I chose not to use
disable_work_sync to allow for easier backporting.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
Link: https://patch.msgid.link/20240903175141.4132898-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for IFF_ALLMULTI by configuring a single filter to match the
multicast address bit. This allows us to keep promiscuous mode disabled,
even when we have more than four multicast addresses. An even better
solution would be to "pack" addresses into the available CAM registers,
but that can wait for a future series.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240822154059.1066595-6-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
drivers/net/ethernet/broadcom/bnxt/bnxt.h
c948c0973df5 ("bnxt_en: Don't clear ntuple filters and rss contexts during ethtool ops")
f2878cdeb754 ("bnxt_en: Add support to call FW to update a VNIC")
Link: https://patch.msgid.link/20240822210125.1542769-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If a multicast address is removed but there are still some multicast
addresses, that address would remain programmed into the frame filter.
Fix this by explicitly setting the enable bit for each filter.
Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240822154059.1066595-3-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add support for reading the statistics counters, if they are enabled.
The counters may be 64-bit, but we can't detect this statically as
there's no ability bit for it and the counters are read-only. Therefore,
we assume the counters are 32-bits by default. To ensure we don't miss
an overflow, we read all counters at 13-second intervals. This should be
often enough to ensure the bytes counters don't wrap at 2.5 Gbit/s.
Another complication is that the counters may be reset when the device
is reset (depending on configuration). To ensure the counters persist
across link up/down (including suspend/resume), we maintain our own
versions along with the last counter value we saw. Because we might wait
up to 100 ms for the reset to complete, we use a mutex to protect
writing hw_stats. We can't sleep in ndo_get_stats64, so we use a seqlock
to protect readers.
We don't bother disabling the refresh work when we detect 64-bit
counters. This is because the reset issue requires us to read
hw_stat_base and reset_in_progress anyway, which would still require the
seqcount. And I don't think skipping the task is worth the extra
bookkeeping.
We can't use the byte counters for either get_stats64 or
get_eth_mac_stats. This is because the byte counters include everything
in the frame (destination address to FCS, inclusive). But
rtnl_link_stats64 wants bytes excluding the FCS, and
ethtool_eth_mac_stats wants to exclude the L2 overhead (addresses and
length/type). It might be possible to calculate the byte values Linux
expects based on the frame counters, but I think it is simpler to use
the existing software counters.
get_ethtool_stats is implemented for nonstandard statistics. This
includes the aforementioned byte counters, VLAN and PFC frame
counters, and user-defined (e.g. with custom RTL) counters.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Link: https://patch.msgid.link/20240820175343.760389-3-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
Documentation/devicetree/bindings/net/fsl,qoriq-mc-dpmac.yaml
c25504a0ba36 ("dt-bindings: net: fsl,qoriq-mc-dpmac: add missed property phys")
be034ee6c33d ("dt-bindings: net: fsl,qoriq-mc-dpmac: using unevaluatedProperties")
https://lore.kernel.org/20240815110934.56ae623a@canb.auug.org.au
drivers/net/dsa/vitesse-vsc73xx-core.c
5b9eebc2c7a5 ("net: dsa: vsc73xx: pass value in phy_write operation")
fa63c6434b6f ("net: dsa: vsc73xx: check busy flag in MDIO operations")
2524d6c28bdc ("net: dsa: vsc73xx: use defined values in phy operations")
https://lore.kernel.org/20240813104039.429b9fe6@canb.auug.org.au
Resolve by using FIELD_PREP(), Stephen's resolution is simpler.
Adjacent changes:
net/vmw_vsock/af_vsock.c
69139d2919dd ("vsock: fix recursive ->recvmsg calls")
744500d81f81 ("vsock: add support for SIOCOUTQ ioctl")
Link: https://patch.msgid.link/20240815141149.33862-1-pabeni@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In axiethernet header fix register defines comment description to be
inline with IP documentation. It updates MAC configuration register,
MDIO configuration register and frame filter control description.
Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver")
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Replace all occurences of (1<<x) by BIT(x) to get rid of checkpatch.pl
"CHECK" output "Prefer using the BIT macro".
Signed-off-by: Appana Durga Kedareswara Rao <appana.durga.rao@xilinx.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add description of mdio enable, mdio disable and mdio wait functions.
Add description of skb pointer in axidma_bd data structure.
Remove 'phy_node' description in axienet local data structure since
it is not a valid struct member.
Correct description of struct axienet_option.
Fix below kernel-doc warnings in drivers/net/ethernet/xilinx/:
1) xilinx_axienet_mdio.c:1: warning: no structured comments found
2) xilinx_axienet.h:379: warning: Function parameter or struct member
'skb' not described in 'axidma_bd'
3) xilinx_axienet.h:538: warning: Excess struct member 'phy_node'
description in 'axienet_local'
4) xilinx_axienet.h:1002: warning: expecting prototype for struct
axiethernet_option. Prototype was for struct axienet_option instead
Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://lore.kernel.org/r/20240328110713.12885-1-suraj.gupta2@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add dmaengine framework to communicate with the xilinx DMAengine
driver(AXIDMA).
Axi ethernet driver uses separate channels for transmit and receive.
Add support for these channels to handle TX and RX with skb and
appropriate callbacks. Also add axi ethernet core interrupt for
dmaengine framework support.
The dmaengine framework was extended for metadata API support.
However it still needs further enhancements to make it well suited for
ethernet usecases. The ethernet features i.e ethtool set/get of DMA IP
properties, ndo_poll_controller,(mentioned in TODO) are not supported
and it requires follow-up discussions.
dmaengine support has a dependency on xilinx_dma as it uses
xilinx_vdma_channel_set_config() API to reset the DMA IP
which internally reset MAC prior to accessing MDIO.
Benchmark with netperf:
xilinx-zcu102-20232:~$ netperf -H 192.168.10.20 -t TCP_STREAM
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 192.168.10.20 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
131072 16384 16384 10.02 886.69
xilinx-zcu102-20232:~$ netperf -H 192.168.10.20 -t UDP_STREAM
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 192.168.10.20 () port 0 AF_INET
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 65507 10.00 15851 0 830.66
212992 10.00 15851 830.66
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://lore.kernel.org/r/1700074613-1977070-4-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The axiethernet driver has inbuilt dma programming. In order to add
dmaengine support and make it's integration seamless the current axidma
inbuilt programming code is put under use_dmaengine check.
It also performs minor code reordering to minimize conditional
use_dmaengine checks and there is no functional change. It uses
"dmas" property to identify whether it should use a dmaengine
framework or inbuilt axidma programming.
Signed-off-by: Sarath Babu Naidu Gaddam <sarath.babu.naidu.gaddam@amd.com>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com>
Link: https://lore.kernel.org/r/1700074613-1977070-3-git-send-email-radhey.shyam.pandey@amd.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Both axienet_mdio_{enable/disable} functions are no longer used in
xilinx_axienet_main.c due to 253761a0e61b7. And axienet_mdio_disable is
not even used in the mdio.c. So unexport and remove them.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cleaning some static warnings of open parenthesis.
Signed-off-by: huangjunxian <huangjunxian6@hisilicon.com>
Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com>
Reviewed-by: Harini Katakam <harini.katakam@amd.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The RX and TX byte/packet statistics in this driver could be overflowed
relatively quickly on a 32-bit platform. Switch these stats to use the
u64_stats infrastructure to avoid this.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Link: https://lore.kernel.org/r/20220829233901.3429419-1-robert.hancock@calian.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
CONFIG_64BIT is not sufficient for checking for availability of
iowrite64() and friends.
Also, the out_addr helpers need to be inline.
Fixes: b690f8df6497 ("net: axienet: Use iowrite64 to write all 64b descriptor pointers")
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
According to commit f735c40ed93c ("net: axienet: Autodetect 64-bit DMA
capability") and AXI-DMA spec (pg021), on 64-bit capable dma, only
writing MSB part of tail descriptor pointer causes DMA engine to start
fetching descriptors. However, we found that it is true only if dma is in
idle state. In other words, dma would use a tailp even if it only has LSB
updated, when the dma is running.
The non-atomicity of this behavior could be problematic if enough
delay were introduced in between the 2 writes. For example, if an
interrupt comes right after the LSB write and the cpu spends long
enough time in the handler for the dma to get back into idle state by
completing descriptors, then the seconcd write to MSB would treat dma
to start fetching descriptors again. Since the descriptor next to the
one pointed by current tail pointer is not filled by the kernel yet,
fetching a null descriptor here causes a dma internal error and halt
the dma engine down.
We suggest that the dma engine should start process a 64-bit MMIO write
to the descriptor pointer only if ONE 32-bit part of it is written on all
states. Or we should restrict the use of 64-bit addressable dma on 32-bit
platforms, since those devices have no instruction to guarantee the write
to LSB and MSB part of tail pointer occurs atomically to the dma.
initial condition:
curp = x-3;
tailp = x-2;
LSB = x;
MSB = 0;
cpu: |dma:
iowrite32(LSB, tailp) | completes #(x-3) desc, curp = x-3
... | tailp updated
=> irq | completes #(x-2) desc, curp = x-2
... | completes #(x-1) desc, curp = x-1
... | ...
... | completes #x desc, curp = tailp = x
<= irqreturn | reaches tailp == curp = x, idle
iowrite32(MSB, tailp + 4) | ...
| tailp updated, starts fetching...
| fetches #(x + 1) desc, sees cntrl = 0
| post Tx error, halt
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reported-by: Max Hsu <max.hsu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently it is not safe to config the IP as 64-bit addressable on 32-bit
archectures, which cannot perform a double-word store on its descriptor
pointers. The pointer is 64-bit wide if the IP is configured as 64-bit,
and the device would process the partially updated pointer on some
states if the pointer was updated via two store-words. To prevent such
condition, we force a probe fail if we discover that the IP has 64-bit
capability but it is not running on a 64-Bit kernel.
This is a series of patch (1/2). The next patch must be applied in order
to make 64b DMA safe on 64b archectures.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reported-by: Max Hsu <max.hsu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This driver was using the TX IRQ handler to perform all TX completion
tasks. Under heavy TX network load, this can cause significant irqs-off
latencies (found to be in the hundreds of microseconds using ftrace).
This can cause other issues, such as overrunning serial UART FIFOs when
using high baud rates with limited UART FIFO sizes.
Switch to using a NAPI poll handler to perform the TX completion work
to get this out of hard IRQ context and avoid the IRQ latency impact.
A separate poll handler is used for TX and RX since they have separate
IRQs on this controller, so that the completion work for each of them
stays on the same CPU as the interrupt.
Testing on a Xilinx MPSoC ZU9EG platform using iperf3 from a Linux PC
through a switch at 1G link speed showed no significant change in TX or
RX throughput, with approximately 941 Mbps before and after. Hard IRQ
time in the TX throughput test was significantly reduced from 12% to
below 1% on the CPU handling TX interrupts, with total hard+soft IRQ CPU
usage dropping from about 56% down to 48%.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
the struct member `phy_node` of struct axienet_local is not used by the
driver anymore after initialization. It might be a remnent of old code
and could be removed.
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add the ability to configure the RX/TX coalesce timer with ethtool.
Change default setting to scale with the clock rate rather than being a
fixed number of clock cycles.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that NAPI has been implemented, the hardware interrupt mitigation
mechanism is not needed to avoid excessive interrupt load in most cases.
Reduce the default RX interrupt threshold to 1 to reduce introduced
latency. This can be increased with ethtool if desired if some applications
still want to reduce interrupts.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement NAPI and GRO receive. In addition to better performance, this
also avoids handling RX packets in hard IRQ context, which reduces the
IRQ latency impact to other devices.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert axienet to use the phylink_pcs layer, resulting in it no longer
being a legacy driver.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Conflicts:
MAINTAINERS
- keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
- simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
- trivial
include/linux/ethtool.h
- trivial, fix kdoc while at it
include/linux/skmsg.h
- move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
- add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
- trivial
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This driver was only enabling the first clock on the device, regardless
of its name. However, this controller logic can have multiple clocks
which should all be enabled. Add support for enabling additional clocks.
The clock names used are matching those used in the Xilinx version of this
driver as well as the Xilinx device tree generator, except for mgt_clk
which is not present there.
For backward compatibility, if no named clocks are present, the first
clock present is used for determining the MDIO bus clock divider.
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In setups with fixed-link settings there is no mdio node in DTS.
axienet_probe() already handles that gracefully but lp->mii_bus is
then NULL.
Fix code that tries to blindly grab the MDIO lock by introducing two helper
functions that make the locking conditional.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Newer versions of the Xilinx AXI Ethernet core (specifically version 7.2 or
later) allow the core to be configured with a PHY interface mode of "Both",
allowing either 1000BaseX or SGMII modes to be selected at runtime. Add
support for this in the driver to allow better support for applications
which can use both fiber and copper SFP modules.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce helper functions to enable/disable MDIO interface clock. This
change serves a preparatory patch for the coming feature to dynamically
control the management bus clock.
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Update the axienet driver to properly support the Xilinx PCS/PMA PHY
component which is used for 1000BaseX and SGMII modes, including
properly configuring the auto-negotiation mode of the PHY and reading
the negotiated state from the PHY.
Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Link: https://lore.kernel.org/r/20201028171429.1699922-1-robert.hancock@calian.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is a spelling mistake in a comment. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When newer revisions of the Axienet IP are configured for a 64-bit bus,
we *need* to write to the MSB part of the an address registers,
otherwise the IP won't recognise this as a DMA start condition.
This is even true when the actual DMA address comes from the lower 4 GB.
To autodetect this configuration, at probe time we write all 1's to such
an MSB register, and see if any bits stick. If this is configured for a
32-bit bus, those MSB registers are RES0, so reading back 0 indicates
that no MSB writes are necessary.
On the other hands reading anything other than 0 indicated the need to
write the MSB registers, so we set the respective flag.
The actual DMA mask stays at 32-bit for now. To help bisecting, a
separate patch will enable allocations from higher addresses.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Newer revisions of the AXI DMA IP (>= v7.1) support 64-bit addresses,
both for the descriptors itself, as well as for the buffers they are
pointing to.
This is realised by adding "MSB" words for the next and phys pointer
right behind the existing address word, now named "LSB". These MSB words
live in formerly reserved areas of the descriptor.
If the hardware supports it, write both words when setting an address.
The buffer address is handled by two wrapper functions, the two
occasions where we set the next pointers are open coded.
For now this is guarded by a flag which we don't set yet.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Newer revisions of the IP don't have these registers. Since we don't
really use them, just drop them from the ethtools dump.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The DMA error handler routine is currently a tasklet, scheduled to run
after the DMA error IRQ was handled.
However it needs to take the MDIO mutex, which is not allowed to do in a
tasklet. A kernel (with debug options) complains consequently:
[ 614.050361] net eth0: DMA Tx error 0x174019
[ 614.064002] net eth0: Current BD is at: 0x8f84aa0ce
[ 614.080195] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
[ 614.109484] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 40, name: kworker/u4:4
[ 614.135428] 3 locks held by kworker/u4:4/40:
[ 614.149075] #0: ffff000879863328 ((wq_completion)rpciod){....}, at: process_one_work+0x1f0/0x6a8
[ 614.177528] #1: ffff80001251bdf8 ((work_completion)(&task->u.tk_work)){....}, at: process_one_work+0x1f0/0x6a8
[ 614.209033] #2: ffff0008784e0110 (sk_lock-AF_INET-RPC){....}, at: tcp_sendmsg+0x24/0x58
[ 614.235429] CPU: 0 PID: 40 Comm: kworker/u4:4 Not tainted 5.6.0-rc3-00926-g4a165a9d5921 #26
[ 614.260854] Hardware name: ARM Test FPGA (DT)
[ 614.274734] Workqueue: rpciod rpc_async_schedule
[ 614.289022] Call trace:
[ 614.296871] dump_backtrace+0x0/0x1a0
[ 614.308311] show_stack+0x14/0x20
[ 614.318751] dump_stack+0xbc/0x100
[ 614.329403] ___might_sleep+0xf0/0x140
[ 614.341018] __might_sleep+0x4c/0x80
[ 614.352201] __mutex_lock+0x5c/0x8a8
[ 614.363348] mutex_lock_nested+0x1c/0x28
[ 614.375654] axienet_dma_err_handler+0x38/0x388
[ 614.389999] tasklet_action_common.isra.15+0x160/0x1a8
[ 614.405894] tasklet_action+0x24/0x30
[ 614.417297] efi_header_end+0xe0/0x494
[ 614.429020] irq_exit+0xd0/0xd8
[ 614.439047] __handle_domain_irq+0x60/0xb0
[ 614.451877] gic_handle_irq+0xdc/0x2d0
[ 614.463486] el1_irq+0xcc/0x180
[ 614.473451] __tcp_transmit_skb+0x41c/0xb58
[ 614.486513] tcp_write_xmit+0x224/0x10a0
[ 614.498792] __tcp_push_pending_frames+0x38/0xc8
[ 614.513126] tcp_rcv_established+0x41c/0x820
[ 614.526301] tcp_v4_do_rcv+0x8c/0x218
[ 614.537784] __release_sock+0x5c/0x108
[ 614.549466] release_sock+0x34/0xa0
[ 614.560318] tcp_sendmsg+0x40/0x58
[ 614.571053] inet_sendmsg+0x40/0x68
[ 614.582061] sock_sendmsg+0x18/0x30
[ 614.593074] xs_sendpages+0x218/0x328
[ 614.604506] xs_tcp_send_request+0xa0/0x1b8
[ 614.617461] xprt_transmit+0xc8/0x4f0
[ 614.628943] call_transmit+0x8c/0xa0
[ 614.640028] __rpc_execute+0xbc/0x6f8
[ 614.651380] rpc_async_schedule+0x28/0x48
[ 614.663846] process_one_work+0x298/0x6a8
[ 614.676299] worker_thread+0x40/0x490
[ 614.687687] kthread+0x134/0x138
[ 614.697804] ret_from_fork+0x10/0x18
[ 614.717319] xilinx_axienet 7fe00000.ethernet eth0: Link is Down
[ 615.748343] xilinx_axienet 7fe00000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
Since tasklets are not really popular anymore anyway, lets convert this
over to a work queue, which can sleep and thus can take the MDIO mutex.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert this driver to use the phylink API rather than the legacy PHY
API. This allows for better support for SFP modules connected using a
1000BaseX or SGMII interface.
Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Previously this driver only handled interrupts from the DMA RX and TX
blocks, not from the Ethernet core itself. Add optional support for
the Ethernet core interrupt, which is used to detect rx_missed and
framing errors signalled by the hardware. In order to use this
interrupt, a third interrupt needs to be specified in the device tree.
Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for setting the RX and TX ring sizes for this driver using
ethtool. Also increase the default RX ring size as the previous default
was far too low for good performance in some configurations.
Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The Xilinx DMA blocks each have their own reset register, but they both
reset the entire DMA engine, so only one of them needs to be reset.
Also, when stopping the device, we need to not just command the DMA
blocks to stop, but wait for them to stop, and trigger a device reset
to ensure that they are completely stopped.
Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MDIO clock divisor register setting was only applied on the initial
startup when the driver was loaded. However, this setting is cleared
when the device is reset, such as would occur when the interface was
taken down and brought up again, and so the MDIO bus would be
non-functional afterwards.
Split up the MDIO bus setup and enable into separate functions and
re-enable the bus after a device reset, to ensure that the MDIO
registers are set properly. This also allows us to remove direct access
to MDIO registers in xilinx_axienet_main.c and centralize them all in
xilinx_axienet_mdio.c.
Also, lock the MDIO bus lock around the device reset process, to avoid
MDIO accesses from occurring while the MDIO is disabled during the reset.
Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This driver was previously always calculating the MDIO clock divisor
(from AXI bus clock to MDIO bus clock) based on the CPU clock frequency,
assuming that it is the same as the AXI bus frequency, but that
simplistic method only works on the MicroBlaze platform.
Add support for specifying the clock used for the device in the device
tree using the clock framework. If the clock is specified then it will
be used when calculating the clock divisor. The previous CPU clock
detection method is left for backward compatibility if no clock is
specified.
Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MDIO bus for this driver was being named using the result of
of_address_to_resource on a node which may not have any resource on it,
but the return value of that call was not checked so it was using some
random value in the bus name. Change to name the MDIO bus based on the
resource start of the actual Ethernet register block.
Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This driver was using in_be32 and out_be32 IO accessors which do not
exist on most platforms. Also, the use of big-endian accessors does not
seem correct as this hardware is accessed over an AXI bus which, to the
extent it has an endian-ness, is little-endian. Switch to standard
ioread32/iowrite32 accessors.
Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This driver was casting skb pointers to u32 and storing them as such in
the DMA buffer descriptor, which is obviously broken on 64-bit. The area
of the buffer descriptor being used is not accessed by the hardware and
has sufficient room for a 32 or 64-bit pointer, so just store the skb
pointer as such.
Signed-off-by: Robert Hancock <hancock@sedsystems.ca>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|