| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this replaces the existing counters implementation, which just
collected the stats in the softc, but didn't really provide a way
for a person to read them.
em counters get cleared on read. a lot of them are 32bit, so to
avoid overflow the counters are polled and the newly accumulated
values are added to some 64 bit counters in software.
tested by hrvoje popovski and SAITOH Masanobu
ok mpi@
|
|
|
|
| |
ok dlg@ tobhe@
|
|
|
|
| |
this is a step toward deprecating softclock based livelock detection.
|
|
|
|
|
|
|
|
|
|
|
| |
- return an error if em_rxfill() fails when setting up the ring, this
means em_get_buf() couldn't get a mbuf.
- disable "Drop Enable" to have the same behavior when queues > 1
- use local variables for statistics in preparation for using the
counters_add(9) API to not trash values
- extend hw_stats to print per-queue counters
Tested by Hrvoje Popovski, ok jmatthew@
|
|
|
|
|
|
|
|
|
|
| |
descriptors runs below the low watermark.
The em(4) firmware seems not to work properly with just a few descriptors in
the receive ring. Thus, we use the low water mark as an indicator instead of
zero descriptors, which causes deadlocks.
ok kettenis@
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
but on selected ARM64 machines with non-cache-coherent PCIe controllers this
makes em(4) work reliably. Without it the network controller's view of the
head and tail get out of sync. The reason remains unclear. It could be an
issue in our arm64 bus dma code, it could be an issue in the em(4) code, or
maybe the hardware itself just doesn't cope well with non-coherent memory.
Linux maps them coherent as well, and it might actually be better to map
them that way, since otherwise we might spend a lot of time flushing our
caches.
ok kettenis@ deraadt@
|
|
|
|
| |
Tested by Hrvoje Popovski and jmatthew@, ok jmatthew@
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The current implementation still uses a single queue but already establishes
a different handler for link interrupts. This is done in preparation for
multi-queues support.
Based on a bigger diff from haesbaert@ and on the FreeBSD code.
Tested by Hrvoje Popovski and jmatthew@, ok jmatthew@
|
|
|
|
| |
Tested by Hrvoje Popovski, ok jmatthew@
|
|
|
|
| |
No functional change.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move the tx/rx descriptors to dedicated structures similar to what already
exist in ix(4).
Only one queue is currently used, no real architectural change introduced
in this diff.
Extracted from a big diff from haesbaert@ via patrick@.
Tested by Hrvoje Popovski and jmatthew@, ok jmatthew@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Abstract the allocation/freeing of TX/RX ring into em_dma_malloc().
This will ease the introduction of multiple rings.
- Split the 82576 variant out of 82575. The distinction is necessary
when it comes to setting multiple queues.
- Change multiple TX/RX related macro to take an index argument
corresponding to a ring. Currently only the index 0 and 1 are used.
- Gather and print more stats counters
- Switch to using a function, like FreeBSD, to translate 82542
registers and get rid of a set of defines.
Tested by many, thanks! ok mlarkin@, jmatthew@
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
em had rxr, but didn't use a timeout cos it claimed to generate an
RX overflow interrupt when packets fell off slots in the ring. turns
out that's a lie on at least one chip, so add the timeout like other
drivers.
this was hit by mlarkin@, who had nfs and bufs steal all the packets
and memory for packets from em, which didn't recover after the
memory had been released back to the system.
|
|
|
|
|
|
| |
https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/i218-i219-ethernet-connection-spec-update.pdf?asset=9561
ok mikeb@ jsg@
|
|
|
|
|
|
|
|
| |
Print the error code if hardware initialization failed.
If EM_DEBUG is defined, print the phy/mac type during attach.
ok mikeb@ jsg@
|
|
|
|
|
| |
Going by changes in FreeBSD and Linux it is almost identical to pch_spt
but doesn't need one of the workarounds for a pch_spt specific errata.
|
| |
|
| |
|
|
|
|
|
|
| |
So the em(4) driver never got out of that state. Better compare
the new link state value with the old one, like other drivers do.
bug report Matthias Pitzl; OK deraadt@
|
|
|
|
| |
Expanded version of a diff from claudio@ who tested on x270 ok kettenis@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
an ifq to transmit a packet is picked by the current traffic
conditioner (ie, priq or hfsc) by providing an index into an array
of ifqs. by default interfaces get a single ifq but can ask for
more using if_attach_queues().
the vast majority of our drivers still think there's a 1:1 mapping
between interfaces and transmit queues, so their if_start routines
take an ifnet pointer instead of a pointer to the ifqueue struct.
instead of changing all the drivers in the tree, drivers can opt
into using an if_qstart routine and setting the IFXF_MPSAFE flag.
the stack provides a compatability wrapper from the new if_qstart
handler to the previous if_start handlers if IFXF_MPSAFE isnt set.
enabling hfsc on an interface configures it to transmit everything
through the first ifq. any other ifqs are left configured as priq,
but unused, when hfsc is enabled.
getting this in now so everyone can kick the tyres.
ok mpi@ visa@ (who provided some tweaks for cnmac).
|
|
|
|
|
|
|
| |
this means packets are consistently counted in one place, unlike the
many and various ways that drivers thought they should do it.
ok mpi@ deraadt@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this means that the ethernet header and therefore its payload will
be aligned correctly for the stack. without this em and ix are
sufferring a 30 to 40 percent hit in forwarding performance because
the ethernet stack expects to be able to prepend 8 bytes for an
ethernet header so it can gaurantee its alignment. because em and
ix only had 6 bytes where the ethernet header was, it always prepends
an mbuf which turns out to be expensive. this way the prepend will
be cheap because the 8 byte space will exist.
2k+ETHER_ALIGN clusters will end up using the newly created mcl2k2
pool.
the regression was isolated and the fix tested by hrvoje popovski.
ok mikeb@
|
| |
|
|
|
|
|
| |
from Christian Ehrhardt; input jsg@; OK deraadt@ sthen@ mpi@ jsg@
tested by sthen@ jca@ benno@ bluhm@
|
|
|
|
|
| |
now that start and txeof can run on different cpus, txeof could
have freed the mbuf before bpf got to it.
|
|
|
|
|
|
|
|
|
|
|
| |
noone could understand how em_txeof worked, so i rewrote it.
this also gets rid of the sc_tx_desc_free var that needed atomic
ops. space to use in em_start and space to free in em_txeof is now
calculated from the producer and consumer.
testers have reported better responsiveness with this. somehow.
if em issues persist after this, im rolling back to pre-mpsafe changes.
|
| |
|
| |
|
|
|
|
| |
shorten a bunch of variable names while here.
|
| |
|
| |
|
|
|
|
|
|
| |
this lets us do the syncs once for a fill of the ring instead of
once for every packet put onto the ring. it mirrors how we try to
do things for tx.
|
| |
|
|
|
|
|
|
| |
we dont user config of the ring size, especially before attach time,
and the dmamem api takes care of rounding up to PAGE_SIZE if it needs
to.
|
| |
|
| |
|
|
|
|
| |
makes it more consistent with the rest of the tree.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
this is mostly work by kettenis and claudio, with further work from
me to make the transmit side from the stack mpsafe.
there's a watchdog issue that will be worked on in tree after this
change.
tested by hrvoje popovski and gregor best
ok mpi@ claudio@ deraadt@ jmatthew@
|
|
|
|
|
|
|
|
|
|
| |
the possible number of slots a packet can use on the tx ring.
to make it easier to reserve and account for space on the ring,
half the number of dma descriptors on those chips so the number of
slots can stay the same.
ok claudio@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
there are two things shared between the network stack and drivers
in the send path: the send queue and the IFF_OACTIVE flag. the send
queue is now protected by a mutex. this diff makes the oactive
functionality mpsafe too.
IFF_OACTIVE is part of if_flags. there are two problems with that.
firstly, if_flags is a short and we dont have any MI atomic operations
to manipulate a short. secondly, while we could make the IFF_OACTIVE
operates mpsafe, all changes to other flags would have to be made
safe at the same time, otherwise a read-modify-write cycle on their
updates could clobber the oactive change.
instead, this moves the oactive mark into struct ifqueue and provides
an API for changing it. there's ifq_set_oactive, ifq_clr_oactive,
and ifq_is_oactive. these are modelled on ifsq_set_oactive,
ifsq_clr_oactive, and ifsq_is_oactive in dragonflybsd.
this diff includes changes to all the drivers manipulating IFF_OACTIVE
to now use the ifsq_{set,clr_is}_oactive API too.
ok kettenis@ mpi@ jmatthew@ deraadt@
|
|
|
|
|
|
|
|
|
|
|
| |
KERNE_LOCK.
A piece is still not right as many peole reported a "watchdog timeout"
problem.
This basically brings us back to r1.305.
ok dlg@, jmatthew@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the code is refactored so the IFQ macros call newly implemented ifq
functions. the ifq code is split so each discipline (priq and hfsc
in our case) is an opaque set of operations that the common ifq
code can call. the common code does the locking, accounting (ifq_len
manipulation), and freeing of the mbuf if the disciplines enqueue
function rejects it. theyre kind of like bufqs in the block layer
with their fifo and nscan disciplines.
the new api also supports atomic switching of disciplines at runtime.
the hfsc setup in pf_ioctl.c has been tweaked to build a complete
hfsc_if structure which it attaches to the send queue in a single
operation, rather than attaching to the interface up front and
building up a list of queues.
the send queue is now mutexed, which raises the expectation that
packets can be enqueued or purged on one cpu while another cpu is
dequeueing them in a driver for transmission. a lot of drivers use
IFQ_POLL to peek at an mbuf and attempt to fit it on the ring before
committing to it with a later IFQ_DEQUEUE operation. if the mbuf
gets freed in between the POLL and DEQUEUE operations, fireworks
will ensue.
to avoid this, the ifq api introduces ifq_deq_begin, ifq_deq_rollback,
and ifq_deq_commit. ifq_deq_begin allows a driver to take the ifq
mutex and get a reference to the mbuf they wish to try and tx. if
there's space, they can ifq_deq_commit it to remove the mbuf and
release the mutex. if there's no space, ifq_deq_rollback simply
releases the mutex. this api was developed to make updating the
drivers using IFQ_POLL easy, instead of having to do significant
semantic changes to avoid POLL that we cannot test on all the
hardware.
the common code has been tested pretty hard, and all the driver
modifications are straightforward except for de(4). if that breaks
it can be dealt with later.
ok mpi@ jmatthew@
|
| |
|
| |
|