| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
deraadt@ says i broke hppa :(
|
|
|
|
| |
also do the ethertype comparison before the conversion above.
|
|
|
|
| |
ok dlg@
|
|
|
|
|
|
| |
OK dlg@, bluhm@
No Opinion mpi@
Not against it claudio@
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most clonable interface drivers (except bridge, enc, loop, pppx,
switch, trunk and vlan) initialise the send queue's length to IFQ_MAXLEN
during *_clone_create() even though ifq_init(), which is eventually called
through if_attach(), does the same.
Remove all early "ifq_set_maxlen(&ifq->if_snd, IFQ_MAXLEN);" lines to leave
it to ifq_init() and have clonable drivers a tad more in sync.
OK mvs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the interface input handler lists were originally set up to help
us during the intial mpsafe network stack work. at the time not all
the virtual ethernet interfaces (vlan, svlan, bridge, trunk, etc)
were mpsafe, so we wanted a way to avoid them by default, and only
take the kernel lock hit when they were specifically enabled on the
interface. since then, they have been fixed up to be mpsafe.
i could leave the list in place, but it has some semantic problems.
because virtual interfaces filter packets based on the order they
were attached to the parent interface, you can get packets taken
away in surprising ways, especially when you reboot and netstart
does something different to what you did by hand. by hardcoding the
order that things like vlan and bridge get to look at packets, we
can document the behaviour and get consistency.
it also means we can get rid of a use of SRPs which were difficult
to replace with SMRs. the interface input handler list is an SRPL,
which we would like to deprecate. it turns out that you can sleep
during stack processing, which you're not supposed to do with SRPs
or SMRs, but SRPs are a lot more forgiving and it worked.
lastly, it turns out that this code is faster than the input list
handling, so lots of winning all around.
special thanks to hrvoje popovski and aaron bieber for testing.
this has been in snaps as part of a larger diff for over a week.
|
|
|
|
| |
ok dlg@ tobhe@
|
|
|
|
|
| |
i've been wanting to do this for a while, and now that we've got
stoeplitz and it gives us 16 bits, it seems like the right time.
|
|
|
|
|
|
|
|
| |
the intersection of the capabilities of the ports, allowing use of
vlan and checksum offloads if supported by all ports. Since this works
the same way as updating hardmtu, do them both at the same time.
ok dlg@
|
|
|
|
|
|
|
|
|
|
|
| |
aggr_p_dtor() calls ifpromisc(), and ifpromisc() callers need to
be holding NET_LOCK to make changes to if_flags and if_pcount, and
before calling the interfaces ioctl to apply the flag change.
i found this while reading code with my eyes, and was able to trigger
the NET_ASSERT_LOCKED in the vlan_ioctl path.
ok visa@
|
|
|
|
|
| |
coverity CID 1486819
pointed out by and ok tobhe@
|
|
|
|
|
|
|
|
| |
this lets aggr come up on boot if there's a race with it being
brought up and the ports being up.
reported by holger glaess on misc@ and debugged with hrvoje popovski.
tested by hrvoje popovski too.
|
|
|
|
|
| |
Spotted by Hrvoje Popovski using witness(4)
OK dlg@
|
|
|
|
|
|
|
|
|
|
| |
this means we don't truncate sockaddr_in6, which in turn means we
dont end up using garbage or zeros on the underlying ports when
requesting they set up hardware filters for multicast addresses.
vlan(4) uses sockaddr_storage like this too for the same thing.
discovered by jmatthew@ because ipv6 on top of aggr wasn't working
unless tcpdump was running.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
it's no longer necessary to hold NET_LOCK to call interface hook
adds or dels now, but it is necessary not to hold NET_LOCK when
calling some barrier functions.
found by hrvoje popovski
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this is largely mechanical, except for carp. this moves the addition
of the carp link state hook after we're committed to using the new
interface as a carpdev. because the add can't fail, we avoid a
complicated unwind dance. also, this tweaks the carp linkstate hook
so it only updates the relevant carp interface, not all of the
carpdevs on the parent.
hrvoje popovski has tested an early version of this diff and it's
generally ok, but there's some splasserts that this diff fires that
i'll fix in an upcoming diff.
ok claudio@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the main semantic change is that things registering detach hooks
have to allocate and set a task structure that then gets added to
the list. this means if the task is allocated up front (eg, as part
of carps softc or bridges port structure), it avoids the possibility
that adding a hook can fail. a lot of drivers weren't checking for
failure, and unwinding state in the event of failure in other parts
was error prone.
while doing this i discovered that the list operations have to be
in a particular order, but drivers weren't doing that consistently
either. this diff wraps the list ops up so you have to seriously
go out of your way to screw them up.
ive also sprinkled some NET_ASSERT_LOCKED around the list operations
so we can make sure there's no potential for the list to be corrupted,
especially while it's being run.
hrvoje popovski has tested this a bit, and some issues he discovered
have been fixed.
ok sashan@
|
|
|
|
| |
i hope, i didn't test this that hard.
|
|
|
|
| |
useful for debugging.
|
|
|
|
| |
excluding HALF_DUPLEX just seems mean.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
by notify i mean we send an lacp packet with our collecting and
distributing flags cleared, which should tell the remote system
that it should no longer handle packets on their port as part of
their aggregation. this is implemented by "unselecting" a port.
if an active port is going away, ie, being removed from an aggr via
"ifconfig aggr0 -trunkport port0", all that happens is software
state on our side changes and we stop considering the interface as
part of the aggr interface. the partner system is otherwise oblivious
and can continue to send us packets until its expiry timeout fires
because it doesn't know any better.
we already intercept a ports ioctl handling, so if someone goes
"ifconfig portX down" while it is attached to an aggr, we can catch
that before the underlying driver actually tears the rings down, and
we still have a chance to try and send a packet to the peer. this
is useful because our drivers generally do not drop the physical
link, so again, the partner system is oblivious to the change on
our side until its expiry timer fires.
expiry timeouts can be up to 90 seconds away, which is a lot of
traffic to blackhole. sending the notification to the parnter means
they withdraw this link at the same time the local system is pulling
the port out of the aggregation. hopefully. it is possible the
packet is lost, but this is a good start.
the only caveat to this is is my implementation ignores the transmit
state machine from the lacp spec, and may cause more than 3 lacp
packets per second to be transmitted to the partner system. oh
well.
i should look at the marker protocol too.
|
|
|
|
|
|
|
|
|
|
| |
this doesnt seem to be mentioned in the spec, but is a sensible
thing to do if you think about it. all the switches i've tried also
do this, so there's some consensus about it being sensible.
this is done in the link state handler rather than being added to
one of the state machines. the idea is to keep the state machines
as close to what's in the spec as possible.
|
| |
|
|
|
|
|
|
|
| |
without this it looks like debug output loses info because of how
the uct was shortcutted.
no functional change, just prettier printfs.
|
|
|
|
|
|
|
|
|
|
|
| |
previously it would only run the selection logic if the peer
information changed, but it is possible to be in the current state
with stale partner info. that can happen if the port becomes
disabled/disconnected, which unwinds the mux machine, but doesnt
clear the partner info. when the link is enabled again we re-enter
the current state, but because the partner info is the same we
didn't run the selection logic, which in turn didn't let the mux
machine move forward again.
|
|
|
|
|
|
| |
lacp didnt come up again after i replaced some optics with dacs, and it
has to be because of a problem around the selection logic. this will let
me narrow it down.
|
|
|
|
|
|
|
|
| |
ehter_cmp goes away, ether_is_eq becomes ETHER_IS_EQ, ether_is_zero
becomes ETHER_IS_ANYADDR.
ether_is_slow is kept locally, but renamed to ETHER_IS_SLOWADDR to
better match what comes from if_ether.h.
|
| |
|
| |
|
|
|
|
|
| |
it's the same, but there was a misleading comment on the same line
which this cleans up too.
|
|
|
|
|
|
|
|
| |
this probably explains why ive seen a box decide not to use a
distributing port, even though the state machine and all the lacp
state flags say it's fine. it may also explain why jmatthew@ has
seen a port still transmitting after it's been removed from an
aggr(4).
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
make setting a trunkports mtu to its current mtu a nop. set a
trunkports mtu to the aggr mtu when the port is getting added. set
the mtu on all trunkports when the aggr mtu is set so things look
consistent. restore a trunkports mtu when it is removed from an
aggr.
this is mostly cosmetic since the mtu on trunkports isn't really
used anywhere.
|
|
802.1AX (formerly known as 802.3ad) describes the Link Aggregation
Control Protocol (LACP) and how to use it in a bunch of different
state machines to control when to bundle interfaces into an
aggregation.
technically the trunk(4) driver already implements support for
802.1AX, but it had a couple of problems i struggled to deal with
as part of that driver. firstly, i couldnt easily make the output
path in trunk mpsafe without getting bogged down, and the state
machine handling had a few hard to diagnose edge cases that i couldnt
figure out.
the new driver has an mpsafe output path, and implements ifq bypass
like vlan(4) does. this means output with aggr(4) is up to twice
as fast as trunk(4). the implementation of the state machines as
per the standard means the driver behaves more correctly in edge
cases like when a physical link looks like it is up, but is logically
unidirectional.
the code has been good enough for me to use in production, but it
does need more work. that can happen in tree now instead of carrying
a large diff around.
some testing by ccardenas@, hrvoje popovski, and jmatthew@
ok deraadt@ ccardenas@ jmatthew@
|