| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
| |
if the bridge is supposed to carry vlan packets, assuming it's an
s-vlan component and should allow certain group addresses to cross
between "customer" bridges.
i should probably let some of these groups fall back through to the
calling ether_input rather than drop them.
|
| |
|
|
|
|
|
|
|
|
| |
testing has shown up to a 30% improvement in the veb forwarding
rate with this change.
an earlier diff was tested by hrvoje popovski
tested on amd64 and sparc64
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the main bits are ether_addr_to_e64 and ether_e64_to addr for loading
an ethernet address into a uin64_t and visa versa. there's also
some macros for testing if an address in a uint64_t is multicast,
broadcast, anyaddr, or if it's an 802.1q reserved multicast group
address.
the reason for this functionality is once you have an ethernet
address as a uint64_t, operations like compares, bit tests, and
so on are fast and easy.
tested on amd64 and sparc64
|
|
|
|
| |
ok dlg
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the first cut of this diff was made with coccinelle using this spatch:
@rule@
type caddr_t;
expression m, off, len, cp;
@@
-m_copydata(m, off, len, (caddr_t)cp)
+m_copydata(m, off, len, cp)
i had fix it's opinionated idea of formatting by hand though, so
i'm not sure it was worth it.
ok deraadt@ bluhm@
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
another bridge feature i'm not convinced people actually use.
ok jmatthew@ claudio@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the guts of this are in the etherbridge code which i added for
veb and used in bpe. there's a bit of boilerplate to make sure that
the addresses used for the endpoints will work with the tunnel
addresses that have been configured, but it's not too bad.
again, this is hard to use because ifconfig doesnt (yet) know how
to put ethernet addresses into the "add address" ioctl.
these ioctls could be used for things like evpn via bgpd though.
not sure if that's interesting to anyone though. it would probably
be more useful on vxlan interfaces.
|
|
|
|
|
|
|
| |
the guts of this are in the etherbridge code which i just added for
veb, so this code is very minimal. it's hard to use though cos
ifconfig doesnt (yet) know how to put ethernet addresses into the
"add address" ioctl.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
switch_clone_destroy(). This fixes netlock assertion within underlay
ifpromisc(). The problem was reported by hrvoje@ [1].
"why not" by deraadt@
1. https://marc.info/?l=openbsd-bugs&m=161338077403538&w=2
|
| |
|
|
|
|
| |
ok deraadt@ dlg@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
i found the Transparent Network Security Policy Enforcement paper
by angelos and jason was useful for understanding the background
and why you'd want to do this.
the implementation is a little bit different to the bridge one
because i've tweaked the order that pf and ipsec processing happens,
depending on which direction the packet is going over the bridge.
bridge always runs ipsec processing before pf, no matter which
direction the packet is going. packets going into veb, pf runs first
and then ipsec input processing is allowed to happen. in the outgoing
direction ipsec happens first and then pf. pf runs before ipsec in
the inbound direction so pf can apply policy to ipsec encapsulated
packets before they hit pf. this allows you to apply policy to both
the encrypted and unencrypted packets in both directions.
the code is disabled for now. this is mostly because i want veb(4)
to have a good chance at operating outside the netlock, and i'm
pretty sure the ipsec stack isn't ready for that yet. the other
reason why it's disabled is getting a test setup is effort, but i
want to sleep.
|
|
|
|
|
|
|
|
|
| |
using the ipv6 next protocol header probably doesnt work. it also
probably doesnt matter cos i'm not sure anyone uses this feature in
bridge. or maybe there isn't anyone who uses ipv6. both are plausible
options.
hahaha^Wok patrick@
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
im considering converting ethernet addresses into uint64_ts to make
comparisons (and masking) easier. im trialling it here, and it
doesn't seem like the worst.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
my intention is to replace bridge(4), but the way it works is
different enough from from bridge that a name change is justified
to distinguish them. it also makes it easier to commit it to the
tree and work on it in parallel to bridge, and allows a window of
migration.
the main difference between veb(4) and bridge(4) is how they use
interfaces as ports. veb takes over interfaces completely and only
uses them to receive and transmit ethernet packets. bridge also use
each interface as a port to the ethernet segment it's connected to,
but also tries to continue supporting the use of the interface as
a way to talk to the network stack on the local system. supporting
the use of interfaces for both external and local communication is
where most of my confusion with bridge comes from, both when i'm
trying to operate it and also understand the code. changing this
semantic is where most of the simplification in veb comes from
compared to bridge.
because veb takes over interfaces, the ethernet network set up on
a veb is isolated from the host network stack. by default veb does
not interact with pf or the ip (and mpls) stacks. to enable pf for
ip frames going over veb ports link1 on the veb interface must be
set. to have the stack interact with a veb network, vport interfaces
must be created and added as ports to a veb.
the vport interface driver is provided as part of veb, and is handled
specially by veb. veb usually prevents the use of ports by the stack
for sending an receiving packets, but that's why vports exist, so
veb has special handling for them.
veb already supports a lot of the other features that bridge has,
including bridge rules and protected domains, but i got tired of
working out of the tree and stopped implementing them. the main
outstanding features is better address table management, the
blocknonip flag on ports, transparent ipsec interception, and
spanning tree. i may not bother with spanning tree unless someone
tells me that they actually use it.
the core ethernet learning bridge functionality is provided by the
etherbridge code that was factored out of nvgre and bpe. veb is
already (a lot) faster than bridge, and is better prepared to operate
in parallel on multiple CPUs concurrently.
thanks to hrvoje popovski for testing some earlier versions of this.
discussed with many
ok patrick@ jmatthew@
|
|
|
|
|
|
|
| |
reassembly, reinsert the fragment into the lookup table with correct
index.
Reported-by: syzbot+d043455a5346f726f1c4@syzkaller.appspotmail.com
OK claudio@
|
| |
|
|
|
|
|
|
|
| |
the "ports" that nvgre provides to etherbridge are ip addresses
used in the underlay network.
ok patrick@ jmatthew@
|
|
|
|
|
|
|
|
|
| |
it's pretty straightforward since etherbridge was mostly based on
this code in the first place. the etherbridge_ops that bpe provides
to etherbridge set entries up to point at mac addresses in the
underlay network.
ok patrick@ jmatthew@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this allows for the factoring out of the learning bridge code i
wrote in bpe and nvme, and should be reusable for other drivers
needing a mac learning bridge.
the core data structures are an etherbridge struct to represent the
learning bridge, eb_entry structs for each mac address entry that
the bridge knows about, and an etherbridge_ops struct that drivers
fill in so that they can use this code.
eb_entry structs are stored in a hash table made up of SMR_TAILQs
to support lookups of entries quickly and concurrently in the
forwarding path. they are also stored in a locked red-black tree
to help manage the uniqueness of the mac address in the table.
the etherbridge_ops handlers mostly deal with comparing and testing
the "ports" associated with mac address table entries. the "port"
that a mac address entry is associated with is opaque to the
etherbridge code, which allows for this code to be used by nvgre
and bpe which map mac addresses inside the bridge to addresses in
their underlay networks. it also supports traditional bridges where
"ports" are actual interfaces.
ok patrick@ jmatthew@
|
| |
|
|
|
|
|
|
| |
if_vinput requires mpsafe interface counters, so add those in. this
factors out some more code between drivers. monitor mode will work
on these interfaces now too.
|
|
|
|
|
|
|
|
| |
using if_vinput factors out a lot of repeated code between tunnel
drivers, and it means monitor mode works on gre and mgre now too.
make the l2 gre interfaces do some things in the same order while
here.
|
|
|
|
|
|
| |
if_vinput requires mpsafe interface counters, so gif is a bit more
mpsafe now than it was before. using if_vinput means monitor mode
works on gif now too.
|
|
|
|
|
|
|
|
| |
the l3 protocol input to push the packet is based on a value in
m->m_pkthdr.ph_family, which tunnel drivers should set before calling
if_vinput.
add p2p_bpf_mtap to call bpf_mtap_af also using m->m_pkthdr.ph_family.
|
|
|
|
|
|
|
| |
tun (not tap) input packets are written from userland in the same
format that it's bpf dlt is expecting, so we can push the packet
straight into bpf with bpf_mtap. this is more correct that using
bpf_mtap_ether for tun.
|
|
|
|
|
| |
call (*ifp->if_bpf_mtap) instead of bpf_mtap_ether in ifiq_input
and if_vinput.
|
|
|
|
|
|
|
|
| |
the network stack is now responsible for calling bpf for packets
that the interface receives, and we so far got away with using
bpf_mtap_ether for everything. this doesn't work if layer 3 input
goes through the same functions, so letting drivers specify the
appropriate bpf mtap function means they will be able to cope.
|
|
|
|
|
|
|
|
|
|
|
| |
an example use of this is when you have a span port on a switch and
you want to be able to see the packets coming out of it with tcpdump,
but do not want these packets to enter the network stack for
processing. this is particularly important if the span port is
pushing a copy of any packets related to the machine doing the
monitoring as it will confuse pf states and the stack.
ok benno@
|
| |
|
| |
|
|
|
|
|
|
|
| |
if you have multiple links to the same destination, this will let
you use them with route-to/reply-to/dup-to.
ok claudio@
|
|
|
|
|
|
|
| |
context so we always have `curproc' Also protocol control block is not
required for soreserve() so we can do it before `rop' allocation.
ok bluhm@
|
|
|
|
|
|
|
| |
table. Hence we have to grab both the pf lock and the pf state lock.
Found by dlg@
ok bluhm@ sashan@
|
|
|
|
|
|
| |
addresses that come from pf cannot be right, so remove the code.
Coverity CID 1501718
OK dlg@ claudio@
|
|
|
|
|
|
|
|
|
| |
fully initialized because we initialize `if_groups' after linking. It's
not triggered because if_attach() and if_unit(9) are serialized by
kernel lock and `ifp' is often filled by nulls. Move `if_groups'
initialization to if_attach_common() to prevent this.
ok bluhm@ claudio@ deraadt@
|
|
|
|
|
|
|
|
|
|
|
|
| |
the kernel made the unique check before trunkating with strlcpy().
So there could be two interface groups with the same name. The kif
is created by a name lookup. The trunkated names are equal, so
there was only one kif owned by both groups. When the groups got
destroyed, the single kif was removed twice from the RB tree.
Check length of group name before doing the unique check.
The empty group name was allowed and is now invalid.
Reported-by: syzbot+f47e8296ebd559f9bbff@syzkaller.appspotmail.com
OK deraadt@ gnezdo@ anton@ mvs@ claudio@
|
|
|
|
|
|
|
|
|
|
| |
pppac_ioctl() be called on dying pppac(4) interface. But now if_detach()
makes dying `ifp' inaccessible and waits for references which are in-use
in ioctl(2) path. This logic is not required anymore. Also if_detach()
was moved before klist_invalidate() to prevent the case while
pppac_qstart() bump `sc_rsel'.
ok yasuoka@
|
|
|
|
|
|
|
|
|
| |
since the actual modification of the state table is done by a call to
pf_state_insert(), which takes the pf state lock itself. Other calls
to pfsync_state_import() also only have the pf lock.
Reported-by: syzbot+d6ea8620b43dc69ecbc6@syzkaller.appspotmail.com
ok bluhm@
|
|
|
|
|
| |
Silence from the network group
ok sashan@
|