summaryrefslogtreecommitdiffstats
path: root/sys/net (follow)
Commit message (Collapse)AuthorAgeFilesLines
* cut nvgre(4) over to use common etherbridge code.dlg2021-02-211-315/+127
| | | | | | | the "ports" that nvgre provides to etherbridge are ip addresses used in the underlay network. ok patrick@ jmatthew@
* cut bpe(4) over to using the common etherbridge code.dlg2021-02-211-290/+125
| | | | | | | | | it's pretty straightforward since etherbridge was mostly based on this code in the first place. the etherbridge_ops that bpe provides to etherbridge set entries up to point at mac addresses in the underlay network. ok patrick@ jmatthew@
* add etherbridge, the guts of a learning bridge that can be reused.dlg2021-02-212-0/+689
| | | | | | | | | | | | | | | | | | | | | | | | | | this allows for the factoring out of the learning bridge code i wrote in bpe and nvme, and should be reusable for other drivers needing a mac learning bridge. the core data structures are an etherbridge struct to represent the learning bridge, eb_entry structs for each mac address entry that the bridge knows about, and an etherbridge_ops struct that drivers fill in so that they can use this code. eb_entry structs are stored in a hash table made up of SMR_TAILQs to support lookups of entries quickly and concurrently in the forwarding path. they are also stored in a locked red-black tree to help manage the uniqueness of the mac address in the table. the etherbridge_ops handlers mostly deal with comparing and testing the "ports" associated with mac address table entries. the "port" that a mac address entry is associated with is opaque to the etherbridge code, which allows for this code to be used by nvgre and bpe which map mac addresses inside the bridge to addresses in their underlay networks. it also supports traditional bridges where "ports" are actual interfaces. ok patrick@ jmatthew@
* add stoeplitz_eaddr, for getting a hash value from an ethernet address.dlg2021-02-212-2/+16
|
* move from calling l3 protocol input handlers to using if_vinput.dlg2021-02-202-40/+14
| | | | | | if_vinput requires mpsafe interface counters, so add those in. this factors out some more code between drivers. monitor mode will work on these interfaces now too.
* move gre and mgre from calling l3 input handlers to using if_vinput.dlg2021-02-201-46/+11
| | | | | | | | using if_vinput factors out a lot of repeated code between tunnel drivers, and it means monitor mode works on gre and mgre now too. make the l2 gre interfaces do some things in the same order while here.
* move gif from calling l3 protocol input handlers to using if_vinput.dlg2021-02-201-25/+5
| | | | | | if_vinput requires mpsafe interface counters, so gif is a bit more mpsafe now than it was before. using if_vinput means monitor mode works on gif now too.
* add p2p_input, like ether_input but for l3 tunnel interfaces.dlg2021-02-202-2/+44
| | | | | | | | the l3 protocol input to push the packet is based on a value in m->m_pkthdr.ph_family, which tunnel drivers should set before calling if_vinput. add p2p_bpf_mtap to call bpf_mtap_af also using m->m_pkthdr.ph_family.
* let tun use bpf_mtap for handling input packets.dlg2021-02-201-1/+4
| | | | | | | tun (not tap) input packets are written from userland in the same format that it's bpf dlt is expecting, so we can push the packet straight into bpf with bpf_mtap. this is more correct that using bpf_mtap_ether for tun.
* default interfaces to bpf_mtap_ether for their if_bpf_mtap handler.dlg2021-02-202-4/+8
| | | | | call (*ifp->if_bpf_mtap) instead of bpf_mtap_ether in ifiq_input and if_vinput.
* give interfaces an if_bpf_mtap handler.dlg2021-02-201-1/+2
| | | | | | | | the network stack is now responsible for calling bpf for packets that the interface receives, and we so far got away with using bpf_mtap_ether for everything. this doesn't work if layer 3 input goes through the same functions, so letting drivers specify the appropriate bpf mtap function means they will be able to cope.
* add a MONITOR flag to ifaces to say they're only used for watching packets.dlg2021-02-203-8/+12
| | | | | | | | | | | an example use of this is when you have a span port on a switch and you want to be able to see the packets coming out of it with tcpdump, but do not want these packets to enter the network stack for processing. this is particularly important if the span port is pushing a copy of any packets related to the machine doing the monitoring as it will confuse pf states and the stack. ok benno@
* we dont need to wrap some short lines.dlg2021-02-191-5/+3
|
* check the state for PF_ROUTE when undeferring a packet, not the rule.dlg2021-02-191-2/+2
|
* use rtalloc_mpath in pf_route and pf_route6.dlg2021-02-161-3/+4
| | | | | | | if you have multiple links to the same destination, this will let you use them with route-to/reply-to/dup-to. ok claudio@
* Simplify error path in in route_attach(). We always call it in threadmvs2021-02-151-10/+4
| | | | | | | context so we always have `curproc' Also protocol control block is not required for soreserve() so we can do it before `rop' allocation. ok bluhm@
* pf_remove_divert_state() is an entry point into pf, modifying the pf statepatrick2021-02-121-1/+7
| | | | | | | table. Hence we have to grab both the pf lock and the pf state lock. Found by dlg@ ok bluhm@ sashan@
* Fix null pointer dereference in pf_route6(). Embedding scope intobluhm2021-02-121-3/+1
| | | | | | addresses that come from pf cannot be right, so remove the code. Coverity CID 1501718 OK dlg@ claudio@
* We link `ifp' to `if_list' before we perform if_attachsetup(). It is notmvs2021-02-111-3/+2
| | | | | | | | | fully initialized because we initialize `if_groups' after linking. It's not triggered because if_attach() and if_unit(9) are serialized by kernel lock and `ifp' is often filled by nulls. Move `if_groups' initialization to if_attach_common() to prevent this. ok bluhm@ claudio@ deraadt@
* Interface group names must fit into IFNAMSIZ and be unique. Butbluhm2021-02-101-3/+5
| | | | | | | | | | | | the kernel made the unique check before trunkating with strlcpy(). So there could be two interface groups with the same name. The kif is created by a name lookup. The trunkated names are equal, so there was only one kif owned by both groups. When the groups got destroyed, the single kif was removed twice from the RB tree. Check length of group name before doing the unique check. The empty group name was allowed and is now invalid. Reported-by: syzbot+f47e8296ebd559f9bbff@syzkaller.appspotmail.com OK deraadt@ gnezdo@ anton@ mvs@ claudio@
* Remove `sc_dead' logic from pppac(4). It is used to preventmvs2021-02-101-9/+3
| | | | | | | | | | pppac_ioctl() be called on dying pppac(4) interface. But now if_detach() makes dying `ifp' inaccessible and waits for references which are in-use in ioctl(2) path. This logic is not required anymore. Also if_detach() was moved before klist_invalidate() to prevent the case while pppac_qstart() bump `sc_rsel'. ok yasuoka@
* pfsync_state_import() must not be called with the pf state lock held,patrick2021-02-091-3/+1
| | | | | | | | | since the actual modification of the state table is done by a call to pf_state_insert(), which takes the pf state lock itself. Other calls to pfsync_state_import() also only have the pf lock. Reported-by: syzbot+d6ea8620b43dc69ecbc6@syzkaller.appspotmail.com ok bluhm@
* Activate use of PF_LOCK() by removing the WITH_PF_LOCK ifdefs.patrick2021-02-094-40/+4
| | | | | Silence from the network group ok sashan@
* Start refcounting interface groups with 1. if_creategroup() returnsbluhm2021-02-081-8/+13
| | | | | | | a new object that is already refcounted, so carp attach does not reach into internal structures. Add kasserts to detect counter overflow or underflow. OK mvs@
* Simplex interface sends packet back without hardware checksumbluhm2021-02-061-2/+6
| | | | | | | | offloading. The checksum must be calculated in software. Use the same condition in ether_resolve() to send the broadcast packet back to the stack and in in_ifcap_cksum() to force software checksumming. This fixes regress/sys/kern/sosplice/loop. OK procter@
* Fix whitespace.bluhm2021-02-051-3/+3
|
* make if_pfsync.c a better friend with PF_LOCKsashan2021-02-043-178/+383
| | | | | | | | | | | | The code delivered in this change is currently disabled. Brave souls may enable the code by adding -DWITH_PF_LOCK when building customized kernel. Big thanks goes to Hrvoje@ for providing test equipment and testing. As soon as we enter the next release cycle, the WITH_PF_LOCK will be defined as default option for MP kernels. OK dlg@
* change pf_route so pf only runs when packets enter and leave the stack.dlg2021-02-031-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | before this change pf_route operated on the semantic that pf runs when packets go over an interface, so when pf_route changed which interface the packet was on it would run pf_test again. this change changes (restores) the semantic that pf is only supposed to run when packets go in or out of the network stack, even if route-to is responsibly for short circuiting past the network stack. just to be clear, for normal packets (ie, those not touched by route-to/reply-to/dup-to), there isn't a difference between running pf when packets enter or leave the stack, or having pf run when a packet goes over an interface. the main reason for this change is that running the same packet through pf multiple times creates confusion for the state table. by default, pf states are floating, meaning that packets are matched to states regardless of which interface they're going over. if a packet leaving on em0 is rerouted out em1, both traversals will end up using the same state, which at best will make the accounting look weird, or at worst fail some checks in the state and get dropped. another reason for this commit is is to make handling of the changes that route-to makes consistent with other changes that are made to packet. eg, when nat is applied to a packet, we don't run pf_test again with the new addresses. the main caveat with this diff is you can't have one rule that pushes a packet out a different interface, and then have a rule on that second interface that NATs the packet. i'm not convinced this ever worked reliably or was used much anyway, so we don't think it's a big concern. discussed with many, with special thanks to bluhm@, sashan@ and sthen@ for weathering most of that pain. ok claudio@ sashan@ jmatthew@
* Netlock should be grabbed before pppx_if_find() call in pppxwrite().mvs2021-02-011-3/+5
| | | | | | | Otherwise this `pxi' can be killed by concurrent thread after context switch caused by following netlock. ok yasuoka@
* Remove dummy TUNSIFMODE ioctl(2) call from pppac(4) and npppd(8). Sincemvs2021-02-011-10/+1
| | | | | | OpenBSD 6.7 npppd(8) can't work over tun(4). ok yasuoka@
* ifunit() was fully replaced by if_unit(9) and should go away.mvs2021-02-012-20/+8
| | | | ok bluhm@ dlg@
* change route-to so it sends packets to IPs instead of interfaces.dlg2021-02-013-118/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | this is a significant (and breaking) reworking of the policy based routing that pf can do. the intention is to make it as easy as nat/rdr to use, and more robust when it's operating. the main reasons for this change are: - route-to, reply-to, and dup-to do not work with pfsync this is because the information about where to route-to is stored in rules, and it is hard to have a ruleset synced between firewalls, and impossible to have them synced 100% of the time. - i can make my boxes panic in certain situations using route-to yeah... - the configuration and syntax for route-to rules are confusing. the argument to route-to and co is an interace name with an optional ip address. there are several problems with this. one is that people tend to think about routing as sending packets to peers by their address, not by the interface they're reachable on. another is that we currently have no way to synchronise interface topology information between firewalls, so using an interface to say where packets go means we can't do failover of these states with pfsync. another is that a change in routing topology means a host may become reachable over a different interface. tying routing policy to interfaces gets in the way of failover and load balancing. this change does the following: - stores the route info in the state instead of the pf rule this allows route-to to keep working when the ruleset changes, and allows route-to info to be sent over pfsync. there's enough spare bits in pfsync messages that the protocol doesnt break. the caveat is that route-to becomes tied to pass rules that create state, like rdr-to and nat-to. - the argument to route-to etc is a destination ip address it's not limited to a next-hop address (thought a next-hop can be a destination address). this allows for the failover and load balancing referred to above. - deprecates the address@interface host syntax in pfctl because routing is done entirely by IPs, the interface is derived from the route lookup, not pf. any attempt to use the @interface syntax will fail now in all contexts. there's enthusiasm from proctor@ jmatthew@ and others ok sashan@ bluhm@
* bridge(4): convert ifunit() to if_unit(9)mvs2021-01-282-16/+38
| | | | ok bluhm@ sashan@
* trunk(4): convert ifunit to if_unit(9)mvs2021-01-281-9/+21
| | | | ok bluhm@
* handle "once" rules before letting pfsync defer tx of a packet.dlg2021-01-281-15/+15
| | | | | | | | | | | | | pfsync may want to defer the transmission of a packet. it does this so it can try and get a state over to a peer firewall before a host may send a reply to the peer, which would get dropped cos there's no matching state. i think the once rule processing should happen before that. the state is created from the rule, whether the packet the state is for goes out immediately or not shouldn't matter. ok sashan@
* if the route resolved in pf_route is invalid, generate an icmp error.dlg2021-01-271-1/+10
| | | | | | of course this is limited to the !dup-to case. ok sashan@ bluhm@
* have pf_route{,6} clear the pf_pdesc mbuf ref early for route-to/reply-to.dlg2021-01-271-5/+3
| | | | | | | | | | | | | | | | pf_route and pf_route6 are called to take over delivery of the packet with route-to and reply-to instead of letting it get processed normally. for the dup-to handling, it copies the mbuf but leaves the original mbuf in place. pf_route takes over the packet by clearing the mbuf pointer in the pf_pdesc struct. this diff moves the clearing of that pointer to the start of the function, rather than checking for dup-to again on the way out of the function. i think this is better because it means that it's more robust in the face of future code changes. even if that's not true, it's still shorter code in a forwarding path. ok sashan@ jmatthew@
* don't run copies of packets made by dup-to through pf_test.dlg2021-01-271-3/+3
| | | | | | | | | | | | | | dup-to is kind of like what you do with a span port, but is a bit more fine grained. it copies packets in a connection out an interface so that connection can be monitored. it doesnt make sense for pf to see the copied packets and try to match or create new states for them either. at best it needs config to stop pf seeing the copies (eg, set skip on $dup_to_tgt_if). at worst it breaks the connections you're monitoring because the states in pf get confused. found while discussing larger route-to changes on tech@. ok bluhm@ sashan@
* We have this sequence in bridge(4) ioctl(2) path:mvs2021-01-254-81/+54
| | | | | | | | | | | | | | | | | | | | ifs = ifunit(req->ifbr_ifsname); if (ifs == NULL) { error = ENOENT; break; } if (ifs->if_bridgeidx != ifp->if_index) { error = ESRCH; break; } bif = bridge_getbif(ifs); This sequence repeats 8 times. Also we don't check value returned by bridge_getbig() before use. Newly introduced bridge_getbig() function replaces this sequence. This not only reduces duplicated code but also makes `bif' dereference safe. ok bluhm@
* Fix wg(4) ioctl to be able to handle multiple wgpeers.yasuoka2021-01-251-5/+10
| | | | | | Diff from Yuichiro NAITO. ok procter
* vlan(4): convert ifunit() to if_unit(9)mvs2021-01-211-9/+15
| | | | ok dlg@ kn@
* let vfs keep track of nonblocking state for us.dlg2021-01-212-9/+4
| | | | ok claudio@ mvs@
* An invalid packet may not have set src and dst in packet descriptor.bluhm2021-01-201-7/+9
| | | | | | Add a NULL check to prevent crash in pflog(4) introduced in previous commit. Reported-by: syzbot+c6d2f2ad34b822bce98a@syzkaller.appspotmail.com
* Print rewritten addresses in tcpdump(8) logged with pflog(4) forbluhm2021-01-201-3/+10
| | | | | | | | rdr-to, nat-to, af-to rules. The kernel uses the information from the packet description and fills it into the fields in the pflog header. While doing this, it is trival to figure out whether the packet has been rewritten. OK sashan@
* pflog(4) tried to log the translated packet with rdr-to, nat-to,bluhm2021-01-192-149/+5
| | | | | | | | | | | | | | and af-to addresses and ports applied. Therefore it created a mbuf chain on the stack with a partial copy. This is too complicated for IP options, extension header, NAT46 af-to, and fragmented mbuf chains. It even caused a crash in syzkaller. Usually the length checks in pf_setup_pdesc() rejected the faked mbuf and the goto copy logged the packet unmodified. Remove the pflog_mtap() function and call bpf_mtap_hdr() directly. As the old buggy code was bypassed in most cases, tcpdump(8) output of pflog does not change. Uncondionally log the unmodified packet. Reported-by: syzbot+947e89e06ac3fec187d0@syzkaller.appspotmail.com OK sashan@
* pipex(4): convert ifunit() to if_unit(9)mvs2021-01-191-2/+5
| | | | ok dlg@
* switch(4): convert ifunit to if_unit(9)mvs2021-01-192-21/+38
| | | | ok dlg@
* pppoe(4): convert ifunit() to if_unit(9)mvs2021-01-191-2/+4
| | | | ok dlg@ kn@
* pipex(4): convert ifunit() to if_unit(9)mvs2021-01-191-5/+12
| | | | ok dlg@
* gre(4): convert ifunit() to if_unit(9)mvs2021-01-191-3/+6
| | | | ok dlg@