summaryrefslogtreecommitdiffstats
path: root/sys/net (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
* Start refcounting interface groups with 1. if_creategroup() returnsbluhm2021-02-081-8/+13
| | | | | | | a new object that is already refcounted, so carp attach does not reach into internal structures. Add kasserts to detect counter overflow or underflow. OK mvs@
* Simplex interface sends packet back without hardware checksumbluhm2021-02-061-2/+6
| | | | | | | | offloading. The checksum must be calculated in software. Use the same condition in ether_resolve() to send the broadcast packet back to the stack and in in_ifcap_cksum() to force software checksumming. This fixes regress/sys/kern/sosplice/loop. OK procter@
* Fix whitespace.bluhm2021-02-051-3/+3
|
* make if_pfsync.c a better friend with PF_LOCKsashan2021-02-043-178/+383
| | | | | | | | | | | | The code delivered in this change is currently disabled. Brave souls may enable the code by adding -DWITH_PF_LOCK when building customized kernel. Big thanks goes to Hrvoje@ for providing test equipment and testing. As soon as we enter the next release cycle, the WITH_PF_LOCK will be defined as default option for MP kernels. OK dlg@
* change pf_route so pf only runs when packets enter and leave the stack.dlg2021-02-031-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | before this change pf_route operated on the semantic that pf runs when packets go over an interface, so when pf_route changed which interface the packet was on it would run pf_test again. this change changes (restores) the semantic that pf is only supposed to run when packets go in or out of the network stack, even if route-to is responsibly for short circuiting past the network stack. just to be clear, for normal packets (ie, those not touched by route-to/reply-to/dup-to), there isn't a difference between running pf when packets enter or leave the stack, or having pf run when a packet goes over an interface. the main reason for this change is that running the same packet through pf multiple times creates confusion for the state table. by default, pf states are floating, meaning that packets are matched to states regardless of which interface they're going over. if a packet leaving on em0 is rerouted out em1, both traversals will end up using the same state, which at best will make the accounting look weird, or at worst fail some checks in the state and get dropped. another reason for this commit is is to make handling of the changes that route-to makes consistent with other changes that are made to packet. eg, when nat is applied to a packet, we don't run pf_test again with the new addresses. the main caveat with this diff is you can't have one rule that pushes a packet out a different interface, and then have a rule on that second interface that NATs the packet. i'm not convinced this ever worked reliably or was used much anyway, so we don't think it's a big concern. discussed with many, with special thanks to bluhm@, sashan@ and sthen@ for weathering most of that pain. ok claudio@ sashan@ jmatthew@
* Netlock should be grabbed before pppx_if_find() call in pppxwrite().mvs2021-02-011-3/+5
| | | | | | | Otherwise this `pxi' can be killed by concurrent thread after context switch caused by following netlock. ok yasuoka@
* Remove dummy TUNSIFMODE ioctl(2) call from pppac(4) and npppd(8). Sincemvs2021-02-011-10/+1
| | | | | | OpenBSD 6.7 npppd(8) can't work over tun(4). ok yasuoka@
* ifunit() was fully replaced by if_unit(9) and should go away.mvs2021-02-012-20/+8
| | | | ok bluhm@ dlg@
* change route-to so it sends packets to IPs instead of interfaces.dlg2021-02-013-118/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | this is a significant (and breaking) reworking of the policy based routing that pf can do. the intention is to make it as easy as nat/rdr to use, and more robust when it's operating. the main reasons for this change are: - route-to, reply-to, and dup-to do not work with pfsync this is because the information about where to route-to is stored in rules, and it is hard to have a ruleset synced between firewalls, and impossible to have them synced 100% of the time. - i can make my boxes panic in certain situations using route-to yeah... - the configuration and syntax for route-to rules are confusing. the argument to route-to and co is an interace name with an optional ip address. there are several problems with this. one is that people tend to think about routing as sending packets to peers by their address, not by the interface they're reachable on. another is that we currently have no way to synchronise interface topology information between firewalls, so using an interface to say where packets go means we can't do failover of these states with pfsync. another is that a change in routing topology means a host may become reachable over a different interface. tying routing policy to interfaces gets in the way of failover and load balancing. this change does the following: - stores the route info in the state instead of the pf rule this allows route-to to keep working when the ruleset changes, and allows route-to info to be sent over pfsync. there's enough spare bits in pfsync messages that the protocol doesnt break. the caveat is that route-to becomes tied to pass rules that create state, like rdr-to and nat-to. - the argument to route-to etc is a destination ip address it's not limited to a next-hop address (thought a next-hop can be a destination address). this allows for the failover and load balancing referred to above. - deprecates the address@interface host syntax in pfctl because routing is done entirely by IPs, the interface is derived from the route lookup, not pf. any attempt to use the @interface syntax will fail now in all contexts. there's enthusiasm from proctor@ jmatthew@ and others ok sashan@ bluhm@
* bridge(4): convert ifunit() to if_unit(9)mvs2021-01-282-16/+38
| | | | ok bluhm@ sashan@
* trunk(4): convert ifunit to if_unit(9)mvs2021-01-281-9/+21
| | | | ok bluhm@
* handle "once" rules before letting pfsync defer tx of a packet.dlg2021-01-281-15/+15
| | | | | | | | | | | | | pfsync may want to defer the transmission of a packet. it does this so it can try and get a state over to a peer firewall before a host may send a reply to the peer, which would get dropped cos there's no matching state. i think the once rule processing should happen before that. the state is created from the rule, whether the packet the state is for goes out immediately or not shouldn't matter. ok sashan@
* if the route resolved in pf_route is invalid, generate an icmp error.dlg2021-01-271-1/+10
| | | | | | of course this is limited to the !dup-to case. ok sashan@ bluhm@
* have pf_route{,6} clear the pf_pdesc mbuf ref early for route-to/reply-to.dlg2021-01-271-5/+3
| | | | | | | | | | | | | | | | pf_route and pf_route6 are called to take over delivery of the packet with route-to and reply-to instead of letting it get processed normally. for the dup-to handling, it copies the mbuf but leaves the original mbuf in place. pf_route takes over the packet by clearing the mbuf pointer in the pf_pdesc struct. this diff moves the clearing of that pointer to the start of the function, rather than checking for dup-to again on the way out of the function. i think this is better because it means that it's more robust in the face of future code changes. even if that's not true, it's still shorter code in a forwarding path. ok sashan@ jmatthew@
* don't run copies of packets made by dup-to through pf_test.dlg2021-01-271-3/+3
| | | | | | | | | | | | | | dup-to is kind of like what you do with a span port, but is a bit more fine grained. it copies packets in a connection out an interface so that connection can be monitored. it doesnt make sense for pf to see the copied packets and try to match or create new states for them either. at best it needs config to stop pf seeing the copies (eg, set skip on $dup_to_tgt_if). at worst it breaks the connections you're monitoring because the states in pf get confused. found while discussing larger route-to changes on tech@. ok bluhm@ sashan@
* We have this sequence in bridge(4) ioctl(2) path:mvs2021-01-254-81/+54
| | | | | | | | | | | | | | | | | | | | ifs = ifunit(req->ifbr_ifsname); if (ifs == NULL) { error = ENOENT; break; } if (ifs->if_bridgeidx != ifp->if_index) { error = ESRCH; break; } bif = bridge_getbif(ifs); This sequence repeats 8 times. Also we don't check value returned by bridge_getbig() before use. Newly introduced bridge_getbig() function replaces this sequence. This not only reduces duplicated code but also makes `bif' dereference safe. ok bluhm@
* Fix wg(4) ioctl to be able to handle multiple wgpeers.yasuoka2021-01-251-5/+10
| | | | | | Diff from Yuichiro NAITO. ok procter
* vlan(4): convert ifunit() to if_unit(9)mvs2021-01-211-9/+15
| | | | ok dlg@ kn@
* let vfs keep track of nonblocking state for us.dlg2021-01-212-9/+4
| | | | ok claudio@ mvs@
* An invalid packet may not have set src and dst in packet descriptor.bluhm2021-01-201-7/+9
| | | | | | Add a NULL check to prevent crash in pflog(4) introduced in previous commit. Reported-by: syzbot+c6d2f2ad34b822bce98a@syzkaller.appspotmail.com
* Print rewritten addresses in tcpdump(8) logged with pflog(4) forbluhm2021-01-201-3/+10
| | | | | | | | rdr-to, nat-to, af-to rules. The kernel uses the information from the packet description and fills it into the fields in the pflog header. While doing this, it is trival to figure out whether the packet has been rewritten. OK sashan@
* pflog(4) tried to log the translated packet with rdr-to, nat-to,bluhm2021-01-192-149/+5
| | | | | | | | | | | | | | and af-to addresses and ports applied. Therefore it created a mbuf chain on the stack with a partial copy. This is too complicated for IP options, extension header, NAT46 af-to, and fragmented mbuf chains. It even caused a crash in syzkaller. Usually the length checks in pf_setup_pdesc() rejected the faked mbuf and the goto copy logged the packet unmodified. Remove the pflog_mtap() function and call bpf_mtap_hdr() directly. As the old buggy code was bypassed in most cases, tcpdump(8) output of pflog does not change. Uncondionally log the unmodified packet. Reported-by: syzbot+947e89e06ac3fec187d0@syzkaller.appspotmail.com OK sashan@
* pipex(4): convert ifunit() to if_unit(9)mvs2021-01-191-2/+5
| | | | ok dlg@
* switch(4): convert ifunit to if_unit(9)mvs2021-01-192-21/+38
| | | | ok dlg@
* pppoe(4): convert ifunit() to if_unit(9)mvs2021-01-191-2/+4
| | | | ok dlg@ kn@
* pipex(4): convert ifunit() to if_unit(9)mvs2021-01-191-5/+12
| | | | ok dlg@
* gre(4): convert ifunit() to if_unit(9)mvs2021-01-191-3/+6
| | | | ok dlg@
* tpmr(4): convert ifunit() to if_unit(9)mvs2021-01-191-11/+7
| | | | ok dlg@
* bpe(4): convert ifunit() to if_unit(9)mvs2021-01-191-8/+15
| | | | ok dlg@
* aggr(4): convert ifunit() to if_unit(9)mvs2021-01-191-16/+21
| | | | ok dlg@
* Convert ifunit() to if_unit(9).mvs2021-01-181-2/+5
| | | | ok sashan@
* Introduce new function if_unit(9). This function returns a pointer themvs2021-01-182-19/+55
| | | | | | | | | | | interface descriptor corresponding to the unique name. This descriptor is guaranteed to be valid until if_put(9) is called on the returned pointer. if_unit(9) should replace already existent ifunit() which returns descriptor not safe for dereference when context was switched. This allow us to avoid some use-after-free issues in ioctl(2) path. Also this unifies interface descriptor usage. ok claudio@ sashan@
* don't encode the mbuf prio as part of the vlan tag in bpf_mtap_ether.dlg2021-01-171-8/+2
| | | | | | | | | the vlan tag we're injecting into the mbuf chain is either straight off the wire and therefore already has the vlan priority encoded, or is straight after it's been set up by vlan(4), which also has the prio already encoded. ok kn@ visa@ mvs@
* The sysctl variable net.inet.ip.forwarding is checked beforebluhm2021-01-161-7/+19
| | | | | | | | | ip_input() passes the packet to ip_forward(). But with an af-to rule, pf(4) calls ip_forward() directly. Check the forwarding sysctl also in pf to get consistent behavior. This requires to set both ip and ip6 forwarding to get packet flow in both directions over af-to rules. OK kn@
* Remove a check that bypasses pf state tests. It dates back to 2003bluhm2021-01-151-7/+1
| | | | | | | when NAT was implemented differently. Now it does not seem to make sense anymore. sashan@ has identified cases where it does harm. dlg@ wants to remove it to simplify route-to code. from dlg@; OK sashan@
* Fix build without carp: ifp0 is only used within #if NCARP > 0.tb2021-01-141-2/+7
| | | | ok kn mvs
* Link pflog(4) instances to `pflog_ifs' list instead of allocatingmvs2021-01-132-52/+31
| | | | | | | | | | | | `pflogifs' array. This was done to prevent panics caused by internal malloc(9) limit. Also we avoid the case while single pflog(4) interface with a high index allocates an array for all indices below and eats up kernel memory. Since we have a very little count of pflog(4) interfaces linear search does not performance impact. ok bluhm@ claudio@ kn@
* Send without kernel lockkn2021-01-131-10/+7
| | | | | | | | The output path can run without kernel lock just fine as is. Looking at CVS log, it seems this was not done during import because IFXF_MPSAFE only became a thing afterwards. OK mvs
* Sometimes a user ID was logged in pflog(4) although the logopt ofbluhm2021-01-122-5/+5
| | | | | | | | | | the rule did not specify it. Check the option again for the log rule in case another rule has triggered a socket lookup. Remove logopt group, it is not documented and cannot work as struct pfloghdr does not contain a gid. Rename PF_LOG_SOCKET_LOOKUP to PF_LOG_USER to express what it does. The lookup involved is only an implemntation detail. OK kn@ sashan@ mvs@
* Remove unused start routinekn2021-01-111-12/+1
| | | | | | pflog(4) does not send or generate packets by design. OK mvs sashan
* Enforce range with sysctl_int_bounded in etherip_sysctlgnezdo2021-01-091-2/+3
| | | | OK millert@
* Enforce range with sysctl_int_bounded in pipex_sysctlgnezdo2021-01-091-3/+3
| | | | OK millert@
* Syzkaller has found a stack overflow in socket splicing. Broadcastbluhm2021-01-091-2/+5
| | | | | | | | | | | | packets were resent through simplex broadcast delivery and socket splicing. Although there is an M_LOOP check in somove(9), it did not take effect. if_input_local() cleared the M_BCAST and M_MCAST flags with m_resethdr(). As if_input_local() is used for broadcast and multicast delivery, it was a mistake to delete them. Keep the M_BCAST and M_MCAST mbuf flags when packets are reinjected into the network stack. Reported-by: syzbot+a43ace363f1b663238f8@syzkaller.appspotmail.com OK anton@; discussed with claudio@
* don't check local carp addresses as part of the antispoof checks.dlg2021-01-081-2/+3
| | | | | | | | | | | | | | | | | | bridge(4) drops packets coming from somewhere else that have a source MAC address that's owned by one of the interfaces that's a member of the bridge. because this check was done with bridge_ourether, it included the addresses of active carp interfaces hanging off these member interfaces. this meant if the local machine is the carp master while another machine is trying to preempt it by sending hellos, the packets from the other machine were dropped because the local one is already the master. carp roles are supposed to move around a l2 network, so another host sending a packet with a carp mac address is actually normal and necessary. found by and fix tested by stsp@ ok stsp@ claudio@
* pppoeintr() is no morekn2021-01-051-2/+1
|
* Process pppoe(4) packets directly, do not queue through netiskn2021-01-045-41/+11
| | | | | | | | | | Less scheduling, lock contention and queues. Previously, if_netisr() handled the net lock around those calls, now if_input_process() does it before calling ether_input(), so no need to add or remove NET_*LOCK() anywhere. OK mvs claudio
* Remove kernel lock from pppoe(4) input pathkn2021-01-041-3/+1
| | | | | | | | | | | | "struct pppoe_softc" documents no member being protected by the kernel lock (alone); further review of the code paths starting from pppoeintr() shows no sleeping points which must be avoided in the softnet thread. Everything is fine as is to run without the big lock, so remove it. Tests sthen Feedback mpi mvs OK mvs claudio
* Minor refactoring in pf(4). Note that struct pfsync_state is nobluhm2021-01-042-35/+19
| | | | | | longer memcopied but assigned. Alignment should not be an issue as it is __packed. Part of a larger diff from dlg@; OK dlg@ sashan@
* Remove unused `pipex_iface_context' struct.mvs2021-01-042-18/+2
| | | | ok ok@ yasuoka@
* Don't call if_deactivate() in switch_clone_destroy(). Followingmvs2021-01-021-2/+1
| | | | | | if_detach() will do this. ok kn@