summaryrefslogtreecommitdiffstats
path: root/sys/net/pf.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* spellingjsg2021-03-101-2/+2
| | | | ok gnezdo@ semarie@ mpi@
* Refactor ip_fragment() and ip6_fragment(). Use a mbuf list tobluhm2021-03-011-17/+13
| | | | | | | | | | simplify the handling of the fragment list. Now the functions ip_fragment() and ip6_fragment() always consume the mbuf. They free the mbuf and mbuf list in case of an error and take care about the counter. Adjust the code a bit to make v4 and v6 look similar. Fixes a potential mbuf leak when pf_route6() called pf_refragment6() and it failed. Now the mbuf is always freed by ip6_fragment(). OK dlg@ mvs@
* Use NULL instead of 0 in `m_nextpkt' assignment.mvs2021-02-231-2/+2
| | | | ok deraadt@ dlg@
* use rtalloc_mpath in pf_route and pf_route6.dlg2021-02-161-3/+4
| | | | | | | if you have multiple links to the same destination, this will let you use them with route-to/reply-to/dup-to. ok claudio@
* pf_remove_divert_state() is an entry point into pf, modifying the pf statepatrick2021-02-121-1/+7
| | | | | | | table. Hence we have to grab both the pf lock and the pf state lock. Found by dlg@ ok bluhm@ sashan@
* Fix null pointer dereference in pf_route6(). Embedding scope intobluhm2021-02-121-3/+1
| | | | | | addresses that come from pf cannot be right, so remove the code. Coverity CID 1501718 OK dlg@ claudio@
* make if_pfsync.c a better friend with PF_LOCKsashan2021-02-041-4/+4
| | | | | | | | | | | | The code delivered in this change is currently disabled. Brave souls may enable the code by adding -DWITH_PF_LOCK when building customized kernel. Big thanks goes to Hrvoje@ for providing test equipment and testing. As soon as we enter the next release cycle, the WITH_PF_LOCK will be defined as default option for MP kernels. OK dlg@
* change pf_route so pf only runs when packets enter and leave the stack.dlg2021-02-031-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | before this change pf_route operated on the semantic that pf runs when packets go over an interface, so when pf_route changed which interface the packet was on it would run pf_test again. this change changes (restores) the semantic that pf is only supposed to run when packets go in or out of the network stack, even if route-to is responsibly for short circuiting past the network stack. just to be clear, for normal packets (ie, those not touched by route-to/reply-to/dup-to), there isn't a difference between running pf when packets enter or leave the stack, or having pf run when a packet goes over an interface. the main reason for this change is that running the same packet through pf multiple times creates confusion for the state table. by default, pf states are floating, meaning that packets are matched to states regardless of which interface they're going over. if a packet leaving on em0 is rerouted out em1, both traversals will end up using the same state, which at best will make the accounting look weird, or at worst fail some checks in the state and get dropped. another reason for this commit is is to make handling of the changes that route-to makes consistent with other changes that are made to packet. eg, when nat is applied to a packet, we don't run pf_test again with the new addresses. the main caveat with this diff is you can't have one rule that pushes a packet out a different interface, and then have a rule on that second interface that NATs the packet. i'm not convinced this ever worked reliably or was used much anyway, so we don't think it's a big concern. discussed with many, with special thanks to bluhm@, sashan@ and sthen@ for weathering most of that pain. ok claudio@ sashan@ jmatthew@
* change route-to so it sends packets to IPs instead of interfaces.dlg2021-02-011-108/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | this is a significant (and breaking) reworking of the policy based routing that pf can do. the intention is to make it as easy as nat/rdr to use, and more robust when it's operating. the main reasons for this change are: - route-to, reply-to, and dup-to do not work with pfsync this is because the information about where to route-to is stored in rules, and it is hard to have a ruleset synced between firewalls, and impossible to have them synced 100% of the time. - i can make my boxes panic in certain situations using route-to yeah... - the configuration and syntax for route-to rules are confusing. the argument to route-to and co is an interace name with an optional ip address. there are several problems with this. one is that people tend to think about routing as sending packets to peers by their address, not by the interface they're reachable on. another is that we currently have no way to synchronise interface topology information between firewalls, so using an interface to say where packets go means we can't do failover of these states with pfsync. another is that a change in routing topology means a host may become reachable over a different interface. tying routing policy to interfaces gets in the way of failover and load balancing. this change does the following: - stores the route info in the state instead of the pf rule this allows route-to to keep working when the ruleset changes, and allows route-to info to be sent over pfsync. there's enough spare bits in pfsync messages that the protocol doesnt break. the caveat is that route-to becomes tied to pass rules that create state, like rdr-to and nat-to. - the argument to route-to etc is a destination ip address it's not limited to a next-hop address (thought a next-hop can be a destination address). this allows for the failover and load balancing referred to above. - deprecates the address@interface host syntax in pfctl because routing is done entirely by IPs, the interface is derived from the route lookup, not pf. any attempt to use the @interface syntax will fail now in all contexts. there's enthusiasm from proctor@ jmatthew@ and others ok sashan@ bluhm@
* handle "once" rules before letting pfsync defer tx of a packet.dlg2021-01-281-15/+15
| | | | | | | | | | | | | pfsync may want to defer the transmission of a packet. it does this so it can try and get a state over to a peer firewall before a host may send a reply to the peer, which would get dropped cos there's no matching state. i think the once rule processing should happen before that. the state is created from the rule, whether the packet the state is for goes out immediately or not shouldn't matter. ok sashan@
* if the route resolved in pf_route is invalid, generate an icmp error.dlg2021-01-271-1/+10
| | | | | | of course this is limited to the !dup-to case. ok sashan@ bluhm@
* have pf_route{,6} clear the pf_pdesc mbuf ref early for route-to/reply-to.dlg2021-01-271-5/+3
| | | | | | | | | | | | | | | | pf_route and pf_route6 are called to take over delivery of the packet with route-to and reply-to instead of letting it get processed normally. for the dup-to handling, it copies the mbuf but leaves the original mbuf in place. pf_route takes over the packet by clearing the mbuf pointer in the pf_pdesc struct. this diff moves the clearing of that pointer to the start of the function, rather than checking for dup-to again on the way out of the function. i think this is better because it means that it's more robust in the face of future code changes. even if that's not true, it's still shorter code in a forwarding path. ok sashan@ jmatthew@
* don't run copies of packets made by dup-to through pf_test.dlg2021-01-271-3/+3
| | | | | | | | | | | | | | dup-to is kind of like what you do with a span port, but is a bit more fine grained. it copies packets in a connection out an interface so that connection can be monitored. it doesnt make sense for pf to see the copied packets and try to match or create new states for them either. at best it needs config to stop pf seeing the copies (eg, set skip on $dup_to_tgt_if). at worst it breaks the connections you're monitoring because the states in pf get confused. found while discussing larger route-to changes on tech@. ok bluhm@ sashan@
* pflog(4) tried to log the translated packet with rdr-to, nat-to,bluhm2021-01-191-10/+3
| | | | | | | | | | | | | | and af-to addresses and ports applied. Therefore it created a mbuf chain on the stack with a partial copy. This is too complicated for IP options, extension header, NAT46 af-to, and fragmented mbuf chains. It even caused a crash in syzkaller. Usually the length checks in pf_setup_pdesc() rejected the faked mbuf and the goto copy logged the packet unmodified. Remove the pflog_mtap() function and call bpf_mtap_hdr() directly. As the old buggy code was bypassed in most cases, tcpdump(8) output of pflog does not change. Uncondionally log the unmodified packet. Reported-by: syzbot+947e89e06ac3fec187d0@syzkaller.appspotmail.com OK sashan@
* The sysctl variable net.inet.ip.forwarding is checked beforebluhm2021-01-161-7/+19
| | | | | | | | | ip_input() passes the packet to ip_forward(). But with an af-to rule, pf(4) calls ip_forward() directly. Check the forwarding sysctl also in pf to get consistent behavior. This requires to set both ip and ip6 forwarding to get packet flow in both directions over af-to rules. OK kn@
* Remove a check that bypasses pf state tests. It dates back to 2003bluhm2021-01-151-7/+1
| | | | | | | when NAT was implemented differently. Now it does not seem to make sense anymore. sashan@ has identified cases where it does harm. dlg@ wants to remove it to simplify route-to code. from dlg@; OK sashan@
* Fix build without carp: ifp0 is only used within #if NCARP > 0.tb2021-01-141-2/+7
| | | | ok kn mvs
* Minor refactoring in pf(4). Note that struct pfsync_state is nobluhm2021-01-041-18/+4
| | | | | | longer memcopied but assigned. Alignment should not be an issue as it is __packed. Part of a larger diff from dlg@; OK dlg@ sashan@
* when setting a flowid, set the M_FLOWID csum_flags bit too.dlg2020-12-101-2/+4
| | | | | | | | | | | | | this "fixes" TCP going over an interface with fq codel enabled. the way the codel code classifies a packet without a flowid set is to randomly assign it to a bucket. this in turn means that packets will get reordered, and tcp hates that. sthen was able to find a test case and narrow down at which time the problem appeared, helped greatly. tested by sthen@ and millert@ ok sashan@ jmatthew@
* synproxy should be processing incoming SYN packets only.sashan2020-12-071-2/+2
| | | | | | issue noticed by sthen@. fix discussed with bluhm@ and procter@ OK bluhm@, kn@, procter@
* Use interface index instead of pointer to `ifnet' in carp(4).mvs2020-07-241-9/+15
| | | | ok sashan@
* kernel: use gettime(9)/getuptime(9) in lieu of time_second(9)/time_uptime(9)cheloha2020-06-241-20/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | time_second(9) and time_uptime(9) are widely used in the kernel to quickly get the system UTC or system uptime as a time_t. However, time_t is 64-bit everywhere, so it is not generally safe to use them on 32-bit platforms: you have a split-read problem if your hardware cannot perform atomic 64-bit reads. This patch replaces time_second(9) with gettime(9), a safer successor interface, throughout the kernel. Similarly, time_uptime(9) is replaced with getuptime(9). There is a performance cost on 32-bit platforms in exchange for eliminating the split-read problem: instead of two register reads you now have a lockless read loop to pull the values from the timehands. This is really not *too* bad in the grand scheme of things, but compared to what we were doing before it is several times slower. There is no performance cost on 64-bit (__LP64__) platforms. With input from visa@, dlg@, and tedu@. Several bugs squashed by visa@. ok kettenis@
* make ph_flowid in mbufs 16bits by storing whether it's set in csum_flags.dlg2020-06-171-5/+3
| | | | | i've been wanting to do this for a while, and now that we've got stoeplitz and it gives us 16 bits, it seems like the right time.
* "set delay" never worked as committed: the delay field was not copiedotto2019-11-171-3/+3
| | | | | in and the pf_pktdelay struct ws not declared and initialzed properly. ok rob@ kn@
* Use -1 to indicate an invalid uid/gid, not UID_MAX and GID_MAX.millert2019-10-171-7/+7
| | | | | This is clearer and more consistent with the rest of the kernel. OK deraadt@ sashan@
* pf_state_insert() must grab state lock exclusivelysashan2019-08-291-2/+9
| | | | ok bluhm@
* pf.conf "set timeout interval 1" causes kernel crashsashan2019-08-261-2/+5
| | | | | | (bug reported and fix tested by Kor) ok kn@
* follow up to 'once rule' expirationsashan2019-07-181-3/+4
| | | | ok lteo@
* This commit fixes two bugs involving PF once rules:lteo2019-07-181-4/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. If a packet happens to match an expired once rule before the rule is removed by the purge thread, the rule will be added to the pf_rule_gcl list again, eventually causing a kernel crash when the purge thread tries to remove the expired rule multiple times; and 2. A packet that matches an expired once rule will still cause a state to be created, so a once rule is not truly a one shot rule while it is in that expired-but-not-purged time window. To fix both bugs, add a check in pf_test_rule() to prevent expired once rules from being added to pf_rule_gcl. The check is added "early" in pf_test_rule() to prevent any new connections from creating state if they match the expired once rule. This commit also includes a tweak by sashan@ to ensure that only one PF task will mark a once rule as expired. Here is sashan@'s commentary: "As soon as there will be more PF tasks running in parallel, we would be able to hit similar crash you are fixing now. The rules are covered by read lock, so with more PF tasks there might be two packets racing to expire at once rule at the same time. Using atomic_cas() is sufficient measure to serialize competing packets." tested by abieber@ who reported the kernel crash on bugs@ ok sashan@
* fix NULL pointer dereference, reported and fix tested by sthensashan2019-07-111-3/+5
| | | | ok yasuoka
* Fix previous commit which made src-node have a reference for the kif.yasuoka2019-07-091-1/+3
| | | | | | | Src-node should use the reference counter since it might live longer than its table entry, rule or the associated states. OK sashan
* When source address tracking record is used for "route-to", the nextyasuoka2019-07-021-3/+4
| | | | | | | hop interface configured with "route-to" was not used. Keep the interface within the pf_src_node and use it when the record is used. OK sashan
* Link the state and the source track to keep the source track whileyasuoka2019-07-011-10/+9
| | | | | | there are states which refer it. OK sashan
* States in pf(4) let ICMP and ICMP6 packets pass if they have abluhm2019-03-201-4/+24
| | | | | | | | | | | packet in their payload that matches an exiting connection. It was not checked whether the outer ICMP packet has the same destination IP as the source IP of the inner protocol packet. Enforce that these addresses match, to prevent ICMP packets that do not make sense. Issue found by Nicolas Collignon, Corentin Bayet, Eloi Vanderbeken, Luca Moro at Synacktiv.com OK sashan@
* Use timeout_add_sec() instead of timeout_add() with a multiplication with hzclaudio2018-12-171-2/+2
| | | | OK kn@, florian@, visa@, cheloha@
* Remove useless macroskn2018-12-101-32/+34
| | | | | | These are just unhelpful case conversion. OK sashan henning
* in the "pf: key search" debug message, add the direction. interface *and*henning2018-11-151-2/+3
| | | | dir make debugging much easier than the if alone.
* - pf: honor quick on anchor rulessashan2018-10-161-4/+5
| | | | | | | | | Regression has been introduced in version 1.1024 (a 6.2 time frame). It's been discovered and reported by Fabian Mueller-Knapp. Fair amount of credit goes to kn@, benno@ and henning@ for pointing me to releveant section of pf.conf(5). Fabian and kn@ also did test the patch. OK kn@, henning@
* Honor quick on anchor ruleskn2018-10-041-1/+7
| | | | | | | | | | | | | | | | | When evaluating the anchor's ruleset, prevent clobbering it's very own `quick' test result by blindly setting it. This makes the following pf.conf work as intended (packets would be blocked since `quick' had no effect): anchor quick { pass } block Broken since after 6.1 release as reported by Fabian Mueller-Knapp, thanks! OK henning sashan
* Add reference counting for inet pcb, this will be needed when webluhm2018-09-131-7/+27
| | | | | | start locking the socket. An inp can be referenced by the PCB queue and hashes, by a pf mbuf header, or by a pf state key. OK visa@
* - moving state look up outside of PF_LOCK()sashan2018-09-111-33/+117
| | | | | | | | | | | | | | this change adds a pf_state_lock rw-lock, which protects consistency of state table in PF. The code delivered in this change is guarded by 'WITH_PF_LOCK', which is still undefined. People, who are willing to experiment and want to run it must do two things: - compile kernel with -DWITH_PF_LOCK - bump NET_TASKQ from 1 to ... sky is the limit, (just select some sensible value for number of tasks your system is able to handle) OK bluhm@
* Fix arguments of pf_purge_expired_{src_nodes,rules}()sf2018-07-221-2/+2
| | | | | | | | | | | | | | | Due to the missing "void", this extern void pf_purge_expired_src_nodes(); is no prototype but a declaration. It is enough to suppress the 'implicit declaration' warning but it does not allow the compiler to check the arguments passed to the calls of the function. Fix the prototypes and don't pass the waslocked argument anymore. It has been removed a year ago. ok sashan henning
* trade few 'goto unlock: for 'break' in pf_test()sashan2018-07-121-6/+5
| | | | OK mpi@, OK henning@, OK jca@
* the STATE_LOOKUP macro made sense ages ago. It stopped making sensehenning2018-07-111-56/+69
| | | | | when we moved most of the functionality into a function. g/c the macro and just call the function. ok mpi jca
* in pf_set_protostate(), only decrement the half-open states counter whenhenning2018-07-101-2/+2
| | | | | | | | | | the state was created on this host, i. e. not for those pfsync-imported. whether pfsync-imported states should be accounted is a seperate discussion, but as things are, we only increment the counter in pf_create_state(), and imported states don't excercise that path. probably fixes the half-open states accounting underflow-wraparounds that some people have been seeing. ok sashan
* provide a generic packet delay functionality. packets to be delayed are markedhenning2018-07-101-3/+43
| | | | | | by pf in the packet header. pf_delay_pkt reads the delay value from the packet header, schedules a timeout and re-queues the packet when the timeout fires. ok benno sashan
* Refactor the six ways to find TCP options into one new function. As a result:procter2018-06-181-106/+91
| | | | | | | - MSS and WSCALE option candidates must now meet their min type length. - 'max-mss' is now more tolerant of malformed option lists. These changes were immaterial to the live traffic I've examined. OK sashan@ mpi@
* The function pf_create_state() calls pf_set_protostate() beforebluhm2018-06-041-2/+3
| | | | | | | | | | pf_state_insert(), so the state key has not been set. When inlining, the compiler recognized the NULL pointer dereference in s->key[PF_SK_STACK]->proto and optimized it away. But if pf.c was compiled with -fno-inline, the system crashed during boot. Add a NULL check in pf_set_protostate() to handle the situation when the function is called. OK sashan@ henning@
* While sending the pf syncookie, we are holding the pf lock. Sobluhm2018-06-011-2/+2
| | | | | goto unlock when leaving this block. OK sashan@ henning@
* pf route-to should not send packets from 127.0.0.1 or ::1 addressbluhm2018-05-101-3/+5
| | | | | | | | to the network. This is necessary for locally generated icmp packets that would be dropped otherwise. Refine this check to modify only the source address of packets that go to the external network. This allows route-to tricks on loopback interface. OK sashan@