| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
ok gnezdo@ semarie@ mpi@
|
|
|
|
|
|
|
|
|
|
| |
simplify the handling of the fragment list. Now the functions
ip_fragment() and ip6_fragment() always consume the mbuf. They
free the mbuf and mbuf list in case of an error and take care about
the counter. Adjust the code a bit to make v4 and v6 look similar.
Fixes a potential mbuf leak when pf_route6() called pf_refragment6()
and it failed. Now the mbuf is always freed by ip6_fragment().
OK dlg@ mvs@
|
|
|
|
| |
ok deraadt@ dlg@
|
|
|
|
|
|
|
| |
if you have multiple links to the same destination, this will let
you use them with route-to/reply-to/dup-to.
ok claudio@
|
|
|
|
|
|
|
| |
table. Hence we have to grab both the pf lock and the pf state lock.
Found by dlg@
ok bluhm@ sashan@
|
|
|
|
|
|
| |
addresses that come from pf cannot be right, so remove the code.
Coverity CID 1501718
OK dlg@ claudio@
|
|
|
|
|
|
|
|
|
|
|
|
| |
The code delivered in this change is currently disabled. Brave souls
may enable the code by adding -DWITH_PF_LOCK when building customized
kernel. Big thanks goes to Hrvoje@ for providing test equipment and
testing.
As soon as we enter the next release cycle, the WITH_PF_LOCK will be
defined as default option for MP kernels.
OK dlg@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
before this change pf_route operated on the semantic that pf runs
when packets go over an interface, so when pf_route changed which
interface the packet was on it would run pf_test again. this change
changes (restores) the semantic that pf is only supposed to run
when packets go in or out of the network stack, even if route-to
is responsibly for short circuiting past the network stack.
just to be clear, for normal packets (ie, those not touched by
route-to/reply-to/dup-to), there isn't a difference between running
pf when packets enter or leave the stack, or having pf run when a
packet goes over an interface.
the main reason for this change is that running the same packet
through pf multiple times creates confusion for the state table.
by default, pf states are floating, meaning that packets are matched
to states regardless of which interface they're going over. if a
packet leaving on em0 is rerouted out em1, both traversals will end
up using the same state, which at best will make the accounting
look weird, or at worst fail some checks in the state and get
dropped.
another reason for this commit is is to make handling of the changes
that route-to makes consistent with other changes that are made to
packet. eg, when nat is applied to a packet, we don't run pf_test
again with the new addresses.
the main caveat with this diff is you can't have one rule that
pushes a packet out a different interface, and then have a rule on
that second interface that NATs the packet. i'm not convinced this
ever worked reliably or was used much anyway, so we don't think
it's a big concern.
discussed with many, with special thanks to bluhm@, sashan@ and
sthen@ for weathering most of that pain.
ok claudio@ sashan@ jmatthew@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this is a significant (and breaking) reworking of the policy based
routing that pf can do. the intention is to make it as easy as
nat/rdr to use, and more robust when it's operating.
the main reasons for this change are:
- route-to, reply-to, and dup-to do not work with pfsync
this is because the information about where to route-to is stored in
rules, and it is hard to have a ruleset synced between firewalls,
and impossible to have them synced 100% of the time.
- i can make my boxes panic in certain situations using route-to
yeah...
- the configuration and syntax for route-to rules are confusing.
the argument to route-to and co is an interace name with an optional
ip address. there are several problems with this. one is that people
tend to think about routing as sending packets to peers by their
address, not by the interface they're reachable on. another is that
we currently have no way to synchronise interface topology information
between firewalls, so using an interface to say where packets go
means we can't do failover of these states with pfsync. another
is that a change in routing topology means a host may become
reachable over a different interface. tying routing policy to
interfaces gets in the way of failover and load balancing.
this change does the following:
- stores the route info in the state instead of the pf rule
this allows route-to to keep working when the ruleset changes, and
allows route-to info to be sent over pfsync. there's enough spare bits
in pfsync messages that the protocol doesnt break.
the caveat is that route-to becomes tied to pass rules that create
state, like rdr-to and nat-to.
- the argument to route-to etc is a destination ip address
it's not limited to a next-hop address (thought a next-hop can be a
destination address). this allows for the failover and load balancing
referred to above.
- deprecates the address@interface host syntax in pfctl
because routing is done entirely by IPs, the interface is derived from
the route lookup, not pf. any attempt to use the @interface syntax
will fail now in all contexts.
there's enthusiasm from proctor@ jmatthew@ and others
ok sashan@ bluhm@
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pfsync may want to defer the transmission of a packet. it does this so
it can try and get a state over to a peer firewall before a host may
send a reply to the peer, which would get dropped cos there's no
matching state.
i think the once rule processing should happen before that. the state
is created from the rule, whether the packet the state is for goes out
immediately or not shouldn't matter.
ok sashan@
|
|
|
|
|
|
| |
of course this is limited to the !dup-to case.
ok sashan@ bluhm@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pf_route and pf_route6 are called to take over delivery of the
packet with route-to and reply-to instead of letting it get processed
normally. for the dup-to handling, it copies the mbuf but leaves
the original mbuf in place. pf_route takes over the packet by
clearing the mbuf pointer in the pf_pdesc struct. this diff moves
the clearing of that pointer to the start of the function, rather
than checking for dup-to again on the way out of the function.
i think this is better because it means that it's more robust in
the face of future code changes. even if that's not true, it's still
shorter code in a forwarding path.
ok sashan@ jmatthew@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dup-to is kind of like what you do with a span port, but is a bit
more fine grained. it copies packets in a connection out an interface
so that connection can be monitored. it doesnt make sense for pf
to see the copied packets and try to match or create new states for
them either. at best it needs config to stop pf seeing the copies
(eg, set skip on $dup_to_tgt_if). at worst it breaks the connections
you're monitoring because the states in pf get confused.
found while discussing larger route-to changes on tech@.
ok bluhm@ sashan@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and af-to addresses and ports applied. Therefore it created a mbuf
chain on the stack with a partial copy. This is too complicated
for IP options, extension header, NAT46 af-to, and fragmented mbuf
chains. It even caused a crash in syzkaller. Usually the length
checks in pf_setup_pdesc() rejected the faked mbuf and the goto
copy logged the packet unmodified. Remove the pflog_mtap() function
and call bpf_mtap_hdr() directly. As the old buggy code was bypassed
in most cases, tcpdump(8) output of pflog does not change.
Uncondionally log the unmodified packet.
Reported-by: syzbot+947e89e06ac3fec187d0@syzkaller.appspotmail.com
OK sashan@
|
|
|
|
|
|
|
|
|
| |
ip_input() passes the packet to ip_forward(). But with an af-to
rule, pf(4) calls ip_forward() directly. Check the forwarding
sysctl also in pf to get consistent behavior. This requires to set
both ip and ip6 forwarding to get packet flow in both directions
over af-to rules.
OK kn@
|
|
|
|
|
|
|
| |
when NAT was implemented differently. Now it does not seem to make
sense anymore. sashan@ has identified cases where it does harm.
dlg@ wants to remove it to simplify route-to code.
from dlg@; OK sashan@
|
|
|
|
| |
ok kn mvs
|
|
|
|
|
|
| |
longer memcopied but assigned. Alignment should not be an issue
as it is __packed.
Part of a larger diff from dlg@; OK dlg@ sashan@
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this "fixes" TCP going over an interface with fq codel enabled. the
way the codel code classifies a packet without a flowid set is to
randomly assign it to a bucket. this in turn means that packets
will get reordered, and tcp hates that.
sthen was able to find a test case and narrow down at which time
the problem appeared, helped greatly.
tested by sthen@ and millert@
ok sashan@ jmatthew@
|
|
|
|
|
|
| |
issue noticed by sthen@. fix discussed with bluhm@ and procter@
OK bluhm@, kn@, procter@
|
|
|
|
| |
ok sashan@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
time_second(9) and time_uptime(9) are widely used in the kernel to
quickly get the system UTC or system uptime as a time_t. However,
time_t is 64-bit everywhere, so it is not generally safe to use them
on 32-bit platforms: you have a split-read problem if your hardware
cannot perform atomic 64-bit reads.
This patch replaces time_second(9) with gettime(9), a safer successor
interface, throughout the kernel. Similarly, time_uptime(9) is replaced
with getuptime(9).
There is a performance cost on 32-bit platforms in exchange for
eliminating the split-read problem: instead of two register reads you
now have a lockless read loop to pull the values from the timehands.
This is really not *too* bad in the grand scheme of things, but
compared to what we were doing before it is several times slower.
There is no performance cost on 64-bit (__LP64__) platforms.
With input from visa@, dlg@, and tedu@.
Several bugs squashed by visa@.
ok kettenis@
|
|
|
|
|
| |
i've been wanting to do this for a while, and now that we've got
stoeplitz and it gives us 16 bits, it seems like the right time.
|
|
|
|
|
| |
in and the pf_pktdelay struct ws not declared and initialzed properly.
ok rob@ kn@
|
|
|
|
|
| |
This is clearer and more consistent with the rest of the kernel.
OK deraadt@ sashan@
|
|
|
|
| |
ok bluhm@
|
|
|
|
|
|
| |
(bug reported and fix tested by Kor)
ok kn@
|
|
|
|
| |
ok lteo@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. If a packet happens to match an expired once rule before the rule is removed
by the purge thread, the rule will be added to the pf_rule_gcl list again,
eventually causing a kernel crash when the purge thread tries to remove the
expired rule multiple times; and
2. A packet that matches an expired once rule will still cause a state to be
created, so a once rule is not truly a one shot rule while it is in that
expired-but-not-purged time window.
To fix both bugs, add a check in pf_test_rule() to prevent expired once rules
from being added to pf_rule_gcl. The check is added "early" in pf_test_rule()
to prevent any new connections from creating state if they match the expired
once rule.
This commit also includes a tweak by sashan@ to ensure that only one PF task
will mark a once rule as expired. Here is sashan@'s commentary:
"As soon as there will be more PF tasks running in parallel, we would be
able to hit similar crash you are fixing now. The rules are covered by
read lock, so with more PF tasks there might be two packets racing
to expire at once rule at the same time. Using atomic_cas() is sufficient
measure to serialize competing packets."
tested by abieber@ who reported the kernel crash on bugs@
ok sashan@
|
|
|
|
| |
ok yasuoka
|
|
|
|
|
|
|
| |
Src-node should use the reference counter since it might live longer
than its table entry, rule or the associated states.
OK sashan
|
|
|
|
|
|
|
| |
hop interface configured with "route-to" was not used. Keep the
interface within the pf_src_node and use it when the record is used.
OK sashan
|
|
|
|
|
|
| |
there are states which refer it.
OK sashan
|
|
|
|
|
|
|
|
|
|
|
| |
packet in their payload that matches an exiting connection. It was
not checked whether the outer ICMP packet has the same destination
IP as the source IP of the inner protocol packet. Enforce that
these addresses match, to prevent ICMP packets that do not make
sense.
Issue found by Nicolas Collignon, Corentin Bayet, Eloi Vanderbeken,
Luca Moro at Synacktiv.com
OK sashan@
|
|
|
|
| |
OK kn@, florian@, visa@, cheloha@
|
|
|
|
|
|
| |
These are just unhelpful case conversion.
OK sashan henning
|
|
|
|
| |
dir make debugging much easier than the if alone.
|
|
|
|
|
|
|
|
|
| |
Regression has been introduced in version 1.1024 (a 6.2 time frame).
It's been discovered and reported by Fabian Mueller-Knapp. Fair amount
of credit goes to kn@, benno@ and henning@ for pointing me to releveant
section of pf.conf(5). Fabian and kn@ also did test the patch.
OK kn@, henning@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When evaluating the anchor's ruleset, prevent clobbering it's very own
`quick' test result by blindly setting it.
This makes the following pf.conf work as intended (packets would be blocked
since `quick' had no effect):
anchor quick {
pass
}
block
Broken since after 6.1 release as reported by Fabian Mueller-Knapp, thanks!
OK henning sashan
|
|
|
|
|
|
| |
start locking the socket. An inp can be referenced by the PCB queue
and hashes, by a pf mbuf header, or by a pf state key.
OK visa@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this change adds a pf_state_lock rw-lock, which protects consistency
of state table in PF. The code delivered in this change is guarded
by 'WITH_PF_LOCK', which is still undefined. People, who are willing
to experiment and want to run it must do two things:
- compile kernel with -DWITH_PF_LOCK
- bump NET_TASKQ from 1 to ... sky is the limit,
(just select some sensible value for number of tasks your
system is able to handle)
OK bluhm@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to the missing "void", this
extern void pf_purge_expired_src_nodes();
is no prototype but a declaration. It is enough to suppress the
'implicit declaration' warning but it does not allow the compiler to
check the arguments passed to the calls of the function.
Fix the prototypes and don't pass the waslocked argument anymore. It has
been removed a year ago.
ok sashan henning
|
|
|
|
| |
OK mpi@, OK henning@, OK jca@
|
|
|
|
|
| |
when we moved most of the functionality into a function. g/c the macro
and just call the function. ok mpi jca
|
|
|
|
|
|
|
|
|
|
| |
the state was created on this host, i. e. not for those pfsync-imported.
whether pfsync-imported states should be accounted is a seperate discussion,
but as things are, we only increment the counter in pf_create_state(), and
imported states don't excercise that path.
probably fixes the half-open states accounting underflow-wraparounds that
some people have been seeing.
ok sashan
|
|
|
|
|
|
| |
by pf in the packet header. pf_delay_pkt reads the delay value from the packet
header, schedules a timeout and re-queues the packet when the timeout fires.
ok benno sashan
|
|
|
|
|
|
|
| |
- MSS and WSCALE option candidates must now meet their min type length.
- 'max-mss' is now more tolerant of malformed option lists.
These changes were immaterial to the live traffic I've examined.
OK sashan@ mpi@
|
|
|
|
|
|
|
|
|
|
| |
pf_state_insert(), so the state key has not been set. When inlining,
the compiler recognized the NULL pointer dereference in
s->key[PF_SK_STACK]->proto and optimized it away. But if pf.c was
compiled with -fno-inline, the system crashed during boot. Add a
NULL check in pf_set_protostate() to handle the situation when the
function is called.
OK sashan@ henning@
|
|
|
|
|
| |
goto unlock when leaving this block.
OK sashan@ henning@
|
|
|
|
|
|
|
|
| |
to the network. This is necessary for locally generated icmp packets
that would be dropped otherwise. Refine this check to modify only
the source address of packets that go to the external network. This
allows route-to tricks on loopback interface.
OK sashan@
|