| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There were multiple places where a rwlock was used despite never
rlocking, so just change these into mtxs. This was done with the aid of
Coccinelle's spatch, using this input:
#spatch -j 4 --recursive-includes --include-headers-for-types --include-headers --in-place --macro-file <seebelow.h>
virtual after_start
@initialize:ocaml@
@@
let has_write_table = Hashtbl.create 101
let has_read_table = Hashtbl.create 101
let ok i m =
let entry = (i,m) in
Hashtbl.mem has_write_table entry && not(Hashtbl.mem has_read_table entry)
@hasw depends on !after_start@
identifier i,m;
struct i x;
@@
(
rw_wlock(&x.m)
|
rw_wunlock(&x.m)
)
@script:ocaml@
i << hasw.i;
m << hasw.m;
@@
Hashtbl.replace has_write_table (i,m) ()
@hasr depends on !after_start@
identifier i,m;
struct i x;
@@
(
rw_rlock(&x.m)
|
rw_runlock(&x.m)
)
@script:ocaml@
i << hasr.i;
m << hasr.m;
@@
Hashtbl.replace has_read_table (i,m) ()
@finalize:ocaml depends on !after_start@
wt << merge.has_write_table;
rt << merge.has_read_table;
@@
let redo ts dst =
List.iter (Hashtbl.iter (fun k _ -> Hashtbl.add dst k ())) ts in
redo wt has_write_table;
redo rt has_read_table;
let it = new iteration() in
it#add_virtual_rule After_start;
it#register()
(* ----------------------------------------------------------- *)
@depends on after_start@
identifier i;
identifier m : script:ocaml(i) { ok i m };
@@
struct i {
...
- struct rwlock m;
+ struct mtx m;
...
}
@depends on after_start disable fld_to_ptr@
identifier m;
identifier i : script:ocaml(m) { ok i m };
struct i x;
@@
- rw_wlock
+ mtx_lock
(&x.m)
@depends on after_start disable fld_to_ptr@
identifier m;
identifier i : script:ocaml(m) { ok i m };
struct i x;
@@
- rw_wunlock
+ mtx_unlock
(&x.m)
@depends on after_start disable fld_to_ptr@
identifier m;
expression e;
identifier i : script:ocaml(m) { ok i m };
struct i x;
@@
- rw_init(&x.m, e);
+ mtx_init(&x.m, e, NULL, MTX_DEF);
@depends on after_start disable fld_to_ptr@
identifier m;
identifier i : script:ocaml(m) { ok i m };
struct i x;
@@
- rw_destroy
+ mtx_destroy
(&x.m)
@depends on after_start disable fld_to_ptr, ptr_to_array@
identifier m;
identifier i : script:ocaml(m) { ok i m };
struct i *x;
@@
- rw_wlock
+ mtx_lock
(&x->m)
@depends on after_start disable fld_to_ptr, ptr_to_array@
identifier m;
identifier i : script:ocaml(m) { ok i m };
struct i *x;
@@
- rw_wunlock
+ mtx_unlock
(&x->m)
@depends on after_start disable fld_to_ptr, ptr_to_array@
identifier m;
expression e;
identifier i : script:ocaml(m) { ok i m };
struct i *x;
@@
- rw_init(&x->m, e);
+ mtx_init(&x->m, e, NULL, MTX_DEF);
@depends on after_start disable fld_to_ptr, ptr_to_array@
identifier m;
identifier i : script:ocaml(m) { ok i m };
struct i *x;
@@
- rw_destroy
+ mtx_destroy
(&x->m)
A few macros needed to be provided manually for the parser to work:
#define LIST_HEAD(x,y) int
#define TAILQ_HEAD(x,y) int
#define STAILQ_HEAD(x,y) int
#define CK_LIST_HEAD(x,y) int
#define CK_LIST_ENTRY(x) int
#define LIST_ENTRY(x) int
#define TAILQ_ENTRY(x) int
#define STAILQ_ENTRY(x) int
Co-authored-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
|
| |
Before, most uses of rwlock and mtx never called the destroy method,
which might cause problems for witness.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
|
|
| |
This was added to 12.1 in a security fix, but wasn't really wired up
properly, so this effectively disables it from packet resetting, which
is a bummer, but it's more preferable than hacking this in bad ways.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Hopefully bad tests will cause the module to not insert, so the CI picks
this up. It looks like a failure to insert the module at the moment
actually causes another crash, though:
Kernel page fault with the following non-sleepable locks held:
exclusive sleep mutex if_cloners lock (if_cloners lock) r = 0 (0xffffffff81d9a9b8) locked @ /usr/src/sys/net/if_clone.c:447
stack backtrace:
#0 0xffffffff80c66181 at witness_debugger+0x71
#1 0xffffffff80c6729d at witness_warn+0x40d
#2 0xffffffff8109499e at trap_pfault+0x7e
#3 0xffffffff81093fab at trap+0x2ab
#4 0xffffffff810687f8 at calltrap+0x8
#5 0xffffffff82925610 at wg_module_event_handler+0x120
#6 0xffffffff80bd53c3 at module_register_init+0xd3
#7 0xffffffff80bc5c61 at linker_load_module+0xc01
#8 0xffffffff80bc73b9 at kern_kldload+0xe9
#9 0xffffffff80bc74db at sys_kldload+0x5b
#10 0xffffffff810952f7 at amd64_syscall+0x147
#11 0xffffffff8106911e at fast_syscall_common+0xf8
Fatal trap 12: page fault while in kernel mode
cpuid = 9; apic id = 09
fault virtual address = 0x70
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80d18e37
stack pointer = 0x28:0xfffffe0115fb35a0
frame pointer = 0x28:0xfffffe0115fb35c0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 1587 (kldload)
trap number = 12
panic: page fault
cpuid = 9
time = 1621380034
KDB: stack backtrace:
#0 0xffffffff80c44695 at kdb_backtrace+0x65
#1 0xffffffff80bf9d01 at vpanic+0x181
#2 0xffffffff80bf9ad3 at panic+0x43
#3 0xffffffff81094917 at trap_fatal+0x387
#4 0xffffffff810949b7 at trap_pfault+0x97
#5 0xffffffff81093fab at trap+0x2ab
#6 0xffffffff810687f8 at calltrap+0x8
#7 0xffffffff82925610 at wg_module_event_handler+0x120
#8 0xffffffff80bd53c3 at module_register_init+0xd3
#9 0xffffffff80bc5c61 at linker_load_module+0xc01
#10 0xffffffff80bc73b9 at kern_kldload+0xe9
#11 0xffffffff80bc74db at sys_kldload+0x5b
#12 0xffffffff810952f7 at amd64_syscall+0x147
#13 0xffffffff8106911e at fast_syscall_common+0xf8
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes:
#12 0xffffffff80f20105 in uma_zalloc_arg (zone=<optimized out>,
udata=<optimized out>, udata@entry=0x0, flags=flags@entry=257)
at /usr/src/sys/vm/uma_core.c:3420
#13 0xffffffff82922844 in uma_zalloc (zone=<unavailable>, flags=257)
at /usr/src/sys/vm/uma.h:358
#14 wg_packet_alloc (m=0xfffff801154bfa00) at if_wg.c:1769
#15 wg_send_keepalive (peer=0xfffff800075dd000,
peer@entry=<error reading variable: value is not available>)
at if_wg.c:1291
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise CK_LIST_REMOVE might be called twice on the same element.
Running the following trigger will reproduce the bug that Manojav
reported:
#!/usr/local/bin/bash
NUM_PEER=50
peer_args=( )
for ((i=0; i<NUM_PEER; i++)); do
port="$RANDOM"
private="$(wg genkey)"
ifconfig "wg$i" create
wg set "wg$i" listen-port "$port" private-key <(echo "$private")
peer_args+=( peer "$(wg pubkey <<<"$private")" endpoint "127.0.0.1:$port" persistent-keepalive 1 )
done
for ((i=0; i<NUM_PEER; i++)); do
wg set "wg$i" "${peer_args[@]}"
ifconfig "wg$i" up
done
wg
while true; do
for ((i=0; i<NUM_PEER; i++)); do
ifconfig "wg$i" down
ifconfig "wg$i" up
done
done
Reported-by: Manojav Sridhar <manojav@manojav.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
|
| |
This takes into account mismatched MTUs. Borrows the
"calculate_skb_padding" function from Linux.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The rest of the code uses this, so go with it for now. Maybe later ncon
will want to clean up everything to be this way, but for now keep it
consistent.
This partially reverts commit a1fdf6646b16ec26c741089102346f5455dc5fed,
but doesn't reintroduce the bug that it had fixed.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
|
|
|
| |
This makes the crypto a *lot* faster. We might later revert this if we
use opencrypto's fancy page table mapping scheme. But for now, it's
useful. We do it early, rather than before calling decrypt/encrypt, so
that the various other m_pullups that we have wind up being no-ops.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
|
|
| |
Rather than making a tiny mbuf and then allocating another one in
m_copyback, just make one larger one. These packets are generally pretty
small anyway.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
|
|
|
| |
Either m_pullup reallocates, in which case wg_packet_free winds up
freeing something that's already been freed, or it fails and frees m,
and then wg_packet_free tries to free it again. In both cases, massive
memory corruption ensues.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
|
| |
This can be called when locks are held in upper parts of the stack,
resulting in witness splats.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
|
|
|
|
|
| |
In the occasion that noise_begin_session returns != 0, we could
accidentally leak the remote refcount, as the caller to
consume_response only expects *rp to be set when ret == 0.
The only situation we could leak this is if we cannot allocate memory
for the new keypair.
Signed-off-by: Matt Dunwoodie <ncon@noconroy.net>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The netmap code goes directly to if_transmit, which means it'll bypass
if_output, in which case, there's no packet allocated. Also, we're
relying on if_output's sockaddr structure to be legit, but who knows
what types of userspace hijynxes can forge this. Rather than relying on
that kind of black magic, determine the AF from the actual packet
contents. But still insist that it agrees with the sockaddr.
The extraction of the type from AF_UNSPEC follows the same pattern as
if_gif and if_gre.
We also use this as an opportunity to send icmp error messages in the
right place.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
| |
If sc is null, jumping to increment the counter means crash.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
| |
The lack of 64bit atomic helpers on 32bit is an annoyance.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
|
| |
We do a pretty nasty hack in the allowedips selftest to avoid having to
allocate more memory. Seems to work.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The peer (and keypair and local) lifecycle are managed through EPOCH and
refcounts. Primarily this is used in wg_noise to keep track of active
keypairs, however we can also use it to be sure no more peer references
exist. The structures are linked as such, so noise_remote cannot be
freed until all noise_keypairs are freed, and noise_local cannot be
freed until all noise_remotes are freed.
noise_keypair -> noise_remote -> noise_local
Therefore, if you hold a keypair reference you can be sure that remote
and local will still be around.
There are three main ways peers are referenced:
1) Incoming packets
1.a) Incoming handshake packets are passed to noise_consume_*,
which will (on success) return a refcounted remote which is dropped
at the end of wg_handshake.
1.b) Incoming cookie packets will have their index looked which will (on
success) return a refcounted remote, which is also dropped at the
end of wg_handshake.
1.c) Incoming data packets will have their index looked up which will
(on success) return a refcounted keypair. This keypair will be
dropped after the packet has been passed up the network stack, or
otherwise freed.
2) Outgoing data packets
2.a) Outgoing data packets are first looked up by wg_aip_lookup, which
returns a peer pointer, with an incremented remote refcount. This
is then dropped in wg_transmit after adding the packet to the
staged queue and sending the staged queue.
2.b) Packets in the staged queue do not hold any refcount for the remote
or keypair, because they do not reference the peer in any way, they
are just in the queue.
2.c) Packets finally get a refcoutned keypair in wg_peer_send_staged,
which is dropped after the packet is sent out the UDP socket, or
otherwise freed.
3) wg_timers system
3.a) The wg_timers system holds a reference to the peer whenever a
callout is scheduled. Instead of holding a refcount, we instead
disable the peer's timers, such that no callouts can be scheduled.
Some rationale for changes here:
We move the p_{send,recv} taskqgroup_detach into peer_free_deferred as
they will NULL fields in p_{send,recv}. If there are packets being
processed in wg_{en,de}crypt, then a call tou GROUPTASK_ENQUEUE will
dereference a NULL pointer. In general, we remove all references to the
peer in wg_peer_destroy, and free/deinit all the peer members once no
more references to the remote exist, in wg_peer_free_deferred.
Currently we take a refcount in wg_aip_lookup, which is to be sure that
the peer reference is valid for the entirety of wg_transmit. We do not
care about the refcount in wg_decrypt. It might be worth considering
storing the remote pointer in the allowedip entry, but it could be
argued both ways. For the time being, this is still correct. We don't
have a refcount for the peer stored in the allowedip table, as it is
protected by the table lock. One note here is the NULL p_remote check is
necessary to support selftest/allowedips.c, which does not specify a
p_remote. If we update the tests, then we may remove this check.
There are two added p_enabled checks, in run_retry_handshake and
run_send_keepalive. This is to align them with the other callout_reset
calls. In the case of p_zero_key_material, if we have set p_enabled =
false, then we subsequently clear keypairs and handshakes (on wg_down),
or we free the peer which will clear the keypairs for us.
We want to hold a refcount of remote in wg_{en,de}crypt to ensure that
the peer is still valid in the call to GROUPTASK_ENQUEUE. If we don't
then peer may become invalid after setting p_state. Another thread may
take the packet, put the keypair refcount and free the peer prior to the
call to GROUPTASK_ENQUEUE.
We no longer need to hold (haven't for a while) the EPOCH in
wg_send_initiation and wg_send_response, as we hold valid references for
the duration. This could be either a refcount of a remote or through the
wg_timers system as described above.
We also fix some refcount leaks in wgc_set.
Notes:
We may want to pull NET_EPOCH_WAIT out of wg_timers_disable, to improve
performance. However, we can destroy 20000 peers in less than 20ms so
the performance is not critical for this snapshot and can be addressed
later.
Finally, there is the special case of noise_remote_arg, which stores the
corresponding peer pointer. The peer is not refcounted however it will
have the same scope as the remote. In otherwords it is valid until we
call noise_remote_put on the remote.
Signed-off-by: Matt Dunwoodie <ncon@noconroy.net>
|
|
|
|
| |
Signed-off-by: Matt Dunwoodie <ncon@noconroy.net>
|
|
|
|
|
|
|
| |
We might add locks and things later. Mainly it doesn't cost much and
makes things easier/safer to reason about.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
|
| |
Good for hygiene, but also, lock hardening traps on double initialization
if we don't do this.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
|
| |
This doesn't add any reference counting, opting instead to go right to
the free. This could cause problems, but hopefully not.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here we remove all M_WAITOK checks, because we don't want to hang while
trying to allocate memory. It is better to return an error so the user
can try again later.
We also make sure to check all the return codes in peer and interface
allocation. The structure of those functions is:
1) Allocate all memory
2) Initialise fields in order of the struct
3) Cleanup gotos
Signed-off-by: Matt Dunwoodie <ncon@noconroy.net>
|
|
|
|
| |
Signed-off-by: Matt Dunwoodie <ncon@noconroy.net>
|
|
|
|
|
|
| |
The rest of the stack does this.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
| |
This is already done anyway by if_clone_detach, so let that happen.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
|
|
| |
IPs mean different things per-vnet.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
|
|
| |
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|