wireguard-go - Go implementation of WireGuard

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	device: use container/list instead of open coding it	Jason A. Donenfeld	2021-02-10	2	-37/+25
\| \| \| \| \| \| \|	This linked list implementation is awful, but maybe Go 2 will help eventually, and at least we're not open coding the hlist any more. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: retry Up() in up/down test	Jason A. Donenfeld	2021-02-10	1	-2/+13
\| \| \| \| \| \| \| \|	We're loosing our ownership of the port when bringing the device down, which means another test process could reclaim it. Avoid this by retrying for 4 seconds. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: flush peer queues before starting device	Jason A. Donenfeld	2021-02-10	2	-24/+30
\| \| \| \| \| \| \|	In case some old packets snuck in there before, this flushes before starting afresh. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: create peer queues at peer creation time	Jason A. Donenfeld	2021-02-10	1	-6/+3
\| \| \| \| \| \| \|	Rather than racing with Start(), since we're never destroying these queues, we just set the variables at creation time. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: return error from Up() and Down()	Jason A. Donenfeld	2021-02-10	2	-18/+27
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	rwcancel: add an explicit close call	Jason A. Donenfeld	2021-02-09	1	-0/+1
\| \| \| \| \| \|	This lets us collect FDs even if the GC doesn't do it for us. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: handshake routine writes into encryption queue	Jason A. Donenfeld	2021-02-09	2	-1/+5
\| \| \| \| \| \| \|	Since RoutineHandshake calls peer.SendKeepalive(), it potentially is a writer into the encryption queue, so we need to bump the wg count. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: make RoutineReadFromTUN keep encryption queue alive	Josh Bleecher Snyder	2021-02-09	2	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RoutineReadFromTUN can trigger a call to SendStagedPackets. SendStagedPackets attempts to protect against sending on the encryption queue by checking peer.isRunning and device.isClosed. However, those are subject to TOCTOU bugs. If that happens, we get this: goroutine 1254 [running]: golang.zx2c4.com/wireguard/device.(Peer).SendStagedPackets(0xc000798300) .../wireguard-go/device/send.go:321 +0x125 golang.zx2c4.com/wireguard/device.(Device).RoutineReadFromTUN(0xc000014780) .../wireguard-go/device/send.go:271 +0x21c created by golang.zx2c4.com/wireguard/device.NewDevice .../wireguard-go/device/device.go:315 +0x298 Fix this with a simple, big hammer: Keep the encryption queue alive as long as it might be written to. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
*	device: only allocate peer queues once	Josh Bleecher Snyder	2021-02-09	1	-4/+4
\| \| \| \| \| \| \| \| \|	This serves two purposes. First, it makes repeatedly stopping then starting a peer cheaper. Second, it prevents a data race observed accessing the queues. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
*	device: clarify device.state.state docs (again)	Josh Bleecher Snyder	2021-02-09	1	-2/+4
\| \| \| \|	Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
*	device: run fewer iterations in TestUpDown	Josh Bleecher Snyder	2021-02-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	The high iteration count was useful when TestUpDown was the nexus of new bugs to investigate. Now that it has stabilized, that's less valuable. And it slows down running the tests and crowds out other tests. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
*	device: run fewer trials in TestWaitPool when race detector enabled	Josh Bleecher Snyder	2021-02-09	3	-0/+24
\| \| \| \| \| \| \|	On a many-core machine with the race detector enabled, this test can take several minutes to complete. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
*	device: remove nil elem check in finalizers	Josh Bleecher Snyder	2021-02-09	1	-6/+0
\| \| \| \| \| \|	This is not necessary, and removing it speeds up detection of UAF bugs. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
*	device: rename unsafeRemovePeer to removePeerLocked	Jason A. Donenfeld	2021-02-09	1	-9/+5
\| \| \| \| \| \|	This matches the new naming scheme of upLocked and downLocked. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: remove deviceStateNew	Jason A. Donenfeld	2021-02-09	2	-20/+8
\| \| \| \| \| \| \|	It's never used and we won't have a use for it. Also, move to go-running stringer, for those without GOPATHs. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: fix comment typo and shorten state.mu.Lock to state.Lock	Jason A. Donenfeld	2021-02-09	2	-13/+12
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: fix typo in comment	Jason A. Donenfeld	2021-02-09	1	-1/+1
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: fix alignment on 32-bit machines and test for it	Jason A. Donenfeld	2021-02-09	2	-8/+2
\| \| \| \| \| \| \| \| \|	The test previously checked the offset within a substruct, not the offset within the allocated struct, so this adds the two together. It then fixes an alignment crash on 32-bit machines. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: do not log on idempotent device state change	Jason A. Donenfeld	2021-02-09	1	-1/+0
\| \| \| \| \| \| \|	Part of being actually idempotent is that we shouldn't penalize code that takes advantage of this property with a log splat. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: do not attach finalizer to non-returned object	Jason A. Donenfeld	2021-02-09	5	-20/+22
\| \| \| \| \| \| \| \|	Before, the code attached a finalizer to an object that wasn't returned, resulting in immediate garbage collection. Instead return the actual pointer. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: lock elem in autodraining queue before freeing	Jason A. Donenfeld	2021-02-09	1	-0/+2
\| \| \| \| \| \| \|	Without this, we wind up freeing packets that the encryption/decryption queues still have, resulting in a UaF. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: remove listen port race in tests	Jason A. Donenfeld	2021-02-09	1	-63/+43
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: generate test keys on the fly	Jason A. Donenfeld	2021-02-09	1	-6/+21
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: remove mutex from Peer send/receive	Josh Bleecher Snyder	2021-02-08	4	-16/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The immediate motivation for this change is an observed deadlock. 1. A goroutine calls peer.Stop. That calls peer.queue.Lock(). 2. Another goroutine is in RoutineSequentialReceiver. It receives an elem from peer.queue.inbound. 3. The peer.Stop goroutine calls close(peer.queue.inbound), close(peer.queue.outbound), and peer.stopping.Wait(). It blocks waiting for RoutineSequentialReceiver and RoutineSequentialSender to exit. 4. The RoutineSequentialReceiver goroutine calls peer.SendStagedPackets(). SendStagedPackets attempts peer.queue.RLock(). That blocks forever because the peer.Stop goroutine holds a write lock on that mutex. A background motivation for this change is that it can be expensive to have a mutex in the hot code path of RoutineSequential*. The mutex was necessary to avoid attempting to send elems on a closed channel. This commit removes that danger by never closing the channel. Instead, we send a sentinel nil value on the channel to indicate to the receiver that it should exit. The only problem with this is that if the receiver exits, we could write an elem into the channel which would never get received. If it never gets received, it cannot get returned to the device pools. To work around this, we use a finalizer. When the channel can be GC'd, the finalizer drains any remaining elements from the channel and restores them to the device pool. After that change, peer.queue.RWMutex no longer makes sense where it is. It is only used to prevent concurrent calls to Start and Stop. Move it to a more sensible location and make it a plain sync.Mutex. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
*	device: create channels.go	Josh Bleecher Snyder	2021-02-08	2	-61/+69
\| \| \| \| \| \| \|	We have a bunch of stupid channel tricks, and I'm about to add more. Give them their own file. This commit is 100% code movement. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
*	device: print direction when ping transit fails	Josh Bleecher Snyder	2021-02-08	1	-3/+9
\| \| \| \|	Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
*	device: separate timersInit from timersStart	Josh Bleecher Snyder	2021-02-08	2	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	timersInit sets up the timers. It need only be done once per peer. timersStart does the work to prepare the timers for a newly running peer. It needs to be done every time a peer starts. Separate the two and call them in the appropriate places. This prevents data races on the peer's timers fields when starting and stopping peers. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
*	device: don't track device interface state in RoutineTUNEventReader	Josh Bleecher Snyder	2021-02-08	1	-7/+4
\| \| \| \| \| \| \|	We already track this state elsewhere. No need to duplicate. The cost of calling changeState is negligible. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
*	device: improve MTU change handling	Josh Bleecher Snyder	2021-02-08	1	-8/+15
\| \| \| \| \| \| \| \| \| \| \|	The old code silently accepted negative MTUs. It also set MTUs above the maximum. It also had hard to follow deeply nested conditionals. Add more paranoid handling, and make the code more straight-line. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
*	device: remove device.state.stopping from RoutineTUNEventReader	Josh Bleecher Snyder	2021-02-08	2	-2/+1
\| \| \| \| \| \| \| \| \| \|	The TUN event reader does three things: Change MTU, device up, and device down. Changing the MTU after the device is closed does no harm. Device up and device down don't make sense after the device is closed, but we can check that condition before proceeding with changeState. There's thus no reason to block device.Close on RoutineTUNEventReader exiting. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
*	device: overhaul device state management	Josh Bleecher Snyder	2021-02-08	8	-139/+188
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit simplifies device state management. It creates a single unified state variable and documents its semantics. It also makes state changes more atomic. As an example of the sort of bug that occurred due to non-atomic state changes, the following sequence of events used to occur approximately every 2.5 million test runs: * RoutineTUNEventReader received an EventDown event. * It called device.Down, which called device.setUpDown. * That set device.state.changing, but did not yet attempt to lock device.state.Mutex. * Test completion called device.Close. * device.Close locked device.state.Mutex. * device.Close blocked on a call to device.state.stopping.Wait. * device.setUpDown then attempted to lock device.state.Mutex and blocked. Deadlock results. setUpDown cannot progress because device.state.Mutex is locked. Until setUpDown returns, RoutineTUNEventReader cannot call device.state.stopping.Done. Until device.state.stopping.Done gets called, device.state.stopping.Wait is blocked. As long as device.state.stopping.Wait is blocked, device.state.Mutex cannot be unlocked. This commit fixes that deadlock by holding device.state.mu when checking that the device is not closed. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
*	device: remove unnecessary zeroing in peer.SendKeepalive	Josh Bleecher Snyder	2021-02-08	1	-1/+0
\| \| \| \| \| \|	elem.packet is always already nil. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
*	device: remove device.state.stopping from RoutineHandshake	Josh Bleecher Snyder	2021-02-08	2	-5/+1
\| \| \| \| \| \|	It is no longer necessary. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
*	device: remove device.state.stopping from RoutineDecryption	Josh Bleecher Snyder	2021-02-08	2	-5/+3
\| \| \| \| \| \| \|	It is no longer necessary, as of 454de6f3e64abd2a7bf9201579cd92eea5280996 (device: use channel close to shut down and drain decryption channel). Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
*	device: take peer handshake when reinitializing last sent handshake	Jason A. Donenfeld	2021-02-03	1	-1/+4
\| \| \| \| \| \|	This papers over other unrelated races, unfortunately. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: fix goroutine leak test	Josh Bleecher Snyder	2021-02-03	1	-8/+9
\| \| \| \| \| \| \| \| \| \|	The leak test had rare flakes. If a system goroutine started at just the wrong moment, you'd get a false positive. Instead of looping until the goroutines look good and then checking, exit completely as soon as the number of goroutines looks good. Also, check more frequently, in an attempt to complete faster. Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
*	device: add up/down stress test	Jason A. Donenfeld	2021-02-03	1	-0/+35
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: pass cfg strings around in tests instead of reader	Jason A. Donenfeld	2021-02-03	1	-9/+7
\| \| \| \| \| \|	This makes it easier to tag things onto the end manually for quick hacks. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: benchmark the waitpool to compare it to the prior channels	Jason A. Donenfeld	2021-02-03	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Here is the old implementation: type WaitPool struct { c chan interface{} } func NewWaitPool(max uint32, new func() interface{}) WaitPool { p := &WaitPool{c: make(chan interface{}, max)} for i := uint32(0); i < max; i++ { p.c <- new() } return p } func (p WaitPool) Get() interface{} { return <- p.c } func (p *WaitPool) Put(x interface{}) { p.c <- x } It performs worse than the new one: name old time/op new time/op delta WaitPool-16 16.4µs ± 5% 15.1µs ± 3% -7.86% (p=0.008 n=5+5) Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: test that we do not leak goroutines	Josh Bleecher Snyder	2021-02-03	1	-0/+31
\| \| \| \|	Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
*	device: tie encryption queue lifetime to the peers that write to it	Josh Bleecher Snyder	2021-02-03	3	-4/+6
\| \| \| \| \|	Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
*	device: use a waiting sync.Pool instead of a channel	Jason A. Donenfeld	2021-02-02	4	-67/+116
\| \| \| \| \| \|	Channels are FIFO which means we have guaranteed cache misses. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: reduce number of append calls when padding	Jason A. Donenfeld	2021-01-29	1	-5/+2
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: use int64 instead of atomic.Value for time stamp	Jason A. Donenfeld	2021-01-29	2	-13/+27
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: use new model queues for handshakes	Jason A. Donenfeld	2021-01-29	2	-79/+52
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: simplify peer queue locking	Jason A. Donenfeld	2021-01-29	4	-147/+70
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: reduce nesting when staging packet	Jason A. Donenfeld	2021-01-28	1	-6/+6
\| \| \| \| \|	Suggested-by: Josh Bleecher Snyder <josh@tailscale.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	global: bump copyright	Jason A. Donenfeld	2021-01-28	34	-34/+34
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: do not allow get to run while set runs	Jason A. Donenfeld	2021-01-28	2	-3/+7
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	device: avoid hex allocations in IpcGet	Jason A. Donenfeld	2021-01-28	2	-15/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	benchmark old ns/op new ns/op delta BenchmarkUAPIGet-16 2872 2157 -24.90% benchmark old allocs new allocs delta BenchmarkUAPIGet-16 30 18 -40.00% benchmark old bytes new bytes delta BenchmarkUAPIGet-16 737 256 -65.26% Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>