aboutsummaryrefslogtreecommitdiffstatshomepage
Commit message (Collapse)AuthorAgeFilesLines
...
* queueing: get rid of per-peer ring buffersJason A. Donenfeld2021-02-189-93/+160
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Having two ring buffers per-peer means that every peer results in two massive ring allocations. On an 8-core x86_64 machine, this commit reduces the per-peer allocation from 18,688 bytes to 1,856 bytes, which is an 90% reduction. Ninety percent! With some single-machine deployments approaching 500,000 peers, we're talking about a reduction from 7 gigs of memory down to 700 megs of memory. In order to get rid of these per-peer allocations, this commit switches to using a list-based queueing approach. Currently GSO fragments are chained together using the skb->next pointer (the skb_list_* singly linked list approach), so we form the per-peer queue around the unused skb->prev pointer (which sort of makes sense because the links are pointing backwards). Use of skb_queue_* is not possible here, because that is based on doubly linked lists and spinlocks. Multiple cores can write into the queue at any given time, because its writes occur in the start_xmit path or in the udp_recv path. But reads happen in a single workqueue item per-peer, amounting to a multi-producer, single-consumer paradigm. The MPSC queue is implemented locklessly and never blocks. However, it is not linearizable (though it is serializable), with a very tight and unlikely race on writes, which, when hit (some tiny fraction of the 0.15% of partial adds on a fully loaded 16-core x86_64 system), causes the queue reader to terminate early. However, because every packet sent queues up the same workqueue item after it is fully added, the worker resumes again, and stopping early isn't actually a problem, since at that point the packet wouldn't have yet been added to the encryption queue. These properties allow us to avoid disabling interrupts or spinning. The design is based on Dmitry Vyukov's algorithm [1]. Performance-wise, ordinarily list-based queues aren't preferable to ringbuffers, because of cache misses when following pointers around. However, we *already* have to follow the adjacent pointers when working through fragments, so there shouldn't actually be any change there. A potential downside is that dequeueing is a bit more complicated, but the ptr_ring structure used prior had a spinlock when dequeueing, so all and all the difference appears to be a wash. Actually, from profiling, the biggest performance hit, by far, of this commit winds up being atomic_add_unless(count, 1, max) and atomic_ dec(count), which account for the majority of CPU time, according to perf. In that sense, the previous ring buffer was superior in that it could check if it was full by head==tail, which the list-based approach cannot do. But all and all, this enables us to get massive memory savings, allowing WireGuard to scale for real world deployments, without taking much of a performance hit. [1] http://www.1024cores.net/home/lock-free-algorithms/queues/intrusive-mpsc-node-based-queue Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* device: do not generate ICMP for non-IP packetsJason A. Donenfeld2021-02-181-3/+4
| | | | | | | | | If skb->protocol doesn't match the actual skb->data header, it's probably not a good idea to pass it off to icmp{,v6}_ndo_send, which is expecting to reply to a valid IP packet. So this commit has that early mismatch case jump to a later error label. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* selftests: test multiple parallel streamsJason A. Donenfeld2021-02-181-1/+17
| | | | | | | | | | In order to test ndo_start_xmit being called in parallel, explicitly add separate tests, which should all run on different cores. This should help tease out bugs associated with queueing up packets from different cores in parallel. Currently, it hasn't found those types of bugs, but given future planned work, this is a useful regression to avoid. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* peer: put frequently used members above cache linesJason A. Donenfeld2021-02-081-2/+2
| | | | | | | | | | The is_dead boolean is checked for every single packet, while the internal_id member is used basically only for pr_debug messages. So it makes sense to hoist up is_dead into some space formerly unused by a struct hole, while demoting internal_api to below the lowest struct cache line. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* compat: redefine version constants for sublevel>=256Jason A. Donenfeld2021-02-072-0/+11
| | | | | | | | With the 4.4.256 and 4.9.256 kernels, the previous calculation for integer comparison overflowed. This commit redefines the broken constants to have more space for the sublevel. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* compat: remove unused version.h headersJason A. Donenfeld2021-02-072-2/+0
| | | | | | We don't need this in all files, and it just complicates things. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* version: bumpv1.0.20210124Jason A. Donenfeld2021-01-242-2/+2
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* compat: skb_mark_not_on_list was backported to 4.14Jason A. Donenfeld2021-01-241-1/+1
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* compat: SYM_FUNC_* was backported to c8sJason A. Donenfeld2021-01-131-1/+12
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* version: bumpv1.0.20201221Jason A. Donenfeld2020-12-212-2/+2
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* socket: remove bogus __be32 annotationJann Horn2020-12-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | The endpoint->src_if4 has nothing to do with fixed-endian numbers; remove the bogus annotation. This was introduced in https://git.zx2c4.com/wireguard-monolithic-historical/commit?id=14e7d0a499a676ec55176c0de2f9fcbd34074a82 in the historical WireGuard repo because the old code used to zero-initialize multiple members as follows: endpoint->src4.s_addr = endpoint->src_if4 = fl.saddr = 0; Because fl.saddr is fixed-endian and an assignment returns a value with the type of its left operand, this meant that sparse detected an assignment between values of different endianness. Since then, this assignment was already split up into separate statements; just the cast survived. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* global: avoid double unlikely() notation when using IS_ERR()Antonio Quartulli2020-12-212-3/+3
| | | | | | | | | | | | The definition of IS_ERR() already applies the unlikely() notation when checking the error status of the passed pointer. For this reason there is no need to have the same notation outside of IS_ERR() itself. Clean up code by removing redundant notation. Signed-off-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* simd: detect -rt kernels >= 5.4Jason A. Donenfeld2020-12-191-1/+1
| | | | | | | | | The 5.4 series of -rt kernels moved from PREEMPT_RT_BASE/PREEMPT_RT_FULL to PREEMPT_RT, so we have to account for it here. Otherwise users get scheduling-while-atomic splats. Reported-by: Erik Schuitema <erik@essd.nl> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* gitignore: ignore intermediary build fileL.W.Reek2020-12-161-0/+1
| | | | | Signed-off-by: L.W.Reek <syphyr@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* compat: drop rhel 8.2, add rhel 8.4 supportJason A. Donenfeld2020-12-141-8/+5
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* version: bumpv1.0.20201112Jason A. Donenfeld2020-11-122-2/+2
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* qemu: bump default testing versionJason A. Donenfeld2020-11-121-1/+1
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* compat: SYM_FUNC_{START,END} were backported to 5.4Jason A. Donenfeld2020-11-121-1/+1
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* qemu: drop build support for rhel 8.2Jason A. Donenfeld2020-11-041-1/+0
| | | | | | This reverts commit feb89cab65c6ab1a6cbeeaaeb11b1a174772cea8. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* netns: check that route_me_harder packets use the right skJason A. Donenfeld2020-10-292-0/+11
| | | | | | | | | | | | If netfilter changes the packet mark, the packet is rerouted. The ip_route_me_harder family of functions fails to use the right sk, opting to instead use skb->sk, resulting in a routing loop when used with tunnels. Fixing this inside of the compat layer with skb_orphan would work but would cause other problems, by disabling TSQ, so instead we warn if the calling kernel hasn't yet backported the fix for this. Reported-by: Chen Minqiang <ptpt52@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* noise: take lock when removing handshake entry from tableJason A. Donenfeld2020-09-091-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eric reported that syzkaller found a race of this variety: CPU 1 CPU 2 -------------------------------------------|--------------------------------------- wg_index_hashtable_replace(old, ...) | if (hlist_unhashed(&old->index_hash)) | | wg_index_hashtable_remove(old) | hlist_del_init_rcu(&old->index_hash) | old->index_hash.pprev = NULL hlist_replace_rcu(&old->index_hash, ...) | *old->index_hash.pprev | Syzbot wasn't actually able to reproduce this more than once or create a reproducer, because the race window between checking "hlist_unhashed" and calling "hlist_replace_rcu" is just so small. Adding an mdelay(5) or similar there helps make this demonstrable using this simple script: #!/bin/bash set -ex trap 'kill $pid1; kill $pid2; ip link del wg0; ip link del wg1' EXIT ip link add wg0 type wireguard ip link add wg1 type wireguard wg set wg0 private-key <(wg genkey) listen-port 9999 wg set wg1 private-key <(wg genkey) peer $(wg show wg0 public-key) endpoint 127.0.0.1:9999 persistent-keepalive 1 wg set wg0 peer $(wg show wg1 public-key) ip link set wg0 up yes link set wg1 up | ip -force -batch - & pid1=$! yes link set wg1 down | ip -force -batch - & pid2=$! wait The fundumental underlying problem is that we permit calls to wg_index_ hashtable_remove(handshake.entry) without requiring the caller to take the handshake mutex that is intended to protect members of handshake during mutations. This is consistently the case with calls to wg_index_ hashtable_insert(handshake.entry) and wg_index_hashtable_replace( handshake.entry), but it's missing from a pertinent callsite of wg_ index_hashtable_remove(handshake.entry). So, this patch makes sure that mutex is taken. The original code was a little bit funky though, in the form of: remove(handshake.entry) lock(), memzero(handshake.some_members), unlock() remove(handshake.entry) The original intention of that double removal pattern outside the lock appears to be some attempt to prevent insertions that might happen while locks are dropped during expensive crypto operations, but actually, all callers of wg_index_hashtable_insert(handshake.entry) take the write lock and then explicitly check handshake.state, as they should, which the aforementioned memzero clears, which means an insertion should already be impossible. And regardless, the original intention was necessarily racy, since it wasn't guaranteed that something else would run after the unlock() instead of after the remove(). So, from a soundness perspective, it seems positive to remove what looks like a hack at best. The crash from both syzbot and from the script above is as follows: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 0 PID: 7395 Comm: kworker/0:3 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: wg-kex-wg1 wg_packet_handshake_receive_worker RIP: 0010:hlist_replace_rcu include/linux/rculist.h:505 [inline] RIP: 0010:wg_index_hashtable_replace+0x176/0x330 drivers/net/wireguard/peerlookup.c:174 Code: 00 fc ff df 48 89 f9 48 c1 e9 03 80 3c 01 00 0f 85 44 01 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 10 48 89 c6 48 c1 ee 03 <80> 3c 0e 00 0f 85 06 01 00 00 48 85 d2 4c 89 28 74 47 e8 a3 4f b5 RSP: 0018:ffffc90006a97bf8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888050ffc4f8 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88808e04e010 RBP: ffff88808e04e000 R08: 0000000000000001 R09: ffff8880543d0000 R10: ffffed100a87a000 R11: 000000000000016e R12: ffff8880543d0000 R13: ffff88808e04e008 R14: ffff888050ffc508 R15: ffff888050ffc500 FS: 0000000000000000(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000f5505db0 CR3: 0000000097cf7000 CR4: 00000000001526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: wg_noise_handshake_begin_session+0x752/0xc9a drivers/net/wireguard/noise.c:820 wg_receive_handshake_packet drivers/net/wireguard/receive.c:183 [inline] wg_packet_handshake_receive_worker+0x33b/0x730 drivers/net/wireguard/receive.c:220 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Note that this fixes the same issue as the previous commit, but in a more direct way. Upstream, the commit message of that previous commit has been changed to: wireguard: peerlookup: take lock before checking hash in replace operation Eric's suggested fix for the previous commit's mentioned race condition was to simply take the table->lock in wg_index_hashtable_replace(). The table->lock of the hash table is supposed to protect the bucket heads, not the entires, but actually, since all the mutator functions are already taking it, it makes sense to take it too for the test to hlist_unhashed, as a defense in depth measure, so that it no longer races with deletions, regardless of what other locks are protecting individual entries. This is sensible from a performance perspective because, as Eric pointed out, the case of being unhashed is already the unlikely case, so this won't add common contention. And comparing instructions, this basically doesn't make much of a difference other than pushing and popping %r13, used by the new `bool ret`. More generally, I like the idea of locking consistency across table mutator functions, and this might let me rest slightly easier at night. Since we've already tagged it, we're not going to change it at this point, but I include mention of it here for reference. Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* version: bumpv1.0.20200908Jason A. Donenfeld2020-09-082-2/+2
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* peerlookup: take lock before checking hash in replace operationJason A. Donenfeld2020-09-081-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eric reported that syzkaller found a race of this variety: CPU 1 CPU 2 -------------------------------------------|--------------------------------------- wg_index_hashtable_replace(old, ...) | if (hlist_unhashed(&old->index_hash)) | | wg_index_hashtable_remove(old) | hlist_del_init_rcu(&old->index_hash) | old->index_hash.pprev = NULL hlist_replace_rcu(&old->index_hash, ...) | *old->index_hash.pprev | The table->lock of the hash table is supposed to protect the bucket heads, not the entires, but actually, since all the mutator functions are already taking it, it makes sense to take it too for the test to hlist_unhashed, so that it no longer races with deletions. This is fine because, as Eric pointed out, the case of being unhashed is already the unlikely case, so this won't add common contention. And comparing instructions, this basically doesn't make much of a difference other than pushing and popping %r13, used by the new `bool ret`. The syzkaller crash is as follows: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 0 PID: 7395 Comm: kworker/0:3 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: wg-kex-wg1 wg_packet_handshake_receive_worker RIP: 0010:hlist_replace_rcu include/linux/rculist.h:505 [inline] RIP: 0010:wg_index_hashtable_replace+0x176/0x330 drivers/net/wireguard/peerlookup.c:174 Code: 00 fc ff df 48 89 f9 48 c1 e9 03 80 3c 01 00 0f 85 44 01 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 10 48 89 c6 48 c1 ee 03 <80> 3c 0e 00 0f 85 06 01 00 00 48 85 d2 4c 89 28 74 47 e8 a3 4f b5 RSP: 0018:ffffc90006a97bf8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888050ffc4f8 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88808e04e010 RBP: ffff88808e04e000 R08: 0000000000000001 R09: ffff8880543d0000 R10: ffffed100a87a000 R11: 000000000000016e R12: ffff8880543d0000 R13: ffff88808e04e008 R14: ffff888050ffc508 R15: ffff888050ffc500 FS: 0000000000000000(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000f5505db0 CR3: 0000000097cf7000 CR4: 00000000001526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: wg_noise_handshake_begin_session+0x752/0xc9a drivers/net/wireguard/noise.c:820 wg_receive_handshake_packet drivers/net/wireguard/receive.c:183 [inline] wg_packet_handshake_receive_worker+0x33b/0x730 drivers/net/wireguard/receive.c:220 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Modules linked in: ---[ end trace 0d737db78b72da84 ]--- RIP: 0010:hlist_replace_rcu include/linux/rculist.h:505 [inline] RIP: 0010:wg_index_hashtable_replace+0x176/0x330 drivers/net/wireguard/peerlookup.c:174 Code: 00 fc ff df 48 89 f9 48 c1 e9 03 80 3c 01 00 0f 85 44 01 00 00 48 b9 00 00 00 00 00 fc ff df 48 8b 45 10 48 89 c6 48 c1 ee 03 <80> 3c 0e 00 0f 85 06 01 00 00 48 85 d2 4c 89 28 74 47 e8 a3 4f b5 RSP: 0018:ffffc90006a97bf8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888050ffc4f8 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88808e04e010 RBP: ffff88808e04e000 R08: 0000000000000001 R09: ffff8880543d0000 R10: ffffed100a87a000 R11: 000000000000016e R12: ffff8880543d0000 R13: ffff88808e04e008 R14: ffff888050ffc508 R15: ffff888050ffc500 FS: 0000000000000000(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000f5505db0 CR3: 0000000097cf7000 CR4: 00000000001526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* compat: backport NLA policy macrosJason A. Donenfeld2020-08-271-2/+2
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* netlink: consistently use NLA_POLICY_MIN_LEN()Johannes Berg2020-08-271-2/+2
| | | | | | | | | | Change places that open-code NLA_POLICY_MIN_LEN() to use the macro instead, giving us flexibility in how we handle the details of the macro. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* netlink: consistently use NLA_POLICY_EXACT_LEN()Johannes Berg2020-08-271-5/+5
| | | | | | | | | | | Change places that open-code NLA_POLICY_EXACT_LEN() to use the macro instead, giving us flexibility in how we handle the details of the macro. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* compat: backport kfree_sensitive and switch to itJason A. Donenfeld2020-08-273-3/+7
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* compat: drop support for SUSE 15.1Jason A. Donenfeld2020-07-291-10/+7
| | | | | | | | Now that WireGuard is properly supported by 15.2 and people have had sufficient time to upgrade, we can drop support for 15.1 in this compat module. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* version: bumpv1.0.20200729Jason A. Donenfeld2020-07-292-2/+2
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* compat: add missing headers for ip_tunnel_parse_protocolJason A. Donenfeld2020-07-291-0/+2
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* compat: ipv6_dst_lookup_flow was ported to rhel 7.9 betaJason A. Donenfeld2020-07-291-1/+4
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* compat: allow override of depmod basedirRicardo Mendoza2020-07-291-1/+2
| | | | | | | | | When building in an environment with a different modules install path we need to be able to also override the depmod basedir flag. Signed-off-by: Ricardo Mendoza <ricmm@pantacor.com> [zx2c4: changed name of env var and added quotes to argument] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* compat: rhel 8.3 beta removed nf_nat_core.hJason A. Donenfeld2020-07-291-1/+1
| | | | | Reported-by: Vladimir Benes <vbenes@redhat.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* version: bumpv1.0.20200712Jason A. Donenfeld2020-07-122-2/+2
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* compat: backport ip_tunnel_parse_protocol and ip_tunnel_header_opsJason A. Donenfeld2020-06-301-0/+22
| | | | | | | These are required for moving wg_examine_packet_protocol out of wireguard and into upstream. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* queueing: make use of ip_tunnel_parse_protocolJason A. Donenfeld2020-06-302-18/+3
| | | | | | | | | Now that wg_examine_packet_protocol has been added for general consumption as ip_tunnel_parse_protocol, it's possible to remove wg_examine_packet_protocol and simply use the new ip_tunnel_parse_protocol function directly. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* device: implement header_ops->parse_protocol for AF_PACKETJason A. Donenfeld2020-06-301-0/+1
| | | | | | | | | | | | | | | | WireGuard uses skb->protocol to determine packet type, and bails out if it's not set or set to something it's not expecting. For AF_PACKET injection, we need to support its call chain of: packet_sendmsg -> packet_snd -> packet_parse_headers -> dev_parse_header_protocol -> parse_protocol Without a valid parse_protocol, this returns zero, and wireguard then rejects the skb. So, this wires up the ip_tunnel handler for layer 3 packets for that case. Reported-by: Hans Wippel <ndev@hwipl.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* compat: SUSE 15.1 is the final SUSE we need to supportJason A. Donenfeld2020-06-291-8/+8
| | | | | | >=15.2 is in SUSE's kernel now. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* compat: rhel 8.3 backported skb_reset_redirectJason A. Donenfeld2020-06-291-1/+4
| | | | | Reported-by: Vladimir Benes <vbenes@redhat.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* receive: account for napi_gro_receive never returning GRO_DROPJason A. Donenfeld2020-06-291-8/+2
| | | | | | | | | | The napi_gro_receive function no longer returns GRO_DROP ever, making handling GRO_DROP dead code. This commit removes that dead code. Further, it's not even clear that device drivers have any business in taking action after passing off received packets; that's arguably out of their hands. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* version: bumpv1.0.20200623Jason A. Donenfeld2020-06-232-2/+2
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* netns: workaround bad 5.2.y backportJason A. Donenfeld2020-06-221-1/+2
| | | | | | | | ca7a03c4175 was backported to 5.2 to fix 7d9e5f422150, but 7d9e5f422150 wasn't added until 5.3, so this fix for a reference underflow in 5.3 becomes a memory leak in 5.2. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* device: avoid circular netns referencesJason A. Donenfeld2020-06-226-46/+71
| | | | | | | | | | | | | | | | Before, we took a reference to the creating netns if the new netns was different. This caused issues with circular references, with two wireguard interfaces swapping namespaces. The solution is to rather not take any extra references at all, but instead simply invalidate the creating netns pointer when that netns is deleted. In order to prevent this from happening again, this commit improves the rough object leak tracking by allowing it to account for created and destroyed interfaces, aside from just peers and keys. That then makes it possible to check for the object leak when having two interfaces take a reference to each others' namespaces. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* noise: do not assign initiation time in if conditionFrank Werner-Krippendorf2020-06-211-2/+2
| | | | | | | | | Fixes an error condition reported by checkpatch.pl which caused by assigning a variable in an if condition in wg_noise_handshake_consume_ initiation(). Signed-off-by: Frank Werner-Krippendorf <mail@hb9fxq.ch> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* Kbuild: remove -fvisibility=hidden from cflagsJason A. Donenfeld2020-06-181-1/+1
| | | | | | | | | | | | | This was originally done in 2015 as a means of decreasing module size, but it has the effect of creating JUMP11 relocations on ARM when compiled in THUMB2 mode without CONFIG_THUMB2_AVOID_R_ARM_THM_JUMP11=y, which results in `B ...` instructions being generated with jumps that are too far, rather than `B.W ...` instructions, which can handle the larger sized jump. Get rid of the old hack, which had minimum utility anyway. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* compat: drop centos 8.1 support as 8.2 is now outJason A. Donenfeld2020-06-151-7/+4
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* version: bumpv1.0.20200611Jason A. Donenfeld2020-06-112-2/+2
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* compat: remove stale suse supportJason A. Donenfeld2020-06-041-11/+3
| | | | | | | | The 42.x series is no longer supported, and the 15.2 kernel is getting a proper backport, so at the moment, we only care about supporting 15.1. Eventually we'll drop that too. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* compat: bionic-hwe-5.0/disco kernel backported skb_reset_redirect and ipv6 flowJason A. Donenfeld2020-05-281-2/+4
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
* qemu: mark per_cpu_load_addr as static for gcc-10Jason A. Donenfeld2020-05-281-0/+1
| | | | Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>