wireguard-linux-compat - WireGuard kernel module backport for Linux 3.10

	Commit message (Collapse)	Author	Age	Files	Lines
*	crypto: curve25519-x86_64: use in/out register constraints more precisely	Jason A. Donenfeld	2021-12-13	1	-293/+504
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than passing all variables as modified, pass ones that are only read into that parameter. This helps with old gcc versions when alternatives are additionally used, and lets gcc's codegen be a little bit more efficient. This also syncs up with the latest Vale/EverCrypt output. This also forward ports 3c9f3b6 ("crypto: curve25519-x86_64: solve register constraints with reserved registers"). Cc: Aymeric Fromherz <aymeric.fromherz@inria.fr> Cc: Mathias Krause <minipli@grsecurity.net> Link: https://lore.kernel.org/wireguard/1554725710.1290070.1639240504281.JavaMail.zimbra@inria.fr/ Link: https://github.com/project-everest/hacl-star/pull/501 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	crypto: curve25519-x86_64: solve register constraints with reserved registers	Mathias Krause	2021-12-06	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The register constraints for the inline assembly in fsqr() and fsqr2() are pretty tight on what the compiler may assign to the remaining three register variables. The clobber list only allows the following to be used: RDI, RSI, RBP and R12. With RAP reserving R12 and a kernel having CONFIG_FRAME_POINTER=y, claiming RBP, there are only two registers left so the compiler rightfully complains about impossible constraints. Provide alternatives that'll allow a memory reference for 'out' to solve the allocation constraint dilemma for this configuration. Also make 'out' an input-only operand as it is only used as such. This not only allows gcc to optimize its usage further, but also works around older gcc versions, apparently failing to handle multiple alternatives correctly, as in failing to initialize the 'out' operand with its input value. Signed-off-by: Mathias Krause <minipli@grsecurity.net> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	compat: remove unused version.h headers	Jason A. Donenfeld	2021-02-07	1	-1/+0
\| \| \| \| \| \|	We don't need this in all files, and it just complicates things. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	crypto: do not export symbols	Jason A. Donenfeld	2020-04-14	5	-19/+0
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	curve25519-x86_64: avoid use of r12	Jason A. Donenfeld	2020-02-19	1	-55/+55
\| \| \| \| \| \| \| \| \| \| \|	This causes problems with RAP and KERNEXEC for PaX, as r12 is a reserved register. It also leads to a more compact instruction encoding, saving about 100 cycles. Suggested-by: PaX Team <pageexec@freemail.hu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	chacha20poly1305: defensively protect against large inputs	Jason A. Donenfeld	2020-02-06	1	-1/+3
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	curve25519: x86_64: replace with formally verified implementation	Jason A. Donenfeld	2020-01-21	3	-2308/+1300
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This comes from INRIA's HACL*/Vale. It implements the same algorithm and implementation strategy as the code it replaces, only this code has been formally verified, sans the base point multiplication, which uses code similar to prior, only it uses the formally verified field arithmetic alongside reproducable ladder generation steps. This doesn't have a pure-bmi2 version, which means haswell no longer benefits, but the increased (doubled) code complexity is not worth it for a single generation of chips that's already old. Performance-wise, this is around 1% slower on older microarchitectures, and slightly faster on newer microarchitectures, mainly 10nm ones or backports of 10nm to 14nm. This implementation is "everest" below: Xeon E5-2680 v4 (Broadwell) armfazh: 133340 cycles per call everest: 133436 cycles per call Xeon Gold 5120 (Sky Lake Server) armfazh: 112636 cycles per call everest: 113906 cycles per call Core i5-6300U (Sky Lake Client) armfazh: 116810 cycles per call everest: 117916 cycles per call Core i7-7600U (Kaby Lake) armfazh: 119523 cycles per call everest: 119040 cycles per call Core i7-8750H (Coffee Lake) armfazh: 113914 cycles per call everest: 113650 cycles per call Core i9-9880H (Coffee Lake Refresh) armfazh: 112616 cycles per call everest: 114082 cycles per call Core i3-8121U (Cannon Lake) armfazh: 113202 cycles per call everest: 111382 cycles per call Core i7-8265U (Whiskey Lake) armfazh: 127307 cycles per call everest: 127697 cycles per call Core i7-8550U (Kaby Lake Refresh) armfazh: 127522 cycles per call everest: 127083 cycles per call Xeon Platinum 8275CL (Cascade Lake) armfazh: 114380 cycles per call everest: 114656 cycles per call Achieving these kind of results with formally verified code is quite remarkable, especialy considering that performance is favorable for newer chips. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	global: fix up spelling	Josh Soref	2019-12-12	1	-1/+1
\| \| \| \| \|	Signed-off-by: Josh Soref <jsoref@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	chacha20poly1305: double check the sgmiter logic with test	Jason A. Donenfeld	2019-12-06	1	-8/+59
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	crypto: use new assembler macros for 5.5	Jason A. Donenfeld	2019-12-05	5	-14/+14
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	chacha20poly1305: port to sgmitter for 5.5	Jason A. Donenfeld	2019-12-05	3	-114/+143
\| \| \| \| \| \| \|	I'm not totally comfortable with these changes yet, and it'll require some more scrutiny. But it's a start. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	blake2s: spacing	Jason A. Donenfeld	2019-06-03	2	-123/+123
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	curve25519: not all linkers support bmi2 and adx	Jason A. Donenfeld	2019-06-02	2	-6/+48
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	blake2s: add ssse3 to nobs	Jason A. Donenfeld	2019-05-31	1	-1/+2
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	blake2s: do not use xgetbv for ssse3 detection	Jason A. Donenfeld	2019-05-31	1	-3/+1
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	zinc: update copyright	Jason A. Donenfeld	2019-05-29	2	-2/+2
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	blake2s: shorten ssse3 loop	Samuel Neves	2019-05-29	1	-857/+66
\| \| \| \| \| \| \| \|	This (mostly) preserves the performance (as measured on Haswell and *lake) of last commit, but it drastically reduces code size. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	blake2s,chacha: latency tweak	Samuel Neves	2019-05-29	5	-618/+982
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In every odd-numbered round, instead of operating over the state x00 x01 x02 x03 x05 x06 x07 x04 x10 x11 x08 x09 x15 x12 x13 x14 we operate over the rotated state x03 x00 x01 x02 x04 x05 x06 x07 x09 x10 x11 x08 x14 x15 x12 x13 The advantage here is that this requires no changes to the 'x04 x05 x06 x07' row, which is in the critical path. This results in a noticeable latency improvement of roughly R cycles, for R diagonal rounds in the primitive. In the case of BLAKE2s, which I also moved from requiring AVX to only requiring SSSE3, we save approximately 30 cycles per compression function call on Haswell and Skylake. In other words, this is an improvement of ~0.6 cpb. This idea was pointed out to me by Shunsuke Shimizu, though it appears to have been around for longer. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	zinc: arm64: use cpu_get_elf_hwcap accessor for 5.2	Jason A. Donenfeld	2019-05-29	2	-2/+2
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	kbuild: account for recent upstream changes	Jason A. Donenfeld	2019-05-29	1	-1/+1
\| \| \| \| \| \| \| \|	Apparently cdd750bfb1f76fe9be8cfb53cbe77b2e811081ab changed things, so we fall back onto this hack. Reported-by: Alex Xu <alex@alxu.ca> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	blake2s: remove outlen parameter from final	Jason A. Donenfeld	2019-03-27	2	-8/+7
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	blake2s: simplify	Samuel Neves	2019-03-27	2	-40/+12
\| \| \| \| \|	Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	chacha20: name enums	Jason A. Donenfeld	2019-02-04	1	-2/+2
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	noise: store clamped key instead of raw key	Jason A. Donenfeld	2019-02-03	5	-14/+13
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	chacha20poly1305: permit unaligned strides on certain platforms	Jason A. Donenfeld	2019-02-03	1	-18/+14
\| \| \| \| \| \| \| \|	The map allocations required to fix this are mostly slower than unaligned paths. Reported-by: Louis Sautier <sbraz@gentoo.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	global: normalize -> clamp	Jason A. Donenfeld	2019-01-23	4	-17/+10
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	global: update copyright	Jason A. Donenfeld	2019-01-07	37	-37/+37
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	makefile: use immediate expansion and use correct template patterns	Jason A. Donenfeld	2018-12-18	1	-6/+6
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	chacha20: do not define unused asm function	Jason A. Donenfeld	2018-12-07	1	-4/+2
\| \| \| \| \| \| \|	This causes RAP to be unhappy, and we're not using it anyway. Reported-by: Ivan J. <parazyd@dyne.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	chacha20,poly1305: simplify perlasm fanciness	Jason A. Donenfeld	2018-12-07	3	-75/+69
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	chacha20,poly1305: do not use xlate	Jason A. Donenfeld	2018-11-19	3	-1496/+73
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	poly1305: make frame pointers for auxiliary calls	Samuel Neves	2018-11-17	1	-31/+43
\| \| \| \| \|	Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	crypto: better path resolution and more specific generated .S	Jason A. Donenfeld	2018-11-16	1	-6/+5
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	chacha20,poly1305: don't do compiler testing in generator and remove xor helper	Jason A. Donenfeld	2018-11-15	2	-30/+39
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	crypto: resolve target prefix on buggy kernels	Jason A. Donenfeld	2018-11-15	1	-1/+6
\| \| \| \| \| \| \|	We also move to .SECONDARY, since older kernels don't use targets like that. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	poly1305: cleanup leftover debugging changes	Jason A. Donenfeld	2018-11-15	1	-3/+3
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	poly1305: only export neon symbols when in use	Jason A. Donenfeld	2018-11-15	1	-2/+6
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	chacha20,poly1305: fix up for win64	Samuel Neves	2018-11-15	2	-27/+29
\| \| \| \| \| \| \| \|	These don't help us, but it is important to keep this working for when it's re-added to cryptogams. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	perlasm: avoid rep ret	Jason A. Donenfeld	2018-11-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The original hardcodes returns as .byte 0xf3,0xc3, aka "rep ret". We replace this by "ret". "rep ret" was meant to help with AMD K8 chips, cf. http://repzret.org/p/repzret. It makes no sense to continue to use this kludge for code that won't even run on ancient AMD chips. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	poly1305: specialize to wireguard	Jason A. Donenfeld	2018-11-15	1	-11/+20
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	chacha20: specialize to wireguard	Jason A. Donenfeld	2018-11-15	2	-20/+38
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	perlasm: cleanup whitespace	Jason A. Donenfeld	2018-11-15	1	-5/+5
\| \| \| \|	Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	poly1305: adjust to kernel	Samuel Neves	2018-11-15	1	-220/+291
\| \| \| \| \|	Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	chacha20: cleaner function declarations	Samuel Neves	2018-11-14	1	-23/+23
\| \| \| \| \|	Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	chacha20: normalize names	Samuel Neves	2018-11-14	1	-71/+71
\| \| \| \| \|	Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	chacha20: fixup win64 stack offsets	Samuel Neves	2018-11-14	1	-129/+129
\| \| \| \| \| \| \|	We don't need to do this for kernel purposes, but it's polite to leave things unbroken. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	chacha20: simplify stack unwinding on ChaCha20_ctr32	Samuel Neves	2018-11-14	1	-10/+8
\| \| \| \| \| \| \|	objtool did not quite understand the stack arithmetic employed here. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	chacha20: use DRAP idiom	Samuel Neves	2018-11-14	1	-236/+235
\| \| \| \| \| \| \|	This effectively means swapping the usage of %r9 and %r10 globally. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	chacha20: add hchacha_ssse3	Samuel Neves	2018-11-14	1	-0/+39
\| \| \| \| \|	Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
*	chacha20: begin adapting to kernel setting	Samuel Neves	2018-11-14	2	-68/+116
\| \| \| \| \|	Signed-off-by: Samuel Neves <sneves@dei.uc.pt> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>