Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | blake2s: spacing | Jason A. Donenfeld | 2019-06-03 | 2 | -123/+123 |
| | |||||
* | curve25519: not all linkers support bmi2 and adx | Jason A. Donenfeld | 2019-06-02 | 2 | -6/+48 |
| | |||||
* | blake2s: add ssse3 to nobs | Jason A. Donenfeld | 2019-05-31 | 1 | -1/+2 |
| | |||||
* | blake2s: do not use xgetbv for ssse3 detection | Jason A. Donenfeld | 2019-05-31 | 1 | -3/+1 |
| | |||||
* | zinc: update copyright | Jason A. Donenfeld | 2019-05-29 | 2 | -2/+2 |
| | |||||
* | blake2s: shorten ssse3 loop | Samuel Neves | 2019-05-29 | 1 | -857/+66 |
| | | | | | | | This (mostly) preserves the performance (as measured on Haswell and *lake) of last commit, but it drastically reduces code size. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | blake2s,chacha: latency tweak | Samuel Neves | 2019-05-29 | 5 | -618/+982 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In every odd-numbered round, instead of operating over the state x00 x01 x02 x03 x05 x06 x07 x04 x10 x11 x08 x09 x15 x12 x13 x14 we operate over the rotated state x03 x00 x01 x02 x04 x05 x06 x07 x09 x10 x11 x08 x14 x15 x12 x13 The advantage here is that this requires no changes to the 'x04 x05 x06 x07' row, which is in the critical path. This results in a noticeable latency improvement of roughly R cycles, for R diagonal rounds in the primitive. In the case of BLAKE2s, which I also moved from requiring AVX to only requiring SSSE3, we save approximately 30 cycles per compression function call on Haswell and Skylake. In other words, this is an improvement of ~0.6 cpb. This idea was pointed out to me by Shunsuke Shimizu, though it appears to have been around for longer. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | zinc: arm64: use cpu_get_elf_hwcap accessor for 5.2 | Jason A. Donenfeld | 2019-05-29 | 2 | -2/+2 |
| | |||||
* | kbuild: account for recent upstream changes | Jason A. Donenfeld | 2019-05-29 | 1 | -1/+1 |
| | | | | | | | Apparently cdd750bfb1f76fe9be8cfb53cbe77b2e811081ab changed things, so we fall back onto this hack. Reported-by: Alex Xu <alex@alxu.ca> | ||||
* | blake2s: remove outlen parameter from final | Jason A. Donenfeld | 2019-03-27 | 2 | -8/+7 |
| | |||||
* | blake2s: simplify | Samuel Neves | 2019-03-27 | 2 | -40/+12 |
| | | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | chacha20: name enums | Jason A. Donenfeld | 2019-02-04 | 1 | -2/+2 |
| | |||||
* | noise: store clamped key instead of raw key | Jason A. Donenfeld | 2019-02-03 | 5 | -14/+13 |
| | |||||
* | chacha20poly1305: permit unaligned strides on certain platforms | Jason A. Donenfeld | 2019-02-03 | 1 | -18/+14 |
| | | | | | | | The map allocations required to fix this are mostly slower than unaligned paths. Reported-by: Louis Sautier <sbraz@gentoo.org> | ||||
* | global: normalize -> clamp | Jason A. Donenfeld | 2019-01-23 | 4 | -17/+10 |
| | |||||
* | global: update copyright | Jason A. Donenfeld | 2019-01-07 | 37 | -37/+37 |
| | |||||
* | makefile: use immediate expansion and use correct template patterns | Jason A. Donenfeld | 2018-12-18 | 1 | -6/+6 |
| | |||||
* | chacha20: do not define unused asm function | Jason A. Donenfeld | 2018-12-07 | 1 | -4/+2 |
| | | | | | | This causes RAP to be unhappy, and we're not using it anyway. Reported-by: Ivan J. <parazyd@dyne.org> | ||||
* | chacha20,poly1305: simplify perlasm fanciness | Jason A. Donenfeld | 2018-12-07 | 3 | -75/+69 |
| | |||||
* | chacha20,poly1305: do not use xlate | Jason A. Donenfeld | 2018-11-19 | 3 | -1496/+73 |
| | |||||
* | poly1305: make frame pointers for auxiliary calls | Samuel Neves | 2018-11-17 | 1 | -31/+43 |
| | | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | crypto: better path resolution and more specific generated .S | Jason A. Donenfeld | 2018-11-16 | 1 | -6/+5 |
| | |||||
* | chacha20,poly1305: don't do compiler testing in generator and remove xor helper | Jason A. Donenfeld | 2018-11-15 | 2 | -30/+39 |
| | |||||
* | crypto: resolve target prefix on buggy kernels | Jason A. Donenfeld | 2018-11-15 | 1 | -1/+6 |
| | | | | | We also move to .SECONDARY, since older kernels don't use targets like that. | ||||
* | poly1305: cleanup leftover debugging changes | Jason A. Donenfeld | 2018-11-15 | 1 | -3/+3 |
| | |||||
* | poly1305: only export neon symbols when in use | Jason A. Donenfeld | 2018-11-15 | 1 | -2/+6 |
| | |||||
* | chacha20,poly1305: fix up for win64 | Samuel Neves | 2018-11-15 | 2 | -27/+29 |
| | | | | | | | These don't help us, but it is important to keep this working for when it's re-added to cryptogams. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | perlasm: avoid rep ret | Jason A. Donenfeld | 2018-11-15 | 1 | -1/+1 |
| | | | | | | | | The original hardcodes returns as .byte 0xf3,0xc3, aka "rep ret". We replace this by "ret". "rep ret" was meant to help with AMD K8 chips, cf. http://repzret.org/p/repzret. It makes no sense to continue to use this kludge for code that won't even run on ancient AMD chips. | ||||
* | poly1305: specialize to wireguard | Jason A. Donenfeld | 2018-11-15 | 1 | -11/+20 |
| | |||||
* | chacha20: specialize to wireguard | Jason A. Donenfeld | 2018-11-15 | 2 | -20/+38 |
| | |||||
* | perlasm: cleanup whitespace | Jason A. Donenfeld | 2018-11-15 | 1 | -5/+5 |
| | |||||
* | poly1305: adjust to kernel | Samuel Neves | 2018-11-15 | 1 | -220/+291 |
| | | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | chacha20: cleaner function declarations | Samuel Neves | 2018-11-14 | 1 | -23/+23 |
| | | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | chacha20: normalize names | Samuel Neves | 2018-11-14 | 1 | -71/+71 |
| | | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | chacha20: fixup win64 stack offsets | Samuel Neves | 2018-11-14 | 1 | -129/+129 |
| | | | | | | We don't need to do this for kernel purposes, but it's polite to leave things unbroken. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | chacha20: simplify stack unwinding on ChaCha20_ctr32 | Samuel Neves | 2018-11-14 | 1 | -10/+8 |
| | | | | | | objtool did not quite understand the stack arithmetic employed here. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | chacha20: use DRAP idiom | Samuel Neves | 2018-11-14 | 1 | -236/+235 |
| | | | | | | This effectively means swapping the usage of %r9 and %r10 globally. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | chacha20: add hchacha_ssse3 | Samuel Neves | 2018-11-14 | 1 | -0/+39 |
| | | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | chacha20: begin adapting to kernel setting | Samuel Neves | 2018-11-14 | 2 | -68/+116 |
| | | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | chacha20,poly1305: switch to perlasm originals on x86_64 | Samuel Neves | 2018-11-14 | 5 | -5424/+9596 |
| | | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | chacha20,poly1305: use CONFIG_KERNEL_MODE_NEON in .pl on arm | Jason A. Donenfeld | 2018-11-14 | 4 | -8/+11 |
| | | | | | | While Andy is right to desire a separation between compiler defines and project defines, there are simply too many odd kernel configurations and we require testing for CONFIG_KERNEL_MODE_NEON. | ||||
* | chacha20,poly1305: switch to perlasm originals on mips and arm | Jason A. Donenfeld | 2018-11-14 | 12 | -6104/+5570 |
| | | | | | We also separate out Eric Biggers' Cortex A7 implementation into its own file. | ||||
* | global: various formatting tweeks | Jason A. Donenfeld | 2018-11-13 | 2 | -2/+1 |
| | |||||
* | curve25519-x86_64: this was relicensed to BSD-3-Clause upstream | Jason A. Donenfeld | 2018-10-27 | 1 | -1/+1 |
| | |||||
* | poly1305-donna64: mark large constants as ULL | Jason A. Donenfeld | 2018-10-27 | 1 | -24/+24 |
| | |||||
* | crypto: clean up remaining .h->.c | Jason A. Donenfeld | 2018-10-07 | 8 | -10/+10 |
| | |||||
* | crypto: use BIT(i) & bitmap instead of (bitmap >> i) & 1 | Jason A. Donenfeld | 2018-10-07 | 1 | -2/+2 |
| | | | | | | | | | | | Pros: clearer if you're not familiar with the shift idiom, uses kernel macro. Cons: doesn't work any more if the lvalue ever ceases to be a bool. Neutral: generates the same machine code. Suggested-by: Sultan Alsawaf <sultanxda@gmail.com> | ||||
* | crypto: disable broken implementations in selftests | Jason A. Donenfeld | 2018-10-07 | 1 | -9/+8 |
| | |||||
* | crypto: test all SIMD combinations | Jason A. Donenfeld | 2018-10-06 | 20 | -40/+82 |
| | |||||
* | global: rename include'd C files to be .c | Jason A. Donenfeld | 2018-10-06 | 19 | -28/+28 |
| | | | | | | | | | This is done by 259 other files in the kernel tree: linux $ rg '#include.*\.c' -l | wc -l 259 Suggested-by: Sultan Alsawaf <sultanxda@gmail.com> |