Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | simd: encapsulate fpu amortization into nice functions | Jason A. Donenfeld | 2018-06-17 | 3 | -47/+66 |
| | |||||
* | chacha20poly1305: use slow crypto on -rt kernels on arm too | Jason A. Donenfeld | 2018-06-14 | 1 | -1/+1 |
| | |||||
* | chacha20poly1305: use slow crypto on -rt kernels | Jason A. Donenfeld | 2018-06-13 | 1 | -1/+1 |
| | | | | | | | | | | | | | | In rt kernels, spinlocks call schedule(), which means preemption can't be disabled. The FPU disables preemption. Hence, we can either restructure things to move the calls to kernel_fpu_begin/end to be really close to the actual crypto routines, or we can do the slower lazier solution of just not using the FPU at all on -rt kernels. This patch goes with the latter lazy solution. The reason why we don't place the calls to kernel_fpu_begin/end close to the crypto routines in the first place is that they're very expensive, as it usually involves a call to XSAVE. So on sane kernels, we benefit from only having to call it once. | ||||
* | chacha20: add missing include to header | Jason A. Donenfeld | 2018-06-02 | 1 | -0/+1 |
| | |||||
* | poly1305: mips: compute S on fly | René van Dorst | 2018-05-31 | 1 | -31/+22 |
| | | | | | | This reduces memory access and the total opaque size. Signed-off-by: René van Dorst <opensource@vdorst.com> | ||||
* | crypto: consistent constification | Jason A. Donenfeld | 2018-05-31 | 6 | -23/+23 |
| | |||||
* | chacha20poly1305: combine stack variables into union | Jason A. Donenfeld | 2018-05-31 | 1 | -54/+53 |
| | |||||
* | chacha20poly1305: split up into separate files | Jason A. Donenfeld | 2018-05-31 | 6 | -614/+724 |
| | |||||
* | curve25519: x86_64: make symbol static | Jason A. Donenfeld | 2018-05-29 | 1 | -2/+2 |
| | |||||
* | curve25519: x86_64: satisfy sparse | Jason A. Donenfeld | 2018-05-29 | 1 | -260/+260 |
| | |||||
* | chacha20poly1305: add mips32 implementation | René van Dorst | 2018-05-18 | 3 | -5/+912 |
| | | | | Signed-off-by: René van Dorst <opensource@vdorst.com> | ||||
* | chacha20poly1305: make gcc 8.1 happy | Samuel Neves | 2018-05-13 | 1 | -2/+2 |
| | | | | | | | | | | | | | | | GCC 8.1 does not know about the invariant `0 <= ctx->num < POLY1305_BLOCK_SIZE`. This results in a warning that `memcpy(ctx->data + num, inp, len);` may overflow the `data` field, which is correct for arbitrary values of `num`. To make the invariant explicit we ensure that `num` is in the required range. An alternative would be to change `ctx->num` to a 4-bit bitfield at the point of declaration. This changes the code from `test ebp, ebp; jz end` to `and ebp, 15; jz end`, which have identical performance characteristics. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | poly1305: do not place constants in different sections | Jason A. Donenfeld | 2018-04-18 | 1 | -14/+1 |
| | | | | | | | We're referencing these constants as one contiguous blob, so if there's any merging that goes on with other constants elsewhere (such as the kernel's current poly1305 implementation that we hope to replace), then these will be reordered and have the wrong values. | ||||
* | blake2s: remove unused helper | Jason A. Donenfeld | 2018-04-16 | 1 | -5/+0 |
| | |||||
* | chacha20poly1305: put magic constant behind macro | Jason A. Donenfeld | 2018-04-05 | 1 | -2/+4 |
| | |||||
* | curve25519: precomp const correctness | Jason A. Donenfeld | 2018-03-09 | 1 | -24/+22 |
| | |||||
* | curve25519: memzero in batches | Jason A. Donenfeld | 2018-03-09 | 1 | -140/+124 |
| | |||||
* | curve25519: use cmov instead of xor for cswap | Jason A. Donenfeld | 2018-03-09 | 1 | -12/+39 |
| | | | | Also add cselect optimization. | ||||
* | curve25519: use precomp implementation instead of sandy2x | Jason A. Donenfeld | 2018-03-09 | 3 | -3437/+2070 |
| | | | | It's faster and doesn't use the FPU. | ||||
* | crypto: read only after init | Jason A. Donenfeld | 2018-03-02 | 4 | -10/+11 |
| | |||||
* | blake2s: use union instead of casting | Jason A. Donenfeld | 2018-02-14 | 1 | -18/+16 |
| | | | | | This deals with alignment more easily and also helps squelch a clang-analyzer warning. | ||||
* | curve25519: replace fiat64 with faster hacl64 | Jason A. Donenfeld | 2018-02-01 | 3 | -470/+883 |
| | | | | | This reverts commit da4ff396cc5d5e0ff21f9ecbc2f951c048c63fff and adds some optimizations to hacl64. | ||||
* | curve25519: replace hacl64 with fiat64 | Jason A. Donenfeld | 2018-02-01 | 3 | -871/+470 |
| | | | | | | | | | | For now, it's faster: hacl64: 109782 cycles per call fiat64: 108984 cycles per call It's quite possible this commit will be reverted with nice changes from INRIA, though. | ||||
* | chacha20poly1305: better buffer alignment | Jason A. Donenfeld | 2018-01-30 | 1 | -9/+8 |
| | |||||
* | chacha20poly1305: use existing rol32 function | Jason A. Donenfeld | 2018-01-30 | 1 | -9/+4 |
| | |||||
* | poly1305: add poly-specific self-tests | Jason A. Donenfeld | 2018-01-19 | 2 | -0/+2 |
| | |||||
* | curve25519-fiat32: uninline certain functions | Jason A. Donenfeld | 2018-01-18 | 1 | -4/+4 |
| | | | | | | | | | | | While this has a negative performance impact on x86_64, it has a positive performance impact on smaller machines, which is where we're actually using this code. For example, an A53: Before: fiat32: 228605 cycles per call After: fiat32: 188307 cycles per call | ||||
* | curve25519: wire up new impls and remove donna | Jason A. Donenfeld | 2018-01-18 | 3 | -1454/+3 |
| | |||||
* | curve25519: resolve symbol clash between fe types | Jason A. Donenfeld | 2018-01-18 | 1 | -7/+7 |
| | |||||
* | curve25519: import 64-bit hacl-star implementation | Jason A. Donenfeld | 2018-01-18 | 1 | -0/+739 |
| | |||||
* | curve25519: import 32-bit fiat-crypto implementation | Jason A. Donenfeld | 2018-01-18 | 1 | -0/+838 |
| | |||||
* | curve25519: modularize implementation | Jason A. Donenfeld | 2018-01-18 | 5 | -1610/+1640 |
| | |||||
* | poly1305: remove indirect calls | Samuel Neves | 2018-01-18 | 1 | -79/+96 |
| | | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | global: year bump | Jason A. Donenfeld | 2018-01-03 | 16 | -16/+16 |
| | |||||
* | crypto: compile on UML | Jason A. Donenfeld | 2017-12-13 | 4 | -2/+8 |
| | | | | We basically just don't use FPU in UML. | ||||
* | chacha20poly1305: wire up avx512vl for skylake-x | Jason A. Donenfeld | 2017-12-11 | 2 | -4/+17 |
| | |||||
* | chacha20: avx512vl implementation | Samuel Neves | 2017-12-11 | 2 | -0/+571 |
| | | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | poly1305: fix avx512f alignment bug | Samuel Neves | 2017-12-11 | 1 | -1/+1 |
| | | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | chacha20poly1305: cleaner generic code | Jason A. Donenfeld | 2017-12-11 | 1 | -90/+49 |
| | |||||
* | blake2s-x86_64: fix spacing | Jason A. Donenfeld | 2017-12-09 | 1 | -70/+70 |
| | |||||
* | global: add SPDX tags to all files | Greg Kroah-Hartman | 2017-12-09 | 16 | -247/+57 |
| | | | | | | | | | | | | | It's good to have SPDX identifiers in all files as the Linux kernel developers are working to add these identifiers to all files. Update all files with the correct SPDX license identifier based on the license text of the project or based on the license in the file itself. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Modified-by: Jason A. Donenfeld <Jason@zx2c4.com> | ||||
* | chacha20-arm: fix with clang -fno-integrated-as. | David Benjamin | 2017-12-03 | 1 | -1/+3 |
| | | | | | | | | | The __clang__-guarded #defines cause gas to complain if clang is passed -fno-integrated-as. Emitting .syntax unified when those are used fixes this. Reviewed-by: Andy Polyakov <appro@openssl.org> Reviewed-by: Kurt Roeckx <kurt@roeckx.be> | ||||
* | poly1305: update x86-64 kernel to AVX512F only | Samuel Neves | 2017-12-03 | 2 | -138/+132 |
| | | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | curve25519: explictly depend on AS_AVX | Jason A. Donenfeld | 2017-11-28 | 1 | -3/+3 |
| | |||||
* | curve25519: modularize dispatch | Jason A. Donenfeld | 2017-11-28 | 1 | -91/+82 |
| | |||||
* | blake2s: tweak avx512 code | Samuel Neves | 2017-11-26 | 1 | -64/+47 |
| | | | | | | | | This is not as ideal as using zmm, but zmm downclocks. And it's not as fast single-threaded as using the gathers. But it is faster when multithreaded, which is what WireGuard is doing. Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | chacha20: directly assign constant and initial state | Jason A. Donenfeld | 2017-11-23 | 1 | -59/+20 |
| | |||||
* | blake2s: hmac space optimization | Samuel Neves | 2017-11-22 | 1 | -16/+12 |
| | | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | blake2s: AVX512F+VL implementation | Samuel Neves | 2017-11-22 | 2 | -0/+132 |
| | | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt> | ||||
* | poly1305-avx512: requires AVX512F+VL+BW | Samuel Neves | 2017-11-22 | 1 | -1/+6 |
| | | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt> |