aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/src/crypto (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
* curve25519: use precomp implementation instead of sandy2xJason A. Donenfeld2018-03-093-3437/+2070
| | | | It's faster and doesn't use the FPU.
* crypto: read only after initJason A. Donenfeld2018-03-024-10/+11
|
* blake2s: use union instead of castingJason A. Donenfeld2018-02-141-18/+16
| | | | | This deals with alignment more easily and also helps squelch a clang-analyzer warning.
* curve25519: replace fiat64 with faster hacl64Jason A. Donenfeld2018-02-013-470/+883
| | | | | This reverts commit da4ff396cc5d5e0ff21f9ecbc2f951c048c63fff and adds some optimizations to hacl64.
* curve25519: replace hacl64 with fiat64Jason A. Donenfeld2018-02-013-871/+470
| | | | | | | | | | For now, it's faster: hacl64: 109782 cycles per call fiat64: 108984 cycles per call It's quite possible this commit will be reverted with nice changes from INRIA, though.
* chacha20poly1305: better buffer alignmentJason A. Donenfeld2018-01-301-9/+8
|
* chacha20poly1305: use existing rol32 functionJason A. Donenfeld2018-01-301-9/+4
|
* poly1305: add poly-specific self-testsJason A. Donenfeld2018-01-192-0/+2
|
* curve25519-fiat32: uninline certain functionsJason A. Donenfeld2018-01-181-4/+4
| | | | | | | | | | | While this has a negative performance impact on x86_64, it has a positive performance impact on smaller machines, which is where we're actually using this code. For example, an A53: Before: fiat32: 228605 cycles per call After: fiat32: 188307 cycles per call
* curve25519: wire up new impls and remove donnaJason A. Donenfeld2018-01-183-1454/+3
|
* curve25519: resolve symbol clash between fe typesJason A. Donenfeld2018-01-181-7/+7
|
* curve25519: import 64-bit hacl-star implementationJason A. Donenfeld2018-01-181-0/+739
|
* curve25519: import 32-bit fiat-crypto implementationJason A. Donenfeld2018-01-181-0/+838
|
* curve25519: modularize implementationJason A. Donenfeld2018-01-185-1610/+1640
|
* poly1305: remove indirect callsSamuel Neves2018-01-181-79/+96
| | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
* global: year bumpJason A. Donenfeld2018-01-0316-16/+16
|
* crypto: compile on UMLJason A. Donenfeld2017-12-134-2/+8
| | | | We basically just don't use FPU in UML.
* chacha20poly1305: wire up avx512vl for skylake-xJason A. Donenfeld2017-12-112-4/+17
|
* chacha20: avx512vl implementationSamuel Neves2017-12-112-0/+571
| | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
* poly1305: fix avx512f alignment bugSamuel Neves2017-12-111-1/+1
| | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
* chacha20poly1305: cleaner generic codeJason A. Donenfeld2017-12-111-90/+49
|
* blake2s-x86_64: fix spacingJason A. Donenfeld2017-12-091-70/+70
|
* global: add SPDX tags to all filesGreg Kroah-Hartman2017-12-0916-247/+57
| | | | | | | | | | | | | It's good to have SPDX identifiers in all files as the Linux kernel developers are working to add these identifiers to all files. Update all files with the correct SPDX license identifier based on the license text of the project or based on the license in the file itself. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Modified-by: Jason A. Donenfeld <Jason@zx2c4.com>
* chacha20-arm: fix with clang -fno-integrated-as.David Benjamin2017-12-031-1/+3
| | | | | | | | | The __clang__-guarded #defines cause gas to complain if clang is passed -fno-integrated-as. Emitting .syntax unified when those are used fixes this. Reviewed-by: Andy Polyakov <appro@openssl.org> Reviewed-by: Kurt Roeckx <kurt@roeckx.be>
* poly1305: update x86-64 kernel to AVX512F onlySamuel Neves2017-12-032-138/+132
| | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
* curve25519: explictly depend on AS_AVXJason A. Donenfeld2017-11-281-3/+3
|
* curve25519: modularize dispatchJason A. Donenfeld2017-11-281-91/+82
|
* blake2s: tweak avx512 codeSamuel Neves2017-11-261-64/+47
| | | | | | | | This is not as ideal as using zmm, but zmm downclocks. And it's not as fast single-threaded as using the gathers. But it is faster when multithreaded, which is what WireGuard is doing. Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
* chacha20: directly assign constant and initial stateJason A. Donenfeld2017-11-231-59/+20
|
* blake2s: hmac space optimizationSamuel Neves2017-11-221-16/+12
| | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
* blake2s: AVX512F+VL implementationSamuel Neves2017-11-222-0/+132
| | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
* poly1305-avx512: requires AVX512F+VL+BWSamuel Neves2017-11-221-1/+6
| | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
* chacha20poly1305: poly cleans up its own stateJason A. Donenfeld2017-11-221-5/+1
|
* poly1305-x86_64: unclobber %rbpSamuel Neves2017-11-221-131/+145
| | | | | | | | | | | | | | OpenSSL's Poly1305 kernels use %rbp as a scratch register. However, the kernel expects rbp to be a valid frame pointer at any given time in order to do proper unwinding. Thus we need to alter the code in order to preserve it. The most straightforward manner in which this was accomplished was by replacing $d3 in poly1305-x86_64.pl -- formerly %r10 -- by %rdi, and replace %rbp by %r10. Because %rdi, a pointer to the context structure, does not change and is not used by poly1305_iteration, it is safe to use it here, and the overhead of saving and restoring it should be minimal. Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
* poly1305: import MIPS64 primitive from OpenSSLJason A. Donenfeld2017-11-223-9/+401
|
* chacha20poly1305: import ARM primitives from OpenSSLJason A. Donenfeld2017-11-2211-1025/+5513
| | | | ARMv4-ARMv8, with NEON for ARMv7 and ARMv8.
* chacha20poly1305: import x86_64 primitives from OpenSSLSamuel Neves2017-11-229-2455/+5236
| | | | | | x86_64 only at the moment. SSSE3, AVX, AVX2, AVX512. Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
* curve25519-neon: compile in thumb modeJason A. Donenfeld2017-11-142-6/+6
| | | | | In thumb mode, it's not possible to use sp as an operand of and, so we have to muck around with r3 as a scratch register.
* curve25519: reject deriving from NULL private keysJason A. Donenfeld2017-11-111-0/+7
| | | | | These aren't actually valid 25519 points pre-normalization, and doing this is required to make unsetting private keys based on all zeros.
* receive: hoist fpu outside of receive loopJason A. Donenfeld2017-11-102-15/+13
|
* curve25519: only enable int128 if compiler support is soundJason A. Donenfeld2017-10-311-1/+1
|
* global: style nitsJason A. Donenfeld2017-10-314-129/+198
|
* qemu: allow for cross compilationJason A. Donenfeld2017-10-311-3/+3
|
* crypto/avx: make sure we can actually use ymm registersJason A. Donenfeld2017-10-313-3/+3
|
* blake2: include headers for macrosJason A. Donenfeld2017-10-311-0/+2
|
* blake2s: modernize API and have faster _finalJason A. Donenfeld2017-10-172-48/+64
|
* crypto/x86_64: satisfy stack validation 2.0Jason A. Donenfeld2017-10-093-31/+29
| | | | | We change this to look like the code gcc generates, so as to keep the objtool checker somewhat happy.
* global: use _WG prefix for include guardsJason A. Donenfeld2017-10-033-9/+9
| | | | Suggested-by: Sultan Alsawaf <sultanxda@gmail.com>
* global: satisfy bitshift pedantryJason A. Donenfeld2017-10-031-7/+7
| | | | Suggested-by: Sultan Alsawaf <sultanxda@gmail.com>
* curve25519-neon-arm: force ARM encoding, since this is unrepresentable in ThumbJason A. Donenfeld2017-10-021-0/+1
|