aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/src/crypto/zinc/chacha20/chacha20-x86_64.pl (follow)
Commit message (Collapse)AuthorAgeFilesLines
* crypto: use new assembler macros for 5.5Jason A. Donenfeld2019-12-051-2/+2
|
* zinc: update copyrightJason A. Donenfeld2019-05-291-1/+1
|
* blake2s,chacha: latency tweakSamuel Neves2019-05-291-24/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In every odd-numbered round, instead of operating over the state x00 x01 x02 x03 x05 x06 x07 x04 x10 x11 x08 x09 x15 x12 x13 x14 we operate over the rotated state x03 x00 x01 x02 x04 x05 x06 x07 x09 x10 x11 x08 x14 x15 x12 x13 The advantage here is that this requires no changes to the 'x04 x05 x06 x07' row, which is in the critical path. This results in a noticeable latency improvement of roughly R cycles, for R diagonal rounds in the primitive. In the case of BLAKE2s, which I also moved from requiring AVX to only requiring SSSE3, we save approximately 30 cycles per compression function call on Haswell and Skylake. In other words, this is an improvement of ~0.6 cpb. This idea was pointed out to me by Shunsuke Shimizu, though it appears to have been around for longer. Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
* global: update copyrightJason A. Donenfeld2019-01-071-1/+1
|
* chacha20: do not define unused asm functionJason A. Donenfeld2018-12-071-4/+2
| | | | | | This causes RAP to be unhappy, and we're not using it anyway. Reported-by: Ivan J. <parazyd@dyne.org>
* chacha20,poly1305: simplify perlasm fancinessJason A. Donenfeld2018-12-071-34/+32
|
* chacha20,poly1305: do not use xlateJason A. Donenfeld2018-11-191-25/+34
|
* chacha20,poly1305: don't do compiler testing in generator and remove xor helperJason A. Donenfeld2018-11-151-15/+19
|
* chacha20,poly1305: fix up for win64Samuel Neves2018-11-151-1/+1
| | | | | | | These don't help us, but it is important to keep this working for when it's re-added to cryptogams. Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
* chacha20: specialize to wireguardJason A. Donenfeld2018-11-151-12/+21
|
* chacha20: cleaner function declarationsSamuel Neves2018-11-141-23/+23
| | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
* chacha20: normalize namesSamuel Neves2018-11-141-71/+71
| | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
* chacha20: fixup win64 stack offsetsSamuel Neves2018-11-141-129/+129
| | | | | | We don't need to do this for kernel purposes, but it's polite to leave things unbroken. Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
* chacha20: simplify stack unwinding on ChaCha20_ctr32Samuel Neves2018-11-141-10/+8
| | | | | | objtool did not quite understand the stack arithmetic employed here. Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
* chacha20: use DRAP idiomSamuel Neves2018-11-141-236/+235
| | | | | | This effectively means swapping the usage of %r9 and %r10 globally. Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
* chacha20: add hchacha_ssse3Samuel Neves2018-11-141-0/+39
| | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
* chacha20: begin adapting to kernel settingSamuel Neves2018-11-141-67/+114
| | | | Signed-off-by: Samuel Neves <sneves@dei.uc.pt>
* chacha20,poly1305: switch to perlasm originals on x86_64Samuel Neves2018-11-141-0/+4005
Signed-off-by: Samuel Neves <sneves@dei.uc.pt>