Age | Commit message (Collapse) | Author | Files | Lines |
|
Instantiate crc-pclmul-template.S for crc_t10dif and delete the original
PCLMULQDQ optimized implementation. This has the following advantages:
- Less CRC-variant-specific code.
- VPCLMULQDQ support, greatly improving performance on sufficiently long
messages on newer CPUs.
- A faster reduction from 128 bits to the final CRC.
- Support for i386.
Benchmark results on AMD Ryzen 9 9950X (Zen 5) using crc_kunit:
Length Before After
------ ------ -----
1 440 MB/s 386 MB/s
16 1865 MB/s 2008 MB/s
64 4343 MB/s 6917 MB/s
127 5440 MB/s 8909 MB/s
128 5533 MB/s 12150 MB/s
200 5908 MB/s 14423 MB/s
256 15870 MB/s 21288 MB/s
511 14219 MB/s 25840 MB/s
512 18361 MB/s 37797 MB/s
1024 19941 MB/s 61374 MB/s
3173 20461 MB/s 74909 MB/s
4096 21310 MB/s 78919 MB/s
16384 21663 MB/s 85012 MB/s
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20250210174540.161705-6-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
With the "crct10dif" algorithm having been removed from the crypto API,
crc_t10dif_is_optimized() is no longer used.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250208175647.12333-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Move the x86 CRC-T10DIF assembly code into the lib directory and wire it
up to the library interface. This allows it to be used without going
through the crypto API. It remains usable via the crypto API too via
the shash algorithms that use the library interface. Thus all the
arch-specific "shash" code becomes unnecessary and is removed.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20241202012056.209768-5-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|