<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-dev/arch/arm/crypto/Makefile, branch master</title>
<subtitle>Linux kernel development work - see feature branches</subtitle>
<id>https://git.zx2c4.com/linux-dev/atom/arch/arm/crypto/Makefile?h=master</id>
<link rel='self' href='https://git.zx2c4.com/linux-dev/atom/arch/arm/crypto/Makefile?h=master'/>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/'/>
<updated>2022-06-10T08:43:49Z</updated>
<entry>
<title>crypto: blake2s - remove shash module</title>
<updated>2022-06-10T08:43:49Z</updated>
<author>
<name>Jason A. Donenfeld</name>
<email>Jason@zx2c4.com</email>
</author>
<published>2022-05-28T19:44:07Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=2d16803c562ecc644803d42ba98a8e0aef9c014e'/>
<id>urn:sha1:2d16803c562ecc644803d42ba98a8e0aef9c014e</id>
<content type='text'>
BLAKE2s has no currently known use as an shash. Just remove all of this
unnecessary plumbing. Removing this shash was something we talked about
back when we were making BLAKE2s a built-in, but I simply never got
around to doing it. So this completes that project.

Importantly, this fixs a bug in which the lib code depends on
crypto_simd_disabled_for_test, causing linker errors.

Also add more alignment tests to the selftests and compare SIMD and
non-SIMD compression functions, to make up for what we lose from
testmgr.c.

Reported-by: gaochao &lt;gaochao49@huawei.com&gt;
Cc: Eric Biggers &lt;ebiggers@kernel.org&gt;
Cc: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Cc: stable@vger.kernel.org
Fixes: 6048fdcc5f26 ("lib/crypto: blake2s: include as built-in")
Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
</entry>
<entry>
<title>lib/crypto: blake2s: include as built-in</title>
<updated>2022-01-06T23:25:25Z</updated>
<author>
<name>Jason A. Donenfeld</name>
<email>Jason@zx2c4.com</email>
</author>
<published>2021-12-22T13:56:58Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=6048fdcc5f269c7f31d774c295ce59081b36e6f9'/>
<id>urn:sha1:6048fdcc5f269c7f31d774c295ce59081b36e6f9</id>
<content type='text'>
In preparation for using blake2s in the RNG, we change the way that it
is wired-in to the build system. Instead of using ifdefs to select the
right symbol, we use weak symbols. And because ARM doesn't need the
generic implementation, we make the generic one default only if an arch
library doesn't need it already, and then have arch libraries that do
need it opt-in. So that the arch libraries can remain tristate rather
than bool, we then split the shash part from the glue code.

Acked-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Acked-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Masahiro Yamada &lt;masahiroy@kernel.org&gt;
Cc: linux-kbuild@vger.kernel.org
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
</content>
</entry>
<entry>
<title>crypto: arm - use a pattern rule for generating *.S files</title>
<updated>2021-05-14T11:07:54Z</updated>
<author>
<name>Masahiro Yamada</name>
<email>masahiroy@kernel.org</email>
</author>
<published>2021-04-25T17:57:32Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=8116138cbfcee80b1bf9b57073278dcd86b44656'/>
<id>urn:sha1:8116138cbfcee80b1bf9b57073278dcd86b44656</id>
<content type='text'>
Unify similar build rules.

Signed-off-by: Masahiro Yamada &lt;masahiroy@kernel.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
</entry>
<entry>
<title>crypto: arm - generate *.S by Perl at build time instead of shipping them</title>
<updated>2021-05-14T11:07:54Z</updated>
<author>
<name>Masahiro Yamada</name>
<email>masahiroy@kernel.org</email>
</author>
<published>2021-04-25T17:57:31Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=7c0303ff7e67b637c47d8afee533ca9e2a02359b'/>
<id>urn:sha1:7c0303ff7e67b637c47d8afee533ca9e2a02359b</id>
<content type='text'>
Generate *.S by Perl like arch/{mips,x86}/crypto/Makefile.

Signed-off-by: Masahiro Yamada &lt;masahiroy@kernel.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
</entry>
<entry>
<title>crypto: arm/blake2b - add NEON-accelerated BLAKE2b</title>
<updated>2021-01-02T21:41:39Z</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2020-12-23T08:10:03Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=1862eb007367f9e4cfd52d0406742de337b28ebf'/>
<id>urn:sha1:1862eb007367f9e4cfd52d0406742de337b28ebf</id>
<content type='text'>
Add a NEON-accelerated implementation of BLAKE2b.

On Cortex-A7 (which these days is the most common ARM processor that
doesn't have the ARMv8 Crypto Extensions), this is over twice as fast as
SHA-256, and slightly faster than SHA-1.  It is also almost three times
as fast as the generic implementation of BLAKE2b:

	Algorithm            Cycles per byte (on 4096-byte messages)
	===================  =======================================
	blake2b-256-neon     14.0
	sha1-neon            16.3
	blake2s-256-arm      18.8
	sha1-asm             20.8
	blake2s-256-generic  26.0
	sha256-neon	     28.9
	sha256-asm	     32.0
	blake2b-256-generic  38.9

This implementation isn't directly based on any other implementation,
but it borrows some ideas from previous NEON code I've written as well
as from chacha-neon-core.S.  At least on Cortex-A7, it is faster than
the other NEON implementations of BLAKE2b I'm aware of (the
implementation in the BLAKE2 official repository using intrinsics, and
Andrew Moon's implementation which can be found in SUPERCOP).  It does
only one block at a time, so it performs well on short messages too.

NEON-accelerated BLAKE2b is useful because there is interest in using
BLAKE2b-256 for dm-verity on low-end Android devices (specifically,
devices that lack the ARMv8 Crypto Extensions) to replace SHA-1.  On
these devices, the performance cost of upgrading to SHA-256 may be
unacceptable, whereas BLAKE2b-256 would actually improve performance.

Although BLAKE2b is intended for 64-bit platforms (unlike BLAKE2s which
is intended for 32-bit platforms), on 32-bit ARM processors with NEON,
BLAKE2b is actually faster than BLAKE2s.  This is because NEON supports
64-bit operations, and because BLAKE2s's block size is too small for
NEON to be helpful for it.  The best I've been able to do with BLAKE2s
on Cortex-A7 is 18.8 cpb with an optimized scalar implementation.

(I didn't try BLAKE2sp and BLAKE3, which in theory would be faster, but
they're more complex as they require running multiple hashes at once.
Note that BLAKE2b already uses all the NEON bandwidth on the Cortex-A7,
so I expect that any speedup from BLAKE2sp or BLAKE3 would come only
from the smaller number of rounds, not from the extra parallelism.)

For now this BLAKE2b implementation is only wired up to the shash API,
since there is no library API for BLAKE2b yet.  However, I've tried to
keep things consistent with BLAKE2s, e.g. by defining
blake2b_compress_arch() which is analogous to blake2s_compress_arch()
and could be exported for use by the library API later if needed.

Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Tested-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
</entry>
<entry>
<title>crypto: arm/blake2s - add ARM scalar optimized BLAKE2s</title>
<updated>2021-01-02T21:41:39Z</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2020-12-23T08:09:59Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=5172d322d34c30fb926b29aeb5a064e1fd8a5e13'/>
<id>urn:sha1:5172d322d34c30fb926b29aeb5a064e1fd8a5e13</id>
<content type='text'>
Add an ARM scalar optimized implementation of BLAKE2s.

NEON isn't very useful for BLAKE2s because the BLAKE2s block size is too
small for NEON to help.  Each NEON instruction would depend on the
previous one, resulting in poor performance.

With scalar instructions, on the other hand, we can take advantage of
ARM's "free" rotations (like I did in chacha-scalar-core.S) to get an
implementation get runs much faster than the C implementation.

Performance results on Cortex-A7 in cycles per byte using the shash API:

	4096-byte messages:
		blake2s-256-arm:     18.8
		blake2s-256-generic: 26.0

	500-byte messages:
		blake2s-256-arm:     20.3
		blake2s-256-generic: 27.9

	100-byte messages:
		blake2s-256-arm:     29.7
		blake2s-256-generic: 39.2

	32-byte messages:
		blake2s-256-arm:     50.6
		blake2s-256-generic: 66.2

Except on very short messages, this is still slower than the NEON
implementation of BLAKE2b which I've written; that is 14.0, 16.4, 25.8,
and 76.1 cpb on 4096, 500, 100, and 32-byte messages, respectively.
However, optimized BLAKE2s is useful for cases where BLAKE2s is used
instead of BLAKE2b, such as WireGuard.

This new implementation is added in the form of a new module
blake2s-arm.ko, which is analogous to blake2s-x86_64.ko in that it
provides blake2s_compress_arch() for use by the library API as well as
optionally register the algorithms with the shash API.

Acked-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Tested-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
</entry>
<entry>
<title>crypto: arm/curve25519 - wire up NEON implementation</title>
<updated>2019-11-17T01:02:44Z</updated>
<author>
<name>Jason A. Donenfeld</name>
<email>Jason@zx2c4.com</email>
</author>
<published>2019-11-08T12:22:38Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=d8f1308a025fc7e00414194ed742d5f05a21e13c'/>
<id>urn:sha1:d8f1308a025fc7e00414194ed742d5f05a21e13c</id>
<content type='text'>
This ports the SUPERCOP implementation for usage in kernel space. In
addition to the usual header, macro, and style changes required for
kernel space, it makes a few small changes to the code:

  - The stack alignment is relaxed to 16 bytes.
  - Superfluous mov statements have been removed.
  - ldr for constants has been replaced with movw.
  - ldreq has been replaced with moveq.
  - The str epilogue has been made more idiomatic.
  - SIMD registers are not pushed and popped at the beginning and end.
  - The prologue and epilogue have been made idiomatic.
  - A hole has been removed from the stack, saving 32 bytes.
  - We write-back the base register whenever possible for vld1.8.
  - Some multiplications have been reordered for better A7 performance.

There are more opportunities for cleanup, since this code is from qhasm,
which doesn't always do the most opportune thing. But even prior to
extensive hand optimizations, this code delivers significant performance
improvements (given in get_cycles() per call):

		      ----------- -------------
	             | generic C | this commit |
	 ------------ ----------- -------------
	| Cortex-A7  |     49136 |       22395 |
	 ------------ ----------- -------------
	| Cortex-A17 |     17326 |        4983 |
	 ------------ ----------- -------------

Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
[ardb: - move to arch/arm/crypto
       - wire into lib/crypto framework
       - implement crypto API KPP hooks ]
Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
</entry>
<entry>
<title>crypto: arm/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation</title>
<updated>2019-11-17T01:02:42Z</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2019-11-08T12:22:25Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=a6b803b3ddc793d6db0c16f12fc12d30d20fa9cc'/>
<id>urn:sha1:a6b803b3ddc793d6db0c16f12fc12d30d20fa9cc</id>
<content type='text'>
This is a straight import of the OpenSSL/CRYPTOGAMS Poly1305 implementation
for NEON authored by Andy Polyakov, and contributed by him to the OpenSSL
project. The file 'poly1305-armv4.pl' is taken straight from this upstream
GitHub repository [0] at commit ec55a08dc0244ce570c4fc7cade330c60798952f,
and already contains all the changes required to build it as part of a
Linux kernel module.

[0] https://github.com/dot-asm/cryptogams

Co-developed-by: Andy Polyakov &lt;appro@cryptogams.org&gt;
Signed-off-by: Andy Polyakov &lt;appro@cryptogams.org&gt;
Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
</entry>
<entry>
<title>crypto: arm/chacha - remove dependency on generic ChaCha driver</title>
<updated>2019-11-17T01:02:40Z</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ardb@kernel.org</email>
</author>
<published>2019-11-08T12:22:14Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=b36d8c09e710c71f6a9690b6586fea2d1c9e1e27'/>
<id>urn:sha1:b36d8c09e710c71f6a9690b6586fea2d1c9e1e27</id>
<content type='text'>
Instead of falling back to the generic ChaCha skcipher driver for
non-SIMD cases, use a fast scalar implementation for ARM authored
by Eric Biggers. This removes the module dependency on chacha-generic
altogether, which also simplifies things when we expose the ChaCha
library interface from this module.

Signed-off-by: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
</entry>
<entry>
<title>crypto: arm - use Kconfig based compiler checks for crypto opcodes</title>
<updated>2019-10-23T08:46:56Z</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ard.biesheuvel@linaro.org</email>
</author>
<published>2019-10-11T09:08:00Z</published>
<link rel='alternate' type='text/html' href='https://git.zx2c4.com/linux-dev/commit/?id=b4d0c0aad57ac3bd1b5141bac5ab1ab1d5e442b3'/>
<id>urn:sha1:b4d0c0aad57ac3bd1b5141bac5ab1ab1d5e442b3</id>
<content type='text'>
Instead of allowing the Crypto Extensions algorithms to be selected when
using a toolchain that does not support them, and complain about it at
build time, use the information we have about the compiler to prevent
them from being selected in the first place. Users that are stuck with
a GCC version &lt;4.8 are unlikely to care about these routines anyway, and
it cleans up the Makefile considerably.

While at it, add explicit 'armv8-a' CPU specifiers to the code that uses
the 'crypto-neon-fp-armv8' FPU specifier so we don't regress Clang, which
will complain about this in version 10 and later.

Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
</content>
</entry>
</feed>
