linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2019-10-05	crypto: aegis128-neon - use Clang compatible cflags for ARM	Ard Biesheuvel	1	-1/+1
	The next version of Clang will start policing compiler command line options, and will reject combinations of -march and -mfpu that it thinks are incompatible. This results in errors like clang-10: warning: ignoring extension 'crypto' because the 'armv7-a' architecture does not support it [-Winvalid-command-line-argument] /tmp/aegis128-neon-inner-5ee428.s: Assembler messages: /tmp/aegis128-neon-inner-5ee428.s:73: Error: selected processor does not support `aese.8 q2,q14' in ARM mode when buiding the SIMD aegis128 code for 32-bit ARM, given that the 'armv7-a' -march argument is considered to be compatible with the ARM crypto extensions. Instead, we should use armv8-a, which does allow the crypto extensions to be enabled. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-09-21	Merge tag 'for-5.4/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm	Linus Torvalds	1	-0/+1
	Pull device mapper updates from Mike Snitzer: - crypto and DM crypt advances that allow the crypto API to reclaim implementation details that do not belong in DM crypt. The wrapper template for ESSIV generation that was factored out will also be used by fscrypt in the future. - Add root hash pkcs#7 signature verification to the DM verity target. - Add a new "clone" DM target that allows for efficient remote replication of a device. - Enhance DM bufio's cache to be tailored to each client based on use. Clients that make heavy use of the cache get more of it, and those that use less have reduced cache usage. - Add a new DM_GET_TARGET_VERSION ioctl to allow userspace to query the version number of a DM target (even if the associated module isn't yet loaded). - Fix invalid memory access in DM zoned target. - Fix the max_discard_sectors limit advertised by the DM raid target; it was mistakenly storing the limit in bytes rather than sectors. - Small optimizations and cleanups in DM writecache target. - Various fixes and cleanups in DM core, DM raid1 and space map portion of DM persistent data library. * tag 'for-5.4/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (22 commits) dm: introduce DM_GET_TARGET_VERSION dm bufio: introduce a global cache replacement dm bufio: remove old-style buffer cleanup dm bufio: introduce a global queue dm bufio: refactor adjust_total_allocated dm bufio: call adjust_total_allocated from __link_buffer and __unlink_buffer dm: add clone target dm raid: fix updating of max_discard_sectors limit dm writecache: skip writecache_wait for pmem mode dm stats: use struct_size() helper dm crypt: omit parsing of the encapsulated cipher dm crypt: switch to ESSIV crypto API template crypto: essiv - create wrapper template for ESSIV generation dm space map common: remove check for impossible sm_find_free() return value dm raid1: use struct_size() with kzalloc() dm writecache: optimize performance by sorting the blocks for writeback_all dm writecache: add unlikely for getting two block with same LBA dm writecache: remove unused member pointer in writeback_struct dm zoned: fix invalid memory access dm verity: add root hash pkcs#7 signature verification ...
2019-09-03	crypto: essiv - create wrapper template for ESSIV generation	Ard Biesheuvel	1	-0/+1
	Implement a template that wraps a (skcipher,shash) or (aead,shash) tuple so that we can consolidate the ESSIV handling in fscrypt and dm-crypt and move it into the crypto API. This will result in better test coverage, and will allow future changes to make the bare cipher interface internal to the crypto subsystem, in order to increase robustness of the API against misuse. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-08-15	crypto: arm64/aegis128 - implement plain NEON version	Ard Biesheuvel	1	-1/+8
	Provide a version of the core AES transform to the aegis128 SIMD code that does not rely on the special AES instructions, but uses plain NEON instructions instead. This allows the SIMD version of the aegis128 driver to be used on arm64 systems that do not implement those instructions (which are not mandatory in the architecture), such as the Raspberry Pi 3. Since GCC makes a mess of this when using the tbl/tbx intrinsics to perform the sbox substitution, preload the Sbox into v16..v31 in this case and use inline asm to emit the tbl/tbx instructions. Clang does not support this approach, nor does it require it, since it does a much better job at code generation, so there we use the intrinsics as usual. Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-15	crypto: aegis128 - provide a SIMD implementation based on NEON intrinsics	Ard Biesheuvel	1	-0/+12
	Provide an accelerated implementation of aegis128 by wiring up the SIMD hooks in the generic driver to an implementation based on NEON intrinsics, which can be compiled to both ARM and arm64 code. This results in a performance of 2.2 cycles per byte on Cortex-A53, which is a performance increase of ~11x compared to the generic code. Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-15	crypto: aegis128 - add support for SIMD acceleration	Ard Biesheuvel	1	-0/+1
	Add some plumbing to allow the AEGIS128 code to be built with SIMD routines for acceleration. Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-02	crypto: jitterentropy - build without sanitizer	Arnd Bergmann	1	-0/+2
	Recent clang-9 snapshots double the kernel stack usage when building this file with -O0 -fsanitize=kernel-hwaddress, compared to clang-8 and older snapshots, this changed between commits svn364966 and svn366056: crypto/jitterentropy.c:516:5: error: stack frame size of 2640 bytes in function 'jent_entropy_init' [-Werror,-Wframe-larger-than=] int jent_entropy_init(void) ^ crypto/jitterentropy.c:185:14: error: stack frame size of 2224 bytes in function 'jent_lfsr_time' [-Werror,-Wframe-larger-than=] static __u64 jent_lfsr_time(struct rand_data *ec, __u64 time, __u64 loop_cnt) ^ I prepared a reduced test case in case any clang developers want to take a closer look, but from looking at the earlier output it seems that even with clang-8, something was very wrong here. Turn off any KASAN and UBSAN sanitizing for this file, as that likely clashes with -O0 anyway. Turning off just KASAN avoids the warning already, but I suspect both of these have undesired side-effects for jitterentropy. Link: https://godbolt.org/z/fDcwZ5 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-08-02	Revert "crypto: aegis128 - add support for SIMD acceleration"	Herbert Xu	1	-12/+0
	This reverts commit ecc8bc81f2fb3976737ef312f824ba6053aa3590 ("crypto: aegis128 - provide a SIMD implementation based on NEON intrinsics") and commit 7cdc0ddbf74a19cecb2f0e9efa2cae9d3c665189 ("crypto: aegis128 - add support for SIMD acceleration"). They cause compile errors on platforms other than ARM because the mechanism to selectively compile the SIMD code is broken. Repoted-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: aegis128 - provide a SIMD implementation based on NEON intrinsics	Ard Biesheuvel	1	-0/+11
	Provide an accelerated implementation of aegis128 by wiring up the SIMD hooks in the generic driver to an implementation based on NEON intrinsics, which can be compiled to both ARM and arm64 code. This results in a performance of 2.2 cycles per byte on Cortex-A53, which is a performance increase of ~11x compared to the generic code. Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: aegis128 - add support for SIMD acceleration	Ard Biesheuvel	1	-0/+1
	Add some plumbing to allow the AEGIS128 code to be built with SIMD routines for acceleration. Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: aegis128l/aegis256 - remove x86 and generic implementations	Ard Biesheuvel	1	-2/+0
	Three variants of AEGIS were proposed for the CAESAR competition, and only one was selected for the final portfolio: AEGIS128. The other variants, AEGIS128L and AEGIS256, are not likely to ever turn up in networking protocols or other places where interoperability between Linux and other systems is a concern, nor are they likely to be subjected to further cryptanalysis. However, uninformed users may think that AEGIS128L (which is faster) is equally fit for use. So let's remove them now, before anyone starts using them and we are forced to support them forever. Note that there are no known flaws in the algorithms or in any of these implementations, but they have simply outlived their usefulness. Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: morus - remove generic and x86 implementations	Ard Biesheuvel	1	-2/+0
	MORUS was not selected as a winner in the CAESAR competition, which is not surprising since it is considered to be cryptographically broken [0]. (Note that this is not an implementation defect, but a flaw in the underlying algorithm). Since it is unlikely to be in use currently, let's remove it before we're stuck with it. [0] https://eprint.iacr.org/2019/172.pdf Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-06-06	crypto: xxhash - Implement xxhash support	Nikolay Borisov	1	-0/+1
	xxhash is currently implemented as a self-contained module in /lib. This patch enables that module to be used as part of the generic kernel crypto framework. It adds a simple wrapper to the 64bit version. I've also added test vectors (with help from Nick Terrell). The upstream xxhash code is tested by running hashing operation on random 222 byte data with seed values of 0 and a prime number. The upstream test suite can be found at https://github.com/Cyan4973/xxHash/blob/cf46e0c/xxhsum.c#L664 Essentially hashing is run on data of length 0,1,14,222 with the aforementioned seed values 0 and prime 2654435761. The particular random 222 byte string was provided to me by Nick Terrell by reading /dev/random and the checksums were calculated by the upstream xxsum utility with the following bash script: dd if=/dev/random of=TEST_VECTOR bs=1 count=222 for a in 0 1; do for l in 0 1 14 222; do for s in 0 2654435761; do echo algo $a length $l seed $s; head -c $l TEST_VECTOR \| ~/projects/kernel/xxHash/xxhsum -H$a -s$s done done done This produces output as follows: algo 0 length 0 seed 0 02cc5d05 stdin algo 0 length 0 seed 2654435761 02cc5d05 stdin algo 0 length 1 seed 0 25201171 stdin algo 0 length 1 seed 2654435761 25201171 stdin algo 0 length 14 seed 0 c1d95975 stdin algo 0 length 14 seed 2654435761 c1d95975 stdin algo 0 length 222 seed 0 b38662a6 stdin algo 0 length 222 seed 2654435761 b38662a6 stdin algo 1 length 0 seed 0 ef46db3751d8e999 stdin algo 1 length 0 seed 2654435761 ac75fda2929b17ef stdin algo 1 length 1 seed 0 27c3f04c2881203a stdin algo 1 length 1 seed 2654435761 4a15ed26415dfe4d stdin algo 1 length 14 seed 0 3d33dc700231dfad stdin algo 1 length 14 seed 2654435761 ea5f7ddef9a64f80 stdin algo 1 length 222 seed 0 5f3d3c08ec2bef34 stdin algo 1 length 222 seed 2654435761 6a9df59664c7ed62 stdin algo 1 is xx64 variant, algo 0 is the 32 bit variant which is currently not hooked up. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-05-30	crypto: cryptd - move kcrypto_wq into cryptd	Eric Biggers	1	-2/+0
	kcrypto_wq is only used by cryptd, so move it into cryptd.c and change the workqueue name from "crypto" to "cryptd". Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-04-18	crypto: ecrdsa - add EC-RDSA (GOST 34.10) algorithm	Vitaly Chikunov	1	-0/+8
	Add Elliptic Curve Russian Digital Signature Algorithm (GOST R 34.10-2012, RFC 7091, ISO/IEC 14888-3) is one of the Russian (and since 2018 the CIS countries) cryptographic standard algorithms (called GOST algorithms). Only signature verification is supported, with intent to be used in the IMA. Summary of the changes: * crypto/Kconfig: - EC-RDSA is added into Public-key cryptography section. * crypto/Makefile: - ecrdsa objects are added. * crypto/asymmetric_keys/x509_cert_parser.c: - Recognize EC-RDSA and Streebog OIDs. * include/linux/oid_registry.h: - EC-RDSA OIDs are added to the enum. Also, a two currently not implemented curve OIDs are added for possible extension later (to not change numbering and grouping). * crypto/ecc.c: - Kenneth MacKay copyright date is updated to 2014, because vli_mmod_slow, ecc_point_add, ecc_point_mult_shamir are based on his code from micro-ecc. - Functions needed for ecrdsa are EXPORT_SYMBOL'ed. - New functions: vli_is_negative - helper to determine sign of vli; vli_from_be64 - unpack big-endian array into vli (used for a signature); vli_from_le64 - unpack little-endian array into vli (used for a public key); vli_uadd, vli_usub - add/sub u64 value to/from vli (used for increment/decrement); mul_64_64 - optimized to use __int128 where appropriate, this speeds up point multiplication (and as a consequence signature verification) by the factor of 1.5-2; vli_umult - multiply vli by a small value (speeds up point multiplication by another factor of 1.5-2, depending on vli sizes); vli_mmod_special - module reduction for some form of Pseudo-Mersenne primes (used for the curves A); vli_mmod_special2 - module reduction for another form of Pseudo-Mersenne primes (used for the curves B); vli_mmod_barrett - module reduction using pre-computed value (used for the curve C); vli_mmod_slow - more general module reduction which is much slower (used when the modulus is subgroup order); vli_mod_mult_slow - modular multiplication; ecc_point_add - add two points; ecc_point_mult_shamir - add two points multiplied by scalars in one combined multiplication (this gives speed up by another factor 2 in compare to two separate multiplications). ecc_is_pubkey_valid_partial - additional samity check is added. - Updated vli_mmod_fast with non-strict heuristic to call optimal module reduction function depending on the prime value; - All computations for the previously defined (two NIST) curves should not unaffected. * crypto/ecc.h: - Newly exported functions are documented. * crypto/ecrdsa_defs.h - Five curves are defined. * crypto/ecrdsa.c: - Signature verification is implemented. * crypto/ecrdsa_params.asn1, crypto/ecrdsa_pub_key.asn1: - Templates for BER decoder for EC-RDSA parameters and public key. Cc: linux-integrity@vger.kernel.org Signed-off-by: Vitaly Chikunov <vt@altlinux.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-04-18	crypto: ecc - make ecc into separate module	Vitaly Chikunov	1	-1/+1
	ecc.c have algorithms that could be used togeter by ecdh and ecrdsa. Make it separate module. Add CRYPTO_ECC into Kconfig. EXPORT_SYMBOL and document to what seems appropriate. Move structs ecc_point and ecc_curve from ecc_curve_defs.h into ecc.h. No code changes. Signed-off-by: Vitaly Chikunov <vt@altlinux.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-03-07	lib/lzo: separate lzo-rle from lzo	Dave Rodgman	1	-1/+1
	To prevent any issues with persistent data, separate lzo-rle from lzo so that it is treated as a separate algorithm, and lzo is still available. Link: http://lkml.kernel.org/r/20190205155944.16007-3-dave.rodgman@arm.com Signed-off-by: Dave Rodgman <dave.rodgman@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Markus F.X.J. Oberhumer <markus@oberhumer.com> Cc: Matt Sealey <matt.sealey@arm.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <nitingupta910@gmail.com> Cc: Richard Purdie <rpurdie@openedhand.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Sonny Rao <sonnyrao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-07	crypto: user - made crypto_user_stat optional	Corentin Labbe	1	-1/+2
	Even if CRYPTO_STATS is set to n, some part of CRYPTO_STATS are compiled. This patch made all part of crypto_user_stat uncompiled in that case. Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-11-20	crypto: adiantum - add Adiantum support	Eric Biggers	1	-0/+1
	Add support for the Adiantum encryption mode. Adiantum was designed by Paul Crowley and is specified by our paper: Adiantum: length-preserving encryption for entry-level processors (https://eprint.iacr.org/2018/720.pdf) See our paper for full details; this patch only provides an overview. Adiantum is a tweakable, length-preserving encryption mode designed for fast and secure disk encryption, especially on CPUs without dedicated crypto instructions. Adiantum encrypts each sector using the XChaCha12 stream cipher, two passes of an ε-almost-∆-universal (εA∆U) hash function, and an invocation of the AES-256 block cipher on a single 16-byte block. On CPUs without AES instructions, Adiantum is much faster than AES-XTS; for example, on ARM Cortex-A7, on 4096-byte sectors Adiantum encryption is about 4 times faster than AES-256-XTS encryption, and decryption about 5 times faster. Adiantum is a specialization of the more general HBSH construction. Our earlier proposal, HPolyC, was also a HBSH specialization, but it used a different εA∆U hash function, one based on Poly1305 only. Adiantum's εA∆U hash function, which is based primarily on the "NH" hash function like that used in UMAC (RFC4418), is about twice as fast as HPolyC's; consequently, Adiantum is about 20% faster than HPolyC. This speed comes with no loss of security: Adiantum is provably just as secure as HPolyC, in fact slightly more secure. Like HPolyC, Adiantum's security is reducible to that of XChaCha12 and AES-256, subject to a security bound. XChaCha12 itself has a security reduction to ChaCha12. Therefore, one need not "trust" Adiantum; one need only trust ChaCha12 and AES-256. Note that the εA∆U hash function is only used for its proven combinatorical properties so cannot be "broken". Adiantum is also a true wide-block encryption mode, so flipping any plaintext bit in the sector scrambles the entire ciphertext, and vice versa. No other such mode is available in the kernel currently; doing the same with XTS scrambles only 16 bytes. Adiantum also supports arbitrary-length tweaks and naturally supports any length input >= 16 bytes without needing "ciphertext stealing". For the stream cipher, Adiantum uses XChaCha12 rather than XChaCha20 in order to make encryption feasible on the widest range of devices. Although the 20-round variant is quite popular, the best known attacks on ChaCha are on only 7 rounds, so ChaCha12 still has a substantial security margin; in fact, larger than AES-256's. 12-round Salsa20 is also the eSTREAM recommendation. For the block cipher, Adiantum uses AES-256, despite it having a lower security margin than XChaCha12 and needing table lookups, due to AES's extensive adoption and analysis making it the obvious first choice. Nevertheless, for flexibility this patch also permits the "adiantum" template to be instantiated with XChaCha20 and/or with an alternate block cipher. We need Adiantum support in the kernel for use in dm-crypt and fscrypt, where currently the only other suitable options are block cipher modes such as AES-XTS. A big problem with this is that many low-end mobile devices (e.g. Android Go phones sold primarily in developing countries, as well as some smartwatches) still have CPUs that lack AES instructions, e.g. ARM Cortex-A7. Sadly, AES-XTS encryption is much too slow to be viable on these devices. We did find that some "lightweight" block ciphers are fast enough, but these suffer from problems such as not having much cryptanalysis or being too controversial. The ChaCha stream cipher has excellent performance but is insecure to use directly for disk encryption, since each sector's IV is reused each time it is overwritten. Even restricting the threat model to offline attacks only isn't enough, since modern flash storage devices don't guarantee that "overwrites" are really overwrites, due to wear-leveling. Adiantum avoids this problem by constructing a "tweakable super-pseudorandom permutation"; this is the strongest possible security model for length-preserving encryption. Of course, storing random nonces along with the ciphertext would be the ideal solution. But doing that with existing hardware and filesystems runs into major practical problems; in most cases it would require data journaling (like dm-integrity) which severely degrades performance. Thus, for now length-preserving encryption is still needed. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-11-20	crypto: nhpoly1305 - add NHPoly1305 support	Eric Biggers	1	-0/+1
	Add a generic implementation of NHPoly1305, an ε-almost-∆-universal hash function used in the Adiantum encryption mode. CONFIG_NHPOLY1305 is not selectable by itself since there won't be any real reason to enable it without also enabling Adiantum support. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-11-20	crypto: chacha20-generic - refactor to allow varying number of rounds	Eric Biggers	1	-1/+1
	In preparation for adding XChaCha12 support, rename/refactor chacha20-generic to support different numbers of rounds. The justification for needing XChaCha12 support is explained in more detail in the patch "crypto: chacha - add XChaCha12 support". The only difference between ChaCha{8,12,20} are the number of rounds itself; all other parts of the algorithm are the same. Therefore, remove the "20" from all definitions, structures, functions, files, etc. that will be shared by all ChaCha versions. Also make ->setkey() store the round count in the chacha_ctx (previously chacha20_ctx). The generic code then passes the round count through to chacha_block(). There will be a ->setkey() function for each explicitly allowed round count; the encrypt/decrypt functions will be the same. I decided not to do it the opposite way (same ->setkey() function for all round counts, with different encrypt/decrypt functions) because that would have required more boilerplate code in architecture-specific implementations of ChaCha and XChaCha. Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Martin Willi <martin@strongswan.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-11-16	crypto: streebog - add Streebog hash function	Vitaly Chikunov	1	-0/+1
	Add GOST/IETF Streebog hash function (GOST R 34.11-2012, RFC 6986) generic hash transformation. Cc: linux-integrity@vger.kernel.org Signed-off-by: Vitaly Chikunov <vt@altlinux.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-09-28	crypto: ofb - add output feedback mode	Gilad Ben-Yossef	1	-0/+1
	Add a generic version of output feedback mode. We already have support of several hardware based transformations of this mode and the needed test vectors but we somehow missed adding a generic software one. Fix this now. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-09-28	crypto: user - Implement a generic crypto statistics	Corentin Labbe	1	-0/+1
	This patch implement a generic way to get statistics about all crypto usages. Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-09-04	crypto: x86 - remove SHA multibuffer routines and mcryptd	Ard Biesheuvel	1	-1/+0
	As it turns out, the AVX2 multibuffer SHA routines are currently broken [0], in a way that would have likely been noticed if this code were in wide use. Since the code is too complicated to be maintained by anyone except the original authors, and since the performance benefits for real-world use cases are debatable to begin with, it is better to drop it entirely for the moment. [0] https://marc.info/?l=linux-crypto-vger&m=153476243825350&w=2 Suggested-by: Eric Biggers <ebiggers@google.com> Cc: Megha Dey <megha.dey@linux.intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-09-04	crypto: speck - remove Speck	Jason A. Donenfeld	1	-1/+0
	These are unused, undesired, and have never actually been used by anybody. The original authors of this code have changed their mind about its inclusion. While originally proposed for disk encryption on low-end devices, the idea was discarded [1] in favor of something else before that could really get going. Therefore, this patch removes Speck. [1] https://marc.info/?l=linux-crypto-vger&m=153359499015659 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Eric Biggers <ebiggers@google.com> Cc: stable@vger.kernel.org Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-05-31	crypto: morus - Mark MORUS SIMD glue as x86-specific	Ondrej Mosnacek	1	-2/+0
	Commit 56e8e57fc3a7 ("crypto: morus - Add common SIMD glue code for MORUS") accidetally consiedered the glue code to be usable by different architectures, but it seems to be only usable on x86. This patch moves it under arch/x86/crypto and adds 'depends on X86' to the Kconfig options and also removes the prompt to hide these internal options from the user. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-05-19	crypto: morus - Add common SIMD glue code for MORUS	Ondrej Mosnacek	1	-0/+2
	This patch adds a common glue code for optimized implementations of MORUS AEAD algorithms. Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-05-19	crypto: morus - Add generic MORUS AEAD implementations	Ondrej Mosnacek	1	-0/+2
	This patch adds the generic implementation of the MORUS family of AEAD algorithms (MORUS-640 and MORUS-1280). The original authors of MORUS are Hongjun Wu and Tao Huang. At the time of writing, MORUS is one of the finalists in CAESAR, an open competition intended to select a portfolio of alternatives to the problematic AES-GCM: https://competitions.cr.yp.to/caesar-submissions.html https://competitions.cr.yp.to/round3/morusv2.pdf Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-05-19	crypto: aegis - Add generic AEGIS AEAD implementations	Ondrej Mosnacek	1	-0/+3
	This patch adds the generic implementation of the AEGIS family of AEAD algorithms (AEGIS-128, AEGIS-128L, and AEGIS-256). The original authors of AEGIS are Hongjun Wu and Bart Preneel. At the time of writing, AEGIS is one of the finalists in CAESAR, an open competition intended to select a portfolio of alternatives to the problematic AES-GCM: https://competitions.cr.yp.to/caesar-submissions.html https://competitions.cr.yp.to/round3/aegisv11.pdf Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-04-21	crypto: zstd - Add zstd support	Nick Terrell	1	-0/+1
	Adds zstd support to crypto and scompress. Only supports the default level. Previously we held off on this patch, since there weren't any users. Now zram is ready for zstd support, but depends on CONFIG_CRYPTO_ZSTD, which isn't defined until this patch is in. I also see a patch adding zstd to pstore [0], which depends on crypto zstd. [0] lkml.kernel.org/r/9c9416b2dff19f05fb4c35879aaa83d11ff72c92.1521626182.git.geliangtang@gmail.com Signed-off-by: Nick Terrell <terrelln@fb.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-04-07	kbuild: rename -asn1.[ch] to .asn1.[ch]	Masahiro Yamada	1	-5/+5
	Our convention is to distinguish file types by suffixes with a period as a separator. -asn1.[ch] is a different pattern from other generated sources such as .lex.c, .tab.[ch], .dtb.S, etc. More confusing, files with '-asn1.[ch]' are generated files, but '_asn1.[ch]' are checked-in files: net/netfilter/nf_conntrack_h323_asn1.c include/linux/netfilter/nf_conntrack_h323_asn1.h include/linux/sunrpc/gss_asn1.h Rename generated files to *.asn1.[ch] for consistency. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-04-07	kbuild: clean up *-asn1.[ch] patterns from top-level Makefile	Masahiro Yamada	1	-2/+0
	Clean up these patterns from the top Makefile to omit 'clean-files' in each Makefile. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-03-16	crypto: sm4 - introduce SM4 symmetric cipher algorithm	Gilad Ben-Yossef	1	-0/+1
	Introduce the SM4 cipher algorithms (OSCCA GB/T 32907-2016). SM4 (GBT.32907-2016) is a cryptographic standard issued by the Organization of State Commercial Administration of China (OSCCA) as an authorized cryptographic algorithms for the use within China. SMS4 was originally created for use in protecting wireless networks, and is mandated in the Chinese National Standard for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure) (GB.15629.11-2003). Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-09	crypto: cfb - add support for Cipher FeedBack mode	James Bottomley	1	-0/+1
	TPM security routines require encryption and decryption with AES in CFB mode, so add it to the Linux Crypto schemes. CFB is basically a one time pad where the pad is generated initially from the encrypted IV and then subsequently from the encrypted previous block of ciphertext. The pad is XOR'd into the plain text to get the final ciphertext. https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#CFB Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-03-03	crypto: ablk_helper - remove ablk_helper	Eric Biggers	1	-1/+0
	All users of ablk_helper have been converted over to crypto_simd, so remove ablk_helper. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-02-22	crypto: speck - add support for the Speck block cipher	Eric Biggers	1	-0/+1
	Add a generic implementation of Speck, including the Speck128 and Speck64 variants. Speck is a lightweight block cipher that can be much faster than AES on processors that don't have AES instructions. We are planning to offer Speck-XTS (probably Speck128/256-XTS) as an option for dm-crypt and fscrypt on Android, for low-end mobile devices with older CPUs such as ARMv7 which don't have the Cryptography Extensions. Currently, such devices are unencrypted because AES is not fast enough, even when the NEON bit-sliced implementation of AES is used. Other AES alternatives such as Twofish, Threefish, Camellia, CAST6, and Serpent aren't fast enough either; it seems that only a modern ARX cipher can provide sufficient performance on these devices. This is a replacement for our original proposal (https://patchwork.kernel.org/patch/10101451/) which was to offer ChaCha20 for these devices. However, the use of a stream cipher for disk/file encryption with no space to store nonces would have been much more insecure than we thought initially, given that it would be used on top of flash storage as well as potentially on top of F2FS, neither of which is guaranteed to overwrite data in-place. Speck has been somewhat controversial due to its origin. Nevertheless, it has a straightforward design (it's an ARX cipher), and it appears to be the leading software-optimized lightweight block cipher currently, with the most cryptanalysis. It's also easy to implement without side channels, unlike AES. Moreover, we only intend Speck to be used when the status quo is no encryption, due to AES not being fast enough. We've also considered a novel length-preserving encryption mode based on ChaCha20 and Poly1305. While theoretically attractive, such a mode would be a brand new crypto construction and would be more complicated and difficult to implement efficiently in comparison to Speck-XTS. There is confusion about the byte and word orders of Speck, since the original paper doesn't specify them. But we have implemented it using the orders the authors recommended in a correspondence with them. The test vectors are taken from the original paper but were mapped to byte arrays using the recommended byte and word orders. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-01-20	crypto: aes-generic - fix aes-generic regression on powerpc	Arnd Bergmann	1	-1/+1
	My last bugfix added -Os on the command line, which unfortunately caused a build regression on powerpc in some configurations. I've done some more analysis of the original problem and found slightly different workaround that avoids this regression and also results in better performance on gcc-7.0: -fcode-hoisting is an optimization step that got added in gcc-7 and that for all gcc-7 versions causes worse performance. This disables -fcode-hoisting on all compilers that understand the option. For gcc-7.1 and 7.2 I found the same performance as my previous patch (using -Os), in gcc-7.0 it was even better. On gcc-8 I could see no change in performance from this patch. In theory, code hoisting should not be able make things better for the AES cipher, so leaving it disabled for gcc-8 only serves to simplify the Makefile change. Reported-by: kbuild test robot <fengguang.wu@intel.com> Link: https://www.mail-archive.com/linux-crypto@vger.kernel.org/msg30418.html Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651 Fixes: 148b974deea9 ("crypto: aes-generic - build with -Os on gcc-7+") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-01-12	crypto: aes-generic - build with -Os on gcc-7+	Arnd Bergmann	1	-0/+1
	While testing other changes, I discovered that gcc-7.2.1 produces badly optimized code for aes_encrypt/aes_decrypt. This is especially true when CONFIG_UBSAN_SANITIZE_ALL is enabled, where it leads to extremely large stack usage that in turn might cause kernel stack overflows: crypto/aes_generic.c: In function 'aes_encrypt': crypto/aes_generic.c:1371:1: warning: the frame size of 4880 bytes is larger than 2048 bytes [-Wframe-larger-than=] crypto/aes_generic.c: In function 'aes_decrypt': crypto/aes_generic.c:1441:1: warning: the frame size of 4864 bytes is larger than 2048 bytes [-Wframe-larger-than=] I verified that this problem exists on all architectures that are supported by gcc-7.2, though arm64 in particular is less affected than the others. I also found that gcc-7.1 and gcc-8 do not show the extreme stack usage but still produce worse code than earlier versions for this file, apparently because of optimization passes that generally provide a substantial improvement in object code quality but understandably fail to find any shortcuts in the AES algorithm. Possible workarounds include a) disabling -ftree-pre and -ftree-sra optimizations, this was an earlier patch I tried, which reliably fixed the stack usage, but caused a serious performance regression in some versions, as later testing found. b) disabling UBSAN on this file or all ciphers, as suggested by Ard Biesheuvel. This would lead to massively better crypto performance in UBSAN-enabled kernels and avoid the stack usage, but there is a concern over whether we should exclude arbitrary files from UBSAN at all. c) Forcing the optimization level in a different way. Similar to a), but rather than deselecting specific optimization stages, this now uses "gcc -Os" for this file, regardless of the CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE/SIZE option. This is a reliable workaround for the stack consumption on all architecture, and I've retested the performance results now on x86, cycles/byte (lower is better) for cbc(aes-generic) with 256 bit keys: -O2 -Os gcc-6.3.1 14.9 15.1 gcc-7.0.1 14.7 15.3 gcc-7.1.1 15.3 14.7 gcc-7.2.1 16.8 15.9 gcc-8.0.0 15.5 15.6 This implements the option c) by enabling forcing -Os on all compiler versions starting with gcc-7.1. As a workaround for PR83356, it would only be needed for gcc-7.2+ with UBSAN enabled, but since it also shows better performance on gcc-7.1 without UBSAN, it seems appropriate to use the faster version here as well. Side note: during testing, I also played with the AES code in libressl, which had a similar performance regression from gcc-6 to gcc-7.2, but was three times slower overall. It might be interesting to investigate that further and possibly port the Linux implementation into that. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651 Cc: Richard Biener <rguenther@suse.de> Cc: Jakub Jelinek <jakub@gcc.gnu.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-11-14	Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6	Linus Torvalds	1	-0/+1
	Pull crypto updates from Herbert Xu: "Here is the crypto update for 4.15: API: - Disambiguate EBUSY when queueing crypto request by adding ENOSPC. This change touches code outside the crypto API. - Reset settings when empty string is written to rng_current. Algorithms: - Add OSCCA SM3 secure hash. Drivers: - Remove old mv_cesa driver (replaced by marvell/cesa). - Enable rfc3686/ecb/cfb/ofb AES in crypto4xx. - Add ccm/gcm AES in crypto4xx. - Add support for BCM7278 in iproc-rng200. - Add hash support on Exynos in s5p-sss. - Fix fallback-induced error in vmx. - Fix output IV in atmel-aes. - Fix empty GCM hash in mediatek. Others: - Fix DoS potential in lib/mpi. - Fix potential out-of-order issues with padata" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (162 commits) lib/mpi: call cond_resched() from mpi_powm() loop crypto: stm32/hash - Fix return issue on update crypto: dh - Remove pointless checks for NULL 'p' and 'g' crypto: qat - Clean up error handling in qat_dh_set_secret() crypto: dh - Don't permit 'key' or 'g' size longer than 'p' crypto: dh - Don't permit 'p' to be 0 crypto: dh - Fix double free of ctx->p hwrng: iproc-rng200 - Add support for BCM7278 dt-bindings: rng: Document BCM7278 RNG200 compatible crypto: chcr - Replace _manual_ swap with swap macro crypto: marvell - Add a NULL entry at the end of mv_cesa_plat_id_table[] hwrng: virtio - Virtio RNG devices need to be re-registered after suspend/resume crypto: atmel - remove empty functions crypto: ecdh - remove empty exit() MAINTAINERS: update maintainer for qat crypto: caam - remove unused param of ctx_map_to_sec4_sg() crypto: caam - remove unneeded edesc zeroization crypto: atmel-aes - Reset the controller before each use crypto: atmel-aes - properly set IV after {en,de}crypt hwrng: core - Reset user selected rng by writing "" to rng_current ...
2017-11-02	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	Greg Kroah-Hartman	1	-0/+1
	Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a /uapi/ one with no licensing information in it, - file was a /uapi/ one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non /uapi/ files that summary was: SPDX license identifier # files ---------------------------------------------------\|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a /uapi/ path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------\|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the /uapi/ ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------\|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-22	crypto: sm3 - add OSCCA SM3 secure hash	Gilad Ben-Yossef	1	-0/+1
	Add OSCCA SM3 secure hash (OSCCA GM/T 0004-2012 SM3) generic hash transformation. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-06-10	crypto: ecdh - add privkey generation support	Tudor-Dan Ambarus	1	-4/+5
	Add support for generating ecc private keys. Generation of ecc private keys is helpful in a user-space to kernel ecdh offload because the keys are not revealed to user-space. Private key generation is also helpful to implement forward secrecy. If the user provides a NULL ecc private key, the kernel will generate it and further use it for ecdh. Move ecdh's object files below drbg's. drbg must be present in the kernel at the time of calling. Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Reviewed-by: Stephan Müller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-11	crypto: improve gcc optimization flags for serpent and wp512	Arnd Bergmann	1	-0/+2
	An ancient gcc bug (first reported in 2003) has apparently resurfaced on MIPS, where kernelci.org reports an overly large stack frame in the whirlpool hash algorithm: crypto/wp512.c:987:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=] With some testing in different configurations, I'm seeing large variations in stack frames size up to 1500 bytes for what should have around 300 bytes at most. I also checked the reference implementation, which is essentially the same code but also comes with some test and benchmarking infrastructure. It seems that recent compiler versions on at least arm, arm64 and powerpc have a partial fix for this problem, but enabling "-fsched-pressure", but even with that fix they suffer from the issue to a certain degree. Some testing on arm64 shows that the time needed to hash a given amount of data is roughly proportional to the stack frame size here, which makes sense given that the wp512 implementation is doing lots of loads for table lookups, and the problem with the overly large stack is a result of doing a lot more loads and stores for spilled registers (as seen from inspecting the object code). Disabling -fschedule-insns consistently fixes the problem for wp512, in my collection of cross-compilers, the results are consistently better or identical when comparing the stack sizes in this function, though some architectures (notable x86) have schedule-insns disabled by default. The four columns are: default: -O2 press: -O2 -fsched-pressure nopress: -O2 -fschedule-insns -fno-sched-pressure nosched: -O2 -no-schedule-insns (disables sched-pressure) default press nopress nosched alpha-linux-gcc-4.9.3 1136 848 1136 176 am33_2.0-linux-gcc-4.9.3 2100 2076 2100 2104 arm-linux-gnueabi-gcc-4.9.3 848 848 1048 352 cris-linux-gcc-4.9.3 272 272 272 272 frv-linux-gcc-4.9.3 1128 1000 1128 280 hppa64-linux-gcc-4.9.3 1128 336 1128 184 hppa-linux-gcc-4.9.3 644 308 644 276 i386-linux-gcc-4.9.3 352 352 352 352 m32r-linux-gcc-4.9.3 720 656 720 268 microblaze-linux-gcc-4.9.3 1108 604 1108 256 mips64-linux-gcc-4.9.3 1328 592 1328 208 mips-linux-gcc-4.9.3 1096 624 1096 240 powerpc64-linux-gcc-4.9.3 1088 432 1088 160 powerpc-linux-gcc-4.9.3 1080 584 1080 224 s390-linux-gcc-4.9.3 456 456 624 360 sh3-linux-gcc-4.9.3 292 292 292 292 sparc64-linux-gcc-4.9.3 992 240 992 208 sparc-linux-gcc-4.9.3 680 592 680 312 x86_64-linux-gcc-4.9.3 224 240 272 224 xtensa-linux-gcc-4.9.3 1152 704 1152 304 aarch64-linux-gcc-7.0.0 224 224 1104 208 arm-linux-gnueabi-gcc-7.0.1 824 824 1048 352 mips-linux-gcc-7.0.0 1120 648 1120 272 x86_64-linux-gcc-7.0.1 240 240 304 240 arm-linux-gnueabi-gcc-4.4.7 840 392 arm-linux-gnueabi-gcc-4.5.4 784 728 784 320 arm-linux-gnueabi-gcc-4.6.4 736 728 736 304 arm-linux-gnueabi-gcc-4.7.4 944 784 944 352 arm-linux-gnueabi-gcc-4.8.5 464 464 760 352 arm-linux-gnueabi-gcc-4.9.3 848 848 1048 352 arm-linux-gnueabi-gcc-5.3.1 824 824 1064 336 arm-linux-gnueabi-gcc-6.1.1 808 808 1056 344 arm-linux-gnueabi-gcc-7.0.1 824 824 1048 352 Trying the same test for serpent-generic, the picture is a bit different, and while -fno-schedule-insns is generally better here than the default, -fsched-pressure wins overall, so I picked that instead. default press nopress nosched alpha-linux-gcc-4.9.3 1392 864 1392 960 am33_2.0-linux-gcc-4.9.3 536 524 536 528 arm-linux-gnueabi-gcc-4.9.3 552 552 776 536 cris-linux-gcc-4.9.3 528 528 528 528 frv-linux-gcc-4.9.3 536 400 536 504 hppa64-linux-gcc-4.9.3 524 208 524 480 hppa-linux-gcc-4.9.3 768 472 768 508 i386-linux-gcc-4.9.3 564 564 564 564 m32r-linux-gcc-4.9.3 712 576 712 532 microblaze-linux-gcc-4.9.3 724 392 724 512 mips64-linux-gcc-4.9.3 720 384 720 496 mips-linux-gcc-4.9.3 728 384 728 496 powerpc64-linux-gcc-4.9.3 704 304 704 480 powerpc-linux-gcc-4.9.3 704 296 704 480 s390-linux-gcc-4.9.3 560 560 592 536 sh3-linux-gcc-4.9.3 540 540 540 540 sparc64-linux-gcc-4.9.3 544 352 544 496 sparc-linux-gcc-4.9.3 544 344 544 496 x86_64-linux-gcc-4.9.3 528 536 576 528 xtensa-linux-gcc-4.9.3 752 544 752 544 aarch64-linux-gcc-7.0.0 432 432 656 480 arm-linux-gnueabi-gcc-7.0.1 616 616 808 536 mips-linux-gcc-7.0.0 720 464 720 488 x86_64-linux-gcc-7.0.1 536 528 600 536 arm-linux-gnueabi-gcc-4.4.7 592 440 arm-linux-gnueabi-gcc-4.5.4 776 448 776 544 arm-linux-gnueabi-gcc-4.6.4 776 448 776 544 arm-linux-gnueabi-gcc-4.7.4 768 448 768 544 arm-linux-gnueabi-gcc-4.8.5 488 488 776 544 arm-linux-gnueabi-gcc-4.9.3 552 552 776 536 arm-linux-gnueabi-gcc-5.3.1 552 552 776 536 arm-linux-gnueabi-gcc-6.1.1 560 560 776 536 arm-linux-gnueabi-gcc-7.0.1 616 616 808 536 I did not do any runtime tests with serpent, so it is possible that stack frame size does not directly correlate with runtime performance here and it actually makes things worse, but it's more likely to help here, and the reduced stack frame size is probably enough reason to apply the patch, especially given that the crypto code is often used in deep call chains. Link: https://kernelci.org/build/id/58797d7559b5149efdf6c3a9/logs/ Link: http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11488 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149 Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-02-11	crypto: aes - add generic time invariant AES cipher	Ard Biesheuvel	1	-0/+1
	Lookup table based AES is sensitive to timing attacks, which is due to the fact that such table lookups are data dependent, and the fact that 8 KB worth of tables covers a significant number of cachelines on any architecture, resulting in an exploitable correlation between the key and the processing time for known plaintexts. For network facing algorithms such as CTR, CCM or GCM, this presents a security risk, which is why arch specific AES ports are typically time invariant, either through the use of special instructions, or by using SIMD algorithms that don't rely on table lookups. For generic code, this is difficult to achieve without losing too much performance, but we can improve the situation significantly by switching to an implementation that only needs 256 bytes of table data (the actual S-box itself), which can be prefetched at the start of each block to eliminate data dependent latencies. This code encrypts at ~25 cycles per byte on ARM Cortex-A57 (while the ordinary generic AES driver manages 18 cycles per byte on this hardware). Decryption is substantially slower. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-11-30	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6	Herbert Xu	1	-0/+1
	Merge the crypto tree to pull in chelsio chcr fix.
2016-11-30	crypto: rsa - Add Makefile dependencies to fix parallel builds	David Michael	1	-0/+1
	Both asn1 headers are included by rsa_helper.c, so rsa_helper.o should explicitly depend on them. Signed-off-by: David Michael <david.michael@coreos.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-11-28	crypto: simd - Add simd skcipher helper	Herbert Xu	1	-0/+2
	This patch adds the simd skcipher helper which is meant to be a replacement for ablk helper. It replaces the underlying blkcipher interface with skcipher, and also presents the top-level algorithm as an skcipher. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-11-01	crypto: acomp - fix dependency in Makefile	Giovanni Cabiddu	1	-2/+3
	Fix dependency between acomp and scomp that appears when acomp is built as module Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-10-25	crypto: acomp - add driver-side scomp interface	Giovanni Cabiddu	1	-0/+1
	Add a synchronous back-end (scomp) to acomp. This allows to easily expose the already present compression algorithms in LKCF via acomp. Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>