linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2018-08-25	crypto: caam/qi - fix error path in xts setkey	Horia Geantă	1	-4/+2
	xts setkey callback returns 0 on some error paths. Fix this by returning -EINVAL. Cc: <stable@vger.kernel.org> # 4.12+ Fixes: b189817cf789 ("crypto: caam/qi - add ablkcipher and authenc algorithms") Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-25	crypto: caam/jr - fix descriptor DMA unmapping	Horia Geantă	1	-1/+2
	Descriptor address needs to be swapped to CPU endianness before being DMA unmapped. Cc: <stable@vger.kernel.org> # 4.8+ Fixes: 261ea058f016 ("crypto: caam - handle core endianness != caam endianness") Reported-by: Laurentiu Tudor <laurentiu.tudor@nxp.com> Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07	crypto: arm64/ghash-ce - implement 4-way aggregation	Ard Biesheuvel	2	-51/+142
	Enhance the GHASH implementation that uses 64-bit polynomial multiplication by adding support for 4-way aggregation. This more than doubles the performance, from 2.4 cycles per byte to 1.1 cpb on Cortex-A53. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07	crypto: arm64/ghash-ce - replace NEON yield check with block limit	Ard Biesheuvel	2	-32/+23
	Checking the TIF_NEED_RESCHED flag is disproportionately costly on cores with fast crypto instructions and comparatively slow memory accesses. On algorithms such as GHASH, which executes at ~1 cycle per byte on cores that implement support for 64 bit polynomial multiplication, there is really no need to check the TIF_NEED_RESCHED particularly often, and so we can remove the NEON yield check from the assembler routines. However, unlike the AEAD or skcipher APIs, the shash/ahash APIs take arbitrary input lengths, and so there needs to be some sanity check to ensure that we don't hog the CPU for excessive amounts of time. So let's simply cap the maximum input size that is processed in one go to 64 KB. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07	crypto: hisilicon - sec_send_request() can be static	kbuild test robot	1	-1/+1
	Fixes: 915e4e8413da ("crypto: hisilicon - SEC security accelerator driver") Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07	lib/mpi: remove redundant variable esign	Colin Ian King	1	-2/+1
	Variable esign is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'esign' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07	crypto: arm64/aes-ce-gcm - don't reload key schedule if avoidable	Ard Biesheuvel	2	-41/+49
	Squeeze out another 5% of performance by minimizing the number of invocations of kernel_neon_begin()/kernel_neon_end() on the common path, which also allows some reloads of the key schedule to be optimized away. The resulting code runs at 2.3 cycles per byte on a Cortex-A53. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07	crypto: arm64/aes-ce-gcm - implement 2-way aggregation	Ard Biesheuvel	2	-68/+52
	Implement a faster version of the GHASH transform which amortizes the reduction modulo the characteristic polynomial across two input blocks at a time. On a Cortex-A53, the gcm(aes) performance increases 24%, from 3.0 cycles per byte to 2.4 cpb for large input sizes. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07	crypto: arm64/aes-ce-gcm - operate on two input blocks at a time	Ard Biesheuvel	2	-69/+161
	Update the core AES/GCM transform and the associated plumbing to operate on 2 AES/GHASH blocks at a time. By itself, this is not expected to result in a noticeable speedup, but it paves the way for reimplementing the GHASH component using 2-way aggregation. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-07	crypto: arm64 - revert NEON yield for fast AEAD implementations	Ard Biesheuvel	2	-146/+80
	As it turns out, checking the TIF_NEED_RESCHED flag after each iteration results in a significant performance regression (~10%) when running fast algorithms (i.e., ones that use special instructions and operate in the < 4 cycles per byte range) on in-order cores with comparatively slow memory accesses such as the Cortex-A53. Given the speed of these ciphers, and the fact that the page based nature of the AEAD scatterwalk API guarantees that the core NEON transform is never invoked with more than a single page's worth of input, we can estimate the worst case duration of any resulting scheduling blackout: on a 1 GHz Cortex-A53 running with 64k pages, processing a page's worth of input at 4 cycles per byte results in a delay of ~250 us, which is a reasonable upper bound. So let's remove the yield checks from the fused AES-CCM and AES-GCM routines entirely. This reverts commit 7b67ae4d5ce8e2f912377f5fbccb95811a92097f and partially reverts commit 7c50136a8aba8784f07fb66a950cc61a7f3d2ee3. Fixes: 7c50136a8aba ("crypto: arm64/aes-ghash - yield NEON after every ...") Fixes: 7b67ae4d5ce8 ("crypto: arm64/aes-ccm - yield NEON after every ...") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-03	crypto: dh - make crypto_dh_encode_key() make robust	Eric Biggers	1	-14/+16
	Make it return -EINVAL if crypto_dh_key_len() is incorrect rather than overflowing the buffer. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-03	crypto: dh - fix calculating encoded key size	Eric Biggers	2	-7/+7
	It was forgotten to increase DH_KPP_SECRET_MIN_SIZE to include 'q_size', causing an out-of-bounds write of 4 bytes in crypto_dh_encode_key(), and an out-of-bounds read of 4 bytes in crypto_dh_decode_key(). Fix it, and fix the lengths of the test vectors to match this. Reported-by: syzbot+6d38d558c25b53b8f4ed@syzkaller.appspotmail.com Fixes: e3fe0ae12962 ("crypto: dh - add public key verification test") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-03	crypto: ccp - Check for NULL PSP pointer at module unload	Tom Lendacky	1	-0/+3
	Should the PSP initialization fail, the PSP data structure will be freed and the value contained in the sp_device struct set to NULL. At module unload, psp_dev_destroy() does not check if the pointer value is NULL and will end up dereferencing a NULL pointer. Add a pointer check of the psp_data field in the sp_device struct in psp_dev_destroy() and return immediately if it is NULL. Cc: <stable@vger.kernel.org> # 4.16.x- Fixes: 2a6170dfe755 ("crypto: ccp: Add Platform Security Processor (PSP) device support") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-03	crypto: arm/chacha20 - always use vrev for 16-bit rotates	Eric Biggers	1	-6/+4
	The 4-way ChaCha20 NEON code implements 16-bit rotates with vrev32.16, but the one-way code (used on remainder blocks) implements it with vshl + vsri, which is slower. Switch the one-way code to vrev32.16 too. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-03	crypto: ccree - allow bigger than sector XTS op	Gilad Ben-Yossef	1	-4/+1
	The ccree driver had a sanity check that we are not asked to encrypt an XTS buffer bigger than a sane sector size since XTS IV needs to include the sector number in the IV so this is not expected in any real use case. Unfortunately, this breaks cryptsetup benchmark test which has a synthetic performance test using 64k buffer of data with the same IV. Remove the sanity check and allow the user to hang themselves and/or run benchmarks if they so wish. Reported-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-03	crypto: ccree - zero all of request ctx before use	Gilad Ben-Yossef	1	-3/+3
	In certain error path req_ctx->iv was being freed despite not being allocated because it was not initialized to NULL. Rather than play whack a mole with the structure various field, zero it before use. This fixes a kernel panic that may occur if an invalid buffer size was requested triggering the bug above. Fixes: 63ee04c8b491 ("crypto: ccree - add skcipher support") Reported-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-03	crypto: ccree - remove cipher ivgen left overs	Gilad Ben-Yossef	3	-18/+2
	IV generation is not available via the skcipher interface. Remove the left over support of it from the ablkcipher days. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-03	crypto: ccree - drop useless type flag during reg	Gilad Ben-Yossef	3	-30/+1
	Drop the explicit setting of CRYPTO_ALG_TYPE_AEAD or CRYPTO_ALG_TYPE_SKCIPHER flags during alg registration as they are set anyway by the framework. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-03	crypto: ablkcipher - fix crash flushing dcache in error path	Eric Biggers	1	-31/+26
	Like the skcipher_walk and blkcipher_walk cases: scatterwalk_done() is only meant to be called after a nonzero number of bytes have been processed, since scatterwalk_pagedone() will flush the dcache of the previous page. But in the error case of ablkcipher_walk_done(), e.g. if the input wasn't an integer number of blocks, scatterwalk_done() was actually called after advancing 0 bytes. This caused a crash ("BUG: unable to handle kernel paging request") during '!PageSlab(page)' on architectures like arm and arm64 that define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, provided that the input was page-aligned as in that case walk->offset == 0. Fix it by reorganizing ablkcipher_walk_done() to skip the scatterwalk_advance() and scatterwalk_done() if an error has occurred. Reported-by: Liu Chao <liuchao741@huawei.com> Fixes: bf06099db18a ("crypto: skcipher - Add ablkcipher_walk interfaces") Cc: <stable@vger.kernel.org> # v2.6.35+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-03	crypto: blkcipher - fix crash flushing dcache in error path	Eric Biggers	1	-28/+26
	Like the skcipher_walk case: scatterwalk_done() is only meant to be called after a nonzero number of bytes have been processed, since scatterwalk_pagedone() will flush the dcache of the previous page. But in the error case of blkcipher_walk_done(), e.g. if the input wasn't an integer number of blocks, scatterwalk_done() was actually called after advancing 0 bytes. This caused a crash ("BUG: unable to handle kernel paging request") during '!PageSlab(page)' on architectures like arm and arm64 that define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, provided that the input was page-aligned as in that case walk->offset == 0. Fix it by reorganizing blkcipher_walk_done() to skip the scatterwalk_advance() and scatterwalk_done() if an error has occurred. This bug was found by syzkaller fuzzing. Reproducer, assuming ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE: #include <linux/if_alg.h> #include <sys/socket.h> #include <unistd.h> int main() { struct sockaddr_alg addr = { .salg_type = "skcipher", .salg_name = "ecb(aes-generic)", }; char buffer[4096] __attribute__((aligned(4096))) = { 0 }; int fd; fd = socket(AF_ALG, SOCK_SEQPACKET, 0); bind(fd, (void *)&addr, sizeof(addr)); setsockopt(fd, SOL_ALG, ALG_SET_KEY, buffer, 16); fd = accept(fd, NULL, NULL); write(fd, buffer, 15); read(fd, buffer, 15); } Reported-by: Liu Chao <liuchao741@huawei.com> Fixes: 5cde0af2a982 ("[CRYPTO] cipher: Added block cipher type") Cc: <stable@vger.kernel.org> # v2.6.19+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-03	crypto: skcipher - fix crash flushing dcache in error path	Eric Biggers	1	-26/+27
	scatterwalk_done() is only meant to be called after a nonzero number of bytes have been processed, since scatterwalk_pagedone() will flush the dcache of the previous page. But in the error case of skcipher_walk_done(), e.g. if the input wasn't an integer number of blocks, scatterwalk_done() was actually called after advancing 0 bytes. This caused a crash ("BUG: unable to handle kernel paging request") during '!PageSlab(page)' on architectures like arm and arm64 that define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, provided that the input was page-aligned as in that case walk->offset == 0. Fix it by reorganizing skcipher_walk_done() to skip the scatterwalk_advance() and scatterwalk_done() if an error has occurred. This bug was found by syzkaller fuzzing. Reproducer, assuming ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE: #include <linux/if_alg.h> #include <sys/socket.h> #include <unistd.h> int main() { struct sockaddr_alg addr = { .salg_type = "skcipher", .salg_name = "cbc(aes-generic)", }; char buffer[4096] __attribute__((aligned(4096))) = { 0 }; int fd; fd = socket(AF_ALG, SOCK_SEQPACKET, 0); bind(fd, (void *)&addr, sizeof(addr)); setsockopt(fd, SOL_ALG, ALG_SET_KEY, buffer, 16); fd = accept(fd, NULL, NULL); write(fd, buffer, 15); read(fd, buffer, 15); } Reported-by: Liu Chao <liuchao741@huawei.com> Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface") Cc: <stable@vger.kernel.org> # v4.10+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-03	crypto: skcipher - remove unnecessary setting of walk->nbytes	Eric Biggers	1	-1/+0
	Setting 'walk->nbytes = walk->total' in skcipher_walk_first() doesn't make sense because actually walk->nbytes needs to be set to the length of the first step in the walk, which may be less than walk->total. This is done by skcipher_walk_next() which is called immediately afterwards. Also walk->nbytes was already set to 0 in skcipher_walk_skcipher(), which is a better default value in case it's forgotten to be set later. Therefore, remove the unnecessary assignment to walk->nbytes. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-03	crypto: scatterwalk - remove scatterwalk_samebuf()	Eric Biggers	1	-7/+0
	scatterwalk_samebuf() is never used. Remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-03	crypto: scatterwalk - remove 'chain' argument from scatterwalk_crypto_chain()	Eric Biggers	5	-13/+7
	All callers pass chain=0 to scatterwalk_crypto_chain(). Remove this unneeded parameter. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-03	crypto: skcipher - fix aligning block size in skcipher_copy_iv()	Eric Biggers	1	-1/+1
	The ALIGN() macro needs to be passed the alignment, not the alignmask (which is the alignment minus 1). Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface") Cc: <stable@vger.kernel.org> # v4.10+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-03	arm64: dts: hisi: add SEC crypto accelerator nodes for hip07 SoC	Jonathan Cameron	1	-0/+284
	Enable all 4 SEC units available on d05 boards. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-08-03	crypto: hisilicon - SEC security accelerator driver	Jonathan Cameron	8	-0/+2895
	This accelerator is found inside hisilicon hip06 and hip07 SoCs. Each instance provides a number of queues which feed a different number of backend acceleration units. The queues are operating in an out of order mode in the interests of throughput. The silicon does not do tracking of dependencies between multiple 'messages' or update of the IVs as appropriate for training. Hence where relevant we need to do this in software. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>