linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2019-07-26	crypto: arm64/aes-ce - switch to library version of key expansion routine	Ard Biesheuvel	2	-8/+11
	Switch to the new AES library that also provides an implementation of the AES key expansion routine. This removes the dependency on the generic AES cipher, allowing it to be omitted entirely in the future. While at it, remove some references to the table based arm64 version of AES and replace them with AES library calls as well. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: arm64/aes-neonbs - switch to library version of key expansion routine	Ard Biesheuvel	2	-4/+5
	Switch to the new AES library that also provides an implementation of the AES key expansion routine. This removes the dependency on the generic AES cipher, allowing it to be omitted entirely in the future. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: arm64/aes-ccm - switch to AES library	Ard Biesheuvel	2	-13/+7
	The CCM code calls directly into the scalar table based AES cipher for arm64 from the fallback path, and since this implementation is known to be non-time invariant, doing so from a time invariant SIMD cipher is a bit nasty. So let's switch to the AES library - this makes the code more robust, and drops the dependency on the generic AES cipher, allowing us to omit it entirely in the future. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: arm/aes-neonbs - switch to library version of key expansion routine	Ard Biesheuvel	2	-3/+3
	Switch to the new AES library that also provides an implementation of the AES key expansion routine. This removes the dependency on the generic AES cipher, allowing it to be omitted entirely in the future. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: arm64/ghash - switch to AES library	Ard Biesheuvel	2	-22/+11
	The GHASH code uses the generic AES key expansion routines, and calls directly into the scalar table based AES cipher for arm64 from the fallback path, and since this implementation is known to be non-time invariant, doing so from a time invariant SIMD cipher is a bit nasty. So let's switch to the AES library - this makes the code more robust, and drops the dependency on the generic AES cipher, allowing us to omit it entirely in the future. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: safexcel/aes - switch to library version of key expansion routine	Ard Biesheuvel	2	-2/+2
	Switch to the new AES library that also provides an implementation of the AES key expansion routine. This removes the dependency on the generic AES cipher, allowing it to be omitted entirely in the future. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: cesa/aes - switch to library version of key expansion routine	Ard Biesheuvel	2	-2/+2
	Switch to the new AES library that also provides an implementation of the AES key expansion routine. This removes the dependency on the generic AES cipher, allowing it to be omitted entirely in the future. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: padlock/aes - switch to library version of key expansion routine	Ard Biesheuvel	2	-2/+2
	Switch to the new AES library that also provides an implementation of the AES key expansion routine. This removes the dependency on the generic AES cipher, allowing it to be omitted entirely in the future. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: x86/aes - drop scalar assembler implementations	Ard Biesheuvel	5	-665/+0
	The AES assembler code for x86 isn't actually faster than code generated by the compiler from aes_generic.c, and considering the disproportionate maintenance burden of assembler code on x86, it is better just to drop it entirely. Modern x86 systems will use AES-NI anyway, and given that the modules being removed have a dependency on aes_generic already, we can remove them without running the risk of regressions. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: x86/aes-ni - switch to generic for fallback and key routines	Ard Biesheuvel	3	-22/+8
	The AES-NI code contains fallbacks for invocations that occur from a context where the SIMD unit is unavailable, which really only occurs when running in softirq context that was entered from a hard IRQ that was taken while running kernel code that was already using the FPU. That means performance is not really a consideration, and we can just use the new library code for this use case, which has a smaller footprint and is believed to be time invariant. This will allow us to drop the non-SIMD asm routines in a subsequent patch. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: aes - create AES library based on the fixed time AES code	Ard Biesheuvel	5	-300/+394
	Take the existing small footprint and mostly time invariant C code and turn it into a AES library that can be used for non-performance critical, casual use of AES, and as a fallback for, e.g., SIMD code that needs a secondary path that can be taken in contexts where the SIMD unit is off limits (e.g., in hard interrupts taken from kernel context) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: aes/fixed-time - align key schedule with other implementations	Ard Biesheuvel	1	-31/+21
	The fixed time AES code mangles the key schedule so that xoring the first round key with values at fixed offsets across the Sbox produces the correct value. This primes the D-cache with the entire Sbox before any data dependent lookups are done, making it more difficult to infer key bits from timing variances when the plaintext is known. The downside of this approach is that it renders the key schedule incompatible with other implementations of AES in the kernel, which makes it cumbersome to use this implementation as a fallback for SIMD based AES in contexts where this is not allowed. So let's tweak the fixed Sbox indexes so that they add up to zero under the xor operation. While at it, increase the granularity to 16 bytes so we cover the entire Sbox even on systems with 16 byte cachelines. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: aes - rename local routines to prevent future clashes	Ard Biesheuvel	6	-24/+24
	Rename some local AES encrypt/decrypt routines so they don't clash with the names we are about to introduce for the routines exposed by the generic AES library. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: arm/aes-ce - cosmetic/whitespace cleanup	Ard Biesheuvel	1	-60/+56
	Rearrange the aes_algs[] array for legibility. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: inside-secure - add support for 0 length HMAC messages	Pascal van Leeuwen	1	-3/+44
	This patch adds support for the specific corner case of performing HMAC on an empty string (i.e. payload length is zero). This solves the last failing cryptomgr extratests for HMAC. Signed-off-by: Pascal van Leeuwen <pvanleeuwen@verimatrix.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: inside-secure - add support for arbitrary size hash/HMAC updates	Pascal van Leeuwen	2	-158/+269
	This patch fixes an issue with hash and HMAC operations that perform "large" intermediate updates (i.e. combined size > 2 hash blocks) by actually making use of the hardware's hash continue capabilities. The original implementation would cache these updates in a buffer that was 2 hash blocks in size and fail if all update calls combined would overflow that buffer. Which caused the cryptomgr extra tests to fail. Signed-off-by: Pascal van Leeuwen <pvanleeuwen@verimatrix.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: inside-secure - let HW deal with initial hash digest	Pascal van Leeuwen	1	-65/+6
	The driver was loading the initial digest for hash operations into the hardware explicitly, but this is not needed as the hardware can handle that by itself, which is more efficient and avoids any context record coherence issues. Signed-off-by: Pascal van Leeuwen <pvanleeuwen@verimatrix.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: inside-secure: back out parts of earlier HMAC update workaround	Pascal van Leeuwen	1	-19/+13
	This patch backs out some changes done with commit 082ec2d48467 - "add support for HMAC updates" as that update just works around the issue for the basic tests by providing twice the amount of buffering, but this does not solve the case of much larger data blocks such as those performed by the extra tests. This is in preparation of an actual solution in the next patch(es), which does not actually require any extra buffering at all. Signed-off-by: Pascal van Leeuwen <pvanleeuwen@verimatrix.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: inside-secure - fix EINVAL error (buf overflow) for AEAD decrypt	Pascal van Leeuwen	2	-5/+4
	This patch fixes a buffer overflow error returning -EINVAL for AEAD decrypt operations by NOT appending the (already verified) ICV to the output packet (which is not expected by the API anyway). With this fix, all testmgr AEAD (extra) tests now pass. Signed-off-by: Pascal van Leeuwen <pvanleeuwen@verimatrix.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: inside-secure - fix scatter/gather list to descriptor conversion	Pascal van Leeuwen	1	-46/+136
	Fixed issues with the skcipher and AEAD scatter/gather list to engine descriptor conversion code which caused either too much or too little buffer space to be provided to the hardware. This caused errors with the testmgr extra tests, either kernel panics (on x86-EIP197-FPGA) or engine descriptor errors 0x1, 0x8 or 0x9 (on Macchiatobin e.g. Marvell A8K). With this patch in place, all skcipher and AEAD (extra) tests pass. Signed-off-by: Pascal van Leeuwen <pvanleeuwen@verimatrix.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: inside-secure - fix incorrect skcipher output IV	Pascal van Leeuwen	1	-41/+43
	This patch fixes corruption issues with the skcipher output IV witnessed on x86+EIP197-FPGA (devboard). The original fix, commit 57660b11d5ad ("crypto: inside-secure - implement IV retrieval"), attempted to write out the result IV through the context record. However, this is not a reliable mechanism as there is no way of knowing the hardware context update actually arrived in memory, so it is possible to read the old contents instead of the updated IV. (and indeed, this failed for the x86/FPGA case) The alternative approach used here recognises the fact that the result IV for CBC is actually the last cipher block, which is the last input block in case of decryption and the last output block in case of encryption. So the result IV is taken from the input data buffer respectively the output data buffer instead, which is reliable. Signed-off-by: Pascal van Leeuwen <pvanleeuwen@verimatrix.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: inside-secure - silently return -EINVAL for input error cases	Pascal van Leeuwen	1	-5/+20
	Driver was printing an error message for certain input error cases that should just return -EINVAL, which caused the related testmgr extra tests to flood the kernel message log. Ensured those cases remain silent while making some other device-specific errors a bit more verbose. Signed-off-by: Pascal van Leeuwen <pvanleeuwen@verimatrix.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: inside-secure - keep ivsize for DES ECB modes at 0	Pascal van Leeuwen	1	-2/+0
	The driver incorrectly advertised the IV size for DES and 3DES ECB mode as being the DES blocksize of 8. This is incorrect as ECB mode does not need any IV. Signed-off-by: Pascal van Leeuwen <pvanleeuwen@verimatrix.com> Acked-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: ccree - notify TEE on FIPS tests errors	Gilad Ben-Yossef	1	-0/+23
	Register a FIPS test failure notifier and use it to notify TEE side of FIPS test failures on our side prior to panic. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: fips - add FIPS test failure notification chain	Gilad Ben-Yossef	3	-1/+21
	Crypto test failures in FIPS mode cause an immediate panic, but on some system the cryptographic boundary extends beyond just the Linux controlled domain. Add a simple atomic notification chain to allow interested parties to register to receive notification prior to us kicking the bucket. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: ccree - account for TEE not ready to report	Gilad Ben-Yossef	1	-1/+7
	When ccree driver runs it checks the state of the Trusted Execution Environment CryptoCell driver before proceeding. We did not account for cases where the TEE side is not ready or not available at all. Fix it by only considering TEE error state after sync with the TEE side driver. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Fixes: ab8ec9658f5a ("crypto: ccree - add FIPS support") CC: stable@vger.kernel.org # v4.17+ Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: ccree - drop legacy ivgen support	Gilad Ben-Yossef	9	-466/+17
	ccree had a mechanism for IV generation which was not compatible with the Linux seqiv or echainiv iv generator and was never used in any of the upstream versions so drop all the code implementing it. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-26	crypto: ccree - fix spelling mistake "configration" -> "configuration"	Colin Ian King	1	-1/+1
	There is a spelling mistake in a dev_err message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-07-21	Linus 5.3-rc1	Linus Torvalds	1	-2/+2

2019-07-21	iommu/amd: fix a crash in iova_magazine_free_pfns	Qian Cai	1	-1/+1
	The commit b3aa14f02254 ("iommu: remove the mapping_error dma_map_ops method") incorrectly changed the checking from dma_ops_alloc_iova() in map_sg() causes a crash under memory pressure as dma_ops_alloc_iova() never return DMA_MAPPING_ERROR on failure but 0, so the error handling is all wrong. kernel BUG at drivers/iommu/iova.c:801! Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:iova_magazine_free_pfns+0x7d/0xc0 Call Trace: free_cpu_cached_iovas+0xbd/0x150 alloc_iova_fast+0x8c/0xba dma_ops_alloc_iova.isra.6+0x65/0xa0 map_sg+0x8c/0x2a0 scsi_dma_map+0xc6/0x160 pqi_aio_submit_io+0x1f6/0x440 [smartpqi] pqi_scsi_queue_command+0x90c/0xdd0 [smartpqi] scsi_queue_rq+0x79c/0x1200 blk_mq_dispatch_rq_list+0x4dc/0xb70 blk_mq_sched_dispatch_requests+0x249/0x310 __blk_mq_run_hw_queue+0x128/0x200 blk_mq_run_work_fn+0x27/0x30 process_one_work+0x522/0xa10 worker_thread+0x63/0x5b0 kthread+0x1d2/0x1f0 ret_from_fork+0x22/0x40 Fixes: b3aa14f02254 ("iommu: remove the mapping_error dma_map_ops method") Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-21	hexagon: switch to generic version of pte allocation	Mike Rapoport	1	-32/+2
	The hexagon implementation pte_alloc_one(), pte_alloc_one_kernel(), pte_free_kernel() and pte_free() is identical to the generic except of lack of __GFP_ACCOUNT for the user PTEs allocation. Switch hexagon to use generic version of these functions. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-20	typo fix: it's d_make_root, not d_make_inode...	Al Viro	1	-1/+1
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-20	dt-bindings: pinctrl: stm32: Fix missing 'clocks' property in examples	Rob Herring	1	-0/+4
	Now that examples are validated against the DT schema, an error with required 'clocks' property missing is exposed: Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.example.dt.yaml: \ pinctrl@40020000: gpio@0: 'clocks' is a required property Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.example.dt.yaml: \ pinctrl@50020000: gpio@1000: 'clocks' is a required property Documentation/devicetree/bindings/pinctrl/st,stm32-pinctrl.example.dt.yaml: \ pinctrl@50020000: gpio@2000: 'clocks' is a required property Add the missing 'clocks' properties to the examples to fix the errors. Fixes: 2c9239c125f0 ("dt-bindings: pinctrl: Convert stm32 pinctrl bindings to json-schema") Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: linux-gpio@vger.kernel.org Cc: linux-stm32@st-md-mailman.stormreply.com Acked-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: Rob Herring <robh@kernel.org>
2019-07-20	dt-bindings: iio: ad7124: Fix dtc warnings in example	Rob Herring	1	-33/+38
	With the conversion to DT schema, the examples are now compiled with dtc. The ad7124 binding example has the following warning: Documentation/devicetree/bindings/iio/adc/adi,ad7124.example.dts:19.11-21: \ Warning (reg_format): /example-0/adc@0:reg: property has invalid length (4 bytes) (#address-cells == 1, #size-cells == 1) There's a default #size-cells and #address-cells values of 1 for examples. For examples needing different values such as this one on a SPI bus, they need to provide a SPI bus parent node. Fixes: 26ae15e62d3c ("Convert AD7124 bindings documentation to YAML format.") Cc: Jonathan Cameron <jic23@kernel.org> Cc: linux-iio@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2019-07-20	dt-bindings: iio: avia-hx711: Fix avdd-supply typo in example	Rob Herring	1	-1/+1
	Now that examples are validated against the DT schema, a typo in avia-hx711 example generates a warning: Documentation/devicetree/bindings/iio/adc/avia-hx711.example.dt.yaml: weight: 'avdd-supply' is a required property Fix the typo. Fixes: 5150ec3fe125 ("avia-hx711.yaml: transform DT binding to YAML") Cc: Andreas Klinger <ak@it-klinger.de> Cc: Jonathan Cameron <jic23@kernel.org> Cc: linux-iio@vger.kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2019-07-20	dt-bindings: pinctrl: aspeed: Fix AST2500 example errors	Rob Herring	1	-4/+1
	The schema examples are now validated against the schema itself. The AST2500 pinctrl schema has a couple of errors: Documentation/devicetree/bindings/pinctrl/aspeed,ast2500-pinctrl.example.dt.yaml: \ example-0: $nodename:0: 'example-0' does not match '^(bus\|soc\|axi\|ahb\|apb)(@[0-9a-f]+)?$' Documentation/devicetree/bindings/pinctrl/aspeed,ast2500-pinctrl.example.dt.yaml: \ pinctrl: aspeed,external-nodes: [[1, 2]] is too short Fixes: 0a617de16730 ("dt-bindings: pinctrl: aspeed: Convert AST2500 bindings to json-schema") Cc: Andrew Jeffery <andrew@aj.id.au> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Joel Stanley <joel@jms.id.au> Cc: linux-aspeed@lists.ozlabs.org Cc: linux-gpio@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Acked-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Rob Herring <robh@kernel.org>
2019-07-20	dt-bindings: pinctrl: aspeed: Fix 'compatible' schema errors	Rob Herring	2	-2/+6
	The Aspeed pinctl schema have errors in the 'compatible' schema: Documentation/devicetree/bindings/pinctrl/aspeed,ast2400-pinctrl.yaml: \ properties:compatible:enum: ['aspeed', 'ast2400-pinctrl', 'aspeed', 'g4-pinctrl'] has non-unique elements Documentation/devicetree/bindings/pinctrl/aspeed,ast2500-pinctrl.yaml: \ properties:compatible:enum: ['aspeed', 'ast2500-pinctrl', 'aspeed', 'g5-pinctrl'] has non-unique elements Flow style sequences have to be quoted if the vales contain ','. Fix this by using the more common one line per entry formatting. Fixes: 0a617de16730 ("dt-bindings: pinctrl: aspeed: Convert AST2500 bindings to json-schema") Fixes: 07457937bb5c ("dt-bindings: pinctrl: aspeed: Convert AST2400 bindings to json-schema") Cc: Andrew Jeffery <andrew@aj.id.au> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Joel Stanley <joel@jms.id.au> Cc: linux-aspeed@lists.ozlabs.org Cc: linux-gpio@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Acked-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Rob Herring <robh@kernel.org>
2019-07-20	dt-bindings: riscv: Limit cpus schema to only check RiscV 'cpu' nodes	Rob Herring	1	-82/+61
	Matching on the 'cpus' node was a bad choice because the schema is incorrectly applied to non-RiscV cpus nodes. As we now have a common cpus schema which checks the general structure, it is also redundant to do so in the Risc-V CPU schema. The downside is one could conceivably mix different architecture's cpu nodes or have typos in the compatible string. The latter problem pretty much exists for every schema. Acked-by: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Rob Herring <robh@kernel.org>
2019-07-20	dt-bindings: Ensure child nodes are of type 'object'	Rob Herring	6	-0/+8
	Properties which are child node definitions need to have an explict type. Otherwise, a matching (DT) property can silently match when an error is desired. Fix this up tree-wide. Once this is fixed, the meta-schema will enforce this on any child node definitions. Cc: Chen-Yu Tsai <wens@csie.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Brian Norris <computersforpeace@gmail.com> Cc: Marek Vasut <marek.vasut@gmail.com> Cc: Richard Weinberger <richard@nod.at> Cc: Vignesh Raghavendra <vigneshr@ti.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: linux-mtd@lists.infradead.org Cc: linux-gpio@vger.kernel.org Cc: linux-stm32@st-md-mailman.stormreply.com Cc: linux-spi@vger.kernel.org Acked-by: Miquel Raynal <miquel.raynal@bootlin.com> Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: Rob Herring <robh@kernel.org>
2019-07-20	x86/entry/64: Prevent clobbering of saved CR2 value	Thomas Gleixner	1	-1/+10
	The recent fix for CR2 corruption introduced a new way to reliably corrupt the saved CR2 value. CR2 is saved early in the entry code in RDX, which is the third argument to the fault handling functions. But it missed that between saving and invoking the fault handler enter_from_user_mode() can be called. RDX is a caller saved register so the invoked function can freely clobber it with the obvious consequences. The TRACE_IRQS_OFF call is safe as it calls through the thunk which preserves RDX, but TRACE_IRQS_OFF_DEBUG is not because it also calls into C-code outside of the thunk. Store CR2 in R12 instead which is a callee saved register and move R12 to RDX just before calling the fault handler. Fixes: a0d14b8909de ("x86/mm, tracing: Fix CR2 corruption") Reported-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907201020540.1782@nanos.tec.linutronix.de
2019-07-20	smp: Warn on function calls from softirq context	Peter Zijlstra	1	-0/+16
	It's clearly documented that smp function calls cannot be invoked from softirq handling context. Unfortunately nothing enforces that or emits a warning. A single function call can be invoked from softirq context only via smp_call_function_single_async(). The only legit context is task context, so add a warning to that effect. Reported-by: luferry <luferry@163.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190718160601.GP3402@hirez.programming.kicks-ass.net
2019-07-20	KVM: x86: Add fixed counters to PMU filter	Eric Hankland	3	-12/+35
	Updates KVM_CAP_PMU_EVENT_FILTER so it can also whitelist or blacklist fixed counters. Signed-off-by: Eric Hankland <ehankland@google.com> [No need to check padding fields for zero. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20	KVM: nVMX: do not use dangling shadow VMCS after guest reset	Paolo Bonzini	1	-1/+7
	If a KVM guest is reset while running a nested guest, free_nested will disable the shadow VMCS execution control in the vmcs01. However, on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync the VMCS12 to the shadow VMCS which has since been freed. This causes a vmptrld of a NULL pointer on my machime, but Jan reports the host to hang altogether. Let's see how much this trivial patch fixes. Reported-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20	KVM: VMX: dump VMCS on failed entry	Paolo Bonzini	1	-0/+1
	This is useful for debugging, and is ratelimited nowadays. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20	KVM: x86/vPMU: refine kvm_pmu err msg when event creation failed	Like Xu	1	-2/+2
	If a perf_event creation fails due to any reason of the host perf subsystem, it has no chance to log the corresponding event for guest which may cause abnormal sampling data in guest result. In debug mode, this message helps to understand the state of vPMC and we may not limit the number of occurrences but not in a spamming style. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Like Xu <like.xu@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20	KVM: s390: Use kvm_vcpu_wake_up in kvm_s390_vcpu_wakeup	Wanpeng Li	1	-20/+3
	Use kvm_vcpu_wake_up() in kvm_s390_vcpu_wakeup(). Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20	KVM: Boost vCPUs that are delivering interrupts	Wanpeng Li	3	-5/+10
	Inspired by commit 9cac38dd5d (KVM/s390: Set preempted flag during vcpu wakeup and interrupt delivery), we want to also boost not just lock holders but also vCPUs that are delivering interrupts. Most smp_call_function_many calls are synchronous, so the IPI target vCPUs are also good yield candidates. This patch introduces vcpu->ready to boost vCPUs during wakeup and interrupt delivery time; unlike s390 we do not reuse vcpu->preempted so that voluntarily preempted vCPUs are taken into account by kvm_vcpu_on_spin, but vmx_vcpu_pi_put is not affected (VT-d PI handles voluntary preemption separately, in pi_pre_block). Testing on 80 HT 2 socket Xeon Skylake server, with 80 vCPUs VM 80GB RAM: ebizzy -M vanilla boosting improved 1VM 21443 23520 9% 2VM 2800 8000 180% 3VM 1800 3100 72% Testing on my Haswell desktop 8 HT, with 8 vCPUs VM 8GB RAM, two VMs, one running ebizzy -M, the other running 'stress --cpu 2': w/ boosting + w/o pv sched yield(vanilla) vanilla boosting improved 1570 4000 155% w/ boosting + w/ pv sched yield(vanilla) vanilla boosting improved 1844 5157 179% w/o boosting, perf top in VM: 72.33% [kernel] [k] smp_call_function_many 4.22% [kernel] [k] call_function_i 3.71% [kernel] [k] async_page_fault w/ boosting, perf top in VM: 38.43% [kernel] [k] smp_call_function_many 6.31% [kernel] [k] async_page_fault 6.13% libc-2.23.so [.] __memcpy_avx_unaligned 4.88% [kernel] [k] call_function_interrupt Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Marc Zyngier <maz@kernel.org> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20	KVM: selftests: Remove superfluous define from vmx.c	Thomas Huth	1	-2/+0
	The code in vmx.c does not use "program_invocation_name", so there is no need to "#define _GNU_SOURCE" here. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20	KVM: SVM: Fix detection of AMD Errata 1096	Liran Alon	1	-7/+35
	When CPU raise #NPF on guest data access and guest CR4.SMAP=1, it is possible that CPU microcode implementing DecodeAssist will fail to read bytes of instruction which caused #NPF. This is AMD errata 1096 and it happens because CPU microcode reading instruction bytes incorrectly attempts to read code as implicit supervisor-mode data accesses (that is, just like it would read e.g. a TSS), which are susceptible to SMAP faults. The microcode reads CS:RIP and if it is a user-mode address according to the page tables, the processor gives up and returns no instruction bytes. In this case, GuestIntrBytes field of the VMCB on a VMEXIT will incorrectly return 0 instead of the correct guest instruction bytes. Current KVM code attemps to detect and workaround this errata, but it has multiple issues: 1) It mistakenly checks if guest CR4.SMAP=0 instead of guest CR4.SMAP=1, which is required for encountering a SMAP fault. 2) It assumes SMAP faults can only occur when guest CPL==3. However, in case guest CR4.SMEP=0, the guest can execute an instruction which reside in a user-accessible page with CPL<3 priviledge. If this instruction raise a #NPF on it's data access, then CPU DecodeAssist microcode will still encounter a SMAP violation. Even though no sane OS will do so (as it's an obvious priviledge escalation vulnerability), we still need to handle this semanticly correct in KVM side. Note that (2) is a useful optimization, because CR4.SMAP=1 is an easy triggerable condition and guests usually enable SMAP together with SMEP. If the vCPU has CR4.SMEP=1, the errata could indeed be encountered onlt at guest CPL==3; otherwise, the CPU would raise a SMEP fault to guest instead of #NPF. We keep this condition to avoid false positives in the detection of the errata. In addition, to avoid future confusion and improve code readbility, include details of the errata in code and not just in commit message. Fixes: 05d5a4863525 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)") Cc: Singh Brijesh <brijesh.singh@amd.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-07-20	KVM: LAPIC: Inject timer interrupt via posted interrupt	Wanpeng Li	7	-36/+87
	Dedicated instances are currently disturbed by unnecessary jitter due to the emulated lapic timers firing on the same pCPUs where the vCPUs reside. There is no hardware virtual timer on Intel for guest like ARM, so both programming timer in guest and the emulated timer fires incur vmexits. This patch tries to avoid vmexit when the emulated timer fires, at least in dedicated instance scenario when nohz_full is enabled. In that case, the emulated timers can be offload to the nearest busy housekeeping cpus since APICv has been found for several years in server processors. The guest timer interrupt can then be injected via posted interrupts, which are delivered by the housekeeping cpu once the emulated timer fires. The host should tuned so that vCPUs are placed on isolated physical processors, and with several pCPUs surplus for busy housekeeping. If disabled mwait/hlt/pause vmexits keep the vCPUs in non-root mode, ~3% redis performance benefit can be observed on Skylake server, and the number of external interrupt vmexits drops substantially. Without patch VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% ) While with patch: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% ) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>