wireguard-linux - WireGuard for the Linux kernel

Age	Commit message (Collapse)	Author	Files	Lines
2016-03-14	drm/nouveau/instmem/gk20a: set DMA mask early	Alexandre Courbot	1	-0/+9
	DMA mask is typically set in nouveau_ttm_init(), but this function is called late during initialization and GK20A's instmem will have called DMA functions before this happens. Having a wrongly set DMA mask can result in the use of unneeded bounce buffers. Set it early to avoid this. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/gr/gm206: remove implementation, it's now identical to gm200	Ben Skeggs	6	-93/+1
	Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/gr/gm200: switch over to using sw_nonctx from firmware	Ben Skeggs	3	-204/+1
	Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/gr/gm200: switch over to using sw_method_init from firmware	Ben Skeggs	3	-358/+1
	Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/gr/gm200: switch over to using sw_bundle_init from firmware	Ben Skeggs	3	-278/+1
	Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/gr/gm200: switch over to using sw_ctx from firmware	Ben Skeggs	3	-317/+1
	Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/bios/extdev: also parse v4.1 table	Karol Herbst	1	-1/+1
	Signed-off-by: Karol Herbst <nouveau@karolherbst.de> Reviewed-by: Martin Peres <martin.peres@free.fr>
2016-03-14	drm/nouveau/hwmon: don't require therm to be valid to get any data	Karol Herbst	1	-20/+19
	Signed-off-by: Karol Herbst <nouveau@karolherbst.de> Reviewed-by: Martin Peres <martin.peres@free.fr>
2016-03-14	drm/nouveau/hwmon: add power consumption	Karol Herbst	2	-0/+37
	v2: expose only if the sensor reading is valid Signed-off-by: Karol Herbst <nouveau@karolherbst.de> Reviewed-by: Martin Peres <martin.peres@free.fr>
2016-03-14	drm/nouveau/iccsense: implement for ina209, ina219 and ina3221	Karol Herbst	8	-1/+291
	based on Martins initial work v3: fix ina2x9 calculations v4: don't kmalloc(0), fix the lsb/pga stuff v5: add a field to tell if the power reading may be invalid add nkvm_iccsense_read_all function check for the device on the i2c bus Signed-off-by: Karol Herbst <nouveau@karolherbst.de> Reviewed-by: Martin Peres <martin.peres@free.fr>
2016-03-14	drm/nouveau/nvbios/iccsense: add parsing of the SENSE table	Martin Peres	3	-0/+117
	Karol Herbst: v4: don't kmalloc(0) v5: stricter validation Signed-off-by: Karol Herbst <nouveau@karolherbst.de> Reviewed-by: Martin Peres <martin.peres@free.fr>
2016-03-14	drm/nouveau/subdev/iccsense: add new subdev for power sensors	Martin Peres	9	-146/+216
	Signed-off-by: Karol Herbst <nouveau@karolherbst.de> Reviewed-by: Martin Peres <martin.peres@free.fr>
2016-03-14	drm/nouveau/secboot/gm20b: add secure boot support	Alexandre Courbot	5	-3/+242
	Add secure boot support for the GM20B chip found in Tegra X1. Secure boot on Tegra works slightly differently from desktop, notably in the way the WPR region is set up. In addition, the firmware bootloaders use a slightly different header format. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/secboot/gm200: add secure-boot support	Alexandre Courbot	6	-7/+1674
	Add secure-boot for the dGPU set of GM20X chips, using the PMU as the high-secure falcon. This work is based on Deepak Goyal's initial port of Secure Boot to Nouveau. v2. use proper memory target function Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/gr/gm200: do not load firmware for secure falcons	Alexandre Courbot	1	-6/+15
	Secure falcons' firmware is managed by secboot. Do not load it in GR for them. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/gr/gf100: add support for securely-managed falcons	Alexandre Courbot	1	-4/+24
	Start securely-managed falcons using secboot functions since the process for them is different from just writing CPUCTL. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/core: add support for secure boot	Alexandre Courbot	9	-0/+401
	On GM200 and later GPUs, firmware for some essential falcons (notably GR ones) must be authenticated by a NVIDIA-produced signature and loaded by a high-secure falcon in order to be able to access privileged registers, in a process known as Secure Boot. Secure Boot requires building a binary blob containing the firmwares and signatures of the falcons to be loaded. This blob is then given to a high-secure falcon running a signed loader firmware that copies the blob into a write-protected region, checks that the signatures are valid, and finally loads the verified firmware into the managed falcons and switches them to privileged mode. This patch adds infrastructure code to support this process on chips that require it. v2: - The IRQ mask of the PMU falcon was left - replace it with the proper irq_mask variable. - The falcon reset procedure expecting a falcon in an initialized state, which was accidentally provided by the PMU subdev. Make sure that secboot can manage the falcon on its own. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/gr/gm200: load external firmware and bundles	Alexandre Courbot	5	-29/+54
	Load firmware and bundles in GM200's constructor. The previously called GF100 function did not care about the bundles. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/gr/gk20a: share external bundles loading functions	Alexandre Courbot	2	-3/+11
	There functions are going to be used by other chips that rely on NVIDIA-provided firmware. Export them. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/gr/gk20a: simplify external bundle loading functions	Alexandre Courbot	1	-47/+73
	Make these functions easier to use by handling memory management from within. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/gr/gf100: load firmware in outer function	Alexandre Courbot	2	-10/+22
	The firmwares required by GR may vary from chip to chip, especially with the introduction of secure boot and NVIDIA-provided firmwares. Move the firmware loading outside of gf100_gr_ctor so other chips may still call it while managing their firmwares themselves. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/gr/gk20a: move firmware bundle release to gf100	Alexandre Courbot	4	-19/+12
	Some members of gf100_gr were freed by the gk20a driver. That's not where it should be done - free them in gf100 so other chips that use NVIDIA-provided firmware free these structures properly. This also removes the need for a GK20A-specific destructor. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/core: add gpuobj memcpy helper functions	Alexandre Courbot	2	-0/+24
	Add memcpy functions to copy a buffer to a gpuobj and vice-versa. This will be used by the secure boot code. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/gm200: enable graphics device	Ben Skeggs	1	-0/+1
	Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/gr/gm200: s/gm204/gm200/	Ben Skeggs	10	-128/+128
	Most of the per-chipset differences will go away when we fully switch to using the register lists provided by the firmware files, which will leave all the remaining code "belonging" to GM200. This is a preemptive rename from GM204 to GM200. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/bios/devinit: properly handle unknown generic conditions	Ben Skeggs	1	-2/+3
	Upon encountering an unknown condition code, the script interpreter is supposed to skip 'size' bytes and continue at the next devinit token. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/bios/devinit: rename INIT_DP_CONDITION to INIT_GENERIC_CONDITION	Ben Skeggs	1	-5/+5
	Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/ltc/gm107: fix slice intr offset	Ben Skeggs	1	-1/+1
	Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/devinit/gf100-: detect if BIOS invoked devinit	Alexandre Courbot	4	-3/+16
	It is not advisable to perform devinit if it has already been done. VBIOS will very likely have invoked devinit if the GPU is the primary graphics device, but there is no accurate way to detect this fact yet. This patch adds such a method for gf100 and later chips, by means of the NV_PTOP_SCRATCH1_DEVINIT_COMPLETED bit. This bit is set to 1 by devinit, and reset to 0 when the GPU is powered. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/devinit/nv50: remove unneeded variable	Alexandre Courbot	1	-7/+6
	We never use any nv50-specific member in this nv50_devinit_preinit(). Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau: s/gm204/gm200/ in a number of places	Ben Skeggs	39	-150/+150
	Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau: recognise GM200 chipset	Ben Skeggs	1	-0/+31
	Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/device/tegra: fix uninitialized IRQ number	Alexandre Courbot	1	-1/+0
	nvkm_device_tegra_new initializes the irq member of the Tegra device to -1 in order to signal that it is uninitialized. However, nvkm_device_tegra_fini tests it against 0 to check whether an IRQ has been allocated or not. This leads to free_irq being called on -1 during device initialization. Fix this by using 0 as the uninitialized value everywhere. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/device: call nvkm_device_fini if nvkm_device_init fails	Alexandre Courbot	1	-0/+2
	nvkm_device_fini is never called if a failure occurs in nvkm_device_init, even when unloading the module. This can lead to a resources leak (one example is the Tegra interrupt which would never be freed in that case). Fix this by calling nvkm_device_fini in nvkm_device_init's failure path. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/gr/gf100: use the nvkm_firmware functions	Alexandre Courbot	1	-15/+3
	Use the nvkm_firmware_* functions when loading external firmware to avoid duplicate code. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/core: add firmware handling functions	Alexandre Courbot	3	-0/+73
	Add two functions nvkm_firmware_get() and nvkm_firmware_put() to load a firmware file and free its resources, respectively. Since firmware files are becoming a necessity for new GPUs, and their location has been standardized to nvidia/chip/, this will prevent duplicate and error-prone name-generation code. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-14	drm/nouveau/ltc/gm107: wait on relevant bit in gm107_ltc_cbc_wait	Alexandre Courbot	1	-4/+2
	Patch "ltc/gm107: use nvkm_mask to set cbc_ctrl1" sets the 3rd bit of the CTRL1 register instead of writing it entirely in gm107_ltc_cbc_clear(). As a counterpart, gm107_ltc_cbc_wait() must also be modified to wait on that single bit only, otherwise a timeout may occur if some other bit of that register is set. This happened at least on GM206 when running glmark2-drm. While we are at it, use the more compact nvkm_wait_msec() to wait for the bit to clear. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2016-03-13	drm/exynos: add DRM_EXYNOS_GEM_MAP ioctl	Joonyoung Shim	4	-1/+31
	The commit d931589c01a2 ("drm/exynos: remove DRM_EXYNOS_GEM_MAP_OFFSET ioctl") removed it same with the ioctl that this patch adds. The reason that removed DRM_EXYNOS_GEM_MAP_OFFSET was we could use DRM_IOCTL_MODE_MAP_DUMB. Both did exactly same thing. Now we again will revive it as DRM_EXYNOS_GEM_MAP because of render node. DRM_IOCTL_MODE_MAP_DUMB isn't permitted in render node. Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2016-03-11	ARM: mvebu: fix overlap of Crypto SRAM with PCIe memory window	Thomas Petazzoni	9	-19/+19
	When the Crypto SRAM mappings were added to the Device Tree files describing the Armada XP boards in commit c466d997bb16 ("ARM: mvebu: define crypto SRAM ranges for all armada-xp boards"), the fact that those mappings were overlaping with the PCIe memory aperture was overlooked. Due to this, we currently have for all Armada XP platforms a situation that looks like this: Memory mapping on Armada XP boards with internal registers at 0xf1000000: - 0x00000000 -> 0xf0000000 3.75G RAM - 0xf0000000 -> 0xf1000000 16M NOR flashes (AXP GP / AXP DB) - 0xf1000000 -> 0xf1100000 1M internal registers - 0xf8000000 -> 0xffe0000 126M PCIe memory aperture - 0xf8100000 -> 0xf8110000 64KB Crypto SRAM #0 => OVERLAPS WITH PCIE ! - 0xf8110000 -> 0xf8120000 64KB Crypto SRAM #1 => OVERLAPS WITH PCIE ! - 0xffe00000 -> 0xfff00000 1M PCIe I/O aperture - 0xfff0000 -> 0xffffffff 1M BootROM The overlap means that when PCIe devices are added, depending on their memory window needs, they might or might not be mapped into the physical address space. Indeed, they will not be mapped if the area allocated in the PCIe memory aperture by the PCI core overlaps with one of the Crypto SRAM. Typically, a Intel IGB PCIe NIC that needs 8MB of PCIe memory will see its PCIe memory window allocated from 0xf80000000 for 8MB, which overlaps with the Crypto SRAM windows. Due to this, the PCIe window is not created, and any attempt to access the PCIe window makes the kernel explode: [ 3.302213] igb: Copyright (c) 2007-2014 Intel Corporation. [ 3.307841] pci 0000:00:09.0: enabling device (0140 -> 0143) [ 3.313539] mvebu_mbus: cannot add window '4:f8', conflicts with another window [ 3.320870] mvebu-pcie soc:pcie-controller: Could not create MBus window at [mem 0xf8000000-0xf87fffff]: -22 [ 3.330811] Unhandled fault: external abort on non-linefetch (0x1008) at 0xf08c0018 This problem does not occur on Armada 370 boards, because we use the following memory mapping (for boards that have internal registers at 0xf1000000): - 0x00000000 -> 0xf0000000 3.75G RAM - 0xf0000000 -> 0xf1000000 16M NOR flashes (AXP GP / AXP DB) - 0xf1000000 -> 0xf1100000 1M internal registers - 0xf1100000 -> 0xf1110000 64KB Crypto SRAM #0 => OK ! - 0xf8000000 -> 0xffe0000 126M PCIe memory - 0xffe00000 -> 0xfff00000 1M PCIe I/O - 0xfff0000 -> 0xffffffff 1M BootROM Obviously, the solution is to align the location of the Crypto SRAM mappings of Armada XP to be similar with the ones on Armada 370, i.e have them between the "internal registers" area and the beginning of the PCIe aperture. However, we have a special case with the OpenBlocks AX3-4 platform, which has a 128 MB NOR flash. Currently, this NOR flash is mapped from 0xf0000000 to 0xf8000000. This is possible because on OpenBlocks AX3-4, the internal registers are not at 0xf1000000. And this explains why the Crypto SRAM mappings were not configured at the same place on Armada XP. Hence, the solution is two-fold: (1) Move the NOR flash mapping on Armada XP OpenBlocks AX3-4 from 0xe8000000 to 0xf0000000. This frees the 0xf0000000 -> 0xf80000000 space. (2) Move the Crypto SRAM mappings on Armada XP to be similar to Armada 370 (except of course that Armada XP has two Crypto SRAM and not one). After this patch, the memory mapping on Armada XP boards with registers at 0xf1 is: - 0x00000000 -> 0xf0000000 3.75G RAM - 0xf0000000 -> 0xf1000000 16M NOR flashes (AXP GP / AXP DB) - 0xf1000000 -> 0xf1100000 1M internal registers - 0xf1100000 -> 0xf1110000 64KB Crypto SRAM #0 - 0xf1110000 -> 0xf1120000 64KB Crypto SRAM #1 - 0xf8000000 -> 0xffe0000 126M PCIe memory - 0xffe00000 -> 0xfff00000 1M PCIe I/O - 0xfff0000 -> 0xffffffff 1M BootROM And the memory mapping for the special case of the OpenBlocks AX3-4 (internal registers at 0xd0000000, NOR of 128 MB): - 0x00000000 -> 0xc0000000 3G RAM - 0xd0000000 -> 0xd1000000 1M internal registers - 0xe800000 -> 0xf0000000 128M NOR flash - 0xf1100000 -> 0xf1110000 64KB Crypto SRAM #0 - 0xf1110000 -> 0xf1120000 64KB Crypto SRAM #1 - 0xf8000000 -> 0xffe0000 126M PCIe memory - 0xffe00000 -> 0xfff00000 1M PCIe I/O - 0xfff0000 -> 0xffffffff 1M BootROM Fixes: c466d997bb16 ("ARM: mvebu: define crypto SRAM ranges for all armada-xp boards") Reported-by: Phil Sutter <phil@nwl.cc> Cc: Phil Sutter <phil@nwl.cc> Cc: <stable@vger.kernel.org> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Olof Johansson <olof@lixom.net>
2016-03-11	drm/i915: Actually retry with bit-banging after GMBUS timeout	Ville Syrjälä	1	-0/+6
	After the GMBUS transfer times out, we set force_bit=1 and return -EAGAIN expecting the i2c core to call the .master_xfer hook again so that we will retry the same transfer via bit-banging. This is in case the gmbus hardware is somehow faulty. Unfortunately we left adapter->retries to 0, meaning the i2c core didn't actually do the retry. Let's tell the core we want one retry when we return -EAGAIN. Note that i2c-algo-bit also uses this retry count for some internal retries, so we'll end up increasing those a bit as well. Cc: Jani Nikula <jani.nikula@intel.com> Cc: drm-intel-fixes@lists.freedesktop.org Fixes: bffce907d640 ("drm/i915: abstract i2c bit banging fallback in gmbus xfer") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1457366220-29409-2-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 8b1f165a4a8f64c28cf42d10e1f4d3b451dedc51) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-03-10	[media] media-device: map new functions into old types for legacy API	Mauro Carvalho Chehab	2	-1/+28
	The legacy media controller userspace API exposes entity types that carry both type and function information. The new API replaces the type with a function. It preserves backward compatibility by defining legacy functions for the existing types and using them in drivers. This works fine, as long as newer entity functions won't be added. Unfortunately, some tools, like media-ctl with --print-dot argument rely on the now legacy MEDIA_ENT_T_V4L2_SUBDEV and MEDIA_ENT_T_DEVNODE numeric ranges to identify what entities will be shown. Also, if the entity doesn't match those ranges, it will ignore the major/minor information on devnodes, and won't be getting the devnode name via udev or sysfs. As we're now adding devices outside the old range, the legacy ioctl needs to map the new entity functions into a type at the old range, or otherwise we'll have a regression. Detected on all released media-ctl versions (e. g. versions <= 1.10). Fix this by deriving the type from the function to emulate the legacy API if the function isn't in the legacy functions range. Reported-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-03-10	dmaengine: at_xdmac: fix residue computation	Ludovic Desroches	1	-3/+39
	When computing the residue we need two pieces of information: the current descriptor and the remaining data of the current descriptor. To get that information, we need to read consecutively two registers but we can't do it in an atomic way. For that reason, we have to check manually that current descriptor has not changed. Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com> Suggested-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Reported-by: David Engraf <david.engraf@sysgo.com> Tested-by: David Engraf <david.engraf@sysgo.com> Fixes: e1f7c9eee707 ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver") Cc: stable@vger.kernel.org #4.1 and later Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2016-03-10	KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0	Paolo Bonzini	1	-1/+3
	KVM has special logic to handle pages with pte.u=1 and pte.w=0 when CR0.WP=1. These pages' SPTEs flip continuously between two states: U=1/W=0 (user and supervisor reads allowed, supervisor writes not allowed) and U=0/W=1 (supervisor reads and writes allowed, user writes not allowed). When SMEP is in effect, however, U=0 will enable kernel execution of this page. To avoid this, KVM also sets NX=1 in the shadow PTE together with U=0, making the two states U=1/W=0/NX=gpte.NX and U=0/W=1/NX=1. When guest EFER has the NX bit cleared, the reserved bit check thinks that the latter state is invalid; teach it that the smep_andnot_wp case will also use the NX bit of SPTEs. Cc: stable@vger.kernel.org Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.inel.com> Fixes: c258b62b264fdc469b6d3610a907708068145e3b Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-10	KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo	Paolo Bonzini	2	-14/+25
	Yes, all of these are needed. :) This is admittedly a bit odd, but kvm-unit-tests access.flat tests this if you run it with "-cpu host" and of course ept=0. KVM runs the guest with CR0.WP=1, so it must handle supervisor writes specially when pte.u=1/pte.w=0/CR0.WP=0. Such writes cause a fault when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0. When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and restarts execution. This will still cause a user write to fault, while supervisor writes will succeed. User reads will fault spuriously now, and KVM will then flip U and W again in the SPTE (U=1, W=0). User reads will be enabled and supervisor writes disabled, going back to the originary situation where supervisor writes fault spuriously. When SMEP is in effect, however, U=0 will enable kernel execution of this page. To avoid this, KVM also sets NX=1 in the shadow PTE together with U=0. If the guest has not enabled NX, the result is a continuous stream of page faults due to the NX bit being reserved. The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER switch. (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry control, so they do not use user-return notifiers for EFER---if they did, EFER.NX would be forced to the same value as the host). There is another bug in the reserved bit check, which I've split to a separate patch for easier application to stable kernels. Cc: stable@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-10	s390/mm: four page table levels vs. fork	Martin Schwidefsky	2	-10/+30
	The fork of a process with four page table levels is broken since git commit 6252d702c5311ce9 "[S390] dynamic page tables." All new mm contexts are created with three page table levels and an asce limit of 4TB. If the parent has four levels dup_mmap will add vmas to the new context which are outside of the asce limit. The subsequent call to copy_page_range will walk the three level page table structure of the new process with non-zero pgd and pud indexes. This leads to memory clobbers as the pgd_index and the pud_index is added to the mm->pgd pointer without a pgd_deref in between. The init_new_context() function is selecting the number of page table levels for a new context. The function is used by mm_init() which in turn is called by dup_mm() and mm_alloc(). These two are used by fork() and exec(). The init_new_context() function can distinguish the two cases by looking at mm->context.asce_limit, for fork() the mm struct has been copied and the number of page table levels may not change. For exec() the mm_alloc() function set the new mm structure to zero, in this case a three-level page table is created as the temporary stack space is located at STACK_TOP_MAX = 4TB. This fixes CVE-2016-2143. Reported-by: Marcin Kościelnicki <koriakin@0x04.net> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-09	ext4: iterate over buffer heads correctly in move_extent_per_page()	Eryu Guan	1	-0/+1
	In commit bcff24887d00 ("ext4: don't read blocks from disk after extents being swapped") bh is not updated correctly in the for loop and wrong data has been written to disk. generic/324 catches this on sub-page block size ext4. Fixes: bcff24887d00 ("ext4: don't read blocks from disk after extentsbeing swapped") Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-03-09	dma-mapping: avoid oops when parameter cpu_addr is null	Zhen Lei	1	-1/+1
	To keep consistent with kfree, which tolerate ptr is NULL. We do this because sometimes we may use goto statement, so that success and failure case can share parts of the code. But unfortunately, dma_free_coherent called with parameter cpu_addr is null will cause oops, such as showed below: Unable to handle kernel paging request at virtual address ffffffc020d3b2b8 pgd = ffffffc083a61000 [ffffffc020d3b2b8] pgd=0000000000000000, pud=0000000000000000 CPU: 4 PID: 1489 Comm: malloc_dma_1 Tainted: G O 4.1.12 #1 Hardware name: ARM64 (DT) PC is at __dma_free_coherent.isra.10+0x74/0xc8 LR is at __dma_free+0x9c/0xb0 Process malloc_dma_1 (pid: 1489, stack limit = 0xffffffc0837fc020) [...] Call trace: __dma_free_coherent.isra.10+0x74/0xc8 __dma_free+0x9c/0xb0 malloc_dma+0x104/0x158 [dma_alloc_coherent_mtmalloc] kthread+0xec/0xfc Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-09	mm/hugetlb: use EOPNOTSUPP in hugetlb sysctl handlers	Jan Stancek	1	-2/+2
	Replace ENOTSUPP with EOPNOTSUPP. If hugepages are not supported, this value is propagated to userspace. EOPNOTSUPP is part of uapi and is widely supported by libc libraries. It gives nicer message to user, rather than: # cat /proc/sys/vm/nr_hugepages cat: /proc/sys/vm/nr_hugepages: Unknown error 524 And also LTP's proc01 test was failing because this ret code (524) was unexpected: proc01 1 TFAIL : proc01.c:396: read failed: /proc/sys/vm/nr_hugepages: errno=???(524): Unknown error 524 proc01 2 TFAIL : proc01.c:396: read failed: /proc/sys/vm/nr_hugepages_mempolicy: errno=???(524): Unknown error 524 proc01 3 TFAIL : proc01.c:396: read failed: /proc/sys/vm/nr_overcommit_hugepages: errno=???(524): Unknown error 524 Signed-off-by: Jan Stancek <jstancek@redhat.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-09	memremap: check pfn validity before passing to pfn_to_page()	Ard Biesheuvel	1	-2/+2
	In memremap's helper function try_ram_remap(), we dereference a struct page pointer that was derived from a PFN that is known to be covered by a 'System RAM' iomem region, and is thus assumed to be a 'valid' PFN, i.e., a PFN that has a struct page associated with it and is covered by the kernel direct mapping. However, the assumption that there is a 1:1 relation between the System RAM iomem region and the kernel direct mapping is not universally valid on all architectures, and on ARM and arm64, 'System RAM' may include regions for which pfn_valid() returns false. Generally speaking, both __va() and pfn_to_page() should only ever be called on PFNs/physical addresses for which pfn_valid() returns true, so add that check to try_ram_remap(). Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-09	mm, thp: fix migration of PTE-mapped transparent huge pages	Kirill A. Shutemov	1	-1/+1
	We don't have native support of THP migration, so we have to split huge page into small pages in order to migrate it to different node. This includes PTE-mapped huge pages. I made mistake in refcounting patchset: we don't actually split PTE-mapped huge page in queue_pages_pte_range(), if we step on head page. The result is that the head page is queued for migration, but none of tail pages: putting head page on queue takes pin on the page and any subsequent attempts of split_huge_pages() would fail and we skip queuing tail pages. unmap_and_move_huge_page() will eventually split the huge pages, but only one of 512 pages would get migrated. Let's fix the situation. Fixes: 248db92da13f2507 ("migrate_pages: try to split pages on queuing") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>