linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-03-13	MIPS: smp.c: Fix uninitialised temp_foreign_map	James Hogan	1	-0/+1
	When calculate_cpu_foreign_map() recalculates the cpu_foreign_map cpumask it uses the local variable temp_foreign_map without initialising it to zero. Since the calculation only ever sets bits in this cpumask any existing bits at that memory location will remain set and find their way into cpu_foreign_map too. This could potentially lead to cache operations suboptimally doing smp calls to multiple VPEs in the same core, even though the VPEs share primary caches. Therefore initialise temp_foreign_map using cpumask_clear() before use. Fixes: cccf34e9411c ("MIPS: c-r4k: Fix cache flushing for MT cores") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/12759/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-03-13	MIPS: Fix build error when SMP is used without GIC	Hauke Mehrtens	1	-3/+4
	The MIPS_GIC_IPI should only be selected when MIPS_GIC is also selected, otherwise it results in a compile error. smp-gic.c uses some functions from include/linux/irqchip/mips-gic.h like plat_ipi_call_int_xlate() which are only added to the header file when MIPS_GIC is set. The Lantiq SoC does not use the GIC, but supports SMP. The calls top the functions from smp-gic.c are already protected by some #ifdefs The first part of this was introduced in commit 72e20142b2bf ("MIPS: Move GIC IPI functions out of smp-cmp.c") Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Cc: Paul Burton <paul.burton@imgtec.com> Cc: stable@vger.kernel.org # v3.15+ Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/12774/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-03-13	ld-version: Fix awk regex compile failure	James Hogan	1	-1/+1
	The ld-version.sh script fails on some versions of awk with the following error, resulting in build failures for MIPS: awk: scripts/ld-version.sh: line 4: regular expression compile failed (missing '(') This is due to the regular expression ".*)", meant to strip off the beginning of the ld version string up to the close bracket, however brackets have a meaning in regular expressions, so lets escape it so that awk doesn't expect a corresponding open bracket. Fixes: ccbef1674a15 ("Kbuild, lto: add ld-version and ld-ifversion ...") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: James Hogan <james.hogan@imgtec.com> Tested-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Cc: Michal Marek <mmarek@suse.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: linux-mips@linux-mips.org Cc: linux-kbuild@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org # 4.4.x- Patchwork: https://patchwork.linux-mips.org/patch/12838/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-03-13	MIPS: Fix build with DEBUG_ZBOOT and MACH_JZ4780	Aaro Koskinen	1	-1/+1
	Ingenic SoC declares ZBOOT support, but debug definitions are missing for MACH_JZ4780 resulting in a build failure when DEBUG_ZBOOT is set. The UART addresses are same as with JZ4740, so fix by covering JZ4780 with those as well. Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/12830/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-03-13	tools/power turbostat: bugfix: TDP MSRs print bits fixing	Chen Yu	1	-10/+10
	MSR_CONFIG_TDP_NOMINAL: should print all 8 bits of base_ratio (bit 0:7) 0xFF MSR_CONFIG_TDP_LEVEL_1: should print all 15 bits of PKG_MIN_PWR_LVL1 (bit 48:62) 0x7FFF should print all 15 bits of PKG_MAX_PWR_LVL1 (bit 32:46) 0x7FFF should print all 8 bits of LVL1_RATIO (bit 16:23) 0xFF should print all 15 bits of PKG_TDP_LVL1 (bit 0:14) 0x7FFF And the same modification to MSR_CONFIG_TDP_LEVEL_2. MSR_TURBO_ACTIVATION_RATIO: should print all 8 bits of MAX_NON_TURBO_RATIO (bit 0:7) 0xFF Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2016-03-13	tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump	Len Brown	1	-1/+1
	MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=0: unlimited) should print as MSR_NHM_SNB_PKG_CST_CFG_CTL: 0x1e008008 (...pkg-cstate-limit=8: unlimited) Signed-off-by: Len Brown <len.brown@intel.com>
2016-03-13	tools/power turbostat: call __cpuid() instead of __get_cpuid()	Len Brown	1	-7/+7
	turbostat already checks whether calling each cpuid leavf is legal, and it doesn't look at the function return value, so call the simpler gcc intrinsic __cpuid() instead of __get_cpuid(). syntax only, no functional change Signed-off-by: Len Brown <len.brown@intel.com>
2016-03-13	tools/power turbostat: indicate SMX and SGX support	Len Brown	1	-1/+27
	SGX presence is related to a SKL power workaround, so lets show when that is enabled. Signed-off-by: Len Brown <len.brown@intel.com>
2016-03-13	tools/power turbostat: detect and work around syscall jitter	Len Brown	1	-1/+50
	The accuracy of Bzy_Mhz and Busy% depend on reading the TSC, APERF, and MPERF close together in time. When there is a very short measurement interval, or a large system is profoundly idle, the changes in APERF and MPERF may be very small. They can be small enough that an expensive interrupt between reading APERF and MPERF can cause the APERF/MPERF ratio to become inaccurate, resulting in invalid calculation and display of Bzy_MHz. A dummy APERF read of APERF makes this problem much more rare. Apparently this 1st systemn call after exiting a long stretch of idle is when we typically see expensive timer interrupts that cause large jitter. For the cases that dummy APERF read fails to prevent, we compare the latency of the APERF and MPERF reads. If they differ by more than 2x, we re-issue them. Signed-off-by: Len Brown <len.brown@intel.com>
2016-03-13	tools/power turbostat: show GFX%rc6	Len Brown	1	-0/+45
	The column "GFX%c6" show the percentage of time the GPU is in the "render C6" state, rc6. Deep package C-states on several systems depend on the GPU being in RC6. This information comes from the counter /sys/class/drm/card0/power/rc6_residency_ms, as read before and after the measurement interval. Signed-off-by: Len Brown <len.brown@intel.com>
2016-03-13	tools/power turbostat: show GFXMHz	Len Brown	1	-1/+49
	Under the column "GFXMHz", show a snapshot of this attribute: /sys/class/graphics/fb0/device/drm/card0/gt_cur_freq_mhz This is an instantaneous snapshot of what sysfs presents at the end of the measurement interval. turbostat does not average or otherwise perform any math on this value. Signed-off-by: Len Brown <len.brown@intel.com>
2016-03-13	tools/power turbostat: show IRQs per CPU	Len Brown	1	-4/+122
	The new IRQ column shows how many interrupts have occurred on each CPU during the measurement inteval. This information comes from the difference between /proc/interrupts shapshots made before and after the measurement interval. The first row, the system summary, shows the sum of the IRQS for all CPUs during that interval. Signed-off-by: Len Brown <len.brown@intel.com>
2016-03-13	tools/power turbostat: make fewer systems calls	Len Brown	1	-10/+41
	skip the open(2)/close(2) on each msr read by keeping the /dev/cpu/*/msr files open. The remaining read(2) is generally far fewer cycles than the removed open(2) system call. Signed-off-by: Len Brown <len.brown@intel.com>
2016-03-13	tools/power turbostat: fix compiler warnings	Len Brown	1	-4/+4
	Signed-off-by: Len Brown <len.brown@intel.com>
2016-03-13	tools/power turbostat: add --out option for saving output in a file	Len Brown	2	-135/+160
	By default... Turbostat --debug gconfiguration info goes to stderr. In FORK mode, turbostat statistics go to stderr. In PERIODIC mode, turbostat statistics go to stdout. These defaults do not change, but an option "--out file" will send all output above only to the specified file. Signed-off-by: Len Brown <len.brown@intel.com>
2016-03-13	tools/power turbostat: re-name "%Busy" field to "Busy%"	Len Brown	2	-11/+11
	some tools processing turbostat output have difficulty with items that begin with %... Reported-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2016-03-13	tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding	Hubert Chrzaniuk	1	-27/+26
	Following changes have been made: - changed MSR_NHM_TURBO_RATIO_LIMIT to MSR_TURBO_RATIO_LIMIT in debug print for consistency with Developer Manual - updated definition of bitfields in MSR_TURBO_RATIO_LIMIT and appropriate parsing code - added x200 to list of architectures that do not support Nahlem compatible definition of MSR_TURBO_RATIO_LIMIT register (x200 has the register but bits definition is custom) - fixed typo in code that parses MSR_TURBO_RATIO_LIMIT (logical instead of bitwise operator) - changed MSR_TURBO_RATIO_LIMIT parsing algorithm so the print out had the same order as implementations for other platforms Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2016-03-13	tools/power turbostat: Intel Xeon x200: fix erroneous bclk value	Chrzaniuk, Hubert	1	-1/+1
	x200 does not enable any way to programmatically obtain bus clock speed. Bclk for the architecture has a fixed value of 100 MHz. At the same time x200 cannot be included in has_snb_msrs since it does not support C7 idle state. prior to this patch, MHz values reported on this chip were erroneously calculated using bclk of 133MHz, causing MHz values to be reported 33% higher than actual. Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2016-03-13	tools/power turbostat: allow sub-sec intervals	Len Brown	2	-5/+17
	turbostat -i interval_sec will sample and display statistics every interval_sec. interval_sec used to be a whole number of seconds, but now we accept a decimal, as small as 0.001 sec (1 ms). Signed-off-by: Len Brown <len.brown@intel.com>
2016-03-12	block: don't optimize for non-cloned bio in bio_get_last_bvec()	Ming Lei	1	-5/+0
	For !BIO_CLONED bio, we can use .bi_vcnt safely, but it doesn't mean we can just simply return .bi_io_vec[.bi_vcnt - 1] because the start postion may have been moved in the middle of the bvec, such as splitting in the middle of bvec. Fixes: 7bcd79ac50d9(block: bio: introduce helpers to get the 1st and last bvec) Cc: stable@vger.kernel.org Reported-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-03-12	x86/efi: Fix boot crash by always mapping boot service regions into new EFI page tables	Matt Fleming	1	-17/+62
	Some machines have EFI regions in page zero (physical address 0x00000000) and historically that region has been added to the e820 map via trim_bios_range(), and ultimately mapped into the kernel page tables. It was not mapped via efi_map_regions() as one would expect. Alexis reports that with the new separate EFI page tables some boot services regions, such as page zero, are not mapped. This triggers an oops during the SetVirtualAddressMap() runtime call. For the EFI boot services quirk on x86 we need to memblock_reserve() boot services regions until after SetVirtualAddressMap(). Doing that while respecting the ownership of regions that may have already been reserved by the kernel was the motivation behind this commit: 7d68dc3f1003 ("x86, efi: Do not reserve boot services regions within reserved areas") That patch was merged at a time when the EFI runtime virtual mappings were inserted into the kernel page tables as described above, and the trick of setting ->numpages (and hence the region size) to zero to track regions that should not be freed in efi_free_boot_services() meant that we never mapped those regions in efi_map_regions(). Instead we were relying solely on the existing kernel mappings. Now that we have separate page tables we need to make sure the EFI boot services regions are mapped correctly, even if someone else has already called memblock_reserve(). Instead of stashing a tag in ->numpages, set the EFI_MEMORY_RUNTIME bit of ->attribute. Since it generally makes no sense to mark a boot services region as required at runtime, it's pretty much guaranteed the firmware will not have already set this bit. For the record, the specific circumstances under which Alexis triggered this bug was that an EFI runtime driver on his machine was responding to the EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE event during SetVirtualAddressMap(). The event handler for this driver looks like this, sub rsp,0x28 lea rdx,[rip+0x2445] # 0xaa948720 mov ecx,0x4 call func_aa9447c0 ; call to ConvertPointer(4, & 0xaa948720) mov r11,QWORD PTR [rip+0x2434] # 0xaa948720 xor eax,eax mov BYTE PTR [r11+0x1],0x1 add rsp,0x28 ret Which is pretty typical code for an EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE handler. The "mov r11, QWORD PTR [rip+0x2424]" was the faulting instruction because ConvertPointer() was being called to convert the address 0x0000000000000000, which when converted is left unchanged and remains 0x0000000000000000. The output of the oops trace gave the impression of a standard NULL pointer dereference bug, but because we're accessing physical addresses during ConvertPointer(), it wasn't. EFI boot services code is stored at that address on Alexis' machine. Reported-by: Alexis Murzeau <amurzeau@gmail.com> Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raphael Hertzog <hertzog@debian.org> Cc: Roger Shimizu <rogershimizu@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1457695163-29632-2-git-send-email-matt@codeblueprint.co.uk Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815125 Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-12	x86/fpu: Fix eager-FPU handling on legacy FPU machines	Borislav Petkov	2	-2/+4
	i486 derived cores like Intel Quark support only the very old, legacy x87 FPU (FSAVE/FRSTOR, CPUID bit FXSR is not set), and our FPU code wasn't handling the saving and restoring there properly in the 'eagerfpu' case. So after we made eagerfpu the default for all CPU types: 58122bf1d856 x86/fpu: Default eagerfpu=on on all CPUs these old FPU designs broke. First, Andy Shevchenko reported a splat: WARNING: CPU: 0 PID: 823 at arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160 which was us trying to execute FXRSTOR on those machines even though they don't support it. After taking care of that, Bryan O'Donoghue reported that a simple FPU test still failed because we weren't initializing the FPU state properly on those machines. Take care of all that. Reported-and-tested-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20160311113206.GD4312@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-11	mm/mempool: avoid KASAN marking mempool poison checks as use-after-free	Matthew Dawson	1	-1/+1
	When removing an element from the mempool, mark it as unpoisoned in KASAN before verifying its contents for SLUB/SLAB debugging. Otherwise KASAN will flag the reads checking the element use-after-free writes as use-after-free reads. Signed-off-by: Matthew Dawson <matthew@mjdsystems.ca> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-11	ARM: mvebu: fix overlap of Crypto SRAM with PCIe memory window	Thomas Petazzoni	9	-19/+19
	When the Crypto SRAM mappings were added to the Device Tree files describing the Armada XP boards in commit c466d997bb16 ("ARM: mvebu: define crypto SRAM ranges for all armada-xp boards"), the fact that those mappings were overlaping with the PCIe memory aperture was overlooked. Due to this, we currently have for all Armada XP platforms a situation that looks like this: Memory mapping on Armada XP boards with internal registers at 0xf1000000: - 0x00000000 -> 0xf0000000 3.75G RAM - 0xf0000000 -> 0xf1000000 16M NOR flashes (AXP GP / AXP DB) - 0xf1000000 -> 0xf1100000 1M internal registers - 0xf8000000 -> 0xffe0000 126M PCIe memory aperture - 0xf8100000 -> 0xf8110000 64KB Crypto SRAM #0 => OVERLAPS WITH PCIE ! - 0xf8110000 -> 0xf8120000 64KB Crypto SRAM #1 => OVERLAPS WITH PCIE ! - 0xffe00000 -> 0xfff00000 1M PCIe I/O aperture - 0xfff0000 -> 0xffffffff 1M BootROM The overlap means that when PCIe devices are added, depending on their memory window needs, they might or might not be mapped into the physical address space. Indeed, they will not be mapped if the area allocated in the PCIe memory aperture by the PCI core overlaps with one of the Crypto SRAM. Typically, a Intel IGB PCIe NIC that needs 8MB of PCIe memory will see its PCIe memory window allocated from 0xf80000000 for 8MB, which overlaps with the Crypto SRAM windows. Due to this, the PCIe window is not created, and any attempt to access the PCIe window makes the kernel explode: [ 3.302213] igb: Copyright (c) 2007-2014 Intel Corporation. [ 3.307841] pci 0000:00:09.0: enabling device (0140 -> 0143) [ 3.313539] mvebu_mbus: cannot add window '4:f8', conflicts with another window [ 3.320870] mvebu-pcie soc:pcie-controller: Could not create MBus window at [mem 0xf8000000-0xf87fffff]: -22 [ 3.330811] Unhandled fault: external abort on non-linefetch (0x1008) at 0xf08c0018 This problem does not occur on Armada 370 boards, because we use the following memory mapping (for boards that have internal registers at 0xf1000000): - 0x00000000 -> 0xf0000000 3.75G RAM - 0xf0000000 -> 0xf1000000 16M NOR flashes (AXP GP / AXP DB) - 0xf1000000 -> 0xf1100000 1M internal registers - 0xf1100000 -> 0xf1110000 64KB Crypto SRAM #0 => OK ! - 0xf8000000 -> 0xffe0000 126M PCIe memory - 0xffe00000 -> 0xfff00000 1M PCIe I/O - 0xfff0000 -> 0xffffffff 1M BootROM Obviously, the solution is to align the location of the Crypto SRAM mappings of Armada XP to be similar with the ones on Armada 370, i.e have them between the "internal registers" area and the beginning of the PCIe aperture. However, we have a special case with the OpenBlocks AX3-4 platform, which has a 128 MB NOR flash. Currently, this NOR flash is mapped from 0xf0000000 to 0xf8000000. This is possible because on OpenBlocks AX3-4, the internal registers are not at 0xf1000000. And this explains why the Crypto SRAM mappings were not configured at the same place on Armada XP. Hence, the solution is two-fold: (1) Move the NOR flash mapping on Armada XP OpenBlocks AX3-4 from 0xe8000000 to 0xf0000000. This frees the 0xf0000000 -> 0xf80000000 space. (2) Move the Crypto SRAM mappings on Armada XP to be similar to Armada 370 (except of course that Armada XP has two Crypto SRAM and not one). After this patch, the memory mapping on Armada XP boards with registers at 0xf1 is: - 0x00000000 -> 0xf0000000 3.75G RAM - 0xf0000000 -> 0xf1000000 16M NOR flashes (AXP GP / AXP DB) - 0xf1000000 -> 0xf1100000 1M internal registers - 0xf1100000 -> 0xf1110000 64KB Crypto SRAM #0 - 0xf1110000 -> 0xf1120000 64KB Crypto SRAM #1 - 0xf8000000 -> 0xffe0000 126M PCIe memory - 0xffe00000 -> 0xfff00000 1M PCIe I/O - 0xfff0000 -> 0xffffffff 1M BootROM And the memory mapping for the special case of the OpenBlocks AX3-4 (internal registers at 0xd0000000, NOR of 128 MB): - 0x00000000 -> 0xc0000000 3G RAM - 0xd0000000 -> 0xd1000000 1M internal registers - 0xe800000 -> 0xf0000000 128M NOR flash - 0xf1100000 -> 0xf1110000 64KB Crypto SRAM #0 - 0xf1110000 -> 0xf1120000 64KB Crypto SRAM #1 - 0xf8000000 -> 0xffe0000 126M PCIe memory - 0xffe00000 -> 0xfff00000 1M PCIe I/O - 0xfff0000 -> 0xffffffff 1M BootROM Fixes: c466d997bb16 ("ARM: mvebu: define crypto SRAM ranges for all armada-xp boards") Reported-by: Phil Sutter <phil@nwl.cc> Cc: Phil Sutter <phil@nwl.cc> Cc: <stable@vger.kernel.org> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Olof Johansson <olof@lixom.net>
2016-03-11	drm/i915: Actually retry with bit-banging after GMBUS timeout	Ville Syrjälä	1	-0/+6
	After the GMBUS transfer times out, we set force_bit=1 and return -EAGAIN expecting the i2c core to call the .master_xfer hook again so that we will retry the same transfer via bit-banging. This is in case the gmbus hardware is somehow faulty. Unfortunately we left adapter->retries to 0, meaning the i2c core didn't actually do the retry. Let's tell the core we want one retry when we return -EAGAIN. Note that i2c-algo-bit also uses this retry count for some internal retries, so we'll end up increasing those a bit as well. Cc: Jani Nikula <jani.nikula@intel.com> Cc: drm-intel-fixes@lists.freedesktop.org Fixes: bffce907d640 ("drm/i915: abstract i2c bit banging fallback in gmbus xfer") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1457366220-29409-2-git-send-email-ville.syrjala@linux.intel.com Reviewed-by: Jani Nikula <jani.nikula@intel.com> (cherry picked from commit 8b1f165a4a8f64c28cf42d10e1f4d3b451dedc51) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-03-11	ACPI / APEI: ERST: Fixed leaked resources in erst_init	Josh Hunt	1	-0/+3
	erst_init currently leaks resources allocated from its call to apei_resources_init(). The data allocated there gets copied into apei_resources_all and can be freed when we're done with it. Signed-off-by: Josh Hunt <johunt@akamai.com> Reviewed-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-11	ACPI / APEI: Fix leaked resources	Josh Hunt	1	-2/+4
	We leak the NVS and arch resources (if used), in apei_resources_request. They are allocated to make sure we exclude them from the APEI resources, but they are never freed at the end of the function. Free them now. Signed-off-by: Josh Hunt <johunt@akamai.com> Reviewed-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-11	intel_pstate: Do not skip samples partially	Rafael J. Wysocki	1	-5/+7
	If the current value of MPERF or the current value of TSC is the same as the previous one, respectively, intel_pstate_sample() bails out early and skips the sample. However, intel_pstate_adjust_busy_pstate() is still called in that case which is not correct, so modify intel_pstate_sample() to return a bool value indicating whether or not the sample has been taken and use it to decide whether or not to call intel_pstate_adjust_busy_pstate(). While at it, remove redundant parentheses from the MPERF/TSC check in intel_pstate_sample(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-03-11	intel_pstate: Remove freq calculation from intel_pstate_calc_busy()	Philippe Longepe	1	-8/+8
	Use a helper function to compute the average pstate and call it only where it is needed (only when tracing or in intel_pstate_get). Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-11	intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()	Philippe Longepe	1	-3/+2
	The cpu_load algorithm doesn't need to invoke intel_pstate_calc_busy(), so move that call from intel_pstate_sample() to get_target_pstate_use_performance(). Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-11	intel_pstate: Optimize calculation for max/min_perf_adj	Philippe Longepe	1	-2/+2
	mul_fp(int_tofp(A), B) expands to: ((A << FRAC_BITS) * B) >> FRAC_BITS, so the same result can be obtained via simple multiplication A * B. Apply this observation to max_perf * limits->max_perf and max_perf * limits->min_perf in intel_pstate_get_min_max()." Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-11	intel_pstate: Remove extra conversions in pid calculation	Philippe Longepe	1	-4/+4
	pid->setpoint and pid->deadband can be initialized in fixed point, so we can avoid the int_tofp in pid_calc. Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-10	i2c: designware: Add device HID for future AMD I2C controller	Xiangliang Yu	2	-0/+2
	Add device HID AMDI0010 to match the AMD ACPI Vendor ID (AMDI) that was registered in http://www.uefi.org/acpi_id_list, and the I2C controller on future AMD paltform will use the HID instead of AMD0010. Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-10	device property: fix for a case of use-after-free	Heikki Krogerus	1	-8/+17
	In device_remove_property_set(), the secondary fwnode needs to be cleared before the pset is freed. This fixes a use-after-free when a property set is providing the primary fwnode. As a result of the fix, the primary fwnode may end up containing ERR_PTR(-ENODEV), so also adding checks for it to the property handling code. Reported-by: John Youn <John.Youn@synopsys.com> Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-10	ACPICA / Interpreter: Fix a regression triggered because of wrong Linux ECDT support	Lv Zheng	3	-39/+32
	It is reported that the following commit triggers regressions: Linux commit: efaed9be998b5ae0afb7458e057e5f4402b43fa0 ACPICA commit: 31178590dde82368fdb0f6b0e466b6c0add96c57 Subject: ACPICA: Events: Enhance acpi_ev_execute_reg_method() to ensure no _REG evaluations can happen during OS early boot stages This is because that the ECDT support is not corrected in Linux, and Linux requires to execute _REG for ECDT (though this sounds so wrong), we need to ensure acpi_gbl_namespace_initialized is set before ECDT probing in order for _REG to be executed. Since we have to move "acpi_gbl_namespace_initialized = TRUE" to the initialization step happening before ECDT probing, acpi_load_tables() is the best candidate for now. Thus this patch fixes the regression by doing so. But if the ECDT support is fixed, Linux will not execute _REG for ECDT, and ECDT probing will happen before acpi_load_tables(). At that time, we still want to ensure acpi_gbl_namespace_initialized is set after executing acpi_ns_initialize_objects() (under the condition of acpi_gbl_group_module_level_code = FALSE), this patch also moves acpi_ns_initialize_objects() to acpi_load_tables() accordingly. Since acpi_ns_initialize_objects() doesn't seem to be skippable, this patch also removes ACPI_NO_OBJECT_INIT for the one invoked in acpi_load_tables(). And since the default region handlers should always be installed before loading the tables, this patch also removes useless acpi_gbl_group_module_level_code check accordingly. Reported by Chris Bainbridge, Fixed by Lv Zheng. Fixes: efaed9be998b (ACPICA: Events: Enhance acpi_ev_execute_reg_method() to ensure no _REG evaluations can happen during OS early boot stages) Reported-and-tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-10	cpufreq: Move scheduler-related code to the sched directory	Rafael J. Wysocki	7	-88/+96
	Create cpufreq.c under kernel/sched/ and move the cpufreq code related to the scheduler to that file and to sched.h. Redefine cpufreq_update_util() as a static inline function to avoid function calls at its call sites in the scheduler code (as suggested by Peter Zijlstra). Also move the definition of struct update_util_data and declaration of cpufreq_set_update_util_data() from include/linux/cpufreq.h to include/linux/sched.h. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-03-10	MAINTAINERS: add a maintainer for the NAND subsystem	Boris BREZILLON	1	-0/+11
	Add myself as the maintainer of the NAND subsystem. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Acked-by: Richard Weinberger <richard@nod.at> Acked-by: Brian Norris <computersforpeace@gmail.com> Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>
2016-03-10	[media] media-device: map new functions into old types for legacy API	Mauro Carvalho Chehab	2	-1/+28
	The legacy media controller userspace API exposes entity types that carry both type and function information. The new API replaces the type with a function. It preserves backward compatibility by defining legacy functions for the existing types and using them in drivers. This works fine, as long as newer entity functions won't be added. Unfortunately, some tools, like media-ctl with --print-dot argument rely on the now legacy MEDIA_ENT_T_V4L2_SUBDEV and MEDIA_ENT_T_DEVNODE numeric ranges to identify what entities will be shown. Also, if the entity doesn't match those ranges, it will ignore the major/minor information on devnodes, and won't be getting the devnode name via udev or sysfs. As we're now adding devices outside the old range, the legacy ioctl needs to map the new entity functions into a type at the old range, or otherwise we'll have a regression. Detected on all released media-ctl versions (e. g. versions <= 1.10). Fix this by deriving the type from the function to emulate the legacy API if the function isn't in the legacy functions range. Reported-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2016-03-10	dmaengine: at_xdmac: fix residue computation	Ludovic Desroches	1	-3/+39
	When computing the residue we need two pieces of information: the current descriptor and the remaining data of the current descriptor. To get that information, we need to read consecutively two registers but we can't do it in an atomic way. For that reason, we have to check manually that current descriptor has not changed. Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com> Suggested-by: Cyrille Pitchen <cyrille.pitchen@atmel.com> Reported-by: David Engraf <david.engraf@sysgo.com> Tested-by: David Engraf <david.engraf@sysgo.com> Fixes: e1f7c9eee707 ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver") Cc: stable@vger.kernel.org #4.1 and later Signed-off-by: Vinod Koul <vinod.koul@intel.com>
2016-03-10	x86/delay: Avoid preemptible context checks in delay_mwaitx()	Borislav Petkov	1	-1/+1
	We do use this_cpu_ptr(&cpu_tss) as a cacheline-aligned, seldomly accessed per-cpu var as the MONITORX target in delay_mwaitx(). However, when called in preemptible context, this_cpu_ptr -> smp_processor_id() -> debug_smp_processor_id() fires: BUG: using smp_processor_id() in preemptible [00000000] code: udevd/312 caller is delay_mwaitx+0x40/0xa0 But we don't care about that check - we only need cpu_tss as a MONITORX target and it doesn't really matter which CPU's var we're touching as we're going idle anyway. Fix that. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Huang Rui <ray.huang@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: spg_linux_kernel@amd.com Link: http://lkml.kernel.org/r/20160309205622.GG6564@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10	KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0	Paolo Bonzini	1	-1/+3
	KVM has special logic to handle pages with pte.u=1 and pte.w=0 when CR0.WP=1. These pages' SPTEs flip continuously between two states: U=1/W=0 (user and supervisor reads allowed, supervisor writes not allowed) and U=0/W=1 (supervisor reads and writes allowed, user writes not allowed). When SMEP is in effect, however, U=0 will enable kernel execution of this page. To avoid this, KVM also sets NX=1 in the shadow PTE together with U=0, making the two states U=1/W=0/NX=gpte.NX and U=0/W=1/NX=1. When guest EFER has the NX bit cleared, the reserved bit check thinks that the latter state is invalid; teach it that the smep_andnot_wp case will also use the NX bit of SPTEs. Cc: stable@vger.kernel.org Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.inel.com> Fixes: c258b62b264fdc469b6d3610a907708068145e3b Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-10	KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo	Paolo Bonzini	2	-14/+25
	Yes, all of these are needed. :) This is admittedly a bit odd, but kvm-unit-tests access.flat tests this if you run it with "-cpu host" and of course ept=0. KVM runs the guest with CR0.WP=1, so it must handle supervisor writes specially when pte.u=1/pte.w=0/CR0.WP=0. Such writes cause a fault when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0. When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and restarts execution. This will still cause a user write to fault, while supervisor writes will succeed. User reads will fault spuriously now, and KVM will then flip U and W again in the SPTE (U=1, W=0). User reads will be enabled and supervisor writes disabled, going back to the originary situation where supervisor writes fault spuriously. When SMEP is in effect, however, U=0 will enable kernel execution of this page. To avoid this, KVM also sets NX=1 in the shadow PTE together with U=0. If the guest has not enabled NX, the result is a continuous stream of page faults due to the NX bit being reserved. The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER switch. (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry control, so they do not use user-return notifiers for EFER---if they did, EFER.NX would be forced to the same value as the host). There is another bug in the reserved bit check, which I've split to a separate patch for easier application to stable kernels. Cc: stable@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Fixes: f6577a5fa15d82217ca73c74cd2dcbc0f6c781dd Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-10	x86/fpu: Revert ("x86/fpu: Disable AVX when eagerfpu is off")	Yu-cheng Yu	2	-11/+4
	Leonid Shatz noticed that the SDM interpretation of the following recent commit: 394db20ca240741 ("x86/fpu: Disable AVX when eagerfpu is off") ... is incorrect and that the original behavior of the FPU code was correct. Because AVX is not stated in CR0 TS bit description, it was mistakenly believed to be not supported for lazy context switch. This turns out to be false: Intel Software Developer's Manual Vol. 3A, Sec. 2.5 Control Registers: 'TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the x87 FPU/ MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch to be delayed until an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 instruction is actually executed by the new task.' Intel Software Developer's Manual Vol. 2A, Sec. 2.4 Instruction Exception Specification: 'AVX instructions refer to exceptions by classes that include #NM "Device Not Available" exception for lazy context switch.' So revert the commit. Reported-by: Leonid Shatz <leonid.shatz@ravellosystems.com> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1457569734-3785-1-git-send-email-yu-cheng.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10	s390/mm: four page table levels vs. fork	Martin Schwidefsky	2	-10/+30
	The fork of a process with four page table levels is broken since git commit 6252d702c5311ce9 "[S390] dynamic page tables." All new mm contexts are created with three page table levels and an asce limit of 4TB. If the parent has four levels dup_mmap will add vmas to the new context which are outside of the asce limit. The subsequent call to copy_page_range will walk the three level page table structure of the new process with non-zero pgd and pud indexes. This leads to memory clobbers as the pgd_index and the pud_index is added to the mm->pgd pointer without a pgd_deref in between. The init_new_context() function is selecting the number of page table levels for a new context. The function is used by mm_init() which in turn is called by dup_mm() and mm_alloc(). These two are used by fork() and exec(). The init_new_context() function can distinguish the two cases by looking at mm->context.asce_limit, for fork() the mm struct has been copied and the number of page table levels may not change. For exec() the mm_alloc() function set the new mm structure to zero, in this case a three-level page table is created as the temporary stack space is located at STACK_TOP_MAX = 4TB. This fixes CVE-2016-2143. Reported-by: Marcin Kościelnicki <koriakin@0x04.net> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-09	ext4: iterate over buffer heads correctly in move_extent_per_page()	Eryu Guan	1	-0/+1
	In commit bcff24887d00 ("ext4: don't read blocks from disk after extents being swapped") bh is not updated correctly in the for loop and wrong data has been written to disk. generic/324 catches this on sub-page block size ext4. Fixes: bcff24887d00 ("ext4: don't read blocks from disk after extentsbeing swapped") Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-03-09	dma-mapping: avoid oops when parameter cpu_addr is null	Zhen Lei	1	-1/+1
	To keep consistent with kfree, which tolerate ptr is NULL. We do this because sometimes we may use goto statement, so that success and failure case can share parts of the code. But unfortunately, dma_free_coherent called with parameter cpu_addr is null will cause oops, such as showed below: Unable to handle kernel paging request at virtual address ffffffc020d3b2b8 pgd = ffffffc083a61000 [ffffffc020d3b2b8] pgd=0000000000000000, pud=0000000000000000 CPU: 4 PID: 1489 Comm: malloc_dma_1 Tainted: G O 4.1.12 #1 Hardware name: ARM64 (DT) PC is at __dma_free_coherent.isra.10+0x74/0xc8 LR is at __dma_free+0x9c/0xb0 Process malloc_dma_1 (pid: 1489, stack limit = 0xffffffc0837fc020) [...] Call trace: __dma_free_coherent.isra.10+0x74/0xc8 __dma_free+0x9c/0xb0 malloc_dma+0x104/0x158 [dma_alloc_coherent_mtmalloc] kthread+0xec/0xfc Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-09	mm/hugetlb: use EOPNOTSUPP in hugetlb sysctl handlers	Jan Stancek	1	-2/+2
	Replace ENOTSUPP with EOPNOTSUPP. If hugepages are not supported, this value is propagated to userspace. EOPNOTSUPP is part of uapi and is widely supported by libc libraries. It gives nicer message to user, rather than: # cat /proc/sys/vm/nr_hugepages cat: /proc/sys/vm/nr_hugepages: Unknown error 524 And also LTP's proc01 test was failing because this ret code (524) was unexpected: proc01 1 TFAIL : proc01.c:396: read failed: /proc/sys/vm/nr_hugepages: errno=???(524): Unknown error 524 proc01 2 TFAIL : proc01.c:396: read failed: /proc/sys/vm/nr_hugepages_mempolicy: errno=???(524): Unknown error 524 proc01 3 TFAIL : proc01.c:396: read failed: /proc/sys/vm/nr_overcommit_hugepages: errno=???(524): Unknown error 524 Signed-off-by: Jan Stancek <jstancek@redhat.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-09	memremap: check pfn validity before passing to pfn_to_page()	Ard Biesheuvel	1	-2/+2
	In memremap's helper function try_ram_remap(), we dereference a struct page pointer that was derived from a PFN that is known to be covered by a 'System RAM' iomem region, and is thus assumed to be a 'valid' PFN, i.e., a PFN that has a struct page associated with it and is covered by the kernel direct mapping. However, the assumption that there is a 1:1 relation between the System RAM iomem region and the kernel direct mapping is not universally valid on all architectures, and on ARM and arm64, 'System RAM' may include regions for which pfn_valid() returns false. Generally speaking, both __va() and pfn_to_page() should only ever be called on PFNs/physical addresses for which pfn_valid() returns true, so add that check to try_ram_remap(). Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-09	mm, thp: fix migration of PTE-mapped transparent huge pages	Kirill A. Shutemov	1	-1/+1
	We don't have native support of THP migration, so we have to split huge page into small pages in order to migrate it to different node. This includes PTE-mapped huge pages. I made mistake in refcounting patchset: we don't actually split PTE-mapped huge page in queue_pages_pte_range(), if we step on head page. The result is that the head page is queued for migration, but none of tail pages: putting head page on queue takes pin on the page and any subsequent attempts of split_huge_pages() would fail and we skip queuing tail pages. unmap_and_move_huge_page() will eventually split the huge pages, but only one of 512 pages would get migrated. Let's fix the situation. Fixes: 248db92da13f2507 ("migrate_pages: try to split pages on queuing") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-09	dax: check return value of dax_radix_entry()	Ross Zwisler	1	-1/+8
	dax_pfn_mkwrite() previously wasn't checking the return value of the call to dax_radix_entry(), which was a mistake. Instead, capture this return value and return the appropriate VM_FAULT_ value. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>