linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2017-03-21	drm/exynos/decon5433: signal frame done interrupt at front porch	Andrzej Hajda	2	-1/+5
	DECON in case of video mode generates interrupt by default at start of vertical back porch. As this interrupt is used to generate VBLANK events more optimal point is start of vertical front porch. Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-03-21	drm/exynos/decon5433: fix vblank event handling	Andrzej Hajda	2	-1/+85
	Current implementation of event handling assumes that vblank interrupt is always called at the right time. It is not true, it can be delayed due to various reasons. As a result different races can happen. The patch fixes the issue by using hardware frame counter present in DECON to serialize vblank and commit completion events. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-03-21	drm/exynos: move crtc event handling to drivers callbacks	Andrzej Hajda	7	-13/+24
	CRTC event is currently send with next vblank, or instantly in case crtc is being disabled. This approach usually works, but in corner cases it can result in premature event generation. Only device driver is able to verify if the event can be sent. This patch is a first step in that direction - it moves event handling to the drivers. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-03-21	drm/exynos: Remove support for Exynos4415 (SoC not supported anymore)	Krzysztof Kozlowski	4	-32/+3
	Support for Exynos4415 is going away because there are no internal nor external users. Since commit 46dcf0ff0de3 ("ARM: dts: exynos: Remove exynos4415.dtsi"), the platform cannot be instantiated so remove also the drivers. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Kukjin Kim <kgene@kernel.org> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-03-21	drm/exynos/decon5433: & vs \| typo	Dan Carpenter	1	-1/+1
	"&" was obviously intended instead of "\|". The original condition is always true. Fixes: b93c2e8b5d9d ("drm/exynos/decon5433: configure sysreg in case of hardware trigger") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
2017-03-15	drm/amd/amdgpu: Fix debugfs reg read/write address width	Tom St Denis	1	-2/+2
	The MMIO space is wider now so we mask the lower 22 bits instead of 18. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-03-15	drm/amdgpu/si: add dpm quirk for Oland	Alex Deucher	1	-0/+6
	OLAND 0x1002:0x6604 0x1028:0x066F 0x00 seems to have problems with higher sclks. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2017-03-15	drm/radeon/si: add dpm quirk for Oland	Alex Deucher	1	-0/+6
	OLAND 0x1002:0x6604 0x1028:0x066F 0x00 seems to have problems with higher sclks. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2017-03-14	drm: amd: remove broken include path	Arnd Bergmann	1	-2/+0
	The AMD ACP driver adds "-I../acp -I../acp/include" to the gcc command line, which makes no sense, since these are evaluated relative to the build directory. When we build with "make W=1", they instead cause a warning: cc1: error: ../acp/: No such file or directory [-Werror=missing-include-dirs] cc1: error: ../acp/include: No such file or directory [-Werror=missing-include-dirs] cc1: all warnings being treated as errors ../scripts/Makefile.build:289: recipe for target 'drivers/gpu/drm/amd/amdgpu/amdgpu_drv.o' failed ../scripts/Makefile.build:289: recipe for target 'drivers/gpu/drm/amd/amdgpu/amdgpu_device.o' failed ../scripts/Makefile.build:289: recipe for target 'drivers/gpu/drm/amd/amdgpu/amdgpu_kms.o' failed This removes the subdir-ccflags variable that evidently did not serve any purpose here. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-03-14	drm/amd/powerplay: fix copy error in smu7_clockpoweragting.c	Rex Zhu	1	-1/+1
	Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-03-14	drm/tilcdc: Set framebuffer DMA address to HW only if CRTC is enabled	Jyri Sarha	1	-12/+23
	Touching HW while clocks are off is a serious error and for instance breaks suspend functionality. After this patch tilcdc_crtc_update_fb() always updates the primary plane's framebuffer pointer, increases fb's reference count and stores vblank event. tilcdc_crtc_update_fb() only writes the fb's DMA address to HW if the crtc is enabled, as tilcdc_crtc_enable() takes care of writing the address on enable. This patch also refactors the tilcdc_crtc_update_fb() a bit. Number of subsequent small changes had made it almost unreadable. There should be no other functional changes but checking the CRTC's enable state. However, the locking goes a bit differently and some of the redundant checks have been removed in this new version. The enable_lock should be enough to protect the access to tilcdc_crtc->enabled. The irq_lock protects the access to last_vblank and next_fb. The check for vrefresh and last_vblank being valid is redundant, as the vrefresh should be always valid if the CRTC is enabled and now last_vblank should be too, because it is initialized to current time when CRTC raster is enabled. If for some reason the values are not correctly initialized the division by zero warning is quite appropriate. Signed-off-by: Jyri Sarha <jsarha@ti.com> Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
2017-03-14	drm/tilcdc: Fix hardcoded fail-return value in tilcdc_crtc_create()	Jyri Sarha	1	-1/+1
	Fix badly hardcoded return return value under fail-label. All goto branches to the label set the "ret"-variable accordingly. Signed-off-by: Jyri Sarha <jsarha@ti.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
2017-03-13	drm/i915: Fix forcewake active domain tracking	Tvrtko Ursulin	1	-7/+6
	In commit 003342a50021 ("drm/i915: Keep track of active forcewake domains in a bitmask") I forgot to adjust the newly introduce fw_domains_active state across reset. This caused the assert_forcewakes_inactive to trigger during suspend and resume if there were user held forcewakes. v2: Bitmask checks are required since vfuncs are not always present. v3: Move bitmask tracking to get/put vfunc for simplicity. (Chris Wilson) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: 003342a50021 ("drm/i915: Keep track of active forcewake domains in a bitmask") Testcase: igt/drv_suspend/forcewake Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: "Paneri, Praveen" <praveen.paneri@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: intel-gfx@lists.freedesktop.org Cc: v4.10+ <stable@vger.kernel.org> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170310093249.4484-1-tvrtko.ursulin@linux.intel.com (cherry picked from commit b8473050805f35add97f3ff57570d55a01808df5) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-03-13	drm/i915: Nuke skl_update_plane debug message from the pipe update critical section	Maarten Lankhorst	1	-3/+0
	printks are slow so we should not be doing them from the vblank evade critical section. These could explain why we sometimes seem to blow past our 100 usec deadline. The problem has been there ever since commit c331879ce8ea ("drm/i915: skylake sprite plane scaling using shared scalers.") but it may not have been readily visible until commit e1edbd44e23b ("drm/i915: Complain if we take too long under vblank evasion.") increased our chances of noticing it. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1488974407-25175-1-git-send-email-maarten.lankhorst@linux.intel.com Fixes: c331879ce8ea ("drm/i915: skylake sprite plane scaling using shared scalers") Cc: <stable@vger.kernel.org> # v4.2+ Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> [mlankhorst: Add missing tags, point to the correct offending commit] (cherry picked from commit d38146b9ee16264ff9a88bf3391ab9f2f5af3646) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-03-13	drm/i915: use correct node for handling cache domain eviction	Matthew Auld	1	-4/+4
	It looks like we were incorrectly comparing vma->node against itself instead of the target node, when evicting for a node on systems where we need guard pages between regions with different cache domains. As a consequence we can end up trying to needlessly evict neighbouring nodes, even if they have the same cache domain, and if they were pinned we would fail the eviction. Fixes: 625d988acc28 ("drm/i915: Extract reserving space in the GTT to a helper") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: http://patchwork.freedesktop.org/patch/msgid/20170306235414.23407-3-matthew.auld@intel.com Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (cherry picked from commit fe65cbdbc97929e4a522716ed279a36783656142) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2017-03-13	uapi: fix drm/omap_drm.h userspace compilation errors	Dmitry V. Levin	1	-19/+19
	Consistently use types from linux/types.h like in other uapi drm/_drm.h header files to fix the following drm/omap_drm.h userspace compilation errors: /usr/include/drm/omap_drm.h:36:2: error: unknown type name 'uint64_t' uint64_t param; / in / /usr/include/drm/omap_drm.h:37:2: error: unknown type name 'uint64_t' uint64_t value; / in (set_param), out (get_param) / /usr/include/drm/omap_drm.h:56:2: error: unknown type name 'uint32_t' uint32_t bytes; / (for non-tiled formats) / /usr/include/drm/omap_drm.h:58:3: error: unknown type name 'uint16_t' uint16_t width; /usr/include/drm/omap_drm.h:59:3: error: unknown type name 'uint16_t' uint16_t height; /usr/include/drm/omap_drm.h:65:2: error: unknown type name 'uint32_t' uint32_t flags; / in / /usr/include/drm/omap_drm.h:66:2: error: unknown type name 'uint32_t' uint32_t handle; / out / /usr/include/drm/omap_drm.h:67:2: error: unknown type name 'uint32_t' uint32_t __pad; /usr/include/drm/omap_drm.h:77:2: error: unknown type name 'uint32_t' uint32_t handle; / buffer handle (in) / /usr/include/drm/omap_drm.h:78:2: error: unknown type name 'uint32_t' uint32_t op; / mask of omap_gem_op (in) / /usr/include/drm/omap_drm.h:82:2: error: unknown type name 'uint32_t' uint32_t handle; / buffer handle (in) / /usr/include/drm/omap_drm.h:83:2: error: unknown type name 'uint32_t' uint32_t op; / mask of omap_gem_op (in) / /usr/include/drm/omap_drm.h:88:2: error: unknown type name 'uint32_t' uint32_t nregions; /usr/include/drm/omap_drm.h:89:2: error: unknown type name 'uint32_t' uint32_t __pad; /usr/include/drm/omap_drm.h:93:2: error: unknown type name 'uint32_t' uint32_t handle; / buffer handle (in) / /usr/include/drm/omap_drm.h:94:2: error: unknown type name 'uint32_t' uint32_t pad; /usr/include/drm/omap_drm.h:95:2: error: unknown type name 'uint64_t' uint64_t offset; / mmap offset (out) / /usr/include/drm/omap_drm.h:102:2: error: unknown type name 'uint32_t' uint32_t size; / virtual size for mmap'ing (out) */ /usr/include/drm/omap_drm.h:103:2: error: unknown type name 'uint32_t' uint32_t __pad; Fixes: ef6503e89194 ("drm: Kbuild: add omap_drm.h to the installed headers") Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
2017-03-13	drm/omap: fix dmabuf mmap for dma_alloc'ed buffers	Tomi Valkeinen	1	-3/+0
	omap_gem_dmabuf_mmap() returns an error (with a WARN) when called for a buffer which is allocated with dma_alloc_*(). This prevents dmabuf mmap from working on SoCs without DMM, e.g. AM4 and OMAP3. I could not find any reason for omap_gem_dmabuf_mmap() rejecting such buffers, and just removing the if() fixes the limitation. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
2017-03-12	Linux 4.11-rc2	Linus Torvalds	1	-1/+1

2017-03-12	x86/tlb: Fix tlb flushing when lguest clears PGE	Daniel Borkmann	1	-1/+1
	Fengguang reported random corruptions from various locations on x86-32 after commits d2852a224050 ("arch: add ARCH_HAS_SET_MEMORY config") and 9d876e79df6a ("bpf: fix unlocking of jited image when module ronx not set") that uses the former. While x86-32 doesn't have a JIT like x86_64, the bpf_prog_lock_ro() and bpf_prog_unlock_ro() got enabled due to ARCH_HAS_SET_MEMORY, whereas Fengguang's test kernel doesn't have module support built in and therefore never had the DEBUG_SET_MODULE_RONX setting enabled. After investigating the crashes further, it turned out that using set_memory_ro() and set_memory_rw() didn't have the desired effect, for example, setting the pages as read-only on x86-32 would still let probe_kernel_write() succeed without error. This behavior would manifest itself in situations where the vmalloc'ed buffer was accessed prior to set_memory_*() such as in case of bpf_prog_alloc(). In cases where it wasn't, the page attribute changes seemed to have taken effect, leading to the conclusion that a TLB invalidate didn't happen. Moreover, it turned out that this issue reproduced with qemu in "-cpu kvm64" mode, but not for "-cpu host". When the issue occurs, change_page_attr_set_clr() did trigger a TLB flush as expected via __flush_tlb_all() through cpa_flush_range(), though. There are 3 variants for issuing a TLB flush: invpcid_flush_all() (depends on CPU feature bits X86_FEATURE_INVPCID, X86_FEATURE_PGE), cr4 based flush (depends on X86_FEATURE_PGE), and cr3 based flush. For "-cpu host" case in my setup, the flush used invpcid_flush_all() variant, whereas for "-cpu kvm64", the flush was cr4 based. Switching the kvm64 case to cr3 manually worked fine, and further investigating the cr4 one turned out that X86_CR4_PGE bit was not set in cr4 register, meaning the __native_flush_tlb_global_irq_disabled() wrote cr4 twice with the same value instead of clearing X86_CR4_PGE in the first write to trigger the flush. It turned out that X86_CR4_PGE was cleared from cr4 during init from lguest_arch_host_init() via adjust_pge(). The X86_FEATURE_PGE bit is also cleared from there due to concerns of using PGE in guest kernel that can lead to hard to trace bugs (see bff672e630a0 ("lguest: documentation V: Host") in init()). The CPU feature bits are cleared in dynamic boot_cpu_data, but they never propagated to __flush_tlb_all() as it uses static_cpu_has() instead of boot_cpu_has() for testing which variant of TLB flushing to use, meaning they still used the old setting of the host kernel. Clearing via setup_clear_cpu_cap(X86_FEATURE_PGE) so this would propagate to static_cpu_has() checks is too late at this point as sections have been patched already, so for now, it seems reasonable to switch back to boot_cpu_has(X86_FEATURE_PGE) as it was prior to commit c109bf95992b ("x86/cpufeature: Remove cpu_has_pge"). This lets the TLB flush trigger via cr3 as originally intended, properly makes the new page attributes visible and thus fixes the crashes seen by Fengguang. Fixes: c109bf95992b ("x86/cpufeature: Remove cpu_has_pge") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: bp@suse.de Cc: Kees Cook <keescook@chromium.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: lkp@01.org Cc: Laura Abbott <labbott@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernrl.org/r/20170301125426.l4nf65rx4wahohyl@wfg-t540p.sh.intel.com Link: http://lkml.kernel.org/r/25c41ad9eca164be4db9ad84f768965b7eb19d9e.1489191673.git.daniel@iogearbox.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-11	score: Fix implicit includes now failing build after extable change	Guenter Roeck	2	-0/+3
	After changing from module.h to extable.h, score builds fail with: arch/score/kernel/traps.c: In function 'do_ri': arch/score/kernel/traps.c:248:4: error: implicit declaration of function 'user_disable_single_step' arch/score/mm/extable.c: In function 'fixup_exception': arch/score/mm/extable.c:32:38: error: dereferencing pointer to incomplete type arch/score/mm/extable.c:34:24: error: dereferencing pointer to incomplete type because extable.h doesn't drag in the same amount of headers as the module.h did. Add in the headers which were implicitly expected. Fixes: 90858794c960 ("module.h: remove extable.h include now users have migrated") Signed-off-by: Guenter Roeck <linux@roeck-us.net> [PG: tweak commit log; refresh for sched header refactoring.] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2017-03-10	kexec, x86/purgatory: Unbreak it and clean it up	Thomas Gleixner	10	-46/+78
	The purgatory code defines global variables which are referenced via a symbol lookup in the kexec code (core and arch). A recent commit addressing sparse warnings made these static and thereby broke kexec_file. Why did this happen? Simply because the whole machinery is undocumented and lacks any form of forward declarations. The variable names are unspecific and lack a prefix, so adding forward declarations creates shadow variables in the core code. Aside of that the code relies on magic constants and duplicate struct definitions with no way to ensure that these things stay in sync. The section placement of the purgatory variables happened by chance and not by design. Unbreak kexec and cleanup the mess: - Add proper forward declarations and document the usage - Use common struct definition - Use the proper common defines instead of magic constants - Add a purgatory_ prefix to have a proper name space - Use ARRAY_SIZE() instead of a homebrewn reimplementation - Add proper sections to the purgatory variables [ From Mike ] Fixes: 72042a8c7b01 ("x86/purgatory: Make functions and variables static") Reported-by: Mike Galbraith <<efault@gmx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Nicholas Mc Guire <der.herr@hofr.at> Cc: Borislav Petkov <bp@alien8.de> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Tobin C. Harding" <me@tobin.cc> Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1703101315140.3681@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-10	drm/amdgpu: fix parser init error path to avoid crash in parser fini	Dave Airlie	1	-0/+2
	If we don't reset the chunk info in the error path, the subsequent fini path will double free. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-03-10	drm/amd/amdgpu: Disable GFX_PG on Carrizo until compute issues solved	Tom St Denis	1	-1/+1
	Currently compute jobs will stall if GFX_PG is enabled. Until this is resolved we'll disable GFX_PG. Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
2017-03-10	drm: mali-dp: Fix smart layer not going to composition	Mihail Atanassov	3	-3/+18
	Use rectangle 1 as a generic plane. Existing code already sets the smart layer bounding box size + offset. The rectangles' offsets are relative to the bounding box, so there is no need to set R1's offset (reset value is 0), just its size which is the same as the bounding box. Signed-off-by: Mihail Atanassov <mihail.atanassov@arm.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
2017-03-10	drm: mali-dp: Remove mclk rate management	Mihail Atanassov	1	-2/+1
	The rate of mclk depends on the use-case. If no downscaling is required, then mclk == pxlclk is a valid option; with downscaling however, the rate at which mclk runs determines how much a plane can be downscaled before composition. This is a system integration + power management issue that is more suited to firmware rather than this driver. Signed-off-by: Mihail Atanassov <mihail.atanassov@arm.com> Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
2017-03-10	x86/reboot/quirks: Fix typo in ASUS EeeBook X205TA reboot quirk	Matjaz Hegedic	1	-1/+1
	The reboot quirk for ASUS EeeBook X205TA contains a typo in DMI_PRODUCT_NAME, improperly referring to X205TAW instead of X205TA, which prevents the quirk from being triggered. The model X205TAW already has a reboot quirk of its own. This fix simply removes the inappropriate final letter W. Fixes: 90b28ded88dd ("x86/reboot/quirks: Add ASUS EeeBook X205TA reboot quirk") Signed-off-by: Matjaz Hegedic <matjaz.hegedic@gmail.com> Link: http://lkml.kernel.org/r/1489064417-7445-1-git-send-email-matjaz.hegedic@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-10	drm: mxsfb: Implement drm_panel handling	Fabio Estevam	1	-0/+4
	Currently when the 'power-supply' regulator is passed via device tree it does not actually work since drm_panel_prepare()/drm_panel_enable() are never called. Quoting Thierry Reding: "It should really call drm_panel_prepare() and drm_panel_enable() while switching on the display pipeline and drm_panel_disable(), followed by drm_panel_unprepare() while switching off the display pipeline." So do as suggested, so that the 'power-supply' regulator can be functional. Reported-by: Breno Lima <breno.lima@nxp.com> Suggested-by: Thierry Reding <thierry.reding@gmail.com> Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Tested-by: Marek Vasut <marex@denx.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-10	drm: mxsfb_crtc: Fix the framebuffer misplacement	Fabio Estevam	1	-2/+2
	Currently the framebuffer content is displayed with incorrect offsets in both the vertical and horizontal directions. The fbdev version of the driver does not show this problem. Breno Lima dumped the eLCDIF controller registers on both the drm and fbdev drivers and noticed that the VDCTRL3 register is configured incorrectly in the drm driver. The fbdev driver calculates the vertical and horizontal wait counts of the VDCTRL3 register by doing: back porch + sync length. Looking at the horizontal and vertical timing diagram from include/drm/drm_modes.h this value corresponds to: crtc_[hv]total - crtc_[hv]sync_start So fix the VDCTRL3 register setting accordingly so that the eLCDIF controller can properly show the framebuffer content in the correct position. Reported-by: Breno Lima <breno.lima@nxp.com> Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Tested-by: Breno Lima <breno.lima@nxp.com> Tested-by: Marek Vasut <marex@denx.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-10	drm: mxsfb: Fix crash when provided invalid DT bindings	Marek Vasut	1	-0/+4
	The mxsfb driver will crash if the mxsfb DT node has a subnode, but the content of the subnode is not of-graph binding with an endpoint linking to panel. The crash was triggered by providing old-style panel bindings to the mxsfb driver instead of the new of-graph ones. The problem happens in mxsfb_create_output(), which is invoked from mxsfb_load(). The mxsfb_create_output() iterates over all mxsfb DT subnode endpoints and tries to bind a panel on each endpoint. If there is any problem binding the panel, that is, mxsfb->panel == NULL, this function will return an error code, otherwise success 0 is returned. If the subnodes do not specify of-graph binding with an endpoint, the iteration over endpoints in mxsfb_create_output() will have zero cycles and the function will immediatelly return 0, but the mxsfb->panel will remain NULL. This is propagated back into the mxsfb_load(), which does not detect any problem and expects that the mxsfb->panel is valid, thus calls mxsfb_panel_attach(). But since mxsfb->panel == NULL, mxsfb_panel_attach() is called with first argument NULL and this crashes the kernel. This patch fixes the problem by explicitly checking for valid mxsfb->panel at the end of the iteration in mxsfb_create_output(). Signed-off-by: Marek Vasut <marex@denx.de> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Stefan Agner <stefan@agner.ch> Cc: Breno Matheus Lima <brenomatheus@gmail.com> Tested-by: Breno Lima <breno.lima@nxp.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-10	drm: mxsfb: fix pixel clock polarity	Stefan Agner	1	-2/+9
	The DRM subsystem specifies the pixel clock polarity from a controllers perspective: DRM_BUS_FLAG_PIXDATA_NEGEDGE means the controller drives the data on pixel clocks falling edge. That is the controllers DOTCLK_POL=0 (Default is data launched at negative edge). Also change the data enable logic to be high active by default and only change if explicitly requested via bus_flags. With that defaults are: - Data enable: high active - Pixel clock polarity: controller drives data on negative edge Signed-off-by: Stefan Agner <stefan@agner.ch> Acked-by: Marek Vasut <marex@denx.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-10	drm: mxsfb: use bus_format to determine LCD bus width	Stefan Agner	2	-2/+33
	The LCD bus width does not need to align with the pixel format. The LCDIF controller automatically converts between pixel formats and bus width by padding or dropping LSBs. The DRM subsystem has the notion of bus_format which allows to determine what bus_formats are supported by the display. Choose the first available or fallback to 24 bit if none are available. Signed-off-by: Stefan Agner <stefan@agner.ch> Acked-by: Marek Vasut <marex@denx.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-03-09	userfaultfd: remove wrong comment from userfaultfd_ctx_get()	David Hildenbrand	1	-2/+0
	It's a void function, so there is no return value; Link: http://lkml.kernel.org/r/20170309150817.7510-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	fat: fix using uninitialized fields of fat_inode/fsinfo_inode	OGAWA Hirofumi	1	-1/+12
	Recently fallocate patch was merged and it uses MSDOS_I(inode)->mmu_private at fat_evict_inode(). However, fat_inode/fsinfo_inode that was introduced in past didn't initialize MSDOS_I(inode) properly. With those combinations, it became the cause of accessing random entry in FAT area. Link: http://lkml.kernel.org/r/87pohrj4i8.fsf@mail.parknet.co.jp Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Reported-by: Moreno Bartalucci <moreno.bartalucci@tecnorama.it> Tested-by: Moreno Bartalucci <moreno.bartalucci@tecnorama.it> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	sh: cayman: IDE support fix	Bartlomiej Zolnierkiewicz	1	-2/+0
	Remove incorrect CONFIG_IDE ifdef (CONFIG_IDE config option is for internal drivers/ide/ use) and make IDE hardware interface always initialized (not only when IDE subsystem is built-in). This patch allows Cayman board to work with modular IDE subsystem support and removes the requirement of having the whole core IDE subsystem built-in when using libata PATA support. Link: http://lkml.kernel.org/r/1990884.yFoE6lSB9G@amdc3058 Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	kasan: fix races in quarantine_remove_cache()	Dmitry Vyukov	1	-6/+36
	quarantine_remove_cache() frees all pending objects that belong to the cache, before we destroy the cache itself. However there are currently two possibilities how it can fail to do so. First, another thread can hold some of the objects from the cache in temp list in quarantine_put(). quarantine_put() has a windows of enabled interrupts, and on_each_cpu() in quarantine_remove_cache() can finish right in that window. These objects will be later freed into the destroyed cache. Then, quarantine_reduce() has the same problem. It grabs a batch of objects from the global quarantine, then unlocks quarantine_lock and then frees the batch. quarantine_remove_cache() can finish while some objects from the cache are still in the local to_free list in quarantine_reduce(). Fix the race with quarantine_put() by disabling interrupts for the whole duration of quarantine_put(). In combination with on_each_cpu() in quarantine_remove_cache() it ensures that quarantine_remove_cache() either sees the objects in the per-cpu list or in the global list. Fix the race with quarantine_reduce() by protecting quarantine_reduce() with srcu critical section and then doing synchronize_srcu() at the end of quarantine_remove_cache(). I've done some assessment of how good synchronize_srcu() works in this case. And on a 4 CPU VM I see that it blocks waiting for pending read critical sections in about 2-3% of cases. Which looks good to me. I suspect that these races are the root cause of some GPFs that I episodically hit. Previously I did not have any explanation for them. BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8 IP: qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:155 PGD 6aeea067 PUD 60ed7067 PMD 0 Oops: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 13667 Comm: syz-executor2 Not tainted 4.10.0+ #60 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff88005f948040 task.stack: ffff880069818000 RIP: 0010:qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:155 RSP: 0018:ffff88006981f298 EFLAGS: 00010246 RAX: ffffea0000ffff00 RBX: 0000000000000000 RCX: ffffea0000ffff1f RDX: 0000000000000000 RSI: ffff88003fffc3e0 RDI: 0000000000000000 RBP: ffff88006981f2c0 R08: ffff88002fed7bd8 R09: 00000001001f000d R10: 00000000001f000d R11: ffff88006981f000 R12: ffff88003fffc3e0 R13: ffff88006981f2d0 R14: ffffffff81877fae R15: 0000000080000000 FS: 00007fb911a2d700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000c8 CR3: 0000000060ed6000 CR4: 00000000000006f0 Call Trace: quarantine_reduce+0x10e/0x120 mm/kasan/quarantine.c:239 kasan_kmalloc+0xca/0xe0 mm/kasan/kasan.c:590 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544 slab_post_alloc_hook mm/slab.h:456 [inline] slab_alloc_node mm/slub.c:2718 [inline] kmem_cache_alloc_node+0x1d3/0x280 mm/slub.c:2754 __alloc_skb+0x10f/0x770 net/core/skbuff.c:219 alloc_skb include/linux/skbuff.h:932 [inline] _sctp_make_chunk+0x3b/0x260 net/sctp/sm_make_chunk.c:1388 sctp_make_data net/sctp/sm_make_chunk.c:1420 [inline] sctp_make_datafrag_empty+0x208/0x360 net/sctp/sm_make_chunk.c:746 sctp_datamsg_from_user+0x7e8/0x11d0 net/sctp/chunk.c:266 sctp_sendmsg+0x2611/0x3970 net/sctp/socket.c:1962 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 SYSC_sendto+0x660/0x810 net/socket.c:1685 SyS_sendto+0x40/0x50 net/socket.c:1653 I am not sure about backporting. The bug is quite hard to trigger, I've seen it few times during our massive continuous testing (however, it could be cause of some other episodic stray crashes as it leads to memory corruption...). If it is triggered, the consequences are very bad -- almost definite bad memory corruption. The fix is non trivial and has chances of introducing new bugs. I am also not sure how actively people use KASAN on older releases. [dvyukov@google.com: - sorted includes[ Link: http://lkml.kernel.org/r/20170309094028.51088-1-dvyukov@google.com Link: http://lkml.kernel.org/r/20170308151532.5070-1-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	kasan: resched in quarantine_remove_cache()	Dmitry Vyukov	1	-1/+8
	We see reported stalls/lockups in quarantine_remove_cache() on machines with large amounts of RAM. quarantine_remove_cache() needs to scan whole quarantine in order to take out all objects belonging to the cache. Quarantine is currently 1/32-th of RAM, e.g. on a machine with 256GB of memory that will be 8GB. Moreover quarantine scanning is a walk over uncached linked list, which is slow. Add cond_resched() after scanning of each non-empty batch of objects. Batches are specifically kept of reasonable size for quarantine_put(). On a machine with 256GB of RAM we should have ~512 non-empty batches, each with 16MB of objects. Link: http://lkml.kernel.org/r/20170308154239.25440-1-dvyukov@google.com Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Greg Thelen <gthelen@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	mm: do not call mem_cgroup_free() from within mem_cgroup_alloc()	Tahsin Erdogan	1	-3/+8
	mem_cgroup_free() indirectly calls wb_domain_exit() which is not prepared to deal with a struct wb_domain object that hasn't executed wb_domain_init(). For instance, the following warning message is printed by lockdep if alloc_percpu() fails in mem_cgroup_alloc(): INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 1 PID: 1950 Comm: mkdir Not tainted 4.10.0+ #151 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x67/0x99 register_lock_class+0x36d/0x540 __lock_acquire+0x7f/0x1a30 lock_acquire+0xcc/0x200 del_timer_sync+0x3c/0xc0 wb_domain_exit+0x14/0x20 mem_cgroup_free+0x14/0x40 mem_cgroup_css_alloc+0x3f9/0x620 cgroup_apply_control_enable+0x190/0x390 cgroup_mkdir+0x290/0x3d0 kernfs_iop_mkdir+0x58/0x80 vfs_mkdir+0x10e/0x1a0 SyS_mkdirat+0xa8/0xd0 SyS_mkdir+0x14/0x20 entry_SYSCALL_64_fastpath+0x18/0xad Add __mem_cgroup_free() which skips wb_domain_exit(). This is used by both mem_cgroup_free() and mem_cgroup_alloc() clean up. Fixes: 0b8f73e104285 ("mm: memcontrol: clean up alloc, online, offline, free functions") Link: http://lkml.kernel.org/r/20170306192122.24262-1-tahsin@google.com Signed-off-by: Tahsin Erdogan <tahsin@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	thp: fix another corner case of munlock() vs. THPs	Kirill A. Shutemov	1	-5/+4
	The following test case triggers BUG() in munlock_vma_pages_range(): int main(int argc, char *argv[]) { int fd; system("mount -t tmpfs -o huge=always none /mnt"); fd = open("/mnt/test", O_CREAT \| O_RDWR); ftruncate(fd, 4UL << 20); mmap(NULL, 4UL << 20, PROT_READ \| PROT_WRITE, MAP_SHARED \| MAP_FIXED \| MAP_LOCKED, fd, 0); mmap(NULL, 4096, PROT_READ \| PROT_WRITE, MAP_SHARED \| MAP_LOCKED, fd, 0); munlockall(); return 0; } The second mmap() create PTE-mapping of the first huge page in file. It makes kernel munlock the page as we never keep PTE-mapped page mlocked. On munlockall() when we handle vma created by the first mmap(), munlock_vma_page() returns page_mask == 0, as the page is not mlocked anymore. On next iteration follow_page_mask() return tail page, but page_mask is HPAGE_NR_PAGES - 1. It makes us skip to the first tail page of the next huge page and step on VM_BUG_ON_PAGE(PageMlocked(page)). The fix is not use the page_mask from follow_page_mask() at all. It has no use for us. Link: http://lkml.kernel.org/r/20170302150252.34120-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [4.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	rmap: fix NULL-pointer dereference on THP munlocking	Kirill A. Shutemov	1	-6/+7
	The following test case triggers NULL-pointer derefernce in try_to_unmap_one(): #include <fcntl.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> int main(int argc, char *argv[]) { int fd; system("mount -t tmpfs -o huge=always none /mnt"); fd = open("/mnt/test", O_CREAT \| O_RDWR); ftruncate(fd, 2UL << 20); mmap(NULL, 2UL << 20, PROT_READ \| PROT_WRITE, MAP_SHARED \| MAP_FIXED \| MAP_LOCKED, fd, 0); mmap(NULL, 2UL << 20, PROT_READ \| PROT_WRITE, MAP_SHARED \| MAP_LOCKED, fd, 0); munlockall(); return 0; } Apparently, there's a case when we call try_to_unmap() on huge PMDs: it's TTU_MUNLOCK. Let's handle this case correctly. Fixes: c7ab0d2fdc84 ("mm: convert try_to_unmap_one() to use page_vma_mapped_walk()") Link: http://lkml.kernel.org/r/20170302151159.30592-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	mm/memblock.c: fix memblock_next_valid_pfn()	AKASHI Takahiro	1	-1/+4
	Obviously, we should not access memblock.memory.regions[right] if 'right' is outside of [0..memblock.memory.cnt>. Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") Link: http://lkml.kernel.org/r/20170303023745.9104-1-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Paul Burton <paul.burton@imgtec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	userfaultfd: selftest: vm: allow to build in vm/ directory	Andrea Arcangeli	1	-0/+4
	linux/tools/testing/selftests/vm $ make gcc -Wall -I ../../../../usr/include compaction_test.c -lrt -o /compaction_test /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.4/../../../../x86_64-pc-linux-gnu/bin/ld: cannot open output file /compaction_test: Permission denied collect2: error: ld returned 1 exit status make: *** [../lib.mk:54: /compaction_test] Error 1 Since commit a8ba798bc8ec ("selftests: enable O and KBUILD_OUTPUT") selftests/vm build fails if run from the "selftests/vm" directory, but it works in the selftests/ directory. It's quicker to be able to do a local vm-only build after a tree wipe and this patch allows for it again. Link: http://lkml.kernel.org/r/20170302173738.18994-4-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	userfaultfd: non-cooperative: userfaultfd_remove revalidate vma in MADV_DONTNEED	Andrea Arcangeli	3	-13/+47
	userfaultfd_remove() has to be execute before zapping the pagetables or UFFDIO_COPY could keep filling pages after zap_page_range returned, which would result in non zero data after a MADV_DONTNEED. However userfaultfd_remove() may have to release the mmap_sem. This was handled correctly in MADV_REMOVE, but MADV_DONTNEED accessed a potentially stale vma (the very vma passed to zap_page_range(vma, ...)). The fix consists in revalidating the vma in case userfaultfd_remove() had to release the mmap_sem. This also optimizes away an unnecessary down_read/up_read in the MADV_REMOVE case if UFFD_EVENT_FORK had to be delivered. It all remains zero runtime cost in case CONFIG_USERFAULTFD=n as userfaultfd_remove() will be defined as "true" at build time. Link: http://lkml.kernel.org/r/20170302173738.18994-3-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	userfaultfd: non-cooperative: fix fork fctx->new memleak	Mike Rapoport	1	-0/+9
	We have a memleak in the ->new ctx if the uffd of the parent is closed before the fork event is read, nothing frees the new context. Link: http://lkml.kernel.org/r/20170302173738.18994-2-aarcange@redhat.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Andrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	mm/cgroup: avoid panic when init with low memory	Laurent Dufour	1	-2/+5
	The system may panic when initialisation is done when almost all the memory is assigned to the huge pages using the kernel command line parameter hugepage=xxxx. Panic may occur like this: Unable to handle kernel paging request for data at address 0x00000000 Faulting instruction address: 0xc000000000302b88 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2048 [ 0.082424] NUMA pSeries Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0-15-generic #16-Ubuntu task: c00000021ed01600 task.stack: c00000010d108000 NIP: c000000000302b88 LR: c000000000270e04 CTR: c00000000016cfd0 REGS: c00000010d10b2c0 TRAP: 0300 Not tainted (4.9.0-15-generic) MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>[ 0.082770] CR: 28424422 XER: 00000000 CFAR: c0000000003d28b8 DAR: 0000000000000000 DSISR: 40000000 SOFTE: 1 GPR00: c000000000270e04 c00000010d10b540 c00000000141a300 c00000010fff6300 GPR04: 0000000000000000 00000000026012c0 c00000010d10b630 0000000487ab0000 GPR08: 000000010ee90000 c000000001454fd8 0000000000000000 0000000000000000 GPR12: 0000000000004400 c00000000fb80000 00000000026012c0 00000000026012c0 GPR16: 00000000026012c0 0000000000000000 0000000000000000 0000000000000002 GPR20: 000000000000000c 0000000000000000 0000000000000000 00000000024200c0 GPR24: c0000000016eef48 0000000000000000 c00000010fff7d00 00000000026012c0 GPR28: 0000000000000000 c00000010fff7d00 c00000010fff6300 c00000010d10b6d0 NIP mem_cgroup_soft_limit_reclaim+0xf8/0x4f0 LR do_try_to_free_pages+0x1b4/0x450 Call Trace: do_try_to_free_pages+0x1b4/0x450 try_to_free_pages+0xf8/0x270 __alloc_pages_nodemask+0x7a8/0xff0 new_slab+0x104/0x8e0 ___slab_alloc+0x620/0x700 __slab_alloc+0x34/0x60 kmem_cache_alloc_node_trace+0xdc/0x310 mem_cgroup_init+0x158/0x1c8 do_one_initcall+0x68/0x1d0 kernel_init_freeable+0x278/0x360 kernel_init+0x24/0x170 ret_from_kernel_thread+0x5c/0x74 Instruction dump: eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 4e800020 3d230001 e9499a42 3d220004 3929acd8 794a1f24 7d295214 eac90100 <e9360000> 2fa90000 419eff74 3b200000 ---[ end trace 342f5208b00d01b6 ]--- This is a chicken and egg issue where the kernel try to get free memory when allocating per node data in mem_cgroup_init(), but in that path mem_cgroup_soft_limit_reclaim() is called which assumes that these data are allocated. As mem_cgroup_soft_limit_reclaim() is best effort, it should return when these data are not yet allocated. This patch also fixes potential null pointer access in mem_cgroup_remove_from_trees() and mem_cgroup_update_tree(). Link: http://lkml.kernel.org/r/1487856999-16581-2-git-send-email-ldufour@linux.vnet.ibm.com Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Balbir Singh <bsingharora@gmail.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	drivers/md/bcache/util.h: remove duplicate inclusion of blkdev.h	Masanari Iida	1	-1/+0
	Link: http://lkml.kernel.org/r/20170226060230.11555-1-standby24x7@gmail.com Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Coly Li <colyli@suse.de> Cc: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	mm/vmstats: add thp_split_pud event for clarity	Yisheng Xie	3	-1/+7
	We added support for PUD-sized transparent hugepages, however we count the event "thp split pud" into thp_split_pmd event. To separate the event count of thp split pud from pmd, add a new event named thp_split_pud. Link: http://lkml.kernel.org/r/1488282380-5076-1-git-send-email-xieyisheng1@huawei.com Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	include/linux/fs.h: fix unsigned enum warning with gcc-4.2	Arnd Bergmann	1	-1/+1
	With arm-linux-gcc-4.2, almost every file we build in the kernel ends up with this warning: include/linux/fs.h:2648: warning: comparison of unsigned expression < 0 is always false Later versions don't have this problem, but it's easy enough to work around. Link: http://lkml.kernel.org/r/20161216105634.235457-12-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	userfaultfd: non-cooperative: release all ctx in dup_userfaultfd_complete	Andrea Arcangeli	1	-13/+5
	Don't stop running dup_fctx() even if userfaultfd_event_wait_completion fails as it has to run userfaultfd_ctx_put on all ctx to pair against the userfaultfd_ctx_get that was run on all fctx->orig in dup_userfaultfd. Link: http://lkml.kernel.org/r/20170224181957.19736-4-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	userfaultfd: non-cooperative: robustness check	Andrea Arcangeli	1	-2/+7
	Similar to the handle_userfault() case, also make sure to never attempt to send any event past the PF_EXITING point of no return. This is purely a robustness check. Link: http://lkml.kernel.org/r/20170224181957.19736-3-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09	userfaultfd: non-cooperative: rollback userfaultfd_exit	Andrea Arcangeli	5	-43/+1
	Patch series "userfaultfd non-cooperative further update for 4.11 merge window". Unfortunately I noticed one relevant bug in userfaultfd_exit while doing more testing. I've been doing testing before and this was also tested by kbuild bot and exercised by the selftest, but this bug never reproduced before. I dropped userfaultfd_exit as result. I dropped it because of implementation difficulty in receiving signals in __mmput and because I think -ENOSPC as result from the background UFFDIO_COPY should be enough already. Before I decided to remove userfaultfd_exit, I noticed userfaultfd_exit wasn't exercised by the selftest and when I tried to exercise it, after moving it to a more correct place in __mmput where it would make more sense and where the vma list is stable, it resulted in the event_wait_completion in D state. So then I added the second patch to be sure even if we call userfaultfd_event_wait_completion too late during task exit(), we won't risk to generate tasks in D state. The same check exists in handle_userfault() for the same reason, except it makes a difference there, while here is just a robustness check and it's run under WARN_ON_ONCE. While looking at the userfaultfd_event_wait_completion() function I looked back at its callers too while at it and I think it's not ok to stop executing dup_fctx on the fcs list because we relay on userfaultfd_event_wait_completion to execute userfaultfd_ctx_put(fctx->orig) which is paired against userfaultfd_ctx_get(fctx->orig) in dup_userfault just before list_add(fcs). This change only takes care of fctx->orig but this area also needs further review looking for similar problems in fctx->new. The only patch that is urgent is the first because it's an use after free during a SMP race condition that affects all processes if CONFIG_USERFAULTFD=y. Very hard to reproduce though and probably impossible without SLUB poisoning enabled. This patch (of 3): I once reproduced this oops with the userfaultfd selftest, it's not easily reproducible and it requires SLUB poisoning to reproduce. general protection fault: 0000 [#1] SMP Modules linked in: CPU: 2 PID: 18421 Comm: userfaultfd Tainted: G ------------ T 3.10.0+ #15 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014 task: ffff8801f83b9440 ti: ffff8801f833c000 task.ti: ffff8801f833c000 RIP: 0010:[<ffffffff81451299>] [<ffffffff81451299>] userfaultfd_exit+0x29/0xa0 RSP: 0018:ffff8801f833fe80 EFLAGS: 00010202 RAX: ffff8801f833ffd8 RBX: 6b6b6b6b6b6b6b6b RCX: ffff8801f83b9440 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800baf18600 RBP: ffff8801f833fee8 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: ffffffff8127ceb3 R12: 0000000000000000 R13: ffff8800baf186b0 R14: ffff8801f83b99f8 R15: 00007faed746c700 FS: 0000000000000000(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007faf0966f028 CR3: 0000000001bc6000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: do_exit+0x297/0xd10 SyS_exit+0x17/0x20 tracesys+0xdd/0xe2 Code: 00 00 66 66 66 66 90 55 48 89 e5 41 54 53 48 83 ec 58 48 8b 1f 48 85 db 75 11 eb 73 66 0f 1f 44 00 00 48 8b 5b 10 48 85 db 74 64 <4c> 8b a3 b8 00 00 00 4d 85 e4 74 eb 41 f6 84 24 2c 01 00 00 80 RIP [<ffffffff81451299>] userfaultfd_exit+0x29/0xa0 RSP <ffff8801f833fe80> ---[ end trace 9fecd6dcb442846a ]--- In the debugger I located the "mm" pointer in the stack and walking mm->mmap->vm_next through the end shows the vma->vm_next list is fully consistent and it is null terminated list as expected. So this has to be an SMP race condition where userfaultfd_exit was running while the vma list was being modified by another CPU. When userfaultfd_exit() run one of the ->vm_next pointers pointed to SLAB_POISON (RBX is the vma pointer and is 0x6b6b..). The reason is that it's not running in __mmput but while there are still other threads running and it's not holding the mmap_sem (it can't as it has to wait the even to be received by the manager). So this is an use after free that was happening for all processes. One more implementation problem aside from the race condition: userfaultfd_exit has really to check a flag in mm->flags before walking the vma or it's going to slowdown the exit() path for regular tasks. One more implementation problem: at that point signals can't be delivered so it would also create a task in D state if the manager doesn't read the event. The major design issue: it overall looks superfluous as the manager can check for -ENOSPC in the background transfer: if (mmget_not_zero(ctx->mm)) { [..] } else { return -ENOSPC; } It's safer to roll it back and re-introduce it later if at all. [rppt@linux.vnet.ibm.com: documentation fixup after removal of UFFD_EVENT_EXIT] Link: http://lkml.kernel.org/r/1488345437-4364-1-git-send-email-rppt@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/20170224181957.19736-2-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>