linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2016-05-02	mmc: dw_mmc-exynos: remove dw_mci_exynos_setup_clock	Shawn Lin	1	-8/+0
	We combine what dw_mci_exynos_setup_clock does with init hook to simplify the code Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: dw_mmc-rockchip: remove setup_clock for rockchip	Shawn Lin	1	-8/+4
	We remove setup_clock hook and combine it into init hook to simplify the code Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: dw_mmc: exynos: add the function for controlling SMU	Jaehoon Chung	1	-2/+13
	Some of Exynos has the Security management Unit(SMU). This patch adds the function for controlling SMU. In future, if exynos needs to control SMU, it can be implemented in "config_smu" function, not "init" function. Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: dw_mmc: remove unused EVENT_XFER_ERROR	Shawn Lin	1	-1/+0
	EVENT_XFER_ERROR isn't been used now, so it can be removed. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: dw_mmc: avoid using dmaengine_terminate_all	Shawn Lin	1	-1/+1
	dmaengine_terminate_all is deprecated and should be replaced by more explicit synchronous and asynchronous terminate functions. This change is based on the commit b36f09c3c441 ("dmaengine: Add transfer termination synchronization support"). Currently dw_mci_stop_dma may be called under the spinlock, let's migrate dmaengine_terminate_all to async terminate. This could avoid the race condition of use-after-free resouce of dmaengine once slave-dma driver implement the synchronize method. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: dw_mmc: fix warning reported by kernel-doc	Shawn Lin	1	-1/+10
	Try to fix the warning reported by: scripts/kernel-doc -man -v include/linux/mmc/dw_mmc.h > /dev/null warning: No description found for parameter 'irq_lock' warning: No description found for parameter 'stop_abort' warning: No description found for parameter 'prev_blksz' warning: No description found for parameter 'timing' warning: No description found for parameter 'ring_size' warning: No description found for parameter 'dms' warning: No description found for parameter 'phy_regs' warning: No description found for parameter 'fifoth_val' warning: No description found for parameter 'vqmmc_enabled' warning: No description found for parameter 'cmd11_timer' warning: Excess struct/union/enum/typedef member 'card_tasklet' description in 'dw_mci' Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: dw_mmc-rockchip: fix failing to mount partition with "discard"	Shawn Lin	1	-0/+9
	Without MMC_CAP_ERASE support, we fail to mount partition with "discard" option since mmc_queue_setup_discard is limited for checking mmc_can_erase. Without doing mmc_queue_setup_discard, blk_queue_discard fails to test QUEUE_FLAG_DISCARD flag, so we get the following log from f2fs(actually similar to other file system): mounting with "discard" option, but the device does not support discard Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: dw_mmc-rockchip: remove dw_mci_rockchip_pmops	Shawn Lin	1	-21/+1
	dw_mci_rockchip_pmops just copy-paste what dw_mci_pltfm_pmops have done. Let's remove it. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: sh_mobile_sdhi: Add UHS-I mode support	Wolfram Sang	2	-0/+55
	Implement voltage switch, supporting modes up to SDR-50. Based on work by Shinobu Uehara, Rob Taylor, William Towle and Ian Molton. Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: host: add note that set_ios needs to handle 0Hz properly	Wolfram Sang	1	-10/+21
	While here, refactor the comments so that they are before the declaration they are referring to. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: tmio: stop clock when 0Hz is requested	Wolfram Sang	1	-24/+26
	Setting frequency to 0 is not enough, the clock explicitly has to be disabled. Otherwise voltage switching (which needs SDCLK to be quiet) fails for various cards. Because we now do the 'new_clock == 0' check right at the beginning, the indentation level of the rest of the code can be decreased a little. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: tmio: always start clock after frequency calculation	Wolfram Sang	1	-18/+16
	Starting the clock is always done after frequency change anyhow, so we can do it directly after the clock calculation and remove the specific calls. This is the first part of doing proper clock de-/activation at calculation time. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: tmio: Add UHS-I mode support	Wolfram Sang	3	-1/+15
	Based on work by Shinobu Uehara and Ben Dooks. This adds the voltage switch operation needed for all UHS-I modes, but not the tuning needed for SDR-104 which will come later. Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: tmio, sh_mobile_sdhi: Add support for variable input clock frequency	Ben Hutchings	3	-16/+66
	Currently tmio_mmc assumes that the input clock frequency is fixed and only its own clock divider can be changed. This is not true in the case of sh_mobile_sdhi; we can use the clock API to change it. In tmio_mmc: - Delegate setting of f_min from tmio to the clk_enable operation (if implemented), as it can be smaller than f_max / 512 - Add an optional clk_update operation called from tmio_mmc_set_clock() that updates the input clock frequency - Rename tmio_mmc_clk_update() to tmio_mmc_clk_enable(), to avoid confusion with the clk_update operation In sh_mobile_sdhi: - Make the setting of f_max conditional; it should be set through the max-frequency property in the device tree in future - Set f_min based on the input clock's minimum frequency - Implement the clk_update operation, selecting the best input clock frequency for the bus frequency that's wanted sh_mobile_sdhi_clk_update() is loosely based on Kuninori Morimoto's work in sh_mmcif. Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: tmio, sh_mobile_sdhi: Pass tmio_mmc_host ptr to clk_{enable, disable} ops	Ben Hutchings	3	-11/+9
	Change the clk_enable operation to take a pointer to the struct tmio_mmc_host and have it set f_max. For consistency, also change the clk_disable operation to take a pointer to struct tmio_mmc_host. Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: core: Provide tracepoints for request processing	Baolin Wang	2	-0/+189
	This patch provides some tracepoints for the lifecycle of a mmc request from starting to completion to help with performance analysis of MMC subsystem. Signed-off-by: Baolin Wang <baolin.wang@linaro.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: omap_hsmmc: pass omap_hsmmc_host pointer directly	Andreas Fenkart	1	-10/+9
	unnecessary indirection via 'struct device' back to omap_hsmmc_host Signed-off-by: Andreas Fenkart <afenkart@gmail.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: davinci: remove matching string	David Lechner	1	-1/+1
	The string "MMCSDCLK" is not actually used for clock lookup, so can be removed. Signed-off-by: David Lechner <david@lechnology.com> Acked-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: davinci_mmc: Use dma_request_chan() to requesting DMA channel	Peter Ujfalusi	1	-39/+14
	With the new dma_request_chan() the client driver does not need to look for the DMA resource and it does not need to pass filter_fn anymore. By switching to the new API the driver can now support deferred probing against DMA. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: sh_mmci: Get rid of wrapper function for regulators	Ulf Hansson	1	-12/+4
	As there are two callers of sh_mmcif_set_power() and because its only additional action is to check for a valid regulator, let's just remove it. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: sh_mmcif: Restructure ->set_ios()	Ulf Hansson	1	-29/+16
	Both from a runtime PM and clock management point of view, the ->set_ios() code is unnecessary complex. A suboptimal path is also executed when the mmc core requests a clock rate of zero. As that happens during the card initialization phase, trying to save power by decreasing the runtime PM usage count and gating the clock via clk_disable_unprepare() is just superfluous. Moreover, from a runtime PM point of view the core will anyway keep the device active during the entire card initialization phase. Restructure the code to rely on the ios->power_mode to understand when the runtime PM usage count needs to be increased. Let's also deal with clock rate changes by simply applying the rate. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-02	mmc: sh_mmcif: Make sure the device stays active when needed in ->probe()	Ulf Hansson	1	-7/+8
	While accessing the device, make sure it stays active by increasing the runtime PM usage count for it. Let's also defer to enable runtime PM until we really need access to the device. This also enables the error path in ->probe() to become simpler. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-05-01	Linux 4.6-rc6	Linus Torvalds	1	-2/+2

2016-04-29	EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback	Tony Luck	2	-2/+2
	Both of these drivers can return NOTIFY_BAD, but this terminates processing other callbacks that were registered later on the chain. Since the driver did nothing to log the error it seems wrong to prevent other interested parties from seeing it. E.g. neither of them had even bothered to check the type of the error to see if it was a memory error before the return NOTIFY_BAD. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/72937355dd92318d2630979666063f8a2853495b.1461864507.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>
2016-04-28	Documentation/sysctl/vm.txt: update numa_zonelist_order description	Xishi Qiu	1	-9/+10
	Commit 3193913ce62c ("mm: page_alloc: default node-ordering on 64-bit NUMA, zone-ordering on 32-bit") changes the default value of numa_zonelist_order. Update the document. Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28	lib/stackdepot.c: allow the stack trace hash to be zero	Alexander Potapenko	1	-4/+0
	Do not bail out from depot_save_stack() if the stack trace has zero hash. Initially depot_save_stack() silently dropped stack traces with zero hashes, however there's actually no point in reserving this zero value. Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28	rapidio: fix potential NULL pointer dereference	Vladimir Zapolskiy	1	-2/+2
	The change fixes improper check for a returned error value by class_create() function, which on error returns ERR_PTR() value, thus the original check always results in a dead code on error path. Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28	mm/memory-failure: fix race with compound page split/merge	Konstantin Khlebnikov	1	-1/+9
	get_hwpoison_page() must recheck relation between head and tail pages. n-horiguchi said: without this recheck, the race causes kernel to pin an irrelevant page, and finally makes kernel crash for refcount mismatch. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28	ocfs2/dlm: return zero if deref_done message is successfully handled	xuejiufei	1	-0/+2
	dlm_deref_lockres_done_handler() should return zero if the message is successfully handled. Fixes: 60d663cb5273 ("ocfs2/dlm: add DEREF_DONE message"). Signed-off-by: xuejiufei <xuejiufei@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28	Ananth has moved	Ananth N Mavinakayanahalli	1	-1/+1
	The current ID is going away soon... update email address Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28	kcov: don't profile branches in kcov	Andrey Ryabinin	1	-0/+1
	Profiling 'if' statements in __sanitizer_cov_trace_pc() leads to unbound recursion and crash: __sanitizer_cov_trace_pc() -> ftrace_likely_update -> __sanitizer_cov_trace_pc() ... Define DISABLE_BRANCH_PROFILING to disable this tracer. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28	kcov: don't trace the code coverage code	James Morse	1	-1/+1
	Kcov causes the compiler to add a call to __sanitizer_cov_trace_pc() in every basic block. Ftrace patches in a call to _mcount() to each function it has annotated. Letting these mechanisms annotate each other is a bad thing. Break the loop by adding 'notrace' to __sanitizer_cov_trace_pc() so that ftrace won't try to patch this code. This patch lets arm64 with KCOV and STACK_TRACER boot. Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28	mm: wake kcompactd before kswapd's short sleep	Vlastimil Babka	1	-14/+14
	When kswapd goes to sleep it checks if the node is balanced and at first it sleeps only for HZ/10 time, then rechecks if the node is still balanced and nobody has woken it during the initial sleep. Only then it goes fully sleep until an allocation slowpath wakes it up again. For higher-order allocations, waking up kcompactd is done only before the full sleep. This turns out to be an issue in case another high-order allocation fails during the initial sleep. It will wake kswapd up, however kswapd considers the zone balanced from the order-0 perspective, and will just quickly try to sleep again. So if there's a longer stream of high-order allocations hitting the slowpath and waking up kswapd, it might never actually wake up kcompactd, which may be considered a regression from kswapd-based compaction. In the worst case, it might be that a single allocation that cannot direct reclaim/compact itself is waking kswapd in the retry loop and preventing kcompactd from being woken up and unblocking it. This patch makes sure kcompactd is woken up in such situations by simply moving the wakeup before the short initial sleep. More efficient solution would be to wake kcompactd immediately instead of kswapd if the node is already order-0 balanced, but in that case we should also move reset_isolation_suitable() call to kcompactd so it's not adding to the allocator's latency. Since it's late in the 4.6 cycle, let's go with the simpler change for now. Fixes: accf62422b3a ("mm, kswapd: replace kswapd compaction with waking up kcompactd") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28	.mailmap: add Frank Rowand	Frank Rowand	1	-0/+3
	Set current email address to replace obsolete email addresses. Signed-off-by: Frank Rowand <frank.rowand@sonymobile.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28	mm/hwpoison: fix wrong num_poisoned_pages accounting	Minchan Kim	1	-1/+7
	Currently, migration code increses num_poisoned_pages on failed migration page as well as successfully migrated one at the trial of memory-failure. It will make the stat wrong. As well, it marks the page as PG_HWPoison even if the migration trial failed. It would mean we cannot recover the corrupted page using memory-failure facility. This patches fixes it. Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28	mm: call swap_slot_free_notify() with page lock held	Minchan Kim	1	-1/+5
	Kyeongdon reported below error which is BUG_ON(!PageSwapCache(page)) in page_swap_info. The reason is that page_endio in rw_page unlocks the page if read I/O is completed so we need to hold a PG_lock again to check PageSwapCache. Otherwise, the page can be removed from swapcache. Kernel BUG at c00f9040 [verbose debug info unavailable] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM Modules linked in: CPU: 4 PID: 13446 Comm: RenderThread Tainted: G W 3.10.84-g9f14aec-dirty #73 task: c3b73200 ti: dd192000 task.ti: dd192000 PC is at page_swap_info+0x10/0x2c LR is at swap_slot_free_notify+0x18/0x6c pc : [<c00f9040>] lr : [<c00f5560>] psr: 400f0113 sp : dd193d78 ip : c2deb1e4 fp : da015180 r10: 00000000 r9 : 000200da r8 : c120fe08 r7 : 00000000 r6 : 00000000 r5 : c249a6c0 r4 : = c249a6c0 r3 : 00000000 r2 : 40080009 r1 : 200f0113 r0 : = c249a6c0 ..<snip> .. Call Trace: page_swap_info+0x10/0x2c swap_slot_free_notify+0x18/0x6c swap_readpage+0x90/0x11c read_swap_cache_async+0x134/0x1ac swapin_readahead+0x70/0xb0 handle_pte_fault+0x320/0x6fc handle_mm_fault+0xc0/0xf0 do_page_fault+0x11c/0x36c do_DataAbort+0x34/0x118 Fixes: 3f2b1a04f44933f2 ("zram: revive swap_slot_free_notify") Signed-off-by: Minchan Kim <minchan@kernel.org> Tested-by: Kyeongdon Kim <kyeongdon.kim@lge.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28	mm: vmscan: reclaim highmem zone if buffer_heads is over limit	Minchan Kim	1	-1/+1
	We have been reclaimed highmem zone if buffer_heads is over limit but commit 6b4f7799c6a5 ("mm: vmscan: invoke slab shrinkers from shrink_zone()") changed the behavior so it doesn't reclaim highmem zone although buffer_heads is over the limit. This patch restores the logic. Fixes: 6b4f7799c6a5 ("mm: vmscan: invoke slab shrinkers from shrink_zone()") Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28	numa: fix /proc/<pid>/numa_maps for THP	Gerald Schaefer	3	-3/+72
	In gather_pte_stats() a THP pmd is cast into a pte, which is wrong because the layouts may differ depending on the architecture. On s390 this will lead to inaccurate numa_maps accounting in /proc because of misguided pte_present() and pte_dirty() checks on the fake pte. On other architectures pte_present() and pte_dirty() may work by chance, but there may be an issue with direct-access (dax) mappings w/o underlying struct pages when HAVE_PTE_SPECIAL is set and THP is available. In vm_normal_page() the fake pte will be checked with pte_special() and because there is no "special" bit in a pmd, this will always return false and the VM_PFNMAP \| VM_MIXEDMAP checking will be skipped. On dax mappings w/o struct pages, an invalid struct page pointer would then be returned that can crash the kernel. This patch fixes the numa_maps THP handling by introducing new "_pmd" variants of the can_gather_numa_stats() and vm_normal_page() functions. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> [4.3+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28	mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check	Konstantin Khlebnikov	1	-4/+2
	Khugepaged detects own VMAs by checking vm_file and vm_ops but this way it cannot distinguish private /dev/zero mappings from other special mappings like /dev/hpet which has no vm_ops and popultes PTEs in mmap. This fixes false-positive VM_BUG_ON and prevents installing THP where they are not expected. Link: http://lkml.kernel.org/r/CACT4Y+ZmuZMV5CjSFOeXviwQdABAgT7T+StKfTqan9YDtgEi5g@mail.gmail.com Fixes: 78f11a255749 ("mm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups") Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28	mailmap: fix Krzysztof Kozlowski's misspelled name	Krzysztof Kozlowski	1	-0/+1
	Patchwork introduced a garbled Polish character in commit 1e3012d0fdc5 ("crypto: s5p-sss - Use memcpy_toio for iomem annotated memory") so fix the mail mapping. Additionally prefer to use kernel.org account for personal work, instead of my gmail address. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28	thp: keep huge zero page pinned until tlb flush	Kirill A. Shutemov	3	-3/+13
	Andrea has found[1] a race condition on MMU-gather based TLB flush vs split_huge_page() or shrinker which frees huge zero under us (patch 1/2 and 2/2 respectively). With new THP refcounting, we don't need patch 1/2: mmu_gather keeps the page pinned until flush is complete and the pin prevents the page from being split under us. We still need patch 2/2. This is simplified version of Andrea's patch. We don't need fancy encoding. [1] http://lkml.kernel.org/r/1447938052-22165-1-git-send-email-aarcange@redhat.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28	mm: exclude HugeTLB pages from THP page_mapped() logic	Steve Capper	1	-0/+2
	HugeTLB pages cannot be split, so we use the compound_mapcount to track rmaps. Currently page_mapped() will check the compound_mapcount, but will also go through the constituent pages of a THP compound page and query the individual _mapcount's too. Unfortunately, page_mapped() does not distinguish between HugeTLB and THP compound pages and assumes that a compound page always needs to have HPAGE_PMD_NR pages querying. For most cases when dealing with HugeTLB this is just inefficient, but for scenarios where the HugeTLB page size is less than the pmd block size (e.g. when using contiguous bit on ARM) this can lead to crashes. This patch adjusts the page_mapped function such that we skip the unnecessary THP reference checks for HugeTLB pages. Fixes: e1534ae95004 ("mm: differentiate page_mapped() from page_mapcount() for compound pages") Signed-off-by: Steve Capper <steve.capper@arm.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28	kexec: export OFFSET(page.compound_head) to find out compound tail page	Atsushi Kumagai	1	-0/+1
	PageAnon() always look at head page to check PAGE_MAPPING_ANON and tail page's page->mapping has just a poisoned data since commit 1c290f642101 ("mm: sanitize page->mapping for tail pages"). If makedumpfile checks page->mapping of a compound tail page to distinguish anonymous page as usual, it must fail in newer kernel. So it's necessary to export OFFSET(page.compound_head) to avoid checking compound tail pages. The problem is that unnecessary hugepages won't be removed from a dump file in kernels 4.5.x and later. This means that extra disk space would be consumed. It's a problem, but not critical. Signed-off-by: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28	kexec: update VMCOREINFO for compound_order/dtor	Atsushi Kumagai	1	-2/+4
	makedumpfile refers page.lru.next to get the order of compound pages for page filtering. However, now the order is stored in page.compound_order, hence VMCOREINFO should be updated to export the offset of page.compound_order. The fact is, page.compound_order was introduced already in kernel 4.0, but the offset of it was the same as page.lru.next until kernel 4.3, so this was not actual problem. The above can be said also for page.lru.prev and page.compound_dtor, it's necessary to detect hugetlbfs pages. Further, the content was changed from direct address to the ID which means dtor. The problem is that unnecessary hugepages won't be removed from a dump file in kernels 4.4.x and later. This means that extra disk space would be consumed. It's a problem, but not critical. Signed-off-by: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-28	RDMA/nes: don't leak skb if carrier down	Florian Westphal	1	-3/+0
	Alternatively one could free the skb, OTOH I don't think this test is useful so just remove it. Cc: <linux-rdma@vger.kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28	drm/vmwgfx: Fix order of operation	Sinclair Yeh	1	-3/+3
	mode->hdisplay * (var->bits_per_pixel + 7) gets evaluated before the division, potentially making the pitch larger than it should be. Since the original intention is to do a div-round-up, just use the macro instead. Signed-off-by: Sinclair Yeh <syeh@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
2016-04-28	drm/vmwgfx: use vmw_cmd_dx_cid_check for query commands.	Charmaine Lee	1	-4/+4
	Instead of calling vmw_cmd_ok, call vmw_cmd_dx_cid_check to validate the context id for query commands. Signed-off-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2016-04-28	drm/vmwgfx: Enable SVGA_3D_CMD_DX_SET_PREDICATION	Charmaine Lee	1	-1/+1
	Fixes piglit tests nv_conditional_render-* crashes. Signed-off-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2016-04-28	IB/security: Restrict use of the write() interface	Jason Gunthorpe	7	-1/+40
	The drivers/infiniband stack uses write() as a replacement for bi-directional ioctl(). This is not safe. There are ways to trigger write calls that result in the return structure that is normally written to user space being shunted off to user specified kernel memory instead. For the immediate repair, detect and deny suspicious accesses to the write API. For long term, update the user space libraries and the kernel API to something that doesn't present the same security vulnerabilities (likely a structured ioctl() interface). The impacted uAPI interfaces are generally only available if hardware from drivers/infiniband is installed in the system. Reported-by: Jann Horn <jann@thejh.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [ Expanded check to all known write() entry points ] Cc: stable@vger.kernel.org Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28	IB/hfi1: Use kernel default llseek for ui device	Dean Luick	1	-23/+2
	The ui device llseek had a mistake with SEEK_END and did not fully follow seek semantics. Correct all this by using a kernel supplied function for fixed size devices. Cc: Al Viro <viro@ZenIV.linux.org.uk> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>