linux-dev - Linux kernel development work

Age	Commit message (Collapse)	Author	Files	Lines
2014-02-13	lto: Make asmlinkage __visible	Andi Kleen	1	-2/+2
	Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391846481-31491-3-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13	x86, lto: Disable LTO for the x86 VDSO	Andi Kleen	1	-3/+7
	The VDSO does not play well with LTO, so just disable LTO for it. Also pass a 32bit linker flag for the 32bit version. [ hpa: change braces to parens to match kernel Makefile style ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391846481-31491-1-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13	initconst, x86: Fix initconst mistake in ts5500 code	Andi Kleen	1	-1/+1
	const data must be initconst. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-14-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13	initconst: Fix initconst mistake in dcdbas	Andi Kleen	1	-1/+1
	const must be __initconst. Cc: Douglas_Warzecha@dell.com Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-13-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13	asmlinkage: Make trace_hardirqs_on/off_caller visible	Andi Kleen	1	-2/+2
	These functions are called from assembler, and thus need to be __visible. Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-12-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13	asmlinkage, x86: Fix 32bit memcpy for LTO	Andi Kleen	1	-3/+3
	These functions can be called implicitely from gcc, and thus need to be visible. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-11-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13	asmlinkage Make __stack_chk_failed and memcmp visible	Andi Kleen	2	-2/+2
	In LTO symbols implicitely referenced by the compiler need to be visible. Earlier these symbols were visible implicitely from being exported, but we disabled implicit visibility fo EXPORTs when modules are disabled to improve code size. So now these symbols have to be marked visible explicitely. Do this for __stack_chk_fail (with stack protector) and memcmp. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-10-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13	asmlinkage: Mark rwsem functions that can be called from assembler asmlinkage	Andi Kleen	1	-0/+4
	Mark the rwsem functions that can be called from assembler asmlinkage. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-9-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13	asmlinkage: Make main_extable_sort_needed visible	Andi Kleen	1	-1/+1
	main_extable_sort_needed is used by the build system and needs to be a normal ELF symbol. Make it visible so that LTO does not remove or mangle it. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-8-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13	asmlinkage, mutex: Mark __visible	Andi Kleen	1	-5/+5
	Various kernel/mutex.c functions can be called from inline assembler, so they should be all global and __visible. Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-7-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13	asmlinkage: Make trace_hardirq visible	Andi Kleen	1	-2/+2
	Can be called from assembler code. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-6-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13	asmlinkage: Make lockdep_sys_exit asmlinkage	Andi Kleen	2	-2/+2
	lockdep_sys_exit can be called from assembler code, so make it asmlinkage. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-5-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13	asmlinkage, pnp: Make variables used from assembler code visible	Andi Kleen	1	-4/+5
	Mark variables referenced from assembler files visible. This fixes compile problems with LTO. Cc: Jaroslav Kysela <perex@perex.cz> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-4-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13	asmlinkage: Make jiffies visible	Andi Kleen	2	-2/+2
	Jiffies is referenced by the linker script, so it has to be visible. Handled both the generic and the x86 version. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-3-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13	asmlinkage: Make __iowrite32_copy visible	Andi Kleen	1	-1/+1
	This is a assembler function on x86, so it should be visible. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-2-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13	asmlinkage, kvm: Make kvm_rebooting visible	Andi Kleen	1	-1/+1
	kvm_rebooting is referenced from assembler code, thus needs to be visible. Cc: Gleb Natapov <gleb@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-1-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-09	Linux 3.14-rc2	Linus Torvalds	1	-1/+1

2014-02-09	fix a kmap leak in virtio_console	Al Viro	1	-6/+3
	While we are at it, don't do kmap() under kmap_atomic(), especially for a page we'd allocated with GFP_KERNEL. It's spelled "page_address", and had that been more than that, we'd have a real trouble - kmap_high() can block, and doing that while holding kmap_atomic() is a Bad Idea(tm). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-02-09	fix O_SYNC\|O_APPEND syncing the wrong range on write()	Al Viro	7	-25/+14
	It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support) when sync_page_range() had been introduced; generic_file_write{,v}() correctly synced pos_after_write - written .. pos_after_write - 1 but generic_file_aio_write() synced pos_before_write .. pos_before_write + written - 1 instead. Which is not the same thing with O_APPEND, obviously. A couple of years later correct variant had been killed off when everything switched to use of generic_file_aio_write(). All users of generic_file_aio_write() are affected, and the same bug has been copied into other instances of ->aio_write(). The fix is trivial; the only subtle point is that generic_write_sync() ought to be inlined to avoid calculations useless for the majority of calls. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-02-08	Btrfs: fix data corruption when reading/updating compressed extents	Filipe David Borba Manana	1	-0/+2
	When using a mix of compressed file extents and prealloc extents, it is possible to fill a page of a file with random, garbage data from some unrelated previous use of the page, instead of a sequence of zeroes. A simple sequence of steps to get into such case, taken from the test case I made for xfstests, is: _scratch_mkfs _scratch_mount "-o compress-force=lzo" $XFS_IO_PROG -f -c "pwrite -S 0x06 -b 18670 266978 18670" $SCRATCH_MNT/foobar $XFS_IO_PROG -c "falloc 26450 665194" $SCRATCH_MNT/foobar $XFS_IO_PROG -c "truncate 542872" $SCRATCH_MNT/foobar $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar This results in the following file items in the fs tree: item 4 key (257 INODE_ITEM 0) itemoff 15879 itemsize 160 inode generation 6 transid 6 size 542872 block group 0 mode 100600 item 5 key (257 INODE_REF 256) itemoff 15863 itemsize 16 inode ref index 2 namelen 6 name: foobar item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53 extent data disk byte 0 nr 0 gen 6 extent data offset 0 nr 24576 ram 266240 extent compression 0 item 7 key (257 EXTENT_DATA 24576) itemoff 15757 itemsize 53 prealloc data disk byte 12849152 nr 241664 gen 6 prealloc data offset 0 nr 241664 item 8 key (257 EXTENT_DATA 266240) itemoff 15704 itemsize 53 extent data disk byte 12845056 nr 4096 gen 6 extent data offset 0 nr 20480 ram 20480 extent compression 2 item 9 key (257 EXTENT_DATA 286720) itemoff 15651 itemsize 53 prealloc data disk byte 13090816 nr 405504 gen 6 prealloc data offset 0 nr 258048 The on disk extent at offset 266240 (which corresponds to 1 single disk block), contains 5 compressed chunks of file data. Each of the first 4 compress 4096 bytes of file data, while the last one only compresses 3024 bytes of file data. Therefore a read into the file region [285648 ; 286720[ (length = 4096 - 3024 = 1072 bytes) should always return zeroes (our next extent is a prealloc one). The solution here is the compression code path to zero the remaining (untouched) bytes of the last page it uncompressed data into, as the information about how much space the file data consumes in the last page is not known in the upper layer fs/btrfs/extent_io.c:__do_readpage(). In __do_readpage we were correctly zeroing the remainder of the page but only if it corresponds to the last page of the inode and if the inode's size is not a multiple of the page size. This would cause not only returning random data on reads, but also permanently storing random data when updating parts of the region that should be zeroed. For the example above, it means updating a single byte in the region [285648 ; 286720[ would store that byte correctly but also store random data on disk. A test case for xfstests follows soon. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-02-08	Btrfs: don't loop forever if we can't run because of the tree mod log	Josef Bacik	1	-0/+1
	A user reported a 100% cpu hang with my new delayed ref code. Turns out I forgot to increase the count check when we can't run a delayed ref because of the tree mod log. If we can't run any delayed refs during this there is no point in continuing to look, and we need to break out. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-02-08	btrfs: reserve no transaction units in btrfs_ioctl_set_features	David Sterba	1	-1/+1
	Added in patch "btrfs: add ioctls to query/change feature bits online" modifications to superblock don't need to reserve metadata blocks when starting a transaction. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-02-08	btrfs: commit transaction after setting label and features	Jeff Mahoney	1	-2/+2
	The set_fslabel ioctl uses btrfs_end_transaction, which means it's possible that the change will be lost if the system crashes, same for the newly set features. Let's use btrfs_commit_transaction instead. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-02-08	Btrfs: fix assert screwup for the pending move stuff	Josef Bacik	1	-5/+3
	Wang noticed that he was failing btrfs/030 even though me and Filipe couldn't reproduce. Turns out this is because Wang didn't have CONFIG_BTRFS_ASSERT set, which meant that a key part of Filipe's original patch was not being built in. This appears to be a mess up with merging Filipe's patch as it does not exist in his original patch. Fix this by changing how we make sure del_waiting_dir_move asserts that it did not error and take the function out of the ifdef check. This makes btrfs/030 pass with the assert on or off. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Filipe Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-02-08	jfs: fix generic posix ACL regression	Dave Kleikamp	1	-7/+7
	I missed a couple errors in reviewing the patches converting jfs to use the generic posix ACL function. Setting ACL's currently fails with -EOPNOTSUPP. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reported-by: Michael L. Semon <mlsemon35@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-02-08	watchdog: dw_wdt: Add dependency on HAS_IOMEM	Richard Weinberger	1	-0/+1
	On archs like S390 or um this driver cannot build nor work. Make it depend on HAS_IOMEM to bypass build failures. drivers/built-in.o: In function `dw_wdt_drv_probe': drivers/watchdog/dw_wdt.c:302: undefined reference to `devm_ioremap_resource' Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2014-02-07	libceph: do not dereference a NULL bio pointer	Ilya Dryomov	1	-2/+6
	Commit f38a5181d9f3 ("ceph: Convert to immutable biovecs") introduced a NULL pointer dereference, which broke rbd in -rc1. Fix it. Cc: Kent Overstreet <kmo@daterainc.com> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-02-07	libceph: take map_sem for read in handle_reply()	Ilya Dryomov	1	-6/+11
	Handling redirect replies requires both map_sem and request_mutex. Taking map_sem unconditionally near the top of handle_reply() avoids possible race conditions that arise from releasing request_mutex to be able to acquire map_sem in redirect reply case. (Lock ordering is: map_sem, request_mutex, crush_mutex.) Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-02-07	libceph: factor out logic from ceph_osdc_start_request()	Ilya Dryomov	1	-23/+39
	Factor out logic from ceph_osdc_start_request() into a new helper, __ceph_osdc_start_request(). ceph_osdc_start_request() now amounts to taking locks and calling __ceph_osdc_start_request(). Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2014-02-07	arm64: defconfig: Expand default enabled features	Mark Rutland	2	-4/+15
	FPGA implementations of the Cortex-A57 and Cortex-A53 are now available in the form of the SMM-A57 and SMM-A53 Soft Macrocell Models (SMMs) for Versatile Express. As these attach to a Motherboard Express V2M-P1 it would be useful to have support for some V2M-P1 peripherals enabled by default. Additionally a couple of of features have been introduced since the last defconfig update (CMA, jump labels) that would be good to have enabled by default to ensure they are build and boot tested. This patch updates the arm64 defconfig to enable support for these devices and features. The arm64 Kconfig is modified to select HAVE_PATA_PLATFORM, which is required to enable support for the CompactFlash controller on the V2M-P1. A few options which don't need to appear in defconfig are trimmed: * BLK_DEV - selected by default * EXPERIMENTAL - otherwise gone from the kernel * MII - selected by drivers which require it * USB_SUPPORT - selected by default Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-07	arm64: asm: remove redundant "cc" clobbers	Will Deacon	4	-25/+21
	cbnz/tbnz don't update the condition flags, so remove the "cc" clobbers from inline asm blocks that only use these instructions to implement conditional branches. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-07	arm64: atomics: fix use of acquire + release for full barrier semantics	Will Deacon	5	-18/+35
	Linux requires a number of atomic operations to provide full barrier semantics, that is no memory accesses after the operation can be observed before any accesses up to and including the operation in program order. On arm64, these operations have been incorrectly implemented as follows: // A, B, C are independent memory locations <Access [A]> // atomic_op (B) 1: ldaxr x0, [B] // Exclusive load with acquire <op(B)> stlxr w1, x0, [B] // Exclusive store with release cbnz w1, 1b <Access [C]> The assumption here being that two half barriers are equivalent to a full barrier, so the only permitted ordering would be A -> B -> C (where B is the atomic operation involving both a load and a store). Unfortunately, this is not the case by the letter of the architecture and, in fact, the accesses to A and C are permitted to pass their nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the store-release on B). This is a clear violation of the full barrier requirement. The simple way to fix this is to implement the same algorithm as ARMv7 using explicit barriers: <Access [A]> // atomic_op (B) dmb ish // Full barrier 1: ldxr x0, [B] // Exclusive load <op(B)> stxr w1, x0, [B] // Exclusive store cbnz w1, 1b dmb ish // Full barrier <Access [C]> but this has the undesirable effect of introducing two full barrier instructions. A better approach is actually the following, non-intuitive sequence: <Access [A]> // atomic_op (B) 1: ldxr x0, [B] // Exclusive load <op(B)> stlxr w1, x0, [B] // Exclusive store with release cbnz w1, 1b dmb ish // Full barrier <Access [C]> The simple observations here are: - The dmb ensures that no subsequent accesses (e.g. the access to C) can enter or pass the atomic sequence. - The dmb also ensures that no prior accesses (e.g. the access to A) can pass the atomic sequence. - Therefore, no prior access can pass a subsequent access, or vice-versa (i.e. A is strictly ordered before C). - The stlxr ensures that no prior access can pass the store component of the atomic operation. The only tricky part remaining is the ordering between the ldxr and the access to A, since the absence of the first dmb means that we're now permitting re-ordering between the ldxr and any prior accesses. From an (arbitrary) observer's point of view, there are two scenarios: 1. We have observed the ldxr. This means that if we perform a store to [B], the ldxr will still return older data. If we can observe the ldxr, then we can potentially observe the permitted re-ordering with the access to A, which is clearly an issue when compared to the dmb variant of the code. Thankfully, the exclusive monitor will save us here since it will be cleared as a result of the store and the ldxr will retry. Notice that any use of a later memory observation to imply observation of the ldxr will also imply observation of the access to A, since the stlxr/dmb ensure strict ordering. 2. We have not observed the ldxr. This means we can perform a store and influence the later ldxr. However, that doesn't actually tell us anything about the access to [A], so we've not lost anything here either when compared to the dmb variant. This patch implements this solution for our barriered atomic operations, ensuring that we satisfy the full barrier requirements where they are needed. Cc: <stable@vger.kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-06	hwmon: (da9055) Remove use of regmap_irq_get_virq()	Adam Thomson	1	-4/+0
	Remove use of regmap_irq_get_virq() in driver probe which was conflicting with use of platform_get_irq_byname(). platform_get_irq_byname() already returns the VIRQ number due to MFD core translation so using regmap_irq_get_virq() on that returned value results in an incorrect IRQ being requested. The driver probes then fail because of this. Signed-off-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2014-02-06	mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq	KOSAKI Motohiro	1	-2/+4
	To use spin_{un}lock_irq is dangerous if caller disabled interrupt. During aio buffer migration, we have a possibility to see the following call stack. aio_migratepage [disable interrupt] migrate_page_copy clear_page_dirty_for_io set_page_dirty __set_page_dirty_buffers __set_page_dirty spin_lock_irq This mean, current aio migration is a deadlockable. spin_lock_irqsave is a safer alternative and we should use it. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reported-by: David Rientjes rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-06	arch/x86/mm/numa.c: fix array index overflow when synchronizing nid to memblock.reserved.	Tang Chen	1	-8/+11
	The following path will cause array out of bound. memblock_add_region() will always set nid in memblock.reserved to MAX_NUMNODES. In numa_register_memblks(), after we set all nid to correct valus in memblock.reserved, we called setup_node_data(), and used memblock_alloc_nid() to allocate memory, with nid set to MAX_NUMNODES. The nodemask_t type can be seen as a bit array. And the index is 0 ~ MAX_NUMNODES-1. After that, when we call node_set() in numa_clear_kernel_node_hotplug(), the nodemask_t got an index of value MAX_NUMNODES, which is out of [0 ~ MAX_NUMNODES-1]. See below: numa_init() \|---> numa_register_memblks() \| \|---> memblock_set_node(memory) set correct nid in memblock.memory \| \|---> memblock_set_node(reserved) set correct nid in memblock.reserved \| \|...... \| \|---> setup_node_data() \| \|---> memblock_alloc_nid() here, nid is set to MAX_NUMNODES (1024) \|...... \|---> numa_clear_kernel_node_hotplug() \|---> node_set() here, we have an index 1024, and overflowed This patch moves nid setting to numa_clear_kernel_node_hotplug() to fix this problem. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reported-by: Dave Jones <davej@redhat.com> Cc: David Rientjes <rientjes@google.com> Tested-by: Dave Jones <davej@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-06	arch/x86/mm/numa.c: initialize numa_kernel_nodes in numa_clear_kernel_node_hotplug()	Tang Chen	1	-1/+1
	On-stack variable numa_kernel_nodes in numa_clear_kernel_node_hotplug() was not initialized. So we need to initialize it. [akpm@linux-foundation.org: use NODE_MASK_NONE, per David] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reported-by: Dave Jones <davej@redhat.com> Reported-by: David Rientjes <rientjes@google.com> Tested-by: Dave Jones <davej@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-06	mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq()	KOSAKI Motohiro	1	-2/+3
	During aio stress test, we observed the following lockdep warning. This mean AIO+numa_balancing is currently deadlockable. The problem is, aio_migratepage disable interrupt, but __set_page_dirty_nobuffers unintentionally enable it again. Generally, all helper function should use spin_lock_irqsave() instead of spin_lock_irq() because they don't know caller at all. other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&ctx->completion_lock)->rlock); <Interrupt> lock(&(&ctx->completion_lock)->rlock); * DEADLOCK * dump_stack+0x19/0x1b print_usage_bug+0x1f7/0x208 mark_lock+0x21d/0x2a0 mark_held_locks+0xb9/0x140 trace_hardirqs_on_caller+0x105/0x1d0 trace_hardirqs_on+0xd/0x10 _raw_spin_unlock_irq+0x2c/0x50 __set_page_dirty_nobuffers+0x8c/0xf0 migrate_page_copy+0x434/0x540 aio_migratepage+0xb1/0x140 move_to_new_page+0x7d/0x230 migrate_pages+0x5e5/0x700 migrate_misplaced_page+0xbc/0xf0 do_numa_page+0x102/0x190 handle_pte_fault+0x241/0x970 handle_mm_fault+0x265/0x370 __do_page_fault+0x172/0x5a0 do_page_fault+0x1a/0x70 page_fault+0x28/0x30 Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <jweiner@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-06	mm/swap: fix race on swap_info reuse between swapoff and swapon	Weijie Yang	1	-1/+10
	swapoff clear swap_info's SWP_USED flag prematurely and free its resources after that. A concurrent swapon will reuse this swap_info while its previous resources are not cleared completely. These late freed resources are: - p->percpu_cluster - swap_cgroup_ctrl[type] - block_device setting - inode->i_flags &= ~S_SWAPFILE This patch clears the SWP_USED flag after all its resources are freed, so that swapon can reuse this swap_info by alloc_swap_info() safely. [akpm@linux-foundation.org: tidy up code comment] Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-06	swap: add a simple detector for inappropriate swapin readahead	Shaohua Li	2	-5/+62
	This is a patch to improve swap readahead algorithm. It's from Hugh and I slightly changed it. Hugh's original changelog: swapin readahead does a blind readahead, whether or not the swapin is sequential. This may be ok on harddisk, because large reads have relatively small costs, and if the readahead pages are unneeded they can be reclaimed easily - though, what if their allocation forced reclaim of useful pages? But on SSD devices large reads are more expensive than small ones: if the readahead pages are unneeded, reading them in caused significant overhead. This patch adds very simplistic random read detection. Stealing the PageReadahead technique from Konstantin Khlebnikov's patch, avoiding the vma/anon_vma sophistications of Shaohua Li's patch, swapin_nr_pages() simply looks at readahead's current success rate, and narrows or widens its readahead window accordingly. There is little science to its heuristic: it's about as stupid as can be whilst remaining effective. The table below shows elapsed times (in centiseconds) when running a single repetitive swapping load across a 1000MB mapping in 900MB ram with 1GB swap (the harddisk tests had taken painfully too long when I used mem=500M, but SSD shows similar results for that). Vanilla is the 3.6-rc7 kernel on which I started; Shaohua denotes his Sep 3 patch in mmotm and linux-next; HughOld denotes my Oct 1 patch which Shaohua showed to be defective; HughNew this Nov 14 patch, with page_cluster as usual at default of 3 (8-page reads); HughPC4 this same patch with page_cluster 4 (16-page reads); HughPC0 with page_cluster 0 (1-page reads: no readahead). HDD for swapping to harddisk, SSD for swapping to VertexII SSD. Seq for sequential access to the mapping, cycling five times around; Rand for the same number of random touches. Anon for a MAP_PRIVATE anon mapping; Shmem for a MAP_SHARED anon mapping, equivalent to tmpfs. One weakness of Shaohua's vma/anon_vma approach was that it did not optimize Shmem: seen below. Konstantin's approach was perhaps mistuned, 50% slower on Seq: did not compete and is not shown below. HDD Vanilla Shaohua HughOld HughNew HughPC4 HughPC0 Seq Anon 73921 76210 75611 76904 78191 121542 Seq Shmem 73601 73176 73855 72947 74543 118322 Rand Anon 895392 831243 871569 845197 846496 841680 Rand Shmem 1058375 1053486 827935 764955 764376 756489 SSD Vanilla Shaohua HughOld HughNew HughPC4 HughPC0 Seq Anon 24634 24198 24673 25107 21614 70018 Seq Shmem 24959 24932 25052 25703 22030 69678 Rand Anon 43014 26146 28075 25989 26935 25901 Rand Shmem 45349 45215 28249 24268 24138 24332 These tests are, of course, two extremes of a very simple case: under heavier mixed loads I've not yet observed any consistent improvement or degradation, and wider testing would be welcome. Shaohua Li: Test shows Vanilla is slightly better in sequential workload than Hugh's patch. I observed with Hugh's patch sometimes the readahead size is shrinked too fast (from 8 to 1 immediately) in sequential workload if there is no hit. And in such case, continuing doing readahead is good actually. I don't prepare a sophisticated algorithm for the sequential workload because so far we can't guarantee sequential accessed pages are swap out sequentially. So I slightly change Hugh's heuristic - don't shrink readahead size too fast. Here is my test result (unit second, 3 runs average): Vanilla Hugh New Seq 356 370 360 Random 4525 2447 2444 Attached graph is the swapin/swapout throughput I collected with 'vmstat 2'. The first part is running a random workload (till around 1200 of the x-axis) and the second part is running a sequential workload. swapin and swapout throughput are almost identical in steady state in both workloads. These are expected behavior. while in Vanilla, swapin is much bigger than swapout especially in random workload (because wrong readahead). Original patches by: Shaohua Li and Konstantin Khlebnikov. [fengguang.wu@intel.com: swapin_nr_pages() can be static] Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Shaohua Li <shli@fusionio.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-06	ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters	Zongxun Wang	3	-3/+83
	Even if using the same jbd2 handle, we cannot rollback a transaction. So once some error occurs after successfully allocating clusters, the allocated clusters will never be used and it means they are lost. For example, call ocfs2_claim_clusters successfully when expanding a file, but failed in ocfs2_insert_extent. So we need free the allocated clusters if they are not used indeed. Signed-off-by: Zongxun Wang <wangzongxun@huawei.com> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-06	Documentation/kernel-parameters.txt: fix memmap= language	Randy Dunlap	1	-4/+4
	Clean up descriptions of memmap= boot options. Add periods (full stops), drop commas, change "used" to "reserved" or "marked". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Andiry Xu <andiry.xu@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-06	x86, microcode, AMD: Unify valid container checks	Borislav Petkov	1	-14/+29
	For additional coverage, BorisO and friends unknowlingly did swap AMD microcode with Intel microcode blobs in order to see what happens. What did happen on 32-bit was [ 5.722656] BUG: unable to handle kernel paging request at be3a6008 [ 5.722693] IP: [<c106d6b4>] load_microcode_amd+0x24/0x3f0 [ 5.722716] pdpt = 0000000000000000 pde = 0000000000000000 because there was a valid initrd there but without valid microcode in it and the container check happened after the relocated ramdisk handling on 32-bit, which was clearly wrong. While at it, take care of the ramdisk relocation on both 32- and 64-bit as it is done on both. Also, comment what we're doing because this code is a bit tricky. Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1391460104-7261-1-git-send-email-bp@alien8.de Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-02-06	x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y	Peter Oberparleiter	1	-0/+1
	Commit d61931d89b, "x86: Add optimized popcnt variants" introduced compile flag -fcall-saved-rdi for lib/hweight.c. When combined with options -fprofile-arcs and -O2, this flag causes gcc to generate broken constructor code. As a result, a 64 bit x86 kernel compiled with CONFIG_GCOV_PROFILE_ALL=y prints message "gcov: could not create file" and runs into sproadic BUGs during boot. The gcc people indicate that these kinds of problems are endemic when using ad hoc calling conventions. It is therefore best to treat any file compiled with ad hoc calling conventions as an isolated environment and avoid things like profiling or coverage analysis, since those subsystems assume a "normal" calling conventions. This patch avoids the bug by excluding lib/hweight.o from coverage profiling. Reported-by: Meelis Roos <mroos@linux.ee> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/52F3A30C.7050205@linux.vnet.ibm.com Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@vger.kernel.org>
2014-02-06	pinctrl: tegra: return correct error type	Laxman Dewangan	1	-1/+1
	When memory allocation failed, drive should return error as ENOMEM. Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com> Acked-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2014-02-06	pinctrl: do not init debugfs entries for unimplemented functionalities	Florian Vaussard	1	-2/+4
	Commit c420619 "pinctrl: pinconf: remove checks on ops->pin_config_get" removed the check on (ops != NULL) when performing pinconf_pins_show() or pinconf_groups_show(). As these entries are always enabled, even if pinconf is not supported, reading will result in an oops due to NULL ops. Instead of checking for ops, remove the corresponding debugfs entries if pinconf and/or pinmux are not implemented. Tested on OMAP3 (pinctrl-single). Cc: stable@vger.kernel.org Signed-off-by: Florian Vaussard <florian.vaussard@epfl.ch> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2014-02-06	MIPS: fpu.h: Fix build when CONFIG_BUG is not set	Aaro Koskinen	1	-0/+2
	__enable_fpu produces a build failure when CONFIG_BUG is not set: In file included from arch/mips/kernel/cpu-probe.c:24:0: arch/mips/include/asm/fpu.h: In function '__enable_fpu': arch/mips/include/asm/fpu.h:77:1: error: control reaches end of non-void function [-Werror=return-type] This is regression introduced in 3.14-rc1. Fix that. Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Acked-by: Paul Burton <paul.burton@imgtec.com> Cc: John Crispin <blogic@openwrt.org> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/6504/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-02-06	arm64: barriers: allow dsb macro to take option parameter	Will Deacon	1	-1/+1
	The dsb instruction takes an option specifying both the target access types and shareability domain. This patch allows such an option to be passed to the dsb macro, resulting in potentially more efficient code. Currently the option is ignored until all callers are updated (unlike ARM, the option is mandated by the assembler). Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-06	drm/radeon: allow geom rings to be setup on r600/r700 (v2)	Dave Airlie	3	-3/+19
	the evergreen CS parser has allowed this for a while, just port the code to the r600 one. This is required before geom shaders can be made work. v2: agd5f: minor cleanup and add additional 7xx reg. Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
2014-02-06	drm/mgag200,ast,cirrus: fix regression with drm_can_sleep conversion	Dave Airlie	3	-3/+3
	I totally sign inverted my way out of this one. Cc: stable@vger.kernel.org Reported-by: "Sabrina Dubroca" <sd@queasysnail.net> Signed-off-by: Dave Airlie <airlied@redhat.com>
2014-02-05	x86/efi: Allow mapping BGRT on x86-32	Matt Fleming	1	-4/+6
	CONFIG_X86_32 doesn't map the boot services regions into the EFI memory map (see commit 700870119f49 ("x86, efi: Don't map Boot Services on i386")), and so efi_lookup_mapped_addr() will fail to return a valid address. Executing the ioremap() path in efi_bgrt_init() causes the following warning on x86-32 because we're trying to ioremap() RAM, WARNING: CPU: 0 PID: 0 at arch/x86/mm/ioremap.c:102 __ioremap_caller+0x2ad/0x2c0() Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-0.rc5.git0.1.2.fc21.i686 #1 Hardware name: DellInc. Venue 8 Pro 5830/09RP78, BIOS A02 10/17/2013 00000000 00000000 c0c0df08 c09a5196 00000000 c0c0df38 c0448c1e c0b41310 00000000 00000000 c0b37bc1 00000066 c043bbfd c043bbfd 00e7dfe0 00073eff 00073eff c0c0df48 c0448ce2 00000009 00000000 c0c0df9c c043bbfd 00078d88 Call Trace: [<c09a5196>] dump_stack+0x41/0x52 [<c0448c1e>] warn_slowpath_common+0x7e/0xa0 [<c043bbfd>] ? __ioremap_caller+0x2ad/0x2c0 [<c043bbfd>] ? __ioremap_caller+0x2ad/0x2c0 [<c0448ce2>] warn_slowpath_null+0x22/0x30 [<c043bbfd>] __ioremap_caller+0x2ad/0x2c0 [<c0718f92>] ? acpi_tb_verify_table+0x1c/0x43 [<c0719c78>] ? acpi_get_table_with_size+0x63/0xb5 [<c087cd5e>] ? efi_lookup_mapped_addr+0xe/0xf0 [<c043bc2b>] ioremap_nocache+0x1b/0x20 [<c0cb01c8>] ? efi_bgrt_init+0x83/0x10c [<c0cb01c8>] efi_bgrt_init+0x83/0x10c [<c0cafd82>] efi_late_init+0x8/0xa [<c0c9bab2>] start_kernel+0x3ae/0x3c3 [<c0c9b53b>] ? repair_env_string+0x51/0x51 [<c0c9b378>] i386_start_kernel+0x12e/0x131 Switch to using early_memremap(), which won't trigger this warning, and has the added benefit of more accurately conveying what we're trying to do - map a chunk of memory. This patch addresses the following bug report, https://bugzilla.kernel.org/show_bug.cgi?id=67911 Reported-by: Adam Williamson <awilliam@redhat.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Signed-off-by: Matt Fleming <matt.fleming@intel.com>