aboutsummaryrefslogtreecommitdiffstats
path: root/virt (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2015-06-24mm: fix mprotect() behaviour on VM_LOCKED VMAsKirill A. Shutemov1-0/+11
On mlock(2) we trigger COW on private writable VMA to avoid faults in future. mm/gup.c: 840 long populate_vma_page_range(struct vm_area_struct *vma, 841 unsigned long start, unsigned long end, int *nonblocking) 842 { ... 855 * We want to touch writable mappings with a write fault in order 856 * to break COW, except for shared mappings because these don't COW 857 * and we would not want to dirty them for nothing. 858 */ 859 if ((vma->vm_flags & (VM_WRITE | VM_SHARED)) == VM_WRITE) 860 gup_flags |= FOLL_WRITE; But we miss this case when we make VM_LOCKED VMA writeable via mprotect(2). The test case: #define _GNU_SOURCE #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> #include <sys/resource.h> #include <sys/stat.h> #include <sys/time.h> #include <sys/types.h> #define PAGE_SIZE 4096 int main(int argc, char **argv) { struct rusage usage; long before; char *p; int fd; /* Create a file and populate first page of page cache */ fd = open("/tmp", O_TMPFILE | O_RDWR, S_IRUSR | S_IWUSR); write(fd, "1", 1); /* Create a *read-only* *private* mapping of the file */ p = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_PRIVATE, fd, 0); /* * Since the mapping is read-only, mlock() will populate the mapping * with PTEs pointing to page cache without triggering COW. */ mlock(p, PAGE_SIZE); /* * Mapping became read-write, but it's still populated with PTEs * pointing to page cache. */ mprotect(p, PAGE_SIZE, PROT_READ | PROT_WRITE); getrusage(RUSAGE_SELF, &usage); before = usage.ru_minflt; /* Trigger COW: fault in mlock()ed VMA. */ *p = 1; getrusage(RUSAGE_SELF, &usage); printf("faults: %ld\n", usage.ru_minflt - before); return 0; } $ ./test faults: 1 Let's fix it by triggering populating of VMA in mprotect_fixup() on this condition. We don't care about population error as we don't in other similar cases i.e. mremap. [akpm@linux-foundation.org: tweak comment text] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24thp: cleanup how khugepaged enters freezerJiri Kosina1-3/+1
khugepaged_do_scan() checks in every iteration whether freezing(current) is true, and in such case breaks out of the loop, which causes try_to_freeze() to be called immediately afterwards in khugepaged_wait_work(). If nothing else, this causes unnecessary freezing(current) test, and also makes the way khugepaged enters freezer a bit less obvious than necessary. Let's just try to freeze directly, instead of splitting it into two (directly adjacent) phases. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24mm, hwpoison: remove obsolete "Notebook" todo listAndi Kleen1-7/+0
All the items mentioned here have been either addressed, or were not really needed. So just remove the comment. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24mm, hwpoison: add comment describing when to add new casesAndi Kleen1-0/+8
Here's another comment fix for hwpoison. It describes the "guiding principle" on when to add new memory error recovery code. Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24linux/slab.h: fix three off-by-one typos in commentRasmus Villemoes1-2/+2
The first is a keyboard-off-by-one, the other two the ordinary mathy kind. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24slab: correct size_index table before replacing the bootstrap kmem_cache_nodeDaniel Sanders4-15/+24
This patch moves the initialization of the size_index table slightly earlier so that the first few kmem_cache_node's can be safely allocated when KMALLOC_MIN_SIZE is large. There are currently two ways to generate indices into kmalloc_caches (via kmalloc_index() and via the size_index table in slab_common.c) and on some arches (possibly only MIPS) they potentially disagree with each other until create_kmalloc_caches() has been called. It seems that the intention is that the size_index table is a fast equivalent to kmalloc_index() and that create_kmalloc_caches() patches the table to return the correct value for the cases where kmalloc_index()'s if-statements apply. The failing sequence was: * kmalloc_caches contains NULL elements * kmem_cache_init initialises the element that 'struct kmem_cache_node' will be allocated to. For 32-bit Mips, this is a 56-byte struct and kmalloc_index returns KMALLOC_SHIFT_LOW (7). * init_list is called which calls kmalloc_node to allocate a 'struct kmem_cache_node'. * kmalloc_slab selects the kmem_caches element using size_index[size_index_elem(size)]. For MIPS, size is 56, and the expression returns 6. * This element of kmalloc_caches is NULL and allocation fails. * If it had not already failed, it would have called create_kmalloc_caches() at this point which would have changed size_index[size_index_elem(size)] to 7. I don't believe the bug to be LLVM specific but GCC doesn't normally encounter the problem. I haven't been able to identify exactly what GCC is doing better (probably inlining) but it seems that GCC is managing to optimize to the point that it eliminates the problematic allocations. This theory is supported by the fact that GCC can be made to fail in the same way by changing inline, __inline, __inline__, and __always_inline in include/linux/compiler-gcc.h such that they don't actually inline things. Signed-off-by: Daniel Sanders <daniel.sanders@imgtec.com> Acked-by: Pekka Enberg <penberg@kernel.org> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24mm/slab_common: support the slub_debug boot option on specific object sizeGavin Guo2-23/+61
The slub_debug=PU,kmalloc-xx cannot work because in the create_kmalloc_caches() the s->name is created after the create_kmalloc_cache() is called. The name is NULL in the create_kmalloc_cache() so the kmem_cache_flags() would not set the slub_debug flags to the s->flags. The fix here set up a kmalloc_names string array for the initialization purpose and delete the dynamic name creation of kmalloc_caches. [akpm@linux-foundation.org: s/kmalloc_names/kmalloc_info/, tweak comment text] Signed-off-by: Gavin Guo <gavin.guo@canonical.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24xtensa: use for_each_sg()Akinobu Mita1-7/+12
This replaces the plain loop over the sglist array with for_each_sg() macro which consists of sg_next() function calls. Since xtensa doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in order to loop over each sg element. But this can help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24procfs: treat parked tasks as sleeping for task stateChris Metcalf1-0/+8
Allowing watchdog threads to be parked means that we now have the opportunity of actually seeing persistent parked threads in the output of /proc/<pid>/stat and /proc/<pid>/status. The existing code reported such threads as "Running", which is kind-of true if you think of the case where we park them as part of taking cpus offline. But if we allow parking them indefinitely, "Running" is pretty misleading, so we report them as "Sleeping" instead. We could simply report them with a new string, "Parked", but it feels like it's a bit risky for userspace to see unexpected new values; the output is already documented in Documentation/filesystems/proc.txt, and it seems like a mistake to change that lightly. The scheduler does report parked tasks with a "P" in debugging output from sched_show_task() or dump_cpu_task(), but that's a different API. Similarly, the trace_ctxwake_* routines report a "P" for parked tasks, but again, different API. This change seemed slightly cleaner than updating the task_state_array to have additional rows. TASK_DEAD should be subsumed by the exit_state bits; TASK_WAKEKILL is just a modifier; and TASK_WAKING can very reasonably be reported as "Running" (as it is now). Only TASK_PARKED shows up with unreasonable output here. Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24watchdog: add watchdog_cpumask sysctl to assist nohzChris Metcalf6-5/+112
Change the default behavior of watchdog so it only runs on the housekeeping cores when nohz_full is enabled at build and boot time. Allow modifying the set of cores the watchdog is currently running on with a new kernel.watchdog_cpumask sysctl. In the current system, the watchdog subsystem runs a periodic timer that schedules the watchdog kthread to run. However, nohz_full cores are designed to allow userspace application code running on those cores to have 100% access to the CPU. So the watchdog system prevents the nohz_full application code from being able to run the way it wants to, thus the motivation to suppress the watchdog on nohz_full cores, which this patchset provides by default. However, if we disable the watchdog globally, then the housekeeping cores can't benefit from the watchdog functionality. So we allow disabling it only on some cores. See Documentation/lockup-watchdogs.txt for more information. [jhubbard@nvidia.com: fix a watchdog crash in some configurations] Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com> Acked-by: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24smpboot: allow excluding cpus from the smpboot threadsChris Metcalf2-1/+63
This patch series allows the watchdog to run by default only on the housekeeping cores when nohz_full is in effect; this seems to be a good compromise short of turning it off completely (since the nohz_full cores can't tolerate a watchdog). To provide customizability, we add /proc/sys/kernel/watchdog_cpumask so that the set of cores running the watchdog can be tuned to different values after bootup. To implement this customizability, we add a new smpboot_update_cpumask_percpu_thread() API to the smpboot_thread subsystem that lets us park or unpark "unwanted" threads. And now that threads can be parked for long periods of time, we tweak the /proc/<pid>/stat and /proc/<pid>/status code so parked threads aren't reported as running, which is otherwise confusing. This patch (of 3): This change allows some cores to be excluded from running the smp_hotplug_thread tasks. The following commit to update kernel/watchdog.c to use this functionality is the motivating example, and more information on the motivation is provided there. A new smp_hotplug_thread field is introduced, "cpumask", which is cpumask field managed by the smpboot subsystem that indicates whether or not the given smp_hotplug_thread should run on that core; the cpumask is checked when deciding whether to unpark the thread. To limit the cpumask to less than cpu_possible, you must call smpboot_update_cpumask_percpu_thread() after registering. Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ulrich Obergfell <uobergfe@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24sparc: use for_each_sg()Akinobu Mita1-3/+5
This replaces the plain loop over the sglist array with for_each_sg() macro which consists of sg_next() function calls. Since sparc does select ARCH_HAS_SG_CHAIN, it is necessary to use for_each_sg() in order to loop over each sg element. This also help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24parisc: use for_each_sg()Akinobu Mita1-11/+16
This replaces the plain loop over the sglist array with for_each_sg() macro which consists of sg_next() function calls. Since parisc doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in order to loop over each sg element. But this can help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24ocfs2: mark local functions as staticJoseph Qi2-6/+6
Some functions are only used locally, so mark them as static. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24ocfs2: use swap() in ocfs2_double_lock()Fabian Frederick1-9/+2
Use kernel.h macro definition. Thanks to Julia Lawall for Coccinelle scripting support. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24ocfs2: use swap() in swap_refcount_rec()Fabian Frederick1-4/+2
Use kernel.h macro definition. Thanks to Julia Lawall for Coccinelle scripting support. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24ocfs2: use swap() in dx_leaf_sort_swap()Fabian Frederick1-4/+1
Use kernel.h macro definition. Thanks to Julia Lawall for Coccinelle scripting support. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24ocfs2: fix wrong check in ocfs2_direct_IO_get_blocksJoseph Qi2-3/+15
contig_blocks gotten from ocfs2_extent_map_get_blocks cannot be compared with clusters_to_alloc. So convert it to clusters first. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Weiwei Wang <wangww631@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24ocfs2: fix NULL pointer dereference in function ocfs2_abort_trigger()Xue jiufei1-3/+1
ocfs2_abort_trigger() use bh->b_assoc_map to get sb. But there's no function to set bh->b_assoc_map in ocfs2, it will trigger NULL pointer dereference while calling this function. We can get sb from bh->b_bdev->bd_super instead of b_assoc_map. [akpm@linux-foundation.org: update comment, per Joseph] Signed-off-by: joyce.xue <xuejiufei@huawei.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24ocfs2: o2net: should remove debugfs in o2net_init() out branchalex chen1-1/+1
Signed-off-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24ocfs2: remove OCFS2_IOCB_SEM lock type in direct ioWeiWei Wang3-37/+4
In ocfs2 direct read/write, OCFS2_IOCB_SEM lock type is used to protect inode->i_alloc_sem rw semaphore lock in the earlier kernel version. However, in the latest kernel, inode->i_alloc_sem rw semaphore lock is not used at all, so OCFS2_IOCB_SEM lock type needs to be removed. Signed-off-by: Weiwei Wang <wangww631@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24ocfs2: do not BUG if jbd2_journal_dirty_metadata failsJoseph Qi1-1/+14
jbd2_journal_dirty_metadata may fail. Currently it cannot take care of non zero return value and just BUG in ocfs2_journal_dirty. This patch is aborting the handle and journal instead of BUG. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: joyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24ocfs2: remove BUG_ON(!empty_extent) in __ocfs2_rotate_tree_left()Xue jiufei1-1/+2
ocfs2_rotate_tree_left() calls __ocfs2_rotate_tree_left() for left rotation while non-rightmost path containing an empty extent in the leaf block. __ocfs2_rotate_tree_left() returns -EAGAIN if right subtree having an empty extent and pass the empty_extent_path to caller. The caller ocfs2_rotate_tree_left() will restart rotation from the returned path. It will trigger the BUG_ON(!ocfs2_is_empty_extent) when the et on disk is as follows: eb0 is the leaf block of path(say path_a) passed to ocfs2_rotate_tree_left, which has an empty rec[0]. eb1 is the leaf block of path(say path_b) that just right to path_a, which has no empty record. eb2 is the leaf block of path(say path_c) that just right to path_b, which has an empty rec[0]. And path_c is also the rightmost path. Now we want to remove the empty rec[0] in eb0: ocfs2_rotate_tree_left: -> call __ocfs2_rotate_tree_left with path_a as its input *path* -> call ocfs2_rotate_subtree_left with path_a as its input *left_path* and path_b as its input *right_path*. it will move rec[0] in eb1 to eb0, and rec[0] in eb0 is not empty now. -> continue to call ocfs2_rotate_subtree_left with path_b as its input *left_path* and path_c as its input *right_path*, and return -EAGAIN because eb2 has an empty rec[0] -> call __ocfs2_rotate_tree_left with path_c as it input, rotate all records in eb2 to left and return 0. -> call __ocfs2_rotate_tree_left with path_a as its input, and triggers the BUG_ON(!ocfs2_is_empty_extent) as the rec[0] in eb0 is not empty. So the BUG_ON() should be removed and return 0 if rec[0] is no longer an empty extent. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24ocfs2: return error when ocfs2_figure_merge_contig_type() failsXue jiufei1-10/+24
ocfs2_figure_merge_contig_type() still returns CONTIG_NONE when some error occurs which will cause an unpredictable error. So return a proper errno when ocfs2_figure_merge_contig_type() fails. Signed-off-by: joyce.xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24ocfs2/dlm: cleanup unused function __dlm_wait_on_lockres_flags_setJoseph Qi1-1/+0
__dlm_wait_on_lockres_flags_set() is declared but not implemented and used. So remove it. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24ocfs2: use retval instead of status for checking errorDaeseok Youn1-10/+10
The use of 'status' in __ocfs2_add_entry() can return wrong value. Some functions' return value in __ocfs2_add_entry(), i.e ocfs2_journal_access_di() is saved to 'status'. But 'status' is not used in 'bail' label for returning result of __ocfs2_add_entry(). So use retval instead of status. Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24ocfs2: fix a tiny race when truncate dio orohaned entryJoseph Qi4-46/+39
Once dio crashed it will leave an entry in orphan dir. And orphan scan will take care of the clean up. There is a tiny race case that the same entry will be truncated twice and then trigger the BUG in ocfs2_del_inode_from_orphan. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24ocfs2: remove __mlog_cpu_guessAndrew Morton1-16/+3
raw_smp_processor_id() is the means of avoiding the runtime preemptibility check. [akpm@linux-foundation.org: fix printk warning] Cc: Joe Perches <joe@perches.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24ocfs2: reduce object size of mlog usesJoe Perches2-30/+59
Using a function for __mlog_printk instead of a macro reduces the object size of built-in.o by about 190KB, or ~18% overall (x86-64 defconfig with all ocfs2 options) $ size fs/ocfs2/built-in.o* text data bss dec hex filename 870954 118471 134408 1123833 1125f9 fs/ocfs2/built-in.o,new 1064081 118071 134408 1316560 1416d0 fs/ocfs2/built-in.o.old Miscellanea: - Move the used-once __mlog_cpu_guess statement expression macro to the masklog.c file above the use in __mlog_printk function - Simplify the mlog macro moving the and/or logic and level code into __mlog_printk [akpm@linux-foundation.org: export __mlog_printk() to other ocfs2 modules] Signed-off-by: Joe Perches <joe@perches.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24configfs: unexport/make static config_item_init()Fabian Frederick2-3/+1
config_item_init() is only used in item.c Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24NTFS: use kvfree() in ntfs_free()Pekka Enberg1-6/+1
Use kvfree() instead of open-coding it. Signed-off-by: Pekka Enberg <penberg@kernel.org> Cc: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24fsnotify: remove obsolete documentationNikolay Borisov1-2/+0
should_send_event is no longer part of struct fsnotify_ops, so remove it. Signed-off-by: Nikolay Borisov <kernel@kyup.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24powerpc: use for_each_sg()Akinobu Mita1-5/+5
This replaces the plain loop over the sglist array with for_each_sg() macro which consists of sg_next() function calls. Since powerpc does select ARCH_HAS_SG_CHAIN, it is necessary to use for_each_sg() in order to loop over each sg element. This also help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24metag: use for_each_sg()Akinobu Mita1-5/+9
This replaces the plain loop over the sglist array with for_each_sg() macro which consists of sg_next() function calls. Since metag doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in order to loop over each sg element. But this can help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: James Hogan <james.hogan@imgtec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-23KVM: s390: clear floating interrupt bitmap and parametersJens Freimann1-0/+3
commit 6d3da24141 ("KVM: s390: deliver floating interrupts in order of priority") introduced a regression for the reset handling. We don't clear the bitmap of pending floating interrupts and interrupt parameters. This could result in stale interrupts even after a reset. Let's fix this by clearing the pending bitmap and the parameters for service and machine check interrupts. Cc: stable@vger.kernel.org # 4.1 Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-23KVM: x86/vPMU: Enable PMU handling for AMD PERFCTRn and EVNTSELn MSRsWei Huang1-42/+9
This patch enables AMD guest VM to access (R/W) PMU related MSRs, which include PERFCTR[0..3] and EVNTSEL[0..3]. Reviewed-by: Joerg Roedel <jroedel@suse.de> Tested-by: Joerg Roedel <jroedel@suse.de> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-23KVM: x86/vPMU: Implement AMD vPMU code for KVMWei Huang1-6/+116
This patch replaces the empty AMD vPMU functions (in pmu_amd.c) with real implementation. Reviewed-by: Joerg Roedel <jroedel@suse.de> Tested-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-23KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatchWei Huang8-338/+606
This patch defines a new function pointer struct (kvm_pmu_ops) to support vPMU for both Intel and AMD. The functions pointers defined in this new struct will be linked with Intel and AMD functions later. In the meanwhile the struct that maps from event_sel bits to PERF_TYPE_HARDWARE events is renamed and moved from Intel specific code to kvm_host.h as a common struct. Reviewed-by: Joerg Roedel <jroedel@suse.de> Tested-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-23KVM: x86/vPMU: introduce kvm_pmu_msr_idx_to_pmcWei Huang1-6/+18
This function will be part of the kvm_pmu_ops interface. Introduce it already. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-23HSI: nokia-modem: use flags argument of devm_gpiod_get to set directionUwe Kleine-König1-5/+2
Since 39b2bbe3d715 (gpio: add flags argument to gpiod_get*() functions) which appeared in v3.17-rc1, the gpiod_get* functions take an additional parameter that allows to specify direction and initial value for output. Use this to simplify the driver. Furthermore this is one caller less that stops us making the flags argument to gpiod_get*() mandatory. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Sebastian Reichel <sre@kernel.org>
2015-06-23HSI: nokia-modem: Reduce missing driver message to debug levelSebastian Reichel1-2/+2
Reduce message priority from dev_err to dev_dbg for missing cmt-speech or ssi-protocol drivers, since they will be probed again and it may result in spamming the boot log. Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Sebastian Reichel <sre@kernel.org>
2015-06-23HSI: cmt_speech: fix timestamp interfaceSebastian Reichel2-7/+18
The user interface for timestamps in the new cmt_speech driver is broken in multiple ways: - The layout is incompatible between 32-bit and 64-bit user space, because of the size differences in 'struct timespec'. This means that the driver can not work when used with 32-bit user space on a 64-bit kernel. - As there are plans to change 32-bit user space to use a 64-bit time_t type in the future, it will also be incompatible with new 32-bit user space. - It is using ktime_get_ts under it's deprecated alias (do_posix_clock_monotonic_gettime). To keep support for the user space tools written for this driver (which have lived many years out-of-tree), the interface has been hardened to unsigned 32-bit values. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Sebastian Reichel <sre@kernel.org>
2015-06-22cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle stateShilpasri G Bhat2-0/+23
The idle cpus which stay in snooze for a long period can degrade the perfomance of the sibling cpus. If the cpu stays in snooze for more than target residency of the next available idle state, then exit from snooze. This gives a chance to the cpuidle governor to re-evaluate the last idle state of the cpu to promote it to deeper idle states. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-22x86: Load __USER_DS into DS/ES after resumeIngo Molnar1-2/+4
Srinivas Pandruvada reported a problem with system resume from suspend-to-RAM on 32-bit x86 systems where the DS register of the CPU is set to __KERNEL_DS instead of __USER_DS on return to user space which cases a General Protection Fault to occur. The issue is that DS is set to __KERNEL_DS by the ACPI resume code path while the SYSEXIT path never reloads DS/ES. It assumes they are still __USER_DS set at the SYSENTER time (Brian Gerst), so if the return to user space happens to be through SYSEXIT, it will lead to the reported GPF. Fix the problem by setting the DS and ES registers to __USER_DS as expected by the SYSEXIT path. Link: https://bugzilla.kernel.org/show_bug.cgi?id=61781 Link: http://marc.info/?l=linux-pm&m=143406648920385&w=2 Acked-by: Pavel Machek <pavel@ucw.cz> Tested-by: Pavel Machek <pavel@ucw.cz> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-22PM / OPP: Add binding for 'opp-suspend'Viresh Kumar1-0/+7
On few platforms, for power efficiency, we want the device to be configured for a specific OPP while we put the device in suspend state. Add an optional property in operating-points-v2 bindings for that. Suggested-by: Nishanth Menon <nm@ti.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Nishanth Menon <nm@ti.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-22PM / OPP: Allow multiple OPP tables to be passed via DTViresh Kumar1-0/+60
On some platforms (Like Qualcomm's SoCs), it is not decided until runtime on what OPPs to use. The OPP tables can be fixed at compile time, but which table to use is found out only after reading some efuses (sort of an prom) and knowing characteristics of the SoC. To support such platform we need to pass multiple OPP tables per device and hardware should be able to choose one and only one table out of those. Update operating-points-v2 bindings to support that. Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-22PM / OPP: Add new bindings to address shortcomings of existing bindingsViresh Kumar1-4/+377
Current OPP (Operating performance point) device tree bindings have been insufficient due to the inflexible nature of the original bindings. Over time, we have realized that Operating Performance Point definitions and usage is varied depending on the SoC and a "single size (just frequency, voltage) fits all" model which the original bindings attempted and failed. The proposed next generation of the bindings addresses by providing a expandable binding for OPPs and introduces the following common shortcomings seen with the original bindings: - Getting clock/voltage/current rails sharing information between CPUs. Shared by all cores vs independent clock per core vs shared clock per cluster. - Support for specifying current levels along with voltages. - Support for multiple regulators. - Support for turbo modes. - Other per OPP settings: transition latencies, disabled status, etc.? - Expandability of OPPs in future. This patch introduces new bindings "operating-points-v2" to get these problems solved. Refer to the bindings for more details. We now have multiple versions of OPP binding and only one of them should be used per device. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Acked-by: Nishanth Menon <nm@ti.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-22mfd: lpc_ich: Assign subdevice ids automaticallyMika Westerberg1-4/+4
Using -1 as platform device id means that the platform driver core will not assign any id to the device (the device name will not have id at all). This results problems on systems that have multiple PCHs (Platform Controller HUBs) because all of them also include their own copy of LPC device. All the subsequent device creations will fail because there already exists platform device with the same name. Fix this by passing PLATFORM_DEVID_AUTO as platform device id. This makes the platform device core to allocate new ids automatically. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2015-06-22mfd: si476x-i2c: Pass the IRQF_ONESHOT flagFabio Estevam1-1/+2
Since commit 1c6c69525b40eb76de8adf039409722015927dc3 ("genirq: Reject bogus threaded irq requests") threaded IRQs without a primary handler need to be requested with IRQF_ONESHOT, otherwise the request will fail. So pass the IRQF_ONESHOT flag in this case. The semantic patch that makes this change is available in scripts/coccinelle/misc/irqf_oneshot.cocci. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2015-06-22mfd: ab8500-gpadc: Pass the IRQF_ONESHOT flagFabio Estevam1-2/+4
Since commit 1c6c69525b40eb76de8adf039409722015927dc3 ("genirq: Reject bogus threaded irq requests") threaded IRQs without a primary handler need to be requested with IRQF_ONESHOT, otherwise the request will fail. So pass the IRQF_ONESHOT flag in this case. The semantic patch that makes this change is available in scripts/coccinelle/misc/irqf_oneshot.cocci. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>